I am trying to plot my MultiIndex Xarray in jupyter using Matplotlib and Holoviews. I can plot a very basic plot using matplotlib but I get errors otherwise.
My xarray is this -
I am using this code to plot my spectrogram with matpllotlib and some in-house function of xarray to find the max value in the matrix that I want to plot.
# Plotting in some other way
plt.figure(figsize=(3,5))
data_slice = temp1
max_value = np.log(temp1.max(xr.ALL_DIMS)['__xarray_dataarray_variable__'].values)
xr.ufuncs.log(data_slice).plot(cmap='magma', vmin=0, vmax = max_value*.7)
In this code I get the error - KeyError: 'xarray_dataarray_variable'
When I am plotting the spectrogram using holoviews I use this code -
# plotting the new xarray that we got - 2 dimenntional
# making an array that represents the freq bins
final_freqs = np.linspace(0, 125000, 257)
time_to_see = 10
time_stamps_to_be_displayed = [[] for _ in range(165)]
for x in range(0, 55):
# multiplying it by 0.01 to get it to seconds as each window is for 10 miliseconds.
time_stamps_to_be_displayed[x].append(time_to_see + x * 0.005)
time_displayed = np.array(time_stamps_to_be_displayed).flatten()
xr_spec = xr.DataArray(temp1, dims = ('freq','time') ,coords = {'freq':final_freqs,'time':time_displayed})
xr_spec.name = 'Spectrogram'
# plotting the graph
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', 'matplotlib')
import os
os.environ['HV_DOC_HTML'] = 'true'
#%env HV_DOC_HTML=true
import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
hv.extension('bokeh')
output_notebook()
import imp
imp.reload(hv)
hv_spec = hv.Dataset(xr_spec)
hv.extension('bokeh')
hv_spec.to(hv.Image, ['time', 'freq'])
In this, I get the error - unsupported operand type(s) for -: 'list' and 'list' for the very last line.
What am I doing wrong? Please help me.
StackTrace is here -
Related
I am trying to make a best fit line for all of my graphs in holoviews, right now it just makes a line based on all the data instead of each graph individually.
vdims = [('year avg', 'Yearly Average Temperature')]
ds = hv.Dataset(temp, ['Year','State Name'], vdims)
ds = ds.aggregate(function=np.mean)
scat = hv.Scatter(ds,'Year','year avg')
layout = ds.to(hv.Scatter,'Year','year avg') * hv.Slope.from_scatter(scat)
layout.opts(opts.Curve(width=800, height=400, framewise=True))
Which gives this
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import holoviews as hv
hv.extension('bokeh')
from holoviews import opts
from holoviews.plotting.links import DataLink
import hvplot.pandas
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['Even'] = df['D']%2 == 0
vdims = [('A', 'A')]
ds = hv.Dataset(df, ['B','Even'], vdims)
#ds = ds.aggregate(function=np.mean) this does nothing here, in my code it takes the mean of 100 or so data points in each group for each year.
scat = hv.Scatter(ds,'B','A')
layout = ds.to(hv.Scatter,'B','A') * hv.Slope.from_scatter(scat)
layout.opts(opts.Curve(width=800, height=400, framewise=True))
The idea is to make a scatter plot grouped by a variable in the dataframe, in this case if D is even or odd, and also plot a best fit line of that same grouped scatter plot.
What I have is a set of grouped scatter plots but with a best fit line based on all of the data combined, not grouped by even and odd.
I'm a newbie to Altair, and I would like to change the number of bars being plotted in a bar plot. I have done some research online, but couldn't find anything helpful. Here is my code:
import altair as alt
import pandas as pd
import numpy as np
# Generate a random np array of size 1000, and our goal is to plot its distribution.
my_numbers = np.random.normal(size = 1000)
my_numbers_df = pd.DataFrame.from_dict({'Integers': my_numbers})
alt.Chart(my_numbers_df).mark_bar(size = 10).encode(
alt.X("Integers",
bin = True,
scale = alt.Scale(domain=(-5, 5))
),
y = 'count()',
)
The plot right now looks something like this
You can increase the number of bins by passing an alt.Bin() object and specifying the maxbins
import altair as alt
import pandas as pd
import numpy as np
# Generate a random np array of size 1000, and our goal is to plot its distribution.
my_numbers = np.random.normal(size = 1000)
my_numbers_df = pd.DataFrame.from_dict({'Integers': my_numbers})
alt.Chart(my_numbers_df).mark_bar(size = 10).encode(
alt.X("Integers",
bin = alt.Bin(maxbins=25),
scale = alt.Scale(domain=(-5, 5))
),
y = 'count()',
)
I have several histograms that I succeded to plot using plotly like this:
fig.add_trace(go.Histogram(x=np.array(data[key]), name=self.labels[i]))
I would like to create something like this 3D stacked histogram but with the difference that each 2D histogram inside is a true histogram and not just a hardcoded line (my data is of the form [0.5 0.4 0.5 0.7 0.4] so using Histogram directly is very convenient)
Note that what I am asking is not similar to this and therefore also not the same as this. In the matplotlib example, the data is presented directly in a 2D array so the histogram is the 3rd dimension. In my case, I wanted to feed a function with many already computed histograms.
The snippet below takes care of both binning and formatting of the figure so that it appears as a stacked 3D chart using multiple traces of go.Scatter3D and np.Histogram.
The input is a dataframe with random numbers using np.random.normal(50, 5, size=(300, 4))
We can talk more about the other details if this is something you can use:
Plot 1: Angle 1
Plot 2: Angle 2
Complete code:
# imports
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'browser'
# data
np.random.seed(123)
df = pd.DataFrame(np.random.normal(50, 5, size=(300, 4)), columns=list('ABCD'))
# plotly setup
fig=go.Figure()
# data binning and traces
for i, col in enumerate(df.columns):
a0=np.histogram(df[col], bins=10, density=False)[0].tolist()
a0=np.repeat(a0,2).tolist()
a0.insert(0,0)
a0.pop()
a1=np.histogram(df[col], bins=10, density=False)[1].tolist()
a1=np.repeat(a1,2)
fig.add_traces(go.Scatter3d(x=[i]*len(a0), y=a1, z=a0,
mode='lines',
name=col
)
)
fig.show()
Unfortunately you can't use go.Histogram in a 3D space so you should use an alternative way. I used go.Scatter3d and I wanted to use the option to fill line doc but there is an evident bug see
import numpy as np
import plotly.graph_objs as go
# random mat
m = 6
n = 5
mat = np.random.uniform(size=(m,n)).round(1)
# we want to have the number repeated
mat = mat.repeat(2).reshape(m, n*2)
# and finally plot
x = np.arange(2*n)
y = np.ones(2*n)
fig = go.Figure()
for i in range(m):
fig.add_trace(go.Scatter3d(x=x,
y=y*i,
z=mat[i,:],
mode="lines",
# surfaceaxis=1 # bug
)
)
fig.show()
I want to draw a Rectangle (hv.Bounds) on a plot that has a datetime axis. However it's not clear from the documentation how one would specify the corner points.
Naturally I tried to specify a datetime object, however this results in the following error message:
ValueError: lbrt: tuple element is not numeric
import holoviews as hv
import pandas as pd
hv.extension('bokeh')
%%opts Curve [width=500]
xs = pd.date_range('1.1.2019', '31.1.2019')
ys = np.sin(range(len(xs)))
box=hv.Bounds((pd.to_datetime('5.1.2019'), 0.1, pd.to_datetime('7.1.2019'), .8))
hv.Curve((xs,ys))
As of version 1.12.3 it's possible to do this:
import holoviews as hv
import pandas as pd
import numpy as np
import hvplot.pandas
hv.extension('bokeh')
index = pd.date_range('1.1.2019', '2.28.2019')
df = pd.DataFrame(np.random.rand(len(index)), index)
pts = pd.to_datetime(['1.15.2019', '2.15.2019'])
box = hv.Bounds((pts[0], 0.1, pts[1], .9)).opts(color='red')
df.hvplot.scatter() * box
As of now, Bounds only accepts numeric bounds, but see this PR if you feel like testing it: https://github.com/pyviz/holoviews/pull/3640
I am trying to generate a smooth line using a dataset that contains time (measured as number of days) and a set of numbers that represent a socioeconomic variable.
Here is a sample of my data:
date, data
726,1.2414
727,1.2414
728,1.2414
729,1.2414
730,1.2414
731,1.2414
732,1.2414
733,1.2414
734,1.2414
735,1.2414
736,1.2414
737,1.804597701
738,1.804597701
739,1.804597701
740,1.804597701
741,1.804597701
742,1.804597701
743,1.804597701
744,1.804597701
745,1.804597701
746,1.804597701
747,1.804597701
748,1.804597701
749,1.804597701
750,1.804597701
751,1.804597701
752,1.793103448
753,1.793103448
754,1.793103448
755,1.793103448
756,1.793103448
757,1.793103448
758,1.793103448
759,1.793103448
760,1.793103448
761,1.793103448
762,1.793103448
763,1.793103448
764,1
765,1
This is my code so far:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
out_file = "path_to_file/file.csv"
df = pd.read_csv(out_file)
time = df['date']
data = df['data']
ax1 = plt.subplot2grid((4,3),(0,0), colspan = 2, rowspan = 2) # Will be adding other plots
plt.plot(time, data)
plt.yticks(np.arange(1,5,1)) # Include classes 1-4 showing only 1 step changes
plt.gca().invert_yaxis() # Reverse y axis
plt.ylabel('Trend', fontsize = 8, labelpad = 10)
This generates the following plot:
Test plot
I have seen posts that answer similar questions (like the ones below), but can't seem to get my code to work. Can anyone suggest an elegant solution?
Generating smooth line graph using matplotlib
Python Matplotlib - Smooth plot line for x-axis with date values