How to acess/export holoviews (HexTiles) rendered data - python

Is there a way to access the aggregated data contained in, e.g.,
import holoviews as hv
import numpy as np
hv.HexTiles(np.random.rand(100,2)).options(gridsize=4)
that is the locations and values (here: counts) of all hexagons?

There is, matplotlib performs the aggregation internally but the bokeh backend uses an operation that returns the aggregated data, and q and r coordinates, which define the hex grid. You can import and use the operation like this:
import holoviews as hv
import numpy as np
from holoviews.plotting.bokeh.hex_tiles import hex_binning
hextiles = hv.HexTiles(np.random.rand(100,2))
df = hex_binning(hextiles, gridsize=4).dframe()
df.head()
If you need to compute the hexagon's x/y-locations you'll have to read up on hexagon offset coordinates.

Related

How can you decimate data depending on zoom level of plot?

The documentation for holoviews' decimate operation seems to imply that if max_samples=100, say, you get a plot with 100 points at most no matter the zoom level.
With the following example, I see no new dots appear as I zoom in... can holoviews achieve this? can some other package?
import numpy as np
import holoviews as hv
import pandas as pd
import numpy as np
# from holoviews import opts
# from holoviews.operation.datashader import datashade, shade, dynspread, spread
# from holoviews.operation.datashader import rasterize, ResamplingOperation
from holoviews.operation import decimate
hv.extension('bokeh', width=200)
# Default values suitable for this notebook
decimate.max_samples=100
np.random.seed(1)
points = hv.Points(np.random.multivariate_normal((0,0), [[0.1, 0.1], [0.1, 1.0]], (1_000_000,)))
decimate(points)
The above code seems to decimate data once, and then show the same dots regardless of x_range and y_range...
Any ideas?
Thanks!

How to change or customize the colors from pandas?

hi I'm just starting to use pandas on python to graph some data instead of excel,
i want to customize the colors as well as the opacity of some given data because its always going into its default color lists
heres my code :
from pandas import DataFrame
import matplotlib.pyplot as plt
import numpy as np
x=np.array([[4,8,5,7,6],[2,3,4,2,6],[4,7,4,7,8],[2,6,4,8,6],[2,4,3,3,2]])
df=DataFrame(x, columns=['a','b','c','d','e'], index=[2,4,6,8,10])
df.plot(kind='bar')
plt.show()
You can call df.plot.bar directly and pass a dictionary of column name to color mappings to the color parameter.
from pandas import DataFrame
import matplotlib.pyplot as plt
import numpy as np
x=np.array([[4,8,5,7,6],[2,3,4,2,6],[4,7,4,7,8],[2,6,4,8,6],[2,4,3,3,2]])
df=DataFrame(x, columns=['a','b','c','d','e'], index=[2,4,6,8,10])
df.plot.bar(color={'a':'gold','b':'silver','c':'green','d':'purple','e':'blue'})
plt.show()

How to specify a rectangle on a datetimeaxis plot?

I want to draw a Rectangle (hv.Bounds) on a plot that has a datetime axis. However it's not clear from the documentation how one would specify the corner points.
Naturally I tried to specify a datetime object, however this results in the following error message:
ValueError: lbrt: tuple element is not numeric
import holoviews as hv
import pandas as pd
hv.extension('bokeh')
%%opts Curve [width=500]
xs = pd.date_range('1.1.2019', '31.1.2019')
ys = np.sin(range(len(xs)))
box=hv.Bounds((pd.to_datetime('5.1.2019'), 0.1, pd.to_datetime('7.1.2019'), .8))
hv.Curve((xs,ys))
As of version 1.12.3 it's possible to do this:
import holoviews as hv
import pandas as pd
import numpy as np
import hvplot.pandas
hv.extension('bokeh')
index = pd.date_range('1.1.2019', '2.28.2019')
df = pd.DataFrame(np.random.rand(len(index)), index)
pts = pd.to_datetime(['1.15.2019', '2.15.2019'])
box = hv.Bounds((pts[0], 0.1, pts[1], .9)).opts(color='red')
df.hvplot.scatter() * box
As of now, Bounds only accepts numeric bounds, but see this PR if you feel like testing it: https://github.com/pyviz/holoviews/pull/3640

Adding shaded areas onto a normal distribution for standard deviation and mean with matplotlib [duplicate]

I would like to Fill_Between a sub section of a normal distribution, say the left 5%tile.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as stats
plt.style.use('ggplot')
mean=1000
std=250
x=np.linspace(mean-3*std, mean+3*std,1000)
iq=stats.norm(mean,std)
plt.plot(x,iq.pdf(x),'b')
Great so far.
Then I set px to fill the area between x=0 to 500
px=np.arange(0,500,10)
plt_fill_between(px,iq.pdf(px),color='r')
The problem is that the above will only show the pdf from 0 to 500 in red.
I want to show the full pdf from 0 to 2000 where the 0 to 500 is shaded?
Any idea how to create this?
As commented, you need to use plt.fill_between instead of plt_fill_between. When doing so the output looks like this which seems to be exactly what you're looking for.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as stats
plt.style.use('ggplot')
mean=1000
std=250
x=np.linspace(mean-3*std, mean+3*std,1000)
iq=stats.norm(mean,std)
plt.plot(x,iq.pdf(x),'b')
px=np.arange(0,500,10)
plt.fill_between(px,iq.pdf(px),color='r')
plt.show()
You only use the x values from 0 to 500 in your np.arange if you want to go to 2000 write:
px=np.arange(0,2000,10)
plt.fill_between(px,iq.pdf(px),color='r')

Plot a pandas dataframe with vertical lines

I want to plot a dataframe where each data point is not represented as a point but a vertical line from the zero axis like :
df['A'].plot(style='xxx')
where xxx is the style I need.
Also ideally i would like to be able to color each bar based on the values in another column in my dataframe.
I precise that my x axis values are numbers and are not equally spaced.
The pandas plotting tools are convenient wrappers to matplotlib. There is no way I know of to get the functionality you want directly via pandas.
You can get it in a few lines of matplotlib. Most of the code is to do the colour mapping:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors as colors
import matplotlib.cm as cmx
#make the dataframe
a = np.random.rand(100)
b = np.random.ranf(100)
df = pd.DataFrame({'a': a, 'b': b})
# do the colour mapping
c_norm = colors.Normalize(vmin=min(df.b), vmax=max(df.b))
scalar_map = cmx.ScalarMappable(norm=c_norm, cmap=plt.get_cmap('jet'))
color_vals = [scalar_map.to_rgba(val) for val in df.b]
# make the plot
plt.vlines(df.index, np.zeros_like(df.a), df.a, colors=color_vals)
I've used the DataFrame index for the x axis values but there is no reason that you could not use irregularly spaced x values.

Categories

Resources