How can you decimate data depending on zoom level of plot? - python

The documentation for holoviews' decimate operation seems to imply that if max_samples=100, say, you get a plot with 100 points at most no matter the zoom level.
With the following example, I see no new dots appear as I zoom in... can holoviews achieve this? can some other package?
import numpy as np
import holoviews as hv
import pandas as pd
import numpy as np
# from holoviews import opts
# from holoviews.operation.datashader import datashade, shade, dynspread, spread
# from holoviews.operation.datashader import rasterize, ResamplingOperation
from holoviews.operation import decimate
hv.extension('bokeh', width=200)
# Default values suitable for this notebook
decimate.max_samples=100
np.random.seed(1)
points = hv.Points(np.random.multivariate_normal((0,0), [[0.1, 0.1], [0.1, 1.0]], (1_000_000,)))
decimate(points)
The above code seems to decimate data once, and then show the same dots regardless of x_range and y_range...
Any ideas?
Thanks!

Related

change sns.kdeplot cbar scale

I want to change the scale of the sns.kdeplot cbar, so I can see the number of points instead of a decimal number (honestly I don't fully understand it).
The code:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,50,size=(50,2 )), columns=list('AB'))
sns.kdeplot(df['A'], df['B'],cmap='Reds',shade=True,shade_lowest=False,cbar=True)
The result:

How to specify a rectangle on a datetimeaxis plot?

I want to draw a Rectangle (hv.Bounds) on a plot that has a datetime axis. However it's not clear from the documentation how one would specify the corner points.
Naturally I tried to specify a datetime object, however this results in the following error message:
ValueError: lbrt: tuple element is not numeric
import holoviews as hv
import pandas as pd
hv.extension('bokeh')
%%opts Curve [width=500]
xs = pd.date_range('1.1.2019', '31.1.2019')
ys = np.sin(range(len(xs)))
box=hv.Bounds((pd.to_datetime('5.1.2019'), 0.1, pd.to_datetime('7.1.2019'), .8))
hv.Curve((xs,ys))
As of version 1.12.3 it's possible to do this:
import holoviews as hv
import pandas as pd
import numpy as np
import hvplot.pandas
hv.extension('bokeh')
index = pd.date_range('1.1.2019', '2.28.2019')
df = pd.DataFrame(np.random.rand(len(index)), index)
pts = pd.to_datetime(['1.15.2019', '2.15.2019'])
box = hv.Bounds((pts[0], 0.1, pts[1], .9)).opts(color='red')
df.hvplot.scatter() * box
As of now, Bounds only accepts numeric bounds, but see this PR if you feel like testing it: https://github.com/pyviz/holoviews/pull/3640

Adding shaded areas onto a normal distribution for standard deviation and mean with matplotlib [duplicate]

I would like to Fill_Between a sub section of a normal distribution, say the left 5%tile.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as stats
plt.style.use('ggplot')
mean=1000
std=250
x=np.linspace(mean-3*std, mean+3*std,1000)
iq=stats.norm(mean,std)
plt.plot(x,iq.pdf(x),'b')
Great so far.
Then I set px to fill the area between x=0 to 500
px=np.arange(0,500,10)
plt_fill_between(px,iq.pdf(px),color='r')
The problem is that the above will only show the pdf from 0 to 500 in red.
I want to show the full pdf from 0 to 2000 where the 0 to 500 is shaded?
Any idea how to create this?
As commented, you need to use plt.fill_between instead of plt_fill_between. When doing so the output looks like this which seems to be exactly what you're looking for.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as stats
plt.style.use('ggplot')
mean=1000
std=250
x=np.linspace(mean-3*std, mean+3*std,1000)
iq=stats.norm(mean,std)
plt.plot(x,iq.pdf(x),'b')
px=np.arange(0,500,10)
plt.fill_between(px,iq.pdf(px),color='r')
plt.show()
You only use the x values from 0 to 500 in your np.arange if you want to go to 2000 write:
px=np.arange(0,2000,10)
plt.fill_between(px,iq.pdf(px),color='r')

How to acess/export holoviews (HexTiles) rendered data

Is there a way to access the aggregated data contained in, e.g.,
import holoviews as hv
import numpy as np
hv.HexTiles(np.random.rand(100,2)).options(gridsize=4)
that is the locations and values (here: counts) of all hexagons?
There is, matplotlib performs the aggregation internally but the bokeh backend uses an operation that returns the aggregated data, and q and r coordinates, which define the hex grid. You can import and use the operation like this:
import holoviews as hv
import numpy as np
from holoviews.plotting.bokeh.hex_tiles import hex_binning
hextiles = hv.HexTiles(np.random.rand(100,2))
df = hex_binning(hextiles, gridsize=4).dframe()
df.head()
If you need to compute the hexagon's x/y-locations you'll have to read up on hexagon offset coordinates.

Changing colormap for categorical data in Holoviews/Datashader

I'm trying to visualize categorical spatial data using Datashader and Holoviews, similarly to https://anaconda.org/jbednar/census-hv-dask/notebook. However, when I try to assign different colors to categories, I always end up with same (presumably default) colors (An example of the output image.)
Here is the code I'm running in Jupyter notebook. Could anyone advise me on how to make the custom color map work? Or at least run the code to see if you end up with colors matching the legend or not. Thank you!
from sklearn.datasets.samples_generator import make_blobs
from matplotlib import pyplot
import pandas as pd
import holoviews as hv
import geoviews as gv
import datashader as ds
from cartopy import crs
from matplotlib.cm import get_cmap
from holoviews.operation.datashader import datashade, aggregate
hv.notebook_extension('bokeh', width=95)
# Generating blob data:
X, y = make_blobs(n_samples=5000000, centers=5, n_features=2)
df = pd.DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
# Plotting the blobs using datashader and holoviews:
%opts Overlay [width=800 height=455 xaxis=None yaxis=None show_grid=False]
%opts Shape (fill_color=None line_width=1.5) [apply_ranges=False]
%opts Points [apply_ranges=False] WMTS (alpha=0.5) NdOverlay [tools=['tap']]
color_key = {0:'red', 1:'blue', 2:'green', 3:'yellow', 4:'black'}
labels = {0:'red', 1:'blue', 2:'green', 3:'yellow', 4:'black'}
color_points = hv.NdOverlay({labels[k]: gv.Points([0,0], crs=crs.PlateCarree(),
label=labels[k])(style=dict(color=v))
for k, v in color_key.items()})
dataset = gv.Dataset(df, kdims=['x', 'y'], vdims=['label'])
shaded = datashade(hv.Points(dataset), cmap=color_key, aggregator=ds.count_cat('label'))
shaded * color_points
That code doesn't seem to be runnable (races is not defined, and gv is not imported), but in any case, categorical colors are determined by the color_key argument, not cmap, so you'd need to change cmap=color_key to color_key=color_key.

Categories

Resources