I am using the code below to get a Panel dashboard with a dropdown select box, a histogram and a map.
import pandas as pd
import holoviews as hv
from holoviews.operation.datashader import datashade, rasterize, shade
import panel as pn
from holoviews.element.tiles import OSM
import hvplot.pandas
df = pd.read_parquet('cleanedFiles/AllMNO.parquet')
mno = pn.widgets.Select(options=df['mnc'].unique().tolist())
#pn.depends(mno)
def mnoStats(operator):
return'### Operator {} has {} samples'.format(operator, len(df[df['mnc'] == operator]))
#pn.depends(mno)
def plotMap(mno):
opts = dict(width=700, height=300, tools=['hover'])
tiles = OSM().opts(alpha=0.4, xaxis=None, yaxis=None)
points = hv.Points(df[df['mnc'] == mno], ['latitude', 'longitude'])
rasterized = shade(rasterize(points, x_sampling=1, y_sampling=1)).opts(**opts)
return tiles*rasterized
def plotHist(df):
return df.hvplot.hist(y='rsrp', by='mnc', bins=20)
pn.Row(pn.Column(pn.WidgetBox('## Ofcom scanner data', mno, mnoStats)),
pn.Column(plotHist(df))).servable()
pn.Row(plotMap).servable()
The dropdown selector and histogram appear as expected, however I get a 'blocky' image for the map as below. I wanted to get the locations (lat/longs) of the measurements each coloured / datashaded by the signal level denoted by the column 'rsrp'
Please advice how this can be corrected.
According to the holoviews docs, hv.rasterize is a high-level resampling interface and passes parameters to several internal methods:
holoviews.core.operation.Operation: group, input_ranges
holoviews.operation.datashader.LinkableOperation: link_inputs
holoviews.operation.datashader.ResamplingOperation: dynamic, streams, expand, height, width, x_range, y_range, x_sampling, y_sampling, target, element_type, precompute
holoviews.operation.datashader.AggregationOperation: vdim_prefix
Based on this, it looks like your arguments x_sampling and y_sampling are passed to ResamplingOperation, which are described:
x_sampling = param.Number(allow_None=True, inclusive_bounds=(True, True), label=’X sampling’)
Specifies the smallest allowed sampling interval along the x axis.
y_sampling = param.Number(allow_None=True, inclusive_bounds=(True, True), label=’Y sampling’)
Specifies the smallest allowed sampling interval along the y axis.
So, I'd guess that the issue is that providing the arguments x_sampling=1, y_sampling=1 to rasterize has the effect of aggregating all of your data to 1 degree, or approximately 110 km/70 mile blocks, which is causing the blockiness in your figure. Changing these parameters to a smaller value, such as 0.1 or smaller, should resolve the issue, as long as your data itself has sufficient resolution.
Related
Question:
I am struggling for a more than a week now to do something probably pretty simple:
I want to make a time series plot in which i can control the x axis
range/zoom with a datetime picker widget.
I also want the datetime picker to be updated when the x range is
changed with the plot zoom controls
So far I can do either but not both. It did work for other widgets such as the intslider etc.
Requirements:
If the solution has 1 DatetimeRangePicker to define the x range or 2 DatetimePicker widgets (one for start one for end) would both work great for me.
my datasets are huge so it would be great if it works with datashader
Any help is much appreciated :)
What I tried:
MRE & CODE BELOW
Create a DatetimeRangePicker widget, plot the data using pvplot and set the xlim=datatimerangepicker.
Result: the zoom changes with the selected dates on the widget, but zooming / panning the plot does not change the values of the widget.
Use hv.streams.RangeX stream to capture changes in x range when panning / zooming. Use a pn.depends function to generate plot when changing DatetimeRangePicker widget.
Result: the figure loads and zooming/panning changes the widget (but is very slow), but setting the widget causes AttributeError.
Create a DatetimePicker widget for start and end. Link them with widget.jslink() bidirectionally to x_range.start and x_range.end of the figure.
Result: figure loads but nothing changes when changing values on the widget or panning/zooming.
MRE & Failed Attempts
Create Dataset
import pandas as pd
import numpy as np
import panel as pn
import holoviews as hv
import hvplot.pandas
hv.extension('bokeh')
df = pd.DataFrame({'data': np.random.randint(0, 100, 100)}, index=pd.date_range(start="2022", freq='D', periods=100))
Failed Method 1:
plot changes with widget, but widget does not change with plot
range_select = pn.widgets.DatetimeRangePicker(value=(df.index[0], df.index[-1]))
pn.Column(df.data.hvplot.line(datashade=True, xlim=range_select), range_select)
Failed Method 2:
Slow and causes AttributeError: 'NoneType' object has no attribute 'id' when changing widget
range_select = pn.widgets.DatetimeRangePicker(value=(df.index[0], df.index[-1]))
#pn.depends(range_x=range_select.param.value)
def make_fig(range_x):
fig = df.data.hvplot.line(datashade=True, xlim=range_x)
pointer = hv.streams.RangeX(source=fig)
tabl = hv.DynamicMap(show_x, streams=[pointer]) # plot useless table to make it work
return fig + tabl
def show_x(x_range):
if x_range is not None:
range_select.value = tuple([pd.Timestamp(i).to_pydatetime() for i in x_range])
return hv.Table({"start": [x_range[0]], "stop": [x_range[1]]}, ["start", "stop"]) if x_range else hv.Table({})
pn.Column(range_select, make_fig)
Failed Method 3:
does not work with DatetimePicker widget, but does work other widgets (e.g. intslider)
pn.widgets.DatetimePicker._source_transforms = ({}) # see https://discourse.holoviz.org/t/using-jslink-with-pn-widgets-datepicker/1116
# datetime range widgets
range_strt = pn.widgets.DatetimePicker()
range_end = pn.widgets.DatetimePicker()
# int sliders as example that some widgets work
int_start_widget = pn.widgets.IntSlider(start=0, step=int(1e6), end=int(1.7e12))
int_end_widget = pn.widgets.IntSlider(start=0, step=int(1e6), end=int(1.7e12))
points = df.data.hvplot.line(datashade=True) # generate plot
# link widgets to plot:
int_start_widget.jslink(points, value="x_range.start", bidirectional=True)
int_end_widget.jslink(points, value="x_range.end", bidirectional=True)
range_strt.jslink(points, value="x_range.start", bidirectional=True)
range_end.jslink(points, value="x_range.end", bidirectional=True)
pn.Row(points,pn.Column( range_strt, range_end, int_start_widget, int_end_widget,))
Here is what I came up with:
range_select = pn.widgets.DatetimeRangePicker(value=(df.index[0].to_pydatetime(), df.index[-1].to_pydatetime()))
curve = df.data.hvplot.line(datashade=True).apply.opts(xlim=range_select, framewise=True)
rxy = hv.streams.RangeX(source=curve)
def update_widget(event):
new_dates = tuple([pd.Timestamp(i).to_pydatetime() for i in event.new])
if new_dates != range_select.value:
range_select.value = new_dates
rxy.param.watch(update_widget, 'x_range')
pn.Column(range_select, curve)
Basically we use .apply.opts to apply current widget value as the xlim dynamically (and set framewise=True so the plot ranges update dynamically). Then we instantiate a RangeX stream which we use to update the widget value. Annoyingly we have to do some datetime conversions because np.datetime64 and Timestamps aren't supported in some cases.
So when one exports r.out.vtk from Grass GIS we get a bad surface with -99999 points instead of nulls:
I want to remove them, yet a simple clip is not enough:
pd = pv.read('./pid1.vtk')
pd = pd.clip((0,1,1), invert=False).extract_surface()
p.add_mesh(pd ) #add atoms to scene
p.show()
resulting in:
So I wonder how to keep from it only top (> -999) points and connected vertices - in order to get only the top plane (it is curved\not flat actually) using pyvista?
link to example .vtk
There is an easy way to do this and there isn't...
You could use pyvista's threshold filter with all_scalars=True as long as you have only one set of scalars:
import pyvista as pv
pd = pv.read('./pid1.vtk')
pd = pd.threshold(-999, all_scalars=True)
plotter = pv.Plotter()
plotter.add_mesh(pd) #add atoms to scene
plotter.show()
Since all_scalars starts filtering based on every scalar array, this will only do what you'd expect if there are no other scalars. Furthermore, unfortunately there seems to be a bug in pyvista (expected to be fixed in version 0.32.0) which makes the use of this keyword impossible.
What you can do in the meantime (if you don't want to use pyvista's main branch before the fix is released) is to threshold the data yourself using numpy:
import pyvista as pv
pd = pv.read('./pid1.vtk')
scalars = pd.active_scalars
keep_inds = (scalars > -999).nonzero()[0]
pd = pd.extract_points(keep_inds, adjacent_cells=False)
plotter = pv.Plotter()
plotter.add_mesh(pd) #add atoms to scene
plotter.show()
The main point of both all_scalars (in threshold) and adjacent_cells (in extract_points) is to only keep cells where every point satisfies the condition.
With both of the above I get the following figure using your data:
I'm interested in being able to recreate this multidimensional strip plot below, generated by the Missing Numbers python library, using vega-lite, and I'm looking for a few pointers on how I might do this. The code to generate the image below looks a bit like this snippet:
>>> from quilt.data.ResidentMario import missingno_data
>>> collisions = missingno_data.nyc_collision_factors()
>>> collisions = collisions.replace("nan", np.nan)
>>> import missingno as msno
>>> %matplotlib inline
>>> msno.matrix(collisions.sample(250))
For each column, there is a mark shown for a specific combination of the index, and where the data is null, or not null.
When I look through a gallery of charts created by Altair, I see this horizontal strip plot, which seems to be presenting a similar kind of information, but I'm not sure how to express the same idea.
The viz below is showing a mark when there is data that matches a given combination of horse power and cylinder size - the horsepower and cylinder are encoded into the x and y channels.
I'm not show how I'd express the same for the cool nullity matrix thing, and I think I need some pointers here.
I get that I can reset and index to come up with a y index, but it's not clear to me how to index of the sample is encoded in the Y channel, I'm not sure how I'd populate the x-axis with a column listing the null/not null results. Is this a thing I'd need to do before it gets to vega-lite, or does vega support it?
Yes, you can do this after reshaping your data with a Fold Transform. It looks something like this using Altair:
import numpy as np
import quilt
quilt.install("ResidentMario/missingno_data")
from quilt.data.ResidentMario import missingno_data
collisions = missingno_data.nyc_collision_factors()
collisions = collisions.replace("nan", np.nan)
collisions = collisions.set_index("Unnamed: 0")
import altair as alt
alt.Chart(collisions.sample(250)).transform_window(
index='row_number()'
).transform_fold(
collisions.columns.to_list()
).transform_calculate(
defined="isValid(datum.value)"
).mark_rect().encode(
x=alt.X('key:N',
title=None,
sort=collisions.columns.to_list(),
axis=alt.Axis(orient='top', labelAngle=-45)
),
y=alt.Y('index:O', title=None),
color=alt.Color('defined:N',
legend=None,
scale=alt.Scale(domain=["true", "false"], range=["black", "white"])
)
).properties(
width=800, height=400
)
I am using the seaborn clustermap function and I would like to make multiple plots where the cell sizes are exactly identical. Also the size of the axis labels should be the same. This means figure size and aspect ratio will need to change, the rest needs to stay identical.
import pandas
import seaborn
import numpy as np
dataFrameA = pd.DataFrame([ [1,2],[3,4] ])
dataFrameB = pd.DataFrame( np.arange(3*6).reshape(3,-1))
Then decide how big the clustermap itself needs to be, something along the lines of:
dpi = 72
cellSizePixels = 150
This decides that dataFrameA should be should be 300 by 300 pixels. I think that those need to be converted to the size units of the figure, which will be cellSizePixels/dpi units per pixel. So for dataFrameA that will be a heatmap size of ~2.01 inches. Here I am introducing a problem: there is stuff around the heatmap, which will also take up some space, and I don't know how much space those will exactly take.
I tried to parametrize the heatmap function with a guess of the image size using the formula above:
def fixedWidthClusterMap( dpi, cellSizePixels, dataFrame):
clustermapParams = {
'square':False # Tried to set this to True before. Don't: the dendograms do not scale well with it.
}
figureWidth = (cellSizePixels/dpi)*dataFrame.shape[1]
figureHeight= (cellSizePixels/dpi)*dataFrame.shape[0]
return sns.clustermap( dataFrame, figsize=(figureWidth,figureHeight), **clustermapParams)
fixedWidthClusterMap(dpi, cellSizePixels, dataFrameA)
plt.show()
fixedWidthClusterMap(dpi, cellSizePixels, dataFrameB)
plt.show()
This yields:
My question: how do I obtain square cells which are exactly the size I want?
This is a bit tricky, because there are quite a few things to take into consideration, and in the end, it depends how "exact" you need the sizes to be.
Looking at the code for clustermap the heatmap part is designed to have a ratio of 0.8 compared to the axes used for the dendrograms. But we also need to take into account the margins used to place the axes. If one knows the size of the heatmap axes, one should therefore be able to calculate the desired figure size that would produce the right shape.
dpi = matplotlib.rcParams['figure.dpi']
marginWidth = matplotlib.rcParams['figure.subplot.right']-matplotlib.rcParams['figure.subplot.left']
marginHeight = matplotlib.rcParams['figure.subplot.top']-matplotlib.rcParams['figure.subplot.bottom']
Ny,Nx = dataFrame.shape
figWidth = (Nx*cellSizePixels/dpi)/0.8/marginWidth
figHeigh = (Ny*cellSizePixels/dpi)/0.8/marginHeight
Unfortunately, it seems matplotlib must adjust things a bit during plotting, because that was not enough the get perfectly square heatmap cells. So I choose to resize the various axes create by clustermap after the fact, starting with the heatmap, then the dendrogram axes.
I think the resulting image is pretty close to what you were trying to get, but my tests sometime show some errors by 1-2 px, which I attribute to rounding errors due to all the conversions between sizes in inches and pixels.
dataFrameA = pd.DataFrame([ [1,2],[3,4] ])
dataFrameB = pd.DataFrame( np.arange(3*6).reshape(3,-1))
def fixedWidthClusterMap(dataFrame, cellSizePixels=50):
# Calulate the figure size, this gets us close, but not quite to the right place
dpi = matplotlib.rcParams['figure.dpi']
marginWidth = matplotlib.rcParams['figure.subplot.right']-matplotlib.rcParams['figure.subplot.left']
marginHeight = matplotlib.rcParams['figure.subplot.top']-matplotlib.rcParams['figure.subplot.bottom']
Ny,Nx = dataFrame.shape
figWidth = (Nx*cellSizePixels/dpi)/0.8/marginWidth
figHeigh = (Ny*cellSizePixels/dpi)/0.8/marginHeight
# do the actual plot
grid = sns.clustermap(dataFrame, figsize=(figWidth, figHeigh))
# calculate the size of the heatmap axes
axWidth = (Nx*cellSizePixels)/(figWidth*dpi)
axHeight = (Ny*cellSizePixels)/(figHeigh*dpi)
# resize heatmap
ax_heatmap_orig_pos = grid.ax_heatmap.get_position()
grid.ax_heatmap.set_position([ax_heatmap_orig_pos.x0, ax_heatmap_orig_pos.y0,
axWidth, axHeight])
# resize dendrograms to match
ax_row_orig_pos = grid.ax_row_dendrogram.get_position()
grid.ax_row_dendrogram.set_position([ax_row_orig_pos.x0, ax_row_orig_pos.y0,
ax_row_orig_pos.width, axHeight])
ax_col_orig_pos = grid.ax_col_dendrogram.get_position()
grid.ax_col_dendrogram.set_position([ax_col_orig_pos.x0, ax_heatmap_orig_pos.y0+axHeight,
axWidth, ax_col_orig_pos.height])
return grid # return ClusterGrid object
grid = fixedWidthClusterMap(dataFrameA, cellSizePixels=75)
plt.show()
grid = fixedWidthClusterMap(dataFrameB, cellSizePixels=75)
plt.show()
Not a complete answer (not dealing with pixels) but I suspect OP has moved on after 4 years.
def reshape_clustermap(cmap, cell_width=0.02, cell_height=0.02):
ny, nx = cmap.data2d.shape
hmap_width = nx * cell_width
hmap_height = ny * cell_height
hmap_orig_pos = cmap.ax_heatmap.get_position()
cmap.ax_heatmap.set_position(
[hmap_orig_pos.x0, hmap_orig_pos.y0, hmap_width, hmap_height]
)
top_dg_pos = cmap.ax_col_dendrogram.get_position()
cmap.ax_col_dendrogram.set_position(
[hmap_orig_pos.x0, hmap_orig_pos.y0 + hmap_height, hmap_width, top_dg_pos.height]
)
left_dg_pos = cmap.ax_row_dendrogram.get_position()
cmap.ax_row_dendrogram.set_position(
[left_dg_pos.x0, left_dg_pos.y0, left_dg_pos.width, hmap_height]
)
if cmap.ax_cbar:
cbar_pos = cmap.ax_cbar.get_position()
hmap_pos = cmap.ax_heatmap.get_position()
cmap.ax_cbar.set_position(
[cbar_pos.x0, hmap_pos.y1, cbar_pos.width, cbar_pos.height]
)
cmap = sns.clustermap(dataFrameA)
reshape_clustermap(cmap)
I have a series of lines that each need to be plotted with a separate colour. Each line is actually made up of several data sets (positive, negative regions etc.) and so I'd like to be able to create a generator that will feed one colour at a time across a spectrum, for example the gist_rainbow map shown here.
I have found the following works but it seems very complicated and more importantly difficult to remember,
from pylab import *
NUM_COLORS = 22
mp = cm.datad['gist_rainbow']
get_color = matplotlib.colors.LinearSegmentedColormap.from_list(mp, colors=['r', 'b'], N=NUM_COLORS)
...
# Then in a for loop
this_color = get_color(float(i)/NUM_COLORS)
Moreover, it does not cover the range of colours in the gist_rainbow map, I have to redefine a map.
Maybe a generator is not the best way to do this, if so what is the accepted way?
To index colors from a specific colormap you can use:
import pylab
NUM_COLORS = 22
cm = pylab.get_cmap('gist_rainbow')
for i in range(NUM_COLORS):
color = cm(1.*i/NUM_COLORS) # color will now be an RGBA tuple
# or if you really want a generator:
cgen = (cm(1.*i/NUM_COLORS) for i in range(NUM_COLORS))