I am trying to plot a few points on a graph, similarly to a heat map.
Sample code (adapted from the heat map section here):
import pandas as pd
from bokeh.io import output_notebook, show
from bokeh.models import BasicTicker, ColorBar, ColumnDataSource, LinearColorMapper, PrintfTickFormatter
from bokeh.plotting import figure
from bokeh.transform import transform
import numpy as np
# change this if you don't run it on a Jupyter Notebook
output_notebook()
testx = np.random.randint(0,10,10)
testy = np.random.randint(0,10,10)
npdata = np.stack((testx,testy), axis = 1)
hist, bins = np.histogramdd(npdata, normed = False, bins = (10,10), range=((0,10),(0,10)))
data = pd.DataFrame(hist, columns = [str(x) for x in range(10)])
data.columns.name = 'y'
data['x'] = [str(x) for x in range(10)]
data = data.set_index('x')
df = pd.DataFrame(data.stack(), columns=['present']).reset_index()
source = ColumnDataSource(df)
colors = ['lightblue', "yellow"]
mapper = LinearColorMapper(palette=colors, low=df.present.min(), high=df.present.max())
p = figure(plot_width=400, plot_height=400, title="test circle map",
x_range=list(data.index), y_range=list((data.columns)),
toolbar_location=None, tools="", x_axis_location="below")
p.circle(x="x", y="y", size=20, source=source,
line_color=None, fill_color=transform('present', mapper))
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "10pt"
p.axis.major_label_standoff = 10
p.xaxis.major_label_orientation = 0
show(p)
That returns:
Now, as you can see, the grid lines are centered on the points(circles), and I would like, instead to have the circles enclosed in a square created by the lines.
I went through this to see if I could find information on how to offset the grid lines by 0.5 (that would have worked), but I was not able to.
There's nothing built into Bokeh to accomplish this kind of offsetting of categorical ticks, but you can write a custom extension to do it:
CS_CODE = """
import {CategoricalTicker} from "models/tickers/categorical_ticker"
export class MyTicker extends CategoricalTicker
type: "MyTicker"
get_ticks: (start, end, range, cross_loc) ->
ticks = super(start, end, range, cross_loc)
# shift the default tick locations by half a categorical bin width
ticks.major = ([x, 0.5] for x in ticks.major)
return ticks
"""
class MyTicker(CategoricalTicker):
__implementation__ = CS_CODE
p.xgrid.ticker = MyTicker()
p.ygrid.ticker = MyTicker()
Note that Bokeh assumes CoffeeScript by default when the code is just a string, but it's possible to use pure JS or TypeScript as well. Adding this to your code yields:
Please note the comment about output_notebook you must call it (possibly again, if you have called it previously) after the custom model is defined, due to #6107
Related
I'm getting this error:
TypeError: Object of type Interval is not JSON serializable
Here is my code.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
from bokeh.io import output_file, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.models import NumeralTickFormatter
def construct_labels(start, end):
labels = []
for index, x in enumerate(start):
y = end[index]
labels.append('({}, {}]'.format(x, y))
return labels
values = {'Length': np.random.uniform(0, 4, 10)}
df = pd.DataFrame(values, columns=['Length'])
bin_step_size = 0.5
# List of bin points.
p_bins = np.arange(0, (df['Length'].max() + bin_step_size), bin_step_size)
# Reduce the tail to create the left side bounds.
p_left_limits = p_bins[:-1].copy()
# Cut the head to create the right side bounds.
p_right_limits = np.delete(p_bins, 0)
# Create the bins.
p_range_bins = pd.IntervalIndex.from_arrays(p_left_limits, p_right_limits)
# Create labels.
p_range_labels = construct_labels(p_left_limits, p_right_limits)
p_ranges_binned = pd.cut(
df['Length'],
p_range_bins,
labels=p_range_labels,
precision=0,
include_lowest=True)
out = p_ranges_binned
counts = out.value_counts(sort=False)
total_element_count = len(df.index)
foo = pd.DataFrame({'bins': counts.index, 'counts': counts})
foo.reset_index(drop=True, inplace=True)
foo['percent'] = foo['counts'].apply(lambda x: x / total_element_count)
foo['percent_full'] = foo['counts'].apply(lambda x: x / total_element_count * 100)
bin_labels = p_range_labels
# Data Container
source = ColumnDataSource(dict(
bins=foo['bins'],
percent=foo['percent'],
count=foo['counts'],
labels=pd.DataFrame({'labels': bin_labels})
))
p = figure(x_range=bin_labels, plot_height=600, plot_width=1200, title="Range Counts",
toolbar_location=None, tools="")
p.vbar(x='labels', top='percent', width=0.9, source=source)
p.yaxis[0].formatter = NumeralTickFormatter(format="0.0%")
p.xaxis.major_label_orientation = math.pi / 2
p.xgrid.grid_line_color = None
p.y_range.start = 0
output_file("bars.html")
show(p)
The error comes from here:
source = ColumnDataSource(dict(
bins=foo['bins'],
percent=foo['percent'],
count=foo['counts'],
labels=pd.DataFrame({'labels': bin_labels})
))
The bins you passed in is a interval type that cannot be JSON serialized.
After review your code, this bins variable is not used in your plotting, so you can change it to:
source = ColumnDataSource(dict(
percent=foo['percent'],
count=foo['counts'],
labels=bin_labels
))
Notice that I also changed your labels to bin_labels, which is a list and ColumnDataSource can use list as input. But you may want to further format these labels, as right now they are just like
['(0.0, 0.5]',
'(0.5, 1.0]',
'(1.0, 1.5]',
'(1.5, 2.0]',
'(2.0, 2.5]',
'(2.5, 3.0]',
'(3.0, 3.5]',
'(3.5, 4.0]']
You might want to format them to something prettier.
After this small change you should be able to see your bar graph:
I have code below that creates a simple line x-y plot.
When I zoom in, I want the x-axis ticker to start at 0 again instead of 3.9/whatever the x point of the zoom was as in the image.
No Zoom:
After Zooming:
How do I do that?
Code:
from bokeh.io import output_file, show, save
from bokeh.layouts import column
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
data = []
x = list(range(11))
y0 = x
y1 = [10 - xx for xx in x]
y2 = [abs(xx - 5) for xx in x]
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1, y2=y2))
for i in range(3):
p = figure(title="Title " + str(i), plot_width=300, plot_height=300)
if len(data):
p.x_range = data[0].x_range
p.y_range = data[0].y_range
p.circle('x', 'y0', size=10, color="navy", alpha=0.5, legend_label='line1', source=source)
p.legend.location = 'top_right'
p.legend.click_policy = "hide"
data.append(p)
plot_col = column(data)
# show the results
show(plot_col)
This is an unusual requirement, and none of the built-in things behave this way. If you zoom in to the interval [4,7], the the range will be updated [4, 7], and so then the axis will display labels for [4, 7]. If it will suffice to simply display different tick labels, even while the underlying range start/end remain their usual values, then you could use a Custom Extension to generate whatever customized labels you want. There is an example in the User's Guide that already does almost exactly what you want already:
https://docs.bokeh.org/en/latest/docs/user_guide/extensions_gallery/ticking.html#userguide-extensions-examples-ticking
You might also be able to do something even more simply with a FuncTickFormatter, e.g. (untested)
p.xaxis.formatter = FuncTickFormatter(code="""
return tick - ticks[0]
""")
I've included the PolyDrawTool in my Bokeh plot to let users circle points. When a user draws a line near the edge of the plot the tool expands the axes which often messes up the shape. Is there a way to freeze the axes while a user is drawing on the plot?
I'm using bokeh 1.3.4
MRE:
import numpy as np
import pandas as pd
import string
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.models import PolyDrawTool, MultiLine
def prepare_plot():
embedding_df = pd.DataFrame(np.random.random((100, 2)), columns=['x', 'y'])
embedding_df['word'] = embedding_df.apply(lambda x: ''.join(np.random.choice(list(string.ascii_lowercase), (8,))), axis=1)
# Plot preparation configuration Data source
source = ColumnDataSource(ColumnDataSource.from_df(embedding_df))
labels = LabelSet(x="x", y="y", text="word", y_offset=-10,x_offset = 5,
text_font_size="10pt", text_color="#555555",
source=source, text_align='center')
plot = figure(plot_width=1000, plot_height=500, active_scroll="wheel_zoom",
tools='pan, box_select, wheel_zoom, save, reset')
# Configure free-hand draw
draw_source = ColumnDataSource(data={'xs': [], 'ys': [], 'color': []})
renderer = plot.multi_line('xs', 'ys', line_width=5, alpha=0.4, color='color', source=draw_source)
renderer.selection_glyph = MultiLine(line_color='color', line_width=5, line_alpha=0.8)
draw_tool = PolyDrawTool(renderers=[renderer], empty_value='red')
plot.add_tools(draw_tool)
# Add the data and labels to plot
plot.circle("x", "y", size=0, source=source, line_color="black", fill_alpha=0.8)
plot.add_layout(labels)
return plot
if __name__ == '__main__':
plot = prepare_plot()
show(plot)
The PolyDrawTool actually updates a ColumnDataSource to drive a glyph that draws what the users indicates. The behavior you are seeing is a natural consequence of that fact, combined with Bokeh's default auto-ranging DataRange1d (which by default also consider every glyph when computing the auto-bounds). So, you have two options:
Don't use DataRange1d at all, e.g. you can provide fixed axis bounds when you call figure:
p = figure(..., x_range=(0,10), y_range=(-20, 20)
or you can set them after the fact:
p.x_range = Range1d(0, 10)
p.y_range = Range1d(-20, 20)
Of course, with this approach you will no longer get any auto-ranging at all; you will need to set the axis ranges to exactly the start/end that you want.
Make DataRange1d be more selective by explicitly setting its renderers property:
r = p.circle(...)
p.x_range.renderers = [r]
p.y_range.renderers = [r]
Now the DataRange models will only consider the circle renderer when computing the auto-ranged start/end.
I would like to produce a heatmap in Python, similar to the one shown, where the size of the circle indicates the size of the sample in that cell. I looked in seaborn's gallery and couldn't find anything, and I don't think I can do this with matplotlib.
It's the inverse. While matplotlib can do pretty much everything, seaborn only provides a small subset of options.
So using matplotlib, you can plot a PatchCollection of circles as shown below.
Note: You could equally use a scatter plot, but since scatter dot sizes are in absolute units it would be rather hard to scale them into the grid.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
N = 10
M = 11
ylabels = ["".join(np.random.choice(list("PQRSTUVXYZ"), size=7)) for _ in range(N)]
xlabels = ["".join(np.random.choice(list("ABCDE"), size=3)) for _ in range(M)]
x, y = np.meshgrid(np.arange(M), np.arange(N))
s = np.random.randint(0, 180, size=(N,M))
c = np.random.rand(N, M)-0.5
fig, ax = plt.subplots()
R = s/s.max()/2
circles = [plt.Circle((j,i), radius=r) for r, j, i in zip(R.flat, x.flat, y.flat)]
col = PatchCollection(circles, array=c.flatten(), cmap="RdYlGn")
ax.add_collection(col)
ax.set(xticks=np.arange(M), yticks=np.arange(N),
xticklabels=xlabels, yticklabels=ylabels)
ax.set_xticks(np.arange(M+1)-0.5, minor=True)
ax.set_yticks(np.arange(N+1)-0.5, minor=True)
ax.grid(which='minor')
fig.colorbar(col)
plt.show()
Here's a possible solution using Bokeh Plots:
import pandas as pd
from bokeh.palettes import RdBu
from bokeh.models import LinearColorMapper, ColumnDataSource, ColorBar
from bokeh.models.ranges import FactorRange
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import numpy as np
output_notebook()
d = dict(x = ['A','A','A', 'B','B','B','C','C','C','D','D','D'],
y = ['B','C','D', 'A','C','D','B','D','A','A','B','C'],
corr = np.random.uniform(low=-1, high=1, size=(12,)).tolist())
df = pd.DataFrame(d)
df['size'] = np.where(df['corr']<0, np.abs(df['corr']), df['corr'])*50
#added a new column to make the plot size
colors = list(reversed(RdBu[9]))
exp_cmap = LinearColorMapper(palette=colors,
low = -1,
high = 1)
p = figure(x_range = FactorRange(), y_range = FactorRange(), plot_width=700,
plot_height=450, title="Correlation",
toolbar_location=None, tools="hover")
p.scatter("x","y",source=df, fill_alpha=1, line_width=0, size="size",
fill_color={"field":"corr", "transform":exp_cmap})
p.x_range.factors = sorted(df['x'].unique().tolist())
p.y_range.factors = sorted(df['y'].unique().tolist(), reverse = True)
p.xaxis.axis_label = 'Values'
p.yaxis.axis_label = 'Values'
bar = ColorBar(color_mapper=exp_cmap, location=(0,0))
p.add_layout(bar, "right")
show(p)
One option is to use matplotlib's scatter plots with legends and grid. You can specify size of those circles with specifying the scales. You can also change the color of each circle. You should somehow specify X,Y values so that the circles sit straight on lines. This is an example I got from here:
volume = np.random.rayleigh(27, size=40)
amount = np.random.poisson(10, size=40)
ranking = np.random.normal(size=40)
price = np.random.uniform(1, 10, size=40)
fig, ax = plt.subplots()
# Because the price is much too small when being provided as size for ``s``,
# we normalize it to some useful point sizes, s=0.3*(price*3)**2
scatter = ax.scatter(volume, amount, c=ranking, s=0.3*(price*3)**2,
vmin=-3, vmax=3, cmap="Spectral")
# Produce a legend for the ranking (colors). Even though there are 40 different
# rankings, we only want to show 5 of them in the legend.
legend1 = ax.legend(*scatter.legend_elements(num=5),
loc="upper left", title="Ranking")
ax.add_artist(legend1)
# Produce a legend for the price (sizes). Because we want to show the prices
# in dollars, we use the *func* argument to supply the inverse of the function
# used to calculate the sizes from above. The *fmt* ensures to show the price
# in dollars. Note how we target at 5 elements here, but obtain only 4 in the
# created legend due to the automatic round prices that are chosen for us.
kw = dict(prop="sizes", num=5, color=scatter.cmap(0.7), fmt="$ {x:.2f}",
func=lambda s: np.sqrt(s/.3)/3)
legend2 = ax.legend(*scatter.legend_elements(**kw),
loc="lower right", title="Price")
plt.show()
Output:
I don't have enough reputation to comment on Delenges' excellent answer, so I'll leave my comment as an answer instead:
R.flat doesn't order the way we need it to, so the circles assignment should be:
circles = [plt.Circle((j,i), radius=R[j][i]) for j, i in zip(x.flat, y.flat)]
Here is an easy example to plot circle_heatmap.
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.datasets import load_wine as load_data
from psynlig import plot_correlation_heatmap
plt.style.use('seaborn-talk')
data_set = load_data()
data = pd.DataFrame(data_set['data'], columns=data_set['feature_names'])
#data = df_corr_selected
kwargs = {
'heatmap': {
'vmin': -1,
'vmax': 1,
'cmap': 'viridis',
},
'figure': {
'figsize': (14, 10),
},
}
plot_correlation_heatmap(data, bubble=True, annotate=False, **kwargs)
plt.show()
I am unable to plot the area chart in bokeh for some reason..
Below is the code used for the same..
from bokeh.charts import Area, show, output_file
Areadict = dict(
I = df['IEXT'],
Date=df['Month'],
O = df['OT']
)
area = Area(Areadict, x='Date', y=['I','O'], title="Area Chart",
legend="top_left",
xlabel='time', ylabel='memory')
output_file('area.html')
show(area)
All i see if the date axis getting plotted, but no signs of the two areacharts that I am interested in.
Please advise
I would recommend looking at Holoviews which is a very high level API built on top of Bokeh, and is endorsed by the Bokeh team. You can see an Area chart example in their documentation. Basically it looks like:
# create holoviews objects
dims = dict(kdims='time', vdims='memory')
python = hv.Area(python_array, label='python', **dims)
pypy = hv.Area(pypy_array, label='pypy', **dims)
jython = hv.Area(jython_array, label='jython', **dims)
# plot
overlay.relabel("Area Chart") + hv.Area.stack(overlay).relabel("Stacked Area Chart")
Which results in
Otherwise, as of Bokeh 0.13 to create a stacked area chart with the stable bokeh.plotting API, you will need to stack the data yourself, as shown in this example:
import numpy as np
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.palettes import brewer
N = 20
cats = 10
df = pd.DataFrame(np.random.randint(10, 100, size=(N, cats))).add_prefix('y')
def stacked(df):
df_top = df.cumsum(axis=1)
df_bottom = df_top.shift(axis=1).fillna({'y0': 0})[::-1]
df_stack = pd.concat([df_bottom, df_top], ignore_index=True)
return df_stack
areas = stacked(df)
colors = brewer['Spectral'][areas.shape[1]]
x2 = np.hstack((df.index[::-1], df.index))
p = figure(x_range=(0, N-1), y_range=(0, 800))
p.grid.minor_grid_line_color = '#eeeeee'
p.patches([x2] * areas.shape[1], [areas[c].values for c in areas],
color=colors, alpha=0.8, line_color=None)
show(p)
which results in