I am trying to plot a simple heatmap from a dataframe that looks like this:
row column content amount
0 x a c1 1
2 x b c3 3
4 x c c2 1
6 y a c1 1
8 y b c3 3
10 y c c2 1
12 z a c1 1
14 z b c3 3
16 z c c2 1
row and column indicate the position of the cell, the color of it should be chosen based on content and I want tooltips displaying the content and the amount.
I currently try it like this (using bokeh 1.2.0):
import pandas as pd
from bokeh.io import show
from bokeh.models import CategoricalColorMapper, LinearColorMapper, BasicTicker, PrintfTickFormatter, ColorBar, ColumnDataSource
from bokeh.plotting import figure
from bokeh.palettes import all_palettes
from bokeh.transform import transform
df = pd.DataFrame({
'row': list('xxxxxxyyyyyyzzzzzz'),
'column': list('aabbccaabbccaabbcc'),
'content': ['c1', 'c2', 'c3', 'c1', 'c2', 'c3'] * 3,
'amount': list('123212123212123212')})
df = df.drop_duplicates(subset=['row', 'column'])
source = ColumnDataSource(df)
rows = df['row'].unique()
columns = df['column'].unique()
content = df['content'].unique()
colors = all_palettes['Viridis'][max(len(content), 3)]
mapper = CategoricalColorMapper(palette=colors, factors=content)
TOOLS = "hover,save,pan,box_zoom,reset,wheel_zoom"
p = figure(title="My great heatmap",
x_range=columns, y_range=rows,
x_axis_location="above", plot_width=600, plot_height=400,
tools=TOOLS, toolbar_location='below',
tooltips=[('cell content', '#content'), ('amount', '#amount')])
p.grid.grid_line_color = None
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "5pt"
p.axis.major_label_standoff = 0
p.rect(x="row", y="column", width=1, height=1,
source=source,
fill_color=transform('content', mapper))
# color_bar = ColorBar(color_mapper=mapper, major_label_text_font_size="5pt",
# location=(0, 0))
# p.add_layout(color_bar, 'right')
show(p)
However, there are two issues:
1) When executed, I get an empty heatmap:
Any ideas why?
2) When I outcomment the color_bar = ... part, I receive an error saying:
ValueError: expected an instance of type ContinuousColorMapper, got
CategoricalColorMapper(id='3820', ...) of type CategoricalColorMapper
What am I doing wrong?
Your x and y coordindates are swapped, should be:
p.rect(x="column", y="row", ...)
As for the other message, it is self-explanatory: As of Bokeh 1.2, ColorBar can only be configured with continuous color mappers (e.g. LinearColorMapper). You can either:
compute colors yourself in Python code, and include a column of colors in source, or
re-cast your plot to use a LinearColorMapper (i.e. map content appropriately to some numerical scale)
For your colorBar the solution is here, I quiet did not understand yet what happened to your source, I will dig a bit deeper another time.
The colorBar expected a continuous mapper you gave it a categorical.
from bokeh.models import (CategoricalColorMapper, LinearColorMapper,
BasicTicker, PrintfTickFormatter, ColorBar, ColumnDataSource,
LinearColorMapper)
factors =df['content'].unique().tolist()
colors = all_palettes['Viridis'][max(len(factors), 3)]
mapper = LinearColorMapper(palette=colors)
Related
The Goal:
To generate a scatter plot from a pandas DataFrame with 3 columns: x, y, type (either a, b or c). The data points should have different colors based on the type. Every data point should have a hover effect. However, data points with type c should have a tap effect too. The data file (data_file.csv) looks something like:
x
y
z
1
4
a
2
3
b
3
2
a
4
4
c
..
..
..
My attempt:
First, I imported the dataframe and divided into two parts: one with c type data and another with the everything else. Then I created two columndatasource and plotted the data. Is there a shortcut or better way than this? Also, I couldn't achieve some feature (see below).
Code:
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, OpenURL, TapTool
from bokeh.models.tools import HoverTool
from bokeh.transform import factor_cmap
file = "data.csv"
df = read_csv(file, skiprows=1, header=None, sep="\t")
# now I will seperate the dataframe into two: one with type **a** & **b**
# and another dataframe containing with type **c**
c_df = df.drop(df[df[2] != 'c'].index)
ab_df = df.drop(df[df[2] == 'c'].index)
ab_source = ColumnDataSource(data=dict(
Independent = ab_df[0],
Dependent = ab_df[1],
Type = ab_df[2]
))
c_source = ColumnDataSource(data=dict(
Independent = c_df[0],
Dependent = c_df[1],
Type = c_df[2],
link = "http://example.com/" + c_df[0].apply(str) + ".php"
))
p = figure(title="Random PLot")
p.circle('Independent', 'Dependent',
size=10,
source=ab_source,
color=factor_cmap('Type',
['red', 'blue'],
['a', 'b']),
legend_group='Type'
)
p.circle('Independent', 'Dependent',
size=12,
source=c_source,
color=factor_cmap('Type',
['green'],
['c']),
name='needsTapTool'
)
p.legend.title = "Type"
hover = HoverTool()
hover.tooltips = """
<div>
<h3>Type: #Type</h3>
<p> #Independent and #Dependent </p>
</div>
"""
p.add_tools(hover)
url = "#link"
tap = TapTool(names=['needsTapTool'])
tap.callback = OpenURL(url=url)
p.add_tools(tap)
show(p)
Problems:
(1) How can I add two different hover tools so that different data points will behave differently depending on their type? Whenever I add another hover tool, only the last one is getting effective.
(2) How can I take part of a data in CDS? For example, imagine I have a column called 'href' which contains a link but have a "http://www" part. Now how can I set the 'link' variable inside a CDS that doesn't contain this part? when I try:
c_source = ColumnDataSource(data=dict(
link = c_df[3].apply(str)[10:]
))
I get a keyError. Any help will be appreciated.
It is possible to define multiple Tools, even multiple HoverTools in one plot. The trick is the collect the renderers and apply them to a specific tool.
In the example below, two HoverTools are added and on TapTool.
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, OpenURL, TapTool, HoverTool
output_notebook()
df = pd.DataFrame({'x':[1,2,3,4], 'y':[4,3,2,1], 'z':['a','b','a','c']})
>>
x y z
0 1 4 a
1 2 3 b
2 3 2 a
3 4 1 c
color = {'a':'red', 'b':'blue', 'c':'green'}
p = figure(width=300, height=300)
# list to collect the renderers
renderers = []
for item in df['z'].unique():
df_group = df[df['z']==item].copy()
# if group 'c', add some urls
if item == 'c':
# url with "https"
df_group['link'] = df_group.loc[:,'x'].apply(lambda x: f"https://www.example.com/{x}.php")
# url without "https"
df_group['plain_link'] = df_group.loc[:,'x'].apply(lambda x: f"example.com/{x}.php")
renderers.append(
p.circle(
'x',
'y',
size=10,
source=ColumnDataSource(df_group),
color=color[item],
legend_label=item
)
)
p.legend.title = "Type"
# HoverTool for renderers of group 'a' and 'b'
hover = HoverTool(renderers=renderers[:2])
hover.tooltips = """
<div>
<h3>Type: #z</h3>
<p> #x and #y </p>
</div>
"""
p.add_tools(hover)
# HoverTool for renderer of group 'c'
hover_c = HoverTool(renderers=[renderers[-1]])
hover_c.tooltips = """
<div>
<h3>Type: #z</h3>
<p> #x and #y </p>
<p> #plain_link </p>
</div>
"""
p.add_tools(hover_c)
# TapTool for renderer of group 'c'
tap = TapTool(renderers=[renderers[-1]], callback=OpenURL(url="#link"))
p.add_tools(tap)
show(p)
I'm trying to create a bar chart to see which stores had the biggest revenue in my dataset. Using the default Pandas plot I can do that in one line:
df.groupby('store_name')['sale_value'].sum().sort_values(ascending=False).head(20).plot(kind='bar')
But this chart is not very interactive and I can't see the exact values, so I want to try and create it using Bokeh and be able to mouseover a bar and see the exact amout, for example.
I tried doing the following but just got a blank page:
source = ColumnDataSource(df.groupby('store_name')['sale_value'])
plot = Plot()
glyph = VBar(x='store_name', top='sale_value')
plot.add_glyph(source, glyph)
show(plot)
and if I change source to ColumnDataSource(df.groupby('store_name')['sale_value'].sum()) I get 'ValueError: expected a dict or pandas.DataFrame, got store_name'
How can I create this chart with mouseover using Bokeh?
Let's asume this is our DataFrame:
df = pd.DataFrame({'store_name':['a', 'b', 'a', 'c'], 'sale_value':[4, 5, 2, 4]})
df
>>>
store_name sale_value
0 a 4
1 b 5
2 a 2
3 c 4
Now it is possible to creat a bar chart with your approach.
First we have to do some imports and preprocessing:
from bokeh.models import ColumnDataSource, Grid, LinearAxis, Plot, VBar, Title
source = ColumnDataSource(df.groupby('store_name')['sale_value'].sum().to_frame().reset_index())
my_ticks = [i for i in range(len(source.data['store_name']))]
my_tick_labels = {i: source.data['store_name'][i] for i in range(len(source.data['store_name']))}
There are some changes in the section of the groupby. A .sum() is added and it is reset to a DataFrame with ascending index.
Then you can create a plot.
plot = Plot(title=Title(text='Plot'),
plot_width=300,
plot_height=300,
min_border=0,
toolbar_location=None
)
glyph = VBar(x='index',
top='sale_value',
bottom=0,
width=0.5,
fill_color="#b3de69"
)
plot.add_glyph(source, glyph)
xaxis = LinearAxis(ticker = my_ticks,
major_label_overrides= my_tick_labels
)
plot.add_layout(xaxis, 'below')
yaxis = LinearAxis()
plot.add_layout(yaxis, 'left')
plot.add_layout(Grid(dimension=0, ticker=xaxis.ticker))
plot.add_layout(Grid(dimension=1, ticker=yaxis.ticker))
show(plot)
I also want to show your a second approach I prefere more.
from bokeh.plotting import figure, show
plot = figure(title='Plot',
plot_width=300,
plot_height=300,
min_border=0,
toolbar_location=None
)
plot.vbar(x='index',
top='sale_value',
source=source,
bottom=0,
width=0.5,
fill_color="#b3de69"
)
plot.xaxis.ticker = my_ticks
plot.xaxis.major_label_overrides = my_tick_labels
show(plot)
I like the second one more, because it is a bit shorter.
The created figure is in both cases the same. It looks like this.
I have the following dataframe:
Foo Bar
A 100. 20.
B 65.2 78.
And I want to plot this dataframe is Bokeh, such that I have a line for Foo and a Line for Bar, and the x axis ticks are labelled A and B, and not 0 and 1. So far, I have the following:
p = figure()
p.line(df["Foo"], df.index.values)
show(p)
But this still shows the x axis ticks as integers and not as the index values A and B as expected. How to show the index values?
I tried the following as well:
p = figure(x_range=df.index.values)
p.line(df["Foo"])
show(p)
And I still don't see any lines on the graph.
The tricky part when working with bokeh is that if you want an axis to be categorical, you need to specify it's possible values on the bokeh plot when setting up the figure.
import pandas as pd
from bokeh.plotting import figure
from bokeh.io import show
df = pd.DataFrame({"Foo":[100, 65.2], "Bar": [20, 78]}, index=["A", "B"])
print(df)
Foo Bar
A 100.0 20
B 65.2 78
# Tell bokeh our plot will have a categorical x-axis
# whose values are the index of our dataframe
p = figure(x_range=df.index.values, width=250, height=250)
p.line(x="index", y="Foo", source=df, legend_label="Foo")
p.line(x="index", y="Bar", source=df, legend_label="Bar")
p.legend.location = "bottom_right"
show(p)
I am trying to draw a heatmap(spectrogram) in bokeh but when the heatmap displays it is empty..
This is the code which has some simply sample data, but this would be extended to fetch a large dataset via json.
from math import pi
import pandas as pd
from bokeh.io import show
from bokeh.models import LinearColorMapper, BasicTicker, PrintfTickFormatter, ColorBar
from bokeh.plotting import figure
# initialise data of lists.
data = {'epoch':[63745131000000, 63745131000000, 63745131100000,63745131100000], 'energy':[1.06811, 1.22078, 1.59495, 1.82245],'value':[3981.9308143034305, 2868.5202872178324, 1330.887696894385, 745.6847248644897]}
# Creates pandas DataFrame.
df = pd.DataFrame(data)
# print the data
print(df)
# this is the colormap from the original NYTimes plot
colors = ['#00007F', '#0000ff', '#007FFF', '#00ffff', '#7FFF7F', '#ffff00', '#FF7F00', '#ff0000', '#7F0000']
mapper = LinearColorMapper(palette=colors, low=df.value.min(), high=df.value.max())
TOOLS = "hover,save,pan,box_zoom,reset,wheel_zoom"
epochs = list(df.epoch.drop_duplicates())
print(epochs)
energies = list(df.energy.drop_duplicates())
print(energies)
p = figure(title="My Plot",
x_axis_location="below",
tools=TOOLS, toolbar_location='below',
tooltips=[('epoch', '#epoch'), ('energy', '#energy'), ('value', '#value')])
p.xaxis.ticker = epochs
p.yaxis.ticker = energies
p.grid.grid_line_color = None
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "5pt"
p.axis.major_label_standoff = 0
p.xaxis.major_label_orientation = pi / 3
p.rect(x="epoch", y="energy", width=1, height=1,
source=df,
fill_color={'field': 'value', 'transform': mapper},
line_color=None)
color_bar = ColorBar(color_mapper=mapper, major_label_text_font_size="5pt",
ticker=BasicTicker(desired_num_ticks=len(colors)),
label_standoff=6, border_line_color=None, location=(0, 0))
p.add_layout(color_bar, 'right')
show(p)
The frame which is output looks correct:
epoch energy value
0 63745131000000 1.06811 3981.930814
1 63745131000000 1.22078 2868.520287
2 63745131100000 1.59495 1330.887697
3 63745131100000 1.82245 745.684725
and the ranges for the x and y look ok as well:
[63745131000000, 63745131100000]
[1.06811, 1.22078, 1.59495, 1.82245]
But the image that appears has no points plotted:
I should mention, if I simply change the second epoch to one after e.g)
'epoch':[63745131000000, 63745131000000, 63745131000001,63745131000001]
Then the chart seems to be displayed correctly:
Grateful for any help.
Thanks
The reason why there is no information showing up is because at the edge bokeh apparently does not think your part of the data is something that deserves a color.
What you should change is the limits in you mapper:
mapper = LinearColorMapper(palette=colors, low=df.value.min()-1, high=df.value.max()+1) # this will make sure your data is inside the mapping
Also your width is defined in the figure a being 1. When your epochs are differing with a million you will still see almost nothing when you are plotting this so change
p.rect(x="epoch", y="energy", width=100000, height=1, # here width is set to an adequate level.
source=df,
fill_color={'field': 'value', 'transform': mapper},
line_color=None)
I have a dataframe as
df = pd.DataFrame(data = {'Country':'Spain','Japan','Brazil'],'Number':[10,20,30]})
I wanted to plot a bar chart with labels (that is value of 'Number') annotated on the top for each bar and proceeded accordingly.
from bokeh.charts import Bar, output_file,output_notebook, show
from bokeh.models import Label
p = Bar(df,'Country', values='Number',title="Analysis", color = "navy")
label = Label(x='Country', y='Number', text='Number', level='glyph',x_offset=5, y_offset=-5)
p.add_annotation(label)
output_notebook()
show(p)
But I got an error as ValueError: expected a value of type Real, got COuntry of type str.
How do I solve this issue ?
Label produces a single label at position x and y. In you example, you are trying to add multiple labels using the data from your DataFrame as coordinates. Which is why you are getting your error message x and y need to be real coordinate values that map to the figure's x_range and y_range. You should look into using LabelSet (link) which can take a Bokeh ColumnDataSource as an argument and build multiple labels.
Unforutnately, you are also using a Bokeh Bar chart which is a high level chart which creates a categorical y_range. Bokeh cannot put labels on categorical y_ranges for now. You can circumvent this problem by creating a lower level vbar chart using placeholder x values and then styling it to give it the same look as your original chart. Here it is in action.
import pandas as pd
from bokeh.plotting import output_file, show, figure
from bokeh.models import LabelSet, ColumnDataSource, FixedTicker
# arbitrary placeholders which depends on the length and number of labels
x = [1,2,3]
# This is offset is based on the length of the string and the placeholder size
offset = -0.05
x_label = [x + offset for x in x]
df = pd.DataFrame(data={'Country': ['Spain', 'Japan', 'Brazil'],
'Number': [10, 20, 30],
'x': x,
'y_label': [-1.25, -1.25, -1.25],
'x_label': x_label})
source = ColumnDataSource(df)
p = figure(title="Analysis", x_axis_label='Country', y_axis_label='Number')
p.vbar(x='x', width=0.5, top='Number', color="navy", source=source)
p.xaxis.ticker = FixedTicker(ticks=x) # Create custom ticks for each country
p.xaxis.major_label_text_font_size = '0pt' # turn off x-axis tick labels
p.xaxis.minor_tick_line_color = None # turn off x-axis minor ticks
label = LabelSet(x='x_label', y='y_label', text='Number',
level='glyph', source=source)
p.add_layout(label)
show(p)