I'm trying to display qualitative data using a donut plot with the bokeh library. I have 2 datasets sharing some data labels, and I want to have a unified legend that gathers both labels.
I have managed to either show the legend for only one plot, or have it for both but with repeated items. However, I did not find a way to have unique entries. Here is a sample code to show my issue:
from math import pi
import pandas as pd
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.transform import cumsum
from bokeh.palettes import Set3
# Create fake data
df = pd.DataFrame(
{'label': ['X{}'.format(i) for i in range(0, 4)] + ['X{}'.format(i) for i in range(2, 8)],
'angle': [2*pi / 4] * 4 + [2*pi / 6] * 6,
'group': [1]*4 + [2]*6})
# Set up colors
unique_labels = df.label.unique()
color_mapping = pd.Series(dict(zip(unique_labels, Set3[len(unique_labels)])))
df['color'] = color_mapping.loc[df.label].values
# Plot two concentric donuts
p = figure(title='Test', tools="hover", tooltips="#label")
p.annular_wedge(source=df[df.group==1], x=0, y=1, inner_radius=0.5, outer_radius=0.6,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_group='label')
p.annular_wedge(source=df[df.group==2], x=0, y=1, inner_radius=0.3, outer_radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_group='label')
show(p)
In the end, I get the following result:
Any idea to solve it?
I found other related issues (i.e. matplotlib), but not for bokeh.
I think this will work:
legend_tmp = {x.label['value']: x for x in p.legend.items}
p.legend.items.clear()
p.legend.items.extend(legend_tmp.values())
When it creates the legend for the plot, it is adding all of the items for both angular_wedge's but it doesn't get deduplicated the way you might expect, since the legend members are fairly complex objects themselves (meaning they are identified by more than just the value of the label).
Related
I'm plotting covid-19 data for countries grouped by World Bank regions using pandas and Bokeh.
from bokeh.io import output_file, show
from bokeh.palettes import Spectral5
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
group = data.groupby(["region", "CountryName"])
index_cmap = factor_cmap(
'region_CountryName',
palette=Spectral5,
factors=sorted(data.region.unique()),
end=1
)
p = figure(plot_width=800, plot_height=600, title="Confirmed cases per 100k people by country",
x_range=group, toolbar_location="left")
p.vbar(x='region_CountryName', top='ConfirmedPer100k_max', width=1, source=group,
line_color="white", fill_color=index_cmap, )
p.y_range.start = 0
p.xgrid.grid_line_color = None
p.xaxis.major_label_orientation = 3.14159/2
p.xaxis.group_label_orientation = 3.14159/2
p.outline_line_color = None
show(p)
And I get a
I would like to set some sort of initial zoom into the x-axis to get a more manageable image
, which I got by manually zooming in.
Any suggestions?
You should be able to accomplish this with the x_range parameter. In this example, the plot's x range would be the first 20 countries. You can adjust as needed. You might also have to mess around a bit to get the group_cn_list correct. It's hard to say without seeing your data. If you can post a df example for reproducibility, it would help.
group_cn_list = group["CountryName"].tolist()
p = figure(plot_width=800, plot_height=600, title="Confirmed cases per 100k people by country",
x_range=group_cn_list[0:20], toolbar_location="left")
I've included the PolyDrawTool in my Bokeh plot to let users circle points. When a user draws a line near the edge of the plot the tool expands the axes which often messes up the shape. Is there a way to freeze the axes while a user is drawing on the plot?
I'm using bokeh 1.3.4
MRE:
import numpy as np
import pandas as pd
import string
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.models import PolyDrawTool, MultiLine
def prepare_plot():
embedding_df = pd.DataFrame(np.random.random((100, 2)), columns=['x', 'y'])
embedding_df['word'] = embedding_df.apply(lambda x: ''.join(np.random.choice(list(string.ascii_lowercase), (8,))), axis=1)
# Plot preparation configuration Data source
source = ColumnDataSource(ColumnDataSource.from_df(embedding_df))
labels = LabelSet(x="x", y="y", text="word", y_offset=-10,x_offset = 5,
text_font_size="10pt", text_color="#555555",
source=source, text_align='center')
plot = figure(plot_width=1000, plot_height=500, active_scroll="wheel_zoom",
tools='pan, box_select, wheel_zoom, save, reset')
# Configure free-hand draw
draw_source = ColumnDataSource(data={'xs': [], 'ys': [], 'color': []})
renderer = plot.multi_line('xs', 'ys', line_width=5, alpha=0.4, color='color', source=draw_source)
renderer.selection_glyph = MultiLine(line_color='color', line_width=5, line_alpha=0.8)
draw_tool = PolyDrawTool(renderers=[renderer], empty_value='red')
plot.add_tools(draw_tool)
# Add the data and labels to plot
plot.circle("x", "y", size=0, source=source, line_color="black", fill_alpha=0.8)
plot.add_layout(labels)
return plot
if __name__ == '__main__':
plot = prepare_plot()
show(plot)
The PolyDrawTool actually updates a ColumnDataSource to drive a glyph that draws what the users indicates. The behavior you are seeing is a natural consequence of that fact, combined with Bokeh's default auto-ranging DataRange1d (which by default also consider every glyph when computing the auto-bounds). So, you have two options:
Don't use DataRange1d at all, e.g. you can provide fixed axis bounds when you call figure:
p = figure(..., x_range=(0,10), y_range=(-20, 20)
or you can set them after the fact:
p.x_range = Range1d(0, 10)
p.y_range = Range1d(-20, 20)
Of course, with this approach you will no longer get any auto-ranging at all; you will need to set the axis ranges to exactly the start/end that you want.
Make DataRange1d be more selective by explicitly setting its renderers property:
r = p.circle(...)
p.x_range.renderers = [r]
p.y_range.renderers = [r]
Now the DataRange models will only consider the circle renderer when computing the auto-ranged start/end.
I'm using the datetime axis of Bokeh. In the Bokeh data source, I have my x in numpy datetime format and others are y numbers. I'm looking for a way to show the label of the x datetimx axis right below the point. I want Bokeh to show the exact datetime that I provided via my data source, not some approximation! For instance, I provide 5:15:00 and it shows 5:00:00 somewhere before the related point.I plan to stream data to the chart every 1 hour, and I want to show 5 points each time. Therefore, I need 5 date-time labels. How can I do that? I tried p.yaxis[0].ticker.desired_num_ticks = 5 but it didn't help. Bokeh still shows as many number of ticks as it wants! Here is my code and result:
import numpy as np
from bokeh.models.sources import ColumnDataSource
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.palettes import Category10
p = figure(x_axis_type="datetime", plot_width=800, plot_height=500)
data = {'x':
[np.datetime64('2019-01-26T03:15:10'),
np.datetime64('2019-01-26T04:15:10'),
np.datetime64('2019-01-26T05:15:10'),
np.datetime64('2019-01-26T06:15:10'),
np.datetime64('2019-01-26T07:15:10')],
'A': [10,25,15,55,40],
'B': [60,50,80,65,120],}
source = ColumnDataSource(data=data)
cl = Category10[3][1:]
r11 = p.line(source=source, x='x', y='A', color=cl[0], line_width=3)
r12 = p.line(source=source, x='x', y='B', color=cl[1], line_width=3)
p.xaxis.formatter=DatetimeTickFormatter(
seconds=["%H:%M:%S"],
minsec=["%H:%M:%S"],
minutes=["%H:%M:%S"],
hourmin=["%H:%M:%S"],
hours=["%H:%M:%S"],
days=["%H:%M:%S"],
months=["%H:%M:%S"],
years=["%H:%M:%S"],
)
p.y_range.start = -100
p.x_range.range_padding = 0.1
p.yaxis[0].ticker.desired_num_ticks = 5
p.xaxis.major_label_orientation = math.pi/2
show(p)
and here is the result:
As stated in the docs, num_desired_ticks is only a suggestion. If you want a ticks at specific locations that do not change, then you can use a FixedTicker, which can be set by plain list as convenience:
p.xaxis.ticker = [2, 3.5, 4]
For datetimes, you would pass the values as milliseconds since epoch.
If you want a fixed number of ticks, but the locations may change (i.e. because the range may change), then there is nothing built in to do that. You could make a custom ticker extension.
In the Bokeh guide there are examples of various bar charts that can be created. http://docs.bokeh.org/en/0.10.0/docs/user_guide/charts.html#id4
This code will create one:
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Bar(df, 'cyl', values='mpg', title="Total MPG by CYL")
output_file("bar.html")
show(p)
My question is if it's possible to add data labels to each individual bar of the chart? I searched online but could not find a clear answer.
Use Labelset
Use Labelset to create a label over each individual bar
In my example I'm using vbar with the plotting interface, it is a little bit more low level then the Charts interface, but there might be a way to add it into the Bar chart.
from bokeh.palettes import PuBu
from bokeh.io import show, output_notebook
from bokeh.models import ColumnDataSource, ranges, LabelSet
from bokeh.plotting import figure
output_notebook()
source = ColumnDataSource(dict(x=['Áætlaðir','Unnir'],y=[576,608]))
x_label = ""
y_label = "Tímar (klst)"
title = "Tímar; núllti til þriðji sprettur."
plot = figure(plot_width=600, plot_height=300, tools="save",
x_axis_label = x_label,
y_axis_label = y_label,
title=title,
x_minor_ticks=2,
x_range = source.data["x"],
y_range= ranges.Range1d(start=0,end=700))
labels = LabelSet(x='x', y='y', text='y', level='glyph',
x_offset=-13.5, y_offset=0, source=source, render_mode='canvas')
plot.vbar(source=source,x='x',top='y',bottom=0,width=0.3,color=PuBu[7][2])
plot.add_layout(labels)
show(plot)
You can find more about labelset here: Bokeh annotations
NOTE FROM BOKEH MAINTAINERS The portions of the answer below that refer to the bokeh.charts are of historical interest only. The bokeh.charts API was deprecated and subsequently removed from Bokeh. See the answers here and above for information on the stable bokeh.plotting API
Yes, you can add labels to each bar of the chart. There are a few ways to do this. By default, your labels are tied to your data. But you can change what is displayed. Here are a few ways to do that using your example:
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
from bokeh.layouts import gridplot
from pandas import DataFrame
from bokeh.plotting import figure, ColumnDataSource
from bokeh.models import Range1d, HoverTool
# output_file("bar.html")
""" Adding some sample labels a few different ways.
Play with the sample data and code to get an idea what does what.
See below for output.
"""
Sample data (new labels):
I used some logic to determine the new dataframe column. Of course you could use another column already in df (it all depends on what data you're working). All you really need here is to supply a new column to the dataframe.
# One method
labels = []
for number in df['cyl']:
if number == 3:
labels.append("three")
if number == 4:
labels.append("four")
if number == 5:
labels.append("five")
if number == 6:
labels.append("six")
if number == 8:
labels.append("eight")
df['labels'] = labels
Another way to get a new dataframe column. Again, we just need to supply df a new column to use on our bar plot.
# Another method
def new_labels(x):
if x % 2 != 0 or x == 6:
y = "Inline"
elif x % 2 == 0:
y = "V"
else:
y = "nan"
return y
df["more_labels"] = df["cyl"].map(new_labels)
Now the bar chart:
I've done it two ways. p1 just specifies the new labels. Note that because I used strings it put them in alphabetical order on the chart. p2 uses the original labels, plus adds my new labels on the same bar.
# Specifying your labels
p1 = Bar(df, label='labels', values='mpg',
title="Total MPG by CYL, remapped labels, p1",
width=400, height=400, legend="top_right")
p2 = Bar(df, label=['cyl', 'more_labels'], values='mpg',
title="Total MPG by CYL, multiple labels, p2", width=400, height=400,
legend="top_right")
Another way:
Bokeh has three main "interface levels". High level charts provides quick easy access but limited functionality; plotting which gives more options; models gives even more options.
Here I'm using the plotting interface and the Figure class that contains a rect method. This gives you more detailed control of your chart.
# Plot with "intermediate-level" bokeh.plotting interface
new_df = DataFrame(df.groupby(['cyl'])['mpg'].sum())
factors = ["three", "four", "five", "six", "eight"]
ordinate = new_df['mpg'].tolist()
mpg = [x * 0.5 for x in ordinate]
p3 = figure(x_range=factors, width=400, height=400,
title="Total MPG by CYL, using 'rect' instead of 'bar', p3")
p3.rect(factors, y=mpg, width=0.75, height=ordinate)
p3.y_range = Range1d(0, 6000)
p3.xaxis.axis_label = "x axis name"
p3.yaxis.axis_label = "Sum(Mpg)"
A fourth way to add specific labels:
Here I'm using the hover plot tool. Hover over each bar to display your specified label.
# With HoverTool, using 'quad' instead of 'rect'
top = [int(x) for x in ordinate]
bottom = [0] * len(top)
left = []
[left.append(x-0.2) for x in range(1, len(top)+1)]
right = []
[right.append(x+0.2) for x in range(1, len(top)+1)]
cyl = ["three", "four", "five", "six", "eight"]
source = ColumnDataSource(
data=dict(
top=[int(x) for x in ordinate],
bottom=[0] * len(top),
left=left,
right=right,
cyl=["three", "four", "five", "six", "eight"],
)
)
hover = HoverTool(
tooltips=[
("cyl", "#cyl"),
("sum", "#top")
]
)
p4 = figure(width=400, height=400,
title="Total MPG by CYL, with HoverTool and 'quad', p4")
p4.add_tools(hover)
p4.quad(top=[int(x) for x in ordinate], bottom=[0] * len(top),
left=left, right=right, color="green", source=source)
p4.xaxis.axis_label = "x axis name"
Show all four charts in a grid:
grid = gridplot([[p1, p2], [p3, p4]])
show(grid)
These are the ways I am aware of. There may be others. Change whatever you like to fit your needs. Here is what running all of this will output (you'll have to run it or serve it to get the hovertool):
All I'd like to do is create a pie chart. The Bokeh documentation covers a number of sophisticated charts, including a donut chart, but it doesn't seem to cover pie chart.
Is there any example of this?
Ultimately, the chart will need to be to be embedded in a webpage, so I'll need to take advantage of Bokeh's html embed capabilities.
The answer below is very outdated. The Donut function was part of the old bokeh.charts API that was deprecated and removed long ago. For any modern version of Bokeh (e.g. 0.13 or newer) you can create a pie chart using the wedge glyphs, as follows:
from math import pi
import pandas as pd
from bokeh.io import output_file, show
from bokeh.palettes import Category20c
from bokeh.plotting import figure
from bokeh.transform import cumsum
x = { 'United States': 157, 'United Kingdom': 93, 'Japan': 89, 'China': 63,
'Germany': 44, 'India': 42, 'Italy': 40, 'Australia': 35,
'Brazil': 32, 'France': 31, 'Taiwan': 31, 'Spain': 29 }
data = pd.Series(x).reset_index(name='value').rename(columns={'index':'country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Category20c[len(x)]
p = figure(plot_height=350, title="Pie Chart", toolbar_location=None,
tools="hover", tooltips="#country: #value")
p.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend='country', source=data)
show(p)
OUTDATED BELOW
An example for Bokeh 0.8.1 using the bokeh.plotting interface:
from bokeh.plotting import *
from numpy import pi
# define starts/ends for wedges from percentages of a circle
percents = [0, 0.3, 0.4, 0.6, 0.9, 1]
starts = [p*2*pi for p in percents[:-1]]
ends = [p*2*pi for p in percents[1:]]
# a color for each pie piece
colors = ["red", "green", "blue", "orange", "yellow"]
p = figure(x_range=(-1,1), y_range=(-1,1))
p.wedge(x=0, y=0, radius=1, start_angle=starts, end_angle=ends, color=colors)
# display/save everythin
output_file("pie.html")
show(p)
Bokeh >0.9 will correctly compute the bounding area of all glyphs, not just "pointlike" marker glyphs, and explicitly setting the ranges like this will not be required.
NOTE from project maintainers: This answer refers to an old bokeh.charts API that was remove from bokeh a long time ago
A Donut chart will return a simple pie chart if you input a pandas series rather than a dataframe. And it will display labels too!
from bokeh.charts import Donut, show
import pandas as pd
data = pd.Series([0.15,0.4,0.7,1.0], index = list('abcd'))
pie_chart = Donut(data)
show(pie_chart)
Thanks to the answers above for helping me as well. I want to add how to add a legend to your pie-chart as I had some trouble with that. Below is just a snippet. My piechart just had 2 sections. Thus, I just made a pie chart figure and called wedge on it twice:
import numpy as np
percentAchieved = .6
pieFigure = figure(x_range=(-1, 1), y_range=(-1, 1))
starts = [np.pi / 2, np.pi * 2 * percentAchieved + np.pi / 2]
ends = [np.pi / 2+ np.pi * 2 * percentAchieved, np.pi / 2 + 2*np.pi]
pieColors = ['blue', 'red']
#therefore, this first wedge will add a legend entry for the first color 'blue' and label it 'hello'
pieFigure.wedge(x=0, y=0, radius=.7, start_angle=starts, end_angle=ends, color=pieColors, legend="hello")
#this will add a legend entry for the 'red' color and label it 'bye'. Made radius zero to not make
#another piechart overlapping the original one
pieFigure.wedge(x=0, y=0, radius=0, start_angle=starts, end_angle=ends, color=pieColors[1], legend="bye")