using bokeh to create a bar graph - python

there is an example on the bokeh website:
https://docs.bokeh.org/en/latest/docs/gallery/bar_nested.html
but it does not work on my Jupiter notebook.
I have the following data frame:
precision recall f1
Random Forest 0.493759 1.0 0.661096
XGBoost 0.493759 1.0 0.661096
I want to build a graph that compares the two models on these 3 metrics.
But to start, I just wanted to compare one metric. this is my non-working code:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
data = pd.DataFrame({'precision':[percision_rf,percision_xgb],'recall':[recall_rf,recall_xgb],'f1':[f1_rf,f1_xgb]})
data.rename({0:'Random Forest',1:'XGBoost'}, inplace=True)
source = ColumnDataSource(data=data)
p = figure()
p.vbar(x='Random Forest', top=0.9, width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
There is an example of a simple bar graph on the bokeh website, but it is not using a ColumnDataSource.

When you pass a DataFrame to a ColumnDataSource, Bokeh makes CDS columns out of the columns of the DataFrame. Those are what you can refer to in the glyph methods, and then the glyph will draw glyphs for all values of that column. For example, in the example above, you could do
# plot bars for every precision value along the x axis
p.vbar(x='precision', top=0.9, width=0.9, source=source)
All Bokeh glyphs are inherently "vectorized" in this way.
In the above code, x='Random Forest' is not meaningful to pass to vbar, because there is no column in the DataFrame (and hence no column in the CDS) called "Random Forest".

Related

Is there a way to add a 3rd, 4th and 5th y axis using Bokeh?

I would like to add multiple y axes to a bokeh plot (similar to the one achieved using matplotlib in the attached image).
Would this also be possible using bokeh? The resources I found demonstrate a second y axis.
Thanks in advance!
Best Regards,
Pranit Iyengar
Yes, this is possible. To add a new axis to the figure p use p.extra_y_ranges["my_new_axis_name"] = Range1d(...). Do not write p.extra_y_ranges = {"my_new_axis_name": Range1d(...)} if you want to add multiple axis, because this will overwrite and not extend the dictionary. Other range objects are also valid, too.
Minimal example
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import LinearAxis, Range1d
output_notebook()
data_x = [1,2,3,4,5]
data_y = [1,2,3,4,5]
color = ['red', 'green', 'magenta', 'black']
p = figure(plot_width=500, plot_height=300)
p.line(data_x, data_y, color='blue')
for i, c in enumerate(color, start=1):
name = f'extra_range_{i}'
lable = f'extra range {i}'
p.extra_y_ranges[name] = Range1d(start=0, end=10*i)
p.add_layout(LinearAxis(axis_label=lable, y_range_name=name), 'left')
p.line(data_x, data_y, color=c, y_range_name=name)
show(p)
Output
Official example
See also the twin axis example (axis) on the official webpage. This example uses the same syntax with only two axis. Another example is the twin axis example for models.

Whisker not showing up in Bokeh plot

I had a homework problem recently where we were given a data set and asked to calculate some parameters for a model distribution, with confidence intervals. I wanted to make a quick plot with error bars to display the data, but I can't get the Whisker to show up at all. I'm using Bokeh 2.2.1 so I don't think it's a problem with the version, and the example whisker code from the Bokeh documentation works as well.
Here is the code I wrote for the plot:
from bokeh.io import show
from bokeh.models import ColumnDataSource, Whisker
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
groups= ['Het', 'Wt', 'Mut']
vals = [mle_het[0], mle_wt[0], mle_mut[0]]
upper = [conf_int_het[1][0], conf_int_wt[1][0], conf_int_mut[1][0]]
lower = [conf_int_het[0][0], conf_int_wt[0][0], conf_int_mut[0][0]]
source = ColumnDataSource(data=dict(groups=groups, vals=vals, upper=upper, lower=lower))
p = figure(x_range=groups, plot_height=350, title="Mu MLEs with Error Bars # 95% Confidence Interval", y_range=(0,40))
p.add_layout(
Whisker(source=source, base="groups", upper="upper", lower="lower")
)
p.circle(x='groups', y = 'vals', size=15, source=source, legend_group="groups",
line_color='white', fill_color=factor_cmap('groups', palette=["#962980","#295f96","#29966c"],
factors=groups))
p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_center"
show(p)
The vals, upper, and lower lists are just three floats each that I'm pulling from the data earlier in the code.
There's a link to the plot I'm getting, everything shows up fine except the error bars. I don't get any error messages either. If anyone has any idea how to fix it I'd be grateful!
plot1
This is a bug that'll be fixed in 2.3:
https://github.com/bokeh/bokeh/issues/10575

Make the colour AND marker of bokeh plot scatter points dependent on dataframe values

I've been playing around with bokeh in order to get an interactive scatter plot, with tooltips and interactive legends etc.
Currently I am able to set the colour of the points using the values of a column in the pandas dataframe behind the plot. However I'm wondering if it's possible to set the marker type (diamond, circle, square etc.) as well, using another column in the dataframe?
I appreciate this would mean you'd need a double legend, but hopefully this wouldn't be too much of a problem.
This can be accomplished with marker_map and CDS filters:
from bokeh.plotting import figure, show, output_file
from bokeh.sampledata.iris import flowers
from bokeh.transform import factor_cmap, factor_mark
SPECIES = ['setosa', 'versicolor', 'virginica']
MARKERS = ['hex', 'circle_x', 'triangle']
p = figure(title = "Iris Morphology", background_fill_color="#fafafa")
p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Sepal Width'
p.scatter("petal_length", "sepal_width", source=flowers, legend="species",
fill_alpha=0.4, size=12,
marker=factor_mark('species', MARKERS, SPECIES),
color=factor_cmap('species', 'Category10_3', SPECIES))
show(p)

Position the legend outside the plot area with Bokeh

I am making a plot following the example found here
Unfortunately, I have 17 curves I need to display, and the legend overlaps them. I know I can create a legend object that can be displayed outside the plot area like here, but I have 17 curves so using a loop is much more convenient.
Do you know how to combine both methods?
Ok, I found the solution. See the code below where I have just modified the interactive legend example:
import pandas as pd
from bokeh.palettes import Spectral4
from bokeh.plotting import figure, output_file, show
from bokeh.sampledata.stocks import AAPL, IBM, MSFT, GOOG
from bokeh.models import Legend
from bokeh.io import output_notebook
output_notebook()
p = figure(plot_width=800, plot_height=250, x_axis_type="datetime", toolbar_location='above')
p.title.text = 'Click on legend entries to mute the corresponding lines'
legend_it = []
for data, name, color in zip([AAPL, IBM, MSFT, GOOG], ["AAPL", "IBM", "MSFT", "GOOG"], Spectral4):
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
c = p.line(df['date'], df['close'], line_width=2, color=color, alpha=0.8,
muted_color=color, muted_alpha=0.2)
legend_it.append((name, [c]))
legend = Legend(items=legend_it)
legend.click_policy="mute"
p.add_layout(legend, 'right')
show(p)
I'd like to expand on joelostbloms answer.
It is also possible to pull out the legend from an existing plot and add it
somewhere else after the plot has been created.
from bokeh.palettes import Category10
from bokeh.plotting import figure, show
from bokeh.sampledata.iris import flowers
# add a column with colors to the data
colors = dict(zip(flowers['species'].unique(), Category10[10]))
flowers["color"] = [colors[species] for species in flowers["species"]]
# make plot
p = figure(height=350, width=500)
p.circle("petal_length", "petal_width", source=flowers, legend_group='species',
color="color")
p.add_layout(p.legend[0], 'right')
show(p)
It is also possible to place legends outside the plot areas for auto-grouped, indirectly created legends. The trick is to create an empty legend and use add_layout to place it outside the plot area before using the glyph legend_group parameter:
from bokeh.models import CategoricalColorMapper, Legend
from bokeh.palettes import Category10
from bokeh.plotting import figure, show
from bokeh.sampledata.iris import flowers
color_mapper = CategoricalColorMapper(
factors=[x for x in flowers['species'].unique()], palette=Category10[10])
p = figure(height=350, width=500)
p.add_layout(Legend(), 'right')
p.circle("petal_length", "petal_width", source=flowers, legend_group='species',
color=dict(field='species', transform=color_mapper))
show(p)
A note on visibility as the above answers, while useful, didn't see me successfully place the legend below the plot and others may come across this too.
Where the plot_height or height are set for the figure as so:
p = figure(height=400)
But the legend is created as in Despee1990's answer and then placed below the plot as so:
legend = Legend(items=legend_it)
p.add_layout(legend, 'below')
Then the legend is not displayed, nor the plot.
If the location is changed to the right:
p.add_layout(legend, 'right')
...then the legend is only displayed where the items fit within the figure plot height. I.e. if you have a plot height of 400 but the legend needs a height of 800 then you won't see the items that don't fit within the plot area.
To resolve this either remove the plot height from the figure entirely or specify a height sufficient to include the height of the legend items box.
i.e. either:
p = figure()
or if Legend required height = 800 and glyph required height is 400:
p = figure(plot_height=800)
p.add_layout(legend, 'below')

Bokeh bar plot: color bars by category

I'm tweaking the second example located here.
Here is my code:
from bokeh.charts import BoxPlot, Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
output_file("bar.html")
p = Bar(df, values='mpg', label='cyl', color='origin', legend="top_left",
title="MPG Summary (grouped and shaded by CYL)")
show(p)
There are three changes: (1) I used a Bar plot, (2) I changed the color attribute to a different categorical variable and (3) I added a legend attribute.
The problem is between (2) and (3) I believe. More specifically, the legend becomes tuples of the label and color attributes because they are different - when they are the same, the chart and the legend work properly.
This is a basic feature of ggplot2 in R and I thought it would work here. Am I doing something wrong or is this a bug?
bokeh version 0.12.0
Update with image:
The bokeh.charts API, including Bar was deprecated and removed in 2017. Since then, much work was done to improved the stable and supported bokeh.plotting API, and it it now possible to easily create many kinds of categorical and bar plots. Many examples can be found in the Handling Categorical Data chapter of the Users Guide.
It's not exactly clear what you are trying to accomplish with your plot. Using the same data, here is a plot of car counts broken down by origin and number of cylinders:
from bokeh.core.properties import value
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg import autompg as df
# Bokeh categories are strings
df.cyl = [str(x) for x in df.cyl]
df.origin = [str(x) for x in df.origin]
# pivot to wide format
df = df.pivot_table(index='cyl', columns='origin', values='mpg', fill_value=0, aggfunc='count')
p = figure(title="Count by cylinder and origin", x_axis_label="Cylinders",
x_range=sorted(df.index))
p.y_range.start = 0
p.vbar_stack(df.columns, x='cyl', width=0.9, color=["#c9d9d3", "#718dbf", "#e84d60"],
source=df, legend=[value(x) for x in df.columns])
show(p)
For an even higher level, data-centric API that let's you do this with even less code, you might check out Holoviews which is built on top of Bokeh.

Categories

Resources