Whisker not showing up in Bokeh plot - python

I had a homework problem recently where we were given a data set and asked to calculate some parameters for a model distribution, with confidence intervals. I wanted to make a quick plot with error bars to display the data, but I can't get the Whisker to show up at all. I'm using Bokeh 2.2.1 so I don't think it's a problem with the version, and the example whisker code from the Bokeh documentation works as well.
Here is the code I wrote for the plot:
from bokeh.io import show
from bokeh.models import ColumnDataSource, Whisker
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
groups= ['Het', 'Wt', 'Mut']
vals = [mle_het[0], mle_wt[0], mle_mut[0]]
upper = [conf_int_het[1][0], conf_int_wt[1][0], conf_int_mut[1][0]]
lower = [conf_int_het[0][0], conf_int_wt[0][0], conf_int_mut[0][0]]
source = ColumnDataSource(data=dict(groups=groups, vals=vals, upper=upper, lower=lower))
p = figure(x_range=groups, plot_height=350, title="Mu MLEs with Error Bars # 95% Confidence Interval", y_range=(0,40))
p.add_layout(
Whisker(source=source, base="groups", upper="upper", lower="lower")
)
p.circle(x='groups', y = 'vals', size=15, source=source, legend_group="groups",
line_color='white', fill_color=factor_cmap('groups', palette=["#962980","#295f96","#29966c"],
factors=groups))
p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_center"
show(p)
The vals, upper, and lower lists are just three floats each that I'm pulling from the data earlier in the code.
There's a link to the plot I'm getting, everything shows up fine except the error bars. I don't get any error messages either. If anyone has any idea how to fix it I'd be grateful!
plot1

This is a bug that'll be fixed in 2.3:
https://github.com/bokeh/bokeh/issues/10575

Related

Python Bokeh plotting tool - Changing the font size of the ticker(both X and Y axis)

The problem which traps me is that I want to enlarge the font size in the ticker on both x and y-axis.
I am using the Bokeh as the tool for plotting. I can generate a neat plot now. But the ticker is way too small. As I went through google, I hardly find the solution. Huge thank. (Enlarge the font size within the red box)
You need the major_label_text_font_size attribute:
from bokeh.io import show
from bokeh.plotting import figure
p = figure()
p.circle(0, 0)
p.xaxis.major_label_text_font_size = "20px"
show(p)

using bokeh to create a bar graph

there is an example on the bokeh website:
https://docs.bokeh.org/en/latest/docs/gallery/bar_nested.html
but it does not work on my Jupiter notebook.
I have the following data frame:
precision recall f1
Random Forest 0.493759 1.0 0.661096
XGBoost 0.493759 1.0 0.661096
I want to build a graph that compares the two models on these 3 metrics.
But to start, I just wanted to compare one metric. this is my non-working code:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
data = pd.DataFrame({'precision':[percision_rf,percision_xgb],'recall':[recall_rf,recall_xgb],'f1':[f1_rf,f1_xgb]})
data.rename({0:'Random Forest',1:'XGBoost'}, inplace=True)
source = ColumnDataSource(data=data)
p = figure()
p.vbar(x='Random Forest', top=0.9, width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
There is an example of a simple bar graph on the bokeh website, but it is not using a ColumnDataSource.
When you pass a DataFrame to a ColumnDataSource, Bokeh makes CDS columns out of the columns of the DataFrame. Those are what you can refer to in the glyph methods, and then the glyph will draw glyphs for all values of that column. For example, in the example above, you could do
# plot bars for every precision value along the x axis
p.vbar(x='precision', top=0.9, width=0.9, source=source)
All Bokeh glyphs are inherently "vectorized" in this way.
In the above code, x='Random Forest' is not meaningful to pass to vbar, because there is no column in the DataFrame (and hence no column in the CDS) called "Random Forest".

How to add multiple Y axis with chartify to draw Elbow curves

I'd like to create a line chart but with 2 distinct Y axis with a different scale to replace this piece of code which generates 2 charts:
ch = chartify.Chart(blank_labels=True)
ch.set_title("Elbow method with Euclidian distance")
ch.plot.line(
data_frame=df_elbow,
x_column='K',
y_column='Distortion',
line_width=1)
ch.show()
ch = chartify.Chart(blank_labels=True)
ch.set_title("Elbow method with sum of squared errors")
ch.plot.line(
data_frame=df_elbow,
x_column='K',
y_column='SSE',
line_width=1)
ch.show()
Thanks !
Update:
2nd y-axis plots have been implemented! See chartify.examples.chart_second_axis()
Old answer:
At the moment there isn't support for 2nd y-axis plots, but I'll add in an issue for it. Thanks for the suggestion!
For now I'd suggest falling back on Bokeh. See an example here.
Thanks, here is what I did using the Bokeh figure while waiting for chartify to support 2 axis:
import bokeh.plotting
from bokeh.models import LinearAxis, Range1d
ch = chartify.Chart(blank_labels=True)
ch.set_title("Elbow method to find optimal K")
ch.set_subtitle("Euclidian distance (Blue) and sum of squared errors (Red)")
ch.figure.y_range = Range1d(5, 14)
ch.figure.line(x=df_elbow['K'], y=df_elbow['Distortion'], line_width=1, line_color="Blue")
ch.figure.extra_y_ranges = {"sum": Range1d(start=200000, end=1200000)}
ch.figure.add_layout(LinearAxis(y_range_name="sum"), 'right')
ch.figure.line(x=df_elbow['K'], y=df_elbow['SSE'], line_width=1, y_range_name='sum', line_color="Red")
ch.show()

Datashader canvas.line() aliasing

I use bokeh to plot temperature curves, but in some cases the dataset is quite big (> 500k measurements) and I'm have a laggy user experience with bokeh (event with output_backend="webgl"). So I'm experimenting datashader to get a faster rendering and a smoother user experience.
But the visual result given by datashader is not as beautiful as bokeh's result, datashader result has aliasing :
I obtain this side-by-side comparison with the following code :
import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf
from bokeh.plotting import figure
from bokeh.io import output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.layouts import row
import numpy as np
output_notebook()
# generate signal
n = 2000
start = 0
end = 70
signal = [np.sin(x) for x in np.arange(start, end, step=(end-start)/n)]
signal = pd.DataFrame(signal, columns=["signal"])
signal = signal.reset_index()
# create a bokeh plot
source = ColumnDataSource(signal)
p = figure(plot_height=300, plot_width=400, title="bokeh plot")
p.line(source=source, x="index", y="signal")
# create a datashader image and put it in a bokeh plot
x_range = (signal["index"].min(), signal["index"].max())
y_range = (signal["signal"].min(), signal["signal"].max())
cvs = ds.Canvas(x_range=x_range, y_range=y_range, plot_height=300, plot_width=400)
agg = cvs.line(signal, 'index', 'signal')
img = tf.shade(agg)
image_source = ColumnDataSource(data=dict(image = [img.data]))
q = figure(x_range=x_range, y_range=y_range, plot_height=300, plot_width=400, title="datashader + bokeh")
q.image_rgba(source = image_source,
image="image",
dh=(y_range[1] - y_range[0]),
dw=(x_range[1] - x_range[0]),
x=x_range[0],
y=y_range[0],
dilate=False)
# visualize both plot, bokeh on left
show(row(p, q))
Have you any idea how to fix this aliasing and get a smooth result ? (similar to bokeh's result)
Here's a runnable version of your code, using HoloViews in a Jupyter notebook:
import pandas as pd, numpy as np, holoviews as hv
from holoviews.operation.datashader import datashade, dynspread
hv.extension("bokeh")
%opts Curve RGB [width=400]
n, start, end = 2000, 0, 70
sine = [np.sin(x) for x in np.arange(start, end, step=(end-start)/n)]
signal = pd.DataFrame(sine, columns=["signal"]).reset_index()
curve = hv.Curve(signal)
curve + datashade(curve)
It's true that the datashaded output here doesn't look very nice. Datashader's timeseries support, like the rest of datashader, was designed to allow accurate accumulation and summation of huge numbers of mathematically perfect (i.e., infinitely thin) curves on a raster grid, so that every x location on every curve will fall into one and only one y location in the grid. Here you just seem to want server-side rendering of a large timeseries, which requires partial incrementing of multiple nearby bins in the grid and isn't something that datashader is optimized for yet.
One thing you can do already is to render the curve at a high resolution then "spread" it so that each non-zero pixel will show up in neighboring pixels as well:
curve + dynspread(datashade(curve, height=1200, width=1200, dynamic=False, \
cmap=["#30a2da"]), max_px=3, threshold=1)
Here I set the color to match Bokeh's default, then forced HoloView's "dynspread" function to spread by 3 pixels. Using Datashader+Bokeh as in your version you would do ``img = tf.spread(tf.shade(agg), px=3)` and increase the plot size in the Canvas call to get a similar result.
I haven't tried running a simple smoothing filter over the result of tf.shade() or tf.spread(), but those both just return RGB images, so some filter like that would probably give good results.
The real solution would be to implement an optional antialiased line-drawing function for datashader, operating when the lines are drawn first rather than fixing up the pixels later, but that would take some work. Contributions welcome!

Bokeh bar plot: color bars by category

I'm tweaking the second example located here.
Here is my code:
from bokeh.charts import BoxPlot, Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
output_file("bar.html")
p = Bar(df, values='mpg', label='cyl', color='origin', legend="top_left",
title="MPG Summary (grouped and shaded by CYL)")
show(p)
There are three changes: (1) I used a Bar plot, (2) I changed the color attribute to a different categorical variable and (3) I added a legend attribute.
The problem is between (2) and (3) I believe. More specifically, the legend becomes tuples of the label and color attributes because they are different - when they are the same, the chart and the legend work properly.
This is a basic feature of ggplot2 in R and I thought it would work here. Am I doing something wrong or is this a bug?
bokeh version 0.12.0
Update with image:
The bokeh.charts API, including Bar was deprecated and removed in 2017. Since then, much work was done to improved the stable and supported bokeh.plotting API, and it it now possible to easily create many kinds of categorical and bar plots. Many examples can be found in the Handling Categorical Data chapter of the Users Guide.
It's not exactly clear what you are trying to accomplish with your plot. Using the same data, here is a plot of car counts broken down by origin and number of cylinders:
from bokeh.core.properties import value
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg import autompg as df
# Bokeh categories are strings
df.cyl = [str(x) for x in df.cyl]
df.origin = [str(x) for x in df.origin]
# pivot to wide format
df = df.pivot_table(index='cyl', columns='origin', values='mpg', fill_value=0, aggfunc='count')
p = figure(title="Count by cylinder and origin", x_axis_label="Cylinders",
x_range=sorted(df.index))
p.y_range.start = 0
p.vbar_stack(df.columns, x='cyl', width=0.9, color=["#c9d9d3", "#718dbf", "#e84d60"],
source=df, legend=[value(x) for x in df.columns])
show(p)
For an even higher level, data-centric API that let's you do this with even less code, you might check out Holoviews which is built on top of Bokeh.

Categories

Resources