I have a Bokeh plot with a nested categorical x-axis. Here's the code for a toy problem. My real use case is naturally a larger, more complex data set.
import pandas as pd
from bokeh.io import output_notebook, show, reset_output
from bokeh.models import Band, Span, FactorRange, ColumnDataSource
from bokeh.plotting import figure
reset_output()
output_notebook()
data = {'fruit': ['Apples', 'Pears'],
'2015': [2, 1],
'2016': [5, 3]}
tidy_df = (pd.DataFrame(data)
.melt(id_vars=["fruit"], var_name="year")
.assign(fruit_year=lambda df: list(zip(df['fruit'], df['year'])))
.set_index('fruit_year'))
display(tidy_df)
p = figure(x_range=FactorRange(factors=tidy_df.index.unique()),
height=300,
width=300)
cds = ColumnDataSource(tidy_df)
p.circle(x='fruit_year',
y='value',
size=20,
source=cds,
line_color=None,
)
# this does not show anything or cause an error
p.line(
x=[("Apples", 2015), ("Apples", 2016)],
y=[3.5, 3.5],
color="red",
line_width=2
)
# this works, but does not scale to problems where location can't be manually specified
# also, the line does not line up with the data?
p.line(
x=[4, 5],
y=[2, 2],
color="red",
line_width=2
)
show(p)
Output:
This line does not appear on the plot, and does not throw an error:
p.line(
x=[("Apples", 2015), ("Apples", 2016)],
y=[3.5, 3.5],
color="red",
line_width=2
)
How do I specify x to get the line to show up? Can I specify it for an arbitrary sub-level, i.e. just for ("Apples", 2015)?
Very similar question / solution here. (I asked that question and built off that answer.)
The general concept involves creating dataframes based on one's initial data set, then building multiple ColumnDataSources off of those dataframes.
Here is the complete code:
p = figure(x_range=FactorRange(factors=tidy_df.index.unique()),
plot_height=400,
plot_width=400,
tooltips=[('Fruit', '#fruit'), # first string is user-defined; second string must refer to a column
('Year', '#year'),
('Value', '#value')])
cds = ColumnDataSource(tidy_df)
index_cmap = factor_cmap("fruit",
Spectral5[:2],
factors=sorted(tidy_df["fruit"].unique())) # this is a reference back to the dataframe
p.circle(x='fruit_year',
y='value',
size=20,
source=cds,
fill_color=index_cmap,
line_color=None,
)
# add global median
# how to add for each fruit?
median = Span(location=tidy_df["value"].median(), # median value for Apples
#dimension='height',
line_color='orange',
line_dash='dashed',
line_width=1.0
)
p.add_layout(median)
for fruit, stddev in list(zip(tidy_df["fruit"].unique(), tidy_df.groupby("fruit").std().values.flatten())):
b_df = tidy_df[tidy_df['fruit'] == fruit]\
.drop(columns=['fruit', 'year'])\
.assign(lower=lambda df: df['value'].median() - stddev,
upper=lambda df: df['value'].median() + stddev)\
.assign(median=lambda df: df["value"].median())\
.drop(columns='value')
display(b_df)
# create another cds
cds2 = ColumnDataSource(b_df)
p.add_layout(
Band(
base='fruit_year',
lower='lower',
upper='upper',
source=cds2)
)
p.line(x="fruit_year",
y="median",
source=cds2,
color="red",
line_width=2,
line_dash='dashed',
)
show(p)
Related
I want use a for cycle to call a charting function and then represent the outcome chart into a section of a multi charting pageExample single chart
expected outcome
I have a charting function (see below Charting Function Section) that i recall in a the main script with a for cycle to get several charts in sequence. Now I would like to represent all the charts, in compact size (2 columns 4 rows) in one single page. In literature I find that Subplot allows me to do so but I struggle to find the right command to represent the outcome from the charting function.
I thought something like the below in the Main Section would work but it is not
---------- Main Section ---------
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from sklearn.cluster import KMeans
from plotly.subplots import make_subplots
for cont in range(8):
fig = charting_func(cont)
fig_all.add_trace(fig,
row=1, col=1
) #row and col incrementing function to be defined
fig_all.update_layout(height=600, width=800, title_text="Side By Side Subplots")
fig_all.show()
----- Charting Function ------
def charting_func(n_chrt):
# Arbitrarily 10 colors for up to 10 clusters
#colors = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet', 'purple','pink', 'silver']
# Create Scatter plot, assigning each point a color where
# point group = color index.
fig = btc.plot.scatter(
x=btc.index,
y="Adj Close",
color=[colors[i] for i in lists_clusters[n_chrt]],
title="k-values = {0}".format(n_chrt+2)
)
# Add horizontal lines
for cluster_avg in output[n_chrt][1:-1]:
fig.add_hline(y=cluster_avg, line_width=1, line_color="blue")
# Add a trace of the price for better clarity
fig.add_trace(go.Scatter(
x=btc.index,
y=btc['Adj Close'],
line_color="black",
line_width=1
))
# Make it pretty
layout = go.Layout(
plot_bgcolor='#D9D9D9',
showlegend=False,
# Font Families
font_family='Monospace',
font_color='#000000',
font_size=20,
xaxis=dict(
rangeslider=dict(
visible=False
))
)
fig.update_layout(layout)
return fig
type here
The basic form of a subplot is to add a location arrangement to the graph setup. So the functionalization needs to have matrix information or something like that. I have no data to present, so I have taken the stock prices of 4 companies and graphed them. As for the clustering by price, it is not included in the code, so the binning process is used to get the values and labels for the horizontal line. Please rewrite this part to your own logic. If you are good enough, the functionalization should work well.
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import itertools
import yfinance as yf
stock = ['TSLA','MSFT','AAPL','AMD']
#fig = go.Figure()
fig = make_subplots(rows=2, cols=2, subplot_titles=['MSFT','TSLA','AMD','AAPL'])
for s,rc in zip(stock, itertools.product([1,2],[2,1])):
#print(s, rc[0], rc[1])
df = yf.download(s, start="2017-09-01", end="2022-04-01", interval='1mo', progress=False)
colors = ['blue', 'red', 'green', 'purple', 'orange']
s_cut, bins = pd.cut(df['Adj Close'], 5, retbins=True, labels=colors)
fig.add_trace(go.Scatter(mode='markers+lines',
x=df.index,
y=df['Adj Close'],
marker=dict(
size=10,
color=s_cut.tolist()
)),
row=rc[0], col=rc[1]
)
for b in bins[1:-1]:
fig.add_hline(y=b, line_width=1, line_color="blue", row=rc[0], col=rc[1])
fig.update_layout(autosize=True, height=600, title_text="Side By Side Subplots")
fig.show()
I have a Pandas dataframe which has values for the y axis spread over 3 columns. Those I want to show in a categorical y-axis. Then I have a column for x and a column for the color. From those values I want to create a heatmap.
I created the following code, which returns the error E-1019 (DUPLICATE_FACTORS): FactorRange must specify a unique list of categorical factors for an axis
from bokeh.io import show
from bokeh.models import ColumnDataSource, FactorRange, LinearColorMapper
from bokeh.plotting import figure
from bokeh.palettes import Greys256
mapper = LinearColorMapper(palette=Greys256, low=0, high=5, high_color = 'red')
df_in = pd.DataFrame([['cat1', 'ccat1', 'cccat1', 4, 20],['cat1', 'ccat1', 'cccat1', 5, 15],['cat1', 'ccat1', 'cccat1', 6, 10]], columns=['key1','key2', 'key3', 'x', 'color'])
factors = list(df_in[['key1', 'key2', 'key3']].astype(str).itertuples(index=False, name=None))
data = dict(
y=factors,
x=list(df_in['x'].astype(int)),
color=list(df_in['color'].astype(int)),
)
source = ColumnDataSource(data=data)
p = figure(y_range=FactorRange(*factors))
p.rect(y='y', x='x', width=1, height=0.75, source = source, fill_color={'field': 'color', 'transform': mapper})
show(p)
When I construct the dictionary inside data manually by hardcoding it (including duplicated values in key1-3, I do not get this error.
Do I extract the dataframe wrong?
Figured it out by myself:
The FactorRange needs to be unique while the factors in the y axis should keep having an entry for each value in the heatmap.
A list can be made unique with list(set(factors)))
p = figure(y_range=FactorRange(*list(set(factors)))
I have two plots and a data table. I want to select a value in plot 1 and then the corresponding value of plot two should be highlighted in plot 2. In addition I would like to show a data table under both plots with the selected values. Here is what I have so far:
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.io import show
import numpy as np
import pandas as pd
from bokeh.layouts import row
from bokeh.models.widgets import DataTable, TableColumn
df2 = pd.DataFrame(np.array([[1, 3.280, 3.3925], [2, 3.3012, 3.4303], [3, 3.5972, 3.8696]]),
columns=['abspos', 'val1', 'val1_q'])
source = ColumnDataSource(data=df2)
p1 = figure(title="Plot1",
plot_width=1500,
plot_height=900,
x_range=[0, 5],
y_range=[0, 5])
p1.circle('abspos', 'val1', source=source, line_color=None, color='red', size=6)
pq1 = figure(title="Plot2",plot_width=900, plot_height=900)
pq1.circle('val1_q', 'val1', source=source, line_color=None, size=6)
columns = [
TableColumn(field="abspos", title="abspos"),
TableColumn(field="val1", title="val1"),
TableColumn(field="val1_q", title="val1_q")
]
data_table = DataTable(source=source, columns=columns, width=300, height=280)
def plot_both(plot1, plot2):
show(row(plot1, plot2))
plot_both(p1, pq1)
If I select the data point [2, 3.3012] in plot 1, the data point [3.3012, 3.4303] should be highlighted in plot 2. The data table underneath should show [2, 3.3012, 3.4303].
Question 1: How to achieve a highlight in plot 2 according to selected point in plot 1 (and vice versa).
Question 2: How to display a Table under both plots which shows the data of the selected data points.
The simplest way would be to use the tap tool (code below works for Bokeh v2.1.1)
from bokeh.plotting import show, figure
from bokeh.models import ColumnDataSource, Row, Column, CustomJS, DataTable, TableColumn
import numpy as np
import pandas as pd
df2 = pd.DataFrame(np.array([[1, 3.280, 3.3925], [2, 3.3012, 3.4303], [3, 3.5972, 3.8696]]),
columns=['abspos', 'val1', 'val1_q'])
source = ColumnDataSource(data=df2)
p1 = figure(title="Plot1",plot_width=900, plot_height=500, tools="tap,pan,box_zoom,wheel_zoom,save,reset")
p1.circle('abspos', 'val1', source=source, line_color=None, color='blue', size=10)
p2 = figure(title="Plot2",plot_width=900, plot_height=500, tools="tap,pan,box_zoom,wheel_zoom,save,reset")
p2.circle('val1_q', 'val1', source=source, line_color=None, color='blue', size=10)
columns = [
TableColumn(field="abspos", title="abspos"),
TableColumn(field="val1", title="val1"),
TableColumn(field="val1_q", title="val1_q")
]
dt1 = DataTable(source=source, columns=columns, width=900, height=300)
dt2 = DataTable(source=source, columns=columns, width=900, height=300)
show(Column(Row(p1, p2), Row(dt1, dt2)))
If you need more functionality you would need to add a JS callback like p1.js_on_event('tap', CustomJS(args={...}, code="..."))
I'm trying to create a bar chart to see which stores had the biggest revenue in my dataset. Using the default Pandas plot I can do that in one line:
df.groupby('store_name')['sale_value'].sum().sort_values(ascending=False).head(20).plot(kind='bar')
But this chart is not very interactive and I can't see the exact values, so I want to try and create it using Bokeh and be able to mouseover a bar and see the exact amout, for example.
I tried doing the following but just got a blank page:
source = ColumnDataSource(df.groupby('store_name')['sale_value'])
plot = Plot()
glyph = VBar(x='store_name', top='sale_value')
plot.add_glyph(source, glyph)
show(plot)
and if I change source to ColumnDataSource(df.groupby('store_name')['sale_value'].sum()) I get 'ValueError: expected a dict or pandas.DataFrame, got store_name'
How can I create this chart with mouseover using Bokeh?
Let's asume this is our DataFrame:
df = pd.DataFrame({'store_name':['a', 'b', 'a', 'c'], 'sale_value':[4, 5, 2, 4]})
df
>>>
store_name sale_value
0 a 4
1 b 5
2 a 2
3 c 4
Now it is possible to creat a bar chart with your approach.
First we have to do some imports and preprocessing:
from bokeh.models import ColumnDataSource, Grid, LinearAxis, Plot, VBar, Title
source = ColumnDataSource(df.groupby('store_name')['sale_value'].sum().to_frame().reset_index())
my_ticks = [i for i in range(len(source.data['store_name']))]
my_tick_labels = {i: source.data['store_name'][i] for i in range(len(source.data['store_name']))}
There are some changes in the section of the groupby. A .sum() is added and it is reset to a DataFrame with ascending index.
Then you can create a plot.
plot = Plot(title=Title(text='Plot'),
plot_width=300,
plot_height=300,
min_border=0,
toolbar_location=None
)
glyph = VBar(x='index',
top='sale_value',
bottom=0,
width=0.5,
fill_color="#b3de69"
)
plot.add_glyph(source, glyph)
xaxis = LinearAxis(ticker = my_ticks,
major_label_overrides= my_tick_labels
)
plot.add_layout(xaxis, 'below')
yaxis = LinearAxis()
plot.add_layout(yaxis, 'left')
plot.add_layout(Grid(dimension=0, ticker=xaxis.ticker))
plot.add_layout(Grid(dimension=1, ticker=yaxis.ticker))
show(plot)
I also want to show your a second approach I prefere more.
from bokeh.plotting import figure, show
plot = figure(title='Plot',
plot_width=300,
plot_height=300,
min_border=0,
toolbar_location=None
)
plot.vbar(x='index',
top='sale_value',
source=source,
bottom=0,
width=0.5,
fill_color="#b3de69"
)
plot.xaxis.ticker = my_ticks
plot.xaxis.major_label_overrides = my_tick_labels
show(plot)
I like the second one more, because it is a bit shorter.
The created figure is in both cases the same. It looks like this.
I'm trying to display qualitative data using a donut plot with the bokeh library. I have 2 datasets sharing some data labels, and I want to have a unified legend that gathers both labels.
I have managed to either show the legend for only one plot, or have it for both but with repeated items. However, I did not find a way to have unique entries. Here is a sample code to show my issue:
from math import pi
import pandas as pd
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.transform import cumsum
from bokeh.palettes import Set3
# Create fake data
df = pd.DataFrame(
{'label': ['X{}'.format(i) for i in range(0, 4)] + ['X{}'.format(i) for i in range(2, 8)],
'angle': [2*pi / 4] * 4 + [2*pi / 6] * 6,
'group': [1]*4 + [2]*6})
# Set up colors
unique_labels = df.label.unique()
color_mapping = pd.Series(dict(zip(unique_labels, Set3[len(unique_labels)])))
df['color'] = color_mapping.loc[df.label].values
# Plot two concentric donuts
p = figure(title='Test', tools="hover", tooltips="#label")
p.annular_wedge(source=df[df.group==1], x=0, y=1, inner_radius=0.5, outer_radius=0.6,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_group='label')
p.annular_wedge(source=df[df.group==2], x=0, y=1, inner_radius=0.3, outer_radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_group='label')
show(p)
In the end, I get the following result:
Any idea to solve it?
I found other related issues (i.e. matplotlib), but not for bokeh.
I think this will work:
legend_tmp = {x.label['value']: x for x in p.legend.items}
p.legend.items.clear()
p.legend.items.extend(legend_tmp.values())
When it creates the legend for the plot, it is adding all of the items for both angular_wedge's but it doesn't get deduplicated the way you might expect, since the legend members are fairly complex objects themselves (meaning they are identified by more than just the value of the label).