I am intending to create a scatter plot with a linear colormapper. The dataset is the popular Female Literacy and Birthrate dataset.
The plot would have the "GDP per capita" on the x axis and "Life Expectancy at Birth" on the y axis. In addition to this (and this is where i am running into the issue), is to vary the color of the points according to "Birth rate".
Current Code:
#DATA MANIPULATION
# import Pandas, Bokeh, etc
import numpy as np
import pandas as pd
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.palettes import Viridis256 as palette
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg as df
from bokeh.transform import linear_cmap
# load the data file
excel_file = '../factbook.xlsx'
#(removed url above since it is private)
factbook = pd.read_excel(excel_file)
source = ColumnDataSource(factbook)
colormapper = linear_cmap(field_name = factbook["Birth rate"], palette=palette, low=min(factbook["Birth rate"]), high=max(factbook["Birth rate"]))
p = figure(title = "UN Factbook Bubble Visualization",
x_axis_label = 'GDP per capita', y_axis_label = 'Life expectancy at birth')
p.circle(x = 'GDP per capita', y = 'Life expectancy at birth', source = source, color =colormapper)
output_file("file", title="Bubble Graph")
show(p)
the p.circle line is having an issue with consuming the colormapper. I would like help on understanding how to resolve this.
The field_name parameter should be provided with the name of a column. You are supplying the entire data column itself. Since you have not provided a complete runnable example, it is impossible to test for sure, but presumably you want:
linear_cmap(field_name="Birth rate", ...)
Related
I plotted a line graphs with using bokeh on Python. I want to highlight and take the values (Max-Min-x, y coordinates) of the selected areas with "Box Select tool" like shown below. when I choose a certain section on the graph with "box select tool" the color of the selected part does not change. How to solve this problem?
Example
import numpy as np
import pandas as pd
from bokeh.plotting import figure,show,output_file
from bokeh.models import ColumnDataSource
output_file("PlottingTest.html")
dataset = pd.read_csv("data.csv")
data = dataset.iloc[:,3]
time = np.linspace(1, 500, num = 500)
TOOLS ="pan,wheel_zoom,reset,hover,poly_select,xbox_select,lasso_select"
s1 = ColumnDataSource(data=dict(x=time, y=data))
p = figure(title = 'Test',x_axis_label = 'time', y_axis_label='csv Data',plot_width=1000, plot_height=500,tools=TOOLS)
p.line ('date', 't1', source=s1, selection_color="orange")
p.line(time, data, legend_label="Current", line_width=1)
p.toolbar.autohide = True
show(p)
I'm trying to highlight last value of a time series plot by plot its value on yaxis, as shown in this question. I prefer using LabelSet over Legend because you can precisely control the text positions and also using a data source to update it. But unfortunately, I can not find out how to draw label text outside the plot box.
Here is some code to plot LabelSet and notice how the text is only shown inside the box (66.1x is partially blocked by yaxis):
import pandas as pd
from bokeh.io import output_notebook
output_notebook()
from bokeh.plotting import figure, show
from bokeh.models import LabelSet, ColumnDataSource
#import bokeh.sampledata
#bokeh.sampledata.download()
from bokeh.sampledata.stocks import MSFT
df = pd.DataFrame(MSFT)[:50]
df["date"] = pd.to_datetime(df["date"])
p = figure(
x_axis_type="datetime", width=1000, toolbar_location='left',
title = "MSFT Candlestick", y_axis_location="right")
p.line(df.date, df.close)
ds = ColumnDataSource({'x': [df.date.iloc[-1]], 'y': [df.close.iloc[-1]], 'text': [' ' + str(df.close.iloc[-1])]})
ls = LabelSet(x='x', y='y', text='text', source=ds)
p.add_layout(ls)
show(p)
Please let me know how to show LabelSet outside the box, Thanks
I have a dataframe that details sales of various product categories vs. time. I'd like to make a "line and marker" plot of sales vs. time, per category. To my surprise, this appears to be very difficult in Bokeh.
The scatter plot is easy. But then trying to overplot a line of sales vs. date with the same source (so I can update both scatter and line plots in one go when the source updates) and in such a way that the colors of the line match the colors of the scatter plot markers proves near impossible.
Minimal reproducible example with contrived data:
import pandas as pd
df = pd.DataFrame({'Date':['2020-01-01','2020-01-02','2020-01-01','2020-01-02'],\
'Product Category':['shoes','shoes','grocery','grocery'],\
'Sales':[100,180,21,22],'Colors':['red','red','green','green']})
df['Date'] = pd.to_datetime(df['Date'])
from bokeh.io import output_notebook
output_notebook()
from bokeh.io import output_file, show
from bokeh.plotting import figure
source = ColumnDataSource(df)
plot = figure(x_axis_type="datetime", plot_width=800, toolbar_location=None)
plot.scatter(x="Date",y="Sales",size=15, source=source, fill_color="Colors", fill_alpha=0.5, \
line_color="Colors",legend="Product Category")
for cat in list(set(source.data['Product Category'])):
tmp = source.to_df()
col = tmp[tmp['Product Category']==cat]['Colors'].values[0]
plot.line(x="Date",y="Sales",source=source, line_color=col)
show(plot)
Here's what it looks like, which is clearly wrong:
Here's what I want and don't know how to make:
Can Bokeh not make such plots, where scatter markers and lines have the same color per category, with a legend?
With bokeh it is often helpful to first think about the visualisation you want and then structuring the data source appropriately. You want two lines, on per category, the x axis is time and y axis is the sales. Then a natural way to structure your data source is the following:
df = pd.DataFrame({'Date':['2020-01-01','2020-01-02'],
'Shoe Sales':[100, 180],
'Grocery Sales': [21, 22]
})
from bokeh.io import output_notebook
output_notebook()
from bokeh.io import output_file, show
from bokeh.plotting import figure
source = ColumnDataSource(df)
plot = figure(x_axis_type="datetime", plot_width=800, toolbar_location=None)
categories = ["Shoe Sales", "Grocery Sales"]
colors = {"Shoe Sales": "red", "Grocery Sales": "green"}
for category in categories:
plot.scatter(x="Date",y=category,size=15, source=source, fill_color=colors[category], legend=category)
plot.line(x="Date",y=category,source=source, line_color=colors[category])
show(plot)
The solutions is to group your data. Then you can plot lines for each group.
Minimal Example
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
output_notebook()
df = pd.DataFrame({'Date':['2020-01-01','2020-01-02','2020-01-01','2020-01-02'],
'Product Category':['shoes','shoes','grocery','grocery'],
'Sales':[100,180,21,22],'Colors':['red','red','green','green']})
df['Date'] = pd.to_datetime(df['Date'])
plot = figure(x_axis_type="datetime",
plot_width=400,
plot_height=400,
toolbar_location=None
)
plot.scatter(x="Date",
y="Sales",
size=15,
source=df,
fill_color="Colors",
fill_alpha=0.5,
line_color="Colors",
legend_field="Product Category"
)
for color in df['Colors'].unique():
plot.line(x="Date", y="Sales", source=df[df['Colors']==color], line_color=color)
show(plot)
Output
I'm trying to plot a simple heatmap using bokeh/holoviews. My data (pandas dataframe) has categoricals (on y) and datetime (on x). The problem is that the number of categorical elements is >3000 and the resulting plot appears with messed overlapped tickers on the y axis that makes it totally useless. Currently, is there a reliable way in bokeh to select only a subset of the tickers based on the zoom level?
I've already tried plotly and the result looks perfect but however I need to use bokeh/holoviews and datashader. I want also avoid to replace categoricals with numericals tickers.
I've also tried this solution but actually it doesn't work (bokeh 1.2.0).
This is a toy example representing my use case (Actually here #y is 1000 but it gives the idea)
from datetime import datetime
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.transform import linear_cmap
from bokeh.io import output_notebook
output_notebook()
# build sample data
index = pd.date_range(start='1/1/2019', periods=1000, freq='T')
data = np.random.rand(1000,100)
columns = ['col'+ str(n) for n in range(100)]
# initial data format
df = pd.DataFrame(data=data, index=index, columns=columns)
# bokeh
df = df.stack().reset_index()
df.rename(columns={'level_0':'x','level_1':'y', 0:'z'},inplace=True)
df.sort_values(by=['y'],inplace=True)
x = [
date.to_datetime64().astype('M8[ms]').astype('O')
for date in df.x.to_list()
]
data = {
'value': df.z.to_list(),
'x': x,
'y': df.y.to_list(),
'date' : df.x.to_list()
}
p = figure(x_axis_type='datetime', y_range=columns, width=900, tooltips=[("x", "#date"), ("y", "#y"), ("value", "#value")])
p.rect(x='x', y='y', width=60*1000, height=1, line_color=None,
fill_color=linear_cmap('value', 'Viridis256', low=df.z.min(), high=df.z.max()), source=data)
show(p)
Finally, I partially followed the suggestion from James and managed to get it to work using a python callback for the ticker. This solution was hard to find for me. I really searched all the Bokeh docs, examples and source code for days.
The main problem for me is that in the doc is not mentioned how I can use "ColumnDataSource" objects in the custom callback.
https://docs.bokeh.org/en/1.2.0/docs/reference/models/formatters.html#bokeh.models.formatters.FuncTickFormatter.from_py_func
Finally, this helped a lot:
https://docs.bokeh.org/en/1.2.0/docs/user_guide/interaction/callbacks.html#customjs-with-a-python-function.
So, I modified the original code as follow in the hope it can be useful to someone:
from datetime import datetime
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.transform import linear_cmap
from bokeh.io import output_notebook
from bokeh.models import FuncTickFormatter
from bokeh.models import ColumnDataSource
output_notebook()
# build sample data
index = pd.date_range(start='1/1/2019', periods=1000, freq='T')
data = np.random.rand(1000,100)
columns_labels = ['col'+ str(n) for n in range(100)]
columns = [n for n in range(100)]
# initial data format
df = pd.DataFrame(data=data, index=index, columns=columns)
# bokeh
df = df.stack().reset_index()
df.rename(columns={'level_0':'x','level_1':'y', 0:'z'},inplace=True)
df.sort_values(by=['y'],inplace=True)
x = [
date.to_datetime64().astype('M8[ms]').astype('O')
for date in df.x.to_list()
]
data = {
'value': df.z.to_list(),
'x': x,
'y': df.y.to_list(),
'y_labels_tooltip' : [columns_labels[k] for k in df.y.to_list()],
'y_ticks' : columns_labels*1000,
'date' : df.x.to_list()
}
cd = ColumnDataSource(data=data)
def ticker(source=cd):
labels = source.data['y_ticks']
return "{}".format(labels[tick])
#p = figure(x_axis_type='datetime', y_range=columns, width=900, tooltips=[("x", "#date{%F %T}"), ("y", "#y_labels"), ("value", "#value")])
p = figure(x_axis_type='datetime', width=900, tooltips=[("x", "#date{%F %T}"), ("y", "#y_labels_tooltip"), ("value", "#value")])
p.rect(x='x', y='y', width=60*1000, height=1, line_color=None,
fill_color=linear_cmap('value', 'Viridis256', low=df.z.min(), high=df.z.max()), source=cd)
p.hover.formatters = {'date': 'datetime'}
p.yaxis.formatter = FuncTickFormatter.from_py_func(ticker)
p.yaxis[0].ticker.desired_num_ticks = 20
show(p)
The result is this:
I get two different results when I use bokehs circle (ordiamond_cross` function) and line function. The line function includes negative values and the circle does not.
Plot with line
and a plot with diamond_cross
I want to plot the temperatures for a certain place over a timespan. I have a lot of values therefore I would like to make a scatterplot through bokeh.
I also get the same problem when I use the x function.
In my code below you can change the diamond_cross with line and remove the fill_alpha and size then you will probably also get two different graphs.
import pandas as pd
import numpy as np
import bokeh as bk
import scipy.special
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
from bokeh.models.glyphs import Quad
from bokeh.layouts import gridplot
df = pd.read_csv('KNMI2.csv', sep=';',
usecols= ['YYYYMMDD','Techt', 'YYYY', 'MM', 'D'])
jan= df[df['MM'].isin(['1'])]
source_jan = ColumnDataSource(jan)
p = figure(plot_width = 800, plot_height = 800,
x_range=(0,32), y_range=(-20,20))
p.diamond_cross(x='D', y='Techt', source=source_jan,
fill_alpha=0.2, size=2)
p.title.text = 'Temperatuur per uur vanaf 1951 tot 2019'
p.xaxis.axis_label = 'januari'
p.yaxis.axis_label = 'Temperatuur (C)'
show(p)
If both the circle/ diamond_cross function work the same as the line function then their plots will also show negative values.
I had a similar issue where the the data type of the variable I was trying to plot was string instead of int.
Try using
jan = df[df['MM'].isin(['1'])]
jan['Techt'] = jan['Techt'].astype(int)
source_jan = ColumnDataSource(jan)