Unable to populate Bokeh HoverTool Values from Passed in Pandaframe - python

I am using Bokeh on Jupyter Notebooks to help with data visualization. I wanted to be able to plot the data from a panda DataFrame, and then when I hover over the Bokeh plot, all the feature values should be visible in the hover Box. However, with the code below, only the index correctly displays, and all the other fields appear as ???, and I'm not sure why.
Here is my working example
//Importing all the neccessary things
import numpy as np
import pandas as pd
from bokeh.layouts import row, widgetbox, column
from bokeh.models import CustomJS, Slider, Select, HoverTool
from bokeh.plotting import figure, output_file, show, ColumnDataSource
from bokeh.io import push_notebook, output_notebook, curdoc
from bokeh.client import push_session
#from bokeh.scatter_with_hover import scatter_with_hover
output_notebook()
np.random.seed(0)
samples = np.random.randint(low = 0, high = 1000, size = 1000)
samples = samples.reshape(200,5)
cols = ["A", "B", "C", "D", "E"]
df = pd.DataFrame(samples, columns=cols)
# Here is a dict of some keys that I want to be able to pick from for plotting
labels = list(df.columns.values)
axis_map = {key:key for key in labels}
code2 = ''' var data = source.data;
//axis values with select widgets
var value1 = val1.value;
var value2 = val2.value;
var original_data = original_source.data
// get data corresponding to selection
x = original_data[value1];
y = original_data[value2];
data['x'] = x;
data['y'] = y;
source.trigger('change');
// set axis labels
x_axis.axis_label = value1;
y_axis.axis_label = value2;
'''
datas = "datas"
source = ColumnDataSource(data=dict( x=df['A'], y=df['B'],
label = labels, datas = df))
original_source = ColumnDataSource(data=df.to_dict(orient='list'))
a= source.data[datas].columns.values
#print a.columns.values
print a
TOOLS = [ HoverTool(tooltips= [(c, '#' + c) for c in source.data[datas].columns.values] +
[('index', '$index')] )]
# hover.tooltips.append(('index', '$index'))
#plot the figures
plot = figure(plot_width=800, plot_height=800, tools= TOOLS)
plot.scatter(x= "x",y="y", source=source, line_width=2, line_alpha=0.6,
size = 3)
callback = CustomJS(args=dict(source=source, original_source = original_source,
x_axis=plot.xaxis[0],y_axis=plot.yaxis[0]), code=code2)
#Create two select widgets to pick the features of interest
x_axis = Select(title="X Axis", options=sorted(axis_map.keys()), value="A", callback = callback)
callback.args["val1"] = x_axis
callbackDRange.args["val1"]= x_axis
y_axis = Select(title="Y Axis", options=sorted(axis_map.keys()), value="B", callback = callback)
callback.args["val2"] = y_axis
callbackDRange.args["val2"]= y_axis
plot.xaxis[0].axis_label = 'A'
plot.yaxis[0].axis_label = 'B'
#Display the graph in a jupyter notebook
layout = column(plot, x_axis, y_axis )
show(layout, notebook_handle=True)
I'm even passing in the full dataframe into the source ColumnDataSource so I can access it later, but it won't work. Any guidance would be greatly appreciated!

Running your code in recent version of Bokeh results in the warning:
Which suggests the root of the problem. If we actually look at the data source you create for the glyph:
source = ColumnDataSource(data=dict(x=df['A'],
y=df['B'],
label=labels,
datas = df))
It's apparent what two things are going wrong:
You are violating the fundamental assumption that all CDS columns must always be the same length at all times. The CDS is like a "Cheap DataFrame", i.e. it doesn't have ragged columns. All the columns must have the same length just like a DataFrame.
You are configuring a HoverTool to report values from columns named "A", "B", etc. but your data source has no such columns at all! The CDS you create above for your glyph has columns named:
"x", "y"
"labels" which is the wrong length
"datas" which has a bad value, columns should normally be 1-d arrays
The last column "datas" is irrelevant, BTW. I think you must think the hover tool will somehow look in that column for information to display but that is not how hover tool works. Bottom line:
If you configured a tooltip for a field #A then your CDS must have a column named "A"
And that's exactly what's not the case here.
It's hard to say exactly how you should change your code without more context about what exactly you want to do. I guess I'd suggest taking a closer look at the documention for hover tools in the User's Guide

Related

Bokeh hover special variable `$data_x` shows number instead of FactorRange category label for multi-line glyph

I am using Bokeh multi_line to show several lines using a categorical x_range,
and would like hover to display the x category hovered. I thought $data_x might help, but it shows numerical values related to category indexes rather than the category labels. I can use CustomJSHover with special_vars["segment_index"] to display what I want, but is there a simpler way?
To demonstrate, this code creates a figure with multi_line():
from collections import defaultdict
import pandas as pd
from bokeh import palettes
from bokeh.plotting import show, figure
from bokeh.models import CustomJSHover, HoverTool
# Substantive data.
df_data = pd.DataFrame.from_records([
dict(date="2001 Q1", output=100, inputs=100),
dict(date="2001 Q2", output=105, inputs=102),
dict(date="2001 Q3", output=110, inputs=105),
])
# Make list of lists for multi_line(), with metadata.
lines_data = defaultdict(list)
for var in ["inputs", "output"]:
lines_data["variable"].append(var)
lines_data["date"].append(df_data["date"])
lines_data["value"].append(df_data[var])
lines_data["color"] = palettes.Category10_10[:2]
fig = figure(
x_range = df_data["date"],
plot_height=400,
)
fig.multi_line(
source = lines_data,
xs = "date",
ys = "value",
color = "color",
legend_group = "variable",
line_width = 5,
line_alpha = 0.6,
hover_line_alpha = 1.0, # Highlight hover line.
)
The hover I want can be created like this using CustomJSHover:
# Custom hover formatting #date.
hover_date = CustomJSHover(
# Show value[$segment_index].
code="""
console.log("> Show value[$segment_index] hover", value);
return "" + value[special_vars["segment_index"]];
""")
fig.add_tools(HoverTool(
tooltips=[
('variable', '#variable'),
('date', '#date{custom}'), # Show hovered date only.
('value', '$data_y'),
],
formatters={'#date': hover_date},
))
show(fig)
Potentially a more straightforward hover specification would use something like $data_x without a custom format, except $data_x itself apparently does not look up the label in the FactorRange (applying this HoverTool instead of the one above):
# Simple hover showing $data_x.
fig.add_tools(HoverTool(
tooltips=[
('variable', '#variable'),
('date', '$data_x'), # Does not show x_range value!
('value', '$data_y'),
]))
show(fig)
Now, hovering over a line shows a 'date' like "1.500" instead of "2001 Q2" etc.
Am I missing a trick, or is CustomJSHover the best way to show the x category?

How to retrieve coordinates of PointDrawTool in Bokeh?

I'm trying to get xy coordinates of points drawn by the user. I want to have them as a dictionary, a list or a pandas DataFrame.
I'm using Bokeh 2.0.2 in Jupyter. There'll be a background image (which is not the focus of this post) and on top, the user will create points that I could use further.
Below is where I've managed to get to (with some dummy data). And I've commented some lines which I believe are the direction in which I'd have to go. But I don't seem to get the grasp of it.
from bokeh.plotting import figure, show, Column, output_notebook
from bokeh.models import PointDrawTool, ColumnDataSource, TableColumn, DataTable
output_notebook()
my_tools = ["pan, wheel_zoom, box_zoom, reset"]
#create the figure object
p = figure(title= "my_title", match_aspect=True,
toolbar_location = 'above', tools = my_tools)
seeds = ColumnDataSource({'x': [2,14,8], 'y': [-1,5,7]}) #dummy data
renderer = p.scatter(x='x', y='y', source = seeds, color='red', size=10)
columns = [TableColumn(field="x", title="x"),
TableColumn(field="y", title="y")]
table = DataTable(source=seeds, columns=columns, editable=True, height=100)
#callback = CustomJS(args=dict(source=seeds), code="""
# var data = source.data;
# var x = data['x']
# var y = data['y']
# source.change.emit();
#""")
#
#seeds.x.js_on_change('change:x', callback)
draw_tool = PointDrawTool(renderers=[renderer])
p.add_tools(draw_tool)
p.toolbar.active_tap = draw_tool
show(Column(p, table))
From the documentation at https://docs.bokeh.org/en/latest/docs/user_guide/tools.html#pointdrawtool:
The tool will automatically modify the columns on the data source corresponding to the x and y values of the glyph. Any additional columns in the data source will be padded with the declared empty_value, when adding a new point. Any newly added points will be inserted on the ColumnDataSource of the first supplied renderer.
So, just check the corresponding data source, seeds in your case.
The only issue here is if you want to know exactly what point has been changed or added. In this case, the simplest solution would be to create a custom subclass of PointDrawTool that does just that. Alternatively, you can create an additional "original" data source and compare seeds to it each time it's updated.
The problem is that the execute it in Python. But show create a static version. Here is a simple example that fix it! I removed the table and such to make it a bit cleaner, but it will also work with it:
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import PointDrawTool
output_notebook()
#create the figure object
p = figure(width=400,height=400)
renderer = p.scatter(x=[0,1], y=[1,2],color='red', size=10)
draw_tool = PointDrawTool(renderers=[renderer])
p.add_tools(draw_tool)
p.toolbar.active_tap = draw_tool
# This part is imporant
def app(doc):
global p
doc.add_root(p)
show(app) #<-- show app and not p!

Interactively change a point plot in bokeh using RangeSlider to select columns in a pandas dataframe

I have a pandas dataframe df where the first two columns represent x, y coordinates and the remaining columns represent time slices (t0,...tn) where the presence(1) or absence(0) of each point at each time slice (ti) is recorded.
I would like to use a RangeSlider (not a Slider) so that I can slide across a range of time slices and plot points that are present within that range.
This is what I got thus far,
from bokeh.layouts import column
from bokeh.plotting import figure, show
from bokeh.models import CustomJS, ColumnDataSource
from bokeh.models.widgets import RangeSlider
# pts is a dataframe with columns (x, y, t0, t1,...t19)
src = ColumnDataSource(data = pts)
p = figure(plot_height = 500)
p.circle(source= src, x='x', y= 'y', size=2, color="navy", alpha=0.1)
callback = CustomJS( args = dict(source = src), code="""
var data = source.data;
// changed ti range
var ti_start = cb.obj.value[0] + 2 //offset
var ti_end = cb.obj.value[1] + 2
// change data (how to select columns???????)
data = data[ti_start:ti_end]
source.change.emit()
""")
ti_slider = RangeSlider(start=0, end=19, value=(1,2), step=1, title="Time Period",
callback = callback)
layout = column(ti_slider, p)
show(layout)
The above code does not work at all. The points are plotted and the RangeSlider appears but when I alter the range or slide across nothing happens. I am not able to restrict the columns that make up the data source (i.e. dataframe). I have tried changing the code that selects the columns but I don't know any javascript.
This is my first time trying to use the CustomJS function with bokeh.
There are a number of issues in the code above:
It is cb_obj not cb.obj
Use modern js_on_change, not very old ad-hoc callback parameters
You are assigning to a local variable data and then throwing away the result— need to literally assign to source.data at some point for there to be any effect.
To do this by updating data sources you would need two data sources, on that always has the complete data that you pull from, and another that you only use to hold the subset. If you only have one data source and you subset it, you've now thrown away data you can never get back. (Future subsets will be against the current subset, not the whole)
So, better to use CDSView for this, which lets you express an update-able subset view to apply to a constant data source.
JS does not have Pandas-like operations, you just have to do all the nested looping to check every row to determine the subset indices
Just guessing, but you will probably want to fix x/y ranges if you intend to maintain the same spatial extent for comparison, as the slider moves.
Here is a simplified working example:
from bokeh.layouts import column
from bokeh.plotting import figure, show
from bokeh.models import CustomJS, ColumnDataSource, RangeSlider, CDSView, IndexFilter
source = ColumnDataSource(data=dict(
x=[1,2,3,4],
y=[1,1,1,1],
t0=[1,1,0,0],
t1=[0,1,0,0],
t2=[0,1,1,0],
t3=[0,0,1,1],
t4=[0,0,0,1],
t5=[0,0,0,0],
))
p = figure(plot_height=500, x_range=(0,5), y_range=(0,2))
view = CDSView(source=source, filters=[IndexFilter([0, 1])])
p.circle('x', 'y', size=10, color="navy", alpha=0.8,
source=source, view=view)
callback = CustomJS(args=dict(source=source, view=view), code="""
const start = cb_obj.value[0]
const end = cb_obj.value[1]
const indices = []
for (var i=0; i < source.get_length(); i++) {
for (var j=start; j<=end; j++) {
if (source.data["t" + j][i]==1) {
indices.push(i)
break
}
}
}
view.indices = indices
""")
ti_slider = RangeSlider(start=0, end=5, value=(0,1), step=1, title="Time Period")
ti_slider.js_on_change('value', callback)
show(column(ti_slider, p))

Create a stacked graph or bar graph using plotly in python

I have data like this :
[ ('2018-04-09', '10:18:11',['s1',10],['s2',15],['s3',5])
('2018-04-09', '10:20:11',['s4',8],['s2',20],['s1',10])
('2018-04-10', '10:30:11',['s4',10],['s5',6],['s6',3]) ]
I want to plot a stacked graph preferably of this data.
X-axis will be time,
it should be like this
I created this image in paint just to show.
X axis will show time like normal graph does( 10:00 ,April 3,2018).
I am stuck because the string value (like 's1',or 's2' ) will change in differnt bar graph.
Just to hard code and verify,I try this:
import plotly
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import matplotlib
plotly.offline.init_notebook_mode()
def createPage():
graph_data = []
l1=[('com.p1',1),('com.p2',2)('com.p3',3)]
l2=[('com.p1',1),('com.p4',2)('com.p5',3)]
l3=[('com.p2',8),('com.p3',2)('com.p6',30)]
trace_temp = go.Bar(
x='2018-04-09 10:18:11',
y=l1[0],
name = 'top',
)
graph_data.append(trace_temp)
plotly.offline.plot(graph_data, filename='basic-scatter3.html')
createPage()
Error I am getting is Tuple Object is not callable.
So can someone please suggest some code for how I can plot such data.
If needed,I may store data in some other form which may be helpful in plotting.
Edit :
I used the approach suggested in accepted answer and succeed in plotting using plotly like this
fig=df.iplot(kin='bar',barmode='stack',asFigure=True)
plotly.offline.plt(fig,filename="stack1.html)
However I faced one error:
1.When Time intervals are very close,Data overlaps on graph.
Is there a way to overcome it.
You could use pandas stacked bar plot. The advantage is that you can create with pandas easily the table of column/value pairs you have to generate anyhow.
from matplotlib import pyplot as plt
import pandas as pd
all_data = [('2018-04-09', '10:18:11', ['s1',10],['s2',15],['s3',5]),
('2018-04-09', '10:20:11', ['s4',8], ['s2',20],['s1',10]),
('2018-04-10', '10:30:11', ['s4',10],['s5',6], ['s6',3]) ]
#load data into dataframe
df = pd.DataFrame(all_data, columns = list("ABCDE"))
#combine the two descriptors
df["day/time"] = df["A"] + "\n" + df["B"]
#assign each list to a new row with the appropriate day/time label
df = df.melt(id_vars = ["day/time"], value_vars = ["C", "D", "E"])
#split each list into category and value
df[["category", "val"]] = pd.DataFrame(df.value.values.tolist(), index = df.index)
#create a table with category-value pairs from all lists, missing values are set to NaN
df = df.pivot(index = "day/time", columns = "category", values = "val")
#plot a stacked bar chart
df.plot(kind = "bar", stacked = True)
#give tick labels the right orientation
plt.xticks(rotation = 0)
plt.show()
Output:

Bokeh: chart from pandas dataframe won't update on trigger

I have got a pandas dataframe whose columns I want to show as lines in a plot using a Bokeh server. Additionally, I would like to have a slider for shifting one of the lines against the other.
My problem is the update functionality when the slider value changes. I have tried the code from the sliders-example of bokeh, but it does not work.
Here is an example
import pandas as pd
from bokeh.io import vform
from bokeh.plotting import Figure, output_file, show
from bokeh.models import CustomJS, ColumnDataSource, Slider
df = pd.DataFrame([[1,2,3],[3,4,5]])
df = df.transpose()
myindex = list(df.index.values)
mysource = ColumnDataSource(df)
plot = Figure(plot_width=400, plot_height=400)
for i in range(len(mysource.column_names) - 1):
name = mysource.column_names[i]
plot.line(x = myindex, y = str(name), source = mysource)
offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1)
def update_data(attrname, old, new):
# Get the current slider values
a = offset.value
temp = df[1].shift(a)
#to finish#
offset.on_change('value', update_data)
layout = vform(offset, plot)
show(layout)
Inside the update_data-function I have to update mysource, but I cannot figure out how to do that. Can anybody point me in the right direction?
Give this a try... change a=offset.value to a=cb_obj.get('value')
Then put source.trigger('change') after you do whatever it is you are trying to do in that update_data function instead of offset.on_change('value', update_data).
Also change offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1, callback=CustomJS.from_py_func(offset))
Note this format I'm using works with flexx installed. https://github.com/zoofio/flexx if you have Python 3.5 you'll have to download the zip file, extract, and type python setup.py install as it isn't posted yet compiled for this version...

Categories

Resources