Bokeh: chart from pandas dataframe won't update on trigger - python

I have got a pandas dataframe whose columns I want to show as lines in a plot using a Bokeh server. Additionally, I would like to have a slider for shifting one of the lines against the other.
My problem is the update functionality when the slider value changes. I have tried the code from the sliders-example of bokeh, but it does not work.
Here is an example
import pandas as pd
from bokeh.io import vform
from bokeh.plotting import Figure, output_file, show
from bokeh.models import CustomJS, ColumnDataSource, Slider
df = pd.DataFrame([[1,2,3],[3,4,5]])
df = df.transpose()
myindex = list(df.index.values)
mysource = ColumnDataSource(df)
plot = Figure(plot_width=400, plot_height=400)
for i in range(len(mysource.column_names) - 1):
name = mysource.column_names[i]
plot.line(x = myindex, y = str(name), source = mysource)
offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1)
def update_data(attrname, old, new):
# Get the current slider values
a = offset.value
temp = df[1].shift(a)
#to finish#
offset.on_change('value', update_data)
layout = vform(offset, plot)
show(layout)
Inside the update_data-function I have to update mysource, but I cannot figure out how to do that. Can anybody point me in the right direction?

Give this a try... change a=offset.value to a=cb_obj.get('value')
Then put source.trigger('change') after you do whatever it is you are trying to do in that update_data function instead of offset.on_change('value', update_data).
Also change offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1, callback=CustomJS.from_py_func(offset))
Note this format I'm using works with flexx installed. https://github.com/zoofio/flexx if you have Python 3.5 you'll have to download the zip file, extract, and type python setup.py install as it isn't posted yet compiled for this version...

Related

How to remove points from a dataframe based on a selected area on a plot

I have some experimental data that is often flawed with artifacts exemplified with something like this:
I need a quick way to manually select these random spikes and remove them from datasets.
I figured that any plotting library with a focus on interactive plots should have an easy way to do this but so far I keep struggling with finding a simple way to do what I want.
I'm a Matplotlib/Seaborn guy and this calls for interactive solution. I briefly checked Plotly, Bokeh and Altair and decided to go with the first one. My first attempt looks like this:
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interactive, HBox, VBox, Button
url='https://drive.google.com/file/d/1hCX8Bn_y30aXVN_TyHTTx015u44pO9yB/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
df = pd.read_csv(url, index_col=0)
f = go.FigureWidget()
for col in df.columns[-1:]:
f.add_scatter(x = df.index, y=df[col], mode='markers+lines',
selected_marker=dict(size=5, color='red'),
marker=dict(size=1, color='lightgrey', line=dict(width=1, color='lightgrey')))
t = go.FigureWidget([go.Table(
header=dict(values=['selector range'],
fill = dict(color='#C2D4FF'),
align = ['left'] * 5),
cells=dict(values=['None selected' for col in ['ID']],
fill = dict(color='#F5F8FF'),
align = ['left'] * 5)
)])
def selection_fn(trace,points,selector):
t.data[0].cells.values = [selector.xrange]
def update_axes(dataset):
scatter = f.data[0]
scatter.x = df.index
scatter.y = df[dataset]
f.data[0].on_selection(selection_fn)
axis_dropdowns = interactive(update_axes, dataset = df.columns)
button1 = Button(description="Remove points")
button2 = Button(description="Reset")
button3 = Button(description="Fit data")
VBox((HBox((axis_dropdowns.children)), HBox((button1, button2, button3)), f,t))
Which gives:
So I managed to get Selector Box x coordinates (and temporarily print them inside the table widget). But what I couldn't figure out is how to easily bind a function to button1 that would take as an argument Box Selector coordinates and remove selected points from a dataframe and replot the data. So something like this:
def on_button_click_remove(scatter.selector.xrange):
mask = (df.index >= scatter.selector.xrange[0]) & (df.index <= scatter.selector.xrange[1])
clean_df = df.drop(df.index[mask])
scatter(data = clean_df...) #update scatter plot
button1 = Button(description="Remove points", on_click = on_button_click_remove)
I checked https://plotly.com/python/custom-buttons/ but I am still not sure how to use it for my purpose.
I suggest to use Holoviews and Panel.
They are high level visualization tools that facilitate the creation and control of low level bokeh, matplotlib or plotly figures.
Here are an example:
import panel as pn
import holoviews as hv
import pandas as pd
from bokeh.models import ColumnDataSource
# This example use bokeh as backend.
# You can try plotly or matplotlib with minor modification on the codes below.
# For example you can use on_selection callback from Plotly
# https://plotly.com/python/v3/selection-events/
hv.extension('bokeh')
display( pn.extension( ) ) # activate panel
df=pd.read_csv('spiked_data.csv',index_col=0).reset_index()
pt = hv.Points(
data=df, kdims=['index', 'A' ]
).options( marker='x', size=2,
tools=['hover', 'box_select', 'lasso_select', 'reset'],
height=250, width=600
)
fig = hv.render(pt)
source = fig.select({'type':ColumnDataSource})
bt = pn.widgets.Button(name='remove selected')
def rm_sel(evt):
i = df.iloc[source.selected.indices].index # get index to delete
df.drop(i, inplace=True, errors='ignore') # modify dataframe
source.data = df # update data source
source.selected.indices=[] # clear selection
pn.io.push_notebook(app) # update figure
bt.on_click(rm_sel)
app=pn.Column(fig,'Click to delete the selected points', bt)
display(app)
A related example can be found in this SO answer

How to retrieve coordinates of PointDrawTool in Bokeh?

I'm trying to get xy coordinates of points drawn by the user. I want to have them as a dictionary, a list or a pandas DataFrame.
I'm using Bokeh 2.0.2 in Jupyter. There'll be a background image (which is not the focus of this post) and on top, the user will create points that I could use further.
Below is where I've managed to get to (with some dummy data). And I've commented some lines which I believe are the direction in which I'd have to go. But I don't seem to get the grasp of it.
from bokeh.plotting import figure, show, Column, output_notebook
from bokeh.models import PointDrawTool, ColumnDataSource, TableColumn, DataTable
output_notebook()
my_tools = ["pan, wheel_zoom, box_zoom, reset"]
#create the figure object
p = figure(title= "my_title", match_aspect=True,
toolbar_location = 'above', tools = my_tools)
seeds = ColumnDataSource({'x': [2,14,8], 'y': [-1,5,7]}) #dummy data
renderer = p.scatter(x='x', y='y', source = seeds, color='red', size=10)
columns = [TableColumn(field="x", title="x"),
TableColumn(field="y", title="y")]
table = DataTable(source=seeds, columns=columns, editable=True, height=100)
#callback = CustomJS(args=dict(source=seeds), code="""
# var data = source.data;
# var x = data['x']
# var y = data['y']
# source.change.emit();
#""")
#
#seeds.x.js_on_change('change:x', callback)
draw_tool = PointDrawTool(renderers=[renderer])
p.add_tools(draw_tool)
p.toolbar.active_tap = draw_tool
show(Column(p, table))
From the documentation at https://docs.bokeh.org/en/latest/docs/user_guide/tools.html#pointdrawtool:
The tool will automatically modify the columns on the data source corresponding to the x and y values of the glyph. Any additional columns in the data source will be padded with the declared empty_value, when adding a new point. Any newly added points will be inserted on the ColumnDataSource of the first supplied renderer.
So, just check the corresponding data source, seeds in your case.
The only issue here is if you want to know exactly what point has been changed or added. In this case, the simplest solution would be to create a custom subclass of PointDrawTool that does just that. Alternatively, you can create an additional "original" data source and compare seeds to it each time it's updated.
The problem is that the execute it in Python. But show create a static version. Here is a simple example that fix it! I removed the table and such to make it a bit cleaner, but it will also work with it:
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import PointDrawTool
output_notebook()
#create the figure object
p = figure(width=400,height=400)
renderer = p.scatter(x=[0,1], y=[1,2],color='red', size=10)
draw_tool = PointDrawTool(renderers=[renderer])
p.add_tools(draw_tool)
p.toolbar.active_tap = draw_tool
# This part is imporant
def app(doc):
global p
doc.add_root(p)
show(app) #<-- show app and not p!

Why my bokeh server app won't update the figure

here's my data :https://drive.google.com/drive/folders/1CabmdDQucaKW2XhBxQlXVNOSiNRtkMm-?usp=sharing
i want to use the select to choose the stock i want to show;
and slider to choose the year range i want to show;
and checkboxgroup to choose the index i want to compare with.
the problem is when i adjust the slider, the figure will update, but when i use the select and checkboxgroup, the figure won't update,
what's the reason?
from bokeh.io import curdoc
from bokeh.layouts import column, row
from bokeh.models import ColumnDataSource, Slider, TextInput , Select , Div, CheckboxGroup
from bokeh.plotting import figure
import pandas as pd
import numpy as np
price=pd.read_excel('price.xlsx',index_col=0)
# input control
stock = Select(title='Stock',value='AAPL',options=[x for x in list(price.columns) if x not in ['S&P','DOW']])
yr_1 = Slider(title='Start year',value=2015,start=2000,end=2020,step=1)
yr_2 = Slider(title='End year',value=2020,start=2000,end=2020,step=1)
index = CheckboxGroup(labels=['S&P','DOW'],active=[0,1])
def get_data():
compare_index = [index.labels[i] for i in index.active]
stocks = stock.value
start_year = str(yr_1.value)
end_year = str(yr_2.value)
select_list = []
select_list.append(stocks)
select_list.extend(compare_index)
selected = price[select_list]
selected = selected [start_year:end_year]
for col in selected.columns:
selected[col]=selected[col]/selected[col].dropna()[0]
return ColumnDataSource(selected)
def make_plot(source):
fig=figure(plot_height=600, plot_width=700, title="",sizing_mode="scale_both", x_axis_type="datetime")
data_columns = list(source.data.keys())
for data in data_columns[1:]:
fig.line(x=data_columns[0],y=data,source=source,line_width=3, line_alpha=0.6, legend_label=data)
return fig
def update(attrname, old, new):
new_source = get_data()
source.data.clear()
source.data.update(new_source.data)
#get the initial value and plot
source = get_data()
plot = make_plot(source)
#control_update
stock.on_change('value', update)
yr_1.on_change('value', update)
yr_2.on_change('value', update)
index.on_change('active', update)
# Set up layouts and add to document
inputs = column(stock, yr_1, yr_2, index)
curdoc().add_root(row(inputs, plot, width=800))
curdoc().title = "Stocks"
You're creating a new ColumnDataSource for new data. That's not a good approach.
Instead, create it once and then just assign its data as appropriate.
In your case, I would do it like this:
Create ColumnDataSource just once, as described above
Do not use .update on CDS, just reassign .data
Create the legend manually
For that one line that's susceptible to the select change choose a static x field and use it everywhere instead
Change the first legend item's label when you change the select's value to instead of that x field it has the correct name

Multiple HoverTools for different lines (bokeh)

I have more than one line on a bokeh plot, and I want the HoverTool to show the value for each line, but using the method from a previous stackoverflow answer isn't working:
https://stackoverflow.com/a/27549243/3087409
Here's the relevant code snippet from that answer:
fig = bp.figure(tools="reset,hover")
s1 = fig.scatter(x=x,y=y1,color='#0000ff',size=10,legend='sine')
s1.select(dict(type=HoverTool)).tooltips = {"x":"$x", "y":"$y"}
s2 = fig.scatter(x=x,y=y2,color='#ff0000',size=10,legend='cosine')
fig.select(dict(type=HoverTool)).tooltips = {"x":"$x", "y":"$y"}
And here's my code:
from bokeh.models import HoverTool
from bokeh.plotting import figure
source = ColumnDataSource(data=dict(
x = [list of datetimes]
wind = [some list]
coal = [some other list]
)
)
hover = HoverTool(mode = "vline")
plot = figure(tools=[hover], toolbar_location=None,
x_axis_type='datetime')
plot.line('x', 'wind')
plot.select(dict(type=HoverTool)).tooltips = {"y":"#wind"}
plot.line('x', 'coal')
plot.select(dict(type=HoverTool)).tooltips = {"y":"#coal"}
As far as I can tell, it's equivalent to the code in the answer I linked to, but when I hover over the figure, both hover tools boxes show the same value, that of the wind.
You need to add renderers for each plot. Check this. Also do not use samey label for both values change the names.
from bokeh.models import HoverTool
from bokeh.plotting import figure
source = ColumnDataSource(data=df)
plot = figure(x_axis_type='datetime',plot_width=800, plot_height=300)
plot1 =plot.line(x='x',y= 'wind',source=source,color='blue')
plot.add_tools(HoverTool(renderers=[plot1], tooltips=[('wind',"#wind")],mode='vline'))
plot2 = plot.line(x='x',y= 'coal',source=source,color='red')
plot.add_tools(HoverTool(renderers=[plot2], tooltips=[("coal","#coal")],mode='vline'))
show(plot)
The output look like this.

Display only part of Y-axis on Bokeh

Using Bokeh 0.8.1, how can i display a long timeserie, but start 'zoomed-in' on one part, while keeping the rest of data available for scrolling ?
For instance, considering the following time serie (IBM stock price since 1980), how could i get my chart to initially display only price since 01/01/2014 ?
Example code :
import pandas as pd
import bokeh.plotting as bk
from bokeh.models import ColumnDataSource
bk.output_notebook()
TOOLS="pan,wheel_zoom,box_zoom,reset,save"
# Quandl data, too lazy to generate some random data
df = pd.read_csv('https://www.quandl.com/api/v1/datasets/GOOG/NYSE_IBM.csv')
df['Date'] = pd.to_datetime(df['Date'])
df = df[['Date', 'Close']]
#Generating a bokeh source
source = ColumnDataSource()
dtest = {}
for col in df:
dtest[col] = df[col]
source = ColumnDataSource(data=dtest)
# plotting stuff !
p = bk.figure(title='title', tools=TOOLS,x_axis_type="datetime", plot_width=600, plot_height=300)
p.line(y='Close', x='Date', source=source)
bk.show(p)
outputs :
but i want to get this (which you can achieve with the box-zoom tool - but I'd like to immediately start like this)
So, it looks (as of 0.8.1) that we need to add some more convenient ways to set ranges with datetime values. That said, although this is a bit ugly, it does currently work for me:
import time, datetime
x_range = (
time.mktime(datetime.datetime(2014, 1, 1).timetuple())*1000,
time.mktime(datetime.datetime(2016, 1, 1).timetuple())*1000
)
p = bk.figure(
title='title', tools=TOOLS,x_axis_type="datetime",
plot_width=600, plot_height=300, x_range=x_range
)

Categories

Resources