Remove old data (flush buffer) from Bokeh Plot fed with AjaxDataSource? - python

Is it possible to remove old data that is streamed in append mode when using AjaxDataSource?
Here's the code I'm running:
import numpy as np
from flask import Flask, jsonify, make_response, request
from bokeh.plotting import figure, show
from bokeh.models import AjaxDataSource, CustomJS
# Bokeh related code
adapter = CustomJS(code="""
const result = {x: [], y: []}
const pts = cb_data.response.points
for (i=0; i<pts.length; i++) {
result.x.push(pts[i][0])
result.y.push(pts[i][1])
}
return result
""")
source = AjaxDataSource(data_url='http://localhost:5050/data',
polling_interval=200, adapter=adapter, mode='append', max_size=20)
p = figure(plot_height=300, plot_width=800, background_fill_color="lightgrey",
title="Streaming Noisy sin(x) via Ajax")
p.circle('x', 'y', source=source)
p.x_range.follow = "end"
p.x_range.follow_interval = 10
# Flask related code
app = Flask(__name__)
def crossdomain(f):
def wrapped_function(*args, **kwargs):
resp = make_response(f(*args, **kwargs))
h = resp.headers
h['Access-Control-Allow-Origin'] = '*'
h['Access-Control-Allow-Methods'] = "GET, OPTIONS, POST"
h['Access-Control-Max-Age'] = str(21600)
requested_headers = request.headers.get('Access-Control-Request-Headers')
if requested_headers:
h['Access-Control-Allow-Headers'] = requested_headers
return resp
return wrapped_function
x = list(np.arange(0, 6, 0.1))
y = list(np.sin(x) + np.random.random(len(x)))
#app.route('/data', methods=['GET', 'OPTIONS', 'POST'])
#crossdomain
def data():
x.append(x[-1]+0.1)
y.append(np.sin(x[-1])+np.random.random())
return jsonify(points=list(zip(x,y)))
# show and run
show(p)
app.run(port=5050)
This is modified slightly from the example here: https://github.com/bokeh/bokeh/blob/1.3.2/examples/howto/ajax_source.py
I simply added mode='append', max_size=20 on the line that creates the AjaxDataSource.
My issue is that I want to flush/reset the AjaxDataSource buffer occasionally. My use-case is plotting data over time. If I pause the data generation, I'm able to pause the AjaxDataSource no problem, but when I resume the plotting the graph jumps to the latest data point and keeps all of the old data. What I want to have happen is that I want the old data to be flushed from the AjaxDataSource internal buffer. I was trying to use the ResetTool functionality to accomplish this, however upon much reflection and soul-searching I don't believe this is the intended use of this function. So the question remains, how can I flush the internal buffer of AjaxDataSource?
Side Note: The other option for AjaxDataSource is non-append mode, in which case the entire graph is replaced with new dataset. I don't want to use this mode, as there is a lot of data and using append mode is MUCH more efficient from CPU/Memory perspective.

Related

Return Figure in FastAPI

Im trying to return a matplotlib.figure.Figure in FastAPI.
If I save it like an image it works (code here):
#router.get("/graph/{id_file}", name="Return the graph obtained")
async def create_graph(id_file: str):
data = HAR.createGraph(id_file)
graph = HAR.scatterplot(data['dateTimes'], data['label'], "Time", "Activity")
graph.savefig('saved_figure.jpg')
return FileResponse('saved_figure.jpg')
Where graph is my Figure.
But I would like to show it without saving in mi computer.
savefig can accept binary file-like object. It can be used to achieve what you want.
The code could be:
from io import BytesIO
from starlette.responses import StreamingResponse
...
#router.get("/graph/{id_file}", name="Return the graph obtained")
async def create_graph(id_file: str):
data = HAR.createGraph(id_file)
graph = HAR.scatterplot(data['dateTimes'], data['label'], "Time", "Activity")
# create a buffer to store image data
buf = BytesIO()
graph.savefig(buf, format="png")
buf.seek(0)
return StreamingResponse(buf, media_type="image/png")

How can I cache my SQL result so I don't have to call SQL repeatedly to get data for Dash plots?

I am trying to build a dashboard that will generate several plots based on a single SQL data query. I want the query to be modifiable via the dashboard (e.g. to query a different order amount or similar), and then change all plots at once. The query maybe expensive so I don't want it to run N times for N different plots.
I have tried to do this using the flask cache decorator #cache.memoize(), similar to the example given in the docs: https://dash.plotly.com/performance
Here is a stripped back version of what I'm doing. I can tell that the query_data function is not doing what I intend because:
1. the resulting graphs show different data points on the x-axis. If it was using the same cached dataset the data points in x should be the same
2. The print statements in the query_data function come out twice everytime I change an input cell.
Can anyone explain why this isn't working or how I can achieve what I want.
import sys
import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px
from dash.dependencies import Input, Output
from setup_redshift import setup_connection
from flask_caching import Cache
from datetime import datetime
import pandas as pd
conn = setup_connection()
app = dash.Dash(__name__)
cache = Cache(app.server, config={
# 'CACHE_TYPE': 'filesystem',
'CACHE_TYPE': 'memcached',
'CACHE_DIR': 'cache-directory'
})
sql_query = '''select i.order_amount_in_usd, r.calibrated_score, r.score
from datalake.investigations i
inner join datalagoon.prod_model_decision r
ON i.investigation_id = r.investigation_id
where i.team_id = {}
AND i.order_amount_in_usd < {}
AND r.calibrated_score >= 0
order by RANDOM()
limit 1000'''
#cache.memoize()
def query_data(team_id, max_usd):
print("Calling data query now with team_id={} and max_usd={} at time {}".format(team_id, max_usd, datetime.now()))
_sql = sql_query.format(team_id, max_usd)
print(_sql)
data = pd.read_sql(sql_query.format(team_id, max_usd), conn)
print("data is {} rows ".format(len(data)))
print("data max usd is {}".format(data['order_amount_in_usd'].max()))
return data
#app.callback(Output(component_id='output-graph', component_property='figure'),
[Input(component_id='data-select-team-id', component_property='value'),
Input(component_id='data-select-max-usd', component_property='value')])
def plot_data(team_id, max_usd):
print("calling query_data at from graph at {}".format(datetime.now()))
in_data = query_data(team_id, max_usd)
print("going to make graph1 now at {}".format(datetime.now()))
fig = px.scatter(in_data,
x='order_amount_in_usd',
y='calibrated_score')
return fig
#app.callback(Output(component_id='output-graph2', component_property='figure'),
[Input(component_id='data-select-team-id', component_property='value'),
Input(component_id='data-select-max-usd', component_property='value')])
def plot_second_data(team_id, max_usd):
print("calling query_data at from graph2 at {}".format(datetime.now()))
in_data = query_data(team_id, max_usd)
print("going to make graph2 now at {}".format(datetime.now()))
fig = px.scatter(in_data,
x='order_amount_in_usd',
y='score')
return fig
app.layout = html.Div( # style={'backgroundColor': colors['background']},
children=[dcc.Input(id='data-select-team-id',
value=7625,
placeholder='Input Team ID',
type='number',
min=0,
max=1_000_000_000,
debounce=True
),
dcc.Input(id='data-select-max-usd',
value=5000,
type='number',
debounce=True),
dcc.Graph(id='output-graph'),
dcc.Graph(id='output-graph2')]
)
if __name__ == '__main__':
app.run_server(debug=True)
In the past Ive stored the results using dcc.Store (see here)
You could structure your app like this:
Run the SQL query and store the results using dcc.Store (local or
memory depending on your use case). This only runs once (per app load, interval timer or user button refresh etc)
Callbacks to generate different
cuts of the data in dash tables or charts would load the store
If the results of the query are large (see 'Storage Limitations; in the above link) then you should save the results to a local flat file such as JSON or CSV and read that each time.
An alternative is to use PostgreSQL and materialized views to make the SQL query cheap (with a trade off on storage space)
These approaches makes the dash app appear very responsive to the user while allowing the analysis of large data

plotting multiple lines of streaming data in a bokeh server application

I'm trying to build a bokeh application with streaming data that tracks multiple "strategies" as they are generated in a prisoners-dilemma agent based model. I've run into a problem trying to get my line plots NOT to connect all the data points in one line. I put together this little demo script that replicates the issue. I've read lots of documentation on line and multi_line rendering in bokeh plots, but I just haven't found something that seems to match my simple case. You can run this code & it will automatically open a bokeh server at localhost:5004 ...
from bokeh.server.server import Server
from bokeh.application import Application
from bokeh.application.handlers.function import FunctionHandler
from bokeh.plotting import figure, ColumnDataSource
from bokeh.models import Button
from bokeh.layouts import column
import random
def make_document(doc):
# Create a data source
data_source = ColumnDataSource({'step': [], 'strategy': [], 'ncount': []})
# make a list of groups
strategies = ['DD', 'DC', 'CD', 'CCDD']
# Create a figure
fig = figure(title='Streaming Line Plot',
plot_width=800, plot_height=400)
fig.line(x='step', y='ncount', source=data_source)
global step
step = 0
def button1_run():
global callback_obj
if button1.label == 'Run':
button1.label = 'Stop'
button1.button_type='danger'
callback_obj = doc.add_periodic_callback(button2_step, 100)
else:
button1.label = 'Run'
button1.button_type = 'success'
doc.remove_periodic_callback(callback_obj)
def button2_step():
global step
step = step+1
for i in range(len(strategies)):
new = {'step': [step],
'strategy': [strategies[i]],
'ncount': [random.choice(range(1,100))]}
fig.line(x='step', y='ncount', source=new)
data_source.stream(new)
# add on_click callback for button widget
button1 = Button(label="Run", button_type='success', width=390)
button1.on_click(button1_run)
button2 = Button(label="Step", button_type='primary', width=390)
button2.on_click(button2_step)
doc.add_root(column(fig, button1, button2))
doc.title = "Now with live updating!"
apps = {'/': Application(FunctionHandler(make_document))}
server = Server(apps, port=5004)
server.start()
if __name__ == '__main__':
server.io_loop.add_callback(server.show, "/")
server.io_loop.start()
My hope was that by looping thru the 4 "strategies" in the example (after clicking button2), I could stream the new data coming out of the simulation into a line plot for that one strategy and step only. But what I get is one line with all four values connected vertically, then one of them connected to the first one at the next step. Here's what it looks like after a few steps:
I noticed that if I move data_source.stream(new) out of the for loop, I get a nice single line plot, but of course it is only for the last strategy coming out of the loop.
In all the bokeh multiple line plotting examples I've studied (not the multi_line glyph, which I can't figure out and which seems to have some issues with the Hover tool), the instructions seem pretty clear: if you want to render a second line, you add another fig.line renderer to an existing figure, and it draws a line with the data provided in source=data_source for this line. But even though my for-loop collects and adds data separately for each strategy, I don't get 4 line plots, I get only one.
Hoping I'm missing something obvious! Thanks in advance.
Seems like you need a line per strategy, not a line per step. If so, here's how I would do it:
import random
from bokeh.application import Application
from bokeh.application.handlers.function import FunctionHandler
from bokeh.layouts import column
from bokeh.models import Button
from bokeh.palettes import Dark2
from bokeh.plotting import figure, ColumnDataSource
from bokeh.server.server import Server
STRATEGIES = ['DD', 'DC', 'CD', 'CCDD']
def make_document(doc):
step = 0
def new_step_data():
nonlocal step
result = [dict(step=[step],
ncount=[random.choice(range(1, 100))])
for _ in STRATEGIES]
step += 1
return result
fig = figure(title='Streaming Line Plot', plot_width=800, plot_height=400)
sources = []
for s, d, c in zip(STRATEGIES, new_step_data(), Dark2[4]):
# Generate the very first step right away
# to avoid having a completely empty plot.
ds = ColumnDataSource(d)
sources.append(ds)
fig.line(x='step', y='ncount', source=ds, color=c)
callback_obj = None
def button1_run():
nonlocal callback_obj
if callback_obj is None:
button1.label = 'Stop'
button1.button_type = 'danger'
callback_obj = doc.add_periodic_callback(button2_step, 100)
else:
button1.label = 'Run'
button1.button_type = 'success'
doc.remove_periodic_callback(callback_obj)
def button2_step():
for src, data in zip(sources, new_step_data()):
src.stream(data)
# add on_click callback for button widget
button1 = Button(label="Run", button_type='success', width=390)
button1.on_click(button1_run)
button2 = Button(label="Step", button_type='primary', width=390)
button2.on_click(button2_step)
doc.add_root(column(fig, button1, button2))
doc.title = "Now with live updating!"
apps = {'/': Application(FunctionHandler(make_document))}
server = Server(apps, port=5004)
if __name__ == '__main__':
server.io_loop.add_callback(server.show, "/")
server.start()
server.io_loop.start()
Thank you, Eugene. Your solution got me back on the right track. I played around with it a bit more and ended up with the following:
import colorcet as cc
from bokeh.server.server import Server
from bokeh.application import Application
from bokeh.application.handlers.function import FunctionHandler
from bokeh.plotting import figure, ColumnDataSource
from bokeh.models import Button
from bokeh.layouts import column
import random
def make_document(doc):
# make a list of groups
strategies = ['DD', 'DC', 'CD', 'CCDD']
# initialize some vars
step = 0
callback_obj = None
colors = cc.glasbey_dark
# create a list to hold all CDSs for active strategies in next step
sources = []
# Create a figure container
fig = figure(title='Streaming Line Plot - Step 0', plot_width=800, plot_height=400)
# get step 0 data for initial strategies
for i in range(len(strategies)):
step_data = dict(step=[step],
strategy = [strategies[i]],
ncount=[random.choice(range(1, 100))])
data_source = ColumnDataSource(step_data)
color = colors[i]
# this will create one fig.line renderer for each strategy & its data for this step
fig.line(x='step', y='ncount', source=data_source, color=color, line_width=2)
# add this CDS to the sources list
sources.append(data_source)
def button1_run():
nonlocal callback_obj
if button1.label == 'Run':
button1.label = 'Stop'
button1.button_type='danger'
callback_obj = doc.add_periodic_callback(button2_step, 100)
else:
button1.label = 'Run'
button1.button_type = 'success'
doc.remove_periodic_callback(callback_obj)
def button2_step():
nonlocal step
data = []
step += 1
fig.title.text = 'Streaming Line Plot - Step '+str(step)
for i in range(len(strategies)):
step_data = dict(step=[step],
strategy = [strategies[i]],
ncount=[random.choice(range(1, 100))])
data.append(step_data)
for source, data in zip(sources, data):
source.stream(data)
# add on_click callback for button widget
button1 = Button(label="Run", button_type='success', width=390)
button1.on_click(button1_run)
button2 = Button(label="Step", button_type='primary', width=390)
button2.on_click(button2_step)
doc.add_root(column(fig, button1, button2))
doc.title = "Now with live updating!"
apps = {'/': Application(FunctionHandler(make_document))}
server = Server(apps, port=5004)
server.start()
if __name__ == '__main__':
server.io_loop.add_callback(server.show, "/")
server.io_loop.start()
Result is just what I was looking for ...

Display Streaming DataFrame in Jupyter from Redis Subcription

I have a Redis pub-sub channel 'price-updates' in redis for which a publisher sets updates for a stock price. I want to display a streaming grid that keeps appending the price updates as they come at the end of the grid.
So far, I have created a non-working version of what I want to do.
from streamz import Stream
from streamz.dataframe import DataFrame
source = Stream()
data = []
def handler(message):
json_data = json.loads(message['data'])
df = pd.DataFrame.from_dict([json_data]).set_index('sym')
source.map(handler).sink(data.append)
sdf = DataFrame(source)
## Run this in a different thread
p.subscribe('price-updates')
while True:
message = p.get_message()
if message:
source.emit(message)
time.sleep(0.001)
## end of thread block
#displayStreamingDataGrid(sdf)
I would appreciate if someone with more experience with the sdf could help me do this.
I was able to do this without streams. However, I am not getting a the streaming grid that gets updated in-place but rather displaying and clearing the output which is very annoying.
Here is the output widget in one jupyter cell
import ipywidgets as iw
from IPython.display import display
o = iw.Output()
def output_to_widget(df, output_widget):
output_widget.clear_output()
with output_widget:
display(df)
o
Here is the code to subscribe to redis and get handle the message
import redis, json, time
r = redis.StrictRedis(host = HOST, password = PASS, port = PORT, db = DB)
p = r.pubsub(ignore_subscribe_messages=True)
p.subscribe('QUOTES')
mdf = pd.DataFrame()
while True:
message = p.get_message()
if message:
json_msg = json.loads(message['data'])
df = pd.DataFrame([json_msg]).set_index('sym')
mdf = mdf.append(df)
output_to_widget(mdf, o)
time.sleep(0.001)
You can use https://github.com/AaronWatters/jp_proxy_widget to create an html
table which you can update in place without visibly clearing the table between updates.
I put an example notebook here: https://github.com/AaronWatters/jp_doodle/blob/master/notebooks/misc/In%20place%20html%20table%20update%20demo.ipynb
The trick is to create a widget that displays a table and attaches
an update operation which modifies the table:
# Create a proxy widget with a table update method
import jp_proxy_widget
def updateable_table(headers, rows):
w = jp_proxy_widget.JSProxyWidget()
w.js_init("""
# injected javascript for the widget:
element.update_table = function(headers, rows) {
element.empty();
var table = $("<table border style='text-align:center'/>");
table.appendTo(element);
var header_row = $("<tr/>");
for (var i=0; i<headers.length; i++) {
$("<th style='text-align:center'>" + headers[i] + "</th>")
.width(50)
.appendTo(header_row);
}
header_row.appendTo(table);
for (var j=0; j<rows.length; j++) {
var table_row = $("<tr/>").appendTo(table);
var data_row = rows[j];
for (var i=0; i<data_row.length; i++) {
$("<td>" + data_row[i] + "</td>").appendTo(table_row);
}
}
}
element.update_table(headers, rows);
""", headers=headers, rows=rows)
return w
# show the widget
w = updateable_table(headers, rows)
w
The code to update the widget
# Update the widget 20 times
import time
count = -20
for i in range(21):
time.sleep(1)
rows = [rows[-1]] + rows[:-1] # rotate the rows
rows[0][0] = count # change the upper left entry.
count += 1
w.element.update_table(headers, rows)
updates the table in place with no visible erasure. The example
notebook linked above also shows how to do the same thing using a
pandas dataframe.

Create a graph from a CSV file and render to browser with Django and the Pandas Python library

I'm learning how to use the Django framework for a work project that will allow users to load files in various formats (at the moment I am only dealing with CSV files), graph that data using Pandas, and display that data back to the user via a Django template. I haven't had any problems creating the graph in iPython, but have been struggling with getting it to an HTML Django template.
I've followed the following example from matplotlib:
# graph input file
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
from matplotlib.dates import DateFormatter
def graph(request):
fig = Figure()
ax = fig.add_subplot(111)
x = []
y = []
now = datetime.datetime.now()
delta = datetime.timedelta(days=1)
for i in range(10):
x.append(now)
now += delta
y.append(random.randint(0, 1000))
ax.plot_date(x, y, '-')
ax.xaxis.set_major_formatter(DateFormatter('%Y-%m-%d'))
fig.autofmt_xdate()
canvas = FigureCanvas(fig)
response = HttpResponse( content_type = 'image/png')
canvas.print_png(response)
return response
The above example works great and I can see it in a template, but that's just a graph with hard-coded values.
I've attempted to use Pandas because of its seemingly simplistic syntax and my attempts in Django are as follows:
# graph input file
import pandas as pd
from pandas import DataFrame
def graph(request):
data_df = pd.read_csv("C:/Users/vut46744/Desktop/graphite_project/sampleCSV.csv")
data_df = pd.DataFrame(dataArray)
data_df.plot()
response = HttpResponse( content_type = 'image/png')
return response
In Django calling the .plot() displays the graph fine, but displays a blank page to the HTML template. I've also tried using Numpy's genfromtxt() and loadtxt(), but to no avail. Also, my Google searches have not been fruitful either.
Any help or suggestion would be great. If you know of a better alternative to Pandas then I am willing to try other options.
Haven't tried this yet, but I would attack it something like:
def graph(request):
fig = Figure()
ax = fig.add_subplot(111)
data_df = pd.read_csv("C:/Users/vut46744/Desktop/graphite_project/sampleCSV.csv")
data_df = pd.DataFrame(data_df)
data_df.plot(ax=ax)
canvas = FigureCanvas(fig)
response = HttpResponse(content_type='image/png')
canvas.print_png(response)
return response

Categories

Resources