Streaming Grid Display in Jupyter Notebook

Streaming Grid Display in Jupyter Notebook - python

I am trying to display live price updates coming from a redis pubsub channel in a grid in Jupyter. Everytime there is a price update, the message will be added at the end of the grid. In order words, a gridview widget will be tied to a Dataframe so everytime it changes, the gridview will change. The idea is to get something like this:
I tried to do that by displaying and clearing the output. However, I am not getting a the streaming grid that gets updated in-place but rather displaying and clearing the output which is very annoying.
Here is the output widget in one jupyter cell
import ipywidgets as iw
from IPython.display import display
o = iw.Output()
def output_to_widget(df, output_widget):
output_widget.clear_output()
with output_widget:
display(df)
o
Here is the code to subscribe to redis and get handle the message
import redis, json, time
r = redis.StrictRedis(host = HOST, password = PASS, port = PORT, db = DB)
p = r.pubsub(ignore_subscribe_messages=True)
p.subscribe('QUOTES')
mdf = pd.DataFrame()
while True:
message = p.get_message()
if message:
json_msg = json.loads(message['data'])
df = pd.DataFrame([json_msg]).set_index('sym')
mdf = mdf.append(df)
output_to_widget(mdf, o)
time.sleep(0.001)

Try changing the first line of output_to_widget to output_widget.clear_output(wait = True).
https://ipython.org/ipython-doc/3/api/generated/IPython.display.html

I was able to get it to work using Streaming DataFrames from the streamz library.
Here is the class to emit the data to the streamming dataframe.
class DataEmitter:
def __init__(self, pubsub, src):
self.pubsub = pubsub
self.src = src
self.thread = None
def emit_data(self, channel):
self.pubsub.subscribe(**{channel: self._handler})
self.thread = self.pubsub.run_in_thread(sleep_time=0.001)
def stop(self):
self.pubsub.unsubscribe()
self.thread.stop()
def _handler(self, message):
json_msg = json.loads(message['data'])
df = pd.DataFrame([json_msg])
self.src.emit(df)
and here is the cell to display the streaming dataframe
r = redis.StrictRedis(host = HOST, password = PASS, port = PORT, db = DB)
p = r.pubsub(ignore_subscribe_messages=True)
source = Stream()
emitter = DataEmitter(p, source, COLUMNS)
emitter.emit_data(src='PRICE_UPDATES')
#sample for how the dataframe it's going to look like
example = pd.DataFrame({'time': [], 'sym': []})
sdf = source.to_dataframe(example=example)
sdf

Related

Handling high frequency updates with streamlit

Summary
I want to use streamlit to create a dashboard of all the trades (buy and sell) happening in a given market. I connect to a websocket stream to receive data of BTCUSDT from the Binance exchange. Messages are received every ~0.1s and I would like to update my dashboard in ~0.09s.
How can you handle this kind of situation where messages are delivered at high frequency? With my code, I successfully create a dashboard but it doesn't get updated fast enough. I am wondering if the dashboard is running behind.
The dashboard must display the buy and sell volumes at any moment in time as bar charts. I am also adding some metrics to show the total volume of buy and sell, as well as their change.
Steps to reproduce
My code is structured in the following way.
There is a streamer.py file, that defines a class Streamer. The Streamer object is a Websocket client. It connects to a stream, handles messages, and updates the dashboard. Whenever a new message is received, Streamer acquires a threading.Lock() and updates the pandas dataframes (one dataframe for buy orders and one dataframe for sell orders). If there are multiple orders happening at the same timestamp, it combines them by summing the corresponding volumes. Then, it releases the threading.Lock() and it creates a new thread where the update function (defined in streamer.py) is executed. The update function acquires the lock to avoid messing up with memory.
In the main.py file, streamlit's dashboard and the Streamerobject are initialized.
To reproduce the following code you need to connect to the Websocket from a region where Binance is not restricted. Since I live in the US, I must use a VPN to properly receive the data.
Code snippet:
main.py file
# main.py
import streamer
import pandas as pd
import streamlit as st # web development
import numpy as np # np mean, np random
import time # to simulate a real time data, time loop
import plotly.express as px # interactive charts
df_buy = pd.DataFrame(columns = [ 'Price', 'Quantity', 'USD Value'])
df_sell = pd.DataFrame(columns = [ 'Price', 'Quantity', 'USD Value'])
st.set_page_config(
page_title='Real-Time Data Science Dashboard',
page_icon='✅',
layout='wide'
)
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
placeholder = st.empty()
streamer.Stream(df_buy,df_sell,placeholder).connect()
streamer.py file
# streamer.py
import websocket
import json
import streamlit as st
import plotly.express as px
import pandas as pd
from threading import Thread, Lock
from streamlit.script_run_context import add_script_run_ctx
from datetime import datetime
import time
def on_close(ws, close_status_code, close_msg):
print('LOG', 'Closed orderbook client')
def update(df_buy,df_sell, placeholder, lock):
lock.acquire()
with placeholder.container():
# create three columns
kpi1, kpi2 = st.columns(2)
current_sumSellVolumes = df_sell['Quantity'].sum()
previous_sumSellVolumes = df_sell.iloc[:-1]['Quantity'].sum()
current_sumBuyVolumes = df_buy['Quantity'].sum()
previous_sumBuyVolumes = df_buy.iloc[:-1]['Quantity'].sum()
# fill in those three columns with respective metrics or KPIs
kpi2.metric(label="Sell quantity 📉", value=round(current_sumSellVolumes, 2),
delta=round(current_sumSellVolumes - previous_sumSellVolumes, 2))
kpi1.metric(label="Buy quantity 📈", value=round(current_sumBuyVolumes, 2),
delta=round(current_sumBuyVolumes - previous_sumBuyVolumes, 2))
# create two columns for charts
fig_col1, fig_col2 = st.columns(2)
with fig_col1:
st.markdown("### Buy Volumes")
fig = px.bar(data_frame=df_buy, x=df_buy.index, y='Quantity')
st.write(fig)
with fig_col2:
st.markdown("### Sell Volumes")
fig2 = px.bar(data_frame=df_sell, x=df_sell.index, y='Quantity')
st.write(fig2)
st.markdown("### Detailed Data View")
st.dataframe(df_buy)
st.dataframe(df_sell)
lock.release()
class Stream():
def __init__(self, df_buy, df_sell, placeholder):
self.symbol = 'BTCUSDT'
self.df_buy = df_buy
self.df_sell = df_sell
self.placeholder = placeholder
self.lock = Lock()
self.url = "wss://stream.binance.com:9443/ws"
self.stream = f"{self.symbol.lower()}#aggTrade"
self.times = []
def on_error(self, ws, error):
print(self.times)
print('ERROR', error)
def on_open(self, ws):
print('LOG', f'Opening WebSocket stream for {self.symbol}')
subscribe_message = {"method": "SUBSCRIBE",
"params": [self.stream],
"id": 1}
ws.send(json.dumps(subscribe_message))
def handle_message(self, message):
self.lock.acquire()
timestamp = datetime.utcfromtimestamp(int(message['T']) / 1000)
price = float(message['p'])
qty = float(message['q'])
USDvalue = price * qty
side = 'BUY' if message['m'] == False else 'SELL'
if side == 'BUY':
df = self.df_buy
else:
df = self.df_sell
if timestamp not in df.index:
df.loc[timestamp] = [price, qty, USDvalue]
else:
df.loc[df.index == timestamp, 'Quantity'] += qty
df.loc[df.index == timestamp, 'USD Value'] += USDvalue
self.lock.release()
def on_message(self, ws, message):
message = json.loads(message)
self.times.append(time.time())
if 'e' in message:
self.handle_message(message)
thr = Thread(target=update, args=(self.df_buy, self.df_sell, self.placeholder, self.lock,))
add_script_run_ctx(thr)
thr.start()
def connect(self):
print('LOG', 'Connecting to websocket')
self.ws = websocket.WebSocketApp(self.url, on_close=on_close, on_error=self.on_error,
on_open=self.on_open, on_message=self.on_message)
self.ws.run_forever()
Debug info
Streamlit version: 1.4.0
Python version: 3.10.4
OS version: MacOS 13.1
Browser version: Safari 16.2

Resampling Live Websocket Ticks to Candles using Pandas in python

I am trying to resample live ticks from KiteTicker websocket into OHLC candles using pandas and this is the code I have written, which works fine with single instrument (The commented trd_portfolio on line 9) but doesn't work with multiple instruments (Line 8) as it mixes up data of different instruments.
Is there any way to relate the final candles df to instrument tokens? or make this work with multiple intruments?
I would like to run my algo on multiple instruments at once, please suggest if there is a better way around it.
from kiteconnect import KiteTicker;
from kiteconnect import KiteConnect;
import logging
import time,os,datetime,math;
import winsound
import pandas as pd
trd_portfolio = {954883:"USDINR19MARFUT",4632577:"JUBLFOOD"}
# trd_portfolio = {954883:"USDINR19MARFUT"}
trd_tkn1 = [];
for x in trd_portfolio:
trd_tkn1.append(x)
c_id = '****************'
ak = '************'
asecret = '*************************'
kite = KiteConnect(api_key=ak)
print('[*] Generate access Token : ',kite.login_url())
request_tkn = input('[*] Enter Your Request Token Here : ')[-32:];
data = kite.generate_session(request_tkn, api_secret=asecret)
kite.set_access_token(data['access_token'])
kws = KiteTicker(ak, data['access_token'])
#columns in data frame
df_cols = ["Timestamp", "Token", "LTP"]
data_frame = pd.DataFrame(data=[],columns=df_cols, index=[])
def on_ticks(ws, ticks):
global data_frame, df_cols
data = dict()
for company_data in ticks:
token = company_data["instrument_token"]
ltp = company_data["last_price"]
timestamp = company_data['timestamp']
data[timestamp] = [timestamp, token, ltp]
tick_df = pd.DataFrame(data.values(), columns=df_cols, index=data.keys()) #
data_frame = data_frame.append(tick_df)
ggframe=data_frame.set_index(['Timestamp'],['Token'])
print ggframe
gticks=ggframe.ix[:,['LTP']]
candles=gticks['LTP'].resample('1min').ohlc().dropna()
print candles
def on_connect(kws , response):
print('Connected')
kws.subscribe(trd_tkn1)
kws.set_mode(kws.MODE_FULL, trd_tkn1)
def on_close(ws, code, reason):
print('Connection Error')
kws.on_ticks = on_ticks
kws.on_connect = on_connect
kws.on_close = on_close
kws.connect()

I don't have access to the Kite API, but I've been looking at some code snippets that use it trying to figure out another issue I'm having related to websockets. I came across this open question, and I think I can help, though I can't really test this solution.
The problem I think is that you're not calculating OHLC for each "token"... it just does it for all tokens.
data_frame = data_frame.append(tick_df)
ggframe=data_frame.set_index('Timestamp')
candles=ggframe.groupby('token').resample('1min').agg({'LTP':'ohlc'})
You'll get a multi-index output, but the column names might not quite line up for the rest of your code. To fix that:
candles.columns=['open','high','low','close']

Display Streaming DataFrame in Jupyter from Redis Subcription

I have a Redis pub-sub channel 'price-updates' in redis for which a publisher sets updates for a stock price. I want to display a streaming grid that keeps appending the price updates as they come at the end of the grid.
So far, I have created a non-working version of what I want to do.
from streamz import Stream
from streamz.dataframe import DataFrame
source = Stream()
data = []
def handler(message):
json_data = json.loads(message['data'])
df = pd.DataFrame.from_dict([json_data]).set_index('sym')
source.map(handler).sink(data.append)
sdf = DataFrame(source)
## Run this in a different thread
p.subscribe('price-updates')
while True:
message = p.get_message()
if message:
source.emit(message)
time.sleep(0.001)
## end of thread block
#displayStreamingDataGrid(sdf)
I would appreciate if someone with more experience with the sdf could help me do this.

I was able to do this without streams. However, I am not getting a the streaming grid that gets updated in-place but rather displaying and clearing the output which is very annoying.
Here is the output widget in one jupyter cell
import ipywidgets as iw
from IPython.display import display
o = iw.Output()
def output_to_widget(df, output_widget):
output_widget.clear_output()
with output_widget:
display(df)
o
Here is the code to subscribe to redis and get handle the message
import redis, json, time
r = redis.StrictRedis(host = HOST, password = PASS, port = PORT, db = DB)
p = r.pubsub(ignore_subscribe_messages=True)
p.subscribe('QUOTES')
mdf = pd.DataFrame()
while True:
message = p.get_message()
if message:
json_msg = json.loads(message['data'])
df = pd.DataFrame([json_msg]).set_index('sym')
mdf = mdf.append(df)
output_to_widget(mdf, o)
time.sleep(0.001)

You can use https://github.com/AaronWatters/jp_proxy_widget to create an html
table which you can update in place without visibly clearing the table between updates.
I put an example notebook here: https://github.com/AaronWatters/jp_doodle/blob/master/notebooks/misc/In%20place%20html%20table%20update%20demo.ipynb
The trick is to create a widget that displays a table and attaches
an update operation which modifies the table:
# Create a proxy widget with a table update method
import jp_proxy_widget
def updateable_table(headers, rows):
w = jp_proxy_widget.JSProxyWidget()
w.js_init("""
# injected javascript for the widget:
element.update_table = function(headers, rows) {
element.empty();
var table = $("<table border style='text-align:center'/>");
table.appendTo(element);
var header_row = $("<tr/>");
for (var i=0; i<headers.length; i++) {
$("<th style='text-align:center'>" + headers[i] + "</th>")
.width(50)
.appendTo(header_row);
}
header_row.appendTo(table);
for (var j=0; j<rows.length; j++) {
var table_row = $("<tr/>").appendTo(table);
var data_row = rows[j];
for (var i=0; i<data_row.length; i++) {
$("<td>" + data_row[i] + "</td>").appendTo(table_row);
}
}
}
element.update_table(headers, rows);
""", headers=headers, rows=rows)
return w
# show the widget
w = updateable_table(headers, rows)
w
The code to update the widget
# Update the widget 20 times
import time
count = -20
for i in range(21):
time.sleep(1)
rows = [rows[-1]] + rows[:-1] # rotate the rows
rows[0][0] = count # change the upper left entry.
count += 1
w.element.update_table(headers, rows)
updates the table in place with no visible erasure. The example
notebook linked above also shows how to do the same thing using a
pandas dataframe.

IPython cluster and PicklingError

my problem seem to be similar to This Thread however, while I think I am following the advised method, I still get a PicklingError. When I run my process locally without sending to an IPython Cluster Engine the function works fine.
I am using zipline with IPyhon's notebook, so I first create a class based on zipline.TradingAlgorithm
Cell [ 1 ]
from IPython.parallel import Client
rc = Client()
lview = rc.load_balanced_view()
Cell [ 2 ]
%%px --local # This insures that the Class and modules exist on each engine
import zipline as zpl
import numpy as np
class Agent(zpl.TradingAlgorithm): # must define initialize and handle_data methods
def initialize(self):
self.valueHistory = None
pass
def handle_data(self, data):
for security in data.keys():
## Just randomly buy/sell/hold for each security
coinflip = np.random.random()
if coinflip < .25:
self.order(security,100)
elif coinflip > .75:
self.order(security,-100)
pass
Cell [ 3 ]
from zipline.utils.factory import load_from_yahoo
start = '2013-04-01'
end = '2013-06-01'
sidList = ['SPY','GOOG']
data = load_from_yahoo(stocks=sidList,start=start,end=end)
agentList = []
for i in range(3):
agentList.append(Agent())
def testSystem(agent,data):
results = agent.run(data) #-- This is how the zipline based class is executed
#-- next I'm just storing the final value of the test so I can plot later
agent.valueHistory.append(results['portfolio_value'][len(results['portfolio_value'])-1])
return agent
for i in range(10):
tasks = []
for agent in agentList:
#agent = testSystem(agent,data) ## On its own, this works!
#-- To Test, uncomment the above line and comment out the next two
tasks.append(lview.apply_async(testSystem,agent,data))
agentList = [ar.get() for ar in tasks]
for agent in agentList:
plot(agent.valueHistory)
Here is the Error produced:
PicklingError Traceback (most recent call last)/Library/Python/2.7/site-packages/IPython/kernel/zmq/serialize.pyc in serialize_object(obj, buffer_threshold, item_threshold)
100 buffers.extend(_extract_buffers(cobj, buffer_threshold))
101
--> 102 buffers.insert(0, pickle.dumps(cobj,-1))
103 return buffers
104
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
If I override the run() method from zipline.TradingAlgorithm with something like:
def run(self, data):
return 1
Trying something like this...
def run(self, data):
return zpl.TradingAlgorithm.run(self,data)
results in the same PicklingError.
then the passing off to the engines works, but obviously the guts of the test are not performed. As run is a method internal to zipline.TradingAlgorithm and I don't know everything that it does, how would I make sure it is passed through?

It looks like the zipline TradingAlgorithm object is not pickleable after it has been run:
import zipline as zpl
class Agent(zpl.TradingAlgorithm): # must define initialize and handle_data methods
def handle_data(self, data):
pass
agent = Agent()
pickle.dumps(agent)[:32] # ok
agent.run(data)
pickle.dumps(agent)[:32] # fails
But this suggests to me that you should be creating the Agents on the engines, and only passing data / results back and forth (ideally, not passing data across at all, or at most once).
Minimizing data transfers might look something like this:
define the class:
%%px
import zipline as zpl
import numpy as np
class Agent(zpl.TradingAlgorithm): # must define initialize and handle_data methods
def initialize(self):
self.valueHistory = []
def handle_data(self, data):
for security in data.keys():
## Just randomly buy/sell/hold for each security
coinflip = np.random.random()
if coinflip < .25:
self.order(security,100)
elif coinflip > .75:
self.order(security,-100)
load the data
%%px
from zipline.utils.factory import load_from_yahoo
start = '2013-04-01'
end = '2013-06-01'
sidList = ['SPY','GOOG']
data = load_from_yahoo(stocks=sidList,start=start,end=end)
agent = Agent()
and run the code:
def testSystem(agent, data):
results = agent.run(data) #-- This is how the zipline based class is executed
#-- next I'm just storing the final value of the test so I can plot later
agent.valueHistory.append(results['portfolio_value'][len(results['portfolio_value'])-1])
# create references to the remote agent / data objects
agent_ref = parallel.Reference('agent')
data_ref = parallel.Reference('data')
tasks = []
for i in range(10):
for j in range(len(rc)):
tasks.append(lview.apply_async(testSystem, agent_ref, data_ref))
# wait for the tasks to complete
[ t.get() for t in tasks ]
And plot the results, never fetching the agents themselves
%matplotlib inline
import matplotlib.pyplot as plt
for history in rc[:].apply_async(lambda : agent.valueHistory):
plt.plot(history)
This is not quite the same code you shared - three agents bouncing back and forth on all your engines, whereas this has on agent per engine. I don't know enough about zipline to say whether that's useful to you or not.

Python - how to get edit box value in others module?

in general, i want connect to the database selected by users.
i using 2 module, dblogin.py and xconn.py
dblogin.py is the gui for user to setting the desired dataname, and xconn.py is the connection to the postgresql
the problem is i can't get the value of dbedit in dblogin.py
how i can fix it?
thanks b4 for the answer.... Gbu all......
Regards,
ide
dblogin.py
class dblog(QDialog):
def __init__(self):
super(dblog, self).__init__()
self.dblabel = QLabel('Database Name')
self.dbedit = QLineEdit('')
#create button
...
#set layout in grid
#action for button
self.connect(self.connectbutton, SIGNAL('clicked()'),self.connectaction)
def connectaction(self):
self._data = self.dbedit.text()
if self._data == '':
_msg = QMessageBox.information(self,'information','Nama Database harus diisi !',QMessageBox.Ok)
self.dbedit.setFocus()
else:
try:
xconn.getconn()
_msg = QMessageBox.information(self,'information','Tunggu, Check database struktur!',QMessageBox.Ok)
except:
_msg = QMessageBox.information(self,'information','Database tidak ditemukan !',QMessageBox.Ok)
xconn.py
import psycopg2
import dblogin
def getconn():
_host = '127.0.0.1'
_user = 'postgres'
_pass = 'xxx'
_data = dblogin.dblog.getdb()
conn = psycopg2.connect(database=_data, user=_user, password=_pass, host=_host)
return conn

Your QDialog class should begin with capiatlized letters class DBLog. You can use the standardbuttons:
self.buttonBox = QtGui.QDialogButtonBox(QtGui.QDialogButtonBox.Ok | QtGui.QDialogButtonBox.Cancel)
To set the text from the QLineEdit as return value, reimplement the accept method:
self.buttonBox.accepted.connect(self.accept)
def accept(self):
self._data = self.dbedit.text()
self.done(1)
Then in xconn create an instance od DBLog and only use the Dialog to get this value. Then from xconn.py do something like this:
dblog = DBLog() # create an instance for your dialog
if dblog:
_data = dblog._data
else:
Dialog not accepted

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Streaming Grid Display in Jupyter Notebook - python

Try changing the first line of output_to_widget to output_widget.clear_output(wait = True). https://ipython.org/ipython-doc/3/api/generated/IPython.display.html

Related

Handling high frequency updates with streamlit

Resampling Live Websocket Ticks to Candles using Pandas in python

Display Streaming DataFrame in Jupyter from Redis Subcription

IPython cluster and PicklingError

Python - how to get edit box value in others module?

Categories

Resources