Resampling Live Websocket Ticks to Candles using Pandas in python - python

I am trying to resample live ticks from KiteTicker websocket into OHLC candles using pandas and this is the code I have written, which works fine with single instrument (The commented trd_portfolio on line 9) but doesn't work with multiple instruments (Line 8) as it mixes up data of different instruments.
Is there any way to relate the final candles df to instrument tokens? or make this work with multiple intruments?
I would like to run my algo on multiple instruments at once, please suggest if there is a better way around it.
from kiteconnect import KiteTicker;
from kiteconnect import KiteConnect;
import logging
import time,os,datetime,math;
import winsound
import pandas as pd
trd_portfolio = {954883:"USDINR19MARFUT",4632577:"JUBLFOOD"}
# trd_portfolio = {954883:"USDINR19MARFUT"}
trd_tkn1 = [];
for x in trd_portfolio:
c_id = '****************'
ak = '************'
asecret = '*************************'
kite = KiteConnect(api_key=ak)
print('[*] Generate access Token : ',kite.login_url())
request_tkn = input('[*] Enter Your Request Token Here : ')[-32:];
data = kite.generate_session(request_tkn, api_secret=asecret)
kws = KiteTicker(ak, data['access_token'])
#columns in data frame
df_cols = ["Timestamp", "Token", "LTP"]
data_frame = pd.DataFrame(data=[],columns=df_cols, index=[])
def on_ticks(ws, ticks):
global data_frame, df_cols
data = dict()
for company_data in ticks:
token = company_data["instrument_token"]
ltp = company_data["last_price"]
timestamp = company_data['timestamp']
data[timestamp] = [timestamp, token, ltp]
tick_df = pd.DataFrame(data.values(), columns=df_cols, index=data.keys()) #
data_frame = data_frame.append(tick_df)
print ggframe
print candles
def on_connect(kws , response):
kws.set_mode(kws.MODE_FULL, trd_tkn1)
def on_close(ws, code, reason):
print('Connection Error')
kws.on_ticks = on_ticks
kws.on_connect = on_connect
kws.on_close = on_close

I don't have access to the Kite API, but I've been looking at some code snippets that use it trying to figure out another issue I'm having related to websockets. I came across this open question, and I think I can help, though I can't really test this solution.
The problem I think is that you're not calculating OHLC for each "token"... it just does it for all tokens.
data_frame = data_frame.append(tick_df)
You'll get a multi-index output, but the column names might not quite line up for the rest of your code. To fix that:


Handling high frequency updates with streamlit

I want to use streamlit to create a dashboard of all the trades (buy and sell) happening in a given market. I connect to a websocket stream to receive data of BTCUSDT from the Binance exchange. Messages are received every ~0.1s and I would like to update my dashboard in ~0.09s.
How can you handle this kind of situation where messages are delivered at high frequency? With my code, I successfully create a dashboard but it doesn't get updated fast enough. I am wondering if the dashboard is running behind.
The dashboard must display the buy and sell volumes at any moment in time as bar charts. I am also adding some metrics to show the total volume of buy and sell, as well as their change.
Steps to reproduce
My code is structured in the following way.
There is a file, that defines a class Streamer. The Streamer object is a Websocket client. It connects to a stream, handles messages, and updates the dashboard. Whenever a new message is received, Streamer acquires a threading.Lock() and updates the pandas dataframes (one dataframe for buy orders and one dataframe for sell orders). If there are multiple orders happening at the same timestamp, it combines them by summing the corresponding volumes. Then, it releases the threading.Lock() and it creates a new thread where the update function (defined in is executed. The update function acquires the lock to avoid messing up with memory.
In the file, streamlit's dashboard and the Streamerobject are initialized.
To reproduce the following code you need to connect to the Websocket from a region where Binance is not restricted. Since I live in the US, I must use a VPN to properly receive the data.
Code snippet: file
import streamer
import pandas as pd
import streamlit as st # web development
import numpy as np # np mean, np random
import time # to simulate a real time data, time loop
import as px # interactive charts
df_buy = pd.DataFrame(columns = [ 'Price', 'Quantity', 'USD Value'])
df_sell = pd.DataFrame(columns = [ 'Price', 'Quantity', 'USD Value'])
page_title='Real-Time Data Science Dashboard',
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
placeholder = st.empty()
streamer.Stream(df_buy,df_sell,placeholder).connect() file
import websocket
import json
import streamlit as st
import as px
import pandas as pd
from threading import Thread, Lock
from streamlit.script_run_context import add_script_run_ctx
from datetime import datetime
import time
def on_close(ws, close_status_code, close_msg):
print('LOG', 'Closed orderbook client')
def update(df_buy,df_sell, placeholder, lock):
with placeholder.container():
# create three columns
kpi1, kpi2 = st.columns(2)
current_sumSellVolumes = df_sell['Quantity'].sum()
previous_sumSellVolumes = df_sell.iloc[:-1]['Quantity'].sum()
current_sumBuyVolumes = df_buy['Quantity'].sum()
previous_sumBuyVolumes = df_buy.iloc[:-1]['Quantity'].sum()
# fill in those three columns with respective metrics or KPIs
kpi2.metric(label="Sell quantity 📉", value=round(current_sumSellVolumes, 2),
delta=round(current_sumSellVolumes - previous_sumSellVolumes, 2))
kpi1.metric(label="Buy quantity 📈", value=round(current_sumBuyVolumes, 2),
delta=round(current_sumBuyVolumes - previous_sumBuyVolumes, 2))
# create two columns for charts
fig_col1, fig_col2 = st.columns(2)
with fig_col1:
st.markdown("### Buy Volumes")
fig =, x=df_buy.index, y='Quantity')
with fig_col2:
st.markdown("### Sell Volumes")
fig2 =, x=df_sell.index, y='Quantity')
st.markdown("### Detailed Data View")
class Stream():
def __init__(self, df_buy, df_sell, placeholder):
self.symbol = 'BTCUSDT'
self.df_buy = df_buy
self.df_sell = df_sell
self.placeholder = placeholder
self.lock = Lock()
self.url = "wss://" = f"{self.symbol.lower()}#aggTrade"
self.times = []
def on_error(self, ws, error):
print('ERROR', error)
def on_open(self, ws):
print('LOG', f'Opening WebSocket stream for {self.symbol}')
subscribe_message = {"method": "SUBSCRIBE",
"params": [],
"id": 1}
def handle_message(self, message):
timestamp = datetime.utcfromtimestamp(int(message['T']) / 1000)
price = float(message['p'])
qty = float(message['q'])
USDvalue = price * qty
side = 'BUY' if message['m'] == False else 'SELL'
if side == 'BUY':
df = self.df_buy
df = self.df_sell
if timestamp not in df.index:
df.loc[timestamp] = [price, qty, USDvalue]
df.loc[df.index == timestamp, 'Quantity'] += qty
df.loc[df.index == timestamp, 'USD Value'] += USDvalue
def on_message(self, ws, message):
message = json.loads(message)
if 'e' in message:
thr = Thread(target=update, args=(self.df_buy, self.df_sell, self.placeholder, self.lock,))
def connect(self):
print('LOG', 'Connecting to websocket') = websocket.WebSocketApp(self.url, on_close=on_close, on_error=self.on_error,
on_open=self.on_open, on_message=self.on_message)
Debug info
Streamlit version: 1.4.0
Python version: 3.10.4
OS version: MacOS 13.1
Browser version: Safari 16.2

Pandas not outputting anything after successful deployment

When I run this script, I do not get an output. It appears as if it is successful as I do not get any errors telling me otherwise. When I run the notebook, a cell appears below the 5th cell, indicating that the script ran successfully, but there's nothing populated. All of my auth is correct as when I use the same auth in postman to pull tag data values, it's successful. This script used to run fine and output a table in addition to a graph.
What gives? Any help would be greatly appreciated.
Sample dataset when pulling tag data values from the Azure API
"c": 100,
"s": "opc",
"t": "2021-06-11T16:45:55.04Z",
"v": 80321248.5
import pandas as pd
from modules.services_factory import ServicesFactory
from modules.data_service import TagDataValue
from modules.model_service import ModelService
from datetime import datetime
import dateutil.parser
pd.options.plotting.backend = "plotly"
#specify tag list, start and end times here
taglist = ['c41f-ews-systemuptime']
starttime = '2021-06-10T14:00:00Z'
endtime = '2021-06-10T16:00:00Z'
# Get data and model services.
services = ServicesFactory('local.settings.production.json')
data_service = services.get_data_service()
tagvalues = []
for tag in taglist:
for tagvalue in data_service.get_tag_data_values(tag, dateutil.parser.parse(starttime), dateutil.parser.parse(endtime)):
tagvaluedict = tagvalue.__dict__
tagvaluedict['tag_id'] = tag
df = pd.DataFrame(tagvalues)
df = df.pivot(index='t',columns='tag_id')
fig = df['v'].plot()

How do I import various Data Frames from different python file?

I have a python file called '' which has all the data frames I need, and I want to import them for use in another python file called '' to use in creating a dashboard.
Is it possible to create a class in my, and if so can someone direct me to an article (which I struggled to find so far) so that I can figure it out?
The aim is to shift from CSV to an API overtime, so I wanted to keep data side wrangling side of things in a different file while the web app components in the file.
Any help would be much appreciated.
The code from the is:
import pandas as pd
import csv
import os # To access my file directory
print(os.getcwd()) # Let's me know the Current Work Directory
fdi_data = pd.read_csv(r'Data/fdi_data.csv')
fdi_meta = pd.read_csv(r'Data/fdi_metadata.csv')
debt_data = pd.read_csv(r'Data/debt_data.csv')
debt_meta = pd.read_csv(r'Data/debt_metadata.csv')
gdp_percap_data = pd.read_csv(r'Data/gdp_percap_data.csv', header=2)
gdp_percap_meta = pd.read_csv(r'Data/gdp_percap_metadata.csv')
gov_exp_data = pd.read_csv(r'Data/gov_exp_data.csv', header=2)
gov_exp_meta = pd.read_csv(r'Data/gov_exp_metadata.csv')
pop_data = pd.read_csv(r'Data/pop_data.csv', header=2)
pop_meta = pd.read_csv(r'Data/pop_metadata.csv')
'wb' stands for World Bank
def wb_merge_data(data, metadata):
merge = pd.merge(
on = 'Country Code',
how = 'inner'
return merge
fdi_merge = wb_merge_data(fdi_data, fdi_meta)
debt_merge = wb_merge_data(debt_data, debt_meta)
gdp_percap_merge = wb_merge_data(gdp_percap_data, gdp_percap_meta)
gov_exp_merge = wb_merge_data(gov_exp_data, gov_exp_meta)
pop_merge = wb_merge_data(pop_data, pop_meta)
def wb_drop_data(data):
drop = data.drop(['Country Code','Indicator Name','Indicator Code','TableName','SpecialNotes','Unnamed: 5'], axis=1)
return drop
fdi_merge = wb_drop_data(fdi_merge)
debt_merge = wb_drop_data(debt_merge)
gdp_percap_merge = wb_drop_data(gdp_percap_merge)
gov_exp_merge = wb_drop_data(gov_exp_merge)
pop_merge = wb_drop_data(pop_merge)
def wb_mr_data(data, value_name):
data = data.melt(['Country Name','Region','IncomeGroup']).reset_index()
data = data.rename(columns={'variable': 'Year', 'value': value_name})
data = data.drop('index', axis = 1)
return data
fdi_merge = wb_mr_data(fdi_merge, 'FDI')
debt_merge = wb_mr_data(debt_merge, 'Debt')
gdp_percap_merge = wb_mr_data(gdp_percap_merge, 'GDP per Cap')
gov_exp_merge = wb_mr_data(gov_exp_merge, 'Gov Expend.')
pop_merge = wb_mr_data(pop_merge, 'Population')
def avg_groupby(data, col_cal, cn=False, ig=False, rg=False):
if cn == True:
return data.groupby('Country Name')[col_cal].mean().reset_index()
elif ig == True:
return data.groupby('IncomeGroup')[col_cal].mean().reset_index()
elif rg == True:
return data.groupby('Region')[col_cal].mean().reset_index()
avg_cn_... For country
avg_ig_... Income Group
avg_rg_... Region
avg_cn_fdi = avg_groupby(fdi_merge, 'FDI', cn=True)
avg_ig_fdi = avg_groupby(fdi_merge, 'FDI', ig=True)
avg_rg_fdi = avg_groupby(fdi_merge, 'FDI', rg=True)
avg_cn_debt = avg_groupby(debt_merge, 'Debt', cn=True)
avg_ig_debt = avg_groupby(debt_merge, 'Debt', ig=True)
avg_rg_debt = avg_groupby(debt_merge, 'Debt', rg=True)
avg_cn_gdp_percap = avg_groupby(gdp_percap_merge, 'GDP per Cap', cn=True)
avg_ig_gdp_percap = avg_groupby(gdp_percap_merge, 'GDP per Cap', ig=True)
avg_rg_gdp_percap = avg_groupby(gdp_percap_merge, 'GDP per Cap', rg=True)
avg_cn_gexp = avg_groupby(gov_exp_merge, 'Gov Expend.', cn=True)
avg_ig_gexp = avg_groupby(gov_exp_merge, 'Gov Expend.', ig=True)
avg_rg_gexp = avg_groupby(gov_exp_merge, 'Gov Expend.', rg=True)
avg_cn_pop = avg_groupby(pop_merge, 'Population', cn=True)
avg_ig_pop = avg_groupby(pop_merge, 'Population', ig=True)
avg_rg_pop = avg_groupby(pop_merge, 'Population', rg=True)
In Python, every file is a module. So if you want to re-use your code, you can simple import this module. For example,
import clean_data
Maybe you needn't create a class for this
You can import the whole python file like you'd import any other locally created files and have access to the DataFrames in them. Here's an example:
I created a file called
import pandas as pd
data = pd.read_csv("temp.csv")
And then in a separate file I was able to use data like so:
import temporary
Or, you could also do:
from temporary import data
All that being said, I don't believe that this would be the best way to handle your data.

Streaming Grid Display in Jupyter Notebook

I am trying to display live price updates coming from a redis pubsub channel in a grid in Jupyter. Everytime there is a price update, the message will be added at the end of the grid. In order words, a gridview widget will be tied to a Dataframe so everytime it changes, the gridview will change. The idea is to get something like this:
I tried to do that by displaying and clearing the output. However, I am not getting a the streaming grid that gets updated in-place but rather displaying and clearing the output which is very annoying.
Here is the output widget in one jupyter cell
import ipywidgets as iw
from IPython.display import display
o = iw.Output()
def output_to_widget(df, output_widget):
with output_widget:
Here is the code to subscribe to redis and get handle the message
import redis, json, time
r = redis.StrictRedis(host = HOST, password = PASS, port = PORT, db = DB)
p = r.pubsub(ignore_subscribe_messages=True)
mdf = pd.DataFrame()
while True:
message = p.get_message()
if message:
json_msg = json.loads(message['data'])
df = pd.DataFrame([json_msg]).set_index('sym')
mdf = mdf.append(df)
output_to_widget(mdf, o)
Try changing the first line of output_to_widget to output_widget.clear_output(wait = True).
I was able to get it to work using Streaming DataFrames from the streamz library.
Here is the class to emit the data to the streamming dataframe.
class DataEmitter:
def __init__(self, pubsub, src):
self.pubsub = pubsub
self.src = src
self.thread = None
def emit_data(self, channel):
self.pubsub.subscribe(**{channel: self._handler})
self.thread = self.pubsub.run_in_thread(sleep_time=0.001)
def stop(self):
def _handler(self, message):
json_msg = json.loads(message['data'])
df = pd.DataFrame([json_msg])
and here is the cell to display the streaming dataframe
r = redis.StrictRedis(host = HOST, password = PASS, port = PORT, db = DB)
p = r.pubsub(ignore_subscribe_messages=True)
source = Stream()
emitter = DataEmitter(p, source, COLUMNS)
#sample for how the dataframe it's going to look like
example = pd.DataFrame({'time': [], 'sym': []})
sdf = source.to_dataframe(example=example)

Simple Bokeh app: Chart does not update as expected

So I have been trying to build a little something from the bokeh example there:
My dataset is really similar and this should be very straight forward, yet I have a problem that I cannot explain.
import os , pickle
import pandas as pd
from import curdoc
from bokeh.layouts import row, column
from bokeh.models import ColumnDataSource, Select
from bokeh.plotting import figure
base_path = '/Users/xxxxxx/Desktop/data/'
domain = 'IEM_Domain'
metric = 'total_area_burned'
def get_dataset(dic , selection , scenario):
def _get_mean(thing):
_df = pd.DataFrame(thing)
_df = _df.mean(axis = 1).cumsum(axis=0)
return _df
data = { model : _get_mean( dic[model] ) for model in dic.keys() if all([scenario in model , selection in model])}
df = pd.DataFrame(data)
return ColumnDataSource(data=df)
def make_plot(source, title):
plot = figure(x_axis_type="datetime", plot_width=800, tools="")
plot.title.text = title
for _df in source :
for col in _df.to_df().columns :
if 'index' not in col :
plot.line( _df.to_df()['index'] , _df.to_df()[col] , source = _df)
else : pass
# fixed attributes
plot.xaxis.axis_label = 'Year'
plot.yaxis.axis_label = "Area burned (km)"
plot.axis.axis_label_text_font_style = "bold"
return plot
def update_plot(attrname, old, new):
rcp45 = rcp45_select.value
rcp85 = rcp85_select.value
src45 = get_dataset(dic , rcp45 , 'rcp45')
src85 = get_dataset(dic , rcp85 , 'rcp85') = =
rcp45 = 'CCSM4_rcp45'
rcp85 = 'CCSM4_rcp85'
dic = pickle.load(open(os.path.join(base_path , "_".join([domain , metric ]) + '.p'), 'rb'),encoding='latin1')
rcp45_models = [ i for i in dic.keys() if 'rcp45' in i]
rcp85_models = [ i for i in dic.keys() if 'rcp85' in i]
rcp45_select = Select(value=rcp45, title='RCP 45', options=sorted(rcp45_models))
rcp85_select = Select(value=rcp85, title='RCP 85', options=sorted(rcp85_models))
source45 = get_dataset(dic , rcp45 , 'rcp45')
source85 = get_dataset(dic , rcp85 ,'rcp85')
plot = make_plot([source45 , source85], "Total area burned ")
rcp45_select.on_change('value', update_plot)
rcp85_select.on_change('value', update_plot)
controls = column(rcp45_select, rcp85_select)
curdoc().add_root(row(plot, controls))
curdoc().title = "Total Area burned"
Everything runs find until I try to change the value in the dropdown, I can see that the function update_plot() is doing the job, updating the data when the dropdown is used. But for some reason the plot doesn't change , the example works fine though. I have been digging everywhere in the code but can't seem to find what I am doing wrong.
I have tried to simplify the make_plot() to see if it could come from there but that didn't change anything so I am out of ideas.
I found that but couldn't apply it : Bokeh: chart from pandas dataframe won't update on trigger
Edit after first answer
I tried to get ride of the columndatasource and replaced it by a traditionnal dictionnary but still run into the same issue.
Here is the updated code :
import os , pickle
import pandas as pd
from import curdoc
from bokeh.layouts import row, column
from bokeh.models import ColumnDataSource, Select
from bokeh.plotting import figure
base_path = '/Users/julienschroder/Desktop/data/'
domain = 'IEM_Domain'
metric = 'total_area_burned'
scenarios = ['rcp45','rcp85']
def get_dataset(dic ,selection , scenario = scenarios):
#function taking the raw source as dic and a selection of models, it return a dictionnary
# like this {scenario : pd.Dataframe(models)} that way i can plot each scenario on their own
def _get_mean_cumsum(df ,name):
#Extract, average and cumsum the raw data to a dataframe
_df = pd.DataFrame(df)
_df = _df.mean(axis = 1).cumsum(axis=0)
_df = _df.to_frame(name=name)
return _df
#Just one model at a time for now but hoping to get multilines and so multi models in the future
data = { scenario : pd.concat([_get_mean_cumsum(dic[model] , model) for model in selection if scenario in model ] ,axis=1) for scenario in scenarios }
return data
def make_plot(source, title):
plot = figure(x_axis_type="datetime", plot_width=800, tools="")
plot.title.text = title
#for now it will just deal with one model at a time but in the future I hope to have some multiline plotting hence the for loops
for col in source['rcp45']:
plot.line(source['rcp45'].index,source['rcp45'][col] )
for col in source['rcp85']:
plot.line(source['rcp85'].index , source['rcp85'][col])
# fixed attributes
plot.xaxis.axis_label = 'Year'
plot.yaxis.axis_label = "Area burned (km)"
plot.axis.axis_label_text_font_style = "bold"
return plot
def update_plot(attrname, old, new):
rcp45 = rcp45_select.value
rcp85 = rcp85_select.value
source = get_dataset(dic,[rcp45 ,rcp85])
#check to see if source gets updated
print(source) # <- gets updated properly after dropdown action
rcp45 = 'CCSM4_rcp45'
rcp85 = 'CCSM4_rcp85'
# dic = pickle.load(open(os.path.join(base_path , "_".join([domain , metric ]) + '.p'), 'rb'),encoding='latin1')
dic = pickle.load(open('IEM_Domain_total_area_burned.p', 'rb'),encoding='latin1') #data available there :
rcp45_models = [ i for i in dic.keys() if 'rcp45' in i]
rcp85_models = [ i for i in dic.keys() if 'rcp85' in i]
rcp45_select = Select(value=rcp45, title='RCP 45', options=sorted(rcp45_models))
rcp85_select = Select(value=rcp85, title='RCP 85', options=sorted(rcp85_models))
source = get_dataset(dic,[rcp45 ,rcp85])
plot = make_plot(source , "Total area burned ")
rcp45_select.on_change('value', update_plot)
rcp85_select.on_change('value', update_plot)
controls = column(rcp45_select, rcp85_select)
curdoc().add_root(row(plot, controls))
curdoc().title = "Total Area burned"
I get my two first lines but nothing happen when using the dropdown.
I uploaded a smaller dataset on this github page if someone wants to try out the data
Well, I can't say with 100% certainty since I can't run the code without the data, but I have a decent good idea what the problem might be. The .data attribute one ColumnDataSource is not actually a simple Python dictionary:
In [4]: s = ColumnDataSource(data=dict(a=[1,2], b=[3,4]))
In [5]: type(
It's actually a specially wrapped dictionary that can automatically emit event notifications when its contents are changed. This is part of the machinery that let's Bokeh respond and update things automatically in such a handy way. I'm guessing that setting the .data of one source by using the .data of another source is what is somehow causing the problem. I'm hypothesizing that setting .data with something besides a real Python dicts prevents the event handlers not getting wired up correctly.
So, a suggestion for an immediate-term workaround: Don't construct a ColumnDataSource in get_dataset. Only construct and return a plain Python dictionary. It's possible that df.to_dict will just give you e exactly the right sort of dict. Or else you can make a dict by hand, putting in the columns you need.
And a request: It's possible that this limitation can be fixed up. Or if not, it's certainly possible that a loud warning can be made if a user does this. Please file a bug report on the GitHub issue tracker with all this information.

