Using Bokeh 0.8.1, how can i display a long timeserie, but start 'zoomed-in' on one part, while keeping the rest of data available for scrolling ?
For instance, considering the following time serie (IBM stock price since 1980), how could i get my chart to initially display only price since 01/01/2014 ?
Example code :
import pandas as pd
import bokeh.plotting as bk
from bokeh.models import ColumnDataSource
bk.output_notebook()
TOOLS="pan,wheel_zoom,box_zoom,reset,save"
# Quandl data, too lazy to generate some random data
df = pd.read_csv('https://www.quandl.com/api/v1/datasets/GOOG/NYSE_IBM.csv')
df['Date'] = pd.to_datetime(df['Date'])
df = df[['Date', 'Close']]
#Generating a bokeh source
source = ColumnDataSource()
dtest = {}
for col in df:
dtest[col] = df[col]
source = ColumnDataSource(data=dtest)
# plotting stuff !
p = bk.figure(title='title', tools=TOOLS,x_axis_type="datetime", plot_width=600, plot_height=300)
p.line(y='Close', x='Date', source=source)
bk.show(p)
outputs :
but i want to get this (which you can achieve with the box-zoom tool - but I'd like to immediately start like this)
So, it looks (as of 0.8.1) that we need to add some more convenient ways to set ranges with datetime values. That said, although this is a bit ugly, it does currently work for me:
import time, datetime
x_range = (
time.mktime(datetime.datetime(2014, 1, 1).timetuple())*1000,
time.mktime(datetime.datetime(2016, 1, 1).timetuple())*1000
)
p = bk.figure(
title='title', tools=TOOLS,x_axis_type="datetime",
plot_width=600, plot_height=300, x_range=x_range
)
Related
I have a Pandas dataframe representing portfolio weights in multiple dates, such as the following contents in CSV format:
DATE,ASSET1,ASSET2,ASSET3,ASSET4,ASSET5,ASSET6,ASSET7
2010-01-04,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-02-03,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-03-05,0.217195,0.0,0.250000,0.032805,0.25,0.000000,0.250000
2010-04-06,0.139636,0.0,0.250000,0.110364,0.25,0.000000,0.250000
2010-05-05,0.179569,0.0,0.218951,0.101480,0.25,0.000000,0.250000
2010-06-04,0.207270,0.0,0.211974,0.080756,0.25,0.000000,0.250000
2010-07-06,0.132468,0.0,0.250000,0.117532,0.25,0.000000,0.250000
2010-08-04,0.116353,0.0,0.250000,0.133647,0.25,0.000000,0.250000
2010-09-02,0.081677,0.0,0.250000,0.168323,0.25,0.000000,0.250000
2010-10-04,0.000000,0.0,0.250000,0.250000,0.25,0.009955,0.240045
For each row in the Pandas dataframe resulting from this CSV, we can generate a bar chart with the portfolio composition at that day. I would like to have multiple bar charts, with a time slider, such that we can choose one of the dates and see the portfolio composition during that day.
Can this be achieved with Plotly?
I could not find a way to do it straight in the dataframe above, but it is possible to do it by "melting" the dataframe. The following code achieves what I was looking for, together with some beautification of the chart:
import pandas as pd
from io import StringIO
import plotly.express as px
string = """
DATE,ASSET1,ASSET2,ASSET3,ASSET4,ASSET5,ASSET6,ASSET7
2010-01-04,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-02-03,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-03-05,0.217195,0.0,0.250000,0.032805,0.25,0.000000,0.250000
2010-04-06,0.139636,0.0,0.250000,0.110364,0.25,0.000000,0.250000
2010-05-05,0.179569,0.0,0.218951,0.101480,0.25,0.000000,0.250000
2010-06-04,0.207270,0.0,0.211974,0.080756,0.25,0.000000,0.250000
2010-07-06,0.132468,0.0,0.250000,0.117532,0.25,0.000000,0.250000
2010-08-04,0.116353,0.0,0.250000,0.133647,0.25,0.000000,0.250000
2010-09-02,0.081677,0.0,0.250000,0.168323,0.25,0.000000,0.250000
2010-10-04,0.000000,0.0,0.250000,0.250000,0.25,0.009955,0.240045
"""
df = pd.read_csv(StringIO(string))
df = df.melt(id_vars=['DATE']).sort_values(by = 'DATE')
fig = px.bar(df, x="variable", y="value", animation_frame="DATE")
fig.update_layout(legend_title_text = None)
fig.update_xaxes(title = "Asset")
fig.update_yaxes(title = "Proportion")
fig.update_layout(autosize = True, height = 600)
fig.update_layout(hovermode="x")
fig.update_layout(plot_bgcolor="#F8F8F8")
fig.update_traces(
hovertemplate=
'<i></i> %{y:.2%}'
)
fig.show()
This produces the following:
I'm trying to create an animated line plot to illustrate the price increase of 3 different asset classes over time (year), but it doesn't work and I don't know why!
What I've done so far:
get closing price data for each asset
start = datetime.datetime(2010,7,01)
end = datetime.datetime(2021,7,01)
data = pdr.get_data_yahoo(['BTC-USD', 'GC=F','^GSPC'],startDate,endDate)['Adj Close']
transpose columns into rows to avoid a lot of calculations
data['Date'] = data.index
data['Year'] = data.index.year
dataNew =data.melt(['Date', 'Year'], var_name='Asset')
dataNew = dataNew.rename(columns = {'value': 'Price'})
plot
fig = px.line(dataNew,
x = 'Date',
y = 'Price',
range_y=[0,50000],
color = 'Asset',
animation_frame = 'Year')
st.write(fig)
Output:
Short answer:
This is in fact very much possible, and the only additions you'll have to make to a standard px.line() time series using axes of type date plot setup is this:
# input data
dfi = px.data.stocks().head(50)
# new datastructure for animation
df = pd.DataFrame() # container for df with new datastructure
for i in np.arange(start,obs):
dfa = dfi.head(i).copy()
dfa['ix']=i
df = pd.concat([df, dfa])
The cool details:
Contrary to what seems to be the most common belief, and contrary to my own comments just a few minutes ago, this is in fact possible to do with px.line and a set of time series as you're describing. As long, as you massage the dataset just a little bit. The only real drawback seems to be that it might not work very well for larger datasets, since the amount of data that the figure structure will contain will be huge. But let's get back to the boring details after the cool stuff. The snippet below and the dataset px.data.stocks() will produce the following figure:
Plot 1 - Animation using the play button:
When the animation has come to an end, you can also subset the lines however you'd like.
Plot 2 - Animation using the slider:
The boring details:
I'll get back to this if the OP or anyone else is interested
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
# input data
dfi = px.data.stocks().head(50)
dfi['date'] = pd.to_datetime(dfi['date'])
start = 12
obs = len(dfi)
# new datastructure for animation
df = pd.DataFrame() # container for df with new datastructure
for i in np.arange(start,obs):
dfa = dfi.head(i).copy()
dfa['ix']=i
df = pd.concat([df, dfa])
# plotly figure
fig = px.line(df, x = 'date', y = ['GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT'],
animation_frame='ix',
# template = 'plotly_dark',
width=1000, height=600)
# attribute adjusments
fig.layout.updatemenus[0].buttons[0]['args'][1]['frame']['redraw'] = True
fig.show()
I'm trying to plot a simple heatmap using bokeh/holoviews. My data (pandas dataframe) has categoricals (on y) and datetime (on x). The problem is that the number of categorical elements is >3000 and the resulting plot appears with messed overlapped tickers on the y axis that makes it totally useless. Currently, is there a reliable way in bokeh to select only a subset of the tickers based on the zoom level?
I've already tried plotly and the result looks perfect but however I need to use bokeh/holoviews and datashader. I want also avoid to replace categoricals with numericals tickers.
I've also tried this solution but actually it doesn't work (bokeh 1.2.0).
This is a toy example representing my use case (Actually here #y is 1000 but it gives the idea)
from datetime import datetime
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.transform import linear_cmap
from bokeh.io import output_notebook
output_notebook()
# build sample data
index = pd.date_range(start='1/1/2019', periods=1000, freq='T')
data = np.random.rand(1000,100)
columns = ['col'+ str(n) for n in range(100)]
# initial data format
df = pd.DataFrame(data=data, index=index, columns=columns)
# bokeh
df = df.stack().reset_index()
df.rename(columns={'level_0':'x','level_1':'y', 0:'z'},inplace=True)
df.sort_values(by=['y'],inplace=True)
x = [
date.to_datetime64().astype('M8[ms]').astype('O')
for date in df.x.to_list()
]
data = {
'value': df.z.to_list(),
'x': x,
'y': df.y.to_list(),
'date' : df.x.to_list()
}
p = figure(x_axis_type='datetime', y_range=columns, width=900, tooltips=[("x", "#date"), ("y", "#y"), ("value", "#value")])
p.rect(x='x', y='y', width=60*1000, height=1, line_color=None,
fill_color=linear_cmap('value', 'Viridis256', low=df.z.min(), high=df.z.max()), source=data)
show(p)
Finally, I partially followed the suggestion from James and managed to get it to work using a python callback for the ticker. This solution was hard to find for me. I really searched all the Bokeh docs, examples and source code for days.
The main problem for me is that in the doc is not mentioned how I can use "ColumnDataSource" objects in the custom callback.
https://docs.bokeh.org/en/1.2.0/docs/reference/models/formatters.html#bokeh.models.formatters.FuncTickFormatter.from_py_func
Finally, this helped a lot:
https://docs.bokeh.org/en/1.2.0/docs/user_guide/interaction/callbacks.html#customjs-with-a-python-function.
So, I modified the original code as follow in the hope it can be useful to someone:
from datetime import datetime
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.transform import linear_cmap
from bokeh.io import output_notebook
from bokeh.models import FuncTickFormatter
from bokeh.models import ColumnDataSource
output_notebook()
# build sample data
index = pd.date_range(start='1/1/2019', periods=1000, freq='T')
data = np.random.rand(1000,100)
columns_labels = ['col'+ str(n) for n in range(100)]
columns = [n for n in range(100)]
# initial data format
df = pd.DataFrame(data=data, index=index, columns=columns)
# bokeh
df = df.stack().reset_index()
df.rename(columns={'level_0':'x','level_1':'y', 0:'z'},inplace=True)
df.sort_values(by=['y'],inplace=True)
x = [
date.to_datetime64().astype('M8[ms]').astype('O')
for date in df.x.to_list()
]
data = {
'value': df.z.to_list(),
'x': x,
'y': df.y.to_list(),
'y_labels_tooltip' : [columns_labels[k] for k in df.y.to_list()],
'y_ticks' : columns_labels*1000,
'date' : df.x.to_list()
}
cd = ColumnDataSource(data=data)
def ticker(source=cd):
labels = source.data['y_ticks']
return "{}".format(labels[tick])
#p = figure(x_axis_type='datetime', y_range=columns, width=900, tooltips=[("x", "#date{%F %T}"), ("y", "#y_labels"), ("value", "#value")])
p = figure(x_axis_type='datetime', width=900, tooltips=[("x", "#date{%F %T}"), ("y", "#y_labels_tooltip"), ("value", "#value")])
p.rect(x='x', y='y', width=60*1000, height=1, line_color=None,
fill_color=linear_cmap('value', 'Viridis256', low=df.z.min(), high=df.z.max()), source=cd)
p.hover.formatters = {'date': 'datetime'}
p.yaxis.formatter = FuncTickFormatter.from_py_func(ticker)
p.yaxis[0].ticker.desired_num_ticks = 20
show(p)
The result is this:
I'm hoping to create a line graph which shows the changes to flowering and fruiting times (phenophases) from year to year. For each phenophase I'd like to plot the average Day of Year and, if possible, show the min and max for each year as an error bar. I've filtered down all the data I need in a few data frames, grouped it all in a sensible way, but I can't figure out how to get it all to plot. Here's a screen grab of where I'm at: Imgur
All the examples I've found adding error bars have been based on formulas or other equal amounts over/under, but in my case the max/min will be different so I'm not sure how to integrate that. Possible just create a list of each column's data and feed that to plot? I'm playing with that now but not getting far.
Also, if anyone has general suggestions as to better ways to present this data I'm all ears. I've looked into Gantt plots but didn't get far with them, as this seems a bit more straight-forward just using matplotlib. I'm happy to put some demo data or the rest of my notebook up if anyone thinks that would help.
Edit: Here's some sample data and the code from my notebook: Gist
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
pd.set_option('display.max_columns', 40)
tick_spacing = 1
dfClean = df[['Site_Cluster', 'Species', 'Phenophase_Name',
'Phenophase_Status', 'Observation_Year', 'Day_of_Year']]
dfClean = dfClean[dfClean.Phenophase_Status == 1]
PhenoNames = ['Open flowers', 'Ripe fruits']
dfLakes = dfClean[(dfClean.Phenophase_Name.isin(PhenoNames))
& (dfClean.Site_Cluster == 'Lakes')
& (dfClean.Species == 'lapponica')]
dfLakesGrouped = dfLakes.groupby(['Observation_Year', 'Phenophase_Name'])
dfLakesReady = dfLakesGrouped.Day_of_Year.agg([np.min, np.mean, np.max]).round(0)
dfLakesReady = dfLakesReady.unstack()
print(dfLakesReady['mean'].plot())
Here's another answer:
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
rng = date_range(start='2015-01-01', periods=5, freq='24H')
df = DataFrame({'y':np.random.normal(size=len(rng))}, index=rng)
y1 = df['y']
y2 = (y1*3)
sd1 = (y1*2)
sd2 = (y1*2)
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
_ = y1.plot(yerr=sd1, ax=ax1)
_ = y2.plot(yerr=sd2, ax=ax2)
Output:
I have got a pandas dataframe whose columns I want to show as lines in a plot using a Bokeh server. Additionally, I would like to have a slider for shifting one of the lines against the other.
My problem is the update functionality when the slider value changes. I have tried the code from the sliders-example of bokeh, but it does not work.
Here is an example
import pandas as pd
from bokeh.io import vform
from bokeh.plotting import Figure, output_file, show
from bokeh.models import CustomJS, ColumnDataSource, Slider
df = pd.DataFrame([[1,2,3],[3,4,5]])
df = df.transpose()
myindex = list(df.index.values)
mysource = ColumnDataSource(df)
plot = Figure(plot_width=400, plot_height=400)
for i in range(len(mysource.column_names) - 1):
name = mysource.column_names[i]
plot.line(x = myindex, y = str(name), source = mysource)
offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1)
def update_data(attrname, old, new):
# Get the current slider values
a = offset.value
temp = df[1].shift(a)
#to finish#
offset.on_change('value', update_data)
layout = vform(offset, plot)
show(layout)
Inside the update_data-function I have to update mysource, but I cannot figure out how to do that. Can anybody point me in the right direction?
Give this a try... change a=offset.value to a=cb_obj.get('value')
Then put source.trigger('change') after you do whatever it is you are trying to do in that update_data function instead of offset.on_change('value', update_data).
Also change offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1, callback=CustomJS.from_py_func(offset))
Note this format I'm using works with flexx installed. https://github.com/zoofio/flexx if you have Python 3.5 you'll have to download the zip file, extract, and type python setup.py install as it isn't posted yet compiled for this version...