I'm trying to create an animated line plot to illustrate the price increase of 3 different asset classes over time (year), but it doesn't work and I don't know why!
What I've done so far:
get closing price data for each asset
start = datetime.datetime(2010,7,01)
end = datetime.datetime(2021,7,01)
data = pdr.get_data_yahoo(['BTC-USD', 'GC=F','^GSPC'],startDate,endDate)['Adj Close']
transpose columns into rows to avoid a lot of calculations
data['Date'] = data.index
data['Year'] = data.index.year
dataNew =data.melt(['Date', 'Year'], var_name='Asset')
dataNew = dataNew.rename(columns = {'value': 'Price'})
plot
fig = px.line(dataNew,
x = 'Date',
y = 'Price',
range_y=[0,50000],
color = 'Asset',
animation_frame = 'Year')
st.write(fig)
Output:
Short answer:
This is in fact very much possible, and the only additions you'll have to make to a standard px.line() time series using axes of type date plot setup is this:
# input data
dfi = px.data.stocks().head(50)
# new datastructure for animation
df = pd.DataFrame() # container for df with new datastructure
for i in np.arange(start,obs):
dfa = dfi.head(i).copy()
dfa['ix']=i
df = pd.concat([df, dfa])
The cool details:
Contrary to what seems to be the most common belief, and contrary to my own comments just a few minutes ago, this is in fact possible to do with px.line and a set of time series as you're describing. As long, as you massage the dataset just a little bit. The only real drawback seems to be that it might not work very well for larger datasets, since the amount of data that the figure structure will contain will be huge. But let's get back to the boring details after the cool stuff. The snippet below and the dataset px.data.stocks() will produce the following figure:
Plot 1 - Animation using the play button:
When the animation has come to an end, you can also subset the lines however you'd like.
Plot 2 - Animation using the slider:
The boring details:
I'll get back to this if the OP or anyone else is interested
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
# input data
dfi = px.data.stocks().head(50)
dfi['date'] = pd.to_datetime(dfi['date'])
start = 12
obs = len(dfi)
# new datastructure for animation
df = pd.DataFrame() # container for df with new datastructure
for i in np.arange(start,obs):
dfa = dfi.head(i).copy()
dfa['ix']=i
df = pd.concat([df, dfa])
# plotly figure
fig = px.line(df, x = 'date', y = ['GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT'],
animation_frame='ix',
# template = 'plotly_dark',
width=1000, height=600)
# attribute adjusments
fig.layout.updatemenus[0].buttons[0]['args'][1]['frame']['redraw'] = True
fig.show()
Related
I want to make line chart for the different categories where one is a different country, and one is a different country for weekly based line charts. Initially, I was able to draft line plots using seaborn but it is not quite handy like setting its label, legend, color palette and so on. I am wondering is there any way to easily reshape this data with multiple categorical variables and render line charts. In initial attempt, I tried seaborn.relplot but it is not easy to tune its parameter and hard to customize the resulted plot. Can anyone point me to any efficient way to reshape dataframe with multiple categorical columns and render a clear line chart? Any thoughts?
reproducible data & my attempt:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
dff = pd.read_csv(url, parse_dates=['weekly'])
dff.drop('Unnamed: 0', axis=1, inplace=True)
df2_bf = dff.groupby(['destination', 'weekly'])['FCF_Beef'].sum().unstack()
df2_bf = df2_bf.fillna(0)
mm = df2_bf.T
mm.columns.name = None
mm = mm[~(mm.isna().sum(1)/mm.shape[1]).gt(0.9)].fillna(0)
#Total sum per column:
mm.loc['Total',:]= mm.sum(axis=0)
mm1 = mm.T
mm1 = mm1.nlargest(6, columns=['Total'])
mm1.drop('Total', axis=1, inplace=True)
mm2 = mm1.T
mm2.reset_index(inplace=True)
mm2['weekly'] = pd.to_datetime(mm2['weekly'])
mm2['year'] = mm2['weekly'].dt.year
mm2['week'] = mm2['weekly'].dt.isocalendar().week
df = mm2.melt(id_vars=['weekly','week','year'], var_name='country')
df_ = df.groupby(['country', 'year', 'week'], as_index=False)['value'].sum()
sns.relplot(data=df_, x='week', y='value', hue='year', row='country', kind='line', height=6, aspect=2, facet_kws={'sharey': False, 'sharex': False}, sizes=(20, 10))
current plot
this is one of current plot that I made with seaborn.relplot
structure of plot is okay for me, but in seaborn.replot, it is hard to tune parameter and it is as flexible as using matplotlib. Also, I realized that the way of aggregating my data is not very efficient. I think there might be a shortcut to make the above code snippet more efficient like:
plt_data = []
for i in dff.loc[:, ['FCF_Beef','FCF_Beef']]:
...
but doing this way I faced a couple of issues to make the right plot. Can anyone point me out how to make this simple and efficient in order to make the expected line chart with matplotlib? Does anyone know any better way of doing this? Any idea? Thanks
desired output
In my desired plot, first I need to iterate list of countries, where each country has one subplot, in each subplot, x-axis shows 52 weeks and y-axis shows weeklyExport amount of different years for each country. Here is draft plot that I made with seaborn.relplot.
note that, I don't like the output from seaborn.relplot, so I am wondering how can I make above attempt more efficient with matplotlib attempt. Any idea?
As requested by the OP, following is an iterative way to plot the data.
The following example plots each year, for a given 'destination' in a single figure
This is similar to the answer for this question.
import pandas as pd
import matplotlib.pyplot as plt
# load the data
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
df = pd.read_csv(url, parse_dates=['weekly'], usecols=range(1, 6))
# groupby destination and iterate through for plotting
for g, d in df.groupby(['destination']):
# create the figure
fig, ax = plt.subplots(figsize=(7, 4))
# add lines for specific years
for year in d.weekly.dt.year.unique():
data = d[d.weekly.dt.year == year].copy() # select the data from d, by year
data['week'] = data.weekly.dt.isocalendar().week # create a week column
data.sort_values('weekly', inplace=True)
display(data.head()) # display is for jupyter, if it causes an error, use pring
data.plot(x='week', y='FCF_Beef', ax=ax, label=year)
plt.show()
Single sample plot
If we look at the tail of one of the dataframes, data.weekly.dt.isocalendar().week as putting the last day of the year as week 1, so a line is drawn back to the last data point being placed at week 1.
This function rests on datetime.datetime(2018, 12, 31).isocalendar() and is the expected behavior from the datetime module, as per this closed pandas bug.
Removing the last row with .iloc[:-1, :], is a work around
Alternatively, replace data['week'] = data.weekly.dt.isocalendar().week with data['week'] = data.weekly.dt.strftime('%W').astype('int')
data.iloc[:-1, :].plot(x='week', y='FCF_Beef', ax=ax, label=year)
Updated with all code from OP
# load the data
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
dff = pd.read_csv(url, parse_dates=['weekly'], usecols=range(1, 6))
df2_bf = dff.groupby(['destination', 'weekly'])['FCF_Beef'].sum().unstack()
df2_bf = df2_bf.fillna(0)
mm = df2_bf.T
mm.columns.name = None
mm = mm[~(mm.isna().sum(1)/mm.shape[1]).gt(0.9)].fillna(0)
#Total sum per column:
mm.loc['Total',:]= mm.sum(axis=0)
mm1 = mm.T
mm1 = mm1.nlargest(6, columns=['Total'])
mm1.drop('Total', axis=1, inplace=True)
mm2 = mm1.T
mm2.reset_index(inplace=True)
mm2['weekly'] = pd.to_datetime(mm2['weekly'])
mm2['year'] = mm2['weekly'].dt.year
mm2['week'] = mm2['weekly'].dt.strftime('%W').astype('int')
df = mm2.melt(id_vars=['weekly','week','year'], var_name='country')
# groupby destination and iterate through for plotting
for g, d in df.groupby(['country']):
# create the figure
fig, ax = plt.subplots(figsize=(7, 4))
# add lines for specific years
for year in d.weekly.dt.year.unique():
data = d[d.weekly.dt.year == year].copy() # select the data from d, by year
data.sort_values('weekly', inplace=True)
display(data.head()) # display is for jupyter, if it causes an error, use pring
data.plot(x='week', y='value', ax=ax, label=year, title=g)
plt.show()
I am currently using plotly express to create a Sunburst Chart. However, i realized that children are ordered alphabetical for nominal values. Especially for plotting months that is pretty unlucky... Do you know how to handle that issue? Maybe a property or some workaround? Below there is an example so you can try it yourself. Thanks in advance!
import plotly.express as px
import pandas as pd
import calendar
months = [x for x in calendar.month_name if x]
#Create Dataframe
data = []
for m in months:
data.append(['2018', m, 2])
df = pd.DataFrame(data, columns=['Year', 'Month', 'Value'])
#Compute Sunburst
fig = px.sunburst(df, path=['Year', 'Month'], values='Value')
fig.show()
Please Check this out. I have just added values to each months instead of hardcoding 2. So the corresponding month matches with corresponding number.
January-1, February-2, ... December-12
import plotly.express as px
import pandas as pd
import calendar
months = [x for x in calendar.month_name if x]
#Create Dataframe
data = []
for i,m in enumerate(months):
data.append(['2018', m,i+1])
print(data)
df = pd.DataFrame(data, columns=['Year', 'Month', 'Value'])
#Compute Sunburst
fig = px.sunburst(df, path=['Year', 'Month'], values='Value')
fig.show()
The other solution gives each month an angle proportional to its number. A small tweak to line 8 as follows:
data.append(['2018', m,0.00001*i+1])
gives each month the same sized piece of the pie.
A better solution is to disable the auto-sorting of the elements:
fig.update_traces(sort=False, selector=dict(type='sunburst'))
which then adds the elements in the order that they are defined in the data.
I'm hoping to create a line graph which shows the changes to flowering and fruiting times (phenophases) from year to year. For each phenophase I'd like to plot the average Day of Year and, if possible, show the min and max for each year as an error bar. I've filtered down all the data I need in a few data frames, grouped it all in a sensible way, but I can't figure out how to get it all to plot. Here's a screen grab of where I'm at: Imgur
All the examples I've found adding error bars have been based on formulas or other equal amounts over/under, but in my case the max/min will be different so I'm not sure how to integrate that. Possible just create a list of each column's data and feed that to plot? I'm playing with that now but not getting far.
Also, if anyone has general suggestions as to better ways to present this data I'm all ears. I've looked into Gantt plots but didn't get far with them, as this seems a bit more straight-forward just using matplotlib. I'm happy to put some demo data or the rest of my notebook up if anyone thinks that would help.
Edit: Here's some sample data and the code from my notebook: Gist
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
pd.set_option('display.max_columns', 40)
tick_spacing = 1
dfClean = df[['Site_Cluster', 'Species', 'Phenophase_Name',
'Phenophase_Status', 'Observation_Year', 'Day_of_Year']]
dfClean = dfClean[dfClean.Phenophase_Status == 1]
PhenoNames = ['Open flowers', 'Ripe fruits']
dfLakes = dfClean[(dfClean.Phenophase_Name.isin(PhenoNames))
& (dfClean.Site_Cluster == 'Lakes')
& (dfClean.Species == 'lapponica')]
dfLakesGrouped = dfLakes.groupby(['Observation_Year', 'Phenophase_Name'])
dfLakesReady = dfLakesGrouped.Day_of_Year.agg([np.min, np.mean, np.max]).round(0)
dfLakesReady = dfLakesReady.unstack()
print(dfLakesReady['mean'].plot())
Here's another answer:
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
rng = date_range(start='2015-01-01', periods=5, freq='24H')
df = DataFrame({'y':np.random.normal(size=len(rng))}, index=rng)
y1 = df['y']
y2 = (y1*3)
sd1 = (y1*2)
sd2 = (y1*2)
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
_ = y1.plot(yerr=sd1, ax=ax1)
_ = y2.plot(yerr=sd2, ax=ax2)
Output:
First of all I am sorry if I am not describing the problem correctly but the example should make my issue clear.
I have this dataframe and I need to plot it sorted by date, but I have lots of date (around 60), therefore pandas automatically chooses which date to plot(label) in x-axis and the dates are random. Due to visibility issue I too want to only plot selected dates in x-axis but I want it to have some pattern like january every year.
This is my code:
df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
df1 = df[df['Resource_ID'] == 32543]
df1 = df1[['Format','Range_Start','Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format','Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
df2 = df1[1:4].sum(axis=0)
else:
df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
df2.to_csv('test.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
df2.T[['entry','sum']].plot(rot = 30)
else:
df2.T[['sum']].plot(kind = 'bar')
ax1 = plt.axes()
ax1.legend(["Seitenzugriffe", "Dateiabrufe"])
plt.xlabel("")
plt.savefig('image.png')
As you can see the plot has 2010-08, 2013-09, 2014-07 as the x-axis value. How can I make it something like 2010-01, 2013-01, 2014-01 e.t.c
Thank you very much, I know this is not the optimal description but since english is not my first language this is the best I could come up with.
NOTE: Updated to answer OP question more directly.
You are mixing Pandas plotting as well as the matplotlib PyPlot API and Object-oriented API by using axes (ax1 above) methods and plt methods. The latter are two distinctly different APIs and they may not work correctly when mixed. The matplotlib documentation recommends using the object-oriented API.
While it is easy to quickly generate plots with the matplotlib.pyplot module, we recommend using the object-oriented approach for more control and customization of your plots. See the methods in the matplotlib.axes.Axes() class for many of the same plotting functions. For examples of the OO approach to Matplotlib, see the API Examples.
Here's how you can control the x-axis "tick" values/labels using proper matplotlib date formatting (see matplotlib example) with the object-oriented API. Also, see link from #ImportanceOfBeingErnest answer to another question for incompatibilities between Pandas' and matplotlib's datetime objects.
# prepare your data
df = pd.read_csv('../../../so/dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
df.head()
df1 = df[df['Resource_ID'] == 10021]
df1 = df1[['Format','Range_Start','Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format','Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
df2 = df1[1:4].sum(axis=0)
else:
df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
df2.to_csv('test.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
# convert your index to use pandas datetime format
df3 = df2.T[['entry','sum']].copy()
df3.index = pd.to_datetime(df3.index)
# for illustration, I changed a couple dates and added some dummy values
df3.loc['2014-01-01']['entry'] = 48
df3.loc['2014-05-01']['entry'] = 28
df3.loc['2015-05-01']['entry'] = 36
print(df3)
# plot your data
fig, ax = plt.subplots()
# use matplotlib date formatters
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter('%Y-%m')
# format the major ticks
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.plot(df3)
# add legend
ax.legend(["Seitenzugriffe", "Dateiabrufe"])
fig.savefig('image.png')
else:
# left as an exercise...
df2.T[['sum']].plot(kind = 'bar')
Using Bokeh 0.8.1, how can i display a long timeserie, but start 'zoomed-in' on one part, while keeping the rest of data available for scrolling ?
For instance, considering the following time serie (IBM stock price since 1980), how could i get my chart to initially display only price since 01/01/2014 ?
Example code :
import pandas as pd
import bokeh.plotting as bk
from bokeh.models import ColumnDataSource
bk.output_notebook()
TOOLS="pan,wheel_zoom,box_zoom,reset,save"
# Quandl data, too lazy to generate some random data
df = pd.read_csv('https://www.quandl.com/api/v1/datasets/GOOG/NYSE_IBM.csv')
df['Date'] = pd.to_datetime(df['Date'])
df = df[['Date', 'Close']]
#Generating a bokeh source
source = ColumnDataSource()
dtest = {}
for col in df:
dtest[col] = df[col]
source = ColumnDataSource(data=dtest)
# plotting stuff !
p = bk.figure(title='title', tools=TOOLS,x_axis_type="datetime", plot_width=600, plot_height=300)
p.line(y='Close', x='Date', source=source)
bk.show(p)
outputs :
but i want to get this (which you can achieve with the box-zoom tool - but I'd like to immediately start like this)
So, it looks (as of 0.8.1) that we need to add some more convenient ways to set ranges with datetime values. That said, although this is a bit ugly, it does currently work for me:
import time, datetime
x_range = (
time.mktime(datetime.datetime(2014, 1, 1).timetuple())*1000,
time.mktime(datetime.datetime(2016, 1, 1).timetuple())*1000
)
p = bk.figure(
title='title', tools=TOOLS,x_axis_type="datetime",
plot_width=600, plot_height=300, x_range=x_range
)