How can I plot different length pandas series with matplotlib? - python

I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!

The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()

The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:

Related

How to plot a variable dataframe

I have a dataframe with a variable number of stock prices. In other words, I have to be able to plot the entire Dataframe, because I may encounter 1 to 10 stocks prices.
The x axis are dates, the Y axis are Stock prices. Here is a sample of my Df:
df = pd.DataFrame(all_Assets)
df2 = df.transpose()
print(df2)
Close Close Close
Date
2018-12-12 00:00:00-05:00 40.802803 24.440001 104.500526
2018-12-13 00:00:00-05:00 41.249191 25.119333 104.854965
2018-12-14 00:00:00-05:00 39.929325 24.380667 101.578560
2018-12-17 00:00:00-05:00 39.557732 23.228001 98.570381
2018-12-18 00:00:00-05:00 40.071678 22.468666 99.605057
This is not working
fig = go.Figure(data=go.Scatter(df2, mode='lines'),)
I need to plot this entire dataframe on a single chart, with 3 different lines. But the code has to adapt automatically if there is a fourth stock, fifth stock e.g. By the way , I want it to be a Logarithmic plot.
There is a sample in the reference, so let's try to graph it in wide and long format with express and in wide and long format with the graph object. You can choose from these four types to do what you need.
express wide format
df.head()
date GOOG AAPL AMZN FB NFLX MSFT
0 2018-01-01 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 2018-01-08 1.018172 1.011943 1.061881 0.959968 1.053526 1.015988
2 2018-01-15 1.032008 1.019771 1.053240 0.970243 1.049860 1.020524
3 2018-01-22 1.066783 0.980057 1.140676 1.016858 1.307681 1.066561
4 2018-01-29 1.008773 0.917143 1.163374 1.018357 1.273537 1.040708
import plotly.express as px
df = px.data.stocks()
fig = px.line(df, x='date', y=df.columns[1:])
fig.show()
express long format
df_long = df.melt(id_vars='date', value_vars=df.columns[1:],var_name='ticker')
px.line(df_long, x='date', y='value', color='ticker')
graph_objects wide format
import plotly.graph_objects as go
fig = go.Figure()
for ticker in df.columns[1:]:
fig.add_trace(go.Scatter(x=df['date'], y=df[ticker], name=ticker))
fig.show()
graph_objects long format
fig = go.Figure()
for ticker in df_long.ticker.unique():
dff = df_long.query('ticker == #ticker')
fig.add_trace(go.Scatter(x=dff['date'], y=dff['value'], name=ticker))
fig.show()
I recommend you to use pandas.DataFrame.plot. A minimal working example for your case should be just
df2.plot()
. Then just play around with the plot() method and your df2 dataframe to get exactly the output you need.

Plot boxplots for minute - and hourly data using pandas and seaborn

I've got the following dataframe:
dfB1
Date_and_time MP
2020-08-28 19:05:00.066663676 75.0
2020-08-28 19:05:00.133330342 70.0
2020-08-28 19:05:00.199997008 76.0
2020-08-28 19:05:00.266663674 85.0
2020-08-28 19:05:00.333330340 73.0
... ...
2020-08-29 01:59:50.666414770 1454.0
2020-08-29 01:59:50.733081436 1359.0
2020-08-29 01:59:50.799748102 1320.0
2020-08-29 01:59:50.866414768 1217.0
2020-08-29 01:59:50.933081434 1246.0
373364 rows × 1 columns
My goal is to create a plot which displays boxplots for every 1 or 5 or 30 minutes, or even every 1 hour. The datetimeindex is in the correct format (data was collected at 15 Hz, which means every datapoint is 66666666 nanaseconds) in order to index to 'hours'.
dfB1.index
DatetimeIndex(['2020-08-28 19:05:00.066663676',
'2020-08-28 19:05:00.133330342',
'2020-08-28 19:05:00.199997008',
'2020-08-28 19:05:00.266663674',
'2020-08-28 19:05:00.333330340',
...
'2020-08-29 01:59:50.666414770',
'2020-08-29 01:59:50.733081436',
'2020-08-29 01:59:50.799748102',
'2020-08-29 01:59:50.866414768',
'2020-08-29 01:59:50.933081434'],
dtype='datetime64[ns]', name='Date_and_time', length=373364, freq='66666666N')
I've tried plotting using seaborn, and I get a result. But I can't interact with the plot and it is also plotted very poorly. I am familiar with plotly, but I can't seem to find a way to integrate plotly. Also, the minute plot is completely wrong. I only get 59 points on the x-axis. What should I do to interact with the plots and to get boxplots every minute (or every 5 minutes)? I've also read and tried functions described here: Box plot of hourly data in Time Series Python
import seaborn as sns
fig, ax = plt.subplots(figsize=(15,5))
sns.boxplot(x=dfB1.index.hour, y=dfB1['MP'], ax=ax)
hour gives only the hours, i.e. both 2020-01-01 00:00 and 2020-01-10 00:00 will give 0. I think you want .floor:
sns.boxplot(x=dfB1.index.floor('H'), y=dfB1['MP'], ax=ax)
and also:
sns.boxplot(x=dfB1.index.floor('5Min'), y=dfB1['MP'], ax=ax)

Python matplotlib: data labels for multiple line graphs

.In existing thread (Annotate Time Series plot in Matplotlib), they annotate a single line graph. I am after annotation of multiple line graphs that share the same -axis: I have two data frames which look like as follow:
df:
Value
Week
2020-04-05 0.330967
2020-04-12 1.307075
2020-04-19 2.406805
2020-04-26 2.562565
2020-05-03 2.868995
2020-05-10 5.174968
2020-05-17 5.734933
2020-05-24 6.903961
2020-05-31 7.205925
2020-06-07 9.960470
2020-06-14 11.106135
2020-06-21 12.356842
2020-06-28 13.247175
2020-07-05 13.600287
2020-07-12 15.098775
2020-07-19 16.754835
2020-07-26 18.596575
2020-08-02 20.118878
2020-08-09 21.168825
2020-08-16 21.201978
2020-08-23 21.784821
2020-08-30 22.329772
2020-09-06 23.981835
2020-09-13 23.981835
2020-09-20 23.981835
df2:
Value
Date
2020-09-27 29.003255
2020-10-04 29.642155
2020-10-11 30.872583
2020-10-18 32.492713
2020-10-25 33.436226
2020-11-01 35.187827
2020-11-08 35.589155
2020-11-15 37.185094
2020-11-22 37.575597
2020-11-29 39.273018
2020-12-06 40.047140
2020-12-13 41.621320
2020-12-20 42.563794
2020-12-27 43.750932
2021-01-03 44.823089
2021-01-10 45.797449
2021-01-17 47.109407
2021-01-24 48.045107
2021-01-31 49.472744
2021-02-07 50.355325
2021-02-14 51.717578
2021-02-21 52.602765
2021-02-28 53.886987
2021-03-07 54.888933
2021-03-14 56.108036
2021-03-21 57.226216
2021-03-28 58.345462
I plot these two data frames as a line graph using the following code:
I want to plot these data frames and want to show the data labels on the graph. For this purpose, I was following this article (https://queirozf.com/entries/add-labels-and-text-to-matplotlib-plots-annotation-examples) to plot labels on the line graph. As I have two different data frames so I tried a slightly different method to get the value of xs and ys. Here is my code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
ys = np.array([df.index,df2.index])
xs = np.array([df.Value,df2.Value])
fig, ax = plt.subplots(figsize=(12,6))
ax.plot(df.index,df['Value'],'-',color='c')
ax.plot(df2.index,df2['Value'],'-',color='g')
for x,y in zip(xs,ys):
label = "{:.2f}".format(y)
plt.annotate(label, (x,y), textcoords="offset points", ha='center')
plt.show()
When I ran the above code, it gave me the following error:
TypeError: unsupported format string passed to DatetimeIndex.__format__
Could anyone guide me where am I making the mistake?
The problems could be solved by keeping things more clear. Specifically, you make an array of appended data from the two data frames and then you sometimes use that, and sometimes use the unappended data frames, and things are getting confused.
Instead, I'd suggest just keep the data frames separate throughout, since you are clearly interpreting them as distinct because you plot them in different colors, and loop over through the dataframes so you don't duplicate code. So something like this:
df0 = pd.read_csv("data5001.csv", sep="\s+") # uninteresting, my reading in the data, but do what you have here
df1 = pd.read_csv("data5002.csv", sep="\s+")
fig, ax = plt.subplots(figsize=(16,8)) # basically what you have
ax.plot(df0['Date'], df0['Value'],'-',color='c')
ax.plot(df1['Date'], df1['Value'],'-',color='g')
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
for df in (df0, df1): # loop through the dataframes
for index, v in df.iterrows(): # loop through the data in each frame
label = "{:.2f}".format(v['Value']) # I assume you want the value and not the date, but, whatever, it should be clear now
plt.annotate(label, (v['Date'], v['Value']), ha='center')
I won't address the over-crowding problems since that's an entirely separate question.

Pandas dataframe groupby plot

I have a dataframe which is structured as:
Date ticker adj_close
0 2016-11-21 AAPL 111.730
1 2016-11-22 AAPL 111.800
2 2016-11-23 AAPL 111.230
3 2016-11-25 AAPL 111.790
4 2016-11-28 AAPL 111.570
...
8 2016-11-21 ACN 119.680
9 2016-11-22 ACN 119.480
10 2016-11-23 ACN 119.820
11 2016-11-25 ACN 120.740
...
How can I plot based on the ticker the adj_close versus Date?
Simple plot,
you can use:
df.plot(x='Date',y='adj_close')
Or you can set the index to be Date beforehand, then it's easy to plot the column you want:
df.set_index('Date', inplace=True)
df['adj_close'].plot()
If you want a chart with one series by ticker on it
You need to groupby before:
df.set_index('Date', inplace=True)
df.groupby('ticker')['adj_close'].plot(legend=True)
If you want a chart with individual subplots:
grouped = df.groupby('ticker')
ncols=2
nrows = int(np.ceil(grouped.ngroups/ncols))
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(12,4), sharey=True)
for (key, ax) in zip(grouped.groups.keys(), axes.flatten()):
grouped.get_group(key).plot(ax=ax)
ax.legend()
plt.show()
Similar to Julien's answer above, I had success with the following:
fig, ax = plt.subplots(figsize=(10,4))
for key, grp in df.groupby(['ticker']):
ax.plot(grp['Date'], grp['adj_close'], label=key)
ax.legend()
plt.show()
This solution might be more relevant if you want more control in matlab.
Solution inspired by: https://stackoverflow.com/a/52526454/10521959
The question is How can I plot based on the ticker the adj_close versus Date?
This can be accomplished by reshaping the dataframe to a wide format with .pivot or .groupby, or by plotting the existing long form dataframe directly with seaborn.
In the following sample data, the 'Date' column has a datetime64[ns] Dtype.
Convert the Dtype with pandas.to_datetime if needed.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
Imports and Sample Data
import pandas as pd
import pandas_datareader as web # for sample data; this can be installed with conda if using Anaconda, otherwise pip
import seaborn as sns
import matplotlib.pyplot as plt
# sample stock data, where .iloc[:, [5, 6]] selects only the 'Adj Close' and 'tkr' column
tickers = ['aapl', 'acn']
df = pd.concat((web.DataReader(ticker, data_source='yahoo', start='2020-01-01', end='2022-06-21')
.assign(ticker=ticker) for ticker in tickers)).iloc[:, [5, 6]]
# display(df.head())
Date Adj Close ticker
0 2020-01-02 73.785904 aapl
1 2020-01-03 73.068573 aapl
2 2020-01-06 73.650795 aapl
3 2020-01-07 73.304420 aapl
4 2020-01-08 74.483604 aapl
# display(df.tail())
Date Adj Close ticker
1239 2022-06-14 275.119995 acn
1240 2022-06-15 281.190002 acn
1241 2022-06-16 270.899994 acn
1242 2022-06-17 275.380005 acn
1243 2022-06-21 282.730011 acn
pandas.DataFrame.pivot & pandas.DataFrame.plot
pandas plots with matplotlib as the default backend.
Reshaping the dataframe with pandas.DataFrame.pivot converts from long to wide form, and puts the dataframe into the correct format to plot.
.pivot does not aggregate data, so if there is more than 1 observation per index, per ticker, then use .pivot_table
Adding subplots=True will produce a figure with two subplots.
# reshape the long form data into a wide form
dfp = df.pivot(index='Date', columns='ticker', values='Adj Close')
# display(dfp.head())
ticker aapl acn
Date
2020-01-02 73.785904 203.171112
2020-01-03 73.068573 202.832764
2020-01-06 73.650795 201.508224
2020-01-07 73.304420 197.157654
2020-01-08 74.483604 197.544434
# plot
ax = dfp.plot(figsize=(11, 6))
Use seaborn, which accepts long form data, so reshaping the dataframe to a wide form isn't necessary.
seaborn is a high-level api for matplotlib
sns.lineplot: axes-level plot
fig, ax = plt.subplots(figsize=(11, 6))
sns.lineplot(data=df, x='Date', y='Adj Close', hue='ticker', ax=ax)
sns.relplot: figure-level plot
Adding row='ticker', or col='ticker', will generate a figure with two subplots.
g = sns.relplot(kind='line', data=df, x='Date', y='Adj Close', hue='ticker', aspect=1.75)

Remove interpolation Time series plot for missing values

I'm trying to plot a time series data but I have some problems.
I'm using this code:
from matplotlib import pyplot as plt
plt.figure('Fig')
plt.plot(data.index,data.Colum,'g', linewidth=2.0,label='Data')
And I get this:
But I dont want the interpolation between missing values!
How can I achieve this?
Since you are using pandas you could do something like this:
import pandas as pd
import matplotlib.pyplot as plt
pd.np.random.seed(1234)
idx = pd.date_range(end=datetime.today().date(), periods=10, freq='D')
vals = pd.Series(pd.np.random.randint(1, 10, size=idx.size), index=idx)
vals.iloc[4:8] = pd.np.nan
print vals
Here is an example of a column from a DataFrame with DatetimeIndex
2016-03-29 4.0
2016-03-30 7.0
2016-03-31 6.0
2016-04-01 5.0
2016-04-02 NaN
2016-04-03 NaN
2016-04-04 NaN
2016-04-05 NaN
2016-04-06 9.0
2016-04-07 1.0
Freq: D, dtype: float64
To plot it without dates where data is NaN you could do something like this:
fig, ax = plt.subplots()
ax.plot(range(vals.dropna().size), vals.dropna())
ax.set_xticklabels(vals.dropna().index.date.tolist());
fig.autofmt_xdate()
Which should produce a plot like this:
The trick here is to replace the dates with some range of values that do not trigger matplotlib's internal date processing when you call .plot method.
Later, when the plotting is done, replace the ticklabels with actual dates.
Optionally, call .autofmt_xdate() to make labels readable.

Categories

Resources