Values are plotted out of order with a datetime axis - python

This is the code for showing the 'Close' prices for Amazon, and visualizing the data using matplotlib.pyplot in Python.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('amzn_close.csv')
df = df.set_index(pd.DatetimeIndex(df['Date'].values))
plt.figure(figsize=(16,8))
plt.plot(df['Close'], label='Close')
plt.title('Close Price')
plt.xlabel('Date')
plt.ylabel('Price USD')
plt.show()
Unfortunately, it came out like this, with squiggly lines:
Can someone help me to present and visualize the data correctly?
amzn_close.csv
Date,Close
2021-03-05,3000.46
2021-04-01,3161.0
2021-03-17,3135.73
2021-02-23,3194.5
2021-03-10,3057.64
2021-03-16,3091.86
2021-03-18,3027.99
2021-02-25,3057.16
2021-04-15,3379.09
2021-03-22,3110.87
2021-04-14,3333.0
2021-03-25,3046.26
2021-03-24,3087.07
2021-02-26,3092.93
2021-04-20,3334.69
2021-04-19,3372.01
2021-04-16,3399.44
2021-04-08,3299.3
2021-03-08,2951.95
2021-03-30,3055.29
2021-03-02,3094.53
2021-03-09,3062.85
2021-02-24,3159.53
2021-02-22,3180.74
2021-04-22,3309.04
2021-03-01,3146.14
2021-03-15,3081.68
2021-03-26,3052.03
2021-04-05,3226.73
2021-03-31,3094.08
2021-03-03,3005.0
2021-04-23,3340.88
2021-04-26,3409.0
2021-03-19,3074.96
2021-03-23,3137.5
2021-04-21,3362.02
2021-03-29,3075.73
2021-04-12,3379.39
2021-04-07,3279.39
2021-04-13,3400.0
2021-04-27,3417.43
2021-04-06,3223.82
2021-03-12,3089.49
2021-03-11,3113.59
2021-03-04,2977.57
2021-04-09,3372.2

Two things to always verify are:
The 'Date' should be set as a datetime64[ns] or DatetimeIndex dtype.
pd.to_datetime()
pd.DatetimeIndex()
Parse dates when importing data
The value column, 'Close', should be a numeric dtype
Check the dtypes with df.info()
Use matplotlib directly
In this case, only the index needs to be sorted, because the y-axis is already numeric and the x-axis is already a datetime dtype.
The csv can also be read in with:
df = pd.read_csv('amzn_close.csv', parse_dates=['Date'], index_col=['Date'])
# load and format the data
df = pd.read_csv('amzn_close.csv')
df = df.set_index(pd.DatetimeIndex(df['Date'].values))
# sort the index
df.sort_index(inplace=True)
# plot
plt.figure(figsize=(16, 7))
plt.plot(df['Close'], label='Close')
plt.title('Close Price')
plt.xlabel('Date')
plt.ylabel('Price USD')
plt.show()
Use pandas.DataFrame.plot
Doesn't require sorting the index
# load the data
df = pd.read_csv('amzn_close.csv', parse_dates=['Date'], index_col=['Date'])
# plot the data
df.plot(figsize=(16, 7), title='Close Price', ylabel='Price USD', rot=0, legend=False)
Use seaborn.lineplot
Doesn't require sorting the index
# load the data
df = pd.read_csv('amzn_close.csv', parse_dates=['Date'], index_col=['Date'])
# plot
fig, ax = plt.subplots(figsize=(16, 7))
sns.lineplot(data=df, ax=ax, legend=False)
ax.set(title='Close Price', xlabel='Date', ylabel='Price USD')
Output for all implementations
Notes
Package versions:
pandas v1.2.4
seaborn v0.11.1
matplotlib v3.3.4

Related

How to create a yearly bar plot grouped by months

I'm having a difficult time trying to create a bar plot with and DataFrame grouped by year and month. With the following code I'm trying to plot the data in the created image, instead of that, is returning a second image. Also I tried to move the legend to the right and change its values to the corresponding month.
I started to get a feel for the DataFrames obtained with the groupby command, though not getting what I expected led me to ask you guys.
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.read_csv('fcc-forum-pageviews.csv', index_col='date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]
fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')
plt.show()
This is the format of the data that I am analyzing.
date,value
2016-05-09,1201
2016-05-10,2329
2016-05-11,1716
2016-05-12,10539
2016-05-13,6933
Add a sorted categorical 'month' column with pd.Categorical
Transform the dataframe to a wide format with pd.pivot_table where aggfunc='mean' is the default.
Wide format is typically best for plotting grouped bars.
pandas.DataFrame.plot returns matplotlib.axes.Axes, so there's no need to use fig, ax = plt.subplots(figsize=(10,10)).
The pandas .dt accessor is used to extract various components of 'date', which must be a datetime dtype
If 'date' is not a datetime dtype, then transform it with df.date = pd.to_datetime(df.date).
Tested with python 3.8.11, pandas 1.3.1, and matplotlib 3.4.2
Imports and Test Data
import pandas as pd
from calendar import month_name # conveniently supplies a list of sorted month names or you can type them out manually
import numpy as np # for test data
# test data and dataframe
np.random.seed(365)
rows = 365 * 3
data = {'date': pd.bdate_range('2021-01-01', freq='D', periods=rows), 'value': np.random.randint(100, 1001, size=(rows))}
df = pd.DataFrame(data)
# select data within specified quantiles
df = df[df.value.gt(df.value.quantile(0.025)) & df.value.lt(df.value.quantile(0.975))]
# display(df.head())
date value
0 2021-01-01 694
1 2021-01-02 792
2 2021-01-03 901
3 2021-01-04 959
4 2021-01-05 528
Transform and Plot
If 'date' has been set to the index, as stated in the comments, use the following:
df['months'] = pd.Categorical(df.index.strftime('%B'), categories=months, ordered=True)
# create the month column
months = month_name[1:]
df['months'] = pd.Categorical(df.date.dt.strftime('%B'), categories=months, ordered=True)
# pivot the dataframe into the correct shape
dfp = pd.pivot_table(data=df, index=df.date.dt.year, columns='months', values='value')
# display(dfp.head())
months January February March April May June July August September October November December
date
2021 637.9 595.7 569.8 508.3 589.4 557.7 508.2 545.7 560.3 526.2 577.1 546.8
2022 567.9 521.5 625.5 469.8 582.6 627.3 630.4 474.0 544.1 609.6 526.6 572.1
2023 521.1 548.5 484.0 528.2 473.3 547.7 525.3 522.4 424.7 561.3 513.9 602.3
# plot
ax = dfp.plot(kind='bar', figsize=(12, 4), ylabel='Mean Page Views', xlabel='Year', rot=0)
_ = ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
Just pass the ax you defined to pandas:
bar_plot.plot(ax = ax, kind='bar')
If you also want to replace months numbers with names, you have to get those labels, replace numbers with names and re-define the legend by passing to it the new labels:
handles, labels = ax.get_legend_handles_labels()
new_labels = [datetime.date(1900, int(monthinteger), 1).strftime('%B') for monthinteger in labels]
ax.legend(handles = handles, labels = new_labels, loc = 'upper left', bbox_to_anchor = (1.02, 1))
Complete Code
import pandas as pd
from matplotlib import pyplot as plt
import datetime
df = pd.read_csv('fcc-forum-pageviews.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]
fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(ax = ax, kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')
handles, labels = ax.get_legend_handles_labels()
new_labels = [datetime.date(1900, int(monthinteger), 1).strftime('%B') for monthinteger in labels]
ax.legend(handles = handles, labels = new_labels, loc = 'upper left', bbox_to_anchor = (1.02, 1))
plt.show()
(plot generated with fake data)

Any way to correctly make weekly time series line chart in matplotlib?

I am trying to make a linear chart that visualizes the product's export and sales activity by using weekly base data. Basically, I want to use this data to see how the exporting number of different commodities is changing along with weekly time base data. I could able to aggregate data for making a line chart for the export trends of different commodities for top-5 counties, but the resulted plot in my attempt didn't make my expected output. Can anyone point me out how to make this right? Is there any better way to make a product export trend line chart using matplotlib or seaborn in python? Can anyone suggest a possible better way of doing this? Any thoughts
my current attempt
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
url = 'https://gist.githubusercontent.com/adamFlyn/e9ad428a266eccb5dc38b4cee7084372/raw/cfcbe9cf0ed19ada6a4ea409644db7414de9c87f/sales_df.csv'
df = pd.read_csv(url)
df.drop(columns=['Unnamed: 0'], inplace=True)
df_grp = df.groupby(['weekEndingDate','country', 'commodity'])['weeklyExports'].sum().unstack().reset_index()
df_grp = df_grp .fillna(0)
for c in df_grp[['FCF_Beef', 'FCF_Pork']]:
fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
df_grp_new = df_grp .groupby(['country', 'weekEndingDate'])[c].sum().unstack().fillna(0)
df_grp_new = df_grp_new .T
df_grp_new.drop([col for col, val in df_grp_new .sum().iteritems() if val < 1000], axis=1, inplace=True)
for col in df_grp_new.columns:
sns.lineplot(x='WeekEndingDate', y='weekly export', ci=None, data=df_grp_new, label=col)
ax.relim()
ax.autoscale_view()
ax.xaxis.label.set_visible(False)
plt.legend(bbox_to_anchor=(1., 1), loc='upper left')
plt.ylabel('weekly export')
plt.margins(x=0)
plt.title(c)
plt.tight_layout()
plt.grid(True)
plt.show()
plt.close()
but these attempts didn't make my expected output. Essentially, I want to see how weekly export of different commodities like beef and pork for different countries by weekly base time series. Can anyone suggest to me what went wrong in my code? How can I get a desirable line chart by using the above data? Any idea?
desired output
here is the example desired plots (just style) that I want to make in my attempt:
Plenty of ways to do it. If you make your time column into datetime seaborn will handle formatting the axis for you.
You could use a facetgrid to split by commodity, or if you want finer control over the individual charts plot them using lineplot, filtering the df by the commodity prior.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
url = 'https://gist.githubusercontent.com/adamFlyn/e9ad428a266eccb5dc38b4cee7084372/raw/cfcbe9cf0ed19ada6a4ea409644db7414de9c87f/sales_df.csv'
df = pd.read_csv(url)
df.drop(columns=['Unnamed: 0'], inplace=True)
df['weekEndingDate'] = pd.to_datetime(df['weekEndingDate'])
# sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.FacetGrid(df, col='commodity', height=8, sharex=False, sharey=False, legend_out=True)
g.map_dataframe(sns.lineplot, x='weekEndingDate',y='weeklyExports', hue='country', ci=None)
g.add_legend()

Lineplot doesn't show all dates in axis

I have the followings:
fig, ax = plt.subplots(figsize=(40, 10))
sns.lineplot(x="Date", y="KFQ imports", data=df_dry, color="BLACK", ax=ax)
sns.lineplot(x="Date", y="QRR imports", data=df_dry, color="RED",ax=ax)
ax.set(xlabel="Date", ylabel="Value", )
x_dates = df_dry['Date'].dt.strftime('%b-%Y')
ax.set_xticklabels(labels=x_dates, rotation=45)
Result
When I use a barchart (sns.barplot) the entire spectrum of dates are shown. Am I missing something for the line chart? I
The idea would be to set the xticks to exactly the dates in your dataframe. To this end you can use set_xticks(df.Date.values). It might then be good to use a custom formatter for the dates, which would allow to format them in the way you want them.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
import seaborn as sns
df = pd.DataFrame({"Date" : ["2018-01-22", "2018-04-04", "2018-12-06"],
"val" : [1,2,3]})
df.Date = pd.to_datetime(df.Date)
ax = sns.lineplot(data=df, x="Date", y="val", marker="o")
ax.set(xticks=df.Date.values)
ax.xaxis.set_major_formatter(dates.DateFormatter("%d-%b-%Y"))
plt.show()
Note how the same can be achieved without seaborn, as
ax = df.set_index("Date").plot(x_compat=True, marker="o")
ax.set(xticks=df.Date.values)
ax.xaxis.set_major_formatter(dates.DateFormatter("%d-%b-%Y"))
plt.show()
One way to circumvent the date tick sparsification from Seaborn is to convert the date column to string values.
df['dates_str'] = df.dates.astype(str)
Plotting using the new column as the x-axis will show all the dates.

Make line thicker in a matplotlib time series 'spaghetti' plot

Thanks for reading.
I have a plot and would like to make the latest year in my dataset stand out. My data is just one long time series, so I want to plot YoY comparisons, so I pivot it, then plot it.
The first block of code runs and gives me roughly what I am after (without the latest year standing out), then in the second block of code I try to make my latest stand out (which technically works) but the colour is different, doesn't match the legend and can even be the same colour as another year.
I can see the old series in the background. I think I am creating another plot and putting this on top, but how can I select the original line for the latest year (in this case 2018) and just make that stand out?
Or is there a better way to do this whole process?
Any tips on code, formatting or anything would be much appreciated, I am very new to this!
Thanks so much!
13sen1
FIRST BLOCK
# import
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create fake time series dataframe
index = pd.date_range(start='01-Jan-2012', end='01-01-2019', freq='M')
data = np.random.randn(len(index))
df = pd.DataFrame(data, index, columns=['Data'])
# pivot to get by month in rows, then year in columns
df_pivot = pd.pivot_table(df, index=df.index.month, columns=df.index.year, values='Data')
# plot
df_pivot.plot(title='Data by Year', figsize=(6,4))
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.tight_layout()
plt.show()
firstblockresult
SECOND BLOCK
# import
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create fake time series dataframe
index = pd.date_range(start='01-Jan-2012', end='01-01-2019', freq='M')
data = np.random.randn(len(index))
df = pd.DataFrame(data, index, columns=['Data'])
# pivot to get by month in rows, then year in columns
df_pivot = pd.pivot_table(df, index=df.index.month, columns=df.index.year, values='Data')
# plot
df_pivot.plot(title='Data by Year', figsize=(6,4))
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.tight_layout()
# plot the thicker last line
# **************** ERROR HERE *************************
plt.plot(df_pivot.iloc[:, -1:], lw=4, ls='--')
# **************** ERROR HERE *************************
plt.show()
secondblockresult
You can make the line of the last year thicker. Because columns are sorted, it will be the last line in the axes (index -1).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create fake time series dataframe
index = pd.date_range(start='01-Jan-2012', end='01-01-2019', freq='M')
data = np.random.randn(len(index))
df = pd.DataFrame(data, index, columns=['Data'])
# pivot to get by month in rows, then year in columns
df_pivot = pd.pivot_table(df, index=df.index.month, columns=df.index.year, values='Data')
# plot
ax = df_pivot.plot(title='Data by Year', figsize=(6,4))
ax.get_lines()[-1].set_linewidth(5)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
ax.figure.tight_layout()
plt.show()

How to format x axis in matplotlib when plotting pandas series with timestamp as index? [duplicate]

Compare the following code:
test = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
ax = test.plot()
I added DateFormatter in the end:
test = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
ax = test.plot()
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d\n\n%a')) ## Added this line
The issue with the second graph is that it starts on 5-24 instead 5-25. Also, 5-25 of 2017 is Thursday not Monday. What is causing the issue? Is this timezone related? (I don't understand why the date numbers are stacked on top of each other either)
In general the datetime utilities of pandas and matplotlib are incompatible. So trying to use a matplotlib.dates object on a date axis created with pandas will in most cases fail.
One reason is e.g. seen from the documentation
datetime objects are converted to floating point numbers which represent time in days since 0001-01-01 UTC, plus 1. For example, 0001-01-01, 06:00 is 1.25, not 0.25.
However, this is not the only difference and it is thus advisable not to mix pandas and matplotlib when it comes to datetime objects.
There is however the option to tell pandas not to use its own datetime format. In that case using the matplotlib.dates tickers is possible. This can be steered via.
df.plot(x_compat=True)
Since pandas does not provide sophisticated formatting capabilities for dates, one can use matplotlib for plotting and formatting.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
df = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
df['date'] = pd.to_datetime(df['date'])
usePandas=True
#Either use pandas
if usePandas:
df = df.set_index('date')
df.plot(x_compat=True)
plt.gca().xaxis.set_major_locator(dates.DayLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.gcf().autofmt_xdate(rotation=0, ha="center")
# or use matplotlib
else:
plt.plot(df["date"], df["ratio1"])
plt.gca().xaxis.set_major_locator(dates.DayLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.show()
Updated using the matplotlib object oriented API
usePandas=True
#Either use pandas
if usePandas:
df = df.set_index('date')
ax = df.plot(x_compat=True, figsize=(6, 4))
ax.xaxis.set_major_locator(dates.DayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
ax.invert_xaxis()
ax.get_figure().autofmt_xdate(rotation=0, ha="center")
# or use matplotlib
else:
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot('date', 'ratio1', data=df)
ax.xaxis.set_major_locator(dates.DayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
fig.invert_xaxis()
plt.show()

Categories

Resources