Make line thicker in a matplotlib time series 'spaghetti' plot - python

Thanks for reading.
I have a plot and would like to make the latest year in my dataset stand out. My data is just one long time series, so I want to plot YoY comparisons, so I pivot it, then plot it.
The first block of code runs and gives me roughly what I am after (without the latest year standing out), then in the second block of code I try to make my latest stand out (which technically works) but the colour is different, doesn't match the legend and can even be the same colour as another year.
I can see the old series in the background. I think I am creating another plot and putting this on top, but how can I select the original line for the latest year (in this case 2018) and just make that stand out?
Or is there a better way to do this whole process?
Any tips on code, formatting or anything would be much appreciated, I am very new to this!
Thanks so much!
13sen1
FIRST BLOCK
# import
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create fake time series dataframe
index = pd.date_range(start='01-Jan-2012', end='01-01-2019', freq='M')
data = np.random.randn(len(index))
df = pd.DataFrame(data, index, columns=['Data'])
# pivot to get by month in rows, then year in columns
df_pivot = pd.pivot_table(df, index=df.index.month, columns=df.index.year, values='Data')
# plot
df_pivot.plot(title='Data by Year', figsize=(6,4))
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.tight_layout()
plt.show()
firstblockresult
SECOND BLOCK
# import
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create fake time series dataframe
index = pd.date_range(start='01-Jan-2012', end='01-01-2019', freq='M')
data = np.random.randn(len(index))
df = pd.DataFrame(data, index, columns=['Data'])
# pivot to get by month in rows, then year in columns
df_pivot = pd.pivot_table(df, index=df.index.month, columns=df.index.year, values='Data')
# plot
df_pivot.plot(title='Data by Year', figsize=(6,4))
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.tight_layout()
# plot the thicker last line
# **************** ERROR HERE *************************
plt.plot(df_pivot.iloc[:, -1:], lw=4, ls='--')
# **************** ERROR HERE *************************
plt.show()
secondblockresult

You can make the line of the last year thicker. Because columns are sorted, it will be the last line in the axes (index -1).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create fake time series dataframe
index = pd.date_range(start='01-Jan-2012', end='01-01-2019', freq='M')
data = np.random.randn(len(index))
df = pd.DataFrame(data, index, columns=['Data'])
# pivot to get by month in rows, then year in columns
df_pivot = pd.pivot_table(df, index=df.index.month, columns=df.index.year, values='Data')
# plot
ax = df_pivot.plot(title='Data by Year', figsize=(6,4))
ax.get_lines()[-1].set_linewidth(5)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
ax.figure.tight_layout()
plt.show()

Related

Setting the axes date formatting of a pandas stacked bar subplot

I'm attempting to plot a pandas stacked bar plot with the x axis showing Months on the major ticks, or years on Jan 1, ideally with small ticks identifying the weeks but with no label.
I have a dataset with a datetime index that was then grouped by week and then I plot that dataset. If I don't attempt to control the settings the dates show up but are vertical and don't fit. So I used the set formatter to fix that but then the axes changed to 1970 as if following an index number instead of date. If I replace the pandas plotting with a regular bar chart, the "ConciseDateFormatter" works as desired/expected. But I wanted to use stacked with pandas as creating a regular stacked bar chart is a pain. I don't understand why I can't control pandas axes like I can a regular plot.
One thing I notice is that the index is shown as an object. If I convert it to to_datetime() it then adds 00:00 for times that I don't want on the axes or my data.
My data is a simple set of weekly random data:
date A B C D
3/20/2022 1.540765154 0.504616419 1.543679189 2.952934623
3/27/2022 1.781135128 4.594966635 4.799026389 3.499803401
4/3/2022 0.254059207 0.69835265 0.323039575 1.628138491
4/10/2022 3.112760301 0.287056897 4.372938373 0.130817579
4/17/2022 0.497273044 0.913246096 1.296612207 1.250610278
4/24/2022 1.370087689 3.124985109 4.322253295 4.49571603
5/1/2022 3.952629538 3.976896924 1.679311114 1.265443147
5/8/2022 3.470328161 1.266161308 3.990502436 1.364929959
5/15/2022 2.296588269 4.639761391 0.04685036 1.438471692
5/22/2022 3.443458637 2.66592719 0.968656871 2.349325343
5/29/2022 1.820278464 4.794211675 2.435710815 2.156110694
6/5/2022 4.328825266 0.049132356 1.842839099 3.665701299
6/12/2022 0.184631564 0.412976815 4.787477069 4.80052839
6/19/2022 4.846734385 3.471474741 1.808871854 2.440013553
6/26/2022 1.612870444 0.70191857 3.55713114 1.438699834
7/3/2022 2.896859156 4.025996887 0.209608767 4.174881655
Code:
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
maxval = 200
values = ['A','B','C','D']
cum = [v + '_CUM' for v in values]
df = pd.read_csv('test_data.csv', index_col='date', parse_dates=True,
infer_datetime_format=True)
#df.index = pd.to_datetime(df.index.date).strftime("%b %d")
df = df.join(df.cumsum(), rsuffix="_CUM")
df = df.join(df[cum]/maxval * 100, rsuffix="_LIFE")
fig, axs = plt.subplots(nrows=2, ncols=1, sharex=False, squeeze=False,
facecolor='white')
axs = axs.flatten()
ax = axs[0]
df[values].plot.bar(ax=ax, grid=True, stacked=True, legend=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.ConciseDateFormatter
(ax.xaxis.get_major_locator()))
# ax.xaxis.set_tick_params(rotation = 0)
plt.show(block=False)

Any way to correctly make weekly time series line chart in matplotlib?

I am trying to make a linear chart that visualizes the product's export and sales activity by using weekly base data. Basically, I want to use this data to see how the exporting number of different commodities is changing along with weekly time base data. I could able to aggregate data for making a line chart for the export trends of different commodities for top-5 counties, but the resulted plot in my attempt didn't make my expected output. Can anyone point me out how to make this right? Is there any better way to make a product export trend line chart using matplotlib or seaborn in python? Can anyone suggest a possible better way of doing this? Any thoughts
my current attempt
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
url = 'https://gist.githubusercontent.com/adamFlyn/e9ad428a266eccb5dc38b4cee7084372/raw/cfcbe9cf0ed19ada6a4ea409644db7414de9c87f/sales_df.csv'
df = pd.read_csv(url)
df.drop(columns=['Unnamed: 0'], inplace=True)
df_grp = df.groupby(['weekEndingDate','country', 'commodity'])['weeklyExports'].sum().unstack().reset_index()
df_grp = df_grp .fillna(0)
for c in df_grp[['FCF_Beef', 'FCF_Pork']]:
fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
df_grp_new = df_grp .groupby(['country', 'weekEndingDate'])[c].sum().unstack().fillna(0)
df_grp_new = df_grp_new .T
df_grp_new.drop([col for col, val in df_grp_new .sum().iteritems() if val < 1000], axis=1, inplace=True)
for col in df_grp_new.columns:
sns.lineplot(x='WeekEndingDate', y='weekly export', ci=None, data=df_grp_new, label=col)
ax.relim()
ax.autoscale_view()
ax.xaxis.label.set_visible(False)
plt.legend(bbox_to_anchor=(1., 1), loc='upper left')
plt.ylabel('weekly export')
plt.margins(x=0)
plt.title(c)
plt.tight_layout()
plt.grid(True)
plt.show()
plt.close()
but these attempts didn't make my expected output. Essentially, I want to see how weekly export of different commodities like beef and pork for different countries by weekly base time series. Can anyone suggest to me what went wrong in my code? How can I get a desirable line chart by using the above data? Any idea?
desired output
here is the example desired plots (just style) that I want to make in my attempt:
Plenty of ways to do it. If you make your time column into datetime seaborn will handle formatting the axis for you.
You could use a facetgrid to split by commodity, or if you want finer control over the individual charts plot them using lineplot, filtering the df by the commodity prior.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
url = 'https://gist.githubusercontent.com/adamFlyn/e9ad428a266eccb5dc38b4cee7084372/raw/cfcbe9cf0ed19ada6a4ea409644db7414de9c87f/sales_df.csv'
df = pd.read_csv(url)
df.drop(columns=['Unnamed: 0'], inplace=True)
df['weekEndingDate'] = pd.to_datetime(df['weekEndingDate'])
# sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.FacetGrid(df, col='commodity', height=8, sharex=False, sharey=False, legend_out=True)
g.map_dataframe(sns.lineplot, x='weekEndingDate',y='weeklyExports', hue='country', ci=None)
g.add_legend()

How to create a min-max lineplot by month

I have retail beef ad counts time series data, and I intend to make stacked line chart aim to show On a three-week average basis, quantity of average ads that grocers posted per store last week. To do so, I managed to aggregate data for plotting and tried to make line chart that I want. The main motivation is based on context of the problem and desired plot. In my attempt, I couldn't get very nice line chart because it is not informative to understand. I am wondering how can I achieve this goal in matplotlib. Can anyone suggest me what should I do from my current attempt? Any thoughts?
reproducible data and current attempt
Here is minimal reproducible data that I used in my current attempt:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from datetime import timedelta, datetime
url = 'https://gist.githubusercontent.com/adamFlyn/96e68902d8f71ad62a4d3cda135507ad/raw/4761264cbd55c81cf003a4219fea6a24740d7ce9/df.csv'
df = pd.read_csv(url, parse_dates=['date'])
df.drop(columns=['Unnamed: 0'], inplace=True)
df_grp = df.groupby(['date', 'retail_item']).agg({'number_of_ads': 'sum'})
df_grp["percentage"] = df_grp.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
df_grp = df_grp.reset_index(level=[0,1])
for item in df_grp['retail_item'].unique():
dd = df_grp[df_grp['retail_item'] == item].groupby(['date', 'percentage'])[['number_of_ads']].sum().reset_index(level=[0,1])
dd['weakly_change'] = dd[['percentage']].rolling(7).mean()
fig, ax = plt.subplots(figsize=(8, 6), dpi=144)
sns.lineplot(dd.index, 'weakly_change', data=dd, ax=ax)
ax.set_xlim(dd.index.min(), dd.index.max())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
plt.gcf().autofmt_xdate()
plt.style.use('ggplot')
plt.xticks(rotation=90)
plt.show()
Current Result
but I couldn't get correct line chart that I expected, I want to reproduce the plot from this site. Is that doable to achieve this? Any idea?
desired plot
here is the example desired plot that I want to make from this minimal reproducible data:
I don't know how should make changes for my current attempt to get my desired plot above. Can anyone know any possible way of doing this in matplotlib? what else should I do? Any possible help would be appreciated. Thanks
Also see How to create a min-max plot by month with fill_between?
See in-line comments for details
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
#################################################################
# setup from question
url = 'https://gist.githubusercontent.com/adamFlyn/96e68902d8f71ad62a4d3cda135507ad/raw/4761264cbd55c81cf003a4219fea6a24740d7ce9/df.csv'
df = pd.read_csv(url, parse_dates=['date'])
df.drop(columns=['Unnamed: 0'], inplace=True)
df_grp = df.groupby(['date', 'retail_item']).agg({'number_of_ads': 'sum'})
df_grp["percentage"] = df_grp.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
df_grp = df_grp.reset_index(level=[0,1])
#################################################################
# create a month map from long to abbreviated calendar names
month_map = dict(zip(calendar.month_name[1:], calendar.month_abbr[1:]))
# update the month column name
df_grp['month'] = df_grp.date.dt.month_name().map(month_map)
# set month as categorical so they are plotted in the correct order
df_grp.month = pd.Categorical(df_grp.month, categories=month_map.values(), ordered=True)
# use groupby to aggregate min mean and max
dfmm = df_grp.groupby(['retail_item', 'month'])['percentage'].agg([max, min, 'mean']).stack().reset_index(level=[2]).rename(columns={'level_2': 'mm', 0: 'vals'}).reset_index()
# create a palette map for line colors
cmap = {'min': 'k', 'max': 'k', 'mean': 'b'}
# iterate through each retail item and plot the corresponding data
for g, d in dfmm.groupby('retail_item'):
plt.figure(figsize=(7, 4))
sns.lineplot(x='month', y='vals', hue='mm', data=d, palette=cmap)
# select only min or max data for fill_between
y1 = d[d.mm == 'max']
y2 = d[d.mm == 'min']
plt.fill_between(x=y1.month, y1=y1.vals, y2=y2.vals, color='gainsboro')
# add lines for specific years
for year in [2016, 2018, 2020]:
data = df_grp[(df_grp.date.dt.year == year) & (df_grp.retail_item == g)]
sns.lineplot(x='month', y='percentage', ci=None, data=data, label=year)
plt.ylim(0, 100)
plt.margins(0, 0)
plt.legend(bbox_to_anchor=(1., 1), loc='upper left')
plt.ylabel('Percentage of Ads')
plt.title(g)
plt.show()

How can I plot slice of certain DataFrame for each row with different color?

I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)

How to select a specific date range in a csv file with pandas (again)?

I looked at the responses to this original question (see here but doesn't seem to solve my issue.)
import pandas as pd
import pandas_datareader.data
import datetime
import matplotlib.pyplot as plt
df = pd.read_csv(mypath + filename, \
skiprows=4,index_col=0,usecols=['Day', 'Cushing OK Crude Oil Future Contract 1 Dollars per Barrel'], \
skipfooter=0,engine='python')
df.index = pd.to_datetime(df.index)
fig = plt.figure(figsize=plt.figaspect(0.25))
ax = fig.add_subplot(1,1,1)
ax.grid(axis='y',color='lightgrey', linestyle='--', linewidth=0.5)
ax.grid(axis='x',color='lightgrey', linestyle='none', linewidth=0.5)
df['Cushing OK Crude Oil Future Contract 1 Dollars per
Barrel'].plot(ax=ax,grid = True, \
color='blue',fontsize=14,legend=False)
plt.show()
The graph turns out fine but I can't figure out a way to show only a certain date range. I have tried everything.
type(df) = pandas.core.frame.DataFrame
type(df.index) = pandas.core.indexes.datetimes.DatetimeIndex
also, the format for the column 'Day' is YYYY-MM-DD
Assuming you have a datetime index on your dataframe (it looks that way), you can slice using .loc like so:
% matplotlib inline
import pandas as pd
import numpy as np
data = pd.DataFrame({'values': np.random.rand(31)}, index = pd.date_range('2018-01-01', '2018-01-31'))
# Plot the entire dataframe.
data.plot()
# Plot a slice of the dataframe.
data.loc['2018-01-05':'2018-01-10', 'values'].plot(legend = False)
Gives:
The orange series is the slice.

Categories

Resources