I can't get the legends to show on the subplots which show up just fine and take the other formatting I've applied. What am I missing?
If I do a plot for the dataframe alone, it shows the legend. If I add a label to the plot for the subplots, it assigns that label to all three lines.
Here is image. plot vs subplot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from functools import reduce
%matplotlib notebook
#Source for files
# Per Capita Personal Income
# Ann Arbor https://fred.stlouisfed.org/series/ANNA426PCPI
# MI https://fred.stlouisfed.org/series/MIPCPI
# USA https://fred.stlouisfed.org/series/A792RC0A052NBEA
dfAnnArbor_PCPI = pd.read_csv('PerCapitaPersonalIncomeAnnArborMI.csv', skiprows=1, names=['Date', 'PCPI'])
dfMI_PCPI = pd.read_csv('PerCapitaPersonalIncomeMI.csv', skiprows=1, names=['Date', 'PCPI'])
dfUSA_PCPI = pd.read_csv('PerCapitaPersonalIncomeUSA.csv', skiprows=1, names=['Date', 'PCPI'])
# consolidate three df into one using Date
dfAll = [dfAnnArbor_PCPI, dfMI_PCPI, dfUSA_PCPI]
dfPCPI = reduce(lambda left, right: pd.merge(left, right, on='Date', how='outer'), dfAll)
dfPCPI = dfPCPI.dropna() # drop rows with NaN
dfPCPI.columns = ['Date', 'AnnArbor', 'MI', 'USA'] # rename columns
dfPCPI['Date'] = dfPCPI['Date'].str[:4] # select only year
dfPCPI = dfPCPI.set_index('Date')
dfPCPI_Rel = dfPCPI.apply(lambda x: x / x[0])
dfPCPI_Small = dfPCPI.iloc[8:].copy()
dfPCPI_SmRel = dfPCPI_Small.apply(lambda x: x / x[0])
dfPCPI_SmRel.plot()
fig, ax = plt.subplots(1, 2)
ax0 = ax[0].plot(dfPCPI_Rel, '-', label='a')
ax1 = ax[1].plot(dfPCPI_SmRel, '-', label='test1')
ax[0].legend()
for x in fig.axes:
for label in x.get_xticklabels():
label.set_rotation(45)
ax[1].xaxis.set_major_locator(ticker.MultipleLocator(2))
plt.show()
The legend in pyplot refers to an axis instance. Therefore, if you want multiple plots to have their own legend, you need to call legend() for each axis. In your case
ax[0].legend()
ax[1].legend()
Additionally, as you are calling plot(), you may want to use the keyword label in each plot() call so as to have a label for each legend entry.
You should try fig.legend() instead of plt.legend()
Related
I have a data frame that contains average concentrations for 4 different sites based on season and year. The code I wrote produces a figure for each site, with four subplots for each season. Year is on the y-axis and concentration is on the x-axis.
Here's the link to my data: https://drive.google.com/file/d/1mVAsjRiFmMXaW0F8HBhadi1ZQPcUGIa7/view?usp=sharing
The issue is that the code automatically plots the subplots as
fall - spring
summer - winter
I want them to plot in chronological order, because that makes more sense that alphabetical:
spring - summer
fall - winter
Here is my code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.formula.api as smf
import scipy.stats
main_dataframe = pd.read_csv('NOx_sznl.csv')
main_dataframe.rename(columns={'NOx_3168':'Banning NOx', 'NOx_2199':'Palm Springs NOx', 'NOx_2551':'El Centro NOx', 'NOx_3135':'Calexico NOx'}, inplace=True)
col = list(main_dataframe.columns)
col.remove('Year')
col.remove('Season')
for ind,station in enumerate(col):
df_new = main_dataframe[['Season', 'Year', col[ind]]]
###here I tried to reorder the seasons in the dataframe
df_new = df_new.set_index('Season')
df_new = df_new.loc[['Spring', 'Summer', 'Fall', 'Winter'], :]
df_new = df_new.reset_index()
###but it didn't change the outcome
df_new = df_new.set_index('Year')
# df_new['Betty Jo Mcneece Receiving Home'].astype('float')
df_new[col[ind]] = df_new[col[ind]]
grouped = df_new.groupby('Season')
rowlength = grouped.ngroups/2 # fix up if odd number of groups
fig, axs = plt.subplots(figsize=(15,10),
nrows=2, ncols=int(rowlength), # fix as above
gridspec_kw=dict(hspace=0.4))#, sharex='col', sharey='row') # Much control of gridspec
targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
ax.plot(grouped.get_group(key)[col[ind]], marker='o', color='orange')
ax.set_ylim(0,)
ax.set_yticks(ax.get_yticks(),size=12)
#ax.set_xlim(2009,2020)
ax.set_xticks(np.arange(2009,2020,1))
ax.set_xticklabels(ax.get_xticks(), rotation = 45, size=12)
fig.suptitle("%s"%col[ind], fontsize=30)
# ax.set_title('%s')
plt.subplot(221)
plt.gca().set_title('Fall', fontsize=20)
plt.subplot(222)
plt.gca().set_title('Spring', fontsize=20)
plt.subplot(223)
plt.gca().set_title('Summer', fontsize=20)
plt.subplot(224)
plt.gca().set_title('Winter', fontsize=20)
plt.show()
I would apppreciate any help rearranging the subplots.
The order of the subplots is given by grouped.groups.keys() in targets = zip(grouped.groups.keys(), axs.flatten()) but the problem is further upstream in grouped = df_new.groupby('Season') which is where grouped.groups.keys() comes from. df.groupby() automatically sorts alphabetically unless you do sort=False, so grouped = df_new.groupby('Season', sort=False) should follow the order you provided when you made df_new.
Here is what your code looks like on my end so you can have an exact copy.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.formula.api as smf
import scipy.stats
main_dataframe = pd.read_csv('NOx_sznl.csv')
main_dataframe.rename(columns={'NOx_3168': 'Banning NOx',
'NOx_2199': 'Palm Springs NOx',
'NOx_2551': 'El Centro NOx',
'NOx_3135': 'Calexico NOx'},
inplace=True)
col = list(main_dataframe.columns)
col.remove('Year')
col.remove('Season')
for ind, station in enumerate(col):
df_new = main_dataframe[['Season', 'Year', col[ind]]]
###here I tried to reorder the seasons in the dataframe
df_new = df_new.set_index('Season')
df_new = df_new.loc[['Spring', 'Summer', 'Fall', 'Winter'], :]
df_new = df_new.reset_index()
###but it didn't change the outcome
df_new = df_new.set_index('Year')
# df_new['Betty Jo Mcneece Receiving Home'].astype('float')
df_new[col[ind]] = df_new[col[ind]]
grouped = df_new.groupby('Season', sort=False)
rowlength = grouped.ngroups/2 # fix up if odd number of groups
fig, axs = plt.subplots(figsize=(15,10),
nrows=2, ncols=int(rowlength), # fix as above
gridspec_kw=dict(hspace=0.4))#, sharex='col', sharey='row') # Much control of gridspec
targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
ax.plot(grouped.get_group(key)[col[ind]], marker='o', color='orange')
ax.set_ylim(0,)
ax.set_yticks(ax.get_yticks(),size=12)
#ax.set_xlim(2009,2020)
ax.set_xticks(np.arange(2009,2020,1))
ax.set_xticklabels(ax.get_xticks(), rotation = 45, size=12)
ax.set_title(key)
fig.suptitle("%s"%col[ind], fontsize=30)
plt.show()
Question
I have used the secondary_y argument in pd.DataFrame.plot().
While trying to change the fontsize of legends by .legend(fontsize=20), I ended up having only 1 column name in the legend when I actually have 2 columns to be printed on the legend.
This problem (having only 1 column name in the legend) does not take place when I did not use secondary_y argument.
I want all the column names in my dataframe to be printed in the legend, and change the fontsize of the legend even when I use secondary_y while plotting dataframe.
Example
The following example with secondary_y shows only 1 column name A, when I have actually 2 columns, which are A and B.
The fontsize of the legend is changed, but only for 1 column name.
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.randn(24*3, 2),
index=pd.date_range('1/1/2019', periods=24*3, freq='h'))
df.columns = ['A', 'B']
df.plot(secondary_y = ["B"], figsize=(12,5)).legend(fontsize=20, loc="upper right")
When I do not use secondary_y, then legend shows both of the 2 columns A and B.
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.randn(24*3, 2),
index=pd.date_range('1/1/2019', periods=24*3, freq='h'))
df.columns = ['A', 'B']
df.plot(figsize=(12,5)).legend(fontsize=20, loc="upper right")
To manage to customize it you have to create your graph with subplots function of Matplotlib:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
df = pd.DataFrame(np.random.randn(24*3, 2),
index=pd.date_range('1/1/2019', periods=24*3, freq='h'))
df.columns = ['A', 'B']
#define colors to use
col1 = 'steelblue'
col2 = 'red'
#define subplots
fig,ax = plt.subplots()
#add first line to plot
lns1=ax.plot(df.index,df['A'], color=col1)
#add x-axis label
ax.set_xlabel('dates', fontsize=14)
#add y-axis label
ax.set_ylabel('A', color=col1, fontsize=16)
#define second y-axis that shares x-axis with current plot
ax2 = ax.twinx()
#add second line to plot
lns2=ax2.plot(df.index,df['B'], color=col2)
#add second y-axis label
ax2.set_ylabel('B', color=col2, fontsize=16)
#legend
ax.legend(lns1+lns2,['A','B'],loc="upper right",fontsize=20)
#another solution is to create legend for fig,:
#fig.legend(['A','B'],loc="upper right")
plt.show()
result:
this is a somewhat late response, but something that worked for me was simply setting plt.legend(fontsize = wanted_fontsize) after the plot function.
I have the following function:
Say hue="animals have three categories dog,bird,horse and we have two dataframes df_m and df_f consisting of data of male animals and women animals only, respectively.
The function plots three distplot of y (e.g y="weight") one for each hue={dog,bird,horse}. In each subplot we plot df_m[y] and df_f[y] such that I can compare the weight of male dogs/female dogs, male birds/female birds, male horses/female horses.
If I set distkwargs={"hist":False} when calling the function the legends ["F","M"] disappears, for some reason. Having distkwargs={"hist":True}` shows the legends
def plot_multi_kde_cat(self,dfs,y,hue,subkwargs={},distkwargs={},legends=[]):
"""
Create a subplot multi_kde with categories in the same plot
dfs: List
- DataFrames for each category e.g one for male and one for females
hue: string
- column for which each category is plotted (in each subplot)
"""
hues = dfs[0][hue].cat.categories
if len(hues)==2: #Only two categories
fig,axes = plt.subplots(1,2,**subkwargs) #Get axes and flatten them
axes=axes.flatten()
for ax,hu in zip(axes,hues):
for df in dfs:
sns.distplot(df.loc[df[hue]==hu,y],ax=ax,**distkwargs)
ax.set_title(f"Segment: {hu}")
ax.legend(legends)
else: #More than two categories: create a square grid and remove unsused axes
n_rows = int(np.ceil(np.sqrt(len(hues)))) #number of rows
fig,axes = plt.subplots(n_rows,n_rows,**subkwargs)
axes = axes.flatten()
for ax,hu in zip(axes,hues):
for df in dfs:
sns.distplot(df.loc[df[hue]==hu,y],ax=ax,**distkwargs)
ax.set_title(f"Segment: {hu}")
ax.legend(legends)
n_remove = len(axes)-len(hues) #number of axes to remove
if n_remove>0:
for ax in axes[-n_remove:]:
ax.set_visible(False)
fig.tight_layout()
return fig,axes
You can work around the problem by explicitly providing the label to the distplot. This forces a legend entry for each distplot. ax.legend() then already gets the correct labels.
Here is some minimal sample code to illustrate how everything works together:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
def plot_multi_kde_cat(dfs, y, hue, subkwargs={}, distkwargs={}, legends=[]):
hues = np.unique(dfs[0][hue])
fig, axes = plt.subplots(1, len(hues), **subkwargs)
axes = axes.flatten()
for ax, hu in zip(axes, hues):
for df, legend_label in zip(dfs, legends):
sns.distplot(df.loc[df[hue] == hu, y], ax=ax, label=legend_label, **distkwargs)
ax.set_title(f"Segment: {hu}")
ax.legend()
N = 20
df_m = pd.DataFrame({'animal': np.random.choice(['tiger', 'horse'], N), 'weight': np.random.uniform(100, 200, N)})
df_f = pd.DataFrame({'animal': np.random.choice(['tiger', 'horse'], N), 'weight': np.random.uniform(80, 160, N)})
plot_multi_kde_cat([df_m, df_f], 'weight', 'animal',
subkwargs={}, distkwargs={'hist': False}, legends=['male', 'female'])
plt.show()
I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)
With a dataframe and basic plot such as this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123456)
rows = 75
df = pd.DataFrame(np.random.randint(-4,5,size=(rows, 3)), columns=['A', 'B', 'C'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df = df.cumsum()
df.plot()
What is the best way of annotating the last points on the lines so that you get the result below?
In order to annotate a point use ax.annotate(). In this case it makes sense to specify the coordinates to annotate separately. I.e. the y coordinate is the data coordinate of the last point of the line (which you can get from line.get_ydata()[-1]) while the x coordinate is independent of the data and should be the right hand side of the axes (i.e. 1 in axes coordinates). You may then also want to offset the text a bit such that it does not overlap with the axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
rows = 75
df = pd.DataFrame(np.random.randint(-4,5,size=(rows, 3)), columns=['A', 'B', 'C'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df = df.cumsum()
ax = df.plot()
for line, name in zip(ax.lines, df.columns):
y = line.get_ydata()[-1]
ax.annotate(name, xy=(1,y), xytext=(6,0), color=line.get_color(),
xycoords = ax.get_yaxis_transform(), textcoords="offset points",
size=14, va="center")
plt.show()
Method 1
Here is one way, or at least a method, which you can adapt to aesthetically fit in whatever way you want, using the plt.annotate method:
[EDIT]: If you're going to use a method like this first one, the method outlined in ImportanceOfBeingErnest's answer is better than what I've proposed.
df.plot()
for col in df.columns:
plt.annotate(col,xy=(plt.xticks()[0][-1]+0.7, df[col].iloc[-1]))
plt.show()
For the xy argument, which is the x and y coordinates of the text, I chose the last x coordinate in plt.xticks(), and added 0.7 so that it is outside of your x axis, but you can coose to make it closer or further as you see fit.
METHOD 2:
You could also just use the right y axis, and label it with your 3 lines. For example:
fig, ax = plt.subplots()
df.plot(ax=ax)
ax2 = ax.twinx()
ax2.set_ylim(ax.get_ylim())
ax2.set_yticks([df[col].iloc[-1] for col in df.columns])
ax2.set_yticklabels(df.columns)
plt.show()
This gives you the following plot:
I've got some tips from the other answers and believe this is the easiest solution.
Here is a generic function to improve the labels of a line chart. Its advantages are:
you don't need to mess with the original DataFrame since it works over a line chart,
it will use the already set legend label,
removes the frame,
just copy'n paste it to improve your chart :-)
You can just call it after creating any line char:
def improve_legend(ax=None):
if ax is None:
ax = plt.gca()
for spine in ax.spines:
ax.spines[spine].set_visible(False)
for line in ax.lines:
data_x, data_y = line.get_data()
right_most_x = data_x[-1]
right_most_y = data_y[-1]
ax.annotate(
line.get_label(),
xy=(right_most_x, right_most_y),
xytext=(5, 0),
textcoords="offset points",
va="center",
color=line.get_color(),
)
ax.legend().set_visible(False)
This is the original chart:
Now you just need to call the function to improve your plot:
ax = df.plot()
improve_legend(ax)
The new chart:
Beware, it will probably not work well if a line has null values at the end.