Empty legend when plotting pandas dataframe columns with matplotlib - python

When I plot two columns of my dataframe, where one contains the x-values and the other one the y-values, I do not manage to show a legend. This is because there are non handles in the legend call. But I did not figure out how to create those handles out of the dataframe.
plt.figure(num=None, figsize=(15, 6), dpi=80, facecolor='w', edgecolor='k')
colordict = {'m':'blue', 'f':'red'}
plt.scatter(x=df.p_age, y=df.p_weight, c=df.p_gender.apply(lambda x: colordict[x]))
plt.title('Weight distribution per age')
plt.xlabel('Age [months]')
plt.ylabel('individual weight [kg]')
plt.legend(title="Legend Title")
The color-dictionnary colordict is so I can have different colors for the two genders. That works. And I want a legend with a blue and a red dot and "male", "female" next to it.
I have tried:
plt.legend(title="Legend Title", handles=(df.p_age,df.p_weight))
But that does not work, as handles cannot be made out of series.

Derived from the solution found here. This solution iteratively draws the plot for each class.
fig, ax = plt.subplots()
for gender,color in colordict.items():
scatter = ax.scatter(x=df.p_age, y=df.p_weight, c=color,label=gender)
ax.legend()
plt.show()

Related

How to reduce the blank area in a grouped boxplot with many missing hue categories

I have an issue when plotting a categorical grouped boxplot by seaborn in Python, especially using 'hue'.
My raw data is as shown in the figure below. And I wanted to plot values in column 8 after categorized by column 1 and 4.
I used seaborn and my code is shown below:
ax = sns.boxplot(x=output[:,1], y=output[:,8], hue=output[:,4])
ax.set_xticklabel(ax.get_xticklabels(), rotation=90)
plt.legend([],[])
However, the generated plot always contains large blank area, as shown in the upper figure below. I tried to add 'dodge=False' in sns.boxplot according to a post here (https://stackoverflow.com/questions/53641287/off-center-x-axis-in-seaborn), but it gives the lower figure below.
Actually, what I want Python to plot is a boxplot like what I generated using JMP below.
It seems that if one of the 2nd categories is empty, seaborn will still leave the space on the generated figure for each 1st category, thus causes the observed off-set/blank area.
So I wonder if there is any way to solve this issue, like using other package in python?
Seaborn reserves a spot for each individual hue value, even when some of these values are missing. When many hue values are missing, this leads to annoying open spots. (When there would be only one box per x-value, dodge=False would solve the problem.)
A workaround is to generate a separate subplot for each individual x-label.
Reproducible example for default boxplot with missing hue values
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230206)
df = pd.DataFrame({'label': np.repeat(['label1', 'label2', 'label3', 'label4'], 250),
'cat': np.repeat(np.random.choice([*'abcdefghijklmnopqrst'], 40), 25),
'value': np.random.randn(1000).cumsum()})
df['cat'] = pd.Categorical(df['cat'], [*'abcdefghijklmnopqrst'])
sns.set_style('white')
plt.figure(figsize=(15, 5))
ax = sns.boxplot(df, x='label', y='value', hue='cat', palette='turbo')
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1, 1), ncol=2)
sns.despine()
plt.tight_layout()
plt.show()
Individual subplots per x value
A FacetGrid is generated with a subplot ("facet") for each x value
The original hue will be used as x-value for each subplot. To avoid empty spots, the hue should be of string type. When the hue would be pd.Categorical, seaborn would still reserve a spot for each of the categories.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value')
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
plt.tight_layout()
plt.show()
Adding consistent coloring
A dictionary palette can color the boxes such that corresponding boxes in different subplots have the same color. hue= with the same column as the x= will do the coloring, and dodge=False will remove the empty spots.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
cats = np.sort(df['cat'].unique())
palette_dict = {cat: color for cat, color in zip(cats, sns.color_palette('turbo', len(cats)))}
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value',
hue='cat', dodge=False, palette=palette_dict)
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
# ax.tick_params(axis='x', labelrotation=90) # optionally rotate the tick labels
plt.tight_layout()
plt.show()

Plot pandas df into boxplot & histogram

Currently I am trying to plot a boxplot into a histogram (without using seaborn). I have tried many varieties but I always get skewed graphs.
This was my starting point:
#Histogram
df.hist(column="Q7", bins=20, figsize=(14,6))
#Boxplot
df.boxplot(column="Q7",vert=False,figsize=(14,6))
which resulted in the following graph:
As you can see the boxplot and outliers are at the bottom, but I want it to be on top.
Anyone an idea?
You can use subplots and set the percentage of the plot to ensure that the boxplot is first and the hist plot is below. In below example, I am using 30% for boxplot and 70% for bistogram. Also adjusted the spacing between the plots and used a common x-axis using sharex. Hope this is what you are looking for...
fig, ax = plt.subplots(2, figsize=(14, 6), sharex=True, # Common x-axis
gridspec_kw={"height_ratios": (.3, .7)}) # boxplot 30% of the vertical space
#Boxplot
df.boxplot(column="Q7",vert=False,figsize=(14,6), ax=ax[0])
#Histogram
df.hist(column="Q7", bins=20, figsize=(14,6), ax=ax[1])
ax[1].title.set_size(0)
plt.subplots_adjust(hspace=0.1) ##Adjust gap between the two plots

Matplotlib scatter plot dual y-axis

I try to figure out how to create scatter plot in matplotlib with two different y-axis values.
Now i have one and need to add second with index column values on y.
points1 = plt.scatter(r3_load["TimeUTC"], r3_load["r3_load_MW"],
c=r3_load["r3_load_MW"], s=50, cmap="rainbow", alpha=1) #set style options
plt.rcParams['figure.figsize'] = [20,10]
#plt.colorbar(points)
plt.title("timeUTC vs Load")
#plt.xlim(0, 400)
#plt.ylim(0, 300)
plt.xlabel('timeUTC')
plt.ylabel('Load_MW')
cbar = plt.colorbar(points1)
cbar.set_label('Load')
Result i expect is like this:
So second scatter set should be for TimeUTC vs index. Colors are not the subject;) also in excel y-axes are different sites, but doesnt matter.
Appriciate your help! Thanks, Paulina
Continuing after the suggestions in the comments.
There are two ways of using matplotlib.
Via the matplotlib.pyplot interface, like you were doing in your original code snippet with .plt
The object-oriented way. This is the suggested way to use matplotlib, especially when you need more customisation like in your case. In your code, ax1 is an Axes instance.
From an Axes instance, you can plot your data using the Axes.plot and Axes.scatter methods, very similar to what you did through the pyplot interface. This means, you can write a Axes.scatter call instead of .plot and use the same parameters as in your original code:
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.scatter(r3_load["TimeUTC"], r3_load["r3_load_MW"],
c=r3_load["r3_load_MW"], s=50, cmap="rainbow", alpha=1)
ax2.plot(r3_dda249["TimeUTC"], r3_dda249.index, c='b', linestyle='-')
ax1.set_xlabel('TimeUTC')
ax1.set_ylabel('r3_load_MW', color='g')
ax2.set_ylabel('index', color='b')
plt.show()

Personalize pandas boxplot with colors

I've been trying to make a boxplot of some gender data that I divided into two sapareted dataframes, one for male, and one for female.
I managed to make the graph basically how I wanted it, but now I would like to make it look better. I'd like to make it look like a seaborn graph, but I wasn't able to find a way to make this using the seaborn library. I tried some ideas I found for coloring the pandas boxpplot, but nothing worked.
Is there a way to color these graphs? Or is there a way to make these side-by-side boxplots with seaborn?
dados_generos = dados_sem_zeros[["NU_NOTA_CN","NU_NOTA_CH","NU_NOTA_MT","NU_NOTA_LC","NU_NOTA_REDACAO", "TP_SEXO"]]
sexo_f = dados_generos[dados_generos["TP_SEXO"].str.contains("F")]
sexo_m = dados_generos[dados_generos["TP_SEXO"].str.contains("M")]
labels = ["CN", "CH", "MT", "LC", "REDAÇÃO"]
fig, (ax, ax2) = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
#Setting axis titles
ax.set_xlabel('Provas')
ax2.set_xlabel('Provas')
ax.set_ylabel('Notas')
#Making plots
chart1 = sexo_f[provas].boxplot(ax=ax)
chart2 = sexo_m[provas].boxplot(ax=ax2)
#Setting axis labels
chart1.set_xticklabels(labels,rotation=45)
chart2.set_xticklabels(labels,rotation=45)
plt.show()
This is the result I have:
This is the link to the data I'm using:
https://github.com/KarolDuarte/dados_generos/blob/main/dados_generos.csv
Since sns is best suitable for long form data, let's try melting the data and use sns.
# melting the data
plot_data = df.melt('TP_SEXO')
fig, axes = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
for ax, (gender, data) in zip(axes, plot_data.groupby('TP_SEXO')) :
sns.boxplot(x='variable',y='value',data=data, ax=ax)
Output:

Matplotlib Subplot axes sharing: Apply to every other plot?

I am trying to find a way to apply the shared axes parameters of subplot() to every other plot in a series of subplots.
I've got the following code, which uses data from RPM4, based on rows in fpD
fig, ax = plt.subplots(2*(fpD['name'].count()), sharex=True, figsize=(6,fpD['name'].count()*2),
gridspec_kw={'height_ratios':[5,1]*fpD['name'].count()})
for i, r in fpD.iterrows():
RPM4[RPM4['name'] == RPM3.iloc[i,0]].plot(x='date', y='RPM', ax=ax[(2*i)], legend=False)
RPM4[RPM4['name'] == RPM3.iloc[i,0]].plot(kind='area', color='lightgrey', x='date', y='total', ax=ax[(2*i)+1],
legend=False,)
ax[2*i].set_title('test', fontsize=12)
plt.tight_layout()
Which produces an output that is very close to what I need. It loops through the 'name' column in a table and produces two plots for each, and displays them as subplots:
As you can see, the sharex parameter works fine for me here, since I want all the plots to share the same axis.
However, what I'd really like is for all the even-numbered (bigger) plots to share the same y axis, and for the odd-numbered (small grey) plots to all share a different y axis.
Any help on accomplishing this is much appreciated, thanks!

Categories

Resources