Personalize pandas boxplot with colors - python

I've been trying to make a boxplot of some gender data that I divided into two sapareted dataframes, one for male, and one for female.
I managed to make the graph basically how I wanted it, but now I would like to make it look better. I'd like to make it look like a seaborn graph, but I wasn't able to find a way to make this using the seaborn library. I tried some ideas I found for coloring the pandas boxpplot, but nothing worked.
Is there a way to color these graphs? Or is there a way to make these side-by-side boxplots with seaborn?
dados_generos = dados_sem_zeros[["NU_NOTA_CN","NU_NOTA_CH","NU_NOTA_MT","NU_NOTA_LC","NU_NOTA_REDACAO", "TP_SEXO"]]
sexo_f = dados_generos[dados_generos["TP_SEXO"].str.contains("F")]
sexo_m = dados_generos[dados_generos["TP_SEXO"].str.contains("M")]
labels = ["CN", "CH", "MT", "LC", "REDAÇÃO"]
fig, (ax, ax2) = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
#Setting axis titles
ax.set_xlabel('Provas')
ax2.set_xlabel('Provas')
ax.set_ylabel('Notas')
#Making plots
chart1 = sexo_f[provas].boxplot(ax=ax)
chart2 = sexo_m[provas].boxplot(ax=ax2)
#Setting axis labels
chart1.set_xticklabels(labels,rotation=45)
chart2.set_xticklabels(labels,rotation=45)
plt.show()
This is the result I have:
This is the link to the data I'm using:
https://github.com/KarolDuarte/dados_generos/blob/main/dados_generos.csv

Since sns is best suitable for long form data, let's try melting the data and use sns.
# melting the data
plot_data = df.melt('TP_SEXO')
fig, axes = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
for ax, (gender, data) in zip(axes, plot_data.groupby('TP_SEXO')) :
sns.boxplot(x='variable',y='value',data=data, ax=ax)
Output:

Related

How to show multiple already plotted matplotlib figures side-by-side or on-top in Python without re-plotting them?

I have already plotted two figures separately in a single jupyter notebook file, and exported them.
What I want is to show them side by side, but not plot them again by using matplotlib.pyplot.subplots.
For example, in Mathematica, it's easier to do this by just saving the figures into a Variable, and displaying them afterwards.
What I tried was saving the figures, using
fig1, ax1 = plt.subplots(1,1)
... #plotting using ax1.plot()
fig2, ax2 = plt.subplots(1,1)
... #plotting using ax2.plot()
Now, those fig1 or fig2 are of type Matplotlib.figure.figure which stores the figure as an 'image-type' instance. I can even see them separately by calling just fig1 or fig2 in my notebook.
But, I can not show them together as by doing something like
plt.show(fig1, fig2)
It returns nothing since, there wasn't any figures currently being plotted.
You may look at this link or this, which is a Mathematica version of what I was talking about.
assuming u want to merge those subplots in the end.
Here is the code
import numpy as np
import matplotlib.pyplot as plt
#e.x function to plot
x = np.linspace(0, 10)
y = np.exp(x)
#almost your code
figure, axes = plt.subplots(1,1)
res_1, = axes.plot(x,y) #saving the results in a tuple
plt.show()
plt.close(figure)
figure, axes = plt.subplots(1,1)
res_2, = axes.plot(x,-y) #same before
plt.show()
#restructure to merge
figure_2, (axe_1,axe_2) = plt.subplots(1,2) #defining rows and columns
axe_1.plot(res_1.get_data()[0], res_1.get_data()[1]) #using the already generated data
axe_2.plot(res_2.get_data()[0], res_2.get_data()[1])
#if you want show them in one
plt.show()
Not quite sure what you mean with:
but not plot them again by using matplotlib.pyplot.subplots.
But you can display two figures next to each other in a jupyter notebook by using:
fig, ax = plt.subplots(nrows=1, ncols=2)
ax[0] = ... # Code for first figure
ax[1] = ... # Code for second figure
plt.show()
Or above each other:
fig, ax = plt.subplots(nrows=2, ncols=1)
ax[0] = ... # Top figure
ax[1] = ... # Bottom figure
plt.show()

Seaborn Box Plot X-Axis Too Crowded

Good Day,
See the attached image for reference. The x-axis on the Seaborn bar chart I created has overlapping text and is too crowded. How do I fix this?
The data source is on Kaggle and I was following along with this article: https://towardsdatascience.com/a-quick-guide-on-descriptive-statistics-using-pandas-and-seaborn-2aadc7395f32
Here is the code I used:
sns.set(style = 'darkgrid')
plt.figure(figsize = (20, 10))
ax = sns.countplot(x = 'Regionname', data = df)
Seaborn X-axis too crowded
I'd appreciate any help.
Thanks!
You are not using the figure size you set on the previous line. Try
fig, ax = plt.subplots(figsize=(20, 10)) # generate a figure and return figure and axis handle
sns.countplot(x='Regionname', data=df, ax=ax) # passing the `ax` to seaborn so it knows about it
An extra thing after this might be to rotate the labels:
ax.set_xticklabels(ax.get_xticklabels(), rotation=60)

Empty legend when plotting pandas dataframe columns with matplotlib

When I plot two columns of my dataframe, where one contains the x-values and the other one the y-values, I do not manage to show a legend. This is because there are non handles in the legend call. But I did not figure out how to create those handles out of the dataframe.
plt.figure(num=None, figsize=(15, 6), dpi=80, facecolor='w', edgecolor='k')
colordict = {'m':'blue', 'f':'red'}
plt.scatter(x=df.p_age, y=df.p_weight, c=df.p_gender.apply(lambda x: colordict[x]))
plt.title('Weight distribution per age')
plt.xlabel('Age [months]')
plt.ylabel('individual weight [kg]')
plt.legend(title="Legend Title")
The color-dictionnary colordict is so I can have different colors for the two genders. That works. And I want a legend with a blue and a red dot and "male", "female" next to it.
I have tried:
plt.legend(title="Legend Title", handles=(df.p_age,df.p_weight))
But that does not work, as handles cannot be made out of series.
Derived from the solution found here. This solution iteratively draws the plot for each class.
fig, ax = plt.subplots()
for gender,color in colordict.items():
scatter = ax.scatter(x=df.p_age, y=df.p_weight, c=color,label=gender)
ax.legend()
plt.show()

Seaborn heatmap widths do not match when using subplots

I am trying to adjust the width of my second subplot (column sum with the binary cmap) to the first one.
So far I only managed to do so by randomly selecting different figsize, but every time I trying to re-use the code on a dataset of different size I alwayse come up with something like the picture below (second heatmap always wider than the first one).
Am I missing something to adjust the second one automatically ?
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
test = pd.DataFrame({'user': ['Bob', 'Bob', 'Bob','Janice','Janice','Fernand','Kevin','Sidhant'],
'tag' : ['enfant','enfant','enfant','femme','femme','jeune','jeune','jeune'],
'income': [3, 5, 1,14,8,10,13,17]})
# specify font sizes for later:
titlesize= 30
ticklabel = 23
legendlabel = 23
# Generate custom diverging colormaps:
cmap = sns.color_palette("ch:18,-.1,dark=.3", 6)
cmap2 = sns.color_palette("binary", 6)
# Preparing data for the heatmap:
heatmap1_data = pd.pivot_table(test, values='income',
index=['user'],
columns='tag')
heatmap1_data = heatmap1_data.reindex(heatmap1_data.sum().sort_values(ascending=False).index, axis=1)
# Creating figure:
fig, (ax1, ax2) = plt.subplots(2,1,figsize=(10,15))
# First subplot:
sns.heatmap(heatmap1_data, ax= ax1, cmap=cmap,square=True, linewidths=.5, annot=True, cbar = False,annot_kws={"size": legendlabel} )
# Cosmetic first subplot:
ax1.xaxis.tick_top()
ax1.tick_params(labelsize= ticklabel, top = False)
ax1.set_xlabel('')
ax1.set_ylabel('')
ax1.set_xticklabels(heatmap1_data.columns,rotation=90)
ax1.set_yticklabels(heatmap1_data.index,rotation=0)
ax1.set_title("Activités par agence et population vulnérable", size= titlesize, pad=20)
# Second subplot (column sum at the bottom):
sns.heatmap((pd.DataFrame(heatmap1_data.sum(axis=0))).transpose().round(1), ax=ax2, square=True, fmt='g', linewidths=.5, annot=True, cmap=cmap2 , cbar=False, xticklabels=False, yticklabels=False, annot_kws={"size": legendlabel})
ax2.set_xlabel("Nombre d'activités", size = ticklabel, labelpad = 5)
# More cosmetic:
ax1.set_title("Title", size= titlesize, pad=35)
ax1.set_xlabel('')
ax1.set_ylabel('')
plt.tick_params(labelsize= ticklabel,left=False, bottom=False)
plt.xticks(rotation=60)
ax1.spines['bottom'].set_color('#dfe1ec')
ax1.spines['left'].set_color('#dfe1ec')
ax1.spines['top'].set_color('#dfe1ec')
ax1.spines['right'].set_color('#dfe1ec')
plt.tight_layout()
plt.show()
The issue is using square=True in sns.heatmap. Since the aspect ratios of the two subplots are wide vs tall, the way that the "squaring" is done is different for each. For the first, it's made thinner, and the second, it's made shorter. It's done this way to fit into the constraints of the your subplot Axes' sizes, which are defined to be equal by default when you call plt.subplots.
One way to get around this is to define the aspect ratios of your two Axes to be different and fit the shape of your data. This won't work 100 % of the time but will in most cases. You can use the keyword gridspec_kw and define a dictionary with 'height_ratios' in your call of plt.subplots.
fig, (ax1, ax2) = plt.subplots(2,1, figsize=(10,15), gridspec_kw={'height_ratios':[5, 1]})

Question on Matplotlib bar plot with a pandas dataframe?

I want to create a bar chart with SALES_COV_TYPE as X and ID_ev as Y. I would like to have both the bars in different colors. Rename X axis values, and also rename the X label in legend to something else as can be seen in the image link. Change Id_ev to 'Visits'.
This is my attempt at it and I'm not satisfied with it.
data = {'SALES_COV_TYPE':[84.0,88.0], 'ID_ev':[2360869,882791]}
df = pd.DataFrame(data)
df.plot(kind='bar',y='ID_ev',x='SALES_COV_TYPE')
ax=plt.gca()
my_xticks = ['CAM','GAM']
ax.set_xticklabels(my_xticks)
ax.set_xlabel('Sales Coverage Type')
ax.set_ylabel('Number of Customer Visits')
Plot
I want to create the bar chart using fig, ax method so that when I create subplots in future this becomes a template for me, but I am not being able to get it done. I would not like to use the pandas wrapper for matplotlib if possible. What do you all suggest?
Thanks!
Do you mean something like this:
data = {'SALES_COV_TYPE':[84.0,88.0], 'ID_ev':[2360869,882791], 'SALES_COV_CLASS':['CAM','GAM']}
df = pd.DataFrame(data)
colors = ['green', 'red']
fig, ax = plt.subplots(1, 1, figsize=(4,3))
ax.bar(df['SALES_COV_CLASS'], df['ID_ev'], color = colors)
ax.set_title('Title')
ax.set_xlabel('Sales Coverage Type')
ax.set_ylabel('Number of Customer Visits')
plt.show()
I find it easier to add a column with the name of the group, that way you don't need to reindex the x axis to remove the empty columns.

Categories

Resources