for loop with seaborn not displaying all plots - python

I've been trying to plot multiple graphs using a for loop and seaborn. Have tried different approaches (with subplots and trying to display them sequentially) and I can't manage to get the all the graphs to display (the best I've achieved is plotting the last one in the list). Here are the two approaches I've tried:
fig, ax = plt.subplots(1, 3, sharex = True) #Just hardcoding thre 3 here (number of slicers to plot) for testing
for i, col in enumerate(slicers):
plt.sca(ax[i])
ax[i] = sns.catplot(x = 'seq', kind = 'count', hue = col
, order = dfFirst['seq'].value_counts().index, height=6, aspect=11.7/6
, data = dfFirst) # distribution.set_xticklabels(rotation=65, horizontalalignment='right')
display(fig)
Have tried all combinations between plt.sca(ax[i]) and ax[i] = sns.catplot (activating both as in the example and one at a time) but fig always shows empty when displaying. In addition, I tried displaying figures sequentially using:
for i, col in enumerate(slicers):
plt.figure(i)
sns.catplot(x = 'seq', kind = 'count', hue = col
, order = dfFirst['seq'].value_counts().index, height=6, aspect=11.7/6
, data = dfFirst) # distribution.set_xticklabels(rotation=65, horizontalalignment='right')
display(figure)

catplot produces its own figure. See Plotting with seaborn using the matplotlib object-oriented interface
Hence, here it's just
for whatever:
sns.catplot(...)
plt.show()

Related

How to reduce the blank area in a grouped boxplot with many missing hue categories

I have an issue when plotting a categorical grouped boxplot by seaborn in Python, especially using 'hue'.
My raw data is as shown in the figure below. And I wanted to plot values in column 8 after categorized by column 1 and 4.
I used seaborn and my code is shown below:
ax = sns.boxplot(x=output[:,1], y=output[:,8], hue=output[:,4])
ax.set_xticklabel(ax.get_xticklabels(), rotation=90)
plt.legend([],[])
However, the generated plot always contains large blank area, as shown in the upper figure below. I tried to add 'dodge=False' in sns.boxplot according to a post here (https://stackoverflow.com/questions/53641287/off-center-x-axis-in-seaborn), but it gives the lower figure below.
Actually, what I want Python to plot is a boxplot like what I generated using JMP below.
It seems that if one of the 2nd categories is empty, seaborn will still leave the space on the generated figure for each 1st category, thus causes the observed off-set/blank area.
So I wonder if there is any way to solve this issue, like using other package in python?
Seaborn reserves a spot for each individual hue value, even when some of these values are missing. When many hue values are missing, this leads to annoying open spots. (When there would be only one box per x-value, dodge=False would solve the problem.)
A workaround is to generate a separate subplot for each individual x-label.
Reproducible example for default boxplot with missing hue values
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230206)
df = pd.DataFrame({'label': np.repeat(['label1', 'label2', 'label3', 'label4'], 250),
'cat': np.repeat(np.random.choice([*'abcdefghijklmnopqrst'], 40), 25),
'value': np.random.randn(1000).cumsum()})
df['cat'] = pd.Categorical(df['cat'], [*'abcdefghijklmnopqrst'])
sns.set_style('white')
plt.figure(figsize=(15, 5))
ax = sns.boxplot(df, x='label', y='value', hue='cat', palette='turbo')
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1, 1), ncol=2)
sns.despine()
plt.tight_layout()
plt.show()
Individual subplots per x value
A FacetGrid is generated with a subplot ("facet") for each x value
The original hue will be used as x-value for each subplot. To avoid empty spots, the hue should be of string type. When the hue would be pd.Categorical, seaborn would still reserve a spot for each of the categories.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value')
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
plt.tight_layout()
plt.show()
Adding consistent coloring
A dictionary palette can color the boxes such that corresponding boxes in different subplots have the same color. hue= with the same column as the x= will do the coloring, and dodge=False will remove the empty spots.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
cats = np.sort(df['cat'].unique())
palette_dict = {cat: color for cat, color in zip(cats, sns.color_palette('turbo', len(cats)))}
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value',
hue='cat', dodge=False, palette=palette_dict)
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
# ax.tick_params(axis='x', labelrotation=90) # optionally rotate the tick labels
plt.tight_layout()
plt.show()

pandas subplot title size

I have recently figured out that I can use plot function directly from pandas without using Seaborn for quick visualisations.
I used the following code to generate a series of graphs from the data frame that contains years as the first column and the price for different product in the rest of the columns.
df_annual_price.plot.line(x='Date',
subplots=True,
layout=(5,5),
figsize=(60,60),
fontsize=20,
sharex=False,
title = list_of_products
)
It neatly graphs the lineplot for all the columns. However, one thing I can't figure out is how to control the fontsize of the title for each plot. I have tried to look it up in other threads but couldn't find an answer.
Is there a simple and elegant answer to this?
Pandas's plot() with subplots=True option returns a list (or list of lists) of axes.
We could enumerate each axis and call its set_title() with title and font size.
This is how you change the title font size of each subplot.
We could pick any one of the axes and call its get_figure() to obtain the Figure object of the overall plot. Then we could call Figure's suptitle() with title and font size. This is how you change the title font size of the overall figure.
The example below creates a 2 x 2 subplots and illustrates functions which may be useful for people who are new to MatplotLib and Pandas's plot() function.
import numpy as np
import pandas as pd
labels = ['y1', 'y2', 'y3', 'y4']
x = 'x'
columns = [x] + labels
matrix = np.random.rand(10, 5)
df = pd.DataFrame(matrix, columns=columns)
df = df.sort_values(by=x)
axes = df.plot(
x=x,
y=labels,
subplots=True,
layout=(2,2),
kind='hist',
figsize=(8,8)
)
for i, row in enumerate(axes):
for j, ax in enumerate(row):
ax.set_title(f'Subplot {i, j}', fontsize=12)
ax.set_xlabel('Width')
ax.set_ylabel('Percentage')
fig = axes[0, 0].get_figure()
fig.subplots_adjust(top=0.9, wspace=0.3, hspace=0.3)
_ = fig.suptitle(f'Distribution of Widths', fontsize=16) # suppress printing of title
Pandas's plot() accepts **kwargs parameters which could be passed to its underlying matplotlib.pyplot.plot(). See https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.plot.html for various parameters.

Personalize pandas boxplot with colors

I've been trying to make a boxplot of some gender data that I divided into two sapareted dataframes, one for male, and one for female.
I managed to make the graph basically how I wanted it, but now I would like to make it look better. I'd like to make it look like a seaborn graph, but I wasn't able to find a way to make this using the seaborn library. I tried some ideas I found for coloring the pandas boxpplot, but nothing worked.
Is there a way to color these graphs? Or is there a way to make these side-by-side boxplots with seaborn?
dados_generos = dados_sem_zeros[["NU_NOTA_CN","NU_NOTA_CH","NU_NOTA_MT","NU_NOTA_LC","NU_NOTA_REDACAO", "TP_SEXO"]]
sexo_f = dados_generos[dados_generos["TP_SEXO"].str.contains("F")]
sexo_m = dados_generos[dados_generos["TP_SEXO"].str.contains("M")]
labels = ["CN", "CH", "MT", "LC", "REDAÇÃO"]
fig, (ax, ax2) = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
#Setting axis titles
ax.set_xlabel('Provas')
ax2.set_xlabel('Provas')
ax.set_ylabel('Notas')
#Making plots
chart1 = sexo_f[provas].boxplot(ax=ax)
chart2 = sexo_m[provas].boxplot(ax=ax2)
#Setting axis labels
chart1.set_xticklabels(labels,rotation=45)
chart2.set_xticklabels(labels,rotation=45)
plt.show()
This is the result I have:
This is the link to the data I'm using:
https://github.com/KarolDuarte/dados_generos/blob/main/dados_generos.csv
Since sns is best suitable for long form data, let's try melting the data and use sns.
# melting the data
plot_data = df.melt('TP_SEXO')
fig, axes = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
for ax, (gender, data) in zip(axes, plot_data.groupby('TP_SEXO')) :
sns.boxplot(x='variable',y='value',data=data, ax=ax)
Output:

How to combine 2 dataframe histograms in 1 plot?

I would like to use a code that shows all histograms in a dataframe. That will be df.hist(bins=10). However, I would like to add another histograms which shows CDF df_hist=df.hist(cumulative=True,bins=100,density=1,histtype="step")
I tried separating their matplotlib axes by using fig=plt.figure() and
plt.subplot(211). But this df.hist is actually part of pandas function, not matplotlib function. I also tried setting axes and adding ax=ax1 and ax2 options to each histogram but it didn't work.
How can I combine these histograms together?
Any help?
Histograms that I want to combine are like these. I want to show them side by side or put the second one on tip of the first one.
Sorry that I didn't care to make them look good.
It is possible to draw them together:
# toy data frame
df = pd.DataFrame(np.random.normal(0,1,(100,20)))
# draw hist
fig, axes = plt.subplots(5,4, figsize=(16,10))
df.plot(kind='hist', subplots=True, ax=axes, alpha=0.5)
# clone axes so they have different scales
ax_new = [ax.twinx() for ax in axes.flatten()]
df.plot(kind='kde', ax=ax_new, subplots=True)
plt.show()
Output:
It's also possible to draw them side-by-side. For example
fig, axes = plt.subplots(10,4, figsize=(16,10))
hist_axes = axes.flatten()[:20]
df.plot(kind='hist', subplots=True, ax=hist_axes, alpha=0.5)
kde_axes = axes.flatten()[20:]
df.plot(kind='kde', subplots=True, ax=kde_axes, alpha=0.5)
will plot hist on top of kde.
You can find more info here: Multiple histograms in Pandas (possible duplicate btw) but apparently Pandas cannot handle multiple histogram on same graphs.
It's ok because np.histogram and matplotlib.pyplot can, check the above link for a more complete answer.
Solution for overlapping histograms with df.hist with any number of subplots
You can combine two dataframe histogram figures by creating twin axes using the grid of axes returned by df.hist. Here is an example of normal histograms combined with cumulative step histograms where the size of the figure and the layout of the grid of subplots are taken care of automatically:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset stored in a pandas dataframe
rng = np.random.default_rng(seed=1) # random number generator
letters = [chr(i) for i in range(ord('A'), ord('G')+1)]
df = pd.DataFrame(rng.exponential(1, size=(100, len(letters))), columns=letters)
# Set parameters for figure dimensions and grid layout
nplots = df.columns.size
ncols = 3
nrows = int(np.ceil(nplots/ncols))
subp_w = 10/ncols # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10
# Plot grid of histograms with pandas function (with a shared y-axis)
grid = df.hist(grid=False, sharey=True, figsize=(ncols*subp_w, nrows*subp_h),
layout=(nrows, ncols), bins=bins, edgecolor='white', linewidth=0.5)
# Create list of twin axes containing second y-axis: note that due to the
# layout, the grid object may contain extra unused axes that are not shown
# (here in the H and I positions). The ax parameter of df.hist only accepts
# a number of axes that corresponds to the number of numerical variables
# in df, which is why the flattened array of grid axes is sliced here.
grid_twinx = [ax.twinx() for ax in grid.flat[:nplots]]
# Plot cumulative step histograms over normal histograms: note that the grid layout is
# preserved in grid_twinx so no need to set the layout parameter a second time here.
df.hist(ax=grid_twinx, histtype='step', bins=bins, cumulative=True, density=True,
color='tab:orange', linewidth=2, grid=False)
# Adjust space between subplots after generating twin axes
plt.gcf().subplots_adjust(wspace=0.4, hspace=0.4)
plt.show()
Solution for displaying histograms of different types side-by-side with matplotlib
To my knowledge, it is not possible to show the different types of plots side-by-side with df.hist. You need to create the figure from scratch, like in this example using the same dataset as before:
# Set parameters for figure dimensions and grid layout
nvars = df.columns.size
plot_types = 2 # normal histogram and cumulative step histogram
ncols_vars = 2
nrows = int(np.ceil(nvars/ncols_vars))
subp_w = 10/(plot_types*ncols_vars) # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10
# Create figure with appropriate size
fig = plt.figure(figsize=(plot_types*ncols_vars*subp_w, nrows*subp_h))
fig.subplots_adjust(wspace=0.4, hspace=0.7)
# Create subplots by adding a new axes per type of plot for each variable
# and create lists of axes of normal histograms and their y-axis limits
axs_hist = []
axs_hist_ylims = []
for idx, var in enumerate(df.columns):
axh = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+1)
axh.hist(df[var], bins=bins, edgecolor='white', linewidth=0.5)
axh.set_title(f'{var} - Histogram', size=11)
axs_hist.append(axh)
axs_hist_ylims.append(axh.get_ylim())
axc = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+2)
axc.hist(df[var], bins=bins, density=True, cumulative=True,
histtype='step', color='tab:orange', linewidth=2)
axc.set_title(f'{var} - Cumulative step hist.', size=11)
# Set shared y-axis for histograms
for ax in axs_hist:
ax.set_ylim(max(axs_hist_ylims))
plt.show()

Creating a matrix of plots with sns distplot

I am plotting 20+ features like so:
for col in dsd_mod["ae_analysis"].columns[:len(dsd_mod["ae_analysis"].columns)]:
if col != "sae_flag":
sns.distplot(dsd_mod["ae_analysis"].loc[(dsd_mod["ae_analysis"]['sae_flag'] == 1),col],
color='r',
kde=True,
hist=False,
label='sae_ae = 1')
sns.distplot(dsd_mod["ae_analysis"].loc[(dsd_mod["ae_analysis"]['sae_flag'] == 0),col],
color='y',
kde=True,
hist=False,
label='sae_ae = 0')
Which creates a separate graph for each feature. How can I put these all on a matrix? Or like how pair plots outputs?
Right now I get 30 graphs like this all in one column:
How can I modify this so that I can get 6 rows and 5 columns ?
Thanks in advance!
displot can use whatever axes object you want to draw the plot. So you just need to create your axes with the desired geometry, and pass the relevant axes to your functions.
fig, axs = plt.subplots(6,5)
# axs is a 2D array with shape (6,5)
# you can keep track of counters in your for-loop to place the resulting graphs
# using ax=axs[i,j]
# or an alternative is to use a generator that you can use to get the next axes
# instance at every step of the loop
ax_iter = iter(axs.flat)
for _ in range(30):
ax = next(ax_iter)
sns.distplot(np.random.normal(loc=0, size=(1000,)), ax=ax)
sns.distplot(np.random.normal(loc=1, size=(1000,)), ax=ax)

Categories

Resources