Creating a matrix of plots with sns distplot - python

I am plotting 20+ features like so:
for col in dsd_mod["ae_analysis"].columns[:len(dsd_mod["ae_analysis"].columns)]:
if col != "sae_flag":
sns.distplot(dsd_mod["ae_analysis"].loc[(dsd_mod["ae_analysis"]['sae_flag'] == 1),col],
color='r',
kde=True,
hist=False,
label='sae_ae = 1')
sns.distplot(dsd_mod["ae_analysis"].loc[(dsd_mod["ae_analysis"]['sae_flag'] == 0),col],
color='y',
kde=True,
hist=False,
label='sae_ae = 0')
Which creates a separate graph for each feature. How can I put these all on a matrix? Or like how pair plots outputs?
Right now I get 30 graphs like this all in one column:
How can I modify this so that I can get 6 rows and 5 columns ?
Thanks in advance!

displot can use whatever axes object you want to draw the plot. So you just need to create your axes with the desired geometry, and pass the relevant axes to your functions.
fig, axs = plt.subplots(6,5)
# axs is a 2D array with shape (6,5)
# you can keep track of counters in your for-loop to place the resulting graphs
# using ax=axs[i,j]
# or an alternative is to use a generator that you can use to get the next axes
# instance at every step of the loop
ax_iter = iter(axs.flat)
for _ in range(30):
ax = next(ax_iter)
sns.distplot(np.random.normal(loc=0, size=(1000,)), ax=ax)
sns.distplot(np.random.normal(loc=1, size=(1000,)), ax=ax)

Related

How to combine boxplot figures into one?

I am working now on plot my dataset by boxplot as in below code
plt.figure(figsize=(8,5))
fig = plt.figure()
num_list=Final_dataset.columns.values.tolist()
for i in range(len(num_list)):
column=num_list[i]
sns.boxplot(x="label", y=column, data=Final_dataset, palette='Set2')
plt.savefig('{}.png'. format(i))
plt.show()
I need to produce one image that combine all attributes figures as in this figure rather than several figures. how Ican fix it? thanks, a lot
See subplot function in matplotlib.
nrows = 3 # decide how many you want
ncols = 4 # decide how many you want
plt.figure(figsize=(8,5))
num_list=Final_dataset.columns.values.tolist()
for i in range(len(num_list)):
column=num_list[i]
sns.boxplot(x="label", y=column, data=Final_dataset, palette='Set2')
plt.subplot(nrows, ncols, index = 1+i)
plt.savefig('{}.png'. format(i))
plt.show()

Sub Plots using Seaborn

I am trying to plot box plots and violin plots for three variables against a variable in a 3X2 subplot formation. But I am not able to figure out how to include sns lib with subplot function.
#plots=plt.figure()
axis=plt.subplots(nrows=3,ncols=3)
for i,feature in enumerate(list(df.columns.values)[:-1]):
axis[i].plot(sns.boxplot(data=df,x='survival_status_after_5yrs',y=feature))
i+=1
axis[i].plot(sns.violinplot(data=df,x='survival_status_after_5yrs',y=feature))
plt.show()```
I am expecting 3X2 subplot, x axis stays same all the time y axis rolls over the three variables I have mentioned.
Thanks for your help.
I think you have two problems.
First, plt.subplots(nrows=3, ncols=2) returns a figure object and an array of axes objects so you should replace this line with:
fig, ax = plt.subplots(nrows=3, ncols=2). The ax object is now a 3x2 numpy array of axes objects.
You could turn this into a 1-d array with ax = ax.flatten() but given what I think you are trying to do I think it is easier to keep as 3x2.
(Btw I assume the ncols=3 is a typo)
Second, as Ewoud answer mentions with seaborn you pass the axes to plot on as an argument to the plot call.
I think the following will work for you:
fig, ax = plt.subplots(nrows=3, ncols=2)
for i, feature in enumerate(list(df.columns.values)[:-1]):
# for each feature create two plots on the same row
sns.boxplot(data=df, x='survival_status_after_5yrs',y=feature, ax=ax[i, 0])
sns.violinplot(data=df, x='survival_status_after_5yrs', y=feature, ax=ax[i, 1])
plt.show()
Most seaborn plot functions have an axis kwarg, so instead of
axis[i].plot(sns.boxplot(data=df,x='survival_status_after_5yrs',y=feature))
try
sns.boxplot(data=df,x='survival_status_after_5yrs',y=feature,axis=axis[i])

How to prepare 2 row, 5 column grid to plot 10 boxplot in one figure using seaborn or matplotlib library?

I am trying to plot a 10box plot in one image, in two rows, with the given code but no success, how can I implement this idea.
fig, axes =plt.subplots(2,5)
sns.set_style("darkgrid")
for i,t in enumerate(new_fs):
df = pd.read_csv(t,sep='\t')
sns.boxplot(data=df, orient='v',ax=axes[i % 2] )
Thank you.
As per the documentation for plt.subplots():
for NxM, subplots with N>1 and M>1 are returned as a 2D array.
Therefore, your variable axes is a 2D array, you need to access the individual axes using axes[i,j].
Alternatively, I would rewrite your for loop like so:
for t,ax in zip(new_fs, axes.flat):
df = pd.read_csv(t,sep='\t')
sns.boxplot(data=df, orient='v', ax=ax)

How to combine 2 dataframe histograms in 1 plot?

I would like to use a code that shows all histograms in a dataframe. That will be df.hist(bins=10). However, I would like to add another histograms which shows CDF df_hist=df.hist(cumulative=True,bins=100,density=1,histtype="step")
I tried separating their matplotlib axes by using fig=plt.figure() and
plt.subplot(211). But this df.hist is actually part of pandas function, not matplotlib function. I also tried setting axes and adding ax=ax1 and ax2 options to each histogram but it didn't work.
How can I combine these histograms together?
Any help?
Histograms that I want to combine are like these. I want to show them side by side or put the second one on tip of the first one.
Sorry that I didn't care to make them look good.
It is possible to draw them together:
# toy data frame
df = pd.DataFrame(np.random.normal(0,1,(100,20)))
# draw hist
fig, axes = plt.subplots(5,4, figsize=(16,10))
df.plot(kind='hist', subplots=True, ax=axes, alpha=0.5)
# clone axes so they have different scales
ax_new = [ax.twinx() for ax in axes.flatten()]
df.plot(kind='kde', ax=ax_new, subplots=True)
plt.show()
Output:
It's also possible to draw them side-by-side. For example
fig, axes = plt.subplots(10,4, figsize=(16,10))
hist_axes = axes.flatten()[:20]
df.plot(kind='hist', subplots=True, ax=hist_axes, alpha=0.5)
kde_axes = axes.flatten()[20:]
df.plot(kind='kde', subplots=True, ax=kde_axes, alpha=0.5)
will plot hist on top of kde.
You can find more info here: Multiple histograms in Pandas (possible duplicate btw) but apparently Pandas cannot handle multiple histogram on same graphs.
It's ok because np.histogram and matplotlib.pyplot can, check the above link for a more complete answer.
Solution for overlapping histograms with df.hist with any number of subplots
You can combine two dataframe histogram figures by creating twin axes using the grid of axes returned by df.hist. Here is an example of normal histograms combined with cumulative step histograms where the size of the figure and the layout of the grid of subplots are taken care of automatically:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset stored in a pandas dataframe
rng = np.random.default_rng(seed=1) # random number generator
letters = [chr(i) for i in range(ord('A'), ord('G')+1)]
df = pd.DataFrame(rng.exponential(1, size=(100, len(letters))), columns=letters)
# Set parameters for figure dimensions and grid layout
nplots = df.columns.size
ncols = 3
nrows = int(np.ceil(nplots/ncols))
subp_w = 10/ncols # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10
# Plot grid of histograms with pandas function (with a shared y-axis)
grid = df.hist(grid=False, sharey=True, figsize=(ncols*subp_w, nrows*subp_h),
layout=(nrows, ncols), bins=bins, edgecolor='white', linewidth=0.5)
# Create list of twin axes containing second y-axis: note that due to the
# layout, the grid object may contain extra unused axes that are not shown
# (here in the H and I positions). The ax parameter of df.hist only accepts
# a number of axes that corresponds to the number of numerical variables
# in df, which is why the flattened array of grid axes is sliced here.
grid_twinx = [ax.twinx() for ax in grid.flat[:nplots]]
# Plot cumulative step histograms over normal histograms: note that the grid layout is
# preserved in grid_twinx so no need to set the layout parameter a second time here.
df.hist(ax=grid_twinx, histtype='step', bins=bins, cumulative=True, density=True,
color='tab:orange', linewidth=2, grid=False)
# Adjust space between subplots after generating twin axes
plt.gcf().subplots_adjust(wspace=0.4, hspace=0.4)
plt.show()
Solution for displaying histograms of different types side-by-side with matplotlib
To my knowledge, it is not possible to show the different types of plots side-by-side with df.hist. You need to create the figure from scratch, like in this example using the same dataset as before:
# Set parameters for figure dimensions and grid layout
nvars = df.columns.size
plot_types = 2 # normal histogram and cumulative step histogram
ncols_vars = 2
nrows = int(np.ceil(nvars/ncols_vars))
subp_w = 10/(plot_types*ncols_vars) # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10
# Create figure with appropriate size
fig = plt.figure(figsize=(plot_types*ncols_vars*subp_w, nrows*subp_h))
fig.subplots_adjust(wspace=0.4, hspace=0.7)
# Create subplots by adding a new axes per type of plot for each variable
# and create lists of axes of normal histograms and their y-axis limits
axs_hist = []
axs_hist_ylims = []
for idx, var in enumerate(df.columns):
axh = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+1)
axh.hist(df[var], bins=bins, edgecolor='white', linewidth=0.5)
axh.set_title(f'{var} - Histogram', size=11)
axs_hist.append(axh)
axs_hist_ylims.append(axh.get_ylim())
axc = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+2)
axc.hist(df[var], bins=bins, density=True, cumulative=True,
histtype='step', color='tab:orange', linewidth=2)
axc.set_title(f'{var} - Cumulative step hist.', size=11)
# Set shared y-axis for histograms
for ax in axs_hist:
ax.set_ylim(max(axs_hist_ylims))
plt.show()

A matplotlib histogram matrix, using Pandas, with multiple categories overlaid

I am trying to combine two approaches at creating histograms.
#Sample Data
df = pd.DataFrame({'V1':[1,2,3,4,5,6],
'V2': [43,35,6,7,31,34],
'V3': [23,75,67,23,56,32],
'V4': [23,45,67,63,56,32],
'V5': [23,5,67,23,6,2],
'V6': [23,78,67,76,56,2],
'V7': [23,45,67,53,56,32],
'V8': [5,5,5,5,5,5],
'cat': ["A","B","C","A","B","B"],})
I am able to create a histogram matrix for each category using this code.
#1. Creating histogram matrix for each category
for i in df['cat'].unique():
fig, ax = plt.subplots()
df[df['cat']==i].hist(figsize=(20,20),ax =ax)
fig.suptitle(i + " Feature-Class Relationships", fontsize = 20)
fig.savefig('Histogram Matrix.png' %(i), dpi = 240)
This creates a separate histogram matrix for each category. However what I would like is for the categories to be overlaid on the same matrix.
I am able to create an overlaid histogram using this approach:
#2. Overlaid histrogram for single variable
fig, ax = plt.subplots()
for i in df['cat'].unique():
df[df['cat']==i]['V8'].hist(figsize=(12,8),ax =ax, alpha = 0.5, label = i)
ax.legend()
plt.show()
However this only creates a single overlaid image. I want to create an overlaid histogram for all of variables in the matrix i.e. all categories shown in the same matrix rather than a separate matrix for each category.
I have created the following code, which is a combination of the above two approaches, but it does not overlay each of the histogram matrices together and only the last plot is created.
#3. Combining approaches to create a matrix of overlaid histograms
fig, ax = plt.subplots()
for i in df['cat'].unique():
df[df['cat']==i].hist(figsize=(12,8),ax =ax, alpha = 0.5, label = i)
ax.legend()
fig.savefig('Combined.png', dpi = 240)
Is what I am trying to do possible?
I guess this is what you want. A matrix of 2 columns and 4 rows and in each "cell" of this matrix you get the histogram for a column with the categories overlapped.
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'V1':[1,2,3,4,5,6],
'V2': [43,35,6,7,31,34],
'V3': [23,75,67,23,56,32],
'V4': [23,45,67,63,56,32],
'V5': [23,5,67,23,6,2],
'V6': [23,78,67,76,56,2],
'V7': [23,45,67,53,56,32],
'V8': [5,5,5,5,5,5],
'cat': ["A","B","C","A","B","B"],})
# Define your subplots matrix.
# In this example the fig has 4 rows and 2 columns
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
# This approach is better than looping through df.cat.unique
for g, d in df.groupby('cat'):
d.hist(alpha = 0.5, ax=axes, label=g)
# Just outputing the legend for each column in fig
for c1, c2 in axes:
c1.legend()
c2.legend()
plt.show()
Here's the output:
The last code from the question should give you a warning about the axes being cleared - essentially the phenomenon you observe.
UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
Now the idea could be to let pandas plot each histogram in its own axes, but to make sure that each of those is the same, namely ax. This can be done by passing a list of 8 times ax, ax =[ax]*8:
fig, ax = plt.subplots(figsize=(12,8),)
for i in df['cat'].unique():
df[df['cat']==i].hist(ax =[ax]*8, alpha = 0.5, label = i)
ax.legend()
plt.show()
The result will look very crowded, but this is apparently desired.

Categories

Resources