Seaborn heatmap widths do not match when using subplots

Seaborn heatmap widths do not match when using subplots - python

I am trying to adjust the width of my second subplot (column sum with the binary cmap) to the first one.
So far I only managed to do so by randomly selecting different figsize, but every time I trying to re-use the code on a dataset of different size I alwayse come up with something like the picture below (second heatmap always wider than the first one).
Am I missing something to adjust the second one automatically ?
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
test = pd.DataFrame({'user': ['Bob', 'Bob', 'Bob','Janice','Janice','Fernand','Kevin','Sidhant'],
'tag' : ['enfant','enfant','enfant','femme','femme','jeune','jeune','jeune'],
'income': [3, 5, 1,14,8,10,13,17]})
# specify font sizes for later:
titlesize= 30
ticklabel = 23
legendlabel = 23
# Generate custom diverging colormaps:
cmap = sns.color_palette("ch:18,-.1,dark=.3", 6)
cmap2 = sns.color_palette("binary", 6)
# Preparing data for the heatmap:
heatmap1_data = pd.pivot_table(test, values='income',
index=['user'],
columns='tag')
heatmap1_data = heatmap1_data.reindex(heatmap1_data.sum().sort_values(ascending=False).index, axis=1)
# Creating figure:
fig, (ax1, ax2) = plt.subplots(2,1,figsize=(10,15))
# First subplot:
sns.heatmap(heatmap1_data, ax= ax1, cmap=cmap,square=True, linewidths=.5, annot=True, cbar = False,annot_kws={"size": legendlabel} )
# Cosmetic first subplot:
ax1.xaxis.tick_top()
ax1.tick_params(labelsize= ticklabel, top = False)
ax1.set_xlabel('')
ax1.set_ylabel('')
ax1.set_xticklabels(heatmap1_data.columns,rotation=90)
ax1.set_yticklabels(heatmap1_data.index,rotation=0)
ax1.set_title("Activités par agence et population vulnérable", size= titlesize, pad=20)
# Second subplot (column sum at the bottom):
sns.heatmap((pd.DataFrame(heatmap1_data.sum(axis=0))).transpose().round(1), ax=ax2, square=True, fmt='g', linewidths=.5, annot=True, cmap=cmap2 , cbar=False, xticklabels=False, yticklabels=False, annot_kws={"size": legendlabel})
ax2.set_xlabel("Nombre d'activités", size = ticklabel, labelpad = 5)
# More cosmetic:
ax1.set_title("Title", size= titlesize, pad=35)
ax1.set_xlabel('')
ax1.set_ylabel('')
plt.tick_params(labelsize= ticklabel,left=False, bottom=False)
plt.xticks(rotation=60)
ax1.spines['bottom'].set_color('#dfe1ec')
ax1.spines['left'].set_color('#dfe1ec')
ax1.spines['top'].set_color('#dfe1ec')
ax1.spines['right'].set_color('#dfe1ec')
plt.tight_layout()
plt.show()

The issue is using square=True in sns.heatmap. Since the aspect ratios of the two subplots are wide vs tall, the way that the "squaring" is done is different for each. For the first, it's made thinner, and the second, it's made shorter. It's done this way to fit into the constraints of the your subplot Axes' sizes, which are defined to be equal by default when you call plt.subplots.
One way to get around this is to define the aspect ratios of your two Axes to be different and fit the shape of your data. This won't work 100 % of the time but will in most cases. You can use the keyword gridspec_kw and define a dictionary with 'height_ratios' in your call of plt.subplots.
fig, (ax1, ax2) = plt.subplots(2,1, figsize=(10,15), gridspec_kw={'height_ratios':[5, 1]})

Related

set custom tick labels on heatmap color bar

I have a list of dataframes named merged_dfs that I am looping through to get the correlation and plot subplots of heatmap correlation matrix using seaborn.
I want to customize the colorbar tick labels, but I am having trouble figuring out how to do it with my example.
Currently, my colorbar scale values from top to bottom are
[1,0.5,0,-0.5,-1]
I want to keep these values, but change the tick labels to be
[1,0.5,0,0.5,1]
for my diverging color bar.
Here is the code and my attempt:
fig, ax = plt.subplots(nrows=6, ncols=2, figsize=(20,20))
for i, (title,merging) in enumerate (zip(new_name_data,merged_dfs)):
graph = merging.corr()
colormap = sns.diverging_palette(250, 250, as_cmap=True)
a = sns.heatmap(graph.abs(), cmap=colormap, vmin=-1,vmax=1,center=0,annot = graph, ax=ax.flat[i])
cbar = fig.colorbar(a)
cbar.set_ticklabels(["1","0.5","0","0.5","1"])
fig.delaxes(ax[5,1])
plt.show()
plt.close()
I keep getting this error:
AttributeError: 'AxesSubplot' object has no attribute 'get_array'

Several things are going wrong:
fig.colorbar(...) would create a new colorbar, by default appended to the last subplot that was created.
sns.heatmap returns an ax (indicates a subplot). This is very different to matplotlib functions, e.g. plt.imshow(), which would return the graphical element that was plotted.
You can suppress the heatmap's colorbar (cbar=False), and then create it newly with the parameters you want.
fig.colorbar(...) needs a parameter ax=... when the figure contains more than one subplot.
Instead of creating a new colorbar, you can add the colorbar parameters to sns.heatmap via cbar_kws=.... The colorbar itself can be found via ax.collections[0].colobar. (ax.collections[0] is where matplotlib stored the graphical object that contains the heatmap.)
Using an index is strongly discouraged when working with Python. It's usually more readable, easier to maintain and less error-prone to include everything into the zip command.
As now your vmin now is -1, taking the absolute value for the coloring seems to be a mistake.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
merged_dfs = [pd.DataFrame(data=np.random.rand(5, 7), columns=[*'ABCDEFG']) for _ in range(5)]
new_name_data = [f'Dataset {i + 1}' for i in range(len(merged_dfs))]
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(12, 7))
for title, merging, ax in zip(new_name_data, merged_dfs, axes.flat):
graph = merging.corr()
colormap = sns.diverging_palette(250, 250, as_cmap=True)
sns.heatmap(graph, cmap=colormap, vmin=-1, vmax=1, center=0, annot=True, ax=ax, cbar_kws={'ticks': ticks})
ax.collections[0].colorbar.set_ticklabels([abs(t) for t in ticks])
fig.delaxes(axes.flat[-1])
fig.tight_layout()
plt.show()

How to plot a stacked area plot

I have a dataframe(df) with two columns: 'Foundation Type', which has 4 types of foundations (Shafts, Piles, Combination, Spread), and another column 'Vs30' with different values for parameter Vs30. Each row represents a bridge, with a type of foundation and a Vs30 value.
First, I create an new column 'binVs30' in df, converting each element of 'Vs30' into different bins, which has 5 different kind of ranges ([0-200],[200-400]...[800-1000]).
df['binVs30'] = pd.cut(df.Vs30, bins=np.arange(0, 1100, 200))
then, I created a stacked area plot with the code as follow:
color_table = pd.crosstab(df['binVs30'], df['Foundation Type'], dropna=False)
ax = color_table.plot(kind='area', figsize=(8, 8), stacked=True, rot=0)
display(ax)
plt.xlabel('')
plt.ylabel('Frequency', fontsize=12)
plt.legend(title='Foundation Type', loc='upper right')
plt.title('Column Database', fontsize='20')
plt.show()
The resulting picture shows some extra bins that shouldn't be there. Therefore, I had to fix the xticks by manually adding the following code:
locs, labels = plt.xticks()
plt.xticks(locs, ['','0-200','','200-400','','400-600','','600-800','','800-1000'], fontsize=10, rotation=45)
Is there a reason why Python creates those extra bins that shouldn't exist? Is that a bug that Python has? Since if I change it to a stacked bar plot, the problem just vanished. Is there a way that I could fix it by not manually adding bin code?
Also two other questions are, how to add the edgecolor for an area plot? Something like:
color_table.plot(kind='area', figsize=(8, 8), stacked=True, edgecolor='black', legend=None, rot=0)
The command edgecolor='black' doesn't work in a stacked area plot.
And, if I want to create bin for 'Vs30' like ([0-200],[200-400]...[>800]). Is there a way I can do that? Since the way I create 'binVs30' column doesn't allow me create a bin that is '>800'.

There are a couple of questions here. Firstly about including an open-ended bin in your pd.cut(). You can use np.inf to capture everything in the last bin and assign it a custom label. Secondly, since you're already using matplotlib, I'd recommend using its stacking plot directly rather than via pandas. Then you can use edgecolor argument without any issues.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame(data={
"foundation" : np.random.choice(list("ABCD"), 1000),
"binVs30" : np.random.randint(0, 1200, 1000)
})
bins = [0, 200, 400, 600, 800, np.inf]
labels = ["0-199", "200-399", "400-599", "600-799", "800+"]
df["bins"] = pd.cut(
df["binVs30"], bins=bins, labels=labels,
right=False, include_lowest=True)
stack_data = pd.crosstab(df['bins'], df['foundation'], dropna=False)
stack_array = stack_data.values.T.tolist()
pal = sns.color_palette("Set1")
plt.figure(figsize=(8,4))
plt.stackplot(
labels, stack_array, labels=list("ABCD"),
colors=pal, alpha=0.4, edgecolor="black")
plt.legend(loc='upper left')
plt.show()

How to combine 2 dataframe histograms in 1 plot?

I would like to use a code that shows all histograms in a dataframe. That will be df.hist(bins=10). However, I would like to add another histograms which shows CDF df_hist=df.hist(cumulative=True,bins=100,density=1,histtype="step")
I tried separating their matplotlib axes by using fig=plt.figure() and
plt.subplot(211). But this df.hist is actually part of pandas function, not matplotlib function. I also tried setting axes and adding ax=ax1 and ax2 options to each histogram but it didn't work.
How can I combine these histograms together?
Any help?
Histograms that I want to combine are like these. I want to show them side by side or put the second one on tip of the first one.
Sorry that I didn't care to make them look good.

It is possible to draw them together:
# toy data frame
df = pd.DataFrame(np.random.normal(0,1,(100,20)))
# draw hist
fig, axes = plt.subplots(5,4, figsize=(16,10))
df.plot(kind='hist', subplots=True, ax=axes, alpha=0.5)
# clone axes so they have different scales
ax_new = [ax.twinx() for ax in axes.flatten()]
df.plot(kind='kde', ax=ax_new, subplots=True)
plt.show()
Output:
It's also possible to draw them side-by-side. For example
fig, axes = plt.subplots(10,4, figsize=(16,10))
hist_axes = axes.flatten()[:20]
df.plot(kind='hist', subplots=True, ax=hist_axes, alpha=0.5)
kde_axes = axes.flatten()[20:]
df.plot(kind='kde', subplots=True, ax=kde_axes, alpha=0.5)
will plot hist on top of kde.

You can find more info here: Multiple histograms in Pandas (possible duplicate btw) but apparently Pandas cannot handle multiple histogram on same graphs.
It's ok because np.histogram and matplotlib.pyplot can, check the above link for a more complete answer.

Solution for overlapping histograms with df.hist with any number of subplots
You can combine two dataframe histogram figures by creating twin axes using the grid of axes returned by df.hist. Here is an example of normal histograms combined with cumulative step histograms where the size of the figure and the layout of the grid of subplots are taken care of automatically:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset stored in a pandas dataframe
rng = np.random.default_rng(seed=1) # random number generator
letters = [chr(i) for i in range(ord('A'), ord('G')+1)]
df = pd.DataFrame(rng.exponential(1, size=(100, len(letters))), columns=letters)
# Set parameters for figure dimensions and grid layout
nplots = df.columns.size
ncols = 3
nrows = int(np.ceil(nplots/ncols))
subp_w = 10/ncols # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10
# Plot grid of histograms with pandas function (with a shared y-axis)
grid = df.hist(grid=False, sharey=True, figsize=(ncols*subp_w, nrows*subp_h),
layout=(nrows, ncols), bins=bins, edgecolor='white', linewidth=0.5)
# Create list of twin axes containing second y-axis: note that due to the
# layout, the grid object may contain extra unused axes that are not shown
# (here in the H and I positions). The ax parameter of df.hist only accepts
# a number of axes that corresponds to the number of numerical variables
# in df, which is why the flattened array of grid axes is sliced here.
grid_twinx = [ax.twinx() for ax in grid.flat[:nplots]]
# Plot cumulative step histograms over normal histograms: note that the grid layout is
# preserved in grid_twinx so no need to set the layout parameter a second time here.
df.hist(ax=grid_twinx, histtype='step', bins=bins, cumulative=True, density=True,
color='tab:orange', linewidth=2, grid=False)
# Adjust space between subplots after generating twin axes
plt.gcf().subplots_adjust(wspace=0.4, hspace=0.4)
plt.show()
Solution for displaying histograms of different types side-by-side with matplotlib
To my knowledge, it is not possible to show the different types of plots side-by-side with df.hist. You need to create the figure from scratch, like in this example using the same dataset as before:
# Set parameters for figure dimensions and grid layout
nvars = df.columns.size
plot_types = 2 # normal histogram and cumulative step histogram
ncols_vars = 2
nrows = int(np.ceil(nvars/ncols_vars))
subp_w = 10/(plot_types*ncols_vars) # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10
# Create figure with appropriate size
fig = plt.figure(figsize=(plot_types*ncols_vars*subp_w, nrows*subp_h))
fig.subplots_adjust(wspace=0.4, hspace=0.7)
# Create subplots by adding a new axes per type of plot for each variable
# and create lists of axes of normal histograms and their y-axis limits
axs_hist = []
axs_hist_ylims = []
for idx, var in enumerate(df.columns):
axh = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+1)
axh.hist(df[var], bins=bins, edgecolor='white', linewidth=0.5)
axh.set_title(f'{var} - Histogram', size=11)
axs_hist.append(axh)
axs_hist_ylims.append(axh.get_ylim())
axc = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+2)
axc.hist(df[var], bins=bins, density=True, cumulative=True,
histtype='step', color='tab:orange', linewidth=2)
axc.set_title(f'{var} - Cumulative step hist.', size=11)
# Set shared y-axis for histograms
for ax in axs_hist:
ax.set_ylim(max(axs_hist_ylims))
plt.show()

Python: subplots with different total sizes

Original Post
I need to make several subplots with different sizes.
I have simulation areas of the size (x y) 35x6µm to 39x2µm and I want to plot them in one figure. All subplots have the same x-ticklabels (there is a grid line every 5µm on the x-axis).
When I plot the subplots into one figure, then the graphs with the small x-area are streched, so that the x-figuresize is completely used. Therefore, the x-gridlines do not match together anymore.
How can I achieve that the subplots aren't streched anymore and are aligned on the left?
Edit: Here is some code:
size=array([[3983,229],[3933,350],[3854,454],[3750,533],[3500,600]], dtype=np.float)
resolution=array([[1024,256],[1024,320],[1024,448],[1024,512],[1024,640]], dtype=np.float)
aspect_ratios=(resolution[:,0]/resolution[:,1])*(size[:,1]/size[:,0])
number_of_graphs=len(data)
fig, ax=plt.subplots(nrows=number_of_graphs, sharex=xshare)
fig.set_size_inches(12,figheight)
for i in range(number_of_graphs):
temp=np.rot90(np.loadtxt(path+'/'+data[i]))
img=ax[i].imshow(temp,
interpolation="none",
cmap=mapping,
norm=specific_norm,
aspect=aspect_ratios[i]
)
ax[i].set_adjustable('box-forced')
#Here I have to set some ticks and labels....
ax[i].xaxis.set_ticks(np.arange(0,int(size[i,0]),stepwidth_width)*resolution[i,0]/size[i,0])
ax[i].set_xticklabels((np.arange(0, int(size[i,0]), stepwidth_width)))
ax[i].yaxis.set_ticks(np.arange(0,int(size[i,1]),stepwidth_height)*resolution[i,1]/size[i,1])
ax[i].set_yticklabels((np.arange(0, int(size[i,1]), stepwidth_height)))
ax[i].set_title(str(mag[i]))
grid(True)
savefig(path+'/'+name+'all.pdf', bbox_inches='tight', pad_inches=0.05) #saves graph
Here are some examples:
If I plot different matrices in a for loop, the iPhython generates an output which is pretty much what I want. The y-distande between each subplot is constant, and the size of each figure is correct. You can see, that the x-labels match to each other:
When I plot the matrices in one figure using subplots, then this is not the case: The x-ticks do not fit together, and every subplot has the same size on the canvas (which means, that for thin subplots there is more white space reservated on the canvas...).
I simply want the first result from iPython in one output file using subplots.
Using GridSpec
After the community told me to use GridSpec to determine the size of my subplots directly I wrote a code for automatic plotting:
size=array([[3983,229],[3933,350],[3854,454],[3750,533],[3500,600]], dtype=np.float)
#total size of the figure
total_height=int(sum(size[:,1]))
total_width=int(size.max())
#determines steps of ticks
stepwidth_width=500
stepwidth_height=200
fig, ax=plt.subplots(nrows=len(size))
fig.set_size_inches(size.max()/300., total_height/200)
gs = GridSpec(total_height, total_width)
gs.update(left=0, right=0.91, hspace=0.2)
height=0
for i in range (len(size)):
ax[i] = plt.subplot(gs[int(height):int(height+size[i,1]), 0:int(size[i,0])])
temp=np.rot90(np.loadtxt(path+'/'+FFTs[i]))
img=ax[i].imshow(temp,
interpolation="none",
vmin=-100,
vmax=+100,
aspect=aspect_ratios[i],
)
#Some rescaling
ax[i].xaxis.set_ticks(np.arange(0,int(size[i,0]),stepwidth_width)*resolution[i,0]/size[i,0])
ax[i].set_xticklabels((np.arange(0, int(size[i,0]), stepwidth_width)))
ax[i].yaxis.set_ticks(np.arange(0,int(size[i,1]),stepwidth_height)*resolution[i,1]/size[i,1])
ax[i].set_yticklabels((np.arange(0, int(size[i,1]), stepwidth_height)))
ax[i].axvline(antenna[i]) #at the antenna position a vertical line is plotted
grid(True)
#colorbar
cbaxes = fig.add_axes([0.93, 0.2, 0.01, 0.6]) #[left, bottom, width, height]
cbar = plt.colorbar(img, cax = cbaxes, orientation='vertical')
tick_locator = ticker.MaxNLocator(nbins=3)
cbar.locator = tick_locator
cbar.ax.yaxis.set_major_locator(matplotlib.ticker.AutoLocator())
cbar.set_label('Intensity',
#fontsize=12
)
cbar.update_ticks()
height=height+size[i,1]
plt.show()
And here is the result....
Do you have any ideas?

What about using matplotlib.gridspec.GridSpec? Docs.
You could try something like
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
gs = GridSpec(8, 39)
ax1 = plt.subplot(gs[:6, :35])
ax2 = plt.subplot(gs[6:, :])
data1 = np.random.rand(6, 35)
data2 = np.random.rand(2, 39)
ax1.imshow(data1)
ax2.imshow(data2)
plt.show()

Python Subplot function parameters

I am having a hard time with putting in the parameters for the python subplot function.
What I want is to plot 4 graphs on a same image file with the following criteria
left
space
right
space
left
space
right
I have tried different ways of the 3 numbers but the output doesnt show up correctly.

Do you mean something like this?
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(4,2,1)
ax2 = fig.add_subplot(4,2,4)
ax3 = fig.add_subplot(4,2,5)
ax4 = fig.add_subplot(4,2,8)
fig.subplots_adjust(hspace=1)
plt.show()

Well, the not-so-easily-found documentation regarding the sublot function template is as follows:
subplot (number_of_graphs_horizontal, number of graphs_vertical, index)
Let us investigate the code from Joe Kington above:
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(4,2,1)
ax2 = fig.add_subplot(4,2,4)
ax3 = fig.add_subplot(4,2,5)
ax4 = fig.add_subplot(4,2,8)
fig.subplots_adjust(hspace=1)
plt.show()
You told matplotlib that you want a grid with 4 rows and 2 columns of graphs. ax1, ax2 and so on are the graphs that you add at the index positions which you can read as the third parameter. You count from left to right in a row-wise manner.
I hope that helped :)

Matplotlib provides several ways deal with the deliberate placement of plots on a single page; i think the best is gridspec, which i believe first appeared in the 1.0 release. The other two, by the way, are (i) directly indexing subplot and (ii) the new ImageGrid toolkit).
GridSpec works like grid-based packers in GUI toolkits used to placed widgets in a parent frame, so for that reason at least, it seems the easiest to use and the most configurable of the three placement techniques.
import numpy as NP
import matplotlib.pyplot as PLT
import matplotlib.gridspec as gridspec
import matplotlib.cm as CM
V = 10 * NP.random.rand(10, 10) # some data to plot
fig = PLT.figure(1, (5., 5.)) # create the top-level container
gs = gridspec.GridSpec(4, 4) # create a GridSpec object
# for the arguments to subplot that are identical across all four subplots,
# to avoid keying them in four times, put them in a dict
# and let subplot unpack them
kx = dict(frameon = False, xticks = [], yticks = [])
ax1 = PLT.subplot(gs[0, 0], **kx)
ax3 = PLT.subplot(gs[2, 0], **kx)
ax2 = PLT.subplot(gs[1, 1], **kx)
ax4 = PLT.subplot(gs[3, 1], **kx)
for itm in [ax1, ax2, ax3, ax4] :
itm.imshow(V, cmap=CM.jet, interpolation='nearest')
PLT.show()
Beyond just arranging the four plots in a 'checkerboard' configuration (per your Question), I have not tried to tune this configuration, but that's easy to do. E.g.,
# to change the space between the cells that hold the plots:
gs1.update(left=.1, right=,1, wspace=.1, hspace=.1)
# to create a grid comprised of varying cell sizes:
gs = gridspec.GridSpec(4, 4, width_ratios=[1, 2], height_ratios=[4, 1])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seaborn heatmap widths do not match when using subplots - python

Related

set custom tick labels on heatmap color bar

How to plot a stacked area plot

How to combine 2 dataframe histograms in 1 plot?

Python: subplots with different total sizes

Python Subplot function parameters

Categories

Resources