Pandas combine multiple subplots with same x axis into 1 bar chart - python

I am looping through a list containing 6 col_names. I loop by taking 3 cols at a time so i can print 3 subplots per iteration later.
I have 2 dataframes with same column names so they look identical except for the histograms of each column name.
I want to plot similar column names of both dataframes on the same subplot. Right now, im plotting their histograms on 2 separate subplots.
currently, for col 'A','B','C' in df_plot:
and for col 'A','B','C' in df_plot2:
I only want 3 charts where i can combine similar column names into same chart so there is blue and yellow bars in the same chart.
Adding df_plot2 below doesnt work. i think im not defining my second axs properly but im not sure how to do that.
col_name_list = ['A','B','C','D','E','F']
chunk_list = [col_name_list[i:i + 3] for i in xrange(0, len(col_name_list), 3)]
for k,g in enumerate(chunk_list):
df_plot = df[g]
df_plot2 = df[g][df[g] != 0]
fig, axs = plt.subplots(1,len(g),figsize = (50,20))
axs = axs.ravel()
for j,x in enumerate(g):
df_plot[x].value_counts(normalize=True).head().plot(kind='bar',ax=axs[j], position=0, title = x, fontsize = 30)
# adding this doesnt work.
df_plot2[x].value_counts(normalize=True).head().plot(kind='bar',ax=axs[j], position=1, fontsize = 30)
axs[j].title.set_size(40)
fig.tight_layout()

the solution is to plot on the same ax:
change axs[j] to axs
for k,g in enumerate(chunk_list):
df_plot = df[g]
df_plot2 = df[g][df[g] != 0]
fig, axs = plt.subplots(1,len(g),figsize = (50,20))
axs = axs.ravel()
for j,x in enumerate(g):
df_plot[x].value_counts(normalize=True).head().plot(kind='bar',ax=axs, position=0, title = x, fontsize = 30)
# adding this doesnt work.
df_plot2[x].value_counts(normalize=True).head().plot(kind='bar',ax=axs, position=1, fontsize = 30)
axs[j].title.set_size(40)
fig.tight_layout()
then just call plt.plot()
Example this will plot x and y on the same subplot:
import matplotlib.pyplot as plt
x = np.arange(0, 10, 1)
y = np.arange(0, 20, 2)
ax = plt.subplot(1,1)
fig = plt.figure()
ax = fig.gca()
ax.plot(x)
ax.plot(y)
plt.show()
EDIT:
There is now a squeeze keyword argument. This makes sure the result is always a 2D numpy array.
fig, ax2d = subplots(2, 2, squeeze=False)
if needed Turning that into a 1D array is easy:
axli = ax1d.flatten()

Related

How to create subplots of all column combinations from two dataframes

I have a made a function which plots input variables against predicted variables.
dummy_data = pd.DataFrame(np.random.uniform(low=65.5,high=140.5,size=(50,4)), columns=list('ABCD'))
dummy_predicted = pd.DataFrame(np.random.uniform(low=15.5,high=17.5,size=(50,4)), columns=list('WXYZ'))
##Plot test input distriubtions
fig = plt.figure(figsize=(15,6))
n_rows = 1
n_cols = 4
counter = 1
for i in dummy_data.keys():
plt.subplot(n_rows, n_cols, counter)
plt.scatter(dummy_data[i], dummy_predicted['Z'])
plt.title(f'{i} vs Z')
plt.xlabel(i)
counter += 1
plt.tight_layout()
plt.show()
How do I create a 4 x 4 subplot of all combinations of 'ABCD' and 'WXYZ'? I can have any number of dummy_data and dummy_predicted columns so some dynamism would be useful.
Use itertools.product from the standard library, to create all combinations of column names, combos.
Use the len of each set of columns to determine nrows and ncols for plt.subplots
Flatten the array of axes to easily iterate through a 1D array instead of a 2D array.
zip combos and axes to iterate through, and plot each group with a single loop.
See this answer in How to plot in multiple subplots.
from itertools import product
import matplotlib.pyplot as plt
import numpy as np
# sample data
np.random.seed(2022)
dd = pd.DataFrame(np.random.uniform(low=65.5, high=140.5, size=(50, 4)), columns=list('ABCD'))
dp = pd.DataFrame(np.random.uniform(low=15.5, high=17.5, size=(50, 4)), columns=list('WXYZ'))
# create combinations of columns
combos = product(dd.columns, dp.columns)
# create subplots
fig, axes = plt.subplots(nrows=len(dd.columns), ncols=len(dp.columns), figsize=(15, 6))
# flatten axes into a 1d array
axes = axes.flat
# iterate and plot
for (x, y), ax in zip(combos, axes):
ax.scatter(dd[x], dp[y])
ax.set(title=f'{x} vs. {y}', xlabel=x, ylabel=y)
plt.tight_layout()
plt.show()
just do a double for loop
n_rows = len(dummy_data.columns)
n_cols = len(dummy_predicted.columns)
fig, axes = plt.subplots(n_rows, n_cols, figsize=(15,6))
for row, data_col in enumerate(dummy_data):
for col, pred_col in enumerate(dummy_predicted):
ax = axes[row][col]
ax.scatter(dummy_data[data_col], dummy_predicted[pred_col])
ax.set_title(f'{data_col} vs {pred_col}')
ax.set_xlabel(data_col)
plt.tight_layout()
plt.show()
Output:

two DataFrame plots

I have a similar plot to the one answered in the link below:
two DataFrame plot in a single plot matplotlip
I made some modification to plots for df2 columns code block because i think that is where i have to modify but i could not yield the output.
a sample of the plot i want is this
this was how i modified it:
f, axes = plt.subplots(nrows=len(signals.columns)+1, sharex=True, )
i = 0
for col in df2.columns:
fig, axs = plt.subplots()
sns.regplot(x='', y='', data=df2, ax=axs[0])
df2[col].plot(ax=axes[i], color='grey')
axes[i].set_ylabel(col)
i+=1
I have seen that its wrong.
I tried this out, it seems like a head way :)
How do I make modification on this to get what i want:
f, axes = plt.subplots(nrows=len(signals.columns)+1, sharex=True, )
# plots for df2 columns
i = 0
for col in df2.columns:
lw=1
df2[col].plot(ax=axes[i], color='grey')
axes[i].set_ylim(0, 1)
axes[i].set_ylabel(col)
sns.rugplot(df2["P1"])
You have several options to make this graph. df1 and df2 are as defined in your previous question
The version with matplotlib.pyplot.scatter is faster to draw, but less faithful to the example. The version with seaborn.rugplot looks identical to the example, but takes longer to draw. I highlighted the important part of the code between comment lines ########
using matplotlib.pyplot.scatter
import seaborn as sns
import numpy as np
f, axes = plt.subplots(nrows=len(df2.columns)+1, sharex=True,
gridspec_kw={'height_ratios':np.append(np.repeat(1, len(df2.columns)), 3)})
####### variable part below #######
# plots for df2 columns
i = 0
for col in df2.columns:
axes[i].scatter(x=df2.index, y=np.repeat(0, len(df2)), c=df2[col], marker='|', cmap='Greys')
axes[i].set_ylim(-0.5, 0.5)
axes[i].set_yticks([0])
axes[i].set_yticklabels([col])
i+=1
###################################
## code to plot annotations
axes[-1].set_xlabel('Genomic position')
axes[-1].set_ylabel('annotations')
axes[-1].set_ylim(-0.5, 1.5)
axes[-1].set_yticks([0, 1])
axes[-1].set_yticklabels(['−', '+'])
for _, r in df1.iterrows():
marker = '|'
lw=1
if r['type'] == 'exon':
marker=None
lw=8
y = 1 if r['strand'] == '+' else 0
axes[-1].plot((r['start'], r['stop']), (y, y),
marker=marker, lw=lw,
solid_capstyle='butt',
color='#505050')
# remove space between plots
plt.subplots_adjust(hspace=0)
axes[-1].set_xlim(0, len(df2))
f.set_size_inches(6, 2)
using seaborn.rugplot
import seaborn as sns
import numpy as np
f, axes = plt.subplots(nrows=len(df2.columns)+1, sharex=True,
gridspec_kw={'height_ratios':np.append(np.repeat(1, len(df2.columns)), 3)})
####### variable part below #######
import matplotlib
import matplotlib.cm as cm
norm = matplotlib.colors.Normalize(vmin=0, vmax=1, clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cm.Greys)
# plots for df2 columns
i = 0
for col in df2.columns:
sns.rugplot(x=df2.index, color=list(map(mapper.to_rgba, df2[col])), height=1, ax=axes[i])
axes[i].set_yticks([0])
axes[i].set_yticklabels([col])
i+=1
###################################
## code to plot annotations
axes[-1].set_xlabel('Genomic position')
axes[-1].set_ylabel('annotations')
axes[-1].set_ylim(-0.5, 1.5)
axes[-1].set_yticks([0, 1])
axes[-1].set_yticklabels(['−', '+'])
for _, r in df1.iterrows():
marker = '|'
lw=1
if r['type'] == 'exon':
marker=None
lw=8
y = 1 if r['strand'] == '+' else 0
axes[-1].plot((r['start'], r['stop']), (y, y),
marker=marker, lw=lw,
solid_capstyle='butt',
color='#505050')
# remove space between plots
plt.subplots_adjust(hspace=0)
axes[-1].set_xlim(0, len(df2))
f.set_size_inches(6, 2)

Adjusting gridspec so that plotted data aligns

I have several graphs to plot, all having the width a multiple of some unit as in the figure below.
So the bottom axis is 1/4 of the whole width, the second-to-bottom one is 2/4 of the width etc.
The code I am using:
import matplotlib.pyplot as plt
divs = 4
fig = plt.figure()
gs = fig.add_gridspec(ncols = divs, nrows = divs)
axes = [fig.add_subplot(gs[div, div:]) for div in range(divs)]
for row in range(divs):
axes[row].plot([1]*10*(divs - row), c = 'r')
axes[row].set_xlabel('', fontsize = 6)
fig.set_figheight(10)
fig.set_figwidth(10)
plt.show()
My problem is that the plots don't exactly align as I want them to: The plot on row 2 begins slightly to the right of the '10' tick mark on the plot on row 1, and the same applies for the plot on row 3 vs the plot on row 2 etc. I would like the beginning of the plot on row 2 to synchronize precisely with the '10' on row 1, and likewise for the other plots. How is this achievable (not necessarily but preferably using gridspec)?
I tried adding axes[row].tick_params(axis="y",direction="in", pad=-22) to push the y-axis inside the plot but that didn't change the alignment. Also I tried using fig.tight_layout(pad = 0.3): this did not change the alignment either.
If you set the default value of the margin of the graph X-axis to 0, the ticks will match.
import matplotlib.pyplot as plt
divs = 4
fig = plt.figure()
gs = fig.add_gridspec(ncols = divs, nrows = divs)
plt.rcParams['axes.xmargin'] = 0.0 #updated
axes = [fig.add_subplot(gs[div, div:]) for div in range(divs)]
for row in range(divs):
axes[row].plot([1]*10*(divs - row), c = 'r')
axes[row].set_xlabel('', fontsize = 6)
fig.set_figheight(10)
fig.set_figwidth(10)
plt.show()
subplots(gridspec_kw=()...)
import matplotlib.pyplot as plt
divs = 4
fig, axes = plt.subplots(4,1, gridspec_kw=dict(height_ratios=[1,1,1,1]), sharex='col', figsize=(10,10))
for row in range(divs):
axes[row].plot([1]*10*(divs - row), c = 'r')
axes[row].set_xlabel('', fontsize = 6)
plt.show()

Tick labels appearing twice

I am trying to create a figure that is a dendrogram on top of a scatterplot, where the ends of the leaves on the dendrogram match up with the dots on the scatterplot, which in turn match up with the tick labels below. I have this working, but for some reason the tick labels appear twice. The labels in red and green are the ones I'm trying to keep.
This is my code:
import pandas as pd
from matplotlib import pyplot as plt
import scipy.cluster.hierarchy as sch
import numpy as np
import json
import random
def scatter_and_dendrogram(df, colors,wn='',label_x=False):
'''Args:
df (Pandas DataFrame): similarity matrix
colors (list of strs): list of colors
wn (str): window name
label_x=False(Bool): whether or not to label x axis
Returns: None
'''
norm = plt.Normalize(1,4)
dist_matrix = [] #linkage
for i in range(len(df)):
arr = []
for j in range(1,len(df.iloc[i])):
arr.append(df.iloc[i,j])
dist_matrix.append(list(arr))
X = np.asarray(dist_matrix)
Z = sch.linkage(X, 'ward')
sch.set_link_color_palette(['b'])
fig = plt.figure()
fig, axs = plt.subplots(2, 1, sharex='col', sharey='row',
gridspec_kw={'width_ratios': [1],
'height_ratios': [30, 1],
'hspace': 0, 'wspace': 0})
(ax1, ax2) = axs
dendrogram = sch.dendrogram(Z=Z, p=3,ax=ax1)
icoords = dendrogram['icoord']
dcoords = dendrogram['dcoord']
lst = [[],[],colors]
for i in range(len(icoords)):
ic = icoords[i]
dc = dcoords[i]
if dc.count(0) == 2:
lst[0].append(ic[0])
lst[0].append(ic[-1])
elif dc.count(0) == 1:
ind = dc.index(0)
lst[0].append(ic[ind])
lst[1] = [-0.1]*len(lst[0])
ax2.scatter(lst[0],lst[1],s=10,norm=norm, alpha=0.7)
fig.canvas.set_window_title(wn)
ax1.set_yticklabels([])
ax1.set_xticklabels([])
ax2.set_yticklabels([])
ax2.set_xticklabels([])
if label_x:
letters = list('ABCD')
labels = [letters[ind] for ind in dendrogram['leaves']]
c1 = '#ff0033' #red
c2 = '#006600'#green
xlbls = ax2.set_xticklabels(labels,fontsize=11,linespacing=3)
for lbl in xlbls:
t = lbl.get_text()
c = c2
if letters.index(t) < 2:
c = c1
print(c)
lbl.set_color(c)
ax1.set_title(wn)
ax1.set_ylabel('Aggregation Criterion',fontsize=15)
ax2.set_xlabel('Articles', fontsize=15)
plt.show()
l = ['A','B','C','D']
df = pd.DataFrame(index=l, columns=l)
for i in range(len(l)-1):
for j in range(i+1, len(l)):
r = random.randint(0, 10)
df.iloc[i,j] = r
df.iloc[j, i] = r
df.fillna(0,inplace=True)
print(df)
wn = 'Set C'
scatter_and_dendrogram(df, l, wn,True)
This is what it looks like:
According to matplotlib.pyplot.subplots about sharex and sharey
When subplots have a shared x-axis along a column, only the x tick
labels of the bottom subplot are created.
Similarly, when subplots have a shared y-axis along a row, only the y tick labels of the first column subplot are created.
To later turn other subplots' ticklabels on, use tick_params.
You need to add ax1.tick_params(axis='x', labelbottom=False) under xlbls = ax2.set_xticklabels.
Besides, if fig = plt.figure() is useless, remove it.

How to have a secondary y axis in a nested GridSpec?

I'd like to obtain this figure:
But with two plots inside each graph, like this:
Here is a sample of the code I used for the first figure
measures = ['ACE', 'SCE', 'LZs', 'LZc']
conditions = ['dark','light','flick3','flick10','switch']
outer_grid = gridspec.GridSpec(2,2)
for measure in measures:
inner_grid = gridspec.GridSpecFromSubplotSpec(5, 1, subplot_spec=outer_grid[measures.index(measure)])
ax={}
for cond in conditions:
c=conditions.index(cond)
ax[c] = plt.Subplot(fig, inner_grid[c])
if c != 0:
ax[c].get_shared_y_axes().join(ax[0], ax[c])
ax[c].plot()
ax[c+n]=ax[c].twinx()
ax[c+n].scatter()
ax[c+n].set_ylim(0,5)
fig.add_subplot(ax[c],ax[c+n])
For the second plot, it's basically the same without the first loop and GridSpec, using ax[c]=plt.subplot('51{c}') instead of ax[c]=plt.Subplot(fig, inner_grid[c]).
As you can see, when using GridSpec I still have the secondary y axis but not the scatter plot associated.
I guess the short question would be How to write fig.add_subplot(ax[c],ax[c+n]) properly?
(fig.add_subplot(ax[c]) fig.add_subplot(ax[c+n]) in two lines doesn't work.)
It is not clear from your question exactly which data you're plotting in each subplot, plus the way you're creating your subplots seems a little convoluted, which is probably why you're having problems. Here is how I would do it:
import matplotlib.gridspec as gs
measures = ['ACE', 'SCE', 'LZs', 'LZc']
conditions = ['dark','light','flick3','flick10','switch']
colors = ['g','c','b','r','grey']
Npoints = 10
data = [np.random.random((Npoints,len(measures))) for i in range(len(conditions))]
gs00 = gs.GridSpec(len(conditions), 1)
fig = plt.figure(figsize=(5,5))
for i,condition in enumerate(conditions):
ax1 = fig.add_subplot(gs00[i])
ax2 = ax1.twinx()
ax1.plot(range(Npoints), data[i][:,0], 'o-', color=colors[i], label=measures[0])
ax2.plot(range(Npoints), data[i][:,1], 'o-.', color=colors[i], label=measures[1])
ax1.set_ylim((-0.1,1.1))
ax2.set_ylim(ax1.get_ylim())
ax1.set_title(condition)
EDIT to get the same thing repeated 4 times, the logic is exactly the same, you just have to play around with the gridspec. But the only things that matters are the lines ax1 = fig.add_subplot(gs01[j]) followed by ax2 = ax1.twinx(), which will create a second axis on top of the first
import matplotlib.gridspec as gs
measures = ['ACE', 'SCE', 'LZs', 'LZc']
conditions = ['dark','light','flick3','flick10','switch']
colors = ['g','c','b','r','grey']
Npoints = 10
data = [np.random.random((Npoints,len(measures))) for i in range(len(conditions))]
gs00 = gs.GridSpec(2,2)
plt.style.use('seaborn-paper')
fig = plt.figure(figsize=(10,10))
grid_x, grid_y = np.unravel_index(range(len(measures)),(2,2))
for i,measure in enumerate(measures):
gs01 = gs.GridSpecFromSubplotSpec(len(conditions), 1, subplot_spec=gs00[grid_x[i],grid_y[i]])
for j,condition in enumerate(conditions):
ax1 = fig.add_subplot(gs01[j])
ax2 = ax1.twinx()
ax1.plot(range(Npoints), data[j][:,0], 'o-', color=colors[j], label=measures[0])
ax2.plot(range(Npoints), data[j][:,1], 'o-.', color=colors[j], label=measures[1])
ax1.set_ylim((-0.1,1.1))
ax2.set_ylim(ax1.get_ylim())
if j==0:
ax1.set_title(measure)

Categories

Resources