How to create subplots of all column combinations from two dataframes - python

I have a made a function which plots input variables against predicted variables.
dummy_data = pd.DataFrame(np.random.uniform(low=65.5,high=140.5,size=(50,4)), columns=list('ABCD'))
dummy_predicted = pd.DataFrame(np.random.uniform(low=15.5,high=17.5,size=(50,4)), columns=list('WXYZ'))
##Plot test input distriubtions
fig = plt.figure(figsize=(15,6))
n_rows = 1
n_cols = 4
counter = 1
for i in dummy_data.keys():
plt.subplot(n_rows, n_cols, counter)
plt.scatter(dummy_data[i], dummy_predicted['Z'])
plt.title(f'{i} vs Z')
plt.xlabel(i)
counter += 1
plt.tight_layout()
plt.show()
How do I create a 4 x 4 subplot of all combinations of 'ABCD' and 'WXYZ'? I can have any number of dummy_data and dummy_predicted columns so some dynamism would be useful.

Use itertools.product from the standard library, to create all combinations of column names, combos.
Use the len of each set of columns to determine nrows and ncols for plt.subplots
Flatten the array of axes to easily iterate through a 1D array instead of a 2D array.
zip combos and axes to iterate through, and plot each group with a single loop.
See this answer in How to plot in multiple subplots.
from itertools import product
import matplotlib.pyplot as plt
import numpy as np
# sample data
np.random.seed(2022)
dd = pd.DataFrame(np.random.uniform(low=65.5, high=140.5, size=(50, 4)), columns=list('ABCD'))
dp = pd.DataFrame(np.random.uniform(low=15.5, high=17.5, size=(50, 4)), columns=list('WXYZ'))
# create combinations of columns
combos = product(dd.columns, dp.columns)
# create subplots
fig, axes = plt.subplots(nrows=len(dd.columns), ncols=len(dp.columns), figsize=(15, 6))
# flatten axes into a 1d array
axes = axes.flat
# iterate and plot
for (x, y), ax in zip(combos, axes):
ax.scatter(dd[x], dp[y])
ax.set(title=f'{x} vs. {y}', xlabel=x, ylabel=y)
plt.tight_layout()
plt.show()

just do a double for loop
n_rows = len(dummy_data.columns)
n_cols = len(dummy_predicted.columns)
fig, axes = plt.subplots(n_rows, n_cols, figsize=(15,6))
for row, data_col in enumerate(dummy_data):
for col, pred_col in enumerate(dummy_predicted):
ax = axes[row][col]
ax.scatter(dummy_data[data_col], dummy_predicted[pred_col])
ax.set_title(f'{data_col} vs {pred_col}')
ax.set_xlabel(data_col)
plt.tight_layout()
plt.show()
Output:

Related

Automatically get the dimensions or indices of matplotlib gridspec

Given a gridspec object in matplotlib, I want to automatically iterate through all its indices so I can add the corresponding Axes automatically, something like:
for i, j in gspec.indices: # whatever those indices are
axs[i,j] = fig.add_subplot(gspec[i][j])
How do I do that, without knowing how many rows or columns the gridspec has in advance?
gspec.get_geometry() returns the number of rows and of columns. Here is some example code:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(constrained_layout=True)
gspec = fig.add_gridspec(3, 4)
nrows, ncols = gspec.get_geometry()
axs = np.array([[fig.add_subplot(gspec[i, j]) for j in range(ncols)] for i in range(nrows)])
t = np.linspace(0, 4 * np.pi, 1000)
for i in range(nrows):
for j in range(ncols):
axs[i, j].plot(np.sin((i + 1) * t), np.sin((j + 1) * t))
plt.show()
If axs isn't needed as numpy array, the conversion to numpy array can be left out.
Note that the code assumes you need a subplot in every possible grid position, which also can be obtained via fig, axs = plt.subplots(...). A gridspec is typically used when you want to combine grid positions to create custom layouts, as shown in the examples of the tutorial.

How to plot barplot 3D projection in matplotlib for multiple columns

I have a table that contains three different time characteristics according to two different parameters. I want to plot those parameters on x and y-axis and show bars of the three different times on the z-axis. I have created a simple bar plot where I plot one of the time characteristics:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
columns = ['R','Users','A','B','C']
df=pd.DataFrame({'R':[2,2,2,4,4,4,6,6,6,8,8],
'Users':[80,400,1000,80,400,1000,80,400,1000,80,400],
'A':[ 0.05381,0.071907,0.08767,0.04493,0.051825,0.05295,0.05285,0.0804,0.0967,0.09864,0.1097],
'B':[0.04287,0.83652,5.49683,0.02604,.045599,2.80836,0.02678,0.32621,1.41399,0.19025,0.2111],
'C':[0.02192,0.16217,0.71645, 0.25314,5.12239,38.92758,1.60807,262.4874,8493,11.6025,6288]},
columns=columns)
fig = plt.figure()
ax = plt.axes(projection="3d")
num_bars = 11
x_pos = df["R"]
y_pos = df["Users"]
z_pos = [0] * num_bars
x_size = np.ones(num_bars)/4
y_size = np.ones(num_bars)*50
z_size = df["A"]
ax.bar3d(x_pos, y_pos, z_pos, x_size, y_size, z_size, color='aqua')
plt.show()
This produces a simple 3d barplot:
However, I would like to plot similar bars next to the existing ones for the rest two columns (B and C) in a different color and add a plot legend as well. I could not figure out how to achieve this.
As a side question, is it as well possible to show only values from df at x- and y-axis? The values are 2-4-6-8 and 80-400-1000, I do not wish pyplot to add additional values on those axis.
I have managed to find a solution myself. To solve the problem with values I have added one to all times (to avoid negative log) and used np.log on all time columns. The values got on scale 0-10 this way and the plot got way easier to read. After that I used loop to go over each column and create corresponding values, positions and colors which I have added all to one list. I moved y_pos for each column so the columns do not plot on same position.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
columns = ['R','Users','A','B','C']
df=pd.DataFrame({'R':[2,2,2,4,4,4,6,6,6,8,8],
'Users':[80,400,1000,80,400,1000,80,400,1000,80,400],
'A':[ 0.05381,0.071907,0.08767,0.04493,0.051825,0.05295,0.05285,0.0804,0.0967,0.09864,0.1097],
'B':[0.04287,0.83652,5.49683,0.02604,.045599,2.80836,0.02678,0.32621,1.41399,0.19025,0.2111],
'C':[0.02192,0.16217,0.71645, 0.25314,5.12239,38.92758,1.60807,262.4874,8493,11.6025,6288]},
columns=columns)
fig = plt.figure(figsize=(10, 10))
ax = plt.axes(projection="3d")
df["A"] = np.log(df["A"]+1)
df["B"] = np.log(df["B"]+1)
df["C"] = np.log(df["C"]+1)
colors = ['r', 'g', 'b']
num_bars = 11
x_pos = []
y_pos = []
x_size = np.ones(num_bars*3)/4
y_size = np.ones(num_bars*3)*50
c = ['A','B','C']
z_pos = []
z_size = []
z_color = []
for i,col in enumerate(c):
x_pos.append(df["R"])
y_pos.append(df["Users"]+i*50)
z_pos.append([0] * num_bars)
z_size.append(df[col])
z_color.append([colors[i]] * num_bars)
x_pos = np.reshape(x_pos,(33,))
y_pos = np.reshape(y_pos,(33,))
z_pos = np.reshape(z_pos,(33,))
z_size = np.reshape(z_size,(33,))
z_color = np.reshape(z_color,(33,))
ax.bar3d(x_pos, y_pos, z_pos, x_size, y_size, z_size, color=z_color)
plt.xlabel('R')
plt.ylabel('Users')
ax.set_zlabel('Time')
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], marker='o', color='w', label='A',markerfacecolor='r', markersize=10),
Line2D([0], [0], marker='o', color='w', label='B',markerfacecolor='g', markersize=10),
Line2D([0], [0], marker='o', color='w', label='C',markerfacecolor='b', markersize=10)
]
# Make legend
ax.legend(handles=legend_elements, loc='best')
# Set view
ax.view_init(elev=35., azim=35)
plt.show()
Final plot:

How should a nested loop statement to create vertical lines in a for loop statement that creates histograms work?

I am trying to use a for loop to create histograms for each fields in a dataframe. The dataframe here is labeled as 'df4'.
There are 3 fields/columns.
Then I want to create vertical lines using quantiles for each of the columns as defined in the following series: p, exp, eng.
My code below only successfully creates the vertical lines on the last field/column or histogram.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df4 = pd.read_csv("xyz.csv", index_col = "abc_id" )
# dataframe
# x coordinates for the lines
p = df4['abc'].quantile([0.25,0.5,0.75,0.9,0.95])
exp = df4['efg'].quantile([0.25,0.5,0.75,0.9,0.95])
eng = df4['xyz'].quantile([0.25,0.5,0.75,0.9,0.95])
# colors for the lines
colors = ['r','k','b','g','y']
bins = [0,100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000]
fig, axs = plt.subplots(len(df4.columns), figsize=(10, 25))
for n, col in enumerate(df4.columns):
if (n==0):
for xc,c in zip(exp,colors):
plt.axvline(x=xc, label='line at x = {}'.format(xc), c=c)
if (n==1):
for xc,c in zip(eng,colors):
plt.axvline(x=xc, label='line at x = {}'.format(xc), c=c)
if (n==2):
for xc,c in zip(p,colors):
plt.axvline(x=xc, label='line at x = {}'.format(xc), c=c)
df[col].hist(ax=axs[n],bins=50)
plt.legend()
plt.show()

Pandas combine multiple subplots with same x axis into 1 bar chart

I am looping through a list containing 6 col_names. I loop by taking 3 cols at a time so i can print 3 subplots per iteration later.
I have 2 dataframes with same column names so they look identical except for the histograms of each column name.
I want to plot similar column names of both dataframes on the same subplot. Right now, im plotting their histograms on 2 separate subplots.
currently, for col 'A','B','C' in df_plot:
and for col 'A','B','C' in df_plot2:
I only want 3 charts where i can combine similar column names into same chart so there is blue and yellow bars in the same chart.
Adding df_plot2 below doesnt work. i think im not defining my second axs properly but im not sure how to do that.
col_name_list = ['A','B','C','D','E','F']
chunk_list = [col_name_list[i:i + 3] for i in xrange(0, len(col_name_list), 3)]
for k,g in enumerate(chunk_list):
df_plot = df[g]
df_plot2 = df[g][df[g] != 0]
fig, axs = plt.subplots(1,len(g),figsize = (50,20))
axs = axs.ravel()
for j,x in enumerate(g):
df_plot[x].value_counts(normalize=True).head().plot(kind='bar',ax=axs[j], position=0, title = x, fontsize = 30)
# adding this doesnt work.
df_plot2[x].value_counts(normalize=True).head().plot(kind='bar',ax=axs[j], position=1, fontsize = 30)
axs[j].title.set_size(40)
fig.tight_layout()
the solution is to plot on the same ax:
change axs[j] to axs
for k,g in enumerate(chunk_list):
df_plot = df[g]
df_plot2 = df[g][df[g] != 0]
fig, axs = plt.subplots(1,len(g),figsize = (50,20))
axs = axs.ravel()
for j,x in enumerate(g):
df_plot[x].value_counts(normalize=True).head().plot(kind='bar',ax=axs, position=0, title = x, fontsize = 30)
# adding this doesnt work.
df_plot2[x].value_counts(normalize=True).head().plot(kind='bar',ax=axs, position=1, fontsize = 30)
axs[j].title.set_size(40)
fig.tight_layout()
then just call plt.plot()
Example this will plot x and y on the same subplot:
import matplotlib.pyplot as plt
x = np.arange(0, 10, 1)
y = np.arange(0, 20, 2)
ax = plt.subplot(1,1)
fig = plt.figure()
ax = fig.gca()
ax.plot(x)
ax.plot(y)
plt.show()
EDIT:
There is now a squeeze keyword argument. This makes sure the result is always a 2D numpy array.
fig, ax2d = subplots(2, 2, squeeze=False)
if needed Turning that into a 1D array is easy:
axli = ax1d.flatten()

using two indices to refer to subplot axes when either the number of rows or columns is unity

It is possible to refer to different subplots using two indices to index their axes as in the following example
rows = 2
cols = 2
f, ax = plt.subplots(rows, cols)
x = np.arange(12)
y = xdata**2
plotFunction(x,y,ax,0,1)
def plotFunction(xdata, ydata, ax, i, j):
ax[i,j].plot(xdata, ydata, marker='o', label='quadratic')
however if either rows or cols = 1 pyplot does not permit the use of two indices. This precludes the generic use of my plotting function that relies on double index plotting. So the following won't work
rows = 1
cols = 2
f, ax = plt.subplots(rows, cols)
x = np.arange(12)
y = xdata**2
plotFunction(x,y,ax,0,1)
One way is to use the 'squeeze=False' option when calling subplots.
rows = 1
cols = 2
f, ax = plt.subplots(rows, cols, squeeze=False)
x = np.arange(12)
y = xdata**2
plotFunction(x,y,ax,0,1,label='quadratic')
def plotFunction(xdata, ydata, ax, i, j, label):
ax[i,j].plot(xdata, ydata, marker='o', label=label)
This permits [row,col] indexing in all cases.

Categories

Resources