I have a dataframe X with 30 variables, v1, v2 ... v30 and
col_name=[v1,v2.....v30]
For each variable, I want to plot the histogram to understand the variable distribution. However, it is too manual to write code to plot one by one, can I have something like a for loop to draw 30 histograms one under another at one go?
For example:
for i in range(30):
hist(np.array(X[col_name[i]]).astype(np.float),bins=100,color='blue',label=col_name[i],normed=1,alpha=0.5)
How can I do that? Like one page of graphs (each with title and label) so that I can scroll down to read.
You could do something like this:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.normal(0, 10)
df = pd.DataFrame({
'v1': np.random.normal(0, 3, 20),
'v2': np.random.normal(0, 3, 20),
'v3': np.random.normal(0, 3, 20),
'v4': np.random.normal(0, 3, 20),
'v5': np.random.normal(0, 3, 20),
'v6': np.random.normal(0, 3, 20),
})
# Generically define how many plots along and across
ncols = 3
nrows = int(np.ceil(len(df.columns) / (1.0*ncols)))
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(10, 10))
# Lazy counter so we can remove unwated axes
counter = 0
for i in range(nrows):
for j in range(ncols):
ax = axes[i][j]
# Plot when we have data
if counter < len(df.columns):
ax.hist(df[df.columns[counter]], bins=10, color='blue', alpha=0.5, label='{}'.format(df.columns[counter]))
ax.set_xlabel('x')
ax.set_ylabel('PDF')
ax.set_ylim([0, 5])
leg = ax.legend(loc='upper left')
leg.draw_frame(False)
# Remove axis when we no longer have data
else:
ax.set_axis_off()
counter += 1
plt.show()
Results in:
Adapted from: How do I get multiple subplots in matplotlib?
Related
I want to create a heatmap in seaborn, and have a nice way to see the labels.
With ax.figure.tight_layout(), I am getting
which is obviously bad.
Without ax.figure.tight_layout(), the labels get cropped.
The code is
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sn
n_classes = 10
confusion = np.random.randint(low=0, high=100, size=(n_classes, n_classes))
label_length = 20
label_ind_by_names = {
"A"*label_length: 0,
"B"*label_length: 1,
"C"*label_length: 2,
"D"*label_length: 3,
"E"*label_length: 4,
"F"*label_length: 5,
"G"*label_length: 6,
"H"*label_length: 7,
"I"*label_length: 8,
"J"*label_length: 9,
}
# confusion matrix
df_cm = pd.DataFrame(
confusion,
index=label_ind_by_names.keys(),
columns=label_ind_by_names.keys()
)
plt.figure()
sn.set(font_scale=1.2)
ax = sn.heatmap(df_cm, annot=True, annot_kws={"size": 16}, fmt='d')
# ax.figure.tight_layout()
plt.show()
I would like to create an extra legend based on label_ind_by_names, then post an abbreviation on the heatmap itself, and be able to look up the abbreviation in the legend.
How can this be done in seaborn?
You can define your own legend handler, e.g. for integers:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sn
n_classes = 10
confusion = np.random.randint(low=0, high=100, size=(n_classes, n_classes))
label_length = 20
label_ind_by_names = {
"A"*label_length: 0,
"B"*label_length: 1,
"C"*label_length: 2,
"D"*label_length: 3,
"E"*label_length: 4,
"F"*label_length: 5,
"G"*label_length: 6,
"H"*label_length: 7,
"I"*label_length: 8,
"J"*label_length: 9,
}
# confusion matrix
df_cm = pd.DataFrame(
confusion,
index=label_ind_by_names.values(),
columns=label_ind_by_names.values()
)
fig, ax = plt.subplots(figsize=(10, 5))
fig.subplots_adjust(left=0.05, right=.65)
sn.set(font_scale=1.2)
sn.heatmap(df_cm, annot=True, annot_kws={"size": 16}, fmt='d', ax=ax)
class IntHandler:
def legend_artist(self, legend, orig_handle, fontsize, handlebox):
x0, y0 = handlebox.xdescent, handlebox.ydescent
text = plt.matplotlib.text.Text(x0, y0, str(orig_handle))
handlebox.add_artist(text)
return text
ax.legend(label_ind_by_names.values(),
label_ind_by_names.keys(),
handler_map={int: IntHandler()},
loc='upper left',
bbox_to_anchor=(1.2, 1))
plt.show()
Explanation of the hard-coded figures: the first two are the left and right extreme positions of the Axes in the figure (0.05 = 5 % for the figure width etc). 1.2 and 1 is the location of the upper left corner of the legend box relative to the Axes (1, 1 is the upper right corner of the Axes, we add 0.2 to 1 to account for the space used by the colorbar). Ideally one would use a constrained layout instead of fiddeling with the parameters but it doesn't (yet) support figure legends and if using an Axes legend, it places it between the Axes and the colorbar.
I have a dataframe with 1000 simulations of a portfolio's returns. I am able to graph the simulations and do the respective histogram separately, but I have absolutely no idea how to merge them in order to resemble the following image:
please take this example of data in order to facilitate answers:
import numpy as np
import pandas as pd
def simulate_panel(T, N):
"""" This function simulates return paths"""
dates = pd.date_range("20210218", periods=T, freq='D')
columns = []
for i in range(N):
columns.append(str(i+1))
return pd.DataFrame(np.random.normal(0, 0.01, size=(T, N)), index=dates,
columns=columns)
df=(1+simulate_panel(1000,1000)).cumprod()
df.plot(figsize=(8,6),title=('Bootstrap'), legend=False)
Thank you very much in advance.
To color the curves via their last value, they can be drawn one-by-one. With a colormap and a norm, the value can be converted to the appropriate color. Using some transparency (alpha), the most visited positions will be colored stronger.
In a second subplot, a vertical histogram can be drawn, with the bars colored similarly.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def simulate_panel(T, N):
"""" This function simulates return paths"""
dates = pd.date_range("20210218", periods=T, freq='D')
columns = [(str(i + 1)) for i in range(N)]
return pd.DataFrame(np.random.normal(0, 0.01, size=(T, N)), index=dates, columns=columns)
df = (1 + simulate_panel(1000, 1000)).cumprod()
fig, (ax1, ax2) = plt.subplots(ncols=2, sharey=True, figsize=(12, 4),
gridspec_kw={'width_ratios': [5, 1], 'wspace': 0})
data = df.to_numpy().T
cmap = plt.cm.get_cmap('turbo')
norm = plt.Normalize(min(data[:, -1]), max(data[:, -1]))
for row in data:
ax1.plot(df.index, row, c=cmap(norm(row[-1])), alpha=0.1)
ax1.margins(x=0)
_, bin_edges, bars = ax2.hist(data[:, -1], bins=20, orientation='horizontal')
for x0, x1, bar in zip(bin_edges[:-1], bin_edges[1:], bars):
bar.set_color(cmap(norm((x0 + x1) / 2)))
ax2.tick_params(left=False)
plt.tight_layout()
plt.show()
You can use GridSpec to set up axes for line chart and the histogram next to each other:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
# layout
fig = plt.figure()
gs = fig.add_gridspec(1, 2, wspace=0, width_ratios=[9, 1])
ax = gs.subplots(sharey=True)
# line chart
z = df.iloc[-1]
df.plot(figsize=(8,6), title=('Bootstrap'), legend=False, ax=ax[0],
color=cm.RdYlBu_r((z - z.min()) / (z.max() - z.min())))
# histogram
n_bins = 20
cnt, bins, patches = ax[1].hist(
z, np.linspace(z.min(), z.max(), n_bins),
ec='k', orientation='horizontal')
colors = cm.RdYlBu_r((bins - z.min()) / (z.max() - z.min()))
for i, p in enumerate(patches):
p.set_color(colors[i])
I cannot for the life of me find a similar question to this, and I have been pulling my hair out trying to figure out how to do this. It seems like it should be a simple thing!
The setup: I have some X vs Y data grouped into bins, and each bin contains X and Y data points. For each bin, I would like to plot the mean of X vs mean of Y along with their respective stdevs, and most importantly: color code each bin using the Seaborn "colorblind" palette (this is mandatory).
What I've tried: Everything under the sun. Lineplot, scatterplot, catplot, plotpoints. And when none of those were working, I tried to use matplotlib's "errorbars" but I apparently can't seem to export Seaborn's "colorblind" palette to matplotlib so that was a bust too.
Some dummy code:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
some_data = pd.DataFrame({'X':[9,10,11,12,39,40,41,42], 'Y':[99,100,110,111,499,500,510,511], 'Bin':[1,1,1,1,2,2,2,2]})
Results of some tries:
sns.pointplot(x="X", y="Y", data=some_data, legend='full', hue='Bin')
Scatterplot completely screws up the x-axis scale, so that's another issue that I haven't been able to work around.
sns.lineplot(x="X", y="Y", data=some_data, legend='full', hue='Bin', err_style="band", estimator="mean", ci='sd')
Better but it's just drawing a line between the points instead of calculating the mean and stdev, which, I thought it would do when I specify an estimator and confidence interval method!!!!!.
sns.scatterplot(x="X", y="Y", data=some_data, legend='full', hue='Bin')
Scatterplot is fine, but it doesn't possess estimator functionality so I'm literally just plotting the raw data.
I'm just completely lost on what to do. I've been at this all night. It's 4:30 AM and I've barely slept for the past few nights. Any help would be appreciated!
The following approach draws an ellipse using the mean and sdevs:
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import pandas as pd
import seaborn as sns
df = pd.DataFrame({'X':[9,10,11,12,39,40,41,42], 'Y':[99,100,110,111,499,500,510,511], 'Bin':[1,1,1,1,2,2,2,2]})
means = df.groupby('Bin').mean()
sdevs = df.groupby('Bin').std()
fig, ax = plt.subplots()
colors = ['crimson', 'dodgerblue']
sns.scatterplot(x='X', y='Y', hue='Bin', palette=colors, data=df, ax=ax)
sns.scatterplot(x='X', y='Y', data=means, color='limegreen', label='means', ax=ax)
for (_, mean), (_, sdev), color in zip(means.iterrows(), sdevs.iterrows(), colors):
ellipse = Ellipse((mean['X'], mean['Y']), width=2*sdev['X'], height=2*sdev['Y'],
facecolor=color, alpha=0.3)
ax.add_patch(ellipse)
plt.show()
Here is a more elaborate example, showing ellipses for 1, 2 and 3 times the sdev.
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import pandas as pd
import numpy as np
import seaborn as sns
K = 5
N = 100
df = pd.DataFrame({'X': np.random.normal(np.tile(np.random.uniform(10, 40, K), N), np.tile([3, 4, 7, 9, 10], N)),
'Y': np.random.normal(np.tile(np.random.uniform(90, 500, K), N), np.tile([20, 25, 8, 22, 18], N)),
'Bin': np.tile(np.arange(1, K + 1), N)})
means = df.groupby('Bin').mean()
sdevs = df.groupby('Bin').std()
fig, axes = plt.subplots(ncols=2, figsize=(12, 4))
colors = ['crimson', 'dodgerblue', 'limegreen', 'turquoise', 'gold']
for ax in axes:
sns.scatterplot(x='X', y='Y', hue='Bin', palette=colors, s=5, ec='none', data=df, ax=ax)
sns.scatterplot(x='X', y='Y', marker='o', s=50, fc='none', ec='black', label='means', data=means, ax=ax)
if ax == axes[1]:
for (_, mean), (_, sdev), color in zip(means.iterrows(), sdevs.iterrows(), colors):
for sdev_mult in [1, 2, 3]:
ellipse = Ellipse((mean['X'], mean['Y']), width=2 * sdev['X'] * sdev_mult,
height=2 * sdev['Y'] * sdev_mult,
facecolor=color, alpha=0.2 if sdev_mult == 1 else 0.1)
ax.add_patch(ellipse)
plt.show()
I acknowledge this is not the full answer - but I hope it will help with the data stats and give you some direction with the plot. I'm not terribly good with matplot/seaborn, so to get this over to you, I've quickly written the graph in plotly. I hope it at least provides some direction for you ...
Mean / Std:
import pandas as pd
from plotly.offline import iplot
x = [9, 10, 11, 12, 39, 40, 41, 42]
y = [99, 100, 110, 111, 499, 500, 510, 511]
b = [1, 1, 1, 1, 2, 2, 2, 2]
df = pd.DataFrame({'x': x, 'y': y, 'bin': b})
df = df.groupby(['bin']).agg(['mean', 'std'])
df.columns = ['_'.join(c).rstrip('_') for c in df.columns.to_list()]
df.reset_index(inplace=True)
Output:
bin x_mean x_std y_mean y_std
0 1 10.5 1.290994 105 6.377042
1 2 40.5 1.290994 505 6.377042
Plotting:
data = []
for row in df.itertuples():
data.append({'x': [row.x_mean],
'y': [row.y_mean],
'mode': 'markers',
'name': '{} mean'.format(row.bin),
'marker': {'size': 25}})
data.append({'x': [row.x_std],
'y': [row.y_std],
'mode': 'markers',
'name': '{} std'.format(row.bin),
'marker': {'size': 25}})
iplot({'data': data})
Output:
Note that as the stds are the same, the red/purple dots overlay each other.
I hope this helps a bit ...
I have a figure containing a graph and two tables.
I want to align the x-position of each sample with the center of the respective column.
The amount of columns is the same as the amount of samples to plot.
I have found this related question, which covers the same question but for a bar chart.
I couldn't transfer the result to my case.
Here is a minimal, working code example:
import matplotlib.pyplot
import numpy as np
a = np.arange(20)
b = np.random.randint(1, 5, 20)
fig, ax = plt.subplots()
ax.plot(a, b, marker='o')
ax.table(np.random.randint(1, 5, (4, 20)), loc="top")
ax.table(np.random.randint(1, 5, (4, 20)))
ax.set_xticklabels([])
plt.subplots_adjust(top=0.85, bottom=0.15)
fig.savefig('test.png')
It creates this output:
As you can see, the circles representing the samples are not centered towards the respective columns.
Any help appreciated!
For me it always worked to change the xlim and thereby hardcoding the alignment.
plt.xlim(left=first-0.5, right=last+0.5)
integrating this into your example would lead to:
import matplotlib.pyplot
import numpy as np
a = np.arange(20)
b = np.random.randint(1, 5, 20)
fig, ax = plt.subplots()
ax.plot(a, b, marker='o')
ax.table(np.random.randint(1, 5, (4, 20)), loc="top")
ax.table(np.random.randint(1, 5, (4, 20)))
ax.set_xticklabels([])
plt.xlim(left=a[0]-0.5, right=a[-1]+0.5)
plt.subplots_adjust(top=0.85, bottom=0.15)
fig.savefig('test.png')
Hope that helps!
Here is my code (adapted from here):
df_1 = pd.DataFrame({'Cells' : np.arange(0,100), 'Delta_7' : np.random.rand(100,), 'Delta_10' : np.random.rand(100,), 'Delta_14' : np.random.rand(100,)}, columns = ['Cells','Delta_7', 'Delta_10', 'Delta_14'])
#figure
fig, ax1 = plt.subplots()
fig.set_size_inches(13, 10)
#c sequence
c = df_1['Delta_7']
#plot
plt.scatter(np.full((len(df_1), 1), 1), df_1['Delta_7'] , s = 50, c=c, cmap = 'viridis')
plt.scatter(np.full((len(df_1), 1), 2), df_1['Delta_10'] , s = 50, c=c, cmap = 'viridis')
plt.scatter(np.full((len(df_1), 1), 3), df_1['Delta_14'] , s = 50, c=c, cmap = 'viridis')
cbar = plt.colorbar()
I would like to make a beautiful jitterplot (like on R or seaborn) with matplotlib. The thing is that I would like to give each cell a color based on its 'Delta_7' value. And this color would be kept when plotting 'Delta_10' and 'Delta_14', that I didn't manage to do with seaborn.
Please, could you let me know if you have any clue (python package, coding tricks …)?
Kindly,
The positions of the dots can be obtained from the list returned by scatter. These positions can be jittered, for example only in the x-direction. Possibly the range of the x-axis needs to be extended a bit to show every displaced dot.
Here is some code to start experimenting:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def jitter_dots(dots):
offsets = dots.get_offsets()
jittered_offsets = offsets
# only jitter in the x-direction
jittered_offsets[:, 0] += np.random.uniform(-0.3, 0.3, offsets.shape[0])
dots.set_offsets(jittered_offsets)
df_1 = pd.DataFrame({'Cells': np.arange(0, 100),
'Delta_7': np.random.rand(100),
'Delta_10': np.random.rand(100),
'Delta_14': np.random.rand(100)})
fig, ax1 = plt.subplots()
columns = df_1.columns[1:]
c = df_1['Delta_7']
for i, column in enumerate(columns):
dots = plt.scatter(np.full((len(df_1), 1), i), df_1[column], s=50, c=c, cmap='plasma')
jitter_dots(dots)
plt.xticks(range(len(columns)), columns)
xmin, xmax = plt.xlim()
plt.xlim(xmin - 0.3, xmax + 0.3) # make some room to show the jittered dots
cbar = plt.colorbar()
plt.show()