Plot two histograms on single chart with matplotlib - python

I created a histogram plot using data from a file and no problem. Now I wanted to superpose data from another file in the same histogram, so I do something like this
n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)
but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden. I wonder how could I plot both histograms at the same time with different colors.

Here you have a working example:
import random
import numpy
from matplotlib import pyplot
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
bins = numpy.linspace(-10, 10, 100)
pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()

The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html
EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by #stochastic_zeitgeist

In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis. For example:
import numpy as np
import matplotlib.pyplot as plt
#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']
#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()
In this case, you can plot your two data sets on different axes. To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don't overlap):
#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis
#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])
#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()

As a completion to Gustavo Bezerra's answer:
If you want each histogram to be normalized (normed for mpl<=2.1 and density for mpl>=3.1) you cannot just use normed/density=True, you need to set the weights for each value instead:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
As a comparison, the exact same x and y vectors with default weights and density=True:

You should use bins from the values returned by hist:
import numpy as np
import matplotlib.pyplot as plt
foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution
_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)

Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:
def plotHistogram(p, o):
"""
p and o are iterables with the values you want to
plot the histogram of
"""
plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
plt.show()

Plotting two overlapping histograms (or more) can lead to a rather cluttered plot. I find that using step histograms (aka hollow histograms) improves the readability quite a bit. The only downside is that in matplotlib the default legend for a step histogram is not properly formatted, so it can be edited like in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
from matplotlib.lines import Line2D
rng = np.random.default_rng(seed=123)
# Create two normally distributed random variables of different sizes
# and with different shapes
data1 = rng.normal(loc=30, scale=10, size=500)
data2 = rng.normal(loc=50, scale=10, size=1000)
# Create figure with 'step' type of histogram to improve plot readability
fig, ax = plt.subplots(figsize=(9,5))
ax.hist([data1, data2], bins=15, histtype='step', linewidth=2,
alpha=0.7, label=['data1','data2'])
# Edit legend to get lines as legend keys instead of the default polygons
# and sort the legend entries in alphanumeric order
handles, labels = ax.get_legend_handles_labels()
leg_entries = {}
for h, label in zip(handles, labels):
leg_entries[label] = Line2D([0], [0], color=h.get_facecolor()[:-1],
alpha=h.get_alpha(), lw=h.get_linewidth())
labels_sorted, lines = zip(*sorted(leg_entries.items()))
ax.legend(lines, labels_sorted, frameon=False)
# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Add annotations
plt.ylabel('Frequency', labelpad=15)
plt.title('Matplotlib step histogram', fontsize=14, pad=20)
plt.show()
As you can see, the result looks quite clean. This is especially useful when overlapping even more than two histograms. Depending on how the variables are distributed, this can work for up to around 5 overlapping distributions. More than that would require the use of another type of plot, such as one of those presented here.

It sounds like you might want just a bar graph:
http://matplotlib.sourceforge.net/examples/pylab_examples/bar_stacked.html
http://matplotlib.sourceforge.net/examples/pylab_examples/barchart_demo.html
Alternatively, you can use subplots.

There is one caveat when you want to plot the histogram from a 2-d numpy array. You need to swap the 2 axes.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()

Also an option which is quite similar to joaquin answer:
import random
from matplotlib import pyplot
#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()
Gives the following output:

Just in case you have pandas (import pandas as pd) or are ok with using it:
test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)],
[random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()

This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.
import seasborn as sns
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)
Some helpful examples are here for kde vs histogram comparison.

Inspired by Solomon's answer, but to stick with the question, which is related to histogram, a clean solution is:
sns.distplot(bar)
sns.distplot(foo)
plt.show()
Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.

Related

How to overlay two 2D-histograms in Matplotlib?

I have two datasets (corresponding with the time-positional data of hydrogen atoms and time-positional data of alumina atoms) in the same system.
I want to plot the density of each element by overlaying two hist2d plots using matplotlib.
I am currently doing this by setting an alpha value on the second hist2d:
fig, ax = plt.subplots(figsize=(4, 4))
v = ax.hist2d(x=alx, y=aly,
bins=50, cmap='Reds')
h = ax.hist2d(x=hx, y=hy,
bins=50, cmap='Blues',
alpha=0.7)
ax.set_title('Adsorption over time, {} K'.format(temp))
ax.set_xlabel('picoseconds')
ax.set_ylabel('z-axis')
fig.colorbar(h[3], ax=ax)
fig.savefig(savename, dpi=300)
I do get the plot that I want, however the colors seem washed out due to the alpha value.
Is there a more correct way to do generate such plots?
One way to achieve this would be a to add fading alphas towards lower levels to the existing color maps:
import numpy as np
import matplotlib.pylab as pl
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
# modify existing Reds colormap with a linearly fading alpha
red = pl.cm.Reds # original colormap
fading_red = red(np.arange(red.N)) # extract colors
fading_red[:, -1] = np.linspace(0, 1, red.N) # modify alpha
fading_red = ListedColormap(fading_red) # convert to colormap
# data generation
random_1 = np.random.randn(10000)+1
random_2 = np.random.randn(10000)+1
random_3 = np.random.randn(10000)
random_4 = np.random.randn(10000)
# plot
fig, ax = plt.subplots(1,1)
plt.hist2d(x=random_3, y=random_4, bins=100, cmap="Blues")
plt.hist2d(x=random_1, y=random_2, bins=50, cmap=fading_red)
plt.show()

matplotlib: labeling of curves

When I create a plot with many curves it would be convenient to be able to label each curve at the right where it ends.
The result of plt.legend produces too many similar colors and the legend is overlapping the plot.
As one can see in the example below the use of plt.legend is not very effective:
import numpy as np
from matplotlib import pyplot as plt
n=10
x = np.linspace(0,1, n)
for i in range(n):
y = np.linspace(x[i],x[i], n)
plt.plot(x, y, label=str(i))
plt.legend(loc='upper right')
plt.show()
If possible I would like to have something similar to this plot:
or this:
I would recommend the answer suggested in the comments, but another method that gives something similar to your first option (albeit without the exact placement of the legend markers matching the positions of the associated lines) is:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
n=10
x = np.linspace(0, 1, n)
labels = [str(i) for i in range(len(x))]
for i in range(n):
y = np.linspace(x[i], x[i], n)
ax.plot(x, y, label=labels[i])
h, _ = ax.get_legend_handles_labels()
# sort the legend handles/labels so they are in the same order as the data
hls = sorted(zip(x, h, labels), reverse=True)
ax.legend(
[ha[1] for ha in hls], # get handles
[la[2] for la in hls], # get labels
bbox_to_anchor=(1.04, 0, 0.1, 1), # set box outside of axes
loc="lower left",
labelspacing=1.6, # add space between labels
)
leg = ax.get_legend()
# expand the border of the legend
fontsize = fig.canvas.get_renderer().points_to_pixels(leg._fontsize)
pad = 2 * (leg.borderaxespad + leg.borderpad) * fontsize
leg._legend_box.set_height(leg.get_bbox_to_anchor().height - pad)
This is heavily reliant on the answers here and here.

Set size of subplot to other sublot with equal aspect ratio

I would like a representation consisting of a scatter plot and 2 histograms on the right and below the scatter plot
create. I have the following requirements:
1.) In the scatter plot, the apect ratio is equal so that the circle does not look like an ellipse.
2.) In the graphic, the subplots should be exactly as wide or high as the axes of the scatter plot.
This also works to a limited extent. However, I can't make the lower histogram as wide as the x axis of the scatter plot. How do I do that?
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import random
#create some demo data
x = [random.uniform(-2.0, 2.0) for i in range(100)]
y = [random.uniform(-2.0, 2.0) for i in range(100)]
#create figure
fig = plt.figure()
gs = gridspec.GridSpec(2, 2, width_ratios = [3, 1], height_ratios = [3, 1])
ax = plt.subplot(gs[0])
# Axis labels
plt.xlabel('pos error X [mm]')
plt.ylabel('pos error Y [mm]')
ax.grid(True)
ax.axhline(color="#000000")
ax.axvline(color="#000000")
ax.set_aspect('equal')
radius = 1.0
xc = radius*np.cos(np.linspace(0,np.pi*2))
yc = radius*np.sin(np.linspace(0,np.pi*2))
plt.plot(xc, yc, "k")
ax.scatter(x,y)
hist_x = plt.subplot(gs[1],sharey=ax)
hist_y = plt.subplot(gs[2],sharex=ax)
plt.tight_layout() #needed. without no xlabel visible
plt.show()
what i want is:
Many thanks for your help!
The easiest (but not necessarily most elegant) solution is to manually position the lower histogram after applying the tight layout:
ax_pos = ax.get_position()
hist_y_pos = hist_y.get_position()
hist_y.set_position((ax_pos.x0, hist_y_pos.y0, ax_pos.width, hist_y_pos.height))
This output was produced by matplotlib version 3.4.3. For your example output, you're obviously using a different version, as I get a much wider lower histogram than you.
(I retained the histogram names as in your example although I guess the lower one should be hist_x instead of hist_y).

Removing legend from mpl parallel coordinates plot?

I have a parallel coordinates plot with lots of data points so I'm trying to use a continuous colour bar to represent that, which I think I have worked out. However, I haven't been able to remove the default key that is put in when creating the plot, which is very long and hinders readability. Is there a way to remove this table to make the graph much easier to read?
This is the code I'm currently using to generate the parallel coordinates plot:
parallel_coordinates(data[[' male_le','
female_le','diet','activity','obese_perc','median_income']],'median_income',colormap = 'rainbow',
alpha = 0.5)
fig, ax = plt.subplots(figsize=(6, 1))
fig.subplots_adjust(bottom=0.5)
cmap = mpl.cm.rainbow
bounds = [0.00,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
norm = mpl.colors.BoundaryNorm(bounds, cmap.N,)
plt.colorbar(mpl.cm.ScalarMappable(norm = norm, cmap=cmap),cax = ax, orientation = 'horizontal',
label = 'normalised median income', alpha = 0.5)
plt.show()
Current Output:
I want my legend to be represented as a color bar, like this:
Any help would be greatly appreciated. Thanks.
You can use ax.legend_.remove() to remove the legend.
The cax parameter of plt.colorbar indicates the subplot where to put the colorbar. If you leave it out, matplotlib will create a new subplot, "stealing" space from the current subplot (subplots are often referenced to by ax in matplotlib). So, here leaving out cax (adding ax=ax isn't necessary, as here ax is the current subplot) will create the desired colorbar.
The code below uses seaborn's penguin dataset to create a standalone example.
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import numpy as np
from pandas.plotting import parallel_coordinates
penguins = sns.load_dataset('penguins')
fig, ax = plt.subplots(figsize=(10, 4))
cmap = plt.get_cmap('rainbow')
bounds = np.arange(penguins['body_mass_g'].min(), penguins['body_mass_g'].max() + 200, 200)
norm = mpl.colors.BoundaryNorm(bounds, 256)
penguins = penguins.dropna(subset=['body_mass_g'])
parallel_coordinates(penguins[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']],
'body_mass_g', colormap=cmap, alpha=0.5, ax=ax)
ax.legend_.remove()
plt.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=cmap),
ax=ax, orientation='horizontal', label='body mass', alpha=0.5)
plt.show()

Convert a Histogram which has two variables plotted on it into a smooth Curve

Here is the code for generating the histogram. For the full code you can refer to this iPython Notebook
# Splitting the dataset into malignant and benign.
dataMalignant=datas[datas['diagnosis'] ==1]
dataBenign=datas[datas['diagnosis'] ==0]
#Plotting these features as a histogram
fig, axes = plt.subplots(nrows=10, ncols=1, figsize=(15,60))
for idx,ax in enumerate(axes):
ax.figure
binwidth= (max(datas[features_mean[idx]]) - min(datas[features_mean[idx]]))/250
ax.hist([dataMalignant[features_mean[idx]],dataBenign[features_mean[idx]]], bins=np.arange(min(datas[features_mean[idx]]), max(datas[features_mean[idx]]) + binwidth, binwidth) , alpha=0.5,stacked=True, normed = True, label=['M','B'],color=['r','g'])
ax.legend(loc='upper right')
ax.set_title(features_mean[idx])
plt.show()
How do I convert this Histogram into a smooth curve with the area under the curve shaded/highlighted.
here is a simple example that might help you
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(123)
datas = pd.DataFrame(np.random.randint(0, 2, size=(100, 1)), columns=['diagnosis'])
datas['data'] = np.random.randint(0, 100,size=(100, 1))
I used numpy's histogram function,but you could also use ax.hist with same arguments instead.
benign_hist=np.histogram(datas[datas['diagnosis']==0]['data'],bins=np.arange(0, 100, 10))
malignant_hist=np.histogram(datas[datas['diagnosis']==1]['data'],bins=np.arange(0, 100, 10))
fig,ax=plt.subplots(1,1)
ax.fill_between(malignant_hist[1][1:], malignant_hist[0], color='r', alpha=0.5)
ax.fill_between(benign_hist[1][1:], benign_hist[0], color='b', alpha=0.5)
in the above example for plotting convenience instead of bin midpoints I just used 9 bin edges for demonstration.
in OP's code one could assign hist_data = ax.hist(...)
hist_data[0] contains histogram values and hist_data1 contains bins to fill in areas use something like
fig, ax=plt.subplots(1,1)
ax.fill_between(hist_data[1][1:],hist_data[0][0],color='g',alpha=0.5)
ax.fill_between(hist_data[1][1:],hist_data[0][1],color='r',alpha=0.5)

Categories

Resources