Setting a fixed size for points in legend - python

I'm making some scatter plots and I want to set the size of the points in the legend to a fixed, equal value.
Right now I have this:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=0., high=1., size=(100,))
# Generate data.
x1, y1 = [rand_data() for i in range(2)]
x2, y2 = [rand_data() for i in range(2)]
plt.figure()
plt.scatter(x1, y1, marker='o', label='first', s=20., c='b')
plt.scatter(x2, y2, marker='o', label='second', s=35., c='r')
# Plot legend.
plt.legend(loc="lower left", markerscale=2., scatterpoints=1, fontsize=10)
plt.show()
which produces this:
The sizes of the points in the legend are scaled but not the same. How can I fix the sizes of the points in the legend to an equal value without affecting the sizes in the scatter plot?

I had a look into the source code of matplotlib. Bad news is that there does not seem to be any simple way of setting equal sizes of points in the legend. It is especially difficult with scatter plots (wrong: see the update below). There are essentially two alternatives:
Change the maplotlib code
Add a transform into the PathCollection objects representing the dots in the image. The transform (scaling) has to take the original size into account.
Neither of these is very much fun, though #1 seems to be easier. The scatter plots are especially challenging in this respect.
However, I have a hack which does probably what you want:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=0., high=1., size=(100,))
# Generate data.
x1, y1 = [rand_data() for i in range(2)]
x2, y2 = [rand_data() for i in range(2)]
plt.figure()
plt.plot(x1, y1, 'o', label='first', markersize=np.sqrt(20.), c='b')
plt.plot(x2, y2, 'o', label='second', markersize=np.sqrt(35.), c='r')
# Plot legend.
lgnd = plt.legend(loc="lower left", numpoints=1, fontsize=10)
#change the marker size manually for both lines
lgnd.legendHandles[0]._legmarker.set_markersize(6)
lgnd.legendHandles[1]._legmarker.set_markersize(6)
plt.show()
This gives:
Which seems to be what you wanted.
The changes:
scatter changed into a plot, which changes the marker scaling (hence the sqrt) and makes it impossible to use changing marker size (if that was intended)
the marker size changed manually to be 6 points for both markers in the legend
As you can see, this utilizes hidden underscore properties (_legmarker) and is bug-ugly. It may break down at any update in matplotlib.
Update
Haa, I found it. A better hack:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=0., high=1., size=(100,))
# Generate data.
x1, y1 = [rand_data() for i in range(2)]
x2, y2 = [rand_data() for i in range(2)]
plt.figure()
plt.scatter(x1, y1, marker='o', label='first', s=20., c='b')
plt.scatter(x2, y2, marker='o', label='second', s=35., c='r')
# Plot legend.
lgnd = plt.legend(loc="lower left", scatterpoints=1, fontsize=10)
lgnd.legendHandles[0]._sizes = [30]
lgnd.legendHandles[1]._sizes = [30]
plt.show()
Now the _sizes (another underscore property) does the trick. No need to touch the source, even though this is quite a hack. But now you can use everything scatter offers.

Similarly to the answer, assuming you want all the markers with the same size:
lgnd = plt.legend(loc="lower left", scatterpoints=1, fontsize=10)
for handle in lgnd.legendHandles:
handle.set_sizes([6.0])
With MatPlotlib 2.0.0

You can make a Line2D object that resembles your chosen markers, except with a different marker size of your choosing, and use that to construct the legend. This is nice because it doesn't require placing an object in your axes (potentially triggering a resize event), and it doesn't require use of any hidden attributes. The only real downside is that you have to construct the legend explicitly from lists of objects and labels, but this is a well-documented matplotlib feature so it feels pretty safe to use.
from matplotlib.lines import Line2D
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=0., high=1., size=(100,))
# Generate data.
x1, y1 = [rand_data() for i in range(2)]
x2, y2 = [rand_data() for i in range(2)]
plt.figure()
plt.scatter(x1, y1, marker='o', label='first', s=20., c='b')
plt.scatter(x2, y2, marker='o', label='second', s=35., c='r')
# Create dummy Line2D objects for legend
h1 = Line2D([0], [0], marker='o', markersize=np.sqrt(20), color='b', linestyle='None')
h2 = Line2D([0], [0], marker='o', markersize=np.sqrt(20), color='r', linestyle='None')
# Set axes limits
plt.gca().set_xlim(-0.2, 1.2)
plt.gca().set_ylim(-0.2, 1.2)
# Plot legend.
plt.legend([h1, h2], ['first', 'second'], loc="lower left", markerscale=2,
scatterpoints=1, fontsize=10)
plt.show()

I did not have much success using #DrV's solution though perhaps my use case is unique. Because of the density of points, I am using the smallest marker size, i.e. plt.plot(x, y, '.', ms=1, ...), and want the legend symbols larger.
I followed the recommendation I found on the matplotlib forums:
plot the data (no labels)
record axes limit (xlimits = plt.xlim())
plot fake data far away from real data with legend-appropriate symbol colors and sizes
restore axes limits (plt.xlim(xlimits))
create legend
Here is how it turned out (for this the dots are actually less important that the lines):
Hope this helps someone else.

Just another alternative here. This has the advantage that it would not use any "private" methods and works even with other objects than scatters present in the legend. The key is to map the scatter PathCollection to a HandlerPathCollection with an updating function being set to it.
def update(handle, orig):
handle.update_from(orig)
handle.set_sizes([64])
plt.legend(handler_map={PathCollection : HandlerPathCollection(update_func=update)})
Complete code example:
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
from matplotlib.collections import PathCollection
from matplotlib.legend_handler import HandlerPathCollection, HandlerLine2D
colors = ["limegreen", "crimson", "indigo"]
markers = ["o", "s", r"$\clubsuit$"]
labels = ["ABC", "DEF", "XYZ"]
plt.plot(np.linspace(0,1,8), np.random.rand(8), marker="o", markersize=22, label="A line")
for i,(c,m,l) in enumerate(zip(colors,markers,labels)):
plt.scatter(np.random.rand(8),np.random.rand(8),
c=c, marker=m, s=10+np.exp(i*2.9), label=l)
def updatescatter(handle, orig):
handle.update_from(orig)
handle.set_sizes([64])
def updateline(handle, orig):
handle.update_from(orig)
handle.set_markersize(8)
plt.legend(handler_map={PathCollection : HandlerPathCollection(update_func=updatescatter),
plt.Line2D : HandlerLine2D(update_func = updateline)})
plt.show()

Related

How to add a legend entry without a symbol/color into an existing legend?

I have a plot with a legend. I would like to add an entry into the legend box. This entry could be something like a fit parameter or something else descriptive of the data.
As an example, one can use the code below
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 101)
y = np.sin(x)
plt.plot(x, y, color='r', label='y = sin(x)')
plt.plot(np.NaN, np.NaN, color='k', label='extra label')
plt.plot(np.NaN, np.NaN, color='b', linestyle=None, label='linestyle = None')
plt.plot(np.NaN, np.NaN, color='orange', marker=None, label='marker = None')
plt.plot(np.NaN, np.NaN, color=None, label='color = None')
plt.legend()
plt.show()
to generate the plot below
I would like instead to have a label "extra label" with only whitespace and no symbol. I tried changing the linestyle, marker, and color kwargs to None but without success. I've also tried plotting plt.plot([], []) instead of plotting plt.plot(np.NaN, np.NaN). I suppose some hacky workaround is to change color='k' to color='white'. But I'm hoping there is a more proper way to do this. How can I do this?
EDIT
My question is not a duplicate. The post that this is accused of being a duplicate of shows another way of producing a legend label, but not for one without a symbol. One can run the code below to test as the same problem from my original question applies.
import matplotlib.patches as mpatches
nan_patch = mpatches.Patch(color=None, label='The label for no data')
and modify this instance from plt.legend()
plt.legend(handles=[nan_patch])
You can add items to legend as shown in the (now removed) duplicate. Note that to have no color in the legend itself you must set color="none", e.g
empty_patch = mpatches.Patch(color='none', label='Extra label')
plt.legend(handles=[empty_patch])
In order to have this, as well as your existing legend entries, you can get a list of the existing legend handles and labels, add the extra ones to it, then plot the legend:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
x = np.linspace(0, 2*np.pi, 101)
y = np.sin(x)
plt.plot(x, y, color='r', label='y = sin(x)')
handles, labels = plt.gca().get_legend_handles_labels() # get existing handles and labels
empty_patch = mpatches.Patch(color='none', label='Extra label') # create a patch with no color
handles.append(empty_patch) # add new patches and labels to list
labels.append("Extra label")
plt.legend(handles, labels) # apply new handles and labels to plot
plt.show()
Which gives:

Change size/alpha of markers in the legend box of matplotlib

What is the most convenient way to enlarge and set the alpha value of the markers (back to 1.0) in the legend box? I'm also happy with big coloured boxes.
import matplotlib.pyplot as plt
import numpy as np
n = 100000
s1 = np.random.normal(0, 0.05, n)
s2 = np.random.normal(0, 0.08, n)
ys = np.linspace(0, 1, n)
plt.plot(s1, ys, ',', label='data1', alpha=0.1)
plt.plot(s2, ys, ',', label='data2', alpha=0.1)
plt.legend(bbox_to_anchor=(1.005, 1), loc=2, borderaxespad=0.)
For the size you can include the keyword markerscale=## in the call to legend and that will make the markers bigger (or smaller).
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
fig.clf()
x1,y1 = 4.*randn(10000), randn(10000)
x2,y2 = randn(10000), 4.*randn(10000)
ax = [fig.add_subplot(121+c) for c in range(2)]
ax[0].plot(x1, y1, 'bx',ms=.1,label='blue x')
ax[0].plot(x2, y2, 'r^',ms=.1,label='red ^')
ax[0].legend(loc='best')
ax[1].plot(x1, y1, 'bx',ms=.1,label='blue x')
ax[1].plot(x2, y2, 'r^',ms=.1,label='red ^')
ax[1].legend(loc='best', markerscale=40)
leg = plt.legend()
for lh in leg.legendHandles:
lh.set_alpha(1)
credit to https://izziswift.com/set-legend-symbol-opacity-with-matplotlib/
We can use the handler_map option to .legend() to define a custom function to update the alpha or marker for all Line2D instances in the legend. This method has the advantage that it gets the legend markers correct first time, they do not need to be modified afterwards, and fixes issues where the original legend markers can sometimes still be seen.
This method makes use of HandlerLine2D from the matplotlib.legend_handler module. I'm not aware of a way to do this without adding the extra import.
A complete script would look like this:
import matplotlib.pyplot as plt
from matplotlib.legend_handler import HandlerLine2D
import numpy as np
# Generate data
n = 100000
s1 = np.random.normal(0, 0.05, n)
s2 = np.random.normal(0, 0.08, n)
ys = np.linspace(0, 1, n)
# Create figure and plot the data
fig, ax = plt.subplots()
ax.plot(s1, ys, ',', label='data1', alpha=0.1)
ax.plot(s2, ys, ',', label='data2', alpha=0.1)
def change_alpha(handle, original):
''' Change the alpha and marker style of the legend handles '''
handle.update_from(original)
handle.set_alpha(1)
handle.set_marker('.')
# Add the legend, and set the handler map to use the change_alpha function
ax.legend(bbox_to_anchor=(1.005, 1), loc=2, borderaxespad=0.,
handler_map={plt.Line2D: HandlerLine2D(update_func=change_alpha)})
plt.show()
Note, below is my original answer. I have left it here for posterity as is may work for some use cases, but has the problem that when you change the alpha and markers, it actually creates new instances on the legend, and does not remove the old ones, so both can still be visible. I would recommend the method above in most cases.
If you name your legend, you can then iterate over the lines contained within it. For example:
leg=plt.legend(bbox_to_anchor=(1.005, 1), loc=2, borderaxespad=0.)
for l in leg.get_lines():
l.set_alpha(1)
l.set_marker('.')
note, you also have to set the marker again. I suggest setting it to . rather than , here, to make it a little more visible
for me the trick was to use the right property:
leg = axs.legend()
for l in leg.get_lines():
l._legmarker.set_markersize(6)
Another option: instead of altering your legend's markers, you can make a custom legend (ref the matplotlib legend tutorial)
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import numpy as np
n = 100000
s1 = np.random.normal(0, 0.05, n)
s2 = np.random.normal(0, 0.08, n)
ys = np.linspace(0, 1, n)
plt.plot(s1, ys, ',', label='data1', alpha=0.1)
plt.plot(s2, ys, ',', label='data2', alpha=0.1)
# manually generate legend
handles, labels = plt.axes().get_legend_handles_labels()
patches = [Patch(color=handle.get_color(), label=label) for handle, label in zip(handles, labels)]
plt.legend(handles=patches, bbox_to_anchor=(1.005, 1), loc=2, borderaxespad=0., frameon=False)

multiple savefig() 's with different styles

I'm writing a program, which has two outputs: a GUI and a printed report on paper (a simple pdf printed out).
On both outputs I would like to have a diagram, but with different styles.
dark_background on the GUI (http://matplotlib.org/examples/style_sheets/plot_dark_background.html)
and fivethirtyeight on the paper http://matplotlib.org/examples/style_sheets/plot_fivethirtyeight.html
Somehow I could not manage to generate 2 images with proper styles. Only one of them was correct. I do not have enough experience-points to post pictures. So I will post only my code.
My first idea was:
import numpy as np
import matplotlib.pyplot as plt
def plot():
#set pic size
fig = plt.figure(figsize=(16, 9), dpi=100)
ax = plt.subplot(111)
# set x data
x = range(10)
# set y data
y1 = np.zeros(10)
y2 = [0,1,2,3,4,5,6,7,8,9]
y3 = [10,9,8,7,6,5,4,3,2,1]
#plot as errorbar
ax.errorbar(x, y1, fmt='o', color='green', markersize=8, label='Normal')
ax.errorbar(x, y2, yerr=0.1, fmt='o', color='orange', markersize=8, label='abw_up')
ax.errorbar(x, y3, yerr=0.1, fmt='o', color='purple', markersize=8,label='abw_down')
# limits
ax.axhline(0.1*10, color='red', lw=2)
ax.axhline(-0.1*10, color='red', lw=2)
#set limit of y-Axis
ax.set_ylim((-1.3,1.3))
# Labels
ax.set_xlabel('points number')
ax.set_ylabel('values')
# legend
legend=ax.legend(loc=('upper center'), shadow='true',bbox_to_anchor=(0.5, 1.05),ncol=3, fancybox=True)
plt.style.use('dark_background')
plt.savefig('result_dark.png')
plt.style.use('fivethirtyeight')
plt.savefig('result_white.png')
But it did not work properly. One of the images was correct. The second had a correct backgroundcolor, but the fontcolor of legend/labels did not change. I tried to separate the 2 images, the result was the same:
import numpy as np
import matplotlib.pyplot as plt
import os
def plot():
#set pic size
ax = plt.subplot(111)
# set x data
x = range(10)
# set y data
y1 = np.zeros(10)
y2 = [1,2,3,1,2,3,1,2,3,1]
y3 = [3,1,2,3,1,2,3,1,2,3]
#plot as errorbar
ax.errorbar(x, y1, fmt='o', color='green', markersize=8, label='Normal')
ax.errorbar(x, y2, yerr=0.2, fmt='o', color='orange', markersize=8, label='abw_up')
ax.errorbar(x, y3, yerr=0.1, fmt='o', color='purple', markersize=8,label='abw_down')
# limits
ax.axhline(0.1*10, color='red', lw=2)
ax.axhline(-0.1*10, color='red', lw=2)
#set limit of y-Axis
ax.set_ylim((-1.3,5.3))
# Labels
ax.set_xlabel('Messpunkte-Nr.\nMeasurement points number')
ax.set_ylabel('Spezifikationsgrenze normiert\nnormed to specification')
# legend
legend=ax.legend(loc=('upper center'), shadow='true',bbox_to_anchor=(0.5, 1.05),ncol=3, fancybox=True)
texts =legend.get_texts()
texts[0].set_color('green')
texts[1].set_color('orange')
texts[2].set_color('purple')
fig = plt.figure(figsize=(16, 9), dpi=100)
plt.style.use('dark_background')
plot()
plt.savefig('result_dark.png')
plt.clf()
#plt.close()
fig = plt.figure(figsize=(16, 9), dpi=100)
plt.style.use('fivethirtyeight')
plot()
plt.savefig('result_white.png')
plt.clf()
#plt.close()
How should I fix my code to have 2 images with the same values, but different styles?
I would suggest structuring you code something like:
from matplotlib.style import context
def my_plot_function(ax, data, style):
# do all of your plotting in here, should be fast as no computation
pass
with context('dark'):
fig, ax = plt.subplots(1, 1)
my_plot_function(ax, data, style)
fig.savefig(...)
with context('fivethirtyeight'):
fig, ax = plt.subplots(1, 1)
my_plot_function(ax, data, style)
fig.savefig(...)
This is a design feature, not a bug. Almost all of the values controlled by rcparams are set at object creation time, not a draw time, because having what your figure will look like when you render it depend on global state is terrifying. This also allows you to use context managers for the rcparams, as shown above. Calling use only over-rides the values that the style sheet explicitly sets (which is also a design feature so you can apply multiple styles a-la cascading style sheets).
So your problem appears to be that you do a lot of plotting and then tell pylab you'd like your plots to have a particular style. That instruction doesn't seem to be updating everything. So instead, tell it you want to use a particular style. Then plot. Then clear everything. Then plot again.
import numpy as np
import matplotlib.pyplot as plt
def plot():
#set pic size
fig = plt.figure(figsize=(16, 9), dpi=100)
ax = plt.subplot(111)
# set x data
x = range(10)
# set y data
y1 = np.zeros(10)
y2 = [0,1,2,3,4,5,6,7,8,9]
y3 = [10,9,8,7,6,5,4,3,2,1]
#plot as errorbar
ax.errorbar(x, y1, fmt='o', color='green', markersize=8, label='Normal')
ax.errorbar(x, y2, yerr=0.1, fmt='o', color='orange', markersize=8, label='abw_up')
ax.errorbar(x, y3, yerr=0.1, fmt='o', color='purple', markersize=8,label='abw_down')
# limits
ax.axhline(0.1*10, color='red', lw=2)
ax.axhline(-0.1*10, color='red', lw=2)
#set limit of y-Axis
ax.set_ylim((-1.3,1.3))
# Labels
ax.set_xlabel('points number')
ax.set_ylabel('values')
# legend
legend=ax.legend(loc=('upper center'), shadow='true',bbox_to_anchor=(0.5, 1.05),ncol=3, fancybox=True)
plt.style.use('dark_background')
plot()
plt.savefig('result_dark.png')
plt.clf()
plt.style.use('fivethirtyeight')
plot()
plt.savefig('result_white.png')
Does this give what you want? Here are the figures I got.

Adding a legend to PyPlot in Matplotlib in the simplest manner possible

TL;DR -> How can one create a legend for a line graph in Matplotlib's PyPlot without creating any extra variables?
Please consider the graphing script below:
if __name__ == '__main__':
PyPlot.plot(total_lengths, sort_times_bubble, 'b-',
total_lengths, sort_times_ins, 'r-',
total_lengths, sort_times_merge_r, 'g+',
total_lengths, sort_times_merge_i, 'p-', )
PyPlot.title("Combined Statistics")
PyPlot.xlabel("Length of list (number)")
PyPlot.ylabel("Time taken (seconds)")
PyPlot.show()
As you can see, this is a very basic use of matplotlib's PyPlot. This ideally generates a graph like the one below:
Nothing special, I know. However, it is unclear what data is being plotted where (I'm trying to plot the data of some sorting algorithms, length against time taken, and I'd like to make sure people know which line is which). Thus, I need a legend, however, taking a look at the following example below(from the official site):
ax = subplot(1,1,1)
p1, = ax.plot([1,2,3], label="line 1")
p2, = ax.plot([3,2,1], label="line 2")
p3, = ax.plot([2,3,1], label="line 3")
handles, labels = ax.get_legend_handles_labels()
# reverse the order
ax.legend(handles[::-1], labels[::-1])
# or sort them by labels
import operator
hl = sorted(zip(handles, labels),
key=operator.itemgetter(1))
handles2, labels2 = zip(*hl)
ax.legend(handles2, labels2)
You will see that I need to create an extra variable ax. How can I add a legend to my graph without having to create this extra variable and retaining the simplicity of my current script?
Add a label= to each of your plot() calls, and then call legend(loc='upper left').
Consider this sample (tested with Python 3.8.0):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 20, 1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, "-b", label="sine")
plt.plot(x, y2, "-r", label="cosine")
plt.legend(loc="upper left")
plt.ylim(-1.5, 2.0)
plt.show()
Slightly modified from this tutorial: http://jakevdp.github.io/mpl_tutorial/tutorial_pages/tut1.html
You can access the Axes instance (ax) with plt.gca(). In this case, you can use
plt.gca().legend()
You can do this either by using the label= keyword in each of your plt.plot() calls or by assigning your labels as a tuple or list within legend, as in this working example:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-0.75,1,100)
y0 = np.exp(2 + 3*x - 7*x**3)
y1 = 7-4*np.sin(4*x)
plt.plot(x,y0,x,y1)
plt.gca().legend(('y0','y1'))
plt.show()
However, if you need to access the Axes instance more that once, I do recommend saving it to the variable ax with
ax = plt.gca()
and then calling ax instead of plt.gca().
Here's an example to help you out ...
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(111)
ax.set_title('ADR vs Rating (CS:GO)')
ax.scatter(x=data[:,0],y=data[:,1],label='Data')
plt.plot(data[:,0], m*data[:,0] + b,color='red',label='Our Fitting
Line')
ax.set_xlabel('ADR')
ax.set_ylabel('Rating')
ax.legend(loc='best')
plt.show()
You can add a custom legend documentation
first = [1, 2, 4, 5, 4]
second = [3, 4, 2, 2, 3]
plt.plot(first, 'g--', second, 'r--')
plt.legend(['First List', 'Second List'], loc='upper left')
plt.show()
A simple plot for sine and cosine curves with a legend.
Used matplotlib.pyplot
import math
import matplotlib.pyplot as plt
x=[]
for i in range(-314,314):
x.append(i/100)
ysin=[math.sin(i) for i in x]
ycos=[math.cos(i) for i in x]
plt.plot(x,ysin,label='sin(x)') #specify label for the corresponding curve
plt.plot(x,ycos,label='cos(x)')
plt.xticks([-3.14,-1.57,0,1.57,3.14],['-$\pi$','-$\pi$/2',0,'$\pi$/2','$\pi$'])
plt.legend()
plt.show()
Add labels to each argument in your plot call corresponding to the series it is graphing, i.e. label = "series 1"
Then simply add Pyplot.legend() to the bottom of your script and the legend will display these labels.

Plot two histograms on single chart with matplotlib

I created a histogram plot using data from a file and no problem. Now I wanted to superpose data from another file in the same histogram, so I do something like this
n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)
but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden. I wonder how could I plot both histograms at the same time with different colors.
Here you have a working example:
import random
import numpy
from matplotlib import pyplot
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
bins = numpy.linspace(-10, 10, 100)
pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()
The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html
EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by #stochastic_zeitgeist
In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis. For example:
import numpy as np
import matplotlib.pyplot as plt
#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']
#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()
In this case, you can plot your two data sets on different axes. To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don't overlap):
#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis
#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])
#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()
As a completion to Gustavo Bezerra's answer:
If you want each histogram to be normalized (normed for mpl<=2.1 and density for mpl>=3.1) you cannot just use normed/density=True, you need to set the weights for each value instead:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
As a comparison, the exact same x and y vectors with default weights and density=True:
You should use bins from the values returned by hist:
import numpy as np
import matplotlib.pyplot as plt
foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution
_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)
Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:
def plotHistogram(p, o):
"""
p and o are iterables with the values you want to
plot the histogram of
"""
plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
plt.show()
Plotting two overlapping histograms (or more) can lead to a rather cluttered plot. I find that using step histograms (aka hollow histograms) improves the readability quite a bit. The only downside is that in matplotlib the default legend for a step histogram is not properly formatted, so it can be edited like in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
from matplotlib.lines import Line2D
rng = np.random.default_rng(seed=123)
# Create two normally distributed random variables of different sizes
# and with different shapes
data1 = rng.normal(loc=30, scale=10, size=500)
data2 = rng.normal(loc=50, scale=10, size=1000)
# Create figure with 'step' type of histogram to improve plot readability
fig, ax = plt.subplots(figsize=(9,5))
ax.hist([data1, data2], bins=15, histtype='step', linewidth=2,
alpha=0.7, label=['data1','data2'])
# Edit legend to get lines as legend keys instead of the default polygons
# and sort the legend entries in alphanumeric order
handles, labels = ax.get_legend_handles_labels()
leg_entries = {}
for h, label in zip(handles, labels):
leg_entries[label] = Line2D([0], [0], color=h.get_facecolor()[:-1],
alpha=h.get_alpha(), lw=h.get_linewidth())
labels_sorted, lines = zip(*sorted(leg_entries.items()))
ax.legend(lines, labels_sorted, frameon=False)
# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Add annotations
plt.ylabel('Frequency', labelpad=15)
plt.title('Matplotlib step histogram', fontsize=14, pad=20)
plt.show()
As you can see, the result looks quite clean. This is especially useful when overlapping even more than two histograms. Depending on how the variables are distributed, this can work for up to around 5 overlapping distributions. More than that would require the use of another type of plot, such as one of those presented here.
It sounds like you might want just a bar graph:
http://matplotlib.sourceforge.net/examples/pylab_examples/bar_stacked.html
http://matplotlib.sourceforge.net/examples/pylab_examples/barchart_demo.html
Alternatively, you can use subplots.
There is one caveat when you want to plot the histogram from a 2-d numpy array. You need to swap the 2 axes.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()
Also an option which is quite similar to joaquin answer:
import random
from matplotlib import pyplot
#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()
Gives the following output:
Just in case you have pandas (import pandas as pd) or are ok with using it:
test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)],
[random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()
This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.
import seasborn as sns
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)
Some helpful examples are here for kde vs histogram comparison.
Inspired by Solomon's answer, but to stick with the question, which is related to histogram, a clean solution is:
sns.distplot(bar)
sns.distplot(foo)
plt.show()
Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.

Categories

Resources