Adding a legend to PyPlot in Matplotlib in the simplest manner possible - python

TL;DR -> How can one create a legend for a line graph in Matplotlib's PyPlot without creating any extra variables?
Please consider the graphing script below:
if __name__ == '__main__':
PyPlot.plot(total_lengths, sort_times_bubble, 'b-',
total_lengths, sort_times_ins, 'r-',
total_lengths, sort_times_merge_r, 'g+',
total_lengths, sort_times_merge_i, 'p-', )
PyPlot.title("Combined Statistics")
PyPlot.xlabel("Length of list (number)")
PyPlot.ylabel("Time taken (seconds)")
PyPlot.show()
As you can see, this is a very basic use of matplotlib's PyPlot. This ideally generates a graph like the one below:
Nothing special, I know. However, it is unclear what data is being plotted where (I'm trying to plot the data of some sorting algorithms, length against time taken, and I'd like to make sure people know which line is which). Thus, I need a legend, however, taking a look at the following example below(from the official site):
ax = subplot(1,1,1)
p1, = ax.plot([1,2,3], label="line 1")
p2, = ax.plot([3,2,1], label="line 2")
p3, = ax.plot([2,3,1], label="line 3")
handles, labels = ax.get_legend_handles_labels()
# reverse the order
ax.legend(handles[::-1], labels[::-1])
# or sort them by labels
import operator
hl = sorted(zip(handles, labels),
key=operator.itemgetter(1))
handles2, labels2 = zip(*hl)
ax.legend(handles2, labels2)
You will see that I need to create an extra variable ax. How can I add a legend to my graph without having to create this extra variable and retaining the simplicity of my current script?

Add a label= to each of your plot() calls, and then call legend(loc='upper left').
Consider this sample (tested with Python 3.8.0):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 20, 1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, "-b", label="sine")
plt.plot(x, y2, "-r", label="cosine")
plt.legend(loc="upper left")
plt.ylim(-1.5, 2.0)
plt.show()
Slightly modified from this tutorial: http://jakevdp.github.io/mpl_tutorial/tutorial_pages/tut1.html

You can access the Axes instance (ax) with plt.gca(). In this case, you can use
plt.gca().legend()
You can do this either by using the label= keyword in each of your plt.plot() calls or by assigning your labels as a tuple or list within legend, as in this working example:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-0.75,1,100)
y0 = np.exp(2 + 3*x - 7*x**3)
y1 = 7-4*np.sin(4*x)
plt.plot(x,y0,x,y1)
plt.gca().legend(('y0','y1'))
plt.show()
However, if you need to access the Axes instance more that once, I do recommend saving it to the variable ax with
ax = plt.gca()
and then calling ax instead of plt.gca().

Here's an example to help you out ...
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(111)
ax.set_title('ADR vs Rating (CS:GO)')
ax.scatter(x=data[:,0],y=data[:,1],label='Data')
plt.plot(data[:,0], m*data[:,0] + b,color='red',label='Our Fitting
Line')
ax.set_xlabel('ADR')
ax.set_ylabel('Rating')
ax.legend(loc='best')
plt.show()

You can add a custom legend documentation
first = [1, 2, 4, 5, 4]
second = [3, 4, 2, 2, 3]
plt.plot(first, 'g--', second, 'r--')
plt.legend(['First List', 'Second List'], loc='upper left')
plt.show()

A simple plot for sine and cosine curves with a legend.
Used matplotlib.pyplot
import math
import matplotlib.pyplot as plt
x=[]
for i in range(-314,314):
x.append(i/100)
ysin=[math.sin(i) for i in x]
ycos=[math.cos(i) for i in x]
plt.plot(x,ysin,label='sin(x)') #specify label for the corresponding curve
plt.plot(x,ycos,label='cos(x)')
plt.xticks([-3.14,-1.57,0,1.57,3.14],['-$\pi$','-$\pi$/2',0,'$\pi$/2','$\pi$'])
plt.legend()
plt.show()

Add labels to each argument in your plot call corresponding to the series it is graphing, i.e. label = "series 1"
Then simply add Pyplot.legend() to the bottom of your script and the legend will display these labels.

Related

Matplotlib legend in separate figure with PolyCollection object

I need to create the legend as a separate figure, and more importantly separate instance that can be saved in a new file. My plot consists of lines and a filled in segment.
The problem is the fill_between element, I can not add it to the external figure/legend.
I realise, this is a different type of object, it is a PolyCollection, while to line-plots are Line2D elements.
How do I handle the PolyCollection so that I can use it in the external legend?
INFO: matplotlib version 3.3.2
import matplotlib.pyplot as plt
import numpy as np
# Dummy data
x = np.linspace(1, 100, 1000)
y = np.log(x)
y1 = np.sin(x)
# Create regular plot and plot everything
fig = plt.figure('Line plot')
ax = fig.add_subplot(111)
line1, = ax.plot(x, y)
line2, = ax.plot(x, y1)
fill = ax.fill_between(x, y, y1)
ax.legend([line1, line2, fill],['Log','Sin','Area'])
ax.plot()
# create new plot only for legend
legendFig = plt.figure('Legend plot')
legendFig.legend([line1, line2],['Log','Sin']) <----- This works
# legendFig.legend([line1, line2, fill],['Log','Sin', 'Area']) <----- This does not work
You forgot to mention what does not work means here.
Apparently, you get an error message: RuntimeError: Can not put single artist in more than one figure.
Matplotlib doesn't allow elements placed in one figure to be reused in another. It is just a lucky coincidence that the line don't give an error.
To use an element in another figure, you can create a new element, and that copy the style from the original element:
from matplotlib.lines import Line2D
from matplotlib.collections import PolyCollection
legendFig = plt.figure('Legend plot')
handle_line1 = Line2D([], [])
handle_line1.update_from(line1)
handle_line2 = Line2D([], [])
handle_line2.update_from(line2)
handle_fill = PolyCollection([])
handle_fill.update_from(fill)
legendFig.legend([handle_line1, handle_line2, handle_fill], ['Log', 'Sin', 'Area'])

How to add a legend entry without a symbol/color into an existing legend?

I have a plot with a legend. I would like to add an entry into the legend box. This entry could be something like a fit parameter or something else descriptive of the data.
As an example, one can use the code below
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 101)
y = np.sin(x)
plt.plot(x, y, color='r', label='y = sin(x)')
plt.plot(np.NaN, np.NaN, color='k', label='extra label')
plt.plot(np.NaN, np.NaN, color='b', linestyle=None, label='linestyle = None')
plt.plot(np.NaN, np.NaN, color='orange', marker=None, label='marker = None')
plt.plot(np.NaN, np.NaN, color=None, label='color = None')
plt.legend()
plt.show()
to generate the plot below
I would like instead to have a label "extra label" with only whitespace and no symbol. I tried changing the linestyle, marker, and color kwargs to None but without success. I've also tried plotting plt.plot([], []) instead of plotting plt.plot(np.NaN, np.NaN). I suppose some hacky workaround is to change color='k' to color='white'. But I'm hoping there is a more proper way to do this. How can I do this?
EDIT
My question is not a duplicate. The post that this is accused of being a duplicate of shows another way of producing a legend label, but not for one without a symbol. One can run the code below to test as the same problem from my original question applies.
import matplotlib.patches as mpatches
nan_patch = mpatches.Patch(color=None, label='The label for no data')
and modify this instance from plt.legend()
plt.legend(handles=[nan_patch])
You can add items to legend as shown in the (now removed) duplicate. Note that to have no color in the legend itself you must set color="none", e.g
empty_patch = mpatches.Patch(color='none', label='Extra label')
plt.legend(handles=[empty_patch])
In order to have this, as well as your existing legend entries, you can get a list of the existing legend handles and labels, add the extra ones to it, then plot the legend:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
x = np.linspace(0, 2*np.pi, 101)
y = np.sin(x)
plt.plot(x, y, color='r', label='y = sin(x)')
handles, labels = plt.gca().get_legend_handles_labels() # get existing handles and labels
empty_patch = mpatches.Patch(color='none', label='Extra label') # create a patch with no color
handles.append(empty_patch) # add new patches and labels to list
labels.append("Extra label")
plt.legend(handles, labels) # apply new handles and labels to plot
plt.show()
Which gives:

Setting a fixed size for points in legend

I'm making some scatter plots and I want to set the size of the points in the legend to a fixed, equal value.
Right now I have this:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=0., high=1., size=(100,))
# Generate data.
x1, y1 = [rand_data() for i in range(2)]
x2, y2 = [rand_data() for i in range(2)]
plt.figure()
plt.scatter(x1, y1, marker='o', label='first', s=20., c='b')
plt.scatter(x2, y2, marker='o', label='second', s=35., c='r')
# Plot legend.
plt.legend(loc="lower left", markerscale=2., scatterpoints=1, fontsize=10)
plt.show()
which produces this:
The sizes of the points in the legend are scaled but not the same. How can I fix the sizes of the points in the legend to an equal value without affecting the sizes in the scatter plot?
I had a look into the source code of matplotlib. Bad news is that there does not seem to be any simple way of setting equal sizes of points in the legend. It is especially difficult with scatter plots (wrong: see the update below). There are essentially two alternatives:
Change the maplotlib code
Add a transform into the PathCollection objects representing the dots in the image. The transform (scaling) has to take the original size into account.
Neither of these is very much fun, though #1 seems to be easier. The scatter plots are especially challenging in this respect.
However, I have a hack which does probably what you want:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=0., high=1., size=(100,))
# Generate data.
x1, y1 = [rand_data() for i in range(2)]
x2, y2 = [rand_data() for i in range(2)]
plt.figure()
plt.plot(x1, y1, 'o', label='first', markersize=np.sqrt(20.), c='b')
plt.plot(x2, y2, 'o', label='second', markersize=np.sqrt(35.), c='r')
# Plot legend.
lgnd = plt.legend(loc="lower left", numpoints=1, fontsize=10)
#change the marker size manually for both lines
lgnd.legendHandles[0]._legmarker.set_markersize(6)
lgnd.legendHandles[1]._legmarker.set_markersize(6)
plt.show()
This gives:
Which seems to be what you wanted.
The changes:
scatter changed into a plot, which changes the marker scaling (hence the sqrt) and makes it impossible to use changing marker size (if that was intended)
the marker size changed manually to be 6 points for both markers in the legend
As you can see, this utilizes hidden underscore properties (_legmarker) and is bug-ugly. It may break down at any update in matplotlib.
Update
Haa, I found it. A better hack:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=0., high=1., size=(100,))
# Generate data.
x1, y1 = [rand_data() for i in range(2)]
x2, y2 = [rand_data() for i in range(2)]
plt.figure()
plt.scatter(x1, y1, marker='o', label='first', s=20., c='b')
plt.scatter(x2, y2, marker='o', label='second', s=35., c='r')
# Plot legend.
lgnd = plt.legend(loc="lower left", scatterpoints=1, fontsize=10)
lgnd.legendHandles[0]._sizes = [30]
lgnd.legendHandles[1]._sizes = [30]
plt.show()
Now the _sizes (another underscore property) does the trick. No need to touch the source, even though this is quite a hack. But now you can use everything scatter offers.
Similarly to the answer, assuming you want all the markers with the same size:
lgnd = plt.legend(loc="lower left", scatterpoints=1, fontsize=10)
for handle in lgnd.legendHandles:
handle.set_sizes([6.0])
With MatPlotlib 2.0.0
You can make a Line2D object that resembles your chosen markers, except with a different marker size of your choosing, and use that to construct the legend. This is nice because it doesn't require placing an object in your axes (potentially triggering a resize event), and it doesn't require use of any hidden attributes. The only real downside is that you have to construct the legend explicitly from lists of objects and labels, but this is a well-documented matplotlib feature so it feels pretty safe to use.
from matplotlib.lines import Line2D
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=0., high=1., size=(100,))
# Generate data.
x1, y1 = [rand_data() for i in range(2)]
x2, y2 = [rand_data() for i in range(2)]
plt.figure()
plt.scatter(x1, y1, marker='o', label='first', s=20., c='b')
plt.scatter(x2, y2, marker='o', label='second', s=35., c='r')
# Create dummy Line2D objects for legend
h1 = Line2D([0], [0], marker='o', markersize=np.sqrt(20), color='b', linestyle='None')
h2 = Line2D([0], [0], marker='o', markersize=np.sqrt(20), color='r', linestyle='None')
# Set axes limits
plt.gca().set_xlim(-0.2, 1.2)
plt.gca().set_ylim(-0.2, 1.2)
# Plot legend.
plt.legend([h1, h2], ['first', 'second'], loc="lower left", markerscale=2,
scatterpoints=1, fontsize=10)
plt.show()
I did not have much success using #DrV's solution though perhaps my use case is unique. Because of the density of points, I am using the smallest marker size, i.e. plt.plot(x, y, '.', ms=1, ...), and want the legend symbols larger.
I followed the recommendation I found on the matplotlib forums:
plot the data (no labels)
record axes limit (xlimits = plt.xlim())
plot fake data far away from real data with legend-appropriate symbol colors and sizes
restore axes limits (plt.xlim(xlimits))
create legend
Here is how it turned out (for this the dots are actually less important that the lines):
Hope this helps someone else.
Just another alternative here. This has the advantage that it would not use any "private" methods and works even with other objects than scatters present in the legend. The key is to map the scatter PathCollection to a HandlerPathCollection with an updating function being set to it.
def update(handle, orig):
handle.update_from(orig)
handle.set_sizes([64])
plt.legend(handler_map={PathCollection : HandlerPathCollection(update_func=update)})
Complete code example:
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
from matplotlib.collections import PathCollection
from matplotlib.legend_handler import HandlerPathCollection, HandlerLine2D
colors = ["limegreen", "crimson", "indigo"]
markers = ["o", "s", r"$\clubsuit$"]
labels = ["ABC", "DEF", "XYZ"]
plt.plot(np.linspace(0,1,8), np.random.rand(8), marker="o", markersize=22, label="A line")
for i,(c,m,l) in enumerate(zip(colors,markers,labels)):
plt.scatter(np.random.rand(8),np.random.rand(8),
c=c, marker=m, s=10+np.exp(i*2.9), label=l)
def updatescatter(handle, orig):
handle.update_from(orig)
handle.set_sizes([64])
def updateline(handle, orig):
handle.update_from(orig)
handle.set_markersize(8)
plt.legend(handler_map={PathCollection : HandlerPathCollection(update_func=updatescatter),
plt.Line2D : HandlerLine2D(update_func = updateline)})
plt.show()

twinx kills tick label color

I am plotting a double plot with two y-axes. The second axis ax2 is created by twinx. The problem is that the coloring of the second y-axis via yticks is not working anymore. Instead I have to set_color the labels individually. Here is the relevant code:
fig = plt.figure()
fill_between(data[:,0], 0, (data[:,2]), color='yellow')
yticks(arange(0.2,1.2,0.2), ['.2', '.4', '.6', '.8', ' 1'], color='yellow')
ax2 = twinx()
ax2.plot(data[:,0], (data[:,1]), 'green')
yticks(arange(0.1,0.6,0.1), ['.1 ', '.2', '.3', '.4', '.5'], color='green')
# color='green' has no effect here ?!
# instead this is needed:
for t in ax2.yaxis.get_ticklabels(): t.set_color('green')
show()
Resulting in:
This issue only occurs if I set the tick strings.
yticks(arange(0.1,0.6,0.1), ['.1 ', '.2', '.3', '.4', '.5'], color='green')
Omit it, like here
yticks(arange(0.1,0.6,0.1), color='green')
and the coloring works fine.
Is that a bug (could not find any reports to this), a feature (?!) or
am I missing something here? I am using python 2.6.5 with matplotlib 0.99.1.1 on ubuntu.
For whatever it's worth, you code works fine on my system even without the for loop to set the label colors. Just as a reference, here's a stand-alone example trying to follow essentially exactly what you posted:
import matplotlib.pyplot as plt
import numpy as np
# Generate some data
num = 200
x = np.linspace(501, 1200, num)
yellow_data, green_data = np.random.random((2,num))
green_data -= np.linspace(0, 3, yellow_data.size)
# Plot the yellow data
plt.fill_between(x, yellow_data, 0, color='yellow')
plt.yticks([0.0, 0.5, 1.0], color='yellow')
# Plot the green data
ax2 = plt.twinx()
ax2.plot(x, green_data, 'g-')
plt.yticks([-4, -3, -2, -1, 0, 1], color='green')
plt.show()
My guess is that your problem is mostly coming from mixing up references to different objects. I'm guessing that your code is a bit more complex, and that when you call plt.yticks, ax2 is not the current axis. You can test that idea by explicitly calling sca(ax2) (set the current axis to ax2) before calling yticks and see if that changes things.
Generally speaking, it's best to stick to either entirely the matlab-ish state machine interface or the OO interface, and don't mix them too much. (Personally, I prefer just sticking to the OO interface. Use pyplot to set up figure objects and for show, and use the axes methods otherwise. To each his own, though.)
At any rate, with matplotlib >= 1.0, the tick_params function makes this a bit more convenient. (I'm also using plt.subplots here, which is only in >= 1.0, as well.)
import matplotlib.pyplot as plt
import numpy as np
# Generate some data
yellow_data, green_data = np.random.random((2,2000))
yellow_data += np.linspace(0, 3, yellow_data.size)
green_data -= np.linspace(0, 3, yellow_data.size)
# Plot the data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(yellow_data, 'y-')
ax2.plot(green_data, 'g-')
# Change the axis colors...
ax1.tick_params(axis='y', labelcolor='yellow')
ax2.tick_params(axis='y', labelcolor='green')
plt.show()
The equivalent code for older versions of matplotlib would look more like this:
import matplotlib.pyplot as plt
import numpy as np
# Generate some data
yellow_data, green_data = np.random.random((2,2000))
yellow_data += np.linspace(0, 3, yellow_data.size)
green_data -= np.linspace(0, 3, yellow_data.size)
# Plot the data
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
ax2 = ax1.twinx()
ax1.plot(yellow_data, 'y-')
ax2.plot(green_data, 'g-')
# Change the axis colors...
for ax, color in zip([ax1, ax2], ['yellow', 'green']):
for label in ax.yaxis.get_ticklabels():
label.set_color(color)
plt.show()

Plot two histograms on single chart with matplotlib

I created a histogram plot using data from a file and no problem. Now I wanted to superpose data from another file in the same histogram, so I do something like this
n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)
but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden. I wonder how could I plot both histograms at the same time with different colors.
Here you have a working example:
import random
import numpy
from matplotlib import pyplot
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
bins = numpy.linspace(-10, 10, 100)
pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()
The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html
EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by #stochastic_zeitgeist
In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis. For example:
import numpy as np
import matplotlib.pyplot as plt
#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']
#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()
In this case, you can plot your two data sets on different axes. To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don't overlap):
#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis
#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])
#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()
As a completion to Gustavo Bezerra's answer:
If you want each histogram to be normalized (normed for mpl<=2.1 and density for mpl>=3.1) you cannot just use normed/density=True, you need to set the weights for each value instead:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
As a comparison, the exact same x and y vectors with default weights and density=True:
You should use bins from the values returned by hist:
import numpy as np
import matplotlib.pyplot as plt
foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution
_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)
Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:
def plotHistogram(p, o):
"""
p and o are iterables with the values you want to
plot the histogram of
"""
plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
plt.show()
Plotting two overlapping histograms (or more) can lead to a rather cluttered plot. I find that using step histograms (aka hollow histograms) improves the readability quite a bit. The only downside is that in matplotlib the default legend for a step histogram is not properly formatted, so it can be edited like in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
from matplotlib.lines import Line2D
rng = np.random.default_rng(seed=123)
# Create two normally distributed random variables of different sizes
# and with different shapes
data1 = rng.normal(loc=30, scale=10, size=500)
data2 = rng.normal(loc=50, scale=10, size=1000)
# Create figure with 'step' type of histogram to improve plot readability
fig, ax = plt.subplots(figsize=(9,5))
ax.hist([data1, data2], bins=15, histtype='step', linewidth=2,
alpha=0.7, label=['data1','data2'])
# Edit legend to get lines as legend keys instead of the default polygons
# and sort the legend entries in alphanumeric order
handles, labels = ax.get_legend_handles_labels()
leg_entries = {}
for h, label in zip(handles, labels):
leg_entries[label] = Line2D([0], [0], color=h.get_facecolor()[:-1],
alpha=h.get_alpha(), lw=h.get_linewidth())
labels_sorted, lines = zip(*sorted(leg_entries.items()))
ax.legend(lines, labels_sorted, frameon=False)
# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Add annotations
plt.ylabel('Frequency', labelpad=15)
plt.title('Matplotlib step histogram', fontsize=14, pad=20)
plt.show()
As you can see, the result looks quite clean. This is especially useful when overlapping even more than two histograms. Depending on how the variables are distributed, this can work for up to around 5 overlapping distributions. More than that would require the use of another type of plot, such as one of those presented here.
It sounds like you might want just a bar graph:
http://matplotlib.sourceforge.net/examples/pylab_examples/bar_stacked.html
http://matplotlib.sourceforge.net/examples/pylab_examples/barchart_demo.html
Alternatively, you can use subplots.
There is one caveat when you want to plot the histogram from a 2-d numpy array. You need to swap the 2 axes.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()
Also an option which is quite similar to joaquin answer:
import random
from matplotlib import pyplot
#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()
Gives the following output:
Just in case you have pandas (import pandas as pd) or are ok with using it:
test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)],
[random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()
This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.
import seasborn as sns
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)
Some helpful examples are here for kde vs histogram comparison.
Inspired by Solomon's answer, but to stick with the question, which is related to histogram, a clean solution is:
sns.distplot(bar)
sns.distplot(foo)
plt.show()
Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.

Categories

Resources