matplotlib: labeling of curves - python

When I create a plot with many curves it would be convenient to be able to label each curve at the right where it ends.
The result of plt.legend produces too many similar colors and the legend is overlapping the plot.
As one can see in the example below the use of plt.legend is not very effective:
import numpy as np
from matplotlib import pyplot as plt
n=10
x = np.linspace(0,1, n)
for i in range(n):
y = np.linspace(x[i],x[i], n)
plt.plot(x, y, label=str(i))
plt.legend(loc='upper right')
plt.show()
If possible I would like to have something similar to this plot:
or this:

I would recommend the answer suggested in the comments, but another method that gives something similar to your first option (albeit without the exact placement of the legend markers matching the positions of the associated lines) is:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
n=10
x = np.linspace(0, 1, n)
labels = [str(i) for i in range(len(x))]
for i in range(n):
y = np.linspace(x[i], x[i], n)
ax.plot(x, y, label=labels[i])
h, _ = ax.get_legend_handles_labels()
# sort the legend handles/labels so they are in the same order as the data
hls = sorted(zip(x, h, labels), reverse=True)
ax.legend(
[ha[1] for ha in hls], # get handles
[la[2] for la in hls], # get labels
bbox_to_anchor=(1.04, 0, 0.1, 1), # set box outside of axes
loc="lower left",
labelspacing=1.6, # add space between labels
)
leg = ax.get_legend()
# expand the border of the legend
fontsize = fig.canvas.get_renderer().points_to_pixels(leg._fontsize)
pad = 2 * (leg.borderaxespad + leg.borderpad) * fontsize
leg._legend_box.set_height(leg.get_bbox_to_anchor().height - pad)
This is heavily reliant on the answers here and here.

Related

Put legend on a place of a subplot

I would like to put a legend on a place of a central subplot (and remove it).
I wrote this code:
import matplotlib.pylab as plt
import numpy as np
f, ax = plt.subplots(3,3)
x = np.linspace(0, 2. * np.pi, 1000)
y = np.sin(x)
for axis in ax.ravel():
axis.plot(x, y)
legend = axis.legend(loc='center')
plt.show()
I do not know how to hide a central plot. And why legend is not appear?
This link did not help http://matplotlib.org/1.3.0/examples/pylab_examples/legend_demo.html
There are several problems with your code. In your for loop, you are attempting to plot a legend on each axis (the loc="center" refers to the axis, not the figure), yet you have not given a plot label to represent in your legend.
You need to choose the central axis in your loop and only display a legend for this axis. This iteration of the loop should have no plot call either, if you don't want a line there. You can do this with a set of conditionals like I have done in the following code:
import matplotlib.pylab as plt
import numpy as np
f, ax = plt.subplots(3,3)
x = np.linspace(0, 2. * np.pi, 1000)
y = np.sin(x)
handles, labels = (0, 0)
for i, axis in enumerate(ax.ravel()):
if i == 4:
axis.set_axis_off()
legend = axis.legend(handles, labels, loc='center')
else:
axis.plot(x, y, label="sin(x)")
if i == 3:
handles, labels = axis.get_legend_handles_labels()
plt.show()
This gives me the following image:

Dynamic marker colour in matplotlib

I have two lists containing the x and y coordinates of some points. There is also a list with some values assigned to each of those points. Now my question is, I can always plot the points (x,y) using markers in python. Also I can select colour of the marker manually (as in this code).
import matplotlib.pyplot as plt
x=[0,0,1,1,2,2,3,3]
y=[-1,3,2,-2,0,2,3,1]
colour=['blue','green','red','orange','cyan','black','pink','magenta']
values=[2,6,10,8,0,9,3,6]
for i in range(len(x)):
plt.plot(x[i], y[i], linestyle='none', color=colour[i], marker='o')
plt.axis([-1,4,-3,4])
plt.show()
But is it possible to choose a colour for the marker marking a particular point according to the value assigned to that point (using cm.jet, cm.gray or similar other color schemes) and provide a colorbar with the plot ?
For example, this is the kind of plot I am looking for
where the red dots denote high temperature points and the blue dots denote low temperature ones and others are for temperatures in between.
You are most likely looking for matplotlib.pyplot.scatter. Example:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
# Generate data:
N = 10
x = np.linspace(0, 1, N)
y = np.linspace(0, 1, N)
x, y = np.meshgrid(x, y)
colors = np.random.rand(N, N) # colors for each x,y
# Plot
circle_size = 200
cmap = matplotlib.cm.viridis # replace with your favourite colormap
fig, ax = plt.subplots(figsize=(4, 4))
s = ax.scatter(x, y, s=circle_size, c=colors, cmap=cmap)
# Prettify
ax.axis("tight")
fig.colorbar(s)
plt.show()
Note: viridis may fail on older version of matplotlib.
Resulting image:
Edit
scatter does not require your input data to be 2-D, here are 4 alternatives that generate the same image:
import matplotlib
import matplotlib.pyplot as plt
x = [0,0,1,1,2,2,3,3]
y = [-1,3,2,-2,0,2,3,1]
values = [2,6,10,8,0,9,3,6]
# Let the colormap extend between:
vmin = min(values)
vmax = max(values)
cmap = matplotlib.cm.viridis
norm = matplotlib.colors.Normalize(vmin=vmin, vmax=vmax)
fig, ax = plt.subplots(4, sharex=True, sharey=True)
# Alternative 1: using plot:
for i in range(len(x)):
color = cmap(norm(values[i]))
ax[0].plot(x[i], y[i], linestyle='none', color=color, marker='o')
# Alternative 2: using scatter without specifying norm
ax[1].scatter(x, y, c=values, cmap=cmap)
# Alternative 3: using scatter with normalized values:
ax[2].scatter(x, y, c=cmap(norm(values)))
# Alternative 4: using scatter with vmin, vmax and cmap keyword-arguments
ax[3].scatter(x, y, c=values, vmin=vmin, vmax=vmax, cmap=cmap)
plt.show()

Change size/alpha of markers in the legend box of matplotlib

What is the most convenient way to enlarge and set the alpha value of the markers (back to 1.0) in the legend box? I'm also happy with big coloured boxes.
import matplotlib.pyplot as plt
import numpy as np
n = 100000
s1 = np.random.normal(0, 0.05, n)
s2 = np.random.normal(0, 0.08, n)
ys = np.linspace(0, 1, n)
plt.plot(s1, ys, ',', label='data1', alpha=0.1)
plt.plot(s2, ys, ',', label='data2', alpha=0.1)
plt.legend(bbox_to_anchor=(1.005, 1), loc=2, borderaxespad=0.)
For the size you can include the keyword markerscale=## in the call to legend and that will make the markers bigger (or smaller).
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
fig.clf()
x1,y1 = 4.*randn(10000), randn(10000)
x2,y2 = randn(10000), 4.*randn(10000)
ax = [fig.add_subplot(121+c) for c in range(2)]
ax[0].plot(x1, y1, 'bx',ms=.1,label='blue x')
ax[0].plot(x2, y2, 'r^',ms=.1,label='red ^')
ax[0].legend(loc='best')
ax[1].plot(x1, y1, 'bx',ms=.1,label='blue x')
ax[1].plot(x2, y2, 'r^',ms=.1,label='red ^')
ax[1].legend(loc='best', markerscale=40)
leg = plt.legend()
for lh in leg.legendHandles:
lh.set_alpha(1)
credit to https://izziswift.com/set-legend-symbol-opacity-with-matplotlib/
We can use the handler_map option to .legend() to define a custom function to update the alpha or marker for all Line2D instances in the legend. This method has the advantage that it gets the legend markers correct first time, they do not need to be modified afterwards, and fixes issues where the original legend markers can sometimes still be seen.
This method makes use of HandlerLine2D from the matplotlib.legend_handler module. I'm not aware of a way to do this without adding the extra import.
A complete script would look like this:
import matplotlib.pyplot as plt
from matplotlib.legend_handler import HandlerLine2D
import numpy as np
# Generate data
n = 100000
s1 = np.random.normal(0, 0.05, n)
s2 = np.random.normal(0, 0.08, n)
ys = np.linspace(0, 1, n)
# Create figure and plot the data
fig, ax = plt.subplots()
ax.plot(s1, ys, ',', label='data1', alpha=0.1)
ax.plot(s2, ys, ',', label='data2', alpha=0.1)
def change_alpha(handle, original):
''' Change the alpha and marker style of the legend handles '''
handle.update_from(original)
handle.set_alpha(1)
handle.set_marker('.')
# Add the legend, and set the handler map to use the change_alpha function
ax.legend(bbox_to_anchor=(1.005, 1), loc=2, borderaxespad=0.,
handler_map={plt.Line2D: HandlerLine2D(update_func=change_alpha)})
plt.show()
Note, below is my original answer. I have left it here for posterity as is may work for some use cases, but has the problem that when you change the alpha and markers, it actually creates new instances on the legend, and does not remove the old ones, so both can still be visible. I would recommend the method above in most cases.
If you name your legend, you can then iterate over the lines contained within it. For example:
leg=plt.legend(bbox_to_anchor=(1.005, 1), loc=2, borderaxespad=0.)
for l in leg.get_lines():
l.set_alpha(1)
l.set_marker('.')
note, you also have to set the marker again. I suggest setting it to . rather than , here, to make it a little more visible
for me the trick was to use the right property:
leg = axs.legend()
for l in leg.get_lines():
l._legmarker.set_markersize(6)
Another option: instead of altering your legend's markers, you can make a custom legend (ref the matplotlib legend tutorial)
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import numpy as np
n = 100000
s1 = np.random.normal(0, 0.05, n)
s2 = np.random.normal(0, 0.08, n)
ys = np.linspace(0, 1, n)
plt.plot(s1, ys, ',', label='data1', alpha=0.1)
plt.plot(s2, ys, ',', label='data2', alpha=0.1)
# manually generate legend
handles, labels = plt.axes().get_legend_handles_labels()
patches = [Patch(color=handle.get_color(), label=label) for handle, label in zip(handles, labels)]
plt.legend(handles=patches, bbox_to_anchor=(1.005, 1), loc=2, borderaxespad=0., frameon=False)

how to shade points in scatter based on colormap in matplotlib?

I'm trying to shade points in a scatter plot based on a set of values (from 0 to 1) picked from one of the already defined color maps, like Blues or Reds. I tried this:
import matplotlib
import matplotlib.pyplot as plt
from numpy import *
from scipy import *
fig = plt.figure()
mymap = plt.get_cmap("Reds")
x = [8.4808517662594909, 11.749082788323497, 5.9075039082855652, 3.6156231827873615, 12.536817102137768, 11.749082788323497, 5.9075039082855652, 3.6156231827873615, 12.536817102137768]
spaced_colors = linspace(0, 1, 10)
print spaced_colors
plt.scatter(x, x,
color=spaced_colors,
cmap=mymap)
# this does not work either
plt.scatter(x, x,
color=spaced_colors,
cmap=plt.get_cmap("gray"))
But it does not work, using either the Reds or gray color map. How can this be done?
edit: if I want to plot each point separately so it can have a separate legend, how can I do it? I tried:
fig = plt.figure()
mymap = plt.get_cmap("Reds")
data = np.random.random([10, 2])
colors = list(linspace(0.1, 1, 5)) + list(linspace(0.1, 1, 5))
print "colors: ", colors
plt.subplot(1, 2, 1)
plt.scatter(data[:, 0], data[:, 1],
c=colors,
cmap=mymap)
plt.subplot(1, 2, 2)
# attempt to plot first five points in five shades of red,
# with a separate legend for each point
for n in range(5):
plt.scatter([data[n, 0]], [data[n, 1]],
c=[colors[n]],
cmap=mymap,
label="point %d" %(n))
plt.legend()
but it fails. I need to make a call to scatter for each point so that it can have a separate label=, but still want each point to have a different shade of the color map as its color.
thanks.
If you really want to do this (what you describe in your edit), you have to "pull" the colors from your colormap (I have commented all changes I made to your code):
import numpy as np
import matplotlib.pyplot as plt
# plt.subplots instead of plt.subplot
# create a figure and two subplots side by side, they share the
# x and the y-axis
fig, axes = plt.subplots(ncols=2, sharey=True, sharex=True)
data = np.random.random([10, 2])
# np.r_ instead of lists
colors = np.r_[np.linspace(0.1, 1, 5), np.linspace(0.1, 1, 5)]
mymap = plt.get_cmap("Reds")
# get the colors from the color map
my_colors = mymap(colors)
# here you give floats as color to scatter and a color map
# scatter "translates" this
axes[0].scatter(data[:, 0], data[:, 1], s=40,
c=colors, edgecolors='None',
cmap=mymap)
for n in range(5):
# here you give a color to scatter
axes[1].scatter(data[n, 0], data[n, 1], s=40,
color=my_colors[n], edgecolors='None',
label="point %d" %(n))
# by default legend would show multiple scatterpoints (as you would normally
# plot multiple points with scatter)
# I reduce the number to one here
plt.legend(scatterpoints=1)
plt.tight_layout()
plt.show()
However, if you only want to plot 10 values and want to name every single one,
you should consider using something different, for instance a bar chart as in this
example. Another opportunity would be to use plt.plot with a custom color cycle, like in this example.
As per the documentation, you want the c keyword argument instead of color. (I agree that this is a bit confusing, but the "c" and "s" terminology is inherited from matlab, in this case.)
E.g.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
x, y, colors = np.random.random((3,10))
fig, ax = plt.subplots()
ax.scatter(x, y, c=colors, s=50, cmap=mpl.cm.Reds)
plt.show()
How about:
import matplotlib.pyplot as plt
import numpy as np
reds = plt.get_cmap("Reds")
x = np.linspace(0, 10, 10)
y = np.log(x)
# color by value given a cmap
plt.subplot(121)
plt.scatter(x, y, c=x, s=100, cmap=reds)
# color by value, and add a legend for each
plt.subplot(122)
norm = plt.normalize()
norm.autoscale(x)
for i, (x_val, y_val) in enumerate(zip(x, y)):
plt.plot(x_val, y_val, 'o', markersize=10,
color=reds(norm(x_val)),
label='Point %s' % i
)
plt.legend(numpoints=1, loc='lower right')
plt.show()
The code should all be fairly self explanatory, but if you want me to go over anything, just shout.

Plot two histograms on single chart with matplotlib

I created a histogram plot using data from a file and no problem. Now I wanted to superpose data from another file in the same histogram, so I do something like this
n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)
but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden. I wonder how could I plot both histograms at the same time with different colors.
Here you have a working example:
import random
import numpy
from matplotlib import pyplot
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
bins = numpy.linspace(-10, 10, 100)
pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()
The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html
EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by #stochastic_zeitgeist
In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis. For example:
import numpy as np
import matplotlib.pyplot as plt
#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']
#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()
In this case, you can plot your two data sets on different axes. To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don't overlap):
#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis
#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])
#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()
As a completion to Gustavo Bezerra's answer:
If you want each histogram to be normalized (normed for mpl<=2.1 and density for mpl>=3.1) you cannot just use normed/density=True, you need to set the weights for each value instead:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)
plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()
As a comparison, the exact same x and y vectors with default weights and density=True:
You should use bins from the values returned by hist:
import numpy as np
import matplotlib.pyplot as plt
foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution
_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)
Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:
def plotHistogram(p, o):
"""
p and o are iterables with the values you want to
plot the histogram of
"""
plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
plt.show()
Plotting two overlapping histograms (or more) can lead to a rather cluttered plot. I find that using step histograms (aka hollow histograms) improves the readability quite a bit. The only downside is that in matplotlib the default legend for a step histogram is not properly formatted, so it can be edited like in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
from matplotlib.lines import Line2D
rng = np.random.default_rng(seed=123)
# Create two normally distributed random variables of different sizes
# and with different shapes
data1 = rng.normal(loc=30, scale=10, size=500)
data2 = rng.normal(loc=50, scale=10, size=1000)
# Create figure with 'step' type of histogram to improve plot readability
fig, ax = plt.subplots(figsize=(9,5))
ax.hist([data1, data2], bins=15, histtype='step', linewidth=2,
alpha=0.7, label=['data1','data2'])
# Edit legend to get lines as legend keys instead of the default polygons
# and sort the legend entries in alphanumeric order
handles, labels = ax.get_legend_handles_labels()
leg_entries = {}
for h, label in zip(handles, labels):
leg_entries[label] = Line2D([0], [0], color=h.get_facecolor()[:-1],
alpha=h.get_alpha(), lw=h.get_linewidth())
labels_sorted, lines = zip(*sorted(leg_entries.items()))
ax.legend(lines, labels_sorted, frameon=False)
# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Add annotations
plt.ylabel('Frequency', labelpad=15)
plt.title('Matplotlib step histogram', fontsize=14, pad=20)
plt.show()
As you can see, the result looks quite clean. This is especially useful when overlapping even more than two histograms. Depending on how the variables are distributed, this can work for up to around 5 overlapping distributions. More than that would require the use of another type of plot, such as one of those presented here.
It sounds like you might want just a bar graph:
http://matplotlib.sourceforge.net/examples/pylab_examples/bar_stacked.html
http://matplotlib.sourceforge.net/examples/pylab_examples/barchart_demo.html
Alternatively, you can use subplots.
There is one caveat when you want to plot the histogram from a 2-d numpy array. You need to swap the 2 axes.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()
Also an option which is quite similar to joaquin answer:
import random
from matplotlib import pyplot
#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]
#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()
Gives the following output:
Just in case you have pandas (import pandas as pd) or are ok with using it:
test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)],
[random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()
This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.
import seasborn as sns
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)
Some helpful examples are here for kde vs histogram comparison.
Inspired by Solomon's answer, but to stick with the question, which is related to histogram, a clean solution is:
sns.distplot(bar)
sns.distplot(foo)
plt.show()
Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.

Categories

Resources