Customizing legend in matplotlib - python

i'm plotting rectangles that have different colors in matplotlib. i would like each type of rectangle to have a single label that appears once in the legend. my code is:
import matplotlib.patches as patches
fig1 = plt.figure()
ax = plt.subplot(1,1,1)
times = [0, 1, 2, 3, 4]
for t in times:
if t % 2 == 0:
color="blue"
else:
color="green"
ax.add_patch(patches.Rectangle((t, 0.5), 0.1, 0.1,
facecolor=color,
label=color))
plt.xlim(times[0], times[-1] + 0.1)
plt.legend()
plt.show()
the problem is that each rectangle appears multiple in the legend. i would like to only have two entries in the legend: a blue rectangle labeled "blue", and a green rectangle labeled "green". how can this be achieved?

As documented here, you can control the legend by specifying the handles to the graphical objects for which you want the legends. In this case, two out of the five objects are needed, so you can store them in a dictionary
import matplotlib.pyplot as plt
import matplotlib.patches as patches
fig1 = plt.figure()
ax = plt.subplot(1,1,1)
times = [0, 1, 2, 3, 4]
handle = {}
for t in times:
if t % 2 == 0:
color="blue"
else:
color="green"
handle[color] = ax.add_patch(patches.Rectangle((t, 0.5), 0.1, 0.1,
facecolor=color,
label=color))
plt.xlim(times[0], times[-1] + 0.1)
print handle
plt.legend([handle['blue'],handle['green']],['MyBlue','MyGreen'])
plt.show()

Related

Set steps on y-axis with matplotlib

Currently I have the problem that I do not get the steps on the y-axis (score) changed. My representation currently looks like this:
However, since only whole numbers are possible in my evaluation, these 0.5 steps are rather meaningless in my representation. I would like to change these steps from 0.5 to 1.0. So that I get the steps [0, 1, 2, 3, ...] instead of [0.0, 0.5, 1.0, 1.5, 2.0, ...].
My code broken down to the most necessary and simplified looks like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# some calculations
x = np.arange(len(something)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, test1_means, width, label='Test 1')
rects2 = ax.bar(x + width/2, test2_means, width, label='Test 2')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Something to check')
ax.set_xticks(x, something)
ax.legend()
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
fig.tight_layout()
plt.show()
In addition, after research, I tried setting the variable ax.set_yticks or adjusting the fig. Unfortunately, these attempts did not work.
What am I doing wrong or is this a default setting of matplotlib at this point?
Edit after comment:
My calculations are prepared on the basis of Excel data. Here is a reproducible code snippet with the current values how the code might look like in the final effect:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
stories = ["A", "B", "C", "D", "E"]
test1_means = [2, 3, 2, 3, 1]
test2_means = [0, 1, 0, 0, 0]
x = np.arange(len(stories)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, test1_means, width, label='Test 1')
rects2 = ax.bar(x + width/2, test2_means, width, label='Test 2')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Something')
ax.set_xticks(x, stories)
ax.legend()
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
fig.tight_layout()
plt.show()
You're looking for Axes.set_yticks. Add those lines right before the plot :
N = 1 # <- you can adjust the step here
ax.set_yticks(np.arange(0, max(test1_means + test2_means) + 1, N))
Output :
Based on this answer, you just need to do:
from matplotlib.ticker import MaxNLocator
# set y-axis to only show integer values
ax.yaxis.set_major_locator(MaxNLocator(integer=True))

Matplotlib colorbar labels to fit into boxes

I created a scatter plot using matplotlib but I am somehow unable to get the labels to center into the boxes within the colorbar..
This is the code I have so far:
cMap = ListedColormap(['Orange', 'Purple', 'Blue','Red','Green'])
fig, ax = plt.subplots()
plt.figure(figsize=(12,12),dpi = 80)
#data
dist = np.random.rand(1900,1900)
#legend
cbar = plt.colorbar(scatter)
cbar.ax.get_yaxis().set_ticks([])
for j, lab in enumerate(['$Training$','$None$','$GS$','$ML$','$Both$']):
cbar.ax.text( .5, j - .985, lab, ha='left', va='center', rotation = 270)
cbar.ax.get_yaxis().labelpad = 15
cbar.ax.set_ylabel('Outliers', rotation=270)
indices = np.where(outlier_label != -2)[0]
plt.scatter(dist[indices, 0], dist[indices, 1], c=outlier_label[indices], cmap=cMap, s=20)
plt.gca().set_aspect('equal', 'datalim')
plt.title('Projection of the data', fontsize=24)
Thanks!
In line cbar.ax.text( .5, j - .985, lab, ha='left', va='center', rotation = 270) you have to work and change with '.985' with try and error to get better results.
You can extract the y limits of the colorbar to know its top and bottom. Dividing that area into 11 equally spaced positions, will have the 5 centers at the odd positions of that list. Similarly, you can extract the x limits to find the horizontal center.
Some remarks:
If you already called plt.subplots(), then plt.figure() will create a new figure, leaving the first plot empty. You can set the figsize directly via plt.subplots(figsize=...)
You are mixing matplotlib's "object-oriented interface" with the pyplot interface. This can lead to a lot of confusion. It is best to stick to one or the other. (The object-oriented interface is preferred, especially when you are creating non-trivial plots.)
You set dist = np.random.rand(1900,1900) of dimensions 1900x1900 while you are only using dimensions 1900x2.
The code nor the text give an indication of the values inside outlier_label. The code below assumes they are 5 equally-spaced numbers, and that both the lowest and the highest value are present in the data.
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import numpy as np
colors = ['Orange', 'Purple', 'Blue', 'Red', 'Green']
cmap = ListedColormap(colors)
fig, ax = plt.subplots(figsize=(12, 12), dpi=80)
# data
dist = np.random.randn(1900, 2).cumsum(axis=0)
outlier_label = np.repeat(np.arange(5), 1900 // 5)
indices = outlier_label != -2
scatter = ax.scatter(dist[indices, 0], dist[indices, 1], c=outlier_label[indices], cmap=cmap, s=20)
# legend
cbar = plt.colorbar(scatter, ax=ax)
cbar.ax.get_yaxis().set_ticks([])
cb_xmin, cb_xmax = cbar.ax.get_xlim()
cb_ymin, cb_ymax = cbar.ax.get_ylim()
num_colors = len(colors)
for j, lab in zip(np.linspace(cb_ymin, cb_ymax, 2 * num_colors + 1)[1::2],
['$Training$', '$None$', '$GS$', '$ML$', '$Both$']):
cbar.ax.text((cb_xmin + cb_xmax) / 2, j, lab, ha='center', va='center', rotation=270, color='white', fontsize=16)
cbar.ax.get_yaxis().labelpad = 25
cbar.ax.set_ylabel('Outliers', rotation=270, fontsize=18)
ax.set_aspect('equal', 'datalim')
ax.set_title('Projection of the data', fontsize=24)
plt.show()

matplotlib subplots last plot disturbs log scale

I am making a matplotlib figure with a 2x2 dimension where x- and y-axis are shared, and then loop over the different axes to plot in them. I'm plotting variant data per sample, and it is possible that a sample doesn't have variant data, so then I want the plot to say "NA" in the middle of it.
import matplotlib.pyplot as plt
n_plots_per_fig = 4
nrows = 2
ncols = 2
fig, axs = plt.subplots(nrows, ncols, sharex="all", sharey="all", figsize=(8, 6))
axs = axs.ravel()
for i, ax in enumerate(axs):
x = [1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3] # example values, but this list CAN be empty
bins = 3 # example bins
if x:
ax.hist(x, bins=bins) # plot the hist
ax.set_yscale("log")
ax.set_title(str(i), fontsize="medium")
else:
ax.set_title(str(i), fontsize="medium")
ax.text(0.5, 0.5, 'NA', ha='center', va='center', transform=ax.transAxes)
fig.show()
This works in almost every case; example of wanted output:
However, only if the last plot in the figure doesn't have any data, then this disturbs the log scale. Example code that triggers this:
import matplotlib.pyplot as plt
n_plots_per_fig = 4
nrows = 2
ncols = 2
fig, axs = plt.subplots(nrows, ncols, sharex="all", sharey="all", figsize=(8, 6))
axs = axs.ravel()
for i, ax in enumerate(axs):
x = [1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
bins = 3
if i == n_plots_per_fig-1: # this will distort the log scale
ax.set_title(str(i), fontsize="medium")
ax.text(0.5, 0.5, 'NA', ha='center', va='center', transform=ax.transAxes)
elif x:
ax.hist(x, bins=bins) # plot the hist
ax.set_yscale("log")
ax.set_title(str(i), fontsize="medium")
else:
ax.set_title(str(i), fontsize="medium")
ax.text(0.5, 0.5, 'NA', ha='center', va='center', transform=ax.transAxes)
fig.show()
The log scale is now set to really low values, and this is not what I want. I've tried several things to fix this, like unsharing the y-axes for the plot that doesn't have any data [ax.get_shared_y_axes().remove(axis) for axis in axs] or hiding the plot ax.set_visible(False), but none of this works. The one thing that does work is removing the axes from the plot with ax.remove(), but since this is the bottom most sample, this also removes the values for the x ticks for that column:
And besides that, I would still like the name of the sample that didn't have any data to be visible in the axes (and the "NA" text), and removing the axes doesn't allow this.
Any ideas on a fix?
Edit: I simplified my example.
You can set the limits manually with ax.set_xlim() / ax.set_ylim().
Note, that if you share the axes it does not matter on which subplot you call those functions. For example:
axs[-1][-1].set_ylim(1e0, 1e2)
If you do not know the limits before, you can infer it from the other plots:
x = np.random.random(100)
bins = 10
if bins != 0:
...
yy, xx = np.histogram(x, bins=bins)
ylim = yy.min(), yy.max()
xlim = xx.min(), xx.max()
else:
ax.set_xlim(xlim)
ax.set_ylim(ylim)

Arrows between matplotlib subplots

I decided to play around with this example code a bit. I was able to figure out how to draw a straight line between the two subplots, even when the line is outside the bounds of one of the subplots.
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
axs = [ax1, ax2]
# Fixing random state for reproducibility
np.random.seed(19680801)
# generate some random test data
all_data = [np.random.normal(0, std, 100) for std in range(6, 10)]
# plot violin plot
axs[0].violinplot(all_data,
showmeans=False,
showmedians=True)
axs[0].set_title('Violin plot')
# plot box plot
axs[1].boxplot(all_data)
axs[1].set_title('Box plot')
# adding horizontal grid lines
for ax in axs:
ax.yaxis.grid(True)
ax.set_xticks([y + 1 for y in range(len(all_data))])
ax.set_xlabel('Four separate samples')
ax.set_ylabel('Observed values')
for tick in ax.xaxis.get_major_ticks():
tick.label.set_fontsize(20)
plt.setp(axs[0], xticklabels=['x1', 'x2', 'x3', 'x4'])
transFigure = fig.transFigure.inverted()
coord1 = transFigure.transform(ax1.transData.transform([5,10]))
coord2 = transFigure.transform(ax2.transData.transform([2,-10]))
line = mpl.lines.Line2D((coord1[0],coord2[0]),(coord1[1],coord2[1]),
c='k', lw=5, transform=fig.transFigure)
fig.lines.append(line)
Yes that added line is ugly but I just wanted to get it functional.
However, what I'd really like to do is make an arrow between the subplots, and I can't figure out how without jury-rigging my own arrow tails. Is there a way to do this that uses the matplotlib.pyplot.arrow class?
I also wanted to draw an arrow between two subplots but I didn't even know where to start! However, the line between subplots example in the original question gave me enough of a clue to get started...
First, I reduced the code in the original question to a minimal working example:
from matplotlib import lines, pyplot as plt
fig = plt.figure()
# First subplot
ax1 = fig.add_subplot(121)
plt.plot([0, 1], [0, 1])
# Second subplot
ax2 = fig.add_subplot(122)
plt.plot([0, 1], [0, 1])
# Add line from one subplot to the other
xyA = [0.5, 1.0]
ax1.plot(*xyA, "o")
xyB = [0.75, 0.25]
ax2.plot(*xyB, "o")
transFigure = fig.transFigure.inverted()
coord1 = transFigure.transform(ax1.transData.transform(xyA))
coord2 = transFigure.transform(ax2.transData.transform(xyB))
line = lines.Line2D(
(coord1[0], coord2[0]), # xdata
(coord1[1], coord2[1]), # ydata
transform=fig.transFigure,
color="black",
)
fig.lines.append(line)
# Show figure
plt.show()
This produces the following output:
Then, using this blog post, I thought the answer was to create a matplotlib.patches.FancyArrowPatch and append it to fig.patches (instead of creating a matplotlib.lines.Line2D and appending it to fig.lines). After consulting the matplotlib.patches.FancyArrowPatch documentation, plus some trial and error, I came up with something that works in matplotlib 3.1.2:
from matplotlib import patches, pyplot as plt
fig = plt.figure()
# First subplot
ax1 = fig.add_subplot(121)
plt.plot([0, 1], [0, 1])
# Second subplot
ax2 = fig.add_subplot(122)
plt.plot([0, 1], [0, 1])
# Add line from one subplot to the other
xyA = [0.5, 1.0]
ax1.plot(*xyA, "o")
xyB = [0.75, 0.25]
ax2.plot(*xyB, "o")
transFigure = fig.transFigure.inverted()
coord1 = transFigure.transform(ax1.transData.transform(xyA))
coord2 = transFigure.transform(ax2.transData.transform(xyB))
arrow = patches.FancyArrowPatch(
coord1, # posA
coord2, # posB
shrinkA=0, # so tail is exactly on posA (default shrink is 2)
shrinkB=0, # so head is exactly on posB (default shrink is 2)
transform=fig.transFigure,
color="black",
arrowstyle="-|>", # "normal" arrow
mutation_scale=30, # controls arrow head size
linewidth=3,
)
fig.patches.append(arrow)
# Show figure
plt.show()
However, as per the comments below, this does not work in matplotlib 3.4.2, where you get this:
Notice that the ends of the arrow do not line up with the target points (orange circles), which they should do.
This matplotlib version change also causes the original line example to fail in the same way.
However, there is a better patch! Use ConnectionPatch (docs), which is a subclass of FancyArrowPatch, instead of using FancyArrowPatch directly as the ConnectionPatch is designed specifically for this use case and deals with the transform more correctly, as shown in this matplotlib documentation example:
fig = plt.figure()
# First subplot
ax1 = fig.add_subplot(121)
plt.plot([0, 1], [0, 1])
# Second subplot
ax2 = fig.add_subplot(122)
plt.plot([0, 1], [0, 1])
# Add line from one subplot to the other
xyA = [0.5, 1.0]
ax1.plot(*xyA, "o")
xyB = [0.75, 0.25]
ax2.plot(*xyB, "o")
# ConnectionPatch handles the transform internally so no need to get fig.transFigure
arrow = patches.ConnectionPatch(
xyA,
xyB,
coordsA=ax1.transData,
coordsB=ax2.transData,
# Default shrink parameter is 0 so can be omitted
color="black",
arrowstyle="-|>", # "normal" arrow
mutation_scale=30, # controls arrow head size
linewidth=3,
)
fig.patches.append(arrow)
# Show figure
plt.show()
This produces the correct output in both matplotlib 3.1.2 and matplotlib 3.4.2, which looks like this:
To draw a correctly positioned line connecting across two subplots in matplotlib 3.4.2, use a ConnectionPatch as above but with arrowstyle="-" (i.e. no arrow heads, so just a line).
NB: You cannot use:
plt.arrow as it is automatically added to the current axes so only appears in one subplot
matplotlib.patches.Arrow as the axes-figure transform skews the arrow-head
matplotlib.patches.FancyArrow as this also results in a skewed arrow-head

how to shade points in scatter based on colormap in matplotlib?

I'm trying to shade points in a scatter plot based on a set of values (from 0 to 1) picked from one of the already defined color maps, like Blues or Reds. I tried this:
import matplotlib
import matplotlib.pyplot as plt
from numpy import *
from scipy import *
fig = plt.figure()
mymap = plt.get_cmap("Reds")
x = [8.4808517662594909, 11.749082788323497, 5.9075039082855652, 3.6156231827873615, 12.536817102137768, 11.749082788323497, 5.9075039082855652, 3.6156231827873615, 12.536817102137768]
spaced_colors = linspace(0, 1, 10)
print spaced_colors
plt.scatter(x, x,
color=spaced_colors,
cmap=mymap)
# this does not work either
plt.scatter(x, x,
color=spaced_colors,
cmap=plt.get_cmap("gray"))
But it does not work, using either the Reds or gray color map. How can this be done?
edit: if I want to plot each point separately so it can have a separate legend, how can I do it? I tried:
fig = plt.figure()
mymap = plt.get_cmap("Reds")
data = np.random.random([10, 2])
colors = list(linspace(0.1, 1, 5)) + list(linspace(0.1, 1, 5))
print "colors: ", colors
plt.subplot(1, 2, 1)
plt.scatter(data[:, 0], data[:, 1],
c=colors,
cmap=mymap)
plt.subplot(1, 2, 2)
# attempt to plot first five points in five shades of red,
# with a separate legend for each point
for n in range(5):
plt.scatter([data[n, 0]], [data[n, 1]],
c=[colors[n]],
cmap=mymap,
label="point %d" %(n))
plt.legend()
but it fails. I need to make a call to scatter for each point so that it can have a separate label=, but still want each point to have a different shade of the color map as its color.
thanks.
If you really want to do this (what you describe in your edit), you have to "pull" the colors from your colormap (I have commented all changes I made to your code):
import numpy as np
import matplotlib.pyplot as plt
# plt.subplots instead of plt.subplot
# create a figure and two subplots side by side, they share the
# x and the y-axis
fig, axes = plt.subplots(ncols=2, sharey=True, sharex=True)
data = np.random.random([10, 2])
# np.r_ instead of lists
colors = np.r_[np.linspace(0.1, 1, 5), np.linspace(0.1, 1, 5)]
mymap = plt.get_cmap("Reds")
# get the colors from the color map
my_colors = mymap(colors)
# here you give floats as color to scatter and a color map
# scatter "translates" this
axes[0].scatter(data[:, 0], data[:, 1], s=40,
c=colors, edgecolors='None',
cmap=mymap)
for n in range(5):
# here you give a color to scatter
axes[1].scatter(data[n, 0], data[n, 1], s=40,
color=my_colors[n], edgecolors='None',
label="point %d" %(n))
# by default legend would show multiple scatterpoints (as you would normally
# plot multiple points with scatter)
# I reduce the number to one here
plt.legend(scatterpoints=1)
plt.tight_layout()
plt.show()
However, if you only want to plot 10 values and want to name every single one,
you should consider using something different, for instance a bar chart as in this
example. Another opportunity would be to use plt.plot with a custom color cycle, like in this example.
As per the documentation, you want the c keyword argument instead of color. (I agree that this is a bit confusing, but the "c" and "s" terminology is inherited from matlab, in this case.)
E.g.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
x, y, colors = np.random.random((3,10))
fig, ax = plt.subplots()
ax.scatter(x, y, c=colors, s=50, cmap=mpl.cm.Reds)
plt.show()
How about:
import matplotlib.pyplot as plt
import numpy as np
reds = plt.get_cmap("Reds")
x = np.linspace(0, 10, 10)
y = np.log(x)
# color by value given a cmap
plt.subplot(121)
plt.scatter(x, y, c=x, s=100, cmap=reds)
# color by value, and add a legend for each
plt.subplot(122)
norm = plt.normalize()
norm.autoscale(x)
for i, (x_val, y_val) in enumerate(zip(x, y)):
plt.plot(x_val, y_val, 'o', markersize=10,
color=reds(norm(x_val)),
label='Point %s' % i
)
plt.legend(numpoints=1, loc='lower right')
plt.show()
The code should all be fairly self explanatory, but if you want me to go over anything, just shout.

Categories

Resources