Matplotlib Scatter Plot Legend Creation Mystery - python

I have the following snipped of code (values for c, s, x, y are mockups, but the real lists follow the same format, just much bigger. Only two colors are used - red and green though. All lists are of the same size)
The issue is that the color legend fails to materialize. I am completely at loss as to why. Code snippets for legend generation is basically a cut-n-paste from docs, i.e. (https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/scatter_with_legend.html#sphx-glr-gallery-lines-bars-and-markers-scatter-with-legend-py)
Anyone has any idea??
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
c = [ 'g', 'r', 'r', 'g', 'g', 'r', 'r', 'r', 'g', 'r']
s = [ 10, 20, 10, 40, 60, 90, 90, 50, 60, 40]
x = [ 2.4, 3.0, 3.5, 3.5, 3.5, 3.5, 3.5, 2.4, 3.5, 3.5]
y = [24.0, 26.0, 20.0, 19.0, 19.0, 21.0, 20.0, 23.0, 20.0, 20.0]
fig, ax = plt.subplots()
scatter = plt.scatter(x, y, s=s, c=c, alpha=0.5)
# produce a legend with the unique colors from the scatter
handles, lables = scatter.legend_elements()
legend1 = ax.legend(handles, labels, loc="lower left", title="Colors")
ax.add_artist(legend1)
# produce a legend with a cross section of sizes from the scatter
handles, labels = scatter.legend_elements(prop="sizes", alpha=0.5)
legend2 = ax.legend(handles, labels, loc="upper right", ncol=2, title="Sizes")
plt.show()
Plot output:

It seems that legend_elements() is only meant to be used when c= is passed a numeric array to be mapped against a colormap.
You can test by replacing c=c by c=s in your code, and you will get the desired output.
Personally, I would have expected your code to work, and maybe it is worth bringing it up either as a bug or a feature request at matplotlib's github. EDIT: actually, there is already a discussion about this very issue on the issue tracker
One way to circumvent this limitation is to replace your array of colors names with a numeric array and creating a custom colormap that maps each value in your array to the desired color:
#c = [ 'g', 'r', 'r', 'g', 'g', 'r', 'r', 'r', 'g', 'r']
c = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
cmap = matplotlib.colors.ListedColormap(['g','r'])
s = [ 10, 20, 10, 40, 60, 90, 90, 50, 60, 40]
x = [ 2.4, 3.0, 3.5, 3.5, 3.5, 3.5, 3.5, 2.4, 3.5, 3.5]
y = [24.0, 26.0, 20.0, 19.0, 19.0, 21.0, 20.0, 23.0, 20.0, 20.0]
fig, ax = plt.subplots()
scatter = plt.scatter(x, y, s=s, c=c, alpha=0.5, cmap=cmap)
# produce a legend with the unique colors from the scatter
handles, labels = scatter.legend_elements()
legend1 = ax.legend(handles, labels, loc="lower left", title="Colors")
ax.add_artist(legend1)
# produce a legend with a cross section of sizes from the scatter
handles, labels = scatter.legend_elements(prop="sizes", alpha=0.5)
legend2 = ax.legend(handles, labels, loc="upper right", ncol=2, title="Sizes")
plt.show()

Related

How to generate proper legends for scatter plot in python

I am trying to prepare a box and scatter plot for 8 data points in python. I use the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x = [24.4, 6.7, 19.7, 16.0, 25.1, 19.5, 10, 22.1]
f, ax = plt.subplots()
ax.boxplot(x, vert=False, showmeans=True, showfliers=False)
x0 = np.random.normal(1, 0.05, len(x))
c = ['r', 'b', 'c', 'm', 'y', 'g', 'm', 'k']
lab = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
ax.scatter(x, x0, c=c, s=60, alpha=0.2)
ax.legend(labels=lab, loc="upper left", ncol=8)
It generate a image like the following:
It looks that the legend doesn't have the proper sphere symbols with different colors, which I expected. Beside the colors for the symbols are shallow and light.
So how to generate proper legends with correct symbols and how to make the colors of the symbols brighter and sharper?
I will deeply appreciate it if anyone can help.
Best regards
To make the colours brighter, just raise the alpha value.
For the legend, the order of the plotting matters here, it is better that the boxplot is plotted after the scatter plots. Also, to get for each point a place in the legend, it should b considered as a different graph, for that I used a loop to loop over the values of x, x0 and c. Here's the outcome:
import numpy as np
import matplotlib.pyplot as plt
# init figure
f, ax = plt.subplots()
# values
x = [24.4, 6.7, 19.7, 16.0, 25.1, 19.5, 10, 22.1]
x0 = np.random.normal(1, 0.05, len(x))
# labels and colours
c = ['r', 'b', 'c', 'm', 'y', 'g', 'm', 'k']
lab = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
# put the plots into a list
plots = []
for i in range(len(x)):
p = ax.scatter(x[i], x0[i], c=c[i], s=60, alpha=0.5) # raised the alpha to get sharper colors
plots.append(p)
# plot legends
plt.legend(plots,
labels=lab,
scatterpoints=1,
loc='upper left',
ncol=8,
fontsize=8)
# plot the box plot (the order here matters!)
ax.boxplot(x, vert=False, showmeans=True, showfliers=False)
# save the desired figure
plt.savefig('tt.png')
Output:

mplcursors interactivity with endpoints of scatterplots

import pandas as pd
import matplotlib.pyplot as plt
import mplcursors
df = pd.DataFrame(
{'Universe': ['Darvel', 'MC', 'MC', 'Darvel', 'MC', 'Other', 'Darvel'],
'Value': [10, 11, 13, 12, 9, 7, 10],
'Upper': [12.5, 11.3, 15.4, 12.2, 13.1, 8.8, 11.5],
'Lower': [4.5, 9.6, 11.8, 6, 6.5, 5, 8]})
df['UpperError'] = df['Upper'] - df['Value']
df['LowerError'] = df['Value'] - df['Lower']
colors = ['r', 'g', 'b']
fig, ax = plt.subplots()
for i, universe in enumerate(df['Universe'].unique()):
to_plot = df[df['Universe'] == universe]
ax.scatter(to_plot.index, to_plot['Value'], s=16, c=colors[i])
error = to_plot[['LowerError', 'UpperError']].transpose().to_numpy()
ax.errorbar(to_plot.index, to_plot['Value'], yerr=error, fmt='o',
markersize=0, capsize=6, color=colors[i])
ax.scatter(to_plot.index, to_plot['Upper'], c='w', zorder=-1)
ax.scatter(to_plot.index, to_plot['Lower'], c='w', zorder=-1)
mplcursors.cursor(hover=True)
plt.show()
This does most of what I want, but I want the following changes.
I do not want the mplcursors cursor to interact with the errorbars, but just the scatter plots, including the invisible scatterplots on top and bottom of the errorbars.
I just want the y value to show. For example, the first bar should say "12.5" on the top, "10.0" in the middle, and "4.5" on the bottom.
To have mplcursors only interact with some elements, a list of those elements can be given as the first parameter to mplcursors.cursor(). The list could be built from the return values of the calls to ax.scatter.
To modify the annotation text shown, a custom function can be connected. In the example below, the label and the y-position are extracted from the selected element and put into the annotation text. Such label can be added via ax.scatter(..., label=...).
(Choosing 'none' as the color for the "invisible" elements makes them really invisible. To make the code more "Pythonic" explicit indices can be avoided, working with zip instead of with enumerate.)
import matplotlib.pyplot as plt
import mplcursors
import pandas as pd
def show_annotation(sel):
text = f'{sel.artist.get_label()}\n y={sel.target[1]:.1f}'
sel.annotation.set_text(text)
df = pd.DataFrame(
{'Universe': ['Darvel', 'MC', 'MC', 'Darvel', 'MC', 'Other', 'Darvel'],
'Value': [10, 11, 13, 12, 9, 7, 10],
'Upper': [12.5, 11.3, 15.4, 12.2, 13.1, 8.8, 11.5],
'Lower': [4.5, 9.6, 11.8, 6, 6.5, 5, 8]})
df['UpperError'] = df['Upper'] - df['Value']
df['LowerError'] = df['Value'] - df['Lower']
colors = ['r', 'g', 'b']
fig, ax = plt.subplots()
all_scatters = []
for universe, color in zip(df['Universe'].unique(), colors):
to_plot = df[df['Universe'] == universe]
all_scatters.append(ax.scatter(to_plot.index, to_plot['Value'], s=16, c=color, label=universe))
error = to_plot[['LowerError', 'UpperError']].transpose().to_numpy()
ax.errorbar(to_plot.index, to_plot['Value'], yerr=error, fmt='o',
markersize=0, capsize=6, color=color)
all_scatters.append(ax.scatter(to_plot.index, to_plot['Upper'], c='none', zorder=-1, label=universe))
all_scatters.append(ax.scatter(to_plot.index, to_plot['Lower'], c='none', zorder=-1, label=universe))
cursor = mplcursors.cursor(all_scatters, hover=True)
cursor.connect('add', show_annotation)
plt.show()
PS: You can also show the 'Universe' via the x ticks:
ax.set_xticks(df.index)
ax.set_xticklabels(df['Universe'])
If you want to, for short functions you could use the lambda notation instead of writing a separate function:
cursor.connect('add',
lambda sel: sel.annotation.set_text(f'{sel.artist.get_label()}\n y={sel.target[1]:.1f}'))

How to add labels to the axes of subplots

I am plotting 8 subplots into a figure as follows:
import matplotlib.pyplot as plt
fig, axs = plt.subplots(8)
label = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
data = [0.6, 0.4, 1.3, 0.8, 0.9, 1.0, 1.6, 0.2]
plt.xlim(0,2)
for i in range(8):
axs[i].get_yaxis().set_visible(False)
axs[i].get_xaxis().set_visible(False)
axs[i].set_xlim([0, 2])
axs[i].axvline(data[i],linestyle='--')
axs[i].get_yaxis().set_visible(False)
axs[7].get_xaxis().set_visible(True)
plt.show()
This looks like:
In order to label the subplots I would like to write label[i] (see code above) to the left of subplot i. How can you do that?
(As a quick fix), you might just be able to use Axes.text, for example:
axs[i].text(-0.1,0.2,label[i])
Adjust the x and y arguments as needed depending on the length of the labels.
As mentioned in the comments, another (much better) option is to keep the y-axis visible, but then set the ticks to nothing:
axs[i].set_yticks(())
axs[i].set_ylabel(label[i], rotation=0, ha='right', va='center')
As I mentioned in the comments, the proper approach would be to not set the y axis off, and remove the ticks.
The trick is to remove the two lines with axs[i].get_yaxis().set_visible(False) and add the following two lines:
axs[i].tick_params(left=False, labelleft=False)
axs[i].set_ylabel(label[i])
Please, consider the following code as a full answer (edited to include bnaecker's suggestion):
import matplotlib.pyplot as plt
plt.close('all')
fig, axs = plt.subplots(8, sharex="col")
label = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
data = [0.6, 0.4, 1.3, 0.8, 0.9, 1.0, 1.6, 0.2]
plt.xlim(0, 2)
for i in range(8):
axs[i].set_xlim([0, 2])
axs[i].tick_params(left=False, labelleft=False)
axs[i].axvline(data[i], linestyle='--')
axs[i].set_ylabel(label[i])
plt.show()
The figure should look like this:

How to fix the heatmap plotted in python which seems way off from the scatterplot

I want to plot a heatmap that better visualize the distribution pattern in a scatterplot, but I have some trouble generating the heatmap. The data on y-axis spreads from 0 to 15 and x from 0 to 7.
I referred to the post below regarding how to generate heatmap and coded the following which seems to give me a scatterplot that seems quite off from what I would hope for from the scatterplot.
Generate a heatmap in MatPlotLib using a scatter data set
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm as CM
x = [0.3178, 2.0857, 2.5922, 0.088, 0.3, 0.4006, 1.0241, 0.1913, 0.56, 1.1828, 2.6879, 5.8044, 0.3593, 1.8732, 10.8003, 0.3457, 1.7003, 0.1677, 0.7442, 1.5731, 0.4927, 0.4143, 0.558, 0.2486, 0.3009, 0.163, 2.645, 4.1364, 13.8043, 3.9997, 0.258, 0.78, 10.3991, 0.2425, 0.3335, 4.8002, 0.3529, 5.9263, 0.151, 0.34, 0.1146, 13.6505, 2.8802, 3.2738, 0.5562, 0.5067, 1.5142, 2.0373, 2.5427, 12.1005]
y = [4.4903, 6.8879, 5.6211, 5.1128, 1.8125, 4.9716, 2.6847, 5.3744, 6.5254, 3.875, 3.6667, 2.0, 6.9811, 6.0501, 6.0, 6.8478, 5.0, 5.3676, 3.403, 6.1015, 6.8793, 4.7684, 3.5934, 2.6224, 5.9319, 1.8191, 3.0554, 3.5207, 3.6786, 3.0, 5.9041, 1.9128, 6.3333, 5.4949, 5.7135, 6.0, 5.5348, 3.0, 5.2644, 5.8111, 1.093, 4.0, 7.0, 6.0, 3.8684, 4.8, 1.5283, 6.6932, 7.0, 4.0]
# plot the scatter_plot
xposition = [0,7]
plt.figure()
plt.plot(y,x,'r^', label='series_1',markersize=12)
plt.gcf().set_size_inches(11.7, 8.27)
ax = plt.gca()
ax.tick_params(axis = 'both', which = 'major', labelsize = 16)
for xc in range(0,xposition[-1]+1):
ax.axvline(x=xc, color='darkgrey', linestyle='--', linewidth = 2)
plt.xlabel('x', fontsize=18)
plt.ylabel('y', fontsize=18)
plt.xlim(xposition)
plt.ylim([0,15])
plt.legend(loc='upper right',fontsize = 'x-large')
# plot the heatmap
plt.figure()
heatmap, xedges, yedges = np.histogram2d(y, x, bins=50)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
plt.clf()
plt.imshow(heatmap.T, extent=extent, interpolation='nearest', origin='lower')
plt.pcolormesh(xedges, yedges, heatmap, cmap=CM.RdBu_r, vmin=-7, vmax=7)
plt.gcf().set_size_inches(11.7, 8.27)
plt.show()
For the results, first of all, the plot size of the heatmap seems to be different than the scatterplot although I specified them to be the same. Second, the heatmap simply does not seem to match the pattern in the scatterplot that seems to gather towards the bottom right. Please advise on where I should revise to get the correct heatmap. Thank you.
The code below seems to fix it. You made 3 mistakes.
You made the figures the same size, not the axes.
I added a set_aspect for the scatter plot to make the aspect ratio equal, same as in the heat map.
You drew an imshow and then a pcolormesh on top of it (you don't need both).
The pcolormesh for some reason expects the heat map to be transposed relative to what imshow requires. I transposed it.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm as CM
x = [0.3178, 2.0857, 2.5922, 0.088, 0.3, 0.4006, 1.0241, 0.1913, 0.56, 1.1828, 2.6879, 5.8044, 0.3593, 1.8732, 10.8003, 0.3457, 1.7003, 0.1677, 0.7442, 1.5731, 0.4927, 0.4143, 0.558, 0.2486, 0.3009, 0.163, 2.645, 4.1364, 13.8043, 3.9997, 0.258, 0.78, 10.3991, 0.2425, 0.3335, 4.8002, 0.3529, 5.9263, 0.151, 0.34, 0.1146, 13.6505, 2.8802, 3.2738, 0.5562, 0.5067, 1.5142, 2.0373, 2.5427, 12.1005]
y = [4.4903, 6.8879, 5.6211, 5.1128, 1.8125, 4.9716, 2.6847, 5.3744, 6.5254, 3.875, 3.6667, 2.0, 6.9811, 6.0501, 6.0, 6.8478, 5.0, 5.3676, 3.403, 6.1015, 6.8793, 4.7684, 3.5934, 2.6224, 5.9319, 1.8191, 3.0554, 3.5207, 3.6786, 3.0, 5.9041, 1.9128, 6.3333, 5.4949, 5.7135, 6.0, 5.5348, 3.0, 5.2644, 5.8111, 1.093, 4.0, 7.0, 6.0, 3.8684, 4.8, 1.5283, 6.6932, 7.0, 4.0]
# plot the scatter_plot
xposition = [0,7]
plt.figure()
plt.plot(y,x,'r^', label='series_1',markersize=12)
plt.gcf().set_size_inches(11.7, 8.27)
ax = plt.gca()
ax.tick_params(axis = 'both', which = 'major', labelsize = 16)
for xc in range(0,xposition[-1]+1):
ax.axvline(x=xc, color='darkgrey', linestyle='--', linewidth = 2)
plt.xlabel('x', fontsize=18)
plt.ylabel('y', fontsize=18)
plt.xlim(xposition)
plt.ylim([0,15])
plt.legend(loc='upper right',fontsize = 'x-large')
plt.gca().set_aspect('equal')
heatmap, xedges, yedges = np.histogram2d(y, x, bins=50)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
# plot the heatmap
plt.figure()
#plt.imshow(heatmap.T, extent=extent, interpolation='nearest', origin='lower')
plt.pcolormesh(xedges, yedges, heatmap.transpose(), cmap=CM.RdBu_r, vmin=-7, vmax=7)
plt.gcf().set_size_inches(11.7, 8.27)
plt.gca().set_aspect('equal')
plt.show()
Also, why don't you try to use subplot instead of two figures like in the following example? You might run into some problems with adding a colorbar though, but it's solvable.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm as CM
x = [0.3178, 2.0857, 2.5922, 0.088, 0.3, 0.4006, 1.0241, 0.1913, 0.56, 1.1828, 2.6879, 5.8044, 0.3593, 1.8732, 10.8003, 0.3457, 1.7003, 0.1677, 0.7442, 1.5731, 0.4927, 0.4143, 0.558, 0.2486, 0.3009, 0.163, 2.645, 4.1364, 13.8043, 3.9997, 0.258, 0.78, 10.3991, 0.2425, 0.3335, 4.8002, 0.3529, 5.9263, 0.151, 0.34, 0.1146, 13.6505, 2.8802, 3.2738, 0.5562, 0.5067, 1.5142, 2.0373, 2.5427, 12.1005]
y = [4.4903, 6.8879, 5.6211, 5.1128, 1.8125, 4.9716, 2.6847, 5.3744, 6.5254, 3.875, 3.6667, 2.0, 6.9811, 6.0501, 6.0, 6.8478, 5.0, 5.3676, 3.403, 6.1015, 6.8793, 4.7684, 3.5934, 2.6224, 5.9319, 1.8191, 3.0554, 3.5207, 3.6786, 3.0, 5.9041, 1.9128, 6.3333, 5.4949, 5.7135, 6.0, 5.5348, 3.0, 5.2644, 5.8111, 1.093, 4.0, 7.0, 6.0, 3.8684, 4.8, 1.5283, 6.6932, 7.0, 4.0]
# plot the scatter_plot
xposition = [0,7]
plt.figure()
ax1 = plt.subplot(1,2,1)
plt.plot(y,x,'r^', label='series_1',markersize=12)
plt.gcf().set_size_inches(11.7, 8.27)
ax1.tick_params(axis = 'both', which = 'major', labelsize = 16)
for xc in range(0,xposition[-1]+1):
ax1.axvline(x=xc, color='darkgrey', linestyle='--', linewidth = 2)
plt.xlabel('x', fontsize=18)
plt.ylabel('y', fontsize=18)
plt.xlim(xposition)
plt.ylim([0,15])
plt.legend(loc='upper right',fontsize = 'x-large')
plt.gca().set_aspect('equal')
heatmap, xedges, yedges = np.histogram2d(y, x, bins=50)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
# plot the heatmap
#plt.figure()
#plt.imshow(heatmap.T, extent=extent, interpolation='nearest', origin='lower')
ax2 = plt.subplot(1,2,2,sharex=ax1,sharey=ax1)
heatmap_copy = heatmap.transpose()
heatmap_copy[heatmap_copy==0] = np.nan
plt.pcolormesh(xedges, yedges, heatmap_copy, cmap=CM.RdBu_r, vmin=-7, vmax=7)
ax2.set_aspect('equal')
plt.xlabel('x', fontsize=18)
plt.ylabel('y', fontsize=18)
plt.ylim([0,3])
ax2.tick_params(axis = 'both', which = 'major', labelsize = 16)
for xc in range(0,xposition[-1]+1):
ax2.axvline(x=xc, color='darkgrey', linestyle='--', linewidth = 2)
plt.show()

How to put labels on plot markers in pyplot (not a scatter plot)

I am plotting a Precision/Recall Curve and want to put specific labels for each marker in the plot.
Here is the code that generates the plot:
from matplotlib import pyplot
pyplot.plot([0, 100], [94, 100], linestyle='--')
pyplot.xlabel("Recall")
pyplot.ylabel("Precision")
list_of_rec = [
99.96,99.96,99.96,99.96,99.96,99.96,99.8,98.25,96.59,93.37,83.74,63.53,48.72,25.05,10.7,4.27,0.73,0.23]
list_of_prec = [
94.12,94.12,94.12,94.12,94.12,94.12,94.42,95.14,95.92,96.57,97.33,98.26,98.72,99.0,99.0,99.17,99.75,99.19]
list_of_markers = [
0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5
]
# plot the precision-recall curve for the model
pyplot.plot(list_of_rec, list_of_prec, marker='*', markersize=8)
pyplot.show()
This gives me the following plot:
For each of the markers in the plot (*) I want to label them with text from the list_of_markers. Don't seem to find an option to pass a list of text labels to the plot anywhere, any help appreciated.
You can annotate each of your markers by looping through them and putting the labels as text annotations
for x, y, text in zip(list_of_rec, list_of_prec, list_of_markers):
plt.text(x, y, text)

Categories

Resources