Stop matplotlib repeating labels in legend - python

Here is a very simplified example:
xvalues = [2,3,4,6]
for x in xvalues:
plt.axvline(x,color='b',label='xvalues')
plt.legend()
The legend will now show 'xvalues' as a blue line 4 times in the legend.
Is there a more elegant way of fixing this than the following?
for i,x in enumerate(xvalues):
if not i:
plt.axvline(x,color='b',label='xvalues')
else:
plt.axvline(x,color='b')

plt.legend takes as parameters
A list of axis handles which are Artist objects
A list of labels which are strings
These parameters are both optional defaulting to plt.gca().get_legend_handles_labels().
You can remove duplicate labels by putting them in a dictionary before calling legend. This is because dicts can't have duplicate keys.
For example:
For Python versions < 3.7
from collections import OrderedDict
import matplotlib.pyplot as plt
handles, labels = plt.gca().get_legend_handles_labels()
by_label = OrderedDict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys())
For Python versions > 3.7
As of Python 3.7, dictionaries retain input order by default. Thus, there is no need for OrderedDict form the collections module.
import matplotlib.pyplot as plt
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys())
Docs for plt.legend

handles, labels = ax.get_legend_handles_labels()
handle_list, label_list = [], []
for handle, label in zip(handles, labels):
if label not in label_list:
handle_list.append(handle)
label_list.append(label)
plt.legend(handle_list, label_list)

I don't know if this can be considered "elegant", but you can have your label a variable that gets set to "_nolegend_" after first usage:
my_label = "xvalues"
xvalues = [2,3,4,6]
for x in xvalues:
plt.axvline(x, color='b', label=my_label)
my_label = "_nolegend_"
plt.legend()
This can be generalized using a dictionary of labels if you have to put several labels:
my_labels = {"x1" : "x1values", "x2" : "x2values"}
x1values = [1, 3, 5]
x2values = [2, 4, 6]
for x in x1values:
plt.axvline(x, color='b', label=my_labels["x1"])
my_labels["x1"] = "_nolegend_"
for x in x2values:
plt.axvline(x, color='r', label=my_labels["x2"])
my_labels["x2"] = "_nolegend_"
plt.legend()
(Answer inspired by https://stackoverflow.com/a/19386045/1878788)

Problem - 3D Array
Questions: Nov 2012, Oct 2013
import numpy as np
a = np.random.random((2, 100, 4))
b = np.random.random((2, 100, 4))
c = np.random.random((2, 100, 4))
Solution - dict uniqueness
For my case _nolegend_ (bli and DSM) would not work, nor would label if i==0. ecatmur's answer uses get_legend_handles_labels and reduces the legend down with collections.OrderedDict. Fons demonstrates this is possible without an import.
Inline with these answers, I suggest using dict for unique labels.
# Step-by-step
ax = plt.gca() # Get the axes you need
a = ax.get_legend_handles_labels() # a = [(h1 ... h2) (l1 ... l2)] non unique
b = {l:h for h,l in zip(*a)} # b = {l1:h1, l2:h2} unique
c = [*zip(*b.items())] # c = [(l1 l2) (h1 h2)]
d = c[::-1] # d = [(h1 h2) (l1 l2)]
plt.legend(*d)
Or
plt.legend(*[*zip(*{l:h for h,l in zip(*ax.get_legend_handles_labels())}.items())][::-1])
Maybe less legible and memorable than Matthew Bourque's solution. Code golf welcome.
Example
import numpy as np
a = np.random.random((2, 100, 4))
b = np.random.random((2, 100, 4))
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1)
ax.plot(*a, 'C0', label='a')
ax.plot(*b, 'C1', label='b')
ax.legend(*[*zip(*{l:h for h,l in zip(*ax.get_legend_handles_labels())}.items())][::-1])
# ax.legend() # Old, ^ New
plt.show()

Based on answer https://stackoverflow.com/a/13589144/9132798 and https://stackoverflow.com/a/19386045/9132798
plt.gca().get_legend_handles_labels()[1] gives a list of names, it is possible to check if the label is already in the list while in the loop plotting (label= name[i] if name[i] not in plt.gca().get_legend_handles_labels()[1] else '').
For the given example this solution would look like:
import matplotlib.pyplot as plt
xvalues = [2,3,4,6]
for x in xvalues:
plt.axvline(x,color='b',\
label= 'xvalues' if 'xvalues' \
not in plt.gca().get_legend_handles_labels()[1] else '')
plt.legend()
Which is much shorter than https://stackoverflow.com/a/13589144/9132798 and more flexible than https://stackoverflow.com/a/19386045/9132798 as it could be use for any kind of loop any plot function in the loop individually.
However, for many cycles it probably slower than https://stackoverflow.com/a/13589144/9132798.

These code snippets didn't work for me personally. I was plotting two different groups in two different colors. The legend would show two red markers and two blue markers, when I only wanted to see one per color. I'll paste a simplified version of what did work for me:
Import statements
import matplotlib.pyplot as plt
from matplotlib.legend_handler import HandlerLine2D
Plot data
points_grp, = plt.plot(x[grp_idx], y[grp_idx], color=c.c[1], marker=m, ms=4, lw=0, label=leglab[1])
points_ctrl, = plt.plot(x[ctrl_idx], y[ctrl_idx], color=c.c[0], marker=m, ms=4, lw=0, label=leglab[0])
Add legend
points_dict = {points_grp: HandlerLine2D(numpoints=1),points_ctrl: HandlerLine2D(numpoints=1)}
leg = ax.legend(fontsize=12, loc='upper left', bbox_to_anchor=(1, 1.03),handler_map=points_dict)

Related

redundant legends in python matplot lib

I am plotting two conditions and want only two legends. But there are replicates in my data, and I am getting a separate legend for each replicate. Why? I apologize if this has previously been addressed, but I have spent an embarrassing amount of time on this and much of what I find seems overly complex for my situation. Any help would be appreciated.
import matplotlib.pyplot as plt
import pandas as pd
#####read and organize data
alldata = pd.read_csv('Fig_1.csv')
CondtionA = list(zip(alldata.iloc[:,1],alldata.iloc[:,2]))
ConditionB = list(zip(alldata.iloc[:,7],alldata.iloc[:,8]))
### make the figure
fig, ax = plt.subplots()
plt.plot(alldata['Temperature'],ConditionA,linewidth = 1,c='k', linestyle = '--',label = 'ConditionA')
plt.plot(alldata['Temperature'],ConditionB,linewidth = 1,c='k', label = "ConditonB")
ax.legend(numpoints=1)
plt.show()
a) use returned lines
You should be able to create a legend from only the first item of the returned lines of each plot call.
lines1 = plt.plot(...)
lines2 = plt.plot(...)
plt.legend(handles=(lines1[0], lines2[0]), labels=("Label A", "Label B"))
The drawback here is that you need to name the labels again manually.
b) select every second legend handle/label
If that is undesired, but if in turn you know that you want to use every second handle and label from the originally created legend, you can get those handles and labels via get_legend_handles_labels().
handles, labels = plt.gca().get_legend_handles_labels()
plt.legend(handles[::2], labels[::2])
Reproducible example:
import numpy as np; np.random.seed(10)
import matplotlib.pyplot as plt
x=np.arange(10)
a = np.cumsum(np.cumsum(np.random.randn(10,2), axis=0), axis=1)
b = np.cumsum(np.cumsum(np.random.randn(10,2), axis=0), axis=1)+6
lines1 = plt.plot(x,a, label="Label A", color="k")
lines2 = plt.plot(x,b, label="Label B", color="k", linestyle="--")
# either:
plt.legend(handles=(lines1[0], lines2[0]), labels=("Label A", "Label B"))
# or alternatively:
handles, labels = plt.gca().get_legend_handles_labels()
plt.legend(handles[::2], labels[::2])
plt.show()
If you remove
ax.legend(numpoints=1)
and add
plt.legend(handles=[p1,p2], bbox_to_anchor=(0.75, 1), loc=2, borderaxespad=0.)
You will get only one legend.
So your code will look like
import matplotlib.pyplot as plt
import pandas as pd
#####read and organize data
alldata = pd.read_csv('Fig_1.csv')
CondtionA = list(zip(alldata.iloc[:,1],alldata.iloc[:,2]))
ConditionB = list(zip(alldata.iloc[:,7],alldata.iloc[:,8]))
### make the figure
fig, ax = plt.subplots()
p1 = plt.plot(alldata['Temperature'],ConditionA,linewidth = 1,c='k', linestyle = '--',label = 'ConditionA')
p2 = plt.plot(alldata['Temperature'],ConditionB,linewidth = 1,c='k', label = "ConditonB")
#ax.legend(numpoints=1)
plt.legend(handles=[p1,p2], bbox_to_anchor=(0.75, 1), loc=2, borderaxespad=0.)
plt.show()

Matplotlib: Automatic coloured legend for all subplots using subplot line labels

The code below achieves what I want to do, but does so in a very roundabout way. I have looked around for a succinct way to produce a single legend for a figure that includes multiple subplots that takes into account their labels, to no avail. plt.figlegend() requires you to pass in labels and lines, and plt.legend() requires only handles (slightly better).
My example below illustrates what I want. I have 9 vectors, each with one of 3 categories. I want to plot each vector on a separate sub plot, label it, and plot a legend which indicates (using colour) what the label means; this is the automatic behaviour on a single plot.
Do you know of a better way of achieving the plot below?
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
nr_lines = 9
nr_cats = 3
np.random.seed(1337)
# Data
X = np.random.randn(nr_lines, 100)
labels = ['Category {}'.format(ii) for ii in range(nr_cats)]
y = np.random.choice(labels, nr_lines)
# Ideally wouldn't have to manually pick colours
clrs = matplotlib.rcParams['axes.prop_cycle'].by_key()['color']
clrs = [clrs[ii] for ii in range(nr_cats)]
lab_clr = {k: v for k, v in zip(labels, clrs)}
fig, ax = plt.subplots(3, 3)
ax = ax.flatten()
for ii in range(nr_lines):
ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
lines = [a.lines[0] for a in ax]
l_labels = [l.get_label() for l in lines]
# the hack - get a single occurance of each label
idx_list = [l_labels.index(lab) for lab in labels]
lines_ = [lines[idx] for idx in idx_list]
#l_labels_ = [l_labels[idx] for idx in idx_list]
plt.legend(handles=lines_, bbox_to_anchor=[2, 2.5])
plt.tight_layout()
plt.savefig('/home/james/Downloads/stack_figlegend_example.png',
bbox_inches='tight')
You could use a dictionary to collect them using the label as a key. For example:
handles = {}
for ii in range(nr_lines):
l1, = ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
if y[ii] not in handles:
handles[y[ii]] = l1
plt.legend(handles=handles.values(), bbox_to_anchor=[2, 2.5])
You only add a handle to the dictionary if the category isn't already present.

Python, matplotlib - legend based on a conditional variable -

I have a data, which can be used in scatter plot.
I also have labels for the same data. So I am using conditional coloring:
# import needed things
fig = plt.figure()
r = fig.add_subplot(121)
r.scatter(np.arange(500), X[ :500, 0] c = Y[:500]
# x and y labels set here
g = fig.add_subplot(122)
g.scatter(np.arange(500), X[ :500, 1] c = Y[:500]
# x and y labels set here
plt.show()
I need to have a legend as well, suggesting which type has which color. I tried this:
plt.legend((r, g), ("one", "zero"), scatterpoints = 1, loc = "upper left")
but I get a warning
.../site-packages/matplotlib/legend.py:633: UserWarning: Legend does not support <matplotlib.axes._subplots.AxesSubplot object at 0x7fe37f460668> instances.
A proxy artist may be used instead.
and legend is not displayed.
I was able to run your code by substituting
r.scatter(np.arange(500), np.arange(500), c= np.arange(500))
g.scatter(np.arange(500), np.arange(500), c= np.arange(500))
I got a similar error that points me to a page on matplotlib.org, see below:
/Users/sdurant/anaconda/lib/python2.7/site-packages/matplotlib/legend.py:611: UserWarning: Legend does not support instances.
A proxy artist may be used instead.
See: http://matplotlib.org/users/legend_guide.html#using-proxy-artist
"#using-proxy-artist".format(orig_handle))
I don't understand exactly what you want your legend to look like, but that page has examples for a few types, hope this helps.
Edit: That makes more sense, is this roughly what you are looking for then?
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
red_patch = mpatches.Patch(color='red', label='one')
blue_patch = mpatches.Patch(color='blue', label='zero')
area = np.pi *30
fig = plt.figure()
r = fig.add_subplot(121)
r.scatter(np.arange(10), np.arange(10), c= [random.randint(2) for k in range(10)], s=area)
# x and y labels set here
plt.legend(handles=[red_patch,blue_patch])
g = fig.add_subplot(122)
g.scatter(np.arange(10), np.arange(10), c= [random.randint(2) for k in range(10)], s=area)
# x and y labels set here
plt.legend(handles=[red_patch,blue_patch])

Discrete colorbar in matplotlib [duplicate]

How does one set the color of a line in matplotlib with scalar values provided at run time using a colormap (say jet)? I tried a couple of different approaches here and I think I'm stumped. values[] is a storted array of scalars. curves are a set of 1-d arrays, and labels are an array of text strings. Each of the arrays have the same length.
fig = plt.figure()
ax = fig.add_subplot(111)
jet = colors.Colormap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
retLine, = ax.plot(line, color=colorVal)
#retLine.set_color()
lines.append(retLine)
ax.legend(lines, labels, loc='upper right')
ax.grid()
plt.show()
The error you are receiving is due to how you define jet. You are creating the base class Colormap with the name 'jet', but this is very different from getting the default definition of the 'jet' colormap. This base class should never be created directly, and only the subclasses should be instantiated.
What you've found with your example is a buggy behavior in Matplotlib. There should be a clearer error message generated when this code is run.
This is an updated version of your example:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
import numpy as np
# define some random data that emulates your indeded code:
NCURVES = 10
np.random.seed(101)
curves = [np.random.random(20) for i in range(NCURVES)]
values = range(NCURVES)
fig = plt.figure()
ax = fig.add_subplot(111)
# replace the next line
#jet = colors.Colormap('jet')
# with
jet = cm = plt.get_cmap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
print scalarMap.get_clim()
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
colorText = (
'color: (%4.2f,%4.2f,%4.2f)'%(colorVal[0],colorVal[1],colorVal[2])
)
retLine, = ax.plot(line,
color=colorVal,
label=colorText)
lines.append(retLine)
#added this to get the legend to work
handles,labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper right')
ax.grid()
plt.show()
Resulting in:
Using a ScalarMappable is an improvement over the approach presented in my related answer:
creating over 20 unique legend colors using matplotlib
I thought it would be beneficial to include what I consider to be a more simple method using numpy's linspace coupled with matplotlib's cm-type object. It's possible that the above solution is for an older version. I am using the python 3.4.3, matplotlib 1.4.3, and numpy 1.9.3., and my solution is as follows.
import matplotlib.pyplot as plt
from matplotlib import cm
from numpy import linspace
start = 0.0
stop = 1.0
number_of_lines= 1000
cm_subsection = linspace(start, stop, number_of_lines)
colors = [ cm.jet(x) for x in cm_subsection ]
for i, color in enumerate(colors):
plt.axhline(i, color=color)
plt.ylabel('Line Number')
plt.show()
This results in 1000 uniquely-colored lines that span the entire cm.jet colormap as pictured below. If you run this script you'll find that you can zoom in on the individual lines.
Now say I want my 1000 line colors to just span the greenish portion between lines 400 to 600. I simply change my start and stop values to 0.4 and 0.6 and this results in using only 20% of the cm.jet color map between 0.4 and 0.6.
So in a one line summary you can create a list of rgba colors from a matplotlib.cm colormap accordingly:
colors = [ cm.jet(x) for x in linspace(start, stop, number_of_lines) ]
In this case I use the commonly invoked map named jet but you can find the complete list of colormaps available in your matplotlib version by invoking:
>>> from matplotlib import cm
>>> dir(cm)
A combination of line styles, markers, and qualitative colors from matplotlib:
import itertools
import matplotlib as mpl
import matplotlib.pyplot as plt
N = 8*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, colormap)):
plt.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=4);
UPDATE: Supporting not only ListedColormap, but also LinearSegmentedColormap
import itertools
import matplotlib.pyplot as plt
Ncolors = 8
#colormap = plt.cm.Dark2# ListedColormap
colormap = plt.cm.viridis# LinearSegmentedColormap
Ncolors = min(colormap.N,Ncolors)
mapcolors = [colormap(int(x*colormap.N/Ncolors)) for x in range(Ncolors)]
N = Ncolors*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
fig,ax = plt.subplots(gridspec_kw=dict(right=0.6))
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, mapcolors)):
ax.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=3,prop={'size': 8})
U may do as I have written from my deleted account (ban for new posts :( there was). Its rather simple and nice looking.
Im using 3-rd one of these 3 ones usually, also I wasny checking 1 and 2 version.
from matplotlib.pyplot import cm
import numpy as np
#variable n should be number of curves to plot (I skipped this earlier thinking that it is obvious when looking at picture - sorry my bad mistake xD): n=len(array_of_curves_to_plot)
#version 1:
color=cm.rainbow(np.linspace(0,1,n))
for i,c in zip(range(n),color):
ax1.plot(x, y,c=c)
#or version 2: - faster and better:
color=iter(cm.rainbow(np.linspace(0,1,n)))
c=next(color)
plt.plot(x,y,c=c)
#or version 3:
color=iter(cm.rainbow(np.linspace(0,1,n)))
for i in range(n):
c=next(color)
ax1.plot(x, y,c=c)
example of 3:
Ship RAO of Roll vs Ikeda damping in function of Roll amplitude A44

matplotlib: Group boxplots

Is there a way to group boxplots in matplotlib?
Assume we have three groups "A", "B", and "C" and for each we want to create a boxplot for both "apples" and "oranges". If a grouping is not possible directly, we can create all six combinations and place them linearly side by side. What would be to simplest way to visualize the groupings? I'm trying to avoid setting the tick labels to something like "A + apples" since my scenario involves much longer names than "A".
How about using colors to differentiate between "apples" and "oranges" and spacing to separate "A", "B" and "C"?
Something like this:
from pylab import plot, show, savefig, xlim, figure, \
hold, ylim, legend, boxplot, setp, axes
# function for setting the colors of the box plots pairs
def setBoxColors(bp):
setp(bp['boxes'][0], color='blue')
setp(bp['caps'][0], color='blue')
setp(bp['caps'][1], color='blue')
setp(bp['whiskers'][0], color='blue')
setp(bp['whiskers'][1], color='blue')
setp(bp['fliers'][0], color='blue')
setp(bp['fliers'][1], color='blue')
setp(bp['medians'][0], color='blue')
setp(bp['boxes'][1], color='red')
setp(bp['caps'][2], color='red')
setp(bp['caps'][3], color='red')
setp(bp['whiskers'][2], color='red')
setp(bp['whiskers'][3], color='red')
setp(bp['fliers'][2], color='red')
setp(bp['fliers'][3], color='red')
setp(bp['medians'][1], color='red')
# Some fake data to plot
A= [[1, 2, 5,], [7, 2]]
B = [[5, 7, 2, 2, 5], [7, 2, 5]]
C = [[3,2,5,7], [6, 7, 3]]
fig = figure()
ax = axes()
hold(True)
# first boxplot pair
bp = boxplot(A, positions = [1, 2], widths = 0.6)
setBoxColors(bp)
# second boxplot pair
bp = boxplot(B, positions = [4, 5], widths = 0.6)
setBoxColors(bp)
# thrid boxplot pair
bp = boxplot(C, positions = [7, 8], widths = 0.6)
setBoxColors(bp)
# set axes limits and labels
xlim(0,9)
ylim(0,9)
ax.set_xticklabels(['A', 'B', 'C'])
ax.set_xticks([1.5, 4.5, 7.5])
# draw temporary red and blue lines and use them to create a legend
hB, = plot([1,1],'b-')
hR, = plot([1,1],'r-')
legend((hB, hR),('Apples', 'Oranges'))
hB.set_visible(False)
hR.set_visible(False)
savefig('boxcompare.png')
show()
Here is my version. It stores data based on categories.
import matplotlib.pyplot as plt
import numpy as np
data_a = [[1,2,5], [5,7,2,2,5], [7,2,5]]
data_b = [[6,4,2], [1,2,5,3,2], [2,3,5,1]]
ticks = ['A', 'B', 'C']
def set_box_color(bp, color):
plt.setp(bp['boxes'], color=color)
plt.setp(bp['whiskers'], color=color)
plt.setp(bp['caps'], color=color)
plt.setp(bp['medians'], color=color)
plt.figure()
bpl = plt.boxplot(data_a, positions=np.array(xrange(len(data_a)))*2.0-0.4, sym='', widths=0.6)
bpr = plt.boxplot(data_b, positions=np.array(xrange(len(data_b)))*2.0+0.4, sym='', widths=0.6)
set_box_color(bpl, '#D7191C') # colors are from http://colorbrewer2.org/
set_box_color(bpr, '#2C7BB6')
# draw temporary red and blue lines and use them to create a legend
plt.plot([], c='#D7191C', label='Apples')
plt.plot([], c='#2C7BB6', label='Oranges')
plt.legend()
plt.xticks(xrange(0, len(ticks) * 2, 2), ticks)
plt.xlim(-2, len(ticks)*2)
plt.ylim(0, 8)
plt.tight_layout()
plt.savefig('boxcompare.png')
I am short of reputation so I cannot post an image to here.
You can run it and see the result. Basically it's very similar to what Molly did.
Note that, depending on the version of python you are using, you may need to replace xrange with range
A simple way would be to use pandas.
I adapted an example from the plotting documentation:
In [1]: import pandas as pd, numpy as np
In [2]: df = pd.DataFrame(np.random.rand(12,2), columns=['Apples', 'Oranges'] )
In [3]: df['Categories'] = pd.Series(list('AAAABBBBCCCC'))
In [4]: pd.options.display.mpl_style = 'default'
In [5]: df.boxplot(by='Categories')
Out[5]:
array([<matplotlib.axes.AxesSubplot object at 0x51a5190>,
<matplotlib.axes.AxesSubplot object at 0x53fddd0>], dtype=object)
Mock data:
df = pd.DataFrame({'Group':['A','A','A','B','C','B','B','C','A','C'],\
'Apple':np.random.rand(10),'Orange':np.random.rand(10)})
df = df[['Group','Apple','Orange']]
Group Apple Orange
0 A 0.465636 0.537723
1 A 0.560537 0.727238
2 A 0.268154 0.648927
3 B 0.722644 0.115550
4 C 0.586346 0.042896
5 B 0.562881 0.369686
6 B 0.395236 0.672477
7 C 0.577949 0.358801
8 A 0.764069 0.642724
9 C 0.731076 0.302369
You can use the Seaborn library for these plots. First melt the dataframe to format data and then create the boxplot of your choice.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
dd=pd.melt(df,id_vars=['Group'],value_vars=['Apple','Orange'],var_name='fruits')
sns.boxplot(x='Group',y='value',data=dd,hue='fruits')
The accepted answer uses pylab and works for 2 groups. What if we have more?
Here is the flexible generic solution with matplotlib
import matplotlib.pyplot as pl
# there are 4 individuals, each one tested under 3 different settings
# --- Random data, e.g. results per algorithm:
# Invidual 1
d1_1 = [1,1,2,2,3,3]
d1_2 = [3,3,4,4,5,5]
d1_3 = [5,5,6,6,7,7]
# Individual 2
d2_1 = [7,7,8,8,9,9]
d2_2 = [9,9,10,10,11,11]
d2_3 = [11,11,12,12,13,13]
# Individual 3
d3_1 = [1,2,3,4,5,6]
d3_2 = [4,5,6,7,8,9]
d3_3 = [10,11,12,13,14,15]
# Individual 4
d4_1 = [1,1,2,2,3,3]
d4_2 = [9,9,10,10,11,11]
d4_3 = [10,11,12,13,14,15]
# --- Combining your data:
data_group1 = [d1_1, d1_2, d1_3]
data_group2 = [d2_1, d2_2, d2_3]
data_group3 = [d3_1, d3_2, d3_3]
data_group4 = [d4_1, d4_2, d4_3]
colors = ['pink', 'lightblue', 'lightgreen', 'violet']
# we compare the performances of the 4 individuals within the same set of 3 settings
data_groups = [data_group1, data_group2, data_group3, data_group4]
# --- Labels for your data:
labels_list = ['a','b', 'c']
width = 1/len(labels_list)
xlocations = [ x*((1+ len(data_groups))*width) for x in range(len(data_group1)) ]
symbol = 'r+'
ymin = min ( [ val for dg in data_groups for data in dg for val in data ] )
ymax = max ( [ val for dg in data_groups for data in dg for val in data ])
ax = pl.gca()
ax.set_ylim(ymin,ymax)
ax.grid(True, linestyle='dotted')
ax.set_axisbelow(True)
pl.xlabel('X axis label')
pl.ylabel('Y axis label')
pl.title('title')
space = len(data_groups)/2
offset = len(data_groups)/2
# --- Offset the positions per group:
group_positions = []
for num, dg in enumerate(data_groups):
_off = (0 - space + (0.5+num))
print(_off)
group_positions.append([x+_off*(width+0.01) for x in xlocations])
for dg, pos, c in zip(data_groups, group_positions, colors):
boxes = ax.boxplot(dg,
sym=symbol,
labels=['']*len(labels_list),
# labels=labels_list,
positions=pos,
widths=width,
boxprops=dict(facecolor=c),
# capprops=dict(color=c),
# whiskerprops=dict(color=c),
# flierprops=dict(color=c, markeredgecolor=c),
medianprops=dict(color='grey'),
# notch=False,
# vert=True,
# whis=1.5,
# bootstrap=None,
# usermedians=None,
# conf_intervals=None,
patch_artist=True,
)
ax.set_xticks( xlocations )
ax.set_xticklabels( labels_list, rotation=0 )
pl.show()
Just to add to the conversation, I have found a more elegant way to change the color of the box plot by iterating over the dictionary of the object itself
import numpy as np
import matplotlib.pyplot as plt
def color_box(bp, color):
# Define the elements to color. You can also add medians, fliers and means
elements = ['boxes','caps','whiskers']
# Iterate over each of the elements changing the color
for elem in elements:
[plt.setp(bp[elem][idx], color=color) for idx in xrange(len(bp[elem]))]
return
a = np.random.uniform(0,10,[100,5])
bp = plt.boxplot(a)
color_box(bp, 'red')
Cheers!
Here's a function I wrote that takes Molly's code and some other code I've found on the internet to make slightly fancier grouped boxplots:
import numpy as np
import matplotlib.pyplot as plt
def custom_legend(colors, labels, linestyles=None):
""" Creates a list of matplotlib Patch objects that can be passed to the legend(...) function to create a custom
legend.
:param colors: A list of colors, one for each entry in the legend. You can also include a linestyle, for example: 'k--'
:param labels: A list of labels, one for each entry in the legend.
"""
if linestyles is not None:
assert len(linestyles) == len(colors), "Length of linestyles must match length of colors."
h = list()
for k,(c,l) in enumerate(zip(colors, labels)):
clr = c
ls = 'solid'
if linestyles is not None:
ls = linestyles[k]
patch = patches.Patch(color=clr, label=l, linestyle=ls)
h.append(patch)
return h
def grouped_boxplot(data, group_names=None, subgroup_names=None, ax=None, subgroup_colors=None,
box_width=0.6, box_spacing=1.0):
""" Draws a grouped boxplot. The data should be organized in a hierarchy, where there are multiple
subgroups for each main group.
:param data: A dictionary of length equal to the number of the groups. The key should be the
group name, the value should be a list of arrays. The length of the list should be
equal to the number of subgroups.
:param group_names: (Optional) The group names, should be the same as data.keys(), but can be ordered.
:param subgroup_names: (Optional) Names of the subgroups.
:param subgroup_colors: A list specifying the plot color for each subgroup.
:param ax: (Optional) The axis to plot on.
"""
if group_names is None:
group_names = data.keys()
if ax is None:
ax = plt.gca()
plt.sca(ax)
nsubgroups = np.array([len(v) for v in data.values()])
assert len(np.unique(nsubgroups)) == 1, "Number of subgroups for each property differ!"
nsubgroups = nsubgroups[0]
if subgroup_colors is None:
subgroup_colors = list()
for k in range(nsubgroups):
subgroup_colors.append(np.random.rand(3))
else:
assert len(subgroup_colors) == nsubgroups, "subgroup_colors length must match number of subgroups (%d)" % nsubgroups
def _decorate_box(_bp, _d):
plt.setp(_bp['boxes'], lw=0, color='k')
plt.setp(_bp['whiskers'], lw=3.0, color='k')
# fill in each box with a color
assert len(_bp['boxes']) == nsubgroups
for _k,_box in enumerate(_bp['boxes']):
_boxX = list()
_boxY = list()
for _j in range(5):
_boxX.append(_box.get_xdata()[_j])
_boxY.append(_box.get_ydata()[_j])
_boxCoords = zip(_boxX, _boxY)
_boxPolygon = plt.Polygon(_boxCoords, facecolor=subgroup_colors[_k])
ax.add_patch(_boxPolygon)
# draw a black line for the median
for _k,_med in enumerate(_bp['medians']):
_medianX = list()
_medianY = list()
for _j in range(2):
_medianX.append(_med.get_xdata()[_j])
_medianY.append(_med.get_ydata()[_j])
plt.plot(_medianX, _medianY, 'k', linewidth=3.0)
# draw a black asterisk for the mean
plt.plot([np.mean(_med.get_xdata())], [np.mean(_d[_k])], color='w', marker='*',
markeredgecolor='k', markersize=12)
cpos = 1
label_pos = list()
for k in group_names:
d = data[k]
nsubgroups = len(d)
pos = np.arange(nsubgroups) + cpos
label_pos.append(pos.mean())
bp = plt.boxplot(d, positions=pos, widths=box_width)
_decorate_box(bp, d)
cpos += nsubgroups + box_spacing
plt.xlim(0, cpos-1)
plt.xticks(label_pos, group_names)
if subgroup_names is not None:
leg = custom_legend(subgroup_colors, subgroup_names)
plt.legend(handles=leg)
You can use the function(s) like this:
data = { 'A':[np.random.randn(100), np.random.randn(100) + 5],
'B':[np.random.randn(100)+1, np.random.randn(100) + 9],
'C':[np.random.randn(100)-3, np.random.randn(100) -5]
}
grouped_boxplot(data, group_names=['A', 'B', 'C'], subgroup_names=['Apples', 'Oranges'], subgroup_colors=['#D02D2E', '#D67700'])
plt.show()
Grouped boxplots, towards subtle academic publication styling... (source)
(Left) Python 2.7.12 Matplotlib v1.5.3. (Right) Python 3.7.3. Matplotlib v3.1.0.
Code:
import numpy as np
import matplotlib.pyplot as plt
# --- Your data, e.g. results per algorithm:
data1 = [5,5,4,3,3,5]
data2 = [6,6,4,6,8,5]
data3 = [7,8,4,5,8,2]
data4 = [6,9,3,6,8,4]
# --- Combining your data:
data_group1 = [data1, data2]
data_group2 = [data3, data4]
# --- Labels for your data:
labels_list = ['a','b']
xlocations = range(len(data_group1))
width = 0.3
symbol = 'r+'
ymin = 0
ymax = 10
ax = plt.gca()
ax.set_ylim(ymin,ymax)
ax.set_xticklabels( labels_list, rotation=0 )
ax.grid(True, linestyle='dotted')
ax.set_axisbelow(True)
ax.set_xticks(xlocations)
plt.xlabel('X axis label')
plt.ylabel('Y axis label')
plt.title('title')
# --- Offset the positions per group:
positions_group1 = [x-(width+0.01) for x in xlocations]
positions_group2 = xlocations
plt.boxplot(data_group1,
sym=symbol,
labels=['']*len(labels_list),
positions=positions_group1,
widths=width,
# notch=False,
# vert=True,
# whis=1.5,
# bootstrap=None,
# usermedians=None,
# conf_intervals=None,
# patch_artist=False,
)
plt.boxplot(data_group2,
labels=labels_list,
sym=symbol,
positions=positions_group2,
widths=width,
# notch=False,
# vert=True,
# whis=1.5,
# bootstrap=None,
# usermedians=None,
# conf_intervals=None,
# patch_artist=False,
)
plt.savefig('boxplot_grouped.png')
plt.savefig('boxplot_grouped.pdf') # when publishing, use high quality PDFs
#plt.show() # uncomment to show the plot.
I used the code given by Kuzeko and it worked well, but I found that the boxes in each group were being drawn in the reverse order. I changed ...x-_off... to ...x+_off... in the following line (just above the last for loop) which fixes it for me:
group_positions.append([x+_off*(width+0.01) for x in xlocations])
A boxplot above was modified to obtain group boxplots with 3 data types.
import matplotlib.pyplot as plt
import numpy as np
ord = [[16.9423,
4.0410,
19.1185],
[18.5134,
17.8048,
19.2669],
[18.7286,
18.0576,
19.1717],
[18.8998,
18.8469,
19.0005],
[18.8126,
18.7870,
18.8393],
[18.7770,
18.7511,
18.8022],
[18.7409,
18.7075,
18.7747],
[18.6866,
18.6624,
18.7093
],
[18.6748],
[18.9069,
18.6752,
19.0769],
[19.0012,
18.9783,
19.0202
],
[18.9448,
18.9134,
18.9813],
[19.1242,
18.8256,
19.3185],
[19.2118,
19.1661,
19.2580],
[19.2505,
19.1231,
19.3526]]
seq = [[17.8092,
4.0410,
19.6653],
[18.7266,
18.2556,
19.3739],
[18.6051,
18.0589,
19.0557],
[18.6467,
18.5629,
18.7566],
[18.5307,
18.4999,
18.5684],
[18.4732,
18.4484,
18.4985],
[18.5234,
18.5027,
18.4797,
18.4573],
[18.3987,
18.3636,
18.4544],
[18.3593],
[18.7234,
18.7092,
18.7598],
[18.7438,
18.7224,
18.7677],
[18.7304,
18.7111,
18.6880,
18.6913,
18.6678],
[18.8926,
18.5902,
19.2003],
[19.1059,
19.0835,
19.0601,
19.0373,
19.0147],
[19.1925,
19.0177,
19.2588]]
apd=[[17.0331,
4.0410,
18.5670],
[17.6124,
17.1975,
18.0755],
[17.3956,
17.1572,
17.9140],
[17.8295,
17.6514,
18.1466],
[18.0665,
17.9144,
18.2157],
[18.1518,
18.0382,
18.2722],
[18.1975,
18.0956,
18.2987],
[18.2219,
18.1293,
18.3062],
[18.2870,
18.2215,
18.3513],
[18.3047,
18.2363,
18.3950],
[18.3580,
18.2923,
18.4205],
[18.3830,
18.3250,
18.4381],
[18.4135,
18.3645,
18.4753],
[18.4580,
18.4095,
18.5170],
[18.4900,
18.4430,
18.5435]
]
ticks = [120,
240,
360,
516,
662,
740,
874,
1022,
1081,
1201,
1320,
1451,
1562,
1680,
1863]
def set_box_color(bp, color):
plt.setp(bp['boxes'], color=color)
plt.setp(bp['whiskers'], color=color)
plt.setp(bp['caps'], color=color)
plt.setp(bp['medians'], color=color)
plt.figure()
bpl = plt.boxplot(ord, positions=np.array(range(len(ord)))*3.0-0.3, sym='', widths=0.6)
bpr = plt.boxplot(seq, positions=np.array(range(len(seq)))*3.0+0.3, sym='', widths=0.6)
bpg = plt.boxplot(apd, positions=np.array(range(len(apd)))*3.0+0.9, sym='', widths=0.6)
set_box_color(bpl, '#D7191C') # colors are from http://colorbrewer2.org/
set_box_color(bpr, '#2C7BB6')
set_box_color(bpg, '#99d8c9')
# draw temporary red and blue lines and use them to create a legend
plt.plot([], c='#D7191C', label='ORD')
plt.plot([], c='#2C7BB6', label='SEQ')
plt.plot([], c='#99d8c9', label='APD')
plt.legend()
plt.xticks(range(0, len(ticks) * 3, 3), ticks)
plt.xlim(-2, len(ticks)*3)
plt.ylim(0, 20)
plt.tight_layout()
plt.show()
plt.savefig('boxcompare.png')

Categories

Resources