redundant legends in python matplot lib - python

I am plotting two conditions and want only two legends. But there are replicates in my data, and I am getting a separate legend for each replicate. Why? I apologize if this has previously been addressed, but I have spent an embarrassing amount of time on this and much of what I find seems overly complex for my situation. Any help would be appreciated.
import matplotlib.pyplot as plt
import pandas as pd
#####read and organize data
alldata = pd.read_csv('Fig_1.csv')
CondtionA = list(zip(alldata.iloc[:,1],alldata.iloc[:,2]))
ConditionB = list(zip(alldata.iloc[:,7],alldata.iloc[:,8]))
### make the figure
fig, ax = plt.subplots()
plt.plot(alldata['Temperature'],ConditionA,linewidth = 1,c='k', linestyle = '--',label = 'ConditionA')
plt.plot(alldata['Temperature'],ConditionB,linewidth = 1,c='k', label = "ConditonB")
ax.legend(numpoints=1)
plt.show()

a) use returned lines
You should be able to create a legend from only the first item of the returned lines of each plot call.
lines1 = plt.plot(...)
lines2 = plt.plot(...)
plt.legend(handles=(lines1[0], lines2[0]), labels=("Label A", "Label B"))
The drawback here is that you need to name the labels again manually.
b) select every second legend handle/label
If that is undesired, but if in turn you know that you want to use every second handle and label from the originally created legend, you can get those handles and labels via get_legend_handles_labels().
handles, labels = plt.gca().get_legend_handles_labels()
plt.legend(handles[::2], labels[::2])
Reproducible example:
import numpy as np; np.random.seed(10)
import matplotlib.pyplot as plt
x=np.arange(10)
a = np.cumsum(np.cumsum(np.random.randn(10,2), axis=0), axis=1)
b = np.cumsum(np.cumsum(np.random.randn(10,2), axis=0), axis=1)+6
lines1 = plt.plot(x,a, label="Label A", color="k")
lines2 = plt.plot(x,b, label="Label B", color="k", linestyle="--")
# either:
plt.legend(handles=(lines1[0], lines2[0]), labels=("Label A", "Label B"))
# or alternatively:
handles, labels = plt.gca().get_legend_handles_labels()
plt.legend(handles[::2], labels[::2])
plt.show()

If you remove
ax.legend(numpoints=1)
and add
plt.legend(handles=[p1,p2], bbox_to_anchor=(0.75, 1), loc=2, borderaxespad=0.)
You will get only one legend.
So your code will look like
import matplotlib.pyplot as plt
import pandas as pd
#####read and organize data
alldata = pd.read_csv('Fig_1.csv')
CondtionA = list(zip(alldata.iloc[:,1],alldata.iloc[:,2]))
ConditionB = list(zip(alldata.iloc[:,7],alldata.iloc[:,8]))
### make the figure
fig, ax = plt.subplots()
p1 = plt.plot(alldata['Temperature'],ConditionA,linewidth = 1,c='k', linestyle = '--',label = 'ConditionA')
p2 = plt.plot(alldata['Temperature'],ConditionB,linewidth = 1,c='k', label = "ConditonB")
#ax.legend(numpoints=1)
plt.legend(handles=[p1,p2], bbox_to_anchor=(0.75, 1), loc=2, borderaxespad=0.)
plt.show()

Related

Plotting Errorbars from different DataFrame into SubPlots with matplotlib

i just stumpled upon a problem I simply cannot solve. I have a dataset with raw data which I will upload here: https://file.io/oJqkZjAGyqV1
Its an excel file with the data inside.
I then created some code to open it, read it, generate a mean and sem of my data as below.
# Import required packages
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from pylab import cm
df = pd.read_excel("Chlorophyll_data_mod.xlsx")
#----Calculation of meanvalues and sem from raw_data---------
meandf2 = df.set_index(["Group"])
sets = []
for x in ["A","B","AB","xc"]:
meandf3 = meandf2.filter(like=f"Chl_{x}_").reset_index()
sets.append(meandf3)
#---------Grouping DataFrame----------#
means = []
ster = []
for x in range(len(sets)):
meandf = sets[x].groupby(["Group"]).mean()
meandf = meandf.reset_index()
means.append(meandf)
sems = sets[x].groupby("Group").sem()
sems = sems.reset_index()
ster.append(sems)
#----Selecting Dataframe from List-----#
plotdf = means[0]
ploter = ster[0]
plotgroup = plotdf.iloc[:,[0,]]
plotdata = plotdf.iloc[:,[1,]]
grouparray = plotgroup.to_numpy()
dataarray = plotdata.to_numpy()
#-----CreatePlot------#
fig, ax = plt.subplots(nrows=3, ncols=1, sharex="all", figsize=(10,8))
plotdf.plot(ax=ax[0,],x="Group",y="Chl_A_0D", kind="bar", legend=False, color="black")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_0D"],yerr=ploter["Chl_A_0D"])
plotdf.plot(ax=ax[1,],x="Group",y="Chl_A_10DaT", kind="bar", legend=False, color="blue")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_10DaT"],yerr=ploter["Chl_A_10DaT"])
plotdf.plot(ax=ax[2,],x="Group",y="Chl_A_7DaR", kind="bar", legend=False, color="magenta")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_7DaR"],yerr=ploter["Chl_A_7DaR"])
#----Legend of the Plot-----#
fig.legend(loc="lower center", bbox_to_anchor=(0.5,0), fancybox=True, ncol=6)
#----Layout------#
plt.tight_layout(rect=[0, 0.02, 1,1])
plt.show()
And I manage to create a subplot, which shows 3 of my interested data points. However, I struggle with the error bars.
My approach was to calculate the sem and store it into a new dataframe. And then just read it from there for the yerr. However, this doesn't work.
plotdf.plot(ax=ax[2,],x="Group",y="Chl_A_7DaR", kind="bar", legend=False, color="magenta", yerr=ploter["Chl_A_7DaR"])
Results in an array error because of the structure.
And my current approach, as in the main code above only draws the error bars in the last subplot, but not in each individual plot.
Maybe here is someone who could help me understanding this function?
Best regards

Moving Graph Titles in the Y axis of Subplots

This question is adapted from this answer, however the solution provided does not work and following is my result. I am interested in adding individual title on the right side for individual subgraphs.
(p.s no matter how much offset for y-axis i provide the title seems to stay at the same y-value)
from matplotlib import pyplot as plt
import numpy as np
fig, axes = plt.subplots(nrows=2)
ax0label = axes[0].set_ylabel('Axes 0')
ax1label = axes[1].set_ylabel('Axes 1')
title = axes[0].set_title('Title')
offset = np.array([-0.15, 0.0])
title.set_position(ax0label.get_position() + offset)
title.set_rotation(90)
fig.tight_layout()
plt.show()
Something like this? This is the only other way i can think of.
from matplotlib import pyplot as plt
import numpy as np
fig, axes = plt.subplots(nrows=2)
ax0label = axes[0].set_ylabel('Axes 0')
ax1label = axes[1].set_ylabel('Axes 1')
ax01 = axes[0].twinx()
ax02 = axes[1].twinx()
ax01.set_ylabel('title')
ax02.set_ylabel('title')
fig.tight_layout()
plt.show()

Heatmap with circles indicating size of population

I would like to produce a heatmap in Python, similar to the one shown, where the size of the circle indicates the size of the sample in that cell. I looked in seaborn's gallery and couldn't find anything, and I don't think I can do this with matplotlib.
It's the inverse. While matplotlib can do pretty much everything, seaborn only provides a small subset of options.
So using matplotlib, you can plot a PatchCollection of circles as shown below.
Note: You could equally use a scatter plot, but since scatter dot sizes are in absolute units it would be rather hard to scale them into the grid.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
N = 10
M = 11
ylabels = ["".join(np.random.choice(list("PQRSTUVXYZ"), size=7)) for _ in range(N)]
xlabels = ["".join(np.random.choice(list("ABCDE"), size=3)) for _ in range(M)]
x, y = np.meshgrid(np.arange(M), np.arange(N))
s = np.random.randint(0, 180, size=(N,M))
c = np.random.rand(N, M)-0.5
fig, ax = plt.subplots()
R = s/s.max()/2
circles = [plt.Circle((j,i), radius=r) for r, j, i in zip(R.flat, x.flat, y.flat)]
col = PatchCollection(circles, array=c.flatten(), cmap="RdYlGn")
ax.add_collection(col)
ax.set(xticks=np.arange(M), yticks=np.arange(N),
xticklabels=xlabels, yticklabels=ylabels)
ax.set_xticks(np.arange(M+1)-0.5, minor=True)
ax.set_yticks(np.arange(N+1)-0.5, minor=True)
ax.grid(which='minor')
fig.colorbar(col)
plt.show()
Here's a possible solution using Bokeh Plots:
import pandas as pd
from bokeh.palettes import RdBu
from bokeh.models import LinearColorMapper, ColumnDataSource, ColorBar
from bokeh.models.ranges import FactorRange
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import numpy as np
output_notebook()
d = dict(x = ['A','A','A', 'B','B','B','C','C','C','D','D','D'],
y = ['B','C','D', 'A','C','D','B','D','A','A','B','C'],
corr = np.random.uniform(low=-1, high=1, size=(12,)).tolist())
df = pd.DataFrame(d)
df['size'] = np.where(df['corr']<0, np.abs(df['corr']), df['corr'])*50
#added a new column to make the plot size
colors = list(reversed(RdBu[9]))
exp_cmap = LinearColorMapper(palette=colors,
low = -1,
high = 1)
p = figure(x_range = FactorRange(), y_range = FactorRange(), plot_width=700,
plot_height=450, title="Correlation",
toolbar_location=None, tools="hover")
p.scatter("x","y",source=df, fill_alpha=1, line_width=0, size="size",
fill_color={"field":"corr", "transform":exp_cmap})
p.x_range.factors = sorted(df['x'].unique().tolist())
p.y_range.factors = sorted(df['y'].unique().tolist(), reverse = True)
p.xaxis.axis_label = 'Values'
p.yaxis.axis_label = 'Values'
bar = ColorBar(color_mapper=exp_cmap, location=(0,0))
p.add_layout(bar, "right")
show(p)
One option is to use matplotlib's scatter plots with legends and grid. You can specify size of those circles with specifying the scales. You can also change the color of each circle. You should somehow specify X,Y values so that the circles sit straight on lines. This is an example I got from here:
volume = np.random.rayleigh(27, size=40)
amount = np.random.poisson(10, size=40)
ranking = np.random.normal(size=40)
price = np.random.uniform(1, 10, size=40)
fig, ax = plt.subplots()
# Because the price is much too small when being provided as size for ``s``,
# we normalize it to some useful point sizes, s=0.3*(price*3)**2
scatter = ax.scatter(volume, amount, c=ranking, s=0.3*(price*3)**2,
vmin=-3, vmax=3, cmap="Spectral")
# Produce a legend for the ranking (colors). Even though there are 40 different
# rankings, we only want to show 5 of them in the legend.
legend1 = ax.legend(*scatter.legend_elements(num=5),
loc="upper left", title="Ranking")
ax.add_artist(legend1)
# Produce a legend for the price (sizes). Because we want to show the prices
# in dollars, we use the *func* argument to supply the inverse of the function
# used to calculate the sizes from above. The *fmt* ensures to show the price
# in dollars. Note how we target at 5 elements here, but obtain only 4 in the
# created legend due to the automatic round prices that are chosen for us.
kw = dict(prop="sizes", num=5, color=scatter.cmap(0.7), fmt="$ {x:.2f}",
func=lambda s: np.sqrt(s/.3)/3)
legend2 = ax.legend(*scatter.legend_elements(**kw),
loc="lower right", title="Price")
plt.show()
Output:
I don't have enough reputation to comment on Delenges' excellent answer, so I'll leave my comment as an answer instead:
R.flat doesn't order the way we need it to, so the circles assignment should be:
circles = [plt.Circle((j,i), radius=R[j][i]) for j, i in zip(x.flat, y.flat)]
Here is an easy example to plot circle_heatmap.
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.datasets import load_wine as load_data
from psynlig import plot_correlation_heatmap
plt.style.use('seaborn-talk')
data_set = load_data()
data = pd.DataFrame(data_set['data'], columns=data_set['feature_names'])
#data = df_corr_selected
kwargs = {
'heatmap': {
'vmin': -1,
'vmax': 1,
'cmap': 'viridis',
},
'figure': {
'figsize': (14, 10),
},
}
plot_correlation_heatmap(data, bubble=True, annotate=False, **kwargs)
plt.show()

Change Error Bar Markers (Caplines) in Pandas Bar Plot

so I am plotting error bar of pandas dataframe. Now the error bar has a weird arrow at the top, but what I want is a horizontal line. For example, a figure like this:
But now my error bar ends with arrow instead of a horinzontal line.
Here is the code i used to generate it:
plot = meansum.plot(
kind="bar",
yerr=stdsum,
colormap="OrRd_r",
edgecolor="black",
grid=False,
figsize=(8, 2),
ax=ax,
position=0.45,
error_kw=dict(ecolor="black", elinewidth=0.5, lolims=True, marker="o"),
width=0.8,
)
So what should I change to make the error become the one I want. Thx.
Using plt.errorbar from matplotlib makes it easier as it returns several objects including the caplines which contain the marker you want to change (the arrow which is automatically used when lolims is set to True, see docs).
Using pandas, you just need to dig the correct line in the children of plot and change its marker:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5, lolims=True),width=0.8)
for ch in plot.get_children():
if str(ch).startswith('Line2D'): # this is silly, but it appears that the first Line in the children are the caplines...
ch.set_marker('_')
ch.set_markersize(10) # to change its size
break
plt.show()
The result looks like:
Just don't set lolim = True and you are good to go, an example with sample data:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=(8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5),width=0.8)
plt.show()

Stop matplotlib repeating labels in legend

Here is a very simplified example:
xvalues = [2,3,4,6]
for x in xvalues:
plt.axvline(x,color='b',label='xvalues')
plt.legend()
The legend will now show 'xvalues' as a blue line 4 times in the legend.
Is there a more elegant way of fixing this than the following?
for i,x in enumerate(xvalues):
if not i:
plt.axvline(x,color='b',label='xvalues')
else:
plt.axvline(x,color='b')
plt.legend takes as parameters
A list of axis handles which are Artist objects
A list of labels which are strings
These parameters are both optional defaulting to plt.gca().get_legend_handles_labels().
You can remove duplicate labels by putting them in a dictionary before calling legend. This is because dicts can't have duplicate keys.
For example:
For Python versions < 3.7
from collections import OrderedDict
import matplotlib.pyplot as plt
handles, labels = plt.gca().get_legend_handles_labels()
by_label = OrderedDict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys())
For Python versions > 3.7
As of Python 3.7, dictionaries retain input order by default. Thus, there is no need for OrderedDict form the collections module.
import matplotlib.pyplot as plt
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys())
Docs for plt.legend
handles, labels = ax.get_legend_handles_labels()
handle_list, label_list = [], []
for handle, label in zip(handles, labels):
if label not in label_list:
handle_list.append(handle)
label_list.append(label)
plt.legend(handle_list, label_list)
I don't know if this can be considered "elegant", but you can have your label a variable that gets set to "_nolegend_" after first usage:
my_label = "xvalues"
xvalues = [2,3,4,6]
for x in xvalues:
plt.axvline(x, color='b', label=my_label)
my_label = "_nolegend_"
plt.legend()
This can be generalized using a dictionary of labels if you have to put several labels:
my_labels = {"x1" : "x1values", "x2" : "x2values"}
x1values = [1, 3, 5]
x2values = [2, 4, 6]
for x in x1values:
plt.axvline(x, color='b', label=my_labels["x1"])
my_labels["x1"] = "_nolegend_"
for x in x2values:
plt.axvline(x, color='r', label=my_labels["x2"])
my_labels["x2"] = "_nolegend_"
plt.legend()
(Answer inspired by https://stackoverflow.com/a/19386045/1878788)
Problem - 3D Array
Questions: Nov 2012, Oct 2013
import numpy as np
a = np.random.random((2, 100, 4))
b = np.random.random((2, 100, 4))
c = np.random.random((2, 100, 4))
Solution - dict uniqueness
For my case _nolegend_ (bli and DSM) would not work, nor would label if i==0. ecatmur's answer uses get_legend_handles_labels and reduces the legend down with collections.OrderedDict. Fons demonstrates this is possible without an import.
Inline with these answers, I suggest using dict for unique labels.
# Step-by-step
ax = plt.gca() # Get the axes you need
a = ax.get_legend_handles_labels() # a = [(h1 ... h2) (l1 ... l2)] non unique
b = {l:h for h,l in zip(*a)} # b = {l1:h1, l2:h2} unique
c = [*zip(*b.items())] # c = [(l1 l2) (h1 h2)]
d = c[::-1] # d = [(h1 h2) (l1 l2)]
plt.legend(*d)
Or
plt.legend(*[*zip(*{l:h for h,l in zip(*ax.get_legend_handles_labels())}.items())][::-1])
Maybe less legible and memorable than Matthew Bourque's solution. Code golf welcome.
Example
import numpy as np
a = np.random.random((2, 100, 4))
b = np.random.random((2, 100, 4))
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1)
ax.plot(*a, 'C0', label='a')
ax.plot(*b, 'C1', label='b')
ax.legend(*[*zip(*{l:h for h,l in zip(*ax.get_legend_handles_labels())}.items())][::-1])
# ax.legend() # Old, ^ New
plt.show()
Based on answer https://stackoverflow.com/a/13589144/9132798 and https://stackoverflow.com/a/19386045/9132798
plt.gca().get_legend_handles_labels()[1] gives a list of names, it is possible to check if the label is already in the list while in the loop plotting (label= name[i] if name[i] not in plt.gca().get_legend_handles_labels()[1] else '').
For the given example this solution would look like:
import matplotlib.pyplot as plt
xvalues = [2,3,4,6]
for x in xvalues:
plt.axvline(x,color='b',\
label= 'xvalues' if 'xvalues' \
not in plt.gca().get_legend_handles_labels()[1] else '')
plt.legend()
Which is much shorter than https://stackoverflow.com/a/13589144/9132798 and more flexible than https://stackoverflow.com/a/19386045/9132798 as it could be use for any kind of loop any plot function in the loop individually.
However, for many cycles it probably slower than https://stackoverflow.com/a/13589144/9132798.
These code snippets didn't work for me personally. I was plotting two different groups in two different colors. The legend would show two red markers and two blue markers, when I only wanted to see one per color. I'll paste a simplified version of what did work for me:
Import statements
import matplotlib.pyplot as plt
from matplotlib.legend_handler import HandlerLine2D
Plot data
points_grp, = plt.plot(x[grp_idx], y[grp_idx], color=c.c[1], marker=m, ms=4, lw=0, label=leglab[1])
points_ctrl, = plt.plot(x[ctrl_idx], y[ctrl_idx], color=c.c[0], marker=m, ms=4, lw=0, label=leglab[0])
Add legend
points_dict = {points_grp: HandlerLine2D(numpoints=1),points_ctrl: HandlerLine2D(numpoints=1)}
leg = ax.legend(fontsize=12, loc='upper left', bbox_to_anchor=(1, 1.03),handler_map=points_dict)

Categories

Resources