I'm creating a lineplot from a dataframe with seaborn and I want to add a horizontal line to the plot. That works fine, but I am having trouble adding the horizontal line to the legend.
Here is a minimal, verifiable example:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
x = np.array([2, 2, 4, 4])
y = np.array([5, 10, 10, 15])
isBool = np.array([True, False, True, False])
data = pd.DataFrame(np.column_stack((x, y, isBool)), columns=["x", "y", "someBoolean"])
print(data)
ax = sns.lineplot(x="x", y="y", hue="someBoolean", data=data)
plt.axhline(y=7, c='red', linestyle='dashed', label="horizontal")
plt.legend(("some name", "some other name", "horizontal"))
plt.show()
This results in the following plot:
The legends for "some name" and "some other name" show up correctly, but the "horizontal" legend is just blank. I tried simply using plt.legend() but then the legend consists of seemingly random values from the dataset.
Any ideas?
Simply using plt.legend() tells you what data is being plotting:
You are using someBoolean as the hue. So you are essentially creating two lines by applying a Boolean mask to your data. One line is for values that are False (shown as 0 on the legend above), the other for values that are True (shown as 1 on the legend above).
In order to get the legend you want you need to set the handles and the labels. You can get a list of them using ax.get_legend_handles_labels(). Then make sure to omit the first handle which, as shown above, has no artist:
ax = sns.lineplot(x="x", y="y", hue="someBoolean", data=data)
plt.axhline(y=7, c='red', linestyle='dashed', label="horizontal")
labels = ["some name", "some other name", "horizontal"]
handles, _ = ax.get_legend_handles_labels()
# Slice list to remove first handle
plt.legend(handles = handles[1:], labels = labels)
This gives:
Related
I would like to customize the labels on the geopandas plot legend.
fig, ax = plt.subplots(figsize = (8,5))
gdf.plot(column = "WF_CEREAL", ax = ax, legend=True, categorical=True, cmap='YlOrBr',legend_kwds = {"loc":"lower right"}, figsize =(10,6))
Adding "labels" in legend_kwds does not help.
I tried to add labels with legend_kwds in the following ways, but it didn't work-
legend_kwds = {"loc":"lower right", "labels":["low", "mid", "high", "strong", "severe"]
legend_labels:["low", "mid", "high", "strong", "severe"]
legend_labels=["low", "mid", "high", "strong", "severe"]
Since the question does not have reproducible code and data to work on. I will use the best possible approach to give a demo code that the general readers can follow and some of it may answer the question.
The code I provide below can run without the need of external data. Comments are inserted in various places to explain at important steps.
# Part 1
# Classifying the data of choice
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world['gdp_per_cap'] = world.gdp_md_est / world.pop_est
num_classes = 4 #quartile scheme has 4 classes
# You can use values derived from your preferred classification scheme here
num_qtiles = [0, .25, .5, .75, 1.] #class boundaries for quartiles
# Here is the categorical data to append to the dataframe
# They are also used as legend's label texts
qlabels = ["1st quartile","2nd quartile","3rd quartile","4th quartile"] #matching categorical data/labels
# Conditions
# len(num_qtiles)-1 == num_classes
# len(qlabels) == num_classes
# Create a new column for the categorical data mentioned above
world['gdp_quartile'] = pd.qcut(world['gdp_per_cap'], num_qtiles, labels=qlabels)
# Plotting the categorical data for checking
ax1 = world['gdp_quartile'].value_counts().plot(figsize=(5,4), kind='bar', xlabel='Quartile_Classes', ylabel='Countries', rot=45, legend=True)
The output of part1:-
# Part 2
# Plot world map using the categorical data
fig, ax = plt.subplots(figsize=(9,4))
# num_classes = 4 # already defined
#color_steps = plt.colormaps['Reds']._resample(num_classes) #For older version
color_steps = plt.colormaps['Reds'].resampled(num_classes) #Current version of matplotlib
# This plots choropleth map using categorical data as the theme
world.plot(column='gdp_quartile', cmap = color_steps,
legend=True,
legend_kwds={'loc':'lower left',
'bbox_to_anchor':(0, .2),
'markerscale':1.29,
'title_fontsize':'medium',
'fontsize':'small'},
ax=ax)
leg1 = ax.get_legend()
leg1.set_title("GDP per capita")
ax.title.set_text("World Map: GDP per Capita")
plt.show()
Output of part2:-
Edit
Additional code,
use it to replace the line plt.show() above.
This answers the question posted in the comment below.
# Part 3
# New categorical texts to use with legend
new_legtxt = ["low","mid","high","v.high"]
for ix,eb in enumerate(leg1.get_texts()):
print(eb.get_text(), "-->", new_legtxt[ix])
eb.set_text(new_legtxt[ix])
plt.show()
I am not sure if this will work but try:
gdf.plot(column = "WF_CEREAL", ax = ax, legend=True, categorical=True, cmap='YlOrBr',legend_kwds = {"loc":"lower right"}, figsize =(10,6), legend_labels=["low", "mid", "high", "strong", "severe"])
I am using seaborn in Jupyterlab to plot my data. Here is the code snippet for plotting the graph where I have separated data based on the presence/absence of PMMA shown by PMMA=1, PMMA=0 respectively. However, the strip plot on PMMA=1 for 17 and 20 on the x-axis is plotting the individual data points from PMMA=0 and the strip plot for PMMA=0 is not showing for the rest of the data. How can I fix this issue? Also, the legend is not showing the tag as "Day#"
both = pd.concat((df1, df2))
grped_bplot = sns.catplot(x='Passage#',
y='Dendrite Length (um)',
hue="Day#",
col="PMMA",
kind="box",
legend=False,
height=6,
aspect=1.3,
palette="Set2",
data=both);
grped_bplot = sns.stripplot(x='Passage#',
y='Dendrite Length (um)',
hue='Day#',
jitter=True,
dodge=True,
marker='o',
palette="Set2",
alpha=0.5,
data=both)
handles, labels = grped_bplot.get_legend_handles_labels()
l = plt.legend(handles[0:3], labels[0:3])
Boxplot with overlapping strip plot
sns.catplot returns a FacetGrid. You can call .map_dataframe(sns.stripplot) to create strip plots for the same data.
Here is some example code starting from generated test data:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# first, create some test data
both = pd.DataFrame({'Passage#': np.random.choice([6, 17, 18, 19, 20], 500),
'Dendrite Length (um)': np.random.uniform(1, 17, 500) ** 3,
'Day#': np.random.choice([3, 4, 7], 500),
'PMMA': np.random.randint(0, 2, 500)})
g = sns.catplot(x='Passage#',
y='Dendrite Length (um)',
hue="Day#",
col="PMMA",
kind="box",
legend=False,
height=6,
aspect=1.3,
palette="Set2",
boxprops={'alpha': 0.4},
data=both)
g.map_dataframe(sns.stripplot,
x='Passage#',
y='Dendrite Length (um)',
hue='Day#',
jitter=True,
dodge=True,
marker='o',
palette="Set2",
alpha=0.5)
g.add_legend(title='Day#')
plt.show()
PS: To have the boxes in the legend instead of the dots, you can call g.add_legend() after the catplot but before calling g.map_dataframe.
I followed all step following my question here : Pandas Dataframe : How to add a vertical line with label to a bar plot when your data is time-series?
it was supposed to solve my problem but when I change the The kind of plot to line , the vertical line did not appear . I copy the same code and change plot type to line instead of bar :
as you can see with bar , the vertical line (in red ) appears .
# function to plot a bar
def dessine_line3(madataframe,debut_date , mes_colonnes):
madataframe.index = pd.to_datetime(madataframe.index,format='%m/%d/%y')
df = madataframe.loc[debut_date:,mes_colonnes].copy()
filt = (df[df.index == '4/20/20']).index
df.index.searchsorted(value=filt)
fig,ax = plt.subplots()
df.plot.bar(figsize=(17,8),grid=True,ax=ax)
ax.axvline(df.index.searchsorted(filt), color="red", linestyle="--", lw=2, label="lancement")
plt.tight_layout()
out :
but whan I just change code by changing the type of plot to line : there is no vertical line and also the x axis (date ) changed .
so I wrote another code juste to draw line with vertical line
ax = madagascar_maurice_case_df[["Madagascar Covid-19 Ratio","Maurice Covid-19 Ratio"]].loc['3/17/20':].plot.line(figsize=(17,7),grid=True)
filt = (df[df.index=='4/20/20']).index
ax.axvline(df.index.searchsorted(filt),color="red",linestyle="--",lw=2 ,label="lancement")
plt.show()
but the result is the same
following the comment below , here is my final code :
def dessine_line5(madataframe,debut_date , mes_colonnes):
plt.figure(figsize=(17,8))
plt.grid(b=True,which='major',axis='y')
df = madataframe.loc[debut_date:,mes_colonnes]
sns.lineplot(data=df)
lt = datetime.toordinal(pd.to_datetime('4/20/20'))
plt.axvline(lt,color="red",linestyle="--",lw=2,label="lancement")
plt.show()
and the result is :
Plot tick locs
The issue is the plot tick locations are a different style depending on plot kind and api
df.plot vs. plt.plot vs. sns.lineplot
Place ticks, labels = plt.xticks() after df.plot.bar(figsize=(17,8),grid=True,ax=ax) and printing ticks will give array([0, 1, 2,..., len(df.index)]), which is why df.index.searchsorted(filt) works, it produces an integer location.
df.plot() has tick locs like array([13136, 13152, 13174, 13175], dtype=int64), for my sample date range. I don't actually know how those numbers are derived, so I don't know how to convert the date to that format.
sns.lineplot and plt.plot have tick locs that are the ordinal representation of the datetime, array([737553., 737560., 737567., 737577., 737584., 737591., 737598.,
737607.]
For a lineplot with your example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime
sns.lineplot(data=df)
lt = datetime.toordinal(pd.to_datetime('2020/04/20'))
plt.axvline(lt, color="red", linestyle="--", lw=2, label="lancement")
plt.show()
For my example data:
import numpy as np
data = {'a': [np.random.randint(10) for _ in range(40)],
'b': [np.random.randint(10) for _ in range(40)],
'date': pd.bdate_range(datetime.today(), periods=40).tolist()}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
sns.lineplot(data=df)
ticks, labels = plt.xticks()
lt = datetime.toordinal(pd.to_datetime('2020-05-19'))
plt.axvline(lt, color="red", linestyle="--", lw=2, label="lancement")
plt.show()
I am plotting multiple dataframes as point plot using seaborn. Also I am plotting all the dataframes on the same axis.
How would I add legend to the plot ?
My code takes each of the dataframe and plots it one after another on the same figure.
Each dataframe has same columns
date count
2017-01-01 35
2017-01-02 43
2017-01-03 12
2017-01-04 27
My code :
f, ax = plt.subplots(1, 1, figsize=figsize)
x_col='date'
y_col = 'count'
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_3,color='red')
This plots 3 lines on the same plot. However the legend is missing. The documentation does not accept label argument .
One workaround that worked was creating a new dataframe and using hue argument.
df_1['region'] = 'A'
df_2['region'] = 'B'
df_3['region'] = 'C'
df = pd.concat([df_1,df_2,df_3])
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df,hue='region')
But I would like to know if there is a way to create a legend for the code that first adds sequentially point plot to the figure and then add a legend.
Sample output :
I would suggest not to use seaborn pointplot for plotting. This makes things unnecessarily complicated.
Instead use matplotlib plot_date. This allows to set labels to the plots and have them automatically put into a legend with ax.legend().
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
date = pd.date_range("2017-03", freq="M", periods=15)
count = np.random.rand(15,4)
df1 = pd.DataFrame({"date":date, "count" : count[:,0]})
df2 = pd.DataFrame({"date":date, "count" : count[:,1]+0.7})
df3 = pd.DataFrame({"date":date, "count" : count[:,2]+2})
f, ax = plt.subplots(1, 1)
x_col='date'
y_col = 'count'
ax.plot_date(df1.date, df1["count"], color="blue", label="A", linestyle="-")
ax.plot_date(df2.date, df2["count"], color="red", label="B", linestyle="-")
ax.plot_date(df3.date, df3["count"], color="green", label="C", linestyle="-")
ax.legend()
plt.gcf().autofmt_xdate()
plt.show()
In case one is still interested in obtaining the legend for pointplots, here a way to go:
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df3,color='red')
ax.legend(handles=ax.lines[::len(df1)+1], labels=["A","B","C"])
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
plt.gcf().autofmt_xdate()
plt.show()
Old question, but there's an easier way.
sns.pointplot(x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(x=x_col,y=y_col,data=df_3,color='red')
plt.legend(labels=['legendEntry1', 'legendEntry2', 'legendEntry3'])
This lets you add the plots sequentially, and not have to worry about any of the matplotlib crap besides defining the legend items.
I tried using Adam B's answer, however, it didn't work for me. Instead, I found the following workaround for adding legends to pointplots.
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='#bb3f3f', label='Label1')
black_patch = mpatches.Patch(color='#000000', label='Label2')
In the pointplots, the color can be specified as mentioned in previous answers. Once these patches corresponding to the different plots are set up,
plt.legend(handles=[red_patch, black_patch])
And the legend ought to appear in the pointplot.
This goes a bit beyond the original question, but also builds on #PSub's response to something more general---I do know some of this is easier in Matplotlib directly, but many of the default styling options for Seaborn are quite nice, so I wanted to work out how you could have more than one legend for a point plot (or other Seaborn plot) without dropping into Matplotlib right at the start.
Here's one solution:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# We will need to access some of these matplotlib classes directly
from matplotlib.lines import Line2D # For points and lines
from matplotlib.patches import Patch # For KDE and other plots
from matplotlib.legend import Legend
from matplotlib import cm
# Initialise random number generator
rng = np.random.default_rng(seed=42)
# Generate sample of 25 numbers
n = 25
clusters = []
for c in range(0,3):
# Crude way to get different distributions
# for each cluster
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Cluster {c+1}"
})
clusters.append(df)
# Flatten to a single data frame
clusters = pd.concat(clusters)
# Now do the same for data to feed into
# the second (scatter) plot...
n = 8
points = []
for c in range(0,2):
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Group {c+1}"
})
points.append(df)
points = pd.concat(points)
# And create the figure
f, ax = plt.subplots(figsize=(8,8))
# The KDE-plot generates a Legend 'as usual'
k = sns.kdeplot(
data=clusters,
x='x', y='y',
hue='name',
shade=True,
thresh=0.05,
n_levels=2,
alpha=0.2,
ax=ax,
)
# Notice that we access this legend via the
# axis to turn off the frame, set the title,
# and adjust the patch alpha level so that
# it closely matches the alpha of the KDE-plot
ax.get_legend().set_frame_on(False)
ax.get_legend().set_title("Clusters")
for lh in ax.get_legend().get_patches():
lh.set_alpha(0.2)
# You would probably want to sort your data
# frame or set the hue and style order in order
# to ensure consistency for your own application
# but this works for demonstration purposes
groups = points.name.unique()
markers = ['o', 'v', 's', 'X', 'D', '<', '>']
colors = cm.get_cmap('Dark2').colors
# Generate the scatterplot: notice that Legend is
# off (otherwise this legend would overwrite the
# first one) and that we're setting the hue, style,
# markers, and palette using the 'name' parameter
# from the data frame and the number of groups in
# the data.
p = sns.scatterplot(
data=points,
x="x",
y="y",
hue='name',
style='name',
markers=markers[:len(groups)],
palette=colors[:len(groups)],
legend=False,
s=30,
alpha=1.0
)
# Here's the 'magic' -- we use zip to link together
# the group name, the color, and the marker style. You
# *cannot* retreive the marker style from the scatterplot
# since that information is lost when rendered as a
# PathCollection (as far as I can tell). Anyway, this allows
# us to loop over each group in the second data frame and
# generate a 'fake' Line2D plot (with zero elements and no
# line-width in our case) that we can add to the legend. If
# you were overlaying a line plot or a second plot that uses
# patches you'd have to tweak this accordingly.
patches = []
for x in zip(groups, colors[:len(groups)], markers[:len(groups)]):
patches.append(Line2D([0],[0], linewidth=0.0, linestyle='',
color=x[1], markerfacecolor=x[1],
marker=x[2], label=x[0], alpha=1.0))
# And add these patches (with their group labels) to the new
# legend item and place it on the plot.
leg = Legend(ax, patches, labels=groups,
loc='upper left', frameon=False, title='Groups')
ax.add_artist(leg);
# Done
plt.show();
Here's the output:
I have the following plot build with seaborn using factorplot() method.
Is it possible to use the line style as a legend to replace the legend based on line color on the right?
graycolors = sns.mpl_palette('Greys_r', 4)
g = sns.factorplot(x="k", y="value", hue="class", palette=graycolors,
data=df, linestyles=["-", "--"])
Furthermore I'm trying to get both lines in black color using the color="black" parameter in my factorplot method but this results in an exception "factorplot() got an unexpected keyword argument 'color'". How can I paint both lines in the same color and separate them by the linestyle only?
I have been looking for a solution trying to put the linestyle in the legend like matplotlib, but I have not yet found how to do this in seaborn. However, to make the data clear in the legend I have used different markers:
import seaborn as sns
import numpy as np
import pandas as pd
# creating some data
n = 11
x = np.linspace(0,2, n)
y = np.sin(2*np.pi*x)
y2 = np.cos(2*np.pi*x)
data = {'x': np.append(x, x), 'y': np.append(y, y2),
'class': np.append(np.repeat('sin', n), np.repeat('cos', n))}
df = pd.DataFrame(data)
# plot the data with the markers
# note that I put the legend=False to move it up (otherwise it was blocking the graph)
g=sns.factorplot(x="x", y="y", hue="class", palette=graycolors,
data=df, linestyles=["-", "--"], markers=['o','v'], legend=False)
# placing the legend up
g.axes[0][0].legend(loc=1)
# showing graph
plt.show()
you can try the following:
h = plt.gca().get_lines()
lg = plt.legend(handles=h, labels=['YOUR Labels List'], loc='best')
It worked fine with me.