How to specify the palette in a seaborn figure-level plot

How to specify the palette in a seaborn figure-level plot - python

I've learned to not use seaborn if I need to make specific changes or detail oriented visualizations but I feel like I'm not fully utilizing what it has to offer at times.
I have a series of 2D slices plotting cluster memberships.
Issue is between the cases, the number of clusters present changes which causes seaborn to reset the color palette every case, leading to the same color being used for different clusters.
I'd like to specify the color palette specifically with seaborn. I'm not sure if I'm just missing something or if this is a detail that cannot be addressed when using facetgrid?
df = pd.DataFrame()
df['I'] = np.full(20,1)
df['J'] = np.arange(0,20,1)
df['K'] = [1]*12 + [2]*8
df['CM_Hard'] = [1]*10 + [2] + [0] + [2]*8
df['Realization'] = ['p25']*10 + ['p50']*9 + ['p75']
for layer in df['K'].unique():
layer_data_slice = df.groupby('K').get_group(layer)
g = sns.FacetGrid(layer_data_slice, col="Realization",hue="CM_Hard")
g.map_dataframe(sns.scatterplot, x="I", y="J", s=50, marker='+', palette='deep')
g.add_legend()
g.fig.suptitle("Training Realizations, Layer: {}".format(int(layer)), size=16, y=1.05)
figure_title = 'Training_Layer_{}'.format(int(layer))
I've attempted to use the following for the palette definition but it does not affect the plots:
palette = {0:"tab:cyan", 1:"tab:orange", 2:"tab:purple"}
This has been attempted with "tab:color", "color" and the RGB reference with no luck. There is no error it simply doesn't do anything when changed.

Update to seaborn 0.11.2. Using FacetGrid directly is not recommended. Use seaborn.relplot with kind='scatter' for a figure-level plot.
The keys in palette must match the unique values from the column passed to hue.
Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3, seaborn 0.11.2
import seaborn as sns
# load the data - this is a pandas.DataFrame
tips = sns.load_dataset('tips')
# set the hue palette as a dict for custom mapping
palette = {'Lunch': "tab:cyan", 'Dinner':"tab:purple"}
# plot
p = sns.relplot(kind='scatter', data=tips, col='smoker', x='total_bill', y='tip', hue='time', palette=palette)
Using new sample data added to OP
If the 'K' column is renamed to 'Layer', then the subplot title will match your example: df = df.rename({'K': 'Layer'}, axis=1)
p = sns.relplot(data=df, x='I', y='J', s=50, marker='+', row='Layer', col='Realization', hue='CM_Hard', palette=palette, height=4)
p.fig.suptitle('Training Realizations', y=1.05, size=16)
FacetGrid
Note that palette is in the FacetGrid call, not map_dataframe
for layer in df['K'].unique():
layer_data_slice = df.groupby('K').get_group(layer)
g = sns.FacetGrid(layer_data_slice, col="Realization",hue="CM_Hard", palette=palette)
g.map_dataframe(sns.scatterplot, x="I", y="J", s=50, marker='+')
g.add_legend()
g.fig.suptitle("Training Realizations, Layer: {}".format(int(layer)), size=16, y=1.05)
figure_title = 'Training_Layer_{}'.format(int(layer))

Related

half (not split!) violin plots in seaborn

Currently seaborn offers functionality for split violinplots by setting split=True, according to a hue variable. I would like to make a 'half' violin plot, i.e. a plot where half of each violin is omitted. Such a plot depicts something similar to a pdf for each continuous variable, plotted on one side of each vertical line of each categorical variable only.
I have managed to trick seaborn to plot this with an extra data point outside the plotted range of values and an extra dummy hue, but I would like to know if this can be done without actually altering the dataset, e.g. within sns.violinplot() arguments.
For instance, this graph:
Was created by this snippet:
# imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# load dataset from seaborn
datalist = sns.get_dataset_names()
dataset_name = 'iris'
if dataset_name in datalist:
df = sns.load_dataset(dataset_name)
else:
print("Dataset with name: " + dataset_name + " was not found in the available datasets online by seaborn.")
# prepare data
df2 = df.append([-999,-999,-999,-999,'setosa'])
df2['huecol'] = 0.0
df2['huecol'].iloc[-1]= -999
# plot
fig = plt.figure(figsize=(6,6))
sns.violinplot(x='species',y="sepal_width",
split=True, hue ='huecol', inner = 'quartile',
palette="pastel", data=df2, legend=False)
plt.title('iris')
# remove hue legend
leg = plt.gca().legend()
leg.remove()
plt.ylim([1,5.0])
plt.show()

I was looking for a solution similar to this but did not find anything satisfactory. I ended up calling seaborn.kdeplot multiple times as violinplot is essentially a one-sided kernel density plot.
Example
Function definition for categorical_kde_plot below
categorical_kde_plot(
df,
variable="tip",
category="day",
category_order=["Thur", "Fri", "Sat", "Sun"],
horizontal=False,
)
with horizontal=True, the output would look like:
Code
import seaborn as sns
from matplotlib import pyplot as plt
def categorical_kde_plot(
df,
variable,
category,
category_order=None,
horizontal=False,
rug=True,
figsize=None,
):
"""Draw a categorical KDE plot
Parameters
----------
df: pd.DataFrame
The data to plot
variable: str
The column in the `df` to plot (continuous variable)
category: str
The column in the `df` to use for grouping (categorical variable)
horizontal: bool
If True, draw density plots horizontally. Otherwise, draw them
vertically.
rug: bool
If True, add also a sns.rugplot.
figsize: tuple or None
If None, use default figsize of (7, 1*len(categories))
If tuple, use that figsize. Given to plt.subplots as an argument.
"""
if category_order is None:
categories = list(df[category].unique())
else:
categories = category_order[:]
figsize = (7, 1.0 * len(categories))
fig, axes = plt.subplots(
nrows=len(categories) if horizontal else 1,
ncols=1 if horizontal else len(categories),
figsize=figsize[::-1] if not horizontal else figsize,
sharex=horizontal,
sharey=not horizontal,
)
for i, (cat, ax) in enumerate(zip(categories, axes)):
sns.kdeplot(
data=df[df[category] == cat],
x=variable if horizontal else None,
y=None if horizontal else variable,
# kde kwargs
bw_adjust=0.5,
clip_on=False,
fill=True,
alpha=1,
linewidth=1.5,
ax=ax,
color="lightslategray",
)
keep_variable_axis = (i == len(fig.axes) - 1) if horizontal else (i == 0)
if rug:
sns.rugplot(
data=df[df[category] == cat],
x=variable if horizontal else None,
y=None if horizontal else variable,
ax=ax,
color="black",
height=0.025 if keep_variable_axis else 0.04,
)
_format_axis(
ax,
cat,
horizontal,
keep_variable_axis=keep_variable_axis,
)
plt.tight_layout()
plt.show()
def _format_axis(ax, category, horizontal=False, keep_variable_axis=True):
# Remove the axis lines
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
if horizontal:
ax.set_ylabel(None)
lim = ax.get_ylim()
ax.set_yticks([(lim[0] + lim[1]) / 2])
ax.set_yticklabels([category])
if not keep_variable_axis:
ax.get_xaxis().set_visible(False)
ax.spines["bottom"].set_visible(False)
else:
ax.set_xlabel(None)
lim = ax.get_xlim()
ax.set_xticks([(lim[0] + lim[1]) / 2])
ax.set_xticklabels([category])
if not keep_variable_axis:
ax.get_yaxis().set_visible(False)
ax.spines["left"].set_visible(False)
if __name__ == "__main__":
df = sns.load_dataset("tips")
categorical_kde_plot(
df,
variable="tip",
category="day",
category_order=["Thur", "Fri", "Sat", "Sun"],
horizontal=True,
)

The answer is simply, no, it's not possible with seaborn without tricking it into thinking there is a hue present.
This answer shows how to do it in matplotlib and in principle the same can be applied to seaborn violinplots as well, namely to cut out half of the violin path.

It's not necessary to modify the data:
ax = sns.violinplot(
data=tips,
x="day", y="total_bill", hue=True,
hue_order=[True, False], split=True,
)
ax.legend_ = None

How to add median value labels to a Seaborn boxplot using the hue argument

In addition to the solution posted in this link I would also like if I can also add the Hue Parameter, and add the Median Values in each of the plots.
The Current Code:
testPlot = sns.boxplot(x='Pclass', y='Age', hue='Sex', data=trainData)
m1 = trainData.groupby(['Pclass', 'Sex'])['Age'].median().values
mL1 = [str(np.round(s, 2)) for s in m1]
p1 = range(len(m1))
for tick, label in zip(p1, testPlot.get_xticklabels()):
print(testPlot.text(p1[tick], m1[tick] + 1, mL1[tick]))
Gives a Output Like:
I'm working on the Titanic Dataset which can be found in this link.
I'm getting the required values, but only when I do a print statement, how do I include it in my Plot?

Place your labels manually according to hue parameter and width of bars for every category in a cycle of all xticklabels:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
trainData = pd.read_csv('titanic.csv')
testPlot = sns.boxplot(x='pclass', y='age', hue='sex', data=trainData)
m1 = trainData.groupby(['pclass', 'sex'])['age'].median().values
mL1 = [str(np.round(s, 2)) for s in m1]
ind = 0
for tick in range(len(testPlot.get_xticklabels())):
testPlot.text(tick-.2, m1[ind+1]+1, mL1[ind+1], horizontalalignment='center', color='w', weight='semibold')
testPlot.text(tick+.2, m1[ind]+1, mL1[ind], horizontalalignment='center', color='w', weight='semibold')
ind += 2
plt.show()

This answer is nearly copy & pasted from here but fit more to your example code. The linked answer is IMHO a bit missplaced there because that question is just about labeling a boxplot and not about a boxplot using the hue argument.
I couldn't use your Train dataset because it is not available as Python package. So I used Titanic instead which has nearly the same column names.
#!/usr/bin/env python3
import pandas as pd
import matplotlib
import matplotlib.patheffects as path_effects
import seaborn as sns
def add_median_labels(ax, fmt='.1f'):
"""Credits: https://stackoverflow.com/a/63295846/4865723
"""
lines = ax.get_lines()
boxes = [c for c in ax.get_children() if type(c).__name__ == 'PathPatch']
lines_per_box = int(len(lines) / len(boxes))
for median in lines[4:len(lines):lines_per_box]:
x, y = (data.mean() for data in median.get_data())
# choose value depending on horizontal or vertical plot orientation
value = x if (median.get_xdata()[1] - median.get_xdata()[0]) == 0 else y
text = ax.text(x, y, f'{value:{fmt}}', ha='center', va='center',
fontweight='bold', color='white')
# create median-colored border around white text for contrast
text.set_path_effects([
path_effects.Stroke(linewidth=3, foreground=median.get_color()),
path_effects.Normal(),
])
df = sns.load_dataset('titanic')
plot = sns.boxplot(x='pclass', y='age', hue='sex', data=df)
add_median_labels(plot)
plot.figure.show()
Als an alternative when you create your boxplot with a figure-based function. In that case you need to give the axes parameter to add_median_labels().
# imports and add_median_labels() unchanged
df = sns.load_dataset('titanic')
plot = sns.catplot(kind='box', x='pclass', y='age', hue='sex', data=df)
add_median_labels(plot.axes[0][0])
plot.figure.show()
The resulting plot
This solution also works with more then two categories in the column used for the hue argument.

Add Legend to Seaborn point plot

I am plotting multiple dataframes as point plot using seaborn. Also I am plotting all the dataframes on the same axis.
How would I add legend to the plot ?
My code takes each of the dataframe and plots it one after another on the same figure.
Each dataframe has same columns
date count
2017-01-01 35
2017-01-02 43
2017-01-03 12
2017-01-04 27
My code :
f, ax = plt.subplots(1, 1, figsize=figsize)
x_col='date'
y_col = 'count'
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_3,color='red')
This plots 3 lines on the same plot. However the legend is missing. The documentation does not accept label argument .
One workaround that worked was creating a new dataframe and using hue argument.
df_1['region'] = 'A'
df_2['region'] = 'B'
df_3['region'] = 'C'
df = pd.concat([df_1,df_2,df_3])
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df,hue='region')
But I would like to know if there is a way to create a legend for the code that first adds sequentially point plot to the figure and then add a legend.
Sample output :

I would suggest not to use seaborn pointplot for plotting. This makes things unnecessarily complicated.
Instead use matplotlib plot_date. This allows to set labels to the plots and have them automatically put into a legend with ax.legend().
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
date = pd.date_range("2017-03", freq="M", periods=15)
count = np.random.rand(15,4)
df1 = pd.DataFrame({"date":date, "count" : count[:,0]})
df2 = pd.DataFrame({"date":date, "count" : count[:,1]+0.7})
df3 = pd.DataFrame({"date":date, "count" : count[:,2]+2})
f, ax = plt.subplots(1, 1)
x_col='date'
y_col = 'count'
ax.plot_date(df1.date, df1["count"], color="blue", label="A", linestyle="-")
ax.plot_date(df2.date, df2["count"], color="red", label="B", linestyle="-")
ax.plot_date(df3.date, df3["count"], color="green", label="C", linestyle="-")
ax.legend()
plt.gcf().autofmt_xdate()
plt.show()
In case one is still interested in obtaining the legend for pointplots, here a way to go:
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df3,color='red')
ax.legend(handles=ax.lines[::len(df1)+1], labels=["A","B","C"])
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
plt.gcf().autofmt_xdate()
plt.show()

Old question, but there's an easier way.
sns.pointplot(x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(x=x_col,y=y_col,data=df_3,color='red')
plt.legend(labels=['legendEntry1', 'legendEntry2', 'legendEntry3'])
This lets you add the plots sequentially, and not have to worry about any of the matplotlib crap besides defining the legend items.

I tried using Adam B's answer, however, it didn't work for me. Instead, I found the following workaround for adding legends to pointplots.
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='#bb3f3f', label='Label1')
black_patch = mpatches.Patch(color='#000000', label='Label2')
In the pointplots, the color can be specified as mentioned in previous answers. Once these patches corresponding to the different plots are set up,
plt.legend(handles=[red_patch, black_patch])
And the legend ought to appear in the pointplot.

This goes a bit beyond the original question, but also builds on #PSub's response to something more general---I do know some of this is easier in Matplotlib directly, but many of the default styling options for Seaborn are quite nice, so I wanted to work out how you could have more than one legend for a point plot (or other Seaborn plot) without dropping into Matplotlib right at the start.
Here's one solution:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# We will need to access some of these matplotlib classes directly
from matplotlib.lines import Line2D # For points and lines
from matplotlib.patches import Patch # For KDE and other plots
from matplotlib.legend import Legend
from matplotlib import cm
# Initialise random number generator
rng = np.random.default_rng(seed=42)
# Generate sample of 25 numbers
n = 25
clusters = []
for c in range(0,3):
# Crude way to get different distributions
# for each cluster
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Cluster {c+1}"
})
clusters.append(df)
# Flatten to a single data frame
clusters = pd.concat(clusters)
# Now do the same for data to feed into
# the second (scatter) plot...
n = 8
points = []
for c in range(0,2):
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Group {c+1}"
})
points.append(df)
points = pd.concat(points)
# And create the figure
f, ax = plt.subplots(figsize=(8,8))
# The KDE-plot generates a Legend 'as usual'
k = sns.kdeplot(
data=clusters,
x='x', y='y',
hue='name',
shade=True,
thresh=0.05,
n_levels=2,
alpha=0.2,
ax=ax,
)
# Notice that we access this legend via the
# axis to turn off the frame, set the title,
# and adjust the patch alpha level so that
# it closely matches the alpha of the KDE-plot
ax.get_legend().set_frame_on(False)
ax.get_legend().set_title("Clusters")
for lh in ax.get_legend().get_patches():
lh.set_alpha(0.2)
# You would probably want to sort your data
# frame or set the hue and style order in order
# to ensure consistency for your own application
# but this works for demonstration purposes
groups = points.name.unique()
markers = ['o', 'v', 's', 'X', 'D', '<', '>']
colors = cm.get_cmap('Dark2').colors
# Generate the scatterplot: notice that Legend is
# off (otherwise this legend would overwrite the
# first one) and that we're setting the hue, style,
# markers, and palette using the 'name' parameter
# from the data frame and the number of groups in
# the data.
p = sns.scatterplot(
data=points,
x="x",
y="y",
hue='name',
style='name',
markers=markers[:len(groups)],
palette=colors[:len(groups)],
legend=False,
s=30,
alpha=1.0
)
# Here's the 'magic' -- we use zip to link together
# the group name, the color, and the marker style. You
# *cannot* retreive the marker style from the scatterplot
# since that information is lost when rendered as a
# PathCollection (as far as I can tell). Anyway, this allows
# us to loop over each group in the second data frame and
# generate a 'fake' Line2D plot (with zero elements and no
# line-width in our case) that we can add to the legend. If
# you were overlaying a line plot or a second plot that uses
# patches you'd have to tweak this accordingly.
patches = []
for x in zip(groups, colors[:len(groups)], markers[:len(groups)]):
patches.append(Line2D([0],[0], linewidth=0.0, linestyle='',
color=x[1], markerfacecolor=x[1],
marker=x[2], label=x[0], alpha=1.0))
# And add these patches (with their group labels) to the new
# legend item and place it on the plot.
leg = Legend(ax, patches, labels=groups,
loc='upper left', frameon=False, title='Groups')
ax.add_artist(leg);
# Done
plt.show();
Here's the output:

Custom legend for Seaborn regplot (Python 3)

I've been trying to follow this How to make custom legend in matplotlib SO question but I think a few things are getting lost in translation. I used a custom color mapping for the different classes of points in my plot and I want to be able to put a table with those color-label pairs. I stored the info in a dictionary D_color_label and then made 2 parallel lists colors and labels. I tried using it in the ax.legend but it didn't seem to work.
np.random.seed(0)
# Create dataframe
DF_0 = pd.DataFrame(np.random.random((100,2)), columns=["x","y"])
# Label to colors
D_idx_color = {**dict(zip(range(0,25), ["#91FF61"]*25)),
**dict(zip(range(25,50), ["#BA61FF"]*25)),
**dict(zip(range(50,75), ["#916F61"]*25)),
**dict(zip(range(75,100), ["#BAF1FF"]*25))}
D_color_label = {"#91FF61":"label_0",
"#BA61FF":"label_1",
"#916F61":"label_2",
"#BAF1FF":"label_3"}
# Add color column
DF_0["color"] = pd.Series(list(D_idx_color.values()), index=list(D_idx_color.keys()))
# Plot
fig, ax = plt.subplots(figsize=(8,8))
sns.regplot(data=DF_0, x="x", y="y", scatter_kws={"c":DF_0["color"]}, ax=ax)
# Add custom legend
colors = list(set(DF_0["color"]))
labels = [D_color_label[x] for x in set(DF_0["color"])]
# If I do this, I get the following error:
# ax.legend(colors, labels)
# UserWarning: Legend does not support '#BA61FF' instances.
# A proxy artist may be used instead.

According to http://matplotlib.org/users/legend_guide.html you have to put to legend function artists which will be labeled. To use scatter_plot individually you have to group by your data by color and plot every data of one color individually to set its own label for every artist:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
np.random.seed(0)
# Create dataframe
DF_0 = pd.DataFrame(np.random.random((100, 2)), columns=["x", "y"])
DF_0['color'] = ["#91FF61"]*25 + ["#BA61FF"]*25 + ["#91FF61"]*25 + ["#BA61FF"]*25
#print DF_0
D_color_label = {"#91FF61": "label_0", "#BA61FF": "label_1",
"#916F61": "label_2", "#BAF1FF": "label_3"}
colors = list(DF_0["color"].uniqe())
labels = [D_color_label[x] for x in DF_0["color"].unique()]
ax = sns.regplot(data=DF_0, x="x", y="y", scatter_kws={'c': DF_0['color'], 'zorder':1})
# Make a legend
# groupby and plot points of one color
for i, grp in DF_0.groupby(['color']):
grp.plot(kind='scatter', x='x', y='y', c=i, ax=ax, label=labels[i+1], zorder=0)
ax.legend(loc=2)
plt.show()

Python Seaborn Matplotlib setting line style as legend

I have the following plot build with seaborn using factorplot() method.
Is it possible to use the line style as a legend to replace the legend based on line color on the right?
graycolors = sns.mpl_palette('Greys_r', 4)
g = sns.factorplot(x="k", y="value", hue="class", palette=graycolors,
data=df, linestyles=["-", "--"])
Furthermore I'm trying to get both lines in black color using the color="black" parameter in my factorplot method but this results in an exception "factorplot() got an unexpected keyword argument 'color'". How can I paint both lines in the same color and separate them by the linestyle only?

I have been looking for a solution trying to put the linestyle in the legend like matplotlib, but I have not yet found how to do this in seaborn. However, to make the data clear in the legend I have used different markers:
import seaborn as sns
import numpy as np
import pandas as pd
# creating some data
n = 11
x = np.linspace(0,2, n)
y = np.sin(2*np.pi*x)
y2 = np.cos(2*np.pi*x)
data = {'x': np.append(x, x), 'y': np.append(y, y2),
'class': np.append(np.repeat('sin', n), np.repeat('cos', n))}
df = pd.DataFrame(data)
# plot the data with the markers
# note that I put the legend=False to move it up (otherwise it was blocking the graph)
g=sns.factorplot(x="x", y="y", hue="class", palette=graycolors,
data=df, linestyles=["-", "--"], markers=['o','v'], legend=False)
# placing the legend up
g.axes[0][0].legend(loc=1)
# showing graph
plt.show()

you can try the following:
h = plt.gca().get_lines()
lg = plt.legend(handles=h, labels=['YOUR Labels List'], loc='best')
It worked fine with me.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to specify the palette in a seaborn figure-level plot - python

Related

half (not split!) violin plots in seaborn

How to add median value labels to a Seaborn boxplot using the hue argument

Add Legend to Seaborn point plot

Custom legend for Seaborn regplot (Python 3)

Python Seaborn Matplotlib setting line style as legend

Categories

Resources