seaborn barplot with labels for x values (and no hue) - python

My dataframe contains two columns, I would like to plot their values in a barplot. Like this:
import seaborn as sns
# load sample data and drop all but two columns
tips = sns.load_dataset("tips")
tips= tips[["day", "total_bill"]]
sns.set(style="whitegrid")
ax = sns.barplot(x="day", y="total_bill", data=tips)
On top of this barplot, I would also like to add a legend with labels for each x value. Seaborn supports this, but as far as I can see, it works only when you specify a hue argument. Each label in the legend then corresponds to a hue value.
Can I create a legend with explanations for the x values?
This might be a confusing question. I don't want to rename the label for the axis or the ticks along the axis. Instead, I would like to have a separate legend with additional explanations. My bars give me some nice space to put this legend and the explanations would be too long to have them as ticks.

Is this what you want:
sns.set(style="whitegrid")
ax = sns.barplot(x="day", y="total_bill", data=tips)
ax.legend(ax.patches, ['1','2','3','Something that I can\'t say'], loc=[1.01,0.5])
Output:

Related

How to reduce the blank area in a grouped boxplot with many missing hue categories

I have an issue when plotting a categorical grouped boxplot by seaborn in Python, especially using 'hue'.
My raw data is as shown in the figure below. And I wanted to plot values in column 8 after categorized by column 1 and 4.
I used seaborn and my code is shown below:
ax = sns.boxplot(x=output[:,1], y=output[:,8], hue=output[:,4])
ax.set_xticklabel(ax.get_xticklabels(), rotation=90)
plt.legend([],[])
However, the generated plot always contains large blank area, as shown in the upper figure below. I tried to add 'dodge=False' in sns.boxplot according to a post here (https://stackoverflow.com/questions/53641287/off-center-x-axis-in-seaborn), but it gives the lower figure below.
Actually, what I want Python to plot is a boxplot like what I generated using JMP below.
It seems that if one of the 2nd categories is empty, seaborn will still leave the space on the generated figure for each 1st category, thus causes the observed off-set/blank area.
So I wonder if there is any way to solve this issue, like using other package in python?
Seaborn reserves a spot for each individual hue value, even when some of these values are missing. When many hue values are missing, this leads to annoying open spots. (When there would be only one box per x-value, dodge=False would solve the problem.)
A workaround is to generate a separate subplot for each individual x-label.
Reproducible example for default boxplot with missing hue values
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230206)
df = pd.DataFrame({'label': np.repeat(['label1', 'label2', 'label3', 'label4'], 250),
'cat': np.repeat(np.random.choice([*'abcdefghijklmnopqrst'], 40), 25),
'value': np.random.randn(1000).cumsum()})
df['cat'] = pd.Categorical(df['cat'], [*'abcdefghijklmnopqrst'])
sns.set_style('white')
plt.figure(figsize=(15, 5))
ax = sns.boxplot(df, x='label', y='value', hue='cat', palette='turbo')
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1, 1), ncol=2)
sns.despine()
plt.tight_layout()
plt.show()
Individual subplots per x value
A FacetGrid is generated with a subplot ("facet") for each x value
The original hue will be used as x-value for each subplot. To avoid empty spots, the hue should be of string type. When the hue would be pd.Categorical, seaborn would still reserve a spot for each of the categories.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value')
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
plt.tight_layout()
plt.show()
Adding consistent coloring
A dictionary palette can color the boxes such that corresponding boxes in different subplots have the same color. hue= with the same column as the x= will do the coloring, and dodge=False will remove the empty spots.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
cats = np.sort(df['cat'].unique())
palette_dict = {cat: color for cat, color in zip(cats, sns.color_palette('turbo', len(cats)))}
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value',
hue='cat', dodge=False, palette=palette_dict)
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
# ax.tick_params(axis='x', labelrotation=90) # optionally rotate the tick labels
plt.tight_layout()
plt.show()

Rename xticks in seaborn boxplot

Is there a default function/method in seaborn to rename the xticks of a boxplot without the need of changing the input data frame?
I haven't seen anything in documentation neither googling this
Since there is no code or data, customizing the x-axis label based on the example from the official reference can be done by setting any string. As an addition, ticks can also be achieved by converting an existing string or setting a list with the same number of ticks.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", data=tips)
xlabel = ax.get_xlabel()
print(xlabel)
labels = ax.get_xticklabels()
print(labels)
labels = [x.get_text().upper()for x in labels]
ax.set_xticklabels(labels)
ax.set_xlabel('dayofweek')
plt.show()
Graph before customization
This replace the name of the box
ax.set_xticklabels(["First box","Second box"])

Seaborn lineplot without lines between points

How can I use the lineplot plotting function in seaborn to create a plot with no lines connecting between the points. I know the function is called lineplot, but it has the useful feature of merging all datapoints with the same x value and plotting a single mean and confidence interval.
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', err_style='bars')
How do I plot without the line? I'm not sure of a better way to phrase my question. How can I plot points only? Lineless lineplot?
I know that seaborn has a pointplot function, but that is for categorical data. In some cases, my x-values are continuous values, so pointplot would not work.
I realize one could get into the matplotlib figure artists and delete the line, but that gets more complicated as the amount of stuff on the plot increases. I was wondering if there are some sort of arguments that can be passed to the lineplot function.
To get error bars without the connecting lines, you can set the linestyle parameter to '':
import seaborn as sns
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', linestyle='', err_style='bars')
Other types of linestyle could also be interesting, for example "a loosely dotted line": sns.lineplot(..., linestyle=(0, (1, 10)))
I recommend setting join=False.
For me only join = True works.
sns.pointplot(data=df, x = "x_attribute", y = "y_attribute", ci= 95, join=False)

creating plot matrix with relplot in seaborn

I am trying to add multiple plots and create a matrix plot with seaborn. unfortunately python give me following warning.
"relplot is a figure-level function and does not accept target axes. You may wish to try scatterplot"
fig, axes = plt.subplots(nrows=5,ncols=5,figsize=(20,20),sharex=True, sharey=True)
for i in range(5):
for j in range(5):
axes[i][j]=seaborn.relplot(x=col[i+2],y=col[j+2],data=df,ax=axes=[i][j])
I would like to know if there's any method with which I can combine all the plots plotted with relplot.
Hi Kinto welcome to StackOverflow!
relplot works differently than for example scatterplot. With relplot you don't need to define subplots and loop over them. Instead you can say what you would like to vary on each row or column of a graph.
For an example from the documentation:
import seaborn as sns
sns.set(style="ticks")
tips = sns.load_dataset("tips")
g = sns.relplot(
x="total_bill", y="tip", hue="day",
col="time", row="sex", data=tips
)
Which says: on each subplot, plot the total bill on the x-axis, the tip on the y-axis and vary the hue in a subplot with the day. Then for each column, plot unique data from the "time" column of the tips dataset. In this case there are two unique times: "Lunch" and "Diner". And finally vary the "sex" for each subplot row. In this case there are two types of "sex": "Male" and "Female", so on one row you plot the male tipping behavior and on the second the female tipping behavior.
I'm not sure what your data looks like, but hopefully this explanation helps you.

How to put the legend on first subplot of seaborn.FacetGrid?

I have a pandas DataFrame df which I visualize with subplots of a seaborn.barplot. My problem is that I want to move my legend inside one of the subplots.
To create subplots based on a condition (in my case Area), I use seaborn.FacetGrid. This is the code I use:
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
# .. load data
grid = sns.FacetGrid(df, col="Area", col_order=['F1','F2','F3'])
bp = grid.map(sns.barplot,'Param','Time','Method')
bp.add_legend()
bp.set_titles("{col_name}")
bp.set_ylabels("Time (s)")
bp.set_xlabels("Number")
sns.plt.show()
Which generates this plot:
You see that the legend here is totally at the right, but I would like to have it inside one of the plots (for example the left one) since my original data labels are quite long and the legend occupies too much space. This is the example for only 1 plot where the legend is inside the plot:
and the code:
mask = df['Area']=='F3'
ax=sns.barplot(x='Param',y='Time',hue='Method',data=df[mask])
sns.plt.show()
Test 1:
I tried the example of an answer where they have the legend in one of the subplots:
grid = sns.FacetGrid(df, col="Area", col_order=['F1','F2','F3'])
bp = grid.map(sns.barplot,'Param','Time','Method')
Ax = bp.axes[0]
Boxes = [item for item in Ax.get_children()
if isinstance(item, matplotlib.patches.Rectangle)][:-1]
legend_labels = ['So1', 'So2', 'So3', 'So4', 'So5']
# Create the legend patches
legend_patches = [matplotlib.patches.Patch(color=C, label=L) for
C, L in zip([item.get_facecolor() for item in Boxes],
legend_labels)]
# Plot the legend
plt.legend(legend_patches)
sns.plt.show()
Note that I changed plt.legend(handles=legend_patches) did not work for me therefore I use plt.legend(legend_patches) as commented in this answer. The result however is:
As you see the legend is in the third subplot and neither the colors nor labels match.
Test 2:
Finally I tried to create a subplot with a column wrap of 2 (col_wrap=2) with the idea of having the legend in the right-bottom square:
grid = sns.FacetGrid(df, col="MapPubName", col_order=['F1','F2','F3'],col_wrap=2)
but this also results in the legend being at the right:
Question: How can I get the legend inside the first subplot? Or how can I move the legend to anywhere in the grid?
You can set the legend on the specific axes you want, by using grid.axes[i][j].legend()
For your case of a 1 row, 3 column grid, you want to set grid.axes[0][0].legend() to plot on the left hand side.
Here's a simple example derived from your code, but changed to account for the sample dataset.
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
df = sns.load_dataset("tips")
grid = sns.FacetGrid(df, col="day")
bp = grid.map(sns.barplot,"time",'total_bill','sex')
grid.axes[0][0].legend()
bp.set_titles("{col_name}")
bp.set_ylabels("Time (s)")
bp.set_xlabels("Number")
sns.plt.show()
Use the legend_out=False option.
If you are making a faceted bar plot, you should use factorplot with kind=bar. Otherwise, if you don't explicitly specify the order for each facet, it is possible that your plot will end up being wrong.
import seaborn as sns
tips = sns.load_dataset("tips")
sns.factorplot(x="sex", y="total_bill", hue="smoker", col="day",
data=tips, kind="bar", aspect=.7, legend_out=False)

Categories

Resources