How to do a boxplot with individual data points using seaborn - python

I have a box plot that I create using the following command:
sns.boxplot(y='points_per_block', x='block', data=data, hue='habit_trial')
So the different colors represent whether the trial was a habit trial or not (0,1). I want to also plot the individual data points, which I tried to achieve using:
sns.stripplot(y='points_per_block', x='block', data=data, hue='habit_trial')
The result was the following
I want the individual points to display over the corresponding box plots. Is there a way to do this without resorting to hacking their positions in some manner? The problem comes from the fact that the separation of data using hue works differently for stripplot and boxplot but I would have thought that these would be easily combinable.
Thanks in advance.

Seaborn functions working with categorical data usually have a dodge= parameter indicating whether data with different hue should be separated a bit. For a boxplot, dodge defaults to True, as it usually would look bad without dodging. For a stripplot defaults to dodge=False.
The following example also shows how the legend can be updated (matplotlib 3.4 is needed for HandlerTuple):
import seaborn as sns
from matplotlib.legend_handler import HandlerTuple
tips = sns.load_dataset("tips")
ax = sns.boxplot(data=tips, x="day", y="total_bill",
hue="smoker", hue_order=['Yes', 'No'], boxprops={'alpha': 0.4})
sns.stripplot(data=tips, x="day", y="total_bill",
hue="smoker", hue_order=['Yes', 'No'], dodge=True, ax=ax)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=[(handles[0], handles[2]), (handles[1], handles[3])],
labels=['Smoker', 'Non-smoker'],
loc='upper left', handlelength=4,
handler_map={tuple: HandlerTuple(ndivide=None)})

Related

Legend not showing with barless histogram plot in python

I am trying to plot a kde plot in seaborn using the histplot function, and removing later the bars of the histogram in the following way (see last part of the accepted answer here):
fig, ax = plt.subplots()
sns.histplot(data, kde=True, binwidth=5, stat="probability", label='data1', kde_kws={'cut': 3})
The reason for using histplot instead of kdeplot is that I need to set a specific binwidth. The problem I have that I cannot print out the legend, meaning that
ax.legend(loc='best')
does nothing, and I receive the following message: No handles with labels found to put in legend.
I have also tried with
handles, labels = ax.get_legend_handles_labels()
plt.legend(handles, labels, loc='best')
but without results. Does anybody have an idea of what is going on here? Thanks in advance!
You can add the label for the kde line via the line_kws={'label': ...} parameter.
sns.kdeplot can't be used directly, because currently the only option is the default scaling (density).
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = np.random.normal(0.01, 0.1, size=10000).cumsum()
ax = sns.histplot(data, kde=True, binwidth=5, stat="probability", label='data1',
kde_kws={'cut': 3}, line_kws={'label': 'kde scaled to probability'})
ax.containers[0].remove() # remove the bars of the histogram
ax.legend()
plt.show()

Seaborn lineplot without lines between points

How can I use the lineplot plotting function in seaborn to create a plot with no lines connecting between the points. I know the function is called lineplot, but it has the useful feature of merging all datapoints with the same x value and plotting a single mean and confidence interval.
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', err_style='bars')
How do I plot without the line? I'm not sure of a better way to phrase my question. How can I plot points only? Lineless lineplot?
I know that seaborn has a pointplot function, but that is for categorical data. In some cases, my x-values are continuous values, so pointplot would not work.
I realize one could get into the matplotlib figure artists and delete the line, but that gets more complicated as the amount of stuff on the plot increases. I was wondering if there are some sort of arguments that can be passed to the lineplot function.
To get error bars without the connecting lines, you can set the linestyle parameter to '':
import seaborn as sns
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', linestyle='', err_style='bars')
Other types of linestyle could also be interesting, for example "a loosely dotted line": sns.lineplot(..., linestyle=(0, (1, 10)))
I recommend setting join=False.
For me only join = True works.
sns.pointplot(data=df, x = "x_attribute", y = "y_attribute", ci= 95, join=False)

categorical data not fitting text into plot legend from dataframe when using seaborn

I am using pandas in conjunction with numpy, matplotlib, seaborne and a load of data. I am plotting dataframe data with the following line of code
####MINIMIZED CODE#####
importing numpy ect
df=pd.read_csv("csvfile")
df["movieTitle"]=pd.Categorical(df["movieTitle"] #done for "movieTitle"
df['releaseData']=pd.to_datetime(df["releaseData"]
df2=df.sort_values('inflationGross', ascending=True) ###which was disneyData2 in original code
df2=df=df2.head(100)
plot=rando=sns.lmplot(x='releaseDate', y='inflationGross', data=df2, fit_reg=False, hue='movieTitle', legend=True)
rando.set_xticklabels(rotation=90)
########ACTUAL CODE#########
rando=sns.lmplot(x='releaseDate', y='inflationGross', data=disneyData2, fit_reg=False, hue='movieTitle', legend=True)```
The plot itself is fine, however, the categorical data, which is a series of film titles won't fit entirely on the legend. There's a large number of categories, which I understand isn't good practice but I'd like to find a way to perhaps shrink text size, or format the legend correctly.
Everything has been imported correctly and I have checked there are no bugs in the code.
I would like to keep only using seaborne to analyse this data NOT matplotlib
Picture of the problem is included below.
Thank you in advance for any help!
Roughly speaking, you can delete the automatically created legend and fetch the text of the original legend to create a new The legend will be set to (I set the font size to 8.) Since no data was presented, I modified the code on the official site for illustrative purposes. If the data is presented we will get some great answers from more people.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(color_codes=True)
tips = sns.load_dataset("tips")
g = sns.lmplot(x="total_bill", y="tip", hue='smoker', data=tips, legend=True, legend_out=True)
# Figure can hold multiple legends, so you can specify the first legend in the list
lg = g.fig.legends[0]
# Figure legend delete
g.fig.legends[0].remove()
# The list where the handle is kept
handles = lg.legendHandles
# Extracts only strings from the list of Text objects holding labels
labels = [t.get_text() for t in lg.texts]
print(handles)
print(labels)
g.fig.axes[0].legend(handles, labels, loc='center left', bbox_to_anchor=(1.0, 0.5), frameon=False, fontsize=8)

Difficulty combining and repositioning the legends of two charts in matplotlib and pandas

I am trying to plot two charts onto one figure, with both charts coming from the same dataframe, but one represented as a stacked bar chart and the other a simple line plot.
When I create the plot using the following code:
combined.iloc[:, 1:10].plot(kind='bar', stacked=True, figsize=(20,10))
combined.iloc[:, 0].plot(kind='line', secondary_y=True, use_index=False, linestyle='-', marker='o')
plt.legend(loc='upper left', fancybox=True, framealpha=1, shadow=True, borderpad=1)
plt.show()
With the combined data frame looking like this:
I get the following image:
I am trying to combine both legends into one, and position the legend in the upper left hand corner so all the chart is visible.
Can someone explain why plt.legend() only seems to be editing the line chart corresponding to the combined.iloc[:, 0] slice of my combined dataframe? If anyone can see a quick and easy way to combine and reposition the legends please let me know! I'd be most grateful.
Passing True for the argument secondary_y means that the plot will be created on a separate axes instance with twin x-axis, since this creates a different axes instance the solution is generally to create the legend manually, as in the answers to the question linked by #ImportanceOfBeingErnest. If you don't want to create the legend directly you can get around this issue by calling plt.legend() between calls to pandas.DataFrame.plot and storing the result. You can then recover the handles and labels from the two axes instances. The following code is a complete example of this
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({'x' : np.random.random(25),
'y' : np.random.random(25)*5,
'z' : np.random.random(25)*2.5})
df.iloc[:, 1:10].plot(kind='bar', stacked=True)
leg = plt.legend()
df.iloc[:, 0].plot(kind='line', y='x', secondary_y=True)
leg2 = plt.legend()
plt.legend(leg.get_patches()+leg2.get_lines(),
[text.get_text() for text in leg.get_texts()+leg2.get_texts()],
loc='upper left', fancybox=True, framealpha=1, shadow=True, borderpad=1)
leg.remove()
plt.show()
This will produce
and should be fairly easy to modify to suit your specific use case.
Alternatively, you can use matplotlib.pyplot.figlegend(), but you will need to pass legend = False in all calls to pandas.DataFrame.plot(), i.e.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({'x' : np.random.random(25),
'y' : np.random.random(25)*5,
'z' : np.random.random(25)*2.5})
df.iloc[:, 1:10].plot(kind='bar', stacked=True, legend=False)
df.iloc[:, 0].plot(kind='line', y='x', secondary_y=True, legend=False)
plt.figlegend(loc='upper left', fancybox=True, framealpha=1, shadow=True, borderpad=1)
plt.show()
This will however default to positioning the legend outside the axes, but you can override the automatic positioning via the bbox_to_anchor argument in calling plt.figlegend().

Seaborn jointplot group colour coding (for both scatter and density plots)

I would like to use sns.jointplot to visualise the association between X and Y in the presence of two groups. However, in
tips = sns.load_dataset("tips")
sns.jointplot("total_bill", "tip", data=tips)
there is no "hue" option as in other sns plots such as sns.scatterplot. How could one assign different colours for different groups (e.g. hue="smoker") in both the scatter plot, as well as the two overlapping density plots.
In R this could be done by creating a scatter plot with two marginal density plots as shown in here.
What is the equivalent in sns? If this is not possible in sns, is there another python package that can be used for this?
jointplot is a simple wrapper around sns.JointGrid. If you create a JointGrid object and add plots to it manually, you will have much more control over the individual plots.
In this case, your desired jointplot is simply a scatterplot combined with a kdeplot, and what you want to do is pass hue='smoker' (for example) to scatterplot.
The kdeplot is more complex; seaborn doesn't really support one KDE for each class, AFAIK, so I was forced to plot them individually (you could use a for loop with more classes).
Accordingly, you can do this:
import seaborn as sns
tips = sns.load_dataset('tips')
grid = sns.JointGrid(x='total_bill', y='tip', data=tips)
g = grid.plot_joint(sns.scatterplot, hue='smoker', data=tips)
sns.kdeplot(tips.loc[tips['smoker']=='Yes', 'total_bill'], ax=g.ax_marg_x, legend=False)
sns.kdeplot(tips.loc[tips['smoker']=='No', 'total_bill'], ax=g.ax_marg_x, legend=False)
sns.kdeplot(tips.loc[tips['smoker']=='Yes', 'tip'], ax=g.ax_marg_y, vertical=True, legend=False)
sns.kdeplot(tips.loc[tips['smoker']=='No', 'tip'], ax=g.ax_marg_y, vertical=True, legend=False)

Categories

Resources