Showing both boxplots when using split in seaborn violinplots - python

I would like to make split violin plots which also show the boxplots for both datasets, like in the figure of the question Seaborn: How to apply custom color to each seaborn violinplot? , problem is that when using split seaborn shows only one of them (and it's not even clear to me to which dataset it refers to) as you can see in the answer, is there a way to overcome this or should I use a different package?

Here is an example with an artificial dataset to show how the default inner='box' shows a simple boxplot-like box for the combined dataset.
The second plot shows how inner='quartile' looks like.
The rightmost plot shows an approach to explicitly draw separate boxplots (using width= to place them close to the center).
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
data = pd.DataFrame({'Value': (np.random.randn(4, 100).cumsum(axis=0) + np.array([[15], [5], [12], [7]])).ravel(),
'Set': np.repeat(['A', 'B', 'A', 'B'], 100),
'x': np.repeat([1, 2], 200)})
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(15, 4))
palette = ['paleturquoise', 'yellow']
sns.violinplot(data=data, x='x', y='Value', hue='Set', split=True, inner='box', palette=palette, ax=ax1)
ax1.set_title('Default, inner="box"')
sns.violinplot(data=data, x='x', y='Value', hue='Set', split=True, inner='quartiles', palette=palette, ax=ax2)
ax2.set_title('Using inner="quartiles"')
sns.violinplot(data=data, x='x', y='Value', hue='Set', split=True, inner=None, palette=palette, ax=ax3)
sns.boxplot(data=data, x='x', y='Value', hue='Set', color='white', width=0.3, boxprops={'zorder': 2}, ax=ax3)
ax3.set_title('Explicitely drawing boxplots')
handles, labels = ax3.get_legend_handles_labels()
ax3.legend(handles[:2], labels[:2], title='Set')
plt.tight_layout()
plt.show()

Related

How to color different seaborn kdeplots in one figure?

With seaborn, I want to plot the kde distribution of 4 different arrays all in one plot.
The problem is that all arrays have different lengths to eachother.
mc_means_TP.shape, mc_means_TN.shape, mc_means_FP.shape, mc_means_FN.shape
> ((3640, 1), (3566, 1), (170, 1), (238, 1))
This makes some workaround necessary, in which I plot them all in one plot by sharing the same axis:
import seaborn as sns
fig, ax = plt.subplots()
sns.kdeplot(data=mc_means_TP, ax=ax, color='red', fill=True)
sns.kdeplot(data=mc_means_TN, ax=ax, color='green', fill=True)
sns.kdeplot(data=mc_means_FP, ax=ax, color='yellow')
sns.kdeplot(data=mc_means_FN, ax=ax, color='purple')
The result looks like this:
Obviously, since they are sharing the same axis, it is not possible to color them differently, they are all colored blue.
I tried solving this with ax.set_prop_cycle(color=['red', 'green', 'blue', 'purple']), but it doesn't work, I guess because Im using the same ax for all plots.
I guess the question breaks down to how to visualize the distribution density of different sized data arrays in one plot?
When arrays with more than one dimension are used, seaborn here ignores the color parameter and only considers the palette. You can either provide a palette (to override the default blue one used in this case), or to squeeze the arrays to be one dimensional:
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
mc_means_TP = np.random.normal(10, 1, size=(3640, 1))
mc_means_TN = np.random.normal(20, 1, size=(3566, 1))
mc_means_FP = np.random.normal(12, 1, size=(170, 1))
mc_means_FN = np.random.normal(18, 1, size=(238, 1))
fig, ax = plt.subplots()
sns.kdeplot(data=mc_means_TP.squeeze(), ax=ax, color='red', fill=True, label='means TP')
sns.kdeplot(data=mc_means_TN.squeeze(), ax=ax, color='green', fill=True, label='means TN')
sns.kdeplot(data=mc_means_FP.squeeze(), ax=ax, color='gold', label='means FP')
sns.kdeplot(data=mc_means_FN.squeeze(), ax=ax, color='purple', label='means FN')
ax.legend(bbox_to_anchor=(1.02, 1.02), loc='upper left')
plt.tight_layout()
plt.show()

How to add x and y axis line in seaborn scatter plot

I used the following code to create scatterplot (data is imported as an example). However, the plot was created without x and y axis, which looks weird. I would like to keep facecolor='white' as well.
import seaborn as sns
tips = sns.load_dataset("tips")
fig, ax = plt.subplots(figsize=(10, 8))
sns.scatterplot(
x='total_bill',
y='tip',
data=tips,
hue='total_bill',
edgecolor='black',
palette='rocket_r',
linewidth=0.5,
ax=ax
)
ax.set(
title='title',
xlabel='total_bill',
ylabel='tip',
facecolor='white'
);
Any suggestions? Thanks a lot.
You seem to have explicitly set the default seaborn theme. That has no border (so also no line for x and y axis), a grey facecolor and white grid lines. You can use sns.set_style("whitegrid") to have a white facecolor. You can also use sns.despine() to only show the x and y-axis but no "spines" at the top and right. See Controlling figure aesthetics for more information about fine-tuning how the plot looks like.
Here is a comparison. Note that the style should be set before the axes are created, so for demo-purposes plt.subplot creates the axes one at a time.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set() # set the default style
# sns.set_style('white')
tips = sns.load_dataset("tips")
fig = plt.figure(figsize=(18, 6))
for subplot_ind in (1, 2, 3):
if subplot_ind >= 2:
sns.set_style('white')
ax = plt.subplot(1, 3, subplot_ind)
sns.scatterplot(
x='total_bill',
y='tip',
data=tips,
hue='total_bill',
edgecolor='black',
palette='rocket_r',
linewidth=0.5,
ax=ax
)
ax.set(
title={1: 'Default theme', 2: 'White style', 3: 'White style with despine'}[subplot_ind],
xlabel='total_bill',
ylabel='tip'
)
if subplot_ind == 3:
sns.despine(ax=ax)
plt.tight_layout()
plt.show()

Seaborn plots appending legends when plotting multiple plots in the same script

I am new to Seaborn. When plotting multiple plots in the same script, the first plot is correct, but for the rest, the legends are appended which skew the plots.
My code
sns.set()
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
ax = sns.scatterplot(x="Clicks", y="Impressions",
hue="Language2", size="CTR",
palette=cmap, sizes=(10, 200),
data=df)
ax.get_figure().savefig('Test plot.png')
sns.set()
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
ax0 = sns.scatterplot(x="Impressions", y="Clicks",
hue="Word2", size="Transactions",
palette=cmap, sizes=(10, 200),
data=df)
ax0.get_figure().savefig('Test plot 2.png')
sns.set()
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
ax1 = sns.scatterplot(x="CTR", y="CostPerTransaction",
hue="Language2", size="Transactions",
palette=cmap, sizes=(10, 200),
data=df)
ax1.get_figure().savefig('Test plot 3.png')
I am not sure if I should use sns.set() each time. I've renamed each ax but the issue persists.
Also, maybe you could suggest how I could improve my plots.
Thank you for your suggestions.
I am not certain if this will fix your problem. But in general I have a strong preference for using the explicit object oriented approach whenever creating more than one plot in matplotlib/seaborn (matplotlib is the underlying library, seaborn is just wrapping it to make certain applications quicker). This means getting rid of the ax.get_figure().savefig parts. I found this tutorial really useful in understanding the object oriented matplotlib approach compared to the implicit state approach.
Your code in this method would look like this:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
fig1, ax1 = plt.subplots()
sns.scatterplot(x="Clicks", y="Impressions",
hue="Language2", size="CTR",
palette=cmap, sizes=(10, 200),
data=df,
ax=ax1)
# This may help with your axes labels spilling off the figure:
fig1.tight_layout()
fig1.savefig('Test plot.png')
# the sns.set is not needed each time
fig2, ax2 = plt.subplots()
# cmap is the same, so we don't need to define that again
sns.scatterplot(x="Impressions", y="Clicks",
hue="Word2", size="Transactions",
palette=cmap, sizes=(10, 200),
data=df,
ax=ax2)
fig2.savefig('Test plot 2.png')
fig3, ax3 = plt.subplots()
sns.scatterplot(x="CTR", y="CostPerTransaction",
hue="Language2", size="Transactions",
palette=cmap, sizes=(10, 200),
data=df,
ax=ax3)
fig3.savefig('Test plot 3.png')

How to assign different position for each group in seaborn violin plot

The shape of violin plot is useful for visualizing data distribution of grouped data. The size of each group can also be visualized as the area of the 'violin'.
But when the data is heterogeneous, the width of certain group is too small to show any meaning info (Fri group in Figure 1). There is width option in seaborn violinplot for enlarging the size of the plot.
However, once group of small size is enlarged into a suitable scale, the large ones will become 'too large'(Sat group in Figure 2) and overlap with each other.
Thus, my question is how to assign different gaping distance for violin plot in seaborn.
The demo
Code for generating the Figure 1:
import seaborn as sns
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", hue="sex",
data=tips, palette="Set2", split=True,
scale="count", inner="stick",
scale_hue=False, bw=.2)
Figure 1
Code for generating the Figure 2:
import seaborn as sns
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", hue="sex",
data=tips, palette="Set2", split=True,
scale="count", inner="stick", width=2.5
scale_hue=False, bw=.2)
Figure 2
What is your solution?
The first attempt is increase figure width, but it looks terrible and leaves too much white space in the figure.
I tried to map category data in x axis into numeric form with different distance between them.
tips["day_n"] = tips["day"].map(dict(zip(tips["day"].unique(), [1, 2, 4, 6])))
But it seems that seaborn does not support numeric data, the distance between group keep just unchanged or mess up, when switching the x, y axis.
Code for generating the Figure 3:
ax = sns.violinplot(y="day_n", x="total_bill", hue="sex",
data=tips, palette="Set2", split=True,
scale="count", inner="stick", width=2.5,
scale_hue=False, bw=.2)
Figure 3
A similar question in stackoverflow, indicating that there is positions option for matplotlib. But it is not work for seaborn either.
Using the order parameter can achieve the [1, 2, 4, 6] positions on the x-axis:
import seaborn as sns, matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", hue="sex",
data=tips, palette="Set2", split=True,
scale="count", inner="stick",
scale_hue=False, bw=.2, width=2.5,
order=('Thur', 'Fri', '', 'Sat', '', 'Sun'))
# get rid of ticks for empty columns (levels)
ax.set_xticks([0,1,3,5])
ax.set_xticklabels(['Thur', 'Fri', 'Sat', 'Sun'])
plt.show()
Here the result:

How to set xticks in subplots

If I plot a single imshow plot I can use
fig, ax = plt.subplots()
ax.imshow(data)
plt.xticks( [4, 14, 24], [5, 15, 25] )
to replace my xtick labels.
Now, I am plotting 12 imshow plots using
f, axarr = plt.subplots(4, 3)
axarr[i, j].imshow(data)
How can I change my xticks just for one of these subplots? I can only access the axes of the subplots with axarr[i, j]. How can I access plt just for one particular subplot?
There are two ways:
Use the axes methods of the subplot object (e.g. ax.set_xticks and ax.set_xticklabels) or
Use plt.sca to set the current axes for the pyplot state machine (i.e. the plt interface).
As an example (this also illustrates using setp to change the properties of all of the subplots):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=3, ncols=4)
# Set the ticks and ticklabels for all axes
plt.setp(axes, xticks=[0.1, 0.5, 0.9], xticklabels=['a', 'b', 'c'],
yticks=[1, 2, 3])
# Use the pyplot interface to change just one subplot...
plt.sca(axes[1, 1])
plt.xticks(range(3), ['A', 'Big', 'Cat'], color='red')
fig.tight_layout()
plt.show()
See the (quite) recent answer on the matplotlib repository, in which the following solution is suggested:
If you want to set the xticklabels:
ax.set_xticks([1,4,5])
ax.set_xticklabels([1,4,5], fontsize=12)
If you want to only increase the fontsize of the xticklabels, using the default values and locations (which is something I personally often need and find very handy):
ax.tick_params(axis="x", labelsize=12)
To do it all at once:
plt.setp(ax.get_xticklabels(), fontsize=12, fontweight="bold",
horizontalalignment="left")`

Categories

Resources