This question already has answers here:
Draw a line at specific position/annotate a Facetgrid in seaborn
(5 answers)
Closed 1 year ago.
I want to add an axis line to my plot. I've tried other functions to set tick but it didn't work.
warnings.filterwarnings('ignore')
xvalues = np.arange(2)
# Graph - Grouped by class, survival and sex
r = sns.factorplot(x="Sex", y="Survived", col="Pclass", data=titanicData_clean,
saturation=.5, kind="bar", ci=None, size=5 ,aspect=.8)
r.fig.suptitle('class, survival and sex')
r.fig.subplots_adjust(top=0.9)
plt.xticks(xvalues)
plt.show()
# Fix up the labels
(r.set_axis_labels('', 'Survival Rate')
.set_titles("Class {col_name}")
.set(ylim=(0, 1))
.despine(left=True, bottom=True));
I don't know what kind of line you want. I will use the horizontal line as an example, and answer with examples of drawing individual lines on subplots and drawing a unified line.
The positions of the lines are appropriate, so please replace them with your own data.
import seaborn as sns
titanicData_clean = sns.load_dataset("titanic")
# warnings.filterwarnings('ignore')
xvalues = np.arange(2)
# Graph - Grouped by class, survival and sex
r = sns.catplot(x="sex", y="survived", col="pclass", data=titanicData_clean,
saturation=.5, kind="bar", ci=None, height=5 ,aspect=.8)
r.fig.suptitle('class, survival and sex')
r.fig.subplots_adjust(top=0.9)
plt.xticks(xvalues)
ax1,ax2,ax3 = r.axes[0]
# Fix up the labels
(r.set_axis_labels('', 'Survival Rate')
.set_titles("Class {col_name}")
.set(ylim=(0, 1))
.despine(left=True, bottom=True))
# When drawing individual horizontal lines
ax1.axhline(0.4, ls='--')
ax2.axhline(0.2, ls='--')
ax3.axhline(0.2, ls='--')
# To draw a horizontal line in a unified manner
r.map(plt.axhline, y=0.5,ls='-', c='r')
plt.show()
Related
This question already has answers here:
Why am I getting a line shadow in a seaborn line plot?
(2 answers)
Seaborn lineplot using median instead of mean
(2 answers)
Closed 10 months ago.
`def comparison_visuals(df_new):
matplotlib.rc_file_defaults()
ax1 = sns.set_style(style=None, rc=None )
fig, ax1 = plt.subplots(figsize=(12,6))
sns.lineplot(data = df_new, x='Date', y=
(df_new['Transfer_fee'])/1000000, marker='o', sort = False,
ax=ax1)
ax2 = ax1.twinx()
from matplotlib.ticker import FormatStrFormatter
ax1.xaxis.set_major_formatter(FormatStrFormatter('%.0f'))
sns.lineplot(data = df_new, x='Date', y='Inflation', alpha=0.5,
ax=ax2)
from matplotlib.ticker import FormatStrFormatter
ax2.xaxis.set_major_formatter(FormatStrFormatter('%.0f'))
comparison_visuals(df_new)'
(Edited to paste code above)
Can anyone tell me what the shaded area represents on my line graph (see screenshot)? The middle line represents the mean. I haven't specifically added it. I would like to know first what it is and secondly how to remove it (I might chose to keep it if I find out what it represents and it adds value to my visualisation).
Any related answers I've come across don't del with this directly. Thansk in advance.
Screenshot of line graph
I have a dataset with a lot of categorical variables and a binary target variable. What package is available in Python or other opensource GUI-based software where I can scatterplot two categorical variables on the X and Y axis and use the target variable as hue?
I have looked at Seaborn's catplot, but for that, one axis has to be numerical while the other categorical. So it doesn't serve this case.
For example, you can use the following:
import seaborn as sns
data = sns.load_dataset('titanic')
Here are the plot features I want
X-axis - 'embark_town'
Y-axis - 'class'
hue - 'alive'
I am of the opinion that if you have to rearrange a seaborn graph substantially, you can also create this graph from scratch with matplotlib. This gives us the opportunity to have a different approach to display this categorical vs categorical plot:
import matplotlib.pyplot as plt
from matplotlib.markers import MarkerStyle
import numpy as np
#dataframe and categories
import seaborn as sns
df = sns.load_dataset('titanic')
X = "embark_town"
Y = "class"
H = "alive"
bin_dic = {0: "yes", 1: "no"}
#counting the X-Y-H category entries
plt_df = df.groupby([X, Y, H]).size().to_frame(name="vals").reset_index()
#figure preparation with grid and scaling
fig, ax = plt.subplots(figsize=(9, 6))
ax.set_ylim(plt_df[Y].unique().size-0.5, -0.5)
ax.set_xlim(-0.5, plt_df[X].unique().size+1.0)
ax.grid(ls="--")
#upscale factor for scatter marker size
scale=10000/plt_df.vals.max()
#left marker for category 0
ax.scatter(plt_df[plt_df[H]==bin_dic[0]][X],
plt_df[plt_df[H]==bin_dic[0]][Y],
s=plt_df[plt_df[H]==bin_dic[0]].vals*scale,
c=[(0, 0, 1, 0.5)], edgecolor="black", marker=MarkerStyle("o", fillstyle="left"),
label=bin_dic[0])
#right marker for category 1
ax.scatter(plt_df[plt_df[H]==bin_dic[1]][X],
plt_df[plt_df[H]==bin_dic[1]][Y],
s=plt_df[plt_df[H]==bin_dic[1]].vals*scale,
c=[(1, 0, 0, 0.5)], edgecolor="black", marker=MarkerStyle("o", fillstyle="right"),
label=bin_dic[1])
#legend entries for the two categories
l = ax.legend(title="Survived the catastrophe", ncol=2, framealpha=0, loc="upper right", columnspacing=0.1,labelspacing=1.5)
l.legendHandles[0]._sizes = l.legendHandles[1]._sizes = [800]
#legend entries representing sizes
bubbles_n=5
bubbles_min = 50*(1+plt_df.vals.min()//50)
bubbles_step = 10*((plt_df.vals.max()-bubbles_min)//(10*(bubbles_n-1)))
bubbles_x = plt_df[X].unique().size+0.5
for i, bubbles_y in enumerate(np.linspace(0.5, plt_df[Y].unique().size-1, bubbles_n)):
#plot each legend bubble to indicate different marker sizes
ax.scatter(bubbles_x,
bubbles_y,
s=(bubbles_min + i*bubbles_step) * scale,
c=[(1, 0, 1, 0.6)], edgecolor="black")
#and label it with a value
ax.annotate(bubbles_min+i*bubbles_step, xy=(bubbles_x, bubbles_y),
ha="center", va="center",
fontsize="large", fontweight="bold", color="white")
plt.show()
Seaborn supports, just like matplotlib, the plotting of categorical vs categorical variables. One can create semitransparent markers that allow to see both categories, although this might be difficult to distinguish from one marker if both are of similar size. The essential plot is rather easy - we transform the dataframe with groupby and size to count the entries per triplet embarking town - class - alive category, then create a scatterplot with count value as markersize. However, the legend entry is the complicated part here. Either the markersize is tiny in the plot or massive in the legend. I tried to balance this but I am not happy with the result. A lot of manual adjusting necessary here, so seaborn is no real advantage here. Any suggestions on how to simplify this within seaborn are welcome.
import seaborn as sns
import matplotlib.pyplot as plt
#dataframe and categories
df = sns.load_dataset('titanic')
X = "embark_town"
Y = "class"
H = "alive"
#counting the X-Y-H category entries
plt_df = df.groupby([X, Y, H]).size().to_frame(name="people").reset_index()
#figure preparation with grid and scaling
fig, ax = plt.subplots(figsize=(6,4))
ax.set_ylim(plt_df[Y].unique().size-0.5, -0.5)
ax.set_xlim(-0.5, plt_df[X].unique().size+1.0)
ax.grid(ls="--")
#the actual scatterplot with markersize representing the counted values
sns.scatterplot(x=X,
y=Y,
size="people",
sizes=(100, 10000),
alpha=0.5,
edgecolor="black",
hue=H,
data=plt_df,
ax=ax)
#creating two legends because the hue markers differ in size from the others
handles, labels = ax.get_legend_handles_labels()
l = ax.legend(handles[:3], labels[:3], title="The poor die first", markerscale=2, loc="upper right")
ax.add_artist(l)
#and seaborn plots the size markers in black, so you would get massive black blobs in the legend
#we change the color and make them transparent
for handle in handles:
handle.set_facecolors((0, 1, 1, 0.5))
ax.legend(handles[4::2], labels[4::2], title="N° of people", loc="lower right", handletextpad=4, labelspacing=3, markerfirst=False)
plt.tight_layout()
plt.show()
Sample output:
I'm using Seaborn to generate many types of graphs, but will use just a simple example here for illustration purposes based on an included dataset:
import seaborn
tips = seaborn.load_dataset("tips")
axes = seaborn.scatterplot(x="day", y="tip", size="sex", hue="time", data=tips)
In this result, the single legend box contains two titles "time" and "sex", each with sub-elements.
How could I easily separate the legend into two boxes, each with a single title? I.e. one for legend box indicating color codes (that could be placed at the left), and one legend box indicating size codes (that would be placed at the right).
The following code works well because there is the same number of time categories as sex categories. If it is not necessarily the case, you would have to calculate a priori how many lines of legend are required by each "label"
fig = plt.figure()
tips = seaborn.load_dataset("tips")
axes = seaborn.scatterplot(x="day", y="tip", size="sex", hue="time", data=tips)
h,l = axes.get_legend_handles_labels()
l1 = axes.legend(h[:int(len(h)/2)],l[:int(len(l)/2)], loc='upper left')
l2 = axes.legend(h[int(len(h)/2):],l[int(len(l)/2):], loc='upper right')
axes.add_artist(l1) # we need this because the 2nd call to legend() erases the first
If you want to use matplotlib instead of seaborn,
import matplotlib.pyplot as plt
import seaborn
tips = seaborn.load_dataset("tips")
tips["time_int"] = tips["time"].cat.codes
tips["sex_int"] = (tips["sex"].cat.codes*5+5)**2
sc = plt.scatter(x="day", y="tip", s="sex_int", c="time_int", data = tips, cmap="bwr")
leg1 = plt.legend(sc.legend_elements("colors")[0], tips["time"].cat.categories,
title="Time", loc="upper right")
leg2 = plt.legend(sc.legend_elements("sizes")[0], tips["sex"].cat.categories,
title="Sex", loc="upper left")
plt.gca().add_artist(leg1)
plt.show()
I took Diziet's answer and expanded on it. He produced the necessary syntax I was needing, but as he pointed out, was missing a way to calculate how many lines of legend are required for splitting the legend. I have added this, and wrote a complete script:
# Modules #
import seaborn
from matplotlib import pyplot
# Plot #
tips = seaborn.load_dataset("tips")
axes = seaborn.scatterplot(x="day", y="tip", size="sex", hue="time", data=tips)
# Legend split and place outside #
num_of_colors = len(tips['time'].unique()) + 1
handles, labels = axes.get_legend_handles_labels()
color_hl = handles[:num_of_colors], labels[:num_of_colors]
sizes_hl = handles[num_of_colors:], labels[num_of_colors:]
# Call legend twice #
color_leg = axes.legend(*color_hl,
bbox_to_anchor = (1.05, 1),
loc = 'upper left',
borderaxespad = 0.)
sizes_leg = axes.legend(*sizes_hl,
bbox_to_anchor = (1.05, 0),
loc = 'lower left',
borderaxespad = 0.)
# We need this because the 2nd call to legend() erases the first #
axes.add_artist(color_leg)
# Adjust #
pyplot.subplots_adjust(right=0.75)
# Display #
pyplot.ion()
pyplot.show()
This question already has answers here:
How to add axis offset in matplotlib plot?
(2 answers)
Closed 4 years ago.
I am plotting two seaborn categorical plots (pointplot and swarmplot) on top of each other and just can't figure out how I can change the x axis position of one of them (i.e. the swarm plot in my particular case) so that instead of overlapping the plots are 'side by side' (i.e. ideally I want to have the individual data points to the right of the mean and ci).
Here's the code to produce the plot:
import seaborn as sns
# set style and font size
sns.set(style='white', rc={'figure.figsize':(6,6)}, font_scale=1.3)
# plot means as points with confidence intervals
a = sns.pointplot(x='Group',
y='RT',
data=data,
estimator= np.mean,
capsize=.2,
join=False,
color='black',
size=12)
# plot individual data points as swarmplot
b = sns.swarmplot(x='Group',
y='RT',
data=data,
size=8,
alpha=0.8)
You can feed the axis handle to the sns.
I am not sure whether this is what do you want!
import seaborn as sns
sns.set(style="whitegrid")
tips = sns.load_dataset("tips")
fig,ax =plt.subplots(1,2,figsize=(15,7))
sns.swarmplot(x="day", y="total_bill", data=tips,
ax= ax[1])
sns.pointplot(x='day',
y='total_bill',
data=tips,
estimator= np.mean,
capsize=.2,
join=False,
color='black',
size=12,ax=ax[0])
I want to plot categorical plots with the Seaborn pointplot, but data points that are not adjacent are not connected with a line in the plot. I would like to interpolate between non adjacent points, and connect them in the same way as adjacent points are connected, how can I do this?
An example: In the left and middle images, the blue and green points should be connected with a curve, respectively, but now they are separated into small parts. How can I plot the left and middle images just like the right one?
fig, axs = plt.subplots(ncols=3, figsize=(10,5))
exp_methods = ['fMRI left', 'fMRI right', 'MEG']
for i in range(3):
experiment = exp_methods[i]
dataf = df[df['data']==experiment]
sns.pointplot(x='number_of_subjects', y='accuracy', hue='training_size', data=dataf,
capsize=0.2, size=6, aspect=0.75, ci=95, legend=False, ax=axs[i])
I don't think there is an option to interpolate where there are missing data points, and hence the line stops instead. This question on the same topic from 2016 remains unanswered.
Instead, you could use plt.errorbar as suggested in the comments, or add the lines afterwards using plt.plot while still using seaborn to plot the means and error bars:
import seaborn as sns
tips = sns.load_dataset('tips')
# Create a gap in the data and plot it
tips.loc[(tips['size'] == 4) & (tips['sex'] == 'Male'), 'size'] = 5
sns.pointplot('size', 'total_bill', 'sex', tips, dodge=True)
# Fill gap with manual line plot
ax = sns.pointplot('size', 'total_bill', 'sex', tips, dodge=True, join=False)
# Loop over the collections of point in the axes and the grouped data frame
for points, (gender_name, gender_slice) in zip(ax.collections, tips.groupby('sex')):
# Retrieve the x axis positions for the points
x_coords = [coord[0] for coord in points.get_offsets()]
# Manually calculate the mean y-values to use with the line
means = gender_slice.groupby(['size']).mean()['total_bill']
ax.plot(x_coords, means, lw=2)