Labeling boxplot in seaborn with median value

Labeling boxplot in seaborn with median value - python

How can I label each boxplot in a seaborn plot with the median value?
E.g.
import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", data=tips)
How do I label each boxplot with the median or average value?

I love when people include sample datasets!
import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
box_plot = sns.boxplot(x="day",y="total_bill",data=tips)
medians = tips.groupby(['day'])['total_bill'].median()
vertical_offset = tips['total_bill'].median() * 0.05 # offset from median for display
for xtick in box_plot.get_xticks():
box_plot.text(xtick,medians[xtick] + vertical_offset,medians[xtick],
horizontalalignment='center',size='x-small',color='w',weight='semibold')

Based on ShikharDua's approach, I created a version which works independent of tick positions. This comes in handy when dealing with grouped data in seaborn (i.e. hue=parameter). Additionally, I added a flier- and orientation-detection.
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects
def add_median_labels(ax, fmt='.1f'):
lines = ax.get_lines()
boxes = [c for c in ax.get_children() if type(c).__name__ == 'PathPatch']
lines_per_box = int(len(lines) / len(boxes))
for median in lines[4:len(lines):lines_per_box]:
x, y = (data.mean() for data in median.get_data())
# choose value depending on horizontal or vertical plot orientation
value = x if (median.get_xdata()[1] - median.get_xdata()[0]) == 0 else y
text = ax.text(x, y, f'{value:{fmt}}', ha='center', va='center',
fontweight='bold', color='white')
# create median-colored border around white text for contrast
text.set_path_effects([
path_effects.Stroke(linewidth=3, foreground=median.get_color()),
path_effects.Normal(),
])
tips = sns.load_dataset("tips")
ax = sns.boxplot(data=tips, x='day', y='total_bill', hue="sex")
add_median_labels(ax)
plt.show()

This can also be achieved by deriving median from the plot itself without exclusively computing median from data
box_plot = sns.boxplot(x="day", y="total_bill", data=tips)
ax = box_plot.axes
lines = ax.get_lines()
categories = ax.get_xticks()
for cat in categories:
# every 4th line at the interval of 6 is median line
# 0 -> p25 1 -> p75 2 -> lower whisker 3 -> upper whisker 4 -> p50 5 -> upper extreme value
y = round(lines[4+cat*6].get_ydata()[0],1)
ax.text(
cat,
y,
f'{y}',
ha='center',
va='center',
fontweight='bold',
size=10,
color='white',
bbox=dict(facecolor='#445A64'))
box_plot.figure.tight_layout()

Related

How to overlay two 2D-histograms in Matplotlib?

I have two datasets (corresponding with the time-positional data of hydrogen atoms and time-positional data of alumina atoms) in the same system.
I want to plot the density of each element by overlaying two hist2d plots using matplotlib.
I am currently doing this by setting an alpha value on the second hist2d:
fig, ax = plt.subplots(figsize=(4, 4))
v = ax.hist2d(x=alx, y=aly,
bins=50, cmap='Reds')
h = ax.hist2d(x=hx, y=hy,
bins=50, cmap='Blues',
alpha=0.7)
ax.set_title('Adsorption over time, {} K'.format(temp))
ax.set_xlabel('picoseconds')
ax.set_ylabel('z-axis')
fig.colorbar(h[3], ax=ax)
fig.savefig(savename, dpi=300)
I do get the plot that I want, however the colors seem washed out due to the alpha value.
Is there a more correct way to do generate such plots?

One way to achieve this would be a to add fading alphas towards lower levels to the existing color maps:
import numpy as np
import matplotlib.pylab as pl
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
# modify existing Reds colormap with a linearly fading alpha
red = pl.cm.Reds # original colormap
fading_red = red(np.arange(red.N)) # extract colors
fading_red[:, -1] = np.linspace(0, 1, red.N) # modify alpha
fading_red = ListedColormap(fading_red) # convert to colormap
# data generation
random_1 = np.random.randn(10000)+1
random_2 = np.random.randn(10000)+1
random_3 = np.random.randn(10000)
random_4 = np.random.randn(10000)
# plot
fig, ax = plt.subplots(1,1)
plt.hist2d(x=random_3, y=random_4, bins=100, cmap="Blues")
plt.hist2d(x=random_1, y=random_2, bins=50, cmap=fading_red)
plt.show()

How to shift quartile lines in seaborn grouped violin plots?

Consider the following seaborn grouped violinplot with split violins, where I inserted a small space inbetween.
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
fig, ax = plt.subplots()
sns.violinplot(
data=tips, x="day", y="total_bill", hue="smoker", split=True, inner="quart", linewidth=1,
palette={"Yes": "b", "No": ".85"}, ax=ax
)
sns.despine(left=True)
delta = 0.025
for ii, item in enumerate(ax.collections):
if isinstance(item, matplotlib.collections.PolyCollection):
path, = item.get_paths()
vertices = path.vertices
if ii % 2: # -> to right
vertices[:, 0] += delta
else: # -> to left
vertices[:, 0] -= delta
plt.show()
How can I shift the quartile (and median) indicating dotted (and dashed) lines back inside the violins?

You can do it exactly the same way as you did with the violins:
for i, line in enumerate(ax.get_lines()):
line.get_path().vertices[:, 0] += delta if i // 3 % 2 else -delta

How to add a colorbar to a plt.bar chart?

I am trying to create a self updating chart that displays a horizontal line and color bars based on a y-axis value of interest. So bars might be colored red if they are definitely above this value (given a 95% confidence interval), blue if they are definitely below this value, or white if they contain this value. something similar to this:
The problem I have is I cant display the colorbar on my plot. I managed to color each bar based on a LinearSegmentedColormap and some conditions, but I cant manage to display this colorbar on my image.
This is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
import matplotlib.axes
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
from matplotlib.cm import ScalarMappable
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650),
np.random.normal(43000,100000,3650),
np.random.normal(43500,140000,3650),
np.random.normal(48000,70000,3650)],
index=[1992,1993,1994,1995])
means = []
for i in df.index:
means.append(df.loc[i].mean())
std = []
for i in df.index:
std.append(df.loc[i].std())
# compute the 95% confidence intervals
conf = []
for i in range(len(means)):
margin = (1.96*std[i])/sqrt(len(df.columns))
conf.append(margin)
fig, axs = plt.subplots(1)
bars = plt.bar(df.index, means, yerr= conf, tick_label = df.index, capsize = 10)
#Setup the plot
yinterest = 43000
plt.gca().spines.get('top').set_visible(False)
plt.gca().spines.get('right').set_visible(False)
plt.axhline(yinterest, color = 'black', label = '4300')
#setting the y-interest tick
plt.draw()
labels = [w.get_text() for w in ax.get_yticklabels()]
locs=list(ax.get_yticks())
labels+=[str(yinterest)]
locs+=[float(yinterest)]
ax.set_yticklabels(labels)
ax.set_yticks(locs)
plt.draw()
#setting up the colormap
colormap = cm.get_cmap('RdBu', 10)
colores = []
for i in range(len(means)):
color = (yinterest-(means[i]-conf[i]))/((means[i]+conf[i])-(means[i]-conf[i]))
bars[i].set_color(colormap(color))
I am fairly new to python (or programming for that matter) and I have searched everywhere for a solution but to no avail. Any help would be appreciated.
Greetings.

The first hint is to use pandasonic methods to compute plot data
(much more concise):
means = df.mean(axis=1)
std = df.std(axis=1)
conf = (std * 1.96 / sqrt(df.shape[1]))
And to draw your plot, run:
yinterest = 39541
fig, ax = plt.subplots(figsize=(10,6))
ax.spines.get('top').set_visible(False)
ax.spines.get('right').set_visible(False)
colors = (yinterest - (means - conf)) / (2 * conf)
colormap = plt.cm.get_cmap('RdBu', 10)
plt.bar(df.index, means, yerr=conf, tick_label=df.index, capsize=10, color=colormap(colors))
cbar = plt.colorbar(plt.cm.ScalarMappable(cmap=colormap), orientation='horizontal')
cbar.set_label('Color', labelpad=5)
plt.axhline(yinterest, color='black', linestyle='--', linewidth=1)
plt.show()
One trick that allows to avoid colouring the bars after their
generation is that I compute colors, which are then converted to
a color map and passed to plt.bar.
To draw the color bar, use plt.colorbar.
I changed the value of yinterest to that included in your picture and got
something similar to your picture, but with a color bar:

Boxplot and Data points side by side in one plot

plt.figure(figsize=(8,5))
sns.boxplot(x=df.StoreType, y=df.Sales)
I'm getting the above plot but i want boxplot and data points side by side (not overlapped,using seaborn or matplotlib) like the one below:

The code below borrows from a couple of other SO answers:
offsetting swarmplot's idea is from: https://stackoverflow.com/a/56655927/42346
changing width of swarmplot's code is from: https://stackoverflow.com/a/44542112/42346
If you have questions about the code please tell me.
import seaborn as sns, numpy as np
import matplotlib, matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
# adjust these as per your data
boxplot_width = .25 # thinner to make room for having swarmplot beside
swarmplot_offset = -.5 # offset to left of boxplot
xlim_offset = -1 # necessary to show leftmost swarmplot
fig = plt.figure(figsize=(6,4))
ax = sns.swarmplot(x="day", y="total_bill", data=tips)
path_collections = [child for child in ax.get_children()
if isinstance(child,matplotlib.collections.PathCollection)]
for path_collection in path_collections:
x,y = np.array(path_collection.get_offsets()).T
xnew = x + swarmplot_offset
offsets = list(zip(xnew,y))
path_collection.set_offsets(offsets)
sns.boxplot(x="day", y="total_bill", data=tips, width=boxplot_width, ax=ax)
def change_width(ax, new_value):
for patch in ax.patches:
current_width = patch.get_width()
diff = current_width - new_value
# change patch width
patch.set_width(new_value)
# re-center patch
patch.set_x(patch.get_x() + diff * .5)
change_width(ax,.25)
ax.set_xticklabels(ax.get_xticklabels(), ha="right") # align labels to left
ax.set_xlim(xlim_offset,ax.get_xlim()[1]) # to show leftmost swarmplot
plt.show()
Example image:

frequency trail in matplotlib

I'm looking into outliers detection. Brendan Gregg has a really nice article and I'm especially intrigued by his visualizations. One of the methods he uses are frequency trails.
I'm trying to reproduce this in matplotlib using this example. Which looks like this:
And the plot is based on this answer: https://stackoverflow.com/a/4152016/948369
Now my issue is, like described by Brendan, that I have a continuous line that masks the outlier (I simplified the input values so you can still see them):
Any help on making the line "non-continuous" for non existent values?

Seaborn also provides a very neat example:
They call it a joy/ridge plot however: https://seaborn.pydata.org/examples/kde_ridgeplot.html
#!/usr/bin/python
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, size=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play will with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)

I would stick with a flat 2D plot and displace each level by a set vertical amount. You'll have to play the the levels (in the code below I called it displace) to properly see the outliers, but this does a pretty good job at replicating your target image. The key, I think, is to set the "zero" values to None so pylab does not draw them.
import numpy as np
import pylab as plt
import itertools
k = 20
X = np.linspace(0, 20, 500)
Y = np.zeros((k,X.size))
# Add some fake data
MU = np.random.random(k)
for n in xrange(k):
Y[n] += np.exp(-(X-MU[n]*n)**2 / (1+n/3))
Y *= 50
# Add some outliers for show
Y += 2*np.random.random(Y.shape)
displace = Y.max()/4
# Add a cutoff
Y[Y<1.0] = None
face_colors = itertools.cycle(["#D3D820", "#C9CC54",
"#D7DA66", "#FDFE42"])
fig = plt.figure()
ax = fig.add_subplot(111, axisbg='black')
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
for n,y in enumerate(Y):
# Vertically displace each plot
y0 = np.ones(y.shape) * n * displace
y1 = y + n*displace
plt.fill_between(X, y0,y1,lw=1,
facecolor=face_colors.next(),
zorder=len(Y)-n)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Labeling boxplot in seaborn with median value - python

How can I label each boxplot in a seaborn plot with the median value? E.g. import seaborn as sns sns.set_style("whitegrid") tips = sns.load_dataset("tips") ax = sns.boxplot(x="day", y="total_bill", data=tips) How do I label each boxplot with the median or average value?

Related

How to overlay two 2D-histograms in Matplotlib?

How to shift quartile lines in seaborn grouped violin plots?

How to add a colorbar to a plt.bar chart?

Boxplot and Data points side by side in one plot

frequency trail in matplotlib

Categories

Resources