Boxplot and Data points side by side in one plot

Boxplot and Data points side by side in one plot - python

plt.figure(figsize=(8,5))
sns.boxplot(x=df.StoreType, y=df.Sales)
I'm getting the above plot but i want boxplot and data points side by side (not overlapped,using seaborn or matplotlib) like the one below:

The code below borrows from a couple of other SO answers:
offsetting swarmplot's idea is from: https://stackoverflow.com/a/56655927/42346
changing width of swarmplot's code is from: https://stackoverflow.com/a/44542112/42346
If you have questions about the code please tell me.
import seaborn as sns, numpy as np
import matplotlib, matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
# adjust these as per your data
boxplot_width = .25 # thinner to make room for having swarmplot beside
swarmplot_offset = -.5 # offset to left of boxplot
xlim_offset = -1 # necessary to show leftmost swarmplot
fig = plt.figure(figsize=(6,4))
ax = sns.swarmplot(x="day", y="total_bill", data=tips)
path_collections = [child for child in ax.get_children()
if isinstance(child,matplotlib.collections.PathCollection)]
for path_collection in path_collections:
x,y = np.array(path_collection.get_offsets()).T
xnew = x + swarmplot_offset
offsets = list(zip(xnew,y))
path_collection.set_offsets(offsets)
sns.boxplot(x="day", y="total_bill", data=tips, width=boxplot_width, ax=ax)
def change_width(ax, new_value):
for patch in ax.patches:
current_width = patch.get_width()
diff = current_width - new_value
# change patch width
patch.set_width(new_value)
# re-center patch
patch.set_x(patch.get_x() + diff * .5)
change_width(ax,.25)
ax.set_xticklabels(ax.get_xticklabels(), ha="right") # align labels to left
ax.set_xlim(xlim_offset,ax.get_xlim()[1]) # to show leftmost swarmplot
plt.show()
Example image:

Related

Removing legend from mpl parallel coordinates plot?

I have a parallel coordinates plot with lots of data points so I'm trying to use a continuous colour bar to represent that, which I think I have worked out. However, I haven't been able to remove the default key that is put in when creating the plot, which is very long and hinders readability. Is there a way to remove this table to make the graph much easier to read?
This is the code I'm currently using to generate the parallel coordinates plot:
parallel_coordinates(data[[' male_le','
female_le','diet','activity','obese_perc','median_income']],'median_income',colormap = 'rainbow',
alpha = 0.5)
fig, ax = plt.subplots(figsize=(6, 1))
fig.subplots_adjust(bottom=0.5)
cmap = mpl.cm.rainbow
bounds = [0.00,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
norm = mpl.colors.BoundaryNorm(bounds, cmap.N,)
plt.colorbar(mpl.cm.ScalarMappable(norm = norm, cmap=cmap),cax = ax, orientation = 'horizontal',
label = 'normalised median income', alpha = 0.5)
plt.show()
Current Output:
I want my legend to be represented as a color bar, like this:
Any help would be greatly appreciated. Thanks.

You can use ax.legend_.remove() to remove the legend.
The cax parameter of plt.colorbar indicates the subplot where to put the colorbar. If you leave it out, matplotlib will create a new subplot, "stealing" space from the current subplot (subplots are often referenced to by ax in matplotlib). So, here leaving out cax (adding ax=ax isn't necessary, as here ax is the current subplot) will create the desired colorbar.
The code below uses seaborn's penguin dataset to create a standalone example.
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import numpy as np
from pandas.plotting import parallel_coordinates
penguins = sns.load_dataset('penguins')
fig, ax = plt.subplots(figsize=(10, 4))
cmap = plt.get_cmap('rainbow')
bounds = np.arange(penguins['body_mass_g'].min(), penguins['body_mass_g'].max() + 200, 200)
norm = mpl.colors.BoundaryNorm(bounds, 256)
penguins = penguins.dropna(subset=['body_mass_g'])
parallel_coordinates(penguins[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']],
'body_mass_g', colormap=cmap, alpha=0.5, ax=ax)
ax.legend_.remove()
plt.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=cmap),
ax=ax, orientation='horizontal', label='body mass', alpha=0.5)
plt.show()

How to add a colorbar to a plt.bar chart?

I am trying to create a self updating chart that displays a horizontal line and color bars based on a y-axis value of interest. So bars might be colored red if they are definitely above this value (given a 95% confidence interval), blue if they are definitely below this value, or white if they contain this value. something similar to this:
The problem I have is I cant display the colorbar on my plot. I managed to color each bar based on a LinearSegmentedColormap and some conditions, but I cant manage to display this colorbar on my image.
This is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
import matplotlib.axes
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
from matplotlib.cm import ScalarMappable
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650),
np.random.normal(43000,100000,3650),
np.random.normal(43500,140000,3650),
np.random.normal(48000,70000,3650)],
index=[1992,1993,1994,1995])
means = []
for i in df.index:
means.append(df.loc[i].mean())
std = []
for i in df.index:
std.append(df.loc[i].std())
# compute the 95% confidence intervals
conf = []
for i in range(len(means)):
margin = (1.96*std[i])/sqrt(len(df.columns))
conf.append(margin)
fig, axs = plt.subplots(1)
bars = plt.bar(df.index, means, yerr= conf, tick_label = df.index, capsize = 10)
#Setup the plot
yinterest = 43000
plt.gca().spines.get('top').set_visible(False)
plt.gca().spines.get('right').set_visible(False)
plt.axhline(yinterest, color = 'black', label = '4300')
#setting the y-interest tick
plt.draw()
labels = [w.get_text() for w in ax.get_yticklabels()]
locs=list(ax.get_yticks())
labels+=[str(yinterest)]
locs+=[float(yinterest)]
ax.set_yticklabels(labels)
ax.set_yticks(locs)
plt.draw()
#setting up the colormap
colormap = cm.get_cmap('RdBu', 10)
colores = []
for i in range(len(means)):
color = (yinterest-(means[i]-conf[i]))/((means[i]+conf[i])-(means[i]-conf[i]))
bars[i].set_color(colormap(color))
I am fairly new to python (or programming for that matter) and I have searched everywhere for a solution but to no avail. Any help would be appreciated.
Greetings.

The first hint is to use pandasonic methods to compute plot data
(much more concise):
means = df.mean(axis=1)
std = df.std(axis=1)
conf = (std * 1.96 / sqrt(df.shape[1]))
And to draw your plot, run:
yinterest = 39541
fig, ax = plt.subplots(figsize=(10,6))
ax.spines.get('top').set_visible(False)
ax.spines.get('right').set_visible(False)
colors = (yinterest - (means - conf)) / (2 * conf)
colormap = plt.cm.get_cmap('RdBu', 10)
plt.bar(df.index, means, yerr=conf, tick_label=df.index, capsize=10, color=colormap(colors))
cbar = plt.colorbar(plt.cm.ScalarMappable(cmap=colormap), orientation='horizontal')
cbar.set_label('Color', labelpad=5)
plt.axhline(yinterest, color='black', linestyle='--', linewidth=1)
plt.show()
One trick that allows to avoid colouring the bars after their
generation is that I compute colors, which are then converted to
a color map and passed to plt.bar.
To draw the color bar, use plt.colorbar.
I changed the value of yinterest to that included in your picture and got
something similar to your picture, but with a color bar:

How to obtain correct size for a second colorbar in matplotlib plot?

I wish to create a "split plot", i.e. to use different colormaps in the left and the right half of my plot. Accordingly I will need two different colorbars. Unfortunately I have to set the position of the second colorbar by hand and modify everytime a label or title is included. Is there a way to automatise that?
I wondered if I could extract the rect parameter of the following minimal example from the right colorbar. That would help me as I only had shift it a bit. Any other (/better) idea is also welcome.
At the moment, whenever I change the labels or title a bit the manually set position of the left colorbar has to be modified again. This is very annoying. I include a running minimal example and a the output it produces:
import matplotlib as mpl
params = {
'xtick.direction' : 'out',
'ytick.direction' : 'out',
'text.usetex' : True,
}
mpl.rcParams.update(params)
mpl.rcParams.update({'figure.autolayout': True})
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
extent_arr1 = [-1,0, -1,1]
extent_arr2 = [ 0,1, -1,1]
M = 501
# define test-data
data_arr1 = np.zeros((M, M))
data_arr2 = np.ones((M, M))
# define figure
fig = plt.figure()
ax = fig.add_subplot(111)
# left plot:
image1 = ax.imshow( data_arr1, cmap='jet', interpolation='bilinear', extent=extent_arr1, \
origin='lower')
plt.title("Minimal example")
cbar1 = plt.colorbar(image1)
# right plot:
image2 = ax.imshow( data_arr2, cmap='gnuplot', interpolation='bilinear', extent=extent_arr2, \
origin='lower')
# define axes-labels:
plt.xlabel(r"$x$")
plt.ylabel(r"$y$")
# define colour-bar at left side:
rect_loc = [0.0, 0.08, 0.03, 0.88] # define position ---> how to automatise this?
cax2 = fig.add_axes(rect_loc) # left | bottom | width | height
cbar2 = plt.colorbar(image2, cax=cax2)
cbar2.ax.yaxis.set_ticks_position('left')
# set limits:
ax.set_xlim(-1,1)
ax.set_ylim(-1,1)
plt.show()
output:
Thanks in advance!

There are of course several ways to create a colorbar axes and put it next to a plot. I would recommend reading those questions:
positioning the colorbar
Matplotlib 2 Subplots, 1 Colorbar
Many of those concepts can be extended to a second colorbar. The solution I would personally prefer is the following, which uses an axes divider. The advantage is that the colorbar keeps the size of the axes.
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np; np.random.seed(1)
plt.rcParams.update({'figure.autolayout': True})
fig, ax = plt.subplots(figsize=(6,4))
im = ax.imshow(np.random.rand(10,10), extent=[-1,0,0,1], cmap="RdYlGn")
im2 = ax.imshow(np.random.rand(10,10), extent=[0,1,0,1], cmap="magma")
ax.set_xlabel("x label")
ax.set_ylabel("y label")
ax.set_xlim(-1,1)
ax.set_ylim(0,1)
divider = make_axes_locatable(ax)
cax = divider.new_horizontal(size="5%", pad=0.2)
fig.add_axes(cax)
fig.colorbar(im2, cax=cax)
cax2 = divider.new_horizontal(size="5%", pad=0.7, pack_start=True)
fig.add_axes(cax2)
cb2 = fig.colorbar(im, cax=cax2)
cb2.ax.yaxis.set_ticks_position('left')
plt.show()

frequency trail in matplotlib

I'm looking into outliers detection. Brendan Gregg has a really nice article and I'm especially intrigued by his visualizations. One of the methods he uses are frequency trails.
I'm trying to reproduce this in matplotlib using this example. Which looks like this:
And the plot is based on this answer: https://stackoverflow.com/a/4152016/948369
Now my issue is, like described by Brendan, that I have a continuous line that masks the outlier (I simplified the input values so you can still see them):
Any help on making the line "non-continuous" for non existent values?

Seaborn also provides a very neat example:
They call it a joy/ridge plot however: https://seaborn.pydata.org/examples/kde_ridgeplot.html
#!/usr/bin/python
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, size=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play will with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)

I would stick with a flat 2D plot and displace each level by a set vertical amount. You'll have to play the the levels (in the code below I called it displace) to properly see the outliers, but this does a pretty good job at replicating your target image. The key, I think, is to set the "zero" values to None so pylab does not draw them.
import numpy as np
import pylab as plt
import itertools
k = 20
X = np.linspace(0, 20, 500)
Y = np.zeros((k,X.size))
# Add some fake data
MU = np.random.random(k)
for n in xrange(k):
Y[n] += np.exp(-(X-MU[n]*n)**2 / (1+n/3))
Y *= 50
# Add some outliers for show
Y += 2*np.random.random(Y.shape)
displace = Y.max()/4
# Add a cutoff
Y[Y<1.0] = None
face_colors = itertools.cycle(["#D3D820", "#C9CC54",
"#D7DA66", "#FDFE42"])
fig = plt.figure()
ax = fig.add_subplot(111, axisbg='black')
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
for n,y in enumerate(Y):
# Vertically displace each plot
y0 = np.ones(y.shape) * n * displace
y1 = y + n*displace
plt.fill_between(X, y0,y1,lw=1,
facecolor=face_colors.next(),
zorder=len(Y)-n)
plt.show()

Tick properties for scatterplot matrices with Matplotlib

I am trying to plot a scatterplot matrix based on the code written by Joe Kington: Is there a function to make scatterplot matrices in matplotlib?
Some people already helped me: Thank you again (especially J.K.).
I am having a last problem: I cannot rotate the ticks of some axis for which numbers overlap (bottom left):
I would like to try to have them vertical but I cannot do it.... Here is my code:
import itertools
import numpy as np
import pylab as plot
import scipy
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import axis
import math
from matplotlib import rc
import os
import platform
def main():
FigSize=8.89
FontSize=8
np.random.seed(1977)
numvars, numdata = 4, 10
data = 10 * np.random.random((numvars, numdata))
fig = scatterplot_matrix(data, ['mpg', 'disp', 'drat', 'wt'], FigSize, FontSize,
linestyle='none', marker='o', color='black', mfc='none', markersize=3,)
fig.suptitle('Simple Scatterplot Matrix')
plt.savefig('Plots/ScatterplotMatrix/ScatterplotMatrix2.pdf',format='pdf', dpi=1000, transparent=True, bbox_inches='tight')
plt.show()
def scatterplot_matrix(data, names, FigSize, FontSize, **kwargs):
"""Plots a scatterplot matrix of subplots. Each row of "data" is plotted
against other rows, resulting in a nrows by nrows grid of subplots with the
diagonal subplots labeled with "names". Additional keyword arguments are
passed on to matplotlib's "plot" command. Returns the matplotlib figure
object containg the subplot grid."""
legend=['(kPa)','\%','\%','\%']
numvars, numdata = data.shape
fig, axes = plt.subplots(nrows=numvars, ncols=numvars, figsize=(FigSize/2.54,FigSize/2.54))
fig.subplots_adjust(hspace=0.05, wspace=0.05)
sub_labelx_top=[2,4]
sub_labelx_bottom=[13,15]
sub_labely_left=[5,13]
sub_labely_right=[4,12]
for i, ax in enumerate(axes.flat, start=1):
# Hide all ticks and labels
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.xaxis.set_major_locator(MaxNLocator(prune='both',nbins=4))
ax.yaxis.set_major_locator(MaxNLocator(prune='both',nbins=4)) #http://matplotlib.org/api/ticker_api.html#matplotlib.ticker.MaxNLocator
# Set up ticks only on one side for the "edge" subplots...
if ax.is_first_col():
ax.yaxis.set_ticks_position('left')
ax.tick_params(direction='out')
ax.yaxis.set_tick_params(labelsize=0.75*FontSize)
if i in sub_labely_left:
ax.yaxis.set_label_position('left')
ax.set_ylabel('(\%)',fontsize=0.75*FontSize)
if ax.is_last_col():
ax.yaxis.set_ticks_position('right')
ax.tick_params(direction='out')
ax.yaxis.set_tick_params(labelsize=0.75*FontSize)
if i in sub_labely_right:
ax.yaxis.set_label_position('right')
if i==4:
ax.set_ylabel('(kPa)',fontsize=0.75*FontSize)
else:
ax.set_ylabel('(\%)',fontsize=0.75*FontSize)
if ax.is_first_row():
ax.xaxis.set_ticks_position('top')
ax.tick_params(direction='out')
ax.xaxis.set_tick_params(labelsize=0.75*FontSize)
if i in sub_labelx_top:
ax.xaxis.set_label_position('top')
ax.set_xlabel('(\%)',fontsize=0.75*FontSize)
if ax.is_last_row():
ax.xaxis.set_ticks_position('bottom')
ax.tick_params(direction='out')
ax.xaxis.set_tick_params(labelsize=0.75*FontSize)
if i in sub_labelx_bottom:
ax.xaxis.set_label_position('bottom')
if i==13:
ax.set_xlabel('(kPa)',fontsize=0.75*FontSize)
else:
ax.set_xlabel('(\%)',fontsize=0.75*FontSize)
# Plot the data.
for i, j in zip(*np.triu_indices_from(axes, k=1)):
for x, y in [(i,j), (j,i)]:
axes[x,y].plot(data[y], data[x], **kwargs)
# Label the diagonal subplots...
for i, label in enumerate(names):
axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
ha='center', va='center',fontsize=FontSize)
# Turn on the proper x or y axes ticks.
for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
axes[j,i].xaxis.set_visible(True)
axes[i,j].yaxis.set_visible(True)
return fig
main()
My second question is more for the 'fun': how can I make the subplots perfectly squares?
I apologize to Joe Kington; I know my code is way less elegant than his... I just started few weeks ago. If you have any suggestions to improve mine, for example to make it more dynamic, I am very interesting.

You can rotate the xtick labels using setp.
from matplotlib.artist import setp
Then after you set the x tick positions for the top row and left column of subplot call:
setp(ax.get_xticklabels(), rotation=90)
To make the size of the subplots equal, you can fig.subplots_adjust to set the area of all the subplots to a square. Something like this:
gridSize = 0.6
leftBound = 0.5 - gridSize/2
bottomBound = 0.1
rightBound = leftBound + gridSize
topBound = bottomBound + gridSize
fig.subplots_adjust(hspace=0.05, wspace=0.05, left=leftBound,
bottom=bottomBound, right=rightBound, top=topBound)
If the figure size isn't square, you'll need to change the shape of the grid accordingly. Alternately, you could add each subplot axes individually with fig.add_axes. That will allow you to set the size directly but you'll also have to set the location.
Don't use bbox_inches='tight' to save the figure or you'll lose the title with these setting. You can save like this:
plt.savefig('ScatterplotMatrix.pdf',format='pdf', dpi=1000, transparent=True)
The resulting graph looks like this:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Boxplot and Data points side by side in one plot - python

plt.figure(figsize=(8,5)) sns.boxplot(x=df.StoreType, y=df.Sales) I'm getting the above plot but i want boxplot and data points side by side (not overlapped,using seaborn or matplotlib) like the one below:

Related

Removing legend from mpl parallel coordinates plot?

How to add a colorbar to a plt.bar chart?

How to obtain correct size for a second colorbar in matplotlib plot?

frequency trail in matplotlib

Tick properties for scatterplot matrices with Matplotlib

Categories

Resources