Increasing space between bins in seaborn distplot - python

So I have this, probably, simple question. I created a histogram from data out of an excel file with seaborn. Forbetter visualization, I would like to have some space between the bars/bins. Is that possible?
My code looks as followed
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg', 'pdf')
df = pd.read_excel('test.xlsx')
sns.set_style("white")
#sns.set_style("dark")
plt.figure(figsize=(12,10))
plt.xlabel('a', fontsize=18)
plt.ylabel('test2', fontsize=18)
plt.title ('tests ^2', fontsize=22)
ax = sns.distplot(st,bins=34, kde=False, hist_kws={'range':(0,1), 'edgecolor':'black', 'alpha':1.0}, axlabel='test1')
A second question though a bit off topic would be, how I get the exponent in the title of the chart to actually be uplifted?
Thanks!

The matplotlib hist function has an argument rwidth
rwidth : scalar or None, optional
The relative width of the bars as a fraction of the bin width.
You can use this inside the distplot via the hist_kws argument.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
x = np.random.normal(0.5,0.2,1600)
ax = sns.distplot(x,bins=34, kde=False,
hist_kws={"rwidth":0.75,'edgecolor':'black', 'alpha':1.0})
plt.show()

for Seaborn >= 0.11, use shrink parameter. It scales the width of each bar relative to the binwidth by this parameter. The rest will be empty space.
Documentation: https://seaborn.pydata.org/generated/seaborn.histplot.html
edit:
OP was originally asking about sns.distplot(), however, it is deprecated in favor of sns.histplot or sns.displot() in the current version >=0.11. Since OP is generating a histogram, both histplot and displot in hist mode will take shrink

After posting my answer, I realized I answered the opposite of what was being asked. I found this question while trying to figure out how to remove the space between bars. I almost deleted my answer, but in case anyone else stumbles on this question and is trying to remove the space between bars in seaborn's histplot, I'll leave it for now.
Thanks to #miro for Seaborn's updated documentation, I found that element='step' worked for me. Depending on exactly what you want, element='poly' may be what you are after.
My implementation with 'step':
fig,axs = plt.subplots(4,2,figsize=(10,10))
i,j = 0,0
for col in cols:
sns.histplot(df[col],ax=axs[i,j],bins=100,element='step')
axs[i,j].set(title="",ylabel='Frequency',xlabel=labels[col])
i+=1
if i == 4:
i = 0
j+=1
My implementation with 'poly':
fig,axs = plt.subplots(4,2,figsize=(10,10))
i,j = 0,0
for col in cols:
sns.histplot(df[col],ax=axs[i,j],bins=100,element='poly')
axs[i,j].set(title="",ylabel='Frequency',xlabel=labels[col])
i+=1
if i == 4:
i = 0
j+=1

Related

Make width of seaborn facets proportional to the range of data along the x axis

I have used FacetGrid() from the seaborn module to break a line graph into segments with labels for each region as the title of each subplot. I saw the option in the documentation to have the x-axes be independent. However, I could not find anything related to having the plot sizes correspond to the size of each axis.
The code I used to generate this plot, along with the plot, are found below.
import matplotlib.pyplot as plt
import seaborn as sns
# Added during Edit 1.
sns.set()
graph = sns.FacetGrid(rmsf_crys, col = "Subunit", sharex = False)
graph.map(plt.plot, "Seq", "RMSF")
graph.set_titles(col_template = '{col_name}')
plt.show()
Plot resulting from the above code
Edit 1
Updated plot code using relplot() instead of calling FacetGrid() directly. The final result is the same graph.
import matplotlib.pyplot as plt
import seaborn as sns
# Forgot to include this in the original code snippet.
sns.set()
graph = sns.relplot(data = rmsf_crys, x = "Seq", y = "RMSF",
col = "Subunit", kind = "line",
facet_kws = dict(sharex=False))
graph.set_titles(col_template = '{col_name}')
plt.show()
Full support for this would need to live at the matplotlib layer, and I don't believe it's currently possible to have independent axes but shared transforms. (Someone with deeper knowledge of the matplotlib scale internals may prove me wrong).
But you can get pretty close by calculating the x range you'll need ahead of time and using that to parameterize the gridspec for the facets:
import numpy as np, seaborn as sns
tips = sns.load_dataset("tips")
xranges = tips.groupby("size")["total_bill"].agg(np.ptp)
xranges *= 1.1 # Account for default margins
sns.relplot(
data=tips, kind="line",
x="total_bill", y="tip",
col="size", col_order=xranges.index,
height=3, aspect=.65,
facet_kws=dict(sharex=False, gridspec_kws=dict(width_ratios=xranges))
)

Is there a way to adjust the axes limits of pairplot(), but not as individual plots?

Is there a way to adjust the axes limits of pairplot(), but not as individual plots? Maybe a setting to produce better axes limits?
I would like to have the plots with a bigger range for the axes. My plots axes allows all the data to be visualized, but it is too 'zoomed in'.
My code is:
import pandas as pd
mport matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
g = sns.pairplot(iris, hue = 'species', diag_kind = 'hist', palette = 'Dark2', plot_kws={"s": 20})
The link for my plot and what I would like to plot to look like is here:
pairplot
To change the subplots, g.map(func, <parameters>) can be used. A small problem is that func needs to accept color as parameter, and plt.margins() gives an error when color is used. Moreover, map uses x and y to indicate the row and column variables. You could write a dummy function that simply calls plt.margin(), for example g.map(lambda *args, **kwargs: plt.margins(x=0.2, y=0.3)).
An alternative is to loop through g.axes.flat and call ax.margins() on each of them. Note that many axes are shared in x and/or y direction. The diagonal is treated differently; for some reason ax.margins needs to be called a second time on the diagonal.
To have the histogram for the different colors stacked instead of overlapping, diag_kws={"multiple": "stack"} can be set.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, hue='species', diag_kind='hist', palette='Dark2',
plot_kws={"s": 20}, diag_kws={"multiple": "stack"})
# g.map(plt.margins, x=0.2, y=0.2) # gives an error
for ax in g.axes.flat:
ax.margins(x=0.2, y=0.2)
for ax in g.diag_axes:
ax.margins(y=0.2)
plt.show()
PS: still another option, is to change the rcParams which will have effect on all the plots created later in the code:
import matplotlib as mpl
mpl.rcParams['axes.xmargin'] = 0.2
mpl.rcParams['axes.ymargin'] = 0.2

How to add a new color in matplotlib graph (or use colormaps)?

I have seen several similar question but none which answers my precise question, so please bear with it before marking it duplicate.
I have to plot n lines on a graph.
The standard value of n is 10 but it can vary. Previously, I used
ax.set_color_cycle([plt.cm.Accent(i) for i in np.linspace(0, 1, limit)])
Which works well for me.
However, this method has depreciated as of late, and my program gives a warning for the same to use set_prop_cycle
I wasn't able to find any example in the documentation http://matplotlib.org/api/axes_api.html that suits my needs of uses a colour bar like accent or hot or cool etc.
Update
Sample code for testing:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
limit=10
ax.set_color_cycle([plt.cm.Accent(i) for i in np.linspace(0, 1, limit)])
for limit in range(1,limit+1):
x=np.random.randint(10, size=10)
y=limit*x
plt.plot(x,y,lw=2, label=limit)
plt.legend(loc='best')
What this shows me is:
The question is how can I use a colormap to chose between different shades of a color or the sort? http://matplotlib.org/users/colormaps.html
You need a slightly different formulation in order to make use of the more generic set_prop_cycle. You have to create a cycler for the argument 'color' with plt.cycler and for the rest do it exactly as you did before:
import matplotlib.pyplot as plt
import numpy as np
limit=10
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_prop_cycle(plt.cycler('color', plt.cm.Accent(np.linspace(0, 1, limit))))
for limit in range(1,limit+1):
x=np.random.randint(10, size=10)
y=limit*x
plt.plot(x,y,lw=2, label=limit)
plt.legend(loc='best')

secondary_y=True changes x axis in pandas

I'm trying to plot two series together in Pandas, from different dataframes.
Both their axis are datetime objects, so they can be plotted together:
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot()
plt.plot()
Yields:
All fine, but I need the green graph to have its own scale. So I use the
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot(secondary_y=True)
plt.plot()
This secondary_y creates a problem, as instead of having the desired graph, I have the following:
Any help with this is hugely appreciated.
(Less relevant notes: I'm (evidently) using Pandas, Matplotlib, and all this is in an Ipython notebook)
EDIT:
I've since noticed that removing the resample("W") solves the issue. It is still a problem however as the non-resampled data is too noisy to be visible. Being able to plot sampled data with a secondary axis would be hugely helpful.
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
df.a = df.a*100
fig, ax1 = plt.subplots(1,1)
df.a.plot(ax=ax1, color='blue', label='a')
ax2 = ax1.twinx()
df.b.plot(ax=ax2, color='green', label='b')
ax1.set_ylabel('a')
ax2.set_ylabel('b')
ax1.legend(loc=3)
ax2.legend(loc=0)
plt.show()
I had the same issue, always getting a strange plot when I wanted a secondary_y.
I don't know why no-one mentioned this method in this post, but here's how I got it to work, using the same example as cphlewis:
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
ax = df.plot(secondary_y=['b'])
plt.show()
Here's what it'll look like

How to change figuresize using seaborn factorplot

%pylab inline
import pandas as pd
import numpy as np
import matplotlib as mpl
import seaborn as sns
typessns = pd.DataFrame.from_csv('C:/data/testesns.csv', index_col=False, sep=';')
mpl.rc("figure", figsize=(45, 10))
sns.factorplot("MONTH", "VALUE", hue="REGION", data=typessns, kind="box", palette="OrRd");
I always get a small size figure, no matter what size I 've specified in figsize...
How to fix it?
Note added in 2019: In modern seaborn versions the size argument has been renamed to height.
To be a little more concrete:
%matplotlib inline
import seaborn as sns
exercise = sns.load_dataset("exercise")
# Defaults are size=5, aspect=1
sns.factorplot("kind", "pulse", "diet", exercise, kind="point", size=2, aspect=1)
sns.factorplot("kind", "pulse", "diet", exercise, kind="point", size=4, aspect=1)
sns.factorplot("kind", "pulse", "diet", exercise, kind="point", size=4, aspect=2)
You want to pass in the arguments 'size' or 'aspect' to the sns.factorplot() when constructing your plot.
Size will change the height, while maintaining the aspect ratio (so it will also also get wider if only size is changed.)
Aspect will change the width while keeping the height constant.
The above code should be able to be run locally in an ipython notebook.
Plot sizes are reduced in these examples to show the effects, and because the plots from the above code were fairly large when saved as png's. This also shows that size/aspect includes the legend in the margin.
size=2, aspect=1
size=4, aspect=1
size=4, aspect=2
Also, all other useful parameters/arguments and defaults for this plotting function can be viewed with once the 'sns' module is loaded:
help(sns.factorplot)
mpl.rc is stored in a global dictionary (see http://matplotlib.org/users/customizing.html).
So, if you only want to change the size of one figure (locally), it will do the trick:
plt.figure(figsize=(45,10))
sns.factorplot(...)
It worked for me using matplotlib-1.4.3 and seaborn-0.5.1
The size of the figure is controlled by the size and aspect arguments to factorplot. They correspond to the size of each facet ("size" really means "height" and then size * aspect gives the width), so if you are aiming for a particularl size for the whole figure you'll need to work backwards from there.
import seaborn as sns
sns.set(rc={'figure.figsize':(12.7,8.6)})
plt.figure(figsize=(45,10))
Output
Do not use %pylab inline, it is deprecated, use %matplotlib inline
The question is not specific to IPython.
use seaborn .set_style function, pass it your rc as second parameter or kwarg.: http://web.stanford.edu/~mwaskom/software/seaborn/generated/seaborn.set_style.html
If you just want to scale the figure use the below code:
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 6))
sns.factorplot("MONTH", "VALUE", hue="REGION", data=typessns, kind="box", palette="OrRd"); // OR any plot code
Note as of July 2018:
seaborn.__version__ == 0.9.0
Two main changes which affect the above answers
The factorplot function has been renamed to catplot()
The size parameter has been renamed to height for multi plot grid functions and those that use them.
https://seaborn.pydata.org/whatsnew.html
Meaning the answer provided by #Fernando Hernandez should be adjusted as per below:
%matplotlib inline
import seaborn as sns
exercise = sns.load_dataset("exercise")
# Defaults are hieght=5, aspect=1
sns.catplot("kind", "pulse", "diet", exercise, kind="point", height=4, aspect=2)

Categories

Resources