Shortly ago, I posted this question:
seaborn barplot: vary color with x and hue
My sample code from that question produces a barplot, which looks like this:
As you can see, there is a very tiny space between the bars.
The code is this:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(columns=["model", "time", "value"])
df["model"] = ["on"]*2 + ["off"]*2
df["time"] = ["short", "long"] * 2
df["value"] = [1, 10, 2, 4]
sns.barplot(data=df, x="model", hue="time", y="value")
plt.show()
Now I executed the code on a different machine and it produced a different image:
The different colors are no concern, I'll specify my own palette in any case. But an important difference for me is: The bars touch each other now, there is no longer a white border between them.
How can I reproduce the original behaviour, how can I explicitly set the padding of barplots bars in seaborn ?
My current seaborn version is 0.9.0. Unfortunately, I don't know with which version the original image was created.
Related
I had a look at Kaggle's univariate-plotting-with-pandas. There's this line which generates bar graph.
reviews['province'].value_counts().head(10).plot.bar()
I don't see any color scheme defined specifically.
I tried plotting it using jupyter notebook but could see only one color instead of all multiple colors as at Kaggle.
I tried reading the document and online help but couldn't get any method to generate these colors just by the line above.
How do we do that? Is there a config to set this randomness by default?
It seems like the multicoloured bars were the default behaviour in one of the former pandas versions and Kaggle must have used that one for their tutorial (you can read more here).
You can easily recreate the plot by defining a list of standard colours and then using it as an argument in bar.
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
'#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
reviews['province'].value_counts().head(10).plot.bar(color=colors)
Tested on pandas 0.24.1 and matplotlib 2.2.2.
In seaborn is it not problem:
import seaborn as sns
sns.countplot(x='province', data=reviews)
In matplotlib are not spaces, but possible with convert values to one row DataFrame:
reviews['province'].value_counts().head(10).to_frame(0).T.plot.bar()
Or use some qualitative colormap:
import matplotlib.pyplot as plt
N = 10
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Paired(np.arange(N)))
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Pastel1(np.arange(N)))
The colorful plot has been produced with an earlier version of pandas (<= 0.23). Since then, pandas has decided to make bar plots monochrome, because the color of the bars is pretty meaningless. If you still want to produce a bar chart with the default colors from the "tab10" colormap in pandas >= 0.24, and hence recreate the previous behaviour, it would look like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
N = 13
df = pd.Series(np.random.randint(10,50,N), index=np.arange(1,N+1))
cmap = plt.cm.tab10
colors = cmap(np.arange(len(df)) % cmap.N)
df.plot.bar(color=colors)
plt.show()
So I have this, probably, simple question. I created a histogram from data out of an excel file with seaborn. Forbetter visualization, I would like to have some space between the bars/bins. Is that possible?
My code looks as followed
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg', 'pdf')
df = pd.read_excel('test.xlsx')
sns.set_style("white")
#sns.set_style("dark")
plt.figure(figsize=(12,10))
plt.xlabel('a', fontsize=18)
plt.ylabel('test2', fontsize=18)
plt.title ('tests ^2', fontsize=22)
ax = sns.distplot(st,bins=34, kde=False, hist_kws={'range':(0,1), 'edgecolor':'black', 'alpha':1.0}, axlabel='test1')
A second question though a bit off topic would be, how I get the exponent in the title of the chart to actually be uplifted?
Thanks!
The matplotlib hist function has an argument rwidth
rwidth : scalar or None, optional
The relative width of the bars as a fraction of the bin width.
You can use this inside the distplot via the hist_kws argument.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
x = np.random.normal(0.5,0.2,1600)
ax = sns.distplot(x,bins=34, kde=False,
hist_kws={"rwidth":0.75,'edgecolor':'black', 'alpha':1.0})
plt.show()
for Seaborn >= 0.11, use shrink parameter. It scales the width of each bar relative to the binwidth by this parameter. The rest will be empty space.
Documentation: https://seaborn.pydata.org/generated/seaborn.histplot.html
edit:
OP was originally asking about sns.distplot(), however, it is deprecated in favor of sns.histplot or sns.displot() in the current version >=0.11. Since OP is generating a histogram, both histplot and displot in hist mode will take shrink
After posting my answer, I realized I answered the opposite of what was being asked. I found this question while trying to figure out how to remove the space between bars. I almost deleted my answer, but in case anyone else stumbles on this question and is trying to remove the space between bars in seaborn's histplot, I'll leave it for now.
Thanks to #miro for Seaborn's updated documentation, I found that element='step' worked for me. Depending on exactly what you want, element='poly' may be what you are after.
My implementation with 'step':
fig,axs = plt.subplots(4,2,figsize=(10,10))
i,j = 0,0
for col in cols:
sns.histplot(df[col],ax=axs[i,j],bins=100,element='step')
axs[i,j].set(title="",ylabel='Frequency',xlabel=labels[col])
i+=1
if i == 4:
i = 0
j+=1
My implementation with 'poly':
fig,axs = plt.subplots(4,2,figsize=(10,10))
i,j = 0,0
for col in cols:
sns.histplot(df[col],ax=axs[i,j],bins=100,element='poly')
axs[i,j].set(title="",ylabel='Frequency',xlabel=labels[col])
i+=1
if i == 4:
i = 0
j+=1
I'm plotting a scatter plot using a pandas dataframe. This works correctly, but I wanted to use seaborn themes and specials functions. When I plot the same data points calling seaborn, the y-axis remains almost invisible. X-axis values ranges from 5000-15000, while y-axis values are in [-6:6]*10^-7.
If I multiply the y-axis values by 10^6, they display correctly, but the actual values when plotted using seaborn remains invisible/indistinguishable in a seaborn generated plot.
How can I seaborn so that the y-axis values scale automatically in the resultant plot?
Also some rows even contain NaN, not in this case, how to disregard that while plotting, short of manually weeding out rows containing NaN.
Below is the code I've used to plot.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("datascale.csv")
subdf = df.loc[(df.types == "easy") & (df.weight > 1300), ]
subdf = subdf.iloc[1:61, ]
subdf.drop(subdf.index[[25]], inplace=True) #row containing NaN
subdf.plot(x='length', y='speed', style='s') #scales y-axis correctly
sns.lmplot("length", "speed", data=subdf, fit_reg=True, lowess=True) #doesn't scale y-axis properly
# multiplying by 10^6 displays the plot correctly, in matplotlib
plt.scatter(subdf['length'], 10**6*subdf['speed'])
Strange that seaborn does not scale the axis correctly. Nonetheless, you can correct this behaviour. First, get a reference to the axis object of the plot:
lm = sns.lmplot("length", "speed", data=subdf, fit_reg=True)
After that you can manually set the y-axis limits:
lm.axes[0,0].set_ylim(min(subdf.speed), max(subdf.speed))
The result should look something like this:
Example Jupyter notebook here.
Seaborn and matplotlib should just ignore NaN values when plotting. You should be able to leave them as is.
As for the y scaling: there might be a bug in seaborn.
The most basic workaround is still to scale the data before plotting.
Scale to microspeed in the dataframe before plotting and plot microspeed instead.
subdf['microspeed']=subdf['speed']*10**6
Or transform to log y before plotting, i.e.
import math
df = pd.DataFrame({'speed':[1, 100, 10**-6]})
df['logspeed'] = df['speed'].map(lambda x: math.log(x,10))
then plot logspeed instead of speed.
Another approach would be to use seaborn regplot instead.
Matplot lib correctly scales and plots for me as follows:
plt.plot(subdf['length'], subdf['speed'], 'o')
I would like to "maximize" the color space for a plot in seaborn. By that I mean I would like the color range to include the two extreme colors in a given palette.
For example if I choose the matplotlib color palette "YlGn_r" and plot it with pandas:
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot(kind='bar', colormap='YlGn_r', width=.8);
If I plot this with the same palette but seaborn the colors are different:
df['x'] = df.index
sns.barplot(x='x', y='value', hue='variable', data=pd.melt(df, 'x'), palette='YlGn_r')
I realize this is probably intended behavior, which I am not entirely opposed to, however is there a way to force seaborn to use the full spectrum? I have many plots that need the colors to match, some in seaborn and some with matplotlib. Thanks!
You can just pass a list of colors anywhere a palette name is accepted, so if you want specific colors, that's the best way to get them. One way would be mpl.cm.YlGn_r(np.linspace(0, 1, 4)).
However, barplot also desaturate the colors a bit, which looks better with large patches, but if you don't want that you can set saturation=1.
Using Seaborn, I can create boxplots of multiple columns of one pandas DataFrame on the same figure. I would like to apply a custom style to the fliers (outliers), e.g. setting the marker symbol, color and marker size.
The API documentation on seaborn.boxplot, however, only provides an argument fliersize which lets me control the size of the fliers but not the color and symbol.
Since Seaborn uses matplotlib for plotting, I thought I could provide a matplotlib styling dictionary to the boxplot function like so:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# create a dataframe
df = pd.DataFrame({'column_a': [3, 6, 200, 100, 7], 'column_b': [1, 8, 4, 150, 290], 'column_c': [6, 7, 20, 80, 275]})
# set figure size
sns.set(rc={"figure.figsize": (14, 6)})
# define outlier properties
flierprops = dict(marker='o', markersize=5)
# create boxplot
ax = sns.boxplot(df, vert=False, showmeans=True, flierprops=flierprops)
plt.show()
Result:
According to the provided dictionary, I would expect a large red circle representing the flyer of column_c but instead, the standard settings are still used.
This thread describes a similar problem when matplotlib is used directly - however, from the discussion I guessed that this should be fixed meanwhile when using recent versions of matplotlib.
I tried this with an iPython notebook (iPython 3.10), matplotlib 1.4.3 and seaborn 0.5.1.
flierprops = dict(marker='o', markerfacecolor='None', markersize=10, markeredgecolor='black')
sns.boxplot(y=df.Column,orient="v",flierprops=flierprops)
Seaborn's boxplot code ignores your flierprops argument and overwrites it with its own before passing arguments to Matplotlib's. Matplotlib's boxplot also returns all the flier objects as part of its return value, so you could modify this after running boxplot, but Seaborn doesn't return this.
The overwriting of flierprops (and sym) seems like a bug, so I'll see if I can fix it: see this issue. Meanwhile, you may want to consider using matplotlib's boxplot instead. Looking at seaborn's code may be useful (boxplot is in distributions.py).
Update: there is now a pull request that fixes this (flierprops and other *props, but not sym)