Seaborn's pairplot seems to have scalling issue on diagonal plots - python

Here is the code I tested
import seaborn as sns
tips = sns.load_dataset('tips')
sns.pairplot(tips)
By default, the diagonal plots are all histograms, and everything seems right (see the picgure below).
However, when I change the setting of pairplot function to something like below, the scale of the vertical axis of the histograms shrinks while the shape and number of bins are still the same (see the picture below). Does anyone know what happened here? I checked the documentation of the pairplot (https://seaborn.pydata.org/generated/seaborn.pairplot.html), by default, the diag_kind is set to 'auto'. When the kind parameter is equal to scatter (the default setting too), even though diag_kind equals to auto, it will be reset to hist behind the scene (https://github.com/mwaskom/seaborn/blob/master/seaborn/axisgrid.py#L1822). So technically the two scripts I presented here should produce the same histograms. Totally lost here ...
tips = sns.load_dataset('tips', diag_kind='hist')

The reason you're seeing this behaviour is that the diagonal plots will only share the Y with the rest of the row if diag_kind == 'hist'. When diag_kind == 'auto', the diag_sharey parameter to PairGrid is set to False.
I see you're already opened an issue about it on Seaborn's github. I guess a clarification of this behaviour (principle of least astonishment, etc.) in the doc string for diag_kind would be helpful.

Related

Matplotlib plot function output differs from seaborn's lineplot

I want to plot a PDF function given data which follows a normal distribution. Mainly I followed this link.
Now, if I am working on the data created like on that website (x=np.linspace()) and I plot it with either seaborn.lineplot() or matplotlib.pyplot.plot(), I get a normal curve as shown on the website linked above. But when I do this with my own data (which I believe is normal, but with a lot more data points) instead of initializing it with np.linspace I get a clear normal curve with seaborn's lineplot and a messy normal curve with matplotlib's plot function.
I have tried to look for default arguments on both functions but couldn't find any (except estimator) which would cause this behavior. The estimator argument of Seaborn's lineplot was the only argument that looked like it could do something like this but setting it to None did not make any difference (and it kind of makes sense I think since the y value is always same for a specific x so averaging out will produce the same value).
I used to think both functions are the same, but then why do they have different output?
The Seaborn lineplot function has the default parameter sort=True.
So unless you tell it not to, it'll order the data for you. This is not something which pyplot.plot() does, instead it'll draw lines between the points in the order provided.
If you want to order the data before plotting it using Pyplot, there's a good solution for how to do that.

Why is part of my contour plot showing white?

I am using Python's matplotlib.pyplot.contourf to create a contour plot of my data with a color bar. I have done this successfully countless times, even with other layers of the same variable. However, when the values get small (on the order of 1E-12), parts of the contour show up white. The white color does not show up in the color bar either. Does anyone know what causes this and how to fix this? The faulty contour is attached below.
a1 = plt.contourf(np.linspace(1,24,24),np.linspace(1,20,20),np.transpose(data[:,:,15]))
plt.colorbar(a1)
plt.show()
tl;dr
Given the new information, matplotlib couldn't set the right number of levels (see parameters in the documentation) for your data leaving data unplotted. To fix that you need to tell matplotlib to extend the limits with either plt.contourf(..., extend="max") or plt.contourf(..., extend="both")
Extensive answer
There are a few reasons why contourf() is showing white zones with a colormap that doesn't include white.
NaN values
NaN values are never plotted.
Masked data
If you mask data before plotting, it won't appear in the plot. But you should know if you masked your data.
Although, you may have unnoticed mask your data if you use something like Tick locator = LogLocator().
Matplotlib couldn't set the right levels for your data
Sometimes matplotlib doesn't set the right levels, leaving some of your data without plotting.
To fix that you can user plt.contourf(..., extend=EXTENDS) where EXTENDS can be "neither", "both", "min", "max"
Coarse grid
contourf plots whitespace over finite data. Past answers do not correct
One remark, white section in the plot can also occur if the X and Y vectors data points are not equally spaced. In that case best to use function tricontourf().
I was facing the same problem recently, when there was data available even higher/lower than the levels I have set. So, the plt.contourf fills the contours exclusively given by you, and it neglects any other higher or lower values present in your data.
I solved this by adding a key word argument extend="both", which for your case would be something like this:
a1 = plt.contourf(np.linspace(1,24,24),np.linspace(1,20,20),np.transpose(data[:,:,15]), extend="both")
or in general form:
a1 = plt.contourf(x,y,variable[:,:,15],extend="both")
By doing this, you're instructing the module to plot the higher(/lower) values according to the highest(/lowest) filled contour.
If you want only to extend in the lower or higher range, you can change the keyword argument to
extend="min" or extend ="max"

Matplotlib example output significantly differs from website

When I run the Matplotlib api example code: radar_chart.py on my computer the output differs from the result on the Matplotlib website at a crucial point. The zero values, of which there are plenty of them, do not hit the origin of the chart on the Matplotlib website, see the chart at the link. When I run the exact same code on my own computer the zero values do hit the origin. See picture below. This results in a less smooth and readable chart compared to the one on the Matplotlib website, however this is not what one would expect. Could anyone please tell me why this difference exists?
The reason for this difference is that the linked example is produced using matplotlib 2.0, while on your computer you run <= 1.5.
It can be observed when looking at the old example on the matplotlib page.
This difference is due to the axes margins being set to 0 in matplotlib 1.5 and to 0.05 in matplotlib 2.0.
There are several ways to set the margins, one being plt.margins(x=0.05, y=0.05).
Since here you want to have the same margins for all axes, one easy method is to use rc params. Adding
plt.rcParams['axes.xmargin'] = 0.05
plt.rcParams['axes.ymargin'] = 0.05
at the top of the script, will set the margins to the values used by default in matplotlib 2.0. Of course you can play around with them and see which values best fit your needs.

matplot and seaborn figure parameters/customizations

I'm so confused between the two. Every time I make a chart on either pyplot or seaborn, I have to guess what syntax to use. For example, for seaborn doesn't have a title setter so I have to remember to use plt.title. Or, for seaborn charts, plt.xlabel doesn't work, so I have to use sns.axlable(x,y).
And also, randomly I run into the following problem. I'm simply trying to make my seaborn jointplot bigger but I have no success trying both the plt nor the seaborn methods (any tips as to a good documentation showing all the chart parameters??? I find them scattered on the web and it seems like each solution on stack overflow is unique...which adds to the overall confusion).
Here's my code:
a = plt.figure(figsize=(30,30))
a.set_size_inches(30,30)
sns.jointplot(x='COAST',y='NORTH',data = data_df, kind = 'kde')
Notice I used the plt method and the sns.set_size_inches methods. Both gave me a small chart.
So frustrated with the random overlaps of the two libraries. Any pro tips to lessen the confusion will be greatly appreciated!
edit: This is also true for seaborn's pairplot. I have no success in changing the pairplot's size.
sns.jointplot creates its own figure instance (as #tcaswell suspected). It doesn't appear that you can tell jointplot to use an existing figure. I think you have two options:
You can give sns.jointplot the size option. e.g.:
sns.jointplot(x='COAST', y='NORTH', data=data_df, kind='kde', size=30)
You can alter the JointGrid figure size after creating it, using:
g=sns.jointplot(x='COAST', y='NORTH', data=data_df, kind='kde')
g.fig.set_size_inches(30,30)
I presume option 1 is the better option, as it is a built-in seaborn option

How to prevent from plotting outlier in boxplot in pandas

I have a DataFrame(called result_df) and want to plot one column with boxplot.
But certain outliers spoiled the visualization. How could I prevent from ploting outliers?
Code I used:
fig, ax = pl.subplots()
fig.set_size_inches(18.5,10.5)
result_df.boxplot(ax=ax)
pl.show()
Important: I haven't paid enough attention, apparently that happens a lot, and I missed that it's pandas specific. However from questions I saw it's basically matplotlib for graphing in the background so this could still work. Sorry I failed to be more careful.
Luckily for you there is such a thing. In the manual under results: dict title torwards the bottom of the page it states:
fliers: points representing data that extend beyond the whiskers
(outliers).
Setting showfliers=False will hopefully help you.
I do have to mention though, that I find it really really strange they shortened outliers to fliers. If that doesn't help manual offers a second solution:
sym : str or None, default = None
The default symbol for flier points. Enter an empty string (‘’) if you don’t want to show fliers. If None, then the fliers default to
‘b+’ If you want more control use the flierprops kwarg.

Categories

Resources