When using seaborn's joint plot function, setting kind=reg will draw a scatter plot with regression line, an outer histogram and an estimated kernel density (see here).
I would like to get all of that, but not the kernel density estimation. However, it seems that the kind switch either gives me all or nothing. The only conceivable alternative is to use a standard regplot and then manually add the histograms around it.. is there any more convenient way of doing this?
The seaborn jointplot has arguments
{joint, marginal, annot}_kws : dicts, optional
Additional keyword arguments for the plot components.
Since it plots a distplot to the marginal axes, the marginal_kws needs to be used to pass additional keyword arguments to distplot.
The argument to be used to turn the kde off is found in the distplot documentation
kde : bool, optional
Whether to plot a gaussian kernel density estimate.
Related
I want to plot a PDF function given data which follows a normal distribution. Mainly I followed this link.
Now, if I am working on the data created like on that website (x=np.linspace()) and I plot it with either seaborn.lineplot() or matplotlib.pyplot.plot(), I get a normal curve as shown on the website linked above. But when I do this with my own data (which I believe is normal, but with a lot more data points) instead of initializing it with np.linspace I get a clear normal curve with seaborn's lineplot and a messy normal curve with matplotlib's plot function.
I have tried to look for default arguments on both functions but couldn't find any (except estimator) which would cause this behavior. The estimator argument of Seaborn's lineplot was the only argument that looked like it could do something like this but setting it to None did not make any difference (and it kind of makes sense I think since the y value is always same for a specific x so averaging out will produce the same value).
I used to think both functions are the same, but then why do they have different output?
The Seaborn lineplot function has the default parameter sort=True.
So unless you tell it not to, it'll order the data for you. This is not something which pyplot.plot() does, instead it'll draw lines between the points in the order provided.
If you want to order the data before plotting it using Pyplot, there's a good solution for how to do that.
Out of the box seaborn does a very good job to plot a 2D KDE or jointplot. However it is not returning anything like a function that I can evaluate to numerically read the values of the estimated density.
How can I evaluate numerically the density that sns.kdeplot or jointplot has put in the plot?
Just for completeness. I see something interesting in the scipy docs, stats.gaussian_kde but I am getting very clunky density plots,
which for some reason because of missing extent are really off compared to the scatter plot. So I would like to stay away from the scipy kde, at least until I figure how to make it work why pyplot is so much more "not smart" as seaborn is.
Anyhow, the evaluate method of the scipy.stats.gaussian_kde does its job.
I also faced this issue in jointplot() method. I opened a file distribution.py on this path anaconda3/lib/python3.7/site-packages/seaborn/. Then I added these lines in _bivariate_kdeplot() function:
print("xx=",xx[50])
print("yy=",yy[:,50])
print("z=",z[50])
This prints out 100 values of x,y and z arrays of 50 index. Where "z" is the density and "xx" and "yy" are the values adjusted according to the bandwidth, cut and clip, in a meshgrid form distributed according to grid size, that were given by the user. This gave me some idea about the actual values of the 2D kde plot.
If you print out entire array of each variable then you will get 100 x 100 values of each.
Plotly currently supports Catmull-Rom splines for interpolation of the lines between markers on a Scatter plot.
I have graphs where the data is fundamentally a normal distribution. Cubic or Hermite interpolation works very well for this type of data in other graphing frameworks - unfortunately the Catmull-Rom splines (or at least Plotly's implementation of them) really doesn't.
I've experimented with values of "smoothing" between 0.0 and 1.0 (it seems, though this is not documented, that values over 1.0 make no further difference). Unfortunately, they all look bad.
I've seen a suggestion elsewhere that it might make sense to do my own interpolation using scipy's interpolate.interp2d, and graph that line separately. However, this fails for my use case, since I want the color of the line to be paired with the color of the markers, and for both to appear on the legend as a single item, as shown above.
Has anyone had any experience making the Plotly splines look nicer than they do on a quasi-normal distribution using smoothing=1.0?
I'm getting into seaborn for python and I have a quick question that I was not able to find an answer to. If I add jitter to a plot, does it actually change the fit values (such as r^2, p-value, etc) or is it just cosmetic for the plot's look?
Comparing for example sns.lmplot("size", "tip", tips, x_jitter=.15) from sns.lmplot("size", "tip", tips) at https://web.stanford.edu/~mwaskom/software/seaborn/tutorial/quantitative_linear_models.html
No, the regression is estimated on the original data; the jitter is applied to a copy of the data that is used to draw the scatterplot.
I am using Python's matplotlib acorr to plot autocorrelations of time series, but the graph always includes the negative lags.
Since autocorrelation function is always even anyway, I would like to suppress the negative x-axis of the graph.
Is there a parameter I can pass to acorr?