When plotting the estimated density function of my data using sns.kdeplot(), the algorithm extrapolates outside of the boundaries of the data, meaning that it draws the plot for values smaller than 0 or greater than 1, which is particularly annoying when dealing with probabilities. Example:
import numpy as np
import seaborn as sns
data = np.random.random(100)
sns.kdeplot(data)
How to fix that so that the actual plot remains within [0,1] in the x-direction without simply calling plt.xlim((0,1))?
Related
I visualize density function (PDF) using two plotting approaches: displot() and plot(). I don't understand why displot() doesn't produce normally distributed plot wheras plot() do this perfectly. Density plots should look alike but they don't. What's wrong with displot() here?
from scipy.stats import norm
import seaborn as sns
import numpy as np
data_x= np.arange(-4, 4, 0.001)
norm_pdf = norm.pdf(data_x)
sns.displot(data = norm_pdf, x = data_x, kind='kde')
from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np
data_x= np.arange(-4, 4, 0.001)
plt.plot(data_x, norm.pdf(data_x))
plt.show()
displot (or the underlying kdeplot) creates an approximation of a probability density function (pdf) to resemble the function that might have generated the given random data. As input, you'll need random data. The function will mimic these data as a sum of Gaussian bell shapes (a "kernel density estimation" with a Gaussian kernel).
Here is an example using 8000 random points as input. You'll notice the curve resembles the normal pdf, but is also a bit "bumpier" (that's how randomness looks like).
data_x = norm.rvs(size=8000)
sns.kdeplot(x=data_x)
When you call kdeplot (or displot(..., kind='kde')) with both data= and x=, while x= isn't a columnname in a dataframe, data= gets ignored. So, you are using 8000 evenly distributed values between -4 and 4. The kde of such data looks like a flat line between -4 and 4. But as the kde supposes the underlying function locally resembles a Gaussian, the start and end are smoothed out.
data_x = np.arange(-4, 4, 0.001)
sns.kdeplot(x=data_x)
First time user so apologies for any mistakes.
I have some code (pasted below) which is used to analyse and gain values/graphs from a simulation I have run.
This results in the following image:
I would therefore now like to plot a line graph on top of this according to the values of the colour map corresponding to r = 0 on the y-axis at every point on the x - axis with each respective value on the colour map. However, I'm completely lost on where to even begin with this. I've tried looking into KDE and other similar things, but I realise I'm not sure how to take numerical values which were used to generate the colour map.
from openpmd_viewer import OpenPMDTimeSeries
from openpmd_viewer.addons import LpaDiagnostics
import numpy as np
from scipy.constants import c, e, m_e
import matplotlib.pyplot as plt
from matplotlib import gridspec
# Replace the string below, to point to your data
ts = OpenPMDTimeSeries(r"/Users/bentorrance/diags/hdf5/")
ts_2d = LpaDiagnostics(r"/Users/bentorrance/diags/hdf5/")
plt.figure(1)
Ez = ts.get_field(iteration=5750, field='E', coord='z', plot=True, cmap='inferno')
plt.title(r'Electric Field Density $E_{z}$')
plt.show()
The pandas.plot.kde() function is handy for plotting the estimated density function of a continuous random variable. It will take data x as input, and display the probabilities p(x) of the binned input as its output.
How can I extract the values of probabilities it computes? Instead of just plotting the probabilities of bandwidthed samples, I would like an array or pandas series that contains the probability values it internally computed.
If this can't be done with pandas kde, let me know of any equivalent in scipy or other
there are several ways to do that. You can either compute it yourself or get it from the plot.
As pointed out in the comment by #RichieV following this post, you can extract the data from the plot using
data.plot.kde().get_lines()[0].get_xydata()
Use seaborn and then the same as in 1):
You can use seaborn to estimate the kernel density and then matplotlib to extract the values (as in this post). You can either use distplot or kdeplot:
import seaborn as sns
# kde plot
x,y = sns.kdeplot(data).get_lines()[0].get_data()
# distplot
x,y = sns.distplot(data, hist=False).get_lines()[0].get_data()
You can use the underlying methods of scipy.stats.gaussian_kde to estimate the kernel density which is used by pandas:
import scipy.stats
density = scipy.stats.gaussian_kde(data)
and then you can use this to evaluate it on a set of points:
x = np.linspace(0,80,200)
y = density(xs)
How would I make a plot of this style in python with matplotlib? (Cumulative probability plot) I don't need complete code, mostly just need a place to start and a general idea of what I need to do for it.
A cumulative probability plot is really easy to make:
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
fig,ax = plt.subplots()
ax.plot(np.sort(data),np.linspace(0.0,1.0,len(data)))
plt.xlabel(r'$x$')
plt.ylabel(r'$P(X \leq x)$')
plt.show()
Note that it can have a strong advantage over a probability density plot as it does not require binning of your data. (Should you be looking for the latter you can check this code).
I have two sin signals with the same frequency, and I plot a lissajou figure using them:
Now I want to calculate the phase difference between them. To do that, I have to know the values of a and b. How can I extract the value from the figure? By this I mean the exact coordinates of the largest x-position and the x-position where the curve crosses the zero line.
As example:
import numpy as np
import matplotlib as plt
t=np.arange(0,1.0,1.0/8000)
U=np.sin(2*np.pi*100*t)
I=np.sin(2*np.pi*100*t+45)
plt.plot(U,I)
plt.grid(True)
plt.show()