everyone,
I have a generic values distribution. I post the graph.
Is there a way to generate a CDF from these values? Using sns I can create a graph:
My goal is to assign a value to the y-axis and take a value from the x-axis from the CDF. I'm searching online but can't find a method that doesn't require going through curve normalisation.
I'm not sure of the exact data format, but something like numpy.cumsum will take a numpy array that represents a PDF and turn it into an array that represents the CDF.
From there, with your array of p and cdf it is straightforward to find the p value that gives the cdf (which is what I understand you are looking for) with some interpolation with "nearest" as the type of interpolation (see the documentation on scipy.interpolate.interp1d for example).
Related
I have an array of binned angle data, and another array of weights for each bin. I am using the vmpar() function found here in order to estimate the loc and kappa parameters. I then use the vmpdf() function, found in the same script, to create a von mises probability density function (pdf).
However, the vmpar function does not give me a scale parameter like the scipy vonmises.fit() function does. But I don't know how to use the vonmises.fit() function with binned data, since this function does not seem to accept weights as input.
My question is therefore: how do I estimate the scale from my binned angle data? The reason I want to adjust the scale is so that I can plot my original data and the pdf on the same graph. For example, now the pdf is not scaled to my original data, as seen in the image below (blue=original data, red line = pdf).
I am quite new to circular statistics, so perhaps there is a very easy way to implement this that I am overlooking. I need to figure out this asap, so I appreciate any help!
I have [x,y,z] data to plot on a ternary diagram, that of which I would like to plot the contours of based on their density in [x,y,z]-space. I have my data stored in a list of ((x1,y1,z1), (x2,y2,z2), ect..), and also in individual data-frame columns.
I see many options (using Marc Harper's function, plotly's 'create_ternary_contour', ect...) for plotting contours based on a 4th dimension (usually output values of a function of x,y,z), but I haven't found a solution to define them based on density. I think what I would like is analogous to the 2D solution available with hist2d and/or contour/contourf using a KDE approach... but on a ternary diagram.
Does anyone know how to do this? I suspect I would have to make some sort of grid in the ternary geometry, and then evaluate the KDE of the [x,y,z] data and define contours based on this somehow? I found a similar question here, but it is unfortunately in R, not Python.
Out of the box seaborn does a very good job to plot a 2D KDE or jointplot. However it is not returning anything like a function that I can evaluate to numerically read the values of the estimated density.
How can I evaluate numerically the density that sns.kdeplot or jointplot has put in the plot?
Just for completeness. I see something interesting in the scipy docs, stats.gaussian_kde but I am getting very clunky density plots,
which for some reason because of missing extent are really off compared to the scatter plot. So I would like to stay away from the scipy kde, at least until I figure how to make it work why pyplot is so much more "not smart" as seaborn is.
Anyhow, the evaluate method of the scipy.stats.gaussian_kde does its job.
I also faced this issue in jointplot() method. I opened a file distribution.py on this path anaconda3/lib/python3.7/site-packages/seaborn/. Then I added these lines in _bivariate_kdeplot() function:
print("xx=",xx[50])
print("yy=",yy[:,50])
print("z=",z[50])
This prints out 100 values of x,y and z arrays of 50 index. Where "z" is the density and "xx" and "yy" are the values adjusted according to the bandwidth, cut and clip, in a meshgrid form distributed according to grid size, that were given by the user. This gave me some idea about the actual values of the 2D kde plot.
If you print out entire array of each variable then you will get 100 x 100 values of each.
I have two sets of data points; effectively, one is from a preimage and the other from its image, but I do not know the rule between the two. This rule/function is nonlinear.
I've collected many data points of corresponding locations on both images, and I was wondering if anyone knew of a way to find a more complete mapping. That is, does anyone know the best way to find a mapping from R^2 to R^2 with an extensive set of sample points. This mapping is one-to-one and onto.
My goal is to use the data I've found to find a polynomial function that takes in some x,y coordinate from the preimage, and outputs the shifted coordinates.
edit: I have sample points along the domain and their corresponding points in the image, but not for every point in the domain. I want to be able to input any point (only integer values) in the domain and output the shifted point.
I don't think polynomial is easy (or easy to guarantee is a bijection). The obvious thing to do is to
Construct the delaunay triangulation of the known points in the domain.
For each delaunay triangle the mapping is just the linear mapping which interpolates the map on the vertices.
Then, when you have a random point, look up its delaunay triangle, and apply the requisite map.
I believe that all of the above can be done via scipy.spatial.delaunay.
The transformation you're trying to find sounds a lot like what's accomplished in Geographic Information Systems using a technique called rubber-sheeting https://en.wikipedia.org/wiki/Rubbersheeting
Igor Rivin's description of a process using a Delaunay triangulation is pretty much the solution that's used in such systems. Some systems will use a Barycentric coordinate system rather than a linear mapping to try to reduce the appearance of triangle-related artifacts in the transformed image.
What you are describing also sounds a bit like the "morphing" special effect used in video. Maybe a web search on that topic would turn up some leads for you.
I have a set of 1D values. When I plot the histogram of the values, I notice that they're not uniformly distributed. Can I find a non-linear mapping, such that the transformed scores are uniformly distributed? I also need the reverse transform function as well.
One way I know is to do histogram equalization like in images. Is there any inbuilt function in python to achieve this?