Calculate area between two curves (that are normal distributions) - python

I need to calculate the area between two curves.
I have lots of data, so I'd like to do it programmatically.
Basically, I always have 2 normal distributions, calculated from a mean value and standard deviation. I would then like to calculate how much they intersect.
Here is an example of what I mean, and also some code in R (that I don't know).
Is there already a function in matplotlib or scipy or some other module that does it for me?
In case I have to implement it myself, I think that I should do:
find the intersections (there will be max 2)
see which function is lower before, [between], and after the intersection
calculate the integral of the lower function and add them all together
Is that right? How can I do the single steps? Are there functions, modules, etc that can help?

I don't know R either, but the answer seems to be in the link you provided: just integrate the minimum of your distributions.
You don't need to find intersections, just feed min(f(x), g(x)) to scipy.integrate.quad.

Related

use FFT spectrum to define a lambda function

my problem is the following: I have N different measurements of a quantity which depends on two other quantities that I also know. I would like to use find a function two variable function that approximates the data, and I thought that using Fourier transforms was a nice idea.
Does anybody has a suggestion on how should I proceed? I think as a first step I want to do a FFT of my data, but then how can I implement the inverse FT not only for the points where I measured but for any pair (x,y) as input?
Thanks a lot.
(I am using python).

Curve Fitting in Python for extrapolation, Regression analysis

This question is regarding curve fitting in python.
First, I would say that I do not know the curve fit function to insert into "curve_fit" function in the scipy library; therefore, I am trying to use a polyfit which is OK if I am interested in interpolation but my goal is to predict values at future points, in other words extrapolation.
I have attached a screenshot of a raw signal, smoothed and its polyfit result. It has the correct poly order but still fails at extrapolation. My conclusion is that poly fit is not the right approach here, but I can not estimate the curve function. What are you thoughts?
Please note that this is not a distribution since the y values may keep slowly decreasing infinitely, even below 0.
I'd say the function looks like an exponential Gaussian but again it's not a distribution so dont want to do that.
My last thought was to split the plot into two, the first model can certainly be modeled as a polynomial and the second as an exponential. (values are different than first png cuz it's of a different signal).
Then, maybe combine the two. What do you think about this?
Attached is a screenshot of this too.
Since many curves can fit the data and extrapolate differently, you need to choose the right basis functions to get the behaviour you want.
So far you have tried polynomials for instance, these however tend to +- infinite, which is perhaps not what you want.
I would try and use curve_fit on a sum of Hermite polynomials or Laguerre polynomials. For instance, for Laguerre polynomials, you could try
a + b*exp(-k x) + c*(1-x)*exp(-k x) + d*(x^2 - 4*x + 2)*exp(-k x) + ...
Python has a lot of convenience functions built in for this, see e.g. https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.polynomials.laguerre.html
Note however that you should also fit k to your data, which you could use curve_fit for.

Manipulate 2D symbolic parametric curves in Python

I am trying to compute intersections, distances and derivatives on 2D symbolic parametric curves (that is a curve defined on the plan by a function) but I can't find any Python module that seems to do the job.
So far I have only found libraries that deal with plotting or do numerical approximation so I thought I could implement it myself as a light overlay on top of a symbolic mathematics library.
I start experimenting with SymPy but I can wrap my head around it: it doesn't seems to be able to return intervals even in finite number (for instance solve(x = x) fails !) and only a small numbers of solutions is some simple cases.
What tool could be suitable for the task ?
I guess that parametric functions relate to the advanced topics of mathematical analysis, and I haven't seen any libraries yet that could match your demands. However you could try to look through the docs of the Sage project...
It would help if you give an example of two curves that you want to define. solve is up to the task for finding intersections of all quadratic curves (it will actually solve quartics and some quintics, too).
When you say "distance" what do you mean - arc length sort of distance or distance from a point to the curve?
As for tangents, that is easily handled with idiff (see its docstring for examples with help(idiff).

Curve_fit not converging means...?

I need to crossmatch a list of astronomical coordinates with different catalogues, and I want to decide a maximum radius for the crossmatch. This will avoid mismatches between my list and the catalogues.
To do this, I compute the separation between the best match with the catalogue for each object in my list. My initial list is supossed to be the position of a known object, but it could happend that it is not detected in the catalog, and my coordinates may suffer from small offsets.
They way I am computing the maximum radius is by fitting the gaussian kernel density of the separation with a gaussian, and use the center + 3sigmas value. The method works nicely for most of the cases, but when a small subsample of my list has an offset, I have two gaussians instead. In these cases, I will specify the max radius in a different way.
My problem is that when this happens, curve_fit can't normally do the fit with one gaussian. For a scientific publication, I will need to justify the "no fit" in curve_fit, and in which cases the "different way" is used. Could someone give me a hand on what this means in mathematical terms?
There are varying lengths to which you can go justifying this or that fitting ansatz --- which strongly depends on the details of your specific case (eg: why do you expect a gaussian to work in a first place? to what depth you need/want to delve into why exactly a certain fitting procedure fails and what exactly is a fail etc).
If the question is really about the curve_fit and its failure to converge, then show us some code and some input data which demonstrate the problem.
If the question is about how to evaluate the goodness-of-fit, you're best off going back to the library and picking a good book on statistics.
If all you look for is way of justifying why in a certain case a gaussian is not a good fitting ansatz, one way would be to calculate the moments: for a gaussian distribution 1st, 2nd, 3rd and higher moments are related to each other in a very precise way. If you can demonstrate that for your underlying data the relation between moments is very different, it sounds reasonable that these data can't be fit by a gaussian.

Curve Fitting with Known Integrals Python

I have some data that are the integrals of an unknown curve within bins. For your interest, the data is ocean wave energy and the bins are for directions, e.g. 0-15 degrees. If possible, I would like to fit a curve on to the data that conserves the integrals within the bins. I've tried sketching it on a notepad with a pencil and it seems like it could be possible. Does anyone know of any curve-fitting tool in Python to do this, for example in the scipy interpolation sub-package?
Thanks in advance
Edit:
Thanks for the help. If I do it, it looks like I will try the method that is recommended in section 4 of this paper: http://journals.ametsoc.org/doi/abs/10.1175/1520-0485%281996%29026%3C0136%3ATIOFFI%3E2.0.CO%3B2. In theory, it basically uses matrices to make some 'fake' data from the known integrals between each band. When plotted, this data then produces an interpolated line graph that preserves the integrals.
It's a little outside my bailiwick, but I can suggest having a look at SciKits to see if there's anything there that might be useful. Other packages to browse would be pandas and StatsModels. Good luck!
If you have a curve f(x) which is an approximation to the integral of another curve g(x), i.e. f=int(g,x) then the two are related by the Fundamental theorem of calculus, that is, your original function is the derivative of the first curve g = df/dx. As such you can use numpy.diff or any of the higher order methods to approximate df/dx to obtain an estimate of your original curve.
One possibility: calculate the cumulative sum of the bin volumes (np.cumsum), fit an interpolating spline to it, and then take the derivative to get the curve.
scipy splines have methods to calculate the derivatives.
The only limitation, in case it is relevant in your case, the spline through the cumulative sum might not be monotonic, and the derivative might be negative over some intervals.
I guess that the literature on smoothing a histogram looks at similar constraints on the volume of the integral/bin, but I don't have any references ready.
1/ fit2histogram
Your question is about fitting an histogram. I just came through documentation for some Python package for Multi-Variate Pattern Analysis, PyMVPA, and some function for histogram fitting is proposed. An example is here: PyMVPA.
However, I guess that set of available distributions is limited to famous distributions.
2/ integral computation
As already mentionned, next solution is to approximate integral value, and to fit a model to the resulting set of data. Either you know explicit expression for the derivative, or you use computational derivation: finite difference, analytical method.

Categories

Resources