I have a problem: I have an expensive to compute 1D function (float->float). I want to approximate it with splines for efficiency.
I know I can define a set of knot points with a uniform grid on the function's domain, evaluate the function on that grid and calculate a spline over that set. But the functions are special - they have huge dull regions and some special places with complex behavior. I want an algorithm that adaptively finds some optimal set of the knot points, sapling them more densely where the function has a difficult shape, and less so when the spline makes a good job in approximating it.
How can I find a library (preferably Python, but at this point I will take anything open source) that does it with an automatic knot selection? I tried many Google searches and still found nothing
I have finally found one, that seems to be working. It's splipy. It's method fit produces a B-Spline interpolation of the function object, with a relatively precise way to control the accuracy of the interpolation.
The source is on GitHub.
Related
I have two related fitting parameters. They have the same fitting range. Let's call them r1 and r2. I know I can limit the fitting range using minuit.limits, but I have an additional criteria that r2 has to be smaller than r1, can I do that in iminuit?
I've found this, I hope this can help you!
Extracted from: https://iminuit.readthedocs.io/en/stable/faq.html
**Can I have parameter limits that depend on each other (e.g. x^2 + y^2 < 3)?**ΒΆ
MINUIT was only designed to handle box constrains, meaning that the limits on the parameters are independent of each other and constant during the minimisation. If you want limits that depend on each other, you have three options (all with caveats), which are listed in increasing order of difficulty:
Change the variables so that the limits become independent. For example, transform from cartesian coordinates to polar coordinates for a circle. This is not always possible, of course.
Use another minimiser to locate the minimum which supports complex boundaries. The nlopt library and scipy.optimize have such minimisers. Once the minimum is found and if it is not near the boundary, place box constraints around the minimum and run iminuit to get the uncertainties (make sure that the box constraints are not too tight around the minimum). Neither nlopt nor scipy can give you the uncertainties.
Artificially increase the negative log-likelihood in the forbidden region. This is not as easy as it sounds.
The third method done properly is known as the interior point or barrier method. A glance at the Wikipedia article shows that one has to either run a series of minimisations with iminuit (and find a clever way of knowing when to stop) or implement this properly at the level of a Newton step, which would require changes to the complex and convoluted internals of MINUIT2.
Warning: you cannot just add a large value to the likelihood when the parameter boundary is violated. MIGRAD expects the likelihood function to be differential everywhere, because it uses the gradient of the likelihood to go downhill. The derivative at a discrete step is infinity and zero in the forbidden region. MIGRAD does not like this at all.
I have a data set that looks like this:
I used the scipy.signal.find_peaks function to determine the peaks of the data set, and it works out fine enough, but since this function determines the local maxima of the data, it is not neglecting the noise in the data which causes overshoot. Therefore what I'm determining isn't actually the location of the most likely maxima, but rather the location of an 'outlier'.
Is there another, more exact way to approximate the local maxima?
I'm not sure that you can consider those points to be outliers so easily, as they look to be close to the place I would expect them to be. But if you don't think they are a valid approximation let me tell you three other ways you can use.
First option
I would construct a physical model of these peaks (a mathematical formula) and do a fitting analysis around the peaks. You can for instance, suppose that the shape of the plot is the sum of some background model (maybe constant or maybe more complicated) plus some Gaussian peaks (or Lorentzian).
This is what we usually do in physics. Of course it will be more accurate taking knowledge from the underlying processes, which I don't have.
Having a good model, this way is definitely better as taking the maximum values, as even if they are not outliers, they still have errors which you want to reduce.
Second option
But if you want a easier way, just a rough estimation and you already found the location of the three peaks programmatically, you can make the average of a few points around the maximum. If you do it so, the function np.where or np.argwhere tend to be useful for this kind of things.
Third option
The easiest option is taking the value by hand. It could sound unacceptable for academic proposes and it probably is. Even worst, it is not a programmatic way, and this is SO. But at the end, it depends on why and for what you need those values and on the confidence interval you need for your measurement.
My task at hand is finding specific contours in a cloud of points. They need to be determined as accurately as possible. Therefore higher order polynomial interpolation must be used. I have data consisting of points in a grid, each point having its own height.
I had great success finding contours with matplotlib.contours function. Then I directly used matplotlib.cntr (C++ algorithm) to determine specific contours. But because the file uses marching squares algorithm (linear interpolation) the precision of found contours is not good enough.
I am looking for alternative ways to solve this problem. A suggested solution was to
Use bicubic interpolation to find analytical function around a specific point.
Followed by calculation of the function derivative.
Derived function is then used to determine direction in which derivative stays at 0 (using point as a point of view). By moving a small distance in that direction height should remain constant.
Using this algorithm a contour should be found. Would this algorithm work, can I take any shortcuts when implementing it (by using already implemented functions)?
Implementing this algorithm would take quite the effort on my part, and besides it may only work in theory. I am guessing that the numerical error should still be present (the derivative may not be exactly 0) and may in the end even surpass the current one (using marching squares). I want to consider other, easier and faster methods to solve this problem.
What other algorithm could be used to solve it?
Did anyone ever face a similar problem? How did you solve it?
I got this array of data and I need to calculate the area under the curve, so I use the Numpy library and the Scipy library which contain the functions trapz in Numpy and integrate.simps in Scipy for a Numerical Integration which gave me a really nice result in both cases.
The problem now is, that I need the error for each one or at least the error for the Trapezoidal Rule. The thing is, that the formula for that ask me a function, which obviously I don't have. I have been researching for a way to obtain the error but always return to the same point...
Here are the pages of scipy.integrate http://docs.scipy.org/doc/scipy/reference/integrate.html and trapz in Numpy http://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html I try and see a lot of code about the Numerical Integration and prefer to use the existing ones...
Any ideas please?
While cel is right that you cannot determine an integration error if you don't know the function, there is something you can do.
You can use curve fitting to fit a function through the available data points. You can then use that function for error estimation.
If you expect the data to fit a certain kind of function like a sine, log or exponential it is good to use that as a basis for curve fitting.
For instance, if you are measuring the drag on a moving car, it is known that this mostly proportional to the velocity squared because of air resistance.
However, if you do not have any knowledge about the applicable function then assuming you have N data points, there is a polynomial of the N-1 degree that fits exactly though all those data points. Determining such a polynomial from the data is solving a system of lineair equations. See e.g. polynomial interpolation. You could use this polynomial as an estimate for the unknown real function. Note however that outside the range of data points this polynomial might be wildly inaccurate.
I need to crossmatch a list of astronomical coordinates with different catalogues, and I want to decide a maximum radius for the crossmatch. This will avoid mismatches between my list and the catalogues.
To do this, I compute the separation between the best match with the catalogue for each object in my list. My initial list is supossed to be the position of a known object, but it could happend that it is not detected in the catalog, and my coordinates may suffer from small offsets.
They way I am computing the maximum radius is by fitting the gaussian kernel density of the separation with a gaussian, and use the center + 3sigmas value. The method works nicely for most of the cases, but when a small subsample of my list has an offset, I have two gaussians instead. In these cases, I will specify the max radius in a different way.
My problem is that when this happens, curve_fit can't normally do the fit with one gaussian. For a scientific publication, I will need to justify the "no fit" in curve_fit, and in which cases the "different way" is used. Could someone give me a hand on what this means in mathematical terms?
There are varying lengths to which you can go justifying this or that fitting ansatz --- which strongly depends on the details of your specific case (eg: why do you expect a gaussian to work in a first place? to what depth you need/want to delve into why exactly a certain fitting procedure fails and what exactly is a fail etc).
If the question is really about the curve_fit and its failure to converge, then show us some code and some input data which demonstrate the problem.
If the question is about how to evaluate the goodness-of-fit, you're best off going back to the library and picking a good book on statistics.
If all you look for is way of justifying why in a certain case a gaussian is not a good fitting ansatz, one way would be to calculate the moments: for a gaussian distribution 1st, 2nd, 3rd and higher moments are related to each other in a very precise way. If you can demonstrate that for your underlying data the relation between moments is very different, it sounds reasonable that these data can't be fit by a gaussian.