I require a python solution to force a polynomial to end at a specific point. I have read the solutions offered here: How to do a polynomial fit with fixed points to a similar question but have been unable to get any of these methods working on my data set as are not defining end points but locations to for a polynomial to pass through.
I therefore require a solution to force a polynomial curve to end at a specific location.
To put this in context the example that I need this for is shown in the image below, I require a line of best fit for the data below, the green points represent the raw data and the pink points are the mean of the green points for every x value. The best fit should be a 3rd order polynomial until the data becomes a horizontal linear line. The black line is my current attempt at a line of best fit using np.ployfit(), I have defined the polynomial to only plot until the location where I would then start the linear best fit line but as you can see the tail of the polynomial is far to low and hence I want to force it to end / go through a specific point.
I am open to all options to get a nice mathematically sensible best fit as have been banging my head against this problem for too long now.
Using a logistic sigmoid instead of a polynomial:
Formula and parameters, generated from some of your sample data points(taken from the picture):
where S(x) is the sigmoid function.
A distinct approach, since you seem to want to identify outliers in the horizontal dimension.
Stratify your data by power, say into 10-kW intervals, so that each interval contains 'enough' points to use some robust estimator of their dispersion. In each stratum discard upper extreme observations until the robust estimate falls to a stable value. Now use the maximum values for the strata as the criteria against which to gauge whether any given device is to be regarded as 'inefficient'.
Related
I have two related fitting parameters. They have the same fitting range. Let's call them r1 and r2. I know I can limit the fitting range using minuit.limits, but I have an additional criteria that r2 has to be smaller than r1, can I do that in iminuit?
I've found this, I hope this can help you!
Extracted from: https://iminuit.readthedocs.io/en/stable/faq.html
**Can I have parameter limits that depend on each other (e.g. x^2 + y^2 < 3)?**¶
MINUIT was only designed to handle box constrains, meaning that the limits on the parameters are independent of each other and constant during the minimisation. If you want limits that depend on each other, you have three options (all with caveats), which are listed in increasing order of difficulty:
Change the variables so that the limits become independent. For example, transform from cartesian coordinates to polar coordinates for a circle. This is not always possible, of course.
Use another minimiser to locate the minimum which supports complex boundaries. The nlopt library and scipy.optimize have such minimisers. Once the minimum is found and if it is not near the boundary, place box constraints around the minimum and run iminuit to get the uncertainties (make sure that the box constraints are not too tight around the minimum). Neither nlopt nor scipy can give you the uncertainties.
Artificially increase the negative log-likelihood in the forbidden region. This is not as easy as it sounds.
The third method done properly is known as the interior point or barrier method. A glance at the Wikipedia article shows that one has to either run a series of minimisations with iminuit (and find a clever way of knowing when to stop) or implement this properly at the level of a Newton step, which would require changes to the complex and convoluted internals of MINUIT2.
Warning: you cannot just add a large value to the likelihood when the parameter boundary is violated. MIGRAD expects the likelihood function to be differential everywhere, because it uses the gradient of the likelihood to go downhill. The derivative at a discrete step is infinity and zero in the forbidden region. MIGRAD does not like this at all.
I'm trying to approximate a digital filter impulse response with a set of piecewise polynomials:
The number of segment(knot) is a free parameter on the entire interval [0,1). To give perspective of the problem size, I'm expecting something like 256 to 1024 segments for a good approximation.
The knot positions have to fall on a power of 2 integer grid on the interval [0,1] for easy hardware implementation of the polynomial selection.
The polynomial order for each segment can be different, the lower the better. The maximum order is known (could be set to 2, or 3).
The length of each segment does not need to be equal as long as (2) is obeyed.
For example a linear segment on [0, 1/256) followed by a 3rd order segment on [1/256, 22/256) followed by a 2nd order segment on [22/256, 1) would be fine.
The goal is to minimize some kind of combination of the number of segments and their order to reduce overall computation/memory cost (tradeoff to be defined), while the mean square or maximum error between fitted curve and ideal is below a given value.
I know I could brute force search the entire space and calculate the max error for each allowed polynomial order, for each allowed segment. I could then 'construct' the final piecewise curve by walking through this large table - although I'm not entirely sure how to exactly do the final construct.
I'm wondering if this is not a 'known' type of problem for which algorithms already exist. Any comments welcome!
You can try a variant of the Ramer–Douglas–Peucker algorithm. It's an easy-to-implement algorithm for simplifying polygonal lines. In your context, the polygonal line is a sample of your filter curve at the grid points and the algorithm certifies that the maximal error is smaller than some threshold.
If you need a smooth curve you can modify the algorithm to implement a quadratic spline interpolation instead of a polyline approximation (which correspond to a linear spline interpolation) and similarly a cubic spline for second order continuity. In each iteration the farthest sample point is added to the interpolation point set and the interpolation spline is re-computed.
A slightly different alternative is to use a least-square approximating spline instead of an interpolation spline. A new knot will be added in each iteration at the grid point with farthest distance but the curve will not be required to pass through it.
This approach, while simple, answers most of your requirements and gives good results in practice.
However, it may not give the theoretical optimal solution (although I don't currently have a counter example).
I'm attempting to fit a cubic spline to a time-series using scipy's interpolate.splrep. However, I can't work out how to determine a valid smoothing condition without manually adjusting it by eye. It seems like there should be a way to calculate this condition.
According to the docs, the smoothing condition should be determined in this way:
Recommended values of s depend on the weights, w. If the weights represent the inverse of the standard-deviation of y, then a good s value should be found in the range (m-sqrt(2*m),m+sqrt(2*m)) where m is the number of datapoints in x, y, and w. default : s=m-sqrt(2*m) if weights are supplied. s = 0.0 (interpolating) if no weights are supplied.
However, after much testing, I haven't been able to get this to work (where smoothing is non-zero). The "fit" usually ends up looking like an arbitrary 3rd degree polynomial. I'm dealing with a dataset that should be a high degree polynomial when fit properly. Just from fiddling around with the smoothing condition, I've found s = 1E-9 to balance closeness and smoothing well (I'm using weights with the data).
Does anyone have any ideas what's going on?
There are reasons I'm using a cubic-spline over other interpolation methods, but I'm wondering if I should be looking elsewhere...
I have a specific problem where I need a equation fit to a set of data where I can extract a general function (can be complex) that will give me a Y value that is very close to the data as possible-- as if I used the data as a lookup table.
I have been reading up on various approaches and Multivariate adaptive regression splines [MARS] seemed like an excellent candidate. The problem I have is that it is not detecting/fitting a hinge at a very important segment of the data.
I'm primarily using R with the Earth package, with the intention to put an equation in to Excel. I can use other languages or packages if it will give me the results I need.
PROBLEM:
At the low end of my data I have a small set of values that are an important lower bound that need to have a hinge or knot placed.
The rest of the data should have automatic hinge/knot detection.
Example:
X Y
0 130
1 130
10000 130
rest of X past 10000 with increasing Y's at various rates.
It will average the 0 through 10,000 range into the increasing Y values where if I
predict(model, 5000) I may get say 150 as a result. The line needs a flat linear segment, then hinge at 10000. This lack of a hinge causes all the high values of X to be very accurate in the MARS model output but the low values of X to diverge significantly from the base data.
I would rather not manually place this as the lower end may change and I would like a generalized approach.
Does anyone know of an approach similar to MARS that can provide
Automatic knot/hinge detection
Output is an equation I could place in to Excel
If the automatic detection is failing on an important section the ability to manually specify a point to hinge on.
The MARS approach is working for all the other breakpoints in the data but because of the limited "range" of the lower bound it doesn't place a hinge there even with pruning turned off.
Does anyone know of a better approach?
I need to crossmatch a list of astronomical coordinates with different catalogues, and I want to decide a maximum radius for the crossmatch. This will avoid mismatches between my list and the catalogues.
To do this, I compute the separation between the best match with the catalogue for each object in my list. My initial list is supossed to be the position of a known object, but it could happend that it is not detected in the catalog, and my coordinates may suffer from small offsets.
They way I am computing the maximum radius is by fitting the gaussian kernel density of the separation with a gaussian, and use the center + 3sigmas value. The method works nicely for most of the cases, but when a small subsample of my list has an offset, I have two gaussians instead. In these cases, I will specify the max radius in a different way.
My problem is that when this happens, curve_fit can't normally do the fit with one gaussian. For a scientific publication, I will need to justify the "no fit" in curve_fit, and in which cases the "different way" is used. Could someone give me a hand on what this means in mathematical terms?
There are varying lengths to which you can go justifying this or that fitting ansatz --- which strongly depends on the details of your specific case (eg: why do you expect a gaussian to work in a first place? to what depth you need/want to delve into why exactly a certain fitting procedure fails and what exactly is a fail etc).
If the question is really about the curve_fit and its failure to converge, then show us some code and some input data which demonstrate the problem.
If the question is about how to evaluate the goodness-of-fit, you're best off going back to the library and picking a good book on statistics.
If all you look for is way of justifying why in a certain case a gaussian is not a good fitting ansatz, one way would be to calculate the moments: for a gaussian distribution 1st, 2nd, 3rd and higher moments are related to each other in a very precise way. If you can demonstrate that for your underlying data the relation between moments is very different, it sounds reasonable that these data can't be fit by a gaussian.