I have data where I want to fit the Fourier3 series, I looked to this answer: here and tried different algorithms from different packages (like symfit, and scipy). But when I plot the data, different packages give me get this result:
enter image description here
Currently, I'm using the curve_fit package from scipy and here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
def fourier(x, *as_bs):
sum_a = 0
sum_b = 0
j = 1
w = as_bs[0]
a0 = as_bs[1]
for i in range(2, len(as_bs)-1, 2):
sum_a += as_bs[i] * np.cos(j * w * x)
sum_b += as_bs[i+1] * np.sin(j * w * x)
j = j + 1
return a0 + sum_a + sum_b
T = pd.read_excel('FS_data.xlsx')
A = pd.DataFrame(T)
xdata = np.array(A.iloc[:, 0])
ydata = np.array(A.iloc[:, 1])
# fits
popt, pcov = curve_fit(fourier, xdata, ydata, [np.random.rand(1)] * 8)
print(popt)
data_fit = fourier(ydata, *popt)
print(data_fit)
plt.plot(ydata)
plt.plot(data_fit, label='after fitting')
plt.legend()
plt.show()
So, my code basically will read random 8 numbers and assign them as initial guesses for (f, a0, a1, b1, a2, b2, a3, b3) respectively.
I tried to fit the data on Matlab to check if the data can be fitted with the fourier3 and the results there are great:
enter image description here
I printed the output on both Python and Matlab to compare and here is the results for both:
Python:
w = 5.66709943e-01
a0 = 3.80499132e+01
a1 = 5.56883486e-04
b1 = -3.88408379e-04
a2 = -3.88408379e-04
b2 = 3.32951592e-04
a3 = 3.15641900e-04
b3 = 1.96414168e-04
Matlab:
a0 = 38.07 (38.07, 38.08)
a1 = 0.5352 (0.4951, 0.5753)
b1 = -0.5788 (-0.5863, -0.5714)
a2 = -0.3728 (-0.413, -0.3326)
b2 = 0.5411 (0.492, 0.5901)
a3 = 0.2357 (0.2226, 0.2488)
b3 = 0.05895 (0.02773, 0.09018)
w = 0.0003088
So as noted, only the value for a0 was correct, but the others are very far from Matlab.
So why I'm getting this result in Python? What I'm doing wrong?
Here is the data for those who like to test it out:
https://docs.google.com/spreadsheets/d/18lL1iMZ3kdaqUUtRDLNRK4A3uCPzOrXt/edit?usp=sharing&ouid=112684448221465330517&rtpof=true&sd=true
I am not into Matlab, so I don't know, which additional work the Matlab fit does to estimate starting values for a non-linear fit. I can say, though, that curve_fit does non at all, i.e. all values are assumed to be on the order of 1. The easiest way, would have been to rescale the x axis to the range [0, 2 pi]. Hence, the problem of the OP is, once again, wrong starting values. Rescaling requires, however, the knowledge that the main wave to be fitted is approximately the width of the data set. Moreover, we need to assume that all other fit parameters are also of the order 1. Luckily, this is the case, so this would have worked:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
xdat, ydat = np.loadtxt( "data.tsv", unpack=True, skiprows=1 )
def fourier(x, *as_bs):
sum_a = 0
sum_b = 0
j = 1
w = as_bs[0]
a0 = as_bs[1]
for i in range(2, len( as_bs ) - 1, 2 ):
sum_a += as_bs[i] * np.cos( j * w * x )
sum_b += as_bs[i+1] * np.sin( j * w * x )
j = j + 1
return a0 + sum_a + sum_b
"""
lets rescale the data to get the base frequency in the range of one
"""
xmin = min( xdat )
xmax = max( xdat )
xdat = ( xdat - xmin ) / (xmax - xmin ) * 2 * np.pi
popt, pcov = curve_fit(
fourier,
xdat, ydat,
p0 = np.ones(8)
)
### here I assume that higher order are similar to lower orders
### but slightly smaller. ... hoping that the fit correts errors in
### this assumption
print(popt)
### scale back w noting that it scales inverse to x
print( popt[0] * 2 * np.pi / (xmax - xmin ) )
data_fit = fourier( xdat, *popt )
If we cannot make the assumptions above, we may only assume that there is a base frequency with a dominant contribution to the signal (Note that this is not always true). In this case we can pre-calculate starting guesses in an non-iterative way.
The solution looks a bit more complicated:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
from scipy.integrate import cumtrapz
xdat, ydat = np.loadtxt( "data.tsv", unpack=True, skiprows=1 )
def fourier(x, *as_bs):
sum_a = 0
sum_b = 0
j = 1
w = as_bs[0]
a0 = as_bs[1]
for i in range(2, len( as_bs ) - 1, 2 ):
sum_a += as_bs[i] * np.cos( j * w * x )
sum_b += as_bs[i+1] * np.sin( j * w * x )
j = j + 1
return a0 + sum_a + sum_b
#### initial guess
"""
This uses the fact that if y = a sin w t + b cos w t + c we have
int int y = -y/w^2 + c/2 t^2 + d t + e
i.e. we can get 1/w^2 as linear fit parameter without the danger of
a non-linear fit iterative process running into a local minimum
for details see:
https://scikit-guess.readthedocs.io/en/sine/_downloads/4b4ed1e691ff195be3ca73879a674234/Regressions-et-equations-integrales.pdf
"""
Sy = cumtrapz( ydat, xdat, initial=0 )
SSy = cumtrapz( Sy, xdat, initial=0 )
ST = np.array( [
ydat, xdat**2, xdat, np.ones( len( xdat ) )
] )
S = np.transpose( ST )
eta = np.dot( ST, SSy )
A = np.dot( ST, S )
sol = np.linalg.solve( A, eta )
wFit = np.sqrt( -1 / sol[0] )
### linear parameters
"""
Once we have a good guess for w we can get starting guesses for
a, b and c from a standard linear fit
"""
ST = np.array( [
np.sin( wFit * xdat ), np.cos( wFit * xdat ), np.ones( len( xdat ) )
])
S = np.transpose( ST )
eta = np.dot( ST, ydat )
A = np.dot( ST, S )
sol = np.linalg.solve( A, eta )
a1 = sol[0]
b1 = sol[1]
a0 = sol[2]
### final non-linear fit
"""
Now we can use the guesses from above as input for the final
non-linear fit. Hopefully, we are now close enough to the global minimum
and have the algorithm converge reasonably
"""
popt, pcov = curve_fit(
fourier,
xdat, ydat,
p0=[
wFit, a0, a1, b1,
a1 / 2, b1 / 2,
a1 / 4, b1 / 4
]
)
### here I assume that higher order are similar to lower orders
### but slightly smaller. ... hoping that the fit correts errors in
### this assumption
print(popt)
data_fit = fourier( xdat, *popt )
plt.plot( xdat, ydat, ls="", marker="o", ms=0.5, label="data" )
plt.plot( xdat, data_fit, label='fitting')
plt.legend()
plt.show()
Both providing basically the same solution, with the latter code being applicable to more cases with less assumptions.
Related
The curve and my attempt at fitting:
I wish to find the coefficients (A, B, C, D, E, F) for my model function: A * x**2 + B * x + C * np.cos(D * x - E) + F that would almost exactly match the blue curve. But because I used SciPy's optimization curve_fit, which finds the curve with the lowest square difference, it's going to look like the red curve in the image. While I would want the red curve to match up with the crests and troughs of the blue curve. Can scipy do this and how do you do it. If not is there any other library that can handle this?
This is the method mentioned by JJacquelin to make a double linear fit. It fits the data and can be used to provide initial guesses for the non-linear fit. Note that for this method, it is required to express P sin( w t + p ) as A sin( w t ) + B cos( w t ), but that is easily done.
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import cumtrapz
from scipy.optimize import curve_fit
def signal( x, A, B, C, D, E, F ):
### note: C, D, E, F have different meaning here
r = (
A * x**2
+ B * x
+ C
+ D * np.sin( F * x )
+ E * np.cos( F * x )
)
return r
def signal_p( x, A, B, C, D, E, F ):
r = (
A * x**2
+ B * x
+ C * np.sin( D * x - E )
+ F
)
return r
testparams = [ -1, 1, 3, 0.005, 0.03, 22 ]
### test data with noise
xl = np.linspace( -0.3, 1.6, 190 )
sl = signal( xl, *testparams )
sl += np.random.normal( size=len( xl ), scale=0.005 )
### numerical integrals
Sl = cumtrapz( sl, x=xl, initial=0 )
SSl = cumtrapz( Sl, x=xl, initial=0 )
### fitting the integro-differential equation to get the frequency
"""
note:
with y = A x**2 +...+ D sin() + E cos()
the double integral int( int(y) ) = a x**4 + ... - y/F**2
"""
VMXT = np.array( [ xl**4, xl**3, xl**2, xl, np.ones( len( xl ) ), sl ] )
VMX = VMXT.transpose()
A = np.dot( VMXT, VMX )
SV = np.dot( VMXT, SSl )
AI = np.linalg.inv( A )
result = np.dot( AI , SV )
print ( "Fit: ",result )
F = np.sqrt( -1 / result[-1] )
print("F = ", F)
### Fitting the linear parameters with the frequency known
VMXT = np.array(
[
xl**2, xl, np.ones( len( xl ) ),
np.sin( F * xl), np.cos( F * xl )
]
)
VMX = VMXT.transpose()
A = np.dot( VMXT, VMX )
SV = np.dot( VMXT, sl )
AI = np.linalg.inv( A )
A, B, C, D, E = np.dot( AI , SV )
print( A, B, C, D, E )
### Non-linear fit with initial guesses
amp = np.sqrt( D**2 + E**2 )
phi = -np.arctan( D / E )
opt, cov = curve_fit( signal_p, xl, sl, p0=( A, B, amp, F, phi, C ) )
print( opt )
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot(
xl, sl,
ls='', marker='+', label="data", markersize=5
)
ax.plot(
xl, signal( xl, A, B, C, D, E, F ),
ls="--", label="double linear fit"
)
ax.plot(
xl, signal_p( xl, *opt ),
ls=":", label="non-linear"
)
ax.legend( loc=0 )
ax.grid()
plt.show()
Providing
Fit: [-0.083161 0.1659759 1.49879056 0.848999 0.130222 -0.001990]
F = 22.414133356157887
-0.998516 0.998429 3.000265 0.012701 0.026926
[-0.99856269 0.9973273 0.0305014 21.96402992 -1.4215656 3.00100979]
and
When using the non-linear fit without initial guesses, I get basically a parabola. One can understand why when visualizing a sine half-wave. That is basically a parabola as well. Hence, the non-linear fit drives the according parameters in that direction, especially knowing that the default initial guesses are 1. So one is far off the small amplitude and the high frequency. The fit only finds a local minimum in the chi-square hyper-plane.
I'm trying to make a piecewise linear fit consisting of 3 pieces whereof the first and last pieces are constant. As you can see in this figure
don't get the expected fit, since the fit doesn't capture the 3 linear pieces clearly visual from the original data points.
I've tried following this question and expanded it to the case of 3 pieces with the two constant pieces, but I must have done something wrong.
Here is my code:
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams['figure.figsize'] = [16, 6]
x = np.arange(0, 50, dtype=float)
y = np.array([50 for i in range(10)]
+ [50 - (50-5)/31 * i for i in range(1, 31)]
+ [5 for i in range(10)],
dtype=float)
def piecewise_linear(x, x0, y0, x1, y1):
return np.piecewise(x,
[x < x0, (x >= x0) & (x < x1), x >= x1],
[lambda x:y0, lambda x:(y1-y0)/(x1-x0)*(x-x0)+y0, lambda x:y1])
p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(0, 50, 101)
plt.plot(x, y, "o", label='original data')
plt.plot(xd, piecewise_linear(xd, *p), label='piecewise linear fit')
plt.legend()
The accepted answer to the previous mentioned question suggest looking at segments_fit.ipynb for the case of N parts, but following that it doesn't seem that I can specify, that the first and last pieces should be constant.
Furthermore I do get the following warning:
OptimizeWarning: Covariance of the parameters could not be estimated
What do I do wrong?
You could directly copy the segments_fit implementation
from scipy import optimize
def segments_fit(X, Y, count):
xmin = X.min()
xmax = X.max()
seg = np.full(count - 1, (xmax - xmin) / count)
px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.01].mean() for x in px_init])
def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
return px, py
def err(p):
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)
r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
return func(r.x)
Then you apply it as follows
import numpy as np;
# mimic your data
x = np.linspace(0, 50)
y = 50 - np.clip(x, 10, 40)
# apply the segment fit
fx, fy = segments_fit(x, y, 3)
This will give you (fx,fy) the corners your piecewise fit, let's plot it
import matplotlib.pyplot as plt
# show the results
plt.figure(figsize=(8, 3))
plt.plot(fx, fy, 'o-')
plt.plot(x, y, '.')
plt.legend(['fitted line', 'given points'])
EDIT: Introducing constant segments
As mentioned in the comments the above example doesn't guarantee that the output will be constant in the end segments.
Based on this implementation the easier way I can think is to restrict func(p) to do that, a simple way to ensure a segment is constant, is to set y[i+1]==y[i]. Thus I added xanchor and yanchor. If you give an array with repeated numbers you can bind multiple points to the same value.
from scipy import optimize
def segments_fit(X, Y, count, xanchors=slice(None), yanchors=slice(None)):
xmin = X.min()
xmax = X.max()
seg = np.full(count - 1, (xmax - xmin) / count)
px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.01].mean() for x in px_init])
def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py = py[yanchors]
px = px[xanchors]
return px, py
def err(p):
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)
r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
return func(r.x)
I modified a little the data generation to make it more clear the effect of the change
import matplotlib.pyplot as plt
import numpy as np;
# mimic your data
x = np.linspace(0, 50)
y = 50 - np.clip(x, 10, 40) + np.random.randn(len(x)) + 0.25 * x
# apply the segment fit
fx, fy = segments_fit(x, y, 3)
plt.plot(fx, fy, 'o-')
plt.plot(x, y, '.k')
# apply the segment fit with some consecutive points having the
# same anchor
fx, fy = segments_fit(x, y, 3, yanchors=[1,1,2,2])
plt.plot(fx, fy, 'o--r')
plt.legend(['fitted line', 'given points', 'with const segments'])
You can get a one line solution (not counting the import) using univariate splines of degree one. Like this
from scipy.interpolate import UnivariateSpline
f = UnivariateSpline(x,y,k=1,s=0)
Here k=1 means we interpolate using polynomials of degree one aka lines. s is the smoothing parameter. It decides how much you want to compromise on the fit to avoid using too many segments. Setting it to zero means no compromises i.e. the line HAS to go threw all points. See the documentation.
Then
plt.plot(x, y, "o", label='original data')
plt.plot(x, f(x), label='linear interpolation')
plt.legend()
plt.savefig("out.png", dpi=300)
gives
This i consider a funny non-linear approach that works quite well.
Note that even though this is highly non-linear it approximates the linear behavior very well. Moreover, the fit parameters provide the linear results. Only for the offset b a little transformation and according error propagation is required. (Also, I don't care about the value of p as long as it is somewhat larger than 5)
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
np.set_printoptions( linewidth=250, precision=4)
np.set_printoptions( linewidth=250, precision=4)
### piecewise linear function for data generation
def pwl( x, m, b, a1, a2 ):
if x < a1:
out = pwl( a1, m, b, a1, a2 )
elif x > a2:
out = pwl( a2, m, b, a1, a2 )
else:
out = m * x + b
return out
### non-linear approximation
def func( x, m, b, a1, a2, p ):
out = b + np.log(
1 / ( 1 + np.exp( -m *( x - a1 ) )**p )
) / p - np.log(
1 / ( 1 + np.exp( -m * ( x - a2 ) )**p )
) / p
return out
### some data
nn = 36
xdata = np.linspace( -5, 19, nn )
ydata = np.fromiter( (pwl( x, -2.1, 11.6, -1.1, 12.7 ) for x in xdata ), float)
ydata += np.random.normal( size=nn, scale=0.2)
### dense grid for printing
xth = np.linspace( -5, 19, 150 )
###fitting
popt, cov = curve_fit( func, xdata, ydata, p0=[-2, 11, -1, 10, 1])
mF, betaF, a1F, a2F, pF = popt
bF = betaF - mF * a1F
sol=( mF, bF, a1F, a2F, pF )
### transforming the covariance due to the b' -> b mapping
J1 = np.identity(5)
J1[1,0] = -popt[2]
J1[1,2] = -popt[0]
cov2 = np.dot( J1, np.dot( cov, np.transpose( J1 ) ) )
### results
print( cov2 )
for i, v in enumerate( ("m", "b", "a1", "a2", "p" ) ):
print( "{:>2} = {:+2.4e} ± {:0.4e}".format( v, sol[i], np.sqrt( cov2[i,i] ) ) )
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xdata, ydata, ls='', marker='+' )
ax.plot( xth, func( xth, -2, 11, -1, 10, 1 ) )
ax.plot( xth, func( xth, *popt ) )
plt.show()
Providing
[[ 1.3553e-04 -7.6291e-04 -4.3488e-04 4.5624e-04 1.2619e-01]
[-7.6291e-04 6.4126e-03 3.4560e-03 -1.5573e-03 -7.4983e-01]
[-4.3488e-04 3.4560e-03 3.4741e-03 -9.8284e-04 -4.2344e-01]
[ 4.5624e-04 -1.5573e-03 -9.8284e-04 3.0842e-03 -5.2739e+00]
[ 1.2619e-01 -7.4983e-01 -4.2344e-01 -5.2739e+00 3.1583e+05]]
m = -2.0810e+00 ± 9.7718e-03
b = +1.1463e+01 ± 6.7217e-02
a1 = -1.2545e+00 ± 5.0384e-02
a2 = +1.2739e+01 ± 4.7176e-02
p = +1.6840e+01 ± 2.9872e+02
and
I attempted to use the code below as a guide, in order to solve a non-linear equation, but I continue to get errors such as "object too deep for desired array" and "Result from function call is not a proper array of floats".
from scipy.optimize import fsolve
from math import exp
def equations(vars):
x, y = vars
eq1 = x+y**2-4
eq2 = exp(x) + x*y - 3
return [eq1, eq2]
x, y = fsolve(equations, (1, 1))
print(x, y)
My code will be posted below. It points out the error on the line "Q = fsolve(equations, 1)"
%reset -f
from math import *
T = 4 # N·m
ω = 1800*(pi/30) # rad/s
A1 = .00131 # m^2
A2 = .00055 # m^2
P1 = 12000 # Pa
P2 = 200000 # Pa
ρ = 1000 # kg/m^3
μ = .89e-3 # N·s/m^2
η = .57 # % efficiency
g = 9.81 # m/s^2
γ = 9810 # N/m^3
Z2 = .7 # m
P_motor = T*ω
print(P_motor)
P_pump = P_motor*η
print(P_pump)
from scipy.optimize import fsolve
def equations(vars):
Q = vars
eq1 = γ*Q*( ((P2-P1)/γ) + ( ( ( (Q**2) / (A2**2) ) - ( (Q**2) / (A1**2) ) )/2*g) + Z2) - P_pump
return [eq1]
Q = fsolve(equations, 1)
print(Q)
Since you have only one equation with one unknown variable, you don't need to put the output in a list. You can replace return [eq1] with return eq1.
The new code would be:
%reset -f
from math import *
T = 4 # N·m
ω = 1800*(pi/30) # rad/s
A1 = .00131 # m^2
A2 = .00055 # m^2
P1 = 12000 # Pa
P2 = 200000 # Pa
ρ = 1000 # kg/m^3
μ = .89e-3 # N·s/m^2
η = .57 # % efficiency
g = 9.81 # m/s^2
γ = 9810 # N/m^3
Z2 = .7 # m
P_motor = T*ω
print(P_motor)
P_pump = P_motor*η
print(P_pump)
from scipy.optimize import fsolve
def equations(vars):
Q = vars
eq1 = γ*Q*( ((P2-P1)/γ) + ( ( ( (Q**2) / (A2**2) ) - ( (Q**2) / (A1**2) ) )/2*g) + Z2) - P_pump
return eq1
Q = fsolve(equations, 1)
print(Q)
Output:
753.9822368615503
429.7698750110836
[0.0011589]
The documentation states
func : callable f(x, *args)
A function that takes at least one (possibly vector) argument, and returns a value of the same length.
if your input is a list of 2 values, it is expecting the function to return something of the same shape. So in your 1st example, you pass [x,y] and you return [eq1, eq2], so it works, but in second case, you pass a scalar and return a list
So, you can change your input to Q = fsolve(equations, (1,)) or change your returned value to return eq1:
def equations(vars):
Q = vars
eq1 = γ*Q*( ((P2-P1)/γ) + ( ( ( (Q**2) / (A2**2) ) - ( (Q**2) / (A1**2) ) )/2*g) + Z2) - P_pump
return eq1
I'm using LMFIT to fit a piecewise polynomials to the first quadrant of a sine wave.
I would like to be able to add a constraint on the polynomial output - as opposed to on its parameters.
For example, I would like to ensure that the output is >= 0 and <= 1.0 (which of course only affects the first and last segment in the code below).
Another use case if if I want the polynomial to pass through some specific (x,y) exact points.
I understand this might be better done with np.polyfit but eventually I want to add more non-linear constraints and the LMFIT framework is more flexible.
import numpy as np
from lmfit.models import LinearModel
#split sine wave in 4 segments with 1024 points
nseg = 4
frac = 2**10
npoints = nseg*frac
xfrac = np.linspace(0, 1, num=frac, endpoint=False)
x = np.linspace(0, 1, num=npoints, endpoint=False)
y = np.sin(x*np.pi/2)
yseg = np.reshape(y, (nseg, frac))
mod = LinearModel()
coeff = []
bestfit = []
for i in range(nseg):
pars = mod.guess(yseg[i], x=xfrac)
out = mod.fit(yseg[i], pars, x=xfrac)
coeff.append([out.best_values['slope'], out.best_values['intercept']])
bestfit.append(out.best_fit)
bestfit = np.reshape(bestfit, (1, npoints))[0]
Turns out this is done by adding constraints on the parameters themselves that turns into the right constraint on the model output.
Using a custom model for linear interpolation it can be done as following:
def func(x, c0, c1):
return c0 + c1*x
pmodel = Model(func)
params = Parameters()
params.add('c0')
params.add('clip', value=0, max=1.0, vary=True)
params.add('c1', expr='clip-c0')
One option might be using splines.
A quick and dirty approach, just to present the idea, might look like this:
import matplotlib.pyplot as plt
import numpy as np
## quich and dirty spline function
def l_spline(x, abc ):
if isinstance( x, ( list, tuple, np.ndarray ) ):
out = [ l_spline( elem, abc ) for elem in x]
else:
a, b, c = abc
if x < a:
f = lambda t: 0
elif x < b:
f = lambda t: ( t - a ) / ( b - a )
elif x < c:
f = lambda t: -( t - c ) / (c - b )
else:
f = lambda t: 0
out = f(x)
return out
### test data
xl = np.linspace( 0, 4, 150 )
sl = np.fromiter( ( np.sin( elem ) for elem in xl ), np.float )
### test splines with manual double knots on first and last
yl = dict()
yl[0] = l_spline( xl, ( 0, 0, .4 ) )
for i in range(1, 10 ):
yl[i] = l_spline( xl, ( (i - 1 ) * 0.4 , i * 0.4, (i + 1 ) * 0.4 ) )
yl[10] = l_spline( xl, ( 3.6, 4, 4 ) )
## This is the most simple linear least square for the coefficients
AT = list()
for i in range( 11 ):
AT.append( yl[i] )
AT = np.array( AT )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, sl )
## adding up the weigthed sum
out = np.zeros( len( sl ) )
for a, l in zip( v, AT ):
out += a * l
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xl, sl, ls=':' )
for i in range( 11 ):
ax.plot( xl, yl[i] )
ax.plot( xl, out, color='k')
plt.show()
Looks like this:
Instead of the simple linear optimization one could use more complex functions to ensure that no parameter is larger than 1. This automatically ensures that the function does not go beyond 1. A fixed point can be established by setting the according b-spline to a fixed value, i.e. not fitting its parameter.
I am trying to make a gaussian fit on a function that is messy. I want to only fit the exterior outer shell (these are not just the max values at each x, because some of the max values will be too low too, because the sample size is low).
from scipy.optimize import curve_fit
def Gauss(x, a, x0, sigma, offset):
return a * np.exp(-np.power(x - x0,2) / (2 * np.power(sigma,2))) + offset
def fitNormal(x, y):
popt, pcov = curve_fit(Gauss, x, y, p0=[np.max(y), np.median(x), np.std(x), np.min(y)])
return popt
plt.plot(xPlot,yPlot, 'k.')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Y(x)')
x,y = xPlot,yPlot
popt = fitNormal(x, y)
minx, maxx = np.min(x), np.max(x)
xFit = np.arange(start=minx, stop=maxx, step=(maxx-minx)/1000)
yFitTest = Gauss(xPlot, popt[0], popt[1], popt[2], popt[3])
print('max fit test: ',np.max(yFitTest))
print('max y: ',np.max(yPlot))
maxIndex = np.where(yPlot==np.max(yPlot))[0][0]
factor = yPlot[maxIndex]/yFitTest[maxIndex]
yFit = Gauss(xPlot, popt[0], popt[1], popt[2], popt[3]) * factor
plt.plot(xFit,yFit,'r')
This is an iterative approach similar to this post. It is different in the sense that the shape of the graph does not permit the use of convex hull. So the idea is to create a cost function that tries to minimize the area of the graph while paying high cost if a point is above the graph. Depending on the type of the graph in OP the cost function needs to be adapted. One also has to check if in the final result all points are really below the graph. Here one can fiddle with details of the cost function. One my, e.g., include an offset in the tanh like tanh( slope * ( x - offset) ) to push the solution farther away from the data.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import leastsq
def g( x, a, s ):
return a * np.exp(-x**2 / s**2 )
def cost_function( params, xData, yData, slope, val ):
a,s = params
area = 0.5 * np.sqrt( np.pi ) * a * s
diff = np.fromiter ( ( y - g( x, a, s) for x, y in zip( xData, yData ) ), np.float )
cDiff = np.fromiter( ( val * ( 1 + np.tanh( slope * d ) ) for d in diff ), np.float )
out = np.concatenate( [ [area] , cDiff ] )
return out
xData = np.linspace( -5, 5, 500 )
yData = np.fromiter( ( g( x, .77, 2 ) * np.sin( 257.7 * x )**2 for x in xData ), np.float )
sol=[ [ 1, 2.2 ] ]
for i in range( 1, 6 ):
solN, err = leastsq( cost_function, sol[-1] , args=( xData, yData, 10**i, 1 ) )
sol += [ solN ]
print sol
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1)
ax.scatter( xData, yData, s=1 )
for solN in sol:
solY = np.fromiter( ( g( x, *solN ) for x in xData ), np.float )
ax.plot( xData, solY )
plt.show()
giving
>> [0.8627445 3.55774814]
>> [0.77758636 2.52613376]
>> [0.76712184 2.1181137 ]
>> [0.76874125 2.01910211]
>> [0.7695663 2.00262339]
and
Here is a different approach using scipy's Differental Evolution module combined with a "brick wall", where if any predicted value during the fit is greater than the corresponding Y value, the fitting error is made extremely large. I have shamelessly poached code from the answer of #mikuszefski to generate the data used in this example.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
from scipy.optimize import differential_evolution
def g( x, a, s ):
return a * np.exp(-x**2 / s**2 )
xData = np.linspace( -5, 5, 500 )
yData = np.fromiter( ( g( x, .77, 2 )* np.sin( 257.7 * x )**2 for x in xData ), np.float )
def Gauss(x, a, x0, sigma, offset):
return a * np.exp(-np.power(x - x0,2) / (2 * np.power(sigma,2))) + offset
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = Gauss(xData, *parameterTuple)
multiplier = 1.0
for i in range(len(val)):
if val[i] < yData[i]: # ****** brick wall ******
multiplier = 1.0E10
return np.sum((multiplier * (yData - val)) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
minData = min(minX, minY)
maxData = max(maxX, maxY)
parameterBounds = []
parameterBounds.append([minData, maxData]) # parameter bounds for a
parameterBounds.append([minData, maxData]) # parameter bounds for x0
parameterBounds.append([minData, maxData]) # parameter bounds for sigma
parameterBounds.append([minData, maxData]) # parameter bounds for offset
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3, polish=False)
return result.x
# generate initial parameter values
geneticParameters = generate_Initial_Parameters()
# create values for display of fitted function
y_fit = Gauss(xData, *geneticParameters)
plt.scatter(xData, yData, s=1 ) # plot the raw data
plt.plot(xData, y_fit) # plot the equation using the fitted parameters
plt.show()
print('parameters:', geneticParameters)