Related
I have 2 formulas that describes the behaviour in 2 perpendicular axes. Also I have data from FEM simulation. The goal is to use least mean square method to get parameters Rr, Lr and cm.
I wanted to use scipy.curve_fit unfortunately it accepts only single function as an input. In this case i would need it to accept 2 functions as an input.
I did something in excel where arguments are inserted by hand to prove that it can/can not be perfectly fitted. They cant be but i would like to get "best" fit.
Any idea how it can be solved besides hard coding the last mean square method by hand to calculate deviances and find min?
Thank you so much for help.
If not using packages like lmfit or similar, fitting curves with shared parameters will always require to write some sort of wrapper. Personally I'd write a residual function and use scipy.optimize.least_squares, but if one insists to use curve_fit, this would be a possible wrapper:
import numpy as np
from scipy.optimize import curve_fit
def f1( x, c, L, R):
a = c**2 * x / ( R**2 + (x * L )**2 )
return a * x * L
def f2( x, c, L, R):
a = c**2 * x / ( R**2 + (x * L )**2 )
return a * R
def falt( x, c, L, R, n=-1):
"""
by construction x is the doubled x-list, 0 <= nn / l < 1
and >= 1/2 is the second part
"""
if isinstance( x, ( list, tuple, np.ndarray ) ):
### curve_fit sends array
l = len( x )
out = [ falt( xx, c, L, R, n=( nn / l ) ) for nn, xx in enumerate( x ) ]
else:
if n < 0.5:
out = f1( x, c, L, R)
else:
out = f2( x, c, L, R)
return out
## some data
c0=1.2
L0=0.3
R0= 0.45
size = 99
xl = np.linspace( 0, 10, size )
y1l = f1( xl , c0, L0, R0 ) + ( 2 * np.random.random( size=size ) - 1 ) * 0.1
y2l = f2( xl , c0, L0, R0 ) + ( 2 * np.random.random( size=size ) - 1 ) * 0.1
sol, err = curve_fit(
falt,
np.append( xl, xl ),
np.append( y1l, y2l )
)
print( sol )
You can put the relative importance of the functions in a hyperparamter lambda, then use func1 + lambda * func2.
With code:
importance_of_func1_relative_to_func2 = 1
def objective(args1, args2):
return func1(args1) * importance_of_func1_relative_to_func2 + func2(args2)
If I understand the goal (not sure of that!), I think that what you might want to do is have a single function that evaluates your 2 values for Fperp and Fpara and then concatenates them. You wrote those as both being multiplied by |z| (maybe abs(zhat)?) -- I cannot tell if that should be a common scaling factor, a fitting variable, or some other array of values...
Anyway, I might suggest a function like
def f_model(omega, cm, rr, lr, zhat):
ll = lr * omega
scale = abs(zhat) * cm**2 / (rr**2 + ll**2)
fpara = scale * ll * omega
fperp = scale * rr * omega
return np.concatenate((fpara, fperp))
Then you would want to arrange the data that you model with this function to also be the concatenation of the data corresponding to fpara and fperp.
That concatenation would effectively fit Fpara and Fperp together, weighting them evenly in the fit.
The curve and my attempt at fitting:
I wish to find the coefficients (A, B, C, D, E, F) for my model function: A * x**2 + B * x + C * np.cos(D * x - E) + F that would almost exactly match the blue curve. But because I used SciPy's optimization curve_fit, which finds the curve with the lowest square difference, it's going to look like the red curve in the image. While I would want the red curve to match up with the crests and troughs of the blue curve. Can scipy do this and how do you do it. If not is there any other library that can handle this?
This is the method mentioned by JJacquelin to make a double linear fit. It fits the data and can be used to provide initial guesses for the non-linear fit. Note that for this method, it is required to express P sin( w t + p ) as A sin( w t ) + B cos( w t ), but that is easily done.
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import cumtrapz
from scipy.optimize import curve_fit
def signal( x, A, B, C, D, E, F ):
### note: C, D, E, F have different meaning here
r = (
A * x**2
+ B * x
+ C
+ D * np.sin( F * x )
+ E * np.cos( F * x )
)
return r
def signal_p( x, A, B, C, D, E, F ):
r = (
A * x**2
+ B * x
+ C * np.sin( D * x - E )
+ F
)
return r
testparams = [ -1, 1, 3, 0.005, 0.03, 22 ]
### test data with noise
xl = np.linspace( -0.3, 1.6, 190 )
sl = signal( xl, *testparams )
sl += np.random.normal( size=len( xl ), scale=0.005 )
### numerical integrals
Sl = cumtrapz( sl, x=xl, initial=0 )
SSl = cumtrapz( Sl, x=xl, initial=0 )
### fitting the integro-differential equation to get the frequency
"""
note:
with y = A x**2 +...+ D sin() + E cos()
the double integral int( int(y) ) = a x**4 + ... - y/F**2
"""
VMXT = np.array( [ xl**4, xl**3, xl**2, xl, np.ones( len( xl ) ), sl ] )
VMX = VMXT.transpose()
A = np.dot( VMXT, VMX )
SV = np.dot( VMXT, SSl )
AI = np.linalg.inv( A )
result = np.dot( AI , SV )
print ( "Fit: ",result )
F = np.sqrt( -1 / result[-1] )
print("F = ", F)
### Fitting the linear parameters with the frequency known
VMXT = np.array(
[
xl**2, xl, np.ones( len( xl ) ),
np.sin( F * xl), np.cos( F * xl )
]
)
VMX = VMXT.transpose()
A = np.dot( VMXT, VMX )
SV = np.dot( VMXT, sl )
AI = np.linalg.inv( A )
A, B, C, D, E = np.dot( AI , SV )
print( A, B, C, D, E )
### Non-linear fit with initial guesses
amp = np.sqrt( D**2 + E**2 )
phi = -np.arctan( D / E )
opt, cov = curve_fit( signal_p, xl, sl, p0=( A, B, amp, F, phi, C ) )
print( opt )
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot(
xl, sl,
ls='', marker='+', label="data", markersize=5
)
ax.plot(
xl, signal( xl, A, B, C, D, E, F ),
ls="--", label="double linear fit"
)
ax.plot(
xl, signal_p( xl, *opt ),
ls=":", label="non-linear"
)
ax.legend( loc=0 )
ax.grid()
plt.show()
Providing
Fit: [-0.083161 0.1659759 1.49879056 0.848999 0.130222 -0.001990]
F = 22.414133356157887
-0.998516 0.998429 3.000265 0.012701 0.026926
[-0.99856269 0.9973273 0.0305014 21.96402992 -1.4215656 3.00100979]
and
When using the non-linear fit without initial guesses, I get basically a parabola. One can understand why when visualizing a sine half-wave. That is basically a parabola as well. Hence, the non-linear fit drives the according parameters in that direction, especially knowing that the default initial guesses are 1. So one is far off the small amplitude and the high frequency. The fit only finds a local minimum in the chi-square hyper-plane.
I'm using LMFIT to fit a piecewise polynomials to the first quadrant of a sine wave.
I would like to be able to add a constraint on the polynomial output - as opposed to on its parameters.
For example, I would like to ensure that the output is >= 0 and <= 1.0 (which of course only affects the first and last segment in the code below).
Another use case if if I want the polynomial to pass through some specific (x,y) exact points.
I understand this might be better done with np.polyfit but eventually I want to add more non-linear constraints and the LMFIT framework is more flexible.
import numpy as np
from lmfit.models import LinearModel
#split sine wave in 4 segments with 1024 points
nseg = 4
frac = 2**10
npoints = nseg*frac
xfrac = np.linspace(0, 1, num=frac, endpoint=False)
x = np.linspace(0, 1, num=npoints, endpoint=False)
y = np.sin(x*np.pi/2)
yseg = np.reshape(y, (nseg, frac))
mod = LinearModel()
coeff = []
bestfit = []
for i in range(nseg):
pars = mod.guess(yseg[i], x=xfrac)
out = mod.fit(yseg[i], pars, x=xfrac)
coeff.append([out.best_values['slope'], out.best_values['intercept']])
bestfit.append(out.best_fit)
bestfit = np.reshape(bestfit, (1, npoints))[0]
Turns out this is done by adding constraints on the parameters themselves that turns into the right constraint on the model output.
Using a custom model for linear interpolation it can be done as following:
def func(x, c0, c1):
return c0 + c1*x
pmodel = Model(func)
params = Parameters()
params.add('c0')
params.add('clip', value=0, max=1.0, vary=True)
params.add('c1', expr='clip-c0')
One option might be using splines.
A quick and dirty approach, just to present the idea, might look like this:
import matplotlib.pyplot as plt
import numpy as np
## quich and dirty spline function
def l_spline(x, abc ):
if isinstance( x, ( list, tuple, np.ndarray ) ):
out = [ l_spline( elem, abc ) for elem in x]
else:
a, b, c = abc
if x < a:
f = lambda t: 0
elif x < b:
f = lambda t: ( t - a ) / ( b - a )
elif x < c:
f = lambda t: -( t - c ) / (c - b )
else:
f = lambda t: 0
out = f(x)
return out
### test data
xl = np.linspace( 0, 4, 150 )
sl = np.fromiter( ( np.sin( elem ) for elem in xl ), np.float )
### test splines with manual double knots on first and last
yl = dict()
yl[0] = l_spline( xl, ( 0, 0, .4 ) )
for i in range(1, 10 ):
yl[i] = l_spline( xl, ( (i - 1 ) * 0.4 , i * 0.4, (i + 1 ) * 0.4 ) )
yl[10] = l_spline( xl, ( 3.6, 4, 4 ) )
## This is the most simple linear least square for the coefficients
AT = list()
for i in range( 11 ):
AT.append( yl[i] )
AT = np.array( AT )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, sl )
## adding up the weigthed sum
out = np.zeros( len( sl ) )
for a, l in zip( v, AT ):
out += a * l
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xl, sl, ls=':' )
for i in range( 11 ):
ax.plot( xl, yl[i] )
ax.plot( xl, out, color='k')
plt.show()
Looks like this:
Instead of the simple linear optimization one could use more complex functions to ensure that no parameter is larger than 1. This automatically ensures that the function does not go beyond 1. A fixed point can be established by setting the according b-spline to a fixed value, i.e. not fitting its parameter.
Hello as the title suggests I have been trying to add an exponential and power law fit to my PDF.
As shown in this picture:
The code i am using produces the underlying graph:
The code is this one:
a11=[9.76032106e-02, 6.73754187e-02, 3.20683249e-02, 2.21788509e-02,
2.70850237e-02, 9.90377323e-03, 2.11573411e-02, 8.46232347e-03,
8.49027869e-03, 7.33997745e-03, 5.71819070e-03, 4.62720448e-03,
4.11562884e-03, 3.20064313e-03, 2.66192941e-03, 1.69116510e-03,
1.94355212e-03, 2.55224949e-03, 1.23822395e-03, 5.29618250e-04,
4.03769641e-04, 3.96865740e-04, 3.38530868e-04, 2.04124701e-04,
1.63913557e-04, 2.04486864e-04, 1.82216592e-04, 1.34708400e-04,
9.24289261e-05, 9.55074181e-05, 8.13695322e-05, 5.15610541e-05,
4.15425149e-05, 4.68101099e-05, 3.33696885e-05, 1.61893058e-05,
9.61743970e-06, 1.17314090e-05, 6.65239507e-06]
b11=[3.97213201e+00, 4.77600082e+00, 5.74255432e+00, 6.90471618e+00,
8.30207306e+00, 9.98222306e+00, 1.20023970e+01, 1.44314081e+01,
1.73519956e+01, 2.08636432e+01, 2.50859682e+01, 3.01627952e+01,
3.62670562e+01, 4.36066802e+01, 5.24316764e+01, 6.30426504e+01,
7.58010432e+01, 9.11414433e+01, 1.09586390e+02, 1.31764173e+02,
1.58430233e+02, 1.90492894e+02, 2.29044305e+02, 2.75397642e+02,
3.31131836e+02, 3.98145358e+02, 4.78720886e+02, 5.75603061e+02,
6.92091976e+02, 8.32155588e+02, 1.00056488e+03, 1.20305636e+03,
1.44652749e+03, 1.73927162e+03, 2.09126048e+03, 2.51448384e+03,
3.02335795e+03, 3.63521656e+03, 4.37090138e+03]
plt.plot(b11,a11, 'ro')
plt.yscale("log")
plt.xscale("log")
plt.show()
I would like to add to the underlying graph a power law fit at smaller time and an exponential fit for loner times based on chi square error minimization method.
The data for the x axis saved in csv form:
The data for the x axis:
As mentioned in my comments, I think you can couple the power law and the exponential via a constant term. Alternatively, the data look like it can be fitted by two power laws. Although the comments suggest that there is truly an exponential behavior. Anyhow, I show both approaches here. In both cases I try to avoid any type of piece-wise definition. This also ensures $C^infty$.
In the first approach we have a * x**( -b ) for small x and a1 * exp( -d * x ) for large x. The idea is to choose an c such that the power law is much bigger than c for the required small x but significantly smaller otherwise.
This allows for the function mentioned in my comment, namely ( a * x**( -b ) + c ) * exp( -d * x ) . One may consider c as an transition parameter.
In the alternative approaches, I am taking two power-laws. There are, hence, two regions, In the first one function one is smaller, in the second, the second is smaller. As I always want the smaller function I make inverse summation, i.e., f = 1 / ( 1 / f1 + 1 / f2 ). As can be seen in the code below, I add an additional parameter ( technically in ] 0, infty [ ). This parameter controls the smoothness of the transition.
import matplotlib.pyplot as mp
import numpy as np
from scipy.optimize import curve_fit
data = np.loadtxt( "7jyRi.txt", delimiter=',' )
#### p-e: power and exponential coupled via a small constant term
def func_log( x, a, b, c, d ):
return np.log10( ( a * x**( -b ) + c ) * np.exp( -d * x ) )
guess = [.1, .8, 0.01, .005 ]
testx = np.logspace( 0, 3, 150 )
testy = np.fromiter( ( 10**func_log( x, *guess ) for x in testx ), np.float )
sol, _ = curve_fit( func_log, data[ ::, 0 ], np.log10( data[::,1] ), p0=guess )
fity = np.fromiter( ( 10**func_log( x, *sol ) for x in testx ), np.float )
#### p-p: alternatively using two power laws
def double_power_log( x, a, b, c, d, k ):
s1 = ( a * x**( -b ) )**k
s2 = ( c * x**( -d ) )**k
out = 1.0 / ( 1.0 / s1 + 1.0 / s2 )**( 1.0 / k )
return np.log10( out )
aguess = [.1, .8, 1e7, 4, 1 ]
atesty = np.fromiter( ( 10**double_power_log( x, *aguess ) for x in testx ), np.float )
asol, _ = curve_fit( double_power_log, data[ ::, 0 ], np.log10( data[ ::, 1 ] ), p0=aguess )
afity = np.fromiter( ( 10**double_power_log( x, *asol ) for x in testx ), np.float )
#### plotting
fig = mp.figure( figsize=( 10, 8 ) )
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( data[::,0], data[::,1] ,ls='', marker='o', label="data" )
ax.plot( testx, testy ,ls=':', label="guess p-e" )
ax.plot( testx, atesty ,ls=':',label="guess p-p" )
ax.plot( testx, fity ,ls='-',label="fit p-e: {}".format( sol ) )
ax.plot( testx, afity ,ls='-', label="fit p-p: {}".format( asol ) )
ax.set_xscale( "log" )
ax.set_yscale( "log" )
ax.set_xlim( [ 5e-1, 2e3 ] )
ax.set_ylim( [ 1e-5, 2e-1 ] )
ax.legend( loc=0 )
mp.show()
The results look like
For completeness I'd like to add a solution with a piece-wise definition. As I want the function continuous and differentiable, the parameters of the exponential law are not completely free. With f = a * x**(-b) and g = alpha * exp( -beta * x ) and a transition at x0 I choose ( a, b, x0 ) as free parameters. From this alpha and beta follow. The equations have no easy solution though, such that this itself requires a minimization.
import matplotlib.pyplot as mp
import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import minimize
from scipy.special import lambertw
data = np.loadtxt( "7jyRi.txt", delimiter=',' )
def pwl( x, a, b):
return a * x**( -b )
def expl( x, a, b ):
return a * np.exp( -b * x )
def alpha_fun(alpha, a, b, x0):
out = alpha - pwl( x0, a, b ) * expl(1, 1, lambertw( pwl( x0, -a * b/ alpha, b ) ) )
return 1e10 * np.abs( out )**2
def p_w( v, a,b, alpha, beta, x0 ):
if v < x0:
out = pwl( v, a, b )
else:
out = expl( v, alpha, beta )
return np.log10( out )
def alpha_beta( x, a, b, x0 ):
"""
continuous and differentiable define alpha and beta
free parameter is the point where I connect
"""
sol = minimize(alpha_fun, .005, args=( a, b, x0 ) )### attention, strongly depends on starting guess, i.e might be a catastrophic fail
alpha = sol.x[0]
# ~print alpha
beta = np.real( -lambertw( pwl( x0, -a * b/ alpha, b ) )/ x0 )
###
if isinstance( x, ( np.ndarray, list, tuple ) ):
out = list()
for v in x:
out.append( p_w( v, a, b, alpha, beta, x0 ) )
else:
out = p_w( v, a, b, alpha, beta, x0 )
return out
sol,_ = curve_fit( alpha_beta, data[ ::, 0 ], np.log10( data[ ::, 1 ] ), p0=[ .1, .8, 70. ] )
alpha0 = minimize(alpha_fun, .005, args=tuple(sol ) ).x[0]
beta0 = np.real( -lambertw( pwl( sol[2], -sol[0] * sol[1]/ alpha0, sol[1] ) )/ sol[2] )
xl = np.logspace(0,3,100)
yl = alpha_beta( xl, *sol )
pl = pwl( xl, sol[0], sol[1] )
el = expl( xl, alpha0, beta0 )
#### plotting
fig = mp.figure( figsize=( 10, 8 ) )
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( data[::,0], data[::,1] ,ls='', marker='o', label="data" )
ax.plot( xl, pl ,ls=':', label="p" )
ax.plot( xl, el ,ls=':', label="{:0.3e} exp(-{:0.3e} x)".format(alpha0, beta0) )
ax.plot( xl, [10**y for y in yl] ,ls='-', label="sol: {}".format(sol) )
ax.axvline(sol[-1], color='k', ls=':')
ax.set_xscale( "log" )
ax.set_yscale( "log" )
ax.set_xlim( [ 5e-1, 2e3 ] )
ax.set_ylim( [ 1e-5, 2e-1 ] )
ax.legend( loc=0 )
mp.show()
Eventually providing
I am trying to make a gaussian fit on a function that is messy. I want to only fit the exterior outer shell (these are not just the max values at each x, because some of the max values will be too low too, because the sample size is low).
from scipy.optimize import curve_fit
def Gauss(x, a, x0, sigma, offset):
return a * np.exp(-np.power(x - x0,2) / (2 * np.power(sigma,2))) + offset
def fitNormal(x, y):
popt, pcov = curve_fit(Gauss, x, y, p0=[np.max(y), np.median(x), np.std(x), np.min(y)])
return popt
plt.plot(xPlot,yPlot, 'k.')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Y(x)')
x,y = xPlot,yPlot
popt = fitNormal(x, y)
minx, maxx = np.min(x), np.max(x)
xFit = np.arange(start=minx, stop=maxx, step=(maxx-minx)/1000)
yFitTest = Gauss(xPlot, popt[0], popt[1], popt[2], popt[3])
print('max fit test: ',np.max(yFitTest))
print('max y: ',np.max(yPlot))
maxIndex = np.where(yPlot==np.max(yPlot))[0][0]
factor = yPlot[maxIndex]/yFitTest[maxIndex]
yFit = Gauss(xPlot, popt[0], popt[1], popt[2], popt[3]) * factor
plt.plot(xFit,yFit,'r')
This is an iterative approach similar to this post. It is different in the sense that the shape of the graph does not permit the use of convex hull. So the idea is to create a cost function that tries to minimize the area of the graph while paying high cost if a point is above the graph. Depending on the type of the graph in OP the cost function needs to be adapted. One also has to check if in the final result all points are really below the graph. Here one can fiddle with details of the cost function. One my, e.g., include an offset in the tanh like tanh( slope * ( x - offset) ) to push the solution farther away from the data.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import leastsq
def g( x, a, s ):
return a * np.exp(-x**2 / s**2 )
def cost_function( params, xData, yData, slope, val ):
a,s = params
area = 0.5 * np.sqrt( np.pi ) * a * s
diff = np.fromiter ( ( y - g( x, a, s) for x, y in zip( xData, yData ) ), np.float )
cDiff = np.fromiter( ( val * ( 1 + np.tanh( slope * d ) ) for d in diff ), np.float )
out = np.concatenate( [ [area] , cDiff ] )
return out
xData = np.linspace( -5, 5, 500 )
yData = np.fromiter( ( g( x, .77, 2 ) * np.sin( 257.7 * x )**2 for x in xData ), np.float )
sol=[ [ 1, 2.2 ] ]
for i in range( 1, 6 ):
solN, err = leastsq( cost_function, sol[-1] , args=( xData, yData, 10**i, 1 ) )
sol += [ solN ]
print sol
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1)
ax.scatter( xData, yData, s=1 )
for solN in sol:
solY = np.fromiter( ( g( x, *solN ) for x in xData ), np.float )
ax.plot( xData, solY )
plt.show()
giving
>> [0.8627445 3.55774814]
>> [0.77758636 2.52613376]
>> [0.76712184 2.1181137 ]
>> [0.76874125 2.01910211]
>> [0.7695663 2.00262339]
and
Here is a different approach using scipy's Differental Evolution module combined with a "brick wall", where if any predicted value during the fit is greater than the corresponding Y value, the fitting error is made extremely large. I have shamelessly poached code from the answer of #mikuszefski to generate the data used in this example.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
from scipy.optimize import differential_evolution
def g( x, a, s ):
return a * np.exp(-x**2 / s**2 )
xData = np.linspace( -5, 5, 500 )
yData = np.fromiter( ( g( x, .77, 2 )* np.sin( 257.7 * x )**2 for x in xData ), np.float )
def Gauss(x, a, x0, sigma, offset):
return a * np.exp(-np.power(x - x0,2) / (2 * np.power(sigma,2))) + offset
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = Gauss(xData, *parameterTuple)
multiplier = 1.0
for i in range(len(val)):
if val[i] < yData[i]: # ****** brick wall ******
multiplier = 1.0E10
return np.sum((multiplier * (yData - val)) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
minData = min(minX, minY)
maxData = max(maxX, maxY)
parameterBounds = []
parameterBounds.append([minData, maxData]) # parameter bounds for a
parameterBounds.append([minData, maxData]) # parameter bounds for x0
parameterBounds.append([minData, maxData]) # parameter bounds for sigma
parameterBounds.append([minData, maxData]) # parameter bounds for offset
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3, polish=False)
return result.x
# generate initial parameter values
geneticParameters = generate_Initial_Parameters()
# create values for display of fitted function
y_fit = Gauss(xData, *geneticParameters)
plt.scatter(xData, yData, s=1 ) # plot the raw data
plt.plot(xData, y_fit) # plot the equation using the fitted parameters
plt.show()
print('parameters:', geneticParameters)