Fitting a trapezoid with curve_fit - python

I am trying to fit a trapezoid to a set of time series using the curve_fit library from scipy.optimize. The function that I'm using to generate a trapezoid is the following:
def trapezoid(x, a, b, c, tau1, tau2):
y = np.zeros(len(x))
c = -np.abs(c)
a = np.abs(a)
y[:int(tau1)] = a*x[:int(tau1)] + b
y[int(tau1):int(tau2)] = a*tau1 + b
y[int(tau2):] = c*(x[int(tau2):]-tau2) + (a*tau1 + b)
return y
Where a and c are the slopes, and tau1 and tau2 mark the beginning and the end of the flat phase.
And in order to fit I just use:
popt, pcov = curve_fit(trapezoid, xdata, ydata, method = 'lm')
For most of the cases it works just fine, such as in the following:
However, I'm also getting some cases on which it just fails to fit the data, where it looks like it should be doing ok:
The problem with these cases is that it sets a tau2 (end of the flat phase) smaller than tau1 (beginning of it).
Could anyone suggest a way to solve this issue? Whether by imposing a constraint or in some other way?
Example array for which the fit does not work:
array([1.2 , 1.21, 1.2 , 1.19, 1.21, 1.22, 2.47, 2.53, 2.49, 2.39, 2.28,
2.16, 2.07, 1.99, 1.91, 1.83, 1.74, 1.65, 1.57, 1.5 , 1.45, 1.41,
1.38, 1.35, 1.33, 1.29, 1.24, 1.19, 1.14, 1.11, 1.07, 1.04, 1. ,
0.95, 0.91, 0.87, 0.84, 0.8 , 0.77, 0.74, 0.72, 0.7 , 0.68, 0.66,
0.63, 0.61, 0.59, 0.57, 0.55, 0.52, 0.5 , 0.48, 0.45, 0.43, 0.41,
0.39, 0.38, 0.37, 0.37, 0.36, 0.35, 0.34, 0.34, 0.33])
Which yields: tau1: 8.45, tau2:5.99

You might find lmfit (http://lmfit.github.io/lmfit-py/) useful for this problem. Lmfit provides a slightly higher level interface to curve fitting, still based on the scipyoptimizers, but with some better abstractions and features.
In particular for your question, lmfit parameters are Python objects that can have bounds, be fixed, or be written as simple algebraic constraints in terms of other variables. This can support imposing tau2 > tau1.
The idea is essentially to set tau2=tau1+taudiff and place a lower bound of 0 on taudiff. While you could rewrite your function to do that in the code, with lmfit you don't have to do that and can put that logic in the Parameters instead.
Converting your script to use lmfit would give something like this:
from lmfit import Model
# use your same model function
def trapezoid(x, a, b, c, tau1, tau2):
y = np.zeros(len(x))
c = -np.abs(c)
a = np.abs(a)
y[:int(tau1)] = a*x[:int(tau1)] + b
y[int(tau1):int(tau2)] = a*tau1 + b
y[int(tau2):] = c*(x[int(tau2):]-tau2) + (a*tau1 + b)
return y
# turn model function into lmfit Model
tmod = Model(trapezoid)
# create Parameters for this model: they will be *named* according
# to the signature of the model function, and be used as keys in
# an ordered-directory-derived object. Here you can also give
# initial values
params = tmod.make_params(a=1, b=2, c=0.5, tau1=5, tau2=-1)
# now you can set bounds or constraints.
# 1st, add a new variable "taudiff"
params.add('taudiff', value=0.1, min=0, vary=True)
# constraint tau2 to be taudiff+tau1 -- this is no longer a "free variable:
params['tau2'].expr = "taudiff + tau1"
# now do fit to data:
result = tmod.fit(ydata, params, x=xdata)
# print report of fit
print(result.fit_report())
# get best fit params:
for parname, param in result.params:
print(parname, param.value, param.stderr, param.expr)
# get best fit array for plotting
pylab.plot(xdata, ydata)
pylab.plot(xdata, result.best_fit)
Hope that helps.

Just setting t1,t2 to the minimum and maximum value does work
def trapezoid(x, a, b, c, tau1, tau2):
y = np.zeros(len(x))
c = -np.abs(c)
a = np.abs(a)
(tau1,tau2) = (min(tau1,tau2),max(tau1,tau2))
y[:int(tau1)] = a*x[:int(tau1)] + b
y[int(tau1):int(tau2)] = a*tau1 + b
y[int(tau2):] = c*(x[int(tau2):]-tau2) + (a*tau1 + b)
x_data = np.arange(len(A))
popt, pcov = curve_fit(trapezoid, x_data, A, method = 'lm')
print popt
fit = trapezoid(x_data,*popt)
leads to:

Related

Normalize vectors in the complex space : mean=0 and std-dev=1/sqrt(n)

I have a Vector Or bunch of vectors (stored in 2D array, by rows)
The vectors are generated as :
MEAN=0, STD-DEV=1/SQRT(vec_len)
and before or after operations have to be normalized in the same form
I want to normalize them in the complex space.
Here is the wrapper of a function:
#staticmethod
def fft_normalize(x, dim=DEF_DIM):
cx = rfft(x, dim=dim)
....
rv = irfft(cx_proj, dim=dim)
return rv
help me fill the dots.
Here is the real-value normalization that I use.
#staticmethod
def normalize(a, dim=DEF_DIM):
norm=torch.linalg.norm(a,dim=dim)
# if torch.eq(norm,0) : return torch.divide(a,st.MIN)
if dim is not None : norm = norm.unsqueeze(dim)
return torch.divide(a,norm)
In [70]: st.normalize(x + 3)
Out[70]:
([[0.05, 0.04, 0.05, ..., 0.04, 0.04, 0.04],
[0.04, 0.04, 0.05, ..., 0.05, 0.04, 0.05],
[0.05, 0.04, 0.05, ..., 0.04, 0.05, 0.04]])
In [71]: st.normalize(x + 5)
Out[71]:
([[0.05, 0.04, 0.05, ..., 0.04, 0.04, 0.04],
[0.04, 0.04, 0.05, ..., 0.05, 0.04, 0.04],
[0.05, 0.04, 0.04, ..., 0.04, 0.05, 0.04]])
In [73]: st.normalize(x + 5).len()
Out[73]: ([1.00, 1.00, 1.00])
In [74]: st.normalize(x + 3).len()
Out[74]: ([1., 1., 1.])
In [75]: st.normalize(x).len()
Out[75]: ([1.00, 1.00, 1.00])
#bad, need normalization
In [76]: (x + 3).len()
Out[76]: ([67.13, 67.13, 67.13])
#staticmethod
def len(a,dim=DEF_DIM): return torch.linalg.norm(a,dim=dim)
I did not want to post this so not to influence possible better solution.. So here is one my attempts .. parts I borrowed from what I found.
This only works for 1D vectors ;(
#staticmethod
def fft_normalize(x, dim=DEF_DIM):# Normalize a vector x in complex domain.
c = rfft(x,dim=dim)
ri = torch.vstack([c.real, c.imag])
norm = torch.abs(c)
print(norm.shape, ri.shape)
# norm = torch.linalg.norm(ri, dim=dim)
# if dim is not None : norm = norm.unsqueeze(dim)
if torch.any(torch.eq(norm,0)): norm[torch.eq(norm,0)] = st.MIN #!fixme
ri= torch.divide(ri,norm) #2D fails here
c_proj = ri[0,:] + 1j * ri[1,:]
rv = irfft(c_proj, dim=dim)
return rv
adapted the solution of Thibault Cimic ... seems to work for 1D vectors, but not for 2D
#staticmethod
def fft_normalize(x, dim=DEF_DIM, dot_dim=None):# Normalize a vector x in complex domain.
c = rfftn(x,dim=dim)
c_conj = torch.conj(c)
if dot_dim is None : dot_dim = st.dot_dims(c, c_conj)
c_norm = torch.sqrt(torch.tensordot(c, c_conj, dims=dot_dim))
c_proj = torch.divide(c, c_norm)
rv = irfftn(c_proj, dim=dim)
return rv
I'm guessing you want to normalize with the norm associated to the natural complex inner product. So is that what you're trying to do :
def fft_normalize(x, dim=DEF_DIM):# Normalize a vector x in complex domain.
c = rfft(x,dim=dim)
c_norm = math.sqrt(c.dot(numpy.conjugate(c)))
c_proj = c/c_norm
rv = irfft(c_proj, dim=dim)
return rv

getting values from a CDF

Good morning, everyone. I have a set of values.
Arr = np.array([0.11, 0.14, 0.22, 0.26, 0.31, 0.36, 0.44, 0.69, 0.70, 0.70, 0.70, 0.75, 0.98, 1.40])
I have constructed the CDF function in this way:
def ecdf(a):
x, counts = np.unique(a, return_counts=True)
cusum = np.cumsum(counts)
return x, cusum / cusum[-1]
def plot_ecdf(a):
x, y = ecdf(a)
x = np.insert(x, 0, x[0])
y = np.insert(y, 0, 0.)
plt.plot(x, y, drawstyle='steps-post')
plt.grid(True)
ecdf_ = ecdf(Arr)
plot_ecdf(ecdf_)
Obtaining this figure:
Now I want to divide the space (y-axis) into 5 parts. To do this I am using the following function:
from scipy.stats.qmc import LatinHypercube
engine = LatinHypercube(d=1)
sample = engine.random(n=5) #Array of float64
For example, obtaining 5 values randomly generated:
0.0886183
0.450613
0.808077
0.753524
0.343108
At this point I would like to keep the corresponding values in the CDF as in the picture.
I also observed that in this way the constructed CDF has a discrete set of values. Which may not be optimal for my purpose.

Finding the probability of a variable in collection of lists

I have a selection of lists of variables
import numpy.random as npr
w = [0.02, 0.03, 0.05, 0.07, 0.11, 0.13, 0.17]
x = 1
y = False
z = [0.12, 0.2, 0.25, 0.05, 0.08, 0.125, 0.175]
v = npr.choice(w, x, y, z)
I want to find the probability of the value V being a selection of variables eg; False or 0.12.
How do I do this.
Heres what I've tried;
import numpy.random as npr
import math
w = [0.02, 0.03, 0.05, 0.07, 0.11, 0.13, 0.17]
x = 1
y = False
z = [0.12, 0.2, 0.25, 0.05, 0.08, 0.125, 0.175]
v = npr.choice(w, x, y, z)
from collections import Counter
c = Counter(0.02, 0.03, 0.05, 0.07, 0.11, 0.13, 0.17,1,False,0.12, 0.2, 0.25, 0.05, 0.08, 0.125, 0.175)
def probability(0.12):
return float(c[v]/len(w,x,y,z))
which I'm getting that 0.12 is an invalid syntax
There are several issues in the code, I think you want the following:
import numpy.random as npr
import math
from collections import Counter
def probability(v=0.12):
return float(c[v]/len(combined))
w = [0.02, 0.03, 0.05, 0.07, 0.11, 0.13, 0.17]
x = [1]
y = [False]
z = [0.12, 0.2, 0.25, 0.05, 0.08, 0.125, 0.175]
combined = w + x + y + z
v = npr.choice(combined)
c = Counter(combined)
print(probability())
print(probability(v=0.05))
1) def probability(0.12) does not make sense; you will have to pass a variable which can also have a default value (above I use 0.12)
2) len(w, x, y, z) does not make much sense either; you probably look for a list that combines all the elements of w, x, y and z. I put all of those in the list combined.
3) One would also have to put in an additional check, in case the user passes e.g. v=12345 which is not included in combined (I leave this to you).
The above will print
0.0625
0.125
which gives the expected outcome.

Chi-squared for the optimal order of a fit with polynomial

I have the following code, in which DGauss is a function that generates the expected values. The two arrays, on the other hand, allow me to generate a distribution, that I take as observed values.
The code, based on the observed values, extracts a polynomial (for the moment of the seventh degree) that describes its trend.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def DGauss(x,I1,I2,sigma1,sigma2):
return I1*np.exp(-x*x/(2*sigma1*sigma1)) + I2*np.exp(-x*x/(2*sigma2*sigma2))
Pos = np.array([3.28, 3.13, 3.08, 3.03, 2.98, 2.93, 2.88, 2.83, 2.78, 2.73, 2.68,
2.63, 2.58, 2.53, 2.48, 2.43, 2.38, 2.33, 2.28, 2.23, 2.18, 2.13,
2.08, 2.03, 1.98, 1.93, 1.88, 1.83, 1.78, 1.73, 1.68, 1.63, 1.58,
1.53, 1.48, 1.43, 1.38, 1.33, 1.28, 1.23, 1.18, 1.13, 1.08, 1.03,
0.98, 0.93, 0.88, 0.83, 0.78, 0.73, 0.68, 0.63, 0.58, 0.53, 0.48,
0.43, 0.38, 0.33, 0.28, 0.23, 0.18, 0.13, 0.08, 0.03])
Val = np.array([0.00986279, 0.01529543, 0.0242624 , 0.0287456 , 0.03238484,
0.03285927, 0.03945234, 0.04615091, 0.05701618, 0.0637672 ,
0.07194268, 0.07763934, 0.08565687, 0.09615262, 0.1043281 ,
0.11350606, 0.1199406 , 0.1260062 , 0.14093328, 0.15079665,
0.16651464, 0.18065023, 0.1938894 , 0.2047541 , 0.21794024,
0.22806706, 0.23793043, 0.25164404, 0.2635118 , 0.28075974,
0.29568682, 0.30871501, 0.3311846 , 0.34648062, 0.36984661,
0.38540666, 0.40618835, 0.4283945 , 0.45002014, 0.48303911,
0.50746062, 0.53167057, 0.5548792 , 0.57835128, 0.60256181,
0.62566436, 0.65704847, 0.68289386, 0.71332794, 0.73258027,
0.769608 , 0.78769989, 0.81407275, 0.83358852, 0.85210239,
0.87109068, 0.89456217, 0.91618782, 0.93760247, 0.95680234,
0.96919757, 0.9783219 , 0.98486193, 0.9931429 ])
f = np.linspace(-9,9,2*len(Pos))
plt.errorbar(Pos, Val, xerr=0.02, yerr=2.7e-3, fmt='o')
popt, pcov = curve_fit(DGauss, Pos, Val)
plt.plot(xfull, DGauss(f, *popt), '--', label='Double Gauss')
x = Pos
y = Val
#z, w = np.polyfit(x, y, 7, full=False, cov=True)
p = np.poly1d(z)
u = np.array(p)
xp = np.linspace(1, 6, 100)
_ = plt.plot(xp, p(xp), '-', color='darkviolet')
x = symbols('x')
list = u[::-1]
poly = sum(S("{:7.3f}".format(v))*x**i for i, v in enumerate(list))
eq_latex = sympy.printing.latex(poly)
print(eq_latex)
#LOOP SUGGESTED BY #Fourier
dof = [1,2,3,4,5,6,7,8,9,10]
for i in dof:
z = np.polyfit(x, y, i, full=False, cov=True)
chi = np.sum((np.polyval(z, x) - y) ** 2)
chinorm = chi/i
plt.plot(chinorm)
What I would like to do now is to make a fit by varying the order of the polynomial to figure out which is the minimum order I need to have a good fit and not exceed the number of free parameters. In particular, I would like to make this fit with different orders and plot the chi-squared, which must be normalized with respect to the number of degrees of freedom.
Could someone help me kindly?
Thanks!
Based on the posted code this should work for your purpose:
chiSquares = []
dofs = 10
for i in np.arange(1,dofs+1):
z = np.polyfit(x, y, i, full=False, cov=False)
chi = np.sum((np.polyval(z, x) - y) ** 2) / np.std(y) #ideally you should divide this using an error for Val array
chinorm = chi/i
chiSquares.append(chinorm)
plt.plot(np.arange(1,dofs+1),chiSquares)
If not evident from the plot, you can further use the F-test to check how much dof is really needed:
n = len(y)
for d, (rss1,rss2) in enumerate(zip(chiSquares,chiSquares[1:])):
p1 = d + 1
p2 = d + 2
F = (rss1-rss2/(p2-p1)) / (rss2/(n-p2))
p = 1.0 - scipy.stats.f.cdf(F,p1,p2)
print 'F-stats: {:.3f}, p-value: {:.5f}'.format(F,p)

Least squares function and 4 parameter logistics function not working

Relatively new to python, mainly using it for plotting things. I am currently attempting to determine a best fit line using the 4 parameter logistic (4PL) equation and curve fit from scipy. There are one or two sites showing how 4PL works, but could not get them to work for my data. Example, but similar 4PL data below:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import scipy.optimize as optimization
xdata = [2.3, 2.3, 2, 2, 1.7, 1.7, 1, 1, 0.000001, 0.000001, -1, -1]
ydata = [0.32, 0.3, 0.55, 0.60, 0.88, 0.92, 1.27, 1.21, 1.15, 1.12, 1.1, 1.1]
def fourPL(x, A, B, C, D):
return ((A-D)/(1.0+((x/C)**(B))) + D)
guess = [0, -0.5, 0.5, 1]
params, params_covariance = optimization.curve_fit(fourPL, xdata, ydata,
guess)
params
Gives warning (also an exponent warning in test data, but not real):
OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
And the params returns my initial guess. I have tried various initial guesses.
The best fit line is drawn when plotting, but is not a curve and does not go below x = 0 (I cannot find a reason negatives would mess with the 4PL model).
4PL fit plotted
I'm not sure if I am doing something incorrect with the equation, or how the curve fit function works, or both. I have a similar issue using least squares instead of curve fit. I've tried a bunch of variations based off similar equations for fit etc. but have been stuck for awhile, any help in pointing me in the right direction would be much appreciated.
I'm surprised you did not get any warnings or did not share them with us. I can't analyze this task for you by scientific means, just some remarks about technical stuff:
Observation
When running your code, you should some warnings like:
RuntimeWarning: invalid value encountered in power
return ((A-D)/(1.0+((x/C)**(B))) + D)
Don't ignore this!
Debugging
Add some prints to your function fourPL, probably all the different components of your function and look what's happening.
Example:
def fourPL(x, A, B, C, D):
print('1: ', (A-D))
print('2: ', (x/C))
print('3: ', (1.0+((x/C)**(B))))
return ((A-D)/(1.0+((x/C)**(B))) + D)
...
params, params_covariance = optimization.curve_fit(fourPL, xdata, ydata, guess, maxfev=1)
# maxfev=1 -> let's just check 1 or few it's
Output:
1: -1.0
2: [ 4.60000000e+00 4.60000000e+00 4.00000000e+00 4.00000000e+00
3.40000000e+00 3.40000000e+00 2.00000000e+00 2.00000000e+00
2.00000000e-06 2.00000000e-06 -2.00000000e+00 -2.00000000e+00]
RuntimeWarning: invalid value encountered in power
print('3: ', (1.0+((x/C)**(B))))
3: [ 1.4662524 1.4662524 1.5 1.5 1.54232614
1.54232614 1.70710678 1.70710678 708.10678119 708.10678119
nan nan]
That's enough to stop. nans and infs are bad!
Theory
Now it's time for theory and i won't do that. But usually you now should think about the underlying theory and why these problems occur.
Is there something you missed in regards to the assumptions?
Repair (without checking theory)
Without checking out the theory and just looking over some example found within 30 secs: hmm are negative x-values a problem?
Let's shift x (by the minimum; hardcoded 1 here):
xdata = np.array([2.3, 2.3, 2, 2, 1.7, 1.7, 1, 1, 0.000001, 0.000001, -1, -1]) + 1
Complete code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import scipy.optimize as optimization
xdata = np.array([2.3, 2.3, 2, 2, 1.7, 1.7, 1, 1, 0.000001, 0.000001, -1, -1]) + 1
ydata = np.array([0.32, 0.3, 0.55, 0.60, 0.88, 0.92, 1.27, 1.21, 1.15, 1.12, 1.1, 1.1])
def fourPL(x, A, B, C, D):
return ((A-D)/(1.0+((x/C)**(B))) + D)
guess = [0, -0.5, 0.5, 1]
params, params_covariance = optimization.curve_fit(fourPL, xdata, ydata, guess)#, maxfev=1)
x_min, x_max = np.amin(xdata), np.amax(xdata)
xs = np.linspace(x_min, x_max, 1000)
plt.scatter(xdata, ydata)
plt.plot(xs, fourPL(xs, *params))
plt.show()
Output:
RuntimeWarning: divide by zero encountered in power
return ((A-D)/(1.0+((x/C)**(B))) + D)
Looks good, but it's time for another theory session: what did our linear-shift do to our results? I'm ignoring this again.
So just one warning and a nice-looking output.
If you want to remove that last warning, add some small epsilon to not have 0's in xdata:
xdata = np.array([2.3, 2.3, 2, 2, 1.7, 1.7, 1, 1, 0.000001, 0.000001, -1, -1]) + 1 + 1e-10
which will achieve the same, without any warning.

Categories

Resources