Apply non-linear regression for multi dimension data samples in Python

Apply non-linear regression for multi dimension data samples in Python - python

I have installed Numpy and SciPy, but I'm not quite understand their documentation about polyfit.
For exmpale, Here's my three data samples:
[-0.042780748663101636, -0.0040771571786609945, -0.00506567946276074]
[0.042780748663101636, -0.0044771571786609945, -0.10506567946276074]
[0.542780748663101636, -0.005771571786609945, 0.30506567946276074]
[-0.342780748663101636, -0.0304077157178660995, 0.90506567946276074]
The first two columns are sample features, the third column is output, My target is to get a function that could take two parameters(first two columns) and return its prediction(the output).
Any simple example ?
====================== EDIT ======================
Note that, I need to fit something like a curve, not only straight lines. The polynomial should be something like this ( n = 3):
a*x1^3 + b*x2^2 + c*x3 + d = y
Not:
a*x1 + b*x2 + c*x3 + d = y
x1, x2, x3 are features of one sample, y is the output

Try something like
edit: added an example function that used results of linear regression to estimate output.
import numpy as np
data =np.array(
[[-0.042780748663101636, -0.0040771571786609945, -0.00506567946276074],
[0.042780748663101636, -0.0044771571786609945, -0.10506567946276074],
[0.542780748663101636, -0.005771571786609945, 0.30506567946276074],
[-0.342780748663101636, -0.0304077157178660995, 0.90506567946276074]])
coefficient = data[:,0:2]
dependent = data[:,-1]
x,residuals,rank,s = np.linalg.lstsq(coefficient,dependent)
def f(x,u,v):
return u*x[0] + v*x[1]
for datum in data:
print f(x,*datum[0:2])
Which gives
>>> x
array([ 0.16991146, -30.18923739])
>>> residuals
array([ 0.07941146])
>>> rank
2
>>> s
array([ 0.64490113, 0.02944663])
and the function created with your coefficients gave
0.115817326583
0.142430900298
0.266464019171
0.859743371665
More info can be found at the documentation I posted as a comment.
edit 2: fitting your data to an arbitrary model.
edit 3: made my model a function for ease of understanding.
edit 4: made code more easily read/ changed model to a quadratic fit, but you should be able to read this code and know how to make it minimize any residual you want now.
contrived example:
import numpy as np
from scipy.optimize import leastsq
data =np.array(
[[-0.042780748663101636, -0.0040771571786609945, -0.00506567946276074],
[0.042780748663101636, -0.0044771571786609945, -0.10506567946276074],
[0.542780748663101636, -0.005771571786609945, 0.30506567946276074],
[-0.342780748663101636, -0.0304077157178660995, 0.90506567946276074]])
coefficient = data[:,0:2]
dependent = data[:,-1]
def model(p,x):
a,b,c = p
u = x[:,0]
v = x[:,1]
return (a*u**2 + b*v + c)
def residuals(p, y, x):
a,b,c = p
err = y - model(p,x)
return err
p0 = np.array([2,3,4]) #some initial guess
p = leastsq(residuals, p0, args=(dependent, coefficient))[0]
def f(p,x):
return p[0]*x[0] + p[1]*x[1] + p[2]
for x in coefficient:
print f(p,x)
gives
-0.108798280153
-0.00470479385807
0.570237823475
0.413016072653

Related

fft normalization vs the norm-option

i have to often normalize the result of circular-convolution, so I 'borrowed'-and-modfied the following routine :
def fft_normalize(x):# Normalize a vector x in complex domain.
c = np.fft.rfft(x)
# Look at real and image as if they were real
ri = np.vstack([c.real, c.imag])
# Normalize magnitude of each complex/real pair
norm = np.linalg.norm(ri, axis=0)
if np.any(norm==0): norm[norm == 0] = np.float64(1e-308) #!fixme
ri= np.divide(ri,norm)
c_proj = ri[0,:] + 1j * ri[1,:]
rv = np.fft.irfft(c_proj, n=x.shape[-1])
return rv
def fft_convolution(a, b):
return np.fft.irfft(np.fft.rfft(a) * np.fft.rfft(b))
so that I do this :
fft_normalize(fft_convolution(a,b))
i see in the numpy docs there is a 'norm' option. Is this equivalent to what i'm doing ?
And if yes, which option should I use ?
def fft_convolution2(a, b):
return np.fft.irfft(np.fft.rfft(a) * np.fft.rfft(b), norm='ortho')
When I test it it behaves better when I do fft_normalize()
Second, i had to add this line, but it does not seem, right. any ideas ?
if np.any(norm==0): norm[norm == 0] = np.float64(1e-308) #!fixme
As a side note, if you know !! numpy docs mentions that they promote float32 to float64 and that scipy.fftpack does not !!
Would fftpack be faster ! because scipy says fftpack is obsolete and there is no info on the new scipy ?
#cris are you sugesting i do it this way :
def fft_normalize(x):# Normalize a vector x in complex domain.
c = np.fft.rfft(x)
ri = np.vstack([c.real, c.imag])
norm = np.abs(c)
if np.any(norm==0): norm[norm == 0] = MIN #!fixme
ri= np.divide(ri,norm)
c_proj = ri[0,:] + 1j * ri[1,:]
rv = np.fft.irfft(c_proj, n=x.shape[-1])
return rv

The norm argument to the FFT functions in NumPy determine whether the transform result is multiplied by 1, 1/N or 1/sqrt(N), with N the number of samples in the array. Normally, the inverse transform is normalized by dividing by N, and the forward transform is not. Specifying “ortho” here causes both transforms to be normalized by 1/sqrt(2). Specifying “forward” causes only the forward transform to be normalized by 1/N.
These normalizations are very different from the one you apply, where you normalize each element in the frequency domain.

How to include known parameter that changes over time in solve_bvp

I am trying to use scipy's solve_bvp in python to solve differential equations that depend on a known parameter that changes over time. I have this parameter saved in a numpy array. However, when I try to use this array in the derivatives function, I get the following error ValueError: operands could not be broadcast together with shapes (10,) (11,).
Below is a simplified version of my code. I want the variable d2 to take certain values at different times according to an array, d2_set_values. The differential equations for some of the 12 variables then depend on d2. I hope it's clear from this code what I'm trying to achieve.
import numpy as np
from scipy.integrate import solve_bvp
t = np.linspace(0, 10, 11)
# Known parameter that changes over time
d2_set_values = np.zeros(t.size)
d2_set_values[:4] = 0.1
d2_set_values[4:8] = 0.2
d2_set_values[8:] = 0.1
# Initialise y vector
y = np.zeros((12, t.size))
# ODEs
def fun(x, y):
S1, I1, R1, S2, I2, R2, lamS1, lamI1, lamR1, lamS2, lamI2, lamR2 = y
d1 = 0.5*(I1 + 0.1*I2)*(lamS1 - lamI1)
d2 = d2_set_values
dS1dt = -0.5*S1*(1-d1)*(I1 + 0.1*I2)
dS2dt = -0.5*S2*(1-d2)*(I2 + 0.1*I1)
dI1dt = 0.5*S1*(1-d1)*(I1 + 0.1*I2) - 0.2*I1
dI2dt = 0.5*S2*(1-d2)*(I2 + 0.1*I1) - 0.2*I2
dR1dt = 0.2*I1
dR2dt = 0.2*I2
dlamS1dt = 0.5*(1-d1)*S1*lamS1
dlamS2dt = 0.5*(1-d2)*S2*lamS2
dlamI1dt = 0.5*(1-d1)*I1*lamI1
dlamI2dt = 0.5*(1-d2)*I2*lamI2
dlamR1dt = lamR1
dlamR2dt = lamR2
return np.vstack((dS1dt, dI1dt, dR1dt, dS2dt, dI2dt, dR2dt, dlamS1dt, dlamI1dt, dlamR1dt, dlamS2dt, dlamI2dt, dlamR2dt))
# Boundary conditions
def bc(ya, yb):
return np.array([ya[0]-0.99, ya[1]-0.01, ya[2]-0., ya[3]-1.0, ya[4]-0., ya[5]-0.,
yb[6]-0., yb[7]-1., yb[8]-0., yb[9]-0, yb[10]-0, yb[11]-0])
# Run the solver
sol = solve_bvp(fun, bc, t, y)
I have even tried reducing the size of d2_set_values by one, but that doesn't solve the issue.
Any help I can get would be much appreciated!

Fitting a quadratic function in python without numpy polyfit

I am trying to fit a quadratic function to some data, and I'm trying to do this without using numpy's polyfit function.
Mathematically I tried to follow this website https://neutrium.net/mathematics/least-squares-fitting-of-a-polynomial/ but somehow I don't think that I'm doing it right. If anyone could assist me that would be great, or If you could suggest another way to do it that would also be awesome.
What I've tried so far:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
ones = np.ones(3)
A = np.array( ((0,1),(1,1),(2,1)))
xfeature = A.T[0]
squaredfeature = A.T[0] ** 2
b = np.array( (1,2,0), ndmin=2 ).T
b = b.reshape(3)
features = np.concatenate((np.vstack(ones), np.vstack(xfeature), np.vstack(squaredfeature)), axis = 1)
featuresc = features.copy()
print(features)
m_det = np.linalg.det(features)
print(m_det)
determinants = []
for i in range(3):
featuresc.T[i] = b
print(featuresc)
det = np.linalg.det(featuresc)
determinants.append(det)
print(det)
featuresc = features.copy()
determinants = determinants / m_det
print(determinants)
plt.scatter(A.T[0],b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
p2 = np.polyfit(A.T[0],b,2)
plt.plot(u, np.polyval(p2,u), 'b--')
plt.show()
As you can see my curve doesn't compare well to nnumpy's polyfit curve.
Update:
I went through my code and removed all the stupid mistakes and now it works, when I try to fit it over 3 points, but I have no idea how to fit over more than three points.
This is the new code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
ones = np.ones(3)
A = np.array( ((0,1),(1,1),(2,1)))
xfeature = A.T[0]
squaredfeature = A.T[0] ** 2
b = np.array( (1,2,0), ndmin=2 ).T
b = b.reshape(3)
features = np.concatenate((np.vstack(ones), np.vstack(xfeature), np.vstack(squaredfeature)), axis = 1)
featuresc = features.copy()
print(features)
m_det = np.linalg.det(features)
print(m_det)
determinants = []
for i in range(3):
featuresc.T[i] = b
print(featuresc)
det = np.linalg.det(featuresc)
determinants.append(det)
print(det)
featuresc = features.copy()
determinants = determinants / m_det
print(determinants)
plt.scatter(A.T[0],b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
p2 = np.polyfit(A.T[0],b,2)
plt.plot(u, np.polyval(p2,u), 'r--')
plt.show()

Instead using Cramer's Rule, actually solve the system using least squares. Remember that Cramer's Rule will only work if the total number of points you have equals the desired order of polynomial plus 1.
If you don't have this, then Cramer's Rule will not work as you're trying to find an exact solution to the problem. If you have more points, the method is unsuitable as we will create an overdetermined system of equations.
To adapt this to more points, numpy.linalg.lstsq would be a better fit as it solves the solution to the Ax = b by computing the vector x that minimizes the Euclidean norm using the matrix A. Therefore, remove the y values from the last column of the features matrix and solve for the coefficients and use numpy.linalg.lstsq to solve for the coefficients:
import numpy as np
import matplotlib.pyplot as plt
ones = np.ones(4)
xfeature = np.asarray([0,1,2,3])
squaredfeature = xfeature ** 2
b = np.asarray([1,2,0,3])
features = np.concatenate((np.vstack(ones),np.vstack(xfeature),np.vstack(squaredfeature)), axis = 1) # Change - remove the y values
determinants = np.linalg.lstsq(features, b)[0] # Change - use least squares
plt.scatter(xfeature,b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
plt.show()
I get this plot now, which matches what the dashed curve is in your graph, also matching what numpy.polyfit gives you:

Scipy.Odr multiple variable regression

I would like to perform a multidimensional ODR with scipy.odr. I read the API documentation, it says that multi-dimensionality is possible, but I cannot make it work. I cannot find working example on the internet and API is really crude and give no hints how to proceed.
Here is my MWE:
import numpy as np
import scipy.odr
def linfit(beta, x):
return beta[0]*x[:,0] + beta[1]*x[:,1] + beta[2]
n = 1000
t = np.linspace(0, 1, n)
x = np.full((n, 2), float('nan'))
x[:,0] = 2.5*np.sin(2*np.pi*6*t)+4
x[:,1] = 0.5*np.sin(2*np.pi*7*t + np.pi/3)+2
e = 0.25*np.random.randn(n)
y = 3*x[:,0] + 4*x[:,1] + 5 + e
print(x.shape)
print(y.shape)
linmod = scipy.odr.Model(linfit)
data = scipy.odr.Data(x, y)
odrfit = scipy.odr.ODR(data, linmod, beta0=[1., 1., 1.])
odrres = odrfit.run()
odrres.pprint()
It raises the following exception:
scipy.odr.odrpack.odr_error: number of observations do not match
Which seems to be related to my matrix shapes, but I do not know how must I shape it properly. Does anyone know?

Firstly, in my experience scipy.odr uses mostly arrays, not matrices. The library seems to make a large amount of size checks along the way and getting it to work with multiple variables seems to be quite troublesome.
This is the workflow how I usually get it to work (and worked at least on python 2.7):
import numpy as np
import scipy.odr
n = 1000
t = np.linspace(0, 1, n)
def linfit(beta, x):
return beta[0]*x[0] + beta[1]*x[1] + beta[2] #notice changed indices for x
x1 = 2.5*np.sin(2*np.pi*6*t)+4
x2 = 0.5*np.sin(2*np.pi*7*t + np.pi/3)+2
x = np.row_stack( (x1, x2) ) #odr doesn't seem to work with column_stack
e = 0.25*np.random.randn(n)
y = 3*x[0] + 4*x[1] + 5 + e #indices changed
linmod = scipy.odr.Model(linfit)
data = scipy.odr.Data(x, y)
odrfit = scipy.odr.ODR(data, linmod, beta0=[1., 1., 1.])
odrres = odrfit.run()
odrres.pprint()
So using identical (1D?) arrays, using row_stack and adressing by single index number seems to work.

Python lmfit: Fitting a 2D Model

I'm trying to fit a 2D-Gaussian to some greyscale image data, which is given by one 2D array.
The lmfit library implements a easy-to-use Model class, that should be capable of doing this.
Unfortunately the documentation (http://lmfit.github.io/lmfit-py/model.html) does only provide examples for 1D fitting. For my case I simply construct the lmfit Model with 2 independent variables.
The following code seems valid for me, but causes scipy to throw a "minpack.error: Result from function call is not a proper array of floats."
Tom sum it up: How to input 2D (x1,x2)->(y) data to a Model of lmfit.?
Here is my approach:
Everything is packed in a GaussianFit2D class, but here are the important parts:
That's the Gaussian function. The documentation says about user defined functions
Of course, the model function will have to return an array that will be the same size as the data being modeled. Generally this is handled by also specifying one or more independent variables.
I don't really get what this should mean, since for given values x1,x2 the only reasonable result is a scalar value.
def _function(self, x1, x2, amp, wid, cen1, cen2):
val = (amp/(np.sqrt(2*np.pi)*wid)) * np.exp(-((x1-cen1)**2+(x2-cen2)**2)/(2*wid**2))
return val
Here the model is generated:
def _buildModel(self, **kwargs):
model = lmfit.Model(self._function, independent_vars=["x1", "x2"],
param_names=["amp", "wid", "cen1", "cen2"])
return model
That's the function that takes the data, builds the model and params and calls lmfit fit():
def fit(self, data, freeX, **kwargs):
freeX = np.asarray(freeX, float)
model = self._buildModel(**kwargs)
params = self._generateModelParams(model, **kwargs)
model.fit(data, x1=freeX[0], x2=freeX[1], params=params)
Anf finally here this fit function gets called:
data = np.asarray(img, float)
gaussFit = GaussianFit2D()
x1 = np.arange(len(img[0, :]))
x2 = np.arange(len(img[:, 0]))
fit = gaussFit.fit(data, [x1, x2])

Ok, wrote with the devs and got the answer from them (thanks to Matt here).
The basic idea is to flatten all the input to 1D data, hiding from lmfit the >1 dimensional input.
Here's how you do it.
Modify your function:
def function(self, x1, x2):
return (x1+x2).flatten()
Flatten your 2D input array you want to fit to:
...
data = data.flatten()
...
Modify the two 1D x-variables such that you have any combination of them:
...
x1n = []
x2n = []
for i in x1:
for j in x2:
x1n.append(i)
x2n.append(j)
x1n = np.asarray(x1n)
x2n = np.asarray(x2n)
...
And throw anything into the fitter:
model.fit(data, x1=x1n, x2=x2n, params=params)

Here is an example for your reference, hope it may help you.
import numpy
from lmfit import Model
def gaussian(x, cenu, cenv, wid):
u = x[:, 0]
v = x[:, 1]
return (1/(2*numpy.pi*wid**2)) * numpy.exp(-(u-cenu)**2 / (2*wid**2)-(v-cenv)**2 / (2*wid**2))
data = numpy.empty((25,3))
x = numpy.arange(-2,3,1)
y = numpy.arange(-2,3,1)
xx, yy = numpy.meshgrid(x, y)
data[:,0] = xx.flatten()
data[:,1] = yy.flatten()
data[:, 2]= gaussian(data[:,0:2],0,0,0.5)
print 'xx\n', xx
print 'yy\n',yy
print 'data to be fit\n', data[:, 2]
cu = 0.9
cv = 0.5
wid = 1
gmod = Model(gaussian)
gmod.set_param_hint('cenu', value=cu, min=cu-2, max=cu+2)
gmod.set_param_hint('cenv', value=cv, min=cv -2, max=cv+2)
gmod.set_param_hint('wid', value=wid, min=0.1, max=5)
params = gmod.make_params()
result = gmod.fit(data[:, 2], x=data[:, 0:2], params=params)
print result.fit_report(min_correl=0.25)
print result.best_values
print result.best_fit

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Apply non-linear regression for multi dimension data samples in Python - python

Related

fft normalization vs the norm-option

How to include known parameter that changes over time in solve_bvp

Fitting a quadratic function in python without numpy polyfit

Scipy.Odr multiple variable regression

Python lmfit: Fitting a 2D Model

Categories

Resources