Scipy.Odr multiple variable regression - python

I would like to perform a multidimensional ODR with scipy.odr. I read the API documentation, it says that multi-dimensionality is possible, but I cannot make it work. I cannot find working example on the internet and API is really crude and give no hints how to proceed.
Here is my MWE:
import numpy as np
import scipy.odr
def linfit(beta, x):
return beta[0]*x[:,0] + beta[1]*x[:,1] + beta[2]
n = 1000
t = np.linspace(0, 1, n)
x = np.full((n, 2), float('nan'))
x[:,0] = 2.5*np.sin(2*np.pi*6*t)+4
x[:,1] = 0.5*np.sin(2*np.pi*7*t + np.pi/3)+2
e = 0.25*np.random.randn(n)
y = 3*x[:,0] + 4*x[:,1] + 5 + e
print(x.shape)
print(y.shape)
linmod = scipy.odr.Model(linfit)
data = scipy.odr.Data(x, y)
odrfit = scipy.odr.ODR(data, linmod, beta0=[1., 1., 1.])
odrres = odrfit.run()
odrres.pprint()
It raises the following exception:
scipy.odr.odrpack.odr_error: number of observations do not match
Which seems to be related to my matrix shapes, but I do not know how must I shape it properly. Does anyone know?

Firstly, in my experience scipy.odr uses mostly arrays, not matrices. The library seems to make a large amount of size checks along the way and getting it to work with multiple variables seems to be quite troublesome.
This is the workflow how I usually get it to work (and worked at least on python 2.7):
import numpy as np
import scipy.odr
n = 1000
t = np.linspace(0, 1, n)
def linfit(beta, x):
return beta[0]*x[0] + beta[1]*x[1] + beta[2] #notice changed indices for x
x1 = 2.5*np.sin(2*np.pi*6*t)+4
x2 = 0.5*np.sin(2*np.pi*7*t + np.pi/3)+2
x = np.row_stack( (x1, x2) ) #odr doesn't seem to work with column_stack
e = 0.25*np.random.randn(n)
y = 3*x[0] + 4*x[1] + 5 + e #indices changed
linmod = scipy.odr.Model(linfit)
data = scipy.odr.Data(x, y)
odrfit = scipy.odr.ODR(data, linmod, beta0=[1., 1., 1.])
odrres = odrfit.run()
odrres.pprint()
So using identical (1D?) arrays, using row_stack and adressing by single index number seems to work.

Related

Numpy only computation of mathematical expression involving a nested sum of functions over the same array

I need help to compute a mathematical expression using only numpy operations. The expression I want to compute is the following :
Where : x is an (N, S) array and f is a numpy function (that can work with broadcastable arrays e.g np.maximum, np.sum, np.prod, ...). If that is of importance, in my case f is a symetric function.
So far my code looks like this:
s = 0
for xp in x: # Loop over N...
s += np.sum(np.prod(f(xp, x), axis=1))
And still has loop that I'd like to get rid of.
Typically N is "large" (around 30k) but S is small (less than 20) so if anyone can find a trick to only loop over S this would still be a major improvement.
I belive the problem is easy by N-plicating the array but one of size (32768, 32768, 20) requires 150Go of RAM that I don't have. However, (32768, 32768) fits in memory though I would appreciate a solution that does not allocate such array.
Maybe a use of np.einsum with well-chosen arrays is possible?
Thanks for your replies. If any information is missing let me know!
Have a nice day !
Edit 1 :
Form of f I'm interested in includes (for now) : f(x, y) = |x - y|, f(x, y) = |x - y|^2, f(x, y) = 2 - max(x, y).
Your loop is very efficient. Some possible ways are
Method-1 (looping over S)
import numpy as np
def f(x,y):
return np.abs(x-y)
N = 200
S = 20
x_data = random.rand(N,S) #(i,s)
y_data = random.rand(N,S) #(i',s)
product = f(broadcast_to(x_data[:,0][...,None],(N,N)) ,broadcast_to(y_data[:,0][...,None],(N,N)).T)
for i in range(1,S):
product *= f(broadcast_to(x_data[:,i][...,None],(N,N)) ,broadcast_to(y_data[:,i][...,None],(N,N)).T)
sum = np.sum(product)
Method-2 (dispatching S number of blocks)
import numpy as np
def f(x,y):
x1 = np.broadcast_to(x[:,None,...],(x.shape[0],y.shape[0],x.shape[1]))
y1 = np.broadcast_to(y[None,...],(x.shape[0],y.shape[0],x.shape[1]))
return np.abs(x1-y1)
def f1(x1,y1):
return np.abs(x1-y1)
N = 5000
S = 20
x_data = np.random.rand(N,S) #(i,s)
y_data = np.random.rand(N,S) #(i',s)
def fun_new(x_data1,y_data1):
s = 0
pp =np.split(x_data1,S,axis=0)
for xp in pp:
s += np.sum(np.prod(f(xp, y_data1), axis=2))
return s
def fun_op(x_data1,y_data1):
s = 0
for xp in x_data1: # Loop over N...
s += np.sum(np.prod(f1(xp, y_data1), axis=1))
return s
fun_new(x_data,y_data)

How to include known parameter that changes over time in solve_bvp

I am trying to use scipy's solve_bvp in python to solve differential equations that depend on a known parameter that changes over time. I have this parameter saved in a numpy array. However, when I try to use this array in the derivatives function, I get the following error ValueError: operands could not be broadcast together with shapes (10,) (11,).
Below is a simplified version of my code. I want the variable d2 to take certain values at different times according to an array, d2_set_values. The differential equations for some of the 12 variables then depend on d2. I hope it's clear from this code what I'm trying to achieve.
import numpy as np
from scipy.integrate import solve_bvp
t = np.linspace(0, 10, 11)
# Known parameter that changes over time
d2_set_values = np.zeros(t.size)
d2_set_values[:4] = 0.1
d2_set_values[4:8] = 0.2
d2_set_values[8:] = 0.1
# Initialise y vector
y = np.zeros((12, t.size))
# ODEs
def fun(x, y):
S1, I1, R1, S2, I2, R2, lamS1, lamI1, lamR1, lamS2, lamI2, lamR2 = y
d1 = 0.5*(I1 + 0.1*I2)*(lamS1 - lamI1)
d2 = d2_set_values
dS1dt = -0.5*S1*(1-d1)*(I1 + 0.1*I2)
dS2dt = -0.5*S2*(1-d2)*(I2 + 0.1*I1)
dI1dt = 0.5*S1*(1-d1)*(I1 + 0.1*I2) - 0.2*I1
dI2dt = 0.5*S2*(1-d2)*(I2 + 0.1*I1) - 0.2*I2
dR1dt = 0.2*I1
dR2dt = 0.2*I2
dlamS1dt = 0.5*(1-d1)*S1*lamS1
dlamS2dt = 0.5*(1-d2)*S2*lamS2
dlamI1dt = 0.5*(1-d1)*I1*lamI1
dlamI2dt = 0.5*(1-d2)*I2*lamI2
dlamR1dt = lamR1
dlamR2dt = lamR2
return np.vstack((dS1dt, dI1dt, dR1dt, dS2dt, dI2dt, dR2dt, dlamS1dt, dlamI1dt, dlamR1dt, dlamS2dt, dlamI2dt, dlamR2dt))
# Boundary conditions
def bc(ya, yb):
return np.array([ya[0]-0.99, ya[1]-0.01, ya[2]-0., ya[3]-1.0, ya[4]-0., ya[5]-0.,
yb[6]-0., yb[7]-1., yb[8]-0., yb[9]-0, yb[10]-0, yb[11]-0])
# Run the solver
sol = solve_bvp(fun, bc, t, y)
I have even tried reducing the size of d2_set_values by one, but that doesn't solve the issue.
Any help I can get would be much appreciated!

Fitting a quadratic function in python without numpy polyfit

I am trying to fit a quadratic function to some data, and I'm trying to do this without using numpy's polyfit function.
Mathematically I tried to follow this website https://neutrium.net/mathematics/least-squares-fitting-of-a-polynomial/ but somehow I don't think that I'm doing it right. If anyone could assist me that would be great, or If you could suggest another way to do it that would also be awesome.
What I've tried so far:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
ones = np.ones(3)
A = np.array( ((0,1),(1,1),(2,1)))
xfeature = A.T[0]
squaredfeature = A.T[0] ** 2
b = np.array( (1,2,0), ndmin=2 ).T
b = b.reshape(3)
features = np.concatenate((np.vstack(ones), np.vstack(xfeature), np.vstack(squaredfeature)), axis = 1)
featuresc = features.copy()
print(features)
m_det = np.linalg.det(features)
print(m_det)
determinants = []
for i in range(3):
featuresc.T[i] = b
print(featuresc)
det = np.linalg.det(featuresc)
determinants.append(det)
print(det)
featuresc = features.copy()
determinants = determinants / m_det
print(determinants)
plt.scatter(A.T[0],b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
p2 = np.polyfit(A.T[0],b,2)
plt.plot(u, np.polyval(p2,u), 'b--')
plt.show()
As you can see my curve doesn't compare well to nnumpy's polyfit curve.
Update:
I went through my code and removed all the stupid mistakes and now it works, when I try to fit it over 3 points, but I have no idea how to fit over more than three points.
This is the new code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
ones = np.ones(3)
A = np.array( ((0,1),(1,1),(2,1)))
xfeature = A.T[0]
squaredfeature = A.T[0] ** 2
b = np.array( (1,2,0), ndmin=2 ).T
b = b.reshape(3)
features = np.concatenate((np.vstack(ones), np.vstack(xfeature), np.vstack(squaredfeature)), axis = 1)
featuresc = features.copy()
print(features)
m_det = np.linalg.det(features)
print(m_det)
determinants = []
for i in range(3):
featuresc.T[i] = b
print(featuresc)
det = np.linalg.det(featuresc)
determinants.append(det)
print(det)
featuresc = features.copy()
determinants = determinants / m_det
print(determinants)
plt.scatter(A.T[0],b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
p2 = np.polyfit(A.T[0],b,2)
plt.plot(u, np.polyval(p2,u), 'r--')
plt.show()
Instead using Cramer's Rule, actually solve the system using least squares. Remember that Cramer's Rule will only work if the total number of points you have equals the desired order of polynomial plus 1.
If you don't have this, then Cramer's Rule will not work as you're trying to find an exact solution to the problem. If you have more points, the method is unsuitable as we will create an overdetermined system of equations.
To adapt this to more points, numpy.linalg.lstsq would be a better fit as it solves the solution to the Ax = b by computing the vector x that minimizes the Euclidean norm using the matrix A. Therefore, remove the y values from the last column of the features matrix and solve for the coefficients and use numpy.linalg.lstsq to solve for the coefficients:
import numpy as np
import matplotlib.pyplot as plt
ones = np.ones(4)
xfeature = np.asarray([0,1,2,3])
squaredfeature = xfeature ** 2
b = np.asarray([1,2,0,3])
features = np.concatenate((np.vstack(ones),np.vstack(xfeature),np.vstack(squaredfeature)), axis = 1) # Change - remove the y values
determinants = np.linalg.lstsq(features, b)[0] # Change - use least squares
plt.scatter(xfeature,b)
u = np.linspace(0,3,100)
plt.plot(u, u**2*determinants[2] + u*determinants[1] + determinants[0] )
plt.show()
I get this plot now, which matches what the dashed curve is in your graph, also matching what numpy.polyfit gives you:

3-D interpolation using LinearNDInterpolator (Python)

I want to interpolate some 3-d data using the scipy LinearNDInterpolator function (Python 2.7). I can't quite figure out how to use it, though: below is my attempt. I'm getting the error ValueError: different number of values and points. This leads me to believe that the shape of "coords" is not appropriate for these data, but it looks in the documentation like the shape is okay.
Note that in the data I really want to use (instead of this example) the spacing of my grid is irregular, so something like RegularGridInterpolator will not do the trick.
Thanks very much for your help!
def f(x,y,z):
return 2 * x**3 + 3 * y**2 - z
x = np.linspace(1,2,2)
y = np.linspace(1,2,2)
z = np.linspace(1,2,2)
data = f(*np.meshgrid(x, y, z, indexing='ij', sparse=True))
coords = np.zeros((len(x),len(y),len(z),3))
coords[...,0] = x.reshape((len(x),1,1))
coords[...,1] = y.reshape((1,len(y),1))
coords[...,2] = z.reshape((1,1,len(z)))
coords = coords.reshape(data.size,3)
my_interpolating_function = LinearNDInterpolator(coords,data)
pts = np.array([[2.1, 6.2, 8.3], [3.3, 5.2, 7.1]])
print(my_interpolating_function(pts))

Apply non-linear regression for multi dimension data samples in Python

I have installed Numpy and SciPy, but I'm not quite understand their documentation about polyfit.
For exmpale, Here's my three data samples:
[-0.042780748663101636, -0.0040771571786609945, -0.00506567946276074]
[0.042780748663101636, -0.0044771571786609945, -0.10506567946276074]
[0.542780748663101636, -0.005771571786609945, 0.30506567946276074]
[-0.342780748663101636, -0.0304077157178660995, 0.90506567946276074]
The first two columns are sample features, the third column is output, My target is to get a function that could take two parameters(first two columns) and return its prediction(the output).
Any simple example ?
====================== EDIT ======================
Note that, I need to fit something like a curve, not only straight lines. The polynomial should be something like this ( n = 3):
a*x1^3 + b*x2^2 + c*x3 + d = y
Not:
a*x1 + b*x2 + c*x3 + d = y
x1, x2, x3 are features of one sample, y is the output
Try something like
edit: added an example function that used results of linear regression to estimate output.
import numpy as np
data =np.array(
[[-0.042780748663101636, -0.0040771571786609945, -0.00506567946276074],
[0.042780748663101636, -0.0044771571786609945, -0.10506567946276074],
[0.542780748663101636, -0.005771571786609945, 0.30506567946276074],
[-0.342780748663101636, -0.0304077157178660995, 0.90506567946276074]])
coefficient = data[:,0:2]
dependent = data[:,-1]
x,residuals,rank,s = np.linalg.lstsq(coefficient,dependent)
def f(x,u,v):
return u*x[0] + v*x[1]
for datum in data:
print f(x,*datum[0:2])
Which gives
>>> x
array([ 0.16991146, -30.18923739])
>>> residuals
array([ 0.07941146])
>>> rank
2
>>> s
array([ 0.64490113, 0.02944663])
and the function created with your coefficients gave
0.115817326583
0.142430900298
0.266464019171
0.859743371665
More info can be found at the documentation I posted as a comment.
edit 2: fitting your data to an arbitrary model.
edit 3: made my model a function for ease of understanding.
edit 4: made code more easily read/ changed model to a quadratic fit, but you should be able to read this code and know how to make it minimize any residual you want now.
contrived example:
import numpy as np
from scipy.optimize import leastsq
data =np.array(
[[-0.042780748663101636, -0.0040771571786609945, -0.00506567946276074],
[0.042780748663101636, -0.0044771571786609945, -0.10506567946276074],
[0.542780748663101636, -0.005771571786609945, 0.30506567946276074],
[-0.342780748663101636, -0.0304077157178660995, 0.90506567946276074]])
coefficient = data[:,0:2]
dependent = data[:,-1]
def model(p,x):
a,b,c = p
u = x[:,0]
v = x[:,1]
return (a*u**2 + b*v + c)
def residuals(p, y, x):
a,b,c = p
err = y - model(p,x)
return err
p0 = np.array([2,3,4]) #some initial guess
p = leastsq(residuals, p0, args=(dependent, coefficient))[0]
def f(p,x):
return p[0]*x[0] + p[1]*x[1] + p[2]
for x in coefficient:
print f(p,x)
gives
-0.108798280153
-0.00470479385807
0.570237823475
0.413016072653

Categories

Resources