Implicit fitting with scipy.odr - python

By using scipy.odr (link), I would like to perform a fit by assuming an implicit model function. For simplicity, let us assume a linear model, namely
y = a*x + b, (1)
and we are interested in its implicit form, i.e.
a*x + b - y = 0. (2)
In the following I show my code:
import scipy.odr
import numpy as np
import matplotlib.pyplot as plt
a = 1
b = 2
x = np.linspace(0, 10, 50)
y = a * x + b + np.random.normal(scale=0.1, size=len(x))
# Define function for scipy.odr
def func(p, var):
x, y = var
a, b = p
return a * x + b - y
# Fit the data using scipy.odr
Model = scipy.odr.Model(func, implicit=True)
Data = scipy.odr.Data(np.array([x,y]), 0)
Odr = scipy.odr.ODR(Data, Model, [a,b], maxit=10000)
From the last line of the code I get this error:
in _check(self)
848 print(res.shape)
849 print(fcn_perms)
--> 850 raise OdrError("fcn does not output %s-shaped array" % y_s)
851
852 if self.model.fjacd is not None:
OdrError: fcn does not output [0, 50]-shaped array
I don't get what I should modify in the code. For sure there is something I'm not understanding, even if I carefully read the documentation.
EDIT:
I misunderstood the meaning of the y parameter in scipy.odr.Data. I thought it was the right-hand side of Eq. (2), but instead it is the dimensionality of the response. This is the correct line of code:
Data = scipy.odr.Data(np.array([x,y]), 1)
Now the code properly works.

Related

How to use scipy least_squares to get the estimation of unknow variables

I am a newbie in using scipy.optimize. I have the following function calls func. I have x and y values given as a list and need to get the estimated value of a, b and c. I could use curve_fit to get the estimation of a, b and c. However, I want to explore the possibilities of using least_squares.
When I run the following code, I get the below error. It'd be great if anyone could point me to the right direction.
import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import least_squares
np.random.seed(0)
x = np.random.randint(0, 100, 100) # sample dataset for independent variables
y = np.random.randint(0, 100, 100) # sample dataset for dependent variables
def func(x,a,b,c):
return a*x**2 + b*x + c
def result(list_x, list_y):
popt = curve_fit(func, list_x, list_y)
sol = least_squares(result,x, args=(y,),method='lm',jac='2-point',max_nfev=2000)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''
The following code uses the least_squares() routine for optimization. The most important change in comparison to your code is ensuring that func() returns a vector of residuals. I also compared the solution to the linear algebra result to ensure correctness.
import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import least_squares
np.random.seed(0)
x = np.random.randint(0, 100, 100) # sample dataset for independent variables
y = np.random.randint(0, 100, 100) # sample dataset for dependent variables
def func(theta, x, y):
# Return residual = fit-observed
return (theta[0]*x**2 + theta[1]*x + theta[2]) - y
# Initial parameter guess
theta0 = np.array([0.5, -0.1, 0.3])
# Compute solution providing initial guess theta0, x input, and y input
sol = least_squares(func, theta0, args=(x,y))
print(sol.x)
#------------------- OPTIONAL -------------------#
# Compare to linear algebra solution
temp = x.reshape((100,1))
X = np.hstack( (temp**2, temp, np.ones((100,1))) )
OLS = np.linalg.lstsq(X, y.reshape((100,1)), rcond=None)
print(OLS[0])
To use least_squares you need a residual function and not the curve_fit. Also, least_squares requires a guess for the parameters that you are fitting (i.e. a,b,c). In your case, if you want to use least_squares you can write something similar (I just used random values for the guess)
import numpy as np
from scipy.optimize import least_squares
np.random.seed(0)
x = np.random.randint(0, 100, 100) # sample dataset for independent variables
y = np.random.randint(0, 100, 100) # sample dataset for dependent variables
def func(x,a,b,c):
return a*x**2 + b*x + c
def residual(p, x, y):
return y - func(x, *p)
guess = np.random.rand(3)
sol = least_squares(residual, guess, args=(x, y,),method='lm',jac='2-point',max_nfev=2000)

fitting an inverse proportional function

I want to fit the function f(x) = b + a / x to my data set. For that I found scipy leastsquares from optimize were suitable.
My code is as follows:
x = np.asarray(range(20,401,20))
y is distances that I calculated, but is an array of length 20, here is just random numbers for example
y = np.random.rand(20)
Initial guesses of the params a and b:
params = np.array([1,1])
Function to minimize
def funcinv(x):
return params[0]/x+params[1]
res = least_squares(funinv, params, args=(x, y))
Error given:
return np.atleast_1d(fun(x, *args, **kwargs))
TypeError: funinv() takes 1 positional argument but 3 were given
How can I fit my data?
To make a little of clarity. There are two related problems:
Minimizing a function
Fitting model to data
To fit a model to observed data is to find such parameters of a model which minimize some sort of error between model data and observed data.
least_squares method just minimizes a following function with respect to x (x can be a vector).
F(x) = 0.5 * sum(rho(f_i(x)**2), i = 0, ..., m - 1)
(rho is a loss function and default is rho(x) = x so don't mind it for now)
least_squares(func, x0) expects that call to func(x) will return a vector [a1, a2, a3, ...] for which a sum of squares will be computed: S = 0.5 * (a1^2 + a2^2 + a3^2 + ...).
least_squares will tweak x0 to minimize S.
Thus, in order to use it to fit model to data, one must construct a function of error between a model and actual data - residuals and then minimize that residuals function. In your case you can write it as follows:
import numpy as np
from scipy.optimize import least_squares
x = np.asarray(range(20,401,20))
y = np.random.rand(20)
params = np.array([1,1])
def funcinv(x, a, b):
return b + a/x
def residuals(params, x, data):
# evaluates function given vector of params [a, b]
# and return residuals: (observed_data - model_data)
a, b = params
func_eval = funcinv(x, a, b)
return (data - func_eval)
res = least_squares(residuals, params, args=(x, y))
This gives a result:
print(res)
...
message: '`gtol` termination condition is satisfied.'
nfev: 4
njev: 4 optimality: 5.6774618339971994e-10
status: 1
success: True
x: array([ 6.89518618, 0.37118815])
However, as a residuals function pretty much the same all the time (res = observed_data - model_data), there is a shortcut in scipy.optimize called curve_fit: curve_fit(func, xdata, ydata, x0). curve_fit builds residuals function automatically and you can simply write:
import numpy as np
from scipy.optimize import curve_fit
x = np.asarray(range(20,401,20))
y = np.random.rand(20)
params = np.array([1,1])
def funcinv(x, a, b):
return b + a/x
res = curve_fit(funcinv, x, y, params)
print(res) # ... array([ 6.89518618, 0.37118815]), ...

Doing many iterations of curve_fit in one go for piecewise function

I'm trying to perform what are many iterations of Scipy's curve_fit at once in order to avoid loops and therefore increase speed.
This is very similar to this problem, which was solved. However, the fact that the functions are piece-wise (discontinuous) makes so that that solution isn't applicable here.
Consider this example:
import numpy as np
from numpy import random as rng
from scipy.optimize import curve_fit
rng.seed(0)
N=20
X=np.logspace(-1,1,N)
Y = np.zeros((4, N))
for i in range(0,4):
b = i+1
a = b
print(a,b)
Y[i] = (X/b)**(-a) #+ 0.01 * rng.randn(6)
Y[i, X>b] = 1
This yields these arrays:
Which as you can see are discontinuous at X==b. I can retrieve the original values of a and b by using curve_fit iteratively:
def plaw(r, a, b):
""" Theoretical power law for the shape of the normalized conditional density """
import numpy as np
return np.piecewise(r, [r < b, r >= b], [lambda x: (x/b)**-a, lambda x: 1])
coeffs=[]
for ix in range(Y.shape[0]):
print(ix)
c0, pcov = curve_fit(plaw, X, Y[ix])
coeffs.append(c0)
But this process can be very slow depending of the size of X, Y and the loop, so I'm trying to speed things up by trying to get coeffs without the need for a loop. So far I haven't had any luck.
Things that might be important:
X and Y only contain positive values
a and b are always positive
Although the data to fit in this example is smooth (for the sake of simplicity), the real data has noise
EDIT
This is as far as I've gotten:
y=np.ma.masked_where(Y<1.01, Y)
lX = np.log(X)
lY = np.log(y)
A = np.vstack([lX, np.ones(len(lX))]).T
m,c=np.linalg.lstsq(A, lY.T)[0]
print('a=',-m)
print('b=',np.exp(-c/m))
But even without any noise the output is:
a= [0.18978965578339158 1.1353633705997466 2.220234483915197 3.3324502660995714]
b= [339.4090881838179 7.95073481873057 6.296592007396107 6.402567167503574]
Which is way worse than I was hoping to get.
Here are three approaches to speeding this up. You gave no desired speed up or accuracies, or even vector sizes, so buyer beware.
TL;DR
Timings:
len 1 2 3 4
1000 0.045 0.033 0.025 0.022
10000 0.290 0.097 0.029 0.023
100000 3.429 0.767 0.083 0.030
1000000 0.546 0.046
1) Original Method
2) Pre-estimate with Subset
3) M Newville [linear log-log estimate](https://stackoverflow.com/a/44975066/7311767)
4) Subset Estimate (Use Less Data)
Pre-estimate with Subset (Method 2):
A decent speedup can be achieved by simply running the curve_fit twice, where the first time uses a short subset of the data to get a quick estimate. That estimate is then used to seed a curve_fit with the entire dataset.
x, y = current_data
stride = int(max(1, len(x) / 200))
c0 = curve_fit(power_law, x[0:len(x):stride], y[0:len(y):stride])[0]
return curve_fit(power_law, x, y, p0=c0)[0]
M Newville linear log-log estimate (Method 3):
Using the log estimate proposed by M Newville, is also considerably faster. As the OP was concerned about the initial estimate method proposed by Newville, this method uses curve_fit with a subset to provide the estimate of the break point in the curve.
x, y = current_data
stride = int(max(1, len(x) / 200))
c0 = curve_fit(power_law, x[0:len(x):stride], y[0:len(y):stride])[0]
index_max = np.where(x > c0[1])[0][0]
log_x = np.log(x[:index_max])
log_y = np.log(y[:index_max])
result = linregress(log_x, log_y)
return -result[0], np.exp(-result[1] / result[0])
return (m, c), result
Use Less Data (Method 4):
Finally the seed mechanism used for the previous two methods provides pretty good estimates on the sample data. Of course it is sample data so your mileage may vary.
stride = int(max(1, len(x) / 200))
c0 = curve_fit(power_law, x[0:len(x):stride], y[0:len(y):stride])[0]
Test Code:
import numpy as np
from numpy import random as rng
from scipy.optimize import curve_fit
from scipy.stats import linregress
fit_data = {}
current_data = None
def data_for_fit(a, b, n):
key = a, b, n
if key not in fit_data:
rng.seed(0)
x = np.logspace(-1, 1, n)
y = np.clip((x / b) ** (-a) + 0.01 * rng.randn(n), 0.001, None)
y[x > b] = 1
fit_data[key] = x, y
return fit_data[key]
def power_law(r, a, b):
""" Power law for the shape of the normalized conditional density """
import numpy as np
return np.piecewise(
r, [r < b, r >= b], [lambda x: (x/b)**-a, lambda x: 1])
def method1():
x, y = current_data
return curve_fit(power_law, x, y)[0]
def method2():
x, y = current_data
return curve_fit(power_law, x, y, p0=method4()[0])
def method3():
x, y = current_data
c0, pcov = method4()
index_max = np.where(x > c0[1])[0][0]
log_x = np.log(x[:index_max])
log_y = np.log(y[:index_max])
result = linregress(log_x, log_y)
m, c = -result[0], np.exp(-result[1] / result[0])
return (m, c), result
def method4():
x, y = current_data
stride = int(max(1, len(x) / 200))
return curve_fit(power_law, x[0:len(x):stride], y[0:len(y):stride])
from timeit import timeit
def runit(stmt):
print("%s: %.3f %s" % (
stmt, timeit(stmt + '()', number=10,
setup='from __main__ import ' + stmt),
eval(stmt + '()')[0]
))
def runit_size(size):
print('Length: %d' % size)
if size <= 100000:
runit('method1')
runit('method2')
runit('method3')
runit('method4')
for i in (1000, 10000, 100000, 1000000):
current_data = data_for_fit(3, 3, i)
runit_size(i)
Two suggestions:
Use numpy.where (and possibly argmin) to find the X value at which the Y data becomes 1, or perhaps just slightly larger than 1, and truncate the data to that point -- effectively ignoring the data where Y=1.
That might be something like:
index_max = numpy.where(y < 1.2)[0][0]
x = y[:index_max]
y = y[:index_max]
Use the hint shown in your log-log plot that the power law is now linear in log-log. You don't need curve_fit, but can use scipy.stats.linregress on log(Y) vs log(Y). For your real work, that will at the very least give good starting values for a subsequent fit.
Following up on this and trying to follow your question, you might try something like:
import numpy as np
from scipy.stats import linregress
np.random.seed(0)
npts = 51
x = np.logspace(-2, 2, npts)
YTHRESH = 1.02
for i in range(5):
b = i + 1.0 + np.random.normal(scale=0.1)
a = b + np.random.random()
y = (x/b)**(-a) + np.random.normal(scale=0.0030, size=npts)
y[x>b] = 1.0
# to model exponential decay, first remove the values
# where y ~= 1 where the data is known to not decay...
imax = np.where(y < YTHRESH)[0][0]
# take log of this truncated x and y
_x = np.log(x[:imax])
_y = np.log(y[:imax])
# use linear regression on the log-log data:
out = linregress(_x, _y)
# map slope/intercept to scale, exponent
afit = -out.slope
bfit = np.exp(out.intercept/afit)
print(""" === Fit Example {i:3d}
a expected {a:4f}, got {afit:4f}
b expected {b:4f}, got {bfit:4f}
""".format(i=i+1, a=a, b=b, afit=afit, bfit=bfit))
Hopefully that's enough to get you going.

Python lambda function with arrays as parameters

I am trying to define a function of n variables to fit to a data set. The function looks like this.
Kelly Function
I then want to find the optimal ai's and bj's to fit my data set using scipy.optimize.leastsq
Here's my code so far.
from scipy.optimize import leastsq
import numpy as np
def kellyFunc(a, b, x): #Function to fit.
top = 0
bot = 0
a = [a]
b = [b]
for i in range(len(a)):
top = top + a[i]*x**(2*i)
bot = bot + b[i]*x**(2*i)
return(top/bot)
def fitKelly(x, y, n):
line = lambda params, x : kellyFunc(params[0,:], params[1,:], x) #Lambda Function to minimize
error = lambda params, x, y : line(params, x) - y #Kelly - dataset
paramsInit = [[1 for x in range(n)] for y in range(2)] #define all ai and bi = 1 for initial guess
paramsFin, success = leastsq(error, paramsInit, args = (x,y)) #run leastsq optimization
#line of best fit
xx = np.linspace(x.min(), x.max(), 100)
yy = line(paramsFin, xx)
return(paramsFin, xx, yy)
At the moment it's giving me the error:
"IndexError: too many indices" because of the way I've defined my initial lambda function with params[0,:] and params[1,:].
There are a few problems with your approach that makes me write a full answer.
As for your specific question: leastsq doesn't really expect multidimensional arrays as parameter input. The documentation doesn't make this clear, but parameter inputs are flattened when passed to the objective function. You can verify this by using full functions instead of lambdas:
from scipy.optimize import leastsq
import numpy as np
def kellyFunc(a, b, x): #Function to fit.
top = 0
bot = 0
for i in range(len(a)):
top = top + a[i]*x**(2*i)
bot = bot + b[i]*x**(2*i)
return(top/bot)
def line(params,x):
print(repr(params)) # params is 1d!
params = params.reshape(2,-1) # need to reshape back
return kellyFunc(params[0,:], params[1,:], x)
def error(params,x,y):
print(repr(params)) # params is 1d!
return line(params, x) - y # pass it on, reshape in line()
def fitKelly(x, y, n):
#paramsInit = [[1 for x in range(n)] for y in range(2)] #define all ai and bi = 1 for initial guess
paramsInit = np.ones((n,2)) #better
paramsFin, success = leastsq(error, paramsInit, args = (x,y)) #run leastsq optimization
#line of best fit
xx = np.linspace(x.min(), x.max(), 100)
yy = line(paramsFin, xx)
return(paramsFin, xx, yy)
Now, as you see, the shape of the params array is (2*n,) instead of (2,n). By doing the re-reshape ourselves, your code (almost) works. Of course the print calls are only there to show you this fact; they are not needed for the code to run (and will produce bunch of needless output in each iteration).
See my other changes, related to other errors: you had a=[a] and b=[b] in your kellyFunc, for no good reason. This turned the input arrays into lists containing arrays, which made the next loop do something very different from what you intended.
Finally, the sneakiest error: you have input variables named x, y in fitKelly, then you use x and y is loop variables in a list comprehension. Please be aware that this only works as you expect it to in python 3; in python 2 the internal variables of list comprehensions actually leak outside the outer scope, overwriting your input variables named x and y.

How to get dimensions right using fmin_cg in scipy.optimize

I have been trying to use fmin_cg to minimize cost function for Logistic Regression.
xopt = fmin_cg(costFn, fprime=grad, x0= initial_theta,
args = (X, y, m), maxiter = 400, disp = True, full_output = True )
This is how I call my fmin_cg
Here is my CostFn:
def costFn(theta, X, y, m):
h = sigmoid(X.dot(theta))
J = 0
J = 1 / m * np.sum((-(y * np.log(h))) - ((1-y) * np.log(1-h)))
return J.flatten()
Here is my grad:
def grad(theta, X, y, m):
h = sigmoid(X.dot(theta))
J = 1 / m * np.sum((-(y * np.log(h))) - ((1-y) * np.log(1-h)))
gg = 1 / m * (X.T.dot(h-y))
return gg.flatten()
It seems to be throwing this error:
/Users/sugethakch/miniconda2/lib/python2.7/site-packages/scipy/optimize/linesearch.pyc in phi(s)
85 def phi(s):
86 fc[0] += 1
---> 87 return f(xk + s*pk, *args)
88
89 def derphi(s):
ValueError: operands could not be broadcast together with shapes (3,) (300,)
I know it's something to do with my dimensions. But I can't seem to figure it out.
I am noob, so I might be making an obvious mistake.
I have read this link:
fmin_cg: Desired error not necessarily achieved due to precision loss
But, it somehow doesn't seem to work for me.
Any help?
Updated size for X,y,m,theta
(100, 3) ----> X
(100, 1) -----> y
100 ----> m
(3, 1) ----> theta
This is how I initialize X,y,m:
data = pd.read_csv('ex2data1.txt', sep=",", header=None)
data.columns = ['x1', 'x2', 'y']
x1 = data.iloc[:, 0].values[:, None]
x2 = data.iloc[:, 1].values[:, None]
y = data.iloc[:, 2].values[:, None]
# join x1 and x2 to make one array of X
X = np.concatenate((x1, x2), axis=1)
m, n = X.shape
ex2data1.txt:
34.62365962451697,78.0246928153624,0
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
.....
If it helps, I am trying to re-code one of the homework assignments for the Coursera's ML course by Andrew Ng in python
Finally, I figured out what the problem in my initial program was.
My 'y' was (100, 1) and the fmin_cg expects (100, ). Once I flattened my 'y' it no longer threw the initial error. But, the optimization wasn't working still.
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 0.693147
Iterations: 0
Function evaluations: 43
Gradient evaluations: 41
This was the same as what I achieved without optimization.
I figured out the way to optimize this was to use the 'Nelder-Mead' method. I followed this answer: scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"
Result = op.minimize(fun = costFn,
x0 = initial_theta,
args = (X, y, m),
method = 'Nelder-Mead',
options={'disp': True})#,
#jac = grad)
This method doesn't need a 'jacobian'.
I got the results I was looking for,
Optimization terminated successfully.
Current function value: 0.203498
Iterations: 157
Function evaluations: 287
Well, since I don't know exactly how your initializing m, X, y, and theta I had to make some assumptions. Hopefully my answer is relevant:
import numpy as np
from scipy.optimize import fmin_cg
from scipy.special import expit
def costFn(theta, X, y, m):
# expit is the same as sigmoid, but faster
h = expit(X.dot(theta))
# instead of 1/m, I take the mean
J = np.mean((-(y * np.log(h))) - ((1-y) * np.log(1-h)))
return J #should be a scalar
def grad(theta, X, y, m):
h = expit(X.dot(theta))
J = np.mean((-(y * np.log(h))) - ((1-y) * np.log(1-h)))
gg = (X.T.dot(h-y))
return gg.flatten()
# initialize matrices
X = np.random.randn(100,3)
y = np.random.randn(100,) #this apparently needs to be a 1-d vector
m = np.ones((3,)) # not using m, used np.mean for a weighted sum (see ali_m's comment)
theta = np.ones((3,1))
xopt = fmin_cg(costFn, fprime=grad, x0=theta, args=(X, y, m), maxiter=400, disp=True, full_output=True )
While the code runs, I don't know enough about your problem to know if this is what you're looking for. But hopefully this can help you understand the problem better. One way to check your answer is to call fmin_cg with fprime=None and see how the answers compare.

Categories

Resources