Python3 Scipy: Desired error not necessarily achieved due to precision loss - python

I'm implementing Andrew Ng's Coursera course in Python and I'm doing Ex2 right now, Logistic Regression. I'm trying to use SciPy's optimize.minimize but I can't seem to get it to run correctly. I'll try to give as brief a summary of my code as possible while being thorough. I'm using Python3. Here is my variable setup, I move everything to numpy after using pandas to read in the csv file:
import numpy as np
import pandas as pd
from scipy.optimize import fmin_bfgs
from scipy import optimize as opt
from scipy.optimize import minimize
class Ex2:
def __init__(self):
self.pandas_data = pd.read_csv("ex2data1.txt", skipinitialspace=True)
self.data = self.pandas_data.values
self.data = np.insert(self.data, 0, 1, axis=1)
self.x = self.data[:, 0:3]
self.y = self.data[:, 3:]
self.theta = np.zeros(shape=(self.x.shape[1]))
x: (100, 3) numpy ndarray
y: (100, 1) numpy ndarray
theta: (3,) numpy ndarray (1-d)
Then, I define a sigmoid, cost and gradient function to give to Scipy's minimize:
#staticmethod
def sigmoid(x):
return 1/(1 + np.exp(x))
def cost(self, theta):
x = self.x
y = self.y
m = len(y)
h = self.sigmoid(x.dot(theta))
j = (1/m) * ((-y.T.dot(np.log(h))) - ((1-y).T.dot(np.log(1-h))))
return j[0]
def grad(self, theta):
x = self.x
y = self.y
theta = np.expand_dims(theta, axis=0)
m = len(y)
h = self.sigmoid(x.dot(theta.T))
grad = (1/m) * (x.T.dot(h-y))
grad = np.squeeze(grad)
return grad
These take theta, a 1-D numpy ndarray. Cost returns a scalar (the cost associated with the theta given) and gradient returns a 1-D numpy ndarray of updates for theta.
When I then run this code:
def run(self):
options = {'maxiter': 100}
print(minimize(self.cost, self.theta, jac=self.grad, options=options))
ex2 = Ex2()
ex2.run()
I get:
fun: 0.69314718055994529
hess_inv: array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
jac: array([ -0.1 , -12.00921659, -11.26284221])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 106
nit: 0
njev: 94
status: 2
success: False
x: array([ 0., 0., 0.])
Process finished with exit code 0
Can't quite get the formatting right on the output, apologies. That's the gist of what I'm doing, am I returning something from cost or gradient incorrectly? That seems most likely to me but I've been trying various combinations and formats of return values and nothing seems to work. Any help is greatly appreciated.
Edit: Among other things, to debug this I've made sure that cost and grad are returning what I expect, which they are (cost: float, grad: 1-D ndarray). Running both on an initial theta array of zeros gives me the same values as I get in Octave (which I know to be correct thanks to the provided code for the exercises). However, giving these values to the minimize function does not seem to be minimizing the theta values as expected.

If anyone stumbles across this and happens to have the same problem, I figured out that in my sigmoid function I should have had
return 1/(1 + np.exp(-x))
but had
return 1/(1 + np.exp(x))
After fixing that, the minimize function converged normally.

Related

Improve performance of autograd jacobian

I'm wondering how the following code could be faster. At the moment, it seems unreasonably slow, and I suspect I may be using the autograd API wrong. The output I expect is each element of timeline evaluated at the jacobian of f, which I do get, but it takes a long time:
import numpy as np
from autograd import jacobian
def f(params):
mu_, log_sigma_ = params
Z = timeline * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]))
I would expect the following:
jacobian(f) returns an function that represents the gradient vector w.r.t. the parameters.
jacobian(f)(np.array([1.0, 1.0])) is the Jacobian evaluated at the point (1, 1). To me, this should be like a vectorized numpy function, so it should execute very fast, even for 40k length arrays. However, this is not what is happening.
Even something like the following has the same poor performance:
import numpy as np
from autograd import jacobian
def f(params, t):
mu_, log_sigma_ = params
Z = t * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]), timeline)
From https://github.com/HIPS/autograd/issues/439 I gathered that there is an undocumented function autograd.make_jvp which calculates the jacobian with a fast forward mode.
The link states:
Given a function f, vectors x and v in the domain of f, make_jvp(f)(x)(v) computes both f(x) and the Jacobian of f evaluated at x, right multiplied by the vector v.
To get the full Jacobian of f you just need to write a loop to evaluate make_jvp(f)(x)(v) for each v in the standard basis of f's domain. Our reverse mode Jacobian operator works in the same way.
From your example:
import autograd.numpy as np
from autograd import make_jvp
def f(params):
mu_, log_sigma_ = params
Z = timeline * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = make_jvp(f)(np.array([1.0, 1.0]))
# loop through each basis
# [1, 0] evaluates (f(0), first column of jacobian)
# [0, 1] evaluates (f(0), second column of jacobian)
for basis in (np.array([1, 0]), np.array([0, 1])):
val_of_f, col_of_jacobian = gradient_at_mle(basis)
print(col_of_jacobian)
Output:
[ 1. 1.00247506 1.00495012 ... 99.99504988 99.99752494
100. ]
[ -1. -1.00247506 -1.00495012 ... -99.99504988 -99.99752494
-100. ]
This runs in ~ 0.005 seconds on google collab.
Edit:
Functions like cdf aren't defined for the regular jvp yet but you can use another undocumented function make_jvp_reversemode where it is defined. Usage is similar except that the output is only the column and not the value of the function:
import autograd.numpy as np
from autograd.scipy.stats.norm import cdf
from autograd.differential_operators import make_jvp_reversemode
def f(params):
mu_, log_sigma_ = params
Z = timeline * cdf(mu_ / log_sigma_)
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = make_jvp_reversemode(f)(np.array([1.0, 1.0]))
# loop through each basis
# [1, 0] evaluates first column of jacobian
# [0, 1] evaluates second column of jacobian
for basis in (np.array([1, 0]), np.array([0, 1])):
col_of_jacobian = gradient_at_mle(basis)
print(col_of_jacobian)
Output:
[0.05399097 0.0541246 0.05425823 ... 5.39882939 5.39896302 5.39909665]
[-0.05399097 -0.0541246 -0.05425823 ... -5.39882939 -5.39896302 -5.39909665]
Note that make_jvp_reversemode will be slightly faster than make_jvp by a constant factor due to it's use of caching.

machine learning and optimizing scipy

I have coding machine learning and for optimizing my cost function i used scipy.optimize.minimum for it and scipy doesn't return right answer.so what should i do?
code:
data1 = pd.read_csv('ex2data1.txt', header = None, names =
['exam1','exam2', 'y'])
data1['ones'] = pd.Series(np.ones(100), dtype = int)
data1 = data1[['ones', 'exam1', 'exam2', 'y']]
X = np.matrix(data1.iloc[:, 0:3])
y = np.matrix(data1.iloc[:, 3:])
def gFunction(z):
return sc.special.expit(-z)
def hFunction(theta, X):
theta = np.matrix(theta).T
h = np.matrix(gFunction(X.dot(theta)))
return h
def costFunction(theta, X, y):
m = y.size
h = hFunction(theta, X).T
j = (-1 / m) * (np.dot(np.log(h), y) + np.dot(np.log(1-h), (1-y)))
return j
def gradientDescent(theta, X, y):
theta = np.matrix(theta)
m = y.size
h = hFunction(theta, X)
gradient = (1 / m) * X.T.dot(h - y)
return gradient.flatten()
initial_theta = np.zeros(X.shape[1])
cost = costFunction(initial_theta, X, y)
grad = gradientDescent(initial_theta, X, y)
print('Cost: \n', cost)
print('Grad: \n', grad)
Cost:
[[ 0.69314718]]
Grad:
[[ -0.1 -12.00921659 -11.26284221]]
def optimizer(costFunction, theta, X, y, gradientDescent):
optimum = sc.optimize.minimize(costFunction, theta, args = (X, y),
method = None, jac = gradientDescent, options={'maxiter':400})
return optimum
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 0.693147
Iterations: 0
Function evaluations: 106
Gradient evaluations: 94
Out[46]:
fun: matrix([[ 0.69314718]])
hess_inv: array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
jac: matrix([[ -0.1 , -12.00921659, -11.26284221]])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 106
nit: 0
njev: 94
status: 2
success: False
x: array([ 0., 0., 0.])
this is the message that says success False
i have done everything right i don't know what's happen
It's hard to debug something like this when:
code is not reproducible because of external data
question does not even try to explain what is optimized here
There are some strange design-decisions:
use of np.matrix -> do use np.array!
don't call the jacobian gradientDescent
And then in regards to your observation:
Iterations: 0
Function evaluations: 106
Gradient evaluations: 94
zero iterations while doing so many function-evaluations is a very bad sign. Something is very broken. Probably line-search is going crazy above, but that's just a guess.
Now what's broken?:
your jacobian is definitely broken!
i did not check the math, but:
your jacobian-shape is dependent on the number of samples when number of variables is fixed -> no ! that does not make sense!
Steps to do:
run with jac=False
If working: your cost-fuc looks ok
If not working: your trouble probably (no proof) starts even there
repair the jacobian!
check the jacobian against check_grad
I wonder why you don't get any shape errors here. I do, when trying to mimic your input shapes and playing around with sample-size!

Keras custom RMSLE metric

How do I implement this metric in Keras? My code below gives the wrong result!
Note that I'm undoing a previous log(x + 1) transformation via exp(x) - 1, also negative predictions are clipped to 0:
def rmsle_cust(y_true, y_pred):
first_log = K.clip(K.exp(y_pred) - 1.0, 0, None)
second_log = K.clip(K.exp(y_true) - 1.0, 0, None)
return K.sqrt(K.mean(K.square(K.log(first_log + 1.) - K.log(second_log + 1.)), axis=-1)
For comparison, here's the standard numpy implementation:
def rmsle_cust_py(y, y_pred, **kwargs):
# undo 1 + log
y = np.exp(y) - 1
y_pred = np.exp(y_pred) - 1
y_pred[y_pred < 0] = 0.0
to_sum = [(math.log(y_pred[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
return (sum(to_sum) * (1.0/len(y))) ** 0.5
What I'm doing wrong? Thanks!
EDIT: Setting axis=0 seems to give a value very close to the correct one, but I'm not sure since all the code I've seem uses axis=-1.
I ran into the same problem and searched for it, here is what I found
https://www.kaggle.com/jpopham91/rmlse-vectorized
After modified a bit, this seems to work for me,rmsle_K method implemented with Keras and TensorFlow.
import numpy as np
import math
from keras import backend as K
import tensorflow as tf
def rmsle(y, y0):
assert len(y) == len(y0)
return np.sqrt(np.mean(np.power(np.log1p(y)-np.log1p(y0), 2)))
def rmsle_loop(y, y0):
assert len(y) == len(y0)
terms_to_sum = [(math.log(y0[i] + 1) - math.log(y[i] + 1)) ** 2.0 for i,pred in enumerate(y0)]
return (sum(terms_to_sum) * (1.0/len(y))) ** 0.5
def rmsle_K(y, y0):
return K.sqrt(K.mean(K.square(tf.log1p(y) - tf.log1p(y0))))
r = rmsle(y=[5, 20, 12], y0=[8, 16, 12])
r1 = rmsle_loop(y=[5, 20, 12], y0=[8, 16, 12])
r2 = rmsle_K(y=[5., 20., 12.], y0=[8., 16., 12.])
print(r)
print(r1)
sess = tf.Session()
print(sess.run(r2))
Result:
Using TensorFlow backend
0.263978210565
0.263978210565
0.263978
By the use of a list (to_sum) in the numpy implementation, I suspect your numpy array has shape (length,).
And on Keras, since you've got different results with axis=0 and axis=1, you probably got some shape like (length,1).
Also, when creating the to_sum list, you're using y[i] and y_pred[i], which means you're taking elements from the axis=0 in numpy implementation.
The numpy implementation also sums everything for calculating the mean in sum(to_sum). So, you really don't need to use any axis in the K.mean.
If you make sure your model's output shape is either (length,) or (length,1), you can use just K.mean(value) without passing the axis parameter.

nonlinear optimization with vectors, scalars and inequality constraints

I have set of equation in form: Y=aA+bB
where Y-is know vector of floats (only this one is known!); a, b are unkown scalar (float) and A, B are unknown vectors of floats. Each equation have it own Y, a, b, whereas all equation share the same unknow vectors A and B.
I have set of such equation so my problem is to minimize function:
(Y-aA-bB)+(Y'-a'A-b'B)+....
I have also many inequality constrains of type: Ai>Aj (Ai i-th element of vector A), Bi>= Bk, Bi>0, a>a', ...
Is there any software or library (ideally for python) which can handle this problem?
General remarks
This is a linear problem (at least in the linear least-squares sense, continue reading)!
It's also incompletely specified as it's not clear if there should be always a feasible solution in your case or if you want to minimize some given loss in general. Your text sounds like the latter, but in this case one has to chose the loss (which makes a difference in regards to possible algorithms). Let's take the euclidean-norm (probably the best pick here)!
Ignoring constraints for a moment, we can view this problem as basic least-squares solution to a linear matrix equation problem (euclidean-norm vs. squared euclidean-norm does not make a difference!).
min || b - Ax ||^2
Here:
M = number of Y's
N = size of Y
b = (Y0,
Y1,
...) -> shape: M*N (flattened: Y_x = (y_x_0, y_x_1).T)
A = ((a0, 0, 0, ..., b0, 0, 0, ...),
(0, a0, 0, ..., 0, b0, 0, ...),
(0, 0, a0, ..., 0, 0, b0, ...),
...
(a1, 0, 0, ..., b1, 0, 0, ...)) -> shape: (M*N, N*2)
x = (A0, A1, A2, ... B0, B1, B2, ...) -> shape: N*2 (one for A, one for B)
What you should do
If unconstrained:
Convert to standard-form and use numpy's lstsq
If constrained:
Either use customized optimization algorithms, or:
Linear-programming (if minimizing absolute-differences / l1-norm)
I'm too lazy to formulate it for scipy's linprog
Not that hard, but l1-norm is non-trivial using scipy's API
Much easier to formulate with cvxpy (obj=cvxpy.norm(X, 1))
Quadratic-programming / Second-order-cone-programming (if minimizing euclidean norm / l2-norm)
Again, too lazy to formuate it; no special solver available at scipy yet
Could be easily formulated with cvxpy (obj=cvxpy.norm(X, 2))
Emergency: use general-purpose constrained nonlinear-optimization algorithms like SLSQP -> see code
Some hacky code (not the best approach!)
This code:
Is just a demo!
Uses general nonlinear optimization algorithms from scipy
Therefore:
easier to formulate
Less fast & robust than LP, QP, SOCP
But will achieve approximately the same result as convergence on convex optimization problems is guaranteed
Uses automatic-differentiation whenever needed
(author too lazy to add gradients)
this can really hurt if performance is important
Is really ugly in terms of np.repeat vs. broadcasting!
Code:
import numpy as np
from scipy.optimize import minimize
np.random.seed(1)
""" Fake-problem (usually the job of the question-author!) """
def get_partial(N=10):
Y = np.random.uniform(size=N)
a, b = np.random.uniform(size=2)
return Y, a, b
""" Optimization """
def optimize(list_partials, N, M):
""" General approach:
This is a linear system of equations (with constraints)
Basic (unconstrained) form: min || b - Ax ||^2
"""
Y_all = np.vstack(map(lambda x: x[0], list_partials)).ravel() # flat 1d
a_all = np.hstack(map(lambda x: np.repeat(x[1], N), list_partials)) # repeat to be of same shape
b_all = np.hstack(map(lambda x: np.repeat(x[2], N), list_partials)) # """
def func(x):
A = x[:N]
B = x[N:]
return np.linalg.norm(Y_all - a_all * np.repeat(A, M) - b_all * np.repeat(B, M))
""" Example constraints: A >= B element-wise """
cons = ({'type': 'ineq',
'fun' : lambda x: x[:N] - x[N:]})
res = minimize(func, np.zeros(N*2), constraints=cons, method='SLSQP', options={'disp': True})
print(res)
print(Y_all - a_all * np.repeat(res.x[:N], M) - b_all * np.repeat(res.x[N:], M))
""" Test """
M = 4
N = 3
list_partials = [get_partial(N) for i in range(M)]
optimize(list_partials, N, M)
Output:
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.9019356096498999
Iterations: 12
Function evaluations: 96
Gradient evaluations: 12
fun: 0.9019356096498999
jac: array([ 1.03786588e-04, 4.84041870e-04, 2.08129734e-01,
1.57609582e-04, 2.87599862e-04, -2.07959406e-01])
message: 'Optimization terminated successfully.'
nfev: 96
nit: 12
njev: 12
status: 0
success: True
x: array([ 1.82177105, 0.62803449, 0.63815278, -1.16960281, 0.03147683,
0.63815278])
[ 3.78873785e-02 3.41189867e-01 -3.79020251e-01 -2.79338679e-04
-7.98836875e-02 7.94168282e-02 -1.33155595e-01 1.32869391e-01
-3.73398306e-01 4.54460178e-01 2.01297470e-01 3.42682496e-01]
I did not check the result! If there is an error it's an implementation-error, not a conceptional one (my opinion)!
I agree with sascha that this is a linear problem. As I do not like constrains very much, I prefer, actually, to make it a non-linear without constrains. I do so by setting the vector A=(a1**2, a1**2+a2**2, a1**2+a2**2+a3**2, ...) like this it is ensured that it is all positive and A_i > A_j for i>j. That makes errors a bit problematic, as you now have to consider error propagation to get A1, A2, etc. including correlation, but I will have an important point on that at the end. The "simple" solution would look as follows:
import numpy as np
from scipy.optimize import leastsq
from random import random
np.set_printoptions(linewidth=190)
def generate_random_vector(n, sortIt=True):
out=np.fromiter( (random() for x in range(n) ),np.float)
if sortIt:
out.sort()
return out
def residuals(parameters,dataVec,dataLength,vecDims):
aParams=parameters[:dataLength]
bParams=parameters[dataLength:2*dataLength]
AParams=parameters[-2*vecDims:-vecDims]
BParams=parameters[-vecDims:]
YList=dataVec
AVec=[a**2 for a in AParams]##assures A_i > 0
BVec=[b**2 for b in BParams]
AAVec=np.cumsum(AVec)##assures A_i>A_j for i>j
BBVec=np.cumsum(BVec)
dist=[ np.array(Y)-a*np.array(AAVec)-b*np.array(BBVec) for Y,a,b in zip(YList,aParams,bParams) ]
dist=np.ravel(dist)
return dist
if __name__=="__main__":
aList=generate_random_vector(20, sortIt=False)
bList=generate_random_vector(20, sortIt=False)
AVec=generate_random_vector(5)
BVec=generate_random_vector(5)
YList=[a*AVec+b*BVec for a,b in zip(aList,bList)]
aGuess=20*[.2]
bGuess=20*[.3]
AGuess=5*[.4]
BGuess=5*[.5]
bestFitValues, covMX, infoDict, messages ,ier = leastsq(residuals, aGuess+bGuess+AGuess+BGuess ,args=(YList,20,5) ,full_output=True)
print "a"
print aList
besta = bestFitValues[:20]
print besta
print "b"
print bList
bestb = bestFitValues[20:40]
print bestb
print "A"
print AVec
bestA = bestFitValues[-2*5:-5]
realBestA = np.cumsum([x**2 for x in bestA])
print realBestA
print "B"
print BVec
bestB = bestFitValues[-5:]
realBestB = np.cumsum([x**2 for x in bestB])
print realBestB
print covMX
The problem on errors and correlation is that the solution to the problem is not unique. If Y = a A + b B is a solution and we, e.g., rotate such that A = c E + s F and B = -s E + c F then also Y = (ac-bs) E + (as+bc) F =e E + f F is a solution. The parameter space is, hence, completely flat at "the solution" resulting in huge errors and apocalyptic correlations.

defining a fuction to be used in solving equation from a data file

I am completely new to python and in fact any fundamental programming language, I use Mathematica for my all my symbolic and numeric calculations. I am learning to work with python and finding it really awesome! Here is a problem I am trying to solve but stuck without a clue!
I have a data file for example
0. 1.
0.01 0.9998000066665778
0.02 0.9992001066609779
... ..
Which just the {t, Cos[2t]}.
I want to define a function out of this data and use it in solving an equation in python. My Mathematica intuition tells me that I should define the function like:
iFunc[x_] = Interpolation[iData, x]
and rest of the job is easy. for instance
NDSolve[{y''[x] + iFunc[x] y[x] == 0, y[0] == 1, y[1] == 0}, y, {x, 0, 1}]
Solves the equation easily. (I have not tried with more complicated cases though).
Now how to do the job in python and also accuracy is an important issue for me. So, now I would like to ask two questions.
1. Is this the most accurate method in Mathematica?
2. And what is the equivalent of more accurate way to do the problem in python?
Here is my humble attempt to solve the problem (with a lot of input from StackOverflow) where the definition with cos(2t) works:
from scipy.integrate import odeint
import numpy as np
import matplotlib.pyplot as plt
from math import cos
from scipy import interpolate
data = np.genfromtxt('cos2t.dat')
T = data[:,0] #first column
phi = data[:,1] #second column
f = interpolate.interp1d(T, phi)
tmin = 0.0# There should be a better way to define from the data
dt = 0.01
tmax = 2*np.pi
t = np.arange(tmin, tmax, dt)
phinew = f(t) # use interpolation function returned by `interp1d`
"""
def fun(z, t):
x, y = z
return np.array([y, -(cos(2*t))*x ])
"""
def fun(z, t):
x, y = z
return np.array([y, -(phinew(t))*x ])
sol1 = odeint(fun, [1, 0], t)[..., 0]
# for checking the plots
plt.plot(t, sol1, label='sol')
plt.show()
*When I run the code with interpolated function from cos(2t) data, is not working...the error message tell
Traceback (most recent call last): File "testde.py", line 30,
in <module> sol1 = odeint(fun, [1, 0], t)[..., 0]
File "/home/archimedes/anaconda3/lib/python3.6/site-packages/scip‌​y/integrate/odepack.‌​py",
line 215, in odeint ixpr, mxstep, mxhnil, mxordn, mxords)
File "testde.py",
line 28, in fun return np.array([y, -(phinew(t))*x ])
TypeError: 'numpy.ndarray' object is not callable.
I really can't decipher them. Please help...
In Mathematica, the usual way is simply
iFunc = Interpolation[iData]
Interpolation[iData] already returns a function.
To sub-question 2
With
t = np.arange(tmin, tmax, dt)
phinew = f(t) # use interpolation function returned by `interp1d`
equivalent to
phinew = np.array([ f(s) for s in t])
you construct phinew not as callable function but as array of values, closing the circle array to interpolation function to array. Use f which is a scalar function directly in the derivatives function,
def fun(z, t):
x, y = z
return np.array([y, -f(t)*x ])

Categories

Resources