I'm using Scipy's fmin search to compute the log of the likelihood of a distribution's fit to some data. I'm using fmin to search for the parameters that maximize the log likelihood, like so:
j = fmin(lambda p:-sum(log(likelihood_calculator(data, p))), array([1.5]), full_output=True)
(likelihood_calculator takes data and a parameter and spits out an array of likelihood values for each data point.)
If we start that search with a parameter that yields a likelihood of 0, the loglikelihood is -inf, so the -sum is inf. fmin should run away from the initial parameter, but instead it sticks on that value for the maximum number of calls, then returns it:
In [268]: print j
(array([ 1.5]), inf, 67, 200, 1)
I thought this was perhaps a problem with fmin's handling of infs, but if we remove the likelihood calculator and just hand a 0 directly, we get better behavior:
In [269]: i = fmin(lambda p: -sum(log(p)), array([0]), full_output=1)
Warning: Maximum number of function evaluations has been exceeded.
In [270]: i
Out[270]: (array([ 3.16912650e+26]), -61.020668415892501, 100, 200, 1)
This same correct behavior happens if we use an array of zeros, if those zeros are floats, or if we use fmin_bfgs. The same incorrect behavior with the function call continues if we use fmin_bfgs, but fmin works CORRECTLY if we start with a parameter that doesn't yield a 0 likelihood (and thus any infs).
Thoughts? Thanks!
Update:
If there's a broad area of parameters that result in zeros, we can push the parameter value up to the edge. If the parameter is near enough the edge, fmin will get out of zeroland and start searching.
Ex. p<1 = Inf, then at p=.99 fmin will work, but not at p=.95
Maybe your update answers the question. Since fmin uses a downhill gradient algorithm, it searches in a neighborhood of the initial guess for the direction of steepest descent. If you are deep enough into a parameter-region where the function always returns inf then the algorithm can not see which direction to go.
Related
I have this set of experimental data:
x_data = np.array([0, 2, 5, 10, 15, 30, 60, 120])
y_data = np.array([1.00, 0.71, 0.41, 0.31, 0.29, 0.36, 0.26, 0.35])
t = np.linspace(min(x_data), max(x_data), 151)
scatter plot
I want to fit them with a curve that follows an exponential behaviour for t < t_lim and a linear behaviour for t > t_lim, where t_lim is a value that i can set as i want. I want to use curve_fit to find the best fit. I would like to find the best fit meeting these two conditions:
The end point of the first behaviour (exponential) must be the starting point of the second behaviour (linear): in other words, I don't want the jump discontinuity in the middle.
I would like the second behaviour (linear) to be descending.
I solved in this way:
t_lim = 15
def y(t, k, m, q):
return np.concatenate((np.exp(-k*t)[t<t_lim], (m*t + q)[t>=t_lim]))
popt, pcov = curve_fit(y, x_data, y_data, p0=[0.5, -0.005, 0.005])
y_model = y(t, k_opt, m_opt, q_opt)
I obtain this kind of curve:
chart_plot
I don't know how to tell python to find the best values of m, k, q that meet the two conditions (no jump discontinuity, and m < 0)
Instead of trying to add these conditions as explicit constraints, I'd go about modifying the form of y so that these conditions are always satisfied.
For example, try replacing m with -m**2. That way, the coefficient in the linear part will always be negative.
For the continuity condition, how about this: For an exponential with a given decay factor and a linear curve with a given slope which are supposed to meet at a given t_lim there's only exactly one value for q that will satisfy that condition. You can explicitly compute that value and just plug that in.
Basically, q won't be a fit parameter anymore; instead, inside of y, you'd compute the correct q value based on k, m, t_lim.
This post is not a direct answer to the question. This is a preliminary study.
First : Fitting to a simple exponential function with only a constant (without decreasing or increasing linear part) :
The result is not bad considering the wide scatter on the right part.
Second : Fitting to an exponential function with a linear function (without taking account of the expected decreasing on the right).
The slope of the linear part is very low : 0.000361
But the slope is positive which is not as wanted.
Since the scatter is very large one suspects that the slope of the linear function might be governed mainly by the scatter. In order to check this hypothesis one make the same fitting calculus whitout one point. Taking only the seven first points (that is forgetting the eighth point) the result is :
Now the slope is negative as wanted. But this is an untruthful result.
Of course if some technical reason implies that the slope is necessarily negative one could use a picewise function made of an exponenlial and a linear function. But what is the credibility of such a model ?
This doesn't answer to the question. Neverthelss I hope that this inspection will be of interest.
For information :
The usual nonlinear regression methods are often non convergent in case of large scatter due to the difficulty to set initial values of the parameters sufficienly close to the unknown correct values. In order to avoid the difficulty the above fittings where made with a non usual method which doesn't requires "guessed" initial value. For the principle refer to : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
In the referenced document the case of the function exponential and linear isn't fully treated. In order to overcome this deficiency the method is shown below with the numerical calculus (MathsCAD).
If more accuracy is needed use a nonlinear regression software with the values of p,a,b,c found above as initial values to start the iterative calculus.
I am trying to apply a simple optimization by using gradient descent. In particular, I want to calulate the vector of parameters (Theta) that minimize the cost function (Mean Squared Error).
The gradient descent function looks like this:
eta = 0.1 # learning rate
n_iterations = 1000
m = 100
theta = np.random.randn(2,1) # random initialization
for iteration in range(n_iterations):
gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y) #this is the partial derivate of the cost function
theta = theta - eta * gradients
Where X_b and y are respectively the input matrix and the target vector.
Now, if I take a look at my final theta, it is always equal to [[nan],
[nan]], while it should be equal to [[85.4575313 ],
[ 0.11802224]] (obtained by using both np.linalg and ScikitLearn LinearRegression).
In order to get a numeric result, I have to reduce the learning rate to 0.00001 and the number of iterations to 500. By appling these changes, the results are far away from the real theta.
My data, both X_b and y, are scaled using a StandardScaler.
If I try to print out theta at each iteration, I get the following (these are only few results):
...
[[2.09755838e+297]
[7.26731496e+299]]
[[-3.54990719e+300]
[-1.22992017e+303]]
[[6.00786188e+303]
[ inf]]
[[-inf]
[ nan]]
...
How to solve the problem? Is it because of the function dominium?
Thanks
I've found an error in the code. For the benefit of all the readers, the error was generated by the feature scaling part that isn't reported in the code above.
The initial theta (randomly assigned) had a completely different scale comparing to the dataset and this led to the impossibility to find valid parameters for the regression.
So by using the correct scaled inputs and targets, the function does its job and converges to the values that I know are correct, as reported in my question.
As Kuedsha suggested, I tried to apply a learning schedule in order to reduce the learning rate at each iteration, even if it is not necessary in this specific case. It works, but of course it takes more iterations to converge. I think that potentially this could be a useful thing to do in a random gradient descent algorithm.
Thanks for your support
In my personal experience, this is probably due to the learning rate you are using. If your result goes to infinity this might be because you are using a too big learning rate. Also, be sure to decrease the leaning rate (eta in your code) in each iteration as this will make sure that your solution converges. I am not sure about what would be the optimal way to do it for your particular problem but you could try something like:
eta=initial_eta/(iteration+1)
or
eta=initial_eta/sqrt(iteration+1)
Edit: in fact, as you can see in your results, the value for your parameter goes from negative to positive in each iteration and always increasing in modulus.
I think this is because when you calculate the gradient in the first iteration eta*gradient is so large that is goes to negative value which is higher in modulus. Then, in the second iteration the gradient is even greater and eta*gradient is therefore also greater which gives you a positive number which is also greater in modulus. This continious until you get infinity.
This is the reason why you normally have to be careful when tuning the value for the learning rate and decrease it with the iterations.
I wanted to solve a function being close to 0.
I tried using the newton function in the Scipy package, but the tolerance seems to apply to the input and not the function of the input:
from scipy.optimise import newton
fn = lambda x: x*x-60
res = newton(fn, 0, tol=0.1, maxiter=10000)
print(res)
print(fn(res))
res is close to 0, and fn(res) is about -60.
It looks like newton() stopped because it found two x values which are bounding the solution and within the tolerance.
Is that correct that the tolerance is on x and not fn(x)?
That seems very counterintuitive to me.
SciPy reference on scipy.optimize.newton: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.newton.html
Yes, the tolerance refers to the function argument x of the function f(x).
See here: zeros.py
The code checks if the step from p0 to p is small enough (they call the function argument p and not x in their code), and if yes, the algorithm stops.
You're in fact using the secant method and not Newton's method because you did not provide the derivative of your function.
Use newton(func=lambda x: x*x-60, fprime=lambda x:2*x, x0=0, tol=0.1, maxiter=10000) instead.
When you run the above code you get an error. That is because you're starting from a local minimum of your function where the tangent line is parallel to the x-axis.
In each iteration Newton's method jumps to the position where the tangent line and the x-axis intersect.
This is of course not possible if x-axis and tangent line are parallel. Try starting e.g. from x0=0.1 to get the correct output.
The plot below shows your quadratic function and the tangent lines at x=0 and x=2. It easy to see that starting from x0=2 works because x-axis and tangent will intersect at some point.
Background: A ship is berthed to a jetty using 24 mooring lines and 4 fenders. These mooring lines needs to be pre-tensioned to a design value by experienced engineers. Pre-tensioning is done by setting the appropriate length of each mooring line. Static simulation is done to obtain tension on the lines and compression on the fenders. This is an iterative process as small change in mooring line lenght may cause significant variation in the tension.
Problem Description:
An objective function is set up to take mooring line lenghs as an input array and return the sum of absolute differences between target and achieved pretension values.
Now, I am using the scipy.optimize.minimize function with following options:
target_wire_lenghts = {'Line1': (48.0, 49.0),'Line2': (48.0, 49.0),'Line3': (45.0,46.0),
'Line4': (10.0,11.0),'Line5': (8.0,9.0),'Line6': (7.0,8.0),
'Line7': (46.0,47.0),'Line8': (48.0,49.0),'Line9': (50.0,51.0),
'Line10': (33.0,34.0),'Line11': (31.0,32.0),'Line12': (29.0,30.0),
'Line13': (32.0,33.0),'Line14': (34.0,35.0),'Line15': (36.0,37.0),
'Line16': (48.0,49.0),'Line17': (46.0,47.0),'Line18': (45.0,46.0),
'Line19': (8.0,9.0), 'Line20': (8.0,9.0), 'Line21': (9.0,10.0),
'Line22': (44.0,45.0),'Line23': (45.0,46.0), 'Line24': (46.0,48.0)}
# Bounds
bounds = list(target_wire_lenghts.values())
# Initial guess
x0 = [np.mean([min, max], axis=0) for min,max in bounds]
# Options
options = {'ftol' : 0.1,
'xtol' : 0.1,
'gtol' : 0.1,
'maxiter' : 100,
'accuracy' : 0.1}
result = minimize(objfn, x0, method = 'TNC', bounds = bounds, options = options)
print(result)
However, the optimizer is not varying the input array. The results are the same as initial input array x0(See the length column below). I tried playing around with the optional tolerance parameters of the 'TNC' solver, but do not see any improvement. Also, notice that eventhough I have set the maxiter = 100, the iteration went to 130.
Please suggest what mistake am I making while calling the minimize function.
EDIT: I figured the optimization was running, but changing the variables by 0.000001 at a time. The option parameter eps (Step size used for numerical approximation of the jacobian.) when set to 0.01, the optimization looked working. Unfortunately, it still was not able to reach a reasonable solution. I tried doing an unbounded optimization, with initial guess x0 being very close to the answer (which I found by manually altering each variable), and then the optimizer was able to give a better solution than my manual one.
So the question now is how to do a 24 variable optimization quickly with bad initial guess? Could multi objective optimization be the answer, where reaching each line pre-tension is an objective?
I`m using scipy.optimize.curve_fit for fitting a sigmoidal curve to data. I need to bound one of parameters from [-3, 0.5] and [0.5, 3.0]
I tried fit curve without bounds, and next if parameter is lower than zero, I fit once more with bounds [-3, 0.5] and in contrary with[0.5, 3.0]
Is it possible, to bound function curve_fit with two intervals?
No, least_squares (hence curve_fit) only supports box constraints.
There is a crude way to do this, and that is to have your function return very large values if the parameter is outside the multiple bounds. For example:
sigmoid_func(x, parameters):
if parameter outside multiple bounds:
return 1.0E10 * len(x) # very large number
else:
return sigmoid value
This has the effect of yielding very large errors if the parameter is outside of your multiple bounds. If you have single bound range of [upper, lower] tou should not use this method, since the most recent version of scipy already supports the more common single bound range type of problem.