Background: A ship is berthed to a jetty using 24 mooring lines and 4 fenders. These mooring lines needs to be pre-tensioned to a design value by experienced engineers. Pre-tensioning is done by setting the appropriate length of each mooring line. Static simulation is done to obtain tension on the lines and compression on the fenders. This is an iterative process as small change in mooring line lenght may cause significant variation in the tension.
Problem Description:
An objective function is set up to take mooring line lenghs as an input array and return the sum of absolute differences between target and achieved pretension values.
Now, I am using the scipy.optimize.minimize function with following options:
target_wire_lenghts = {'Line1': (48.0, 49.0),'Line2': (48.0, 49.0),'Line3': (45.0,46.0),
'Line4': (10.0,11.0),'Line5': (8.0,9.0),'Line6': (7.0,8.0),
'Line7': (46.0,47.0),'Line8': (48.0,49.0),'Line9': (50.0,51.0),
'Line10': (33.0,34.0),'Line11': (31.0,32.0),'Line12': (29.0,30.0),
'Line13': (32.0,33.0),'Line14': (34.0,35.0),'Line15': (36.0,37.0),
'Line16': (48.0,49.0),'Line17': (46.0,47.0),'Line18': (45.0,46.0),
'Line19': (8.0,9.0), 'Line20': (8.0,9.0), 'Line21': (9.0,10.0),
'Line22': (44.0,45.0),'Line23': (45.0,46.0), 'Line24': (46.0,48.0)}
# Bounds
bounds = list(target_wire_lenghts.values())
# Initial guess
x0 = [np.mean([min, max], axis=0) for min,max in bounds]
# Options
options = {'ftol' : 0.1,
'xtol' : 0.1,
'gtol' : 0.1,
'maxiter' : 100,
'accuracy' : 0.1}
result = minimize(objfn, x0, method = 'TNC', bounds = bounds, options = options)
print(result)
However, the optimizer is not varying the input array. The results are the same as initial input array x0(See the length column below). I tried playing around with the optional tolerance parameters of the 'TNC' solver, but do not see any improvement. Also, notice that eventhough I have set the maxiter = 100, the iteration went to 130.
Please suggest what mistake am I making while calling the minimize function.
EDIT: I figured the optimization was running, but changing the variables by 0.000001 at a time. The option parameter eps (Step size used for numerical approximation of the jacobian.) when set to 0.01, the optimization looked working. Unfortunately, it still was not able to reach a reasonable solution. I tried doing an unbounded optimization, with initial guess x0 being very close to the answer (which I found by manually altering each variable), and then the optimizer was able to give a better solution than my manual one.
So the question now is how to do a 24 variable optimization quickly with bad initial guess? Could multi objective optimization be the answer, where reaching each line pre-tension is an objective?
Related
I have this set of experimental data:
x_data = np.array([0, 2, 5, 10, 15, 30, 60, 120])
y_data = np.array([1.00, 0.71, 0.41, 0.31, 0.29, 0.36, 0.26, 0.35])
t = np.linspace(min(x_data), max(x_data), 151)
scatter plot
I want to fit them with a curve that follows an exponential behaviour for t < t_lim and a linear behaviour for t > t_lim, where t_lim is a value that i can set as i want. I want to use curve_fit to find the best fit. I would like to find the best fit meeting these two conditions:
The end point of the first behaviour (exponential) must be the starting point of the second behaviour (linear): in other words, I don't want the jump discontinuity in the middle.
I would like the second behaviour (linear) to be descending.
I solved in this way:
t_lim = 15
def y(t, k, m, q):
return np.concatenate((np.exp(-k*t)[t<t_lim], (m*t + q)[t>=t_lim]))
popt, pcov = curve_fit(y, x_data, y_data, p0=[0.5, -0.005, 0.005])
y_model = y(t, k_opt, m_opt, q_opt)
I obtain this kind of curve:
chart_plot
I don't know how to tell python to find the best values of m, k, q that meet the two conditions (no jump discontinuity, and m < 0)
Instead of trying to add these conditions as explicit constraints, I'd go about modifying the form of y so that these conditions are always satisfied.
For example, try replacing m with -m**2. That way, the coefficient in the linear part will always be negative.
For the continuity condition, how about this: For an exponential with a given decay factor and a linear curve with a given slope which are supposed to meet at a given t_lim there's only exactly one value for q that will satisfy that condition. You can explicitly compute that value and just plug that in.
Basically, q won't be a fit parameter anymore; instead, inside of y, you'd compute the correct q value based on k, m, t_lim.
This post is not a direct answer to the question. This is a preliminary study.
First : Fitting to a simple exponential function with only a constant (without decreasing or increasing linear part) :
The result is not bad considering the wide scatter on the right part.
Second : Fitting to an exponential function with a linear function (without taking account of the expected decreasing on the right).
The slope of the linear part is very low : 0.000361
But the slope is positive which is not as wanted.
Since the scatter is very large one suspects that the slope of the linear function might be governed mainly by the scatter. In order to check this hypothesis one make the same fitting calculus whitout one point. Taking only the seven first points (that is forgetting the eighth point) the result is :
Now the slope is negative as wanted. But this is an untruthful result.
Of course if some technical reason implies that the slope is necessarily negative one could use a picewise function made of an exponenlial and a linear function. But what is the credibility of such a model ?
This doesn't answer to the question. Neverthelss I hope that this inspection will be of interest.
For information :
The usual nonlinear regression methods are often non convergent in case of large scatter due to the difficulty to set initial values of the parameters sufficienly close to the unknown correct values. In order to avoid the difficulty the above fittings where made with a non usual method which doesn't requires "guessed" initial value. For the principle refer to : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
In the referenced document the case of the function exponential and linear isn't fully treated. In order to overcome this deficiency the method is shown below with the numerical calculus (MathsCAD).
If more accuracy is needed use a nonlinear regression software with the values of p,a,b,c found above as initial values to start the iterative calculus.
I'm using the scipy.optimize "minimize" function to solve a nonlinear optimization problem.
optimizationOutput = minimize(objective, x0, args=(n), method='SLSQP', bounds = bnds, constraints = cons, tol = 1, options={'disp': True ,'eps' : 1e5, "maxiter": 2000})
My input (x0) is an array containing 20 floats.
The first 19 inputs are initially estimated to be 10000, bounded by (0, None).
The last input is initially estimated to be 65, bounded by (60,70).
My problem has to do with the last input. The optimization algorithm seems to exclusively choose it's upper and lower bound, occasionally choosing values like 60.00001460018602. I know that this has to do with the 'eps' parameter of the minimize() function, but am wondering if there is a way to have a separate step size for this last input, as the first 19 inputs rely on the larger step size in order to find a global solution in a timely manner?
I am trying to apply a simple optimization by using gradient descent. In particular, I want to calulate the vector of parameters (Theta) that minimize the cost function (Mean Squared Error).
The gradient descent function looks like this:
eta = 0.1 # learning rate
n_iterations = 1000
m = 100
theta = np.random.randn(2,1) # random initialization
for iteration in range(n_iterations):
gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y) #this is the partial derivate of the cost function
theta = theta - eta * gradients
Where X_b and y are respectively the input matrix and the target vector.
Now, if I take a look at my final theta, it is always equal to [[nan],
[nan]], while it should be equal to [[85.4575313 ],
[ 0.11802224]] (obtained by using both np.linalg and ScikitLearn LinearRegression).
In order to get a numeric result, I have to reduce the learning rate to 0.00001 and the number of iterations to 500. By appling these changes, the results are far away from the real theta.
My data, both X_b and y, are scaled using a StandardScaler.
If I try to print out theta at each iteration, I get the following (these are only few results):
...
[[2.09755838e+297]
[7.26731496e+299]]
[[-3.54990719e+300]
[-1.22992017e+303]]
[[6.00786188e+303]
[ inf]]
[[-inf]
[ nan]]
...
How to solve the problem? Is it because of the function dominium?
Thanks
I've found an error in the code. For the benefit of all the readers, the error was generated by the feature scaling part that isn't reported in the code above.
The initial theta (randomly assigned) had a completely different scale comparing to the dataset and this led to the impossibility to find valid parameters for the regression.
So by using the correct scaled inputs and targets, the function does its job and converges to the values that I know are correct, as reported in my question.
As Kuedsha suggested, I tried to apply a learning schedule in order to reduce the learning rate at each iteration, even if it is not necessary in this specific case. It works, but of course it takes more iterations to converge. I think that potentially this could be a useful thing to do in a random gradient descent algorithm.
Thanks for your support
In my personal experience, this is probably due to the learning rate you are using. If your result goes to infinity this might be because you are using a too big learning rate. Also, be sure to decrease the leaning rate (eta in your code) in each iteration as this will make sure that your solution converges. I am not sure about what would be the optimal way to do it for your particular problem but you could try something like:
eta=initial_eta/(iteration+1)
or
eta=initial_eta/sqrt(iteration+1)
Edit: in fact, as you can see in your results, the value for your parameter goes from negative to positive in each iteration and always increasing in modulus.
I think this is because when you calculate the gradient in the first iteration eta*gradient is so large that is goes to negative value which is higher in modulus. Then, in the second iteration the gradient is even greater and eta*gradient is therefore also greater which gives you a positive number which is also greater in modulus. This continious until you get infinity.
This is the reason why you normally have to be careful when tuning the value for the learning rate and decrease it with the iterations.
I'm trying to calculate the maximum log-likelihood (MLE) for the following probability density function (PDF):
I'm computing it by minimising the objective function (negative log-likelihood) without relying on any predefined log-likelihood python built-in modules whatsoever. The code is:
# Alpha Distribution (PDF)
def AD(z, *params):
a, scale = z
diameters = params
return -np.sum(np.log((((diameters)/(a**2) * np.exp(-diameters/a))) / scale))
# load data
currpath = ('path')
os.chdir(currpath)
diameters = scipy.io.loadmat('data.mat')["m1"]
# minimise
x0 = [1,1] # initial guesses
res = optimize.minimize(AD, x0, args = diameters, method='Nelder-Mead',
tol=1e-6)
print(res.x)
My data vector (here already sorted) comprises a number of diameters in the following form (0.19, 0.19, 0.19, 0.2, 0.21, 0.21, 0.22, 0.22, 0.22, 0.25, 0.27 ...).
First question: Since I'm fairly new to the topic of MLE, is the form of my data vector correct? I'm not completely sure whether I use a data vector containing every observed diameter (like shown above), or a data vector which only contains the "possible" diameters (which would be: 0.19, 0.2, 0.21, 0.22, 0.25, 0.27 ...), or just the frequencies of the observed diameters (which would be: 3, 1, 2, 3, 1, 1 ...). I think the first option is the right one, but I just wanted to be completely sure.
Second question: If I wish to use a cumulative distribution function (CDF) instead of a PDF to perform my MLE on, I would have to change my PDF function to a CDF, right? I was just wondering if I could alternatively somehow modify my data vector and still use the PDF.
However, for the minimisation in python (if I understood it correctly) I had to rethink the definition of my variables. That means, normally I would assume that the parameters of my PDF (here "a" and "scale") are the variables which should be passed to "args" in "optimize.minimize". However, in the documentation it is stated, that args should contain the "constant" parameters, therefore I used my data vector as a constant "parameter vector" for the minimisation.
Third question: Is this assumption an error in reasoning?
Fourth question: Is the optimisation method "Nelder-Mead" appropriate? I'm not really familiar with optimisation methods and not sure which of the options I should use/is the best.
Finally, the program returns an error "TypeError: bad operand type for unary -: 'tuple'", where I have no clue how to deal with it, since I'm not passing any tuples to the minimisation function ...
Fifth question: Where does the tuple come from and how can I solve this error?
I'd appreciate any help you could give me very much!
Best regards!
PS: Since this post is kind of a mixture between general math and programming, I wasn't completely sure if this is the right place to put the question. Sorry if I'm mistaken!
First, apart from the first part (before the multiplication operator), we are discussing what is generally called maximum likelihood estimation (MLE) for the exponential distribution. It has just been reparameterised in terms of something called a.
We want to estimate this single parameter based on a sample of diameters; there is no scale parameter. Under MLE, we pretend that the sample is fixed and treat the parameter as something that can be varied. We form the likelihood of the sample by taking the product of the density functions (not the cdfs) where each density function is to be calculated for one element of the sample.
(Likelihood is, in concept, like throwing a die twice. In ultra ugly terms, we could say that the likelihood of getting two ones in a row might be (1/6)(1/6).)
We want to maximise this likelihood. However, to make the optimisation problem mathematically and/or computationally tractable we take the function's logarithm. Since all of its constituent functions are densities, less than one, this function must be everywhere less than zero. Thus, the maximisation problem becomes one of minimisatiion.
If you want to avoid almost all of the algebra then you would:
Write a function to calculate the density function for a given diameter and parameter value.
Write another function that would accept a density function parameter value as its Python parameter, and the sample as its second. Make it call the first function once for each sample value, take the log of each of these and return the sum of these.
Call minimize with the second function as its first argument, some reasonable guess for the density function parameter, in a list, as the second argument, the sample for args. Nelder-Mead is probably ok.
Edit: In a nutshell:
diameters =[ 0.19, 0.19, 0.19, 0.2, 0.21, 0.21, 0.22, 0.22, 0.22, 0.25, 0.27]
from scipy.optimize import minimize
from math import exp, log
def pdf(d, a):
result = d*exp(-d/a)/a**2
return result
def log_L(a, diameters):
result = sum(log(pdf(d, a)) for d in diameters)
return result
res = minimize(log_L, [1], args=diameters)
print (res)
Output:
fun: -337.80985348524604
hess_inv: array([[ 8.71770021e+10]])
jac: array([ -7.62939453e-06])
message: 'Optimization terminated successfully.'
nfev: 93
nit: 30
njev: 31
status: 0
success: True
x: array([ 2157576.39996697])
Addendum:
The wikipedia article offers the following form for the pdf of the exponential.
The constant 'lambda' can be viewed as a value that scales the integral of the remainder of the expression from zero to infinity to one. We can ignore it and equate the exponents of your pdf, without the scaling factor, and the exponential. We have to remember that d takes the role of x.
Solve for 'lambda'.
We see that this is the normalising expression in your pdf. In other words, the alpha is an exponential expressed with different parameters.
Here is another approach, assuming that you're analysing data and not simply working out the details of MLE.
scipy provides means for generating samples from arbitrary distributions. Here I define just the pdf for your alpha. Your parameter a becomes p because a is used as the lower limit for the distribution support, which I define to be zero.
I draw a sample of size 100 with p set somewhat arbitrarily to 0.4. I did a little experimentation, trying to find a value that would give me a sample whose lowest 11 values would approximate those in your sample.
The scipy rv_continuous object has a method called fit that will attempt calculation of MLE estimates of location, scale and 'shape'. In this case, the value for shape, about 0.36, is not all that far from 0.4.
from scipy.stats import rv_continuous
import numpy as np
class Alpha(rv_continuous):
'alpha distribution'
def _pdf(self, x, p):
return x*np.exp(-x/p)/p**2
alpha = Alpha(a=0, shapes='p')
sample = sorted(alpha.rvs(size=100,p=0.4))
for a in sample[:12]:
print ('{:10.2f}'.format(a))
print (Alpha(a=0, shapes='p').fit(sample))
I don't believe that your sample is alpha-distributed. The values seem to be too 'uniform' compared with what I could generate. But I've been wrong before.
I would suggest plotting your sample cdf to see if you can recognise what it is.
Incidentally, when I changed the sign of the log-likelihood in the other answer the code croaked. I suspect that the alpha is just a poor fit.
0.00
0.03
0.04
0.04
0.08
0.09
0.09
0.11
0.12
0.14
0.19
0.20
(1.0902616847853124, -0.039102949269294023, 0.35922022997329517)
I'm using Scipy's fmin search to compute the log of the likelihood of a distribution's fit to some data. I'm using fmin to search for the parameters that maximize the log likelihood, like so:
j = fmin(lambda p:-sum(log(likelihood_calculator(data, p))), array([1.5]), full_output=True)
(likelihood_calculator takes data and a parameter and spits out an array of likelihood values for each data point.)
If we start that search with a parameter that yields a likelihood of 0, the loglikelihood is -inf, so the -sum is inf. fmin should run away from the initial parameter, but instead it sticks on that value for the maximum number of calls, then returns it:
In [268]: print j
(array([ 1.5]), inf, 67, 200, 1)
I thought this was perhaps a problem with fmin's handling of infs, but if we remove the likelihood calculator and just hand a 0 directly, we get better behavior:
In [269]: i = fmin(lambda p: -sum(log(p)), array([0]), full_output=1)
Warning: Maximum number of function evaluations has been exceeded.
In [270]: i
Out[270]: (array([ 3.16912650e+26]), -61.020668415892501, 100, 200, 1)
This same correct behavior happens if we use an array of zeros, if those zeros are floats, or if we use fmin_bfgs. The same incorrect behavior with the function call continues if we use fmin_bfgs, but fmin works CORRECTLY if we start with a parameter that doesn't yield a 0 likelihood (and thus any infs).
Thoughts? Thanks!
Update:
If there's a broad area of parameters that result in zeros, we can push the parameter value up to the edge. If the parameter is near enough the edge, fmin will get out of zeroland and start searching.
Ex. p<1 = Inf, then at p=.99 fmin will work, but not at p=.95
Maybe your update answers the question. Since fmin uses a downhill gradient algorithm, it searches in a neighborhood of the initial guess for the direction of steepest descent. If you are deep enough into a parameter-region where the function always returns inf then the algorithm can not see which direction to go.