I try with fmin_bfgs to find the local minimum of the absolute function abs(x). The initial point is set to 100.0; the expected answer is 0.0. However, I get:
In [184]: op.fmin_bfgs(lambda x:np.abs(x),100.0)
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 100.000000
Iterations: 0
Function evaluations: 64
Gradient evaluations: 20
Out[184]: array([100.0])
Why?
Methods like fmin_bfgs and fmin_slsqp require smooth (continuous derivative) functions in order to provide reliable results. abs(x) has a dicontinuous derivative at its minimum. A method like the Nelder-Mead simplex, which doesn't require continuous derivatives, might provide better results in this case.
Related
I have two loss functions here to be minimized:
The first one is a local one, where:
min f1(x1),
min f2(x2),
min f3(x3),....
min fn(xn)
The other one is global one, where:
min f(x1,x2,...,xn) = f1(x1)+f2(x2)+...fn(xn)
For each local problem fi(x), I have 2 variables to be optimized, and I have 1000 local problems. Correspondingly, for the global problem, I have 2000 variables to be optimized. Surely the 2nd one has more parameters to be optimized, but since f1, f2, f3...fn are independent with each other, I hope they two should be comparable.
I use the scipy minimize function for optimization (scipy.optimize.minimize). But the 2nd one much much slower than the 1st one.
The only drawback of the global one, i think, is taking more gradients than it actually need to. For example, the gradient of x1 only comes from f1, but the global computes its gradient from f2, f3... fn, which is 0. Thus, making it slower. If that is the case, I do hope there would be some ways for acceleration.
BTW, since I later on need to add a global constraint to the optimization, this is why I must use the global loss function instead of the local one.
I think your guess is correct that the amount of time that it takes more is because it needs to compute the gradients. Based on the description page for scipy.optimize.minimize (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html), it seems that the method computes the gradient numerically if you provide jac = False (optional and set to False by default).
jac : bool or callable, optional
Jacobian (gradient) of objective function. Only for CG, BFGS, Newton-CG, L-BFGS- B, TNC, SLSQP, dogleg, trust-ncg, trust-krylov, trust-region-exact. If jac is a Boolean and is True, fun is assumed to return the gradient along with the objective function. If False, the gradient will be estimated numerically. jac can also be a callable returning the gradient of the objective. In this case, it must accept the same arguments as fun.
Based on the above, you can set jac = True and then you should provide your function as a callable that returns function value as well as the gradients. This should speed up the process.
One other way is to write your own customizable minimizer as callable.
So far I understand the minimize function with method Trust-ncg, the "method specific" parameter "max_trust_radius" is the maximum value for a new step optimization.
However, I experience a weird behaviour.
I work in my doctorate data and I have a code that invokes minimize function (with trust ncg method)
passing parameters
{
'initial_trust_radius':0.1,
'max_trust_radius':1,
'eta':0.15,
'gtol':1e-5,
'disp': True
}
I invoke minimize function as:
res = minimize(bbox, x0, method='trust-ncg',jac=bbox_der, hess=bbox_hess,options=opt_par)
where
bbox is a function to evaluate the objective function
x0 is the initial guess
bbox_der is the gradient function
bbox_hess hessian function
opt_par is the dictionary above with the parameters.
Bbox invokes simulation code and get the data. It works: minimize go back and forth, proposing new values, bbox invokes simulation.
Everything works well until I got a weird issue.
The "x" vector contains 8 values. I realize that one of the iterations, the last value is greater than 1.
Per the max_trust_radius, I think that it should be less than 1, but it is 1.0621612802208713e+00
The issue causes problems because bbox can not receive the value greater than 1, as it invokes a simulation program and there is a constraint that it can not receive 1 or greater than 1.
I found the scipy code and tried to see if I could be able to find a bug or something wrong but I am not.
My main concerns are:
My understanding is that there is a bug in the scipy minimize code as the new value is greater than max_trust_radius .
How can I manipulate or control the values to avoid that values became greater than 1?
Do you suggest something to investigate the issue?
The max_trust_radius controls how large steps you are allowed to take:
max_trust_radius : float
Maximum value of the trust-region radius.
No steps that are longer than this value will be proposed.
Since you are very likely to take many steps during the minimization, each which can be up to 1 long, it is not strange at all that you (assuming ||x0||=0) end up with ||x|| > 1.
If your problem is strictly bounded then you need to apply an optimization algorithm that supports bounds on the parameters.
For scipy.optimize.minimize only L-BFGS-B, TNC and SLSQP methods seem to support the bounds= keyword.
I'm currently trying to solve numerically a minimization problem and I tried to use the optimization library available in SciPy.
My function and derivative are a bit too complicated to be presented here, but they are based on the following functions, the minimization of which do not work either:
def func(x):
return np.log(1 + np.abs(x))
def grad(x):
return np.sign(x) / (1.0 + np.abs(x))
When calling the fmin_bfgs function (and initializing the descent method to x=10), I get the following message:
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 2.397895
Iterations: 0
Function evaluations: 24
Gradient evaluations: 22
and the output is equal to 10 (i.e. initial point). I suppose that this error may be caused by two problems:
The objective function is not convex: however I checked with other non-convex functions and the method gave me the right result.
The objective function is "very flat" when far from the minimum because of the log.
Are my suppositions true? Or does the problem come from anything else?
Whatever the error can be, what can I do to correct this? In particular, is there any other available minimization method that I could use?
Thanks in advance.
abs(x) is always somewhat dangerous as it is non-differentiable. Most solvers expect problems to be smooth. Note that we can drop the log from your objective function and then drop the 1, so we are left with minimizing abs(x). Often this can be done better by the following.
Instead of min abs(x) use
min t
-t <= x <= t
Of course this requires a solver that can solve (linearly) constrained NLPs.
I'm using a certain StatsModels distribution (Azzalini's Skew Student-t) and I'd like to perform a (one-sample) Kolmogorov-Smirnov test with it.
Is it possible to use Scipy's kstest with a StatsModels distribution? Scipy's documentation (rather vaguely) suggests that the cdf argument may be a String or a callable, with no further details or examples about the latter.
On the other hand, the StatsModels' distribution I'm using has many of the methods that Scipy distributions do; thus, I'm supposing there is some way of using it as a callable argument passed to kstest. Am I wrong?
Here is what I have so far. What I'd like to achieve is commented out in the last line:
import statsmodels.sandbox.distributions.extras as azt
import scipy.stats as stats
x = ([-0.2833379 , -3.05224565, 0.13236267, -0.24549146, -1.75106484,
0.95375723, 0.28628686, 0. , -3.82529261, -0.26714159,
1.07142857, 2.56183746, -1.89491817, -0.3414301 , 1.11589663,
-0.74540174, -0.60470106, -1.93307821, 1.56093656, 1.28078818])
# This is how kstest works.
print stats.kstest(x, stats.norm.cdf) #(0.21003262911224113, 0.29814145956367311)
# This is Statsmodels' distribution I'm using. It has a cdf function as well.
ast = azt.ACSkewT_gen()
# This is what I'd want. Executing this will throw a TypeError because ast.cdf
# needs some shape parameters etc.
# print stats.kstest(x, ast.cdf)
Note: I'll happily use two-sample KS test if what I'm expecting is not possible. Just wanted to know if this is possible.
Those functions have been written a long time ago with scipy compatibility in mind. But there were several changes in scipy in the meantime.
kstest has an args keyword for the distribution parameters.
To get the distribution parameters we can try to estimate them by using the fit method of the scipy.stats distributions. However, estimating all parameters prints some warnings and the estimated df parameter is large. If we fix df at specific values we get estimates without warnings that we can use in the call of kstest.
>>> ast.fit(x)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\integrate\quadpack.py:352: IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
warnings.warn(msg, IntegrationWarning)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\integrate\quadpack.py:352: IntegrationWarning: The integral is probably divergent, or slowly convergent.
warnings.warn(msg, IntegrationWarning)
(31834.800527154337, -2.3475921468088172, 1.3720725621594987, 2.2766515091760722)
>>> p = ast.fit(x, f0=100)
>>> print(stats.kstest(x, ast.cdf, args=p))
(0.13897385693057401, 0.83458552699682509)
>>> p = ast.fit(x, f0=5)
>>> print(stats.kstest(x, ast.cdf, args=p))
(0.097960232618178544, 0.990756154198281)
However, the distribution for the Kolmogorov-Smirnov test assumes that the distribution parameters are fixed and not estimated. If we estimate the parameters as above, then the p-value will not be correct since it is not based on the correct distribution.
For some distributions we can use tables for the kstest with estimated mean and scale parameter, e.g. the Lilliefors test kstest_normal in statsmodels. If we have estimated shape parameters, then the distribution of the ks test statistics will depend on the parameters of the model, and we could get the pvalue from bootstrapping.
(I don't remember anything about estimating the parameters of the SkewT distribution and whether maximum likelihood estimation has any specific problems.)
I'm using the differential_evolution algorithm in scipy to fit some data with various exponential functions convolved with gaussian functions - this in itself is not a problem, the function fits it well.
However, it is not giving the jacobian in the result dictionary (which I would like to use to calculate the errors on my fit constants), despite the fact that I have set "polish" (i.e. use scipy.optimize.minimize with the L-BFGS-B method to polish the best population member at the end) to True, and thus the documentation states it should give the jacobian. My function takes the gaussian width and any number of exponents, and is being fit like so:
result = differential_evolution(exponentialfit, bounds, args=(avgspectra, c, fitfrom, errors, numcomponents, 1), tol=0.000000000001, disp=True, polish=True)
Is there any reason it is not giving the jacobian in the result output?