Differential_evolution in Scipy not giving a jacobean - python

I'm using the differential_evolution algorithm in scipy to fit some data with various exponential functions convolved with gaussian functions - this in itself is not a problem, the function fits it well.
However, it is not giving the jacobian in the result dictionary (which I would like to use to calculate the errors on my fit constants), despite the fact that I have set "polish" (i.e. use scipy.optimize.minimize with the L-BFGS-B method to polish the best population member at the end) to True, and thus the documentation states it should give the jacobian. My function takes the gaussian width and any number of exponents, and is being fit like so:
result = differential_evolution(exponentialfit, bounds, args=(avgspectra, c, fitfrom, errors, numcomponents, 1), tol=0.000000000001, disp=True, polish=True)
Is there any reason it is not giving the jacobian in the result output?

Related

How can one optimize a multivariate function by differential_evolution

I would like to optimize a multivariate function by differential evolution using lambda function. Actually the parameters based on which I want to optimize the function are matrices each of which has a different dimension.
this is my code:
R0= [[(0,1)]]
Q0=[[(0,1)]]
c0= (0,1)
beta0=[[(-6,9)]]
B0=[ (-2,2),(-2,2),(-2,2),(-2,2),(-2,2),(-2,2), (-2,2),(-2,2),(-2,2),(-2,2),(-2,2),(-2,2),(-2,2),(-2,2)]
B0=np.array([B0])
B0=B0.T
kal_=lambda R,Q,c,beta,B:myfunc(R,Q,c,beta,B,Y,X)
opt=scipy.optimize.differential_evolution(kal_(R0,Q0,c0,beta0,B0), maxiter=1000000,tol=1e-6)
Python returns the following error which is due to the bounds set as the initial values:
ValueError: setting an array element with a sequence.
Can anyone let me know what is wrong with the code?

logit optimization methods for binary set of variables

I am looking for help about the implementation of a logit model with statsmodel for binary variables.
Here is my code :
(I am using feature selection methods : MinimumRedundancyMaximumRelevance and RecursiveFeatureElimination available on python)
for i_mrmr in range(4,20):
for i_rfe in range(3,i_mrmr):
regressors_step1 = I am selecting the MRMR features
regressors_step2 = I am selecting features from the previous list with RFE method
for method in ['newton', 'nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg']:
logit_model = Logit(y,X.loc[:,regressors_step2])
try:
result = logit_model.fit(method=method, cov_type='HC1')
print(result.summary)
except:
result = "error"
I am using Logit from statsmodels.discrete.discrete_model.Logit.
The y variable, the target, is a binary.
All explanatory variables, the X, are binary too.
The logit model is "functionning" for the different optimization methods. That is to say, I end up with some summary to print. Nonetheless, various warnings print such as : "Maximum Likelihood optimization failed to converge."
The optimization methods presented in the statsmodel algorithm are the ones from scipy :
‘newton’ for Newton-Raphson, ‘nm’ for Nelder-Mead
‘bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
‘lbfgs’ for limited-memory BFGS with optional box constraints
‘powell’ for modified Powell’s method
‘cg’ for conjugate gradient
‘ncg’ for Newton-conjugate gradient
We can find these methods on scipy.optimize.
Here are my questions :
I did not find anywhere any argument against the use of these optimization methods for a binary set of variables. But, because of the warnings, I am asking myself if it is correct to do so. And then, what is the best method, the one which is the more appropriate in this case ?
Here : Scipy minimize: how to restrict x only to 0 and 1? it is implicitely said that a model of the kind Python MIP (Mixed-Integer Linear Programming) could be better in the binary set of variables case. In the documentation of the MIP package of python it appears that to implement this kind of model I should explicitly give a function to minimize or maximize and also I should express the constraints... (see : https://docs.python-mip.com/en/latest/quickstart.html#creating-models)
Therefore I am wondering if i need to define a logit function as the objective function ? What are the constraints I should express ? Is there any easier way to do ?

Difference between scipy.special.boxcox and scipy.stats.boxcox

In my program, I am applying box cox transform to my data and I am interested to reverse the box-cox transformation at a certain step through my experiment. However I noticed there are two variants of boxcox:
scipy.special.boxcox
scipy.stats.boxcox
I learned that the first option has a function that reverses the box cox transform here.
However I just want to know why in scipy.special the lambda parameter cannot be None while in scipy.stats it could be. In my code I am actually using scipy.stats and the Lamda is None. Now if I want to revert to using scipy.special in order to use its reverse function, what should I set lamda to ?
Here is my current code:
elif self.output_box:
y_train, self.y_train_lambda_ = boxcox(y_train)
y_test, self.y_test_lambda_ = boxcox(y_test)
They both use the same formula for the transformation so it seems that the only difference is that with scipy.stats you can calculate the optimal lambda for the data. If you use scipy.stats.boxcox with lambda=None it returns two parameters: the transformed array and the lambda that maximizes the log-likelihood function (and if alpha is not None too it returns the confidence interval for lambda). Therefore, that’s the lambda that you have to use with the inverse transformation.

Is there a bug in scipy-0.18.1's `scipy.optimize.minimize`?

I am trying to solve for a function using specifically the dogleg method in scipy.optimize.minimize. To understand it better I am adapting an example at the bottom of the help page and using the dogleg method:
from scipy.optimize import minimize
def fun(x):
return (x[0] - 1)**2 + (x[1] - 2.5)**2
# solver
res = minimize(fun, (2, 0), method='dogleg', jac=False) # or jac=None, it doesn't matter
print(res)
I get an error ValueError: Jacobian is required for dogleg minimization.
This is similar to an older problem: "Jacobian is required for Newton-CG method" when doing a approximation to a Jacobian not being used when jac=False? which doesn't appear to be resolved.
So my question: is there really a bug in this minimize or am I not using it properly?
You must pass a Jacobian function to use the dogleg method, as it is a gradient-based optimization method. If you look at the jac argument of scipy.optimize.minimize it says
jac : bool or callable, optional
Jacobian (gradient) of objective function. Only for CG, BFGS, Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg. If jac is a Boolean and is True, fun is assumed to return the gradient along with the objective function. If False, the gradient will be estimated numerically. jac can also be a callable returning the gradient of the objective. In this case, it must accept the same arguments as fun.
Then if you look down at the Notes at the bottom of the page:
Method dogleg uses the dog-leg trust-region algorithm for unconstrained minimization. This algorithm requires the gradient and Hessian; furthermore the Hessian is required to be positive definite.
Some gradient-based methods do not require an explicit Jacobian (and/or Hessian), as they will use a differencing method to approximate them. However dogleg does need you to explicitly pass such a function.

Is it possible to use a distribution from statsmodels with scipy.stats?

I'm using a certain StatsModels distribution (Azzalini's Skew Student-t) and I'd like to perform a (one-sample) Kolmogorov-Smirnov test with it.
Is it possible to use Scipy's kstest with a StatsModels distribution? Scipy's documentation (rather vaguely) suggests that the cdf argument may be a String or a callable, with no further details or examples about the latter.
On the other hand, the StatsModels' distribution I'm using has many of the methods that Scipy distributions do; thus, I'm supposing there is some way of using it as a callable argument passed to kstest. Am I wrong?
Here is what I have so far. What I'd like to achieve is commented out in the last line:
import statsmodels.sandbox.distributions.extras as azt
import scipy.stats as stats
x = ([-0.2833379 , -3.05224565, 0.13236267, -0.24549146, -1.75106484,
0.95375723, 0.28628686, 0. , -3.82529261, -0.26714159,
1.07142857, 2.56183746, -1.89491817, -0.3414301 , 1.11589663,
-0.74540174, -0.60470106, -1.93307821, 1.56093656, 1.28078818])
# This is how kstest works.
print stats.kstest(x, stats.norm.cdf) #(0.21003262911224113, 0.29814145956367311)
# This is Statsmodels' distribution I'm using. It has a cdf function as well.
ast = azt.ACSkewT_gen()
# This is what I'd want. Executing this will throw a TypeError because ast.cdf
# needs some shape parameters etc.
# print stats.kstest(x, ast.cdf)
Note: I'll happily use two-sample KS test if what I'm expecting is not possible. Just wanted to know if this is possible.
Those functions have been written a long time ago with scipy compatibility in mind. But there were several changes in scipy in the meantime.
kstest has an args keyword for the distribution parameters.
To get the distribution parameters we can try to estimate them by using the fit method of the scipy.stats distributions. However, estimating all parameters prints some warnings and the estimated df parameter is large. If we fix df at specific values we get estimates without warnings that we can use in the call of kstest.
>>> ast.fit(x)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\integrate\quadpack.py:352: IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
warnings.warn(msg, IntegrationWarning)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\integrate\quadpack.py:352: IntegrationWarning: The integral is probably divergent, or slowly convergent.
warnings.warn(msg, IntegrationWarning)
(31834.800527154337, -2.3475921468088172, 1.3720725621594987, 2.2766515091760722)
>>> p = ast.fit(x, f0=100)
>>> print(stats.kstest(x, ast.cdf, args=p))
(0.13897385693057401, 0.83458552699682509)
>>> p = ast.fit(x, f0=5)
>>> print(stats.kstest(x, ast.cdf, args=p))
(0.097960232618178544, 0.990756154198281)
However, the distribution for the Kolmogorov-Smirnov test assumes that the distribution parameters are fixed and not estimated. If we estimate the parameters as above, then the p-value will not be correct since it is not based on the correct distribution.
For some distributions we can use tables for the kstest with estimated mean and scale parameter, e.g. the Lilliefors test kstest_normal in statsmodels. If we have estimated shape parameters, then the distribution of the ks test statistics will depend on the parameters of the model, and we could get the pvalue from bootstrapping.
(I don't remember anything about estimating the parameters of the SkewT distribution and whether maximum likelihood estimation has any specific problems.)

Categories

Resources