I am looking for help about the implementation of a logit model with statsmodel for binary variables.
Here is my code :
(I am using feature selection methods : MinimumRedundancyMaximumRelevance and RecursiveFeatureElimination available on python)
for i_mrmr in range(4,20):
for i_rfe in range(3,i_mrmr):
regressors_step1 = I am selecting the MRMR features
regressors_step2 = I am selecting features from the previous list with RFE method
for method in ['newton', 'nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg']:
logit_model = Logit(y,X.loc[:,regressors_step2])
try:
result = logit_model.fit(method=method, cov_type='HC1')
print(result.summary)
except:
result = "error"
I am using Logit from statsmodels.discrete.discrete_model.Logit.
The y variable, the target, is a binary.
All explanatory variables, the X, are binary too.
The logit model is "functionning" for the different optimization methods. That is to say, I end up with some summary to print. Nonetheless, various warnings print such as : "Maximum Likelihood optimization failed to converge."
The optimization methods presented in the statsmodel algorithm are the ones from scipy :
‘newton’ for Newton-Raphson, ‘nm’ for Nelder-Mead
‘bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
‘lbfgs’ for limited-memory BFGS with optional box constraints
‘powell’ for modified Powell’s method
‘cg’ for conjugate gradient
‘ncg’ for Newton-conjugate gradient
We can find these methods on scipy.optimize.
Here are my questions :
I did not find anywhere any argument against the use of these optimization methods for a binary set of variables. But, because of the warnings, I am asking myself if it is correct to do so. And then, what is the best method, the one which is the more appropriate in this case ?
Here : Scipy minimize: how to restrict x only to 0 and 1? it is implicitely said that a model of the kind Python MIP (Mixed-Integer Linear Programming) could be better in the binary set of variables case. In the documentation of the MIP package of python it appears that to implement this kind of model I should explicitly give a function to minimize or maximize and also I should express the constraints... (see : https://docs.python-mip.com/en/latest/quickstart.html#creating-models)
Therefore I am wondering if i need to define a logit function as the objective function ? What are the constraints I should express ? Is there any easier way to do ?
Related
I'm working with PyMC3 and unsure about how to mark certain variables as "observed". In a simple example, I could have two input variables modeled as a uniform distribution on [0, 1]. I know a third "output" random variable is equal to the product of the two inputs, and let's say that I observed that the first input is 1 and the output is 0. Then I want to use PyMC3 to predict the second input, which in this case must be 0.
It is unclear for me how to tell PyMC3 that the output is observed since it is the result of a mathematical expression and not created explicitly from a constructor.
import pymc3 as pm
with pm.Model() as model:
input1 = pm.Uniform('RV1', lower=0, upper=1, observed=1) # API is clear how to mark it observed
input2 = pm.Uniform('RV2', lower=0, upper=1) # This one is not observed
output = input1 * input2 # How to tell PyMC3 the observed value of "output"?
# Now I will do variational inference, sampling, etc... on the model
The random variables are technically boolean random variables but I need to model them as continuous in order to do variational inference. And I have a lot of them, this is a minimum example. Setting output.observed = 0 doesn't seem to work, although it doesn't crash.
When doing a deterministic (i.e. closed form) mathematical operation with the variables, you can use pymc3.Deterministic('your_deterministic_var', input1 * input2) probability distribution. As far as I know, this does not allow the observed keyword as you requested, although after the sampling is done, (pm.sample()) you can run pm.sample_posterior_predictive(chain, var_names=['your_deterministic_var']) and take means of the variable entries, or whatever you need.
For more information on the Deterministic check out the documentation and also the PyMC3 Discourse forum which is a better place for PyMC3 questions than stackoverflow anyway.
I try to calculate the constants of a known mathematical model for my measured (and already filtered) data.
A little bit of pseudo code:
#my data is saved in y_data and x_data
#my model is a function of constants (a,b,c) and the x_data
model = f(x_data, a, b, c)
#set model equal to data
y_data != model
calculate(a, b, c)
Is there any way to find the constants? I know there will be no exact result...but is it possible for a deviation (e.g. 5%) of y_data?
An alternative would be to calculate the constants for each index. Then have len(x_data) equations and somehow find the best fitting model and its constants.
I tried to simplify my issue, furthermore this is my first question, so let me know when I could to something better.
Thanks in advance!
If your model is linear, the simplest methods would be least squares (linear regression). There is a nice tutorial with examples here.
Since it appears your model is non-linear (and assuming you can't or don't want to find an analytical solution), you might want to try numerical optimization.
Scipy has a function called minimize which you could use to try to find the best solution. But it is not guaranteed to find the global minimum. So you may have to try different initial conditions.
Your code might look something like this:
from scipy.optimize import minimize
# Initial guess
x0 = [1.3, 0.7, 0.8]
res = minimize(cost_function, x0, tol=1e-5)
print(res.x)
The trick is, you need to define the function cost_function that you give to the solver, first.
It's common to use quadratic errors (sum of squared-errors or mean-squared error). Something like this:
def cost_function(x):
a, b, c = x
model_predictions = f(x_data, a, b, c)
return sum((model_predictions - y_data)**2)
You might also have to try different solvers which are built into scipy.optimize.minimize. Refer to the documentation on the pros and cons of each method.
Maybe get familiar with the examples in the Scipy documentation first, then try to set it up on your actual problem.
I am using a scipy.minimize function, where I'd like to have one parameter only searching for options with two decimals.
def cost(parameters,input,target):
from sklearn.metrics import mean_squared_error
output = self.model(parameters = parameters,input = input)
cost = mean_squared_error(target.flatten(), output.flatten())
return cost
parameters = [1, 1] # initial parameters
res = minimize(fun=cost, x0=parameters,args=(input,target)
model_parameters = res.x
Here self.model is a function that performs some matrix manipulation based on the parameters. Input and target are two matrices. The function works the way I want to, except I would like to have parameter[1] to have a constraint. Ideally I'd just like to give an numpy array, like np.arange(0,10,0.01). Is this possible?
In general this is very hard to do as smoothness is one of the core-assumptions of those optimizers.
Problems where some variables are discrete and some are not are hard and usually tackled either by mixed-integer optimization (working good for MI-linear-programming, quite okay for MI-convex-programming although there are less good solvers) or global-optimization (usually derivative-free).
Depending on your task-details, i recommend decomposing the problem:
outer-loop for np.arange(0,10,0.01)-like fixing of variable
inner-loop for optimizing, where this variable is fixed
return the model with the best objective (with status=success)
This will effect in N inner-optimizations, where N=state-space of your to fix-var.
Depending on your task/data, it might be a good idea to traverse the fixing-space monotonically (like using np's arange) and use the solution of iteration i as initial-point for the problem i+1 (potentially less iterations needed if guess is good). But this is probably not relevant here, see next part.
If you really got 2 parameters, like indicated, this decomposition leads to an inner-problem with only 1 variable. Then, don't use minimize, use minimize_scalar (faster and more robust; does not need an initial-point).
I'm using a certain StatsModels distribution (Azzalini's Skew Student-t) and I'd like to perform a (one-sample) Kolmogorov-Smirnov test with it.
Is it possible to use Scipy's kstest with a StatsModels distribution? Scipy's documentation (rather vaguely) suggests that the cdf argument may be a String or a callable, with no further details or examples about the latter.
On the other hand, the StatsModels' distribution I'm using has many of the methods that Scipy distributions do; thus, I'm supposing there is some way of using it as a callable argument passed to kstest. Am I wrong?
Here is what I have so far. What I'd like to achieve is commented out in the last line:
import statsmodels.sandbox.distributions.extras as azt
import scipy.stats as stats
x = ([-0.2833379 , -3.05224565, 0.13236267, -0.24549146, -1.75106484,
0.95375723, 0.28628686, 0. , -3.82529261, -0.26714159,
1.07142857, 2.56183746, -1.89491817, -0.3414301 , 1.11589663,
-0.74540174, -0.60470106, -1.93307821, 1.56093656, 1.28078818])
# This is how kstest works.
print stats.kstest(x, stats.norm.cdf) #(0.21003262911224113, 0.29814145956367311)
# This is Statsmodels' distribution I'm using. It has a cdf function as well.
ast = azt.ACSkewT_gen()
# This is what I'd want. Executing this will throw a TypeError because ast.cdf
# needs some shape parameters etc.
# print stats.kstest(x, ast.cdf)
Note: I'll happily use two-sample KS test if what I'm expecting is not possible. Just wanted to know if this is possible.
Those functions have been written a long time ago with scipy compatibility in mind. But there were several changes in scipy in the meantime.
kstest has an args keyword for the distribution parameters.
To get the distribution parameters we can try to estimate them by using the fit method of the scipy.stats distributions. However, estimating all parameters prints some warnings and the estimated df parameter is large. If we fix df at specific values we get estimates without warnings that we can use in the call of kstest.
>>> ast.fit(x)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\integrate\quadpack.py:352: IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
warnings.warn(msg, IntegrationWarning)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\integrate\quadpack.py:352: IntegrationWarning: The integral is probably divergent, or slowly convergent.
warnings.warn(msg, IntegrationWarning)
(31834.800527154337, -2.3475921468088172, 1.3720725621594987, 2.2766515091760722)
>>> p = ast.fit(x, f0=100)
>>> print(stats.kstest(x, ast.cdf, args=p))
(0.13897385693057401, 0.83458552699682509)
>>> p = ast.fit(x, f0=5)
>>> print(stats.kstest(x, ast.cdf, args=p))
(0.097960232618178544, 0.990756154198281)
However, the distribution for the Kolmogorov-Smirnov test assumes that the distribution parameters are fixed and not estimated. If we estimate the parameters as above, then the p-value will not be correct since it is not based on the correct distribution.
For some distributions we can use tables for the kstest with estimated mean and scale parameter, e.g. the Lilliefors test kstest_normal in statsmodels. If we have estimated shape parameters, then the distribution of the ks test statistics will depend on the parameters of the model, and we could get the pvalue from bootstrapping.
(I don't remember anything about estimating the parameters of the SkewT distribution and whether maximum likelihood estimation has any specific problems.)
I'm using the differential_evolution algorithm in scipy to fit some data with various exponential functions convolved with gaussian functions - this in itself is not a problem, the function fits it well.
However, it is not giving the jacobian in the result dictionary (which I would like to use to calculate the errors on my fit constants), despite the fact that I have set "polish" (i.e. use scipy.optimize.minimize with the L-BFGS-B method to polish the best population member at the end) to True, and thus the documentation states it should give the jacobian. My function takes the gaussian width and any number of exponents, and is being fit like so:
result = differential_evolution(exponentialfit, bounds, args=(avgspectra, c, fitfrom, errors, numcomponents, 1), tol=0.000000000001, disp=True, polish=True)
Is there any reason it is not giving the jacobian in the result output?