I have some functional, such as S[f] = \int_\Omega f^2(x) dx. If you're familiar with physics, it's the action. This object takes in a function defined on a certain domain \Omega and gives you a number. The math jargon for this is functional.
Now I need to minimize this thing with respect to f. I know SciPy has an optimize package that allows one to minimize multivariable functions, but I am curious if there is a better way considering if I used this I would be minimizing over ~10,000 variables (because the functions are essentially just lists of 10,000 numbers).
Do I have any other options?
You could use symbolic regression to find the function.
There are several packages available:
deap
glyph
gplearn
monkeys
Here is a good paper on symbolic regression by Schmidt and Lipson.
Although it is more designed for doing Neural Network stuff, Tensorflow sounds like it would work for you. It has the ability to differentiate vector equations and also optimize them using gradient descent.
Related
I am solving a problem of minimizing a function using the BFGS-optimizer available in in Scipy from https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.
In certain cases I would like to perform just a single optimization step with my Scipy optimizer. I would think that this should be easy, but I cannot find any way to do it based on the documentation available in the link. There is an option 'maxiter', which I have tried to set to 1. But this seems to be number of internal evaluations of the BFGS algorithm before it returns the new function value and hence not the number of function evaluations. Does anyone have an idea about how to solve my problem?
Kind regards
I am interested in finding optimized parameters of a model (by minimizing the model's output with the known value). The parameters I am interested in finding have bounds and they are also constrained by an inequality that looks like 1 - sum(x_par) >= 0, where x_par is a list of some of the parameters out of the total parameter list. I have used scipy.optimize.minimize to minimize this problem with different methods (such as COBYLA and SLSQP), but the fitting performance by this function is quite poor and the error is generally above 50%.
I have noticed that scipy.optimize.curve_fit and scipy.optimize.differential_evolution work very well in terms of fitting the given values, but these functions do not allow constraints on parameters. I am looking for an alternative in python to optimize my problem that allows constraining parameters and can do a better job in fitting the given curve/values than scipy.optimize.minimize.
You might find lmfit useful. This module is a wrapper around many of the scipy.optimized routines (including leastsq, differential_evolution, most of the scaler minimizers) that replaces all variables with Parameter objects that can be fixed or free, have bounds applied, or be constrained as mathematical expressions of other Parameters, all independent of the method used to solve the minimization problem. There is also a Model class to support many curve fitting problems, and support for improved analysis of confidence intervals for parameters.
With some care, inequality constraints can be applied, as is discussed briefly at
http://lmfit.github.io/lmfit-py/constraints.html#using-inequality-constraints .
I have a Stochastic Optimal Control problem that I wish to solve, using some type of Bayesian Simulation based framework. My problem has the following general structure:
s_t+1 = r*s_t(1 - s_t) - x_t+1 + epsilon_t+1
x_t+1 ~ Beta(u_t+1, w_t+1)
u_t+1 = f_1(u_t,w_t, s_t, x_t)
w_t+1 = f_2(u_t,w_t, s_t, x_t)
epsilon_t ~ Normal(0,sigma)
objective function: max_{x_t} E(Sigma_{t=0}^{T} V(s_t,x_t,c) * rho^t)
My goal is to explore different functional forms of f_1, f_2, and V to determine how this model differs w.r.t a non-stochastic model and another simpler stochastic model.
State variables are s_t, control variables are x_t with u_t and w_t representing some belief of the current state. The objective function is the discounted maximum from gains (function V) over the time period t=0 to t=T.
I was thinking of using Python, specifically PyMC to solve this, though I am not sure how to proceed, specifically how to optimize the control variables. I found a book, published 1967, Optimization of Stochastic Systems by Masanao Aoki, that references some bayesian techniques that may be useful, is there a current Python implementation that may help? Or is there a much better way to simulate a optimal path, using Python?
The first guess coming to my mind is to try neural network packages like chainer or theano which can track derivative of your cost function with respect to control function parameters; they also have a bunch of optimization plug-in routines. You can use numpy.random to generate samples (particles), compose your control functions from the libraries components, and run them through explicit Euler scheme for first try. This will give you cost function on your particles and its derivative with respect to parameters, which can be fed to the optimizers.
The issue that can arise here is that solver's iterations will create a host of derivative-tracking objects.
update: Please see this example on Github
Also there is a number of hits on Github with keywords particle filter python:
https://github.com/strohel/PyBayes
https://github.com/jerkern/pyParticleEst
Also there is a manuscript around which mentions that the author implemented filters in Python, so you might want to contact them.
Is there a way to have an x,y pair dataset given to a function that will return a list of curve fit models and the coeff. The program DataFit does this with about 200 different models, but we are looking for a pythonic way. From exponential to inverse polynomial etc.
I have seen many posts of manually using scipy to type each model, but this is not feasible for the number of models we want to test.
The closest I found was pyeq2, but this is not returning the list of functions, and seems to be a rabbit hole to code for.
If R has this available, we could use that but python is really the goal
Below is an example of the data, we want to find the best way to describe this curve
You can try library splines in R. I have used this for higher order curve fitting to some univariate data. You can try to change and achieve similar thing with corresponding R^2 errors.
You can either decide to do the following:
Choose a model to fit a parameters. This model should be based on a single independent variable. This can be done by python's scipy.optimize curve_fit function. You can choose something like a hyberbola.
Choose a model that is complex and likely represents an underlying mechanism of something at work. Like the system of ODE's from a disease SIR model. Fitting the parameters will be no easy task. This will be done by Markov Chain Monte Carlo (MCMC) methods. This is VERY difficult.
Realise that you have data and can use machine learning via scikit learn to predict from your data. This is a method that doesn't require parameters.
Machine learning and neural networks don't fit something and can't really tell you about the underlying mechanism but can make predicitions just as a best fit model would...dare I say even better.
In the end, we found that Eureqa software was able to achieve this. https://www.nutonian.com/products/eureqa/
I'm a beginner to using statsmodels & I'm also open to using other Python based methods of solving my problem:
I have a data set with ~ 85 features some of which are highly correlated.
When I run the OLS method I get a helpful 'strong multicollinearity problems' warning as I might expect.
I've previously run this data through Weka, which as part of the regression classifier has an eliminateColinearAttributes option.
How can I do the same thing - get the model to chose which attributes to use instead of having them all in the model?
Thanks!
To run multivariate regression use scipy.stats.linregress. Check out this nice example which has a good explanation.
The eliminateColinearAttributes option in the software you've mentioned is just some algorithm implemented in this software to fight the problem. Here, you need to implement some iterative algorithm yourself based on elimination of one of highly correlated variables with the highest p-value (then run regression again and repeat until multicollinearity is not there).
There's no one and only way here, there are different techniques. It is also a good practice to choose manually from the set of highly correlated with each other set of variables which to omit that it also makes sense.