Bayesian Stochastic Optimal Control, MCMC

Bayesian Stochastic Optimal Control, MCMC - python

I have a Stochastic Optimal Control problem that I wish to solve, using some type of Bayesian Simulation based framework. My problem has the following general structure:
s_t+1 = r*s_t(1 - s_t) - x_t+1 + epsilon_t+1
x_t+1 ~ Beta(u_t+1, w_t+1)
u_t+1 = f_1(u_t,w_t, s_t, x_t)
w_t+1 = f_2(u_t,w_t, s_t, x_t)
epsilon_t ~ Normal(0,sigma)
objective function: max_{x_t} E(Sigma_{t=0}^{T} V(s_t,x_t,c) * rho^t)
My goal is to explore different functional forms of f_1, f_2, and V to determine how this model differs w.r.t a non-stochastic model and another simpler stochastic model.
State variables are s_t, control variables are x_t with u_t and w_t representing some belief of the current state. The objective function is the discounted maximum from gains (function V) over the time period t=0 to t=T.
I was thinking of using Python, specifically PyMC to solve this, though I am not sure how to proceed, specifically how to optimize the control variables. I found a book, published 1967, Optimization of Stochastic Systems by Masanao Aoki, that references some bayesian techniques that may be useful, is there a current Python implementation that may help? Or is there a much better way to simulate a optimal path, using Python?

The first guess coming to my mind is to try neural network packages like chainer or theano which can track derivative of your cost function with respect to control function parameters; they also have a bunch of optimization plug-in routines. You can use numpy.random to generate samples (particles), compose your control functions from the libraries components, and run them through explicit Euler scheme for first try. This will give you cost function on your particles and its derivative with respect to parameters, which can be fed to the optimizers.
The issue that can arise here is that solver's iterations will create a host of derivative-tracking objects.
update: Please see this example on Github
Also there is a number of hits on Github with keywords particle filter python:
https://github.com/strohel/PyBayes
https://github.com/jerkern/pyParticleEst
Also there is a manuscript around which mentions that the author implemented filters in Python, so you might want to contact them.

Related

Tensorflow MCMC doesn't evolve chain states

I'm fairly new to tensorflow and MCMC in general. I'm doing a few basic calculations with different models, the most basic model converges without problem and gives good results from the MCMC calculation. However, when I use a more advanced model, I have a problem where the chain states are never evolved from the initial state.
I'm calling the sampler via this code:
nkernel = tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=_tf_lnlike,
num_leapfrog_steps=5,
step_size=0.1)
adapt_kernel = tfp.mcmc.SimpleStepSizeAdaptation(
inner_kernel=nkernel,
num_adaptation_steps=num_burnin_steps,
target_accept_prob=0.75)
chains_states = tfp.mcmc.sample_chain(
num_results=nresults,
num_burnin_steps=num_burnin_steps,
current_state=initial_state,
kernel=adapt_kernel,
trace_fn=None)
The likelihood function looks like this:
#tf.function
def _tf_lnlike(theta):
y0 = tf.tensordot(tf.ones(theta.shape[0], dtype=dtype), data, axes=0)
y0_err = tf.tensordot(tf.ones(theta.shape[0], dtype=dtype), data_err, axes=0)
y_model = _tf_model(theta)
return tf.math.reduce_sum(-0.5*((y_model-y0)/y0_err)**2, axis=1)
where _tf_model is a rather complex function (so I won't post it here). This is essentially trying to fit some input data (which are tf.constant). The first thing I checked was the gradients, which had inf or nan values from _tf_model. The simplest way I thought to solve that was to write a very simple numerical gradient function into the likelihiood function since the model is not analytically differentiable. _tf_lnlike now returns some reasonable gradients but I still have the same problem with the sampler. Honestly I'm not familiar enough with tf to even diagnose why it's not working so some suggestions for troubleshooting would be appreciated!
Edit: after some playing around it seems to be related to whether or not the model function calls tf.reduce_sum at any point.

It's hard to say much without understanding what's inside _tf_model. If it has inf or nan values or gradients, that can be troublesome as you've already seen. But also if the curvature (second order derivatives) of the likelihood is very sharp, the log-likelihood can be extremely sensitive to any move so that any proposal would be rejected. Are there any constraints on theta (must be positive, etc)? If so, you may want to use TransformedTransitionKernel to enforce those.

Why the pendulum in openai gym has cos and sin feature? why not just use one of them?

Why the pendulum has cos and sin feature? Can I just use 1 of them? Or can I use theta (the angle) instead?
I expect some explanation for this XD, intuitive or theoretical ones are all welcome.

The angles(thetas) are passed through the sin() and cos() function so that the observations are in the range [-1,1]. This fixed range of [-1,1] helps in stabilising the training in the neural networks which has been explained well here.
You could even use one of the sin() or cos() as your observation. The reason(which I can think of) for using both sin() and cos() is probably to give more information about the state. Maybe using both sin() and cos() leads to a faster convergence.
But normalisation of the inputs is necessary. So, you cannot just use the angles as your state observations for training.
Edit: Answer to the comment by #CHEN TIANRONG
I ran DDPG with just sin() and theta_dot in one experiment and with sin(), cos() and theta_dot in another experiment. Clearly the agent never learns the task in the first experiment.
The usage of both sin() and cos() is experimental I guess.
You can find the code I used for the experiments here.
Improving the rate of convergence of a neural network for RL agents is an active area of research. You could search for algorithms which are sample efficient. For example: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion, Sample Efficient Actor-Critic with Experience Replay, etc.

Theoretical underpinning behind Hardmax operator

In the tensor flow Github repository, in the file attentionwrapper.py, hardmax operator has been defined. On the docs, it has been mentioned tf.contrib.seq2seq.hardmax
I want to know what's the theoretical underpinning behind providing this functionality for hardmax operator. Prima facie google searches for past few weeks haven't led me to concrete understanding of the concept.
If softmax is differentiable (soft), why would hardmax be ever used? If it can't be used in back propagation (due to non-differentiability required in gradient calculation), where else can it be used?
Reinforcement learning literature talks about Soft vs Hard attention. However I couldn't see concrete examples nor explanations of where the tf.contrib.seq2seq.hardmax can be actually used in some RL model.
By the looks of it, since it is mentioned in seq2seq, it should be obviously having some application in Natural Language Processing. But exactly where? There are tonnes of NLP tasks. Couldn't find any direct task SOTA algorithm which uses hardmax.

Hardmax is used when you have no choice but to make a decision nonprobabalistically. For example, when you are using a model to generate a neural architecture as in neural module networks, you have to make a discrete choice. To make this trainable (since this would be non-differentiable as you state), you can use REINFORCE (an algorithm in RL) to train via policy gradient and estimate this loss contribution via Monte Carlo sampling. Neural module networks are an NLP construct and depend on seq2seq. I'm sure there are many examples, but this is one that immediately came to mind.

neural network and genetic algorithm

I am working with a complex system, the system has five variables - depending upon values of these five variables, the response of the system is measured. There are seven output variables that are measured in-order to completely define the response.
I have been using artificial neural network to model relationship between the five variables and the seven output parameters. This has been successful so far.. The ANNs can predict really well the output (I have tested the trained network on a validation set of testcases also). I used python Keras/tensor flow for the same.
BTW, I also tried the linear regression as function approximator but it produces large errors. These errors are expected considering that the system is highly non-linear and may not be continuous everywhere.
Now, I would like to predict the values of the five variables from a vector of the seven output parameters (target vector). Tried using Genetic algorithm for the same. After a lot of effort in designing the GA, I still end up getting high differences between target vector and the GA prediction. I just try to minimize the mean squared error between ANN prediction (function approximator) and target vector.
Is this the right approach to use ANN as function approximator and GA for design space exploration?

Yes, it is a good approach to do search space exploration using GA. But designing the crossover, mutation, generation evolution logic, etc. plays a major role in the determining the performance of the Genetic algo.
If your search space is limited, you can use exact methods (which solves to optimality).
There are few implementation in python-scipy itself
If you prefer to go with meta-heuristics,
there is a wide range of options other than Genetic algorithm
Memetic algorithm
Tabu Search
Simulated annealing
Particle swarm optimization
Ant colony optimization

Minimizing Functional with Python

I have some functional, such as S[f] = \int_\Omega f^2(x) dx. If you're familiar with physics, it's the action. This object takes in a function defined on a certain domain \Omega and gives you a number. The math jargon for this is functional.
Now I need to minimize this thing with respect to f. I know SciPy has an optimize package that allows one to minimize multivariable functions, but I am curious if there is a better way considering if I used this I would be minimizing over ~10,000 variables (because the functions are essentially just lists of 10,000 numbers).
Do I have any other options?

You could use symbolic regression to find the function.
There are several packages available:
deap
glyph
gplearn
monkeys
Here is a good paper on symbolic regression by Schmidt and Lipson.

Although it is more designed for doing Neural Network stuff, Tensorflow sounds like it would work for you. It has the ability to differentiate vector equations and also optimize them using gradient descent.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bayesian Stochastic Optimal Control, MCMC - python

Related

Tensorflow MCMC doesn't evolve chain states

Why the pendulum in openai gym has cos and sin feature? why not just use one of them?

Theoretical underpinning behind Hardmax operator

neural network and genetic algorithm

Minimizing Functional with Python

Categories

Resources