I am working with a complex system, the system has five variables - depending upon values of these five variables, the response of the system is measured. There are seven output variables that are measured in-order to completely define the response.
I have been using artificial neural network to model relationship between the five variables and the seven output parameters. This has been successful so far.. The ANNs can predict really well the output (I have tested the trained network on a validation set of testcases also). I used python Keras/tensor flow for the same.
BTW, I also tried the linear regression as function approximator but it produces large errors. These errors are expected considering that the system is highly non-linear and may not be continuous everywhere.
Now, I would like to predict the values of the five variables from a vector of the seven output parameters (target vector). Tried using Genetic algorithm for the same. After a lot of effort in designing the GA, I still end up getting high differences between target vector and the GA prediction. I just try to minimize the mean squared error between ANN prediction (function approximator) and target vector.
Is this the right approach to use ANN as function approximator and GA for design space exploration?
Yes, it is a good approach to do search space exploration using GA. But designing the crossover, mutation, generation evolution logic, etc. plays a major role in the determining the performance of the Genetic algo.
If your search space is limited, you can use exact methods (which solves to optimality).
There are few implementation in python-scipy itself
If you prefer to go with meta-heuristics,
there is a wide range of options other than Genetic algorithm
Memetic algorithm
Tabu Search
Simulated annealing
Particle swarm optimization
Ant colony optimization
Related
I have a Reinforcement Learning problem where the optimal policy does not depend on the next state (ie gamma equals 0). I think this means that I only need an efficient exploration algorithm.
I know that contextual bandits are specialized for this situation, except they only work for discrete action space, and I still need my policy network to make complex decisions (train a Deep neural networks, whereas most contextual bandits algorithms I found learn linear policy decisions).
Therefore I am looking for algorithms, or ideally a python library, that solves RL for continuous action spaces when gamma=0.
Many Thanks,
I have some data (data from sensors and etc.) from an energy system. consider the x-axis is temperature and the y-axis is energy consumption. Suppose we just have data and we don't have access to the mathematical formulation of the problem:
energy consumption vs temperature curve
In the above figure, it is absolutely obvious that the optimum point is 20. I want to predict the optimum point using ML or DL models. Based on the courses that I have taken I know that it's a regression supervised learning problem, however, I don't know how can I do optimization on this kind of problem.
I don't want you to write a code for this problem. I just want you to give me some hints and instructions about doing this optimization problem.
Also if you recommend any references or courses, I will welcome them to learn how to predict the optimum point of a regression supervised learning problem without knowing the mathematical formulation of the problem.
There are lots of ways that you can try when it comes to optimizing your model, for example, fine tuning your model. What you can do with fine tuning is to try different options that a model consists of and find the smallest errors or higher accuracy based on the actual and predicted data.
Using DecisionTreeRegressor model, you can try to use different split criterion, limit the minimum number of split & depth to see which give you the best predicted scores/errors. For neural network model, using keras, you can try different optimizers, try different loss functions, tune your parameters etc. and try all out as a combination of model.
As for resources, you can go Google, Youtube, and other platform to use keywords such as "fine tuning DNN model" and a lot of resources will pop up for your reference. The bottom line is that you will need to try out different models and fine tune your model until when you are satisfied with your results. The results will be based on your judgement and there is no right or wrong answers (i.e., errors are always there), it just completely up to you on how would you like to achieve your solution with handful of ML and DL models that you got. My advice to you is to spend more time on getting your hands dirty. It will be worth it in the long run. HFGL!
I am struggling to implement the following function in python, which holds by the law of large numbers:
where ANN stands for artificial neural network.
I have created a sample from where I have several subsamples. I want to feed each subsample at a time, increaigly, to train a neural network. That implies I will have a neural network for each subsample:
ANN((X_t,N,\theta_1,1)+ANN(X_t,N,\theta_2,2)+....
And each needs to be incorporated in a sum.
However I have no idea on how to implement this, once I would need to store, not the values but the neural network itself after each computation. Is there any references on how to solve a problem of this kind? I have looked at the recurrent neural networks implemented in Python, namely the LSTM, but that does not "store" each neural network, furthemore it selects the variablles that are more meaningful across time.
Thanks in advance.
By invoking (artificial) neural networks and the Central Limit Theorem you step into quite a few concepts. Let me try to elaborate on these concepts before trying to suggest a solution.
First, the fact that
holds P-almost surely for a family of random variables X_{1},X_{2},... that are iid (independently and identically distributed) like the random variable X is called the Strong Law of Large Numbers (LLN). In contrast, the Central Limit Theorem (CLT) refers to the limiting distribution (as the name suggests) which is Gaussian. Both theorems require proper scaling, namely for the LLN
and for the CLT, respectively. Both theorems allow approximation through a finite sum of up to J summands which is what you attempt. However, equality is lost and approximate equality i.e. ≈ is appropriate. Moreover, there is no normalization in your summation which will cause the term to diverge. Note that the limits hold for certain functions being applied to X. You assume that the function ANN(X_t, N, Θ, j).
Second, the (artificial) neural network. Like any statistical model, a neural network takes in data input X, hyperparameters that determine the network architecture (e.g. depth and size of the involved layers) that might be N in your case, and a parameter vector Θ. The latter is only obtained after the model has been trained on data. In turn, I'd interpret your function
def ANN(X_t, N, Θ)
as the inference function that compiles a previously trained neural network by combining hyperparameter value N the parameter vector Θ and applies it to the current data input X_{t}. However, you don't clarify what the input j is. j and Θ_j seem to suggest a recurrent neural network (RNN). An LSTM is a special type of RNN. However, it is unclear what the inputs actually are as you leave this vague. RNNs are used on speech, text, and numeric time-series data. This is further complicated by the fact that $X_{t}$ is on the left-hand side in the expectation and on the right-hand side as the input to the neural network.
Finally, the suggested solution. If the ANNs are in fact independent and you meant to write E(Y), then your equation vaguely describes ensemble learning. There, several neural networks (of the same architecture) are trained on the same dataset and their prediction is averaged (not summed) to gain a more accurate prediction of the expectation of Y. If, on the other hand, you do describe RNNs, the equation above for E(X) vaguely describes a convergence of non-independent random variables as X_{t+1} and Θ_{t+1} depend on the previous X_t's and Θ_t's. Intuitively, you try to show that the output of an RNN converges to some numeric value when applied iteratively. Mathematically speaking, there are LLM-like results for non-iid random variables but they impose other very specific assumptions e.g. on the type of dependence.
Regarding storing neural networks. You can implement your own ANN program which is a lot of work (as it requires training and inference functions). Virtually every deep learning framework in Python allows storing/loading a parameter vector Θ which would allow you to implement your procedure regardless of what mathematical meaning you'd like to derive from it. In keras, for example, a model can be saved via
model.save(PARAMETER_PATH)
and later re-loaded via
keras.models.load_model(PARAMETER_PATH)
see the reference. Similar methods exist for PyTorch another very popular deep learning framework in Python.
I already asked this question here: Can Convolutional Neural Networks (CNN) be represented by a Mathematical formula? but I feel that I was not clear enough and also the proposed idea did not work for me.
Let's say that using my computer, I train a certain machine learning algorithm (i.e. naive bayes, decision tree, linear regression, and others). So I already have a trained model which I can give a input value and it returns the result of the prediction (i.e. 1 or 0).
Now, let's say that I still want to give an input and get a predicted output. However, at this time I would like that my input value to be, for example, multiplied by some sort of mathematical formula, weights, or matrix that represents my "trained model".
In other words, I would like that my trained model "transformed" in some sort of formula which I can give an input and get the predicted number.
The reason why I want to do this is because I wanna train a big dataset and use complex prediction model. And use this trained prediciton model in simpler hardwares such as a PIC32 microcontroler. The PIC32 Microntroler would not train the machine learning or store all inputs. Instead, the microcontroler would simple read from the system certain numbers, apply a math formula or some sort of matrix multiplication and give me the predicted output. With that, I can use "fancy" neural networks in much simpler devices that can easily operate math formulas.
If I read this properly, you want a generally continuous function in many variables to replace a CNN. The central point of a CNN existing in a world with ANNs ("normal" neural networks) is that in includes irruptive transformations: non-linearities, discontinuities, etc. that enable the CNN to develop recognitions and relationships that simple linear combinations -- such as matrix multiplication -- cannot handle.
If you want to understand this better, I recommend that you choose an introduction to Deep Learning and CNNs in whatever presentation mode fits your learning styles.
Essentially, every machine learning algorithm is a parameterized formula, with your trained model being the learned parameters that are applied to the input.
So what you're actually asking is to simplify arbitrary computations to, more or less, a matrix multiplication. I'm afraid that's mathematically impossible. If you ever do come up with a solution to this, make sure to share it - you'd instantly become famous, most likely rich, and put a hell of a lot of researchers out of business. If you can't train a matrix multiplication to get the accuracy you want from the start, what makes you think you can boil down arbitrary "complex prediction models" to such simple computations?
I have a Stochastic Optimal Control problem that I wish to solve, using some type of Bayesian Simulation based framework. My problem has the following general structure:
s_t+1 = r*s_t(1 - s_t) - x_t+1 + epsilon_t+1
x_t+1 ~ Beta(u_t+1, w_t+1)
u_t+1 = f_1(u_t,w_t, s_t, x_t)
w_t+1 = f_2(u_t,w_t, s_t, x_t)
epsilon_t ~ Normal(0,sigma)
objective function: max_{x_t} E(Sigma_{t=0}^{T} V(s_t,x_t,c) * rho^t)
My goal is to explore different functional forms of f_1, f_2, and V to determine how this model differs w.r.t a non-stochastic model and another simpler stochastic model.
State variables are s_t, control variables are x_t with u_t and w_t representing some belief of the current state. The objective function is the discounted maximum from gains (function V) over the time period t=0 to t=T.
I was thinking of using Python, specifically PyMC to solve this, though I am not sure how to proceed, specifically how to optimize the control variables. I found a book, published 1967, Optimization of Stochastic Systems by Masanao Aoki, that references some bayesian techniques that may be useful, is there a current Python implementation that may help? Or is there a much better way to simulate a optimal path, using Python?
The first guess coming to my mind is to try neural network packages like chainer or theano which can track derivative of your cost function with respect to control function parameters; they also have a bunch of optimization plug-in routines. You can use numpy.random to generate samples (particles), compose your control functions from the libraries components, and run them through explicit Euler scheme for first try. This will give you cost function on your particles and its derivative with respect to parameters, which can be fed to the optimizers.
The issue that can arise here is that solver's iterations will create a host of derivative-tracking objects.
update: Please see this example on Github
Also there is a number of hits on Github with keywords particle filter python:
https://github.com/strohel/PyBayes
https://github.com/jerkern/pyParticleEst
Also there is a manuscript around which mentions that the author implemented filters in Python, so you might want to contact them.