So I did an extreme learning machine optimization using particle swarm optimization, then I calculated the PSO fitness value using the accuracy value from the extreme learning method that I had obtained. However, when I iterate, the fitness value of each particle does not change, so it stays with the accuracy value of the extreme learning machine method. By the way i am using python.
Related
When talking about regression problems, RMSE (Root Mean Square Error) is often used as the evaluation metric. And it is also used as the loss function in linear regression (what's more? it is equivalent to the Maximum Likelihood Method considering the distribution of the output follows a normal distribution).
In real-life problems, I find the MAPE (Mean absolute percentage error) can be more meaningful. For example, when prediction house prices, we are more interested in the relative error. Because a difference of 100k$ is not the same if the house is priced around 100k$ or 1M$.
When creating a linear regression for a house price prediction problem, I found this following graph
x axis: real value of prices
y axis: relative error = (prediction-real_value) / real_value
The algorithm predicts relatively much higher prices when the real price is low
The algorithm predicts relatively lower prices when the real price is high.
What kind of transformation can we do, in order to find a better algorithm that would have more homogeneous relative errors.
Sure, one method of obtaining the maximum likelihood estimator is by gradient descent. In this process, the error between the predicted and actual values is determined, and the gradient of this error with respect to each of the changeable parameters of the model is found. Then, these parameters are tweaked slightly according to the calculated gradients such that the error value would be minimized. This process is repeated until the error converges to a suitably low value.
The great thing about this method is that your error or loss function has a lot of flexibility in how you define it. For instance, L2 (MSE) norm is often used, but you can also use L1 norm, smooth-L1 norm, or any other function.
An error function that yields MAPE would simply divide each error term by the true value's magnitude, thus yeilding error relative to value size. Then, the gradients of this error can be calculated with respect to each parameter and gradient descent can be carried out as before.
Comment if there's any part of this that is unclear or needs more explanation!
I am currently training a deep reinforcement learning model in a continuous environment using Ray.
The environment I am using was coded up in OpenAI Gym using baselines by another person who's research I am trying to replicate in Ray so that it may be parallelized. The model converges about 5% of the time in baselines after thousands of episodes though I have not been able to replicate this.
My problem is that the AI stops changing its policy very quickly and the mean reward then stays roughly the same for the rest of the run. This is independent of every parameter I've tried to vary. So far I've tried changing:
Learning Algorithm: I've used PPO, A3C, and DDPG
Learning Rate: I've done hyperparameter sweeps from 1e-6 up to 10
Episode Length: I've used complete_episodes as well as truncated_episodes in batch_mode. I've varied rollout_fragment_length and made sure train_batch_size was always equal to rollout_fragment_length times the number of workers
entropy_coeff in A3C
kl_target, kl_coeff, and sgd_minibatch_size in PPO
Network Architecture: I've tried both wide and deep networks. The default Ray network is 2x256 layers. I've tried changing it to 4x2048 and 10x64. The baselines network that succeeded was 8x64.
Changing activation function: ReLU returns NaNs but tanh lets the program run
Implementing L2 loss to prevent connection weights from exploding. This kept tanh activation from returning NaNs after a certain amount of time (the higher the learning rate the faster it got to the point it returned NaNs)
Normalization: I normalized the state that is returned to the agent from 0 to 1. Actions are not normalized because they both go from -6 to 6.
Scaling the reward: Ive had rewards in the range of 1e-5 to 1e5. This had no effect on the agent's behavior.
Changing reward functions
Making the job easier (more lenient conditions)
Note: I am running this in a docker image and have verified that this occurs on another machine so it's not a hardware issue.
Here is a list of things I believe are red flags that could be interrelated:
I was originally getting NaNs (Ray returned NaN actions) when using ReLU but switching my activation function to tanh.
The KL divergence coefficient cur_kl_coeff in PPO goes to zero almost immediately when I include L2 loss. From what I understand, this means that the model is not making meaningful weight changes after the first few iterations.
However, when the loss is just policy loss, cur_kl_coeff varies normally.
Loss doesn't substantially change on average and doesn't ever converge
Reward doesn't change substantially reward_mean always converges to roughly the same value. This occurs even when reward_mean starts above average or initially increases to above average due to random weight initialization.
Note: This is not an issue of the agent not finding a better path or a poor reward function not valuing good actions.
Zooming out shows why mean reward peaked previously. The agent performed phenomenally but was never able to repeat anywhere near its success.
And, I cannot stress this enough, this is not an issue of me not letting the agent run for long enough. I have observed this same behavior across dozens of runs and cumulatively tens of thousands of iterations. I understand that it can take models a long time to converge, but the models consistently make zero progress.
tldr; My model doesn't change it's policy in any meaningful way despite getting very high rewards on certain runs.
I know that Monte Carlo REINFORCE policy gradient algorithm is different in how it calculates the reward values by calculating discounted cumulative future reward at each step.
here is the peace of code to calculate the discounted cumulative future reward at each time step.
G = np.zeros_like(self.reward_memory, dtype=np.float64)
for t in range(len(self.reward_memory)):
G_sum = 0
discount = 1
for k in range(t, len(self.reward_memory)):
G_sum += self.reward_memory[k] * discount
discount *= self.gamma
G[t] = G_sum
another example for increasing accuracy is to calculate the reward after the action called "reward to go".
another example is to add the entropy bonus.
Is it possible to add the entropy bonus and the rewards to go or either one to the Monte Carlo method.
Also another step is taken in the Monte Carlo after the reward calculation is to normalize the values.
“In practice it can can also be important to normalize these. For example, suppose we compute [discounted cumulative reward] for all of the 20,000 actions in the batch of 100 Pong game rollouts above. One good idea is to “standardize” these returns (e.g. subtract mean, divide by standard deviation) before we plug them into backprop. This way we’re always encouraging and discouraging roughly half of the performed actions. Mathematically you can also interpret these tricks as a way of controlling the variance of the policy gradient estimator”.
Does that effect the accuracy if both or either of the entropy bonus or reward to go modification is added?
That is from the research PDF https://arxiv.org/pdf/1506.02438.pdf
I am Studying Policy Gradient Algorithms and I want to know how to Improve these algorithms. I would greatly appreciated if you could help me out.
Edit:
I would also like to add on whether the advantage function could also be added
The A(s,a) is the advantages function; is it possible to add this to the Monte Carlo approach assuming we also add both reward to go and the entropy bonus?
You are mixing some things up here.
The Monte Carlo approach is a way to compute the returns for the state-action pairs: as the discounted sum of all the future rewards after that state-action pair (s, a) following the current policy π.
(It is also worth noting that REINFORCE is not an especially good RL algorithm, and that Monte Carlo estimates of the returns have a rather high variance in comparison to e. g. TD(λ).)
The entropy bonus and the advantage function on the other hand are part of the loss (the function you use to train your actor), and therefore have nothing to do with the return computation.
I would suggest you read the Reinforcement Learning Book to get a deeper understanding of what you're doing.
I am working with a complex system, the system has five variables - depending upon values of these five variables, the response of the system is measured. There are seven output variables that are measured in-order to completely define the response.
I have been using artificial neural network to model relationship between the five variables and the seven output parameters. This has been successful so far.. The ANNs can predict really well the output (I have tested the trained network on a validation set of testcases also). I used python Keras/tensor flow for the same.
BTW, I also tried the linear regression as function approximator but it produces large errors. These errors are expected considering that the system is highly non-linear and may not be continuous everywhere.
Now, I would like to predict the values of the five variables from a vector of the seven output parameters (target vector). Tried using Genetic algorithm for the same. After a lot of effort in designing the GA, I still end up getting high differences between target vector and the GA prediction. I just try to minimize the mean squared error between ANN prediction (function approximator) and target vector.
Is this the right approach to use ANN as function approximator and GA for design space exploration?
Yes, it is a good approach to do search space exploration using GA. But designing the crossover, mutation, generation evolution logic, etc. plays a major role in the determining the performance of the Genetic algo.
If your search space is limited, you can use exact methods (which solves to optimality).
There are few implementation in python-scipy itself
If you prefer to go with meta-heuristics,
there is a wide range of options other than Genetic algorithm
Memetic algorithm
Tabu Search
Simulated annealing
Particle swarm optimization
Ant colony optimization
The documentation for Statsmodels' linear mixed-effect models claims that
The Statsmodels LME framework currently supports post-estimation inference via Wald tests and confidence intervals on the coefficients, profile likelihood analysis, likelihood ratio testing, and AIC. [emphasis added]
I've noted the MixedLM.loglike method, but I can't seem to find a function/method for running a likelihood ratio test.
Could somebody kindly point me in the right direction?
I'm running a development branch so things may have changed, but the results class returned by MixedLM.fit() should have an attribute called 'llf'. That is the value of the log-likelihood function at the estimated parameters. If you have two nested models and take -2 times the difference in their llf values, under the null hypothesis where the simpler model is true, this will be a chi^2 random variable with degrees of freedom equal to the difference in degrees of freedom for the two models.
Note that many people feel that you should switch the estimator to ML (not the default REML) when using LR tests.