The fitness output is always 0 when my genetic algorithm is training. This should be normal at first but it does this for the entire duration of the training and does not improve at all. I have the training of the genetic algorithm set up so that it trains on small increments of data at a time instead of the entire training data array. The reason I am doing this is because I want the genetic algorithm to train on the most resent data last. This is a simplified version of what I am trying...
def trainModels(coin):
global endSliceTraining, startSliceTraining # These are also used in the fitness function
torch_ga = pygad.torchga.TorchGA(model=NeuralNetworkModel, num_solutions=10)
while endSliceTraining < len(trainingData):
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
initial_population=torch_ga.population_weights,
fitness_func=fitness_func,
parent_selection_type="sss",
mutation_type="random",
mutation_by_replacement=True)
ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
startSliceTraining += 1
endSliceTraining += 1
def fitness_func(solution, solution_idx):
global startSliceTraining, endSliceTraining
stats = numpy.array(trainingData)
statsSlice = stats[startSliceTraining:endSliceTraining]
for stats in statsSlice:
action = [0,0,0]
stats = torch.tensor(stats, dtype=torch.float)
prediction = pygad.torchga.predict(model=NeuralNetworks[currentCoinIndex], solution=solution, data=stats)
move = torch.argmax(prediction).item()
action[move] = 1
"""
I deleted the complicated part here. This area was not the problem. I hope what's above in this function is understandable
"""
return ...
I'm thinking that maybe my parameters in the pygad.GA are wrong or that perhaps for some reason the Neural Network is not being transfered over to be used in the next set of data but I don't know.
Help would be appreciated, thank you!
Related
In the following code I want to optimize a wind farm using a penalty function.
Using the first function(newsite), I have defined the wind turbines numbers and layout. Then in the next function, after importing x0(c=x0=initial guess), for each range of 10 wind directions (wd) I took the c values for the mean wd of each range. For instance, for wd:[0,10] mean value is 5 and I took c values of wd=5 and put it for all wd in the range[0,10] and for each wind speed(ws). I have to mention that c is the value that shows that wind turbines are off or on(c=0 means wt is off). then I have defined operating according to the c, which means that if operating is 0,c=0 and that wt is off.
Then I defined the penalty function to optimize power output. indeed wherever TI_eff>0.14, I need to implement a penalty function so this function must be subtracted from the original power output. For instance, if sim_res.TI_eff[1][2][3] > 0.14, so I need to apply penalty function so curr_func[1][2][3]=sim_res.Power[1][2][3]-10000*(sim_res.TI_eff[1][2][3]-0.14)**2.
The problem is that I run this code but it did not give me any results and I waited for long hours, I think it was stuck in a loop that could not reach converge. so I want to know what is the problem?
import time
from py_wake.examples.data.hornsrev1 import V80
from py_wake.examples.data.hornsrev1 import Hornsrev1Site # We work with the Horns Rev 1 site, which comes already set up with PyWake.
from py_wake import BastankhahGaussian
from py_wake.turbulence_models import GCLTurbulence
from py_wake.deflection_models.jimenez import JimenezWakeDeflection
from scipy.optimize import minimize
from py_wake.wind_turbines.power_ct_functions import PowerCtFunctionList, PowerCtTabular
import numpy as np
def newSite(x,y):
xNew=np.array([x[0]+560*i for i in range(4)])
yNew=np.array([y[0]+560*i for i in range(4)])
x_newsite=np.array([xNew[0],xNew[0],xNew[0],xNew[1]])
y_newsite=np.array([yNew[0],yNew[1],yNew[2],yNew[0]])
return (x_newsite,y_newsite)
def wt_simulation(c):
c = c.reshape(4,360,23)
site = Hornsrev1Site()
x, y = site.initial_position.T
x_newsite,y_newsite=newSite(x,y)
windTurbines = V80()
for item in range(4):
for j in range(10,370,10):
for i in range(j-10,j):
c[item][i]=c[item][j-5]
windTurbines.powerCtFunction = PowerCtFunctionList(
key='operating',
powerCtFunction_lst=[PowerCtTabular(ws=[0, 100], power=[0, 0], power_unit='w', ct=[0, 0]), # 0=No power and ct
windTurbines.powerCtFunction], # 1=Normal operation
default_value=1)
operating = np.ones((4,360,23)) # shape=(#wt,wd,ws)
operating[c <= 0.5]=0
wf_model = BastankhahGaussian(site, windTurbines,deflectionModel=JimenezWakeDeflection(),turbulenceModel=GCLTurbulence())
# run wind farm simulation
sim_res = wf_model(
x_newsite, y_newsite, # wind turbine positions
h=None, # wind turbine heights (defaults to the heights defined in windTurbines)
wd=None, # Wind direction (defaults to site.default_wd (0,1,...,360 if not overriden))
ws=None, # Wind speed (defaults to site.default_ws (3,4,...,25m/s if not overriden))
operating=operating
)
curr_func=np.ones((4,360,23))
for i in range(4):
for l in range(360):
for k in range(23):
if sim_res.TI_eff[i][l][k]-0.14 > 0 :
curr_func[i][l][k]=sim_res.Power[i][l][k]-10000*(sim_res.TI_eff[i][l][k]-0.14)**2
else:
curr_func[i][l][k]=sim_res.Power[i][l][k]
return -float(np.sum(curr_func)) # negative because of scipy minimize
t0 = time.perf_counter()
def solve():
wt =4 # for V80
wd=360
ws=23
x0 = np.ones((wt,wd,ws)).reshape(-1) # initial value for c
b=(0,1)
bounds=np.full((wt,wd,ws,2),b).reshape(-1, 2)
res = minimize(wt_simulation, x0=x0, bounds=bounds)
return res
res=solve()
print(f'success status: {res.success}')
print(f'aep: {-res.fun}') # negative to get the true maximum aep
print(f'c values: {res.x}\n')
print(f'elapse: {round(time.perf_counter() - t0)}s')
sim_res=wt_simulation(res.x)
There are a number of things in your approach that are either wrong or incomprehensible to me. Just for fun I have tried your code. A few observations:
Your set of parameters (optimization variables) has a shape of (4, 360, 23), i.e. you are looking at 33,120 parameters. There is no nonlinear optimization algorithm that is going to give you any meaningful answer to a problem that big. Ever. But then again, you shouldn't be looking at SciPy optimize if your optimization variables should only assume 0/1 values.
Calling SciPy minimize like this:
res = minimize(wt_simulation, x0=x0, bounds=bounds)
Is going to select a nonlinear optimizer between BFGS, L-BFGS-B or SLSQP (according to the documentation at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html)
Those algorithms are gradient-based, and since you're not providing a gradient of your objective function SciPy is going to calculate them numerically. Good luck with that when you have 33,000 parameters. Never going to finish.
At the beginning of your objective function you are doing this:
for item in range(4):
for j in range(10,370,10):
for i in range(j-10,j):
c[item][i]=c[item][j-5]
I don't understand why you're doing it but you are overriding the input values of c coming from the optimizer.
Your objective function takes 20-25 seconds to evaluate on my powerful workstation. Even if you had only 10-15 optimization parameters, it would take you several days to get any answer out of an optimizer. You have 33,000+ variables. No way.
I don't know why you are doing this and why you're doing it the way you're doing it. You should rethink your approach.
I have recently started using gpflow to build a GP model. I used a Hamiltonian Monte Carlo to sample the posterior with a single chain.
My goal is to run multiple chains and perform convergence diagnostics.
This is my set up for one chain:
num_burnin_steps = ci_niter(100)
num_samples = ci_niter(500)
hmc_helper = gpflow.optimizers.SamplingHelper(
model.log_posterior_density, model.trainable_parameters
)
hmc = tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=hmc_helper.target_log_prob_fn, num_leapfrog_steps=10, step_size=0.01
)
adaptive_hmc = tfp.mcmc.SimpleStepSizeAdaptation(
hmc, num_adaptation_steps=10, target_accept_prob=f64(0.75), adaptation_rate=0.1
)
#tf.function
def run_chain_fn():
return tfp.mcmc.sample_chain(
num_results=num_samples,
num_burnin_steps=num_burnin_steps,
current_state=hmc_helper.current_state,
kernel=adaptive_hmc,
trace_fn=lambda _, pkr: pkr.inner_results.is_accepted,
)
samples, _ = run_chain_fn()
constrained_samples = hmc_helper.convert_to_constrained_values(samples
I cannot find any examples on how to modify this for multiple chains. Here is the closest example I can find that does the same thing. Initial_state is altered to have 10 chains. To do the same with my example I would need to alter hmc_helper.current_state but cannot figure out the right way to do this.
Any suggestions would be greatly appreciated.
I am new to stack overflow so apologies if my question is not clear.
Thanks!
I have previousely used PyMC3 and am now looking to use tensorflow probability.
I have built some model in both, but unfortunately, I am not getting the same answer. In fact, the answer is not that close.
# definition of the joint_log_prob to evaluate samples
def joint_log_prob(data, proposal):
prior = tfd.Normal(mu_0, sigma_0, name='prior')
likelihood = tfd.Normal(proposal, sigma, name='likelihood')
return (prior.log_prob(proposal) + tf.reduce_mean(likelihood.log_prob(data)))
proposal = 0
# define a closure on joint_log_prob
def unnormalized_log_posterior(proposal):
return joint_log_prob(data=observed, proposal=proposal)
# define how to propose state
rwm = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=unnormalized_log_posterior,
max_tree_depth = 100,
step_size = 0.1
)
# define initial state
initial_state = tf.constant(0., name='initial_state')
#tf.function
def run_chain(initial_state, num_results=7000, num_burnin_steps=2000,adaptation_steps = 1):
adaptive_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
rwm, num_adaptation_steps=adaptation_steps,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size),
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
)
return tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps= num_burnin_steps,
current_state=initial_state,
kernel=adaptive_kernel,
trace_fn=lambda cs, kr: kr)
trace, kernel_results = run_chain(initial_state)
I am using NoUTurns sampler, I have added some stepsize adaptation, without it, the result is pretty much the same.
I dont really know how to move forward?
Maybe be joint log probability is wrong?
You should use reduce_sum in your log_prob instead of reduce_mean. Otherwise you are effectively downweighting the likelihood by a factor equal to the size of your data set. This would cause the samples to look a lot more like the prior, which might be what you’re seeing in the plot.
My first time trying to train a dataset containing 8 variables in a time-series of 20 years or so using GRU RNN. The biomass value is what I'm trying to predict based on the other variables. I'm trying first with 1 layer GRU. I'm not using softmax for the output layer. MSE is used for my cost function.
It is basic GRU with forward propagation and backward gradient update. Here are the main function I defined:
'x_t is the input training dataset with a dimension of 7572x8. So T = 7572, input_dim = 8, hidden_dim =128. y_train is my train label.'
def forward_prop_step(self, x_t,y_train, s_t1_prev,V, U, W, b, c,learning_rate):
T = x_t.shape[0]
z_t1 = np.zeros((T,self.hidden_dim))
r_t1 = np.zeros((T,self.hidden_dim))
h_t1 = np.zeros((T,self.hidden_dim))
s_t1 = np.zeros((T+1,self.hidden_dim))
o_s = np.zeros((T,self.input_dim))
for i in xrange(T):
x_e = x_t[i].T
z_t1[i] = sigmoid(U[0].dot(x_e) + W[0].dot(s_t1[i]) + b[0])#128x1
r_t1[i] = sigmoid(U[1].dot(x_e) + W[1].dot(s_t1[i]) + b[1])#128x1
h_t1[i] = np.tanh(U[2].dot(x_e) + W[2].dot(s_t1[i] * r_t1[i]) + b[2])#128x1
s_t1[i+1] = (np.ones_like(z_t1[i]) - z_t1[i]) * h_t1[i] + z_t1[i] * s_t1[i]#128x1
o_s[i] = np.dot(V,s_t1[i+1]) + c#8x1
return [o_s,z_t1,r_t1,h_t1,s_t1]
def bptt(self, x,y_train,o,z_t1,r_t1,h_t1,s_t1,V, U, W, b, c):
bptt_truncate = 360
T = x.shape[0]#length of time scale of input data (train)
dLdU = np.zeros(U.shape)
dLdV = np.zeros(V.shape)
dLdW = np.zeros(W.shape)
dLdb = np.zeros(b.shape)
dLdc = np.zeros(c.shape)
y_train_sp = np.repeat(y_train,self.input_dim)
for t in np.arange(T)[::-1]:
dLdy = 2 * (o[t] - y_train_sp[t])
dydV = s_t1[t]
dydc = 1.0
dLdV += np.outer(dLdy,dydV)
dLdc += dLdy*dydc
for i in np.arange(max(0, t-bptt_truncate), t+1)[::-30]:#every month in the past year
s_t1_pre = s_t1[i]
dydst1 = V #8x128
dst1dzt1 = -h_t1[i] + s_t1_pre #128x1
dst1dht1 = np.ones_like(z_t1[i]) - z_t1[i] #128x1
dzt1dU = np.outer(z_t1[i]*(1.0-z_t1[i]),x[i]) #128x8
#print dzt1dU.shape
dzt1dW = np.outer(z_t1[i]*(1.0-z_t1[i]),s_t1_pre) #128x128
dzt1db = z_t1[i]*(1.0-z_t1[i]) #128x1
dht1dU = np.outer((1.0-h_t1[i] ** 2),x[i]) #128x8
dht1dW = np.outer((1.0-h_t1[i] ** 2),s_t1_pre * r_t1[i]) #128x128
dht1db = 1.0-h_t1[i] ** 2 #128x1
dht1drt1 = (1.0-h_t1[i] ** 2)*(W[2].dot(s_t1_pre))#128x1
drt1dU = np.outer((r_t1[i]*(1.0-r_t1[i])),x[i]) #128x8
drt1dW = np.outer((r_t1[i]*(1.0-r_t1[i])),s_t1_pre) #128x128
drt1db = (r_t1[i]*(1.0-r_t1[i]))#128x1
dLdW[0] += np.outer(dydst1.T.dot(dLdy),dzt1dW.dot(dst1dzt1)) #128x128
dLdU[0] += np.outer(dydst1.T.dot(dLdy),dst1dzt1.dot(dzt1dU)) #128x8
dLdb[0] += (dydst1.T.dot(dLdy))*dst1dzt1*dzt1db#128x1
dLdW[1] += np.outer(dydst1.T.dot(dLdy),dst1dht1*dht1drt1).dot(drt1dW)#128x128
dLdU[1] += np.outer(dydst1.T.dot(dLdy),dst1dht1*dht1drt1).dot(drt1dU) #128x8
dLdb[1] += (dydst1.T.dot(dLdy))*dst1dht1*dht1drt1*drt1db#128x1
dLdW[2] += np.outer(dydst1.T.dot(dLdy),dht1dW.dot(dst1dht1)) #128x128
dLdU[2] += np.outer(dydst1.T.dot(dLdy),dst1dht1.dot(dht1dU))#128x8
dLdb[2] += (dydst1.T.dot(dLdy))*dst1dht1*dht1db#128x1
return [ dLdV,dLdU, dLdW, dLdb, dLdc ]
def predict( self, x):
pred = np.amax(x, axis = 1)
pred_f = relu(pred)
return pred_f
Parameters V,U,W,b,c are updated by gradient dLdV,dLdU,dLdW,dLdb,dLdc calculated by bptt.
I have tried different weight initialization (xavier or just random), tried different time truncation. But all lead to the same outcome. Probably the weight update wasn't right? The network set-up seems simple though. Really struggle on understanding the predication and translate to actual biomass too. The function predict is what I defined to translate the output layer from the GRU network to biomass value by taking the maximum value. But the output layer gives similar value for almost all time iterations. Not sure the best way to do the job though. Thanks for any help or suggestions in advance.
I doubt anyone on stackoverflow is going to debug a custom implementation of GRU for you. If you were using Tensorflow or another high level library, I might take a stab at it, or if it was a simple fully connected network, but as it is all I can do is give some advice on how to proceed with debugging.
First, it sounds like you're running a brand new implementation on your own data set right off the bat. Instead, try starting out by testing your network on trivial, synthetic data sets first. Can it learn an identity function? A response which is simply the weighted average of the three previous time stamps? And so on. It's easier to debug small simple problems. Once you know your implementation can learn things that a GRU based recurrent network should be able to learn, then you can start using your own data.
Second, this comment of yours was very insightful:
Probably the weight update wasn't right?
While it's impossible to say for sure, this is a very common - perhaps the most common - source of bugs in for backprop implementations. Andrew Ng recommends gradient checking to debug an implementation like this. Essentially, this involves numerically approximating the gradient. It's computationally inefficient but relies only on a correct implementation of forward propagation, which makes it very useful for debugging. For one, if the algorithm converges when the numerically approximated gradient is used, you can be more sure that your forward prop is correct and focus on debugging backprop. (On the other hand, if it is still does not succeed, it is likely the issue in your forward prop function.) For another, once the algorithm is working with the numerically approximated gradient then you can compare the output of your analytic gradient function with it and debug any discrepancies. This makes it a lot easier because you now know the correct answer that it should return.
I am trying to create a fairly generic genetic algorithm implementation I'm TensorFlow. I have an implementation that is slow and am trying to increase its speed. I will provide a really simple example of where the program is getting slow and would welcome recommendations of improving the speed of the current implementation.
Let us say that we create the following:
W = tf.Variable(tf.convert_to_tensor(Warr, dtype=tf.float32))
X = tf.placeholder(dtype=tf.float32, shape=(3, None))
y = tf.placeholder(dtype=tf.float32, shape=(None) )
And we want to find W for the condition:
Warr = np.array([[0.1, 0, 0]])
Xarr = np.random.random((3, 100))
yarr = np.dot(Warr, Xarr)
A naive implementation (like the one I have created) goes thus:
1 a cost function is created for this implementation:
yHat = tf.matmul(W, X)
costFunction = tf.reduce_mean( tf.sqrt((y - yHat)*(y - yHat)) )
Note that the cost function can be arbitrarily complex and is not known apriori. Hence, it is something that will be passed into a class. Note that the rest of the code are excerpts within a class, but the main idea is easy to follow:
2 A population is generated (within a class).
self.population = []
for i in tqdm(range(self.GAconfig['numChildren'])):
temp = []
for j, v in enumerate(locVars):
v = (v + (np.random.random( v.shape ) - 0.5) * 2)
v = tf.Variable(tf.convert_to_tensor(v, dtype=tf.float32))
temp.append(v)
self.population.append( temp )
Finding the cost function for this population is a rather arduous task. First copy the weights in the population into the original weight tensor and then calculate the original cost function:
for ps in self.population:
for i, v in enumerate(self.variables):
sess.run(tf.assign( self.variables[i], ps[i] ))
result = sess.run(self.costFunction, feed_dict={
self.X : X, self.y : y
})
This implementation is obviously slow. One possible way would be to to generate a set of cost function tensors rather than weight variables, which can all be updated at once.
However, this is the point at which I am not sure what a "good implementation" would be that can improve the speed of the current implementation. Ant help will be greatly appreciated ...
Note: The full implementation is available here:
https://github.com/sankhaMukherjee/tfNNGA
It is at its very early stages, so the code at the moment is very bad.
The implementation of the GA function can be found in the file src/lib/GA/GA.py
Crossover is found within the GA class
This is called from within the file src/moduleGA/moduleGA.py
Have you tried next functions?
copy_variable_to_graph
copy_op_to_graph
get_copied_op
source