How to cache layer activations in Keras?

How to cache layer activations in Keras? - python

I train a NN in which first layers have fixed weights (non-trainable) in Keras.
Computation performed by these layers is pretty intensive during training. It makes sense to cache layer activations for each input and reuse them when same input data is passed on next epoch to save computation time.
Is it possible to achieve this behaviour in Keras?

You could separate your model into two different models. For example, in the following snippet x_ would correspond to your intermediate activations:
from keras.models import Model
from keras.layers import Input, Dense
import numpy as np
nb_samples = 100
in_dim = 2
h_dim = 3
out_dim = 1
a = Input(shape=(in_dim,))
b = Dense(h_dim, trainable=False)(a)
model1 = Model(a, b)
model1.compile('sgd', 'mse')
c = Input(shape=(h_dim,))
d = Dense(out_dim)(c)
model2 = Model(c, d)
model2.compile('sgd', 'mse')
x = np.random.rand(nb_samples, in_dim)
y = np.random.rand(nb_samples, out_dim)
x_ = model1.predict(x) # Shape=(nb_samples, h_dim)
model2.fit(x_, y)

Related

Incompatible shape sizes using PyGAD

I'm trying to follow the tutorial given here.
This tutorial trains a Keras model using a genetic algorithm, with the PyGAD package. I'm interested in the binary classification case. My input matrix is of dimension 10000x20. Hence, I've created the following model using Keras:
input_layer = tensorflow.keras.layers.Input(20)
dense_layer1 = tensorflow.keras.layers.Dense(500, activation="relu")(input_layer)
dense_layer2 = tensorflow.keras.layers.Dense(500, activation="relu")(dense_layer1)
output_layer = tensorflow.keras.layers.Dense(1, activation="softmax")(dense_layer2)
model = tensorflow.keras.Model(inputs=input_layer, outputs=output_layer)
keras_ga = pygad.kerasga.KerasGA(model=model,
num_solutions=10)
However, when I go to run the algorithm, using ga_instance.run(), I get the error:
ValueError: Shapes (10000,) and (10000, 1) are incompatible
I can't figure out why I'm getting this error? I want my Keras model to have 2 hidden layers, each with 500 hidden nodes and 1 output node.

I think the problem is related to how each output is represented in the array. if you have a single output for 10000 instances, then this is an example of preparing the data that works with PyGAD. Its shape is (1000, 1).
numpy.random.uniform(0, 1, (1000, 1))
Here is a code that works but for a simple network architecture because, based on the fitness function you used, the fitness sometimes is NaN.
As I do not have the same data you used, I generated the input/output data randomly.
import tensorflow.keras
import pygad.kerasga
import numpy
import pygad
def fitness_func(solution, sol_idx):
global data_inputs, data_outputs, keras_ga, model
model_weights_matrix = pygad.kerasga.model_weights_as_matrix(model=model,
weights_vector=solution)
model.set_weights(weights=model_weights_matrix)
predictions = model.predict(data_inputs)
cce = tensorflow.keras.losses.CategoricalCrossentropy()
solution_fitness = 1.0 / (cce(data_outputs, predictions).numpy() + 0.00000001)
# print("solution_fitness", cce(data_outputs, predictions).numpy(), solution_fitness)
return solution_fitness
def callback_generation(ga_instance):
print("Generation = {generation}".format(generation=ga_instance.generations_completed))
print("Fitness = {fitness}".format(fitness=ga_instance.best_solution(ga_instance.last_generation_fitness)[1]))
data_inputs = numpy.random.uniform(0, 1, (1000, 20))
data_outputs = numpy.random.uniform(0, 1, (1000, 1))
# create model
from tensorflow.keras.layers import Dense, Dropout
l1_rate=1e-6
l2_rate = 1e-6
input_layer = tensorflow.keras.layers.InputLayer(20)
dense_layer1 = tensorflow.keras.layers.Dense(10, activation="relu",kernel_regularizer=tensorflow.keras.regularizers.l1_l2(l1=l1_rate, l2=l2_rate))
output_layer = tensorflow.keras.layers.Dense(1, activation="sigmoid")
model = tensorflow.keras.Sequential()
model.add(input_layer)
model.add(dense_layer1)
model.add(Dropout(0.2))
model.add(output_layer)
keras_ga = pygad.kerasga.KerasGA(model=model,
num_solutions=10)
# Run pygad
num_generations = 30
num_parents_mating = 5
initial_population = keras_ga.population_weights
ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
initial_population=initial_population,
fitness_func=fitness_func,
on_generation=callback_generation)
ga_instance.run()
Thanks for using PyGAD!

Should my model with Monte Carlo dropout provide a mean prediction similar to the deterministic prediction?

I have a model trained with multiple LayerNormalization layers, and I am unsure if a simple weight transfer works properly when activating dropout for prediction. This is the code I am using:
from tensorflow.keras.models import load_model, Model
from tensorflow.keras.layers import Dense, Dropout, LayerNormalization, Input
model0 = load_model(path + 'model0.h5')
OW = model0.get_weights()
inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=True)
N1 = LayerNormalization()(DO1)
D2 = Dense(460,activation='softsign')(N1)
DO2 = Dropout(0.16)(D2,training=True)
N2 = LayerNormalization()(DO2)
D3 = Dense(664,activation='softsign')(N2)
DO3 = Dropout(0.09)(D3,training=True)
N3 = LayerNormalization()(DO3)
out = Dense(1,activation='linear')(N3)
mP = Model(inp,out)
mP.set_weights(OW)
mP.compile(loss='mse',optimizer='Adam')
mP.save(path + 'new_model.h5')
If I set training=False on the dropout layers, the model makes identical predictions to the original model. However, when the code is written as above the mean prediction is not close to the original/deterministic prediction.
Previous models that I had developed with dropout set to training had mean probabilistic predictions nearly identical to the deterministic model. Is there something I am doing incorrectly, or is this an issue with using LayerNormalization and active dropout? As far as I know, LayerNormalization has trainable parameters, so i didn't know if active dropout interferes with that. If it does, I am not sure how to remedy this.
This segment of code is for running a quick test and plotting the results:
inputs = np.zeros(shape=(1,10),dtype='float32')
inputsP = np.zeros(shape=(1000,10),dtype='float32')
opD = mD.predict(inputs)[0,0]
opP = mP.predict(inputsP).reshape(1000)
print('Deterministic: %.4f Probabilistic: %.4f' % (opD,np.mean(opP)))
plt.scatter(0,opD,color='black',label='Det',zorder=3)
plt.scatter(0,np.mean(opP),color='red',label='Mean prob',zorder=2)
plt.errorbar(0,np.mean(opP),yerr=np.std(opP),color='red',zorder=2,markersize=0, capsize=20,label=r'$\sigma$ bounds')
plt.grid(axis='y',zorder=0)
plt.legend()
plt.tick_params(axis='x',labelsize=0,labelcolor='white',color='white',width=0,length=0)
And the resulting output and plot are shown below.
Deterministic: -0.9732 Probabilistic: -0.9011

Edit to my answer:
I think the problem is just an under-sampling from the model. The standard deviation of the predictions is directly tied to the dropout rate and thus the number of predictions you need to approximate the determistic model goes up as well. If you do an absurd test of the code below but with dropout set to 0.7 for each dropout layer, 100,000 samples is no longer enough to approximate the deterministic mean to within 10^-3 and the standard deviation of the predictions gets much larger.
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, Input
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
GPUs = tf.config.experimental.list_physical_devices('GPU')
for gpu in GPUs:
tf.config.experimental.set_memory_growth(gpu, True)
inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
D2 = Dense(460, activation='softsign')(D1)
D3 = Dense(664, activation='softsign')(D2)
out = Dense(1, activation='linear')(D3)
mP = Model(inp, out)
mP.compile(loss='mse', optimizer='Adam')
inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=False)
D2 = Dense(460, activation='softsign')(DO1)
DO2 = Dropout(0.16)(D2,training=True)
D3 = Dense(664, activation='softsign')(DO2)
DO3 = Dropout(0.09)(D3,training=True)
out = Dense(1, activation='linear')(DO3)
mP2 = Model(inp, out)
mP2.set_weights(mP.get_weights())
mP2.compile(loss='mse', optimizer='Adam')
data = np.zeros(shape=(100000, 10),dtype='float32')
res = mP.predict(data).reshape(data.shape[0])
res2 = mP2.predict(data).reshape(data.shape[0])
print (np.abs(res[0] - res2.mean()))

keras rl - dqn model update

I am reading through the DQN implementation in keras-rl /rl/agents/dqn.py and see that in the compile() step essentially 3 keras models are instantiated:
self.model : provides q value predictions
self.trainable_model : same as self.model but has the loss function we want to train
self.target_model : target model which provides the q targets and is updated every k steps with the weights from self.model
The only model on which train_on_batch() is called is trainable_model, however - and this is what I don't understand - this also updates the weights of model.
In the definition of trainable_model one of the output tensors y_pred is referencing the output from model:
y_pred = self.model.output
y_true = Input(name='y_true', shape=(self.nb_actions,))
mask = Input(name='mask', shape=(self.nb_actions,))
loss_out = Lambda(clipped_masked_error, output_shape=(1,), name='loss')([y_true, y_pred, mask])
ins = [self.model.input] if type(self.model.input) is not list else self.model.input
trainable_model = Model(inputs=ins + [y_true, mask], outputs=[loss_out, y_pred])
When trainable_model.train_on_batch() is called, BOTH the weights in trainable_model and in model change. I am surprised because even though the two models reference the same output tensor object (trainable_model.y_pred = model.output), the instantiation of trainable_model = Model(...) should also instantiate a new set of weights, no?
Thanks for the help!

This is a small example to show that when you instantiate a new keras.models.Model() using the input and output tensor from another model, then the weights of these two models are shared. They don't get re-initialized.
# keras version: 2.2.4
import numpy as np
from keras.models import Sequential, Model
from keras.layers import Dense, Input
from keras.optimizers import SGD
np.random.seed(123)
model1 = Sequential()
model1.add(Dense(1, input_dim=1, activation="linear", name="model1_dense1", weights=[np.array([[10]]),np.array([10])]))
model1.compile(optimizer=SGD(), loss="mse")
model2 = Model(inputs=model1.input, outputs=model1.output)
model2.compile(optimizer=SGD(), loss="mse")
x = np.random.normal(size=2000)
y = 2 * x + np.random.normal(size=2000)
print("model 1 weights", model1.get_weights())
print("model 2 weights", model2.get_weights())
model2.fit(x,y, epochs=3, batch_size=32)
print("model 1 weights", model1.get_weights())
print("model 2 weights", model2.get_weights())
Definitely something to keep in mind. Wasnt intuitive to me.

Make predictions using a tensorflow graph from a keras model

I have a model trained using Keras with Tensorflow as my backend, but now I need to turn my model into a tensorflow graph for a certain application. I attempted to do this and make predictions to insure that it is working correctly, but when comparing to the results gathered from model.predict() I get very different values. For instance:
from keras.models import load_model
import tensorflow as tf
model = load_model('model_file.h5')
x_placeholder = tf.placeholder(tf.float32, shape=(None,7214,1))
y = model(x_placeholder)
x = np.ones((1,7214,1))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print("Predictions from:\ntf graph: "+str(sess.run(y, feed_dict={x_placeholder:x})))
print("keras predict: "+str(model.predict(x)))
returns:
Predictions from:
tf graph: [[-0.1015993 0.07432419 0.0592984 ]]
keras predict: [[ 0.39339241 0.57949686 -3.67846966]]
The values from keras predict are correct, but the tf graph results are not.
If it helps to know the final intended application, I am creating a jacobian matrix with the tf.gradients() function, but currently it does not return the correct results when comparing to theano's jacobian function, which gives the correct jacobian. Here is my tensorflow jacobian code:
x = tf.placeholder(tf.float32, shape=(None,7214,1))
y = tf.reshape(model(x)[0],[-1])
y_list = tf.unstack(y)
jacobian_list = [tf.gradients(y_, x)[0] for y_ in y_list]
jacobian = tf.stack(jacobian_list)
EDIT: Model code
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, InputLayer, Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
# activation function used following every layer except for the output layers
activation = 'relu'
# model weight initializer
initializer = 'he_normal'
# shape of input data that is fed into the input layer
input_shape = (None,7214,1)
# number of filters used in the convolutional layers
num_filters = [4,16]
# length of the filters in the convolutional layers
filter_length = 8
# length of the maxpooling window
pool_length = 4
# number of nodes in each of the hidden fully connected layers
num_hidden_nodes = [256,128]
# number of samples fed into model at once during training
batch_size = 64
# maximum number of interations for model training
max_epochs = 30
# initial learning rate for optimization algorithm
lr = 0.0007
# exponential decay rate for the 1st moment estimates for optimization algorithm
beta_1 = 0.9
# exponential decay rate for the 2nd moment estimates for optimization algorithm
beta_2 = 0.999
# a small constant for numerical stability for optimization algorithm
optimizer_epsilon = 1e-08
model = Sequential([
InputLayer(batch_input_shape=input_shape),
Conv1D(kernel_initializer=initializer, activation=activation, padding="same", filters=num_filters[0], kernel_size=filter_length),
Conv1D(kernel_initializer=initializer, activation=activation, padding="same", filters=num_filters[1], kernel_size=filter_length),
MaxPooling1D(pool_size=pool_length),
Flatten(),
Dense(units=num_hidden_nodes[0], kernel_initializer=initializer, activation=activation),
Dense(units=num_hidden_nodes[1], kernel_initializer=initializer, activation=activation),
Dense(units=3, activation="linear", input_dim=num_hidden_nodes[1]),
])
# compile model
loss_function = mean squared error
early_stopping_min_delta = 0.0001
early_stopping_patience = 4
reduce_lr_factor = 0.5
reuce_lr_epsilon = 0.0009
reduce_lr_patience = 2
reduce_lr_min = 0.00008
optimizer = Adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=optimizer_epsilon, decay=0.0)
early_stopping = EarlyStopping(monitor='val_loss', min_delta=early_stopping_min_delta,
patience=early_stopping_patience, verbose=2, mode='min')
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5, epsilon=reuce_lr_epsilon,
patience=reduce_lr_patience, min_lr=reduce_lr_min, mode='min', verbose=2)
model.compile(optimizer=optimizer, loss=loss_function)
model.fit(train_x, train_y, validation_data=(cv_x, cv_y),
epochs=max_epochs, batch_size=batch_size, verbose=2,
callbacks=[reduce_lr,early_stopping])
model.save('model_file.h5')

#frankyjuang linked me to here
https://github.com/amir-abdi/keras_to_tensorflow
and combining this with code from
https://github.com/metaflow-ai/blog/blob/master/tf-freeze/load.py
and
https://github.com/tensorflow/tensorflow/issues/675
I have found a solution to both predicting using a tf graph and creating the jacobian function:
import tensorflow as tf
import numpy as np
# Create function to convert saved keras model to tensorflow graph
def convert_to_pb(weight_file,input_fld='',output_fld=''):
import os
import os.path as osp
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io
from keras.models import load_model
from keras import backend as K
# weight_file is a .h5 keras model file
output_node_names_of_input_network = ["pred0"]
output_node_names_of_final_network = 'output_node'
# change filename to a .pb tensorflow file
output_graph_name = weight_file[:-2]+'pb'
weight_file_path = osp.join(input_fld, weight_file)
net_model = load_model(weight_file_path)
num_output = len(output_node_names_of_input_network)
pred = [None]*num_output
pred_node_names = [None]*num_output
for i in range(num_output):
pred_node_names[i] = output_node_names_of_final_network+str(i)
pred[i] = tf.identity(net_model.output[i], name=pred_node_names[i])
sess = K.get_session()
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), pred_node_names)
graph_io.write_graph(constant_graph, output_fld, output_graph_name, as_text=False)
print('saved the constant graph (ready for inference) at: ', osp.join(output_fld, output_graph_name))
return output_fld+output_graph_name
Call:
tf_model_path = convert_to_pb('model_file.h5','/model_dir/','/model_dir/')
Create function to load the tf model as a graph:
def load_graph(frozen_graph_filename):
# We load the protobuf file from the disk and parse it to retrieve the
# unserialized graph_def
with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
# Then, we can use again a convenient built-in function to import a graph_def into the
# current default Graph
with tf.Graph().as_default() as graph:
tf.import_graph_def(
graph_def,
input_map=None,
return_elements=None,
name="prefix",
op_dict=None,
producer_op_list=None
)
input_name = graph.get_operations()[0].name+':0'
output_name = graph.get_operations()[-1].name+':0'
return graph, input_name, output_name
Create a function to make model predictions using the tf graph
def predict(model_path, input_data):
# load tf graph
tf_model,tf_input,tf_output = load_graph(model_path)
# Create tensors for model input and output
x = tf_model.get_tensor_by_name(tf_input)
y = tf_model.get_tensor_by_name(tf_output)
# Number of model outputs
num_outputs = y.shape.as_list()[0]
predictions = np.zeros((input_data.shape[0],num_outputs))
for i in range(input_data.shape[0]):
with tf.Session(graph=tf_model) as sess:
y_out = sess.run(y, feed_dict={x: input_data[i:i+1]})
predictions[i] = y_out
return predictions
Make predictions:
tf_predictions = predict(tf_model_path,test_data)
Jacobian function:
def compute_jacobian(model_path,input_data):
tf_model,tf_input,tf_output = load_graph(model_path)
x = tf_model.get_tensor_by_name(tf_input)
y = tf_model.get_tensor_by_name(tf_output)
y_list = tf.unstack(y)
num_outputs = y.shape.as_list()[0]
jacobian = np.zeros((num_outputs,input_data.shape[0],input_data.shape[1]))
for i in range(input_data.shape[0]):
with tf.Session(graph=tf_model) as sess:
y_out = sess.run([tf.gradients(y_, x)[0] for y_ in y_list], feed_dict={x: input_data[i:i+1]})
jac_temp = np.asarray(y_out)
jacobian[:,i:i+1,:]=jac_temp[:,:,:,0]
return jacobian
Compute Jacobian Matrix:
jacobians = compute_jacobian(tf_model_path,test_data)

Memory usage is increasing all the time when I train rnn with keras

I want train a sort network based on rnn using the keras library. But when I train the net in the CPU, the CPU memory usage is increasing all the time. And the cpu will break down soon.
I use the gc module of python to observe the size change of all objects, but sizes of all objects seem to be same all the time. And I found when I set a bigger size, the speed of increasing of memory usage is faster.
The network is as followed
code
#!/usr/bin/env python
from collections import defaultdict
from gc import get_objects
from keras.models import Sequential
from keras.layers.core import Activation, RepeatVector, TimeDistributedDense, Dropout, Dense
from keras.layers import recurrent
import numpy as np
from keras import backend as K
import sys
#from data import batch_gen, encode
RNN = recurrent.LSTM
# Neural networks take input as vectors so we have to convert integers to
vectors using one-hot encoding
# This function will encode a given integer sequence into RNN compatible format (one-hot representation)
def encode(X,seq_len, vocab_size):
x = np.zeros((len(X),seq_len, vocab_size), dtype=np.float32)
for ind,batch in enumerate(X):
for j, elem in enumerate(batch):
x[ind, j, elem] = 1
return x
# This is a generator function which can generate infinite-stream of inputs for training
def batch_gen(batch_size=32, seq_len=6, max_no=10):
# Randomly generate a batch of integer sequences (X) and its sorted
# counterpart (Y)
x = np.zeros((batch_size, seq_len, max_no), dtype=np.float32)
y = np.zeros((batch_size, seq_len, max_no), dtype=np.float32)
while True:
# Generates a batch of input
X = np.random.randint(max_no, size=(batch_size, seq_len))
Y = np.sort(X, axis=1)
for ind,batch in enumerate(X):
for j, elem in enumerate(batch):
x[ind, j, elem] = 1
for ind,batch in enumerate(Y):
for j, elem in enumerate(batch):
y[ind, j, elem] = 1
yield x, y
x.fill(0.0)
y.fill(0.0)
# global parameters.
batch_size=32
seq_len = 6
max_no = 10
# Initializing model
model = Sequential()
# This is encoder RNN (we are taking a variant of RNN called LSTM, because plain RNN's suffer from long-term dependencies issues
model.add(RNN(max_no))#, input_shape=(seq_len, max_no)))
# Dropout to enhace RNN's generalization capacities
model.add(Dropout(0.25))
# At this point RNN will generate a summary vector of the sequence, so to feed it to decoder we need to repeat it lenght of output seq. number of times
model.add(RepeatVector(seq_len))
# Decoder RNN, which will return output sequence
model.add(RNN(128, return_sequences=True))
# Adding linear layer at each time step
model.add(TimeDistributedDense(128,max_no))
# Adding non-linearity on top of linear layer at each time-step, since output at each time step is supposed to be probability distribution over max. no of integer in sequence
# we add softmax non-linearity
model.add(Dropout(0.5))
model.add(Activation('softmax'))
# Since this is a multiclass classification task, crossentropy loss is being used. Optimizer is adam, which is a particular instance of adaptive learning rate Gradient Descent methods
model.compile(loss='categorical_crossentropy', optimizer='adam')#,
#metrics=['accuracy'])
# Now the training loop, we'll sample input batches from the generator
function written previously and feed it to the RNN for learning
#using get_object to observe the object
before = defaultdict(int)
after = defaultdict(int)
for i in get_objects():
before[type(i)] += 1
for ind,(X,Y) in enumerate(batch_gen(batch_size, seq_len, max_no)):
for i in get_objects():
after[type(i)] += 1
print(after)
print([(k, after[k] - before[k]) for k in after if after[k] - before[k]])
before = after
loss, acc = model.train_on_batch(X, Y, accuracy=True)
if ind % 100 == 99:
print(ind, 'loss=',loss, 'acc=', acc)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to cache layer activations in Keras? - python

Related

Incompatible shape sizes using PyGAD

Should my model with Monte Carlo dropout provide a mean prediction similar to the deterministic prediction?

keras rl - dqn model update

Make predictions using a tensorflow graph from a keras model

Memory usage is increasing all the time when I train rnn with keras

Categories

Resources