Parallel fitting of multiple Keras Models on single GPU

Parallel fitting of multiple Keras Models on single GPU - python

I'm trying to fit multiple small Keras models in parallel on a single GPU. Because of reasons i need to get them out of a list and train them one step at a time. Since I was not lucky with the standard multiprocessing module i use pathos.
What I tried to do is something like this:
from pathos.multiprocessing import ProcessPool as Pool
import tensorflow as tf
import keras.backend as K
def multiprocess_step(self, model):
K.set_session(sess)
with sess.graph.as_default():
model = step(model, sess)
return model
def step(model, sess):
K.set_session(sess)
with sess.graph.as_default():
model.fit(x=data['X_train'], y=data['y_train'],
batch_size=batch_size
validation_data=(data['X_test'], data['y_test']),
verbose=verbose,
shuffle=True,
initial_epoch=self.step_num - 1)
return model
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.visible_device_list = "0"
sess = tf.Session(config=config)
K.set_session(sess)
with sess.graph.as_default():
pool = Pool(8).map
model_list = pool(multiprocess_step, model_list)
but whatever I try I keep getting an error claiming that the models dont seem to be on the same graph...
ValueError: Tensor("training/RMSprop/Variable:0", shape=(25, 352), dtype=float32_ref) must be from the same graph as Tensor("RMSprop/rho/read:0", shape=(), dtype=float32).
The exception originates in the model.fit() row so I must have done something wrong with the assignment of the session graph even though I tried to set that in every possible location?
Does anyone have experience with something similar?

The following was suggested on the Keras issue tracker. I'm not sure about the relative merits of the approach compared to using multiprocessing.
in_1 = Input()
lstm_1 = LSTM(...)(in_1)
out_1 = Dense(...)(lstm_1)
in_2 = Input()
lstm_2 = LSTM(...)(in_2)
out_2 = Dense(...)(lstm_2)
model_1 = Model(input=in_1, output=out_1)
model_2 = Model(input=in_2, output=out_2)
model = Model(input = [in_1, in_2], output = [out_1, out_2])
model.compile(...)
model.fit(...)
model_1.predict(...)
model_2.predict(...)

Considering the backend is set to tensorflow for the keras. you can use code and do parallel processing for multiple model invocation/ multiple model loading.
def model1(dir_model):
model = os.path.join(dir_model, 'model.json')
dir_weights = os.path.join(dir_model, 'model.h5')
graph1 = Graph()
with graph1.as_default():
session1 = Session(graph=graph1, config=config)
with session1.as_default():
with open(model, 'r') as data:
model_json = data.read()
model_1 = model_from_json(model_json)
model_1.load_weights(dir_weights)
return model_1,gap_weights,session1,graph1
def model_2(dir_model):
model = os.path.join(dir_model, 'model.json')
dir_weights = os.path.join(dir_model, 'model.h5')
graph2 = Graph()
with graph2.as_default():
session2 = Session(graph=graph2, config=config)
with session2.as_default():
with open(model, 'r') as data:
model_json = data.read()
model_2 = model_from_json(model_json)
model_2.load_weights(dir_weights)
return model_2,session2,graph2
and for invocation of the specific model do the following experiments.
for model 1 predict do the following
K.set_session(session2)
with graph2.as_default():
img_pred[img_name] =
patch_dict[np.argmax(np.squeeze(model_2.predict(img_invoke)))
and for the model 2 it follows same as
K.set_session(session2)
with graph2.as_default():
img_pred[img_name] =
patch_dict[np.argmax(np.squeeze(model_2.predict(img_invoke)))]

Related

How do I properly restore a Tensorflow Checkpoint?

I've extended the python implementation of WGAN-GP from here: https://keras.io/examples/generative/wgan_gp/
Basically, I added a callback to the fit function:
class GANCheckpoint(keras.callbacks.Callback):
def __init__(self, cpkt=None, manager=None):
self.cpkt = cpkt
self.manager = manager
def on_epoch_begin(self, epoch, logs=None):
if self.manager.latest_checkpoint:
self.cpkt.restore(self.manager.latest_checkpoint)
print("Restored from {}".format(self.manager.latest_checkpoint))
else:
print("Initializing from scratch.")
def on_epoch_end(self, epoch, logs=None):
save_path = manager.save()
self.cpkt.step.assign_add(1)
print("\nSaved checkpoint for step {}: {}".format(int(checkpoint.step), save_path))
And the checkpoint manager is initialized as:
Checkpoint manager
checkpoint_dir = './training_checkpoints/GAN/'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(step=tf.Variable(1),
d_model=d_model, g_model=g_model,
discriminator_optimizer=discriminator_optimizer, generator_optimizer=generator_optimizer)
manager = tf.train.CheckpointManager(checkpoint, checkpoint_dir, max_to_keep=None)
cbk = GANCheckpoint(cpkt=checkpoint, manager=manager)
Finally I have the fit call:
wgan.fit(X, batch_size=BATCH_SIZE, epochs=epochs, verbose = True, callbacks=[cbk])
I'm using checkpoint.restore(manager.latest_checkpoint) to restore weights in another python file.
However, my generator results are way off compared to what it is supposed to be.
I'm using the following code:
for i in range(10):
a = tf.random.normal(shape=(1, 128))
sample = checkpoint.g_model.predict(a)
print(sample)
I checked the weights of the generator and optimizer, they're coherent and seem identical.
Are checkpoints tied to a specific python file ?
Additionaly, even when I try to restore a checkpoint without fitting the model a first time, in the original python file, it does not work either.
Do you have any idea ?
Thanks in advance

Running FaceNet and MTCNN model simutaneously on 2 cameras

I am trying to run the MTCNN and FaceNet model on 2 cameras simultaneously. So, I am not getting any error while doing this but the code doesn't give me any results.
It just loads both the models and doesn't give me any predictions. Can anyone help me with this?
I have created 2 separate graphs and sessions using g=tf.Graph for MTCNN and FaceNet.
I think this error is coming due to multi-processing with TensorFlow as it might try to load MTCNN input to the Facenet graph. *this is my assumption.
Please let me know if you have any ideas about this. Thanks.
FaceNet:
with face_rec_graph.graph.as_default():
self.sess = tf.Session()
with self.sess.as_default():
self.__load_model(model_path)
self.x = tf.get_default_graph() \
.get_tensor_by_name("input:0")
self.embeddings = tf.get_default_graph() \
.get_tensor_by_name("embeddings:0")
self.phase_train_placeholder = tf.get_default_graph() \
.get_tensor_by_name("phase_train:0")
print("Model loaded")
face_rec_graph was created as follows:
class FaceRecGraph(object):
def __init__(self):
self.graph = tf.Graph();
MTCNN:
graph = tf.Graph()
with graph.as_default():
with open(model_path, 'rb') as f:
graph_def = tf.GraphDef.FromString(f.read())
tf.import_graph_def(graph_def, name='')
self.graph = graph
config = tf.ConfigProto(
allow_soft_placement=True,
intra_op_parallelism_threads=4,
inter_op_parallelism_threads=4)
config.gpu_options.allow_growth = True
self.sess = tf.Session(graph=graph, config=config)
There is no error coming just both the cameras stop giving any result.

Trying to wrap up a keras model in a flask REST app but getting a ValueError

I can create a simple keras model by running
python create-flask-model.py
create-flask-model.py
##points in square that are in or out of a quarter circle
import random
import math
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
training_size = 8000
testing_size = 2000
batch_size = 10
epoch_no = 30
modelStructureFileName = 'simple-flask.json'
modelWeightFileName = 'simple-flask.h5'
def get_model():
model = Sequential()
model.add(Dense(4, input_dim=2, activation='tanh'))
model.add(Dense(4, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='rmsprop')
return model
def get_data_instances(size):
result = []
for i in range(0, size):
number_1 = random.uniform(0,1)
number_2 = random.uniform(0,1)
squares = math.pow(number_1,2) + math.pow(number_2,2)
target = 0
if squares < 0.49:
target = 1
line = number_1,number_2,target
result.append(line)
return np.array(result)
##create data and split in to training and test, features and targets
data_instances = get_data_instances(training_size+testing_size)
train_x, train_y = data_instances[:training_size,0:2], data_instances[:training_size,-1]
test_x, test_y = data_instances[training_size:,0:2], data_instances[training_size:,-1]
##load model and train
model = get_model()
history = model.fit(train_x, train_y, batch_size=batch_size, epochs=epoch_no, validation_data=(test_x, test_y))
##save the model
model_json = model.to_json()
with open(modelStructureFileName, 'w') as json_file:
json_file.write(model_json)
model.save_weights(modelWeightFileName)
##how to get prediction for an instance
#instance = np.array([0.3, 0.6])
#instance = instance.reshape(1,2)
#yhat = model.predict(instance)
#print(yhat)
I wish to load the resulting model in to a flask app and be able to pass instances as json objects and have predictions made and returned. Running
python flask-app.py
in the same directory as the model json and h5 files.
flask-app.py
import json
import numpy as np
from flask import Flask
from keras.models import model_from_yaml
app = Flask(__name__)
model = None
modelStructureFileName = 'simple-flask.json'
modelWeightFileName = 'simple-flask.h5'
def load_model():
yaml_file = open(modelStructureFileName, 'r')
loaded_model_yaml = yaml_file.read()
yaml_file.close()
global model
model = model_from_yaml(loaded_model_yaml)
model.load_weights(modelWeightFileName)
#app.route('/flask/<input>', methods=['GET'])
def predict(input):
input_array = json.loads(input)
instance = np.array(input_array)
instance = instance.reshape(1,2)
yhat = model.predict(instance)
return str(yhat)
if __name__ == '__main__':
load_model()
app.run(port = 9000, debug = True)
If I navigate to http://localhost:9000/flask/[0.3,0.6] I get an error
builtins.ValueError
ValueError: Tensor Tensor("dense_3/Sigmoid:0", shape=(?, 1), dtype=float32) is not an element of this graph.
I think it's something to do with the scope of the model in the app, but can't figure it out. If I load the model in the request method it works once, but then fails with another error. I only want to load the model once. How can I get the flask app to work as expected?
EDIT: I ended up using bottle instead of flask and it worked no problem.
bottle-app.py
from bottle import route, run
import json
import numpy as np
from keras.models import model_from_yaml
modelStructureFileName = 'simple-flask.json'
modelWeightFileName = 'simple-flask.h5'
yaml_file = open(modelStructureFileName, 'r')
loaded_model_yaml = yaml_file.read()
yaml_file.close()
model = model_from_yaml(loaded_model_yaml)
model.load_weights(modelWeightFileName)
print('model loaded')
#route('/bottle/<input>')
def predict(input):
input_array = json.loads(input)
instance = np.array(input_array)
instance = instance.reshape(1,2)
yhat = model.predict(instance)
print(input_array, yhat)
return str(yhat[0][0])
run(host='localhost', port=9000, debug=True)

This happens because, you have multiple threads enabled in flask by default. Tensorflow models are not working well with multiple threads. You can read more about this in the below links
https://github.com/keras-team/keras/issues/5640
https://github.com/tensorflow/tensorflow/issues/14356
The following workaround worked for me
global graph
graph = tf.get_default_graph()
with graph.as_default():
model.compile()
model.fit()
with graph.as_default():
model.predict()

This answer is with respect to flask API.
The problem is Flask API works only once and then after it gives errors. So, in that case, you should write K.clear_session() at the end of the API before the return statement.
And do not forget to write from keras import backend as K line at the top.

Running different models in one script in Tensorflow 1.9

I have very simple model which consists of one tf.Variable() and here is who code:
import tensorflow as tf
save_path="model1/model1.ckpt"
num_input = 2
n_nodes_hl1 = 2
with tf.variable_scope("model1"):
hidden_1_layer = {
'weights' : tf.Variable(tf.random_normal([num_input, n_nodes_hl1]), name='Weight1')
}
def train_model():
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
save_model(sess)
def save_model(sess):
saver = tf.train.Saver(tf.global_variables(), save_path)
saver.save(sess, save_path)
def load_model(sess):
saver = tf.train.Saver(tf.global_variables(), save_path)
saver.restore(sess, save_path)
def run_model():
print("model1 running...")
with tf.Session() as sess:
load_model(sess)
x = sess.run(hidden_1_layer)
print(x)
#train_model()
The second model is completely the same, but with changed names "model1" to "model2". Both models are trained, saved and work perfect separately. So now I want to test them using following script:
import model1 as m1
import model2 as m2
m1.run_model()
m2.run_model()
And here I got an error message:
NotFoundError (see above for traceback): Key model2/Weight2 not found in checkpoint
So it looks like running imports causes adding all variables to common graph (even though they are in separate variable scopes) and then it cannot find variable from model2 saved in checkpoint in model1.
Can anyone solve my problem?
Is it possible in Tensorflow to run a few different models in one script?
EDIT - PROBLEM SOLVED
The solution is very easy. What you have to do is to create separate graphs for each model like. It means that all tensors you declare or calculate must be within that graph. You also must put it as an argument in Session, like: tf.Session(graph=self.graph)
Whole example below:
import tensorflow as tf
save_path="model1/model1.ckpt"
class model1:
num_input = 2
n_nodes_hl1 = 2
def init(self):
self.graph = tf.Graph()
with self.graph.as_default():
with tf.variable_scope("model1"):
self.hidden_1_layer = {
'weights' : tf.Variable(tf.random_normal([self.num_input, self.n_nodes_hl1]), name='Weight1')
}
def train_model(self):
init = tf.global_variables_initializer()
with tf.Session(graph = self.graph) as sess:
sess.run(init)
self.save_model(sess)
def save_model(self, sess):
saver = tf.train.Saver(tf.global_variables(), save_path)
saver.save(sess, save_path)
def load_model(self, sess):
saver = tf.train.Saver(tf.global_variables(), save_path)
saver.restore(sess, save_path)
def run_model(self):
print("model1 running...")
with tf.Session(graph = self.graph) as sess:
self.load_model(sess)
x = sess.run(self.hidden_1_layer)
print(x)

Oh! the common "I want to use several models" question! just make sure that you reset the graph after each model:
tf.reset_default_graph()
Your code would look like:
import tensorflow as tf
import model1 as m1
m1.run_model()
tf.reset_default_graph()
import model2 as m2
m2.run_model()
Why? The moment you create a variable in tensorflow using tf.Variable, that variable is added to the default graph. If you import both models one after the other, you just created all the variables in the default graph! This is by far the easiest solution. Consider the default graph as a blackboard: you can draw your fancy ML model, but you need to wipe it clean before reuse!
NOTE: If you are wondering, the alternative is to create separate graphs for each of the models, but it is much more worrysome and I only recommend it for times when you must have both models at the same time.
EXTRA: Encapsulating your model in a Tensorflow class
A fancier way to do it while avoiding several graphs (seriously, it is horrible!) is to encapsulate the whole model in a class. Thus, your code would look like this:
import tensorflow as tf
class model():
self.num_input = 2
self.n_nodes_hl1 = 2
def init(self, new_save_path)
self.save_path=new_save_path
tf.reset_default_graph()
with tf.variable_scope("model1"):
self.hidden_1_layer = {
'weights' : tf.Variable(tf.random_normal([self.num_input,
self.n_nodes_hl1]), name='Weight1')
}
self.saver = tf.train.Saver(tf.global_variables(), self.save_path)
self.sess = tf.Session()
self.sess.run(tf.global_variables_initializer())
def save_model(self):
self.saver.save(self.sess, self.save_path)
def load_model(self):
self.saver.restore(self.sess, self.save_path)
def run_model(self):
print("model1 running...")
load_model()
x = sess.run(self.hidden_1_layer)
print(x)
#train_model(self)
This way you could simply do:
import model
m1 = model('model1/model1.ckpt') # These two lines could be put into one
m1.run_model() # m1 = model('model1/model1.ckpt').run_model()
m2 = model('model2/model2.ckpt')
m2.run_model()
You still want it in a for loop?
import model
model_file_list = ['model1/model1.ckpt', 'model2/model2.ckpt']
for model_file in model_list:
m = model(model_file ).run_model()
# Run tests, print stuff, save stuff here!

Tensorflow: define placeholders/operation name in image pipeline

I would like to save my trained Tensorflow model, so it can be deployed by restoring the model file (I'm following this example, which seems to make sense). To do this, however, I need to have named tensors, so that I can do reload the variables with something like:
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("my_tensor:0")
I am queuing images from a list of filenames using string_input_producer (code below), but how do I name the tensors so that I can reload them at a later stage?
import tensorflow as tf
flags = tf.app.flags
conf = flags.FLAGS
class ImageDataSet(object):
def __init__(self, img_list_path, num_epoch, batch_size):
# Build the record list queue
input_file = open(images_list_path, 'r')
self.record_list = []
for line in input_file:
line = line.strip()
self.record_list.append(line)
filename_queue = tf.train.string_input_producer(self.record_list, num_epochs=num_epoch)
image_reader = tf.WholeFileReader()
_, image_file = image_reader.read(filename_queue)
image = tf.image.decode_jpeg(image_file, conf.img_colour_channels)
# preprocess
# ...
min_after_dequeue = 1000
capacity = min_after_dequeue + 400 * batch_size
self.images = tf.train.shuffle_batch(image, batch_size=batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue)

I assume that you want to restore the graph for testing or deploying.
For these purposes, you can edit your graph by insert a placeholder as an entrance of the testing data.
To edit the graph, you can use tf's graph editor, or build an new graph with placeholder and save it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallel fitting of multiple Keras Models on single GPU - python

Related

How do I properly restore a Tensorflow Checkpoint?

Running FaceNet and MTCNN model simutaneously on 2 cameras

Trying to wrap up a keras model in a flask REST app but getting a ValueError

Running different models in one script in Tensorflow 1.9

Tensorflow: define placeholders/operation name in image pipeline

Categories

Resources