For a project I'm working on I have to compare the performance of a small, simulated dataset to the performance of well-known datasets (e.g. ImageNet, CIFAR-10) using two neural networks, ResNet v2 at different depths and Inception v3.
To account for the imbalance between my set and the big popular ones, one of the methods I want to employ is augmenting the big sets using either a class from my simulated set or the same class but instead with real-life images. Then, using a pre-trained model, I want to train on both augmented sets and later compare performance.
The issue
The augmentation works fine: I copy over the .tfrecord files containing only images of the extra class to the original dataset and I modify the labels.txt file to contain the appropriate extra labels. However, the issue occurs when trying to train on this augmented dataset:
I0125 16:40:31.279634 140094914221888 coordinator.py:224] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [258] rhs shape= [257]
which is to be expected, as the graph cannot handle the extra class.
My attempted solution
To alleviate this issue I tried applying the same method as was posed in this answer, using the Python script below (modified for brevity):
checkpoint = {}
saver = tf.compat.v1.train.import_meta_graph(f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}.meta")
with tf.Session() as sess:
saver.restore(sess, f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}")
# get appropriate nodes where number of classes is one of its dimensions
if "inception_v3" in TRAIN_PATH:
target_nodes = [
'<nodes where one of its dimensions is equal to the number of classes>'
]
elif "resnet_v2" in TRAIN_PATH:
target_nodes = [
'<nodes where one of its dimensions is equal to the number of classes>'
]
# select the nodes previously defined
nodes = [node for node in tf.global_variables() if node.name in target_nodes]
for node in nodes:
# load the values of those nodes from the checkpoint
checkpoint[node.name] = node.eval()
# extend dimensions for extra class
checkpoint.update({node.name: np.insert(checkpoint[node.name], checkpoint[node.name].shape[-1], 0.0, axis=-1)})
# assign new nodes to the checkpoint
sess.run(tf.assign(node, checkpoint[node.name], validate_shape=False))
saver.save(sess, f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}")
In short, this script loads the checkpoint and isolates all nodes where one of the dimensions is equal to the number of classes and increases it. I have verified these are always equal to the number of classes using the checkpoints from the different datasets the models have been trained on. In fact, it seems that those tensors are the only thing that's different between the checkpoints.
Although this method does execute correctly and allows me to run the training, it feels very boneheaded and I'm almost positive it is not at all what I'm supposed to do. However I'm curious as to what exactly is wrong with this (apart from initializing the new values as 0.0), as except from a wild fluctuation in loss at the start of the tranining it seems to work as I had hoped.
Other solutions
When looking further into this issue I also found other users' answers on similar questions suggesting that it is in fact impossible to modify a checkpoint in the way that I want to and suggested transfer learning or modifying the output layer. I am very new to neural network training and I don't know who to believe, as the linked answer seems to suggest that what I'm trying to do should work.
My question
So I ask: which approach is correct? Could my initial approach work, or should I try a different method to fix this issue? Starting from scratch would not be ideal, as the progress so far has taken over a month of continuous training.
I am using a modified version the TensorFlow Model Garden as my environment to train the networks, with the modificiations pertaining to creating custom datasets and using them using the supplied training scripts.
Related
I recently have an issue with saving the model in a bigger size.
I am using tensorflow 1.4
Before, I used
tf.train.string_input_producer() and tf.train.batch()
to load images from a text file. And in the training,
tf.train.start_queue_runners() and tf.train.Coordinator()
were used to provide data to the network. In this case, every time I saved the model using
saver.save(sess, checkpoint_path, global_step=iters)
only gave me a small size file, i.e. a file named model.ckpt-1000.data-00000-of-00001 with 1.6MB.
Now, I use
tf.data.Dataset.from_tensor_slices()
to supply images to an input placeholder and the saved model become 290MB. But I don't know why. I suspect the tensorflow saver saved the dataset into the model as well. If so, how to remove them to make it smaller, and only the weights of the network are saved.
This is not network depended because I tried in two networks and they were all like that.
I have googled but unfortunately didn't see any inspiration related to this issue. (Or this is not an issue, just I don't know how do?)
Thank you very much for any idea and help!
Edit
The method I initialised the dataset is:
1.First generated numpy.array dataset:
self.train_hr, self.train_lr = cifar10.load_dataset(sess)
The initial dataset is numpy.array, for example [8000,32,32,3]. I passed sess into this function is because in the function, I did tf.image.resize_images() and use sess.run() to generate numpy.array. The returns self.train_hr and self.train_lr are numpy.array in shape [8000,64,64,3].
2.Then I created the dataset:
self.img_hr = tf.placeholder(tf.float32)
self.img_lr = tf.placeholder(tf.float32)
dataset = tf.data.Dataset.from_tensor_slices((self.img_hr, self.img_lr))
dataset = dataset.repeat(conf.num_epoch).shuffle(buffer_size=conf.shuffle_size).batch(conf.batch_size)
self.iterator = dataset.make_initializable_iterator()
self.next_batch = self.iterator.get_next()
3.Then I initialised network and dataset, did the training and saved model:
self.labels = tf.placeholder(tf.float32,
shape=[conf.batch_size, conf.hr_size, conf.hr_size, conf.img_channel])
self.inputs = tf.placeholder(tf.float32,
shape=[conf.batch_size, conf.lr_size, conf.lr_size, conf.img_channel])
self.net = Net(self.labels, self.inputs, mask_type=conf.mask_type,
is_linear_only=conf.linear_mapping_only, scope='sr_spc')
sess.run(self.iterator.initializer,
feed_dict={self.img_hr: self.train_hr, self.img_lr: self.train_lr})
while True:
hr_img, lr_img = sess.run(self.next_batch)
_, loss, summary_str = sess.run([train_op, self.net.loss, summary_op],
feed_dict={self.labels: hr_img, self.inputs: lr_img})
...
...
checkpoint_path = os.path.join(conf.model_dir, 'model.ckpt')
saver.save(sess, checkpoint_path, global_step=iters)
All the sess are the same instance.
I suspect you created a tensorflow constant tf.constant out of your dataset, which would explain why the dataset gets stored with the graph. There is an initializeable dataset which let's you feed in the data using feed_dict at runtime. It's a few extra lines of code to configure but it's probably what you wanted to use.
https://www.tensorflow.org/programmers_guide/datasets
Note that constants get created for you automatically in the Python wrapper. The following statements are equivalent:
tf.Variable(42)
tf.Variable(tf.constant(42))
Tensorflow indeed saves your dataset. To solve it, lets understand why.
How tensorflow works and what does it save?
In short, Tensorflow API lets you build a computation graph via code, and then optimize it. Every op/variable/constant you define in the graph is working on tensors and is part of that graph. This framework is convenient since Tensorflow just build a graph, then the framework decides (or you specify) where to compute the graph in order to gain maximum speed out of your hardware, for instance, by computing on your GPU.
The GPU is a great example since this is a great example for your issue. Sending data from HDD/RAM/Processor to GPU is expensive time-wise. Therefore, Tensorflow also allow you to create input producers that will pretty much automatically manage the data transferred between all peripheral units, by queuing them and managing threads. However, I haven't seen much gain from that approach. Note that the inputs produced by datasets are also tensors, specifically constants/variables that are used as input to the network.. Therefore, they are part of the graph.
When saving a graph, we save several things:
Metadata - which defines the graph and its structure.
Values - of each variable/constant in the graph, in order to load it and reuse the network.
When you use datasets, the values of the non-trainable variables are saved, and therefore, your checkpoint file is larger.
To better understand datasets, see its implementation in the package files.
TL;DR - How do I fix my problem?
If its not reducing performance, use feeding dictionary to feed placeholders. Do not use tensors to store your data. This way those variables will not be saved.
Save only tensors that you would like to load (weights, biases, etc). You can user .eval() method to find its values, save it as JSON or such, and load it later by reconstructing the graph.
Good luck!
I solved this issue (not perfectly as I still don't know where the problem happens). Instead, I made a workaround to avoid saving a large amount of data.
I defined a saver fed in a specific list of variables. That list only contains the nodes of my graph. Here I show a small example of my workaround:
import tensorflow as tf
v1= tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v1")
v2= tf.Variable(tf.zeros([200]), name="v2")
saver = tf.train.Saver( [v2])
# saver = tf.train.Saver()
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
saver.save(sess,"checkpoint/model_test",global_step=1)
v2 is the variable list. Or you can use variables_list = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='net') to collect all the nodes.
I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.
I'm trying to load three different models in the same process. Only the first one works as expected, the rest of them return like random results.
Basically the order is as follows:
define and compile first model
load trained weights before
rename layers
the same process for the second model
the same process for the third model
So, something like:
model1 = Model(inputs=Input(shape=input_size_im) , outputs=layers_firstmodel)
model1.compile(optimizer='sgd', loss='mse')
model1.load_weights(weights_first, by_name=True)
# rename layers but didn't work
model2 = Model(inputs=Input(shape=input_size_im) , outputs=layers_secondmodel)
model2.compile(optimizer='sgd', loss='mse')
model2.load_weights(weights_second, by_name=True)
# rename layers but didn't work
model3 = Model(inputs=Input(shape=input_size_im) , outputs=layers_thirdmodel)
model3.compile(optimizer='sgd', loss='mse')
model3.load_weights(weights_third, by_name=True)
# rename layers but didn't work
for im in list_images:
results_firstmodel = model1.predict(im)
results_secondmodel = model2.predict(im)
results_thirdmodel = model2.predict(im)
I'd like to perform some inference over a bunch of images. To do that the idea consists in looping over the images and perform inference with these three algorithms, and return the results.
I have tried to rename all layers to make them unique with no success. Also I created a different graph for each network, and with a different session do the inference. This works but it's very inefficient (in addition I have to set their weights every time because of sess.run(tf.global_variables_initializer()) removes them). Each time it's created a session tensorflow prints "creating tensorflow device (/device:GPU:0)".
I am running Tensorflow 1.4.0-rc0, Keras 2.1.1 and Ubuntu 16.04 kernel 4.14.
The OP is correct here. There is a serious bug when you try to load multiple weight files in the same script. The above answer doesn't solve this. If you actually interrogate the weights when loading weights for multiple models in the same script you will notice that the weights are different than when you just load weights for one model on its own. This is where the randomness is the OP observes coming from.
EDIT: To solve this problem you have to encapsulate the model.load_weight command within a function and the randomness that you are experiencing should go away. The problem is that something weird screws up when you have multiple load_weight commands in the same script like you have above. If you load those model weights with a function you issues should go away.
From the Keras docs we have this explanation for the user of load_weights:
loads the weights of the model from a HDF5 file (created by save_weights). By default, the architecture is expected to be unchanged. To load weights into a different architecture (with some layers in common), use by_name=True to load only those layers with the same name.
Therefore, if your architecture is unchanged you should drop the by_name=True or make it False (its default value). This could be causing the inconsistencies that you are facing, as your weights are not being loaded probably due to having different names on your layers.
Another important thing to consider is the nature of your HDF5 file, and the way you created it. If it indeed contains only the weights (created with save_weights as the docs point out) then there should be no problem in proceeding as explained before.
Now, if that HDF5 contains weights and architecture in the same file, then you should be loading it with keras.models.load_model instead (further reading if you like here). If this is the case then this would also explain those inconsistencies.
As a side suggestion, I prefer to save my models using Callbacks, like the ModelCheckpoint or the EarlyStopping if you want to automatically determine when to stop training. This not only gives you greater flexibility when training and saving your models (as you can stop them on the optimal training epoch or when you desire), but also makes loading those models easily, as you can simply use the load_model method to load both architecture and weights to your desired variable.
Finally, here is one useful SO post where saving (and loading) Keras models is explained.
I'm studying different object detection algorithms for my interest.
The main reference are Andrej Karpathy's slides on object detection slides here.
I would like to start from some reference, in particular something which allows me to directly test some of the network mentioned on my data (mainly consisting in onboard cameras of car and bike races).
Unfortunately I already used some pretrained network (repo forked from JunshengFu one, where I slightly adapt Yolo to my use case), but the classification accuracy is rather poor, I guess because there were not many training instances of racing cars like Formula 1.
For this reason I would like to retrain the networks and here is where I'm finding the most issues:
properly training some of the networks requires either hardware (powerful GPUs) or time I don't have so I was wondering whether I could retrain just some part of the network, in particular the classification network and if there is any repo already allowing that.
Thank you in advance
That is called fine-tuning of the network or transfer-learning. Basically you can do that for any network you find (having similar problem domains of course), and then depending on the amount of the data you have you will either fine-tune whole network or freeze some layers and train only last layers. For your case you would probably need to freeze whole network except last fully-connected layers (which you will actually replace with new ones, satisfying your number of classes), which perform classification. I don't know what library you use, but tensorflow has official tutorial on transfer-learning. However it's not very clear tbh.
More user-friendly tutorial you can find here by some enthusiast: tutorial. Here you can find a code repository as well. One correction you need thou is that the author performs fine-tuning of the whole network, while if you want to freeze some layers you will need to get list of the trainable variables and remove those you want to freeze and pass the resultant list to the optimizer (so he ignores removed vars), like following:
all_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,scope='InceptionResnetV2')
to_train = all_vars[-6:] // you better specify them by name explicitely, but this still will work
optimizer = tf.train.AdamOptimizer(lr=0.0001)
train_op = slim.learning.create_train_op(total_loss,optimizer, variables_to_train=to_train)
Further, tensorflow has a so called model zoo (bunch of trained models you can use for your purposes and transfer-learning). You can find it here.
I'm adapting the Tensorflow tutorial for sequence to sequence modeling for my project. Specifically, I am basing my code off of translate.py.
The tutorial computes the perplexity on the dev set every n training steps. I'd instead like to calculate BLEU score on the dev set.
The problem I'm facing is that when creating a model, you specify whether it is forward only or not. Looking through the code, it seems that if it is (which happens when training), at each step the network will not calculate the final output for the input sequence, but will calculate gradients. When it's not forward only (as in the decoding function later in the tutorial), it applies the loop function which feeds the output back into the input for the RNN which allows for the output sequence to be generated. However, it doesn't compute the gradients. So as far as I understand it, you can construct a model for either training (i.e. gradients) or testing (i.e. performing full inference on it).
Since I want to compute the BLEU score, I need some sequence produced by the model which corresponds to an input sequence in the dev set. Because of how the models are constructed, I would need both types of models (forward-only and not forward-only). However, trying to do this (even with a new session and a new variable scope), I can't seem to load the model for inference while I also have a model created for training. Without a new session/variable scope, I get errors about duplicated variables. It would be nice if there were a way to switch the model from not forward-only to forward-only.
In this case, is there any way to perform inference (run the full RNN) while I am also in the scope of training it?