Using Torchvision ImageFolder with Test Set - python

I am trying to solve the Dogs-vs-Cats challenge on Kaggle using the Sample Notebook that has been provided in the Udacity course. I have rearranged the files into two folder dogs/ and cats/ in the train/ directory so that the ImageFolder class can pick up the categories, but I don't know what to do in the test folder? I don't have the labels ready.
Do I just not use the ImageFolder API (seems that the course used it, so it should be usable, and obviously very convenient), or is there some option to use it when the classes are not already known. I could not find anything in this vein on the official documentation, but it should be possible seeing the course solution does it that way. Thanks for any help.

Usually, in the context of training networks, the "test" set is actually a "validation" set: its a subset of the labeled examples that the model does not train on, but only being evaluated. This validation set is used for tuning meta parameters (e.g., number of epochs, learning rate, batch size etc.).
Therefore, despite the fact that the validation ("test") set is not used for actual SGD training, you do have its labels and they are used to estimate the generalization error of the trained model.
Since you usually do have the labels for this set, you can read it using ImageFolder class same as the training set.
However, if you have a test set that has no labels at all, you can still use the ImageFolder class to handle the set. All you need is to create a dummy subfolder to represent a "label" for the set: ImageFolder assumes the images are stored in subfolders based on their labels.

Related

Increasing number of classes on TensorFlow checkpoint

For a project I'm working on I have to compare the performance of a small, simulated dataset to the performance of well-known datasets (e.g. ImageNet, CIFAR-10) using two neural networks, ResNet v2 at different depths and Inception v3.
To account for the imbalance between my set and the big popular ones, one of the methods I want to employ is augmenting the big sets using either a class from my simulated set or the same class but instead with real-life images. Then, using a pre-trained model, I want to train on both augmented sets and later compare performance.
The issue
The augmentation works fine: I copy over the .tfrecord files containing only images of the extra class to the original dataset and I modify the labels.txt file to contain the appropriate extra labels. However, the issue occurs when trying to train on this augmented dataset:
I0125 16:40:31.279634 140094914221888 coordinator.py:224] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [258] rhs shape= [257]
which is to be expected, as the graph cannot handle the extra class.
My attempted solution
To alleviate this issue I tried applying the same method as was posed in this answer, using the Python script below (modified for brevity):
checkpoint = {}
saver = tf.compat.v1.train.import_meta_graph(f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}.meta")
with tf.Session() as sess:
saver.restore(sess, f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}")
# get appropriate nodes where number of classes is one of its dimensions
if "inception_v3" in TRAIN_PATH:
target_nodes = [
'<nodes where one of its dimensions is equal to the number of classes>'
]
elif "resnet_v2" in TRAIN_PATH:
target_nodes = [
'<nodes where one of its dimensions is equal to the number of classes>'
]
# select the nodes previously defined
nodes = [node for node in tf.global_variables() if node.name in target_nodes]
for node in nodes:
# load the values of those nodes from the checkpoint
checkpoint[node.name] = node.eval()
# extend dimensions for extra class
checkpoint.update({node.name: np.insert(checkpoint[node.name], checkpoint[node.name].shape[-1], 0.0, axis=-1)})
# assign new nodes to the checkpoint
sess.run(tf.assign(node, checkpoint[node.name], validate_shape=False))
saver.save(sess, f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}")
In short, this script loads the checkpoint and isolates all nodes where one of the dimensions is equal to the number of classes and increases it. I have verified these are always equal to the number of classes using the checkpoints from the different datasets the models have been trained on. In fact, it seems that those tensors are the only thing that's different between the checkpoints.
Although this method does execute correctly and allows me to run the training, it feels very boneheaded and I'm almost positive it is not at all what I'm supposed to do. However I'm curious as to what exactly is wrong with this (apart from initializing the new values as 0.0), as except from a wild fluctuation in loss at the start of the tranining it seems to work as I had hoped.
Other solutions
When looking further into this issue I also found other users' answers on similar questions suggesting that it is in fact impossible to modify a checkpoint in the way that I want to and suggested transfer learning or modifying the output layer. I am very new to neural network training and I don't know who to believe, as the linked answer seems to suggest that what I'm trying to do should work.
My question
So I ask: which approach is correct? Could my initial approach work, or should I try a different method to fix this issue? Starting from scratch would not be ideal, as the progress so far has taken over a month of continuous training.
I am using a modified version the TensorFlow Model Garden as my environment to train the networks, with the modificiations pertaining to creating custom datasets and using them using the supplied training scripts.

Saving then reusing CNN models - preserving initializations

I wish to repeat a series of image classification experiments by reusing a CNN with the same CNN with identical hyperparameters especially initializations. So, if I save a model after I have instantiated it and before I train it, does that also save the initializations so I then reload it later and train with a different data set and labels, does it start this new model with the same hyperparameters and initializations as the first model I trained with the first data set/classification labels? I am currently using fastai which is, of course, a library/set of API's, built on Pythorch but I think that everyone would be helped with a more general explanation that covers all CNN's using any library.
I expect an answer that says, "after this point in the workflow creating a CNN, the model is initialized and if you save it at this point, you can reload it later and use the same hyperparameters and initializations in your next model."
you can save the learner as soon it is created.
Example:
learn = cnn_learner(data,models.resnet34,metrics=error_rate)
learn.save('init')
later on:
learn.load('init)

Why should I build separated graph for training and validation in tensorflow?

I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.

Does Tensorflow use only one hot encoding to store labels?

I have just started working with Tensorflow, with Caffe it was super practical reading in the data in an efficient manner but with Tensorflow I see that I have to write data loading process myself, creating TFRecords, the batching, the multiple threats, handling those threads etc. So I started with an example, inception v3, as they handle the part to read in the data. I am new to Tensorflow and relatively new to Python, so I feel like I don't understand what is going on with this part exactly (I mean yes it extends the size of the labels to label_index * no of files -but- why? Is it creating one hot encoding for labels? Do we have to? Why doesn't it just extend as much for the length or files as each file have a label? Thx.
labels.extend([label_index] * len(filenames))
texts.extend([text] * len(filenames))
filenames.extend(filenames)
The whole code is here: https://github.com/tensorflow/models/tree/master/research/inception
The part mentioned is under data/build_image_data.py and builds image dataset from an existing dataset as images stored under folders (where foldername is the label): https://github.com/tensorflow/models/blob/master/research/inception/inception/data/build_image_data.py
Putting together what we discussed in the comments:
You have to one-hot encode because the network architecture requires you to, not because it's Tensorflow's demand. The network is a N-class classifier, so the final layer will have one neuron per class and you'll train the network to activate the neuron matching the class the sample belongs to. One-hot encoding the label is the first step in doing this.
About the human-readable labels, the code you're referring to is located in the _find_image_files function, which in turn is used by _process_dataset to transform the dataset from a set of folders to a set TfRecord files, which are a convenient input format type for Tensorflow.
The human-readable label string is included as a feature in the Examples inside the tfrecord files as an 'extra' (probably to simplify visualization of intermediate results during training), it is not strictly necessary for the dataset and will not be used in any way in the actual optimization of the network's parameters.

Is there a way to limit an existing TensorFlow object detection model?

On the tensorflow/models repo on GitHub, they supply five pre-trained models for object detection.
These models are trained on the COCO dataset and can identify 90 different objects.
I need a model to just detect people, and nothing else. I can modify the code to only print labels on people, but it will still look for the other 89 objects, which takes more time than just looking for one object.
I can train my own model, but I would rather be able to use a pre-trained model, instead of spending a lot of time training my own model.
So is there a way, either by modifying the model file, or the TensorFlow or Object Detection API code, so it only looks for a single object?
Either you finetune your model, with just a few thousands steps, on pedestrians (a small dataset to train would be enough) or you look in your in your label definition file (.pbtx file), search for the person label, and do whatever stuff you want with the others.

Categories

Resources