I'm trying to train a very simple neural network to classify samples of data where some classes necessarily succeed others - this is why I decided to let the input data enter the network in batches. Using Tensorflow, apparently you get multiple ways of declaring batches, like tf.data.Dataset.batch (with which I currently train using the Adam Optimizer) and tf.train.batch. Where is the difference? Should the methods be used together or are they exclusive? In the latter case: which one should I prefer?
tf.train.* is an older API, more complex and prone to errors than the tf.data.* one (you need to take care yourself of queues, thread runners, coordinator, etc). For your stated purpose (batching data and feeding it to a model), the two are functionally equivalent, as in both achieve your goal. However, you should consider using tf.data as that's both simpler to use and the currently recommended way to handle input datasets.
Related
I am new to Pytorch and I am now following the tutorial on transforms. I see that the transformations are configured into the dataset object. I am wondering, however, why aren't they configured within the neural network itself. My naive point of view is that the transformations should be in any case the most external layers of the network, in the same way as the eye comes before the brain to transform light into signals for the brain, and you don't modify the world instead to adapt it to the brain.
So, is there any technical reason for putting the transformations in the dataset instead of the net? Is it a good/bad practice to put the transformations within my neural network instead? Why?
These are some of the reason that can explain why one would do this.
We would like to use the same NN code for training as well as testing / inference. Typically during inference, we don't want to do any transformation and hence one might want to keep it out of the network. However, you may argue that one can just simply use model.training flag to skip the transformation.
Most of the transformations happen on CPU. Doing transformations in dataset allows to easily use multi-processing and prefetching. The dataset code can prefetch the data, transform, and keep it ready to be fed into the NN in a separate thread. If instead, we do it inside the forward function, GPUs will idle during the transformations (as these happen on CPU), likely leading to a longer training time.
I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.
I'm using Keras for a sliding window object detection system. This naturally requires the ability to do many, many classifications quickly. Unfortunately, Keras's model.predict() function has a significant overhead and takes longer to load? preprocess the data? who knows, than it does to do the actual network processing. I know because I've tried removing layers, etc. and it makes almost no difference to the time spent in a model.predict() call.
So basically what I'm looking for is a way to use one network and run predictions on several inputs at once. Not necessarily in separate threads, but without returning to my code. Is anyone aware of such a technique?
I have two models trained with Tensorflow Python, exported to binary files named export1.meta and export2.meta. Both files will generate only one output when feeding with input, say output1 and output2.
My question is if it is possible to merge two graphs into one big graph so that it will generate output1 and output2 together in one execution.
Any comment will be helpful. Thanks in advance!
I kicked this around with my local TF expert, and the brief answer is "no"; TF doesn't have a built-in facility for this. However, you could write custom endpoint layers (input and output) with synch operations from Python's process management, so that they'd maintain parallel processing of each input, and concatenate the outputs.
Rationale
I like the way this could be used to get greater accuracy with multiple features, where the features have little or no correlation. For instance, you could train two character recognition models: one to identify the digit, the other to discriminate between left- and right-handed writers.
This would also allow you to examine the internal kernels that evolved for each individual feature, without interdependence with other features: the double-loop of an '8' vs the general slant of right-handed writing.
I also expect that the models for individual features will converge measurably faster than one over-arching training session.
Finally, it's quite possible that the individual models could be used in mix-and-match feature sets. For instance, train another model to differentiate letters, while letting your previously-trained left/right flagger would still have a pretty good guess at the writer's moiety.
OK, it's so easy in Torch ML ;) and I am following indico example for threading to load the data- https://indico.io/blog/tensorflow-data-input-part2-extensions/
So, for I found three ways, which I don't like and I am sure there is a better way.
1) Train and evaluate\validated on two different application\app\run- tensorflow/models/image/cifar10/cifar10_train.py and cifar10_eval.py
I don't like this one because I will waste resources i.e. GPUs where cifar10_eval.py will run. I can do this both from one file or application but don't like to save if model is not the best performing model!
2) Create validation model with weight sharing- tensorflow/models/image/mnist/convolutional.py
Much better but I dont like the fact that I need to remember all the model parameters, I am sure there is a better way to share parameters in TensorFlow i.e. can I just copy the model and say it's for parameters sharing but input feeds are different?
3) The one currently I am doing is using tf.placeholder
But can't do threading things i.e. tf.RandomShuffleQueue with this approach. May be I don't know how to do via this approach.
So, how could I do, threading to load train data and do one epoch of training then use these weights and again do threading to load validation data and get the model performance?
Basically, I am saying multi-threads to load train and valid data and save the best peforming model. Example EXACTLY similar to imagenet multi GPU training in torch- https://github.com/soumith/imagenet-multiGPU.torch
Thank you so much!
The variable-sharing approach is probably the easiest way to do what you want.
Take a look at the "Sharing Variables" tutorial; by using tf.variable_scope() and tf.get_variable() you can reuse variables without having to manage the sharing explicitly. You can instead define the model in a function, call it with different arguments, but share the model variables between the two calls.
There are also convenience layers that wrap Tensorflow's variable management. One option is Tensorflow Slim, which makes it easier to define some classes of models (especially convolutional models).