I'd like to have 2 separate networks running in Theano at the same time, where the first network trains on the results of the second. I could embed both networks in the same structure but that would be a real mess in the entire forward pass (and probably won't even work because of the shared variables etc.)
The problem is that when I define a theano function I don't specify the model it's applied on, meaning if I'm having a predict and a train function they'll both work on the first model I define.
Is there a way to overcome that issue?
In a rather simplified way I've managed to find a nice solution. The trick was to create one model, define its function and then create the other model and define the second function. Works like a charm
Related
I am new to Pytorch and I am now following the tutorial on transforms. I see that the transformations are configured into the dataset object. I am wondering, however, why aren't they configured within the neural network itself. My naive point of view is that the transformations should be in any case the most external layers of the network, in the same way as the eye comes before the brain to transform light into signals for the brain, and you don't modify the world instead to adapt it to the brain.
So, is there any technical reason for putting the transformations in the dataset instead of the net? Is it a good/bad practice to put the transformations within my neural network instead? Why?
These are some of the reason that can explain why one would do this.
We would like to use the same NN code for training as well as testing / inference. Typically during inference, we don't want to do any transformation and hence one might want to keep it out of the network. However, you may argue that one can just simply use model.training flag to skip the transformation.
Most of the transformations happen on CPU. Doing transformations in dataset allows to easily use multi-processing and prefetching. The dataset code can prefetch the data, transform, and keep it ready to be fed into the NN in a separate thread. If instead, we do it inside the forward function, GPUs will idle during the transformations (as these happen on CPU), likely leading to a longer training time.
I'm using the Keras plot_model() function to visualize my machine learning models. Apart from having the issue that the first node of my output is always simply a very large number, there is another thing annoying me about this function: It does not provide a very elaborate output. For example, I would like to be able to see more information about the used loss function, the batch size, the number of epochs, the used optimizer, etc...
Is there any way I can retrieve this information from a model I previously saved to the disk and loaded again with the model_from_json() function?
How about TensorBoardCallback? It will create interactive graphs that you can explore based on your model if you use Tensorflow as your backend.
You just need add it as a callback to your fit function and make sure write_graph=True is set (which it is by default). If you want a shortcut you can directly invoke its methods instead of passing as a callback:
tensorboard = TensorboardCallback()
tensorboard.set_model(model) # your model here, will write graph etc
tensorboard.on_train_end() # will close the writer
Then just run tensorboard --logdir=./logs to start the server.
I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.
OK, it's so easy in Torch ML ;) and I am following indico example for threading to load the data- https://indico.io/blog/tensorflow-data-input-part2-extensions/
So, for I found three ways, which I don't like and I am sure there is a better way.
1) Train and evaluate\validated on two different application\app\run- tensorflow/models/image/cifar10/cifar10_train.py and cifar10_eval.py
I don't like this one because I will waste resources i.e. GPUs where cifar10_eval.py will run. I can do this both from one file or application but don't like to save if model is not the best performing model!
2) Create validation model with weight sharing- tensorflow/models/image/mnist/convolutional.py
Much better but I dont like the fact that I need to remember all the model parameters, I am sure there is a better way to share parameters in TensorFlow i.e. can I just copy the model and say it's for parameters sharing but input feeds are different?
3) The one currently I am doing is using tf.placeholder
But can't do threading things i.e. tf.RandomShuffleQueue with this approach. May be I don't know how to do via this approach.
So, how could I do, threading to load train data and do one epoch of training then use these weights and again do threading to load validation data and get the model performance?
Basically, I am saying multi-threads to load train and valid data and save the best peforming model. Example EXACTLY similar to imagenet multi GPU training in torch- https://github.com/soumith/imagenet-multiGPU.torch
Thank you so much!
The variable-sharing approach is probably the easiest way to do what you want.
Take a look at the "Sharing Variables" tutorial; by using tf.variable_scope() and tf.get_variable() you can reuse variables without having to manage the sharing explicitly. You can instead define the model in a function, call it with different arguments, but share the model variables between the two calls.
There are also convenience layers that wrap Tensorflow's variable management. One option is Tensorflow Slim, which makes it easier to define some classes of models (especially convolutional models).
I'm trying to write something similar to google's wide and deep learning after running into difficulties of doing multi-class classification(12 classes) with the sklearn api. I've tried to follow the advice in a couple of posts and used the tf.group(logistic_regression_optimizer, deep_model_optimizer). It seems to work but I was trying to figure out how to get predictions out of this model. I'm hoping that with the tf.group operator the model is learning to weight the logistic and deep models differently but I don't know how to get these weights out so I can get the right combination of the two model's predictions. Thanks in advance for any help.
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/Cs0R75AGi8A
How to set layer-wise learning rate in Tensorflow?
tf.group() creates a node that forces a list of other nodes to run using control dependencies. It's really just a handy way to package up logic that says "run this set of nodes, and I don't care about their output". In the discussion you point to, it's just a convenient way to create a single train_op from a pair of training operators.
If you're interested in the value of a Tensor (e.g., weights), you should pass it to session.run() explicitly, either in the same call as the training step, or in a separate session.run() invocation. You can pass a list of values to session.run(), for example, your tf.group() expression, as well as a Tensor whose value you would like to compute.
Hope that helps!