As I am reading the Keras Code for Sequential models I see that it only allows for a single output for any defined layer within the Sequential model. I am aware how to do this using the functional API (Model class).
However, I don't see why the Sequential model is limited to layers with a single output. Is there a design limitation for enforcing such constraint?
Not actually. Sequential model is here to make things simpler, when designing smaller and straight-forward Neural Networks. As noted here, they can be useful for most problems.
The Sequential API allows you to create models layer-by-layer for most
problems. It is limited in that it does not allow you to create models
that share layers or have multiple inputs or outputs.
But if you need more complex design, with multiple input/output as well as models that share layers, you can use the Functional API to achieve your goal.
Related
I need to implement a neural network which is NOT layer based, meaning that ANY neuron may be connected to any other neuron, and that there's no way to logically organize them in consecutive layers.
What I'm asking for is an example or a reference to proper and clear documentation about how to implement the following:
Originally I had my own implementation in matlab, however, I've been using TensorFlow and Keras to test simple models and it allows to tune your networks very fast and the implementations are pretty efficient, so I decided to try out more complex models, however, I just got stuck creating this type of network.
HINT: It MAY be OK to create single-neuron layers, as long as you can connect a layer to ANY layer (without caring if it is not adjacent) and to MORE THAN ONE LAYER.
I'm new to Tf and Keras, so a simple python example would be appreciated, althought, pointing me in the right direction would be OK.
This is an example network (¡loops are intentional!):
I dont need to train at the moment, just to evaluate models, however, keep in mind that evaluation of this kind of network is different too, one possible way is to keep with the signal sending until output stabilices, but it is just an example.
Well I start learning Tensorflow but I notice there's so much confusion about how to use this thing..
First, some tutorials present models using low level API tf.varibles, scopes...etc, but other tutorials use Keras instead and for example to use tensor board to invoke callbacks.
Second, what's the purpose of having ton of duplicate API, really what's the purpose behind using high level API like Keras when you have low level to build model like Lego blocks?
Finally, what's the true purpose of using eager execution?
You can use these APIs all together. E.g. if you have a regular dense network, but with an special layer you can use higher level API for dense layers (tf.layers and tf.keras) and low level API for your special layer. Furthermore, it is complex graphs are easier to define in low level APIs, e.g. if you want to share variables, etc.
Eager execution helps you for fast debugging, it evaluates tensors directly without a need of invoking a session.
There are different "levels" of APIs (high-level APIs such as keras and estimators, and low level APIs such as Variables, etc) to suit different developer needs.
For the average industry developer, who already knows approximately what ML model you intend to use, keras is a good fit. For example, if you know you want to implement a sequential model with two dense layers with softmax activation, you need only do something like:
model = keras.Sequential([
keras.layers.Dense(128, activation=tf.nn.softmax),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
Using keras is generally simpler as you don't have to think about low-level implementation details such as tf.Variables. For more complete examples, check out the keras tutorials on tensorflow.org.
The low-level API allows users finer control over the models you're developing. These APIs are more commonly used by developers and researchers developing novel ML methods; for example, if you need a specialized layer that does something different from canonical ML methods, you can manually define a layer using low level APIs.
Finally, eager execution is an imperative programming style. It enables faster debugging, and has a gentler learning curve for those new to tensorflow, since it is more "pythonic"/intuitive. Check out the eager guide for more.
I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.
What extra can be done using Keras functional API, which could not be done using keras sequential models?
Apart from the fact that a simple model can be reused for a time bases data using “TimeDistributed” layer wrapper ?
It is much more than model reuse, the functional API allows you to easily define models where layers connect to more than just the previous and next layers. You can connect layers to any other layers as you wish, so siamese networks, densely connected networks and such become possible. The old Graph API allowed the same level of connectivity but it was a PITA due to its use of layer node names to define connectivity.
The sequential model is just a sequential set of layers, and new neural network architectures at this time are moving away from such pattern.
OK, it's so easy in Torch ML ;) and I am following indico example for threading to load the data- https://indico.io/blog/tensorflow-data-input-part2-extensions/
So, for I found three ways, which I don't like and I am sure there is a better way.
1) Train and evaluate\validated on two different application\app\run- tensorflow/models/image/cifar10/cifar10_train.py and cifar10_eval.py
I don't like this one because I will waste resources i.e. GPUs where cifar10_eval.py will run. I can do this both from one file or application but don't like to save if model is not the best performing model!
2) Create validation model with weight sharing- tensorflow/models/image/mnist/convolutional.py
Much better but I dont like the fact that I need to remember all the model parameters, I am sure there is a better way to share parameters in TensorFlow i.e. can I just copy the model and say it's for parameters sharing but input feeds are different?
3) The one currently I am doing is using tf.placeholder
But can't do threading things i.e. tf.RandomShuffleQueue with this approach. May be I don't know how to do via this approach.
So, how could I do, threading to load train data and do one epoch of training then use these weights and again do threading to load validation data and get the model performance?
Basically, I am saying multi-threads to load train and valid data and save the best peforming model. Example EXACTLY similar to imagenet multi GPU training in torch- https://github.com/soumith/imagenet-multiGPU.torch
Thank you so much!
The variable-sharing approach is probably the easiest way to do what you want.
Take a look at the "Sharing Variables" tutorial; by using tf.variable_scope() and tf.get_variable() you can reuse variables without having to manage the sharing explicitly. You can instead define the model in a function, call it with different arguments, but share the model variables between the two calls.
There are also convenience layers that wrap Tensorflow's variable management. One option is Tensorflow Slim, which makes it easier to define some classes of models (especially convolutional models).