I am starting to learn Tensorflow2.0 and one major source of my confusion is when to use the keras-like model.compile vs tf.GradientTape to train a model.
On the Tensorflow2.0 tutorial for MNIST classification they train two similar models. One with model.compile and the other with tf.GradientTape.
Apologies if this is trivial, but when do you use one over the other?
This is really a case-specific thing and it's difficult to give a definite answer here (it might border on "too opinion-based). But in general, I would say
The "classic" Keras interface (using compile, fitetc.) allows for quick and easy building, training & evaluation of standard models. However, it is very high-level/abstract and as such doesn't give you much low-level control. If you are implementing models with non-trivial control flow, this can be hard to accommodate.
GradientTape gives you full low-level control over all aspects of training/running your model, allowing easier debugging as well as more complex architectures etc., but you will need to write more boilerplate code for many things that a compiled model will hide from you (e.g. training loops). Still, if you do research in deep learning you will probably be working on this level most of the time.
Related
This tutorial describes how to build a TFF computation from keras model.
This tutorial describes how to build a custom TFF computation from scratch, possibly with a custom federated learning algorithm.
What I need is a combination of these: I want to build a custom federated learning algorithm, and I want to use an existing keras model. Q. How can it be done?
The second tutorial requires MODEL_TYPE which is based on MODEL_SPEC, but I don't know how to get it. I can see some variables in model.trainable_variables (where model = tff.learning.from_keras_model(keras_model, ...), but I doubt it's what I need.
Of course, I can implement the model by hand (as in the second tutorial), but I want to avoid it.
I think you have the correct pointers for writing a custom federated computation, as well as converting a Keras model to a tff.learning.Model. So we'll focus on pulling a TFF type signature from an existing tff.learning.Model.
Once you have your hands on such a model, you should be able to use tff.learning.framework.weights_type_from_model to pull out the appropriate TFF type to use for your custom algorithm.
There is an interesting caveat here: how precisely you use a tff.learning.Model in your custom algorithm is pretty much up to you, and this could affect your desired model weights type. This is unlikely to be the case (likely you will simply be assigning values from incoming tensors to the model variables), so I think we should prefer to avoid going deeper into this caveat.
Finally, a few pointers of end-to-end custom algorithm implementations in TFF:
One of the simplest complete examples TFF has is simple_fedavg, which is totally self-contained and contains instructions for running.
The code for a paper on Adaptive Federated Optimization contains a handwritten implementation of learning rate decay on the clients in TFF.
A similar implementation of adaptive learning rate decay (think Keras' functions to decay learning rate on plateaus) is right next door to the code for AFO.
I see many examples with either MonitoredTrainingSession or tf.Estimator as the training framework. However it's not clear why I would use one over the other. Both are configurable with SessionRunHooks. Both integrate with tf.data.Dataset iterators and can feed training/val datasets. I'm not sure what the benefits of one setup would be.
Short answer is that MonitoredTrainingSession allows user to access Graph and Session objects, and training loop, while Estimator hides the details of graphs and sessions from the user, and generally, makes it easier to run training, especially, with train_and_evaluate, if you need to evaluate periodically.
MonitoredTrainingSession different from plain tf.Session() in a way that it handles variables initialization, setting up file writers and also incorporates functionality for distributed training.
Estimator API, on the other hand, is a high-level construct just like Keras. It's maybe used less in the examples because it was introduced later. It also allows to distribute training/evaluation with DistibutedStrategy, and it has several canned estimators which allow rapid prototyping.
In terms of model definition they are pretty equal, both allow to use either keras.layers, or define completely custom model from the ground up. So, if, for whatever reason, you need to access graph construction or customize training loop, use MonitoredTrainingSession. If you just want to define model, train it, run validation and prediction without additional complexity and boilerplate code, use Estimator
Well I start learning Tensorflow but I notice there's so much confusion about how to use this thing..
First, some tutorials present models using low level API tf.varibles, scopes...etc, but other tutorials use Keras instead and for example to use tensor board to invoke callbacks.
Second, what's the purpose of having ton of duplicate API, really what's the purpose behind using high level API like Keras when you have low level to build model like Lego blocks?
Finally, what's the true purpose of using eager execution?
You can use these APIs all together. E.g. if you have a regular dense network, but with an special layer you can use higher level API for dense layers (tf.layers and tf.keras) and low level API for your special layer. Furthermore, it is complex graphs are easier to define in low level APIs, e.g. if you want to share variables, etc.
Eager execution helps you for fast debugging, it evaluates tensors directly without a need of invoking a session.
There are different "levels" of APIs (high-level APIs such as keras and estimators, and low level APIs such as Variables, etc) to suit different developer needs.
For the average industry developer, who already knows approximately what ML model you intend to use, keras is a good fit. For example, if you know you want to implement a sequential model with two dense layers with softmax activation, you need only do something like:
model = keras.Sequential([
keras.layers.Dense(128, activation=tf.nn.softmax),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
Using keras is generally simpler as you don't have to think about low-level implementation details such as tf.Variables. For more complete examples, check out the keras tutorials on tensorflow.org.
The low-level API allows users finer control over the models you're developing. These APIs are more commonly used by developers and researchers developing novel ML methods; for example, if you need a specialized layer that does something different from canonical ML methods, you can manually define a layer using low level APIs.
Finally, eager execution is an imperative programming style. It enables faster debugging, and has a gentler learning curve for those new to tensorflow, since it is more "pythonic"/intuitive. Check out the eager guide for more.
I have a dataset with 11k instances containing 0s,1s and -1s. I heard that deep learning can be applied to feature values.Hence applied the same for my dataset but surprisingly it resulted in less accuracy (<50%) compared to traditional machine learning algos (RF,SVM,ELM). Is it appropriate to apply deep learning algos to feature values for classification task? Any suggestion is greatly appreciated.
First of all, Deep Learning isn't a mythical hammer you can throw at every problem and expect better results. It requires careful analysis of your problem, choosing the right method, crafting your network, properly setting up your training, and only then, with a lot of luck will you see significantly better results than classical methods.
From what you describe (and without any more details about your implementation), it seems to me that there could have been several things going wrong:
Your task is simply not designed for a neural network. Some tasks are still better solved with classical methods, since they manually account for patterns in your data, or distill your advanced reasoning/knowledge into a prediction. You might not be directly aware of it, but sometimes neural networks are just overkill.
You don't describe how your 11000 instances are distributed with respect to the target classes, how big the input is, what kind of preprocessing you are performing for either method, etc, etc. Maybe your data is simply processed wrong, your training is diverging due to unfortunate parameter setups, or plenty of other things.
To expect a reasonable answer, you would have to share at least a bit of code regarding the implementation of your task, and parameters you are using for training.
I've been using tensorflow for a while now. At first I had stuff like this:
def myModel(training):
with tf.scope_variables('model', reuse=not training):
do model
return model
training_model = myModel(True)
validation_model = myModel(False)
Mostly because I started with some MOOCs that tought me to do that. But they also didn't use TFRecords or Queues. And I didn't know why I was using two separate models. I tried building only one and feeding the data with the feed_dict: everything worked.
Ever since I've been usually using only one model. My inputs are always place_holders and I just input either training or validation data.
Lately, I've noticed some weird behavior on models that use tf.layers.dropout and tf.layers.batch_normalization. Both functions have a 'training' parameter that I use with a tf.bool placeholder. I've seen tf.layers used generally with a tf.estimator.Estimator, but I'm not using it. I've read the Estimators code and it appears to create two different graphs for training and validation. May be that those issues are arising from not having two separate models, but I'm still skeptical.
Is there a clear reason I'm not seeing that implies that two separate-equivalent models have to be used?
You do not have to use two neural nets for training and validation. After all, as you noticed, tensorflow helps you having a monolothical train-and-validate net by allowing the training parameter of some layers to be a placeholder.
However, why wouldn't you? By having separate nets for training and for validation, you set yourself on the right path and future-proof your code. Your training and validation nets might be identical today, but you might later see some benefit to having distinct nets such as having different inputs, different outputs, removing out intermediate layers, etc.
Also, because variables are shared between them, having distinct training and validation nets comes at almost no penalty.
So, keeping a single net is fine; in my experience though, any project other than playful experimentation is likely to implement a distinct validation net at some point, and tensorflow makes it easy to do just that with minimal penalty.
tf.estimator.Estimator classes indeed create a new graph for each invocation and this has been the subject of furious debates, see this issue on GitHub. Their approach is to build the graph from scratch on each train, evaluate and predict invocations and restore the model from the last checkpoint. There are clear downsides of this approach, for example:
A loop that calls train and evaluate will create two new graphs on every iteration.
One can't evaluate while training easily (though there are workarounds, train_and_evaluate, but this doesn't look very nice).
I tend to agree that having the same graph and model for all actions is convenient and I usually go with this solution. But in a lot of cases when using a high-level API like tf.estimator.Estimator, you don't deal with the graph and variables directly, so you shouldn't care how exactly the model is organized.