Keras vs TensorFlow - does Keras have any actual benefits?

Keras vs TensorFlow - does Keras have any actual benefits? - python

I have been implementing some deep nets in Keras, but have eventually gotten frustrated with some limitations (for example: setting floatx to float16 fails on batch normalization layers, and the only way to fix it is to actually edit the Keras source; implementing custom layers requires coding them in backend code, which destroys the ability to switch backends), there appear to be no parallel training mechanisms [unlike tf.Estimator], and even vanilla programs run 30% slower in Keras than in tf (if one is to trust the interwebs), and was grumbling about moving to tensorflow, but was pleased to discover that TensorFlow (especially if you use tf.layers stuff) is not actually any longer for anything imaginable you might want to do. Is this a failure of my imagination, or is tf.layers basically a backporting of Keras into core TensorFlow, and is there any actual use case for Keras?

Keras used to have an upper hand on TensorFlow in the past but ever since the author is now affiliated with Google all the features that made it attractive are being implemented into TensorFlow you can check version 1.8, like you rightfully pointed out tf.layers is one such example.

Related

Tensorflow model quantization best strategy

I'm perplexed by the Tensorflow post-training quantization process. The official site refers to Tensorflow Lite Quantization. Unfortunately, this doesn't work in my case, that is, TFLiteConverter returns errors for my Mask RCNN model:
Some of the operators in the model are not supported by the standard TensorFlow Lite runtime and are not recognized by TensorFlow. If you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: <...>. Here is a list of operators for which you will need custom implementations: DecodeJpeg, StatelessWhile.
Basically, I've tried all available options offered by TFLiteConverter including experimental ones. I'm not quite surprised with those errors as it might make sense not to support decodejpeg for the mobile, however, I want my model to be served by Tensorflow Serving, thus I don't know why Tensorflow Lite is the official choice to go for.
I've also tried Graph Transform Tool, which seems to be deprecated, and discovered 2 issues. Firstly it's impossible to quantize with bfloat16 or float16, only int8. Secondly, the quantized model breaks with the error:
Broadcast between [1,20,1,20,1,256] and [1,1,2,1,2,1] is not supported yet
what isn't an issue in the regular model.
Furthermore, it's worth to mention my model was originally built with Tensorflow 1.x, and then ported to Tensorflow 2.1 via tensorflow.compat.v1.
This issue stole a significant amount of my time. I'd be grateful for any cue.

You can convert the model to Tensorflow Lite and use unsupported ops (Like DecodeJpeg) from TF, this is called SELECT TF OPS, see the guide here on how to enable it during conversion.

What is the use of tf.keras.backend nowadays, is it safer/more future proof to code w/ or w/o it?

I understand the historical need for keras.backend in the long gone days of multiframework support. But now that we are talking about tf.keras, and since Keras is scheduled to support this toolkit only, I am wondering what is today's use for tf.keras.backend. From what I can see, it exposes only a fraction of the functions available in tf.*, and evolves more slowly.
So, is tf.keras.backend
better be avoided, because it is an obsolete remnant of the past that is likely to be dropped in a future release?
or, a future-proof alternative to tf.* to be preferred whenever possible, because this API changes at a much slower pace than TF itself and is not going down anytime soon?
or something else?

It is difficult to say either is better at this point. Because keras backend offers unique feature(s) (still).
For example, K.rnn is a very valuable function provided by Keras backend. This can be used to iterate the temporal output of a sequential model (LSTM/GRU) on the temporal dimension. This is pretty useful when you have to do a map() like function on each temporal output of a sequential model (e.g. computing attention vector for each LSTM output of the encoder). This is a very convenient functions to achieve the above because, (as far as I know) doing this with tf.* involves tf.gather and can become ugly (especially in TF 1.x). I am not really sure about other functions that might offer a unique advantage over tf.*. But probably there are a few (e.g. K.foldl).
On the other hand, tf.* does offer many more functions than what the Keras backend offers.
In conclusion, I think it's too early to completely avoid Keras backend. But I do feel like the keras backend will get merged to tf.* at some point in order to offer a more consistent API.

tf and tf.keras Dense layer shows completely different behavior in my setup

While using tensorflow 1.14, I noticed some very strange behavior when using tf.layers.Dense vs tf.keras.layers.Dense. People on Stackoverflow say that these two layers are exactly the same, and I basically would agree, but having a look at the discounted reward while training an AC agent results in the following graph:
The arguments are exactly the same. Repeated runs lead to the same result (see differently colored data in image). As far as I understand the code, one of the Dense layers inherits from the other: tf.keras.layers.core and tf.layers.core.
Is anyone able to explain this behavior?

According to a response to a similar issue on the stable_baseline repository, it seems that keras does not support shared weights between multiple agents. Therefore, when training an actor-critic network with multiple instances, every environment has its own network which leads to completely different results. The fix is to only use tensorflow layers directly which support reuse of the same weights.

Tensorflow2.0 training: model.compile vs GradientTape

I am starting to learn Tensorflow2.0 and one major source of my confusion is when to use the keras-like model.compile vs tf.GradientTape to train a model.
On the Tensorflow2.0 tutorial for MNIST classification they train two similar models. One with model.compile and the other with tf.GradientTape.
Apologies if this is trivial, but when do you use one over the other?

This is really a case-specific thing and it's difficult to give a definite answer here (it might border on "too opinion-based). But in general, I would say
The "classic" Keras interface (using compile, fitetc.) allows for quick and easy building, training & evaluation of standard models. However, it is very high-level/abstract and as such doesn't give you much low-level control. If you are implementing models with non-trivial control flow, this can be hard to accommodate.
GradientTape gives you full low-level control over all aspects of training/running your model, allowing easier debugging as well as more complex architectures etc., but you will need to write more boilerplate code for many things that a compiled model will hide from you (e.g. training loops). Still, if you do research in deep learning you will probably be working on this level most of the time.

Difference between different tensorflow fully connected layers

What is the difference between the different fully connected layers available in tensorflow. I understand that there could 2 versions: Object oriented and functional, but I was able to find 4 different layers in tensorflow:
tf.keras.layers.Dense
tf.layers.dense
tf.layers.Dense
tf.contrib.layers.fully_connected
The documentation contains examples using all of them. I'd also like to know when to use each layer.

Keras is a deep learning library which functions as a wrapper over 'lower level' languges such as Tensorflow and Theano. It has recently been integrated as a Tensorflow project and is part of the code-base. If you are using 'raw' Tensorflow, you should not use this layer.
Tensorflow defines a functional interface. Layers and operations that are lowercase are typically part of this. These functions are used as building blocks when defining a custom layer or a loss function.
This is the layer you should be using.
This comes from the contrib library - features that are typically more experimental and volatile. Once a feature is deemed stable, you should use its other implementation (3). (4) will still be present in the library to maintain backwards compatability.

Is a Keras wrapper function. Its functionality is same as 3. Checkout Keras.
Its a functional interface for tensorflow.
Commonly used.
Function under development.
Technically speaking first 3 have same functionality (same inputs and outputs).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.