I am trying to optimize a convolutional neural network using both Adam and L-BFGS for comparison purposes. However, I am having a hard time implementing the wrap around function in order to be able to use a Keras Sequential model inside the TensorFlow Probability lbfgs_minimize function (https://www.tensorflow.org/probability/api_docs/python/tfp/optimizer/lbfgs_minimize).
Can anyone please provide me with some direction, here is the reference on the wrap around function:
https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/
Related
In Keras loss page which is here there are 2 main distinction I saw is loss classes vs loss functions? Can anyone explain why for same losses these 2 APIs given? Is it just for class initialization or any other purposes? Also if anyone can explain that in which cases we should use which one that would be great.
Thanks in advance.
A deep learning model can be built and trained in multiple ways.
The simplest approach to build a model would be to use Keras functional API or sequential API to build the model, use compile method to specify optimizer, loss, metrics, etc, and use the fit method for training the model.
If you choose to build the model this way the compile method accepts loss class.
Note: You can use the loss function as well
The actual logic that computes the loss is present in the special call method of a class which is used internally in the fit method.
However, there are cases (mostly in research) where the training loop has to be written in a certain way from scratch, in that case, you can use loss function to compute losses.
Note: You can use the loss class as well
The loss class gives you some extra functionality like specifying logit value, reduction technique, etc. So if your code requires the use of those functionality use the loss class to compute the losses.
If you do not require any such functionality you can simply use the loss function.
Note: Under the hood, both functions call the same TensorFlow graph.
I am following the tutorial on neural style transfer. The style transfer is done by minimizing a loss function with respect to an image (initialized with the content image). What confuses me is the following piece of code:
preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
which is part of the call method in the StyleContentModel class. How does TensorFlow know the gradient of this operation? I have checked if this operation has a gradient function using get_gradient_function in the module tensorflow.python.framework.ops, and as far as I can tell it does not.
It is very simple, the function internally uses symbolic tensor operations that are differentiable. TensorFlow can compute gradients through functions that internally use TensorFlow operations, there is no need to manually define a gradient for each function.
You can confirm by looking at the code of that function here, specially if you look at the _preprocess_symbolic_function here which is using normal scalar operations and Keras backend functions (which are just TensorFlow functions in tf.keras).
This has nothing to do with the model or gradients. What this function does is scale the input images so the pixels are in the range from -1 to +1. This is a common requirement for many models used in transfer learning like VGG and MobileNet. If you use the ImageDataGenerator it has a parameter preprocessing_function which the generator calls to preprocess the images. Make sure if you preprocess the training images you do the same for the test and validation images.
This tutorial describes how to build a TFF computation from keras model.
This tutorial describes how to build a custom TFF computation from scratch, possibly with a custom federated learning algorithm.
What I need is a combination of these: I want to build a custom federated learning algorithm, and I want to use an existing keras model. Q. How can it be done?
The second tutorial requires MODEL_TYPE which is based on MODEL_SPEC, but I don't know how to get it. I can see some variables in model.trainable_variables (where model = tff.learning.from_keras_model(keras_model, ...), but I doubt it's what I need.
Of course, I can implement the model by hand (as in the second tutorial), but I want to avoid it.
I think you have the correct pointers for writing a custom federated computation, as well as converting a Keras model to a tff.learning.Model. So we'll focus on pulling a TFF type signature from an existing tff.learning.Model.
Once you have your hands on such a model, you should be able to use tff.learning.framework.weights_type_from_model to pull out the appropriate TFF type to use for your custom algorithm.
There is an interesting caveat here: how precisely you use a tff.learning.Model in your custom algorithm is pretty much up to you, and this could affect your desired model weights type. This is unlikely to be the case (likely you will simply be assigning values from incoming tensors to the model variables), so I think we should prefer to avoid going deeper into this caveat.
Finally, a few pointers of end-to-end custom algorithm implementations in TFF:
One of the simplest complete examples TFF has is simple_fedavg, which is totally self-contained and contains instructions for running.
The code for a paper on Adaptive Federated Optimization contains a handwritten implementation of learning rate decay on the clients in TFF.
A similar implementation of adaptive learning rate decay (think Keras' functions to decay learning rate on plateaus) is right next door to the code for AFO.
I am trying to implement TD-Gammon, as described in this paper, which uses the TD-Lambda learning algorithm . This has been done already here, but it is 4 years old and doesn't use Tensorflow 2. I am trying to do this in Tensorflow 2 and think I need to create a custom optimizer to perform the weight change as described in the paper linked above.
I know that to create a custom optimizer, you need to subclass the Optimizer class and implement the create_slots, resource_apply_dense, resource_apply_sparse, and get_config methods. However, the weight change algorithm for TD-Lambda requires the neural network outputs (Y_t-1 and Y_t in the paper) and the resource_apply_dense method doesn't seem to have access to that.
How do I access the neural network outputs? Or am I just going about this the wrong way?
I am working with the DCGAN code. I need to modify the reward that is given to one of the neural nets by adding a function that would take the output of this neural net, analyse it, and issue a penalty on it. So my loss function would look like:
self.g_loss = self.g_loss + self.penalty
Problem is
this penalty function only takes the numpy arrays as an input (I have no way of modifying this),
neural network output is a tf.tensor,
and as the values haven't been assigned to the neural net yet (technically it hasn't been built yet) I can't run neither .eval() nor sess.run().
So how would I convert a tensorflow tensor into numpy array in this case?
Tensorflow has tf.py_func for wrapping Python functions and passing tensors to them. However, you can't then use this loss function to train the network, because Tensorflow doesn't automatically differentiate numpy code.
Luckily for you, autograd does automatically differentiate numpy code. If you use that, in another tf.pyfunc call, you can get gradients, which you can then put back into the tensorflow graph on the backward pass.
Here's an example of how you can do it all in this gist.