I have a model that is essentially an Auxiliary Conditional GAN; the first part of the model is the Generator, the last part is the Discriminator. The Discriminator makes multiclass (k=10) predictions.
Following the work of http://arxiv.org/abs/1912.07768 (pp3 for a helpful diagram, but note I ignore network structure modifications for the purposes of this question) I train the entire model for T=32 iterations by generating synthetic input and class labels (The 'inner loop'). I can predict on real data and labels using just the Discriminator(Learner) to get losses. However I need to back-propagate the Discriminator's error all the way back through the inner loop to the Generator.
How can I achieve this with Keras? Is it possible to do loop unrolling in Keras? How can I provide an arbitrary loss and backprop this down the unrolled layers?
Update: There's now one implementation, in PyTorch, which uses Facebooks 'Higher' library. This appears to mean that the updates made during the inner loop must be 'unwrapped' in order for the final meta-loss to be applied throughout the entire network. Is there a Keras way of achieving this? https://github.com/GoodAI/GTN
This seems to be a Generative Adversarial Network (GAN) which both model learns; one is to classify and the other is for generation.
To summarize, the output of the Generator is fed as a Input in the Discriminator and then feeds back its output to the Generator as its Input.
There is also a discussion and implementation of this in TensorFlow Keras using MNIST Digit Dataset in TensorFlow GAN/DCGAN Documentation in this link.
Related
So this may be a silly question but how exactly do the preprocessing layers in keras work, especially in the context of as a part of the model itself. This being compared to preprocessing being applied outside the model then inputting the results for training.
I'm trying to understand running data augmentation in keras models. Lets say I have 1000 images for training. Out of model I can apply augmentation 10x and get 10000 resultant images for training.
But I don't understand what's happening when you use a preprocess layer for augmentation. Does this (or these if you use many) layers take each image and apply the transformations before training? Does this mean the total number of images used for training (and validation I assume) to be the number of epochs*the original number of images?
Is one option better than the other? Does that depend on the number of images one originally has before augmentation?
The benefit of preprocessing layers is that the model is truly end-to-end, i.e. raw data comes in and a prediction comes out. It makes your model portable since the preprocessing procedure is included in the SavedModel.
However, it will run everything on the GPU. Usually it makes sense to load the data using CPU worker(s) in the background while the GPU optimizes the model.
Alternatively, you could use a preprocessing layer outside of the model and inside a Dataset. The benefit of that is that you can easily create an inference-only model including the layers, which then gives you the portability at inference time but still the speedup during training.
For more information, see the Keras guide.
I am a freshman in Keras and deep learning, I am not quite sure the right way to add the regularization, I wrote a CNN autoencoder using the API model class, right now I add the regularizer in each of the "Conv2D" Keras function,I am not sure if this is the right place to add regularization, could anyone please give me some suggestions?
(I tried to run the training and check the reconstructed test images, it is OK, but not very good, I use MNIST to test, the line of the reconstructed MNIST number is thicker than the original one.)
In my problem, the input image is an impaired one, and the original good image is used as a training label, by comparing the output image of the CNN with the training label image, I use the "mean absolute error" to define the loss , and also use it as the metric.
I defined three functions first, one downsampling function (the one below), one upsampling function, and one function to squeeze the third dimension of the matrix to get a two-dimensional matrix as the output.
My code is too long, just to help illustrate the problem, part of my code is as follow:
After having three defined functions, I defined the model as follow (not in detail, just part of it to help explain my problem)
load all necessary parameters to the model,then define the optimizer parameters, and compile the model
I saw a sample of code (too big to paste here) where the author used model.train_on_batch(in, out) instead of model.fit(in, out). The official documentation of Keras says:
Single gradient update over one batch of samples.
But I don't get it. Is it the same as fit(), but instead of doing many feed-forward and backprop steps, it does it once? Or am I wrong?
Yes, train_on_batch trains using a single batch only and once.
While fit trains many batches for many epochs. (Each batch causes an update in weights).
The idea of using train_on_batch is probably to do more things yourself between each batch.
It is used when we want to understand and do some custom changes after each batch training.
A more precide use case is with the GANs.
You have to update discriminator but during update the GAN network you have to keep the discriminator untrainable. so you first train the discriminator and then train the gan keeping discriminator untrainable.
see this for more understanding:
https://medium.com/datadriveninvestor/generative-adversarial-network-gan-using-keras-ce1c05cfdfd3
The method fit of the model train the model for one pass through the data you gave it, however because of the limitations in memory (especially GPU memory), we can't train on a big number of samples at once, so we need to divide this data into small piece called mini-batches (or just batchs). The methode fit of keras models will do this data dividing for you and pass through all the data you gave it.
However, sometimes we need more complicated training procedure we want for example to randomly select new samples to put in the batch buffer each epoch (e.g. GAN training and Siamese CNNs training ...), in this cases we don't use the fancy an simple fit method but instead we use the train_on_batch method. To use this methode we generate a batch of inputs and a batch of outputs(labels) in each iteration and pass it to this method and it will train the model on the whole samples in the batch at once and gives us the loss and other metrics calculated with respect to the batch samples.
I'm building a CNN model using Tensorflow, without the use of any frontend APIs such as Keras. I'm creating a VGG-16 model and using the pre-trained weights, and want to fine tune the last layers to serve my purpose.
Following the tutorial here, http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/
I re-created the training script and modified as per my requirements. However, my training does not happen and the training accuracy is stuck at 50.00% and validation accuracy is forming a pattern repeating the numbers.
Attached is the screenshot of the same.
I have been stuck on this for days now and can't seem to find the error. Any help is appreciated.
The code is pretty long and hence here is the gist file for the same
Your cross entropy is wrong, you are comparing your logits with the softmax of your logits.
This:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_pred)
Should be:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_true)
Some things to note. I would not train on some data point and then evaluate on the same datapoint. Your training accuracy is probably going to be biased by doing so. Another point to note ist that tf.argmax(tf.softmax(logits)) is the same as tf.argmax(logits).
I've been looking through the TensorFlow FullyConnected tutorial. This also uses the helper code mnist.py
I understand the code but for one nagging piece. After training the Neural Net, the weights obtained from training should be used to evaluate the precision of the model on the Validation (and Test) data. However, I don't see that being done anywhere.
Infact, this is the only thing I see in fully_connected_feed.py
# Evaluate against the validation set.
print('Validation Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.validation)
# Evaluate against the test set.
print('Test Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
data_sets.test)
the do_eval() function seems to be passed a parameter eval_correct which seems to be recalculating the logits again on this new data. I've been playing around with TF for a while now but I'm baffled by this code. Any thoughts would be great.
TensorFlow creates a graph with the weights and biases. Roughly speaking while you train this neural net the weights and biases get changed so it produces expected outputs. The line 131 in fully_connected_feed.py (with tf.Graph().as_default():) is used to tell TensorFlow to use the default graph. Therefore every line in the training loop including the calls of the do_eval() function use the default graph. Since the weights obtained from training are not resetted before evaluation they are used for it.
eval_correct is the operation used instead of the training operation to just evaluate the neural net without training it. This is important because otherwise the neural net would be trained to them which would result in distorted (too good) results.