Two-step training in neural network - python

In the article https://www.nature.com/articles/s41598-019-51269-8 on sleep-stage classification, the author mentions two-step training. Specifically,
"In the pretraining step, the scoring module (Fig. 2) is temporarily replaced with a softmax layer, which plays the same roles as the original scoring module...In the fine-tuning step, the softmax layer is replaced by the original scoring module. Then, the entire system is trained again using the same training dataset..."
I do not want the actual code. I just want a code snippet that demonstrates the idea of using two-step training like the one mentioned in the article. It can be as short as you like, but it must show how two-step training is done.

You could use the Functional API of Keras. It enables you to define your layers separately and assing to a variable. Hence you could connect and disconnect your layers at any time.

Related

Smartest way to add KL Divergence into (Variational) Auto Encoder

I have an Auto Encoder model with multiple outputs and weightening which a want to enrich into a Variational Auto Encoder.
I followed this: https://keras.io/examples/generative/vae/ official keras tutorial.
But if a manually adapt the train_step function I lose the majority of my original implementation details:
I got two weighted optimization goals: re-construction (decoder) and classification (softmax)
accuracy metrics for the classification
the original fit method also takes care of the validation data and corresponding metrics
Adding the suggested sampling layer according to the keras link is no problem, but to correctly implement the Kullback-Leibler-Loss as it depends on the additional parameters z_mu and z_log_var which is not supported by standard Keras losses.
I search for some workarounds to solve this issue but none of them was succesfull:
re-writing the train_step: its hard to fully re-implement all details (
weightening, multiple losses with different inputs -> decoder: data, classifier: labels etc)
adding a psyeudo layer to the ecoder that calculates the loss: https://tiao.io/post/tutorial-on-variational-autoencoders-with-a-concise-keras-implementation/ like here. But here is the problem that the add loss function does not specify to which key and how KL-Loss is added to the model's total loss
Adding everything as global/top-level element to make the z_mu, z_log_var accessible for the loss calculation like here: https://www.machinecurve.com/index.php/2019/12/30/how-to-create-a-variational-autoencoder-with-keras/. This is the approach I like the least as my current architecture is parametrized to be able to e.g. perform hyperopt tuning
I was not able to find a pleasing solution to this problem, as VAE's are more and more popular I am surprised by the phenomenon that there is no extended tutorial about this especially when dealing with multiple in- and outputs. Or I am just unable to find the right answers through my query.
Any opinions welcome!
After a couple of re-designs I and bug-ticket tracing I found this recent example:
here
The VAE examples can be found at the very bottom of the post.
Solution: write your own train_step: cleanest but also hardest solution depending how complex your loss calculation is.
Solution: use a functional approach the access the necessary variables and add the loss with .add_loss: not very clean but straight to implement (you will lose an additional loss tracker for the KL-loss)
To achieve my weighting I weighted the KL loss before I added it via .add_loss according to the weight of my decoder loss.
Note: The first solution I tested was to define a custom loss function for the mse+kl loss and added it into my functional designed model - this works if one turns of the tf eager eval off. But be careful this really slows down your network and you will lose the ability to monitor your training via tensorboard if you don't have admin rights for your nvidia gpu (profile_batch=0 does not turn off profiling if eager mode is switched off, therefore you ran into INSUFFICENT_PRIVILEDGES Errors with the CUPTI driver)

Keras: Manually backprop from Discriminator to Generator

I have a model that is essentially an Auxiliary Conditional GAN; the first part of the model is the Generator, the last part is the Discriminator. The Discriminator makes multiclass (k=10) predictions.
Following the work of http://arxiv.org/abs/1912.07768 (pp3 for a helpful diagram, but note I ignore network structure modifications for the purposes of this question) I train the entire model for T=32 iterations by generating synthetic input and class labels (The 'inner loop'). I can predict on real data and labels using just the Discriminator(Learner) to get losses. However I need to back-propagate the Discriminator's error all the way back through the inner loop to the Generator.
How can I achieve this with Keras? Is it possible to do loop unrolling in Keras? How can I provide an arbitrary loss and backprop this down the unrolled layers?
Update: There's now one implementation, in PyTorch, which uses Facebooks 'Higher' library. This appears to mean that the updates made during the inner loop must be 'unwrapped' in order for the final meta-loss to be applied throughout the entire network. Is there a Keras way of achieving this? https://github.com/GoodAI/GTN
This seems to be a Generative Adversarial Network (GAN) which both model learns; one is to classify and the other is for generation.
To summarize, the output of the Generator is fed as a Input in the Discriminator and then feeds back its output to the Generator as its Input.
There is also a discussion and implementation of this in TensorFlow Keras using MNIST Digit Dataset in TensorFlow GAN/DCGAN Documentation in this link.

R-CNN: looking for REPO where FC for classification is retrainable

I'm studying different object detection algorithms for my interest.
The main reference are Andrej Karpathy's slides on object detection slides here.
I would like to start from some reference, in particular something which allows me to directly test some of the network mentioned on my data (mainly consisting in onboard cameras of car and bike races).
Unfortunately I already used some pretrained network (repo forked from JunshengFu one, where I slightly adapt Yolo to my use case), but the classification accuracy is rather poor, I guess because there were not many training instances of racing cars like Formula 1.
For this reason I would like to retrain the networks and here is where I'm finding the most issues:
properly training some of the networks requires either hardware (powerful GPUs) or time I don't have so I was wondering whether I could retrain just some part of the network, in particular the classification network and if there is any repo already allowing that.
Thank you in advance
That is called fine-tuning of the network or transfer-learning. Basically you can do that for any network you find (having similar problem domains of course), and then depending on the amount of the data you have you will either fine-tune whole network or freeze some layers and train only last layers. For your case you would probably need to freeze whole network except last fully-connected layers (which you will actually replace with new ones, satisfying your number of classes), which perform classification. I don't know what library you use, but tensorflow has official tutorial on transfer-learning. However it's not very clear tbh.
More user-friendly tutorial you can find here by some enthusiast: tutorial. Here you can find a code repository as well. One correction you need thou is that the author performs fine-tuning of the whole network, while if you want to freeze some layers you will need to get list of the trainable variables and remove those you want to freeze and pass the resultant list to the optimizer (so he ignores removed vars), like following:
all_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,scope='InceptionResnetV2')
to_train = all_vars[-6:] // you better specify them by name explicitely, but this still will work
optimizer = tf.train.AdamOptimizer(lr=0.0001)
train_op = slim.learning.create_train_op(total_loss,optimizer, variables_to_train=to_train)
Further, tensorflow has a so called model zoo (bunch of trained models you can use for your purposes and transfer-learning). You can find it here.

Tensorflow: run time test metrics and data queues

I want to compute and display accuracy on the test set while the network is training.
In the MNIST tutorial that uses feeds, one can see that it can be done easily by feeding test data rather than train data. Simple solution to a simple problem.
However I am not able to find such an easy example when using queues for batching. AFAICS, the documentation proposes two solutions:
Offline testing with saved states. I don't want offline.
Making a second 'test' network that share weights with the network being trained. That doesn't sound simple and I have not seen an example of that.
Is there a third, easy way to compute test metrics at run time? Or is there an example somewhere of the second, test network with shared weights that proves me wrong by being super simple to implement?
If I understand your question correctly, you want to validate your model while training with queue inputs not feed_dict?
see my program that does this.
Here is a short explanation:
First you need to convert you data into train and validation files like 'train.tfreords' and 'valid.tfreocrds'
Second in your training program start two queues that parse this two files,
and use sharing variables to get the two logits for train and valid
In my program this is done by
with tf.variable_scope("inference") as scope:
logits = mnist.inference(images)
scope.reuse_variables()
validation_logits = mnist.inference(validation_images)
then use logits to do get train loss and minimize it and use validation_logits to get valid accuracy

How to use TF model for inference while training

I'm adapting the Tensorflow tutorial for sequence to sequence modeling for my project. Specifically, I am basing my code off of translate.py.
The tutorial computes the perplexity on the dev set every n training steps. I'd instead like to calculate BLEU score on the dev set.
The problem I'm facing is that when creating a model, you specify whether it is forward only or not. Looking through the code, it seems that if it is (which happens when training), at each step the network will not calculate the final output for the input sequence, but will calculate gradients. When it's not forward only (as in the decoding function later in the tutorial), it applies the loop function which feeds the output back into the input for the RNN which allows for the output sequence to be generated. However, it doesn't compute the gradients. So as far as I understand it, you can construct a model for either training (i.e. gradients) or testing (i.e. performing full inference on it).
Since I want to compute the BLEU score, I need some sequence produced by the model which corresponds to an input sequence in the dev set. Because of how the models are constructed, I would need both types of models (forward-only and not forward-only). However, trying to do this (even with a new session and a new variable scope), I can't seem to load the model for inference while I also have a model created for training. Without a new session/variable scope, I get errors about duplicated variables. It would be nice if there were a way to switch the model from not forward-only to forward-only.
In this case, is there any way to perform inference (run the full RNN) while I am also in the scope of training it?

Categories

Resources