I'm adapting the Tensorflow tutorial for sequence to sequence modeling for my project. Specifically, I am basing my code off of translate.py.
The tutorial computes the perplexity on the dev set every n training steps. I'd instead like to calculate BLEU score on the dev set.
The problem I'm facing is that when creating a model, you specify whether it is forward only or not. Looking through the code, it seems that if it is (which happens when training), at each step the network will not calculate the final output for the input sequence, but will calculate gradients. When it's not forward only (as in the decoding function later in the tutorial), it applies the loop function which feeds the output back into the input for the RNN which allows for the output sequence to be generated. However, it doesn't compute the gradients. So as far as I understand it, you can construct a model for either training (i.e. gradients) or testing (i.e. performing full inference on it).
Since I want to compute the BLEU score, I need some sequence produced by the model which corresponds to an input sequence in the dev set. Because of how the models are constructed, I would need both types of models (forward-only and not forward-only). However, trying to do this (even with a new session and a new variable scope), I can't seem to load the model for inference while I also have a model created for training. Without a new session/variable scope, I get errors about duplicated variables. It would be nice if there were a way to switch the model from not forward-only to forward-only.
In this case, is there any way to perform inference (run the full RNN) while I am also in the scope of training it?
Related
For a project I'm working on I have to compare the performance of a small, simulated dataset to the performance of well-known datasets (e.g. ImageNet, CIFAR-10) using two neural networks, ResNet v2 at different depths and Inception v3.
To account for the imbalance between my set and the big popular ones, one of the methods I want to employ is augmenting the big sets using either a class from my simulated set or the same class but instead with real-life images. Then, using a pre-trained model, I want to train on both augmented sets and later compare performance.
The issue
The augmentation works fine: I copy over the .tfrecord files containing only images of the extra class to the original dataset and I modify the labels.txt file to contain the appropriate extra labels. However, the issue occurs when trying to train on this augmented dataset:
I0125 16:40:31.279634 140094914221888 coordinator.py:224] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [258] rhs shape= [257]
which is to be expected, as the graph cannot handle the extra class.
My attempted solution
To alleviate this issue I tried applying the same method as was posed in this answer, using the Python script below (modified for brevity):
checkpoint = {}
saver = tf.compat.v1.train.import_meta_graph(f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}.meta")
with tf.Session() as sess:
saver.restore(sess, f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}")
# get appropriate nodes where number of classes is one of its dimensions
if "inception_v3" in TRAIN_PATH:
target_nodes = [
'<nodes where one of its dimensions is equal to the number of classes>'
]
elif "resnet_v2" in TRAIN_PATH:
target_nodes = [
'<nodes where one of its dimensions is equal to the number of classes>'
]
# select the nodes previously defined
nodes = [node for node in tf.global_variables() if node.name in target_nodes]
for node in nodes:
# load the values of those nodes from the checkpoint
checkpoint[node.name] = node.eval()
# extend dimensions for extra class
checkpoint.update({node.name: np.insert(checkpoint[node.name], checkpoint[node.name].shape[-1], 0.0, axis=-1)})
# assign new nodes to the checkpoint
sess.run(tf.assign(node, checkpoint[node.name], validate_shape=False))
saver.save(sess, f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}")
In short, this script loads the checkpoint and isolates all nodes where one of the dimensions is equal to the number of classes and increases it. I have verified these are always equal to the number of classes using the checkpoints from the different datasets the models have been trained on. In fact, it seems that those tensors are the only thing that's different between the checkpoints.
Although this method does execute correctly and allows me to run the training, it feels very boneheaded and I'm almost positive it is not at all what I'm supposed to do. However I'm curious as to what exactly is wrong with this (apart from initializing the new values as 0.0), as except from a wild fluctuation in loss at the start of the tranining it seems to work as I had hoped.
Other solutions
When looking further into this issue I also found other users' answers on similar questions suggesting that it is in fact impossible to modify a checkpoint in the way that I want to and suggested transfer learning or modifying the output layer. I am very new to neural network training and I don't know who to believe, as the linked answer seems to suggest that what I'm trying to do should work.
My question
So I ask: which approach is correct? Could my initial approach work, or should I try a different method to fix this issue? Starting from scratch would not be ideal, as the progress so far has taken over a month of continuous training.
I am using a modified version the TensorFlow Model Garden as my environment to train the networks, with the modificiations pertaining to creating custom datasets and using them using the supplied training scripts.
I'm implementing LightGBM (Python) into a continuous learning pipeline. My goal is to train an initial model and update the model (e.g. every day) with newly available data.
Most examples load an already trained model and apply train() once again:
updated_model = lightgbm.train(params=last_model_params, train_set=new_data, init_model = last_model)
However, I'm wondering if this is actually the correct way to approach continuous learning within the LightGBM library since the amount of fitted trees (num_trees()) grows for every application of train() by n_estimators. For my understanding a model update should take an initial model definition (under a given set of model parameters) and refine it without ever growing the amount of trees/size of the model definition.
I find the documentation regarding train(), update() and refit() not particularly helpful. What would be considered the right approach to implement continuous learning with LightGBM?
In lightgbm (the Python package for LightGBM), these entrypoints you've mentioned do have different purposes.
The main lightgbm model object is a Booster. A fitted Booster is produced by training on input data. Given an initial trained Booster...
Booster.refit() does not change the structure of an already-trained model. It just updates the leaf counts and leaf values based on the new data. It will not add any trees to the model.
Booster.update() will perform exactly 1 additional round of gradient boosting on an existing Booster. It will add at most 1 tree to the model.
train() with an init_model will perform gradient boosting for num_iterations additional rounds. It also allows for lots of other functionality, like custom callbacks (e.g. to change the learning rate from iteration-to-iteration) and early stopping (to stop adding trees if performance on a validation set fails to improve). It will add up to num_iterations trees to the model.
What would be considered the right approach to implement continuous learning with LightGBM?
There are trade-offs involved in this choice and no one of these is the globally "right" way to achieve the goal "modify an existing model based on newly-arrived data".
Booster.refit() is the only one of these approaches that meets your definition of "refine [the model] without ever growing the amount of trees/size of the model definition". But it could lead to drastic changes in the predictions produced by the model, especially if the batch of newly-arrived data is much smaller than the original training data, or if the distribution of the target is very different.
Booster.update() is the simplest interface for this, but a single iteration might not be enough to get most of the information from the newly-arrived data into the model. For example, if you're using fairly shallow trees (say, num_leaves=7) and a very small learning rate, even newly-arrived data that is very different from the original training data might not change the model's predictions by much.
train(init_model=previous_model) is the most flexible and powerful option, but it also introduces more parameters and choices. If you choose to use train(init_model=previous_model), pay attention to parameters num_iterations and learning_rate. Lower values of these parameters will decrease the impact of newly-arrived data on the trained model, higher values will allow a larger change to the model. Finding the right balance between those is a concern for your evaluation framework.
I have built a stock prediction model using LSTM. however, everytime when I run the program, the value of RMSE and the prediction result keep changing ( I did not change any data in the program. It giving out different result everytime when I clicking the run buttom everytime, ) Can anyone let me know what is the reason of it. Thank you very much
I will suggest you know more about layers and some other basic things of neural networks.
How does a neural network learn?
A neural network contains three types of layers. Input, output, and hidden layers. All these layers contain neurons or you can say nodes. Every layer's neurons are connected with it's previous and next layer's neurons. Take a look at the picture below.
You can call the connections 'path'. Every path has some weights value. A neuron's input value is calculated by summing all the multiplications of outputs of previous layer's neuron and the path's weight value. Then the sum value is processed by some activation function. You can learn more about it by joining online classes or from tutorials.
But my point is, prediction completely depends on those weights. And those weights value keeps changing depending on the learning rate and some other stuff during training. What about the very beginning? at epoch no. 1? Basically model generates some random weights for all the paths. Then keeps changing those values during training to minimize the loss.
Every time you run your train, it generates random values. That's why you get different results each time. If you fix those values using tf.seed or some other method, you will get reproducible results. btw, you don't need to train every time. Save your model weights, then load it whenever you need to predict. You will get the same result every time you load the model weights and use that model to predict.
There are many sources of randomness in machine learning (described in link below).
https://machinelearningmastery.com/randomness-in-machine-learning/
In this case it may probably be "Randomness in the Algorithm" even if you make sure to apply exactly same data in same order.
I have a Tensorflow question regarding Reinforcement Learning. I have everything working and training, but there is something that feels redundant. Want to point it out and hear your thoughts:
Lets assume something simple like episodic REINFORCE. Given this standard setup:
state -> network -> logits
When I want to train (when episode complete), I need to:
pass in an array of states (saved from running the episode) to a TF Placeholder
do a forward pass with those saved states to produce logits
compute log_probs (using saved array of actions)
compute loss (using saved array of advantages)
This works fine. However, what seems redundant is steps 1&2. I'd prefer to calculate log_probs during each step of episode while episode is being run. This way I don't have to do steps 1,2,3 during training, and forward pass is only performed once (during episode). I'd have my log_probs calculated by the time the episode was over.
However, if I create placeholders for log_probs and advantages, and don't pass in states (for the redundant forward prop), then I don't know how to get TF to know where the variables are for backprop. I get the error:
ValueError: No gradients provided for any variable
So my questions are:
if I'm passing in states, is it true that forward prop is being run again during training?
can I prevent this by my method above, finding some way to tell TF where the gradients are?
In case anyone wants to see actual code (I tried to be clear enough not to need it), here is a gist of the script in question
EDIT: I think the answer has something to do with using optimzer.compute_gradients (where I can pass in variables) and optimizer.apply_gradients, but not sure how yet...
I'm trying to write something similar to google's wide and deep learning after running into difficulties of doing multi-class classification(12 classes) with the sklearn api. I've tried to follow the advice in a couple of posts and used the tf.group(logistic_regression_optimizer, deep_model_optimizer). It seems to work but I was trying to figure out how to get predictions out of this model. I'm hoping that with the tf.group operator the model is learning to weight the logistic and deep models differently but I don't know how to get these weights out so I can get the right combination of the two model's predictions. Thanks in advance for any help.
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/Cs0R75AGi8A
How to set layer-wise learning rate in Tensorflow?
tf.group() creates a node that forces a list of other nodes to run using control dependencies. It's really just a handy way to package up logic that says "run this set of nodes, and I don't care about their output". In the discussion you point to, it's just a convenient way to create a single train_op from a pair of training operators.
If you're interested in the value of a Tensor (e.g., weights), you should pass it to session.run() explicitly, either in the same call as the training step, or in a separate session.run() invocation. You can pass a list of values to session.run(), for example, your tf.group() expression, as well as a Tensor whose value you would like to compute.
Hope that helps!