I know with pytorch you can turn off training by calling eval() on your model.
Also you can set requires_grad=False.
How can you ensure that a TensorFlow element is not modified during training?
If you don't want to train certain Variables in TensorFlow you can achieve this behaviour by adding trainable=False to Variables.
Related
I am currently trying to create a Tensorflow DNN model with a multilabel target variable, and whilst my code hasn't had any problems so far, the imbalanced nature of the dataset that I'm working with has caused a few problems.
As per recommendations in Keras' documentation, I've applied an intial bias to the model. I've also tried to enable the class weight parameter in the model compile function and this is where I'm stuck
https://github.com/tensorflow/tensorflow/issues/41448
There seems to be a known bug in this method as seen in this GitHub link, and my attempts at creating a workaround haven't been successful at all. I'd appreciate any advice on creating a workaround because I'm at a loss myself to be honest. Currently running Tensorflow 2.4
You are using a slightly old version of TensorFlow. This worked for me in a multiclass dataset using TensorFlow 2.7 and Keras 2.7:
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight(class_weight="balanced", classes=np.unique(y_train),
y=y_train)
model.fit(
...
class_weight=dict(enumerate(class_weights))
)
The values of y_train must be integers in the range [0, NUMBER_CLASSES - 1] for this code to work correctly. You can accomplish this using LabelEncoder.
Alternatively, you can use sample_weight instead of class_weight to accomplish the same thing (in fact, Keras internally converts class_weight to sample_weight). Here you can find the documentation about these parameters.
Other easy-to-implement and effective methods to combat data imbalance are oversampling and undersampling, which have a similar effect to using class_weight. You can use them in case you have problems using class_weight or sample_weight.
I built my own sklearn-like estimator using pytorch training inside GPU (cuda) and it works fine with RandomizedSearchCV when n_jobs==1. When n_jobs > 1, I get the following error:
PicklingError: Can't pickle <class 'main.LSTM'>: attribute lookup LSTM on main failed
This is the piece of code giving me the error:
model = my_model(input_size=1, hidden_layer_size=80, n_lstm_units=3, bidirectional=False,
output_size=1, training_batch_size=60, epochs=7500, device=device)
model.to(device)
hidden_layer_size = random.uniform(40, 200, 20).astype("int")
n_lstm_units = arange(1, 4)
parametros = {'hidden_layer_size': hidden_layer_size, 'n_lstm_units': n_lstm_units}
splitter = ShuffleSplit()
regressor = model
cv_search = \
RandomizedSearchCV(estimator=regressor, cv=splitter,
search_spaces=parametros,
refit=True,
n_iter=4,
verbose=1,
n_jobs=2,
scoring=make_scorer(mean_squared_error,
greater_is_better=False,
needs_proba=False))
cv_search = MetaSKLearnWrapper(cv_search)
cv_search.fit(X, y)
Using Neuraxle wrapper leads to exactly same error, changes nothing.
I found closest solution here, but still don't know how to use RandomizedSearchCV within Neuraxle. It is a brand new project, so I couldn't find an answer on their docs or community examples. If anyone can give me an example or a good indication it will save my life. Thank you
Ps: Any way to run RandomizedSearchCV with my pytorch model on the gpu without Neuraxle also helps, I just need n_jobs>1.
Ps2: My model has a fit() method that creates and moves tensors to the gpu and works already tested.
There are multiple criteria that must be respected here for your code to work:
You need to use Neuraxle's RandomSearch instead of sklearn's random search for this to work. Use Neuraxle's base classes when possible.
Make sure that you use a Neuraxle BaseStep for your pytorch model, instead of a sklearn base classe.
Also, you should create your PyTorch code only in the setup() method or later. You can't create a PyTorch model in the __init__ of the BaseStep that contains pytorch code. You will want to read this page.
You will probably have to create a Saver for your BaseStep that contains PyTorch code if you want to serialize and then load your trained pipeline again. You can see how we created our TensorFlow Saver for our TensorFlow BaseStep and do something similar. Your saver will probably be much simpler than ours due to the more eager nature of PyTorch. For instance, you could have self.model inside your extension of the BaseStep class. The role of the saver would be to save and strip away this simple variable from self, and to be able to reload it whenever needed.
To sum up: you'd need to create two classes, and your two classes should look very similar our two TensorFlow step and saver classes here, to the exception that you PyTorch model is in a self.model variable of your step.
I'd be glad to see your implementation of your PyTorch base step and of your PyTorch saver!
You could then also even use the AutoML class (see AutoML example here) to save experiments in a Hyperparameter Repository as seen in the example.
Let's say that I want to fine tune one of the Tensorflow Hub image feature vector modules. The problem arises because in order to fine-tune a module, the following needs to be done:
module = hub.Module("https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3", trainable=True, tags={"train"})
Assuming that the module is Resnet50.
In other words, the module is imported with the trainable flag set as True and with the train tag. Now, in case I want to validate the model (perform inference on the validation set in order to measure the performance of the model), I can't switch off the batch-norm because of the train tag and the trainable flag.
Please note that this question has already been asked here Tensorflow hub fine-tune and evaluate but no answer has been provided.
I also raised a Github issue about it issue about it.
Looking forward to your help!
With hub.Module for TF1, the situation is as you say: either the training or the inference graph is instantiated, and there is no good way to import both and share variables between them in a single tf.Session. That's informed by the approach used by Estimators and many other training scripts in TF1 (esp. distributed ones): there's a training Session that produces checkpoints, and a separate evaluation Session that restores model weights from them. (The two will likely also differ in the dataset they read and the preprocessing they perform.)
With TF2 and its emphasis on Eager mode, this has changed. TF2-style Hub modules (as found at https://tfhub.dev/s?q=tf2-preview) are really just TF2-style SavedModels, and these don't come with multiple graph versions. Instead, the __call__ function on the restored top-level object takes an optional training=... parameter if the train/inference distinction is required.
With this, TF2 should match your expectations. See the interactive demo tf2_image_retraining.ipynb and the underlying code in tensorflow_hub/keras_layer.py for how it can be done. The TF Hub team is working on making more complete selection of modules available for the TF2 release.
Is there a way to take a trained TensorFlow model and convert all the tf.Variables and their respective weights (either from within a running tf.Session or from a checkpoint) into tf.constants with that value, such that one can run the model on a new input tensor without initializing or restoring the weights in a session? So can I basically condense a trained model into a fixed and immutable TensorFlow operation?
Yes, there is a freeze_graph.py tool just for that purpose.
It is described (a bit) in the Tool Developer's Guide. And you can find usage example in the Preparing models for mobile deployment section.
I'm trying to write something similar to google's wide and deep learning after running into difficulties of doing multi-class classification(12 classes) with the sklearn api. I've tried to follow the advice in a couple of posts and used the tf.group(logistic_regression_optimizer, deep_model_optimizer). It seems to work but I was trying to figure out how to get predictions out of this model. I'm hoping that with the tf.group operator the model is learning to weight the logistic and deep models differently but I don't know how to get these weights out so I can get the right combination of the two model's predictions. Thanks in advance for any help.
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/Cs0R75AGi8A
How to set layer-wise learning rate in Tensorflow?
tf.group() creates a node that forces a list of other nodes to run using control dependencies. It's really just a handy way to package up logic that says "run this set of nodes, and I don't care about their output". In the discussion you point to, it's just a convenient way to create a single train_op from a pair of training operators.
If you're interested in the value of a Tensor (e.g., weights), you should pass it to session.run() explicitly, either in the same call as the training step, or in a separate session.run() invocation. You can pass a list of values to session.run(), for example, your tf.group() expression, as well as a Tensor whose value you would like to compute.
Hope that helps!