I´m working with the recent NLP model from Google
I have read a few post but mostly I´m plying over the colab example Which have all the model ceations steps and testing function. The problem I have at now is that since the model takes a long time to train even using the google TPUs I need to save the the trained model, my guess is that it works similarly as the GPT-2 model in the sense that the model can be trainined over several sessions since it allows to stop training at any moment:
This will take at least 30 minutes to run to completion, but can safely
# be interrupted by selecting "Runtime > Interrupt Execution"
But i I have not found an example on how to save and load the model once trained. In case of GPT-2 a new directory was created automatically for each new model, and to use it it was necessary only point to that new directory, but for this one I´m not finding how to load a previously trained model.
EDIT:
In the notebook I saw this code:
# Set up a Trainer.
output_dir = os.path.expanduser('~/train_dir/')
!rm -f ~/train_dir/model.pkl # Remove old model
trainer = trax.supervised.Trainer(
model=trax.models.ReformerLM,
loss_fn=trax.layers.CrossEntropyLoss,
optimizer=trax.optimizers.Adam,
lr_schedule=trax.lr.MultifactorSchedule,
inputs=trax.supervised.inputs.Inputs(my_inputs),
output_dir=output_dir,
has_weights=True)
Which is deleteing the previous model, I looked into that directory I found this:
I used pickle to load this model.pkl file, which I also copied to my Gdrive folder:
with open('model.pkl', 'rb') as handle:
reformer_model = pickle.load(handle)
reformer_model
But this is just a dictionary with the weigths, not a model to use directly:
if you remove the line "!rm -f ~/train_dir/model.pkl # Remove old model" and change output_dir to point to the folder the saved model is in it will load that model and continue training from where you left off. If there is no model in that directory it will create a new one.
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
%cd /content/gdrive/My\ Drive/
# Train tiny model with Trainer.
output_dir = "CornellMovieDialog/Model/"
trainer = trax.supervised.Trainer(
model=tiny_transformer_lm,
loss_fn=trax.layers.CrossEntropyLoss(),
optimizer=trax.optimizers.Adafactor, # Change optimizer params here.
lr_schedule=trax.lr.MultifactorSchedule, # Change lr schedule here.
inputs=copy_inputs,
output_dir=output_dir)
Related
I am using PytorchLightning and beside others a ModelCheckpoint which saves models with a formated filename like `filename="model_{epoch}-{val_acc:.2f}"
In a process I want to load these checkpoints again, for simplicity lets say I want only the best via save_top_k=N.
As the filename is dynamic I wonder how can I retrieve the checkpoint easily is there a built in attribute or via the trainer that gives the saved checkpoints?
For example like
checkpoint_callback.get_top_k_paths()
I know I can do it with glob and model_dir but wondering if there is a one line solution built in somehwere.
you can retrieve the best model path after training from the checkpoint
# retrieve the best checkpoint after training
checkpoint_callback = ModelCheckpoint(dirpath='my/path/')
trainer = Trainer(callbacks=[checkpoint_callback])
model = ...
trainer.fit(model)
checkpoint_callback.best_model_path
To find all the checkpoints you can get the list of files in the dirpath where the checkpoints are saved.
I am using TensorFlow 2.x object detection API. I have trained a deep learning model from the model zoo on my dataset. I am using Google Colab. After training now I want to evaluate my model. I am using coco detection metrics. I used the following script to evaluate my model,
!python3 model_main_tf2.py \
--model_dir = path/to/model directory \
--pipeline_config_path = path/to/pipeline config file \
--checkpoint_dir = path/to/checkpoint directory
After running the above code I get the mean average precision (mAP) and average recall (AR) for the latest checkpoint on my test set. But for academic purposes, I want to get these metrics on all the checkpoints to get a graph of how my model has improved over time. Is there a possible way to that? or is it possible to train and evaluate at the same time in TensorFlow 2 object detection API? I am a beginner in this field so kindly help me out with this issue. Thank you.
I am facing the same problem. So I had an idea. We can run the model_main_tf2.py you mentioned to eval the model but changing the current checkpoint (first line) to evaluate in the checkpoint file
model_checkpoint_path: "ckpt-1"
then
model_checkpoint_path: "ckpt-2"
then
model_checkpoint_path: "ckpt-3"
.
.
.
For each checkpoint you will get a .tfevent so then you open TensorBoard pointing to the directory that contains all the .tfevent and you can see how the model improves over time.
I just saved the last 3 checkpoints in my computer so I can't see the progress from the beginning (my fault) but if you have all the checkpoints try to do what I suggest.
See my graph evaluating the last 3 checkpoints.
You should have an eval directory including an events.out.tfevents file under your model directory. You can run !tensorboard --logdir=path/to/eval/directory to access the graphs.
You can run training with the same snipped you have except without the checkpoint_dirand can open another terminal to run evaluation like you're currently doing.
I have been searching for a method to do this for so long, and I can not find an answer. Most threads I found are of people wanting to do the opposite.
Backstory:
I am experimenting with some pre-trained models provided by the tensorflow/models repository. The models are saved as .pb frozen graphs. I want to fine-tune some of these models by changing the final layers to suit my application.
Hence, I want to load the models inside a jupyter notebook as a normal keras h5 model.
How can I do that?
do you have a better way to do so?
Thanks.
seems like all you would have to do is download the model files and store them in a directory. Call the directory for example c:\models. Then load the model.
model = tf.keras.models.load_model(r'c:\models')
model.summary() # prints out the model layers
# generate code to modify the model as you typically do for transfer learning
# compile the changed model
# train the model
# save the trained model as a .h5 file
dir=r'path to the directory you want to save the model to'
model_identifier= 'abcd.h5' # for abcd use whatever identification you want
save_path=os.path.join(dir, model_identifier)
model.save(save_path)
I am using matterport repository to train MASK RCNN on a custom dataset. I have been successful in training. Now I want to save the trained model and use it in a web application to detect objects. How do I save the mask rcnn model after training? Please guide me.
The link of the repository:
https://github.com/matterport/Mask_RCNN
Based on this discussion on GitHub, it appears that trained model or weights of matterport/Mask RCNN can be saved as a JSON file in a manner similar to those trained via standard Keras:
import keras
import json
def save_model(trained_model, out_fname="model.json"):
jsonObj = trained_model.keras_model.to_json()
with open(out_fname, "w") as fh:
fj.write(jsonObj)
save_model(model, "mymodel.json")
Update: If you run into the error related to thread-like object, you might find this post helpful...
In the Inspect_model.ipynb notebook under the "Load Model" topic you can save it after it loads the model in inference mode.
in the folder Mask_RCNN/logs generates a folder inside it
I am not sure if we really need to save the whole model again since normally when we used the matterport git we just train new weights on the existing architecture and doesnt make changes to the architecture. When we used this for a pet project , post training - we defined a new model as the MASK RCNN object (from mrcnn.model import MaskRCNN) with the parameter mode as inference and then loaded the newly trained weights model.load_weights('<logpath/trainedweights.h5>', by_name=True)
Problem preface:
I have a database of user created neural network architectures (written in a different language that I transcompile to a Keras model) stored in MongoDB. My goal is to take these architectures, create a Keras model with them, then train them in the cloud using SageMaker. As of right now, I can load the models from MongoDB and transcompile them to Keras perfectly fine. However, I have trouble sending these dynamically created models to SageMaker using the Python SDK.
Is there a way to train and deploy these Keras model architectures - I.E just Python Keras model objects - in SageMaker by specifying the entry_point attribute of an estimator as a file that has these model objects defined?
Work to Date & Code Example
As of right now, I can create a training job and deploy an endpoint when the model architecture is defined in a separate file. See this example of the separate file and the deployment/training process on SageMaker's GitHub.
train-and-deploy-sagemaker.py
# Import Sagemaker Tensorflow
from sagemaker.tensorflow import TensorFlow
# Create an estimator object using the entry_point file entry_point.py
estimator = TensorFlow(entry_point='entry_point.py',
role=arn_role,
framework_version='1.12.0',
hyperparameters={...some hyperparams for the model...},
training_steps=1000,
evaluation_steps=100,
train_instance_count=4, train_instance_type='ml.p3.8xlarge')
# Start the training job to train the above estimator
estimator.fit(training_data_inputs)
# Deploy said estimator after training
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
entry_point.py
def keras_model_fn(hyperparameters):
"""keras_model_fn receives hyperparameters from the training job and returns a compiled keras model.
The model will be transformed into a TensorFlow Estimator before training and it will be saved in a
TensorFlow Serving SavedModel at the end of training.
Args:
hyperparameters: The hyperparameters passed to the SageMaker TrainingJob that runs your TensorFlow
training script.
Returns: A compiled Keras model
"""
model = Sequential()
... add layers ...
return model
def train_input_fn():
...
# other functions for inference and training, see link above
However, is there a way I could define that architecture dynamically? I.E grab the pre-written architecture from MongoDB then transcompile it into the same Sequential Keras model in entrypoint.py?
Potential ideas and concerns:
Idea: Just grab the models from MongoDB and do the transcompiling from within the entry_point file. Then each method required by AWS can reference the compiled model object.
Concern: Is that secure or best practice given AWS will create a VM from this file to run the code in their cloud? Also the source is later stored in an S3 bucket, so that might pose another security risk regardless of permissions. Also, dependencies like pymongo cannot be loaded from within the entry_point file, making the fetching of the data impossible without changing the training image.
Idea: Do the fetching and transcompiling within the file that creates the training job and deployment instance - train-and-deploy-sagemaker.py above. Then pass some code that can reconstruct the model - like Keras model JSON - through the hyperparams attribute within the estimator.
Concern: Hyperparams can only be 256 chars long according to AWS.
Idea: Dynamically generate the entry_point file based on the model architecture it needs to contain.
Concern: Many such as not wanting to create a one-off file on a server for unnecessary I/O reasons, generating code is messy and bad practice, and there has got to be a better way.
Idea: Make the entry_point attribute a non external file and instead specify the required methods within the file where the estimator is created. This would ostensibly solve all of my problems, but ...
Concern: I have seen nothing about this in the SageMaker documentation. Nonetheless, this is the most ideal.
Any help would be appreciated & thanks in advance!
Note that to simplify your training script you can use SageMaker script mode instead of the entry_point.py.
You specify a requirements_file for your estimator, so you'll have the necessary pip installable libraries. If the MongoDB is running in your VPC, you'll want to run the training job in VPC as well.
You can include the relevant files using the source_dir or dependencies params, however they will end up in S3 in any case, you can wipe out the S3 bucket when the job completes.
From class FrameworkModel:
source_dir (str): Path (absolute or relative) to a directory with any
other training
source code dependencies aside from tne entry point file (default: None). Structure within this
directory will be preserved when training on SageMaker.
If the directory points to S3, no code will be uploaded and the S3 location will be used instead.
dependencies (list[str]): A list of paths to directories (absolute or
relative) with
any additional libraries that will be exported to the container (default: []).
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
If the source_dir points to S3, code will be uploaded and the S3 location will be used
instead
Yeah, better avoid.
Not doable as SageMaker reads code from S3. You could pass an environment
Hope it helps.