I trained a model use kaggle.com
The final code is:
history = model.fit(dataset, epochs=1)
model.save("/kaggle/working/039_model.h5")
print("Model saved successfully!")
The output is:
31368/31368 [==============================] - 23489s 749ms/step - loss: 1.4623
Model saved successfully!
For test, I just trained it for 1 epoch, but I cannot find my model in the /kaggle/working directory in the right side bar. Even if I click the refresh button or refresh the page.
The page picture is :
my problem picture
Thanks for your help!
The refresh button of browser refreshes the whole environment that you were working on, all variables that were set, any file that were saved in storage instance allotted to you for that particular session.
You are given an instance of a server resource, i.e., RAM, few GB of temporary storage, CPU/GPU for a certain period of time. Refreshing the browser starts completely new instance losing all local changes to that instance.
So your model even though it might be saved in the local storage of
that instance (on server), it will get deleted once that session
expires (after 20 minutes of inactivity) or when you run out of quota
or when you refreshes your page.
Solution is to don't refresh your page, look where the model is saved by exploring in side bar by clicking dropdown button and downloading the model for future use case.
Side Note: Committing your notebook will only commit, i.e., save that particular version of your code in the notebook, and will not save anything that you saved to the local storage of that instance, like in your case a saved model file.
Well, maybe I know what happend, I just find out I should save and run all.
File -> save version -> save & run all
so we can get all we want.
It is application manage, it will save on target directory except you using checkpoint callback !
savedir = 'F:\\models\\save\\FlappyBird_15'
checkpoint_path = "F:\\models\\checkpoint\\FlappyBird_15\\TF_DataSets_01.h5"
if not exists(checkpoint_dir) :
os.mkdir(checkpoint_dir)
print("Create directory: " + checkpoint_dir)
### model initialize ###
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(1200, 1)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128, return_sequences=True, return_state=False)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128)),
])
model.add(layers.Flatten())
model.add(layers.Dense(64))
model.add(layers.Dense(2))
model.summary()
########################
if exists(checkpoint_path) :
model.load_weights(checkpoint_path)
print("model load: " + checkpoint_path)
input("Press Any Key!")
model.save_weights(checkpoint_path)
Related
I'm using Google Colab for finetuning a pre-trained model.
I successfully preprocessed a dataset and created an instance of the Seq2SeqTrainer class:
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
The problem is training it from last checkpoint after the session is over.
If I run trainer.train(), it runs correctly. As it takes a long time, I sometimes came back to the Colab tab after a few hours, and I know that if the session has crashed I can continue training from the last checkpoint like this: trainer.train("checkpoint-5500")
The checkpoint data does no longer exist on Google Colab if I come back too late, so even though I know the point the training has reached, I will have to start all over again.
Is there any way to solve this problem? i.e. extend the session?
To fix your problem try adding a full fixed path, for example for your google drive and saving the checkpoint-5500 to it.
Using your trainer you can set the output directory as your Google Drive path when creating an instance of the Seq2SeqTrainingArguments.
When you come back to your code, if the session is indeed over you'll just need to load your checkpoint-5500 from your google drive instead of retraining everything.
Add the following code:
from google.colab import drive
drive.mount('/content/drive')
And then after your trainer.train("checkpoint-5500") is finished (or as it's last step) save your checkpoint to your google drive.
Or if you prefer, you can add a callback inside your fit function in order to save and update after every single epoch (that was if for some reason the session is crashing before it finish you'll still have some progress saved).
I try to load tf.keras model direcly from cloud bucket but I can't see easy wat to do it.
I would like to load whole model structure not only weights.
I see 3 possible directions:
Is posssible to load keras model directly from Google cloud bucket? Command tf.keras.model.load_model('gs://my_bucket/model.h5') doesn't work
I tried to use tensorflow.python.lib.ii.file_io but I don't know how to load this as model.
I copied model to local directory by gsutil cp command but I don't know how to wait until operation will be complete. tf try to load model before download operation is complete so the errors occurs
I will be thankful for any sugestions.
Peter
Load the file from gs storage
from tensorflow.python.lib.io import file_io
model_file = file_io.FileIO('gs://mybucket/model.h5', mode='rb')
Save a temporary copy of the model locally
temp_model_location = './temp_model.h5'
temp_model_file = open(temp_model_location, 'wb')
temp_model_file.write(model_file.read())
temp_model_file.close()
model_file.close()
Load model saved locally
model = tf.keras.models.load_model(temp_model_location)
i am new to Microsoft Custom Vision and I am working on an intergration of Microsoft Azure Custom Vision API using jupyter notebooks/python. I was able to upload images, tag them automatically and train the first iterations. However, as I was trying to download a Docker file of the train iteration/model I got stuck while trying to export the model. Using the function export_iteration I ended up having an mst.rest.pipeline.clientrawresponse object. I think currently it is only stored in the exporting queue. How do I access this queue element to download it to my local system?
PS: I am working with a General (compact) model format so it should be exportable.
Example code:
# Initalize the Training Client
training_key = "your-training-key"
ENDPOINT = "your-endpoint"
c_plat = CustomVisionTrainingClient(training_key,ENDPOINT)
# List all projects you have
projects = c_plat.get_projects()
#Always take the newest project and its newest iteration and export it
iterations = c_plat.get_iterations(projects[0].id)
c_plat.export_iteration(project_id=projects[0].id, iteration_id=iterations[0].id, platform = "DockerFile", raw=True, flavor = "ARM")
After some trial and error I found a solution:
#Always takes the newest project and its newest iteration
iterations = c_plat.get_iterations(projects[0].id)
response = c_plat.export_iteration(project_id=projects[0].id, iteration_id=iterations[0].id, platform = "DockerFile", raw=False, flavor="ARM")
# Opnening the uri
import webbrowser
webbrowser.open(c_plat.get_exports(project_id=projects[0].id, iteration_id=iterations[0].id)[0].download_uri)
This opened the uri in a new tab and started the automatic download. Hope this might help somebody else.
Cheers!
I am training a neural network for object detection using Google Colab. I wanted to visualize the learning process but every time I try to access tensorboard, it shows me the following:
No dashboards are active for the current data set. Probable causes: - You haven’t written any data to your event files. - TensorBoard can’t find your event files.
I am not training the model locally and have configured my google drive account with the colab notebook for the training data so user hpabst's answer does not seem useful.
I also tried setting up tensorboard using ngrok but that gave me a similar output.
I made sure I am generating summary data in a log directory by creating a summary writer:
import tensorflow as tf
sess = tf.Session()
file_writer = tf.summary.FileWriter('/content/logs/my_log_dir/', sess.graph)
and followed that with
tensorboard = TensorBoard(log_dir="/content/logs/my_log_dir/",batch_size=32, write_graph=True, update_freq='epoch')
model.fit_generator(
train_generator,
steps_per_epoch=(train_data/BS),
epochs=EPOCHS,
validation_data=validation_generator,
validation_steps=(test_data/BS),
callbacks=[tensorboard, checkpoint])
and finally
tensorboard --logdir /content/logs/my_log_dir/
The event files are in place. The path to the log directory is also correct.
Like I said, I was getting the same- No active dashboards error using ngrok. I moved over to the SCALARS menu in the Tensorboard GUI and to the left, under the runs section, at the bottom, I found out that the path to the log directory was being shown as '/content/ log /my_log_dir' although everywhere in my code I had only mentioned the path as -'/content/ logs /my_log_dir'. Maybe setting up tensorboard using ngrok expects the files to be in the 'log' and not the 'logs' directory. I made the change and now it works just fine.
I am looking to serve the Tensorflow models to make a Docker image and deploy using AWS. For this I need to have .pb and variables files that is must while serving any Tensorflow model. But, I only have checkpoint file of the model. Is there any way to restore variables folder from the checkpoint file?
I am able to create the .pb file, but not sure how to get the variables folder.
ckpt = tf.train.latest_checkpoint(args.model_path)
model.load_weights(ckpt)
ckpt_filename = os.path.basename(ckpt)
saved_model_path = os.path.join('pb_files', ckpt_filename)
model.save(saved_model_path)
https://www.tensorflow.org/guide/saved_model
Hello, I created this snippet from the above document. This code will create pb file, variables folder, and assets folder.