I have created a word2vec using Google Colab. However, when I try to save it using the code that I generally use to save on my computer, the file doesn't appear:
model.init_sims(replace=True)
model_name = "Twitter"
model.save()
Related
I use google colab for training my model. I use tensorflow 2.2.
Here is how I create summary writer:
writer = tf.summary.create_file_writer(os.path.join(model_dir, 'logs'), max_queue=1)
Here is my code which I run each step:
with writer.as_default():
tf.summary.scalar("train_loss", total_loss, step=num_steps)
writer.flush()
The problem is that if model_dir is just /content/model_dir, then everything saves fine, but if I save my model to a folder on google drive (I connect to my google drive with this code:
from google.colab import drive
drive.mount('/content/gdrive2', force_remount=True)
) then event file doesn't get updated. It is being created, but it is not filling with data during training (and even after training is finished).
As I understand, the problem is that google drive doesn't understand that tensorflow updates the event file. But after training is finished, the whole event file is saved. What can I do to fix this bug?
After training model i would like to save history in mine bucket or any location which i can access later on local
when i run code below on google colab all work fine
history = model.fit(training_dataset, steps_per_epoch=steps_per_epoch, epochs=EPOCHS,
validation_data=validation_dataset, validation_steps=validation_steps)
#model.summary()
model.save(BUCKET)
#save as pickle
with open('/trainHistoryDict', 'wb') as file_pi:
pickle.dump(history.history, file_pi)
I can read later using
history = pickle.load(open('/trainHistoryDict', "rb"))
however when run code as a job on google cloud AI-Platform ( using %%wrotefile) I cannot retrieve history on google colab using pickle load - im getting 'no such directory'
So how i can run training on AI platform on google cloud and then access history on google colab?
Can i save history.history in bucket ? i tried to use PACKAGE_STAGING_PATH but it didnt worked
found solution
subprocess.Popen('gsutil cp history gs://bigdatapart2-storage/history1', shell=True, stdout=subprocess.PIPE)
I am using TensorFlow with Keras to train a classifier and I tried adding TensorBoard as a callback parameter to the fit method. I have installed TensorFlow 2.0 correctly and am also able to load TensorBoard by calling %load_ext tensorboard. I am working on Google Colab and thought I would be able to save the logs to Google Drive during training, so that I can visualize them with TensorBoard. However, when I try to fit the data to the model along with the TensorBoard callback, I get this error:
File system scheme '[local]' not implemented (file: '/content/drive/My
Drive/KInsekten/logs/20200409-160657/train') Encountered when
executing an operation using EagerExecutor.
I initialized the TensorBoard callback like this:
logs_base_dir = "/content/drive/My Drive/KInsekten/logs/"
if not os.path.exists(logs_base_dir):
os.mkdir(logs_base_dir)
log_dir = logs_base_dir + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensor_board = tf.keras.callbacks.TensorBoard(log_dir = log_dir, histogram_freq = 1,
write_graph = True, write_images = True)
I was facing the same issue. The issue is that TPU can't use the local filesystem we have to create a separate bucket on cloud storage and configure it with TPU.
Following are the two links from google cloud official TPU documentation in the first link the main problem is discussed and in the 2nd link the actual solution is implemented.
The main problem disscussed
Solution to this problem
I have trained my Keras model in google colab and saved the model as model.h5 file in google drive. I made the model.5 file accessible through a public URL. Now I want to load the model with load_model() from this URL. I am getting an error *File system scheme 'https' not implemented
Here is my code.
model=load_model(url)
It would great if I could get a possible solution for this.
There is a 12 hour time limit for training DL models on GPU, according to google colab. Other people have had similar questions in the past, but there has been no clear answer on how to save and load models halfway through training when the 12 hour limits get exceeded, including saving the number of epochs that has been completed/saving other parameters. Is there an automated script for me to save the relevant parameters and resume operations on another VM? I am a complete noob; clear cut answers will be much appreciated.
As far as I know, there is no way to automatically reconnect to another VM whenever you reach the 12 hours limit. So in any case, you have to manually reconnect when the time is up.
As Bob Smith points out, you can mount Google Drive in Colab VM so that you can save and load data from there. In particular, you can periodically save model checkpoints so that you can load the most recent one whenever you connect to a new Colab VM.
Mount Drive in your Colab VM:
from google.colab import drive
drive.mount('/content/gdrive')
Create a saver in your graph:
saver = tf.train.Saver()
Periodically (e.g. every epoch) save a checkpoint in Drive:
saver.save(session, CHECKPOINT_PATH)
When you connect to a new Colab VM (because of the timeout), mount Drive again in your VM and restore the most recent checkpoint before the training phase:
saver.restore(session, CHECKPOINT_PATH)
...
# Start training with the restored model.
Take a look at the documentation to read more about tf.train.Saver.
Mount Drive, and save and load persistent data from there.
from google.colab import drive
drive.mount('/content/gdrive')
https://colab.research.google.com/notebooks/io.ipynb#scrollTo=RWSJpsyKqHjH
From colab you can access github, which makes it possible to save your model checkpoints to github periodically. When a session is ended, you can start another session and load the checkpoint back from your github repo.