Today, I used jupyter to run a deep learning model remotely.
After the browser was disconnected for some time, I reconnected the running kernel, but jupyter did not continue to print the intermediate output results.
From the usage of GPU and the command line of jupyter, we can see that the kernel continues to run.
Is there any way I can continue to observe the intermediate output of the kernel?
the situation of the running kernel
The Google colab lifetime with the open browser is usually 12 hours.
The best way to save your changes is to use Checkpoint for your deep learning model to avoid losing the last trained model.
This is an example of how you can use checkpoint callback in your deep learning model while training, more examples and details can be found here.
# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
batch_size = 32
# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq=5*batch_size)
# Create a new model instance
model = create_model()
# Save the weights using the `checkpoint_path` format
model.save_weights(checkpoint_path.format(epoch=0))
# Train the model with the new callback
model.fit(train_images,
train_labels,
epochs=50,
batch_size=batch_size,
callbacks=[cp_callback],
validation_data=(test_images, test_labels),
verbose=0)
Related
I am training a transformer model for a chat bot. I have thought of saving the checkpoints in colab to reuse the trained model whenever required after the training process is done.
I have followed the model saving tutorial from tensorflow but it keeps me giving me the following error.
UnimplementedError: File system scheme '[local]' not implemented (file: 'training_1/cp.ckpt_temp/part-00000-of-00001') [Op:MultiDeviceIteratorInit]
This is my try in saving the checkpoints.
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
#Create checkpoint callback
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True,
verbose=1)
#Fit model:
model.fit(dataset, epochs=EPOCHS,callbacks=[cp_callback])
In some training instances, the model get trained for about 5 epochs and this error occurs while in some instances the error occurs within just one or two epochs. I am using TPU to train the model.
What causes this issue and is there a way to get rid of it?
Any help will be highly appreciated.
I have a keras NN that I want to train and validate using two sets of data, and then test the ultimate performance of using a third set. In order to avoid having to rerun the training every time I restart my google colab runtime or want to change my test data, I want to save the final state of the model after training in one script and then load it again in another script.
I've looked everywhere and it seems that model.save("content/drive/My Drive/Directory/ModelName", save_format='tf') should do the trick, but even though it outputs INFO:tensorflow:Assets written to: content/drive/My Drive/Directory/ModelName/assets nothing appears in my Google Drive, so I assume it isn't actually saving.
Please can someone help me solve this issue?
Thanks in advance!
The standard way of saving and retrieving your model's state after Google Colab terminated your connection is to use a feature called ModelCheckpoint. This is a callback in Keras that would run after each epoch and it will save your model for instance any time there's an improvement. Here's is the steps needed to accomplish what you want:
Connect to Google Drive
Use this code in order to connect to Google Drive:
from google.colab import drive
drive.mount('/content/gdrive')
Give access to Google Colab
Then you'll presented with a link that you should go to and after authorizing Google Colab by copying the given code to the text box as shown below:
Define your ModelCheckpoint
This is how you could define your ModelCheckpoint's callback:
from keras.callbacks import *
filepath="/content/gdrive/My Drive/MyCNN/epochs:{epoch:03d}-val_acc:{val_acc:.3f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
Use it as a callback in while you're training the model
Then you need to tell your model that after each epoch run this functionality for me to save the model's state.
model.fit(X_train, y_train,
batch_size=64,
epochs=epochs,
verbose=1,
validation_data=(X_val, y_val),
callbacks=callbacks_list)
Load the model after Google Colab terminated
Finally after your session was terminated, you can load your previous model's state by simply running the following code. Don't forget to re-define your model first and only load weights at this stage.
model.load_weights('/content/gdrive/My Drive/MyCNN/epochs:047-val_acc:0.905.hdf5'
Hope that this answers your question.
I am writing a neural network to train incrementally (not online). Here is a snippet of the code
output = create_model()
model = Model(inputs=values, outputs=output)
if start_epoch > 1:
weights_list = load_model_from_pickle()
model.set_weights(weights_list)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(data , label, epochs=1, verbose=1, batch_size=1024, shuffle=False)
In essence, I want to load previously trained weights and train for a few more epochs. I read some SO reply that calling compile changes the weights? Is there any other way to do it? Does it make sense to set weight after calling compile? Will the answer change if I run my model in multi gpu setting?
You need to compile the model ones and after training when you reload the model, you dont' require to compile it again. Read more here.
Compile function defines the optimizer, loss functions and metrics you want. It does not change any weights. For more detailed information, read here.
i'm making a convolutional neural network using keras for images classification.
i'm making a script which will allow user to train again the model with new data.
As you know, model weight initialisation influence model performance, so I want to run automaticaly the same model (same architecture and same data) with different random value for weight initilisation and save the weight of the best model
What is a bit tricky is that i'm creating a .exe file and I can't
run all model and save their weight in different .h5 files
delet all h5 files which are useless (worst performances)
What I need is to create only one h5 file, correspoding to model with best performance
But I have no idea how i could do that
edit: about the code, even if my curent process is totaly normal, I add it for more visualisation
model = Sequential()
model.add(Conv2D(24,kernel_size=3,padding='same',activation='relu',
input_shape=(n,n,1)))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(X, activation='softmax'))
model.compile(optimizer="adam", loss="categorical_crossentropy",metrics=["accuracy"])
atch_size=256
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=30) # Stop trainning si pas d'améliorationaprès n = patience
mc = ModelCheckpoint(ROOT+"\\model_weight_"+name+'_'+str(n)+".h5", monitor='val_loss', mode='min', verbose=1, save_best_only=True) #Sauvegarder meilleur modèle
train = model.fit_generator(image_gen.flow(train_X, train_label, batch_size=batch_size),epochs=400,verbose=1,validation_data=(valid_X, valid_label),class_weight=class_weights,callbacks=[es,mc],steps_per_epoch=len(train_X)/batch_size)
I want to speak many things here but first, this should be what you ask for.
mc = ModelCheckpoint(ROOT+"\\model_weight_"+name+"_{epoch:02d}-{val_loss:.2f}.h5", monitor='val_loss', mode='min', verbose=1, save_best_only=True)
Where val_loss is metrics name Keras prints in training process(like val_acc if you put metric=['accuracy']).
More information https://keras.io/callbacks/.
And for deleting files. I didn't check the best weights in training part mainly because the save_best_only=True, a loop still needed to check every epoch in train.history.
import os
min_val_loss = float('inf')
best_file = ''
# Find the best weights from file name
for weightsfile in os.listdir(ROOT):
if int(weightsfile[-7:-3]) < min_val_loss: # Since I use .2f above, I use -7:-3 here
min_val_loss = int(weightsfile[-7:-3])
best_file = weightsfile
# delete everything in ROOT except the best.
for weightsfile in os.listdir(ROOT):
if weightsfile == best_file
continue # do nothing in this loop
os.remove(os.path.join(ROOT, weightsfile)) # I recommend you to use `os.path.join` instead of `SOMTHING+"\\"+ANOTHER`. It might fuck you up later when you move to Linux machine(like Colab).
Now, for what I want to speak. Actually the weights initialization doesn't much matter in Deep CNN(where there're 10 or 100 layers). In fact, I trained the same network like ResNet50 with same setting several times on the same data and it usually only differ for 2% or 3% for the accuracy(around 0.69 to 0.72). And the weights that perform the best on another validation set wasn't the 0.72 one.
So, it's better to use the already fine-tuned weights of some good models like ResNet, DenseNet, NasNet or the most recent state-of-the-art EfficientNet and waste time on trying different setting like optimizer or learning rate schedule. And most importantly train on free GPU machine like Colab.
When I use EarlyStopping callback does Keras save best model in terms of val_loss or it save model on save_epoch = [best epoch in terms of val_loss] + YEARLY_STOPPING_PATIENCE_EPOCHS ?
If it's second option, how to just save best model?
Here is code snippet:
early_stopping = EarlyStopping(monitor='val_loss', patience=YEARLY_STOPPING_PATIENCE_EPOCHS)
history = model.fit_generator(
train_generator,
steps_per_epoch=100, # 1 epoch = BATCH_SIZE * steps_per_epoch samples
epochs=N_EPOCHS,
validation_data=test_generator,
validation_steps=20,
callbacks=[early_stopping])
#Save train log to .csv
pd.DataFrame(history.history).to_csv('vgg16_binary_crossentropy_train_log.csv', index=False)
model.save('vgg16_binary_crossentropy.h5')
In v2.2.4+ of Keras, EarlyStopping has a restore_best_weights parameter which, when set to True, will set the model to the state of best CV performance. For example:
EarlyStopping(restore_best_weights=True)
From my experience using the 'earlystopping' callback, the model will not be saved automatically...it will just stop training and when you save it manually, it will be the second option you present.
To have your model save each time val_loss decreases, see the following documentation page:
https://keras.io/callbacks/ and look at the "Example: model checkpoints" section which will tell you exactly what to do.
note that if you wish to re-use your saved model, I have had better luck using 'save_weights' in combo with saving the architecture in json. YMMV.