How to make it append tensorboard logs to previous runs?

How to make it append tensorboard logs to previous runs? - python

I'm using tensorboard with keras this way:
from keras.callbacks import TensorBoard
tensorboard = TensorBoard(log_dir='./logs', histogram_freq=0,
write_graph=True, write_images=False)
# define model
model.fit(X_train, Y_train,
batch_size=batch_size,
epochs=nb_epoch,
validation_data=(X_test, Y_test),
shuffle=True,
callbacks=[tensorboard])
If I run train one more time calling second time model.fit(…), tensorboard resets step so metric plots start looking like a mess. How to make it append result to previous results?
Another question how to create another session run to compare their results on tensorboard?

To resume a previous training run, you should set the argument initial_epoch of model.fit. By doing so, the new information will be appended to the existing TensorBoard logs.

Related

Debug model.fit and model.evaluate to see outputs obtained and understand how the calculations are made

I am trying to understand the tensorflow architectures and a way to see how the calculations were made to obtain a result in each epoch at history variable in model.fit() function:
history = model.fit(X_train, y_train, epochs=150, batch_size=16, verbose=0, validation_split=0.3, steps_per_epoch=10)
and at model.evaluate() function, about the final test accuracy and loss obtained:
loss, acc = model.evaluate(X_test, y_test, verbose=0)
I have tried to debug the functions behind model.fit and model.evaluate but I realized that there are lots of functions and it makes hard to understand how the calculations are made.
I want to know what should I do to visualize the calculations made in these two functions?

How to continue printing the Intermediate results in jupyter after the reconnection?

Today, I used jupyter to run a deep learning model remotely.
After the browser was disconnected for some time, I reconnected the running kernel, but jupyter did not continue to print the intermediate output results.
From the usage of GPU and the command line of jupyter, we can see that the kernel continues to run.
Is there any way I can continue to observe the intermediate output of the kernel?
the situation of the running kernel

The Google colab lifetime with the open browser is usually 12 hours.
The best way to save your changes is to use Checkpoint for your deep learning model to avoid losing the last trained model.
This is an example of how you can use checkpoint callback in your deep learning model while training, more examples and details can be found here.
# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
batch_size = 32
# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq=5*batch_size)
# Create a new model instance
model = create_model()
# Save the weights using the `checkpoint_path` format
model.save_weights(checkpoint_path.format(epoch=0))
# Train the model with the new callback
model.fit(train_images,
train_labels,
epochs=50,
batch_size=batch_size,
callbacks=[cp_callback],
validation_data=(test_images, test_labels),
verbose=0)

Multivariate LSTM walk-forward fitting

I have an LSTM model that was trained on the multi-feature daily dataset and predicts the target feature's value one day in the future.
How should I retrain the model each day as the new data becomes available? Should I rerun the model.fit with the full dataset (which gets updated each day) like in the example below?
model.fit(x_train, y_train, epochs=50, batch_size=20,
validation_data=(x_test, y_test), verbose=2, shuffle=False)
Or I can call model.fit only with the newly available data?
# run at the beggining once
model.fit(x_train, y_train, epochs=50, batch_size=20,
validation_data=(x_test, y_test), verbose=2, shuffle=False)
# run every day as the new data gets available.
model.fit(x_yesterday, x_yesterday)

Assuming you are using Keras, I would use train_on_batch See this previous question for the answer: What is the use of train_on_batch() in keras?

Validation_data and Validation_split

So I have a GRU model that predict output power. For the training data I have a csv file which has data from 2018, while for my testting data it is a different csv file which has data from 2019.
I just had to short questions.
Since I'm using 2 different csv files one for testing and one for training, I do not need to train_test_split?
When it comes to model.fit, I really don't know the difference between Validation_data and Validation_split and which one should I use?
I have tested these 3 lines seperately, the 2nd and 3rd line give me the same exact results , while the first gives me way lower val_loss.
Thank you.
history=model.fit(X_train, y_train, batch_size=256, epochs=25, validation_split=0.1, verbose=1, callbacks=[TensorBoardColabCallback(tbc)])
history=model.fit(X_train, y_train, batch_size=256, epochs=25, validation_data=(X_test, y_test), verbose=1, callbacks=[TensorBoardColabCallback(tbc)])
history=model.fit(X_train, y_train, batch_size=256, epochs=25, validation_data=(X_test, y_test), validation_split=0.1, verbose=1, callbacks=[TensorBoardColabCallback(tbc)])

You can do what you want, yes you can use one file to train and one to validate. But you could also merge them then use train_test_split if you wish. However, I would recommend you to merge them as you have data from different periods of time, there may be differences.
Using validation_data means you are providing the training set and validation set yourself, whereas using validation_split means you only provide a training set and keras splits it into a training set and a validation set (with the validation set being validation_split times the size of the training set)

Using Checkpoint saving with train_on_batch in Keras

I'm training my data in batches using train_on_batch, but it seems train_on_batch doesn't have an option to use callbacks, which seems to be a requirement to use checkpoints.
I can't use model.fit as that seems to require I load all of my data into memory.
model.fit_generator is giving me strange problems (like hanging at end of an epoch).
Here is the example from Keras API docs showing the use of ModelCheckpoint:
from keras.callbacks import ModelCheckpoint
model = Sequential()
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
checkpointer = ModelCheckpoint(filepath='/tmp/weights.hdf5', verbose=1,
save_best_only=True)
model.fit(x_train, y_train, batch_size=128, epochs=20,
verbose=0, validation_data=(X_test, Y_test), callbacks=[checkpointer])

If you train on each batch manually, you can do whatever you want at any #epoch(#batch). No need to use callback, just call model.save or model.save_weights.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make it append tensorboard logs to previous runs? - python

To resume a previous training run, you should set the argument initial_epoch of model.fit. By doing so, the new information will be appended to the existing TensorBoard logs.

Related

Debug model.fit and model.evaluate to see outputs obtained and understand how the calculations are made

How to continue printing the Intermediate results in jupyter after the reconnection?

Multivariate LSTM walk-forward fitting

Validation_data and Validation_split

Using Checkpoint saving with train_on_batch in Keras

Categories

Resources