I am training a neural network for object detection using Google Colab. I wanted to visualize the learning process but every time I try to access tensorboard, it shows me the following:
No dashboards are active for the current data set. Probable causes: - You haven’t written any data to your event files. - TensorBoard can’t find your event files.
I am not training the model locally and have configured my google drive account with the colab notebook for the training data so user hpabst's answer does not seem useful.
I also tried setting up tensorboard using ngrok but that gave me a similar output.
I made sure I am generating summary data in a log directory by creating a summary writer:
import tensorflow as tf
sess = tf.Session()
file_writer = tf.summary.FileWriter('/content/logs/my_log_dir/', sess.graph)
and followed that with
tensorboard = TensorBoard(log_dir="/content/logs/my_log_dir/",batch_size=32, write_graph=True, update_freq='epoch')
model.fit_generator(
train_generator,
steps_per_epoch=(train_data/BS),
epochs=EPOCHS,
validation_data=validation_generator,
validation_steps=(test_data/BS),
callbacks=[tensorboard, checkpoint])
and finally
tensorboard --logdir /content/logs/my_log_dir/
The event files are in place. The path to the log directory is also correct.
Like I said, I was getting the same- No active dashboards error using ngrok. I moved over to the SCALARS menu in the Tensorboard GUI and to the left, under the runs section, at the bottom, I found out that the path to the log directory was being shown as '/content/ log /my_log_dir' although everywhere in my code I had only mentioned the path as -'/content/ logs /my_log_dir'. Maybe setting up tensorboard using ngrok expects the files to be in the 'log' and not the 'logs' directory. I made the change and now it works just fine.
Related
I'm using Google Colab for finetuning a pre-trained model.
I successfully preprocessed a dataset and created an instance of the Seq2SeqTrainer class:
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
The problem is training it from last checkpoint after the session is over.
If I run trainer.train(), it runs correctly. As it takes a long time, I sometimes came back to the Colab tab after a few hours, and I know that if the session has crashed I can continue training from the last checkpoint like this: trainer.train("checkpoint-5500")
The checkpoint data does no longer exist on Google Colab if I come back too late, so even though I know the point the training has reached, I will have to start all over again.
Is there any way to solve this problem? i.e. extend the session?
To fix your problem try adding a full fixed path, for example for your google drive and saving the checkpoint-5500 to it.
Using your trainer you can set the output directory as your Google Drive path when creating an instance of the Seq2SeqTrainingArguments.
When you come back to your code, if the session is indeed over you'll just need to load your checkpoint-5500 from your google drive instead of retraining everything.
Add the following code:
from google.colab import drive
drive.mount('/content/drive')
And then after your trainer.train("checkpoint-5500") is finished (or as it's last step) save your checkpoint to your google drive.
Or if you prefer, you can add a callback inside your fit function in order to save and update after every single epoch (that was if for some reason the session is crashing before it finish you'll still have some progress saved).
I have a code that basically takes in a csv, which can be uploaded from streamlit and then pushes out a classification prediction.
Just as a context I use xgboost to create my model and I save it as following:
joblib.dump(model, 'C:\\Users\myname\classification\default_class_model.pkl')
To grab the model I do:
model_from_joblib =joblib.load('C:\\Users\myname\classification\default_class_model.pkl')
scoring = model_from_joblib.predict(X_test)
When I execute it in Jupyter notebooks it seems to work just fine, but when running on anaconda and do
streamlit run mymodel.py
I get the error:
XGBoostError: [13:38:10]
C:\Users\Administrator\workspace\xgboost-win64_release_1.1.0\include\xgboost/json.h:65:
Invalid cast, from Null to Array
Does anyone have an idea why this may be?
I solved the problem by updating the xgboost version I was using
I trained a model use kaggle.com
The final code is:
history = model.fit(dataset, epochs=1)
model.save("/kaggle/working/039_model.h5")
print("Model saved successfully!")
The output is:
31368/31368 [==============================] - 23489s 749ms/step - loss: 1.4623
Model saved successfully!
For test, I just trained it for 1 epoch, but I cannot find my model in the /kaggle/working directory in the right side bar. Even if I click the refresh button or refresh the page.
The page picture is :
my problem picture
Thanks for your help!
The refresh button of browser refreshes the whole environment that you were working on, all variables that were set, any file that were saved in storage instance allotted to you for that particular session.
You are given an instance of a server resource, i.e., RAM, few GB of temporary storage, CPU/GPU for a certain period of time. Refreshing the browser starts completely new instance losing all local changes to that instance.
So your model even though it might be saved in the local storage of
that instance (on server), it will get deleted once that session
expires (after 20 minutes of inactivity) or when you run out of quota
or when you refreshes your page.
Solution is to don't refresh your page, look where the model is saved by exploring in side bar by clicking dropdown button and downloading the model for future use case.
Side Note: Committing your notebook will only commit, i.e., save that particular version of your code in the notebook, and will not save anything that you saved to the local storage of that instance, like in your case a saved model file.
Well, maybe I know what happend, I just find out I should save and run all.
File -> save version -> save & run all
so we can get all we want.
It is application manage, it will save on target directory except you using checkpoint callback !
savedir = 'F:\\models\\save\\FlappyBird_15'
checkpoint_path = "F:\\models\\checkpoint\\FlappyBird_15\\TF_DataSets_01.h5"
if not exists(checkpoint_dir) :
os.mkdir(checkpoint_dir)
print("Create directory: " + checkpoint_dir)
### model initialize ###
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(1200, 1)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128, return_sequences=True, return_state=False)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128)),
])
model.add(layers.Flatten())
model.add(layers.Dense(64))
model.add(layers.Dense(2))
model.summary()
########################
if exists(checkpoint_path) :
model.load_weights(checkpoint_path)
print("model load: " + checkpoint_path)
input("Press Any Key!")
model.save_weights(checkpoint_path)
I am using Google collab to analyze some data. I want to download the model file.
I have a code like that:
model.fit(x_train, y_train, epochs=1, batch_size=1,
I tried using the download(), but I doesn't work. Is there any way to download it?
I assuming that you are using Keras, because model.save() method return None so if you put it as download() method parameter, it will make an error. Just save model first then download it after that:
model.save('model_name.h5')
download('model_name.h5')
And make sure that you are at the correct directory that contain model_name.h5 file. You can check that with !ls command.
In the old versions, we can use this command after the creating the network structure and creating the session.
writer = tf.train.SummaryWriter("logs/", sess.graph)
And type this in the cmd after running your script:
tensorboard --logdir="logs"
Then you copy the link to your browser.
But it shows this error:
No graph definition files were found.
To store a graph, create a tf.summary.FileWriter and pass the graph
either via the constructor, or by calling its add_graph() method. You
may want to check out the graph visualizer tutorial .1
Please help. I also tried using the tf.summary.FileWriter() instead
file_writer = tf.summary.FileWriter('/path/to/logs', sess.graph)
And I get the same error.
There is probably something wrong with the folder you are writing to, or reading from. After you tried the filewriter, can you verify that it indeed wrote something to your logs folders?
If this is the case, start your tensorboard with this command:
tensorboard --logdir="logs" --debug
Take a look at this line:
INFO:tensorflow:TensorBoard path_to_run is: {'/Users/test/logs': None}
Verify that this is the same path! This is what went wrong for me when I had the same issue. More debug ideas can be found on this page by the way: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tensorboard/README.md#my-tensorboard-isnt-showing-any-data-whats-wrong
If this does not help, let us know!