I am using TensorFlow 2.x object detection API. I have trained a deep learning model from the model zoo on my dataset. I am using Google Colab. After training now I want to evaluate my model. I am using coco detection metrics. I used the following script to evaluate my model,
!python3 model_main_tf2.py \
--model_dir = path/to/model directory \
--pipeline_config_path = path/to/pipeline config file \
--checkpoint_dir = path/to/checkpoint directory
After running the above code I get the mean average precision (mAP) and average recall (AR) for the latest checkpoint on my test set. But for academic purposes, I want to get these metrics on all the checkpoints to get a graph of how my model has improved over time. Is there a possible way to that? or is it possible to train and evaluate at the same time in TensorFlow 2 object detection API? I am a beginner in this field so kindly help me out with this issue. Thank you.
I am facing the same problem. So I had an idea. We can run the model_main_tf2.py you mentioned to eval the model but changing the current checkpoint (first line) to evaluate in the checkpoint file
model_checkpoint_path: "ckpt-1"
then
model_checkpoint_path: "ckpt-2"
then
model_checkpoint_path: "ckpt-3"
.
.
.
For each checkpoint you will get a .tfevent so then you open TensorBoard pointing to the directory that contains all the .tfevent and you can see how the model improves over time.
I just saved the last 3 checkpoints in my computer so I can't see the progress from the beginning (my fault) but if you have all the checkpoints try to do what I suggest.
See my graph evaluating the last 3 checkpoints.
You should have an eval directory including an events.out.tfevents file under your model directory. You can run !tensorboard --logdir=path/to/eval/directory to access the graphs.
You can run training with the same snipped you have except without the checkpoint_dirand can open another terminal to run evaluation like you're currently doing.
Related
I have 31.216 labeled images for object detection. I used LabelIMG program to label images and it's in Pascal-VOC format. I want to create a tflite model for my Kotlin project. However, I have serious problems.
First, If I use my local environment, I tried to install tflite-model-maker library in PyCharm using pip install tflite-model-maker. It downloaded ~30GB and Python still says unresolved reference. Then I tried to add the library from here but it also didn't work. I couldn't achieve importing the library.
On the second way, I used Google Colab. Following this tutorial from Tensorflow. I mounted my Google Drive in Colab and edited all codes for my dataset path. I ran this line model.export(export_dir='.', tflite_filename='AslModel.tflite') lastly and it create model file in the colab directory. I continued to run next line model.evaluate_tflite('AslModel.tflite', val_data) and it gave 16 hours ETA and after 14 hours Google Colab runtime gave an error and all runtime has been reset. Now, I have a tflite and I tested it. Since there is no evaluation step, it makes bad predictions. I started all over again but Google Colab gave an error again. I guess ~7 hours training + ~16 hours of evaluation is impossible with Google Colab because there is 24h limit. Thus, my question is how can I run the evaluation step only?
model is defining in this line and it takes 7 hours model = object_detector.create(train_data, model_spec=spec, batch_size=4, train_whole_model=True, epochs=20, validation_data=val_data). Instead of this line, I want to initialize my tflite file to a model like model = LoadModel(PATH_OF_MY_TFLITE). I couldn't find any load method so I'm stuck there.
To sum up, objective is training the Pascal-VOC formatted dataset. I couldn't import the libraries for Python and with Google Colab I have raw tflite model but it needs evaluation and I can't run previous steps due to time limit. Lastly, I bought Colab Pro but I spent all my compute unit. I don't even know what is the purpose of compute unit. I'm waiting for suggestions. Thank you.
I would like to evaluate a custom-trained Tensorflow object detection model on a new test set using Google Cloud.
I obtained the inital checkpoints from:
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
I know that the Tensorflow object-detection API allows me to run training and evaluation simultaneously by using:
https://github.com/tensorflow/models/blob/master/research/object_detection/model_main.py
To start such a job, i submit following ml-engine job:
gcloud ml-engine jobs submit training [JOBNAME]
--runtime-version 1.9
--job-dir=gs://path_to_bucket/model-dir
--packages dist/object_detection-
0.1.tar.gz,slim/dist/slim-0.1.tar.gz,pycocotools-2.0.tar.gz
--module-name object_detection.model_main
--region us-central1
--config object_detection/samples/cloud/cloud.yml
--
--model_dir=gs://path_to_bucket/model_dir
--pipeline_config_path=gs://path_to_bucket/data/model.config
However, after I have successfully transfer-trained a model I would like to use calculate performance metrics, such as COCO mAP(http://cocodataset.org/#detection-eval) or PASCAL mAP (http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf) on a new test data set which has not been previously used (neither during training nor during evaluation).
I have seen, that there is possible flag in model_main.py:
flags.DEFINE_string(
'checkpoint_dir', None, 'Path to directory holding a checkpoint. If '
'`checkpoint_dir` is provided, this binary operates in eval-only
mode, '
'writing resulting metrics to `model_dir`.')
But I don't know whether this really implicates that model_main.py can be run in exclusive evaluation mode? If yes, how should I submit the ML-Engine job?
Alternatively, are there any functions in the Tensorflow API which allows me to evaluate an existing output dictionary (containing bounding boxes, class labels, scores) based on COCO and/or Pascal mAP? If there is, I could easily read in a Tensorflow record file locally, run inference and then evaluate the output dictionary.
I know how to obtain these metrics for the evaluation data set, which is evaluated during training in model_main.py. However, from my understanding I should still report model performance on a new test data set, since I compare multiple models and implement some hyper-parameter optimization and thus I should not report on evaluation data set, am I right? On a more general note: I can really not comprehend why one would switch from separate training and evaluation (as it is in the legacy code) to a combined training and evaluation script?
Edit:
I found two related posts. However I do not think that the answers provided are complete:
how to check both training/eval performances in tensorflow object_detection
How to evaluate a pretrained model in Tensorflow object detection api
The latter has been written while TF's object detection API still had separate evaluation and training scripts. This is not the case anymore.
Thank you very much for any help.
If you specify the checkpoint_dir and set run_once to be true, then it should run evaluation exactly once on the eval dataset. I believe that metrics will be written to the model_dir and should also appear in your console logs. I usually just run this on my local machine (since it's just doing one pass over the dataset) and is not a distributed job. Unfortunately I haven't tried running this particular codepath on CMLE.
Regarding why we have a combined script... from the perspective of the Object Detection API, we were trying to write things in the tf.Estimator paradigm --- but you are right that personally I found it a bit easier when the two functionalities lived in separate binaries. If you want, you can always wrap up this functionality in another binary :)
I have been training a custom Object Detector using the Tensorflow Object Detection API (Network: SSD Mobilenet V1). There are screenshots of Tensorboard with the accuracy of the network being shown, however, I have a bunch of metrics being displayed except for accuracy. Are there any specific steps that need to be taken to display the accuracy using tensorboard?
I am using the updated model_main.py and the following python command;
python model_main.py \
--pipeline_config_path=training/ssd_mobilenet_v1_coco.config \
--model_dir=training \
--num_train_steps=560000 \
--num_eval_steps=3 \
--alsologtostderr
How long have you been training it for?
It will run for a number of steps before the doing the first evaluation. Precision and recall data should show up when you wait a bit longer
to display precision and recall in Tensorboard you should run this command after training your model so that the training folder will contain the checkpoints of your trained model
python model_main_tf2.py \
--pipeline_config_path=training/ssd_mobilenet_v1_coco.config \
--model_dir=training \
--checkpoint_dir= training
this command will generate a folder named eval inside the training folder
and to display result you should add the path to your eval folder (training/eval) in tensorboard
!!! this command for TensorFlow 2 object detection API
I'm new to TensorFlow. I am currently working on a project that uses the TF Object detection API. I'm training a model with two classes on my custom images. So far I have successfully run train.py and eval.py and executed TensorBoard at the same time to see how the training processes is progressing.
Here is the image of my work:
How do I display a graph in which I can see the accuracy of the model being developed ?
Any help is appreciated!
You can do this by switching from the deprecated train.py and eval.py to the recent model_main.py. It interleaves evaluation schedule into the training session, so that you can evaluate the training progress of the model without manually doing so.
The flags of model_main.py are very similar to those of train.py, and you can see an example in here:
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md#running-the-training-job
I'm looking to run a basic fully-connected neural network for the MNIST dataset with the C++ API v1.2 from Tensorflow. I have trained the model and exported it using tf.train.Saver() in Python. This gave me a checkpoint file, a data file, an index file and a meta file.
I know that the data file contains the saved variables while the meta file contains the graph from using Tensorboard on a previous project.
However, I am not sure what is the recommended way to load those files
and run the trained model in a C++ environment in v1.2, since all the
tutorials and questions I've found are for older versions which differ
substantially.
I've found that tensorflow::ops::Restore should be the method to do such a thing, but I know that inference in Tensorflow isn't well supported, as such I am not certain what parameters should I give it in order to receive the trained model that I can just put into a session->Run() and receive an accuracy statement when fed test data.