How to display the accuracy graph on TensorBoard? - python

I'm new to TensorFlow. I am currently working on a project that uses the TF Object detection API. I'm training a model with two classes on my custom images. So far I have successfully run train.py and eval.py and executed TensorBoard at the same time to see how the training processes is progressing.
Here is the image of my work:
How do I display a graph in which I can see the accuracy of the model being developed ?
Any help is appreciated!

You can do this by switching from the deprecated train.py and eval.py to the recent model_main.py. It interleaves evaluation schedule into the training session, so that you can evaluate the training progress of the model without manually doing so.
The flags of model_main.py are very similar to those of train.py, and you can see an example in here:
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md#running-the-training-job

Related

TensorFlow 2 Object Detection API Model Evaluation

I am using TensorFlow 2.x object detection API. I have trained a deep learning model from the model zoo on my dataset. I am using Google Colab. After training now I want to evaluate my model. I am using coco detection metrics. I used the following script to evaluate my model,
!python3 model_main_tf2.py \
--model_dir = path/to/model directory \
--pipeline_config_path = path/to/pipeline config file \
--checkpoint_dir = path/to/checkpoint directory
After running the above code I get the mean average precision (mAP) and average recall (AR) for the latest checkpoint on my test set. But for academic purposes, I want to get these metrics on all the checkpoints to get a graph of how my model has improved over time. Is there a possible way to that? or is it possible to train and evaluate at the same time in TensorFlow 2 object detection API? I am a beginner in this field so kindly help me out with this issue. Thank you.
I am facing the same problem. So I had an idea. We can run the model_main_tf2.py you mentioned to eval the model but changing the current checkpoint (first line) to evaluate in the checkpoint file
model_checkpoint_path: "ckpt-1"
then
model_checkpoint_path: "ckpt-2"
then
model_checkpoint_path: "ckpt-3"
.
.
.
For each checkpoint you will get a .tfevent so then you open TensorBoard pointing to the directory that contains all the .tfevent and you can see how the model improves over time.
I just saved the last 3 checkpoints in my computer so I can't see the progress from the beginning (my fault) but if you have all the checkpoints try to do what I suggest.
See my graph evaluating the last 3 checkpoints.
You should have an eval directory including an events.out.tfevents file under your model directory. You can run !tensorboard --logdir=path/to/eval/directory to access the graphs.
You can run training with the same snipped you have except without the checkpoint_dirand can open another terminal to run evaluation like you're currently doing.

How to run TF object detection API model_main.py in evaluation mode only

I would like to evaluate a custom-trained Tensorflow object detection model on a new test set using Google Cloud.
I obtained the inital checkpoints from:
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
I know that the Tensorflow object-detection API allows me to run training and evaluation simultaneously by using:
https://github.com/tensorflow/models/blob/master/research/object_detection/model_main.py
To start such a job, i submit following ml-engine job:
gcloud ml-engine jobs submit training [JOBNAME]
--runtime-version 1.9
--job-dir=gs://path_to_bucket/model-dir
--packages dist/object_detection-
0.1.tar.gz,slim/dist/slim-0.1.tar.gz,pycocotools-2.0.tar.gz
--module-name object_detection.model_main
--region us-central1
--config object_detection/samples/cloud/cloud.yml
--
--model_dir=gs://path_to_bucket/model_dir
--pipeline_config_path=gs://path_to_bucket/data/model.config
However, after I have successfully transfer-trained a model I would like to use calculate performance metrics, such as COCO mAP(http://cocodataset.org/#detection-eval) or PASCAL mAP (http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf) on a new test data set which has not been previously used (neither during training nor during evaluation).
I have seen, that there is possible flag in model_main.py:
flags.DEFINE_string(
'checkpoint_dir', None, 'Path to directory holding a checkpoint. If '
'`checkpoint_dir` is provided, this binary operates in eval-only
mode, '
'writing resulting metrics to `model_dir`.')
But I don't know whether this really implicates that model_main.py can be run in exclusive evaluation mode? If yes, how should I submit the ML-Engine job?
Alternatively, are there any functions in the Tensorflow API which allows me to evaluate an existing output dictionary (containing bounding boxes, class labels, scores) based on COCO and/or Pascal mAP? If there is, I could easily read in a Tensorflow record file locally, run inference and then evaluate the output dictionary.
I know how to obtain these metrics for the evaluation data set, which is evaluated during training in model_main.py. However, from my understanding I should still report model performance on a new test data set, since I compare multiple models and implement some hyper-parameter optimization and thus I should not report on evaluation data set, am I right? On a more general note: I can really not comprehend why one would switch from separate training and evaluation (as it is in the legacy code) to a combined training and evaluation script?
Edit:
I found two related posts. However I do not think that the answers provided are complete:
how to check both training/eval performances in tensorflow object_detection
How to evaluate a pretrained model in Tensorflow object detection api
The latter has been written while TF's object detection API still had separate evaluation and training scripts. This is not the case anymore.
Thank you very much for any help.
If you specify the checkpoint_dir and set run_once to be true, then it should run evaluation exactly once on the eval dataset. I believe that metrics will be written to the model_dir and should also appear in your console logs. I usually just run this on my local machine (since it's just doing one pass over the dataset) and is not a distributed job. Unfortunately I haven't tried running this particular codepath on CMLE.
Regarding why we have a combined script... from the perspective of the Object Detection API, we were trying to write things in the tf.Estimator paradigm --- but you are right that personally I found it a bit easier when the two functionalities lived in separate binaries. If you want, you can always wrap up this functionality in another binary :)

Calculating the mAP from scores obtained in Tensorflow object detection API

I am actually working on a server where I run the Tensorflow OD API to train the model for my custom dataset. So, I divide my images into training, validation and test sets and run the train.py on training and validation sets. Next I run the inference using exported model checkpoint and frozen graph on my test images.
Now, my question is when I run the inference as provided in the example , I get an output dict with detection scores, number of detections, detection classes, detection masks etc for each image, so from these outputs how do I calculate the mAP for my test set?
Any guidance in this direction will be really helpful, thanks in advance.
You can use COCO's API for calculating COCO's metrics withing TF OD API. See this.
TF feeds COCO's API with your detections and GT, and COCO API will compute COCO's metrics and return it the TF (thus you can display their progress for example in TensorBoard). mAP#0.5 is probably the metric which is most relevant (at it is the standard metric used for PASCAL VOC, Open Images, etc), while mAP#0.5:0.95 is a much more difficult one localization-wise.

Keras evaluate generator progress

I'm using Keras to train an image classification network (using the standalone version - not the one included TensorFlow since it doesn't have Inception ResNet v2 yet).
I'm using evaluate_generator to evaluate the network results on about 20,000 images but it takes a long time to run (a few minutes). Is there a way to output the progress while it's running? Couldn't find anything about this in the documentation or in Google results.

Running tensorflow model training from dataflow

I am playing around with tensorflow and today I have noticed that google also open-sourced Python SDK for their dataflow.
Currently when I need to train and evaluate several networks in parallel I usually use either luigi and run one model training after another or I use spark and I am performing each model training within the map step.
Whole this data processing is just a part of the pipeline.
I am wondering if there is or if there is planned something like perform tensorflow model training step inside of the dataflow pipeline?
Is there currently some best practice around this?
Or do I have to run each model setting within the map step?
I went through the documentation and for now it seems to be really vague, so I'm asking here if someone has some experience with this.
There is nothing planned at this time.
If you can run the Tensorflow training on a single machine (it sounds like this is what you were doing with Spark) then it should be possible to do the training within a DoFn of a Dataflow pipeline.

Categories

Resources