Where is merged distributed keras / tensorflow final model saved?

Where is merged distributed keras / tensorflow final model saved? - python

Excuse me . I have a question . I'm working with a distributed version of tensorflow and keras and I succeeded to make a sample deep learning network work on multiple programs (python scripts) working together but I don't know how to save final single model on one of hosts while currently each script is saving it's own model as checkpoint separately.
Thanks.
Source code I used to develop my program :
Distributed tensorflow / keras code sample on github
For example in the source code above , final model save path is not set , would you tell me how to set it ?

Related

Tensorflow Object API TF2 not displaying visualizations in Tensorboard

Description
I had a setup for training using the object detection API that worked really well, however I have had to upgrade from TF1.15 to TF2 and so instead of using model_main.py I am now using model_main_tf2.py and using mobilenet ssd 320x320 pipeline to transfer train a new model.
When training my model in TF1.15 it would display a whole heap of scalars as well as detection box image samples. It was fantastic.
In TF2 training I get no such data, just loss scalars and 3 input images!! and yet the event files are huge gigabytes! where as they were in hundreds of megs using TF1.15
The thing is there is nowhere to specify what data is presented. I have not changed anything other than which model_main py file I use to run the training. I added num_visualizations: to the pipeline config file but no visualizations of detection boxes appear.
Can someone please explain to me what is going on? I need to be able to see whats happening throughout training!
Thank You
I am training on PC in virtual environment before performing TRT optimization in Linux but I think that is irrelevant here really.
Environment
GPU Type: P220
Operating System + Version: Win10 Pro
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 2
Relevant Files
TF1.15 vs TF2 screenshots:
TF1 (model_main.py) Tensorboard Results
TF2 (model_main_tf2.py) Tensorboard Results
Steps To Reproduce
The repo I am working with GitHub Object Detection API
Model
Pipeline Config File
UPDATE: I have investigated further and discovered that the tensorboard settings are being set in Object Detection Trainer 1 for TF1.15 and Object Detection Trainer 2 for TF2
So if someone who knows more than I do about this could work out what the difference is and what I need to do to get same result in tensorboard with v2 as I do with the first one that would be amazing and save me enormous headache. It would seem that this, even though it is documented as being for TF2, is not actually following TF2 syntax but I could be wrong.

Loading keras model into tensorspace

I understand that I have to visualize my model I have to follow to steps: 1) Preprocessing the pre-trained model (lets assume it's called my_model.h5) and 2.) creation of the interactive model.
Further I have created a json file of my model as mentioned within the instructions (Model Preprocessing):https://tensorspace.org/html/docs/preKeras.html
I have node.js installed and I installed tensorspace via npm install tensorspace. However I'm not able to recall the API of tensorspace. Does anyone now if I missed something out?

Transfer learning with Faster-RCNN using tensorflow-hub.KerasLayer and tensorflow 2.x

System information
What is the top-level directory of the model you are using: http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.1.0
Bazel version (if compiling from source):
CUDA/cuDNN version: 10.0 / 7.6.5
GPU model and memory: GTX 1060
Exact command to reproduce:
I am trying to do transfer learning using faster_rcnn.
Currently this model is not available through tensorflow-hub and thus I have to load a legacy module
from the address given above.
I am able to load the model, retrieving a tensorflow-hub.KerasLayer object and to pass a data trough it.
But now I would like to tune this network to my own dataset that only contains 2 classes, so I wonder how can I modify the KerasLayer object so the classification layer does not output 90 classes, but just 2 ?
If I am not using the right approach, what do you advise to solve my problem ?
I would like to avoid using tensorflow object detection API.
Also as it is mentionned in the documentation, legacy models are not trainable, but I do not think it would be a problem if I add my own layer, am I right ?
Here is the code I am using
import tensorflow_hub as hub
import numpy as np
import cv2
# Run this once, then provide the local path where the model was downloaded,
# instead of the below http address
model = hub.KerasLayer(
"http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz",
signature="serving_default",
signature_outputs_as_dict=True,
)
image = np.expand_dims(cv2.imread("/your/image.jpg"), 0).astype(np.uint8)
# I would like this to output only 2 different classes in 'outputs["detection_classes"]'
outputs = model(image)

Why tensorflow would automatically continue training the model?

I'm running my python-project which is about training a neural network with tensorflow at PyCharm, and find that my network has been already well-trained since the second restart of running my project.
I have no command to restore any trained model in my project (I do have commands for saving models). Anyone knows anything about my problem?
Is it possible that tensorflow or pycharm have default settings to save and restore?
Great thanks!

Tensorflow with poets

I have questions to ask about why tensorflow with poets was not able to classify the image i want. I am using Ubuntu 14.04 with tensorflow installed using docker. Here is my story:
After a successful retrained on flower category following this link here. I wish to train on my own category as well, I have 10 classes of images and they are well organized according to the tutorial. My photos were also stored in the tf_files directory and following the guide i retrain the inception model on my category.
Everything on the retraining went well. However, as I tired to classify the image I want, I was unable to do so and I have this error. I also tried to look for py file in /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py but my dist-packages were empty! Can someone help me in this! Where can I find the py files? Thank you!

Your error indicates that it is unable to locate the file. I would suggest you to execute the following command in the directory where you have the graph and label files
python label_image.py exact_path_to_your_testimage_file.jpg

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.