Does Tensorflow use only one hot encoding to store labels?

Does Tensorflow use only one hot encoding to store labels? - python

I have just started working with Tensorflow, with Caffe it was super practical reading in the data in an efficient manner but with Tensorflow I see that I have to write data loading process myself, creating TFRecords, the batching, the multiple threats, handling those threads etc. So I started with an example, inception v3, as they handle the part to read in the data. I am new to Tensorflow and relatively new to Python, so I feel like I don't understand what is going on with this part exactly (I mean yes it extends the size of the labels to label_index * no of files -but- why? Is it creating one hot encoding for labels? Do we have to? Why doesn't it just extend as much for the length or files as each file have a label? Thx.
labels.extend([label_index] * len(filenames))
texts.extend([text] * len(filenames))
filenames.extend(filenames)
The whole code is here: https://github.com/tensorflow/models/tree/master/research/inception
The part mentioned is under data/build_image_data.py and builds image dataset from an existing dataset as images stored under folders (where foldername is the label): https://github.com/tensorflow/models/blob/master/research/inception/inception/data/build_image_data.py

Putting together what we discussed in the comments:
You have to one-hot encode because the network architecture requires you to, not because it's Tensorflow's demand. The network is a N-class classifier, so the final layer will have one neuron per class and you'll train the network to activate the neuron matching the class the sample belongs to. One-hot encoding the label is the first step in doing this.
About the human-readable labels, the code you're referring to is located in the _find_image_files function, which in turn is used by _process_dataset to transform the dataset from a set of folders to a set TfRecord files, which are a convenient input format type for Tensorflow.
The human-readable label string is included as a feature in the Examples inside the tfrecord files as an 'extra' (probably to simplify visualization of intermediate results during training), it is not strictly necessary for the dataset and will not be used in any way in the actual optimization of the network's parameters.

Related

TFLite model maker custom object detector training using tfrecord

I am trying to train a custom object detector using tflite model maker (https://www.tensorflow.org/lite/tutorials/model_maker_object_detection). I want to deploy trained tflite model to coral edgeTPU. I want to use tensorflow tfrecord (multiple) as input for training a model like object detection API. I tried with
tflite_model_maker.object_detector.DataLoader(
tfrecord_file_patten, size, label_map, annotations_json_file=None
) but I am not able to work around it. I have following questions.
Is it possible to tfrecord for training like mentioned above?
Is it also possible to pass multiple CSV files for training?

For multiple CSV files, you could probably just append one file to the other. Then you'd just have to pass one csv file.
As for passing a tfrecord instead, this should be possible. I'm also attempting to do this, so if I get it working I'll update my post. Looking at the source, it seems from_cache is the function internally used. Following that structure, should be able to create a DataLoader object similarly:
train_data = DataLoader(tfrecord_file_patten, meta_data['size'],
meta_data['label_map'], ann_json_file)
In this case, tfrecord_file_patten should be a tfrecord of your training data. You can construct the validation and test data the same way. This will work provided you're constructing your TFRecords correctly. There appears to be some inconsistency to how it's done in different places, so make sure you follow the same structure in creating the TFRecords as found in the ModelMaker source. This worked for me. One specific thing to watch out for is to use an integer for the 'image/source_id' feature in your TFExamples. If you use a string it'll throw an error.

Saving custom variables in Keras .h5 file

I'm developing a RNN for a project and I need to train it on a computer and be able to predict on another. The solution I found is to save the model into a .h5 file using the code below:
... # Train the data etc....
model.save("model.h5")
My problem is that I need to store some meta-data from my training dataset and pre-process and be able to load it together with the model. (e.g. name of dataset file, size of the dataset file, number of characters, etc...)
I don't want to store this information in a second file (e.g. a .txt file) because I would have to use two files. I don't want to use any additional library or framework for this task.
I was thinking (brainstorming) a code like this:
model.save("model.h5", metaData={'myVariableName': myVariable})
And to load would be:
myVariable = model.load("model.h5").getMetaData('myVariableName')
I know this is not possible in the current version and I already read Keras doc, but I couldn't find any efficient method to do that. Notice that what I'm asking is different from custom_object because want to save and load my own variables.
Is there a smarter approach to solve this problem?

splitting tfrecords dataset with multiple features

I have an image classification task where I've created multiple crops of each image as well as flipped/flopped versions to extend my limited dataset. I have written the dataset to a tfrecords file where each record consists of (simplified here to two crops and only a flipped version):
{
lbl: int,
crop_0: np.ndarray,
crop_1: np.ndarray,
crop_0_flipped: np.ndarray,
crop_1_flipped: np.ndarray
}
Basically 4 images / entry. During training, I'd like to treat each image as separate, i.e. feed each record as 4 images with the same label, shuffled with the rest of the images in the dataset, so that N images becomes 4N images. During testing (using a separate but similarly structured dataset), I'd like to take each image, only use the crop_0 and crop_1 images and average the softmax outputs for classification.
My question is - what is the best and most efficient way of training such a dataset? I'm willing to change my approach if this will make training more inefficient, and it seems that the simplest thing to do would have been to have separate tfrecords files for each version (crop & flip/flop images) and interleave the files into one dataset, but I do not want to have a whole bunch of files to deal with if I can help it.

Writing the dataset to disk with 4N images is an approach that you'll come to loath later (I did it this way originally and loath that code now). The better way is to keep your original dataset on disk as-is, don't write your preprocessing steps to disk. Do that kind of preprocessing in the CPU while you train. The tensorflow Dataset preprocessing pipeline makes this easy, modular, and provides the parallelization you need to take advantage of multiple cores at not extra coding expense.
This is the main guide:
https://www.tensorflow.org/programmers_guide/datasets
Your approach should be to create 2 Dataset objects, one for train and one for test. The train Dataset pipeline will perform all the data augmentation you mentioned. The test Dataset pipeline will not, naturally.
One key to understanding this approach is that you will not feed the data to tensorflow using feed_dict, instead, tensorflow will just invoke the Dataset pipeline to pull the data it needs for each batch.
To get parallelization you'll use the Dataset.map function to apply some set of transformations and use the property num_parallel_calls to distribute the operations across multiple cores. If your preprocessing can be done in tensorflow code, great, if not you'll need to use tf.py_func to use python preprocessing code.
The guide I linked to above describes all of this very well. You will want to us a feedable iterator described in the section called "Creating an iterator". This will allow you to get a string ID from each of the 2 datasets (train and test) and pass that string to tensorflow via feed_dict indicating which of the two datasets tensorflow should pull samples from.

How to create my own dataset to train/test a convolutional neural network

So here is my question:
I want to make my very own dataset using a motion capture camera system to get the ground truth poses and one RGB camera to get images, and then using this as input to my network, train/test a convNet.
I have looked around at other datasets for tensorflow, caffe and Matlab. I have viewed the MNIST, Cats/Dogs, Iris, LSP, HumanEva, HumanEva3.6, FLIC, etc. datasets and have viewed and tried to understand their data as best as I can. I have viewed online people trying to make their own datasets. The one thing is usually when you use their datasets as an example, you download a .txt file that already contains the labels.
If anyone could please explain to me how to use the image data with the labels to feed it into my network, it would be a tremendous help. I have made code before using tensorflow to input a .txt file into the network and get the correct predicted output. But, my brain is missing something to understand how to input an image with a label. How to I create that dataset?

Your input images and your labels are two separate variables. You will be writing separate bits of code to import them. The videos typically need to be converted to JPG files (it's a royal pain to read video files directly, mostly because you can't randomly skip around the video easily).
Probably the easiest way to structure you data is via a CSV that contains filename, poseinfoA, poseinfoB, etc. And the filename refers to the JPG image on disk.
To get started on the basics, I suggest looking at the Aymericdamen tutorial examples, I haven't found tutorials anywhere that were as clear and concise.
https://github.com/aymericdamien/TensorFlow-Examples
Those examples don't go into detail on the data input pipeline though. To set up a good data input pipeline in tensorflow I suggest you use the new (as of TF 1.4) Dataset object. It will force you into a good data input pipline workflow, and it's the way all data input is going in tensorflow, so it's worth learning. It's also easy to test and debug when you write it this way. Here's the guide you want to follow.
https://www.tensorflow.org/programmers_guide/datasets
You can start your Dataset object from the CSV, and use a dataset.map_fn() to load the images using tf.image.decode_jpeg
Since you're doing pose estimation I'll also suggest a nice blog I came across recently that will probably interest you. The topic is segmentation, but pose estimation is quite related.
http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review

Tensorflow Object Detection API multi-class error

I am creating a 11 class object detector using the faster-RCNN model set up to the maximum size of 300x400 in the image-resizer tag. This is due to CUDA OOM error popping up if I go any higher as the GPU is a 1050 Ti, 4Gb ver, so I have approximately 3800-3900 Mb of model run-time training memory.
I have followed erishima's steps and mutated them with the Pet's scripts and Dati Tran's to generate the TFRecord files.
The steps were as follows:
Create the labels for the categories using labelImg.
Use the name field in labelImg to annotate the class of the image file.
Create a CSV file and extract the filename, class, xmin, ymin, xmax, ymax from the XML file. (Custom Script)
Create a train and test/eval CSV from the main CSV file.
Generate the TFRecord files to be inputted into the config file. Train and Test.(Dati Tran's script modified to suit needs)
Modify faster_rcnn_config without touching the hyper-parameters.
Created a label_map.pbtxt file which corresponded to the names of the classes. Started from 1 as stated in many other answers related to this topic.
Started training the model via the stated method.
The dataset for the classes is custom and the images/class varies from 2500 to 300. The dataset has no definition of orientation of the object and the difficulty of detection in the image even though every possible angle of the object is present in those images.
The problem which arises when I have trained to a loss value of .002 after 217k steps was that a single class was enveloping the objects of all other classes whether I ran the detector on a video or images. I have not tried to run the eval.py script as that takes too long on this setup and those I can't really see the mAP for the classes but I would assume that it should be redundant information as the problem should be in the dataset set preparation method or in the dataset itself.
When retrained from anew for 60k steps, the problem persisted but with another class enveloping all the other.
The warnings shown were:
The Sparse Index Tensor going to take alot of memory. Can I change the code so that this does not pop-up and possibly save me some precious memory.
Wanted [x,?,?,y], got [x,y,z,a,b] instead. This one stops the training. Got this 2 times in the training upto 217k steps. Have no idea where this one originates; probably, the dataset.
If someone can show me even a hint to the proper fix to this, I would highly appreciate it.

I believe you have class imbalance. Had similar problem in the past
Do an analysis of your dataset - make sure # of images per class are in similar order of magnitude.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.