How to create Yolo model from train and test images?

How to create Yolo model from train and test images? - python

I have a dataset of images that have two folders: test and training. I need to do object detection using OpenCV and Yolo.
Thus, I need to create my own Yolo model for the street objects.
For the training folder:
training
Example training image:
training image
For the test folder:
test
I have the classes txt file which includes id, name and classification (warning, indication and mandatory).
Example:
0 = animal crossing (warning)
1 = soft verges (warning)
2 = road narrows (warning)
Here, the numbers are the numbers (or ids) in the training folder, names, and classification.
My purpose is to create a Yolo model from these training images. I have checked some papers and articles, but in their case, they label the full image using labelimg, but in my case training images are so small and they don't need any labeling.
Thus, I'm confused about how to do this. Could you please give me some ideas?

Labeling images is a must in YOLO's that's how they deal with their loss functions. To detect objects something called (intersection over union )
More easy way to label images is by using (roboflow site ).

I would refer to this image that describes the different types of computer vision tasks.
I think what you want to do is a Classification tasks. Yolo is for Object Detection tasks, where you usually want to detect more than one object per image.
For classification tasks, it can be easier because you don't need to make separate label files. The names of the folders are the labels. Here is an example of a classification model that you can use https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
If you really want to use Yolo you will need to make label files. If you are going to do Classification of the whole image then the format of the annotation will be easy. It would be something like this.
`0 0.5 0.5 1 1' The first column is the class number: 0,1,2,3 etc. You will need to make one file for each image with the name .txt.
Does this help you?

Related

Keras "flow_from_directory", sequence of images does match with the file names

I am new to machine learning and want to ask a question about the "flow_from_directory" function in Keras.
I have trained an image recognition model with ResNet50, and now I want to predict the test images with this model. There are 5 classes of images, which are "daisy", "dandelion", "rose", "sunflower", and "tulip", and their label are corresponding to [10000],[01000],[00100],[00010],and[00001] respectively.
Attach is part of my code to read and predict the test images:
The variable "filenames" is a list of the test images, and the variable "y_act" should be the actual labels of the test images.
However, I found the sequence of "filenames" doesn't match with the "y_act", see the two attached images:
I want to make the "filenames" and the "y_act" in the same sequence, does anyone knows how do realize this? Thanks a lot in advance

General object recognition with biggest number of classes

I'm new to the computer vision world, I'm trying to create a script with the objective to gather data from a dataset of images.
I'm interested in what kind of objects are in those images and getting a summary of them in a json file for every image.
I've checked out some YOLO implementations but the ones I've seen are almost always based on COCO and have 80 classes or have a custom dataset.
I've seen that there are algorithms like InceptionV3 etc. which are capable of classifying 1000 classes. But per my understanding object classification is different from object recognition.
Is there a way to use those big dataset classification algos for object detection?
Or any other suggestion?

Unfortunately, I do not know where the breaking point is, and of course, it will depend on acceptable evaluation metrics and training data size.
From a technical point of view, there is no hard limit and if you go to extremes there could be Core ML model size issues and memory issues during inferences. However, that will only happen for an extremely large number of classes.
From a modeling perspective (which is a problem that will happen much earlier than the technical limitation) it is not as clear. As you increase the number of classes, you increase the risk of making classification mistakes. Although, the severity of a lot of the mistakes should simultaneously go down as you will have more and more classes that are naturally similar (breeds of dogs, etc.). The original YOLO9000 paper (https://arxiv.org/pdf/1612.08242.pdf) trained a model using 9000+ classes with reasonable results (lots of mistakes of course, but still impressive). They trained it on a combination of detection and classification data, so if they actually had detection data for all 9000, then results would presumably be even better.
In your experiment, it sounds like 50-60 was OK (thanks for giving us a sample point!). Anything below 100 is definitely tried and true, as long as you have the data. However, will 300 do OK? Will 1000 do OK? Theoretically, I would say yes, if you are able to provide enough training data and you adjust your expectation of what a good evaluation metric is since you know you'll make more mistakes. For instance, for classification with 1000 classes, it is common to report top-5 accuracy (that is, the correct label is in your top-5 classes for a sample).
Here is a useful link - https://github.com/apple/turicreate/issues/968

First, to level set on terminology.
Image Classification based neural networks, such as Inception and Resnet, classify an entire image based upon the classes the network was trained on. So if the image has a dog, then the classifier will most likely return the class dog with a higher confidence score as compared to the other classes the network was trained on. To train a network such as this, it's simple enough to group the same class images (all images with a dog) into folders as inputs. ImageNet and Pascal VOC are examples of public labeled datasets for Image Classification.
Object Detection based neural networks on the other hand, such as SSD and Yolo, will return a set of coordinates that indicate a bounding box and confident score for each class (object) that is detected based upon what the network was trained with. To train a network such as this, each object in an image much as annotated with a set of coordinates that correspond to the bounding boxes of the class (object). The COCO dataset, for example, is an annotated dataset of 80 classes (objects) with coordinates corresponding to the bounding box around each object. Another popular dataset is Object365 that contains 365 classes.
Another important type of neural network that the COCO dataset provides annotations for is Instance Segmentation models, such as Mask RCNN. These models provide pixel-level classification and are extremely compute-intensive, but critical for use cases such as self-driving cars. If you search for Detectron2 tutorials, you will find several great learning examples of training a Mask RCNN network on the COCO dataset.
So, to answer your question, Yes, you can use the COCO dataset (amongst many other options available publicly on the web) for object detection, or, you can also create your own dataset with a little effort by annotating your own dataset with bounding boxes around the object classes you want to train. Try Googling - 'using coco to train ssd model' to get some easy-to-follow tutorials. SSD stands for single-shot detector and is an alternative neural network architecture to Yolo.

Semantic segmentation dataset organization

I am trying to segment 4 lesions with semantic segmentation. I follow this
this great post
My training folder has only 2 subfolders with patches: masks and images. Inside the folder with masks, ALL the classes are mixed. The other folder has the corresponding images. So, when I train the model ,it appears: ONE CLASS FOUND, just following the abovementioned post. The results are disappointing and I am wondering if I have to split the classes in the folders, and thus the model recognizes 4 classes instead of the one.

What your really need to be attentive at is the way in which the masks are created.
It is possible that by default the ImageDataGenerator in Keras to output the number of folders, regardless of how you manually build and adapt the ImageDataGenerator for image segmentation instead of image classification.
My suggestion is to follow the entire post along and change nothing in the first instance. If you pay attention the final results obtained are quite good; this means that the dataset preparation process (mask creation) is correct.

Unable to improve the mask RCNN model for document images?

I am training a model to extract all the necessary fields from a resume for which I am using mask rcnn to detect the fields in image. I have trained my mask RCNN model for 1000 training samples with 49 fields to extract. I am unable to improve the accuracy. How to improve the model? Is there any pretrained weights that may help?
Difficulty in reading following text -

Looks like you want to do text classification/processing, you need to extract details from the text but you are applying object detection algorithms. I believe you need to use OCR to extract text (if you have cv as an image) and use the text classification model. Check out the below links more information about text classification -
https://medium.com/#armandj.olivares/a-basic-nlp-tutorial-for-news-multiclass-categorization-82afa6d46aa5
https://www.tensorflow.org/tutorials/tensorflow_text/intro

You can break up the problem two different ways:
Step 1- OCR seems to be the most direct way to get to your data. But increase the image size, thus resolution, otherwise, you may lose data.
Step 2- Store the coordinates of each OCRed word. This is valuable information in this context. How words line up have significance.
Step 3- At this point you can try to use basic positional clustering to group words. However, this can easily fail on a columnar vs row-based distribution of related text.
Step 4- See if you can identify which of 49 tags these clusters belong to.
Look at text classification for Hidden Markov models, Baum-Welch Algorithms. i.e. Go for basic models first.
OR
The above ignores the inherent classification opportunity that is the image of a, well, a properly formatted cv.
Step 1- Train your model to partition the image into sections without OCR. A good model should not break up the sentences, tables etc. This approach may leverage separators lines etc. There is also opportunity to decrease the size of your image since you are not OCRing yet.
Step 2 -OCR image sections and try to classify similar to above.

Another option is to use the neural networks like - PixelLink: Detecting Scene Text via Instance Segmentation
https://arxiv.org/pdf/1801.01315.pdf

Is Dataset Organization for Image Classification Necessary?

I'm currently working on a program that can do binary image classification with machine learning. I have a list of labels and a list of images that i'm using as inputs which are then fed into the Inception V3 model.
Will inputting of the dataset this way work with the inception V3 architecture? Is it necessary to organize the images with labeled folders before feeding it into the model?
Thanks for your help!

In your example, you have all the images in memory. You can simply call model.fit(trainX, trainY) to train your model. No need to organize the images in specific folder structures.
What you are referring to, is the flow_from_directory() method of the ImageDataGenerator. This is an object that will yield images from the directories, and automatically infer the labels from the folder structure. In this case, your images should be arranged in one folder per label. Since the ImageDataGenerator is a generator, you should use it in combination with model.fit_generator().
As a third option, you can write your own custom generator that yields both images and labels. This is advised in case you have a more complex label structure than one label per images; for instance in multi-label classification, object detection or semantic segmentation, where the outputs are also images. A custom generator should also be used with model.fit_generator().

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.