I am trying to segment 4 lesions with semantic segmentation. I follow this
this great post
My training folder has only 2 subfolders with patches: masks and images. Inside the folder with masks, ALL the classes are mixed. The other folder has the corresponding images. So, when I train the model ,it appears: ONE CLASS FOUND, just following the abovementioned post. The results are disappointing and I am wondering if I have to split the classes in the folders, and thus the model recognizes 4 classes instead of the one.
What your really need to be attentive at is the way in which the masks are created.
It is possible that by default the ImageDataGenerator in Keras to output the number of folders, regardless of how you manually build and adapt the ImageDataGenerator for image segmentation instead of image classification.
My suggestion is to follow the entire post along and change nothing in the first instance. If you pay attention the final results obtained are quite good; this means that the dataset preparation process (mask creation) is correct.
Related
I have a dataset of images that have two folders: test and training. I need to do object detection using OpenCV and Yolo.
Thus, I need to create my own Yolo model for the street objects.
For the training folder:
training
Example training image:
training image
For the test folder:
test
I have the classes txt file which includes id, name and classification (warning, indication and mandatory).
Example:
0 = animal crossing (warning)
1 = soft verges (warning)
2 = road narrows (warning)
Here, the numbers are the numbers (or ids) in the training folder, names, and classification.
My purpose is to create a Yolo model from these training images. I have checked some papers and articles, but in their case, they label the full image using labelimg, but in my case training images are so small and they don't need any labeling.
Thus, I'm confused about how to do this. Could you please give me some ideas?
Labeling images is a must in YOLO's that's how they deal with their loss functions. To detect objects something called (intersection over union )
More easy way to label images is by using (roboflow site ).
I would refer to this image that describes the different types of computer vision tasks.
I think what you want to do is a Classification tasks. Yolo is for Object Detection tasks, where you usually want to detect more than one object per image.
For classification tasks, it can be easier because you don't need to make separate label files. The names of the folders are the labels. Here is an example of a classification model that you can use https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
If you really want to use Yolo you will need to make label files. If you are going to do Classification of the whole image then the format of the annotation will be easy. It would be something like this.
`0 0.5 0.5 1 1' The first column is the class number: 0,1,2,3 etc. You will need to make one file for each image with the name .txt.
Does this help you?
I have trained my CNN in Tensorflow using MNIST data set; when I tested it, it worked very well using the test data. Even, to prove my model in a better way, I made another set taking images from train and test set randomly. All the images that I took from those set, at the same time, I deleted and I didn't give them to my model. It worked very well too, but with a dowloaded image from Google, it doesn't classify well, so my question is: should I have to apply any filter to that image before I give it to the prediction part?
I resized the image and converted it to gray scale before.
MNIST is an easy dataset. Your model (CNN) structure may do quite well for MNIST, but there is no guarantee that it does well for more complex images too. You can add some more layers and check different activation functions (like Relu, Elu, etc.). Normalizing your image pixel values for small values like between -1 and 1 may help too.
I want to detect small objects (9x9 px) in my images (around 1200x900) using neural networks. Searching in the net, I've found several webpages with codes for keras using customized layers for custom objects classification. In this case, I've understood that you need to provide images where your object is alone. Although the training is goodand it classifies them properly, unfortunately I haven't found how to later load this trained network to find objects in my big images.
On the other side, I have found that I can do this using the cnn class in cv if I load the weigths from the Yolov3 netwrok. In this case I provide the big images with the proper annotations but the network is not well trained...
Given this context, could someone show me how to load weigths in cnn that are trained with a customized network and how to train that nrtwork?
After a lot of search, I've found a better approach:
Cut your images in subimages (I cut it in 2 rows and 4 columns).
Feed yolo with these subimages and their proper annotations. I used yolov3 tiny, with a size of 960x960 for 10k steps. In my case, intensity and color was important so random parameters such as hue, saturation and exposition were kept at 0. Use random angles. If your objects do not change in size, disable random at yolo layers (random=0 in cfg files. It only randomizes the fact that it changes the size for training in every step). For this, I'm using Alexey darknet fork. If you have some blur object, add blur=1 in the [net] properties in cfg file (after hue). For blur you need Alexey fork and to be compiled with opencv (appart from cuda if you can).
Calculate anchors with Alexey fork. Cluster_num is the number of pairs of anchors you use. You can know it by opening your cfg and look at any anchors= line. Anchors are the size of the boxes that darknet will use to predict the positions. Cluster_num = number of anchors pairs.
Change cfg with your new anchors. If you have fixed size objects, anchors will be very close in size. I left the ones for bigger (first yolo layer) but for the second, the tinies, I modified and I even removed 1 pair. If you remove some, then change the order in mask [yolo] (in all [yolo]). Mask refer to the index of the anchors, starting at 0 index. If you remove some, change also the num= inside the [yolo].
After, detection is quite good.It could happen that if you detect on a video, there are objects that are lost in some frames. You can try to avoid this by using the lstm cfg. https://github.com/AlexeyAB/darknet/issues/3114
Now, if you also want to track them, you can apply a deep sort algorithm with your yolo pretrained network. For example, you can convert your pretrained network to keras using https://github.com/allanzelener/YAD2K (add this commit for tiny yolov3 https://github.com/allanzelener/YAD2K/pull/154/commits/e76d1e4cd9da6e177d7a9213131bb688c254eb20) and then use https://github.com/Qidian213/deep_sort_yolov3
As an alternative, you can train it with mask-rcnn or any other faster-rcnn algorithm and then look for deep-sort.
I'm currently working on a program that can do binary image classification with machine learning. I have a list of labels and a list of images that i'm using as inputs which are then fed into the Inception V3 model.
Will inputting of the dataset this way work with the inception V3 architecture? Is it necessary to organize the images with labeled folders before feeding it into the model?
Thanks for your help!
In your example, you have all the images in memory. You can simply call model.fit(trainX, trainY) to train your model. No need to organize the images in specific folder structures.
What you are referring to, is the flow_from_directory() method of the ImageDataGenerator. This is an object that will yield images from the directories, and automatically infer the labels from the folder structure. In this case, your images should be arranged in one folder per label. Since the ImageDataGenerator is a generator, you should use it in combination with model.fit_generator().
As a third option, you can write your own custom generator that yields both images and labels. This is advised in case you have a more complex label structure than one label per images; for instance in multi-label classification, object detection or semantic segmentation, where the outputs are also images. A custom generator should also be used with model.fit_generator().
I am creating a 11 class object detector using the faster-RCNN model set up to the maximum size of 300x400 in the image-resizer tag. This is due to CUDA OOM error popping up if I go any higher as the GPU is a 1050 Ti, 4Gb ver, so I have approximately 3800-3900 Mb of model run-time training memory.
I have followed erishima's steps and mutated them with the Pet's scripts and Dati Tran's to generate the TFRecord files.
The steps were as follows:
Create the labels for the categories using labelImg.
Use the name field in labelImg to annotate the class of the image file.
Create a CSV file and extract the filename, class, xmin, ymin, xmax, ymax from the XML file. (Custom Script)
Create a train and test/eval CSV from the main CSV file.
Generate the TFRecord files to be inputted into the config file. Train and Test.(Dati Tran's script modified to suit needs)
Modify faster_rcnn_config without touching the hyper-parameters.
Created a label_map.pbtxt file which corresponded to the names of the classes. Started from 1 as stated in many other answers related to this topic.
Started training the model via the stated method.
The dataset for the classes is custom and the images/class varies from 2500 to 300. The dataset has no definition of orientation of the object and the difficulty of detection in the image even though every possible angle of the object is present in those images.
The problem which arises when I have trained to a loss value of .002 after 217k steps was that a single class was enveloping the objects of all other classes whether I ran the detector on a video or images. I have not tried to run the eval.py script as that takes too long on this setup and those I can't really see the mAP for the classes but I would assume that it should be redundant information as the problem should be in the dataset set preparation method or in the dataset itself.
When retrained from anew for 60k steps, the problem persisted but with another class enveloping all the other.
The warnings shown were:
The Sparse Index Tensor going to take alot of memory. Can I change the code so that this does not pop-up and possibly save me some precious memory.
Wanted [x,?,?,y], got [x,y,z,a,b] instead. This one stops the training. Got this 2 times in the training upto 217k steps. Have no idea where this one originates; probably, the dataset.
If someone can show me even a hint to the proper fix to this, I would highly appreciate it.
I believe you have class imbalance. Had similar problem in the past
Do an analysis of your dataset - make sure # of images per class are in similar order of magnitude.