Object Detector Training using TensorFlow - python

I am hoping someone on here has experience in training object detection models with tensorflow. I am a complete newbie, trying to learn. I ran through a few of the tutorials on the tensorflow site and am now going to try a real world example. I am following the tutorial here. I am at the point where I need to label the images.
My plan is to try to detect scallops, but the images I using have several scallops. Some I wouldn't really be able to tell were scallops are other than the fact I have context that they are likely a scallop because they are next to a mound of other scallops.
My questions are:
Am I better off cutting them out and treating them individually? Or labeling images that have several scallops
When labeling the scallops there are many that might look just like a round rock if I didn't have context of seeing other scallops. Should I still label them?
I am guessing I will also need to find some images with differing backgrounds???.
I know I can experiment to see how the models perform, but labeling these images is a labour intensive task, so I am hoping I can borrow from someones experience who has attempted something similar in the past. Example of one of the images that I am part way through labeling:

1) Good question! The answer is easy, you should label the images as the model would see them at inference time. There's no reason to "lie" to your model (by not labeling something), you'll only confuse it. Be truthful, if you see a scallop, label it. If you don't label something, it's like a negative example, which will confuse the model. ==> A: multiple scallops
2) Seems like the model will take images of (many) scallops as input, so it's not a problem that it learns that 'round objects next to a mound of scallops are likely also a scallop', it's even a good thing, because they often are. So, again, be truthful, label everything.
3) That depends, how will you use the model at inference time? Will the images all have the same background then? If yes, you don't need different backgrounds, if no, you do need them.

Related

Is there a better way to get accurate count?

i'm working on this project to get accurate number of fishes in images. However, i've used pixellib but couldn't get accurate result. Is there any package that could just count objects in an image and give accurate result?
This is a test image
https://i.stack.imgur.com/RUa79.jpg
This is the output of the test image.
Or will watershed algorithm be better because the object recognition is not important than the number of fishes in the image ?
First of all, you don't need to spend any resource on segmentation if the actual goal is just to count your objects. Object detection might be enough.
It is important to choose because you definitely need to train a custom model because of the specific form, partially visible objects, overlapping, etc.
I've just tried to segment your example:
another part:
BTW, it is better to improve overall input quality. Anyway, segmented results might be used to "measure" shapes, so in a way joined shapes can be properly "interpreted".
You can prepare a specific training set, so your model will recognize all corner cases in a proper way, but it might take some time.
Object detection will not require so many efforts:
Long story short: improve your input (if possible) and try to use any suitable tutorial
Update
Bounding box (detection) vs accurate shape (segmentation) discussion is quite useless, because the goal is to get an accurate count, so let's address another issue:
Even a super advanced model/approach will fail with the example provide. I'd suggest to start with any possible/reasonable input improvement.

How to determine which method of OCR to use depending on images quality

I am asking a question, because my two week research are started to get me really confused.
I have a bunch of images, from which I want to get the numbers in Runtime (it is needed for reward function in Reinforcment Learning). The thing is, that they are pretty clear for me (I know that it is absolutely different thing for OCR-systems, but that's why I am providing additional images to show what I am talking about)
And I thought that because they are rather clear. So I've tried to use PyTesseract and when it does not worked out I have tried to research which other methods could be useful to me.
... and that's how my search ended here, because two weeks of trying to find out which method would be bestly suited for my problem just raised more questions.
Currently I think that the best resolve for it is to create digit recognizing model from MNIST/SVNH dataset, but is not it a little bit overkill? I mean, images are standardized, they are in Grayscale, they are small, and the numbers font stays the same so I suppose that there is easier way of modyfing those images/using different OCR method.
That is why I am asking for two questions:
Which method should be the most useful for my case, if not model
trained with MNIST/SVNH datasets?
Is there any kind of documentation/books/sources which could make the actual choice of infrastructure easier? I mean, let's say
that in future I will come up again to plan which OCR system to use.
On what basis should I make choice? Is it purely trial and error
thing?
If what you have to recognize are those 7 segment digits, forget about any OCR package.
Use the outline of the window to find the size and position of the digits. Then count the black pixels in seven predefined areas, facing the segments.

How to better preprocess images for a better deep learning result?

We are experimenting with applying a convolutional neural network to classify good surfaces and surfaces with defects.
The good and bad images are mostly like the following:
Good ones:
Bad ones:
The image is relatively big (Height:800 pixels, width: 500 pixels)
The defect very local and small relative to image
The background is very noisy
The deep learning (6 x conv+pooling -> flatten -> dense64-> dense32) result is very bad
(perhaps due to limited Bad samples and very small defect pattern)
There are other defect patterns like very subtle scratches, residuals and stains, etc., which is one of the main reasons that we want to use deep learning instead of specific feature engineering.
We can and are willing to accumulate more images of defects.
So the question are:
Is deep learning even an appropriate tool for defect detection like this in practice.
If yes, how can we adapt or pre-process the images to the formats that the deep learning models can really work with. (Could we apply some known filters to make the background much less noisy?)
If no, what are other practical techniques that can be used other than deep models.
Will things like template matching or anything else actually be a fit for this type of problems?
Update:
Very good idea to come up with an explicit circular stripes checker.
It might be directly used to check where the pattern is disturbed or be used as a pre-processing step for deep learning.
Update:
A more subtle pattern 'scratch'.
There is a scratch starting from the bottom of the fan area going up and a little to the right.
Is deep learning even an appropriate tool for defect detection like
this in practice.
Deep learning certainly is a possibility that promises to be universal. In general, it should rather be the last resort than the first approach. Downsides include:
It is difficult to include prior knowledge.
You therefore need an extreme amount of data to train the classifier for the general case.
If you succeed, the model is opaque. It might depend on subtle properties, which cause it to fail if the manufacturing process is changed in the slightest way and there is no easy way to fix it.
If yes, how can we adapt or pre-process the images to the formats that
the deep learning models can really work with. (Could we apply some
known filters to make the background much less noisy?)
Independent of the classifier you eventually decide to use, preprocessing should be optimal.
Illumination: The illumination is uneven. I'd suggest to define a region of interested, in which the illumination is bright enough to see something. I'd suggest to calculate the average intensity over many images and use this to normalize the brightness. The result would be an image cropped to the region of interest, where the illumination is homogenous.
Circular stripes: In the images you show, as the stripes are circular, their orientation depends on the position in the image. I would suggest to use a transformation, which transforms the region of interest (fraction of a circle) into a trapezoid, where each stripe is horizontal and the length of each stripe is retained.
If no, what are other practical techniques that can be used other than
deep models. Will things like template matching or anything else
actually be a fit for this type of problems?
Rather than identifying defects, you could try identifying the intact structure, which has relatively constant properties. (This would be the circular stripes checker that I have suggested in the comment). Here, one obvious thing to test would be a 2D fourier transformation at each pixel within an image preprocessed as described above. If the stripes are intact, you should see that the frequency of intensity change is much lower in horizontal than in vertical direction. I would just plot these two quantities for many "good" and "bad" pixels and check, whether that might already allow some classification.
If you can preselect possible defects with that method, you could then crop out a small image and subject it to deep learning or whatever other method you want to use.

Compare original to modified images

I am working on a project in which I want to compare a non-modified original picture against a dataset that contains images of which some are small to medium alterations of the original image. These alterations can go from simple color changes, gradients, lighting, flipping/rotating the image to even modifications done by a professional in Photoshop and used for a movie poster.
My goal is to identify, with rather good accuracy, if the original image has been used in one of the images.
I have already tried many different approaches:
Perceptual Hashing
Feature Extraction
Both with and without Machine Learning techniques
Tensorflow
...
However, I always have the feeling like all the above have some shortcomings in terms of accuracy and performance.
Therefore I was wondering if someone knows a good Python project (Github, website,...) that will allow me to achieve my goal.

Cityscapes Trafficsigns no box- or mask- detection with TF Object Detection API

I'd be thankful for all thoughts, tips or links on this:
Using TF 1.10 and the recent object detection-API (github, 2018-08-18) I can do box- and mask prediction using the PETS dataset as well as using my own proof of concept data-set:
But when training on the cityscapes traffic signs (single class) I am having troubles to achieve any results. I have adapted the anchors to respect the much smaller objects and it seems the RPN is doing something useful at least:
Anyway, the box predictor is not going into action at all. That means I am not getting any boxes at all - not to ask for masks.
My pipelines are mostly or even exactly like the sample configs.
So I'd expect either problems with the specific type of data or a bug .
Would you have any tips/links how to (either)
visualize the RPN results when using 2 or 3 stages? (Using only one stage does that, but how would one force that?)
train the RPN first and continue with boxes later?
investigate where/why the boxes get lost? (having predictions with zero scores while evaluation yields zero classification error)
The Solution finally was a combination of multiple issues:
The parameter from_detection_checkpoint: true is depreciated and to be replaced by fine_tune_checkpoint_type: 'detection'. However, without any of those the framework seems to default to 'classification', what seems to break the whole idea of the object detection framework. No good idea to rely on the defaults this time.
My data wasn't prepared good enough. I had boxes with zero width+/height (for whatever reason). I also removed masks for instances that were disconnected.
Using the keep_aspect_ratio_resizer together with random_crop_image and random_coef: 0.0 does not seem to allow for the full resolution as the resizer seems to be applied before the random cropping. I do now split my input images into (vertical) stripes [for memory saving] and apply the random_crop with a small min_area so it does not skip the small features at all. Also I can now allow for a max_area: 1 and a random coefficient > 0, as the memory usage is dealt with.
One potential problem also arose from the fact that I only considered a single class (so far). This might be a problem either for the framework, or for the activation function in the network. However, in combination with the other issues this change seemed to cause no additional problems - at minimum.
Last but not least I updated the sources to 2018-10-02 but didn't walk through all modifications in detail.
I hope others can save time and troubles from my findings.

Categories

Resources