i would like to train a CNN for detection and classification of any kind of signs (mainly laboratory and safety markers) using tensorflow.
While I can gather enough training data for the classification training set, using e.g. The Bing API, I‘m struggeling to think about a solution to get enough images for the object detection training set. Since these markers are mostly not public available, I thought I could make a composite of a natrual scene image with the image of the marker itself, to get a training set. Is there any way to do that automatically?
I looked at tensorflow data augmentation class, but it seems it only provides functionality for simpler data augmentation tasks.
You can do it with OpenCV as preprocessing.
The algorithm follows:
Choose a combination of a natural scene image and a sign image randomly.
Sample random position in the natural scene image where the sign image is pasted.
Paste the sign image at the position.
Obtain the pasted image and the position as a part of training data.
Step1 and 2 is done with python standard random module or numpy.
Step3 is done with opencv-python. See overlay a smaller image on a larger image python OpenCv
.
Related
I want to detect small objects (9x9 px) in my images (around 1200x900) using neural networks. Searching in the net, I've found several webpages with codes for keras using customized layers for custom objects classification. In this case, I've understood that you need to provide images where your object is alone. Although the training is goodand it classifies them properly, unfortunately I haven't found how to later load this trained network to find objects in my big images.
On the other side, I have found that I can do this using the cnn class in cv if I load the weigths from the Yolov3 netwrok. In this case I provide the big images with the proper annotations but the network is not well trained...
Given this context, could someone show me how to load weigths in cnn that are trained with a customized network and how to train that nrtwork?
After a lot of search, I've found a better approach:
Cut your images in subimages (I cut it in 2 rows and 4 columns).
Feed yolo with these subimages and their proper annotations. I used yolov3 tiny, with a size of 960x960 for 10k steps. In my case, intensity and color was important so random parameters such as hue, saturation and exposition were kept at 0. Use random angles. If your objects do not change in size, disable random at yolo layers (random=0 in cfg files. It only randomizes the fact that it changes the size for training in every step). For this, I'm using Alexey darknet fork. If you have some blur object, add blur=1 in the [net] properties in cfg file (after hue). For blur you need Alexey fork and to be compiled with opencv (appart from cuda if you can).
Calculate anchors with Alexey fork. Cluster_num is the number of pairs of anchors you use. You can know it by opening your cfg and look at any anchors= line. Anchors are the size of the boxes that darknet will use to predict the positions. Cluster_num = number of anchors pairs.
Change cfg with your new anchors. If you have fixed size objects, anchors will be very close in size. I left the ones for bigger (first yolo layer) but for the second, the tinies, I modified and I even removed 1 pair. If you remove some, then change the order in mask [yolo] (in all [yolo]). Mask refer to the index of the anchors, starting at 0 index. If you remove some, change also the num= inside the [yolo].
After, detection is quite good.It could happen that if you detect on a video, there are objects that are lost in some frames. You can try to avoid this by using the lstm cfg. https://github.com/AlexeyAB/darknet/issues/3114
Now, if you also want to track them, you can apply a deep sort algorithm with your yolo pretrained network. For example, you can convert your pretrained network to keras using https://github.com/allanzelener/YAD2K (add this commit for tiny yolov3 https://github.com/allanzelener/YAD2K/pull/154/commits/e76d1e4cd9da6e177d7a9213131bb688c254eb20) and then use https://github.com/Qidian213/deep_sort_yolov3
As an alternative, you can train it with mask-rcnn or any other faster-rcnn algorithm and then look for deep-sort.
Using a tensorflow image classifier would it be possible to return the marker positions of the items matched.
For example the shoe in the picture using tensorflow.
It is not possible to have what you want using a classical image classification model. What you need is a segmentation network (your training data must be appropriately changed). However, theoretically, it is possible to loosely localize an object using classification techniques. See this
I am trying to do handwriting character recognition using Tensorflow in Google-colab.
I have trained and tested model with an accuracy of 91%
I tried it on image given in the tutorial, and it worked correctly.
it was 28*28 resized.
When I wanted to try it on my input-image, it is predicting wrong results as 2,3, but my input-image is of 'digit-6'.
the problem may be in image-operations and before passing to model.
also, further I wanted to pass that image for realtime-recognition.
I am doing resizing, inverting of the image, to make it compatible with my trained labels.
OpenCV input image is represented opposite-notation of tensorflow labels, as the current matrix represents black as 0 and white as 255.
my GitHub Jupyter-notebook file is followed from tutorial of digitalocean's blog
How can I upload an image taken from a phone/webcam and recognize characters from that image?
where I am making mistakes in processing image?
further, I wanted to pass that image in a project - real-time recognition of characters
testing images are
do you know Mnist data set is restricted with padding of images?
appropriate realtime image processing is needed.
This is useful article about that
https://link.medium.com/0ySCmyMpzU
and following is my project about simple mnist game
https://github.com/mym0404/Math-Writer
I am new by tensorflow. I want to write a Neural network, that gets noisy images from a file and uncorrupted images from another file.
then I want to correct noisy images based on the other images.
What you are talking about is a denoising autoencoder.
This is not my code. It was ranked very high on google search, has several github stars and forkes, all which are great indicators that it is a working and supported implementation.
Actually, I'm trying to train a NN that get corrupted images and based on them the grand truth, remove noise from that images.It must be Network in Network, an another word pixels independent.
For my Deep Learning Course, I need to implement a neural network which is exactly the same as the Tensorflow MNIST for Experts Tutorial. ,
The only difference is that I need to down-sampşe the database, then put it into the neural network. Should I crop and resize, or should I implement the neural network with parameters which accepts multiple data sizes(28x28 and 14x14).
All of the parameters in the tensorflow tutorial is static so I couldn't find a way to feed the algorithm with a 14x14 image. Which tool should I use for 'optimal' down-sampling?
You need resize the input images to a fixed size (which appears tp be 14*14 from your description). There are different ways for doing this, for example, you can use interpolation to resize, simply crop the central part or some corner of the image, or randomly chose one or many patches (all of the same size as your network's input) from a give image. You can also combine these methods. For example, in VGG, they first do a aspect preserving resize using bilinear interpolation and then get a random patch from the resulting image (for test phase they get the central crop). You can find VGG's preprocessing source code in TensorFlow at the following link:
https://github.com/tensorflow/models/blob/master/slim/preprocessing/vgg_preprocessing.py
The only parameters of sample code in the tutorial you have mentioned that needs to be changed are those related to the input image sizes. For example, you need to change 28s to 14s and 784s to 228s (these are just examples, there are other wight sizes that you will need to change as well).