Is image resizing needed to training a new Yolo model?

Is image resizing needed to training a new Yolo model? - python

I would like to train a new model using my own dataset. I will be
using Darkflow/Tensorflow for it.
Regarding my doubts:
(1) Should we resize our training images for a specific size?
(2) I think smaller images might save time, but can smaller images harm the accuracy?
(3) And what about the images to be predicted, should we resize them as well or is it not necessary?

(1) It already resize it with random=1 in .cfg file.The answer is "yes".The input resolution of images are same.You can resize it by yourself or Yolo can do it.
(2)If your hardware is good enough,I suggest you to use big sized images.Also as a suggest,If you will use webcam,use images as the same resolutions as your webcam uses.
(3)Yes, same as training.

(1) Yes, neural networks have fixed input dimensions. These can be adjusted to fit your purpose, but at last you need to commit to a defined input dimension, and thus you need to input your images fitting these dimensions. For YOLO I found the following:
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
It could be that the framework you are using already does that step for you. Maybe somebody could comment on that.
(3) The images / samples you feed during inference, for prediction should be as similar to the training images / samples as possible. So whatever preprocessing you re doing with your training data, you should definitely do the same on your inference data.
(2) Smaller images make sense if your hardware is not able to hold larger images in memory, or if you train with large batch sizes so that your hardware needs to hold multiple images in memory at ones. In the end, the computational time is rather proportional to the amount of operations of your architecture, not necessarily to the images size.

(1) No, it is not necessary. But if your dataset contains random resolutions, you can put
random = 1
in your .cfg file for better results.
(2) Smaller images don't reduce the time to converge, but if your dataset contains only small images, Yolo will probably fail to converge (Yolov3 is not a good detector for a lot of tiny objects)
(3) It is not necessary

Related

Binary image layers for machine learning

I am training a machine learning model with my own dataset generated from another program. I have designed the model to work on 128x128 images however as I approach 100'000 images I start to run into issues with the training crashing without any informative output (the kernel dies). I am assuming that this is caused by memory limits since it only occurs as the number of images increases.
To mitigate the memory usage I realized that all of the pixels in the input image are either 0 or 255 meaning that after normalization they are 0 or 1. Is there a way to use this phenomenon in PyTorch to reduce memory usage? Or are there some other benefits you can use when the input image only contains binary values?

Detecting small custom object using keras

I want to detect small objects (9x9 px) in my images (around 1200x900) using neural networks. Searching in the net, I've found several webpages with codes for keras using customized layers for custom objects classification. In this case, I've understood that you need to provide images where your object is alone. Although the training is goodand it classifies them properly, unfortunately I haven't found how to later load this trained network to find objects in my big images.
On the other side, I have found that I can do this using the cnn class in cv if I load the weigths from the Yolov3 netwrok. In this case I provide the big images with the proper annotations but the network is not well trained...
Given this context, could someone show me how to load weigths in cnn that are trained with a customized network and how to train that nrtwork?

After a lot of search, I've found a better approach:
Cut your images in subimages (I cut it in 2 rows and 4 columns).
Feed yolo with these subimages and their proper annotations. I used yolov3 tiny, with a size of 960x960 for 10k steps. In my case, intensity and color was important so random parameters such as hue, saturation and exposition were kept at 0. Use random angles. If your objects do not change in size, disable random at yolo layers (random=0 in cfg files. It only randomizes the fact that it changes the size for training in every step). For this, I'm using Alexey darknet fork. If you have some blur object, add blur=1 in the [net] properties in cfg file (after hue). For blur you need Alexey fork and to be compiled with opencv (appart from cuda if you can).
Calculate anchors with Alexey fork. Cluster_num is the number of pairs of anchors you use. You can know it by opening your cfg and look at any anchors= line. Anchors are the size of the boxes that darknet will use to predict the positions. Cluster_num = number of anchors pairs.
Change cfg with your new anchors. If you have fixed size objects, anchors will be very close in size. I left the ones for bigger (first yolo layer) but for the second, the tinies, I modified and I even removed 1 pair. If you remove some, then change the order in mask [yolo] (in all [yolo]). Mask refer to the index of the anchors, starting at 0 index. If you remove some, change also the num= inside the [yolo].
After, detection is quite good.It could happen that if you detect on a video, there are objects that are lost in some frames. You can try to avoid this by using the lstm cfg. https://github.com/AlexeyAB/darknet/issues/3114
Now, if you also want to track them, you can apply a deep sort algorithm with your yolo pretrained network. For example, you can convert your pretrained network to keras using https://github.com/allanzelener/YAD2K (add this commit for tiny yolov3 https://github.com/allanzelener/YAD2K/pull/154/commits/e76d1e4cd9da6e177d7a9213131bb688c254eb20) and then use https://github.com/Qidian213/deep_sort_yolov3
As an alternative, you can train it with mask-rcnn or any other faster-rcnn algorithm and then look for deep-sort.

TensorFlow tf.data.Dataset API for medical imaging

I'm a student in medical imaging. I have to construct a neural network for image segmentation. I have a data set of 285 subjects, each with 4 modalities (T1, T2, T1ce, FLAIR) + their respective segmentation ground truth. Everything is in 3D with resolution of 240x240x155 voxels (this is BraTS data set).
As we know, I cannot input the whole image on a GPU for memory reasons. I have to preprocess the images and decompose them in 3D overlapping patches (sub-volumes of 40x40x40) which I do with scikit-image view_as_windows and then serialize the windows in a TFRecords file. Since each patch overlaps of 10 voxels in each direction, these sums to 5,292 patches per volume. The problem is, with only 1 modality, I get sizes of 800 GB per TFRecords file. Plus, I have to compute their respective segmentation weight map and store it as patches too. Segmentation is also stored as patches in the same file.
And I eventually have to include all the other modalities, which would take nothing less than terabytes of storage. I also have to remember I must also sample equivalent number of patches between background and foreground (class balancing).
So, I guess I have to do all preprocessing steps on-the-fly, just before every training step (while hoping not to slow down training too). I cannot use tf.data.Dataset.from_tensors() since I cannot load everything in RAM. I cannot use tf.data.Dataset.from_tfrecords() since preprocessing the whole thing before takes a lot of storage and I will eventually run out.
The question is : what's left for me for doing this cleanly with the possibility to reload the model after training for image inference ?
Thank you very much and feel free to ask for any other details.
Pierre-Luc

Finally, I found a method to solve my problem.
I first crop a subject's image without applying the actual crop. I only measure the slices I need to crop the volume to only the brain. I then serialize all the data set images into one TFRecord file, each training example being an image modality, original image's shape and the slices (saved as Int64 feature).
I decode the TFRecords afterward. Each training sample are reshaped to the shape it contains in a feature. I stack all the image modalities into a stack using tf.stack() method. I crop the stack using the previously extracted slices (the crop then applies to all images in the stack). I finally get some random patches using tf.random_crop() method that allows me to randomly crop a 4-D array (heigh, width, depth, channel).
The only thing I still haven't figured out is data augmentation. Since all this is occurring in Tensors format, I cannot use plain Python and NumPy to rotate, shear, flip a 4-D array. I would need to do it in the tf.Session(), but I would rather like to avoid this and directly input the training handle.
For the evaluation, I serialize in a TFRecords file only one test subject per file. The test subject contains all modalities too, but since there is no TensorFLow methods to extract patches in 4-D, the image is preprocessed in small patches using Scikit-Learn extract_patches() method. I serialize these patches to the TFRecords.
This way, training TFRecords is a lot smaller. I can evaluate the test data using batch prediction.
Thanks for reading and feel free to comment !

Tensorflow Random input size

Acquisition
I have images that contains defaults areas...Original image's size is arround 3072x16000, that is huge ! Lenght can randomly change, depending on product length.
The image is from a profilometer and look like it :
The speed of the convoyer is fixed. Image length depend on product's.
We can't do "online" processing because acquisition and image processing are from different suppliers.
The first supplier give individual image of products, product lenght is not fixed.
Defaults
Defaults are quite small (less than 256x256px), then i cropped it, learn a CNN how to recognize it from a conform area (both 256x256x1 px).
The aim is to focus the network training on ROIs because i don't have a huge database of images.
I need very high resolution to get small defaults. The classification verdict is on small 256x256px subimage.
I'll have arround 20 classes of defaults and 4 classes of conforms areas (depending on where i am in the image).
I use greylevel image to identify defaults.
I can classify my 256x256px small image between "Good"/"Bad" classes.
If one area is identified as "Bad", the product is "suspicious" and segregated...
CNN
I used TensorFlow and retrained a mobilenet network, that work well on 256x256 images, as if training was long.
Now i face other issue. Input images size are in reality arround 3072x16000 pixel.
Is there a recommanded way to use my pretrain CNN on theses huge images ?
How should i cut it and pass it to my CNN?
Many Thanks !

Image Segmentation with TensorFlow

I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?

Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.

SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.

Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.