Acquisition
I have images that contains defaults areas...Original image's size is arround 3072x16000, that is huge ! Lenght can randomly change, depending on product length.
The image is from a profilometer and look like it :
The speed of the convoyer is fixed. Image length depend on product's.
We can't do "online" processing because acquisition and image processing are from different suppliers.
The first supplier give individual image of products, product lenght is not fixed.
Defaults
Defaults are quite small (less than 256x256px), then i cropped it, learn a CNN how to recognize it from a conform area (both 256x256x1 px).
The aim is to focus the network training on ROIs because i don't have a huge database of images.
I need very high resolution to get small defaults. The classification verdict is on small 256x256px subimage.
I'll have arround 20 classes of defaults and 4 classes of conforms areas (depending on where i am in the image).
I use greylevel image to identify defaults.
I can classify my 256x256px small image between "Good"/"Bad" classes.
If one area is identified as "Bad", the product is "suspicious" and segregated...
CNN
I used TensorFlow and retrained a mobilenet network, that work well on 256x256 images, as if training was long.
Now i face other issue. Input images size are in reality arround 3072x16000 pixel.
Is there a recommanded way to use my pretrain CNN on theses huge images ?
How should i cut it and pass it to my CNN?
Many Thanks !
Related
I want to design and train a neural network for the automatic recognition of the edges, in some microscopic images.
I am using Keras for a start, I may consider PyTorch later.
The structure of the images is rather simple, with some dark areas, and some clear areas, relatively easy to distinguish, and the task is to select the pixels of the contour between dark and clear areas. The transition between dark and clear is gradual, so my result is not a single line of edge pixels, but rather a 10 or 15 pixels wide "ribbon" at the edge.
I have manually annotated 200-something images, so for each image I have another image, of the same size, where the pixels of the contours are black, and all the other pixels are white.
I have seen many tutorials on how to design, compile and fit a model (a neural network), and then how to test it, using the manually annotated data.
However, most of the tutorials work on problems of classification, where the number of neurons in the output layer is the number of categories.
My problem is not a problem of classification, and ideally my output should be an image of the same size of the input.
So, here is my question:
What is the best way to design the output layer? Is a layer with a number of neurons equal to the number of pixels the best idea? Or this is a waste, and there is a more efficient way?
Addendum
The images are "easy", but it is still difficult to find the contour pixels, so I believe that it is worth using the machine learning approach.
The transition between dark and clear is a little gradual, so my result is not a single line of pixels on the edge, but rather a band, a 10 or 15 wide ribbon of edge pixels. Since I am after a ribbon of pixels, my categories should be "edge" and "not-edge". If I use the categories "dark pixels" and "clear pixels", and then numerically find the pixels between the two areas I do not get the "ribbon" result, which I need.
The short answer is "yes": it is a good idea to have as many neurons in output as you have in input, i.e. to output an image with the same resolution of the input images.
The network architecture will have an input layer with a neuron for each pixel, then typically the hidden layers will shrink to less neurons, probably with convolutional layers, and then some more layers will re-expand the number of neurons, up to the output layer, which in principle may have the same number of neurons as the input layer.
The most common architecture in this type of problem is the U-net architecture, described in the article "U-Net: Convolutional Networks for Biomedical Image Segmentation", by Ronneberger, Fischer, and Brox, published on the open arxiv: https://arxiv.org/abs/1505.04597.
[
I have a simple question to some of you. I have worked on some image classification tutorials. Only the simpler ones like MNIST dataset. Then I noticed that they do this
train_images = train_images / 255.0
Now I know that every value from the matrix (which is the image) gets divided by 255.0. If I remember correctly this is called normalization right? (please correct me if I am wrong otherwise tell me that I am right).
I'm just curious is there a "BETTER WAY","ANOTHER WAY" or "THE BEST WAY" to pre-process or clean images then those cleaned images will be fed to the network for training.
Please if you would like to provide a sample source code. Please! be my guest. I would love to look at code samples.
Thank you!
Pre-processing images prior to image classification can include the followings:
normalisation: which you already mentioned
reshaping into uniform resolution (img height x img width): higher resoltuion leads to better learning and smaller resolution may lose important features. Some models have default input size that you can refer to. Also an average size of all images can be used too.
color channel: 1 refers to gray-scale and 3 refers rgb-scale. Depending on your application you can set this.
data augmentation: if your model is overfitting or your dataset is small, you can reproduce your dataset by altering original images (flipping, rotating, cropping, zooming..) to increase your dataset
image segmentation: segmentation can be performed to highlight the area or boundaries that may benefit your application. For example, in medical image classification, some part of body maybe masked to enhance classification performance.
For example, I recently worked on image classification of lung CT scan images. For pre-processing, I have reshaped the images and made them gray-scale. Then I performed image segmentation to highlight the lungs in the images. And I normalised the image pixels to put into my classification model. Depending on your application, there may be other more pre-processing techniques you might want to consider.
I want to detect small objects (9x9 px) in my images (around 1200x900) using neural networks. Searching in the net, I've found several webpages with codes for keras using customized layers for custom objects classification. In this case, I've understood that you need to provide images where your object is alone. Although the training is goodand it classifies them properly, unfortunately I haven't found how to later load this trained network to find objects in my big images.
On the other side, I have found that I can do this using the cnn class in cv if I load the weigths from the Yolov3 netwrok. In this case I provide the big images with the proper annotations but the network is not well trained...
Given this context, could someone show me how to load weigths in cnn that are trained with a customized network and how to train that nrtwork?
After a lot of search, I've found a better approach:
Cut your images in subimages (I cut it in 2 rows and 4 columns).
Feed yolo with these subimages and their proper annotations. I used yolov3 tiny, with a size of 960x960 for 10k steps. In my case, intensity and color was important so random parameters such as hue, saturation and exposition were kept at 0. Use random angles. If your objects do not change in size, disable random at yolo layers (random=0 in cfg files. It only randomizes the fact that it changes the size for training in every step). For this, I'm using Alexey darknet fork. If you have some blur object, add blur=1 in the [net] properties in cfg file (after hue). For blur you need Alexey fork and to be compiled with opencv (appart from cuda if you can).
Calculate anchors with Alexey fork. Cluster_num is the number of pairs of anchors you use. You can know it by opening your cfg and look at any anchors= line. Anchors are the size of the boxes that darknet will use to predict the positions. Cluster_num = number of anchors pairs.
Change cfg with your new anchors. If you have fixed size objects, anchors will be very close in size. I left the ones for bigger (first yolo layer) but for the second, the tinies, I modified and I even removed 1 pair. If you remove some, then change the order in mask [yolo] (in all [yolo]). Mask refer to the index of the anchors, starting at 0 index. If you remove some, change also the num= inside the [yolo].
After, detection is quite good.It could happen that if you detect on a video, there are objects that are lost in some frames. You can try to avoid this by using the lstm cfg. https://github.com/AlexeyAB/darknet/issues/3114
Now, if you also want to track them, you can apply a deep sort algorithm with your yolo pretrained network. For example, you can convert your pretrained network to keras using https://github.com/allanzelener/YAD2K (add this commit for tiny yolov3 https://github.com/allanzelener/YAD2K/pull/154/commits/e76d1e4cd9da6e177d7a9213131bb688c254eb20) and then use https://github.com/Qidian213/deep_sort_yolov3
As an alternative, you can train it with mask-rcnn or any other faster-rcnn algorithm and then look for deep-sort.
I'm a student in medical imaging. I have to construct a neural network for image segmentation. I have a data set of 285 subjects, each with 4 modalities (T1, T2, T1ce, FLAIR) + their respective segmentation ground truth. Everything is in 3D with resolution of 240x240x155 voxels (this is BraTS data set).
As we know, I cannot input the whole image on a GPU for memory reasons. I have to preprocess the images and decompose them in 3D overlapping patches (sub-volumes of 40x40x40) which I do with scikit-image view_as_windows and then serialize the windows in a TFRecords file. Since each patch overlaps of 10 voxels in each direction, these sums to 5,292 patches per volume. The problem is, with only 1 modality, I get sizes of 800 GB per TFRecords file. Plus, I have to compute their respective segmentation weight map and store it as patches too. Segmentation is also stored as patches in the same file.
And I eventually have to include all the other modalities, which would take nothing less than terabytes of storage. I also have to remember I must also sample equivalent number of patches between background and foreground (class balancing).
So, I guess I have to do all preprocessing steps on-the-fly, just before every training step (while hoping not to slow down training too). I cannot use tf.data.Dataset.from_tensors() since I cannot load everything in RAM. I cannot use tf.data.Dataset.from_tfrecords() since preprocessing the whole thing before takes a lot of storage and I will eventually run out.
The question is : what's left for me for doing this cleanly with the possibility to reload the model after training for image inference ?
Thank you very much and feel free to ask for any other details.
Pierre-Luc
Finally, I found a method to solve my problem.
I first crop a subject's image without applying the actual crop. I only measure the slices I need to crop the volume to only the brain. I then serialize all the data set images into one TFRecord file, each training example being an image modality, original image's shape and the slices (saved as Int64 feature).
I decode the TFRecords afterward. Each training sample are reshaped to the shape it contains in a feature. I stack all the image modalities into a stack using tf.stack() method. I crop the stack using the previously extracted slices (the crop then applies to all images in the stack). I finally get some random patches using tf.random_crop() method that allows me to randomly crop a 4-D array (heigh, width, depth, channel).
The only thing I still haven't figured out is data augmentation. Since all this is occurring in Tensors format, I cannot use plain Python and NumPy to rotate, shear, flip a 4-D array. I would need to do it in the tf.Session(), but I would rather like to avoid this and directly input the training handle.
For the evaluation, I serialize in a TFRecords file only one test subject per file. The test subject contains all modalities too, but since there is no TensorFLow methods to extract patches in 4-D, the image is preprocessed in small patches using Scikit-Learn extract_patches() method. I serialize these patches to the TFRecords.
This way, training TFRecords is a lot smaller. I can evaluate the test data using batch prediction.
Thanks for reading and feel free to comment !
I would like to train a new model using my own dataset. I will be
using Darkflow/Tensorflow for it.
Regarding my doubts:
(1) Should we resize our training images for a specific size?
(2) I think smaller images might save time, but can smaller images harm the accuracy?
(3) And what about the images to be predicted, should we resize them as well or is it not necessary?
(1) It already resize it with random=1 in .cfg file.The answer is "yes".The input resolution of images are same.You can resize it by yourself or Yolo can do it.
(2)If your hardware is good enough,I suggest you to use big sized images.Also as a suggest,If you will use webcam,use images as the same resolutions as your webcam uses.
(3)Yes, same as training.
(1) Yes, neural networks have fixed input dimensions. These can be adjusted to fit your purpose, but at last you need to commit to a defined input dimension, and thus you need to input your images fitting these dimensions. For YOLO I found the following:
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
It could be that the framework you are using already does that step for you. Maybe somebody could comment on that.
(3) The images / samples you feed during inference, for prediction should be as similar to the training images / samples as possible. So whatever preprocessing you re doing with your training data, you should definitely do the same on your inference data.
(2) Smaller images make sense if your hardware is not able to hold larger images in memory, or if you train with large batch sizes so that your hardware needs to hold multiple images in memory at ones. In the end, the computational time is rather proportional to the amount of operations of your architecture, not necessarily to the images size.
(1) No, it is not necessary. But if your dataset contains random resolutions, you can put
random = 1
in your .cfg file for better results.
(2) Smaller images don't reduce the time to converge, but if your dataset contains only small images, Yolo will probably fail to converge (Yolov3 is not a good detector for a lot of tiny objects)
(3) It is not necessary