![sample training input]http://www.cs.toronto.edu/~vmnih/data/mass_roads/train/sat/10078660_15.tiff
![sample training output]
http://www.cs.toronto.edu/~vmnih/data/mass_roads/train/map/10078660_15.tif
I am a beginner to CNN , and have worked with the MNIST dataset in which we input 28x28x3 images and output a 10x1 vector containing probabilities of the 10 classes(0,1,2,3---,9).
How do we extract only the road pixels from the input image and display them, as is represented by the output image?
This problems is a binary segmentation problem. In a sense you learn a mapping from satellite images and predict for each pixel, iff this pixel is part of the road. A simple algorithm to do this would be to check if the pixel color is part of some range.
A CNN naturally will learn you a more complicated function based on the local neighborhood of said pixel. One repo to get you started should be this one: https://github.com/jocicmarko/ultrasound-nerve-segmentation. Therein they use a similar algorithm to segment ultrasound images using CNNs. You just have to use 3 input channels instead of 1 and everything else should be quite similar.
Related
There is a tutorial on the web for drawing bounding boxes using R-CCN, where a VGG16 network is modified for this task (using transfer learning take advantage that the inner layers are trained already.).
The edit consists on:
removing the classification layer
using a regression layer instead
The training involves images for inputs and [x1,y1,x2,y2] labeled outputs, each pair being a corner of an image, i.e a description of a square box around the object we want to detect.
I have tried it, and so far didn't have luck for the coordinates predicted. So my questions are:
Is the procedure of editing the CNN to create an R-CNN that outputs the vector (also in link at the top) a correct approach for predicting a bounding box for a specific object ?
I am trying with Mobile Net because it is lighter, so assuming 1. is correct, would this also be a "logically similar" idea?
I have a simple question to some of you. I have worked on some image classification tutorials. Only the simpler ones like MNIST dataset. Then I noticed that they do this
train_images = train_images / 255.0
Now I know that every value from the matrix (which is the image) gets divided by 255.0. If I remember correctly this is called normalization right? (please correct me if I am wrong otherwise tell me that I am right).
I'm just curious is there a "BETTER WAY","ANOTHER WAY" or "THE BEST WAY" to pre-process or clean images then those cleaned images will be fed to the network for training.
Please if you would like to provide a sample source code. Please! be my guest. I would love to look at code samples.
Thank you!
Pre-processing images prior to image classification can include the followings:
normalisation: which you already mentioned
reshaping into uniform resolution (img height x img width): higher resoltuion leads to better learning and smaller resolution may lose important features. Some models have default input size that you can refer to. Also an average size of all images can be used too.
color channel: 1 refers to gray-scale and 3 refers rgb-scale. Depending on your application you can set this.
data augmentation: if your model is overfitting or your dataset is small, you can reproduce your dataset by altering original images (flipping, rotating, cropping, zooming..) to increase your dataset
image segmentation: segmentation can be performed to highlight the area or boundaries that may benefit your application. For example, in medical image classification, some part of body maybe masked to enhance classification performance.
For example, I recently worked on image classification of lung CT scan images. For pre-processing, I have reshaped the images and made them gray-scale. Then I performed image segmentation to highlight the lungs in the images. And I normalised the image pixels to put into my classification model. Depending on your application, there may be other more pre-processing techniques you might want to consider.
I have a database of images that contains identity cards, bills and passports.
I want to classify these images into different groups (i.e identity cards, bills and passports).
As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised).
The idea for me is like this: the clustering will be based on the similarity between images (i.e images that have similar features will be grouped together).
I know also that this process can be done by using k-means.
So the problem for me is about features and using images with K-means.
If anyone has done this before, or has a clue about it, please would you recommend some links to start with or suggest any features that can be helpful.
Most simple way to get good results will be to break down the problem into two parts :
Getting the features from the images: Using the raw pixels as features will give you poor results. Pass the images through a pre trained CNN(you can get several of those online). Then use the last CNN layer(just before the fully connected) as the image features.
Clustering of features : Having got the rich features for each image, you can do clustering on these(like K-means).
I would recommend implementing(using already implemented) 1, 2 in Keras and Sklearn respectively.
Label a few examples, and use classification.
Clustering is as likely to give you the clusters "images with a blueish tint", "grayscale scans" and "warm color temperature". That is a quote reasonable way to cluster such images.
Furthermore, k-means is very sensitive to outliers. And you probably have some in there.
Since you want your clusters correspond to certain human concepts, classification is what you need to use.
I have implemented Unsupervised Clustering based on Image Similarity using Agglomerative Hierarchical Clustering.
My use case had images of People, so I had extracted the Face Embedding (aka Feature) Vector from each image. I have used dlib for face embedding and so each feature vector was 128d.
In general, the feature vector of each image can be extracted. A pre-trained VGG or CNN network, with its final classification layer removed; can be used for feature extraction.
A dictionary with KEY as the IMAGE_FILENAME and VALUE as the FEATURE_VECTOR can be created for all the images in the folder. This will make the co-relation between the filename and it’s feature vector easier.
Then create a single feature vector say X, which comprises of individual feature vectors of each image in the folder/group which needs to be clustered.
In my use case, X had the dimension as : NUMBER OF IMAGE IN THE FOLDER, 128 (i.e SIZE OF EACH FEATURE VECTOR). For instance, Shape of X : 50,128
This feature vector can then be used to fit an Agglomerative Hierarchical Cluster. One needs to fine tune the distance threshold parameter empirically.
Finally, we can write a code to identify which IMAGE_FILENAME belongs to which cluster.
In my case, there were about 50 images per folder so this was a manageable solution. This approach was able to group image of a single person into a single clusters. For example, 15 images of PERSON1 belongs to CLUSTER 0, 10 images of PERSON2 belongs to CLUSTER 2 and so on…
I am new by tensorflow. I want to write a Neural network, that gets noisy images from a file and uncorrupted images from another file.
then I want to correct noisy images based on the other images.
What you are talking about is a denoising autoencoder.
This is not my code. It was ranked very high on google search, has several github stars and forkes, all which are great indicators that it is a working and supported implementation.
Actually, I'm trying to train a NN that get corrupted images and based on them the grand truth, remove noise from that images.It must be Network in Network, an another word pixels independent.
I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?
Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.
SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.
Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.