I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?
Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.
SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.
Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.
Related
I am working on a problem which requires me to build a deep learning model that based on certain input image it has to output another image. It is worth noting that these two images are conceptually related but they don't have the same dimensions.
At first I thought that a classical CNN with a final dense layer whose argument is the multiplication of the height and width of the output image would suit this case, but when training it was giving strange figures such as accuracy of 0.
While looking for some answers on the Internet I discovered the concepts of CNN autoencoders and I was wondering if this approach could help me solve my problem. Among all the examples I saw, the input and output of an autoencoder had the same size and dimensions.
At this point I wanted to ask if there was a type of CNN autoencoders that produce an output image that has different dimension compared to input image.
Auto-encoder (AE) is an architecture that tries to encode your image into a lower-dimensional representation by learning to reconstruct the data from such representation simultaniously. Therefore AE rely on a unsupervised (don't need labels) data that is used both as an input and as the target (used in the loss).
You can try using a U-net based architecture for your usecase. A U-net would forward intermediate data representations to later layers of the network which should assist with faster learning/mapping of the inputs into a new domain..
You can also experiment with a simple architecture containing a few ResNet blocks without any downsampling layers, which might or might not be enough for your use-case.
If you want to dig a little deeper you can look into Disco-GAN and related methods.They explicitly try to map image into a new domain while maintaining image information.
I have a database of images that contains identity cards, bills and passports.
I want to classify these images into different groups (i.e identity cards, bills and passports).
As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised).
The idea for me is like this: the clustering will be based on the similarity between images (i.e images that have similar features will be grouped together).
I know also that this process can be done by using k-means.
So the problem for me is about features and using images with K-means.
If anyone has done this before, or has a clue about it, please would you recommend some links to start with or suggest any features that can be helpful.
Most simple way to get good results will be to break down the problem into two parts :
Getting the features from the images: Using the raw pixels as features will give you poor results. Pass the images through a pre trained CNN(you can get several of those online). Then use the last CNN layer(just before the fully connected) as the image features.
Clustering of features : Having got the rich features for each image, you can do clustering on these(like K-means).
I would recommend implementing(using already implemented) 1, 2 in Keras and Sklearn respectively.
Label a few examples, and use classification.
Clustering is as likely to give you the clusters "images with a blueish tint", "grayscale scans" and "warm color temperature". That is a quote reasonable way to cluster such images.
Furthermore, k-means is very sensitive to outliers. And you probably have some in there.
Since you want your clusters correspond to certain human concepts, classification is what you need to use.
I have implemented Unsupervised Clustering based on Image Similarity using Agglomerative Hierarchical Clustering.
My use case had images of People, so I had extracted the Face Embedding (aka Feature) Vector from each image. I have used dlib for face embedding and so each feature vector was 128d.
In general, the feature vector of each image can be extracted. A pre-trained VGG or CNN network, with its final classification layer removed; can be used for feature extraction.
A dictionary with KEY as the IMAGE_FILENAME and VALUE as the FEATURE_VECTOR can be created for all the images in the folder. This will make the co-relation between the filename and it’s feature vector easier.
Then create a single feature vector say X, which comprises of individual feature vectors of each image in the folder/group which needs to be clustered.
In my use case, X had the dimension as : NUMBER OF IMAGE IN THE FOLDER, 128 (i.e SIZE OF EACH FEATURE VECTOR). For instance, Shape of X : 50,128
This feature vector can then be used to fit an Agglomerative Hierarchical Cluster. One needs to fine tune the distance threshold parameter empirically.
Finally, we can write a code to identify which IMAGE_FILENAME belongs to which cluster.
In my case, there were about 50 images per folder so this was a manageable solution. This approach was able to group image of a single person into a single clusters. For example, 15 images of PERSON1 belongs to CLUSTER 0, 10 images of PERSON2 belongs to CLUSTER 2 and so on…
I have a model trained from RGB image samples that take a 31x31 pixel region as input and produces a single classification for the center pixel.
I'd like to apply this model over an entire image to recover effectively a new image of classifications for each pixel. Since this isn't a convolution, I'm not sure what the preferred way to do this is in TensorFlow.
I know this is possible by exploding the image for inference into a ton of smaller tensors but this seems like a colossal waste since each pixel will be duplicated 961 times. Is there a way around this?
Make your model a fully-convolutional neural network, so for a 31x31 image it will produce a single label and for a 62x62 image it will produce 2x2 labels and so on. This will remove the redundant computation you talked about in case of windowing method.
If in case the network has a fully-connected layer, it can be replaced with a convolutional ones using a 1x1 kernel.
For my Deep Learning Course, I need to implement a neural network which is exactly the same as the Tensorflow MNIST for Experts Tutorial. ,
The only difference is that I need to down-sampşe the database, then put it into the neural network. Should I crop and resize, or should I implement the neural network with parameters which accepts multiple data sizes(28x28 and 14x14).
All of the parameters in the tensorflow tutorial is static so I couldn't find a way to feed the algorithm with a 14x14 image. Which tool should I use for 'optimal' down-sampling?
You need resize the input images to a fixed size (which appears tp be 14*14 from your description). There are different ways for doing this, for example, you can use interpolation to resize, simply crop the central part or some corner of the image, or randomly chose one or many patches (all of the same size as your network's input) from a give image. You can also combine these methods. For example, in VGG, they first do a aspect preserving resize using bilinear interpolation and then get a random patch from the resulting image (for test phase they get the central crop). You can find VGG's preprocessing source code in TensorFlow at the following link:
https://github.com/tensorflow/models/blob/master/slim/preprocessing/vgg_preprocessing.py
The only parameters of sample code in the tutorial you have mentioned that needs to be changed are those related to the input image sizes. For example, you need to change 28s to 14s and 784s to 228s (these are just examples, there are other wight sizes that you will need to change as well).
I'm trying to use Keras's implementation of resnet for a transfer learaning task with a quite different set of images (B&W 16 bit). So what Keras expects as an input? Image with 3 channels and -127-128 range (that's what I assume zero centered 8 bit image)? 0-255? What would happen if I pass something outside this range?
Thanks.
According to the paper provided in Keras documentation you should provide a 224 x 224 RGB [0 - 225] image. The actual dimension ordering depends on the backend you use in your Keras installation.
The data preparation was performed as in AlexNet so the mean activation was subtracted from each color channel. The mean vector for RGB is 103.939, 116.779, 123.68.
If your color values would extend -255, 255 range - it could harm your training because of the magnitude of data unknown for the network. But still - network could adapt to this changes, but it usually makes more time and make training more chaotic.
In case of monochromatic images - a commonly used technique is a repeating the same channel 3 times in order to make dimensions plausible for network architecture.