CNN for 3D image segmentation with different size

CNN for 3D image segmentation with different size - python

I am working on 3D image segmentation task, but the length of z-axis is different in every image. For the convolution neural networks, I think the length should be same in all images. How can I handle this?

I'm not sure what you're working with, but you should transform your input before passing it through your network (e.g. in a data loader). In this case, one of the transforms should be a resize operation (to the appropriate dimensions).
https://pytorch.org/vision/stable/transforms.html

Related

convolution neural network image size distortion

I have 2 data sets of images, one is perfect square, so resizing to 224x224 for CNN will not result in any distortion, the other dataset is not square so resizing to 224x224 will result in image distortion.
I will split the sets to train and validation, is this a good way to train the model? will there be any bias in the model?
I am afraid the model will identify distortion rather than the real differences between the 2 sets..

In case you want to preserve your data, you can crop it randomly and use it to transform to square. That way your model will look on the cropped part of the image. Doing so can increase your data but this is one good data if you save the transformed image. However using random crop function from from the dataloader will stream line the process. Cropping is a good augmentation technique for preprocessing the data.

TensorFlow tf.data.Dataset API for medical imaging

I'm a student in medical imaging. I have to construct a neural network for image segmentation. I have a data set of 285 subjects, each with 4 modalities (T1, T2, T1ce, FLAIR) + their respective segmentation ground truth. Everything is in 3D with resolution of 240x240x155 voxels (this is BraTS data set).
As we know, I cannot input the whole image on a GPU for memory reasons. I have to preprocess the images and decompose them in 3D overlapping patches (sub-volumes of 40x40x40) which I do with scikit-image view_as_windows and then serialize the windows in a TFRecords file. Since each patch overlaps of 10 voxels in each direction, these sums to 5,292 patches per volume. The problem is, with only 1 modality, I get sizes of 800 GB per TFRecords file. Plus, I have to compute their respective segmentation weight map and store it as patches too. Segmentation is also stored as patches in the same file.
And I eventually have to include all the other modalities, which would take nothing less than terabytes of storage. I also have to remember I must also sample equivalent number of patches between background and foreground (class balancing).
So, I guess I have to do all preprocessing steps on-the-fly, just before every training step (while hoping not to slow down training too). I cannot use tf.data.Dataset.from_tensors() since I cannot load everything in RAM. I cannot use tf.data.Dataset.from_tfrecords() since preprocessing the whole thing before takes a lot of storage and I will eventually run out.
The question is : what's left for me for doing this cleanly with the possibility to reload the model after training for image inference ?
Thank you very much and feel free to ask for any other details.
Pierre-Luc

Finally, I found a method to solve my problem.
I first crop a subject's image without applying the actual crop. I only measure the slices I need to crop the volume to only the brain. I then serialize all the data set images into one TFRecord file, each training example being an image modality, original image's shape and the slices (saved as Int64 feature).
I decode the TFRecords afterward. Each training sample are reshaped to the shape it contains in a feature. I stack all the image modalities into a stack using tf.stack() method. I crop the stack using the previously extracted slices (the crop then applies to all images in the stack). I finally get some random patches using tf.random_crop() method that allows me to randomly crop a 4-D array (heigh, width, depth, channel).
The only thing I still haven't figured out is data augmentation. Since all this is occurring in Tensors format, I cannot use plain Python and NumPy to rotate, shear, flip a 4-D array. I would need to do it in the tf.Session(), but I would rather like to avoid this and directly input the training handle.
For the evaluation, I serialize in a TFRecords file only one test subject per file. The test subject contains all modalities too, but since there is no TensorFLow methods to extract patches in 4-D, the image is preprocessed in small patches using Scikit-Learn extract_patches() method. I serialize these patches to the TFRecords.
This way, training TFRecords is a lot smaller. I can evaluate the test data using batch prediction.
Thanks for reading and feel free to comment !

Striding a 2D model over an image to produce image of labels (NOT convolution)

I have a model trained from RGB image samples that take a 31x31 pixel region as input and produces a single classification for the center pixel.
I'd like to apply this model over an entire image to recover effectively a new image of classifications for each pixel. Since this isn't a convolution, I'm not sure what the preferred way to do this is in TensorFlow.
I know this is possible by exploding the image for inference into a ton of smaller tensors but this seems like a colossal waste since each pixel will be duplicated 961 times. Is there a way around this?

Make your model a fully-convolutional neural network, so for a 31x31 image it will produce a single label and for a 62x62 image it will produce 2x2 labels and so on. This will remove the redundant computation you talked about in case of windowing method.
If in case the network has a fully-connected layer, it can be replaced with a convolutional ones using a 1x1 kernel.

Down-sampling MNIST dataset for CNN

For my Deep Learning Course, I need to implement a neural network which is exactly the same as the Tensorflow MNIST for Experts Tutorial. ,
The only difference is that I need to down-sampşe the database, then put it into the neural network. Should I crop and resize, or should I implement the neural network with parameters which accepts multiple data sizes(28x28 and 14x14).
All of the parameters in the tensorflow tutorial is static so I couldn't find a way to feed the algorithm with a 14x14 image. Which tool should I use for 'optimal' down-sampling?

You need resize the input images to a fixed size (which appears tp be 14*14 from your description). There are different ways for doing this, for example, you can use interpolation to resize, simply crop the central part or some corner of the image, or randomly chose one or many patches (all of the same size as your network's input) from a give image. You can also combine these methods. For example, in VGG, they first do a aspect preserving resize using bilinear interpolation and then get a random patch from the resulting image (for test phase they get the central crop). You can find VGG's preprocessing source code in TensorFlow at the following link:
https://github.com/tensorflow/models/blob/master/slim/preprocessing/vgg_preprocessing.py
The only parameters of sample code in the tutorial you have mentioned that needs to be changed are those related to the input image sizes. For example, you need to change 28s to 14s and 784s to 228s (these are just examples, there are other wight sizes that you will need to change as well).

Image Segmentation with TensorFlow

I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?

Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.

SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.

Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.