For my Deep Learning Course, I need to implement a neural network which is exactly the same as the Tensorflow MNIST for Experts Tutorial. ,
The only difference is that I need to down-sampşe the database, then put it into the neural network. Should I crop and resize, or should I implement the neural network with parameters which accepts multiple data sizes(28x28 and 14x14).
All of the parameters in the tensorflow tutorial is static so I couldn't find a way to feed the algorithm with a 14x14 image. Which tool should I use for 'optimal' down-sampling?
You need resize the input images to a fixed size (which appears tp be 14*14 from your description). There are different ways for doing this, for example, you can use interpolation to resize, simply crop the central part or some corner of the image, or randomly chose one or many patches (all of the same size as your network's input) from a give image. You can also combine these methods. For example, in VGG, they first do a aspect preserving resize using bilinear interpolation and then get a random patch from the resulting image (for test phase they get the central crop). You can find VGG's preprocessing source code in TensorFlow at the following link:
https://github.com/tensorflow/models/blob/master/slim/preprocessing/vgg_preprocessing.py
The only parameters of sample code in the tutorial you have mentioned that needs to be changed are those related to the input image sizes. For example, you need to change 28s to 14s and 784s to 228s (these are just examples, there are other wight sizes that you will need to change as well).
Related
I am working on a problem which requires me to build a deep learning model that based on certain input image it has to output another image. It is worth noting that these two images are conceptually related but they don't have the same dimensions.
At first I thought that a classical CNN with a final dense layer whose argument is the multiplication of the height and width of the output image would suit this case, but when training it was giving strange figures such as accuracy of 0.
While looking for some answers on the Internet I discovered the concepts of CNN autoencoders and I was wondering if this approach could help me solve my problem. Among all the examples I saw, the input and output of an autoencoder had the same size and dimensions.
At this point I wanted to ask if there was a type of CNN autoencoders that produce an output image that has different dimension compared to input image.
Auto-encoder (AE) is an architecture that tries to encode your image into a lower-dimensional representation by learning to reconstruct the data from such representation simultaniously. Therefore AE rely on a unsupervised (don't need labels) data that is used both as an input and as the target (used in the loss).
You can try using a U-net based architecture for your usecase. A U-net would forward intermediate data representations to later layers of the network which should assist with faster learning/mapping of the inputs into a new domain..
You can also experiment with a simple architecture containing a few ResNet blocks without any downsampling layers, which might or might not be enough for your use-case.
If you want to dig a little deeper you can look into Disco-GAN and related methods.They explicitly try to map image into a new domain while maintaining image information.
I want to detect small objects (9x9 px) in my images (around 1200x900) using neural networks. Searching in the net, I've found several webpages with codes for keras using customized layers for custom objects classification. In this case, I've understood that you need to provide images where your object is alone. Although the training is goodand it classifies them properly, unfortunately I haven't found how to later load this trained network to find objects in my big images.
On the other side, I have found that I can do this using the cnn class in cv if I load the weigths from the Yolov3 netwrok. In this case I provide the big images with the proper annotations but the network is not well trained...
Given this context, could someone show me how to load weigths in cnn that are trained with a customized network and how to train that nrtwork?
After a lot of search, I've found a better approach:
Cut your images in subimages (I cut it in 2 rows and 4 columns).
Feed yolo with these subimages and their proper annotations. I used yolov3 tiny, with a size of 960x960 for 10k steps. In my case, intensity and color was important so random parameters such as hue, saturation and exposition were kept at 0. Use random angles. If your objects do not change in size, disable random at yolo layers (random=0 in cfg files. It only randomizes the fact that it changes the size for training in every step). For this, I'm using Alexey darknet fork. If you have some blur object, add blur=1 in the [net] properties in cfg file (after hue). For blur you need Alexey fork and to be compiled with opencv (appart from cuda if you can).
Calculate anchors with Alexey fork. Cluster_num is the number of pairs of anchors you use. You can know it by opening your cfg and look at any anchors= line. Anchors are the size of the boxes that darknet will use to predict the positions. Cluster_num = number of anchors pairs.
Change cfg with your new anchors. If you have fixed size objects, anchors will be very close in size. I left the ones for bigger (first yolo layer) but for the second, the tinies, I modified and I even removed 1 pair. If you remove some, then change the order in mask [yolo] (in all [yolo]). Mask refer to the index of the anchors, starting at 0 index. If you remove some, change also the num= inside the [yolo].
After, detection is quite good.It could happen that if you detect on a video, there are objects that are lost in some frames. You can try to avoid this by using the lstm cfg. https://github.com/AlexeyAB/darknet/issues/3114
Now, if you also want to track them, you can apply a deep sort algorithm with your yolo pretrained network. For example, you can convert your pretrained network to keras using https://github.com/allanzelener/YAD2K (add this commit for tiny yolov3 https://github.com/allanzelener/YAD2K/pull/154/commits/e76d1e4cd9da6e177d7a9213131bb688c254eb20) and then use https://github.com/Qidian213/deep_sort_yolov3
As an alternative, you can train it with mask-rcnn or any other faster-rcnn algorithm and then look for deep-sort.
I'm a student in medical imaging. I have to construct a neural network for image segmentation. I have a data set of 285 subjects, each with 4 modalities (T1, T2, T1ce, FLAIR) + their respective segmentation ground truth. Everything is in 3D with resolution of 240x240x155 voxels (this is BraTS data set).
As we know, I cannot input the whole image on a GPU for memory reasons. I have to preprocess the images and decompose them in 3D overlapping patches (sub-volumes of 40x40x40) which I do with scikit-image view_as_windows and then serialize the windows in a TFRecords file. Since each patch overlaps of 10 voxels in each direction, these sums to 5,292 patches per volume. The problem is, with only 1 modality, I get sizes of 800 GB per TFRecords file. Plus, I have to compute their respective segmentation weight map and store it as patches too. Segmentation is also stored as patches in the same file.
And I eventually have to include all the other modalities, which would take nothing less than terabytes of storage. I also have to remember I must also sample equivalent number of patches between background and foreground (class balancing).
So, I guess I have to do all preprocessing steps on-the-fly, just before every training step (while hoping not to slow down training too). I cannot use tf.data.Dataset.from_tensors() since I cannot load everything in RAM. I cannot use tf.data.Dataset.from_tfrecords() since preprocessing the whole thing before takes a lot of storage and I will eventually run out.
The question is : what's left for me for doing this cleanly with the possibility to reload the model after training for image inference ?
Thank you very much and feel free to ask for any other details.
Pierre-Luc
Finally, I found a method to solve my problem.
I first crop a subject's image without applying the actual crop. I only measure the slices I need to crop the volume to only the brain. I then serialize all the data set images into one TFRecord file, each training example being an image modality, original image's shape and the slices (saved as Int64 feature).
I decode the TFRecords afterward. Each training sample are reshaped to the shape it contains in a feature. I stack all the image modalities into a stack using tf.stack() method. I crop the stack using the previously extracted slices (the crop then applies to all images in the stack). I finally get some random patches using tf.random_crop() method that allows me to randomly crop a 4-D array (heigh, width, depth, channel).
The only thing I still haven't figured out is data augmentation. Since all this is occurring in Tensors format, I cannot use plain Python and NumPy to rotate, shear, flip a 4-D array. I would need to do it in the tf.Session(), but I would rather like to avoid this and directly input the training handle.
For the evaluation, I serialize in a TFRecords file only one test subject per file. The test subject contains all modalities too, but since there is no TensorFLow methods to extract patches in 4-D, the image is preprocessed in small patches using Scikit-Learn extract_patches() method. I serialize these patches to the TFRecords.
This way, training TFRecords is a lot smaller. I can evaluate the test data using batch prediction.
Thanks for reading and feel free to comment !
I am new to Deep Learning and am using Keras to learn it. I followed instructions at this link to build a handwritten digit recognition classifier using MNIST dataset. It worked fine in terms of seeing comparable evaluation results. I used tensorflow as the backend of Keras.
Now I want to read an image file with a handwritten digit and predict its digit using the same model. I think the image needs to be transformed to be in 28x28 dimension with 255 depth first? I am not sure whether my understanding is correct to begin with. If so, how can I do this transformation in Python? If my understanding is incorrect, what kind of transformation is required?
Thank you in advance!
To my knowledge, you will need to turn this into a 28x28 grayscale image in order to work with this in Python. That's the same shape and scheme as the images that were used to train MNIST, and the tensors are all expecting 784 (28 * 28)-sized items, each with a value between 0-255 in their tensors as input.
To resize an image you could use PIL or Pillow. See this SO post or this page in the Pillow docs (linked to by Wtower in the previously mentioned post, copied here for ease of accesson resizing and keeping aspect ratio, if that's what you want to do.
HTH!
Cheers,
-Maashu
I am trying to see the feasibility of using TensorFlow to identify features in my image data. I have 50x50px grayscale images of nuclei that I would like to have segmented- the desired output would be either a 0 or 1 for each pixel. 0 for the background, 1 as the nucleus.
Example input: raw input data
Example label (what the "label"/real answer would be): output data (label)
Is it even possible to use TensorFlow to perform this type of machine learning on my dataset? I could potentially have thousands of images for the training set.
A lot of the examples have a label correspond to a single category, for example, a 10 number array [0,0,0,0,0,0,0,0,0,0,0] for the handwritten digit data set, but I haven't seen many examples that would output a larger array. I would assume I the label would be a 50x50 array?
Also, any ideas on the processing CPU time for this time of analysis?
Yes, this is possible with TensorFlow. In fact, there are many ways to approach it. Here's a very simple one:
Consider this to be a binary classification task. Each pixel needs to be classified as foreground or background. Choose a set of features by which each pixel will be classified. These features could be local features (such as a patch around the pixel in question) or global features (such as the pixel's location in the image). Or a combination of the two.
Then train a model of your choosing (such as a NN) on this dataset. Of course your results will be highly dependant upon your choice of features.
You could also take a graph-cut approach if you can represent that computation as a computational graph using the primitives that TensorFlow provides. You could then either not make use of TensorFlow's optimization functions such as backprop or if there are some differentiable variables in your computation you could use TF's optimization functions to optimize those variables.
SoftmaxWithLoss() works for your image segmentation problem, if you reshape the predicted label and true label map from [batch, height, width, channel] to [N, channel].
In your case, your final predicted map will be channel = 2, and after reshaping, N = batchheightwidth, then you can use SoftmaxWithLoss() or similar loss function in tensorflow to run the optimization.
See this question that may help.
Try using a convolutional filters for the model. A stacking of convolution and downsampling layers. The input should be the normalized pixel image and output should be the mask. The last layer should be a softmaxWithLoss. HTH.