I am currently implementing FCN in tensorflow that enables variable input image size.
I have images of really various image sizes, but unfortunately I am not able to start the training with batch size different than 1.
I am using the feed dict in a following way:
feed_dict = {fcn.images: image_batch,
fcn.labels: labels_batch,
fcn.dropout_keep: dropout}
result = sess.run(list(tf_ops), feed_dict=feed_dict)
I have already tried:
Creating image_batch and labels_batch as numpy array, this however does not work since numpy arrays does not support variable certain dimensions.
Creating image_batch and labels_batch as list of numpy arrays. Here seems that tensorflow is trying to call numpy.array(image_batch).
Going with tf.pack(), this unfortunately does not support different image sizes as well
My question is:
Is there a way how to solve this problem?
Thank you in advance for any suggestions and advices.
So we can close this - quoting Olivier Moindrot above:
You have to pad or resize all your images to the same size before batching them.
Note that after Olivier's answer, there was a new tf.image.decode_and_crop_jpeg op added that can make it a bit easier to do this.
Related
Question:
If I have an array in memory with dims (n, height, width, channels) and I want to get a Pytorch classifier to feed them forward and give me an array with class predictions for each of the n images in the array, how do I do that?
Background:
I am working with a computer vision problem where I modify some images using pre-existing code and want to send the modified images into a Pytorch Classifier CNN (not developed or controlled by me). I am accustomed to Tensorflow/Keras more than Pytorch.
With Tensorflow/Keras models you can give them a bunch of images in a numpy array and it'll go ahead and feed them forward through the model.
PS:
A colleague suggested saving all the images to disk first, then reading them in with DataLoader but that is so unnecessary when I already have the images in memory.
Sorry if it's a dumb question, I tried to find a solution elsewhere but obviously haven't had much success.
You can create a custom DataLoader function which takes the images in memory and returns tensors which can be fed directly to the model without having to save them on disk first. A very simple implementation can be:
def images_to_tensor(images):
#images is numpy array of shape N,H,W,C
#normalizes images between -1 and 1, comment this if you want to normalize images between 0 and 1
images = (images.astype(np.float32) - 127.5)/128.0
#to normalize image from 0 to 1 uncomment the line below
#images = (images.astype(np.float32))/255.0
#changes numpy array to tensors
images = torch.from_numpy(images).permute(0, 3, 1, 2)
#to convert cpu tensors to cuda uncomment the line below
#images = images.to("cuda")
return images
You can then use the function to convert your images to tensors and pass them to the classification model to get the output predictions.
Im building a GAN on Tensorflow for Image Deblurring, its an implementation of DeblurGANv2. I setup the GAN in a way it have two inputs, a batch of blurred images, and a batch of sharp images. Following this lines, I design the input to be a Python Dictionary with two Keys ['sharp', 'blur'], each one have a tensor of shape [batch_size, 512, 512, 3], this make it easy for feed the blurred images batch to the generator, and then feed the output of generator and the sharp images batch to the discriminator.
Based on the last requirements, i create a tf.data.Dataset that outputs exactly that, a dict containing the two tensors, each one with their batch dimension. this complements perfectly with my GAN implementation, everything work fine and smoothly.
So keep in mind, my input is not a tensor, but a python dict, that has no batch dimension, this will be relevant for explain my problem later.
Recently, i decided to add support for distributed training using Tensorflow Distribution Strategies. This feature of Tensorflow allows to distribute the training over multiple devices, inclusively over multiple machines. There is a feature with some of the implementations, for example MirroredStrategy, that takes the input tensor, splits it in equal parts, and feed each slice to different devices, that means, if you have a batch size of 16 and 4 GPUs, each GPU will end taking a local batch of 4 datapoints, after this there is some magic for aggregate the results and other stuff that is not relevant to my problem.
As you already notice, is critical for distribution strategies to have a tensor as input, or at least some sort of input with an exterior batch dimension, and what i have is a Python dict, with the batch dimension of the inputs in the internal dictionary tensor values. This is a huge problem, my current implementation is not compatible with distributed training.
I was looking for workarounds, but i cant wrap my head very well around this, maybe just make the input a huge tensor of shape=[batch_size, 2, 512, 512, 3] and slice it? not sure this just come to my mind right now lol. Anyways i see this very ambiguous, i cant not differentiate the two inputs, at least not with the clarity of the dictionary keys. Edit: The problem with this solution is that make my dataset transformations very expensive, hence makes the dataset throughput lot slower, taking into account this is an image loading pipeline, this is a major point.
Maybe my explanation of how distributed strategies work is not the most rigorous one, if im not seeing something feel free to correct me pls.
PD: This is not a bug question or a code error, mostly a "System Design Query", hope this is not illegal here
Instead of using dictionary as input the GAN, you can try mapping a function in the following way,
def load_image(fileA,fileB):
imageA = tf.io.read_file(fileA)
imageA = tf.image.decode_jpeg(imageA, channels=3)
imageB = tf.io.read_file(fileB)
imageB = tf.image.decode_jpeg(imageB)
return imageA,imageB
trainA = glob.glob('blur/*.jpg')
trainB = glob.glob('sharp/*.jpg')
train_dataset = tf.data.Dataset.from_tensor_slices((trainA,trainB))
train_dataset = train_dataset.map(load_image).batch(batch_size)
#for mirrored strategy
dist_dataset = mirrored_strategy.experimental_distribute_dataset(train_dataset)
You can iterate the dataset and update the network by passing both the images.
I hope this helps !
I have 70'000 2D numpy arrays on which I would like to train a CNN network using Keras. Holding them in memory would be an option but would consume a lot of memory. Thus, I would like to save the matrices on disk and load them on runtime. One option would be to use ImageDataGenerator. The problem is that it only can read images.
I would like to store the arrays not as images because when I would save them as (grayscale) images then the values of arrays are changed (normalized etc.). But in the end I would like to feed the original matrices into the network and not changed values due to saving as image.
Is it possible to somehow store the arrays on disk and iterate over them in a similar way as ImageDataGenerator does?
Or else can I save the arrays as images without changing the values of the arrays?
Instead of using ImageDataGenerator, you can define your own custom data generator class, by overriding few simple methods for the class.
You can follow this medium post for more reference on this.
https://medium.com/#ensembledme/writing-custom-keras-generators-fe815d992c5a
I have been working on MNIST dataset to learn how to use Tensorflow and Python for my deep learning course.
I could read the data internally/externally and also train it in softmax and cnn thanks to tensorflow tutorial at website. At the end, I could get >%90 in softmax, >%98 in cnn, accuracy.
My problem is that I want to resize all images on MNIST as 14x14 and train it again, also to augment all (noising, rotating etc.) and train again. At the end, I want to be able to compare the accuracies of these three different dataset.
Could you please help me to solve it? How to resize all images and how the model should change.
Thanks!
One way to resize images is using the scipy resize function:
from scipy.misc import imresize
img = imresize(yourimage, (14, 14))
But my real advice to you is that should take a look at the Kadenze course "Creative applications of deep learning". This is a notebook for lecture two: https://github.com/pkmital/CADL/blob/master/session-2/lecture-2.ipynb
This course is really good at helping you understand using images and Tensorflow.
What you need is some image processing library like OpenCV, PIL etc. If you are using the dataset downloaded from tensorflow, it will be a 3d array( array of 2d arrays(every image)) or have more dimensions depending on how it's stored (I'm not sure) you can treat numpy arrays as images and use them with any image processing library you like but make sure what datatype they are in and if it's compatible with the libraries you are using.
Also, tensorflow also has such functions if you want to keep it all in tensorflow.
this post has an accepted answer.
I am trying to build a deep learning model for Saliency analysis using caffe (I am using the python wrapper). But I am unable to understand how to generate the lmdb data structure for this purpose. I have gone through the Imagenet and mnist examples and I understand that I should generate labels in the format
my_test_dir/picture-foo.jpg 0
But in my case, I will be labeling each pixel with 0 or 1 indicating whether that pixel is salient or not. That won't be a single label for an image.
How to generate lmdb files for a per pixel based labeling ?
You can approach this problem in two ways:
1. Using HDF5 data layer instead of LMDB. HDF5 is more flexible and can support labels the size of the image. You can see this answer for an example of constructing and using HDF5 input data layer.
2. You can have two LMDB input layers: one for the image and one for the label. Note that when you build the LMDB you must not use the 'shuffle' option in order to have the images and their labels in sync.
Update: I recently gave a more detailed answer here.
Check this one: http://deepdish.io/2015/04/28/creating-lmdb-in-python/
Just load all images in X and corresponding labels in Y.
In caffe both lmdb and hdf5 supports multiple labels per image, matrices if you like, see this thread:
https://github.com/BVLC/caffe/issues/1698#issue-53768814
See this tutorial on how to create a multi-label dataset (lmdb here) for caffe with python code:
http://www.kostyaev.me/article/Multilabel%20Dataset/
EDIT: For example for the labels it uses the caffe-python function which converts a 3-dimensional array to datum, found in caffe/python/caffe.io.py:
array_to_datum(arr, label=None):