I am working on doing prediction for my large database of ~1 million images. For each image, I have code that can chop the image up into ~200 smaller images and pass them into keras as a numpy array for prediction.
I want to avoid unnecessary reading and writing to the hard drive, so I don't want to save all these smaller images and use flow_from_directory. Instead, I am looking to read in an image, chop it up with my existing code, and pass the smaller images into my network as a batch all in memory, and then repeat this process for many images.
Is this something Keras can handle? If so, I suspect I will need to make my own custom generator, but I'm not sure how to do this, and I couldn't find any good examples. Does anyone have an example of how to implement a custom generator?
Try something like this:
dpath='path to test folder'
ids=os.listdir(dpath+"test/")
for id in ids:
x_batch=[]
img = cv2.imread(dpath+'test/{}.jpg'.format(id)) #jpg if image in jpg format
img = cv2.resize(img, (224, 224), interpolation = cv2.INTER_CUBIC) #if resize is needed
chopped_image= your code that chops image
x_batch.append(chopped_image)
x_batch = np.array(x_batch, np.float32)
preds=(model.predict_on_batch(x_batch))
if first==1:
predsA=preds.copy()
first=0
else:
predsA=np.append(predsA,preds,axis=0)
Related
Question:
If I have an array in memory with dims (n, height, width, channels) and I want to get a Pytorch classifier to feed them forward and give me an array with class predictions for each of the n images in the array, how do I do that?
Background:
I am working with a computer vision problem where I modify some images using pre-existing code and want to send the modified images into a Pytorch Classifier CNN (not developed or controlled by me). I am accustomed to Tensorflow/Keras more than Pytorch.
With Tensorflow/Keras models you can give them a bunch of images in a numpy array and it'll go ahead and feed them forward through the model.
PS:
A colleague suggested saving all the images to disk first, then reading them in with DataLoader but that is so unnecessary when I already have the images in memory.
Sorry if it's a dumb question, I tried to find a solution elsewhere but obviously haven't had much success.
You can create a custom DataLoader function which takes the images in memory and returns tensors which can be fed directly to the model without having to save them on disk first. A very simple implementation can be:
def images_to_tensor(images):
#images is numpy array of shape N,H,W,C
#normalizes images between -1 and 1, comment this if you want to normalize images between 0 and 1
images = (images.astype(np.float32) - 127.5)/128.0
#to normalize image from 0 to 1 uncomment the line below
#images = (images.astype(np.float32))/255.0
#changes numpy array to tensors
images = torch.from_numpy(images).permute(0, 3, 1, 2)
#to convert cpu tensors to cuda uncomment the line below
#images = images.to("cuda")
return images
You can then use the function to convert your images to tensors and pass them to the classification model to get the output predictions.
I have a simple question to some of you. I have worked on some image classification tutorials. Only the simpler ones like MNIST dataset. Then I noticed that they do this
train_images = train_images / 255.0
Now I know that every value from the matrix (which is the image) gets divided by 255.0. If I remember correctly this is called normalization right? (please correct me if I am wrong otherwise tell me that I am right).
I'm just curious is there a "BETTER WAY","ANOTHER WAY" or "THE BEST WAY" to pre-process or clean images then those cleaned images will be fed to the network for training.
Please if you would like to provide a sample source code. Please! be my guest. I would love to look at code samples.
Thank you!
Pre-processing images prior to image classification can include the followings:
normalisation: which you already mentioned
reshaping into uniform resolution (img height x img width): higher resoltuion leads to better learning and smaller resolution may lose important features. Some models have default input size that you can refer to. Also an average size of all images can be used too.
color channel: 1 refers to gray-scale and 3 refers rgb-scale. Depending on your application you can set this.
data augmentation: if your model is overfitting or your dataset is small, you can reproduce your dataset by altering original images (flipping, rotating, cropping, zooming..) to increase your dataset
image segmentation: segmentation can be performed to highlight the area or boundaries that may benefit your application. For example, in medical image classification, some part of body maybe masked to enhance classification performance.
For example, I recently worked on image classification of lung CT scan images. For pre-processing, I have reshaped the images and made them gray-scale. Then I performed image segmentation to highlight the lungs in the images. And I normalised the image pixels to put into my classification model. Depending on your application, there may be other more pre-processing techniques you might want to consider.
I am trying to load data from jpeg files to train a convolution network. The images are large, with 24 million pixels however, so loading and using the full resolution is not practical.
To get the images to a more useful format I am trying to load each image, rescale it and then append it to a list. Once this is done, I can then convert the list into a numpy array and feed into the network for training as usual.
My problem is that my data set is very large and it takes about a second to rescale every image, which means it is not feasible to resize every image the way I have currently implemented this:
length_training_DF = 30000
for i in range(length_training_DF):
im = plt.imread(TRAIN_IM_DIR + trainDF.iloc[i]['image_name'] + '.jpg')
image = block_reduce(im, block_size=(10, 10, 1), func=np.max)
trainX.append(image)
I have also used the following:
length_training_DF = 30000
from keras.preprocessing import image
for i in range(50):
img = image.load_img(TRAIN_IM_DIR + trainDF.iloc[0]['image_name'] + '.jpg', target_size=(224, 224))
trainX.append(ima)
Is there any way to load these images more quickly into a format for training a network? I have thought about using a keras dataset, perhaps by using tf.keras.preprocessing.image_dataset_from_directory(), but the directory in which the image data is stored is not formatted correctly into folders containing the same targets as is required by this method.
The images are for a binary classification problem.
The usual way would be to write a preprocessing script that loads the large images, rescales them, applies other operations if needed, and then saves each class to a separate directory, as required by ImageDataGenerator.
There are at least three good reasons to do that:
Typically, you will run your training process dozens of time. You don't want to every time do the rescaling or e.g. auto-white balance.
ImageDataGenerator provides vital methods for augmenting your training data set.
It's a good generator out of the box. Likely you don't want to load entire data set into memory.
I'm trying to perform some digit recognition using PyTorch. I have implemented a convolutional version of the sliding window with size 32x32. Which makes me able to identify digits of this range of size in a picture.
But now let's imagine I have an image of size 300x300 with a digit that occupies the whole image. I will never be able to identify it...
I have seen people saying that the image needs to be rescaled and resized. Meaning that I need to create various scaled versions of my initial image and then to feed my network with those "new" images.
Does anyone have any idea how I can perform that?
Here is a part of my code, if it can help..
# loading dataset
size=200
height=200
width= 300
transformer_svhn_test = transforms.Compose([
transforms.Grayscale(3),
transforms.Resize((height, width)),
transforms.CenterCrop((size, size)),
transforms.ToTensor(),
transforms.Normalize([.5,.5,.5], [.5,.5,.5])
])
SVHN_test = SVHN_(train=False, transform=transformer_svhn_test)
SVHN_test_loader = DataLoader(SVHN_test, batch_size=batch_size, shuffle=False, num_workers=3)
#loading network
model = Network()
model.to(device)
model.load_state_dict(torch.load("digit_classifier_gray_scale_weighted.pth"))
# loading one image and feeding the model with it
image = next(iter(SVHN_test_loader))[0]
image_tensor = image.unsqueeze(0) # creating a single-image batch
image_tensor = image_tensor.to(device)
model.eval()
output = model(image_tensor)
Please correct me if I understand your question wrong:
Your network takes images with size of 300x300 as input, and does 32x32 sliding window operation within your model, and output the locations of any digits in input images? In this setup, you are framing this problem as an object detection task.
I am imaging the digits in your training data have sizes that are similar to 32x32, and you wanted to use multiple scale evaluation to make sure digits on your testing images will also have similar sizes as those in your training data. As for object detection network, the input size of your network is not fixed.
So the thing you need is actually called multi scale evaluation/testing, and you will find it very common in Computer Vision tasks.
A good starting point would be HERE
I'm a student in medical imaging. I have to construct a neural network for image segmentation. I have a data set of 285 subjects, each with 4 modalities (T1, T2, T1ce, FLAIR) + their respective segmentation ground truth. Everything is in 3D with resolution of 240x240x155 voxels (this is BraTS data set).
As we know, I cannot input the whole image on a GPU for memory reasons. I have to preprocess the images and decompose them in 3D overlapping patches (sub-volumes of 40x40x40) which I do with scikit-image view_as_windows and then serialize the windows in a TFRecords file. Since each patch overlaps of 10 voxels in each direction, these sums to 5,292 patches per volume. The problem is, with only 1 modality, I get sizes of 800 GB per TFRecords file. Plus, I have to compute their respective segmentation weight map and store it as patches too. Segmentation is also stored as patches in the same file.
And I eventually have to include all the other modalities, which would take nothing less than terabytes of storage. I also have to remember I must also sample equivalent number of patches between background and foreground (class balancing).
So, I guess I have to do all preprocessing steps on-the-fly, just before every training step (while hoping not to slow down training too). I cannot use tf.data.Dataset.from_tensors() since I cannot load everything in RAM. I cannot use tf.data.Dataset.from_tfrecords() since preprocessing the whole thing before takes a lot of storage and I will eventually run out.
The question is : what's left for me for doing this cleanly with the possibility to reload the model after training for image inference ?
Thank you very much and feel free to ask for any other details.
Pierre-Luc
Finally, I found a method to solve my problem.
I first crop a subject's image without applying the actual crop. I only measure the slices I need to crop the volume to only the brain. I then serialize all the data set images into one TFRecord file, each training example being an image modality, original image's shape and the slices (saved as Int64 feature).
I decode the TFRecords afterward. Each training sample are reshaped to the shape it contains in a feature. I stack all the image modalities into a stack using tf.stack() method. I crop the stack using the previously extracted slices (the crop then applies to all images in the stack). I finally get some random patches using tf.random_crop() method that allows me to randomly crop a 4-D array (heigh, width, depth, channel).
The only thing I still haven't figured out is data augmentation. Since all this is occurring in Tensors format, I cannot use plain Python and NumPy to rotate, shear, flip a 4-D array. I would need to do it in the tf.Session(), but I would rather like to avoid this and directly input the training handle.
For the evaluation, I serialize in a TFRecords file only one test subject per file. The test subject contains all modalities too, but since there is no TensorFLow methods to extract patches in 4-D, the image is preprocessed in small patches using Scikit-Learn extract_patches() method. I serialize these patches to the TFRecords.
This way, training TFRecords is a lot smaller. I can evaluate the test data using batch prediction.
Thanks for reading and feel free to comment !