Create multiple images with different ratio PyTorch - python

I'm trying to perform some digit recognition using PyTorch. I have implemented a convolutional version of the sliding window with size 32x32. Which makes me able to identify digits of this range of size in a picture.
But now let's imagine I have an image of size 300x300 with a digit that occupies the whole image. I will never be able to identify it...
I have seen people saying that the image needs to be rescaled and resized. Meaning that I need to create various scaled versions of my initial image and then to feed my network with those "new" images.
Does anyone have any idea how I can perform that?
Here is a part of my code, if it can help..
# loading dataset
size=200
height=200
width= 300
transformer_svhn_test = transforms.Compose([
transforms.Grayscale(3),
transforms.Resize((height, width)),
transforms.CenterCrop((size, size)),
transforms.ToTensor(),
transforms.Normalize([.5,.5,.5], [.5,.5,.5])
])
SVHN_test = SVHN_(train=False, transform=transformer_svhn_test)
SVHN_test_loader = DataLoader(SVHN_test, batch_size=batch_size, shuffle=False, num_workers=3)
#loading network
model = Network()
model.to(device)
model.load_state_dict(torch.load("digit_classifier_gray_scale_weighted.pth"))
# loading one image and feeding the model with it
image = next(iter(SVHN_test_loader))[0]
image_tensor = image.unsqueeze(0) # creating a single-image batch
image_tensor = image_tensor.to(device)
model.eval()
output = model(image_tensor)

Please correct me if I understand your question wrong:
Your network takes images with size of 300x300 as input, and does 32x32 sliding window operation within your model, and output the locations of any digits in input images? In this setup, you are framing this problem as an object detection task.
I am imaging the digits in your training data have sizes that are similar to 32x32, and you wanted to use multiple scale evaluation to make sure digits on your testing images will also have similar sizes as those in your training data. As for object detection network, the input size of your network is not fixed.
So the thing you need is actually called multi scale evaluation/testing, and you will find it very common in Computer Vision tasks.
A good starting point would be HERE

Related

how to use pretrained model CNN with real images various size?

how to use pretrained model on 512x512 images on real images with different sizes like Yolo object detection on real image ?
CNNs require fixed image sizes, so how do you manage to use the models on real images larger than the inputs?
If it is just about the image size, you could resize your image to have same size as model input. When you receive the output, assuming that you have bounding boxes or locations etc, you can always rescale them back to original image size. Many ML/DL frameworks provide this functionality

How do we really clean or pre-process Image for Image Classification?

I have a simple question to some of you. I have worked on some image classification tutorials. Only the simpler ones like MNIST dataset. Then I noticed that they do this
train_images = train_images / 255.0
Now I know that every value from the matrix (which is the image) gets divided by 255.0. If I remember correctly this is called normalization right? (please correct me if I am wrong otherwise tell me that I am right).
I'm just curious is there a "BETTER WAY","ANOTHER WAY" or "THE BEST WAY" to pre-process or clean images then those cleaned images will be fed to the network for training.
Please if you would like to provide a sample source code. Please! be my guest. I would love to look at code samples.
Thank you!
Pre-processing images prior to image classification can include the followings:
normalisation: which you already mentioned
reshaping into uniform resolution (img height x img width): higher resoltuion leads to better learning and smaller resolution may lose important features. Some models have default input size that you can refer to. Also an average size of all images can be used too.
color channel: 1 refers to gray-scale and 3 refers rgb-scale. Depending on your application you can set this.
data augmentation: if your model is overfitting or your dataset is small, you can reproduce your dataset by altering original images (flipping, rotating, cropping, zooming..) to increase your dataset
image segmentation: segmentation can be performed to highlight the area or boundaries that may benefit your application. For example, in medical image classification, some part of body maybe masked to enhance classification performance.
For example, I recently worked on image classification of lung CT scan images. For pre-processing, I have reshaped the images and made them gray-scale. Then I performed image segmentation to highlight the lungs in the images. And I normalised the image pixels to put into my classification model. Depending on your application, there may be other more pre-processing techniques you might want to consider.

Single Prediction Image doesn't need to be rescaled?

I followed a tutorial to make my first Convolutional Neural Network using Keras and I have a small question regarding the rescaling step.
So when we are importing the training set and test set, we create an instance of the tf.keras.preprocessing.image.ImageDataGenerator class and use it as:
train_datagen = ImageDataGenerator(rescale=1/255)
Along with some other augmentation parameters. My understanding is that we use the rescale parameter to normalize the pixel values of the images imported.
But when we load up a single image to run through the CNN, we write something like (code from keras docs):
image = tf.keras.preprocessing.image.load_img(image_path)
input_arr = keras.preprocessing.image.img_to_array(image)
input_arr = np.array([input_arr]) # Convert single image to a batch.
predictions = model.predict(input_arr)
My question is, I cannot see the single input image being rescaled anywhere. Is it being done implicitly, or is there no need to actually perform rescaling? If the latter, then why is it so?
Thanks!
The image should be normalized that it should be divided by 255, if it's done during the training. Network will not be able to interpret that.
Also, when we use test_datagen, we apply Rescaling by 1/255 for the predict generator.
Normalization, mean subtraction and std deviation needs to be done at the testing time, if that has been applied during the training stage.

Resizing images in data preprocessing for training convolution network

I am trying to load data from jpeg files to train a convolution network. The images are large, with 24 million pixels however, so loading and using the full resolution is not practical.
To get the images to a more useful format I am trying to load each image, rescale it and then append it to a list. Once this is done, I can then convert the list into a numpy array and feed into the network for training as usual.
My problem is that my data set is very large and it takes about a second to rescale every image, which means it is not feasible to resize every image the way I have currently implemented this:
length_training_DF = 30000
for i in range(length_training_DF):
im = plt.imread(TRAIN_IM_DIR + trainDF.iloc[i]['image_name'] + '.jpg')
image = block_reduce(im, block_size=(10, 10, 1), func=np.max)
trainX.append(image)
I have also used the following:
length_training_DF = 30000
from keras.preprocessing import image
for i in range(50):
img = image.load_img(TRAIN_IM_DIR + trainDF.iloc[0]['image_name'] + '.jpg', target_size=(224, 224))
trainX.append(ima)
Is there any way to load these images more quickly into a format for training a network? I have thought about using a keras dataset, perhaps by using tf.keras.preprocessing.image_dataset_from_directory(), but the directory in which the image data is stored is not formatted correctly into folders containing the same targets as is required by this method.
The images are for a binary classification problem.
The usual way would be to write a preprocessing script that loads the large images, rescales them, applies other operations if needed, and then saves each class to a separate directory, as required by ImageDataGenerator.
There are at least three good reasons to do that:
Typically, you will run your training process dozens of time. You don't want to every time do the rescaling or e.g. auto-white balance.
ImageDataGenerator provides vital methods for augmenting your training data set.
It's a good generator out of the box. Likely you don't want to load entire data set into memory.

convolution neural network image size distortion

I have 2 data sets of images, one is perfect square, so resizing to 224x224 for CNN will not result in any distortion, the other dataset is not square so resizing to 224x224 will result in image distortion.
I will split the sets to train and validation, is this a good way to train the model? will there be any bias in the model?
I am afraid the model will identify distortion rather than the real differences between the 2 sets..
In case you want to preserve your data, you can crop it randomly and use it to transform to square. That way your model will look on the cropped part of the image. Doing so can increase your data but this is one good data if you save the transformed image. However using random crop function from from the dataloader will stream line the process. Cropping is a good augmentation technique for preprocessing the data.

Categories

Resources