I have about 1000 images in a cvs file. I have already managed to put those images in my Python programm by doing the following:
df = pd.read_csv("./Testes_small.csv")
# Creates the dataframe
training_set = pd.DataFrame({'Images': training_imgs,'Labels': training_labels})
train_dataGen = ImageDataGenerator(rescale=1./255)
train_generator = train_dataGen.flow_from_dataframe(dataframe = training_set, directory="",
x_col="Images", y_col="Labels",
class_mode="categorical",
target_size=(224, 224),batch_size=32)
##Steps to plot the images
imgs,labels = next(train_generator)
for i in range(batch_size): # range de 0 a 31
image = imgs[i]
plt.imshow(image)
plt.show()
So now I have the train_generator variable of type python.keras.preprocessing.image.DataframeIterator Its size is (32,224,224,3).
In the function ImageDataGenerator I want to put my own preprocessing function to resize the images. I want to do this because I have some rectangular images that when resized lose its ratio.
Per examples these images before(upper image) and after(the lower one) resizing:
Clearly the secong image loses shape
I found this function(it's the answer to a previous thread):
def resize_image(self, image: Image, length: int) -> Image:
"""
Resize an image to a square. Can make an image bigger to make it fit or smaller if it doesn't fit. It also crops
part of the image.
:param self:
:param image: Image to resize.
:param length: Width and height of the output image.
:return: Return the resized image.
"""
"""
Resizing strategy :
1) We resize the smallest side to the desired dimension (e.g. 1080)
2) We crop the other side so as to make it fit with the same length as the smallest side (e.g. 1080)
"""
if image.size[0] < image.size[1]:
# The image is in portrait mode. Height is bigger than width.
# This makes the width fit the LENGTH in pixels while conserving the ration.
resized_image = image.resize((length, int(image.size[1] * (length / image.size[0]))))
# Amount of pixel to lose in total on the height of the image.
required_loss = (resized_image.size[1] - length)
# Crop the height of the image so as to keep the center part.
resized_image = resized_image.crop(
box=(0, required_loss / 2, length, resized_image.size[1] - required_loss / 2))
# We now have a length*length pixels image.
return resized_image
else:
# This image is in landscape mode or already squared. The width is bigger than the heihgt.
# This makes the height fit the LENGTH in pixels while conserving the ration.
resized_image = image.resize((int(image.size[0] * (length / image.size[1])), length))
# Amount of pixel to lose in total on the width of the image.
required_loss = resized_image.size[0] - length
# Crop the width of the image so as to keep 1080 pixels of the center part.
resized_image = resized_image.crop(
box=(required_loss / 2, 0, resized_image.size[0] - required_loss / 2, length))
# We now have a length*length pixels image.
return resized_image
I'm trying to insert it like this img_datagen = ImageDataGenerator(rescale=1./255, preprocessing_function = resize_image but it doesn't work because I'm not giving an im. Do you have any ideas on how can I do this?
Check the documentation for providing custom functions to the ImageDataGenerator. It says and quote,
"preprocessing_function: function that will be applied on each input. The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and should output a Numpy tensor with the same shape."
From the above documentation we can note the following:
When will this function be executed:
after resizing image.
after any data augmentation which has to be done.
Function argument requirements:
only one argument.
this argument is for only one numpy image.
the image should be numpy tensor of rank 3
Function output requirements:
one output image.
should be same shape as input
This last point is really important for your question. Since your function is resizing the image it's output will not be the same shape as input and so you cannot do this directly.
One alternative to get this done is to do your resizing of dataset before passing to ImageDataGenerator.
Related
def preprocess(self):
# Import image
pic1 = self.path
raw_image = cv2.imread(pic1)
#cv2.imshow('Raw image',raw_image)
#cv2.waitKey(0)
# Resize image
dim = (320,180)
resized = cv2.resize(raw_image, dim)
#cv2.imshow('Resized Image',resized)
#cv2.waitKey(0)
# Scale image
scaled = cv2.normalize(resized, None, alpha=-1, beta=1, norm_type=cv2.NORM_MINMAX,dtype=cv2.CV_32F)
#cv2.imshow('Scaled Image',scaled)
#cv2.waitKey(0)
return scaled
I'm trying to scale the pixel values of "raw_image" to within the range -1 to 1 as part of a pre-process for identifying an object using machine learning. Essentially, a camera takes a picture, resizes and scales the image to the same size as the images within a dataset used for training and validating. Then that image is inferred by the model generated using model.fit() to detect what the object in the image actually is.
The question here is: " Is this scaling function correct for putting the pixel values in the range of -1 to 1?" It appears SUPER dark when I use cv2.imshow and I'm afraid the model isn't recognizing it properly.
I'm rotating a picture the following way:
# Read in image
img = cv2.imread("pic.jpg")
height, width, _ = img.shape
print("height ", height)
print("width ", width)
# Rotate image by 90 degrees
augmentation = iaa.Affine(rotate=90)
img_aug = augmentation(image=img)
height, width, _ = img_aug.shape
print("Height after rotation ", height)
print("Width after rotation ", width
> height 1080
> width 1920
> Height after rotation 1080
> Width after rotation 1920
Why does the shape of the image not change?
Image augmentation does not change the actual or physical shape of your image. Think of the original shape as an window from where you see the image or the outside world. Here all the transformations i.e., rotations, stretchings, translation or in general any homography or non-linear warps are applied to the world outside your window. The opening from where you look at the image, in my analogy the window, stays the same irrespective how the world outside i.e., the image changes.
Now it can obviously bring in some other region, not present in the original view, to the augmented view. Most often, newly introduced pixels will be black or it may depend on what kind of padding is applied.
In short, augmentation operations are not something like matrix transpose where the actual shape may change.
img = cv2.imread("img.png")
augmentation = iaa.Affine(rotate=90)
img_aug = augmentation(image=img)
Let's see the images:
Original Image
Rotated Image
I am looking for a way to take an image/target batch for segmentation and return the batch where the image dimensions have been changed to be equal for the whole batch. I have tried this using the code below:
def collate_fn_padd(batch):
'''
Padds batch of variable length
note: it converts things ToTensor manually here since the ToTensor transform
assume it takes in images rather than arbitrary tensors.
'''
# separate the image and masks
image_batch,mask_batch = zip(*batch)
# pad the images and masks
image_batch = torch.nn.utils.rnn.pad_sequence(image_batch, batch_first=True)
mask_batch = torch.nn.utils.rnn.pad_sequence(mask_batch, batch_first=True)
# rezip the batch
batch = list(zip(image_batch, mask_batch))
return batch
However, I get this error:
RuntimeError: The expanded size of the tensor (650) must match the existing size (439) at non-singleton dimension 2. Target sizes: [3, 650, 650]. Tensor sizes: [3, 406, 439]
How do I efficiently pad the tensors to be of equal dimensions and avoid this issue?
rnn.pad_sequence only pads the sequence dimension, it requires all other dimensions to be equal. You cannot use it to pad images across two dimensions (height and width).
To pad an image torch.nn.functional.pad can be used, but you need to manually determine the height and width it needs to get padded to.
import torch.nn.functional as F
# Determine maximum height and width
# The mask's have the same height and width
# since they mask the image.
max_height = max([img.size(1) for img in image_batch])
max_width = max([img.size(2) for img in image_batch])
image_batch = [
# The needed padding is the difference between the
# max width/height and the image's actual width/height.
F.pad(img, [0, max_width - img.size(2), 0, max_height - img.size(1)])
for img in image_batch
]
mask_batch = [
# Same as for the images, but there is no channel dimension
# Therefore the mask's width is dimension 1 instead of 2
F.pad(mask, [0, max_width - mask.size(1), 0, max_height - mask.size(0)])
for mask in mask_batch
]
The padding lengths are specified in reverse order of the dimensions, where every dimension has two values, one for the padding at the beginning and one for the padding at the end. For an image with the dimensions [channels, height, width] the padding is given as: [width_beginning, width_end, height_beginning, height_top], which can be reworded to [left, right, top, bottom]. Therefore the code above pads the images to the right and bottom. The channels are left out, because they are not being padded, which also means that the same padding could be directly applied to the masks.
Albumentations provides a built-in PadIfNeeded transform
import albumentations as A
from albumentations.pytorch import ToTensorV2
# Get batch dimensions as proposed by Michael Jungo
image_batch,mask_batch = zip(*batch)
batch_height = max([img.size(1) for img in image_batch])
batch_width = max([img.size(2) for img in image_batch])
# Define the transform
transform = A.Compose([
A.PadIfNeeded(min_height=batch_height, min_width=batch_width),
ToTensorV2()])
# Run it (note Albumentations requires Numpy HWC as input)
# Could be more efficient if you loaded from disk to that format first
# Albumentations return is also a dict, so we need to pick out the ``image`` key
image_np_hwc = image_batch.permute(1, 2, 0).numpy()
image_batch = transform(image=image_np_hwc)['image']
# Follow the same process for masks
I am writing a handwriting recognition app and my inputs have to be of a certain size (128x128). When I detect a letter it looks like this:
That image for instance has a size of 40x53. I want to make it 128x128, but simply resizing it lowers the quality especially for smaller images. I want to somehow fill the rest up to 128x128 with the 40x53 in the middle. The background color should also stay relatively the same. I am using Python's opencv but I am new to it. How can I do this, and is it even possible?
Here you can get what you have asked using outputImage. Basically I have added a border using copyMakeBorder method. You can refer this for more details. You have to set the color value as you want in the value parameter. For now it is white [255,255,255].
But I would rather suggest you to resize the original image, seems like it is the better option than what you have asked. Get the image resized you can use resized in the following code. For your convenience I have added both methods in this code.
import cv2
import numpy as np
inputImage = cv2.imread('input.jpg', 1)
outputImage = cv2.copyMakeBorder(inputImage,37,38,44,44,cv2.BORDER_CONSTANT,value=[255,255,255])
resized = cv2.resize(inputImage, (128,128), interpolation = cv2.INTER_AREA)
cv2.imwrite('output.jpg', outputImage)
cv2.imwrite('resized.jpg', resized)
I believe you want to scale your image.
This code might help:
import cv2
img = cv2.imread('name_of_image', cv2.IMREAD_UNCHANGED)
# Get original size of image
print('Original Dimensions: ',img.shape)
# Percentage of the original size
scale_percent = 220
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)
# Resize/Scale the image
resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
# The new size of the image
print('Resized Dimensions: ',resized.shape)
cv2.imshow("Resized image", resized)
cv2.waitKey(0)
cv2.destroyAllWindows()
I am trying to calculate dense feature trajectories of a video as in https://hal.inria.fr/hal-00725627/document. I am trying to use openCV hog descriptors like this:
winSize = (32,32)
blockSize = (32,32)
blockStride = (2,2)
cellSize = (2,2)
nbins = 9
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins)
hist = hog.compute(img)
However, this returns a very large feature vector of size: (160563456, 1).
What is a window? (winSize)
What is a block?
What is a cell?
The documentation isn't particularly helpful at explaining what each of these parameters is.
From http://www.learnopencv.com/histogram-of-oriented-gradients/
I see that to compute HOGs we create a histogram for each cell of an image patch and then normalise over the patch.
What I want is 4 9bin histograms for each (32, 32) patch of my image which should be calculated from the histograms of the (16,16) cells from this patch. So I would expect a final hog feature of size 40716 for a (480,640) image.
(((32*32) / (16*16)) * 9) * (((480-16*640-16)/(32*32)*4)) = 40716
((PatchSize / cell size) * numBins) * numPatches = hogSize
I have also seen people doing stuff like this:
winStride = (8,8)
padding = (8,8)
locations = ((10,20),)
hist = hog.compute(image,winStride,padding,locations)
However, I don't understand what the locations parameter does as I do not wish to only compute the HOG features at a single location but for all (32,32) patches of my image.
cell_size = (16, 16) # h x w in pixels
block_size = (2, 2) # h x w in cells
nbins = 9 # number of orientation bins
# winSize is the size of the image cropped to an multiple of the cell size
# cell_size is the size of the cells of the img patch over which to calculate the histograms
# block_size is the number of cells which fit in the patch
hog = cv2.HOGDescriptor(_winSize=(img.shape[1] // cell_size[1] * cell_size[1],
img.shape[0] // cell_size[0] * cell_size[0]),
_blockSize=(block_size[1] * cell_size[1],
block_size[0] * cell_size[0]),
_blockStride=(cell_size[1], cell_size[0]),
_cellSize=(cell_size[1], cell_size[0]),
_nbins=nbins)
self.hog = hog.compute(img)
We divide the image into cells of mxn pixels. Let's say 8x8.
So a 64x64 image would result in 8x8 cells of 8x8 pixels.
To reduce overall brightness effects we add a normalization stop into the feature calculation. A block contains several cells. Instead of normalizing each cell we normalize across a block. A 32x32 pixel block would contain 4x4 8x8 pixel cells.
A window is the part of the image we calculate the feature descriptor for.
Let's say you want to find something of 64x64 pixels in a large image. You then would slide a 64x64 pixel window across the image and calculate the feature descriptor for each location which you then use to find the location of best match...
It's all in the documents. Just read it and experiment until you understand it.
If you can't follow the documentation, read the source code and see what is going on line by line.