I have a batch of sequential images each containing 5 frames with a shape of (Batch, Sequence, Height, Width, Channel). Here how it looks like with a batch size of 32:
data.shape
> (32, 5, 256, 512, 3)
Now I want to apply some OpenCV and Torch operations on these images. Some examples for these are
cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
or torchvision.transforms.compose operation as:
midas_transformer
> Compose(
<function transforms.<locals>.<lambda> at 0x7ff5c5488a60>
<midas.transforms.Resize object at 0x7ff5c547c0a0>
<midas.transforms.NormalizeImage object at 0x7ff5c547c0d0>
<midas.transforms.PrepareForNet object at 0x7ff5c547c130>
<function transforms.<locals>.<lambda> at 0x7ff5c5488af0>
Currently my solution is nested list comprehension:
new_image = np.array([[my_function(sequence) for sequence in batch] for batch in data])
My question is: What is the best practice to apply these operations on each image frame? Is there any better way to do that?
For the pytorch operations, I would first .reshape() data into a (32 * 5, 3, 256, 512) tensor, then apply the transformation on this whole batch to fully take advantage of CPU/GPU parallelism, and finally reshape into a (32, 5, 256, 512, 3) tensor
Related
I am using the Qubvel segmentation models https://github.com/qubvel/segmentation_models repository to train an Inception-V3-encoder based model for a binary segmentation task. I am using (256 width x 256 height) images to train the models and they are working good. If I double one of the dimensions, say for example, (256 width x 512 height), it works fine as well. However, when I make adjustments for the aspect ratio and resize the images to a custom dimension, say (272 width x 256 height), the model throws an error as follows:
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 16, 18, 2048), (None, 16, 17, 768)]
Is there a way to use such custom dimensions to train these models?
Your value error says that you are trying concatenate batch of inputs with varying dimensions.
This might be due to your dynamic aspect ratio based resizing of images. Say for example one batch of images might have shape (None, 16, 18, 2048) while another batch may have shape (None, 16, 17, 768).
Concatenate operation requires inputs with matching shapes except for
the concatenation axis.
A compatible concatenation will have inputs like (3, 256, 512, 3) and (15, 256, 512, 3) if we are trying to concat on axis=0 which is the concatenation axis. Notice how the shapes are matching except in the concatenation axis. Output will be of shape (18, 256, 512, 3).
Clearly with your input shapes its not possible with any axis. Keep your height and width fixed while training and if any image doesn't fit the size then resize it before passing it for training. This resizing can be done as part of preprocessing before training operation.
I saw a face detection model which consists of the below function. but I could not understand what is the use of the expand_dims function. can anyone explain me what it is and why we are using ?
def get_embedding(model,face_pixels):
face_pixels=face_pixels.astype('float32')
mean, std=face_pixels.mean(),face_pixels.std()
face_pixels=(face_pixels-mean)/std
samples=expand_dims(face_pixels,axis=0)
yhat=model.predict(samples)
return yhat[0]
tf.keras.Conv2D layers expect input with 4D shape:
(n_samples, height, width, channels)
Most libraries that load images will load in 3D like this:
(height, width, channels)
By using np.expand_dims(image, axis=0) or tf.expand_dims(image, axis=0), you add a batch dimension at the beginning, effectively turning your data in the 4D format the Keras needs for Conv2D layers. For instance:
(224, 224, 3)
to:
(1, 224, 224, 3)
If you give Conv2D 3D data, it will give something like this:
ValueError: Error when checking input: expected conv2d_19_input to have 4 dimensions, but got array with shape (60000, 28, 28)
I want to transform the tensor data to numpy data and save it through Opencv, But the opencv require the data dimension must like such style [1, something, something, something], but my tensor data is a blend one, it'e size like [30, something, something, something],how can I modify the data dimension in pytorch.
PS, is there any function in pytorch can save data as a binary picture? I use "save_image" command to save my tensor data to a picture with all figure is 1 or 0, but the picture show is still a gray style. Maybe there is any other ways to save tensor data as a binary picture, please tell me.
def save_image_tensor2cv2(input_tensor, filename):
assert (len(input_tensor.shape) == 4) and input_tensor.shape[0] == 1)
input_tensor = input_tensor.clone().detach()
input_tensor = input_tensor.to(torch.device('cpu'))
input_tensor = input_tensor.squeeze()
input_tensor = input_tensor.mul_(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).type(torch.uint8).numpy()
cv2.imwrite(filename, input_tensor)
batch = next(iter(dataloader_test))
batch.shape
torch.Size([4, 3, 160, 160])
np.transpose(batch.numpy(), (0,2,3,1)).shape
(4, 160, 160, 3)
image = np.transpose(batch.numpy(), (0,2,3,1))
cv2.imwrite("image.png", image[0])
You might have to unNormalize the data before saving it though.
According to the keras docs:
preprocessing_function: function that will be implied on each input. The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and should output a Numpy tensor with the same shape.
My numpy tensor is of rank 5 because my input images have 3 dimensions (height, width, depth).
from keras.preprocessing.image import ImageDataGenerator
label_datagen = ImageDataGenerator()
train_label_generator = label_datagen.flow_from_directory(
directory="some_directory",
target_size=(32, 32, 32),
color_mode='grayscale',
class_mode=None,
batch_size=4)
When I check the first batch, I get my 5D numpy tensor:
first_item = train_image_generator.__getitem__(0)
>>>print('first_item.shape')
(4, 32, 32, 32, 1)
Now I first want to do a simple operation on every input image, I also check the input shape by printing it:
def some_function(arr):
print(arr.shape)
arr += 1
return arr
Here I add this function to my ImageDataGenerator:
label_datagen = FixedImageDataGenerator(preprocessing_function=some_function)
This is what I get as input shape
(32, 32, 1)
which means that it really is limited to rank 3. Any idea how I can modify this so that the input shape is (32, 32, 32)?
My goal is to use the to_categorical function on every input in the ImageDataGenerator. I cannot simply say class_mode="categorical" as I am doing semantic segmentation (not image classification). I know that I could write some custom code for generators for that purpose but I want to know if it would be difficult to modify the keras ImageDataGenerator.
You can use ImageDataGenerator like you would normally, but at the last step, instead of passing a preprocessing_function, wrap your generator in a generator of your own. By doing this, you get full control over the preprocessing function. This means its output no longer has to have the same shape as the input. Be aware that this wrapper function gets fed batches, not single images.
For example:
def preprocess(generator):
for batch in generator:
yield batch[:,1:-1,1:-1] # example: crop 1 px of each border
Now use preprocess(label_datagen) instead. I hope you can use this to circumvent the limitations.
ImageDataGenerator is a generator for images.
This means that in order for this to work your data should be images with 1 (grayscale) or 3 channels (rgb). I think it won't work with your 4-D images (unless depth equals 1 or 3).
I need to convert an image in a numpy array loaded via cv2 into the correct format for the deep learning library mxnet for its convolutional layers.
My current images are shaped as follows: (256, 256, 3), or (height, width, channels).
From what I've been told, this actually needs to be (3, 256, 256), or (channels, height, width).
Unfortunately, my knowledge of numpy/python opencv isn't good enough to know how to manipulate the arrays correctly.
I've figured out that I could split the arrays into channels by cv2.split, but I'm uncertain of how to combine them again in the right format (I don't know if using cv2.split is optimal, or if there are better ways in numpy).
Thanks for any help.
You can use numpy.rollaxis as follow:
If your image as shape (height, width, channels)
import numpy as np
new_shaped_image = np.rollaxis(image, axis=2, start=0)
This means that the 2nd axis of the new_shaped_image will be at 0 spot.
So new_shaped_image.shape will be (channels, height, width)
arr.transpose(2,0,1).shape
# (3, 256, 256)