I need to convert an image in a numpy array loaded via cv2 into the correct format for the deep learning library mxnet for its convolutional layers.
My current images are shaped as follows: (256, 256, 3), or (height, width, channels).
From what I've been told, this actually needs to be (3, 256, 256), or (channels, height, width).
Unfortunately, my knowledge of numpy/python opencv isn't good enough to know how to manipulate the arrays correctly.
I've figured out that I could split the arrays into channels by cv2.split, but I'm uncertain of how to combine them again in the right format (I don't know if using cv2.split is optimal, or if there are better ways in numpy).
Thanks for any help.
You can use numpy.rollaxis as follow:
If your image as shape (height, width, channels)
import numpy as np
new_shaped_image = np.rollaxis(image, axis=2, start=0)
This means that the 2nd axis of the new_shaped_image will be at 0 spot.
So new_shaped_image.shape will be (channels, height, width)
arr.transpose(2,0,1).shape
# (3, 256, 256)
Related
I've used Keras and Tensorflow by thinking that Input(shape=(1280,224,1) will accept as a Tensor of Grayscale Image of Width=1280 and Height=224 and that is what I had been using for all the pre trained models in ResNet or so by using the ImagedataGeneratorflow_from_directory(target_size=(1280,224)).
It is kind of given in the ResNet50 github Code as width and height as:
input_shape: optional shape tuple, only to be specified
if include_top is False (otherwise the input shape
has to be (224, 224, 3) (with channels_last data format)
or (3, 224, 224) (with channels_first data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 32.
But Suddenly I stumped across tf.image.resize
where the argument was gives as:
size: A 1-D int32 Tensor of 2 elements: new_height, new_width. The new size for the images
Either my whole had been a mistake or the functionality is different here. Please Help.
Please correct me but I think I have found the answer. Going through the Conv2D documentation, I found that:
data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch_size, height, width, channels) while channels_first corresponds to inputs with shape (batch_size, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be channels_last.
So does the documentation of ImageDataGenerator says:
data_format Image data format, either "channels_first" or "channels_last". "channels_last" mode means that the images should have shape (samples, height, width, channels), "channels_first" mode means that the images should have shape (samples, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "channels_last".
So I think the tf/Keras uses height,width while PIL or pillow uses width,height format as the documentation says:
size – The requested size in pixels, as a 2-tuple: (width, height).
Please correct
I saw a face detection model which consists of the below function. but I could not understand what is the use of the expand_dims function. can anyone explain me what it is and why we are using ?
def get_embedding(model,face_pixels):
face_pixels=face_pixels.astype('float32')
mean, std=face_pixels.mean(),face_pixels.std()
face_pixels=(face_pixels-mean)/std
samples=expand_dims(face_pixels,axis=0)
yhat=model.predict(samples)
return yhat[0]
tf.keras.Conv2D layers expect input with 4D shape:
(n_samples, height, width, channels)
Most libraries that load images will load in 3D like this:
(height, width, channels)
By using np.expand_dims(image, axis=0) or tf.expand_dims(image, axis=0), you add a batch dimension at the beginning, effectively turning your data in the 4D format the Keras needs for Conv2D layers. For instance:
(224, 224, 3)
to:
(1, 224, 224, 3)
If you give Conv2D 3D data, it will give something like this:
ValueError: Error when checking input: expected conv2d_19_input to have 4 dimensions, but got array with shape (60000, 28, 28)
I used to use keras and the image format it followed is [Height x Width x Channels x Samples]. i decided to switch to PyTorch. But i didn’t switch out my data loading schemes. So now i have numpy arrays of shape HxWxCxS, instead of SxCxHxW which is required for PyTorch. Does anyone have any idea to convert this ?
First, Keras format is (samples, height, width, channels).
All you need to do is a moved = numpy.moveaxis(data, -1,1)
If by luck you were using the non-default config "channels_first", then the config is identical to that of PyTorch, which is (samples, channels, height, width).
And when transforming to torch: data = torch.from_numpy(moved)
You can convert your numpy arrays to tensors in pytorch quite easily by using the from_numpy function:
import torch
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
b is now useable in pytorch
By convention an image tensor is always 3D : One dimension for its height, one for its width and a third one for its color channel. Its shape looks like (height, width, color).
For instance a batch of 128 color images of size 256x256 could be stored in a 4D-tensor of shape (128, 256, 256, 3). The color channel represents here RGB colors. Another example with batch of 128 grayscale images stored in a 4D-tensor of shape (128, 256, 256, 1). The color could be coded as 8-bit integers.
For the second example, the last dimension is a vector containing only one element. It is then possible to use a 3D-tensor of shape (128, 256, 256,) instead.
Here comes my question : I would like to know if there is a difference between using a 3D-tensor rather than a 4D-tensor as the training input of a deep-learning framework using keras.
EDIT : My input layer is a conv2D
I you take a look at the Keras documentation of the conv2D layer here you will see that the shape of the input tensor must be 4D.
conv2D layer input shape
4D tensor with shape: (batch, channels, rows, cols) if data_format is "channels_first" or 4D tensor with shape: (batch, rows, cols, channels) if data_format is "channels_last".
So the 4th dimension of the shape is mandatory, even if it is only "1" as for a grayscaled image.
So in fact, it is not a matter of performance gain nor lack of simplicity, it's only the mandatory input argument's shape.
Hope it answers your question.
In the Tensorflow cifar10 tutorial, in the distorted_input function they set the shape of the float image by doing.
float_image.set_shape([height, width, 3])
but isn't the tensor already in that shape when we loaded it distorted_image
distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
Why do we need to set the shape of the tensor again before we pass it to the batch function.
Here is my opinion why it is done based on the documentation. We start with
distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
distorted_image = tf.image.random_flip_left_right(distorted_image)
distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8)
float_image = tf.image.per_image_standardization(distorted_image)
After random_crop() we know the shape of the tensor, from the documentation the shape stays the same after random_flip_left_right(). But the documentation does not tell that the shape is preserved after random_brightness() and random_contrast(). (Notice the difference between previous functions where in the return argument it was mentioned that the shape stays the same).
So as far as I understand TF does not know exactly the shape of the image afterwards and we just remind it to TF with set_shape:
The tf.Tensor.set_shape method updates the static shape of a Tensor
object, and it is typically used to provide additional shape
information when this cannot be inferred directly.