what is the use of expand_dims in image processing? - python

I saw a face detection model which consists of the below function. but I could not understand what is the use of the expand_dims function. can anyone explain me what it is and why we are using ?
def get_embedding(model,face_pixels):
face_pixels=face_pixels.astype('float32')
mean, std=face_pixels.mean(),face_pixels.std()
face_pixels=(face_pixels-mean)/std
samples=expand_dims(face_pixels,axis=0)
yhat=model.predict(samples)
return yhat[0]

tf.keras.Conv2D layers expect input with 4D shape:
(n_samples, height, width, channels)
Most libraries that load images will load in 3D like this:
(height, width, channels)
By using np.expand_dims(image, axis=0) or tf.expand_dims(image, axis=0), you add a batch dimension at the beginning, effectively turning your data in the 4D format the Keras needs for Conv2D layers. For instance:
(224, 224, 3)
to:
(1, 224, 224, 3)
If you give Conv2D 3D data, it will give something like this:
ValueError: Error when checking input: expected conv2d_19_input to have 4 dimensions, but got array with shape (60000, 28, 28)

Related

Difference between the input shape for a 1D CNN, 2D CNN and 3D CNN

I'm first time building a CNN model for image classification and i'm a little bit confused about what would be the input shape for each type (1D CNN, 2D CNN, 3D CNN) and how to fix the number of filters in the convolution layer. My data is 100x100x30 where 30 are features.
Here is my essay for the 1D CNN using the Functional API Keras:
def create_CNN1D_model(pool_type='max',conv_activation='relu'):
input_layer = (30,1)
conv_layer1 = Conv1D(filters=16, kernel_size=3, activation=conv_activation)(input_layer)
max_pooling_layer1 = MaxPooling1D(pool_size=2)(conv_layer1)
conv_layer2 = Conv1D(filters=32, kernel_size=3, activation=conv_activation)(max_pooling_layer1)
max_pooling_layer2 = MaxPooling1D(pool_size=2)(conv_layer2)
flatten_layer = Flatten()(max_pooling_layer2)
dense_layer = Dense(units=64, activation='relu')(flatten_layer)
output_layer = Dense(units=10, activation='softmax')(dense_layer)
CNN_model = Model(inputs=input_layer, outputs=output_layer)
return CNN_model
CNN1D = create_CNN1D_model()
CNN1D.compile(loss = 'categorical_crossentropy', optimizer = "adam",metrics = ['accuracy'])
Trace = CNN1D.fit(X, y, epochs=50, batch_size=100)
However, while trying the 2D CNN model by just changing Conv1D, Maxpooling1D to Conv2D and Maxpooling2D, i got the following error :
ValueError: Input 0 of layer conv2d_1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 30, 1)
Can anyone please tell me how would be the input shape for 2D CNN and 3D CNN ? And what can be done on input data preprocessing?
TLDR; your X_train can be looked at as (batch, spatial dims..., channels). A kernel applies to the spatial dimensions for all channels in parallel. So a 2D CNN, would require two spatial dimensions (batch, dim 1, dim 2, channels).
So for (100,100,3) shaped images, you will need a 2D CNN that convolves over 100 height and 100 width, over all the 3 channels.
Lets, understand the above statement.
First, you need to understand what CNN (in general) is doing.
A kernel is convolving through the spatial dimensions of a tensor across its feature maps/channels while performing a simple matrix operation (like dot product) to the corresponding values.
Kernel moves over the spatial dimensions
Now, Let's say you have 100 images (called batches). Each image is 28 by 28 pixels and has 3 channels R, G, B (which are also called feature maps in context to CNNs). If I were to store this data as a tensor, the shape would be (100,28,28,3).
However, I could just have an image that doesn't have any height (may like a signal) OR, I could have data that has an extra spatial dimension such as a video (height, width, and time).
In general, here is how the input for a CNN-based neural network looks like.
Same kernel, all channels
The second key point you need to know is, A 2D kernel will convolve over 2 spatial dimensions BUT the same kernel will do this over all the feature maps/channels. So, if I have a (3,3) kernel. This same kernel will get applied over R, G, B channels (in parallel) and move over the Height and Width of the image.
Operation is a dot product
Finally, the operation (for a single feature map/channel and single convolution window) can be visualized like below.
Therefore, in short -
A kernel gets applied to the spatial dimensions of the data
A kernel shape is equal to the # of spatial dimensions
A kernel applies over all the feature maps/channels at once
The operation is a simple dot product between the kernel and window
Let's take the example of tensors with single feature maps/channels (so, for an image, it would be greyscaled) -
So, with that intuition, we see that if I want to use a 1D CNN, your data must have 1 spatial dimension, which means each sample needs to be 2D (spatial dimension and channels), which means the X_train must be a 3D tensor (batch, spatial dimensions, channels).
Similarly, for a 2D CNN, you would have 2 spatial dimensions (H, W for example) and would be 3D samples (H, W, Channels) and X_train would be (Samples, H, W, Channels)
Let's try this with code -
import tensorflow as tf
from tensorflow.keras import layers
X_2D = tf.random.normal((100,7,3)) #Samples, width/time, channels (feature maps)
X_3D = tf.random.normal((100,5,7,3)) #Samples, height, width, channels (feature maps)
X_4D = tf.random.normal((100,6,6,2,3)) #Samples, height, width, time, channels (feature maps)
For applying 1D CNN -
#With padding = same, the edge pixels are padded to not skip a few
#Out featuremaps = 10, kernel (3,)
cnn1d = layers.Conv1D(10, 3, padding='same')(X_2D)
print(X_2D.shape,'->',cnn1d.shape)
#(100, 7, 3) -> (100, 7, 10)
For applying 2D CNN -
#Out featuremaps = 10, kernel (3,3)
cnn2d = layers.Conv2D(10, (3,3), padding='same')(X_3D)
print(X_3D.shape,'->',cnn2d.shape)
#(100, 5, 7, 3) -> (100, 5, 7, 10)
For 3D CNN -
#Out featuremaps = 10, kernel (3,3)
cnn3d = layers.Conv3D(10, (3,3,2), padding='same')(X_4D)
print(X_4D.shape,'->',cnn3d.shape)
#(100, 6, 6, 2, 3) -> (100, 6, 6, 2, 10)
By a 100x100x30 input shape, are you saying the batch size is 100? Or is each data in a shape of 100x100x30? In the second case, you must use a Conv2D layer instead. Input shapes of each layer are supposed to be:
Conv1D: (size1, channel_number), Conv2D: (size1, size2, channel_number) , Conv3D: (size1, size2, size3, channel_number)
The 1DCNN, 2DCNN, 3DCNN denotes the dimension of each kernel and channel of the convolution layer.

Reshaping 2D Grayscale into 4D for Keras Model Inference

I have a pre-trained Keras model that I need to use to classify a 512x 512 image that is originally in grayscale format. The input to the Keras model should be in the shape (None, 512, 512, 1). .
I executed the following code:
model=load_model('model.h5')
img = Image.open('img.jpg')
img_array = np.array (img)
img_array = img_array/255
model.predict (img_array)
However, I get the following error
Error when checking input: expected input_1 to have 4 dimensions, but
got array with shape (512, 512)
I know that I need to reshape my grayscale image into 4D to match the desired input shape, however, I am not sure how to do this so that the image keeps its original features. How can I make the grayscale image into 4D properly?
Thanks.
try reshaping the array
img_array = img_array.reshape((1, 512, 512, 1))
here 1st and last dimension are batch size and channels respectively

Tensorflow Keras Conv2D error with 2D numpy array input

I would like to train a CNN using a 2D numpy array as input, but I am receiving this error: ValueError: Error when checking input: expected conv2d_input to have 4 dimensions, but got array with shape (21, 21).
My input is indeed a 21x21 numpy array of floats. The first layer of the network is defined as Conv2D(32, (3, 3), input_shape=(21, 21, 1)) to match the shape of the input array.
I have found some similar questions but none pertaining to a 2D input array, they mostly deal with images. According to the documentation, Conv2D is expecting an input of a 4D tensor containing (samples, channels, rows, cols), but I cannot find any documentation explaining the meaning of these values. Similar questions pertaining to image inputs suggest reshaping the input array using np.ndarray.reshape(), but when trying to do that I receive an input error.
How can I train a CNN on such an input array? Should input_shape be a different size tuple?
Your current numpy array has dimensions (21, 21). However, TensorFlow expects input tensors to have dimensions in the format (batch_size, height, width, channels) or BHWC implying that you need to convert your numpy input array to 4 dimensions (from the current 2 dimensions). One way to do so is as follows:
input = np.expand_dims(input, axis=0)
input = np.expand_dims(input, axis=-1)
Now, the numpy input array has dimensions: (1, 21, 21, 1) which can be passed to a TF Conv2D operation.
Hope this helps! :)

Is it possible to feed the pretrained Inception model (tensorflow 2.0/Keras) with 2D grayscale images?

According to Keras 2.0 documentation, in relation to the input shape of the images that can be fed to the pretrained inception model:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (299, 299, 3) (with
'channels_last' data format) or (3, 299, 299) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 75. E.g. (150, 150, 3) would be one
valid value.
However, I am dealing with grayscale image which are 2D. How I should deal with this situation?
You can copy the grayscale image 3 times for a pseudoRGB image
import numpy as np
# img=np.zeros((224,224))
If your image is of shape length 2, only width and height you will first need to add an extra dimension:
img = np.expand_dims(img,-1)
The you repeat this last dimension 3 times:
img = np.repeat(img,3,2)
print(img.shape)
# (224,224,3)

Difference between 3D-tensor and 4D-tensor for images input of DL Keras framework

By convention an image tensor is always 3D : One dimension for its height, one for its width and a third one for its color channel. Its shape looks like (height, width, color).
For instance a batch of 128 color images of size 256x256 could be stored in a 4D-tensor of shape (128, 256, 256, 3). The color channel represents here RGB colors. Another example with batch of 128 grayscale images stored in a 4D-tensor of shape (128, 256, 256, 1). The color could be coded as 8-bit integers.
For the second example, the last dimension is a vector containing only one element. It is then possible to use a 3D-tensor of shape (128, 256, 256,) instead.
Here comes my question : I would like to know if there is a difference between using a 3D-tensor rather than a 4D-tensor as the training input of a deep-learning framework using keras.
EDIT : My input layer is a conv2D
I you take a look at the Keras documentation of the conv2D layer here you will see that the shape of the input tensor must be 4D.
conv2D layer input shape
4D tensor with shape: (batch, channels, rows, cols) if data_format is "channels_first" or 4D tensor with shape: (batch, rows, cols, channels) if data_format is "channels_last".
So the 4th dimension of the shape is mandatory, even if it is only "1" as for a grayscaled image.
So in fact, it is not a matter of performance gain nor lack of simplicity, it's only the mandatory input argument's shape.
Hope it answers your question.

Categories

Resources