I am trying to implement deconvolution in Keras. My model definition is as follows:
model=Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same',
input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3,border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
I want to perform deconvolution or transposed convolution on the output given by the first convolution layer i.e. convolution2d_1.
Lets say the feature map we have after first convolution layer is X which is of (9, 32, 32, 32) where 9 is the no of images of dimension 32x32 I have passed through the layer. The weight matrix of the first layer obtained by get_weights() function of Keras. The dimension of weight matrix is (32, 3, 3, 2).
The code I am using for performing transposed convolution is
conv_out = K.deconv2d(self.x, W, (9,3,32,32), dim_ordering = "th")
deconv_func = K.function([self.x, K.learning_phase()], conv_out)
X_deconv = deconv_func([X, 0 ])
But getting error:
CorrMM shape inconsistency:
bottom shape: 9 32 34 34
weight shape: 3 32 3 3
top shape: 9 32 32 32 (expected 9 3 32 32)
Can anyone please tell me where I am going wrong?
You can easily use Deconvolution2D layer.
Here is what you are trying to achieve:
batch_sz = 1
output_shape = (batch_sz, ) + X_train.shape[1:]
conv_out = Deconvolution2D(3, 3, 3, output_shape, border_mode='same')(model.layers[0].output)
deconv_func = K.function([model.input, K.learning_phase()], [conv_out])
test_x = np.random.random(output_shape)
X_deconv = deconv_func([test_x, 0 ])
But its better to create a functional model which will help both for training and prediction..
batch_sz = 10
output_shape = (batch_sz, ) + X_train.shape[1:]
conv_out = Deconvolution2D(3, 3, 3, output_shape, border_mode='same')(model.layers[0].output)
model2 = Model(model.input, [model.output, conv_out])
model2.summary()
model2.compile(loss=['categorical_crossentropy', 'mse'], optimizer='adam')
model2.fit(X_train, [Y_train, X_train], batch_size=batch_sz)
In Keras, Conv2DTranspose layer perform transposed convolution in other terms deconvolution. It supports both backend lib i.e. Theano & Keras.
Keras Documentation says:
Conv2DTranspose
Transposed convolution layer (sometimes called Deconvolution).
The need for transposed convolutions generally arises from the desire
to use a transformation going in the opposite direction of a normal
convolution, i.e., from something that has the shape of the output of
some convolution to something that has the shape of its input while
maintaining a connectivity pattern that is compatible with said
convolution.
Related
I'm following this tutorial from Nabeel Ahmed to create your own emotion detector using Keras (I'm a noob) and I've found a strange behaviour that I'd like to understand. The input data is a bunch of 48x48 images, each one with an integer value between 0 and 6 (each number stands for an emotion label), which represents the emotion present in the image.
train_X.shape -> (28709, 2304) // training-data, 28709 images of 48x48
train_Y.shape -> (28709,) //The emotion present in each image as an integer, 1 = happiness, 2 = sadness, etc.
val_X.shape -> (3589, 2304)
val_Y.shape -> (3589, )
In order to feed the data into the model, train_X and val_X are reshaped (as the tutorial explains)
train_X.shape -> (28709, 48, 48, 1)
val_X.shape -> (3589, 48, 48, 1)
The model, as it is in the tutorial, is this one:
model = Sequential()
input_shape = (48,48,1)
#1st convolution layer
model.add(Conv2D(64, (5, 5), input_shape=input_shape,activation='relu', padding='same'))
model.add(Conv2D(64, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
#2nd convolution layer
model.add(Conv2D(128, (5, 5),activation='relu',padding='same'))
model.add(Conv2D(128, (5, 5),activation='relu',padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
#3rd convolution layer
model.add(Conv2D(256, (3, 3),activation='relu',padding='same'))
model.add(Conv2D(256, (3, 3),activation='relu',padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(128))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.2))
################################################################
model.add(Dense(7)) # <- problematic line
################################################################
model.add(Activation('softmax'))
my_optimiser = tf.keras.optimizers.Adam(
learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False,
name='Adam')
model.compile(loss='categorical_crossentropy', metrics=['accuracy'],optimizer=my_optimiser)
However, when I try to use it, using the tutorial snippet, I get an error in the line of the validation_data like this
history = model.fit(train_X,
train_Y,
batch_size=64,
epochs=80,
verbose=1,
validation_data=(val_X, val_Y),
shuffle=True)
ValueError: Shapes (None, 1) and (None, 7) are incompatible
After reviewing the code and the documentation about the fit method, my only idea was to change the 7 in the last Dense layer of the model to 1, which mysteriously works. I'd like to know what is happening here if anyone could give me a hint.
You seem to be working with sparse integer labels, where each sample belongs to one of seven classes {0, 1, 2, 3, 4, 5, 6}, so I would recommend using SparseCategoricalCrossentropy instead of CategoricalCrossentropy as your loss function. Just change this parameter and your model should work fine. If you want to use CategoricalCrossentropy, you will have to one-hot encode your labels, for example with:
train_Y = tf.keras.utils.to_categorical(train_Y, num_classes=7)
I am trying to combine CNN and LSTM for image classification.
I tried the following code and I am getting an error. I have 4 classes on which I want to train and test.
Following is the code:
from keras.models import Sequential
from keras.layers import LSTM,Conv2D,MaxPooling2D,Dense,Dropout,Input,Bidirectional,Softmax,TimeDistributed
input_shape = (200,300,3)
Model = Sequential()
Model.add(TimeDistributed(Conv2D(
filters=16, kernel_size=(12, 16), activation='relu', input_shape=input_shape)))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(TimeDistributed(Conv2D(
filters=24, kernel_size=(8, 12), activation='relu')))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(TimeDistributed(Conv2D(
filters=32, kernel_size=(5, 7), activation='relu')))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(Bidirectional(LSTM((10),return_sequences=True)))
Model.add(Dense(64,activation='relu'))
Model.add(Dropout(0.5))
Model.add(Softmax(4))
Model.compile(loss='sparse_categorical_crossentropy',optimizer='adam')
Model.build(input_shape)
I am getting the following error:
"Input tensor must be of rank 3, 4 or 5 but was {}.".format(n + 2))
ValueError: Input tensor must be of rank 3, 4 or 5 but was 2.
I found a lot of problems in the code:
your data are in 4D so simple Conv2D are ok, TimeDistributed is not needed
your output is 2D so set return_sequences=False in the last LSTM cell
your last layers are very messy: no need to put a dropout between a layer output and an activation
you need categorical_crossentropy and not sparse_categorical_crossentropy because your target is one-hot encoded
LSTM expects 3D data. So you need to pass from 4D (the output of convolutions) to 3D. There are two possibilities you can adopt: 1) make a reshape (batch_size, H, W * channel); 2) (batch_size, W, H * channel). In this way, u have 3D data to use inside your LSTM
here a full model example:
def ReshapeLayer(x):
shape = x.shape
# 1 possibility: H,W*channel
reshape = Reshape((shape[1],shape[2]*shape[3]))(x)
# 2 possibility: W,H*channel
# transpose = Permute((2,1,3))(x)
# reshape = Reshape((shape[1],shape[2]*shape[3]))(transpose)
return reshape
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(12, 16), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(filters=24, kernel_size=(8, 12), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(filters=32, kernel_size=(5, 7), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Lambda(ReshapeLayer)) # <========== pass from 4D to 3D
model.add(Bidirectional(LSTM(10, activation='relu', return_sequences=False)))
model.add(Dense(nclasses,activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam')
model.summary()
here the running notebook
I'm trying to identify the sequence of images. I've 2 images and I need to identify the 3rd one. All are color images.
I'm getting below error:
ValueError: Error when checking input: expected
time_distributed_1_input to have 5 dimensions, but got array with
shape (32, 128, 128, 6)
This is my layer:
batch_size = 32
height = 128
width = 128
model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3), activation = 'relu'), input_shape=(batch_size, height, width, 2 * 3)))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(LSTM(256, return_sequences=True, dropout=0.5))
model.add(Conv2D(3, (3, 3), activation='relu', padding='same'))
model.compile(optimizer='adam')
model.summary()
My input images shapes are:
(128, 128, 2*3) [as I'm concatenating 2 input images]
My output image shape is:
(128, 128, 3)
You have applied the conv layer after Flatten(). This causes error because after flattening the data flowing through the Network is no more a 2D object.
I suggest you to keep the convolutional and the recurrent phases separated. First, you apply convolution to images, training the model to extract their relevant features. Later, you push these features into LSTM layers, so that you can capture also the information hidden in their sequence.
Hope this helps, otherwise let me know.
--
EDIT:
According to the error that you get, it seems that you are also not feeding the exact input shape. Keras is saying: "I need 5 dimensions, but you gave me 4". A TimeDistributed() layers needs a shape such as: (sample, time, width, length, channel). Your input lacks time, apparently.
I suggest you to print your model.summary() before running, and check the layer called time_distributed_1_input. That's the one your compiler is upset with.
I have the fllowing model in keras.
model = Sequential()
model.add(Conv2D(4, (3, 3), input_shape=input_shape, name='Conv2D_0', padding = 'same', use_bias=False, activation=None))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(8, (3, 3), name='Conv2D_1', padding='same', use_bias=False, activation=None))
model.add(MaxPooling2D(pool_size=(2, 2)))
input_shape is (32, 32). So, for the first layer, if I have a an image of size (32, 32), I get 4 images of size (32, 32). So the input image is convoluted with 4 diffrent kernels. After the pooling layer, I get 4 images of size (16, 16).
The second convolutional layer gives me 8 images of size (16, 16). This layer has
4*8 kernels. The kernels have the size (3, 3, 4, 8). But I don't get, how the 8 output images are computed.
I thought for example for the first image I can do sth like:
H_i : i-th output image of the first Pooling layer
Ker_i : i-th kernel. (:, :, i, 0)
So the first output image of the second convolutional layer could be:
conv(H_0, ker_0) + conv(H_1, ker_1) + conv(H_2, ker_2) + conv(H_3, ker_3)
But this seems to be wrong.
Can anyone explaine me, how the second conv-layer computes the output images?
Thank you for your help.
I have 1000, 28*28 resolution images. I converted those 1000 images into numpy array and formed a new array with size of (1000,28,28). So, while
creating convolution layer using keras, input shape(X value) is specified as (1000,28,28) and output shape(Y value) as (1000,10). Because I ha
ve 1000 examples are inputs and 10 categories of output.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal',input_shape=(1000,28,28)))
.
.
.
model.fit(train_x,train_y,batch_size=32,epochs=10,verbose=1)
So, while using fit function, it shows ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (1000, 28, 28) as error. Pls help me guys to provide proper input and output dimension for CNN.
Code:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal',input_shape=(4132,28,28)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
model.summary()
train_x = numpy.array([train_x])
model.fit(train_x,train_y,batch_size=32,epochs=10,verbose=1)
You need to change the inputs to 4 dimensions with channel set to 1 : (1000, 28, 28, 1) and you need to change the input_shape of the convolutional layer to (28, 28, 1):
model.add(Conv2D(32, kernel_size=(3, 3),...,input_shape=(28,28,1)))
Your numpy arrays need a fourth dimension, the common standard is to number the samples with the first dimension, so changing (1000, 28, 28) to (1, 1000, 28, 28).
You can read more about this here.
from your input it looks like you are using tensorflow as back end.
In keras the input_shape should always be 3 dimension .
For tensorflow as a backend the input_shape to your model will be
input_shape = [img_height,img_width,channels(depth)]
in your case for tensor flow backend that should be
input_shape = [28,28,1]
and the shape of train_x should be
train_x = [batch_size,img_height,img_width,channels(depth)]
in your case
train_x = [1000,28,28,1]
As you are using a gray scale image,the dimension of the image will be (image_height, image_width) and hence you have to add an extra dimension to the image which will result to (image_height, image_width, 1) the '1' suggests the depth of the image,for gray scale that is '1' and for rgb that is '3'.