The Input Shape for ConvLSTM model for image prediction

The Input Shape for ConvLSTM model for image prediction - python

I am trying to make an image prediction by using ConvLSTM model. But I am having trouble with understanding the data set that I should feed into my neural network.
I have searched on the internet and found examples like "Precipitation Prediction Using ConvLSTM" and some other models where ConvLSTM model has been used to make prediction. For the first trial I have chosen the model provided in the precipitation problem in order to see how the model will react to my data set.
def fn_get_model_convLSTM_tframe_5():
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(7, 7),
input_shape=(None, 101, 101, 1), padding='same', return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
kernel_initializer='glorot_uniform', unit_forget_bias=True,
dropout=0.3, recurrent_dropout=0.3, go_backwards=True ))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=32, kernel_size=(7, 7), padding='same', return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
kernel_initializer='glorot_uniform', unit_forget_bias=True,
dropout=0.4, recurrent_dropout=0.3, go_backwards=True ))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=32, kernel_size=(7, 7), padding='same', return_sequences=True,
activation='tanh', recurrent_activation='hard_sigmoid',
kernel_initializer='glorot_uniform', unit_forget_bias=True,
dropout=0.4, recurrent_dropout=0.3, go_backwards=True ))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=32, kernel_size=(7, 7), padding='same', return_sequences=False,
activation='tanh', recurrent_activation='hard_sigmoid',
kernel_initializer='glorot_uniform', unit_forget_bias=True,
dropout=0.4, recurrent_dropout=0.3, go_backwards=True ))
model.add(BatchNormalization())
model.add(Conv2D(filters=1, kernel_size=(1, 1),
activation='sigmoid',
padding='same', data_format='channels_last'))
print(model.summary())
return model
The entire code can be found here:
https://github.com/TeaPearce/precipitation-prediction-convLSTM-keras/blob/master/precip_v09.py
As I understood the image fed into the model had the shape of 101x101 with 4 color channels. The same is the case for my dataset (I am not giving all the details regarding my dataset as I am quite sure ConvLSTM model is suitable for my dataset and for running out of redundant discussion).
I have the images of size 35x45 with four color channels. In total, I have 35 sequences and each sequence contains 140 sequential images. Additionally I have 1 more sequence on which I should do prediction as it contains 120 images (I need to fulfill it till the 140) I have tried to reshape my dataset and fit it into array of size (35, 140, 35, 45, 4). So know I have array so called trainx for training dataset. But the thing is that I am confused about the labels that I should provide (Even I am not sure whether I should provide or not as we are talking about sequential images).
Can anyone please help me on preparing this dataset for neural network and explain the datashape that I should provide clearly

input_shape=(None, 101, 101, 1)
the input shape is 101x101 with 1 channels
If you have images with 4 channel, take input shape as (None, 101, 101, 4)

Related

Incompatibility between input and final Dense Layer (Value Error)

I'm following this tutorial from Nabeel Ahmed to create your own emotion detector using Keras (I'm a noob) and I've found a strange behaviour that I'd like to understand. The input data is a bunch of 48x48 images, each one with an integer value between 0 and 6 (each number stands for an emotion label), which represents the emotion present in the image.
train_X.shape -> (28709, 2304) // training-data, 28709 images of 48x48
train_Y.shape -> (28709,) //The emotion present in each image as an integer, 1 = happiness, 2 = sadness, etc.
val_X.shape -> (3589, 2304)
val_Y.shape -> (3589, )
In order to feed the data into the model, train_X and val_X are reshaped (as the tutorial explains)
train_X.shape -> (28709, 48, 48, 1)
val_X.shape -> (3589, 48, 48, 1)
The model, as it is in the tutorial, is this one:
model = Sequential()
input_shape = (48,48,1)
#1st convolution layer
model.add(Conv2D(64, (5, 5), input_shape=input_shape,activation='relu', padding='same'))
model.add(Conv2D(64, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
#2nd convolution layer
model.add(Conv2D(128, (5, 5),activation='relu',padding='same'))
model.add(Conv2D(128, (5, 5),activation='relu',padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
#3rd convolution layer
model.add(Conv2D(256, (3, 3),activation='relu',padding='same'))
model.add(Conv2D(256, (3, 3),activation='relu',padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(128))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.2))
################################################################
model.add(Dense(7)) # <- problematic line
################################################################
model.add(Activation('softmax'))
my_optimiser = tf.keras.optimizers.Adam(
learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False,
name='Adam')
model.compile(loss='categorical_crossentropy', metrics=['accuracy'],optimizer=my_optimiser)
However, when I try to use it, using the tutorial snippet, I get an error in the line of the validation_data like this
history = model.fit(train_X,
train_Y,
batch_size=64,
epochs=80,
verbose=1,
validation_data=(val_X, val_Y),
shuffle=True)
ValueError: Shapes (None, 1) and (None, 7) are incompatible
After reviewing the code and the documentation about the fit method, my only idea was to change the 7 in the last Dense layer of the model to 1, which mysteriously works. I'd like to know what is happening here if anyone could give me a hint.

You seem to be working with sparse integer labels, where each sample belongs to one of seven classes {0, 1, 2, 3, 4, 5, 6}, so I would recommend using SparseCategoricalCrossentropy instead of CategoricalCrossentropy as your loss function. Just change this parameter and your model should work fine. If you want to use CategoricalCrossentropy, you will have to one-hot encode your labels, for example with:
train_Y = tf.keras.utils.to_categorical(train_Y, num_classes=7)

How do i access data after model.add(Flatten()) layer?

I am trying to use CNN for feature extraction and XGboost for classification of a image data. I researched and found that it could be done by extracting the data after the convolution layers. I found some source code for similar problem and tried on my own.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), strides=(1,1), padding='same', activation="relu", input_shape = data.shape[1:]))
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
model.add(Conv2D(64, kernel_size=(3,3), padding='same', strides=(1,1), activation="relu") )
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2))) #max pool window 2x2
model.add(Conv2D(128, kernel_size=(3,3), padding='same', strides=(1,1), activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2))) #max pool window 2x2
model.add(Conv2D(256, kernel_size=(3,3), padding='same', strides=(1,1), activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2))) #max pool window 2x2
model.add(Flatten())
model.add(Dense(128, activation="relu", name='firstDenseLayer'))
model.add(Dense(1, activation="sigmoid"))
# model.summary()
# print(model)
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(data, label, batch_size=16, epochs=10, validation_data=(val_data, val_label))
Below i accessed the dense layer named "firstDenseLayer".
import xgboost as xgb
from keras.models import Model
layerName = 'firstDenseLayer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layerName).output)
intermediate_output = intermediate_layer_model.predict(data)
from xgboost import XGBClassifier
xgbmodel = XGBClassifier(objective='multi:softmax', num_class= 2)
xgbmodel.fit(intermediate_output, label)
xgbmodel.score(intermediate_output, label)
As i am new in this, i have several confusions.
How the data is being flowed. After i extract the features of the pictures via convolution layers, how do i actually access the data from there?
What is this line of code doing? What data is it extracting?
intermediate_output = intermediate_layer_model.predict(data)
When i omit(keep commented out) the below line,
model.fit(data, label, batch_size=16, epochs=10, validation_data=(val_data, val_label))
from the first snippet and run the XGboost model directly the XGboost gives low accuracy and when i don't it gives higher accuracy. Why is it being like that?
Kindly help me out. I am stuck with this for quite a while. I am just trying to access the extracted features data from the last convolution layer and use that data to do classification using XGboost. As i tried to follow the method that i found from online, i am not sure if it is the the only way of doing it. If there is another way kindly let me know.

The model.fit(...) line does what you would expect, it trains the convnet defined by model on some data and labels. Your classifier yielding lower accuracy when you're using randomly initialized weights (i.e. without running fit) is not surprising.
intermediate_layer_model is constructed as a keras model whose output is the dense layer just before the output of model. Note the name parameter given to the dense layer in the construction of model.
You could just as easily give a name to one of the Conv2D layers and access it the same way. Alternatively, you could store the layer in a python variable, i.e. instead of
model.add(Conv2D(256, kernel_size=(3,3), padding='same', strides=(1,1), activation="relu"))
in the model construction it could say
last_conv_layer = Conv2D(256, kernel_size=(3,3), padding='same', strides=(1,1), activation="relu")
model.add(last_conv_layer)
Then for the intermediate_layer_model you put
intermediate_layer_model = Model(inputs=model.input, outputs=last_conv_layer.output)

Combining CNN and bidirectional LSTM

I am trying to combine CNN and LSTM for image classification.
I tried the following code and I am getting an error. I have 4 classes on which I want to train and test.
Following is the code:
from keras.models import Sequential
from keras.layers import LSTM,Conv2D,MaxPooling2D,Dense,Dropout,Input,Bidirectional,Softmax,TimeDistributed
input_shape = (200,300,3)
Model = Sequential()
Model.add(TimeDistributed(Conv2D(
filters=16, kernel_size=(12, 16), activation='relu', input_shape=input_shape)))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(TimeDistributed(Conv2D(
filters=24, kernel_size=(8, 12), activation='relu')))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(TimeDistributed(Conv2D(
filters=32, kernel_size=(5, 7), activation='relu')))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(Bidirectional(LSTM((10),return_sequences=True)))
Model.add(Dense(64,activation='relu'))
Model.add(Dropout(0.5))
Model.add(Softmax(4))
Model.compile(loss='sparse_categorical_crossentropy',optimizer='adam')
Model.build(input_shape)
I am getting the following error:
"Input tensor must be of rank 3, 4 or 5 but was {}.".format(n + 2))
ValueError: Input tensor must be of rank 3, 4 or 5 but was 2.

I found a lot of problems in the code:
your data are in 4D so simple Conv2D are ok, TimeDistributed is not needed
your output is 2D so set return_sequences=False in the last LSTM cell
your last layers are very messy: no need to put a dropout between a layer output and an activation
you need categorical_crossentropy and not sparse_categorical_crossentropy because your target is one-hot encoded
LSTM expects 3D data. So you need to pass from 4D (the output of convolutions) to 3D. There are two possibilities you can adopt: 1) make a reshape (batch_size, H, W * channel); 2) (batch_size, W, H * channel). In this way, u have 3D data to use inside your LSTM
here a full model example:
def ReshapeLayer(x):
shape = x.shape
# 1 possibility: H,W*channel
reshape = Reshape((shape[1],shape[2]*shape[3]))(x)
# 2 possibility: W,H*channel
# transpose = Permute((2,1,3))(x)
# reshape = Reshape((shape[1],shape[2]*shape[3]))(transpose)
return reshape
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(12, 16), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(filters=24, kernel_size=(8, 12), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(filters=32, kernel_size=(5, 7), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Lambda(ReshapeLayer)) # <========== pass from 4D to 3D
model.add(Bidirectional(LSTM(10, activation='relu', return_sequences=False)))
model.add(Dense(nclasses,activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam')
model.summary()
here the running notebook

ValueError while reshaping array to 4dim Keras API format

DL beginner here. I'm trying to implement LeNet using Keras and apply it on good ol' MNIST.
class LeNet:
#staticmethod
def build(height, width, depth, classes):
model= Sequential()
inputshape= (height,width, depth)
if K.image_data_format()== 'channels_first':
inputshape= (depth, height, width)
#build model
model.add(Conv2d(20, (5,5), padding= "same", input_shape= inputshape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size= (2,2), strides=(2,2)))
#replicate what's above again
model.add(Conv2d(50, (5,5), padding= "same"))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size= (2,2), strides=(2,2)))
#Fully Connected Layers
model.add(Flatten())
model.add(Dense(500))
model.add(Activation('relu'))
#softmax
model.add(Dense(classes))
model.add(Activation('softmax'))
return model
Then I load the dataset
print('[INFO] Accessing MNIST...')
dataset= loadmat('mnist-original.mat')
data=dataset['data']
Now I'm trying to reshape the data whose current shape is (784, 70000) by
data= data.reshape(data.shape[0], 28, 28, 1)
But I get an error that says ValueError: cannot reshape array of size 54880000 into shape (784,28,28,1)
Where am I going wrong here? Please help. Thanks.

Keras ConvLSTM2D: why use the averagepooling3d and how to to regression

i have been studying Keras ConvLSTM2D: ValueError on output layer
i want to use the same code but i want to do regression ( single value ).
I dont know how to do this. And i also dont understand the use of last layers of this post code. Why is averagepolling3d used?
the code from link is
model = Sequential()
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
input_shape=(None, 135, 240, 1),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(
filters=40,
kernel_size=(3, 3),
padding='same',
return_sequences=True))
model.add(BatchNormalization())
model.add(AveragePooling3D((1, 135, 240)))
model.add(Reshape((-1, 40)))
model.add(Dense(
units=9,
activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
optimizer='adadelta'
)

AveragePooling3D is used to reduce each frame in a sequence to a single value + to reduce the #parameters in the Dense Layer. So, the dimension becomes (None, 40 , 1 , 1 ,1 ). Then, using Reshape allows it to use for fully-connected part.
Also, as in Keras ConvLSTM2D: ValueError on output layer, AveragePooling3D is used instead of GlobalMaxPooling2D since data is 5D and Global operations leaves only (batch_size, channels) which is not desirable in your case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

The Input Shape for ConvLSTM model for image prediction - python

input_shape=(None, 101, 101, 1) the input shape is 101x101 with 1 channels If you have images with 4 channel, take input shape as (None, 101, 101, 4)

Related

Incompatibility between input and final Dense Layer (Value Error)

How do i access data after model.add(Flatten()) layer?

Combining CNN and bidirectional LSTM

ValueError while reshaping array to 4dim Keras API format

Keras ConvLSTM2D: why use the averagepooling3d and how to to regression

Categories

Resources