I have a series of images (MSL-5 bands image) for 5 locations (a,b,c,d,e) for three three time series years (2020,2021,2022). So I have total 15 images for 5 locations and for 3 years. I have stacked three years images into one for 5 locations. Then my sample size is 5 (5 locations), and images dimension I got (224, 224, 15). Note here: image width=224, image height = 224, and for 5 bands image and for 3 different years because of stacking channels is (3x5)=15.
I have temperature data set for these 3 locations.
I also divided them into training (3 locations data), testing (2 locations data)
Now I want to predict the temperature based on the image and want use 2DCNN-LSTM or Conv2D-LSTM something like that. I'm not sure what will be the actual model for this? what will will be the input shape and what will be the code for this model. If there is anyone who can help me in this regard. Please help me.
If there is anyone who can help me in this regard. Please help me.
The sample size is just an example, but my sample size can be (3 years x20 locations=60) for training and (3 years x10 locations=30) for testing. If I use following code, is it correct, or any suggestion for prediction accuracy for temperature?
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(3,3), activation='relu', padding='same', return_sequences=True, input_shape=(224, 224, 3, 5)))
model.add(BatchNormalization())
model.add(ConvLSTM2D(32, kernel_size=(3, 3), activation = 'relu', padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
model.summary()
Related
:)
I have a Datset of ~16,000 .wav recording from 70 bird species.
I'm training a model using tensorflow to classify the mel-spectrogram of these recordings using Convolution based architectures.
One of the architectures used is simple multi-layer convolutional described below.
The pre-processing phase include:
extract mel-spectrograms and convert to dB Scale
segment audio to 1-second segment (pad with zero Or gaussian noise if residual is longer than 250ms, discard otherwise)
z-score normalization of training data - reduce mean and divide result by std
pre-processing while inference:
same as described above
z-score normalization BY training data - reduce mean (of training) and divide result by std (of training data)
I understand that the output layer's probabilities with sigmoid activation is not suppose to accumulate to 1, But I get many (8-10) very high prediction (~0.999) probabilities. and some is exactly 0.5.
The current test set correct classification rate is ~84%, tested with 10-fold cross validation, So it seems that the the network mostly operates well.
notes:
1.I understand there are similar features in the vocalization of different birds species, but the recieved probabilities doesn't seem to reflect them correctly
2. probabilities for example - a recording of natural noise:
Natural noise: 0.999
Mallard - 0.981
I'm trying to understand the reason for these results, if it's related the the data etc extensive mislabeling (probably not) or from another source.
Any help will be much appreciated! :)
EDIT: I use sigmoid because the probabilities of all classes are necessary, and I don't need them to accumulate to 1.
def convnet1(input_shape, numClasses, activation='softmax'):
# Define the network
model = tf.keras.Sequential()
model.add(InputLayer(input_shape=input_shape))
# model.add(Augmentations1(p=0.5, freq_type='mel', max_aug=2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Flatten())
# model.add(Dense(numClasses, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(numClasses, activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
metrics=['accuracy'],
optimizer=optimizers.Adam(learning_rate=0.001),
run_eagerly=False) # this parameter allows to debug and use regular functions inside layers: print(), save() etc..
return model
For future searches - this problem was solved, and the reason was found(!).
The initial batch size that was used was 256 or 512. reducing the batch size to 16 or 32 SOLVED THE PROBLEM, and now the difference in probabilities are as expected for training AND test set samples - very high for the correct label and very low for other classes.
I am designing a neural network for the classification of resting-state EEG signals. I have preprocessed my data such that each subject is characterized by a table consisting of 111 channels and their readings over 2505 timesteps. As a measure of dimensionality reduction, I clustered the 111 channels into the 10 lobes of the brain, effectively reducing the dimension to (2505,10) per subject. Since this data is 2D, I assume it would be analogous to CNNs for grayscale images.
I have compiled the EEG data for each subject into a dataframe of size (253, 2505, 10), where 253 is the number of subjects. The corresponding ground truth values are stored in a list of size (253,1) with the indices matching those from the dataframe. I want to build a classifier which tells if the subject is ADHD positive or negative. I am stuck on designing the neural network, particularly facing a dimensionality issue when passing a subject to the 1st layer.
#where X=[df0, df1, df2,......, df252] & y=[0,1,0,........,1]
# Model configuration
batch_size = 100
no_epochs = 30
learning_rate = 0.001
no_classes = 2
validation_split = 0.2
verbosity = 1
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Fit data to model
i=0 #validation_data=(X_test, y_test),
X_train = np.array(X_train)
y_train = np.array(y_train)
print("X_train:\t")
print(X_train.shape)
print("y_train:\t")
print(y_train.shape)
history = model.fit(X_train,y_train,
batch_size=batch_size,
epochs=no_epochs,
verbose=verbosity)
ValueError: Input 0 of layer sequential_12 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 2505, 10).
Any help shall be appreciated.
To have a Conv2D model your train data, in an image processing perspective, needs to be of 4 dimension (N_observatoion, nrows, ncolumns, nchannels). Therefore, you have to reshape your features accordingly as per your domain knowledge to make it meaningful:
X_train = np.array(X_train).reshape(253, 2505, 10, 1) or # np.array(X_train).reshape(253, 2505, 1, 10)
# Then models can be defined as following:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape = X_train.shape[1:], padding='same'))
I dont have any experience working with Signal data but what I would like to share that if your channel columns do not have any spatial significance like that in an image pixel then considering a 2D Conv network is not meaningful. For example if among 111 channels putting channel X's data in column 1 or putting channel Y's data in column 1 doesnt have any meaningful difference like that of the opposite in case of an image pixels then your sliders of a conv2D is not getting any significant information. Rather you can consider a conv1D or LSTM networks. for a conv 1D network you dont need 4 dimension X and your corrent 3 dimension X is ok. You can Try:
model = models.Sequential()
model.add(layers.Conv1D(32, 3, activation='relu', input_shape = X_train.shape[1:], padding='same'))
I am getting confused with the filter paramater, which is the first parameter in the Conv2D() layer function in keras. As I understand the filters are supposed to do things like edge detection or sharpening the image or blurring the image, but when I am defining the model as
input_shape = (32, 32, 3)
model = Sequential()
model.add( Conv2D(64, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same') )
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(64, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(128, kernel_size=(5, 5), activation='relu', input_shape=input_shape, strides=(1,1), padding='same'))
model.add(Flatten())
model.add(Dense(3072, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
I am not mentioning the the edge detection or blurring or sharpening anywhere in the Conv2D function. The input images are 32 by 32 RGB images.
So my question is, when I define the Convolution layer as Conv2D(64, ...), does this 64 means 64 different types of filters, such as vertical edge, horizontal edge, etc, which are chosen by keras at random? if so then is the output of the convolution layer (with 64 filters and 5x5 kernel and 1x1 stride) on a 32x32 1-channel image is 64 images of 28x28 size each. How are these 64 images combined to form a single image for further layers?
The filters argument sets the number of convolutional filters in that layer. These filters are initialized to small, random values, using the method specified by the kernel_initializer argument. During network training, the filters are updated in a way that minimizes the loss. So over the course of training, the filters will learn to detect certain features, like edges and textures, and they might become something like the image below (from here).
It is very important to realize that one does not hand-craft filters. These are learned automatically during training -- that's the beauty of deep learning.
I would highly recommend going through some deep learning resources, particularly https://cs231n.github.io/convolutional-networks/ and https://www.youtube.com/watch?v=r5nXYc2wYvI&list=PLypiXJdtIca5sxV7aE3-PS9fYX3vUdIOX&index=3&t=3122s.
Just wanted to clarify what the output shape was.
Although jakub's answer was good, I don't think it addressed the "single image for further layers" part of the question.
I did a model.summary() to find out more.
I found that the shape returned from a Conv2D is (None, img_width, img_height, num_filters)
So when you pass the output of the Conv2D to MaxPooling you are passing that shape which means it is basically passing each entire convoluted image.
The other layers handle this gracefully. MaxPooling2D(2,2) returns the same shape but half the image size (None, img_width / 2, img_height / 2, num_filters).
Side note: I wish the filters was named num_filters because filters seems to imply you're passing in a list of filters in which to convolute the image.
I am training a CNN model on KTH dataset to detect 6 classes of human actions.
Data Processing
Dataset consists of 599 videos, each action has 99-100 videos performed by 25 different persons. I divided the data to 300 videos for train, 98 videos for validation and 200 videos for test set.
I reduced the resolution to 50x50 pixels, so I don't run out of memory while processing.
I exracted 200 frames from the middle of each video.
it normalized the pixels from 0-255 to 0,1.
Finally I one hot encoded to class labels.
Model architecture
This is my model architecture.
And this is the code of the NN layers.
model = Sequential()
model.add(Conv3D(filters=64,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu',
input_shape=X_train.shape[1:]))
model.add(MaxPooling3D(pool_size=2,
strides=(2, 2, 2),
padding='same'))
model.add(Conv3D(filters=128,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu'))
model.add(MaxPooling3D(pool_size=2,
strides=(2, 2, 2),
padding='same'))
model.add(Conv3D(filters=256,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu'))
model.add(Conv3D(filters=256,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu'))
model.add(MaxPooling3D(pool_size=2,
strides=(2, 2, 2),
padding='same'))
model.add(Conv3D(filters=512,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu'))
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
#model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(6, activation='softmax'))
model.summary()
Training
My problem is both training and validation accuracy do not change, and they basically froze from the first epoch. These are the training step.
These are the first 6 epochs and here the last 6 epochs.
The Loss looks like this.
Training loss is very high, and the loss for validation doesn't change.
and the training looks like this.
I am confused, is the model underfitting or overfitting?
How I am gonna fix this problem? will dropout help, since I can't do data augmentation on videos (I assumed that)?
I greatly appreciate any suggestion.
You are using 0-1 values of frames and are using relu. In dying relu problem model is frozen and doesn't learn at all because relu gets maximum values b/w 0 or the weight*input if bias is not added. You can do 2 things to ensure that model does work properly altough I am not sure whether you will get good accuracy or not but can try this to avoid this dying relu problem:-
Use leaky relu with alpha>=0.2.
Do not normalize the frames, instead just convert to grayscale to reduce extensive training.
Don't take 200 frames from middle, divide all videos in equal amount of frame chunks and take 2,3 consecutive frames from each chunk. also try adding more dense layers as they help in classification.
I worked on almost same problem and what I did was to use Conv2d after merging frames together i.e. if you have 10 frames of size 64,64,3 each instead of doing conv3d, I did conv2d on 640,64,3 dataset and resulted in 86% accuracy on 16 classes for videos.
It depends on how you use the 200frames of video as training data to classify an action. Your training data is having too much bias.
Since its a sequential data to be classified, you have to go for memory based architecture or concatenation model.
I am trying to implement the Model from the article (https://arxiv.org/abs/1411.4389) that basically consists of time-distributed CNNs followed by a sequence of LSTMs using Keras with TF.
However, I am having a problem trying to figure out if I should include the TimeDirstibuted function just for my Convolutional & Pooling Layers or also for the LSTMs?
Is there a way to run the CNN Layers in parallel (Based on the number of frames in the sequence that I want to process and based on the number of cores that I have)?
And Last, suppose that each entry is composed of "n" frames (in sequence) where n varies based on the current data entry, what is the best suitable input dimension? and would "n" be the batch size? Is there a way to limit the number of CNNs in // to for example 4 (so that you get an output Y after 4 frames are processed)?
P.S.: The inputs are small videos (i.e. a sequence of frames)
P.S.: The output dimension is irrelevant to my question, so it is not discussed here
Thank you
[Edited]
Sorry, only-a-link-answer was bad. So I try to answer question one by one.
if I should include the TimeDirstibuted function just for my Convolutional & Pooling Layers or also for the LSTMs?
Use TimeDistributed function only for Conv and Pooling layers, no need for LSTMs.
Is there a way to run the CNN Layers in parallel?
No, if you use CPU. It's possible if you utilize GPU.
Transparent Multi-GPU Training on TensorFlow with Keras
what is the best suitable input dimension?
Five. (batch, time, width, height, channel).
Is there a way to limit the number of CNNs in // to for example 4
You can do this in the preprocess by manually aligning frames into a specific number, not in the network. In other words, "time" dimension should be 4 if you want to have output after 4 frames are processed.
model = Sequential()
model.add(
TimeDistributed(
Conv2D(64, (3, 3), activation='relu'),
input_shape=(data.num_frames, data.width, data.height, 1)
)
)
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(TimeDistributed(Conv2D(128, (4,4), activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(256, (4,4), activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
# extract features and dropout
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.5))
# input to LSTM
model.add(LSTM(256, return_sequences=False, dropout=0.5))
# classifier with sigmoid activation for multilabel
model.add(Dense(data.num_classes, activation='sigmoid'))
Reference:
PRI-MATRIX FACTORIZATION - BENCHMARK