Confusion in setting shape and input shape of a sequential Keras model - python

I have a dataset whose scheme is like:
X1 ... X20 C
where the first 20 columns are input data, and the last column is the target one. The dataset includes 2000 record. I want to design a sequential Keras model to classify those target labels (which vary from 1 to 10, thereby being multi-label classification problem). Assuming that I have saved those input data and labels in X_train_1 and y_train_1, Here is my model:
def build_model_1(n_hidden = 1, n_neurons = 30, learning_rate = 3e-3, input_shape = X_train_1.shape):
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
for layer in range(n_hidden):
model.add(tf.keras.layers.Dense(n_neurons, tf.keras.activations.selu,
kernel_initializer="lecun_normal",
kernel_regularizer= tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
model.add(tf.keras.layers.Dense(10, tf.keras.activations.softmax, kernel_initializer="lecun_normal"))
loss = tf.keras.losses.categorical_crossentropy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.9, beta_2=0.999)
metric = [tf.keras.metrics.Accuracy()]
model.compile(loss = loss, optimizer=optimizer, metrics=[metric])
return model
I thought the shape of the input should be that of my training dataset, however when I compile and fit my model, I get the following error:
ValueError: Input 0 of layer sequential_12 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (32, 20)
What am I doing wrong here?

Your input shape is simply 20, since you have 20 features and 2000 samples. You do not have to provide the batch size. Here is a working example:
import tensorflow as tf
import numpy as np
def build_model_1(n_hidden = 1, n_neurons = 30, learning_rate = 3e-3, input_shape = (20,)):
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
for layer in range(n_hidden):
model.add(tf.keras.layers.Dense(n_neurons, tf.keras.activations.selu,
kernel_initializer="lecun_normal",
kernel_regularizer= tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
model.add(tf.keras.layers.Dense(10, tf.keras.activations.softmax, kernel_initializer="lecun_normal"))
loss = tf.keras.losses.categorical_crossentropy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.9, beta_2=0.999)
metric = [tf.keras.metrics.Accuracy()]
model.compile(loss = loss, optimizer=optimizer, metrics=[metric])
return model
train_data = np.random.random((2000, 20))
model = build_model_1()
y = model(train_data)
Also, ask yourself if you are really dealing with a multi-label classification problem. Can a sample from your dataset belong to more than one class, or are the classes mutually exclusive? If the classes are not mutually exclusive, I would recommend changing the activation function for the output layer to sigmoid and changing the loss function to binary_crossentropy. The intuition behind this can be found here.

Related

Why the accuracy is high but the result for confusion matrix is bad?

I have trained a vgg16 model with a total of 1000 images for 5 classes (200 images for each class). I have used data augmentation, stratified K-fold, and dropout to train the model. The train accuracy and val accuracy is good. However, when i do prediction on the trained model with test dataset, the result of confusion matrix is not compatible with the train accuracy.
[Train & Val accuracy[Classification reportConfusion Matrix](https://i.stack.imgur.com/OIX3O.png)](https://i.stack.imgur.com/MAPXC.png)
VGG model:
def create_model():
# (CNN) is a multilayered neural network with a special architecture to detect complex features in data.
# VGG16 = Visual Geometry Group
# 16 = 16 refers to it has 16 layers that have weights
# VGG16 have 128 million parameters
# 3x3 filter with stride 1
# maxpool layer of 2x2 filter of stride 2
# Conv-1 Layer has 64 number of filters
# Conv-2 has 128 filters
# Conv-3 has 256 filters
# Conv 4 and Conv 5 has 512 filters
# import library
from tensorflow.keras.models import Model
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Conv2D
# number of species
NO_CLASSES = 5
# load the VGG16 model as the base model for training
# exclude the fully connected layer
base_model = VGG16(include_top=False, input_shape=(224, 224, 3))
# add layers
x = base_model.output
x = Conv2D(64, (3,3), activation = 'relu')(x) # output layer will be 64x64, (3x3) kernel
x = GlobalAveragePooling2D()(x) # use average pooling as we dont have min pooling which selects the darkest pixels from image (our dataset here is white background)
# add dense layers so that the model can learn more complex functions and classify for netter results
# Dense layer = a layer that is deeply connected with its preceding layer
x = Flatten()(x) # for feeding into fully connected layer as fully connected layer only accept 1D
x = Dense(1024,activation='relu')(x)
x = Dense(1024,activation='relu')(x) # dense layer 2
x = Dense(512,activation='relu')(x) # dense layer 3
x = Dropout(0.2)(x) # reduce dependency between neurons
# final layer with softmax activation for multiclass classification
preds = Dense(NO_CLASSES, activation='softmax')(x)
# layers of the VGG16 model are frozen, bcuz we dont want their weights to changes during model training
# create a new model with the base model's original input and the new model's output
model = Model(inputs = base_model.input, outputs = preds)
# don't train the first 19 layers - 0..18
for layer in model.layers[:19]:
layer.trainable=False
# train the rest of the layers - 19 onwards
for layer in model.layers[19:]:
layer.trainable=True
# compile the model
model.compile(optimizer='Adam', # Adam optimizer -- training cost (low) and performance (high)
loss='categorical_crossentropy', # for multi-class(classes are mutually exclusive) problem
metrics=['accuracy']) # calculate accuracy
return model
Stratified K fold & model fit
from sklearn.model_selection import StratifiedKFold
from statistics import mean, stdev
EPOCHS = 6
histories = []
kfold = StratifiedKFold(n_splits = 5, shuffle=True, random_state=123)
for f, (trn_ind, val_ind) in enumerate(kfold.split(train_dataset.Image_path, train_dataset.labels)):
print(); print("#"*50)
print("Fold: ",f+1)
print("#"*50)
train_ds = datagen.flow_from_dataframe(train_dataset.loc[trn_ind,:],
x_col='Image_path', y_col='labels',
target_size=(width,height),
class_mode = 'categorical', color_mode = 'rgb',
batch_size = 16, shuffle = True)
val_ds = datagen.flow_from_dataframe(train_dataset.loc[val_ind,:],
x_col='Image_path', y_col='labels',
target_size=(width,height),
class_mode = 'categorical', color_mode = 'rgb',
batch_size = 16, shuffle = True)
# Define start and end epoch for each folds
fold_start_epoch = f * EPOCHS
fold_end_epoch = EPOCHS * (f+1)
step_size_train = train_ds.n // train_ds.batch_size
# fit
history=model.fit(train_ds,
initial_epoch=fold_start_epoch ,
epochs=fold_end_epoch,
validation_data=val_ds,
shuffle=True,
steps_per_epoch=step_size_train,
verbose=1)
# store history for each folds
histories.append(history)
Does this happened is because of the dataset itself or the coding problem? I hope to find the mistake.

Im getting ValueError when trying to load my own weights for a transfer learning task

Hi I am trying to do Transfer Learning in Keras and I am trying to load weights into a new model that I have self trained from a different task.
I have trained my own set of weights from another task. This other task, however, is a binary classification problem while my new problem is a multi-label classification problem.
I got my first set of weights doing this:
n_classes = 1
epochs = 100
batch_size = 32
input_shape = (224, 224, 3)
base_model = MobileNetV2(input_shape=input_shape, weights= None, include_top=False)
x = GlobalAveragePooling2D()(base_model.output)
output = Dense(n_classes, activation='sigmoid')(x)
model = tf.keras.models.Model(inputs=[base_model.input], outputs=[output])
opt = optimizers.Adam(lr = 0.001)
model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
...
...
history = model.fit(train_generator, epochs=epochs,
steps_per_epoch=step_size_train,verbose=1,
validation_data=valid_generator,
validation_steps=STEP_SIZE_VALID,
class_weight=class_weights,
)
model.save_weights("initial-weights.h5")
But when I try to load these weights into my new model:
weights_path = 'initial-weights.h5'
n_classes = 14
epochs = 1000
batch_size = 32
input_shape = (224, 224, 3)
base_model = MobileNetV2(input_shape=input_shape, weights= None, include_top=False)
x = GlobalAveragePooling2D()(base_model.output)
output = Dense(n_classes, activation='sigmoid')(x)
model = tf.keras.models.Model(inputs=[base_model.input], outputs=[output])
opt = optimizers.Adam(lr = 0.001)
model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
model.load_weights(weights_path)
I get the following error:
ValueError: Shapes (1280, 14) and (1280, 1) are incompatible
I understand that based on the error, it is very likely to be due to the difference in the number of classes, but from what I know about transfer learning, it is possible to transfer weights from different tasks even if the number of classes are different (like how ImageNet weights are used for tasks that have different number of classes).
How do I initialize my own set of custom weights that are trained from a different task that has a different number of classes?
I think that the best approach is to transfer the weights for all layers except the last (ie. the feature extraction part). Then you can freeze all the transferred weights, and train the the model again, where only the weights on the last layer (ie. classification layer) will be trained.

How to get a 2D shape ready for a Bi-LSTM in Keras

I've got a 2D numpy matrix (from a DataFrame) of already condensed word vectors (I used a max pooling technique, am trying to compare a logres to a bi-LSTM approach), and I'm not sure how to prepare it to use it in a keras model.
I'm aware of the need of a 3D tensor for the Bi-LSTM model, and have tried googling solutions, but couldn't find a solution that worked.
This is what I have right now:
# Set model parameters
epochs = 4
batch_size = 32
input_shape = (1, 10235, 3072)
# Create the model
model = Sequential()
model.add(Bidirectional(LSTM(64, return_sequences = True, input_shape = input_shape)))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))
# Try using different optimizers and different optimizer configs
model.compile('adam', 'binary_crossentropy', metrics = ['accuracy'])
# Fit the training set over the model and correct on the validation set
model.fit(inputs['X_train'], inputs['y_train'],
batch_size = batch_size,
epochs = epochs,
validation_data = [inputs['X_validation'], inputs['y_validation']])
# Get score over the test set
return model.evaluate(inputs['X_test'], inputs['y_test'])
I currently got the following error:
ValueError: Input 0 is incompatible with layer bidirectional_23: expected ndim=3, found ndim=2
The shape of my training data (inputs['X_train']) is (10235, 3072).
Thanks so much!
I've made it work with the suggestion of the reply by doing the following:
Remove return_sequence = True;
Apply the following transformations to the X sets: np.reshape(inputs[dataset], (inputs[dataset].shape[0], inputs[dataset].shape[1], 1))
Change the input shape of the LSTM layer to (10235, 3072, 1) which is the shape of X_train.

LSTM Autoencoder for time series prediction

I am trying to build an LSTM Autoencoder to predict Time Series data. Since I am new to Python I have mistakes in the decoding part. I tried to build it up like here and Keras. I could not understand the difference between the given examples at all. The code that I have right now looks like:
Question 1: is how to choose the batch_size and input_dimension when each sample has 2000 values?
Question 2: How to get this LSTM Autoencoder working (the model and the prediction) ? This ist just the model, but how to predict? That it is predicting from the lets say starting from sample 10 on till the end of the data?
Mydata has in total 1500 samples, I would go with 10 time steps (or more if better), and each sample has 2000 Values. If you need more information I would include them as well later.
trainX = np.reshape(data, (1500, 10,2000))
from keras.layers import *
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
parameter
timesteps=10
input_dim=2000
units=100 #choosen unit number randomly
batch_size=2000
epochs=20
Model
inpE = Input((timesteps,input_dim))
outE = LSTM(units = units, return_sequences=False)(inpE)
encoder = Model(inpE,outE)
inpD = RepeatVector(timesteps)(outE)
outD1 = LSTM(input_dim, return_sequences=True)(outD
decoder = Model(inpD,outD)
autoencoder = Model(inpE, outD)
autoencoder.compile(loss='mean_squared_error',
optimizer='rmsprop',
metrics=['accuracy'])
autoencoder.fit(trainX, trainX,
batch_size=batch_size,
epochs=epochs)
encoderPredictions = encoder.predict(trainX)
The LSTM model that I use is this one:
def get_model(n_dimensions):
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(n_dimensions, return_sequences=False, name="encoder")(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True, name='decoder')(decoded)
autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
return autoencoder, encoder
autoencoder, encoder = get_model(n_dimensions)
autoencoder.compile(optimizer='rmsprop', loss='mse',
metrics=['acc', 'cosine_proximity'])
history = autoencoder.fit(x, x, batch_size=100, epochs=100)
encoded = encoder.predict(x)
It works with the data that have, x is of size (3000, 180, 40), that is 3000 samples, timesteps=180 and input_dim=40.

Does this Keras Conv1D model correctly represent the intended architecture?

I'm new to Keras and am trying to use a 1D convolutional neural network (CNN) for multi-class classification. I've created a simple model and want to check that it correctly represents my desired architecture.
My input data is a numpy array of shape (number_of_samples, number of features), where number_of_samples = 3541 and number_of_features = 144. There are 277 classes and I've used one-hot encoding to represent the targets as an array of shape (number_of_samples,number_of_features). My desired architecture is shown in the picture below:
The code for my model (which I've run without any issues) is as follows:
# Variables:
############
num_features = 144
num_classes = 277
units = num_classes
input_dim = 1
num_filters = 1
kernel_size = 3
# Reshape training data and labels:
###################################
# inital training_data has shape (3541, 144)
training_data_reshaped = np.atleast_3d(training_data) # (has shape 3541, 144, 1)
# inital labels vector has shape (3541, 1)
new_labels_binary = to_categorical(labels) # One-hot encoding of class labels
# Build, compile and fit model:
###############################
model = Sequential()
# A 1D convolutional layer which applies 1 output filter with a window size (length) of 3 and
# a (default) stride length of 1
model.add(Conv1D(filters = num_filters,
kernel_size = kernel_size,
activation = 'relu',
input_shape=(num_features, input_dim)))
model.add(Flatten())
# Output layer
model.add(Dense(units=units))
sgd = optimizers.SGD()
model.compile(optimizer = sgd,
loss = 'categorical_crossentropy')
model.fit(x = training_data_reshaped,
y = new_labels_binary,
batch_size = batch_size)
print(model.summary())
Does my code correctly represent my desired architecture? In particular:
My aim is that each of the 142 neurons in the output of the convolutional layer is connected to each of the 277 neurons in the model output layer, and that, on input sample x, the vector output by the output layer is compared to row x of new_labels_binary. From what I understand of the Keras docs, this model should do just that, but I'm checking because I'm new to this and the docs were sometimes ambiguous!
I don't mean this to be vague: is there anything in my model which is not (quite) correct given my desired architecture? I just want to make sure I'm not missing anything!
Thanks in advance.
The structure looks fine to me but if you want to solve a multi label classification task the output layer should normally have a softmax activation.
model.add(Dense(units=units,activation='softmax'))
If you dont specify the activation for a Dense layer a linear activation is applied.

Categories

Resources