interpreting get_weight in LSTM model in keras - python

This is my simple reproducible code:
from keras.callbacks import ModelCheckpoint
from keras.models import Model
from keras.models import load_model
import keras
import numpy as np
SEQUENCE_LEN = 45
LATENT_SIZE = 20
VOCAB_SIZE = 100
inputs = keras.layers.Input(shape=(SEQUENCE_LEN, VOCAB_SIZE), name="input")
encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(VOCAB_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
x = np.random.randint(0, 90, size=(10, SEQUENCE_LEN,VOCAB_SIZE))
y = np.random.normal(size=(10, SEQUENCE_LEN, VOCAB_SIZE))
NUM_EPOCHS = 1
checkpoint = ModelCheckpoint(filepath='checkpoint/{epoch}.hdf5')
history = autoencoder.fit(x, y, epochs=NUM_EPOCHS,callbacks=[checkpoint])
and here is my code to have a look at the weights in the encoder layer:
for epoch in range(1, NUM_EPOCHS + 1):
file_name = "checkpoint/" + str(epoch) + ".hdf5"
lstm_autoencoder = load_model(file_name)
encoder = Model(lstm_autoencoder.input, lstm_autoencoder.get_layer('encoder_lstm').output)
print(encoder.output_shape[1])
weights = encoder.get_weights()[0]
print(weights.shape)
for idx in range(encoder.output_shape[1]):
token_idx = np.argsort(weights[:, idx])[::-1]
here print(encoder.output_shape) is (None,20) and print(weights.shape) is (100, 80).
I understand that get_weight will print the weight transition after the layer.
The part I did not get based on this architecture is 80. what is it?
And, are the weights here the weight that connect the encoder layer to the decoder? I meant the connection between encoder and the decoder.
I had a look at this question here. as it is only simple dense layers I could not connect the concept to the seq2seq model.
Update1
What is the difference between:
encoder.get_weights()[0] and encoder.get_weights()[1]?
the first one is (100,80) and the second one is (20,80) like conceptually?
any help is appreciated:)

The encoder as you have defined it is a model, and it consists of two layers: an input layer and the 'encoder_lstm' layer which is the bidirectional LSTM layer in the autoencoder. So its output shape would be the output shape of 'encoder_lstm' layer which is (None, 20) (because you have set LATENT_SIZE = 20 and merge_mode="sum"). So the output shape is correct and clear.
However, since encoder is a model, when you run encoder.get_weights() it would return the weights of all the layers in the model as a list. The bidirectional LSTM consists of two separate LSTM layers. Each of those LSTM layers has 3 weights: the kernel, the recurrent kernel and the biases. So encoder.get_weights() would return a list of 6 arrays, 3 for each of the LSTM layers. The first element of this list, as you have stored in weights and is subject of your question, is the kernel of one of the LSTM layers. The kernel of an LSTM layer has a shape of (input_dim, 4 * lstm_units). The input dimension of 'encoder_lstm' layer is VOCAB_SIZE and its number of units is LATENT_SIZE. Therefore, we have (VOCAB_SIZE, 4 * LATENT_SIZE) = (100, 80) as the shape of kernel.

Related

Why the accuracy is high but the result for confusion matrix is bad?

I have trained a vgg16 model with a total of 1000 images for 5 classes (200 images for each class). I have used data augmentation, stratified K-fold, and dropout to train the model. The train accuracy and val accuracy is good. However, when i do prediction on the trained model with test dataset, the result of confusion matrix is not compatible with the train accuracy.
[Train & Val accuracy[Classification reportConfusion Matrix](https://i.stack.imgur.com/OIX3O.png)](https://i.stack.imgur.com/MAPXC.png)
VGG model:
def create_model():
# (CNN) is a multilayered neural network with a special architecture to detect complex features in data.
# VGG16 = Visual Geometry Group
# 16 = 16 refers to it has 16 layers that have weights
# VGG16 have 128 million parameters
# 3x3 filter with stride 1
# maxpool layer of 2x2 filter of stride 2
# Conv-1 Layer has 64 number of filters
# Conv-2 has 128 filters
# Conv-3 has 256 filters
# Conv 4 and Conv 5 has 512 filters
# import library
from tensorflow.keras.models import Model
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Conv2D
# number of species
NO_CLASSES = 5
# load the VGG16 model as the base model for training
# exclude the fully connected layer
base_model = VGG16(include_top=False, input_shape=(224, 224, 3))
# add layers
x = base_model.output
x = Conv2D(64, (3,3), activation = 'relu')(x) # output layer will be 64x64, (3x3) kernel
x = GlobalAveragePooling2D()(x) # use average pooling as we dont have min pooling which selects the darkest pixels from image (our dataset here is white background)
# add dense layers so that the model can learn more complex functions and classify for netter results
# Dense layer = a layer that is deeply connected with its preceding layer
x = Flatten()(x) # for feeding into fully connected layer as fully connected layer only accept 1D
x = Dense(1024,activation='relu')(x)
x = Dense(1024,activation='relu')(x) # dense layer 2
x = Dense(512,activation='relu')(x) # dense layer 3
x = Dropout(0.2)(x) # reduce dependency between neurons
# final layer with softmax activation for multiclass classification
preds = Dense(NO_CLASSES, activation='softmax')(x)
# layers of the VGG16 model are frozen, bcuz we dont want their weights to changes during model training
# create a new model with the base model's original input and the new model's output
model = Model(inputs = base_model.input, outputs = preds)
# don't train the first 19 layers - 0..18
for layer in model.layers[:19]:
layer.trainable=False
# train the rest of the layers - 19 onwards
for layer in model.layers[19:]:
layer.trainable=True
# compile the model
model.compile(optimizer='Adam', # Adam optimizer -- training cost (low) and performance (high)
loss='categorical_crossentropy', # for multi-class(classes are mutually exclusive) problem
metrics=['accuracy']) # calculate accuracy
return model
Stratified K fold & model fit
from sklearn.model_selection import StratifiedKFold
from statistics import mean, stdev
EPOCHS = 6
histories = []
kfold = StratifiedKFold(n_splits = 5, shuffle=True, random_state=123)
for f, (trn_ind, val_ind) in enumerate(kfold.split(train_dataset.Image_path, train_dataset.labels)):
print(); print("#"*50)
print("Fold: ",f+1)
print("#"*50)
train_ds = datagen.flow_from_dataframe(train_dataset.loc[trn_ind,:],
x_col='Image_path', y_col='labels',
target_size=(width,height),
class_mode = 'categorical', color_mode = 'rgb',
batch_size = 16, shuffle = True)
val_ds = datagen.flow_from_dataframe(train_dataset.loc[val_ind,:],
x_col='Image_path', y_col='labels',
target_size=(width,height),
class_mode = 'categorical', color_mode = 'rgb',
batch_size = 16, shuffle = True)
# Define start and end epoch for each folds
fold_start_epoch = f * EPOCHS
fold_end_epoch = EPOCHS * (f+1)
step_size_train = train_ds.n // train_ds.batch_size
# fit
history=model.fit(train_ds,
initial_epoch=fold_start_epoch ,
epochs=fold_end_epoch,
validation_data=val_ds,
shuffle=True,
steps_per_epoch=step_size_train,
verbose=1)
# store history for each folds
histories.append(history)
Does this happened is because of the dataset itself or the coding problem? I hope to find the mistake.

Confusion in setting shape and input shape of a sequential Keras model

I have a dataset whose scheme is like:
X1 ... X20 C
where the first 20 columns are input data, and the last column is the target one. The dataset includes 2000 record. I want to design a sequential Keras model to classify those target labels (which vary from 1 to 10, thereby being multi-label classification problem). Assuming that I have saved those input data and labels in X_train_1 and y_train_1, Here is my model:
def build_model_1(n_hidden = 1, n_neurons = 30, learning_rate = 3e-3, input_shape = X_train_1.shape):
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
for layer in range(n_hidden):
model.add(tf.keras.layers.Dense(n_neurons, tf.keras.activations.selu,
kernel_initializer="lecun_normal",
kernel_regularizer= tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
model.add(tf.keras.layers.Dense(10, tf.keras.activations.softmax, kernel_initializer="lecun_normal"))
loss = tf.keras.losses.categorical_crossentropy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.9, beta_2=0.999)
metric = [tf.keras.metrics.Accuracy()]
model.compile(loss = loss, optimizer=optimizer, metrics=[metric])
return model
I thought the shape of the input should be that of my training dataset, however when I compile and fit my model, I get the following error:
ValueError: Input 0 of layer sequential_12 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (32, 20)
What am I doing wrong here?
Your input shape is simply 20, since you have 20 features and 2000 samples. You do not have to provide the batch size. Here is a working example:
import tensorflow as tf
import numpy as np
def build_model_1(n_hidden = 1, n_neurons = 30, learning_rate = 3e-3, input_shape = (20,)):
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
for layer in range(n_hidden):
model.add(tf.keras.layers.Dense(n_neurons, tf.keras.activations.selu,
kernel_initializer="lecun_normal",
kernel_regularizer= tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization(momentum=0.999))
model.add(tf.keras.layers.Dense(10, tf.keras.activations.softmax, kernel_initializer="lecun_normal"))
loss = tf.keras.losses.categorical_crossentropy
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, beta_1=0.9, beta_2=0.999)
metric = [tf.keras.metrics.Accuracy()]
model.compile(loss = loss, optimizer=optimizer, metrics=[metric])
return model
train_data = np.random.random((2000, 20))
model = build_model_1()
y = model(train_data)
Also, ask yourself if you are really dealing with a multi-label classification problem. Can a sample from your dataset belong to more than one class, or are the classes mutually exclusive? If the classes are not mutually exclusive, I would recommend changing the activation function for the output layer to sigmoid and changing the loss function to binary_crossentropy. The intuition behind this can be found here.

Error when checking target: expected time_distributed_6 to have 3 dimensions, but got array with shape (200, 80)

I was trying to implement a sequence tagging model with the LSTM. Just for example I have 200 sentences in which, each token has a 1024-dim embedding. I also padded all the sentences to 80-dim vectors. So, I have an input matrix with shape (200,80,1024).
I also padded the targets. I have for each token of the sentences a tag. So the shape of my y is (200,80).
I tried with the LSTM in this way
from keras.models import Model, Input
from keras.layers.merge import add
from keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Bidirectional, Lambda
max_len = 80
input_text = Input(shape=(max_len,1024), dtype=tf.float32)
x = Bidirectional(LSTM(units=512, return_sequences=True,
recurrent_dropout=0.2, dropout=0.2))(input_text)
x_rnn = Bidirectional(LSTM(units=512, return_sequences=True,
recurrent_dropout=0.2, dropout=0.2))(x)
x = add([x, x_rnn]) # residual connection to the first biLSTM
out = TimeDistributed(Dense(n_tags, activation="softmax"))(x)
model = Model(input_text, out)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
history = model.fit(np.array(full_embeddings), y,batch_size=32, epochs=10, verbose=1)
but I get this error:
ValueError: Error when checking target: expected time_distributed_6 to have 3 dimensions, but got array with shape (200, 80)
Could anyone explain me the problem? I'm quite new to Keras and Neural Nets and I'm not able to understand the reason.
Thanks
Since your final output layer is time distributed, it has 3 dimensions. Your targets, y should also have 3 dimensions. Reshape your y
y = np.expand_dims(y, -1)
to have a shape of (200, 80, 1)

Keras Functional API Multi Input Layer

How do I define a multi input layer using Keras Functional API? Below is an example of the neural network I want to build. There are three input nodes. I want each node to be a 1 dimensional numpy array of different lengths.
Here's what I have so far. Basically I want to define an input layer with multiple input tensors.
from keras.layers import Input, Dense, Dropout, concatenate
from keras.models import Model
x1 = Input(shape =(10,))
x2 = Input(shape =(12,))
x3 = Input(shape =(15,))
input_layer = concatenate([x1,x2,x3])
hidden_layer = Dense(units=4, activation='relu')(input_layer)
prediction = Dense(1, activation='linear')(hidden_layer)
model = Model(inputs=input_layer,outputs=prediction)
model.summary()
The code gives the error.
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("x1_1:0", shape=(?, 10), dtype=float32) at layer "x1". The following previous layers were accessed without issue: []
Later when I fit the model I will pass in a list of 1D numpy arrays with the corresponding lengths.
The inputs must be your Input() layers:
model = Model(inputs=[x1, x2, x3],outputs=prediction)
Change
model = Model(inputs=input_layer,outputs=prediction)
to
model = Model(inputs=[x1, x2, x3],outputs=prediction)

Does this Keras Conv1D model correctly represent the intended architecture?

I'm new to Keras and am trying to use a 1D convolutional neural network (CNN) for multi-class classification. I've created a simple model and want to check that it correctly represents my desired architecture.
My input data is a numpy array of shape (number_of_samples, number of features), where number_of_samples = 3541 and number_of_features = 144. There are 277 classes and I've used one-hot encoding to represent the targets as an array of shape (number_of_samples,number_of_features). My desired architecture is shown in the picture below:
The code for my model (which I've run without any issues) is as follows:
# Variables:
############
num_features = 144
num_classes = 277
units = num_classes
input_dim = 1
num_filters = 1
kernel_size = 3
# Reshape training data and labels:
###################################
# inital training_data has shape (3541, 144)
training_data_reshaped = np.atleast_3d(training_data) # (has shape 3541, 144, 1)
# inital labels vector has shape (3541, 1)
new_labels_binary = to_categorical(labels) # One-hot encoding of class labels
# Build, compile and fit model:
###############################
model = Sequential()
# A 1D convolutional layer which applies 1 output filter with a window size (length) of 3 and
# a (default) stride length of 1
model.add(Conv1D(filters = num_filters,
kernel_size = kernel_size,
activation = 'relu',
input_shape=(num_features, input_dim)))
model.add(Flatten())
# Output layer
model.add(Dense(units=units))
sgd = optimizers.SGD()
model.compile(optimizer = sgd,
loss = 'categorical_crossentropy')
model.fit(x = training_data_reshaped,
y = new_labels_binary,
batch_size = batch_size)
print(model.summary())
Does my code correctly represent my desired architecture? In particular:
My aim is that each of the 142 neurons in the output of the convolutional layer is connected to each of the 277 neurons in the model output layer, and that, on input sample x, the vector output by the output layer is compared to row x of new_labels_binary. From what I understand of the Keras docs, this model should do just that, but I'm checking because I'm new to this and the docs were sometimes ambiguous!
I don't mean this to be vague: is there anything in my model which is not (quite) correct given my desired architecture? I just want to make sure I'm not missing anything!
Thanks in advance.
The structure looks fine to me but if you want to solve a multi label classification task the output layer should normally have a softmax activation.
model.add(Dense(units=units,activation='softmax'))
If you dont specify the activation for a Dense layer a linear activation is applied.

Categories

Resources