getting incompatible shape error in tensorflow functional model - python

I am trying to implement a deep model using tensorflow.keras which contains an embedding layer + Conv1D + 2 BiLstm layers. This is the implementation in sequential mode:
model = models.Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, weights=[embedding_matrix], input_length=limit_on_length, trainable=False))
model.add(layers.Conv1D(50, 4, padding='same', activation='relu'))
model.add(layers.Bidirectional(layers.LSTM(units=200, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
model.add(layers.Bidirectional(layers.LSTM(units=200, return_sequences=False,dropout=0.2, recurrent_dropout=0.2)))
model.add(layers.Dense(len(set(tr_intents)), activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
And I fit the model like this:
model.fit(train_padded, to_categorical(tr_intents), epochs=15, batch_size=32, validation_split=0.1)
Everything goes on very good in this sequential mode. But when I implement the model in functional mode, I get this kind of error:
ValueError: Shapes (32, 22) and (32, 11, 22) are incompatible
And here is my implementation in functional structure:
input_layer = layers.Input(shape=(None,))
x = layers.Embedding(vocab_size, embedding_dim, weights=[embedding_matrix], input_length=limit_on_length, trainable=False)(input_layer)
x = layers.Conv1D(50, 4, padding='same', activation='relu')(x)
x = layers.Bidirectional(layers.LSTM(units=200, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))(x)
x = layers.Bidirectional(layers.LSTM(units=200, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))(x)
intents_out = layers.Dense(n_intents, activation='softmax')(x)
model = models.Model(input_layer, intents_out)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
Can anybody help me with this error? I need to implement the model in functional mode cause I have to add one more output layer to it.
Number of my intents(or labels) is 22 and each sentence has length of 11.

Related

How many layers should I stack in a sequential model?

I am trying to train a sequential model using the LSTM layer.
The size of sequence data for learning is as follows:
x = np.array(sequences)
y = to_categorical(labels).astype(int)
x.shape => (1800, 34, 48)
y.shape => (1800, 20)
After that, I make a sequential model and try to stack the LSTM layer and the dense layer, but I don't know how much to do that.
First, I did something like this:
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=x_train.shape[1:3]))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(actions.shape[0], activation='softmax'))
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['acc'])
model.summary()
However, this doesn't seem to fit my case as I followed someone else's code.
How many layers should I stack in a sequential model?

ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 33714, 12), found shape=(None, 12)

I am trying to run a simple RNN with some data extracted from a csv file. I have already preprocessed my data and split them into train set and validation set, but I get the error above.
This is my network structure and what I tryied so far. My shapes are (33714,12) for x_train, (33714,) for y_train, (3745,12) for x_val and (3745,) for y_val.
model = Sequential()
# LSTM LAYER IS ADDED TO MODEL WITH 128 CELLS IN IT
model.add(LSTM(128, input_shape=x_train.shape, activation='tanh', return_sequences=True))
model.add(Dropout(0.2)) # 20% DROPOUT ADDED FOR REGULARIZATION
model.add(BatchNormalization())
model.add(LSTM(128, input_shape=x_train.shape, activation='tanh', return_sequences=True)) # ADD ANOTHER LAYER
model.add(Dropout(0.1))
model.add(BatchNormalization())
model.add(LSTM(128, input_shape=x_train.shape, activation='tanh', return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(32, activation='relu')) # ADD A DENSE LAYER
model.add(Dropout(0.2))
model.add(Dense(2, activation='softmax')) # FINAL CLASSIFICATION LAYER WITH 2 CLASSES AND SOFTMAX
# ---------------------------------------------------------------------------------------------------
# OPTIMIZER SETTINGS
opt = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE, decay=DECAY)
# MODEL COMPILE
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
# CALLBACKS
tensorboard = TensorBoard(log_dir=f"logs/{NAME}")
filepath = "RNN_Final-{epoch:02d}-{val_acc:.3f}"
checkpoint = ModelCheckpoint("models/{}.model".format(filepath, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')) # save only the best ones
# RUN THE MODEL
history = model.fit(x_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE,
validation_data=(x_val, y_val), callbacks=[tensorboard, checkpoint])
Though it will give you a large value, what may be best to do would be to flatten the one with the larger dimension.
A tensorflow.keras.layers.Flatten() will basically make your output shape the values multiplied, i.e. input: (None, 5, 5) -> Flatten() -> (None, 25)
For your example, this will give you:
(None, 33714,12) -> (None, 404568).
I'm not entirely sure if this will work when you change the shape sizes, but that is how I overcame my issue with incompatible shapes: expected: (None, x), got: (None, y, x).

cnn model for binary classification always returning 1

I created a CNN model for binary classification. I used a balanced database of 300 images. I know it's a small database but I used data augmentation. After fitting the model I got 86% val_accuracy on the validation set, but when I wanted to print the probability for each picture, I got probability 1 for most pictures from the first class and even all probabilities are >0.5, and probability 1 for all images from the second class.
This is my model:
model = keras.Sequential([
layers.InputLayer(input_shape=[128, 128, 3]),
preprocessing.Rescaling(scale=1/255),
preprocessing.RandomContrast(factor=0.10),
preprocessing.RandomFlip(mode='horizontal'),
preprocessing.RandomRotation(factor=0.10),
layers.BatchNormalization(renorm=True),
layers.Conv2D(filters=64, kernel_size=3, activation='relu', padding='same'),
layers.MaxPool2D(),
layers.BatchNormalization(renorm=True),
layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
layers.MaxPool2D(),
layers.BatchNormalization(renorm=True),
layers.Conv2D(filters=256, kernel_size=3, activation='relu', padding='same'),
layers.Conv2D(filters=256, kernel_size=3, activation='relu', padding='same'),
layers.MaxPool2D(),
layers.BatchNormalization(renorm=True),
layers.Flatten(),
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='sigmoid'),])
Edit:
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss='binary_crossentropy',
metrics=['binary_accuracy'],
)
history = model.fit(
ds_train,
validation_data=ds_valid,
epochs=50,
)
Thank you.
A pre-trained model like vgg16 does all the work pretty much well, there is no need to complicate very much the model. So try the following code:
base_model = keras.applications.VGG16(
weights='imagenet',
input_shape=(128, 128, 3),
include_top=False)
base_model.trainable = True
inputs = keras.Input(shape=(128, 128, 3))
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
Set base_model.trainable to False if you want the model to train fast and True for
more accurate results.
Notice that I used the GlobalAveragePooling2D layer, instead of Flatten, to reduce the number of parameters and to unstack the features.

Getting vector obtained in the last layer of CNN before softmax layer

I am trying to implement a system by encoding inputs using CNN. After CNN, I need to get a vector and use it in another deep learning method.
def get_input_representation(self):
# get word vectors from embedding
inputs = tf.nn.embedding_lookup(self.embeddings, self.input_placeholder)
sequence_length = inputs.shape[1] # 56
vocabulary_size = 160 # 18765
embedding_dim = 256
filter_sizes = [3,4,5]
num_filters = 3
drop = 0.5
epochs = 10
batch_size = 30
# this returns a tensor
print("Creating Model...")
inputs = Input(shape=(sequence_length,), dtype='int32')
embedding = Embedding(input_dim=vocabulary_size, output_dim=embedding_dim, input_length=sequence_length)(inputs)
reshape = Reshape((sequence_length,embedding_dim,1))(embedding)
conv_0 = Conv2D(num_filters, kernel_size=(filter_sizes[0], embedding_dim), padding='valid', kernel_initializer='normal', activation='relu')(reshape)
conv_1 = Conv2D(num_filters, kernel_size=(filter_sizes[1], embedding_dim), padding='valid', kernel_initializer='normal', activation='relu')(reshape)
conv_2 = Conv2D(num_filters, kernel_size=(filter_sizes[2], embedding_dim), padding='valid', kernel_initializer='normal', activation='relu')(reshape)
maxpool_0 = MaxPool2D(pool_size=(sequence_length - filter_sizes[0] + 1, 1), strides=(1,1), padding='valid')(conv_0)
maxpool_1 = MaxPool2D(pool_size=(sequence_length - filter_sizes[1] + 1, 1), strides=(1,1), padding='valid')(conv_1)
maxpool_2 = MaxPool2D(pool_size=(sequence_length - filter_sizes[2] + 1, 1), strides=(1,1), padding='valid')(conv_2)
concatenated_tensor = Concatenate(axis=1)([maxpool_0, maxpool_1, maxpool_2])
flatten = Flatten()(concatenated_tensor)
dropout = Dropout(drop)(flatten)
output = Dense(units=2, activation='softmax')(dropout)
model = Model(inputs=inputs, outputs=output)
adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=adam, loss='binary_crossentropy', metrics=['accuracy'])
adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=adam, loss='binary_crossentropy', metrics=['accuracy'])
print("Traning Model...")
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, callbacks=[checkpoint], validation_data=(X_test, y_test)) # starts training
return ??
The above code, trains the model using X_train and Y_train and then tests it. However in my system I do not have Y_train or Y_test, I only need the vector in the last hidden layer before softmax layer. How can I obtain it?
For that you can define a backend function to get the output of arbitrary layer(s):
from keras import backend as K
func = K.function([model.input], [model.layers[index_of_layer].output])
You can find the index of your desired layer using model.summary() where the layers are listed starting from index zero. If you need the layer before the last layer you can use -2 as the index (i.e. .layers attribute is actually a list so you can index it like a list in python). Then you can use the function you have defined by passing a list of input array(s):
outputs = func(inputs)
Alternatively, you can also define a model for this purpose. This has been covered in Keras documentation more thoroughly so I advise you to read that.

Loading converted Keras LSTM model in Tensorflow.js leads to tensor shape error

I have defined a LSTM model in Keras and used tfjs.converters.save_keras_model to convert it to the Tensorflow.js format. But when trying to load the web-friendly model in JS, it results in an error saying that different shapes were expected than being present in the weights file:
BenchmarkDialog.vue:47 Error: Based on the provided shape, [2,128], the tensor should have 256 values but has 139
at m (tf-core.esm.js:17)
at new t (tf-core.esm.js:17)
at Function.t.make (tf-core.esm.js:17)
at ke (tf-core.esm.js:17)
at i (tf-core.esm.js:17)
at Object.kh [as decodeWeights] (tf-core.esm.js:17)
at tf-layers.esm.js:17
at tf-layers.esm.js:17
at Object.next (tf-layers.esm.js:17)
at o (tf-layers.esm.js:17)
The model definiton:
model = Sequential()
model.add(LSTM(
32,
batch_input_shape=(30, 5, 3),
return_sequences=True,
stateful=True,
activation='tanh',
))
model.add(Dropout(0.25))
model.add(LSTM(
32,
return_sequences=True,
stateful=True,
activation='tanh',
))
model.add(Dropout(0.25))
model.add(LSTM(
32,
return_sequences=False,
stateful=True,
activation='tanh',
))
model.add(Dropout(0.25))
model.add(Dense(3, activation='tanh', kernel_initializer='lecun_uniform'))
model.compile(loss='mse', optimizer=Adam())
The tensor in question belongs to the LSTM layers in model.json:
{"name": "lstm_1/kernel", "shape": [2, 128], "dtype": "float32"}
Here's the model.json, the weights file and the original keras model in case they are helpful.
Any ideas on what I'm doing wrong here?

Categories

Resources