Predicting in Stateful LSTMs - python

I have the following Keras model, although it could be generalised to a normal RNN using GRUs.
model = Sequential()
model.add(GRU(40, batch_input_shape=(batch_size, look_back, 1), stateful=True, return_sequences=True))
model.add(GRU(10, batch_input_shape=(batch_size, look_back, features), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# Train model
iter = 10000
for i in range(iter):
model.fit(trainX, trainY, nb_epoch=1, batch_size=batch_size, verbose=0, shuffle=False)
if (i<(iter-1)):
model.reset_states()
testPred = model.predict(testX,batch_size=batch_size)
print(mean_squared_error(testY,testPred))
If I don't have the if statement with regards to resetting the state, the mean squared error has always been higher. Considering that the test set is right after the train set wouldn't it make sense that you would want to preserve the state of the last memory block?
This tutorial seems to suggest otherwise: http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ (i.e. he simply doesn't have that if statement, doesn't explicitly mention anything about keeping the last state).
So just wondering if I am correct about this.

Related

Loss function exhibits strange behavior during training

I am building a Deep Learning model for regression:
model = keras.Sequential([
keras.layers.InputLayer(input_shape=np.shape(X_train)[1:]),
keras.layers.Conv1D(filters=30, kernel_size=3, activation=tf.nn.tanh),
keras.layers.Dropout(0.1),
keras.layers.AveragePooling1D(pool_size=2),
keras.layers.Conv1D(filters=20, kernel_size=3, activation=tf.nn.tanh),
keras.layers.Dropout(0.1),
keras.layers.AveragePooling1D(pool_size=2),
keras.layers.Flatten(),
keras.layers.Dense(30, tf.nn.tanh),
keras.layers.Dense(20, tf.nn.tanh),
keras.layers.Dense(10, tf.nn.tanh),
keras.layers.Dense(3)
])
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
model.fit(
X_train,
Y_train,
epochs=300,
batch_size=32,
validation_split=0.2,
shuffle=True,
callbacks=[early_stopping]
)
During training, the loss function (and MAE) exhibit this strange behavior:
What does this trend indicate? Could it mean that the model is overfitting?
It looks to me that your optimiser changes (decreases) the learning rate at those sudden change curvy points.
I think, There is an issue with your dataset. I have seen that your training and validation losses are precisely the same value, which is practically not possible.
Please check your dataset and shuffle it before splitting.

Model Train Start Point

I would like to set a start point before training cnn. How to set a starting point for a model? Here is my code. Also I wonder if starting point changes each time I retrain the model? Any help is highly appreciated.
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(16, 5, activation='tanh'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(3, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model_path= "sentiment labelled sentences/imdb models/model{epoch:02d}.hdf5"
check=ModelCheckpoint(model_path, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto',save_freq='epoch') #modeli her epoch sonunda kaydet
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
epochs=15,
validation_data=(X_test, y_test),
batch_size=10, callbacks=[check])
As you typed out your neural network model is Sequential(), meaning the first layer is where your model starts (i.e. model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen)))
If you want to have a different input layer, you can just rearrange the lines of code, or code another one.
If you're looking to debug or just start execution midway thru your neural network, I don't think that's possible. This is an issue neural networks have. They are not very interpretable/explainable, meaning they're like a blackbox. You cannot look inside and check out how, for a certain set of datapoints, a NN reaches a prediction. I found this website article expanding on this issue: https://www.altacognita.com/explainability-in-deep-neural-networks-2/

How to apply Attention layer to LSTM model

I am doing a speech emotion recognition machine training.
I wish to apply an attention layer to the model. The instruction page is hard to understand.
def bi_duo_LSTM_model(X_train, y_train, X_test,y_test,num_classes,batch_size=68,units=128, learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2):
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if (logs.get('acc') > 0.95):
print("\nReached 99% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout,return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
# model.add(tf.keras.layers.Bidirectional(LSTM(32)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
SGDopt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0.1, nesterov=False)
model.compile(loss='binary_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose=1,
callbacks=[callbacks])
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
yhat = model.predict(X_test)
return history, yhat
How can I apply it to fit for my model?
And are use_scale, causal and dropout all the arguments?
If there is a dropout in attention layer, how do we deal with it since we have dropout in LSTM layer?
Attention can be interpreted as a soft vector retrieval.
You have some query vectors. For each query, you want to retrieve some
values, such that you compute a weighted of them,
where the weights are obtained by comparing a query with keys (the number of keys must the be same as the number of values and often they are the same vectors).
In sequence-to-sequence models, the query is the decoder state and keys and values are the decoder states.
In classification task, you do not have such an explicit query. The easiest way how to get around this is training a "universal" query that is used to collect relevant information from the hidden states (something similar to what was originally described in this paper).
If you approach the problem as sequence labeling, assigning a label not to an entire sequence, but to individual time steps, you might want to use a self-attentive layer instead.

Machine Learning with Keras: Different Validation Loss for the Same Model

I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?
I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.

different mse result for training set

I get different results for mse. During trainig I get 0.296 after my last training epoch and when I evaluate my model I get 0.112. Does any one know why that is so?
Here is the code:
model = Sequential()
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True,batch_input_shape=(batch_size,look_back,dim_x)))
model.add(Dropout(0.2))
model.add(LSTM(150,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50,return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
history=model.fit(x_train_r, y_train_r, validation_data=(x_test_r, y_test_r),\
epochs=epochs, batch_size=batch_size, callbacks=[es])
score_test = model.evaluate(x_test_r, y_test_r,batch_size=batch_size)
score_train = model.evaluate(x_train_r, y_train_r,batch_size=batch_size)
print("Score Training Data:")
print(score_train)
Batch size and everything stays the same. Does anyone knows why I get so different results for mse?
The reason for the discrepancy between the training loss and the loss obtained on the training data after the training is finished, is the existence of Dropout layer in the model. That's because this layer has different behavior during training and inference time. As I have mentioned in another answer, you can make this behavior the same either by passing training=True to dropout call, or using K.learning_phase() flag and a backend function.

Categories

Resources