Keras - Tune a sequential model by testing all the possible hyper parameters - python

I'm working on a simple Keras sequential model and I'm trying to test different combinations of hyperparameters but is there a way to try all the possible combinations of these hyperparameters automatically which provides me the best combinations?
Here's my keras model:
model = Sequential()
input_neurons = 70
model.add(LSTM(input_neurons, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(LeakyReLU(alpha=0.5))
model.add(Dropout(0.1))
model.add(Dense(1))
optimizer = RMSprop(learning_rate=0.00134)
model.compile(loss=loss_func, optimizer=optimizer)
history = model.fit(
train_X,
train_y,
epochs=200, batch_size=72,
validation_data=(test_X, test_y),
verbose=2, shuffle=False)

Yes, you can try hyperas andtalos for example, but there are other too. Just look up automatic hyperparameter optimization and you will surely find more results.

Related

Oscillatory behavior of the train versus validation loss during LSTM model

I am working on time series classification using LSTM model. Here is the architecture:
np.random.seed(16)
python_random.seed(17)
tf.random.set_seed(18)
model =Sequential()
model.add(LSTM(128, input_shape = (50, 5),return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss=tfa.losses.SigmoidFocalCrossEntropy()
, metrics=[tf.keras.metrics.AUC(name='auc'),tf.keras.metrics.binary_accuracy,tf.keras.metrics.Recall()]
, optimizer=adam)
np.random.seed(25)
python_random.seed(26)
tf.random.set_seed(27)
keras_callbacks = [
EarlyStopping(monitor='val_loss', patience=20, mode='min'),
ModelCheckpoint('1LSTM_4_4_2022.h5', monitor='val_loss', save_best_only=True, mode='min')
]
history=model.fit(X_train, y_train, batch_size=256,verbose=1, validation_data=(X_val,
y_val), epochs=100,class_weight=class_weights,callbacks=keras_callbacks)
I used Early stopping to avoid overfitting. However, I don't understand why I am seeing this oscillatory loss function. My dataset is severely imbalanced with imbalance ratio 11500: 1. I used class_weight to handle class imbalance. Class distribution was same in train-validation-test data. How can I explain this loss function?
However, it was alright in ROC-AUC plot. I don't know what I am missing here. I appreciate your explanations.

Loss function exhibits strange behavior during training

I am building a Deep Learning model for regression:
model = keras.Sequential([
keras.layers.InputLayer(input_shape=np.shape(X_train)[1:]),
keras.layers.Conv1D(filters=30, kernel_size=3, activation=tf.nn.tanh),
keras.layers.Dropout(0.1),
keras.layers.AveragePooling1D(pool_size=2),
keras.layers.Conv1D(filters=20, kernel_size=3, activation=tf.nn.tanh),
keras.layers.Dropout(0.1),
keras.layers.AveragePooling1D(pool_size=2),
keras.layers.Flatten(),
keras.layers.Dense(30, tf.nn.tanh),
keras.layers.Dense(20, tf.nn.tanh),
keras.layers.Dense(10, tf.nn.tanh),
keras.layers.Dense(3)
])
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
model.fit(
X_train,
Y_train,
epochs=300,
batch_size=32,
validation_split=0.2,
shuffle=True,
callbacks=[early_stopping]
)
During training, the loss function (and MAE) exhibit this strange behavior:
What does this trend indicate? Could it mean that the model is overfitting?
It looks to me that your optimiser changes (decreases) the learning rate at those sudden change curvy points.
I think, There is an issue with your dataset. I have seen that your training and validation losses are precisely the same value, which is practically not possible.
Please check your dataset and shuffle it before splitting.

feature importance in neural network (classification problem)

i have a classification problem and i need to find important features.
my code as follow:
model = Sequential()
model.add(Dropout(0.99, input_dim=len(X_train.columns)))
model.add(Dense(100, activation='relu',name='layer1'))
model.add(Dense(1, activation='sigmoid',name='layer'))
print(model.summary())
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model=model.fit(X_train, y_train,validation_data = (X_val, y_val), epochs=100, batch_size=1,)
how can i find important features? as you can see in the code it is not a regression problem.

How to apply Attention layer to LSTM model

I am doing a speech emotion recognition machine training.
I wish to apply an attention layer to the model. The instruction page is hard to understand.
def bi_duo_LSTM_model(X_train, y_train, X_test,y_test,num_classes,batch_size=68,units=128, learning_rate=0.005, epochs=20, dropout=0.2, recurrent_dropout=0.2):
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if (logs.get('acc') > 0.95):
print("\nReached 99% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout,return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(LSTM(units, dropout=dropout, recurrent_dropout=recurrent_dropout)))
# model.add(tf.keras.layers.Bidirectional(LSTM(32)))
model.add(Dense(num_classes, activation='softmax'))
adamopt = tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
RMSopt = tf.keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-6)
SGDopt = tf.keras.optimizers.SGD(lr=learning_rate, momentum=0.9, decay=0.1, nesterov=False)
model.compile(loss='binary_crossentropy',
optimizer=adamopt,
metrics=['accuracy'])
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(X_test, y_test),
verbose=1,
callbacks=[callbacks])
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
yhat = model.predict(X_test)
return history, yhat
How can I apply it to fit for my model?
And are use_scale, causal and dropout all the arguments?
If there is a dropout in attention layer, how do we deal with it since we have dropout in LSTM layer?
Attention can be interpreted as a soft vector retrieval.
You have some query vectors. For each query, you want to retrieve some
values, such that you compute a weighted of them,
where the weights are obtained by comparing a query with keys (the number of keys must the be same as the number of values and often they are the same vectors).
In sequence-to-sequence models, the query is the decoder state and keys and values are the decoder states.
In classification task, you do not have such an explicit query. The easiest way how to get around this is training a "universal" query that is used to collect relevant information from the hidden states (something similar to what was originally described in this paper).
If you approach the problem as sequence labeling, assigning a label not to an entire sequence, but to individual time steps, you might want to use a self-attentive layer instead.

Machine Learning with Keras: Different Validation Loss for the Same Model

I am trying to use keras to train a simple feedforward network. I tried two different methods of what I think is the same network, but one is performing significantly better. The first one and the better performing one is the following:
inputs = keras.Input(shape=(384,))
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(384)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="simple_model")
model.compile(loss='mse',optimizer='Adam')
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and it settles on a validation loss of about 0.2. The second model performs much worse:
model = keras.models.Sequential()
model.add(Dense(64, input_shape=(384,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(384, activation='relu'))
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer)
history = model.fit(X_train,
y_train_tf,
epochs=20,
validation_data=(X_test, y_test),
steps_per_epoch=100,
validation_steps=50)
and this has validation loss of around 5. But when I do model.summary, they look virtually the same. Is there something wrong with the second model?
I am not sure that they are the same since second model has relu activation after last layer (384 units) and first doesn't. This might be the issue since default activation of the Keras dense layer is None.

Categories

Resources