Is it a good learning rate for Adam method? - python

I am trying to estimate systolic blood pressure. I put PPG features (27) to the ANN. I got the result as below. Is it a good learning rate? If not, is it high or low? This is my result.
I set the learning rate for 0.000001. I think its still too high. It decreases too fast i think.
loss: 5.1285 - mse: 57.7257 - val_loss: 6.0154 - val_mse: 73.9671
# import data
data = pandas.read_csv("data.csv", sep=",")
data = data[["cp", "st", "dt", "sw10", "dw10", "sw10+dw10", "dw10/sw10", "sw25", "dw25",
"sw25+dw25", "dw25/sw25", "sw33", "dw33", "sw33+dw33", "dw33/sw33", "sw50",
"dw50", "sw50+dw50", "dw50/sw50", "sw66", "dw66", "sw66+dw66", "dw66/sw66",
"sw75", "dw75", "sw75+dw75", "dw75/sw75", "sys"]]
# data description
described_data = data.describe()
print(described_data)
print(len(data))
# # histograms of input data (features)
# data.hist(figsize=(12, 10))
# plt.show()
# index and shuffle data
data.reset_index(inplace=True, drop=True)
data = data.reindex(numpy.random.permutation(data.index))
# x (parameters) and y (blood pressure) data
predict = "sys"
X = numpy.array(data.drop([predict], 1))
y = numpy.array(data[predict])
# Splitting the total data into subsets: 90% - training, 10% - testing
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.1, random_state=0)
def feature_normalize(X): # standardization function
mean = numpy.mean(X, axis=0)
std = numpy.std(X, axis=0)
return (X - mean) / std
# Features scaling
X_train_standardized = feature_normalize(X_train)
X_test_standardized = feature_normalize(X_test)
# Build the ANN model
model = Sequential()
# Adding the input layer and the first hidden layer
model.add(Dense(25, activation='sigmoid', input_dim=27))
# Adding the second hidden layer
model.add(Dense(units=15, activation='sigmoid'))
# Adding the output layer
model.add(Dense(units=1, activation='linear', kernel_initializer='normal'))
model.summary()
optimizer = keras.optimizers.Adam(learning_rate=0.000001)
# Compiling the model
model.compile(loss='mae', optimizer='adam', metrics=['mse'])
#Early stopping to prevent overfitting
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=10, verbose=1, mode='auto',
restore_best_weights=True)
# Fitting the ANN to the Training set
history = model.fit(X_train_standardized, y_train, validation_split=0.2, verbose=2, epochs=1000, batch_size=5)
data
loss
prediction

Your learning rate is not being used because you don't compile the model with your optimizer instance.
# Compiling the model
model.compile(loss='mae', optimizer='adam', metrics=['mse'])
Should be:
# Compiling the model
model.compile(loss='mae', optimizer=optimizer, metrics=['mse'])
Concerning the question itself: As The Half-Blood Prince mentioned, it's hard to say without knowing your data set. Furthermore, the landscape of the data itself is important. I would really suggest the following though:
Consider putting your features within the range (0,1), which can be done with sklearn.preprocess.MinMaxScaler.
Instead of the graduate approach of deciding on your hyper-parameters, optimize them according to your validation data, and test the final result on a hold-out test set. Hyper-parameter optimization is so easy with skopt.

Related

Deep learning accuracy changes

Every time I change the dataset, it gives a different accuracy. Sometimes it gives 97%, 50%, and 92%. It is a text classification. Why does this happen? The other 95% comes from 2 datasets that are the same size and give almost the same result.
#Split DatA
X_train, X_test, label_train, label_test = train_test_split(X, Y, test_size=0.2,random_state=42)
#Size of train and test data:
print("Training:", len(X_train), len(label_train))
print("Testing: ", len(X_test), len(label_test))
#Function defined to test the models in the test set
def test_model(model, epoch_stop):
model.fit(X_test
, Y_test
, epochs=epoch_stop
, batch_size=batch_size
, verbose=0)
results = model.evaluate(X_test, Y_test)
return results
#############3
maxlen = 300
#Bidirectional LSTM model
embedding_dim = 100
dropout = 0.5
opt = 'adam'
####################
#embed_dim = 128 #dimension of the word embedding vector for each word in a sequence
lstm_out = 196 #no of lstm layers
lstm_model = Sequential()
#Adding dropout
#lstm_model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))##############################
lstm_model = Sequential()
lstm_model.add(layers.Embedding(input_dim=num_words,
output_dim=embedding_dim,
input_length=X_train.shape[1]))
#lstm_model.add(Bidirectional(LSTM(lstm_out, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
#lstm_model.add(Bidirectional(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2)))
#lstm_model.add(Bidirectional(LSTM(64, return_sequences=True)))
lstm_model.add(Bidirectional(LSTM(64, return_sequences=True)))
lstm_model.add(layers.GlobalMaxPool1D())
#Adding a regularized dense layer
lstm_model.add(layers.Dense(32,kernel_regularizer=regularizers.l2(0.001),activation='relu'))
lstm_model.add(layers.Dropout(0.25))
lstm_model.add(Dense(3,activation='softmax'))
lstm_model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(lstm_model.summary())
#TRANING
history = lstm_model.fit(X_train, label_train,
epochs=4,
verbose=True,**strong text**
validation_data=(X_test, label_test),
batch_size=64)
loss, accuracy = lstm_model.evaluate(X_train, label_train, verbose=True)
print("Training Accuracy: {:.4f}".format(accuracy))
loss_val, accuracy_val = lstm_model.evaluate(X_test, label_test, verbose=True)
print("Testing Accuracy: {:.4f}".format(accuracy_val))
ML models will base their predictions on the data previously trained on, it is only natural that the outcome will differ in case the training data is changed. Also it might be the case that a different dataset may perform better using different hyperparameters.

Keras sequential model results not reproducible with wildly inconsistent results on same dataset and parameters optimized using Optuna

I am running a Keras sequential model as a regressor with tensorflow backend. I am using Optuna to optimize it's hyper-paramters and reducing the rmse in the Optuna optimizer.
However, when I re-create the Keras model with the best parameters from Optuna and use the same dataset for re-fitting and predicting as the one used in the Optuna objective function, I get wildly inconsistent results.
I'm aware that neural nets are stochastic in nature with an element of randomness. In order to make it deterministic I tried setting the seeds for both numpy and tensorflow in the following manner at beginning of my script, but it doesn't work,
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(2)
Following is my code and the output-
def create_model(trial):
n_layers = trial.suggest_int("layers_number", 4, 8)#4
model = keras.Sequential()
for i in range(n_layers):
num_hidden = trial.suggest_int("n_units_l_{}".format(i), 10, 16)
activation = trial.suggest_categorical('activation_l_{}'.format(i), ['linear'])#, 'relu', 'sigmoid', 'tanh', 'elu'
model.add(layers.Dense(num_hidden, activation=activation, kernel_initializer = 'uniform'))
dropout = trial.suggest_uniform("dropout_l_{}".format(i), 0.1, 0.4)
model.add(layers.Dropout(dropout))
model.add(layers.Dense(1, activation='linear'))
lr = trial.suggest_loguniform("lr", 1e-5, 1e-1)
model.compile(
loss='mean_squared_error',
optimizer=keras.optimizers.Adam(lr=lr),
metrics=['mse']
)
return model
def objective(trial):
keras.backend.clear_session()
model = create_model(trial)
epochs = trial.suggest_int("epochs", 3, 4)#50
batch = trial.suggest_int("batch", 1, 2)
model.fit(
X_train.values,
y_train.values,
batch_size=batch,
epochs=epochs,
verbose=0,
shuffle=False
)
y_pred_test = model.predict(X_test)
test_copy['pred_scaled'] = y_pred_test
rmse = inverse_transform(test_copy, y_pred_test, df_copy) #inverse transforms the transformed target and calculates rmse
return rmse
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=2)
Output- best trial screenshot
RMSE of best trial is 110.90926282554379
Refitting and predicting using best params.
def KerasRegressor(parameters):
print(parameters)
model = keras.Sequential()
layers_number = int(parameters['layers_number'])
for i in range(layers_number):
model.add(layers.Dense(int(parameters['n_units_l_' + str(i)]), activation=parameters['activation_l_' + str(i)], kernel_initializer = 'uniform'))
model.add(layers.Dropout(int(parameters['dropout_l_' + str(i)])))
model.add(layers.Dense(1, activation='linear'))
model.compile(
loss='mean_squared_error',
optimizer=keras.optimizers.Adam(lr=float(parameters['lr'])),
metrics=['mse'])
return model
params = study.best_trial.params
epochs = params['epochs']
batch = params['batch']
del params['epochs']
del params['batch']
seed(1)
tensorflow.random.set_seed(2)
model = KerasRegressor(params)
model.fit(X_train.values, y_train.values, epochs=epochs, batch_size=batch, shuffle=False)
y_pred_test = model.predict(X_test)
test_copy['pred_scaled'] = y_pred_test
rmse = inverse_transform(test_copy, y_pred_test, df_copy)#inverse transforms the transformed target and calculates rmse
print(rmse)
New rmse on same dataset as used in Optuna objective function with best hyperparameters-
New rmse - 227892.23560327655
Small differences in rmse are acceptable but not this large a difference.
I have a different approach. I save the model to a file every time optuna finds the best metric. During prediction I just load the model file to predict the test.
If you really want to debug, have a method to find the randomness in your system like fix seed (you did that), fit same data and ordering, use same layers etc. use same param then test it. Run again - fix seed, fit ..., test it. Are the 2 tests results the same? Run multiple test, are the tests the same.

why my loss and accuracy plots are slightly shaky?

I built a Bi-LSTM model, which tries to predict certain categories based on a given word. For example, the word "smile" should be predicted by "friendly".
However, after training, the model with 100 samples per 10 categories (1000 in total), at the time of plotting the accuracy and loss, these two are slightly shaky continuously. Why does this occur? Increasing the number of samples causes underfitting.
Model
def build_model(vocab_size, embedding_dim=64, input_length=30):
print('\nbuilding the model...\n')
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=(vocab_size + 1), output_dim=embedding_dim, input_length=input_length),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units, return_sequences=True, dropout=0.2)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units, return_sequences=True, dropout=0.2)),
tf.keras.layers.GlobalMaxPool1D(),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(64, activation='tanh', kernel_regularizer=tf.keras.regularizers.L2(l2=0.01)),
# softmax output layer
tf.keras.layers.Dense(10, activation='softmax')
])
# optimizer & loss
opt = 'RMSprop' #tf.optimizers.Adam(learning_rate=1e-4)
loss = 'categorical_crossentropy'
# Metrics
metrics = ['accuracy', 'AUC','Precision', 'Recall']
# compile model
model.compile(optimizer=opt,
loss=loss,
metrics=metrics)
model.summary()
return model
training
def train(model, x_train, y_train, x_validation, y_validation,
epochs, batch_size=32, patience=5,
verbose=2, monitor_es='accuracy', mode_es='auto', restore=True,
monitor_mc='val_accuracy', mode_mc='max'):
# callback
early_stopping = tf.keras.callbacks.EarlyStopping(monitor=monitor_es,
verbose=1, mode=mode_es, restore_best_weights=restore,
min_delta=1e-3, patience=patience)
model_checkpoint = tf.keras.callbacks.ModelCheckpoint('tfjsmode.h5', monitor=monitor_mc, mode=mode_mc,
verbose=1, save_best_only=True)
keras_callbacks = [early_stopping, model_checkpoint]
# train model
history = model.fit(x_train, y_train,
batch_size=batch_size, epochs=epochs, verbose=verbose,
validation_data=(x_validation, y_validation),
callbacks=keras_callbacks)
return history
ACCURACY & LOSS
BATCH SIZE
Currently the batch size is set to 16, if I increase the batch size to 64 with 2500 samples per category, the final plots will will result in underfitting.
As pointed out in the comments the smaller the batch size the more variance of the mean for the batches which then appear in more fluctuation in the loss. I typically use a batch size of 80 since I have a fairly large memory capacity. You are using the ModelCheckpoint callback and saving the model with the best validation accuracy. It is better to save the model with the lowest validation loss. You say increasing the number of samples leads to under fitting. That seems rather strange. Usually more samples results in better accuracy.

Issues with Keras load_model function

I am building a CNN in Keras using a Tensorflow backend for speaker identification, and currently I am attempting to train the model and then save it in as an .hdf5 file. The program trains the model for 100 epochs with early stopping and checkpoints, saving only the best model to a file, as illustrated in the code below:
class BuildModel:
# Create First Model in Ensemble
def createModel(self, model_input, n_outputs, first_session=True):
if first_session != True:
model = load_model('SI_ideal_model_fixed.hdf5')
return model
# Define Input Layer
inputs = model_input
# Define Densely Connected Layers
conv = Dense(16, activation='relu')(inputs)
conv = Dense(64, activation='relu')(conv)
conv = Dense(16, activation='relu')(conv)
conv = Reshape((conv.shape[1]*conv.shape[2]*conv.shape[3],))(conv)
outputs = Dense(n_outputs, activation='softmax')(conv)
# Create Model
model = Model(inputs, outputs)
model.summary()
return model
# Train the Model
def evaluateModel(self, x_train, x_val, y_train, y_val, num_classes, first_session=True):
# Model Parameters
verbose, epochs, batch_size, patience = 1, 100, 64, 10
# Determine Input and Output Dimensions
x = x_train[0].shape[0] # Number of MFCC rows
y = x_train[0].shape[1] # Number of MFCC columns
c = 1 # Number of channels
# Create Model
inputs = Input(shape=(x, y, c), name='input')
model = self.createModel(model_input=inputs,
n_outputs=num_classes,
first_session=first_session)
# Compile Model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Callbacks
es = EarlyStopping(monitor='val_loss',
mode='min',
verbose=verbose,
patience=patience,
min_delta=0.0001) # Stop training at right time
mc = ModelCheckpoint('SI_ideal_model_fixed.hdf5',
monitor='val_accuracy',
verbose=verbose,
save_best_only=True,
mode='max') # Save best model after each epoch
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=patience//2,
min_lr=1e-3) # Reduce learning rate once learning stagnates
# Evaluate Model
model.fit(x_train, y=y_train, epochs=epochs,
callbacks=[es,mc,reduce_lr], batch_size=batch_size,
validation_data=(x_val, y_val))
accuracy = model.evaluate(x=x_train, y=y_train,
batch_size=batch_size,
verbose=verbose)
# Load Best Model
model = load_model('SI_ideal_model_fixed.hdf5')
return (accuracy[1], model)
However, it appears that the load_model function is not working properly since the model achieved a validation accuracy of 0.56193 after the first training session but then only started with a validation accuracy of 0.2508 at the beginning of the second training session. (From what I have seen, the first epoch of the second training session should have a validation accuracy much closer to the that of the best model.)
Moreover, I then attempted to test the trained model on a set of unseen samples with model.predict, and it failed on all six, often with high probabilities, which leads me to believe that it was using minimally trained (or untrained) weights.
So, my question is could this be an issue from loading and saving the models using the load_model and ModelCheckpoint functions? If so, what is the best alternative method? If not, what are some good troubleshooting tips for improving the model's prediction functionality?
I am not sure what you mean by training session. What I would do is first train for a few epochs epochs and note the validation accuracy. Then, load the model and use evaluate() to get the same accuracy. If it differs, then yes something is wrong with your loading. Here is what I would do:
def createModel(self, model_input, n_outputs):
# Define Input Layer
inputs = model_input
# Define Densely Connected Layers
conv = Dense(16, activation='relu')(inputs)
conv2 = Dense(64, activation='relu')(conv)
conv3 = Dense(16, activation='relu')(conv2)
conv4 = Reshape((conv.shape[1]*conv.shape[2]*conv.shape[3],))(conv3)
outputs = Dense(n_outputs, activation='softmax')(conv4)
# Create Model
model = Model(inputs, outputs)
return model
# Train the Model
def evaluateModel(self, x_train, x_val, y_train, y_val, num_classes, first_session=True):
# Model Parameters
verbose, epochs, batch_size, patience = 1, 100, 64, 10
# Determine Input and Output Dimensions
x = x_train[0].shape[0] # Number of MFCC rows
y = x_train[0].shape[1] # Number of MFCC columns
c = 1 # Number of channels
# Create Model
inputs = Input(shape=(x, y, c), name='input')
model = self.createModel(model_input=inputs,
n_outputs=num_classes)
# Compile Model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Callbacks
es = EarlyStopping(monitor='val_loss',
mode='min',
verbose=verbose,
patience=patience,
min_delta=0.0001) # Stop training at right time
mc = ModelCheckpoint('SI_ideal_model_fixed.h5',
monitor='val_accuracy',
verbose=verbose,
save_best_only=True,
save_weights_only=False) # Save best model after each epoch
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=patience//2,
min_lr=1e-3) # Reduce learning rate once learning stagnates
# Evaluate Model
model.fit(x_train, y=y_train, epochs=5,
callbacks=[es,mc,reduce_lr], batch_size=batch_size,
validation_data=(x_val, y_val))
model.evaluate(x=x_val, y=y_val,
batch_size=batch_size,
verbose=verbose)
# Load Best Model
model2 = load_model('SI_ideal_model_fixed.h5')
model2.evaluate(x=x_val, y=y_val,
batch_size=batch_size,
verbose=verbose)
return (accuracy[1], model)
The two evaluations should print the same thing really.
P.S. TF might change the order of your computations so I used different names to prevent that in the model e.g. conv1, conv2 ...)

Adding prior belief into a neural Network

I am busy with a classification problem, with three classes. One of the classes is never predicted/classified. I would like to know if there s anyway to inject a prior belief into my neural network, be design or not.
My football prediction model predicts [Draws , Home Win , Away Win]. My classes are pretty balanced (40% , 30 % , 30%). The class [Draw] that accounts for 40% of the data is the one the my NN never predicts. My dataset contains 1900 samples.
I am using a deep NN with 2 to 4 hidden layers.
My code of my best model(based on training/val loss) is as follows:
X_all = df.copy()
train_cols = ['a_line0','a_line1','a_line2','a_line3','a_line4','a_line5',
'a_line6','a_line7','a_line8','a_line9','a_line10','h_line0',
'h_line1','h_line2','h_line3','h_line4','h_line5','h_line6',
'h_line7','h_line8','h_line9','h_line10','odds0','odds1','odds2']
x = X_all[train_cols]
x_v = x.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x_v)
x = pd.DataFrame(x_scaled)
y = X_all['result']
ohe = OneHotEncoder(n_values=3,categories='auto')
y = ohe.fit_transform(y.reshape(-1,1))
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
for lr,ep in [(0.001,300)]:
model = Sequential()
model.add(Dense(25, input_dim=25, activation='relu'))
model.add(Dense(36, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(12, activation='relu'))
model.add(Dense(3, activation='sigmoid'))
adam = kr.optimizers.Adam(lr=lr, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit(X_train, y_train, epochs=ep, batch_size=10,verbose = 0)
_, accuracy = model.evaluate(X_test, y_test)
_, accuracy1 = model.evaluate(X_train, y_train)
print('Testing Accuracy: %.2f' % (accuracy*100),'Train Accuracy: %.2f' % (accuracy1*100), 'learning rate : ', lr)
I apologise if the code is a bit messy.
My model also overfits by +- 16% (52% vs 68%) on this config of my network.
Since you are in a multi-class single-label setting (i.e. your labels are mutually exclusive), you should not use sigmoid as activation in your final layer; change it to
model.add(Dense(3, activation='softmax'))
Also, dropout should not be used by default; remove it for starters, and only add it if it improves the result.

Categories

Resources