Validation accuracy not improving imbalanced data

Validation accuracy not improving imbalanced data - python

Attempting to make predictions using Kaggle Diabetic retinopathy data set and a CNN model. There are five classes to be predicted. Distribution % of the data label wise is as below.
0 0.73
2 0.15
1 0.07
3 0.02
4 0.02
Name: level, dtype: float64
The relevant important code blocks are furnished below.
# Network training parameters
EPOCHS = 25
BATCH_SIZE =50
VERBOSE = 1
lr=0.0001
OPTIMIZER = tf.keras.optimizers.Adam(lr)
target_size =(256, 256)
NB_CLASSES = 5
THe Image generator class and the preprocessing codes as below.
data_gen=tf.keras.preprocessing.image.ImageDataGenerator(rotation_range=45,
horizontal_flip=True,
vertical_flip=True,
rescale=1./255,
validation_split=0.2)
train_gen=data_gen.flow_from_dataframe(
dataframe=label_csv, directory=IMAGE_FOLDER_PATH,
x_col='image', y_col='level',
target_size=target_size,
class_mode='categorical',
batch_size=BATCH_SIZE, shuffle=True,
subset='training',
validate_filenames=True
)
Found 28101 validated image filenames belonging to 5 classes.
validation_gen=data_gen.flow_from_dataframe(
dataframe=label_csv, directory=IMAGE_FOLDER_PATH,
x_col='image', y_col='level',
target_size=target_size,
class_mode='categorical',
batch_size=BATCH_SIZE, shuffle=True,
subset='validation',
validate_filenames=True
)
Found 7025 validated image filenames belonging to 5 classes.
train_gen.image_shape
(256, 256, 3)
Model building code blocks as below.
# Architect your CNN model1
model1=tf.keras.models.Sequential()
model1.add(tf.keras.layers.Conv2D(256,(3,3),input_shape=INPUT_SHAPE,activation='relu'))
model1.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model1.add(tf.keras.layers.Conv2D(128,(3,3),activation='relu'))
model1.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model1.add(tf.keras.layers.Conv2D(64,(3,3),activation='relu'))
model1.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model1.add(tf.keras.layers.Conv2D(32,(3,3),activation='relu'))
model1.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model1.add(tf.keras.layers.Flatten())
model1.add(tf.keras.layers.Dense(units=512,activation='relu'))
model1.add(tf.keras.layers.Dense(units=256,activation='relu'))
model1.add(tf.keras.layers.Dense(units=128,activation='relu'))
model1.add(tf.keras.layers.Dense(units=64,activation='relu'))
model1.add(tf.keras.layers.Dense(units=32,activation='relu'))
model1.add(tf.keras.layers.Dense(units=NB_CLASSES,activation='softmax'))
model1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 254, 254, 256) 7168
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 127, 127, 256) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 125, 125, 128) 295040
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 62, 62, 128) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 60, 60, 64) 73792
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 30, 30, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 28, 28, 32) 18464
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 32) 0
_________________________________________________________________
flatten (Flatten) (None, 6272) 0
_________________________________________________________________
dense (Dense) (None, 512) 3211776
_________________________________________________________________
dense_1 (Dense) (None, 256) 131328
_________________________________________________________________
dense_2 (Dense) (None, 128) 32896
_________________________________________________________________
dense_3 (Dense) (None, 64) 8256
_________________________________________________________________
dense_4 (Dense) (None, 32) 2080
_________________________________________________________________
dense_5 (Dense) (None, 5) 165
=================================================================
Total params: 3,780,965
Trainable params: 3,780,965
Non-trainable params: 0
# Compile model1
model1.compile(optimizer=OPTIMIZER,metrics=['accuracy'],loss='categorical_crossentropy')
print (train_gen.n,train_gen.batch_size)
28101 50
STEP_SIZE_TRAIN=train_gen.n//train_gen.batch_size
STEP_SIZE_VALID=validation_gen.n//validation_gen.batch_size
print(STEP_SIZE_TRAIN)
print(STEP_SIZE_VALID)
562
140
# Fit the model1
history1=model1.fit(train_gen,
steps_per_epoch=STEP_SIZE_TRAIN,
validation_data=validation_gen,
validation_steps=STEP_SIZE_VALID,
epochs=EPOCHS,verbose=1)
History of the epoch as below and trained stopped at epoch -14 as no improvement observed.
Epoch 1/25
562/562 [==============================] - 1484s 3s/step - loss: 0.9437 - accuracy: 0.7290 - val_loss: 0.8678 - val_accuracy: 0.7309
Epoch 2/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8748 - accuracy: 0.7337 - val_loss: 0.8673 - val_accuracy: 0.7309
Epoch 3/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8681 - accuracy: 0.7367 - val_loss: 0.8614 - val_accuracy: 0.7306
Epoch 4/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8619 - accuracy: 0.7333 - val_loss: 0.8592 - val_accuracy: 0.7306
Epoch 5/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8565 - accuracy: 0.7375 - val_loss: 0.8625 - val_accuracy: 0.7304
Epoch 6/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8608 - accuracy: 0.7357 - val_loss: 0.8556 - val_accuracy: 0.7310
Epoch 7/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8568 - accuracy: 0.7335 - val_loss: 0.8614 - val_accuracy: 0.7304
Epoch 8/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8541 - accuracy: 0.7349 - val_loss: 0.8591 - val_accuracy: 0.7301
Epoch 9/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8582 - accuracy: 0.7321 - val_loss: 0.8583 - val_accuracy: 0.7303
Epoch 10/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8509 - accuracy: 0.7354 - val_loss: 0.8599 - val_accuracy: 0.7311
Epoch 11/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8521 - accuracy: 0.7325 - val_loss: 0.8584 - val_accuracy: 0.7304
Epoch 12/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8422 - accuracy: 0.7352 - val_loss: 0.8481 - val_accuracy: 0.7307
Epoch 13/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8511 - accuracy: 0.7345 - val_loss: 0.8477 - val_accuracy: 0.7307
Epoch 14/25
562/562 [==============================] - 1462s 3s/step - loss: 0.8314 - accuracy: 0.7387 - val_loss: 0.8528 - val_accuracy: 0.7300
Epoch 15/25
73/562 [==>...........................] - ETA: 17:12 - loss: 0.8388 - accuracy: 0.7344
Validation accuracy not improving more than 73 % even after several epochs.In the earlier trial i tried the learning rate 0.001 but the case was same with no improvements.
Request suggestions to improve the model accuracy.
Also how can we use Grid search when we use the Image generator for preprocessing and would invite suggestions for the same
Many thanks in advance

your problem is most likely due to overfitting. your data is quite unbalanced and in addition to finding a better model, a better learning rate or a better optimizer. you could also create a custom generator to augment and select your data in a more balanced way.
I use custom generators for most of the models at work, I can't share the full code of generators but I'll show you a pseudocode example of how to create one. it's actually quite fun to play around and add more steps to it. you can -and you probably should- add pre-processing and post-processing steps but I hope this code gives you an overall idea of the process.
import random
import numpy as np
class myCostumGenerator:
def __init__(self) -> None:
# load dataset into a dict, if it's too big then just load filenames and load them at runtime
# each dict key is a class name, and each value is a list of images or filenames
self.dataSet, self.imageHeight, self.imageWidth, self.imageChannels = loadData()
def labelBinarizer(self, label):
# this is how you convert class names into target Y
pass
def augment(self, image):
# this is how you augment your images
pass
def yeildData(self):
while True:#keras generators need to run infinitly
for className, data in self.dataSet.items():
yield self.augment(random.choice(data)), self.labelBinarizer(className)
def getEmptyBatch(self, batchSize):
return (
np.empty([batchSize, self.imageHeight, self.imageWidth, self.imageChannels]),
np.empty([batchSize, len(self.dataset.keys())]), 0)
def getBatches(self, batchSize):
X, Y, i = self.getEmptyBatch(batchSize)
for image, label in self.yieldData():
X[i, ...] = image
Y[i, ...] = label
i += 1
if i== batchSize:
yield X, Y
X, Y, i = self.getEmptyBatch(batchSize)
# your model definition and other stuff
# ...
# ...
# ...
# with this method of defining a generator, you have to set number of steps per epoch
generator = myCostumGenerator()
model.fit(
generator.getBatches(batchSize=256),
steps_per_epoch = 500
# other params
)

Related

PYOD Autoencoders anomaly detection high false positives

I have a large dataset with 2 Million rows and 2800 columns, containing 2% of anomalous data. Currently, there is a label that says anomalous or not by 0 or 1, they were marked manually by domain experts. I have a need to convert this into unsupervised learning.
So, I started with PYOD's Autoencoders as they work well on high-dimensional data. The problem is all of them gave me high false positives. Based on the tutorial I developed the following Autoencoder
from pyod.models.auto_encoder import AutoEncoder
encoder=AutoEncoder(contamination=0.02,epochs=12,hidden_neurons=[2000,1000,500,500,1000,2000])
data.shape()
encoder.fit(data)
target
0.0 9737
1.0 263
dtype: int64
Model: "sequential_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_54 (Dense) (None, 2869) 8234030
dropout_44 (Dropout) (None, 2869) 0
dense_55 (Dense) (None, 2869) 8234030
dropout_45 (Dropout) (None, 2869) 0
dense_56 (Dense) (None, 2000) 5740000
dropout_46 (Dropout) (None, 2000) 0
dense_57 (Dense) (None, 1000) 2001000
dropout_47 (Dropout) (None, 1000) 0
dense_58 (Dense) (None, 500) 500500
dropout_48 (Dropout) (None, 500) 0
dense_59 (Dense) (None, 500) 250500
dropout_49 (Dropout) (None, 500) 0
dense_60 (Dense) (None, 1000) 501000
dropout_50 (Dropout) (None, 1000) 0
dense_61 (Dense) (None, 2000) 2002000
dropout_51 (Dropout) (None, 2000) 0
dense_62 (Dense) (None, 2869) 5740869
=================================================================
Total params: 33,203,929
Trainable params: 33,203,929
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/12
282/282 [==============================] - 22s 74ms/step - loss: 261.8725 - val_loss: 164.1958
Epoch 2/12
282/282 [==============================] - 21s 73ms/step - loss: 102.1214 - val_loss: 365.0436
Epoch 3/12
282/282 [==============================] - 21s 73ms/step - loss: 54.5027 - val_loss: 598.0752
Epoch 4/12
282/282 [==============================] - 20s 72ms/step - loss: 28.4714 - val_loss: 867.0073
Epoch 5/12
282/282 [==============================] - 20s 72ms/step - loss: 14.0551 - val_loss: 1149.2327
Epoch 6/12
282/282 [==============================] - 20s 72ms/step - loss: 7.2151 - val_loss: 1323.5684
Epoch 7/12
282/282 [==============================] - 20s 73ms/step - loss: 3.8648 - val_loss: 1449.9386
Epoch 8/12
282/282 [==============================] - 20s 72ms/step - loss: 2.7034 - val_loss: 1611.7833
Epoch 9/12
282/282 [==============================] - 20s 72ms/step - loss: 1.6767 - val_loss: 1712.9929
Epoch 10/12
282/282 [==============================] - 20s 72ms/step - loss: 1.3498 - val_loss: 1777.0973
Epoch 11/12
282/282 [==============================] - 20s 72ms/step - loss: 1.1861 - val_loss: 1821.0354
Epoch 12/12
282/282 [==============================] - 20s 73ms/step - loss: 1.1071 - val_loss: 1846.3872
313/313 [==============================] - 3s 10ms/step
AutoEncoder(batch_size=32, contamination=0.02, dropout_rate=0.2, epochs=12,
hidden_activation='relu',
hidden_neurons=[2000, 1000, 500, 500, 1000, 2000],
l2_regularizer=0.1,
loss=<function mean_squared_error at 0x7fadf2d4cf80>,
optimizer='adam', output_activation='sigmoid', preprocessing=True,
random_state=None, validation_size=0.1, verbose=1)
To speed up the iterations, I took 10K records with 2% of anomalous data init, as shown in the output of data.shape. As we can see training loss is reducing; however, val_loss is oscillating. The situation is the same even if the epochs=100.
When I predict on the same training data, not validation data
encoder.predict(data)
I get very high false positives, it will produce 200 anomalous data, but only 9 of them are actual anomalies based on the manual labels.
1). Am I using the encoder correctly?
2). I think, as the data were manually labeled by domain experts, I think the data itself doesn't have enough information to reveal the anomalous data. Hence, it needs to transformed, to help models identify anomalies correctly?
Please suggest.
Thanks

Accuracy and Validation Accuracy stay unchanged while both losses reduce. Tried everything I could find, still doesn't work

So, I am trying to code a multivariate LSTM for time series forecasting, and in my model, the losses decrease but accuracy metrics do not change at all. I tried changing number of neurons, layers, learning rate, early stopping, activation function on the output layer, and l2 regularization but nothing works. I am a beginner in machine learning, and so any help would be appreciated.Most of my efforts were like throwing stones in the dark. I am attaching a the GitHub link to my code, as well as a few of the training epochs.
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.regularizers import l2
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
model = Sequential()
model.add(LSTM(64,activation='sigmoid',return_sequences=True,input_shape = (trainX.shape[1],trainX.shape[2])))
model.add(LSTM(32,activation='sigmoid',return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(trainY.shape[1]))
opt = Adam(learning_rate= 1e-3)
model.compile(optimizer='adam',loss = 'mse', metrics=['accuracy'])
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 200, 64) 19200
_________________________________________________________________
lstm_7 (LSTM) (None, 32) 12416
_________________________________________________________________
dropout_3 (Dropout) (None, 32) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 33
=================================================================
Total params: 31,649
Trainable params: 31,649
Non-trainable params: 0
es_callback = EarlyStopping(monitor='val_loss', patience=3)
history = model.fit(trainX,trainY,epochs=40,batch_size= 32,verbose=1,validation_split=0.2, callbacks= [es_callback])
Epoch 1/40
214/214 [==============================] - 58s 169ms/step - loss: 0.1663 - accuracy: 0.0000e+00 - val_loss: 0.0483 - val_accuracy: 5.8617e-04
Epoch 2/40
214/214 [==============================] - 35s 164ms/step - loss: 0.0497 - accuracy: 0.0000e+00 - val_loss: 0.0446 - val_accuracy: 5.8617e-04
Epoch 3/40
214/214 [==============================] - 35s 164ms/step - loss: 0.0309 - accuracy: 0.0000e+00 - val_loss: 0.0092 - val_accuracy: 5.8617e-04
Epoch 4/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0143 - accuracy: 0.0000e+00 - val_loss: 0.0230 - val_accuracy: 5.8617e-04
Epoch 5/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0115 - accuracy: 0.0000e+00 - val_loss: 0.0160 - val_accuracy: 5.8617e-04
Epoch 6/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0099 - accuracy: 0.0000e+00 - val_loss: 0.0172 - val_accuracy: 5.8617e-04
My code: https://github.com/RiddhimanRaut/Deep-Learning-based-CPR-estimation/blob/main/CPR_prediction_multivariate_LSTM_tobetrialled_1.ipynb
Thank you!

Accuracy is the metric for classification tasks. To measure if a regression model is good or not, measurement such as MSE can be applied.
I think the discussion here can provide more information.

TensorFlow not running correct number of epochs with no errors

I am very much novice at neural networks / machine learning. I am trying to learn more by using RotNet, a NN that will classify rotation angles in images. I am trying to train my network using the MNIST dataset, and have changed only one line of the repo (a log directory file path) but other than that have been able to run it successfully.
Here is how I am running it based on the README:
& .../Anaconda3/envs/tflow/python.exe .../RotNet/train/train_mnist.py
and then the output:
Using TensorFlow backend.
Input shape: (28, 28, 1)
60000 train samples
10000 test samples
2020-10-16 12:18:17.031214: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 28, 28, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 26, 26, 64) 640
_________________________________________________________________
conv2d_2 (Conv2D) (None, 24, 24, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 9216) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 1179776
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 360) 46440
=================================================================
Total params: 1,263,784
Trainable params: 1,263,784
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
1/468 [..............................] - ETA: 2:21 - loss: 5.8862 - angle_error: 87.14062020-10-16 12:18:18.337183: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
469/468 [==============================] - 61s 130ms/step - loss: 5.0338 - angle_error: 81.4492 - val_loss: 4.1144 - val_angle_error: 65.9470
Epoch 2/50
469/468 [==============================] - 61s 131ms/step - loss: 4.3072 - angle_error: 64.7485 - val_loss: 3.4630 - val_angle_error: 53.0140
Epoch 3/50
469/468 [==============================] - 63s 134ms/step - loss: 4.0303 - angle_error: 56.3245 - val_loss: 3.2241 - val_angle_error: 47.0283
Epoch 4/50
469/468 [==============================] - 63s 134ms/step - loss: 3.8824 - angle_error: 52.2043 - val_loss: 3.3227 - val_angle_error: 43.2439
Epoch 5/50
469/468 [==============================] - 63s 135ms/step - loss: 3.7982 - angle_error: 49.9996 - val_loss: 3.1930 - val_angle_error: 41.1242
Epoch 6/50
469/468 [==============================] - 73s 155ms/step - loss: 3.7288 - angle_error: 48.4027 - val_loss: 2.9600 - val_angle_error: 39.9322
Epoch 7/50
469/468 [==============================] - 63s 133ms/step - loss: 3.6781 - angle_error: 46.5616 - val_loss: 3.2243 - val_angle_error: 38.6193
Epoch 8/50
469/468 [==============================] - 62s 132ms/step - loss: 3.6439 - angle_error: 45.2133 - val_loss: 2.8629 - val_angle_error: 38.0046
Epoch 9/50
469/468 [==============================] - 62s 132ms/step - loss: 3.6132 - angle_error: 44.7204 - val_loss: 3.0085 - val_angle_error: 37.4514
Epoch 10/50
469/468 [==============================] - 62s 132ms/step - loss: 3.5817 - angle_error: 43.8439 - val_loss: 3.0073 - val_angle_error: 35.8109
The script train_mnist.py is located here and it specifies 50 epochs. I am getting no error, the program simply stops after the 8th or 10th epoch. I am at a loss for how to fix this issue. Any advice would be appreciated!

I took a quick look at the code. In it there is this line:
callbacks=[checkpointer, early_stopping, tensorboard]
The call back early_stopping by default monitors the validation loss. The code used for early stopping is set such that if the validation loss fails to improve for more than 2 consecutive epochs training will halt. That is why it does not train for 50 epochs. If you want it to continue training for the full 50 remove early_stopping from the line of code above. You can see that early_stopping is causing the training to terminate by changing the code in the script from
early_stopping = EarlyStopping(patience=2)
# change code to
early_stopping = EarlyStopping(patience=2, verbose=1)
From the training data this model does not appear to be training very well. I suggest you try transfer learning with MobileNet. Code below shows how to use it,
mobile = tf.keras.applications.mobilenet.MobileNet( include_top=False, input_shape=(img_size, img_size,3), pooling='max', weights='imagenet', dropout=.5)
x=mobile.layers[-1].output # this is the last layer in the mobilenet model the global max pooling layer
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x=Dense(126, activation='relu')(x)
x=Dropout(rate=.3, seed = 123)(x)
predictions=Dense (len(classes), activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)
Adapt the above to your situation it should work much better
for layer in model.layers:
layer.trainable=True
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])

My validation accuracy is stuck and training accuracy is decreased continuously

I am new to working with LSTM models, but I have a small network. I have extracted MFCC features from my audio files and have flattened it and given as input. But the validation accuracy is stuck between 2 values and my accuracy is decreasing continuously.
I have used RMSprop with a learning rate of 0.001.
I have tried changing Optimizer, adding dropout, and batch normalization.
The dataset is evenly balanced also.
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3460, 1) 0
_________________________________________________________________
cu_dnnlstm_1 (CuDNNLSTM) (None, 3460, 1024) 4206592
_________________________________________________________________
cu_dnnlstm_2 (CuDNNLSTM) (None, 1024) 8396800
_________________________________________________________________
dense_1 (Dense) (None, 512) 524800
_________________________________________________________________
batch_normalization_1 (Batch (None, 512) 2048
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 256) 131328
_________________________________________________________________
batch_normalization_2 (Batch (None, 256) 1024
_________________________________________________________________
dropout_2 (Dropout) (None, 256) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 257
=================================================================
Total params: 13,262,849
Trainable params: 13,261,313
Non-trainable params: 1,536
_________________________________________________________________
Train on 385 samples, validate on 165 samples
Epoch 1/10
385/385 [==============================] - 61s 160ms/step - loss: 1.0811 - accuracy: 0.5143 - val_loss: 0.6917 - val_accuracy: 0.5273
Epoch 2/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7536 - accuracy: 0.5169 - val_loss: 0.6980 - val_accuracy: 0.4727
Epoch 3/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7484 - accuracy: 0.5039 - val_loss: 0.7002 - val_accuracy: 0.4727
Epoch 4/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7333 - accuracy: 0.5091 - val_loss: 0.7030 - val_accuracy: 0.5273
Epoch 5/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7486 - accuracy: 0.4675 - val_loss: 0.6917 - val_accuracy: 0.5273
Epoch 6/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7222 - accuracy: 0.4935 - val_loss: 0.6917 - val_accuracy: 0.5273
Epoch 7/10
385/385 [==============================] - 55s 143ms/step - loss: 0.7208 - accuracy: 0.4883 - val_loss: 0.6919 - val_accuracy: 0.5273
Epoch 8/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7134 - accuracy: 0.4805 - val_loss: 0.6919 - val_accuracy: 0.5273
Epoch 9/10
385/385 [==============================] - 55s 143ms/step - loss: 0.7168 - accuracy: 0.4987 - val_loss: 0.6927 - val_accuracy: 0.5273
Epoch 10/10
385/385 [==============================] - 55s 143ms/step - loss: 0.7089 - accuracy: 0.4909 - val_loss: 0.6926 - val_accuracy: 0.5273
Here is my code:
def build_model():
input = Input((20*173,1))
x = Conv1D(filters=16, kernel_size=4, activation='relu')(input)
x = AveragePooling1D(pool_size=2)(x)
x = Conv1D(filters=16, kernel_size=3, activation='relu')(x)
x = AveragePooling1D(pool_size=2)(x)
x = Flatten()(x)
x = keras.layers.Reshape((13808, 1))(x)
x = CuDNNLSTM(1024, return_sequences=True)(x)
x = CuDNNLSTM(512)(x)
x = Dense(256,activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(128,activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(1,activation='sigmoid')(x)
model = Model(inputs=input, outputs=x)
return model
reduce_lr = ReduceLROnPlateau(monitor='val_accuracy', factor=0.2,patience=3, min_lr=0.001)
opt = RMSprop(lr=0.0001)
m2 = build_model()
m2.compile(loss = "binary_crossentropy", metrics=['accuracy'],optimizer = opt)
m2.fit(X, y, batch_size=16, epochs=10, validation_split=0.3,callbacks = [reduce_lr])

Keras training crashes mid epoch after multiple correct executions

I am trying create a Cudgru based model that predicts sequence of 7 features that are interrelated. Here's my keras model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
cu_dnngru_1 (CuDNNGRU) (None, 49, 100) 32700
_________________________________________________________________
dropout_1 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_2 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_2 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_3 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_3 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_4 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_4 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_5 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_5 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_6 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_6 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_7 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_7 (Dropout) (None, 49, 100) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 4900) 0
_________________________________________________________________
dense_1 (Dense) (None, 7) 34307
=================================================================
Total params: 430,607
Trainable params: 430,607
Non-trainable params: 0
I'm trying to run this model for higher number of epochs. First few epochs are fine, but then it errors out:
Model] Model Compiled
Time taken: 0:00:02.314468
[Model] Training Started
[Model] 100 epochs, 1000 batch size, 20.0 batches per epoch
Epoch 1/100
20/20 [==============================] - 5s 240ms/step - loss: 0.1631 - acc: 0.2905
Epoch 2/100
20/20 [==============================] - 2s 81ms/step - loss: 0.1288 - acc: 0.2455
Epoch 3/100
20/20 [==============================] - 1s 73ms/step - loss: 0.0952 - acc: 0.5058
Epoch 4/100
20/20 [==============================] - 2s 76ms/step - loss: 0.1141 - acc: 0.3288
Epoch 5/100
20/20 [==============================] - 2s 75ms/step - loss: 0.1064 - acc: 0.3425
Epoch 6/100
20/20 [==============================] - 1s 75ms/step - loss: 0.0767 - acc: 0.4213
Epoch 7/100
20/20 [==============================] - 1s 75ms/step - loss: 0.0635 - acc: 0.4764
Epoch 8/100
20/20 [==============================] - 1s 74ms/step - loss: 0.0555 - acc: 0.5274
Epoch 9/100
20/20 [==============================] - 1s 74ms/step - loss: 0.0544 - acc: 0.5141
Epoch 10/100
...
Epoch 61/100
20/20 [==============================] - 1s 74ms/step - loss: 0.0506 - acc: 0.3925
Epoch 62/100
20/20 [==============================] - 1s 72ms/step - loss: 0.0495 - acc: 0.4323
Epoch 63/100
20/20 [==============================] - 1s 73ms/step - loss: 0.0495 - acc: 0.4118
Epoch 64/100
2/20 [==>...........................] - ETA: 1s - loss: 0.0495 - acc: 0.4885Traceback (most recent call last):
File "./run.py", line 118, in <module>
main()
File "./run.py", line 92, in main
steps_per_epoch=steps_per_epoch)
File "/home/sridhar/PE_CSV/alarmProj/rnn/lstm/core/model.py", line 149, in train_generator
workers=70)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training.py", line 1415, in fit_generator
initial_epoch=initial_epoch)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training_generator.py", line 213, in fit_generator
class_weight=class_weight)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training.py", line 1209, in train_on_batch
class_weight=class_weight)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training.py", line 749, in _standardize_user_data
exception_prefix='input')
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training_utils.py", line 127, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected cu_dnngru_1_input to have 3 dimensions, but got array with shape (380, 1)
If I reduce the number of epochs to less than that value (say epoch 64 here), I don't have any issues, but increasing the number of epochs causes the above error at some point. The exact number of epochs where it crashes seem to vary with any change to the configuration. The same issue is seen with vanilla GRU/LSTM layers.
This is keras-2.2.2 and the model is being compiled with 70 worker threads.
Is there something I could do to avoid this issue?
Edit:
Here's the relevant approximate code used:
session_conf = tf.ConfigProto(
inter_op_parallelism_threads=multiprocessing.cpu_count(),
intra_op_parallelism_threads=multiprocessing.cpu_count())
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
self.model.add(CuDNNGRU(
100,
input_shape=(49,7),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
elf.model.add(Flatten())
self.model.add(Dense(7, activation='relu'))
sgd = SGD(lr=0.1, decay=1e-2, clipnorm=5.0)
self.model.compile(
loss='mse',
metrics=["accuracy"],
optimizer=sgd)
===================
def train_generator(self, data_gen, epochs, batch_size, steps_per_epoch):
timer = Timer()
timer.start()
print('[Model] Training Started')
print('[Model] %s epochs, %s batch size, %s batches per epoch' %
(epochs, batch_size, steps_per_epoch))
save_fname = '%s/%s-e%s.h5' % (self.model_dir, dt.datetime.now()
.strftime('%d%m%Y-%H%M%S'), str(epochs))
callbacks = [
ModelCheckpoint(
filepath=save_fname, monitor='loss', save_best_only=True)
]
try:
self.model.fit_generator(
data_gen,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
callbacks=callbacks)
except:
pdb.set_trace()
)
print('[Model] Training Completed. Model saved as %s' % save_fname)
timer.stop()
=============
#invoked from main function
model.train_generator(
data_gen=data.generate_train_batch(
seq_len=50,
batch_size=1000,
normalise=false),
epochs=100,
batch_size=1000,
steps_per_epoch=steps_per_epoch)
=============
def generate_train_batch(self, seq_len, batch_size, normalise):
'''Yield a generator of training data from filename on given list of cols split for train/test'''
i = 0
while i < (self.len_train - seq_len):
x_batch = []
y_batch = []
for b in range(batch_size):
if i >= (self.len_train - seq_len):
# stop-condition for a smaller final batch if data doesn't divide evenly
yield np.array(x_batch), np.array(y_batch)
x, y = self._next_window(i, seq_len, normalise)
x_batch.append(x)
y_batch.append(y)
i += 1
yield np.array(x_batch), np.array(y_batch)
=======================

The generator was wrong. It incorrectly assumes a finite generator whilst keras expects an infinite one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Validation accuracy not improving imbalanced data - python

Related

PYOD Autoencoders anomaly detection high false positives

Accuracy and Validation Accuracy stay unchanged while both losses reduce. Tried everything I could find, still doesn't work

TensorFlow not running correct number of epochs with no errors

My validation accuracy is stuck and training accuracy is decreased continuously

Keras training crashes mid epoch after multiple correct executions

Categories

Resources