PYOD Autoencoders anomaly detection high false positives - python

I have a large dataset with 2 Million rows and 2800 columns, containing 2% of anomalous data. Currently, there is a label that says anomalous or not by 0 or 1, they were marked manually by domain experts. I have a need to convert this into unsupervised learning.
So, I started with PYOD's Autoencoders as they work well on high-dimensional data. The problem is all of them gave me high false positives. Based on the tutorial I developed the following Autoencoder
from pyod.models.auto_encoder import AutoEncoder
encoder=AutoEncoder(contamination=0.02,epochs=12,hidden_neurons=[2000,1000,500,500,1000,2000])
data.shape()
encoder.fit(data)
target
0.0 9737
1.0 263
dtype: int64
Model: "sequential_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_54 (Dense) (None, 2869) 8234030
dropout_44 (Dropout) (None, 2869) 0
dense_55 (Dense) (None, 2869) 8234030
dropout_45 (Dropout) (None, 2869) 0
dense_56 (Dense) (None, 2000) 5740000
dropout_46 (Dropout) (None, 2000) 0
dense_57 (Dense) (None, 1000) 2001000
dropout_47 (Dropout) (None, 1000) 0
dense_58 (Dense) (None, 500) 500500
dropout_48 (Dropout) (None, 500) 0
dense_59 (Dense) (None, 500) 250500
dropout_49 (Dropout) (None, 500) 0
dense_60 (Dense) (None, 1000) 501000
dropout_50 (Dropout) (None, 1000) 0
dense_61 (Dense) (None, 2000) 2002000
dropout_51 (Dropout) (None, 2000) 0
dense_62 (Dense) (None, 2869) 5740869
=================================================================
Total params: 33,203,929
Trainable params: 33,203,929
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/12
282/282 [==============================] - 22s 74ms/step - loss: 261.8725 - val_loss: 164.1958
Epoch 2/12
282/282 [==============================] - 21s 73ms/step - loss: 102.1214 - val_loss: 365.0436
Epoch 3/12
282/282 [==============================] - 21s 73ms/step - loss: 54.5027 - val_loss: 598.0752
Epoch 4/12
282/282 [==============================] - 20s 72ms/step - loss: 28.4714 - val_loss: 867.0073
Epoch 5/12
282/282 [==============================] - 20s 72ms/step - loss: 14.0551 - val_loss: 1149.2327
Epoch 6/12
282/282 [==============================] - 20s 72ms/step - loss: 7.2151 - val_loss: 1323.5684
Epoch 7/12
282/282 [==============================] - 20s 73ms/step - loss: 3.8648 - val_loss: 1449.9386
Epoch 8/12
282/282 [==============================] - 20s 72ms/step - loss: 2.7034 - val_loss: 1611.7833
Epoch 9/12
282/282 [==============================] - 20s 72ms/step - loss: 1.6767 - val_loss: 1712.9929
Epoch 10/12
282/282 [==============================] - 20s 72ms/step - loss: 1.3498 - val_loss: 1777.0973
Epoch 11/12
282/282 [==============================] - 20s 72ms/step - loss: 1.1861 - val_loss: 1821.0354
Epoch 12/12
282/282 [==============================] - 20s 73ms/step - loss: 1.1071 - val_loss: 1846.3872
313/313 [==============================] - 3s 10ms/step
AutoEncoder(batch_size=32, contamination=0.02, dropout_rate=0.2, epochs=12,
hidden_activation='relu',
hidden_neurons=[2000, 1000, 500, 500, 1000, 2000],
l2_regularizer=0.1,
loss=<function mean_squared_error at 0x7fadf2d4cf80>,
optimizer='adam', output_activation='sigmoid', preprocessing=True,
random_state=None, validation_size=0.1, verbose=1)
To speed up the iterations, I took 10K records with 2% of anomalous data init, as shown in the output of data.shape. As we can see training loss is reducing; however, val_loss is oscillating. The situation is the same even if the epochs=100.
When I predict on the same training data, not validation data
encoder.predict(data)
I get very high false positives, it will produce 200 anomalous data, but only 9 of them are actual anomalies based on the manual labels.
1). Am I using the encoder correctly?
2). I think, as the data were manually labeled by domain experts, I think the data itself doesn't have enough information to reveal the anomalous data. Hence, it needs to transformed, to help models identify anomalies correctly?
Please suggest.
Thanks

Related

Accuracy and Validation Accuracy stay unchanged while both losses reduce. Tried everything I could find, still doesn't work

So, I am trying to code a multivariate LSTM for time series forecasting, and in my model, the losses decrease but accuracy metrics do not change at all. I tried changing number of neurons, layers, learning rate, early stopping, activation function on the output layer, and l2 regularization but nothing works. I am a beginner in machine learning, and so any help would be appreciated.Most of my efforts were like throwing stones in the dark. I am attaching a the GitHub link to my code, as well as a few of the training epochs.
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.regularizers import l2
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
model = Sequential()
model.add(LSTM(64,activation='sigmoid',return_sequences=True,input_shape = (trainX.shape[1],trainX.shape[2])))
model.add(LSTM(32,activation='sigmoid',return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(trainY.shape[1]))
opt = Adam(learning_rate= 1e-3)
model.compile(optimizer='adam',loss = 'mse', metrics=['accuracy'])
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 200, 64) 19200
_________________________________________________________________
lstm_7 (LSTM) (None, 32) 12416
_________________________________________________________________
dropout_3 (Dropout) (None, 32) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 33
=================================================================
Total params: 31,649
Trainable params: 31,649
Non-trainable params: 0
es_callback = EarlyStopping(monitor='val_loss', patience=3)
history = model.fit(trainX,trainY,epochs=40,batch_size= 32,verbose=1,validation_split=0.2, callbacks= [es_callback])
Epoch 1/40
214/214 [==============================] - 58s 169ms/step - loss: 0.1663 - accuracy: 0.0000e+00 - val_loss: 0.0483 - val_accuracy: 5.8617e-04
Epoch 2/40
214/214 [==============================] - 35s 164ms/step - loss: 0.0497 - accuracy: 0.0000e+00 - val_loss: 0.0446 - val_accuracy: 5.8617e-04
Epoch 3/40
214/214 [==============================] - 35s 164ms/step - loss: 0.0309 - accuracy: 0.0000e+00 - val_loss: 0.0092 - val_accuracy: 5.8617e-04
Epoch 4/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0143 - accuracy: 0.0000e+00 - val_loss: 0.0230 - val_accuracy: 5.8617e-04
Epoch 5/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0115 - accuracy: 0.0000e+00 - val_loss: 0.0160 - val_accuracy: 5.8617e-04
Epoch 6/40
214/214 [==============================] - 35s 163ms/step - loss: 0.0099 - accuracy: 0.0000e+00 - val_loss: 0.0172 - val_accuracy: 5.8617e-04
My code: https://github.com/RiddhimanRaut/Deep-Learning-based-CPR-estimation/blob/main/CPR_prediction_multivariate_LSTM_tobetrialled_1.ipynb
Thank you!
Accuracy is the metric for classification tasks. To measure if a regression model is good or not, measurement such as MSE can be applied.
I think the discussion here can provide more information.

Validation accuracy not improving imbalanced data

Attempting to make predictions using Kaggle Diabetic retinopathy data set and a CNN model. There are five classes to be predicted. Distribution % of the data label wise is as below.
0 0.73
2 0.15
1 0.07
3 0.02
4 0.02
Name: level, dtype: float64
The relevant important code blocks are furnished below.
# Network training parameters
EPOCHS = 25
BATCH_SIZE =50
VERBOSE = 1
lr=0.0001
OPTIMIZER = tf.keras.optimizers.Adam(lr)
target_size =(256, 256)
NB_CLASSES = 5
THe Image generator class and the preprocessing codes as below.
data_gen=tf.keras.preprocessing.image.ImageDataGenerator(rotation_range=45,
horizontal_flip=True,
vertical_flip=True,
rescale=1./255,
validation_split=0.2)
train_gen=data_gen.flow_from_dataframe(
dataframe=label_csv, directory=IMAGE_FOLDER_PATH,
x_col='image', y_col='level',
target_size=target_size,
class_mode='categorical',
batch_size=BATCH_SIZE, shuffle=True,
subset='training',
validate_filenames=True
)
Found 28101 validated image filenames belonging to 5 classes.
validation_gen=data_gen.flow_from_dataframe(
dataframe=label_csv, directory=IMAGE_FOLDER_PATH,
x_col='image', y_col='level',
target_size=target_size,
class_mode='categorical',
batch_size=BATCH_SIZE, shuffle=True,
subset='validation',
validate_filenames=True
)
Found 7025 validated image filenames belonging to 5 classes.
train_gen.image_shape
(256, 256, 3)
Model building code blocks as below.
# Architect your CNN model1
model1=tf.keras.models.Sequential()
model1.add(tf.keras.layers.Conv2D(256,(3,3),input_shape=INPUT_SHAPE,activation='relu'))
model1.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model1.add(tf.keras.layers.Conv2D(128,(3,3),activation='relu'))
model1.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model1.add(tf.keras.layers.Conv2D(64,(3,3),activation='relu'))
model1.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model1.add(tf.keras.layers.Conv2D(32,(3,3),activation='relu'))
model1.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
model1.add(tf.keras.layers.Flatten())
model1.add(tf.keras.layers.Dense(units=512,activation='relu'))
model1.add(tf.keras.layers.Dense(units=256,activation='relu'))
model1.add(tf.keras.layers.Dense(units=128,activation='relu'))
model1.add(tf.keras.layers.Dense(units=64,activation='relu'))
model1.add(tf.keras.layers.Dense(units=32,activation='relu'))
model1.add(tf.keras.layers.Dense(units=NB_CLASSES,activation='softmax'))
model1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 254, 254, 256) 7168
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 127, 127, 256) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 125, 125, 128) 295040
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 62, 62, 128) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 60, 60, 64) 73792
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 30, 30, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 28, 28, 32) 18464
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 32) 0
_________________________________________________________________
flatten (Flatten) (None, 6272) 0
_________________________________________________________________
dense (Dense) (None, 512) 3211776
_________________________________________________________________
dense_1 (Dense) (None, 256) 131328
_________________________________________________________________
dense_2 (Dense) (None, 128) 32896
_________________________________________________________________
dense_3 (Dense) (None, 64) 8256
_________________________________________________________________
dense_4 (Dense) (None, 32) 2080
_________________________________________________________________
dense_5 (Dense) (None, 5) 165
=================================================================
Total params: 3,780,965
Trainable params: 3,780,965
Non-trainable params: 0
# Compile model1
model1.compile(optimizer=OPTIMIZER,metrics=['accuracy'],loss='categorical_crossentropy')
print (train_gen.n,train_gen.batch_size)
28101 50
STEP_SIZE_TRAIN=train_gen.n//train_gen.batch_size
STEP_SIZE_VALID=validation_gen.n//validation_gen.batch_size
print(STEP_SIZE_TRAIN)
print(STEP_SIZE_VALID)
562
140
# Fit the model1
history1=model1.fit(train_gen,
steps_per_epoch=STEP_SIZE_TRAIN,
validation_data=validation_gen,
validation_steps=STEP_SIZE_VALID,
epochs=EPOCHS,verbose=1)
History of the epoch as below and trained stopped at epoch -14 as no improvement observed.
Epoch 1/25
562/562 [==============================] - 1484s 3s/step - loss: 0.9437 - accuracy: 0.7290 - val_loss: 0.8678 - val_accuracy: 0.7309
Epoch 2/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8748 - accuracy: 0.7337 - val_loss: 0.8673 - val_accuracy: 0.7309
Epoch 3/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8681 - accuracy: 0.7367 - val_loss: 0.8614 - val_accuracy: 0.7306
Epoch 4/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8619 - accuracy: 0.7333 - val_loss: 0.8592 - val_accuracy: 0.7306
Epoch 5/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8565 - accuracy: 0.7375 - val_loss: 0.8625 - val_accuracy: 0.7304
Epoch 6/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8608 - accuracy: 0.7357 - val_loss: 0.8556 - val_accuracy: 0.7310
Epoch 7/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8568 - accuracy: 0.7335 - val_loss: 0.8614 - val_accuracy: 0.7304
Epoch 8/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8541 - accuracy: 0.7349 - val_loss: 0.8591 - val_accuracy: 0.7301
Epoch 9/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8582 - accuracy: 0.7321 - val_loss: 0.8583 - val_accuracy: 0.7303
Epoch 10/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8509 - accuracy: 0.7354 - val_loss: 0.8599 - val_accuracy: 0.7311
Epoch 11/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8521 - accuracy: 0.7325 - val_loss: 0.8584 - val_accuracy: 0.7304
Epoch 12/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8422 - accuracy: 0.7352 - val_loss: 0.8481 - val_accuracy: 0.7307
Epoch 13/25
562/562 [==============================] - 1463s 3s/step - loss: 0.8511 - accuracy: 0.7345 - val_loss: 0.8477 - val_accuracy: 0.7307
Epoch 14/25
562/562 [==============================] - 1462s 3s/step - loss: 0.8314 - accuracy: 0.7387 - val_loss: 0.8528 - val_accuracy: 0.7300
Epoch 15/25
73/562 [==>...........................] - ETA: 17:12 - loss: 0.8388 - accuracy: 0.7344
Validation accuracy not improving more than 73 % even after several epochs.In the earlier trial i tried the learning rate 0.001 but the case was same with no improvements.
Request suggestions to improve the model accuracy.
Also how can we use Grid search when we use the Image generator for preprocessing and would invite suggestions for the same
Many thanks in advance
your problem is most likely due to overfitting. your data is quite unbalanced and in addition to finding a better model, a better learning rate or a better optimizer. you could also create a custom generator to augment and select your data in a more balanced way.
I use custom generators for most of the models at work, I can't share the full code of generators but I'll show you a pseudocode example of how to create one. it's actually quite fun to play around and add more steps to it. you can -and you probably should- add pre-processing and post-processing steps but I hope this code gives you an overall idea of the process.
import random
import numpy as np
class myCostumGenerator:
def __init__(self) -> None:
# load dataset into a dict, if it's too big then just load filenames and load them at runtime
# each dict key is a class name, and each value is a list of images or filenames
self.dataSet, self.imageHeight, self.imageWidth, self.imageChannels = loadData()
def labelBinarizer(self, label):
# this is how you convert class names into target Y
pass
def augment(self, image):
# this is how you augment your images
pass
def yeildData(self):
while True:#keras generators need to run infinitly
for className, data in self.dataSet.items():
yield self.augment(random.choice(data)), self.labelBinarizer(className)
def getEmptyBatch(self, batchSize):
return (
np.empty([batchSize, self.imageHeight, self.imageWidth, self.imageChannels]),
np.empty([batchSize, len(self.dataset.keys())]), 0)
def getBatches(self, batchSize):
X, Y, i = self.getEmptyBatch(batchSize)
for image, label in self.yieldData():
X[i, ...] = image
Y[i, ...] = label
i += 1
if i== batchSize:
yield X, Y
X, Y, i = self.getEmptyBatch(batchSize)
# your model definition and other stuff
# ...
# ...
# ...
# with this method of defining a generator, you have to set number of steps per epoch
generator = myCostumGenerator()
model.fit(
generator.getBatches(batchSize=256),
steps_per_epoch = 500
# other params
)

TensorFlow not running correct number of epochs with no errors

I am very much novice at neural networks / machine learning. I am trying to learn more by using RotNet, a NN that will classify rotation angles in images. I am trying to train my network using the MNIST dataset, and have changed only one line of the repo (a log directory file path) but other than that have been able to run it successfully.
Here is how I am running it based on the README:
& .../Anaconda3/envs/tflow/python.exe .../RotNet/train/train_mnist.py
and then the output:
Using TensorFlow backend.
Input shape: (28, 28, 1)
60000 train samples
10000 test samples
2020-10-16 12:18:17.031214: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 28, 28, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 26, 26, 64) 640
_________________________________________________________________
conv2d_2 (Conv2D) (None, 24, 24, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 9216) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 1179776
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 360) 46440
=================================================================
Total params: 1,263,784
Trainable params: 1,263,784
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
1/468 [..............................] - ETA: 2:21 - loss: 5.8862 - angle_error: 87.14062020-10-16 12:18:18.337183: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
469/468 [==============================] - 61s 130ms/step - loss: 5.0338 - angle_error: 81.4492 - val_loss: 4.1144 - val_angle_error: 65.9470
Epoch 2/50
469/468 [==============================] - 61s 131ms/step - loss: 4.3072 - angle_error: 64.7485 - val_loss: 3.4630 - val_angle_error: 53.0140
Epoch 3/50
469/468 [==============================] - 63s 134ms/step - loss: 4.0303 - angle_error: 56.3245 - val_loss: 3.2241 - val_angle_error: 47.0283
Epoch 4/50
469/468 [==============================] - 63s 134ms/step - loss: 3.8824 - angle_error: 52.2043 - val_loss: 3.3227 - val_angle_error: 43.2439
Epoch 5/50
469/468 [==============================] - 63s 135ms/step - loss: 3.7982 - angle_error: 49.9996 - val_loss: 3.1930 - val_angle_error: 41.1242
Epoch 6/50
469/468 [==============================] - 73s 155ms/step - loss: 3.7288 - angle_error: 48.4027 - val_loss: 2.9600 - val_angle_error: 39.9322
Epoch 7/50
469/468 [==============================] - 63s 133ms/step - loss: 3.6781 - angle_error: 46.5616 - val_loss: 3.2243 - val_angle_error: 38.6193
Epoch 8/50
469/468 [==============================] - 62s 132ms/step - loss: 3.6439 - angle_error: 45.2133 - val_loss: 2.8629 - val_angle_error: 38.0046
Epoch 9/50
469/468 [==============================] - 62s 132ms/step - loss: 3.6132 - angle_error: 44.7204 - val_loss: 3.0085 - val_angle_error: 37.4514
Epoch 10/50
469/468 [==============================] - 62s 132ms/step - loss: 3.5817 - angle_error: 43.8439 - val_loss: 3.0073 - val_angle_error: 35.8109
The script train_mnist.py is located here and it specifies 50 epochs. I am getting no error, the program simply stops after the 8th or 10th epoch. I am at a loss for how to fix this issue. Any advice would be appreciated!
I took a quick look at the code. In it there is this line:
callbacks=[checkpointer, early_stopping, tensorboard]
The call back early_stopping by default monitors the validation loss. The code used for early stopping is set such that if the validation loss fails to improve for more than 2 consecutive epochs training will halt. That is why it does not train for 50 epochs. If you want it to continue training for the full 50 remove early_stopping from the line of code above. You can see that early_stopping is causing the training to terminate by changing the code in the script from
early_stopping = EarlyStopping(patience=2)
# change code to
early_stopping = EarlyStopping(patience=2, verbose=1)
From the training data this model does not appear to be training very well. I suggest you try transfer learning with MobileNet. Code below shows how to use it,
mobile = tf.keras.applications.mobilenet.MobileNet( include_top=False, input_shape=(img_size, img_size,3), pooling='max', weights='imagenet', dropout=.5)
x=mobile.layers[-1].output # this is the last layer in the mobilenet model the global max pooling layer
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x=Dense(126, activation='relu')(x)
x=Dropout(rate=.3, seed = 123)(x)
predictions=Dense (len(classes), activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)
Adapt the above to your situation it should work much better
for layer in model.layers:
layer.trainable=True
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])

My validation accuracy is stuck and training accuracy is decreased continuously

I am new to working with LSTM models, but I have a small network. I have extracted MFCC features from my audio files and have flattened it and given as input. But the validation accuracy is stuck between 2 values and my accuracy is decreasing continuously.
I have used RMSprop with a learning rate of 0.001.
I have tried changing Optimizer, adding dropout, and batch normalization.
The dataset is evenly balanced also.
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3460, 1) 0
_________________________________________________________________
cu_dnnlstm_1 (CuDNNLSTM) (None, 3460, 1024) 4206592
_________________________________________________________________
cu_dnnlstm_2 (CuDNNLSTM) (None, 1024) 8396800
_________________________________________________________________
dense_1 (Dense) (None, 512) 524800
_________________________________________________________________
batch_normalization_1 (Batch (None, 512) 2048
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 256) 131328
_________________________________________________________________
batch_normalization_2 (Batch (None, 256) 1024
_________________________________________________________________
dropout_2 (Dropout) (None, 256) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 257
=================================================================
Total params: 13,262,849
Trainable params: 13,261,313
Non-trainable params: 1,536
_________________________________________________________________
Train on 385 samples, validate on 165 samples
Epoch 1/10
385/385 [==============================] - 61s 160ms/step - loss: 1.0811 - accuracy: 0.5143 - val_loss: 0.6917 - val_accuracy: 0.5273
Epoch 2/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7536 - accuracy: 0.5169 - val_loss: 0.6980 - val_accuracy: 0.4727
Epoch 3/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7484 - accuracy: 0.5039 - val_loss: 0.7002 - val_accuracy: 0.4727
Epoch 4/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7333 - accuracy: 0.5091 - val_loss: 0.7030 - val_accuracy: 0.5273
Epoch 5/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7486 - accuracy: 0.4675 - val_loss: 0.6917 - val_accuracy: 0.5273
Epoch 6/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7222 - accuracy: 0.4935 - val_loss: 0.6917 - val_accuracy: 0.5273
Epoch 7/10
385/385 [==============================] - 55s 143ms/step - loss: 0.7208 - accuracy: 0.4883 - val_loss: 0.6919 - val_accuracy: 0.5273
Epoch 8/10
385/385 [==============================] - 55s 142ms/step - loss: 0.7134 - accuracy: 0.4805 - val_loss: 0.6919 - val_accuracy: 0.5273
Epoch 9/10
385/385 [==============================] - 55s 143ms/step - loss: 0.7168 - accuracy: 0.4987 - val_loss: 0.6927 - val_accuracy: 0.5273
Epoch 10/10
385/385 [==============================] - 55s 143ms/step - loss: 0.7089 - accuracy: 0.4909 - val_loss: 0.6926 - val_accuracy: 0.5273
Here is my code:
def build_model():
input = Input((20*173,1))
x = Conv1D(filters=16, kernel_size=4, activation='relu')(input)
x = AveragePooling1D(pool_size=2)(x)
x = Conv1D(filters=16, kernel_size=3, activation='relu')(x)
x = AveragePooling1D(pool_size=2)(x)
x = Flatten()(x)
x = keras.layers.Reshape((13808, 1))(x)
x = CuDNNLSTM(1024, return_sequences=True)(x)
x = CuDNNLSTM(512)(x)
x = Dense(256,activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(128,activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(1,activation='sigmoid')(x)
model = Model(inputs=input, outputs=x)
return model
reduce_lr = ReduceLROnPlateau(monitor='val_accuracy', factor=0.2,patience=3, min_lr=0.001)
opt = RMSprop(lr=0.0001)
m2 = build_model()
m2.compile(loss = "binary_crossentropy", metrics=['accuracy'],optimizer = opt)
m2.fit(X, y, batch_size=16, epochs=10, validation_split=0.3,callbacks = [reduce_lr])

Cannot install Keras MXNet without CUDA support on a machine with GPUs

I am explicitly trying to install a version of mxnet WITHOUT CUDA support. When installing with cuda support I could run this example here. I am following the keras & mxnet installation guide here.
Steps to reproduce successful CUDA-enabled keras-mxnet:
Here are my gpu configs from nvcc --version:
~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
Make sure you don't have mxnet installed.
pip install mxnet-cu80
pip install keras-mxnet
Running the code on jupyter gives me:
60000 train samples
10000 test samples
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 512) 401920
_________________________________________________________________
activation_1 (Activation) (None, 512) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 512) 262656
_________________________________________________________________
activation_2 (Activation) (None, 512) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 10) 5130
_________________________________________________________________
activation_3 (Activation) (None, 10) 0
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
6400/60000 [==>...........................] - ETA: 39s - loss: 2.1718 - acc: 0.2587
/usr/local/lib/python3.6/dist-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.0078125). Is this intended?
force_init=force_init)
60000/60000 [==============================] - 6s 103us/step - loss: 1.2105 - acc: 0.6957 - val_loss: 0.5334 - val_acc: 0.8728
Epoch 2/20
60000/60000 [==============================] - 2s 27us/step - loss: 0.5280 - acc: 0.8515 - val_loss: 0.3749 - val_acc: 0.8996
Epoch 3/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.4239 - acc: 0.8786 - val_loss: 0.3213 - val_acc: 0.9098
Epoch 4/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.3740 - acc: 0.8911 - val_loss: 0.2923 - val_acc: 0.9162
Epoch 5/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.3437 - acc: 0.9008 - val_loss: 0.2704 - val_acc: 0.9218
Epoch 6/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.3195 - acc: 0.9079 - val_loss: 0.2539 - val_acc: 0.9263
Epoch 7/20
60000/60000 [==============================] - 2s 29us/step - loss: 0.2965 - acc: 0.9151 - val_loss: 0.2393 - val_acc: 0.9312
Epoch 8/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.2792 - acc: 0.9190 - val_loss: 0.2264 - val_acc: 0.9342
Epoch 9/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.2641 - acc: 0.9239 - val_loss: 0.2173 - val_acc: 0.9363
Epoch 10/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.2520 - acc: 0.9277 - val_loss: 0.2064 - val_acc: 0.9413
Epoch 11/20
60000/60000 [==============================] - 2s 29us/step - loss: 0.2409 - acc: 0.9306 - val_loss: 0.1983 - val_acc: 0.9425
Epoch 12/20
60000/60000 [==============================] - 2s 30us/step - loss: 0.2307 - acc: 0.9331 - val_loss: 0.1894 - val_acc: 0.9447
Epoch 13/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.2209 - acc: 0.9362 - val_loss: 0.1813 - val_acc: 0.9463
Epoch 14/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.2106 - acc: 0.9396 - val_loss: 0.1756 - val_acc: 0.9478
Epoch 15/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.2044 - acc: 0.9410 - val_loss: 0.1687 - val_acc: 0.9501
Epoch 16/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.1963 - acc: 0.9424 - val_loss: 0.1625 - val_acc: 0.9528
Epoch 17/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.1912 - acc: 0.9436 - val_loss: 0.1576 - val_acc: 0.9542
Epoch 18/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.1842 - acc: 0.9472 - val_loss: 0.1544 - val_acc: 0.9541
Epoch 19/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.1782 - acc: 0.9482 - val_loss: 0.1490 - val_acc: 0.9553
Epoch 20/20
60000/60000 [==============================] - 2s 28us/step - loss: 0.1729 - acc: 0.9494 - val_loss: 0.1447 - val_acc: 0.9570
Test score: 0.144698123593
Test accuracy: 0.957
Steps to reproduce unsuccessful CPU-only keras-mxnet:
Do the same as before but instead of installing mxnet-cu80, install mxnet:
pip uninstall mxnet-cu80
pip install mxnet
Running the code on a jupyter notebook now gives me:
60000 train samples
10000 test samples
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_4 (Dense) (None, 512) 401920
_________________________________________________________________
activation_4 (Activation) (None, 512) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 512) 0
_________________________________________________________________
dense_5 (Dense) (None, 512) 262656
_________________________________________________________________
activation_5 (Activation) (None, 512) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 512) 0
_________________________________________________________________
dense_6 (Dense) (None, 10) 5130
_________________________________________________________________
activation_6 (Activation) (None, 10) 0
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
---------------------------------------------------------------------------
MXNetError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/mxnet/symbol/symbol.py in simple_bind(self, ctx, grad_req, type_dict, stype_dict, group2ctx, shared_arg_names, shared_exec, shared_buffer, **kwargs)
1512 shared_exec_handle,
-> 1513 ctypes.byref(exe_handle)))
1514 except MXNetError as e:
/usr/local/lib/python3.6/dist-packages/mxnet/base.py in check_call(ret)
148 if ret != 0:
--> 149 raise MXNetError(py_str(_LIB.MXGetLastError()))
150
MXNetError: [04:19:54] src/storage/storage.cc:123: Compile with USE_CUDA=1 to enable GPU usage
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x1c05f2) [0x7f737ac845f2]
[bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x1c0bd8) [0x7f737ac84bd8]
[bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2d7d3cd) [0x7f737d8413cd]
[bt] (3) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2d8141d) [0x7f737d84541d]
[bt] (4) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2d83206) [0x7f737d847206]
[bt] (5) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27a2831) [0x7f737d266831]
[bt] (6) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27a2984) [0x7f737d266984]
[bt] (7) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27aecec) [0x7f737d272cec]
[bt] (8) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27b55f8) [0x7f737d2795f8]
[bt] (9) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27c163a) [0x7f737d28563a]
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-4-c71d8965f0f3> in <module>()
49 history = model.fit(X_train, Y_train,
50 batch_size=batch_size, epochs=nb_epoch,
---> 51 verbose=1, validation_data=(X_test, Y_test))
52 score = model.evaluate(X_test, Y_test, verbose=0)
53 print('Test score:', score[0])
/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
1042 initial_epoch=initial_epoch,
1043 steps_per_epoch=steps_per_epoch,
-> 1044 validation_steps=validation_steps)
1045
1046 def evaluate(self, x=None, y=None,
/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py in fit_loop(model, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
197 ins_batch[i] = ins_batch[i].toarray()
198
--> 199 outs = f(ins_batch)
200 if not isinstance(outs, list):
201 outs = [outs]
/usr/local/lib/python3.6/dist-packages/keras/backend/mxnet_backend.py in train_function(inputs)
4794 def train_function(inputs):
4795 self._check_trainable_weights_consistency()
-> 4796 data, label, _, data_shapes, label_shapes = self._adjust_module(inputs, 'train')
4797
4798 batch = mx.io.DataBatch(data=data, label=label, bucket_key='train',
/usr/local/lib/python3.6/dist-packages/keras/backend/mxnet_backend.py in _adjust_module(self, inputs, phase)
4746 self._set_weights()
4747 else:
-> 4748 self._module.bind(data_shapes=data_shapes, label_shapes=None, for_training=True)
4749 self._set_weights()
4750 self._module.init_optimizer(kvstore=self._kvstore, optimizer=self.optimizer)
/usr/local/lib/python3.6/dist-packages/mxnet/module/bucketing_module.py in bind(self, data_shapes, label_shapes, for_training, inputs_need_grad, force_rebind, shared_module, grad_req)
341 compression_params=self._compression_params)
342 module.bind(data_shapes, label_shapes, for_training, inputs_need_grad,
--> 343 force_rebind=False, shared_module=None, grad_req=grad_req)
344 self._curr_module = module
345 self._curr_bucket_key = self._default_bucket_key
/usr/local/lib/python3.6/dist-packages/mxnet/module/module.py in bind(self, data_shapes, label_shapes, for_training, inputs_need_grad, force_rebind, shared_module, grad_req)
428 fixed_param_names=self._fixed_param_names,
429 grad_req=grad_req, group2ctxs=self._group2ctxs,
--> 430 state_names=self._state_names)
431 self._total_exec_bytes = self._exec_group._total_exec_bytes
432 if shared_module is not None:
/usr/local/lib/python3.6/dist-packages/mxnet/module/executor_group.py in __init__(self, symbol, contexts, workload, data_shapes, label_shapes, param_names, for_training, inputs_need_grad, shared_group, logger, fixed_param_names, grad_req, state_names, group2ctxs)
263 self.num_outputs = len(self.symbol.list_outputs())
264
--> 265 self.bind_exec(data_shapes, label_shapes, shared_group)
266
267 def decide_slices(self, data_shapes):
/usr/local/lib/python3.6/dist-packages/mxnet/module/executor_group.py in bind_exec(self, data_shapes, label_shapes, shared_group, reshape)
359 else:
360 self.execs.append(self._bind_ith_exec(i, data_shapes_i, label_shapes_i,
--> 361 shared_group))
362
363 self.data_shapes = data_shapes
/usr/local/lib/python3.6/dist-packages/mxnet/module/executor_group.py in _bind_ith_exec(self, i, data_shapes, label_shapes, shared_group)
637 type_dict=input_types, shared_arg_names=self.param_names,
638 shared_exec=shared_exec, group2ctx=group2ctx,
--> 639 shared_buffer=shared_data_arrays, **input_shapes)
640 self._total_exec_bytes += int(executor.debug_str().split('\n')[-3].split()[1])
641 return executor
/usr/local/lib/python3.6/dist-packages/mxnet/symbol/symbol.py in simple_bind(self, ctx, grad_req, type_dict, stype_dict, group2ctx, shared_arg_names, shared_exec, shared_buffer, **kwargs)
1517 error_msg += "%s: %s\n" % (k, v)
1518 error_msg += "%s" % e
-> 1519 raise RuntimeError(error_msg)
1520
1521 # update shared_buffer
RuntimeError: simple_bind error. Arguments:
/dense_4_input1: (128, 784)
[04:19:54] src/storage/storage.cc:123: Compile with USE_CUDA=1 to enable GPU usage
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x1c05f2) [0x7f737ac845f2]
[bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x1c0bd8) [0x7f737ac84bd8]
[bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2d7d3cd) [0x7f737d8413cd]
[bt] (3) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2d8141d) [0x7f737d84541d]
[bt] (4) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x2d83206) [0x7f737d847206]
[bt] (5) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27a2831) [0x7f737d266831]
[bt] (6) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27a2984) [0x7f737d266984]
[bt] (7) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27aecec) [0x7f737d272cec]
[bt] (8) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27b55f8) [0x7f737d2795f8]
[bt] (9) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x27c163a) [0x7f737d28563a]
What exactly is this error saying? How can I fix this?
This happens because model.compile uses CPU or GPU depending on whether GPU is available in the machine. Looks like it does not check if a GPU version of MXNet is installed. You can force model.compile to use CPU by explicitly specifying the context. Example:
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'],
context=["cpu()"])

Categories

Resources