how to improve the accuracy of autoencoder? - python

I have an autoencoder and I checked the accuracy of my model with different solutions like changing the number of conv layer and increase them, add or remove Batch Normalization, change the activation function, but the accuracy for all of them is similar and it does not have any improvement that is weird. I'm confused because I think it the accuracy for these different solutions should be different but it is 0.8156. could you please help me what is the problem? I also train it with 10000 epochs but the output is the same for 50 epochs! do my code is wrong or it can not become better?!
the accuracy graph
I am also not sure the learning rate decay work or not?!
I put my code here too:
from keras.layers import Input, Concatenate, GaussianNoise,Dropout,BatchNormalization
from keras.layers import Conv2D
from keras.models import Model
from keras.datasets import mnist,cifar10
from keras.callbacks import TensorBoard
from keras import backend as K
from keras import layers
import matplotlib.pyplot as plt
import tensorflow as tf
import keras as Kr
from keras.callbacks import ReduceLROnPlateau
from keras.callbacks import EarlyStopping
import numpy as np
import pylab as pl
import matplotlib.cm as cm
import keract
from matplotlib import pyplot
from keras import optimizers
from keras import regularizers
from tensorflow.python.keras.layers import Lambda;
image = Input((28, 28, 1))
conv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1e')(image)
conv2 = Conv2D(32, (3, 3), activation='elu', padding='same', name='convl2e')(conv1)
conv3 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl3e')(conv2)
#conv3 = Conv2D(8, (3, 3), activation='relu', padding='same', name='convl3e', kernel_initializer='Orthogonal',bias_initializer='glorot_uniform')(conv2)
BN=BatchNormalization()(conv3)
#DrO1=Dropout(0.25,name='Dro1')(conv3)
DrO1=Dropout(0.25,name='Dro1')(BN)
encoded = Conv2D(1, (3, 3), activation='elu', padding='same',name='encoded_I')(DrO1)
#-----------------------decoder------------------------------------------------
#------------------------------------------------------------------------------
deconv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1d')(encoded)
deconv2 = Conv2D(32, (3, 3), activation='elu', padding='same', name='convl2d')(deconv1)
deconv3 = Conv2D(16, (3, 3), activation='elu',padding='same', name='convl3d')(deconv2)
BNd=BatchNormalization()(deconv3)
DrO2=Dropout(0.25,name='DrO2')(BNd)
#DrO2=Dropout(0.5,name='DrO2')(deconv3)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same', name='decoder_output')(DrO2)
#model=Model(inputs=[image,wtm],outputs=decoded)
#--------------------------------adding noise----------------------------------
#decoded_noise = GaussianNoise(0.5)(decoded)
watermark_extraction=Model(inputs=image,outputs=decoded)
watermark_extraction.summary()
#----------------------training the model--------------------------------------
#------------------------------------------------------------------------------
#----------------------Data preparation----------------------------------------
(x_train, _), (x_test, _) = mnist.load_data()
x_validation=x_train[1:10000,:,:]
x_train=x_train[10001:60000,:,:]
#(x_train, _), (x_test, _) = cifar10.load_data()
#x_validation=x_train[1:10000,:,:]
#x_train=x_train[10001:60000,:,:]
#
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_validation = x_validation.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # adapt this if using `channels_first` image data format
x_validation = np.reshape(x_validation, (len(x_validation), 28, 28, 1))
#---------------------compile and train the model------------------------------
# is accuracy sensible metric for this model?
learning_rate = 0.1
decay_rate = learning_rate / 50
opt = optimizers.SGD(lr=learning_rate, momentum=0.9, decay=decay_rate, nesterov=False)
watermark_extraction.compile(optimizer=opt, loss=['mse'], metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20)
#rlrp = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_delta=1E-7, verbose=1)
history=watermark_extraction.fit(x_train, x_train,
epochs=50,
batch_size=32,
validation_data=(x_validation, x_validation),
callbacks=[TensorBoard(log_dir='E:/output of tensorboard', histogram_freq=0, write_graph=False),es])
watermark_extraction.summary()
#--------------------visuallize the output layers------------------------------
#_, train_acc = watermark_extraction.evaluate(x_train, x_train)
#_, test_acc = watermark_extraction.evaluate([x_test[5000:5001],wt_expand], [x_test[5000:5001],wt_expand])
#print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
## plot loss learning curves
pyplot.subplot(211)
pyplot.title('MSE Loss', pad=-40)
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='validation')
pyplot.legend()
pyplot.subplot(212)
pyplot.title('Accuracy', pad=-40)
pyplot.plot(history.history['acc'], label='train')
pyplot.plot(history.history['val_acc'], label='test')
pyplot.legend()
pyplot.show

Since you stated that you are a beginner, I am going to try to build from the bottom up and try to rope in your code with that explanation as much as possible.
Part 1 Autoencoders are made up of two parts (Encoders and Decoders). Autoencoders decrease the number of variables required to store the Information, and Decoders try to get this information back from the compressed form. (Note that autoencoders are not used in real data compression tasks, due to their uncertainty and their data dependent nature).
Now In your code you keep the padding as same.
conv1 = Conv2D(16, (3, 3), activation='elu', padding='same', name='convl1e')(image)
This basically takes away the compression and expansion feature of the autoencoders, i.e. in each step you are using the same number of variables to represent the information.
Part 2 Now moving on to you training the algorithm
history=watermark_extraction.fit(x_train, x_train,
epochs=50,
batch_size=32,
validation_data=(x_validation, x_validation),
callbacks=[TensorBoard(log_dir='E:/PhD/thesis/deepwatermark/journal code/autoencoder_watermark/11-2-2019/output of tensorboard', histogram_freq=0, write_graph=False),es])
From this expression/statement/line of code I come to the conclusion that you want to generate back the same Image that you put in your code, Now since the Image is stored in same number of variables your model just has to pass on the same Image to each step without changing anything in the Image, this incentivizes your model to optimize each Filter Parameter to 1.
Part 3 Now comes the biggest nail in the coffin, you have Implemented a dropout layer, first you should NEVER Implement dropout in the convolutional layer. This Link explains why and It discusses various ideas that I think if you are a beginner you should check out. Now let's see why the way you have used Dropout is really bad. As already explained, the best fit for your model would be all parameters in the filters learning the value 1. Now what happens here is that you have forced some of those Filters to turn off, which does nothing other than switching off some filters as discussed in the article, All this does is decreases the Intensity of your Images in the next layer.(Since CNN filters take average over all Input channels)
DrO2=Dropout(0.25,name='DrO2')(BNd)
Part 4 This is just a bit of advice and not something that will be a source of any problem
BNd=BatchNormalization()(deconv3)
Here you have tried to normalize the data over the batch, Data normalization is extremely important in most cases, as you might know it doesn't let one feature dictate the model and each feature get an equal say in the model, but in Image data every point is already scaled between zero and 255, so using normalization to scale it between 0 and 1 adds no value, just adds unnecessary computation to the model.
I would suggest you to understand part by part and if something is unclear, comment below, try not to make this about autoencoders using CNN (They don't have any real application anyway), but rather use it to understand various Intricacies of ConvNets(CNN), The reason I have chosen to write an answer like this explaining parts of your network and not the code is because the code for what you are looking for is just a google search away, If you are intrigued by this answer and want to know how exactly CNN's work, check this out https://www.youtube.com/watch?v=ArPaAX_PhIs&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF, If you have any doubts and on anything in this answer or even on those videos comment below.

Related

Why is a CNN model struggling to classify a colored MNIST?

I'm trying to classify colored MNIST digits with a basic CNN architecture on Keras. Here is the piece of code that colors the original dataset into purely either red, green or blue.
def load_norm_data():
## load basic mnist
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
train_images = np.zeros((*x_train.shape, 3)) # orig shape: (60 000, 28, 28, 1) -> rgb shape: (60 000, 28, 28, 3)
for num in range(x_train.shape[0]):
rgb = np.random.randint(3)
train_images[num, ..., rgb] = x_train[num]/255
return train_images, y_train
if __name__ == '__main__':
ims, labels = load_norm_data()
for num in range(10):
plt.subplot(2, 5, num+1)
plt.imshow(ims[num])
plt.axis('off')
which gives for the first couple of digits:
Then, I attempt to classify this colored dataset into the same 10 digit classes of MNIST, so the labels aren't changing --and yet the models accuracy drops from 95% for non-colored MNIST, to wildly variable 30-70% on colored MNIST, vastly depending on weight initialization... Please find below the architecture of said model:
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2), padding='same'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10, activation='relu'))
model.add(keras.layers.Softmax())
input_shape = train_images.shape
model.build(input_shape)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(train_images, train_numbers, batch_size=12, epochs=25)
Initially, I thought that this drop in performance might be linked to data irregularity (e.g. imagine a lot of 3s in the data ended up being green, thus the model learns green = 3). So I checked the data, the counts are good and the rgb distribution for each class is near 33% for each color too. I also checked the misclassified images to see if there were many representatives of a certain color or digit, but it doesn't seem to be the case either. In any case, after reading Keras' documentation and because of the fact that Conv2D forces you to pass it a 2-dimensional kernel_size that I imagine thus operates on all channels of the input image, the model shouldn't be taking color into account for classification here.
Am I missing something here?
Please let me know if you need any further information.
Thank you in advance.
The last part of the model includes a dense -> relu -> softmax. The relu activation should be removed. In addition, you might benefit from adding non-linearities (e.g., relu) in your convolutional blocks. Otherwise, the neural network will end up being a (big) linear function and will not work as well for non-linear data.
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same', activation='relu'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2)))
model.add(keras.layers.Conv2D(64, kernel_size=(3,3), padding='same', activation='relu'))
model.add(keras.layers.MaxPool2D(pool_size=(2,2), padding='same'))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10))
model.add(keras.layers.Softmax())
It is interesting that the original model worked well on the MNIST dataset. I cannot say for sure why, but perhaps the MNIST dataset is simple enough that the model was able to cope. Also, the relu -> softmax would clamp negative values to 0, and maybe there were not many negative values.

What is the appropriate input shape for a 2D CNN-based network?

I am having trouble passing the appropriate input shape to a CNN-based network with a Conv2D layer.
Initially, these are my train shapes. My train data is reshaped into windows:
X_train: (7,100,5185)= (number of features, window size, number of windows)
y_train= (5185, 100 ) = one labeled column that is also windowed
I then calculate some Recurrence plots from this data, I will then have these shapes:
X_train_rp= (5185, 100,100, 7), 100 * 100 referring to my images
y_train = (5185, 100 ), remains unchanged
I pass these two to a conv2D-based CNN with:
model.add(layers.Conv2D(64, kernel_size=3, activation='relu', input_shape=(100, 100, 7)))
And I get this error: Data cardinality is ambiguous: x sizes: 100, 100, 100 ......... y sizes: 5185 Make sure all arrays contain the same number of samples.
I tried many combinations of shapes but in vain! What am I doing wrong ??
EDIT:
This is the model definition using
import tensorflow as tf
X_train_rp = tf.zeros((10, 100,100, 7))
y_train = tf.zeros((10, 100))
#create model
model = tf.keras.Sequential() #add model layers
model.add(tf.keras.layers.Conv2D(64, kernel_size=3, activation='relu',
data_format='channels_last', input_shape=(100, 100, 7)))
model.add(tf.keras.layers.Conv2D(32, kernel_size=3, activation='relu'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(2, activation='softmax'))
#compile model using accuracy to measure model performance
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train_rp, y_train_shaped, epochs=3)
model.predict(X_train_rp)
Judging from the used module aliases I assume you use the tensorflow keras package with a sequential model definition. Your assumptions about the input shapes are actually correct demonstrated by this code snippet adapted from the keras documentation:
import tensorflow as tf
input_shape = (10, 100, 100, 7)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape[1:])(x)
print(y.shape)
>>> (10, 98, 98, 64)
This means the problem lies within your sequential model definition. Please update your question and include the necessary code.
EDIT:
Using the model definition provided by OP with a small modification yields a working training process. The issue lies in the definition of the dense layer, which takes the output nodes as first positional argument and not the input dimensions.
For the sake of computational cost I reduced the number of training examples from (5185) to (10)...
import tensorflow as tf
X_train_rp = tf.zeros((10, 100,100, 7))
y_train = tf.zeros((10, 100))
#create model
model = tf.keras.Sequential() #add model layers
model.add(tf.keras.layers.Conv2D(64, kernel_size=3, activation='relu',
data_format='channels_last', input_shape=(100, 100, 7)))
model.add(tf.keras.layers.Conv2D(32, kernel_size=3, activation='relu'))
model.add(tf.keras.layers.Flatten())
# Here comes the fix:
model.add(tf.keras.layers.Dense(100, activation='softmax'))
#compile model using accuracy to measure model performance
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train_rp, y_train, epochs=3)

Keras - Moderate Accuracy, bad predictions

I'm doing my first steps into machine learning and trying to do a sign-language machine learning project using the Kaggle dataset. It is supposed to be able to predict characters in ASL. Here's the data presented by Kaggle.
Image of Dataset here.
My current issue is that I can achieve moderate accuracy that fits the data given by Kaggle using their testing data, but if I try to predict a single image, say a random letter of the alphabet, it will be consistently wrong. Here's my code.
from keras.models import Sequential, load_model
from keras.preprocessing.image import load_img, img_to_array
from keras.layers import Dense, Dropout, Flatten, BatchNormalization, Activation
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import SGD
import numpy as np
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, LabelBinarizer
import matplotlib.pyplot as plt
trainer = read_csv("sign_mnist_train.csv")
labels = trainer["label"].values
trainer = trainer.drop(["label"], axis=1) #
tester = read_csv("sign_mnist_test.csv")
testlabels = tester["label"].values
tester = tester.drop(["label"], axis=1)
def preProcessing(raw, classes):
OH = OneHotEncoder(sparse=False) # One hot's the labels, can be replaced with LabelBinarizer
binary = classes.reshape(len(classes), 1)
binary = OH.fit_transform(binary)
images = raw.values
for c, i in enumerate(images, 0):
image = np.reshape(i, (28, 28))
image = image.flatten()
images[c] = np.array(image)
return images, binary
def defineModel(): # Builds the layers for our model
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=(x_test.shape[1:]), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu',padding='same'))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(y_train.shape[1], activation='softmax'))
opt = SGD(lr=0.001, momentum=0.9)
model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=["categorical_accuracy"])
return model
def testModel(): #Test's a single image, predicting the class.
model = load_model("my_model.hl5")
img = load_img("C.jpg", color_mode="grayscale", target_size=(28, 28))
img = img_to_array(img)
img = np.reshape(img, (-1, 28, 28, 1))
test = model.predict_classes(img)
print(test)
test_test = model.predict_proba(img)[0]
test_test = "%.2f" % (test_test[test]*100)
print(test_test)
if __name__ == "__main__":
data, labels = preProcessing(trainer, labels)
x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.33, random_state=42)
x_train = x_train.astype('float32')
x_train = x_train/255.0
x_train = np.reshape(x_train, (x_train.shape[0], 28, 28, 1))
x_test = x_test.astype('float32')
x_test = x_test/255.0
x_test = np.reshape(x_test, (x_test.shape[0], 28, 28, 1))
model = defineModel()
history = model.fit(x_train, y_train, validation_data = (x_test, y_test), epochs=40, verbose=1, batch_size=128)
model.evaluate(x_test, y_test)
model.save("my_model.hl5")
Apologies for the messy code, but essentially I try to break the data into usable parts using Panda, then using Keras/Sklearn to fit the data. I wanted to look deeper and was advised to use accuracy_score in the Sklearn library.
testStuff, testlabels = preProcessing(tester, testlabels)
testStuff = testStuff.reshape(testStuff.shape[0], 28, 28, 1)
pred = model.predict(testStuff).round()
print(accuracy_score(testlabels, pred))
This showed that my accuracy was only around 70% compared to the 99% model.evaluate posed. Regardless, I still a very low accuracy on random predictions, some of my individual tests were snipped straight from the Kaggle example images. From there, I tried removing layers, increasing/reducing filters on the Conv2d layers to see what happens, but nothing seems to make a difference. I picked up Pyplot to display the graph and I get this. I don't see a problematic trend, but I may be looking in the wrong area.
Is it because of overfitting/underfitting? I feel that I am getting something wrong at a fundamental level and could use some tips. Looking at similar questions, they point toward possible indexing issues and otherwise mismanagement of the dataset, I am unsure how to test if these issues are present in my code. This is my first time using StackOverflow to ask a question so feel free to ask anything since I understand that reading my rambling code/question is confusing.
Summary: Okay accuracy, bad predictions, why?
In general this behaviour often occurs due to overfitting:
Try to tweak your network to have fewer parameters and try to add some regularizations.
Further it could be that your test set only contains a part of the planned real world domain. Meaning that your training set is far away from reality, which also could lead to bad predictions.
A way to tweak your dataset could be data augmentation, I assume it could work very well on this ASL DataSet - but I did not had a deep look.
Data Augmentation is basically an artifical way to increase the size of your dataset, reducing overfitting as well and improves on slight rotations of your hand or other "random" distortions, like different background or different clothing.
A great article about data augmentation can be found here:
https://towardsdatascience.com/data-augmentation-for-deep-learning-4fe21d1a4eb9

ValueError: bad input shape (2, 256, 3) when trying to compute ROC curve

First, I'm new to python. Trying to build a ROC curve, I am getting an error on this code line:
fpr_keras, tpr_keras, thresholds_keras = roc_curve(Y_test.argmax(axis=1), decoded_imgs.argmax(axis=1))
error:
ValueError: bad input shape (2, 256, 3)
When I try to shape after reshape I get a second error:
TypeError: 'tuple' object is not callable
I followed this link, but I don't understand what should i do, I'm stacking on this problem. Can somebody edit my code? This is what I'm trying to do: link2
import keras
import numpy as np
from keras.datasets import mnist
from get_dataset import get_dataset
from stack import keras_model
X_train, X_test, Y_train, Y_test = get_dataset()
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Dense
from keras.models import Model
input_img = Input(shape=(256, 256, 3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='rmsprop', loss='mae',metrics=['mse', 'accuracy'])
from keras.callbacks import ModelCheckpoint, TensorBoard
checkpoints = []
from keras.preprocessing.image import ImageDataGenerator
generated_data = ImageDataGenerator(featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False, samplewise_std_normalization=False, zca_whitening=False, rotation_range=0, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip = True, vertical_flip = False)
generated_data.fit(X_train)
epochs = 1
batch_size = 5
autoencoder.fit_generator(generated_data.flow(X_train, X_train, batch_size=batch_size), steps_per_epoch=X_train.shape[0]/batch_size, epochs=epochs, validation_data=(X_test, X_test), callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])
autoencoder.fit(X_train, X_train, batch_size=batch_size, epochs=epochs, validation_data=(X_test, X_test), shuffle=True, callbacks=[TensorBoard(log_dir='/tmp/auti')])
decoded_imgs = autoencoder.predict(X_test)
from sklearn.metrics import roc_curve
#2 256 3
print(decoded_imgs.argmax(axis=1))
print(decoded_imgs.argmax(axis=1).reshape(1,3))
fpr_keras, tpr_keras, thresholds_keras = roc_curve(Y_test.argmax(axis=1), decoded_imgs.argmax(axis=1))
ValueError: bad input shape (2, 256, 3)
after editing the line to :
fpr_keras, tpr_keras, thresholds_keras = roc_curve(Y_test.argmax(axis=1), decoded_imgs.reshape(6,256,1))
i get this error :
ValueError: Found input variables with inconsistent numbers of samples: [2, 4]
You sound a little confused regarding the very basics of both ROC curves and autoencoders...
Quoting from the scikit-learn documentation of roc_curve:
roc_curve (y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)
Parameters:
y_true : array, shape = [n_samples]
True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.
y_score : array, shape =
[n_samples]
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as
returned by “decision_function” on some classifiers).
In other words, both inputs to roc_curve should be simple one-dimensional arrays of scalar numbers, the first one containing the true classes and the second one the predicted scores.
Now, despite the fact that you don't show a sample of your own data, and while I don't doubt that your Y_test.argmax(axis=1) may conform to this specification, most certainly your decoded_imgs.argmax(axis=1) (however you reshape it) does not. Why? Because of the very nature of an autoencoder.
In sharp contrast to models like the Random Forest classifier you also attempt to use in a (now removed) part of your code, autoencoders are not classifiers: their function is to reconstruct a (denoised, compressed etc) version of their input, and not to produce class predictions (see the nice little tutorial in the Keras blog for a quick orientation). Which, in your case, means that your decoded_imgs are actually transformed images (or image-like data, in any case), and not the class scores required by roc_curve, hence the error (which, technically speaking, is actually due to decoded_imgs not being a one-dimensional array, but hopefully you get the idea).
Even if you had used a classifier instead of an autoencoder here, you would have bumped upon another issue: ROC curves are used for binary classification tasks, and not for multi-class ones, like MNIST (there are actually some approaches applying them to multi-class data too, but they are not widely used AFAIK). It's true that, superficially, scikit-learn's roc_curve will work even in a multi-class setting:
import numpy as np
from sklearn import metrics
y = np.array([0, 1, 1, 2, 2]) # 3-class problem
scores = np.array([0.05, 0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2) # works OK, no error
but actually this happens only because we have explicitly defined that pos_label=2, hence, under the hood, scikit-learn considers all labels other than 2 as the negative ones, and subsequently treats the rest of the computations as if our problem was a binary one (i.e. class 2 vs all other classes).
In your case (MNIST), you should ask yourself the question: what exactly is a "positive" in the 10-class MNIST dataset? And does this question even make sense? Hopefully, you should be able to convince yourself that the answer is not straightforward, as in the binary (0/1) case.
To wrap-up: there is no coding error here to be remedied; the root cause of your issue is simply that you attempt something meaningless and invalid, since autoencoders do not produce class predictions, hence their output cannot be used for computing a ROC curve. I kindly suggest to first get a solid understanding of the relevant notions and concepts, before proceeding to applications...

Keras BatchNorm Layer giving weird results during training and inference

I am having an issue with Keras where evaluate function gives different training loss (way higher) and accuracy(way lower) value as compared to the value that I get during training. I am aware that this question has already been asked at several places (here, here), but I think my issue is different and still not answered in those forums.
Explanation of the Task
It is supposed to be a very simple task. All I am doing is to overfit to my own dataset of 256 images (29x29x3) with 256 output classes (one for each image).
Dataset
Case 1
x_train = All the pixel values in the image = i where i goes from 0 to 255.
y_train = i
Case 2
x_train = Centre 5*5 patch of the pixel values in the image = i where i goes from 0 to 255. All the other pixel values are same for all the images.
y_train = i
This gives me 256 images in total for the training data in each case. (It would be more clear if you just have a look at the code)
Here is my code to reproduce the issue -
from __future__ import print_function
import os
import keras
from keras.datasets import mnist
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, Activation
from keras.layers.normalization import BatchNormalization
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, Callback
from keras import backend as K
from keras.regularizers import l2
import matplotlib.pyplot as plt
import PIL.Image
import numpy as np
from IPython.display import clear_output
# The GPU id to use, usually either "0" or "1"
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="1"
# To suppress the warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
## Hyperparamters
batch_size = 256
num_classes = 256
l2_reg=0.0
epochs = 500
## input image dimensions
img_rows, img_cols = 29, 29
## Train Image (I took a random image from ImageNet)
train_img_name = 'n01871265_279.JPEG'
ret = PIL.Image.open(train_img_name) #Opening the image
ret = ret.resize((img_rows, img_cols)) #Resizing the image
img = np.asarray(ret, dtype=np.uint8).astype(np.float32) #Converting it to numpy array
print(img.shape) # (29, 29, 3)
## Creating the training data
#############################
x_train = np.zeros((256, img_rows, img_cols, 3))
y_train = np.zeros((256,), dtype=int)
for i in range(len(y_train)):
temp_img = np.copy(img)
## Case1 of dataset
# temp_img[:, :, :] = i # changing all the pixel values
## Case2 of dataset
temp_img[12:16, 12:16, :] = i # changing the centre block of 5*5 pixels
x_train[i, :, :, :] = temp_img
y_train[i] = i
##############################
## Common stuff in Keras
if K.image_data_format() == 'channels_first':
print('Channels First')
x_train = x_train.reshape(x_train.shape[0], 3, img_rows, img_cols)
input_shape = (3, img_rows, img_cols)
else:
print('Channels Last')
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 3)
input_shape = (img_rows, img_cols, 3)
## Normalizing the pixel values
x_train = x_train.astype('float32')
x_train /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
## convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
## Model definition
def model_toy(mom):
model = Sequential()
model.add( Conv2D(filters=64, kernel_size=(7, 7), strides=(1,1), input_shape=input_shape, kernel_regularizer=l2(l2_reg)) )
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
#Default parameters kept same as PyTorch
#Meaning of PyTorch momentum is different from Keras momentum.
# PyTorch mom = 0.1 is same as Keras mom = 0.9
model.add( Conv2D(filters=128, kernel_size=(7, 7), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=256, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=512, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=1024, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add( Conv2D( filters=2048, kernel_size=(3, 3), strides=(1, 1), kernel_regularizer=l2(l2_reg) ) )
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=4096, kernel_size=(3, 3), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
# Passing it to a dense layer
model.add(Flatten())
model.add(Dense(1024, kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
# Output Layer
model.add(Dense(num_classes, kernel_regularizer=l2(l2_reg)))
model.add(Activation('softmax'))
return model
mom = 0.9 #0
model = model_toy(mom)
model.summary()
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(lr=0.001),
#optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, decay=0.0, nesterov=True),
metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
shuffle=True,
)
print('Training results')
print('-------------------------------------------')
score = model.evaluate(x_train, y_train, verbose=1)
print('Training loss:', score[0])
print('Training accuracy:', score[1])
print('-------------------------------------------')
Small Note - I was able to successfully do this task in PyTorch. It is just that my actual task requires me to have a Keras model. That's why I have changed the default values of the BatchNorm layer (the root cause of the issue) according to the ones I used to train PyTorch model.
Here is the image that I used in my code.
Here are the results of training.
Case1 of the dataset
Case2 of the dataset
If you look at these two files, you would be able to notice the discrepancies in the training loss during training vs inference.
(I have set my batch size to be equal to the size of my training data so as to avoid some the reasons BatchNorm generally creates problems as mentioned here)
Next, I looked at the source code of the Keras to see if there is any way I can make the BatchNorm layer use the batch statistics instead of the running mean and variance.
Here is the update formula that Keras (backend - TF) uses to update the running mean and variance.
#running_stat -= (1 - momentum) * (running_stat - batch_stat)
So if I set the momentum value to be 0, it would mean that the value assigned to the runing_stat would always be equal to batch_stat during the training phase. Thus, the value it will use during inference mode will also be same (close) as batch/dataset statistics.
Here are the results for this little experiment with the same issue still occurring.
Case1 of the dataset
Case2 of the dataset
Programming Environment - Python-3.5.2, tensorflow-1.10.0, keras-2.2.4
I tried the same thing with tensorflow-1.12.0, keras-2.2.2 as well but it still did not solve the issue.

Categories

Resources