Losses of NAN, Accuracy of 0, I tried everything! (CNN) - python

I'm trying to create a NN which can recognize the letters of the alphabet (26 classes). I apologize for the lengthy post, but I've included all my relevant code to be as clear as possible. In the end I've explained the issue.
The following block is where I name the paths correctly, and standardize/normalize the image and get it ready for training.
X = [] # Image data
y = [] # Labels
datagen = ImageDataGenerator(samplewise_center=True)
for path in imagepaths:
img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.resize(img, (200, 200))
img = image.img_to_array(img)
img = datagen.standardize(img)
X.append(img)
# Processing label in image path
category = path.split("\\")[1]
#print(category)
split = (category.split("_"))
if int(split[0]) == 0:
label = int(split[1])
else:
label = int(split[0])
y.append(label)
# Turn X and y into np.array to speed up train_test_split
X = np.array(X, dtype="uint8")
X = X.reshape(len(imagepaths), 200, 200, 1)
y = np.array(y)
Creating the test set.
ts = 0.3
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=ts, random_state=42)
Creating a model. Dense of 26, one output for each letter, and size 200,200,1 to match input:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(200, 200, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(26, activation='softmax'))
Model compiler and fit:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=1, batch_size=64, verbose=1, validation_data=(X_test, y_test))
The issue comes here where the output of my model.fit is:
Train on 54600 samples, validate on 23400 samples
Epoch 1/1
54600/54600 [==============================] - 79s 1ms/step - loss: nan - accuracy: 1.8315e-05 - val_loss: nan - val_accuracy: 0.0000e+00
I understand that it may not be high accuracy or anything from the get-go, but why are the losses nan? I posted elsewhere and I was first told to normalize my data (which I fixed for this post). Then I was told that it was possible that my dataset is corrupt or leaking - this is not the case because when I don't do 26 letters at once, it works perfectly fine. (Meaning I tested the code, using letters A-J, dense = 10, etc) and got a high accuracy of about 95%.
Any help is appreciated, I've been scratching my had at this for hours!

Related

Model get 97% accuracy on train and validation but when use custom predict it get wrong

I use a CNN model to train image classification , it got great accuracy at test and validation (98% and 97%), but when use my image to predict it alway go wrong, here is my code:
BATCH_SIZE = 30
IMG_HEIGHT = 256
IMG_WIDTH = 256
STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)
train_data_gen = image_generator.flow_from_directory(directory=str(data_dir),
batch_size=BATCH_SIZE,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
classes = list(CLASS_NAMES))
here is prepare for dataset and data argumentation:
imgDataGen=ImageDataGenerator(
validation_split=0.2,
rescale=1/255,
horizontal_flip=True,
zoom_range=0.3,
rotation_range=15.,
width_shift_range=0.1,
height_shift_range=0.1,
)
prepare data:
train_dataset = imgDataGen.flow_from_directory(
directory=str(data_dir),
target_size = (IMG_HEIGHT, IMG_WIDTH),
classes = list(CLASS_NAMES),
batch_size = BATCH_SIZE,
subset = 'training'
)
val_dataset = imgDataGen.flow_from_directory(
directory=str(data_dir),
target_size = (IMG_HEIGHT, IMG_WIDTH),
classes = list(CLASS_NAMES),
batch_size =BATCH_SIZE,
subset = 'validation'
)
the model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(256, 256, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(6, activation='sigmoid'))
complie:
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.SGD(learning_rate=0.001,momentum=0.9),
metrics=['acc'])
train
history = model.fit_generator(
train_dataset,
validation_data = val_dataset,
workers=10,
epochs=20,
)
It get pretty high accuracy 98% on test and 97% on validation
but when i try with my code to predict
def prepare(filepath):
IMG_SIZE=256
img_array=cv2.imread(filepath)
new_array= cv2.resize(img_array,(IMG_SIZE,IMG_SIZE))
return new_array.reshape(1,IMG_SIZE,IMG_SIZE,3)
model=tf.keras.models.load_model('trained-model.h5',compile=False)
#np.set_printoptions(formatter={'float_kind':'{:f}'.format})
predict=model.predict([prepare('cat.jpg')])
pred_name = CATEGORIES[np.argmax(predict)]
print(pred_name)
it got wrong, with cat image it go for dog and dog for cat, but sometime it go right, just i think 98% is more accurate than this, if i try 5 image of cats it fail 3 or 4 images
so it because dataset or because of code?
please help, thanks
So in your second code-block you have this:
rescale=1/255
This is for normalizing your image into the range [0;1]. So every image gets rescaled (/normalized) before going through the network. But in you las code-block where you test it on an image you didnt add normalization. Try adding that to your "prepare" function:
def prepare(filepath):
IMG_SIZE = 256
img_array = cv2.imread(filepath)
# add this:
img_array = image_array / 255
new_array = cv2.resize(img_array,(IMG_SIZE,IMG_SIZE))
return new_array.reshape(1,IMG_SIZE,IMG_SIZE,3)

Unable to preduction right values from my Regression Model

Predicting head poses to get output in the form of roll pitch yawn by using the model in the image, but the result is not accurate. Dataset has a total of 5500 images
What values should be:
Roll: 0.67°
Pitch: -4.89°
Yaw: 22.57°
Values from my model:
Roll: 356.10°
Pitch: 1036.82°
Yaw: 532.35°
`
#opening the dataset
a,b,c,d = pkl.load(open('samples.pkl', 'rb'))
#concatenating the 2 arrays
x = np.concatenate((a,b), axis=0)
y = np.concatenate((c,d), axis=0)
#Assigning degrees to roll pitch and yaw
roll, pitch, yaw = y[:, 0], y[:, 1], y[:, 2]
#test and train division
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)
x_val, x_test, y_val, y_test = train_test_split(x_test, y_test, test_size=0.2, random_state=2)
print(x_train.shape, y_train.shape)
print(x_val.shape, y_val.shape)
print(x_test.shape, y_test.shape)
#standazing the test and train sets
std = StandardScaler()
std.fit(x_train)
x_train = std.transform(x_train)
x_val = std.transform(x_val)
x_test = std.transform(x_test)
BATCH_SIZE = 64
EPOCHS = 100
#model of cnn
model = Sequential()
model.add(Dense(units=20, activation='relu', kernel_regularizer='l2', input_dim=x.shape[1]))
model.add(Dense(units=10, activation='relu', kernel_regularizer='l2'))
model.add(Dense(units=3, activation='linear'))
print(model.summary())
#compiling the model
callback_list = [EarlyStopping(monitor='val_loss', patience=25)]
model.compile(optimizer='adam', loss='mean_squared_error',metrics=['accuracy'])
hist = model.fit(x=x_train, y=y_train, validation_data=(x_val, y_val), batch_size=BATCH_SIZE, epochs=EPOCHS, callbacks=callback_list)
model.save('model.h5')
`
First of all predicting head poses from images is not a basic regression problem, it is, in fact, a computer vision problem that needs some computer vision-based approach.
Your model is a dummy MLP that will never perform well with the given problem.
I'm giving you some pointers, you still need to figure out which one works based on experimentation.
Instead of using an MLP, start with a ConvNet. Here's a simple CNN for regression.
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(3))
Get rid of the Flatten, and use Global Pooling if the model overfits.
Try to normalize your output. Instead of blindly regressing, range the roll, pitch to 0 - 1 using min-max normalization. Use sigmoid in the last layer in this case.
Here's a paper to get the basic idea and start with a baseline model: https://aip.scitation.org/doi/pdf/10.1063/1.4982509

CNN - Detecting Handwritten Smilies: ValueError: could not broadcast input array from shape (26,26,3) into shape (26)

In my root folder images I have three folders called 0, 1, 2. In folder 0 there are no smilies. In folder 1 there are happy handwritten smilies and in folder 2 there are sad handwritten smilies.
The images are jpg color images with the dimension 26x26.
This is my code
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten
import numpy as np
import os
import cv2
from sklearn.model_selection import train_test_split
def getImages(path, classes):
folder = os.listdir(path)
classes_counter = 0
images = []
images_classes = []
for x in range (0,len(folder)):
myPicList = os.listdir(path+"/"+ str(classes[classes_counter]))
for pic in myPicList:
img_path = path+"/" + str(classes[classes_counter]) + "/" + pic
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
images.append(img)
images_classes.append(classes_counter)
classes_counter +=1
images = np.array(images, dtype="float") / 255
return images, images_classes
def createModel(classes, images_dimension):
classes_amount = len(np.unique(classes))
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), padding='same', activation='relu', input_shape=images_dimension))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, kernel_size=(3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, kernel_size=(3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(classes_amount, activation='softmax'))
return model
labels = [0,1,2]
images, images_classes = getImages('training-images', labels)
images_dimension=(26,26,3)
X_train, X_test, Y_train, Y_test = train_test_split(images, images_classes, test_size=0.2) # if 1000 images split will 200 for testing
X_train, X_validation, Y_train, Y_validation = train_test_split(X_train, Y_train, test_size=0.2) # if 1000 images 20% of remaining 800 will be 160 for validation
model = createModel(labels, images_dimension)
batch_size = 20
epochs = 100
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(X_validation, Y_validation))
model.evaluate(X_test,Y_test,verbose=0)
In the Line images = np.array(images, dtype="float") / 255 I get the error:
Traceback (most recent call last):
File "train-nn.py", line 54, in <module>
images, images_classes = getImages('training-images', labels)
File "train-nn.py", line 24, in getImages
images = np.array(images, dtype="float") / 255
ValueError: could not broadcast input array from shape (26,26) into shape (26)
I think something is wrong with the datastructure or with the array structure. I have no clue what I did wrong. Maybe someone known this problem and can give me a hint!
Here you can download the whole project as zip file.
http://fileshare.mynotiz.de/cnn-handwritten-smilies.zip
I found the problem. One of my test-data images was not in the format of 26x26 it was 26x23.

Is there a python way for reducing the training time of convolution neural network?

I'm building a keras model of convolution neural network for predicting the correct class and classify the tested objects. the model have the conv2D, activation, maxpooling, dropout, flatten, dense layers. after that I training the network on large dataset, but it take a very long time for training, it may reach to 3,4 days, What I need is to reduce the time required to training the network, Is there any way to do that in python?
I have tried to optimize the learning rate by using the LR_Finder class as follow:
from LR_Finder import LRFinder
lr_finder = LRFinder(min_lr=1e-5,max_lr=1e-2, steps_per_epoch=np.ceil(len(trainX) // BS), epochs=100)
But this also did not give me any reduction about the time required.
This is the code of my model:
class SmallerVGGNet:
#staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# CONV => RELU => POOL
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
and after that I trained the model as following code:
EPOCHS = 100
INIT_LR = 1e-3
BS = 32
IMAGE_DIMS = (96, 96, 3)
data = []
labels = []
# grab the image paths and randomly shuffle them
imagePaths = sorted(list(paths.list_images("Dataset")))
random.seed(42)
random.shuffle(imagePaths)
# loop over the input images
for imagePath in imagePaths:
# load the image, pre-process it, and store it in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
image = img_to_array(image)
data.append(image)
label = imagePath.split(os.path.sep)[-2]
labels.append(label)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
print("[INFO] data matrix: {:.2f}MB".format(data.nbytes / (1024 * 1000.0)))
# binarize the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
labels, test_size=0.2, random_state=42)
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, fill_mode="nearest")
# initialize the model
model = SmallerVGGNet.build(width=IMAGE_DIMS[1], height=IMAGE_DIMS[0],
depth=IMAGE_DIMS[2], classes=len(lb.classes_))
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer= opt,
metrics=["accuracy"])
print("model compiled in few minutes successfully ^_^")
# train the network
H = model.fit_generator(aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS, verbose=1)
According to this code,I expected the output required some minutes or may be a few hours, but when it reach to training in model.fit_generator step, the actual time required is about many hours for every epoch and it requires some days to train all the network or it may be crash and stop working. Is there any way to reduce the training time?
set use_multiprocessing=True and workers>1 when you call fit_generator because the default is to execute the generator on the main thread only

Keras BatchNorm Layer giving weird results during training and inference

I am having an issue with Keras where evaluate function gives different training loss (way higher) and accuracy(way lower) value as compared to the value that I get during training. I am aware that this question has already been asked at several places (here, here), but I think my issue is different and still not answered in those forums.
Explanation of the Task
It is supposed to be a very simple task. All I am doing is to overfit to my own dataset of 256 images (29x29x3) with 256 output classes (one for each image).
Dataset
Case 1
x_train = All the pixel values in the image = i where i goes from 0 to 255.
y_train = i
Case 2
x_train = Centre 5*5 patch of the pixel values in the image = i where i goes from 0 to 255. All the other pixel values are same for all the images.
y_train = i
This gives me 256 images in total for the training data in each case. (It would be more clear if you just have a look at the code)
Here is my code to reproduce the issue -
from __future__ import print_function
import os
import keras
from keras.datasets import mnist
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, Activation
from keras.layers.normalization import BatchNormalization
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, Callback
from keras import backend as K
from keras.regularizers import l2
import matplotlib.pyplot as plt
import PIL.Image
import numpy as np
from IPython.display import clear_output
# The GPU id to use, usually either "0" or "1"
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="1"
# To suppress the warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
## Hyperparamters
batch_size = 256
num_classes = 256
l2_reg=0.0
epochs = 500
## input image dimensions
img_rows, img_cols = 29, 29
## Train Image (I took a random image from ImageNet)
train_img_name = 'n01871265_279.JPEG'
ret = PIL.Image.open(train_img_name) #Opening the image
ret = ret.resize((img_rows, img_cols)) #Resizing the image
img = np.asarray(ret, dtype=np.uint8).astype(np.float32) #Converting it to numpy array
print(img.shape) # (29, 29, 3)
## Creating the training data
#############################
x_train = np.zeros((256, img_rows, img_cols, 3))
y_train = np.zeros((256,), dtype=int)
for i in range(len(y_train)):
temp_img = np.copy(img)
## Case1 of dataset
# temp_img[:, :, :] = i # changing all the pixel values
## Case2 of dataset
temp_img[12:16, 12:16, :] = i # changing the centre block of 5*5 pixels
x_train[i, :, :, :] = temp_img
y_train[i] = i
##############################
## Common stuff in Keras
if K.image_data_format() == 'channels_first':
print('Channels First')
x_train = x_train.reshape(x_train.shape[0], 3, img_rows, img_cols)
input_shape = (3, img_rows, img_cols)
else:
print('Channels Last')
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 3)
input_shape = (img_rows, img_cols, 3)
## Normalizing the pixel values
x_train = x_train.astype('float32')
x_train /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
## convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
## Model definition
def model_toy(mom):
model = Sequential()
model.add( Conv2D(filters=64, kernel_size=(7, 7), strides=(1,1), input_shape=input_shape, kernel_regularizer=l2(l2_reg)) )
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
#Default parameters kept same as PyTorch
#Meaning of PyTorch momentum is different from Keras momentum.
# PyTorch mom = 0.1 is same as Keras mom = 0.9
model.add( Conv2D(filters=128, kernel_size=(7, 7), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=256, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=512, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=1024, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add( Conv2D( filters=2048, kernel_size=(3, 3), strides=(1, 1), kernel_regularizer=l2(l2_reg) ) )
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=4096, kernel_size=(3, 3), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
# Passing it to a dense layer
model.add(Flatten())
model.add(Dense(1024, kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
# Output Layer
model.add(Dense(num_classes, kernel_regularizer=l2(l2_reg)))
model.add(Activation('softmax'))
return model
mom = 0.9 #0
model = model_toy(mom)
model.summary()
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(lr=0.001),
#optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, decay=0.0, nesterov=True),
metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
shuffle=True,
)
print('Training results')
print('-------------------------------------------')
score = model.evaluate(x_train, y_train, verbose=1)
print('Training loss:', score[0])
print('Training accuracy:', score[1])
print('-------------------------------------------')
Small Note - I was able to successfully do this task in PyTorch. It is just that my actual task requires me to have a Keras model. That's why I have changed the default values of the BatchNorm layer (the root cause of the issue) according to the ones I used to train PyTorch model.
Here is the image that I used in my code.
Here are the results of training.
Case1 of the dataset
Case2 of the dataset
If you look at these two files, you would be able to notice the discrepancies in the training loss during training vs inference.
(I have set my batch size to be equal to the size of my training data so as to avoid some the reasons BatchNorm generally creates problems as mentioned here)
Next, I looked at the source code of the Keras to see if there is any way I can make the BatchNorm layer use the batch statistics instead of the running mean and variance.
Here is the update formula that Keras (backend - TF) uses to update the running mean and variance.
#running_stat -= (1 - momentum) * (running_stat - batch_stat)
So if I set the momentum value to be 0, it would mean that the value assigned to the runing_stat would always be equal to batch_stat during the training phase. Thus, the value it will use during inference mode will also be same (close) as batch/dataset statistics.
Here are the results for this little experiment with the same issue still occurring.
Case1 of the dataset
Case2 of the dataset
Programming Environment - Python-3.5.2, tensorflow-1.10.0, keras-2.2.4
I tried the same thing with tensorflow-1.12.0, keras-2.2.2 as well but it still did not solve the issue.

Categories

Resources