Is there a python way for reducing the training time of convolution neural network? - python

I'm building a keras model of convolution neural network for predicting the correct class and classify the tested objects. the model have the conv2D, activation, maxpooling, dropout, flatten, dense layers. after that I training the network on large dataset, but it take a very long time for training, it may reach to 3,4 days, What I need is to reduce the time required to training the network, Is there any way to do that in python?
I have tried to optimize the learning rate by using the LR_Finder class as follow:
from LR_Finder import LRFinder
lr_finder = LRFinder(min_lr=1e-5,max_lr=1e-2, steps_per_epoch=np.ceil(len(trainX) // BS), epochs=100)
But this also did not give me any reduction about the time required.
This is the code of my model:
class SmallerVGGNet:
#staticmethod
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# CONV => RELU => POOL
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
and after that I trained the model as following code:
EPOCHS = 100
INIT_LR = 1e-3
BS = 32
IMAGE_DIMS = (96, 96, 3)
data = []
labels = []
# grab the image paths and randomly shuffle them
imagePaths = sorted(list(paths.list_images("Dataset")))
random.seed(42)
random.shuffle(imagePaths)
# loop over the input images
for imagePath in imagePaths:
# load the image, pre-process it, and store it in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
image = img_to_array(image)
data.append(image)
label = imagePath.split(os.path.sep)[-2]
labels.append(label)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
print("[INFO] data matrix: {:.2f}MB".format(data.nbytes / (1024 * 1000.0)))
# binarize the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
labels, test_size=0.2, random_state=42)
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, fill_mode="nearest")
# initialize the model
model = SmallerVGGNet.build(width=IMAGE_DIMS[1], height=IMAGE_DIMS[0],
depth=IMAGE_DIMS[2], classes=len(lb.classes_))
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer= opt,
metrics=["accuracy"])
print("model compiled in few minutes successfully ^_^")
# train the network
H = model.fit_generator(aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS, verbose=1)
According to this code,I expected the output required some minutes or may be a few hours, but when it reach to training in model.fit_generator step, the actual time required is about many hours for every epoch and it requires some days to train all the network or it may be crash and stop working. Is there any way to reduce the training time?

set use_multiprocessing=True and workers>1 when you call fit_generator because the default is to execute the generator on the main thread only

Related

CNN high false positive rate

I am trying to train a convolutional neural network but I get a quite high number of false positive classified objects. I am using two classes, each 10.000 images with quite obvious differences. I would expect a rather easy task for a CNN, also I used some hand crafted features with a random forest classifier before which worked quite well.
This is the model I am using:
def build(width, height, depth, classes):
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# CONV => RELU => POOL layer set
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL layer set
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 3 => POOL layer set
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.25))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
# return the constructed network architecture
return model
After applying data augmentation, training and validation loss look better but still I get to many false positives.
Here is a screenshot of some example images from a validation set (screenshot), green marked are correct classified, the rest are false positives. Any suggestion, how to improve my model?
Edit:
I add also the code for pre-processing the images:
import matplotlib
matplotlib.use("Agg")
from smallvggnet import SmallVGGNet
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import SGD
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import pickle
import cv2
import os
from keras.utils.np_utils import to_categorical
# initialize the data and labels
print("[INFO] loading images...")
data = []
labels = []
# grab the image paths and randomly shuffle them
imagePaths = sorted(list(paths.list_images("C:/06112020_hyphae/all/")))
random.seed(42)
random.shuffle(imagePaths)
# loop over the input images
for imagePath in imagePaths:
image = cv2.imread(imagePath)
image = cv2.resize(image, (350, 150))
data.append(image)
label = imagePath.split(os.path.sep)[-2].split('/')[-1]
if label == 'pos':
label = 1
else:
label = 0
labels.append(label)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.25, random_state=42)
unique, counts = np.unique(trainY, return_counts=True)
print (dict(zip(unique, counts)))
trainY = to_categorical(trainY)
testY = to_categorical(testY)
# construct the image generator for data augmentation
#aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.2, horizontal_flip=True, fill_mode="nearest")
aug = ImageDataGenerator()
# initialize our VGG-like Convolutional Neural Network
model = SmallVGGNet.build(width=350, height=150, depth=3,
classes=2)
# initialize our initial learning rate, # of epochs to train for,
# and batch size
INIT_LR = 0.01
EPOCHS = 20
BS = 32
# initialize the model and optimizer (you'll want to use
# binary_crossentropy for 2-class classification)
print("[INFO] training network...")
opt = SGD(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
H = model.fit(x=aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS)

Model get 97% accuracy on train and validation but when use custom predict it get wrong

I use a CNN model to train image classification , it got great accuracy at test and validation (98% and 97%), but when use my image to predict it alway go wrong, here is my code:
BATCH_SIZE = 30
IMG_HEIGHT = 256
IMG_WIDTH = 256
STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)
train_data_gen = image_generator.flow_from_directory(directory=str(data_dir),
batch_size=BATCH_SIZE,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
classes = list(CLASS_NAMES))
here is prepare for dataset and data argumentation:
imgDataGen=ImageDataGenerator(
validation_split=0.2,
rescale=1/255,
horizontal_flip=True,
zoom_range=0.3,
rotation_range=15.,
width_shift_range=0.1,
height_shift_range=0.1,
)
prepare data:
train_dataset = imgDataGen.flow_from_directory(
directory=str(data_dir),
target_size = (IMG_HEIGHT, IMG_WIDTH),
classes = list(CLASS_NAMES),
batch_size = BATCH_SIZE,
subset = 'training'
)
val_dataset = imgDataGen.flow_from_directory(
directory=str(data_dir),
target_size = (IMG_HEIGHT, IMG_WIDTH),
classes = list(CLASS_NAMES),
batch_size =BATCH_SIZE,
subset = 'validation'
)
the model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(256, 256, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(6, activation='sigmoid'))
complie:
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.SGD(learning_rate=0.001,momentum=0.9),
metrics=['acc'])
train
history = model.fit_generator(
train_dataset,
validation_data = val_dataset,
workers=10,
epochs=20,
)
It get pretty high accuracy 98% on test and 97% on validation
but when i try with my code to predict
def prepare(filepath):
IMG_SIZE=256
img_array=cv2.imread(filepath)
new_array= cv2.resize(img_array,(IMG_SIZE,IMG_SIZE))
return new_array.reshape(1,IMG_SIZE,IMG_SIZE,3)
model=tf.keras.models.load_model('trained-model.h5',compile=False)
#np.set_printoptions(formatter={'float_kind':'{:f}'.format})
predict=model.predict([prepare('cat.jpg')])
pred_name = CATEGORIES[np.argmax(predict)]
print(pred_name)
it got wrong, with cat image it go for dog and dog for cat, but sometime it go right, just i think 98% is more accurate than this, if i try 5 image of cats it fail 3 or 4 images
so it because dataset or because of code?
please help, thanks
So in your second code-block you have this:
rescale=1/255
This is for normalizing your image into the range [0;1]. So every image gets rescaled (/normalized) before going through the network. But in you las code-block where you test it on an image you didnt add normalization. Try adding that to your "prepare" function:
def prepare(filepath):
IMG_SIZE = 256
img_array = cv2.imread(filepath)
# add this:
img_array = image_array / 255
new_array = cv2.resize(img_array,(IMG_SIZE,IMG_SIZE))
return new_array.reshape(1,IMG_SIZE,IMG_SIZE,3)

Input shape for 3d-CNN in python

I want to enter 8 images at the same time to the same CNN structure using conv3d. my CNN model is as following:
def build(sample, frame, height, width, channels, classes):
model = Sequential()
inputShape = (sample, frame, height, width, channels)
chanDim = -1
if K.image_data_format() == "channels_first":
inputShape = (sample, frame, channels, height, width)
chanDim = 1
model.add(Conv3D(32, (3, 3, 3), padding="same", input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling3D(pool_size=(2, 2, 2), padding="same", data_format="channels_last"))
model.add(Dropout(0.25))
model.add(Conv3D(64, (3, 3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling3D(pool_size=(2, 2, 2), padding="same", data_format="channels_last"))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128)) #(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax")
The training of model is as follow:
IMAGE_DIMS = (57, 8, 60, 60, 3) # since I have 460 images so 57 sample with 8 image each
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
# binarize the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
# note: data is a list of all dataset images
(trainX, testX, trainY, testY) train_test_split(data, labels, test_size=0.2, random_state=42)
aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode="nearest")
# initialize the model
model = CNN_Network.build(sample= IMAGE_DIMS[0], frame=IMAGE_DIMS[1],
height = IMAGE_DIMS[2], width=IMAGE_DIMS[3],
channels=IMAGE_DIMS[4], classes=len(lb.classes_))
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer= opt, metrics=["accuracy"])
# train the network
model.fit_generator(
aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY),
steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS, verbose=1)
I have confused with the input_shape, I know Conv3D require 5D input, the input is 4D with batch added from keras, but I have the following error:
ValueError: Error when checking input: expected conv3d_1_input to have 5 dimensions, but got array with shape (92, 60, 60, 3)
Can anyone please help me what can I do? what is the 92 resulted, I determine input_shape with (57, 8, 60, 60, 3). And what is my input_shape should become to get 8 colored images input to the same model at the same time.
In Keras Python 3, the input shape can be as follows:
input_shape = (8, 64, 64, 1)
Where:
Value 1 (8) is the number of frames
Value 2 (64) is the width
Value 3 (64) is the height
Value 4 (1) is the number of channels

Create 5-dimension input shape to 3d-CNN in python [duplicate]

This question already has answers here:
Multiple images input to the same CNN using Conv3d in keras
(2 answers)
Closed 3 years ago.
I have a dataset of 15 class with 460 images all. I want to enter every 8 sequences of images at the same time to the same CNN structure. I use conv3d to do that, but I'm confusing with input shape, it returns error.
This is my model:
IMAGE_DIMS = (8, 460, 60, 60, 3)
data = []
labels = []
# loading images...
imagePaths = "dataset\\path"
listing = os.listdir(imagePaths)
for imagePath in listing:
image_fold = os.listdir(imagePaths + "\\" + imagePath)
for file in image_fold:
im = (imagePaths + "\\" + imagePath + "\\" + file)
image = cv2.imread(im)
image = cv2.resize(image, (IMAGE_DIMS[2], IMAGE_DIMS[3]))
image = img_to_array(image)
data.append(image)
label= imagePath.split(os.path.sep)[-1]
labels.append(label)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
# binarize the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.2, random_state=42)
model = Sequential()
sample= IMAGE_DIMS[0]
frame=IMAGE_DIMS[1]
height = IMAGE_DIMS[2]
width=IMAGE_DIMS[3]
channels=IMAGE_DIMS[4]
classes=len(lb.classes_)
inputShape = (sample, frame, height, width, channels)
chanDim = -1
if K.image_data_format() == "channels_first":
inputShape = (sample, frame, channels, height, width)
chanDim = 1
model.add(Conv3D(32, (3, 3, 3), padding="same", batch_input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling3D(pool_size=(2, 2, 2), padding="same", data_format="channels_last"))
model.add(Dropout(0.25))
model.add(Conv3D(64, (3, 3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling3D(pool_size=(2, 2, 2), padding="same", data_format="channels_last"))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation("softmax"))
model.summary()
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer= opt, metrics=["accuracy"])
H = model.fit(trainX, trainY, batch_size=BS, epochs=EPOCHS, verbose=1,validation_data (testX,testY))
and this is my model summary:
But I get the following error:
ValueError: Error when checking input: expected conv3d_1_input to have 5 dimensions, but got array with shape (368, 60, 60, 3)
How can I fix the error, can anyone please help me, I will be thankful for any help. I know the problem with the input shape, the compiler refer to the model.fit step. I thing trainX, testX, trainY, testY must be in 5-dim, but I cannot able to that.
If I understand correctly, you would like to fit your model with 8 images which is called actually batch. So when you call the method model.fit() set batch_size = 8. Another point that, I think, you confused is about the input shape. If you would like to fit images to the network, your input shape is the height x width of the image and the number of channels which is in your case RGB. So, the set input_shape = (3, 60, 60). Please be aware of that the network structure does not includes the total number of images in it. Because the NN structure does not need to know what is the training number. When you fit the training images to the network it will just take a batch of it and does the training job. Lastly, Instead of using 3D convolution layer, you need to use 2D. Think it as a 2D frame that moves over the training image and it does the movement for each channel. Therefore, the frame size need to has a 2D shape, set it (x, x). This frame is called kernel in documents.
The following code just an sample and has not been tested. I hope it helps to understand the structure:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(3, 60, 60)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(number_of_classes))
model.add(Activation('softmax'))

Keras BatchNorm Layer giving weird results during training and inference

I am having an issue with Keras where evaluate function gives different training loss (way higher) and accuracy(way lower) value as compared to the value that I get during training. I am aware that this question has already been asked at several places (here, here), but I think my issue is different and still not answered in those forums.
Explanation of the Task
It is supposed to be a very simple task. All I am doing is to overfit to my own dataset of 256 images (29x29x3) with 256 output classes (one for each image).
Dataset
Case 1
x_train = All the pixel values in the image = i where i goes from 0 to 255.
y_train = i
Case 2
x_train = Centre 5*5 patch of the pixel values in the image = i where i goes from 0 to 255. All the other pixel values are same for all the images.
y_train = i
This gives me 256 images in total for the training data in each case. (It would be more clear if you just have a look at the code)
Here is my code to reproduce the issue -
from __future__ import print_function
import os
import keras
from keras.datasets import mnist
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, Activation
from keras.layers.normalization import BatchNormalization
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, Callback
from keras import backend as K
from keras.regularizers import l2
import matplotlib.pyplot as plt
import PIL.Image
import numpy as np
from IPython.display import clear_output
# The GPU id to use, usually either "0" or "1"
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="1"
# To suppress the warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
## Hyperparamters
batch_size = 256
num_classes = 256
l2_reg=0.0
epochs = 500
## input image dimensions
img_rows, img_cols = 29, 29
## Train Image (I took a random image from ImageNet)
train_img_name = 'n01871265_279.JPEG'
ret = PIL.Image.open(train_img_name) #Opening the image
ret = ret.resize((img_rows, img_cols)) #Resizing the image
img = np.asarray(ret, dtype=np.uint8).astype(np.float32) #Converting it to numpy array
print(img.shape) # (29, 29, 3)
## Creating the training data
#############################
x_train = np.zeros((256, img_rows, img_cols, 3))
y_train = np.zeros((256,), dtype=int)
for i in range(len(y_train)):
temp_img = np.copy(img)
## Case1 of dataset
# temp_img[:, :, :] = i # changing all the pixel values
## Case2 of dataset
temp_img[12:16, 12:16, :] = i # changing the centre block of 5*5 pixels
x_train[i, :, :, :] = temp_img
y_train[i] = i
##############################
## Common stuff in Keras
if K.image_data_format() == 'channels_first':
print('Channels First')
x_train = x_train.reshape(x_train.shape[0], 3, img_rows, img_cols)
input_shape = (3, img_rows, img_cols)
else:
print('Channels Last')
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 3)
input_shape = (img_rows, img_cols, 3)
## Normalizing the pixel values
x_train = x_train.astype('float32')
x_train /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
## convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
## Model definition
def model_toy(mom):
model = Sequential()
model.add( Conv2D(filters=64, kernel_size=(7, 7), strides=(1,1), input_shape=input_shape, kernel_regularizer=l2(l2_reg)) )
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
#Default parameters kept same as PyTorch
#Meaning of PyTorch momentum is different from Keras momentum.
# PyTorch mom = 0.1 is same as Keras mom = 0.9
model.add( Conv2D(filters=128, kernel_size=(7, 7), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=256, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=512, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=1024, kernel_size=(5, 5), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add( Conv2D( filters=2048, kernel_size=(3, 3), strides=(1, 1), kernel_regularizer=l2(l2_reg) ) )
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
model.add(Conv2D(filters=4096, kernel_size=(3, 3), strides=(1, 1), kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
# Passing it to a dense layer
model.add(Flatten())
model.add(Dense(1024, kernel_regularizer=l2(l2_reg)))
model.add(Activation('relu'))
model.add(BatchNormalization(momentum=mom, epsilon=0.00001))
# Output Layer
model.add(Dense(num_classes, kernel_regularizer=l2(l2_reg)))
model.add(Activation('softmax'))
return model
mom = 0.9 #0
model = model_toy(mom)
model.summary()
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(lr=0.001),
#optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, decay=0.0, nesterov=True),
metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
shuffle=True,
)
print('Training results')
print('-------------------------------------------')
score = model.evaluate(x_train, y_train, verbose=1)
print('Training loss:', score[0])
print('Training accuracy:', score[1])
print('-------------------------------------------')
Small Note - I was able to successfully do this task in PyTorch. It is just that my actual task requires me to have a Keras model. That's why I have changed the default values of the BatchNorm layer (the root cause of the issue) according to the ones I used to train PyTorch model.
Here is the image that I used in my code.
Here are the results of training.
Case1 of the dataset
Case2 of the dataset
If you look at these two files, you would be able to notice the discrepancies in the training loss during training vs inference.
(I have set my batch size to be equal to the size of my training data so as to avoid some the reasons BatchNorm generally creates problems as mentioned here)
Next, I looked at the source code of the Keras to see if there is any way I can make the BatchNorm layer use the batch statistics instead of the running mean and variance.
Here is the update formula that Keras (backend - TF) uses to update the running mean and variance.
#running_stat -= (1 - momentum) * (running_stat - batch_stat)
So if I set the momentum value to be 0, it would mean that the value assigned to the runing_stat would always be equal to batch_stat during the training phase. Thus, the value it will use during inference mode will also be same (close) as batch/dataset statistics.
Here are the results for this little experiment with the same issue still occurring.
Case1 of the dataset
Case2 of the dataset
Programming Environment - Python-3.5.2, tensorflow-1.10.0, keras-2.2.4
I tried the same thing with tensorflow-1.12.0, keras-2.2.2 as well but it still did not solve the issue.

Categories

Resources