my neural network written in keras, for the problem of binary image classification, after selecting hyperparameters using the keras tuner, produces only zeros.
import keras_tuner
from kerastuner import BayesianOptimization
from keras_tuner import Objective
from tensorflow.keras.models import Model
from tensorflow.keras.applications import Xception
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from torch.nn.modules import activation
def build_model(hp):
# create the base pre-trained model
base_model = Xception(include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False
x = base_model.output
x = Flatten()(x)
hp_units = hp.Int('units', min_value=32, max_value=4096, step=32)
x = Dense(units = hp_units, activation="relu")(x)
hp_rate = hp.Float('rate', min_value = 0.01, max_value=0.9, step=0.01)
x = Dropout(rate = hp_rate)(x)
predictions = Dense(1, activation='sigmoid')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
hp_learning_rate = hp.Float('learning_rate', max_value = 1e-2, min_value = 1e-7, step = 0.0005)
optimizer = hp.Choice('optimizer', ['adam', 'sgd', 'adagrad', 'rmsprop'])
model.compile(optimizer,
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy'])
return model
stop_early = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, min_delta= 0.0001)
tuner = BayesianOptimization(
hypermodel = build_model,
objective = Objective(name="val_accuracy",direction="max"),
max_trials = 10,
directory='/content/best_model_s',
overwrite=False
)
tuner.search(train_batches,
validation_data = valid_batches,
epochs = 100,
callbacks=[stop_early]
)
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
model = tuner.hypermodel.build(best_hps)
history = model.fit(train_batches, validation_data = valid_batches ,epochs=50)
val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))
best_model = tuner.hypermodel.build(best_hps)
# Retrain the model
best_model.fit(train_batches, validation_data = valid_batches , epochs=best_epoch)
test_generator.reset()
predict = best_model.predict_generator(test_generator, steps = len(test_generator.filenames))
I'm guessing that maybe the problem is that the ImageDataGenerator is fed to train with 2 batches of 16 images each, and to test the ImageDataGenerator with 2 batches of 4 images (each batch has an equal number of class representatives).I also noticed that with a small number of epochs, the neural network produces values from 0 to 1, but the more epochs, the closer the response of the neural network is to zero. For a solution, I tried to stop training as soon as the next 5 iterations do not decrease the loss on validation. Again, it seems to me that the matter is in the validation sample, it is very small.
Any advice?
Related
I have trained a vgg16 model with a total of 1000 images for 5 classes (200 images for each class). I have used data augmentation, stratified K-fold, and dropout to train the model. The train accuracy and val accuracy is good. However, when i do prediction on the trained model with test dataset, the result of confusion matrix is not compatible with the train accuracy.
[Train & Val accuracy[Classification reportConfusion Matrix](https://i.stack.imgur.com/OIX3O.png)](https://i.stack.imgur.com/MAPXC.png)
VGG model:
def create_model():
# (CNN) is a multilayered neural network with a special architecture to detect complex features in data.
# VGG16 = Visual Geometry Group
# 16 = 16 refers to it has 16 layers that have weights
# VGG16 have 128 million parameters
# 3x3 filter with stride 1
# maxpool layer of 2x2 filter of stride 2
# Conv-1 Layer has 64 number of filters
# Conv-2 has 128 filters
# Conv-3 has 256 filters
# Conv 4 and Conv 5 has 512 filters
# import library
from tensorflow.keras.models import Model
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Conv2D
# number of species
NO_CLASSES = 5
# load the VGG16 model as the base model for training
# exclude the fully connected layer
base_model = VGG16(include_top=False, input_shape=(224, 224, 3))
# add layers
x = base_model.output
x = Conv2D(64, (3,3), activation = 'relu')(x) # output layer will be 64x64, (3x3) kernel
x = GlobalAveragePooling2D()(x) # use average pooling as we dont have min pooling which selects the darkest pixels from image (our dataset here is white background)
# add dense layers so that the model can learn more complex functions and classify for netter results
# Dense layer = a layer that is deeply connected with its preceding layer
x = Flatten()(x) # for feeding into fully connected layer as fully connected layer only accept 1D
x = Dense(1024,activation='relu')(x)
x = Dense(1024,activation='relu')(x) # dense layer 2
x = Dense(512,activation='relu')(x) # dense layer 3
x = Dropout(0.2)(x) # reduce dependency between neurons
# final layer with softmax activation for multiclass classification
preds = Dense(NO_CLASSES, activation='softmax')(x)
# layers of the VGG16 model are frozen, bcuz we dont want their weights to changes during model training
# create a new model with the base model's original input and the new model's output
model = Model(inputs = base_model.input, outputs = preds)
# don't train the first 19 layers - 0..18
for layer in model.layers[:19]:
layer.trainable=False
# train the rest of the layers - 19 onwards
for layer in model.layers[19:]:
layer.trainable=True
# compile the model
model.compile(optimizer='Adam', # Adam optimizer -- training cost (low) and performance (high)
loss='categorical_crossentropy', # for multi-class(classes are mutually exclusive) problem
metrics=['accuracy']) # calculate accuracy
return model
Stratified K fold & model fit
from sklearn.model_selection import StratifiedKFold
from statistics import mean, stdev
EPOCHS = 6
histories = []
kfold = StratifiedKFold(n_splits = 5, shuffle=True, random_state=123)
for f, (trn_ind, val_ind) in enumerate(kfold.split(train_dataset.Image_path, train_dataset.labels)):
print(); print("#"*50)
print("Fold: ",f+1)
print("#"*50)
train_ds = datagen.flow_from_dataframe(train_dataset.loc[trn_ind,:],
x_col='Image_path', y_col='labels',
target_size=(width,height),
class_mode = 'categorical', color_mode = 'rgb',
batch_size = 16, shuffle = True)
val_ds = datagen.flow_from_dataframe(train_dataset.loc[val_ind,:],
x_col='Image_path', y_col='labels',
target_size=(width,height),
class_mode = 'categorical', color_mode = 'rgb',
batch_size = 16, shuffle = True)
# Define start and end epoch for each folds
fold_start_epoch = f * EPOCHS
fold_end_epoch = EPOCHS * (f+1)
step_size_train = train_ds.n // train_ds.batch_size
# fit
history=model.fit(train_ds,
initial_epoch=fold_start_epoch ,
epochs=fold_end_epoch,
validation_data=val_ds,
shuffle=True,
steps_per_epoch=step_size_train,
verbose=1)
# store history for each folds
histories.append(history)
Does this happened is because of the dataset itself or the coding problem? I hope to find the mistake.
I am trying to implement a classification model with ResNet50. I know, CNN or transfer learning models extract the features themselves. How can I get the extracted feature vector from the image dataset in python?
Code Snippet of model training:
base_model = ResNet152V2(
input_shape = (height, width, 3),
include_top=False,
weights='imagenet'
)
from tensorflow.keras.layers import MaxPool2D, BatchNormalization, GlobalAveragePooling2D
top_model = base_model.output
top_model = GlobalAveragePooling2D()(top_model)
top_model = Dense(1072, activation='relu')(top_model)
top_model = Dropout(0.6)(top_model)
top_model = Dense(256, activation='relu')(top_model)
top_model = Dropout(0.6)(top_model)
prediction = Dense(len(categories), activation='softmax')(top_model)
model = Model(inputs=base_model.input, outputs=prediction)
for layer in model.layers:
layer_trainable=False
from tensorflow.keras.optimizers import Adam
model.compile(
optimizer = Adam(learning_rate=0.00001),
loss='categorical_crossentropy',
metrics=['accuracy']
)
import keras
from keras.callbacks import EarlyStopping
es_callback = keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)
datagen = ImageDataGenerator(rotation_range= 30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.3,
horizontal_flip=True
)
resnet152v2 = model.fit(datagen.flow(X_train, y_train),
validation_data = (X_val, y_val),
epochs = 100,
callbacks=[es_callback]
)
I followed a solution from here.
Code snippet of extracting features from the trained model:
extract = Model(model.inputs, model.layers[-3].output) #top_model = Dropout(0.6)(top_model)
features = extract.predict(X_test)
The shape of X_test is (120, 224, 224, 3)
Once you have trained your model, you can save it (or just directly use it in the same file) then cut off the top layer.
model = load_model("xxx.h5") #Or use model directly
#Use functional model
from tensorflow.keras.models import Model
featuresModel = Model(inputs=model.input,outputs=model.layers[-3].output) #Get rids of dropout too
#Inference directly
featureVector = featuresModel.predict(yourBatchData, batch_size=yourBatchLength)
If inferenced in batches, you would need to separate the output results to see the result for each input in the batch.
I have made a custom generator in which I need my model's prediction, during training, to do some calculations on it, before it is trained against the true labels. Therefore, I save the model first and then call model.predict() on the current state.
from keras.models import load_model
def custom_generator(model):
while True:
state, target_labels = next(train_it)
model.save('my_model.h5')
#pause training and do some calculations on the output of the model trained so far
print(state)
print(target_labels)
model.predict(state)
#resume training
#model = load_model('my_model.h5')
yield state, target_labels
model3.fit_generator(custom_generator(model3), steps_per_epoch=1, epochs = 10)
loss = model3.evaluate_generator(test_it, steps=1)
loss
I get the following error due to calling model.predict(model) in the custom_generator()
Error:
ValueError: Tensor Tensor("dense_2/Softmax:0", shape=(?, 200),
dtype=float32) is not an element of this graph.
Kindly, help me how to get model predictions(or last layer output) in a custom generator during training.
This is my model:
#libraries
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
from keras.applications.vgg16 import VGG16
model = VGG16(include_top=False, weights='imagenet')
print(model.summary())
#add layers
z = Conv2D(1, (3, 3), activation='relu')(model.output)
z = Conv2D(1,(1,1), activation='relu')(z)
z = GlobalAveragePooling2D()(z)
predictions3 = Dense(200, activation='softmax')(z)
model3 = Model(inputs=model.input, outputs=predictions3)
for layer in model3.layers[:20]:
layer.trainable = False
for layer in model3.layers[20:]:
layer.trainable = True
model3.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')
Image data generators for loading training and testing data
from keras.preprocessing.image import ImageDataGenerator
# create a data generator
datagen = ImageDataGenerator()
# load and iterate training dataset
train_it = datagen.flow_from_directory('DATA/C_Train/', class_mode='categorical', batch_size=1)
test_it = datagen.flow_from_directory('DATA/C_Test/', class_mode='categorical', batch_size=1)
Your best bet may be to write a custom train loop via train_on_batch or fit; the former's only disadvantaged if use_multiprocessing=True, or using callbacks - which isn't the case. Below is an implementation with train_on_batch - if you use fit instead (for multiprocessing, callbacks, etc), make sure you feed only one batch at a time, and provide no validation data (use model.evaluate instead) - else the control flow breaks. (Also, a custom Callback is a valid, but involved alternative)
CUSTOM TRAIN LOOP
iters_per_epoch = len(train_it) // batch_size
num_epochs = 5
outs_store_freq = 20 # in iters
print_loss_freq = 20 # in iters
iter_num = 0
epoch_num = 0
model_outputs = []
loss_history = []
while epoch_num < num_epochs:
while iter_num < iters_per_epoch:
x_train, y_train = next(train_it)
loss_history += [model3.train_on_batch(x_train, y_train)]
x_test, y_test = next(test_it)
if iter_num % outs_store_freq == 0:
model_outputs += [model3.predict(x_test)]
if iter_num % print_loss_freq == 0:
print("Iter {} loss: {}".format(iter_num, loss_history[-1]))
iter_num += 1
print("EPOCH {} FINISHED".format(epoch_num + 1))
epoch_num += 1
iter_num = 0 # reset counter
FULL CODE
from keras.models import Sequential
from keras.layers import Dense, Conv2D, GlobalAveragePooling2D
from keras.models import Model
from keras.optimizers import SGD
from keras.applications.vgg16 import VGG16
from keras.preprocessing.image import ImageDataGenerator
model = VGG16(include_top=False, weights='imagenet')
print(model.summary())
#add layers
z = Conv2D(1, (3, 3), activation='relu')(model.output)
z = Conv2D(1,(1,1), activation='relu')(z)
z = GlobalAveragePooling2D()(z)
predictions3 = Dense(2, activation='softmax')(z)
model3 = Model(inputs=model.input, outputs=predictions3)
for layer in model3.layers[:20]:
layer.trainable = False
for layer in model3.layers[20:]:
layer.trainable = True
model3.compile(optimizer=SGD(lr=0.0001, momentum=0.9),
loss='categorical_crossentropy')
batch_size = 1
datagen = ImageDataGenerator()
train_it = datagen.flow_from_directory('DATA/C_Train/',
class_mode='categorical',
batch_size=batch_size)
test_it = datagen.flow_from_directory('DATA/C_Test/',
class_mode='categorical',
batch_size=batch_size)
[custom train loop here]
BONUS CODE: to get outputs of any layer, use below:
def get_layer_outputs(model, layer_name, input_data, learning_phase=1):
outputs = [layer.output for layer in model.layers if layer_name in layer.name]
layers_fn = K.function([model.input, K.learning_phase()], outputs)
return [layers_fn([input_data,learning_phase])][0]
outs = get_layer_outputs(model, 'dense_1', x_test, 0) # 0 == inference mode
I'm trying to implement the same model in Keras, and in Tensorflow using Keras layers, using custom data. The two models produce consistently different accuracies over many times of training (keras ~71%, tensorflow ~65%). I want tensorflow to do as well as keras so I can go into the tensorflow iterations to tweak some lower level algorithms.
Here's my original Keras code:
from keras.layers import Dense, Dropout, Input
from keras.models import Model, Sequential
from keras import backend as K
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
main_input = Input(shape=(input_size,),name='text_vectors')
x = Dense(units=64, activation='relu', name = 'dense1')(main_input)
drop1 = Dropout(0.2,name='dropout1')(x)
auxiliary_input = Input(shape=(num_aux_inputs,), name='aux_input')
x = keras.layers.concatenate([drop1,auxiliary_input])
x = Dense(units=64, activation='relu',name='dense2')(x)
drop2 = Dropout(0.1,name='dropout2')(x)
x = Dense(units=32, activation='relu',name='dense3')(drop2)
main_output = Dense(units=num_classes,
activation='softmax',name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input],
outputs=main_output)
model.compile(loss=keras.losses.categorical_crossentropy, metrics= ['accuracy'],optimizer=keras.optimizers.Adadelta())
history = model.fit([train_x,train_x_auxiliary], train_y, batch_size=128, epochs=20, verbose=1, validation_data=([val_x,val_x_auxiliary], val_y))
loss, accuracy = model.evaluate([val_x,val_x_auxiliary], val_y, verbose=0)
Here's I moved the keras layers to tensorflow following this article:
import tensorflow as tf
from keras import backend as K
import keras
from keras.layers import Dense, Dropout, Input # Dense layers are "fully connected" layers
from keras.metrics import categorical_accuracy as accuracy
from keras.objectives import categorical_crossentropy
tf.reset_default_graph()
sess = tf.Session()
K.set_session(sess)
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
x = tf.placeholder(tf.float32, shape=[None, input_size], name='X')
x_aux = tf.placeholder(tf.float32, shape=[None, num_aux_inputs], name='X_aux')
y = tf.placeholder(tf.float32, shape=[None, num_classes], name='Y')
# build graph
layer = Dense(units=64, activation='relu', name = 'dense1')(x)
drop1 = Dropout(0.2,name='dropout1')(layer)
layer = keras.layers.concatenate([drop1,x_aux])
layer = Dense(units=64, activation='relu',name='dense2')(layer)
drop2 = Dropout(0.1,name='dropout2')(layer)
layer = Dense(units=32, activation='relu',name='dense3')(drop2)
output_logits = Dense(units=num_classes, activation='softmax',name='main_output')(layer)
loss = tf.reduce_mean(categorical_crossentropy(y, output_logits))
acc_value = tf.reduce_mean(accuracy(y, output_logits))
correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name='correct_pred')
optimizer = tf.train.AdadeltaOptimizer(learning_rate=1.0, rho=0.95,epsilon=tf.keras.backend.epsilon()).minimize(loss)
init = tf.global_variables_initializer()
sess.run(init)
epochs = 20 # Total number of training epochs
batch_size = 128 # Training batch size
display_freq = 300 # Frequency of displaying the training results
num_tr_iter = int(len(y_train) / batch_size)
with sess.as_default():
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
# Randomly shuffle the training data at the beginning of each epoch
x_train, x_train_aux, y_train = randomize(x_train, x_train_auxiliary, y_train)
for iteration in range(num_tr_iter):
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, x_aux_batch, y_batch = get_next_batch(x_train, x_train_aux, y_train, start, end)
# Run optimization op (backprop)
feed_dict_batch = {x: x_batch, x_aux:x_aux_batch, y: y_batch,K.learning_phase(): 1}
optimizer.run(feed_dict=feed_dict_batch)
I also implemented the whole model from scratch in tensorflow, but it also is a ~65% accuracy, so I decided to try this Keras-layers-within-TF set up to identify problems.
I've looked up posts on similar problems with Keras and Tensorflow, and have tried the following which didn't help in my case:
Keras's dropout layer is only active in the training phase, so I did the same in my tf code by setting keras.backend.learning_phase().
Keras and Tensorflow have different variable initializations. I've tried initializing my weights in tensorflow these following 3 ways, which is supposed to be the same as Keras's weight initialization, but they also didn't affect the accuracies:
initer = tf.glorot_uniform_initializer()
initer = tf.contrib.layers.xavier_initializer()
initer = tf.random_normal(shape) * (np.sqrt(2.0/(shape[0] + shape[1])))
The optimizer in the two versions are set to be exactly the same! Though it doesn't look like the accuracy depends on the optimizer - I tried using different optimizers in both keras and tf and the accuracies each converge to the same.
Help!
It seems to me that this is most probably the weight initialization problem. What I would suggest you to do is to initialize keras layers and before training get the layer weights and initialize tf layers with those values.
I have ran into that kind of problems and it solved problems for me but it was a long time ago and I don't know if they made those initializers the same. At that time tf and keras initializations were not the same obviously.
I checked with initializers,seed, parameters and hyperparameters but accuracy is different.
I checked the code for Keras and they randomly shuffle the batch of images and then fed into the network, so this shuffling is different across different engines. So we need to figure out a way in which we can fed the same set of batch images to the network in order to get same accuracy
A neural network trained on iris dataset using [4, 4] hidden layers and created separately in tensorflow and keras gives different results.
While the tensorflow model gives 96.6 % accuracy on test, keras model gives only around 50%. The various hyper parameters like learning rate, optimiser, mini batch size, etc were the same in both cases.
Keras model
model = Sequential()
model.add(Dense(units = 4, activation = 'relu', input_dim = 4))
model.add(Dropout(0.25))
model.add(Dense(units = 4, activation = 'relu'))
model.add(Dropout(0.25))
model.add(Dense(units = 3, activation = 'softmax'))
adam = Adam(epsilon = 10**(-6), lr = 0.01)
model.compile(optimizer = 'adagrad', loss = 'categorical_crossentropy', metrics = ['accuracy'])
one_hot_labels = keras.utils.to_categorical(y_train, num_classes = 3)
model.fit(X_train, one_hot_labels, epochs = 50, batch_size = 40)
Tensorflow model
feature_columns = [tf.feature_column.numeric_column(key = name,
shape = (1),
dtype = tf.float32) for name in list(X_train.columns)]
classifier = tf.estimator.DNNClassifier(hidden_units = [4, 4],
feature_columns = feature_columns,
n_classes = 3,
dropout = 0.25,
model_dir = './DNN_model')
train_input_fn = tf.estimator.inputs.pandas_input_fn(x = X_train,
y = y_train,
batch_size = 40,
num_epochs = 50,
shuffle = False)
classifier.train(input_fn = train_input_fn, steps = None)
For the keras model, I did try changing the learning rate, increasing the number of epochs, using different optimisers, etc. As such, the accuracy remained poor. Clearly, both the models are doing different things, but on the surface, they seem identical to me for all the key aspects.
Any help is appreciated.
they have the same architecture, and that's all.
The difference in performance is coming from one or more of these factors:
You have Dropout. Therefore your networks in every start behaving differently (check how the Dropout works);
Weight initializations, which method you're using in Keras and TensorFlow?
Check all parameters of the optimizer.