I am trying to build an image classification model using an Inception Network as the base. This is a simple binary classification model.
My images are available in many smaller directories within one big directory. Each of them has its own 'image id' and that is how they have been named. In addition to this, I have a few tsv files which contain these image ids and the respective labels ('Positive' or 'Negative').
When I train the model, I see that my accuracy fluctuates without much progress. I was wondering if there is anything wrong with the way that I have prepared my dataset. I have written a few functions for this purpose.
Before I get to these functions, given below is how I have defined my model,
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dropout(0.4)(x)
predictions = Dense(2, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
These are the functions that I have written in order to prepare my data,
def vectorize_img(img_path):
img = load_img(img_path, target_size=(224, 224)) #size is 224,224 by default
x = img_to_array(img) #change to np array
x = preprocess_input(x) #make input confirm to InceptionV3 input format
return x
def prepare_features(base_dir, limit):
features_dict = dict()
for dir1 in os.listdir(base_dir):
for dir2 in os.listdir(base_dir + dir1):
for file in os.listdir(base_dir + dir1 + '/' + dir2):
if len(features_dict) < limit:
try:
img_path = base_dir + dir1 + '/' + dir2 + '/' + file
x = vectorize_img(img_path)
name_id = file.split('.')[0] #take the file name and use as id in dict
features_dict[name_id] = x
except Exception as e:
print(e)
pass
return features_dict
def prepare_data(file_path, features_dict):
inputs = []
labels = []
df = pd.read_csv(file_path, sep='\t')
df = df[['image_id', 'label_text_image']]
df['class'] = df[['image_id', 'label_text_image']].apply(lambda x: 1 if x['label_text_image'] == 'Positive' else 0, axis = 1)
for index, row in df.iterrows():
try:
inputs.append(features_dict[row['image_id']])
labels.append(row['class'])
except:
pass
return np.asarray(inputs), tf.one_hot(np.asarray(labels), depth=2)
These functions are then called to prepare my dataset,
features_dict = prepare_features('/path/to/img/dir', 8000)
x_train, y_train = prepare_data('/path/to/train/tsv', features_dict)
x_dev, y_dev = prepare_data('/path/to/dev/tsv', features_dict)
x_test, y_test = prepare_data('/path/to/test/tsv', features_dict)
Finally, the model is trained,
EPOCHS = 50
BATCH_SIZE = 32
STEPS_PER_EPOCH = 1
history = model.fit(x=x_train, y=y_train, validation_data=(x_dev, y_dev), epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, batch_size=BATCH_SIZE)
model.evaluate(x=x_test, y=y_test, batch_size=BATCH_SIZE)
Am I doing something wrong?
Here are the results that my model achieves,
Epoch 1/50
1/1 [==============================] - 158s 158s/step - loss: 0.8298 - accuracy: 0.5000 - val_loss: 0.7432 - val_accuracy: 0.5227
Epoch 2/50
1/1 [==============================] - 113s 113s/step - loss: 0.7775 - accuracy: 0.4688 - val_loss: 0.8225 - val_accuracy: 0.5153
Epoch 3/50
1/1 [==============================] - 113s 113s/step - loss: 0.7663 - accuracy: 0.5625 - val_loss: 0.8431 - val_accuracy: 0.5174
Epoch 4/50
1/1 [==============================] - 156s 156s/step - loss: 1.1292 - accuracy: 0.5312 - val_loss: 0.7763 - val_accuracy: 0.5227
Epoch 5/50
1/1 [==============================] - 114s 114s/step - loss: 0.7452 - accuracy: 0.5312 - val_loss: 0.7332 - val_accuracy: 0.5448
Epoch 6/50
1/1 [==============================] - 156s 156s/step - loss: 0.7884 - accuracy: 0.5312 - val_loss: 0.7072 - val_accuracy: 0.5606
Epoch 7/50
1/1 [==============================] - 114s 114s/step - loss: 0.7856 - accuracy: 0.5312 - val_loss: 0.7195 - val_accuracy: 0.5764
Epoch 8/50
1/1 [==============================] - 156s 156s/step - loss: 0.9203 - accuracy: 0.5312 - val_loss: 0.7348 - val_accuracy: 0.5616
Epoch 9/50
1/1 [==============================] - 156s 156s/step - loss: 0.8639 - accuracy: 0.4062 - val_loss: 0.7275 - val_accuracy: 0.5690
Epoch 10/50
1/1 [==============================] - 156s 156s/step - loss: 0.6170 - accuracy: 0.7188 - val_loss: 0.7125 - val_accuracy: 0.5880
Epoch 11/50
1/1 [==============================] - 156s 156s/step - loss: 0.5756 - accuracy: 0.7188 - val_loss: 0.6979 - val_accuracy: 0.6017
Epoch 12/50
1/1 [==============================] - 113s 113s/step - loss: 0.9976 - accuracy: 0.4375 - val_loss: 0.6834 - val_accuracy: 0.5933
Epoch 13/50
1/1 [==============================] - 156s 156s/step - loss: 0.7025 - accuracy: 0.5938 - val_loss: 0.6863 - val_accuracy: 0.5838
You mentioned that it is a binary classification hence labels are {0,1}. In this case your model output should either be
predictions = Dense(2, activation='softmax')(x)
with categorical labels [0,1] or [1,0]
or
predictions = Dense(1, activation='sigmoid')(x)
with binary label 1 or 0
but you are using output 2 with sigmoid i.e. predictions = Dense(2, activation='sigmoid')(x).
Related
I'm working on a multiclass text classification problem.
After splitting the data to train and validation data frames, I've performed text augmentation to balance the data (only on the train data of course).
I've ended up with a balanced trained data and 44325 samples (of trained data).
Later on I've applied the "clean text" task for getting (i.e. stemming and stuff) on the trained data.
train['text'] = train['text'].apply(clean_text)
X_train = train.iloc[:, :-1]
y_train = train.iloc[:, -1:]
X_test = valid.iloc[:, :-1]
y_test = valid.iloc[:, -1:]
y_test = pd.DataFrame(y_test).reset_index(drop=True)
tokenizer = Tokenizer(num_words=vocab_size, oov_token='<OOV>')
tokenizer.fit_on_texts(X_train['text'])
train_seq = tokenizer.texts_to_sequences(X_train['text'])
train_padded = pad_sequences(train_seq, maxlen=max_length, padding=padding_type, truncating=trunc_type)
validation_seq = tokenizer.texts_to_sequences(X_test['text'])
validation_padded = pad_sequences(validation_seq, maxlen=max_length, padding=padding_type, truncating=trunc_type)
print('Shape of train data tensor:', train_padded.shape)
print('Shape of validation data tensor:', validation_padded.shape)
Output:
Shape of train data tensor: (44325, 200)
Shape of validation data tensor: (5466, 200)
Here's the encoding section:
encode = OneHotEncoder()
training_labels = encode.fit_transform(y_train)
validation_labels = encode.transform(y_test)
training_labels = training_labels.toarray()
validation_labels = validation_labels.toarray()
Model:
model = Sequential()
model.add(Embedding(vocab_size, embedding_dim, input_length=train_padded.shape[1]))
model.add(Conv1D(48, len(GROUPS), activation='relu', padding='valid'))
model.add(GlobalMaxPooling1D())
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(len(GROUPS), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
epochs = 100
batch_size = 32
history = model.fit(train_padded, training_labels, shuffle=True ,
epochs=epochs, batch_size=batch_size,
validation_split=0.2,
validation_data=(validation_padded, validation_labels),
callbacks=[ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.0001),
EarlyStopping(monitor='val_loss', mode='min', patience=2, verbose=1),
EarlyStopping(monitor='val_accuracy', mode='max', patience=5, verbose=1)])
Model output:
Epoch 1/100
1109/1109 [==============================] - 88s 79ms/step - loss: 1.2021 - accuracy: 0.5235 - val_loss: 0.8374 - val_accuracy: 0.7232
Epoch 2/100
1109/1109 [==============================] - 87s 79ms/step - loss: 0.9505 - accuracy: 0.6645 - val_loss: 0.7488 - val_accuracy: 0.7461
Epoch 3/100
1109/1109 [==============================] - 86s 77ms/step - loss: 0.8378 - accuracy: 0.7058 - val_loss: 0.6686 - val_accuracy: 0.7663
Epoch 4/100
1109/1109 [==============================] - 88s 79ms/step - loss: 0.7391 - accuracy: 0.7382 - val_loss: 0.6134 - val_accuracy: 0.7891
Epoch 5/100
1109/1109 [==============================] - 89s 80ms/step - loss: 0.6763 - accuracy: 0.7546 - val_loss: 0.5832 - val_accuracy: 0.7997
Epoch 6/100
1109/1109 [==============================] - 87s 79ms/step - loss: 0.6185 - accuracy: 0.7760 - val_loss: 0.5529 - val_accuracy: 0.8050
Epoch 7/100
1109/1109 [==============================] - 87s 79ms/step - loss: 0.5737 - accuracy: 0.7912 - val_loss: 0.5311 - val_accuracy: 0.8153
Epoch 8/100
1109/1109 [==============================] - 88s 80ms/step - loss: 0.5226 - accuracy: 0.8080 - val_loss: 0.5268 - val_accuracy: 0.8226
Epoch 9/100
1109/1109 [==============================] - 88s 79ms/step - loss: 0.4955 - accuracy: 0.8171 - val_loss: 0.5142 - val_accuracy: 0.8285
Epoch 10/100
1109/1109 [==============================] - 88s 80ms/step - loss: 0.4665 - accuracy: 0.8265 - val_loss: 0.5035 - val_accuracy: 0.8338
Epoch 11/100
1109/1109 [==============================] - 88s 79ms/step - loss: 0.4410 - accuracy: 0.8348 - val_loss: 0.5082 - val_accuracy: 0.8399
Epoch 12/100
1109/1109 [==============================] - 88s 80ms/step - loss: 0.4190 - accuracy: 0.8407 - val_loss: 0.5160 - val_accuracy: 0.8414
Epoch 00012: early stopping
... and to the last part I'm unsure of:
def evaluate_preds(y_true, y_preds):
"""
Performs evaluation comparison on y_true labels vs. y_pred labels
on a classification.
"""
accuracy = accuracy_score(y_true, y_preds)
precision = precision_score(y_true, y_preds, average='micro')
recall = recall_score(y_true, y_preds, average='micro')
f1 = f1_score(y_true, y_preds, average='micro')
metric_dict = {"accuracy": round(accuracy, 2),
"precision": round(precision, 2),
"recall": round(recall, 2),
"f1": round(f1, 2)}
print(f"Acc: {accuracy * 100:.2f}%")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 score: {f1:.2f}")
return metric_dict
predicted = model.predict(validation_padded)
evaluate_preds(np.argmax(validation_labels, axis=1), np.argmax(predicted, axis=1))
Output:
Acc: 40.16%
Precision: 0.40
Recall: 0.40
F1 score: 0.40
I can't understand what am I doing wrong.
How come accuracy of the last mentioned method is so low compare to val_accuracy?
I created a simple LSTM model but my validation accuracy always revolves around 50 no matter how many epochs I use. Here's how it looks compared to training accuracy:
Epoch 15/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.9408 - accuracy: 0.7999 - val_loss: 3.5255 - val_accuracy: 0.5190
Epoch 16/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.8724 - accuracy: 0.8080 - val_loss: 3.6279 - val_accuracy: 0.5127
Epoch 17/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.8041 - accuracy: 0.8177 - val_loss: 3.6627 - val_accuracy: 0.5158
Epoch 18/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.7377 - accuracy: 0.8297 - val_loss: 3.7247 - val_accuracy: 0.5140
Epoch 19/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.6680 - accuracy: 0.8431 - val_loss: 3.8000 - val_accuracy: 0.5144
Epoch 20/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.6036 - accuracy: 0.8578 - val_loss: 3.9164 - val_accuracy: 0.5051
Epoch 21/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.5460 - accuracy: 0.8715 - val_loss: 3.9832 - val_accuracy: 0.5089
Epoch 22/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.4830 - accuracy: 0.8872 - val_loss: 4.0284 - val_accuracy: 0.5095
Epoch 23/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.4277 - accuracy: 0.9019 - val_loss: 4.1428 - val_accuracy: 0.5067
Epoch 24/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.3760 - accuracy: 0.9169 - val_loss: 4.1972 - val_accuracy: 0.5069
Epoch 25/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.3319 - accuracy: 0.9275 - val_loss: 4.2494 - val_accuracy: 0.5047
Epoch 26/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.2883 - accuracy: 0.9406 - val_loss: 4.3047 - val_accuracy: 0.5075
Epoch 27/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.2471 - accuracy: 0.9507 - val_loss: 4.3822 - val_accuracy: 0.5063
Epoch 28/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.2131 - accuracy: 0.9592 - val_loss: 4.4553 - val_accuracy: 0.5071
I think it might be overfitting but I have doubts about validation_split function that Keras provides. Does it even shuffle the data?
Anyways, here's my full code from start, even how I take input so u can tell me how I can modify it, from batch size to last nodes size etc. Please take a look and tell me how it can be optimised so my validation accuracy can improve.
BATCH_SIZE = 64
EPOCHS = 50
LSTM_NODES =256
NUM_SENTENCES = 3000
MAX_SENTENCE_LENGTH = 50
MAX_NUM_WORDS = 3000
EMBEDDING_SIZE = 100
input_sentences = []
output_sentences = []
output_sentences_inputs = []
count = 0
for line in open(r'/content/drive/My Drive/TEMPPP/123.txt', encoding="utf-8"):
count += 1
if count > NUM_SENTENCES:
break
if '\t' not in line:
continue
input_sentence, output = line.rstrip().split('\t')
output_sentence = output + ' <eos>'
output_sentence_input = '<sos> ' + output
input_sentences.append(input_sentence)
output_sentences.append(output_sentence)
output_sentences_inputs.append(output_sentence_input)
input_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS)
input_tokenizer.fit_on_texts(input_sentences)
input_integer_seq = input_tokenizer.texts_to_sequences(input_sentences)
word2idx_inputs = input_tokenizer.word_index
max_input_len = max(len(sen) for sen in input_integer_seq)
output_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS, filters='')
output_tokenizer.fit_on_texts(output_sentences + output_sentences_inputs)
output_integer_seq = output_tokenizer.texts_to_sequences(output_sentences)
output_input_integer_seq = output_tokenizer.texts_to_sequences(output_sentences_inputs)
word2idx_outputs = output_tokenizer.word_index
num_words_output = len(word2idx_outputs) + 1
max_out_len = max(len(sen) for sen in output_integer_seq)
encoder_input_sequences = pad_sequences(input_integer_seq, maxlen=max_input_len)
decoder_input_sequences = pad_sequences(output_input_integer_seq, maxlen=max_out_len, padding='post')
import numpy as np
read_dictionary = np.load('/content/drive/My Drive/TEMPPP/hinvec.npy',allow_pickle='TRUE').item()
num_words = min(MAX_NUM_WORDS, len(word2idx_inputs) + 1)
embedding_matrix = np.zeros((num_words, EMBEDDING_SIZE))
for word, index in word2idx_inputs.items():
embedding_vector = read_dictionary.get(word)
if embedding_vector is not None:
embedding_matrix[index] = embedding_vector
embedding_layer = Embedding(num_words, EMBEDDING_SIZE, weights=[embedding_matrix], input_length=max_input_len)
decoder_targets_one_hot = np.zeros((
len(input_sentences),
max_out_len,
num_words_output
),
dtype='float32'
)
decoder_output_sequences = pad_sequences(output_integer_seq, maxlen=max_out_len, padding='post')
for i, d in enumerate(decoder_output_sequences):
for t, word in enumerate(d):
decoder_targets_one_hot[i, t, word] = 1
encoder_inputs_placeholder = Input(shape=(max_input_len,))
x = embedding_layer(encoder_inputs_placeholder)
encoder = LSTM(LSTM_NODES, return_state=True)
encoder_outputs, h, c = encoder(x)
encoder_states = [h, c]
decoder_inputs_placeholder = Input(shape=(max_out_len,))
decoder_embedding = Embedding(num_words_output, LSTM_NODES)
decoder_inputs_x = decoder_embedding(decoder_inputs_placeholder)
decoder_lstm = LSTM(LSTM_NODES, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs_x, initial_state=encoder_states)
decoder_dense = Dense(num_words_output, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
import tensorflow as tf
starter_learning_rate = 0.1
end_learning_rate = 0.01
decay_steps = 2000
learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay(
starter_learning_rate,
decay_steps,
end_learning_rate,
power=0.5)
opt = tf.keras.optimizers.Adam(learning_rate=learning_rate_fn, epsilon=1e-03, clipvalue=0.5)
model = Model([encoder_inputs_placeholder,
decoder_inputs_placeholder],
decoder_outputs)
model.compile(
optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy']
)
history = model.fit(
[encoder_input_sequences, decoder_input_sequences],
decoder_targets_one_hot,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_split=0.1,
)
I tried to add dropout layer but I couldn't add it between LSTM layer and dense layer. And I have doubts about validation_split. I tried to split dataset in train_test_set and valid_test_set but count make it work and ended up sticking with validation_split. Im pretty sure this is case of overfitting but not able to deal with it.
I'm tring to use CNN to classifiy 3 classes data, every data is 30*188. Class1 has 5794 data, class2 has 8471, class3 has 9092. When I train my model, the value of accuracy, loss , val_acc and val_loss don't change.
Please help me to solve this problem.
import glob
import os
import librosa
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import specgram
import librosa.display
import sklearn
from keras.utils import to_categorical
import scipy.io as scio
path1 = 'class1_feature_array.mat'
data1 = scio.loadmat(path1)
class1_feature_array = data1['class1_feature_array']
class1_label = np.zeros((class1_feature_array.shape[0],))
class1_label=class1_label.astype(np.int32)
class1_label=class1_label.astype(np.str)
path2 = 'class2_feature_array.mat'
data2 = scio.loadmat(path2)
class2_feature_array = data2['class2_feature_array']
class2_label = np.ones((class2_feature_array.shape[0],))
class2_label=class2_label.astype(np.int32)
class2_label=class2_label.astype(np.str)
path3 = 'class3_feature_array.mat'
data3 = scio.loadmat(path3)
class3_feature_array = data3['class3_feature_array']
class3_label = np.ones((class3_feature_array.shape[0],))*2
class3_label=class3_label.astype(np.int32)
class3_label=class3_label.astype(np.str)
features, labels = np.empty((0,40,188)), np.empty(0)
features = np.append(features,class1_feature_array,axis=0)
features = np.append(features,class2_feature_array,axis=0)
features = np.append(features,class3_feature_array,axis=0)
features = np.array(features)
labels = np.append(labels,class1_label,axis=0)
labels = np.append(labels,class2_label,axis=0)
labels = np.append(labels,class3_label,axis=0)
labels = np.array(labels, dtype = np.int)
def one_hot_encode(labels):
n_labels = len(labels)
n_unique_labels = len(np.unique(labels))
one_hot_encode = np.zeros((n_labels,n_unique_labels))
print("one_hot_encode",one_hot_encode.shape)
one_hot_encode[np.arange(n_labels), labels] = 1
return one_hot_encode
labels = one_hot_encode(labels)
train_test_split = np.random.rand(len(features)) < 0.80
train_x = features[train_test_split]
train_y = labels[train_test_split]
test_x = features[~train_test_split]
test_y = labels[~train_test_split]
train_x = train_x.reshape(train_x.shape[0],train_x.shape[1],train_x.shape[2],1)
test_x = test_x.reshape(test_x.shape[0],test_x.shape[1],test_x.shape[2],1)
import sklearn
import keras
from keras.models import Sequential
from keras.layers import *
from keras.callbacks import LearningRateScheduler
from keras import optimizers
#LeNet
model = Sequential()
model.add(Conv2D(32,(5, 5),strides=(1,1),padding='valid',activation='relu',input_shape=(40,188,1),kernel_initializer='uniform'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64,(5,5),strides=(1,1),padding='valid',activation='relu',kernel_initializer='uniform'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(100,activation='relu'))
model.add(Dense(3, activation='softmax'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd,
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary(line_length=80)
history = model.fit(train_x, train_y, epochs=100, batch_size=32, validation_data=(test_x, test_y))
The output after training is as shown below:
Train on 18625 samples, validate on 4732 samples
Epoch 1/100
18625/18625 [==============================] - 30s 2ms/step - loss: 8.0138 - accuracy: 0.5001 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 2/100
18625/18625 [==============================] - 22s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 3/100
18625/18625 [==============================] - 23s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 4/100
18625/18625 [==============================] - 24s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 5/100
18625/18625 [==============================] - 23s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 6/100
18625/18625 [==============================] - 24s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 7/100
18625/18625 [==============================] - 24s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 8/100
18625/18625 [==============================] - 25s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 9/100
18625/18625 [==============================] - 26s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 10/100
18625/18625 [==============================] - 25s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 11/100
18625/18625 [==============================] - 26s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 12/100
18625/18625 [==============================] - 26s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
I am building a training model for my character recognition system. During every epochs, I am getting the same accuracy and it doesn't improve. I have currently 4000 training images and 77 validation images.
My model is as follows:
inputs = Input(shape=(32,32,3))
x = Conv2D(filters = 64, kernel_size = 5, activation = 'relu')(inputs)
x = MaxPooling2D()(x)
x = Conv2D(filters = 32,
kernel_size = 3,
activation = 'relu')(x)
x = MaxPooling2D()(x)
x = Flatten()(x)
x=Dense(256,
activation='relu')(x)
outputs = Dense(1, activation = 'softmax')(x)
model = Model(inputs = inputs, outputs = outputs)
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
data_gen_train = ImageDataGenerator(rescale=1/255)
data_gen_test=ImageDataGenerator(rescale=1/255)
data_gen_valid = ImageDataGenerator(rescale=1/255)
train_generator = data_gen_train.flow_from_directory(directory=r"./drive/My Drive/train_dataset",
target_size=(32,32), batch_size=10, class_mode="binary")
valid_generator = data_gen_valid.flow_from_directory(directory=r"./drive/My
Drive/validation_dataset", target_size=(32,32), batch_size=2, class_mode="binary")
test_generator = data_gen_test.flow_from_directory(
directory=r"./drive/My Drive/test_dataset",
target_size=(32, 32),
batch_size=6,
class_mode="binary"
)
model.fit(
train_generator,
epochs =10,
steps_per_epoch=400,
validation_steps=37,
validation_data=valid_generator)
The result is as follows:
Found 4000 images belonging to 2 classes.
Found 77 images belonging to 2 classes.
Found 6 images belonging to 2 classes.
Epoch 1/10
400/400 [==============================] - 14s 35ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5811
Epoch 2/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5811
Epoch 3/10
400/400 [==============================] - 13s 34ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5676
Epoch 4/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5676
Epoch 5/10
400/400 [==============================] - 18s 46ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5541
Epoch 6/10
400/400 [==============================] - 13s 34ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5676
Epoch 7/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5676
Epoch 8/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5946
Epoch 9/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5811
Epoch 10/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5811
<tensorflow.python.keras.callbacks.History at 0x7fa3a5f4a8d0>
If you are trying to recognize charaters of 2 classes, you should:
use class_mode="binary" in the flow_from_directory function
use binary_crossentropy as loss
your last layer must have 1 neuron with sigmoid activation function
In case there are more than 2 classes:
do not use class_mode="binary" in the flow_from_directory function
use categorical_crossentropy as loss
your last layer must have n neurons with softmax activation, where n stands for the number of classes
The model that I am using is this:
from keras.layers import (Input, MaxPooling1D, Dropout,
BatchNormalization, Activation, Add,
Flatten, Conv1D, Dense)
from keras.models import Model
import numpy as np
class ResidualUnit(object):
"""References
----------
.. [1] K. He, X. Zhang, S. Ren, and J. Sun, "Identity Mappings in Deep Residual Networks,"
arXiv:1603.05027 [cs], Mar. 2016. https://arxiv.org/pdf/1603.05027.pdf.
.. [2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. https://arxiv.org/pdf/1512.03385.pdf
"""
def __init__(self, n_samples_out, n_filters_out, kernel_initializer='he_normal',
dropout_rate=0.8, kernel_size=17, preactivation=True,
postactivation_bn=False, activation_function='relu'):
self.n_samples_out = n_samples_out
self.n_filters_out = n_filters_out
self.kernel_initializer = kernel_initializer
self.dropout_rate = dropout_rate
self.kernel_size = kernel_size
self.preactivation = preactivation
self.postactivation_bn = postactivation_bn
self.activation_function = activation_function
def _skip_connection(self, y, downsample, n_filters_in):
"""Implement skip connection."""
# Deal with downsampling
if downsample > 1:
y = MaxPooling1D(downsample, strides=downsample, padding='same')(y)
elif downsample == 1:
y = y
else:
raise ValueError("Number of samples should always decrease.")
# Deal with n_filters dimension increase
if n_filters_in != self.n_filters_out:
# This is one of the two alternatives presented in ResNet paper
# Other option is to just fill the matrix with zeros.
y = Conv1D(self.n_filters_out, 1, padding='same',
use_bias=False,
kernel_initializer=self.kernel_initializer
)(y)
return y
def _batch_norm_plus_activation(self, x):
if self.postactivation_bn:
x = Activation(self.activation_function)(x)
x = BatchNormalization(center=False, scale=False)(x)
else:
x = BatchNormalization()(x)
x = Activation(self.activation_function)(x)
return x
def __call__(self, inputs):
"""Residual unit."""
x, y = inputs
n_samples_in = y.shape[1]
downsample = n_samples_in // self.n_samples_out
n_filters_in = y.shape[2]
y = self._skip_connection(y, downsample, n_filters_in)
# 1st layer
x = Conv1D(self.n_filters_out, self.kernel_size, padding='same',
use_bias=False,
kernel_initializer=self.kernel_initializer
)(x)
x = self._batch_norm_plus_activation(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
# 2nd layer
x = Conv1D(self.n_filters_out, self.kernel_size, strides=downsample,
padding='same', use_bias=False,
kernel_initializer=self.kernel_initializer
)(x)
if self.preactivation:
x = Add()([x, y]) # Sum skip connection and main connection
y = x
x = self._batch_norm_plus_activation(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
else:
x = BatchNormalization()(x)
x = Add()([x, y]) # Sum skip connection and main connection
x = Activation(self.activation_function)(x)
if self.dropout_rate > 0:
x = Dropout(self.dropout_rate)(x)
y = x
return [x, y]
# ----- Model ----- #
kernel_size = 16
kernel_initializer = 'he_normal'
signal = Input(shape=(1000, 12), dtype=np.float32, name='signal')
age_range = Input(shape=(6,), dtype=np.float32, name='age_range')
is_male = Input(shape=(1,), dtype=np.float32, name='is_male')
x = signal
x = Conv1D(64, kernel_size, padding='same', use_bias=False,
kernel_initializer=kernel_initializer
)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x, y = ResidualUnit(512, 128, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, x])
x, y = ResidualUnit(256, 196, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, y])
x, y = ResidualUnit(64, 256, kernel_size=kernel_size,
kernel_initializer=kernel_initializer
)([x, y])
x, _ = ResidualUnit(16, 320, kernel_size=kernel_size, kernel_initializer=kernel_initializer
)([x, y])
x = Flatten()(x)
diagn = Dense(2, activation='sigmoid', kernel_initializer=kernel_initializer)(x)
model = Model(signal, diagn)
model.summary()
# ----- Train ----- #
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
loss = 'binary_crossentropy'
lr = 0.001
batch_size = 64
opt = Adam(learning_rate=0.001)
callbacks = [ReduceLROnPlateau(monitor='val_loss',
factor=0.1,
patience=7,
min_lr=lr / 100)]
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=70,
initial_epoch=0,
validation_split=0.1,
shuffle='batch',
callbacks=callbacks,
verbose=1)
# Save final result
model.save("./final_model_middle_one.hdf5")
When I substitute the use of Keras with tf.keras, which I need to use the qkeras library, the model doesn't learn and gets stuck at a much lower accuracy at every iteration. What could be causing this?
When I use keras the accuracy start high at 83% and slightly increases during training.
Train on 17340 samples, validate on 1927 samples
Epoch 1/70
17340/17340 [==============================] - 33s 2ms/step - loss: 0.3908 - accuracy: 0.8314 - val_loss: 0.3283 - val_accuracy: 0.8710
Epoch 2/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3641 - accuracy: 0.8416 - val_loss: 0.3340 - val_accuracy: 0.8612
Epoch 3/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3525 - accuracy: 0.8483 - val_loss: 0.3847 - val_accuracy: 0.8550
Epoch 4/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3354 - accuracy: 0.8563 - val_loss: 0.4641 - val_accuracy: 0.8215
Epoch 5/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3269 - accuracy: 0.8590 - val_loss: 0.7172 - val_accuracy: 0.7870
Epoch 6/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3202 - accuracy: 0.8630 - val_loss: 0.3599 - val_accuracy: 0.8617
Epoch 7/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3101 - accuracy: 0.8678 - val_loss: 0.2659 - val_accuracy: 0.8934
Epoch 8/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.3058 - accuracy: 0.8688 - val_loss: 0.5683 - val_accuracy: 0.8293
Epoch 9/70
17340/17340 [==============================] - 31s 2ms/step - loss: 0.2980 - accuracy: 0.8739 - val_loss: 0.3442 - val_accuracy: 0.8643
Epoch 10/70
7424/17340 [===========>..................] - ETA: 17s - loss: 0.2966 - accuracy: 0.8707
When I use tf.keras the accuracy starts at 50% and does not increase considerably during training:
Epoch 1/70
271/271 [==============================] - 30s 110ms/step - loss: 0.9325 - accuracy: 0.5093 - val_loss: 0.6973 - val_accuracy: 0.5470 - lr: 0.0010
Epoch 2/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8424 - accuracy: 0.5157 - val_loss: 0.6660 - val_accuracy: 0.6528 - lr: 0.0010
Epoch 3/70
271/271 [==============================] - 29s 108ms/step - loss: 0.8066 - accuracy: 0.5213 - val_loss: 0.6441 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 4/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7884 - accuracy: 0.5272 - val_loss: 0.6649 - val_accuracy: 0.6559 - lr: 0.0010
Epoch 5/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7888 - accuracy: 0.5368 - val_loss: 0.6899 - val_accuracy: 0.5760 - lr: 0.0010
Epoch 6/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7617 - accuracy: 0.5304 - val_loss: 0.6641 - val_accuracy: 0.6533 - lr: 0.0010
Epoch 7/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7485 - accuracy: 0.5333 - val_loss: 0.6450 - val_accuracy: 0.6544 - lr: 0.0010
Epoch 8/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7431 - accuracy: 0.5382 - val_loss: 0.6599 - val_accuracy: 0.6539 - lr: 0.0010
Epoch 9/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7336 - accuracy: 0.5421 - val_loss: 0.6532 - val_accuracy: 0.6554 - lr: 0.0010
Epoch 10/70
271/271 [==============================] - 29s 108ms/step - loss: 0.7274 - accuracy: 0.5379 - val_loss: 0.6753 - val_accuracy: 0.6492 - lr: 0.0010
The lines that have been changed between the two trials are the lines where I import keras modules by adding 'tensorflow.' in front of them. I don't know why the results would be so different, possibly due to different default values of certain parameters?
It might be related to how the accuracy metric is computed in keras vs tf.keras. As far as I can tell the accuracy function is usually used when you have one-hot-encoded output. However, it seems that you are outputting two values [A, B] with a sigmoid function applied to each value.
Since I don't know the labels you're using, there might be two cases:
a) You want to predict A or B. If sos I would change the activation function to softmax
b) You want to predict between A or not A and B or not B. In this case I would modify the output tensor shape to have two heads, each with two values: head_A = [A, not_A] and head_B = [B, not_B]. I would then hot-encode the labels respectively and then I would assume you could use the accuracy metric.
Alternatively, you can create a custom metric that is appropriate to your output shape.
I have a similar (same?) problem, I was manipulating some examples from Kaggle, and was unable to save the model using keras. After much Googling I realised that I needed to use tensorflow.keras. This solved my problem, but the 60000 data items I have and was using for training dropped to a reported 1875. Although the error was still 10%.
1875 * 32 = 60000.
This is my fit.
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, verbose=True,
callbacks=[early_stopping_monitor])
1539/1875 [=======================>......] - ETA: 3s - loss: 0.4445 - accuracy: 0.8418
It turns out that fit defaults to a batch size of 32. If I increase the batch size to 64 I get half the reported data sets, which makes sense:
model.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), epochs=epochs, verbose=True,
callbacks=[early_stopping_monitor])
938/938 [==============================] - 16s 17ms/step - loss: 0.4568 - accuracy: 0.8388
I noticed from your code that you've set batch_size to 64, and your reported data items reduce from 17340 to 271 which is about a 64th, this must also affect your accuracy due to the data you are using.
From the docs here: https://www.tensorflow.org/api_docs/python/tf/keras/Sequential
batch_size
Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of a dataset, generators, or keras.utils.Sequence instances (since they generate batches).
From the Keras docs: https://keras.rstudio.com/reference/fit.html, it also says that the batch size defaults to 32, it must just be reported differently when training the model.
Hope this helps.