Validation accuracy not increasing, overfitting? - python

I created a simple LSTM model but my validation accuracy always revolves around 50 no matter how many epochs I use. Here's how it looks compared to training accuracy:
Epoch 15/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.9408 - accuracy: 0.7999 - val_loss: 3.5255 - val_accuracy: 0.5190
Epoch 16/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.8724 - accuracy: 0.8080 - val_loss: 3.6279 - val_accuracy: 0.5127
Epoch 17/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.8041 - accuracy: 0.8177 - val_loss: 3.6627 - val_accuracy: 0.5158
Epoch 18/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.7377 - accuracy: 0.8297 - val_loss: 3.7247 - val_accuracy: 0.5140
Epoch 19/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.6680 - accuracy: 0.8431 - val_loss: 3.8000 - val_accuracy: 0.5144
Epoch 20/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.6036 - accuracy: 0.8578 - val_loss: 3.9164 - val_accuracy: 0.5051
Epoch 21/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.5460 - accuracy: 0.8715 - val_loss: 3.9832 - val_accuracy: 0.5089
Epoch 22/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.4830 - accuracy: 0.8872 - val_loss: 4.0284 - val_accuracy: 0.5095
Epoch 23/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.4277 - accuracy: 0.9019 - val_loss: 4.1428 - val_accuracy: 0.5067
Epoch 24/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.3760 - accuracy: 0.9169 - val_loss: 4.1972 - val_accuracy: 0.5069
Epoch 25/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.3319 - accuracy: 0.9275 - val_loss: 4.2494 - val_accuracy: 0.5047
Epoch 26/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.2883 - accuracy: 0.9406 - val_loss: 4.3047 - val_accuracy: 0.5075
Epoch 27/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.2471 - accuracy: 0.9507 - val_loss: 4.3822 - val_accuracy: 0.5063
Epoch 28/50
2527/2527 [==============================] - 22s 9ms/step - loss: 0.2131 - accuracy: 0.9592 - val_loss: 4.4553 - val_accuracy: 0.5071
I think it might be overfitting but I have doubts about validation_split function that Keras provides. Does it even shuffle the data?
Anyways, here's my full code from start, even how I take input so u can tell me how I can modify it, from batch size to last nodes size etc. Please take a look and tell me how it can be optimised so my validation accuracy can improve.
BATCH_SIZE = 64
EPOCHS = 50
LSTM_NODES =256
NUM_SENTENCES = 3000
MAX_SENTENCE_LENGTH = 50
MAX_NUM_WORDS = 3000
EMBEDDING_SIZE = 100
input_sentences = []
output_sentences = []
output_sentences_inputs = []
count = 0
for line in open(r'/content/drive/My Drive/TEMPPP/123.txt', encoding="utf-8"):
count += 1
if count > NUM_SENTENCES:
break
if '\t' not in line:
continue
input_sentence, output = line.rstrip().split('\t')
output_sentence = output + ' <eos>'
output_sentence_input = '<sos> ' + output
input_sentences.append(input_sentence)
output_sentences.append(output_sentence)
output_sentences_inputs.append(output_sentence_input)
input_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS)
input_tokenizer.fit_on_texts(input_sentences)
input_integer_seq = input_tokenizer.texts_to_sequences(input_sentences)
word2idx_inputs = input_tokenizer.word_index
max_input_len = max(len(sen) for sen in input_integer_seq)
output_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS, filters='')
output_tokenizer.fit_on_texts(output_sentences + output_sentences_inputs)
output_integer_seq = output_tokenizer.texts_to_sequences(output_sentences)
output_input_integer_seq = output_tokenizer.texts_to_sequences(output_sentences_inputs)
word2idx_outputs = output_tokenizer.word_index
num_words_output = len(word2idx_outputs) + 1
max_out_len = max(len(sen) for sen in output_integer_seq)
encoder_input_sequences = pad_sequences(input_integer_seq, maxlen=max_input_len)
decoder_input_sequences = pad_sequences(output_input_integer_seq, maxlen=max_out_len, padding='post')
import numpy as np
read_dictionary = np.load('/content/drive/My Drive/TEMPPP/hinvec.npy',allow_pickle='TRUE').item()
num_words = min(MAX_NUM_WORDS, len(word2idx_inputs) + 1)
embedding_matrix = np.zeros((num_words, EMBEDDING_SIZE))
for word, index in word2idx_inputs.items():
embedding_vector = read_dictionary.get(word)
if embedding_vector is not None:
embedding_matrix[index] = embedding_vector
embedding_layer = Embedding(num_words, EMBEDDING_SIZE, weights=[embedding_matrix], input_length=max_input_len)
decoder_targets_one_hot = np.zeros((
len(input_sentences),
max_out_len,
num_words_output
),
dtype='float32'
)
decoder_output_sequences = pad_sequences(output_integer_seq, maxlen=max_out_len, padding='post')
for i, d in enumerate(decoder_output_sequences):
for t, word in enumerate(d):
decoder_targets_one_hot[i, t, word] = 1
encoder_inputs_placeholder = Input(shape=(max_input_len,))
x = embedding_layer(encoder_inputs_placeholder)
encoder = LSTM(LSTM_NODES, return_state=True)
encoder_outputs, h, c = encoder(x)
encoder_states = [h, c]
decoder_inputs_placeholder = Input(shape=(max_out_len,))
decoder_embedding = Embedding(num_words_output, LSTM_NODES)
decoder_inputs_x = decoder_embedding(decoder_inputs_placeholder)
decoder_lstm = LSTM(LSTM_NODES, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs_x, initial_state=encoder_states)
decoder_dense = Dense(num_words_output, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
import tensorflow as tf
starter_learning_rate = 0.1
end_learning_rate = 0.01
decay_steps = 2000
learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay(
starter_learning_rate,
decay_steps,
end_learning_rate,
power=0.5)
opt = tf.keras.optimizers.Adam(learning_rate=learning_rate_fn, epsilon=1e-03, clipvalue=0.5)
model = Model([encoder_inputs_placeholder,
decoder_inputs_placeholder],
decoder_outputs)
model.compile(
optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy']
)
history = model.fit(
[encoder_input_sequences, decoder_input_sequences],
decoder_targets_one_hot,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_split=0.1,
)
I tried to add dropout layer but I couldn't add it between LSTM layer and dense layer. And I have doubts about validation_split. I tried to split dataset in train_test_set and valid_test_set but count make it work and ended up sticking with validation_split. Im pretty sure this is case of overfitting but not able to deal with it.

Related

tf.keras model very low accuracy and zero loss

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer(num_words = 5408)
tokenizer.fit_on_texts(training_sentences)
word_index = tokenizer.word_index
vocab_size = len(tokenizer.word_index)
training_sequences = tokenizer.texts_to_sequences(training_sentences)
training_padded = pad_sequences(training_sequences, padding='post', truncating='post', maxlen = 30)
testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences, padding='post', truncating='post', maxlen = 30)
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, 1400),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(34, activation='softmax'),
tf.keras.layers.Dense(50, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])
history = model.fit(training_padded, training_labels, epochs = 5,
validation_data=(testing_padded, testing_labels), verbose=2)
Epoch 1/5
44/44 - 4s - loss: 0.0000e+00 - accuracy: 0.0579 - val_loss: 0.0000e+00 - val_accuracy: 0.0379 - 4s/epoch - 97ms/step
Epoch 2/5
44/44 - 3s - loss: 0.0000e+00 - accuracy: 0.0579 - val_loss: 0.0000e+00 - val_accuracy: 0.0379 - 3s/epoch - 77ms/step
Epoch 3/5
44/44 - 3s - loss: 0.0000e+00 - accuracy: 0.0579 - val_loss: 0.0000e+00 - val_accuracy: 0.0379 - 3s/epoch - 69ms/step
Epoch 4/5
44/44 - 3s - loss: 0.0000e+00 - accuracy: 0.0579 - val_loss: 0.0000e+00 - val_accuracy: 0.0379 - 3s/epoch - 69ms/step
Epoch 5/5
44/44 - 3s - loss: 0.0000e+00 - accuracy: 0.0579 - val_loss: 0.0000e+00 - val_accuracy: 0.0379 - 3s/epoch - 75ms/step
My dataset consists of texts with 17 classes. I have preprocessed it by doing stop word removal, punctuation removal, lowercasing. Is the extremely low accuracy due to a problem in the code?

Preprocessing for Transfer Learning Model with an Inception Network

I am trying to build an image classification model using an Inception Network as the base. This is a simple binary classification model.
My images are available in many smaller directories within one big directory. Each of them has its own 'image id' and that is how they have been named. In addition to this, I have a few tsv files which contain these image ids and the respective labels ('Positive' or 'Negative').
When I train the model, I see that my accuracy fluctuates without much progress. I was wondering if there is anything wrong with the way that I have prepared my dataset. I have written a few functions for this purpose.
Before I get to these functions, given below is how I have defined my model,
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dropout(0.4)(x)
predictions = Dense(2, activation='sigmoid')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
These are the functions that I have written in order to prepare my data,
def vectorize_img(img_path):
img = load_img(img_path, target_size=(224, 224)) #size is 224,224 by default
x = img_to_array(img) #change to np array
x = preprocess_input(x) #make input confirm to InceptionV3 input format
return x
def prepare_features(base_dir, limit):
features_dict = dict()
for dir1 in os.listdir(base_dir):
for dir2 in os.listdir(base_dir + dir1):
for file in os.listdir(base_dir + dir1 + '/' + dir2):
if len(features_dict) < limit:
try:
img_path = base_dir + dir1 + '/' + dir2 + '/' + file
x = vectorize_img(img_path)
name_id = file.split('.')[0] #take the file name and use as id in dict
features_dict[name_id] = x
except Exception as e:
print(e)
pass
return features_dict
def prepare_data(file_path, features_dict):
inputs = []
labels = []
df = pd.read_csv(file_path, sep='\t')
df = df[['image_id', 'label_text_image']]
df['class'] = df[['image_id', 'label_text_image']].apply(lambda x: 1 if x['label_text_image'] == 'Positive' else 0, axis = 1)
for index, row in df.iterrows():
try:
inputs.append(features_dict[row['image_id']])
labels.append(row['class'])
except:
pass
return np.asarray(inputs), tf.one_hot(np.asarray(labels), depth=2)
These functions are then called to prepare my dataset,
features_dict = prepare_features('/path/to/img/dir', 8000)
x_train, y_train = prepare_data('/path/to/train/tsv', features_dict)
x_dev, y_dev = prepare_data('/path/to/dev/tsv', features_dict)
x_test, y_test = prepare_data('/path/to/test/tsv', features_dict)
Finally, the model is trained,
EPOCHS = 50
BATCH_SIZE = 32
STEPS_PER_EPOCH = 1
history = model.fit(x=x_train, y=y_train, validation_data=(x_dev, y_dev), epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, batch_size=BATCH_SIZE)
model.evaluate(x=x_test, y=y_test, batch_size=BATCH_SIZE)
Am I doing something wrong?
Here are the results that my model achieves,
Epoch 1/50
1/1 [==============================] - 158s 158s/step - loss: 0.8298 - accuracy: 0.5000 - val_loss: 0.7432 - val_accuracy: 0.5227
Epoch 2/50
1/1 [==============================] - 113s 113s/step - loss: 0.7775 - accuracy: 0.4688 - val_loss: 0.8225 - val_accuracy: 0.5153
Epoch 3/50
1/1 [==============================] - 113s 113s/step - loss: 0.7663 - accuracy: 0.5625 - val_loss: 0.8431 - val_accuracy: 0.5174
Epoch 4/50
1/1 [==============================] - 156s 156s/step - loss: 1.1292 - accuracy: 0.5312 - val_loss: 0.7763 - val_accuracy: 0.5227
Epoch 5/50
1/1 [==============================] - 114s 114s/step - loss: 0.7452 - accuracy: 0.5312 - val_loss: 0.7332 - val_accuracy: 0.5448
Epoch 6/50
1/1 [==============================] - 156s 156s/step - loss: 0.7884 - accuracy: 0.5312 - val_loss: 0.7072 - val_accuracy: 0.5606
Epoch 7/50
1/1 [==============================] - 114s 114s/step - loss: 0.7856 - accuracy: 0.5312 - val_loss: 0.7195 - val_accuracy: 0.5764
Epoch 8/50
1/1 [==============================] - 156s 156s/step - loss: 0.9203 - accuracy: 0.5312 - val_loss: 0.7348 - val_accuracy: 0.5616
Epoch 9/50
1/1 [==============================] - 156s 156s/step - loss: 0.8639 - accuracy: 0.4062 - val_loss: 0.7275 - val_accuracy: 0.5690
Epoch 10/50
1/1 [==============================] - 156s 156s/step - loss: 0.6170 - accuracy: 0.7188 - val_loss: 0.7125 - val_accuracy: 0.5880
Epoch 11/50
1/1 [==============================] - 156s 156s/step - loss: 0.5756 - accuracy: 0.7188 - val_loss: 0.6979 - val_accuracy: 0.6017
Epoch 12/50
1/1 [==============================] - 113s 113s/step - loss: 0.9976 - accuracy: 0.4375 - val_loss: 0.6834 - val_accuracy: 0.5933
Epoch 13/50
1/1 [==============================] - 156s 156s/step - loss: 0.7025 - accuracy: 0.5938 - val_loss: 0.6863 - val_accuracy: 0.5838
You mentioned that it is a binary classification hence labels are {0,1}. In this case your model output should either be
predictions = Dense(2, activation='softmax')(x)
with categorical labels [0,1] or [1,0]
or
predictions = Dense(1, activation='sigmoid')(x)
with binary label 1 or 0
but you are using output 2 with sigmoid i.e. predictions = Dense(2, activation='sigmoid')(x).

No change in train & test accuracy and loss

I'm tring to use CNN to classifiy 3 classes data, every data is 30*188. Class1 has 5794 data, class2 has 8471, class3 has 9092. When I train my model, the value of accuracy, loss , val_acc and val_loss don't change.
Please help me to solve this problem.
import glob
import os
import librosa
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import specgram
import librosa.display
import sklearn
from keras.utils import to_categorical
import scipy.io as scio
path1 = 'class1_feature_array.mat'
data1 = scio.loadmat(path1)
class1_feature_array = data1['class1_feature_array']
class1_label = np.zeros((class1_feature_array.shape[0],))
class1_label=class1_label.astype(np.int32)
class1_label=class1_label.astype(np.str)
path2 = 'class2_feature_array.mat'
data2 = scio.loadmat(path2)
class2_feature_array = data2['class2_feature_array']
class2_label = np.ones((class2_feature_array.shape[0],))
class2_label=class2_label.astype(np.int32)
class2_label=class2_label.astype(np.str)
path3 = 'class3_feature_array.mat'
data3 = scio.loadmat(path3)
class3_feature_array = data3['class3_feature_array']
class3_label = np.ones((class3_feature_array.shape[0],))*2
class3_label=class3_label.astype(np.int32)
class3_label=class3_label.astype(np.str)
features, labels = np.empty((0,40,188)), np.empty(0)
features = np.append(features,class1_feature_array,axis=0)
features = np.append(features,class2_feature_array,axis=0)
features = np.append(features,class3_feature_array,axis=0)
features = np.array(features)
labels = np.append(labels,class1_label,axis=0)
labels = np.append(labels,class2_label,axis=0)
labels = np.append(labels,class3_label,axis=0)
labels = np.array(labels, dtype = np.int)
def one_hot_encode(labels):
n_labels = len(labels)
n_unique_labels = len(np.unique(labels))
one_hot_encode = np.zeros((n_labels,n_unique_labels))
print("one_hot_encode",one_hot_encode.shape)
one_hot_encode[np.arange(n_labels), labels] = 1
return one_hot_encode
labels = one_hot_encode(labels)
train_test_split = np.random.rand(len(features)) < 0.80
train_x = features[train_test_split]
train_y = labels[train_test_split]
test_x = features[~train_test_split]
test_y = labels[~train_test_split]
train_x = train_x.reshape(train_x.shape[0],train_x.shape[1],train_x.shape[2],1)
test_x = test_x.reshape(test_x.shape[0],test_x.shape[1],test_x.shape[2],1)
import sklearn
import keras
from keras.models import Sequential
from keras.layers import *
from keras.callbacks import LearningRateScheduler
from keras import optimizers
#LeNet
model = Sequential()
model.add(Conv2D(32,(5, 5),strides=(1,1),padding='valid',activation='relu',input_shape=(40,188,1),kernel_initializer='uniform'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64,(5,5),strides=(1,1),padding='valid',activation='relu',kernel_initializer='uniform'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(100,activation='relu'))
model.add(Dense(3, activation='softmax'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd,
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary(line_length=80)
history = model.fit(train_x, train_y, epochs=100, batch_size=32, validation_data=(test_x, test_y))
The output after training is as shown below:
Train on 18625 samples, validate on 4732 samples
Epoch 1/100
18625/18625 [==============================] - 30s 2ms/step - loss: 8.0138 - accuracy: 0.5001 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 2/100
18625/18625 [==============================] - 22s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 3/100
18625/18625 [==============================] - 23s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 4/100
18625/18625 [==============================] - 24s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 5/100
18625/18625 [==============================] - 23s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 6/100
18625/18625 [==============================] - 24s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 7/100
18625/18625 [==============================] - 24s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 8/100
18625/18625 [==============================] - 25s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 9/100
18625/18625 [==============================] - 26s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 10/100
18625/18625 [==============================] - 25s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 11/100
18625/18625 [==============================] - 26s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944
Epoch 12/100
18625/18625 [==============================] - 26s 1ms/step - loss: 8.0181 - accuracy: 0.4998 - val_loss: 8.1055 - val_accuracy: 0.4944

Accuracy doesn't improve in training model character recognition

I am building a training model for my character recognition system. During every epochs, I am getting the same accuracy and it doesn't improve. I have currently 4000 training images and 77 validation images.
My model is as follows:
inputs = Input(shape=(32,32,3))
x = Conv2D(filters = 64, kernel_size = 5, activation = 'relu')(inputs)
x = MaxPooling2D()(x)
x = Conv2D(filters = 32,
kernel_size = 3,
activation = 'relu')(x)
x = MaxPooling2D()(x)
x = Flatten()(x)
x=Dense(256,
activation='relu')(x)
outputs = Dense(1, activation = 'softmax')(x)
model = Model(inputs = inputs, outputs = outputs)
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
data_gen_train = ImageDataGenerator(rescale=1/255)
data_gen_test=ImageDataGenerator(rescale=1/255)
data_gen_valid = ImageDataGenerator(rescale=1/255)
train_generator = data_gen_train.flow_from_directory(directory=r"./drive/My Drive/train_dataset",
target_size=(32,32), batch_size=10, class_mode="binary")
valid_generator = data_gen_valid.flow_from_directory(directory=r"./drive/My
Drive/validation_dataset", target_size=(32,32), batch_size=2, class_mode="binary")
test_generator = data_gen_test.flow_from_directory(
directory=r"./drive/My Drive/test_dataset",
target_size=(32, 32),
batch_size=6,
class_mode="binary"
)
model.fit(
train_generator,
epochs =10,
steps_per_epoch=400,
validation_steps=37,
validation_data=valid_generator)
The result is as follows:
Found 4000 images belonging to 2 classes.
Found 77 images belonging to 2 classes.
Found 6 images belonging to 2 classes.
Epoch 1/10
400/400 [==============================] - 14s 35ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5811
Epoch 2/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5811
Epoch 3/10
400/400 [==============================] - 13s 34ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5676
Epoch 4/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5676
Epoch 5/10
400/400 [==============================] - 18s 46ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5541
Epoch 6/10
400/400 [==============================] - 13s 34ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5676
Epoch 7/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5676
Epoch 8/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5946
Epoch 9/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5811
Epoch 10/10
400/400 [==============================] - 13s 33ms/step - loss: 0.0000e+00 - accuracy: 0.5000 - val_loss: 0.0000e+00 - val_accuracy: 0.5811
<tensorflow.python.keras.callbacks.History at 0x7fa3a5f4a8d0>
If you are trying to recognize charaters of 2 classes, you should:
use class_mode="binary" in the flow_from_directory function
use binary_crossentropy as loss
your last layer must have 1 neuron with sigmoid activation function
In case there are more than 2 classes:
do not use class_mode="binary" in the flow_from_directory function
use categorical_crossentropy as loss
your last layer must have n neurons with softmax activation, where n stands for the number of classes

Different converging for different training data representations (Numpy array and TensorFlow Dataset API)

I am using Keras with TensorFlow backend to train an LSTM network for some time-sequential data sets. The performance seems pretty good when I represent my training data (as well as the validation data) in the Numpy array format:
train_x.shape: (128346, 10, 34)
val_x.shape: (7941, 10, 34)
test_x.shape: (24181, 10, 34)
train_y.shape: (128346, 2)
val_y.shape: (7941, 2)
test_y.shape: (24181, 2)
P.s., 10 is the time steps and 34 is the number of features; The labels were one-hot encoded.
model = tf.keras.Sequential()
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True,
input_shape=(_TIME_STEPS, _FEATURE_DIMENTIONS)))
model.add(layers.Dropout(0.4))
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True))
model.add(layers.Dropout(0.3))
model.add(layers.TimeDistributed(layers.Dense(_NUM_CLASSES)))
model.add(layers.Flatten())
model.add(layers.Dense(_NUM_CLASSES, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr = _LR)
model.compile(optimizer = opt, loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
And the training results are:
Train on 128346 samples, validate on 7941 samples
Epoch 1/10
128346/128346 [==============================] - 50s 390us/step - loss: 0.5883 - acc: 0.6975 - val_loss: 0.5242 - val_acc: 0.7416
Epoch 2/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4804 - acc: 0.7687 - val_loss: 0.4265 - val_acc: 0.8014
Epoch 3/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4232 - acc: 0.8076 - val_loss: 0.4095 - val_acc: 0.8096
Epoch 4/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3894 - acc: 0.8276 - val_loss: 0.3529 - val_acc: 0.8469
Epoch 5/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3610 - acc: 0.8430 - val_loss: 0.3283 - val_acc: 0.8593
Epoch 6/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3402 - acc: 0.8525 - val_loss: 0.3334 - val_acc: 0.8558
Epoch 7/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3233 - acc: 0.8604 - val_loss: 0.2944 - val_acc: 0.8741
Epoch 8/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3087 - acc: 0.8663 - val_loss: 0.2786 - val_acc: 0.8805
Epoch 9/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2969 - acc: 0.8709 - val_loss: 0.2785 - val_acc: 0.8777
Epoch 10/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2867 - acc: 0.8757 - val_loss: 0.2590 - val_acc: 0.8877
This log seems pretty normal, but when I tried to use TensorFlow Dataset API to represent my data sets, the training process performed very strange (it seems that the model turns to overfit/underfit?):
def tfdata_generator(features, labels, is_training = False, batch_size = _BATCH_SIZE, epoch = _EPOCH):
dataset = tf.data.Dataset.from_tensor_slices((features, tf.cast(labels, dtype = tf.uint8)))
if is_training:
dataset = dataset.shuffle(10000) # depends on sample size
dataset = dataset.batch(batch_size, drop_remainder = True).repeat(epoch).prefetch(batch_size)
return dataset
training_set = tfdata_generator(train_x, train_y, is_training=True)
validation_set = tfdata_generator(val_x, val_y, is_training=False)
testing_set = tfdata_generator(test_x, test_y, is_training=False)
Training on the same model and hyperparameters:
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
And the log seems much different from the previous one:
Epoch 1/10
2005/2005 [==============================] - 54s 27ms/step - loss: 0.1451 - acc: 0.9419 - val_loss: 3.2980 - val_acc: 0.4975
Epoch 2/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1675 - acc: 0.9371 - val_loss: 3.0838 - val_acc: 0.4975
Epoch 3/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1821 - acc: 0.9316 - val_loss: 3.1212 - val_acc: 0.4975
Epoch 4/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1902 - acc: 0.9287 - val_loss: 3.0032 - val_acc: 0.4975
Epoch 5/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1905 - acc: 0.9283 - val_loss: 2.9671 - val_acc: 0.4975
Epoch 6/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1867 - acc: 0.9299 - val_loss: 2.8734 - val_acc: 0.4975
Epoch 7/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1802 - acc: 0.9316 - val_loss: 2.8651 - val_acc: 0.4975
Epoch 8/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1740 - acc: 0.9350 - val_loss: 2.8793 - val_acc: 0.4975
Epoch 9/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1660 - acc: 0.9388 - val_loss: 2.7894 - val_acc: 0.4975
Epoch 10/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1613 - acc: 0.9405 - val_loss: 2.7997 - val_acc: 0.4975
The validation loss could not be reduced and the val_acc always the same value when I use the TensorFlow Dataset API to represent my data.
My questions are:
Based on the same model and parameters, why the model.fit() provides such different training results when I merely adopted tf.data.Dataset API?
What the difference between these two mechanisms?
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
vs
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
How to solve this strange problem if I have to use tf.data.Dataset API?

Categories

Resources