My accuracy is stuck at 0.5. I already tried to vary with different parameters, such as learning_rate, optimizer, loss function, etc. But the accuracy always stays the same. Any ideas how to fix this? This is my code:
import tensorflow as tf
imdb = tf.keras.datasets.imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
result = np.zeros((len(sequences),dimension))
for i, sequence in enumerate(sequences):
result[i, sequence] = 1.
return result
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(16,activation='relu', input_shape=(10000,)),
tf.keras.layers.Dense(16,activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer= tf.keras.optimizers.Adam(learning_rate=0.0001), loss= 'binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)
And this is my output:
Epoch 1/10
782/782 [==============================] - 3s 3ms/step - loss: 0.6931 - accuracy: 0.5047
Epoch 2/10
782/782 [==============================] - 3s 3ms/step - loss: 0.6932 - accuracy: 0.5000
Epoch 3/10
782/782 [==============================] - 3s 3ms/step - loss: 0.6931 - accuracy: 0.5000
Epoch 4/10
782/782 [==============================] - 3s 4ms/step - loss: 0.6931 - accuracy: 0.4982
Epoch 5/10
782/782 [==============================] - 3s 3ms/step - loss: 0.6931 - accuracy: 0.4993
Epoch 6/10
782/782 [==============================] - 3s 3ms/step - loss: 0.6931 - accuracy: 0.4980
Epoch 7/10
782/782 [==============================] - 3s 3ms/step - loss: 0.6931 - accuracy: 0.5000
Epoch 8/10
782/782 [==============================] - 3s 4ms/step - loss: 0.6931 - accuracy: 0.4979
Epoch 9/10
782/782 [==============================] - 3s 3ms/step - loss: 0.6931 - accuracy: 0.4941
Epoch 10/10
782/782 [==============================] - 3s 3ms/step - loss: 0.6931 - accuracy: 0.4988
Related
I am trying to train model in three machines using tensorflow MultiWorkerMirroredStrategy. The script is based on the tensorflow tutorial:Multi-worker training with Keras(https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras#dataset_sharding_and_batch_size):
import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()
import os
import json
strategy = tf.distribute.MultiWorkerMirroredStrategy()
BUFFER_SIZE = 10000
BATCH_SIZE = 64
def make_datasets_unbatched():
# scale MNIST data from (0, 255] to (0., 1.]
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
# data download to /home/pzs/tensorflow_datasets/mnist/
datasets, info = tfds.load(name='mnist',
with_info=True,
as_supervised=True)
return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)
def build_and_compile_cnn_model():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
metrics=['accuracy'])
return model
NUM_WORKERS = 3
GLOBAL_BATCH_SIZE = 64 * NUM_WORKERS
train_datasets = make_datasets_unbatched().batch(GLOBAL_BATCH_SIZE)
options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
train_datasets = make_datasets_unbatched().batch(BATCH_SIZE)
train_datasets = train_datasets.with_options(options)
with strategy.scope():
multi_worker_model = build_and_compile_cnn_model()
multi_worker_model.fit(x=train_datasets, epochs=30, steps_per_epoch=5)
I run this script separately on tree node3:
on node 1:
TF_CONFIG='{"cluster": {"worker": ["192.168.4.36:12346", "192.168.4.83:12346", "192.168.4.83:12346"]}, "task": {"index": 0, "type": "worker"}}' python3 multi_worker_with_keras.py
on node 2:
TF_CONFIG='{"cluster": {"worker": ["192.168.4.36:12346", "192.168.4.83:12346", "192.168.4.83:12346"]}, "task": {"index": 1, "type": "worker"}}' python3 multi_worker_with_keras.py
on node 3:
TF_CONFIG='{"cluster": {"worker": ["192.168.4.36:12346", "192.168.4.83:12346", "192.168.4.83:12346"]}, "task": {"index": 2, "type": "worker"}}' python3 multi_worker_with_keras.py
and the results of training error and accuracy are:
Epoch 1/30
2022-02-16 11:52:25.060362: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8201
5/5 [==============================] - 7s 195ms/step - loss: 2.3010 - accuracy: 0.0719
Epoch 2/30
5/5 [==============================] - 1s 181ms/step - loss: 2.2984 - accuracy: 0.0688
Epoch 3/30
5/5 [==============================] - 1s 182ms/step - loss: 2.2993 - accuracy: 0.0781
Epoch 4/30
5/5 [==============================] - 1s 182ms/step - loss: 2.2917 - accuracy: 0.0594
Epoch 5/30
5/5 [==============================] - 1s 182ms/step - loss: 2.2987 - accuracy: 0.0969
Epoch 6/30
5/5 [==============================] - 1s 183ms/step - loss: 2.2992 - accuracy: 0.0906
Epoch 7/30
5/5 [==============================] - 1s 181ms/step - loss: 2.2978 - accuracy: 0.1000
Epoch 8/30
5/5 [==============================] - 1s 183ms/step - loss: 2.2887 - accuracy: 0.0969
Epoch 9/30
5/5 [==============================] - 1s 182ms/step - loss: 2.2887 - accuracy: 0.0969
Epoch 10/30
5/5 [==============================] - 1s 183ms/step - loss: 2.2930 - accuracy: 0.0844
Epoch 11/30
5/5 [==============================] - 1s 184ms/step - loss: 2.2905 - accuracy: 0.1000
Epoch 12/30
5/5 [==============================] - 1s 184ms/step - loss: 2.2884 - accuracy: 0.0812
Epoch 13/30
5/5 [==============================] - 1s 186ms/step - loss: 2.2837 - accuracy: 0.1250
Epoch 14/30
5/5 [==============================] - 1s 189ms/step - loss: 2.2842 - accuracy: 0.1094
Epoch 15/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2856 - accuracy: 0.0750
Epoch 16/30
5/5 [==============================] - 1s 192ms/step - loss: 2.2911 - accuracy: 0.0719
Epoch 17/30
5/5 [==============================] - 1s 188ms/step - loss: 2.2805 - accuracy: 0.1031
Epoch 18/30
5/5 [==============================] - 1s 187ms/step - loss: 2.2800 - accuracy: 0.1219
Epoch 19/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2799 - accuracy: 0.1063
Epoch 20/30
5/5 [==============================] - 1s 192ms/step - loss: 2.2769 - accuracy: 0.1187
Epoch 21/30
5/5 [==============================] - 1s 193ms/step - loss: 2.2768 - accuracy: 0.1344
Epoch 22/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2754 - accuracy: 0.1187
Epoch 23/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2821 - accuracy: 0.1187
Epoch 24/30
5/5 [==============================] - 1s 188ms/step - loss: 2.2832 - accuracy: 0.0844
Epoch 25/30
5/5 [==============================] - 1s 190ms/step - loss: 2.2793 - accuracy: 0.1125
Epoch 26/30
5/5 [==============================] - 1s 191ms/step - loss: 2.2762 - accuracy: 0.1406
Epoch 27/30
5/5 [==============================] - 1s 194ms/step - loss: 2.2696 - accuracy: 0.1344
Epoch 28/30
5/5 [==============================] - 1s 192ms/step - loss: 2.2717 - accuracy: 0.1406
Epoch 29/30
5/5 [==============================] - 1s 191ms/step - loss: 2.2680 - accuracy: 0.1500
Epoch 30/30
5/5 [==============================] - 1s 193ms/step - loss: 2.2696 - accuracy: 0.1500
all results are exactly the same for 3 nodes.
my question is:
When using tf.distribute.MultiWorkerMirroredStrategy to train model among multiple machines, each process does forward and backward propagation independently using different slice of a batch training data, why the training errors are all the same for the corresponding epoch in 3 nodes? I try to run a different script and found the same case.
This is expected. The metric values would be allreduced in fit method.
https://github.com/tensorflow/tensorflow/issues/39343#issuecomment-627008557
I'm a newbie with deep learning and I try to create a model and I don't really understand the model. add(layers). I m sure that the input shape (it's for recognition). I think the problem is in the Dropout, but I don't understand the value.
Can someone explains to me the
model = models.Sequential()
model.add(layers.Conv2D(32, (3,3), activation = 'relu', input_shape = (128,128,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation = 'relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(6, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(lr=1e-4), metrics=['acc'])
-------------------------------------------------------
history = model.fit(
train_data,
train_labels,
epochs=30,
validation_data=(test_data, test_labels),
)
and here is the result :
Epoch 15/30
5/5 [==============================] - 0s 34ms/step - loss: 0.3987 - acc: 0.8536 - val_loss: 0.7021 - val_acc: 0.7143
Epoch 16/30
5/5 [==============================] - 0s 31ms/step - loss: 0.3223 - acc: 0.8891 - val_loss: 0.6393 - val_acc: 0.7778
Epoch 17/30
5/5 [==============================] - 0s 32ms/step - loss: 0.3321 - acc: 0.9082 - val_loss: 0.6229 - val_acc: 0.7460
Epoch 18/30
5/5 [==============================] - 0s 31ms/step - loss: 0.2615 - acc: 0.9409 - val_loss: 0.6591 - val_acc: 0.8095
Epoch 19/30
5/5 [==============================] - 0s 32ms/step - loss: 0.2161 - acc: 0.9857 - val_loss: 0.6368 - val_acc: 0.7143
Epoch 20/30
5/5 [==============================] - 0s 33ms/step - loss: 0.1773 - acc: 0.9857 - val_loss: 0.5644 - val_acc: 0.7778
Epoch 21/30
5/5 [==============================] - 0s 32ms/step - loss: 0.1650 - acc: 0.9782 - val_loss: 0.5459 - val_acc: 0.8413
Epoch 22/30
5/5 [==============================] - 0s 31ms/step - loss: 0.1534 - acc: 0.9789 - val_loss: 0.5738 - val_acc: 0.7460
Epoch 23/30
5/5 [==============================] - 0s 32ms/step - loss: 0.1205 - acc: 0.9921 - val_loss: 0.5351 - val_acc: 0.8095
Epoch 24/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0967 - acc: 1.0000 - val_loss: 0.5256 - val_acc: 0.8413
Epoch 25/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0736 - acc: 1.0000 - val_loss: 0.5493 - val_acc: 0.7937
Epoch 26/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0826 - acc: 1.0000 - val_loss: 0.5342 - val_acc: 0.8254
Epoch 27/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0687 - acc: 1.0000 - val_loss: 0.5452 - val_acc: 0.8254
Epoch 28/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0571 - acc: 1.0000 - val_loss: 0.5176 - val_acc: 0.7937
Epoch 29/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0549 - acc: 1.0000 - val_loss: 0.5142 - val_acc: 0.8095
Epoch 30/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0479 - acc: 1.0000 - val_loss: 0.5243 - val_acc: 0.8095
I never depassed the 70% average but on this i have 80% but i think i'm on overfitting.. I evidemently searched on differents docs but i'm lost
Have you try following into your training:
Data Augmentation
Pre-trained Model
Looking at the execution time per epoch, it looks like your data set is pretty small. Also, it's not clear whether there is any class imbalance in your dataset. You probably should try stratified CV training and analysis on the folds results. It won't prevent overfit but it will eventually give you more insight into your model, which generally can help to reduce overfitting. However, preventing overfitting is a general topic, search online to get resources. You can also try this
model.compile(loss='categorical_crossentropy',
optimizer='adam, metrics=['acc'])
-------------------------------------------------------
# src: https://keras.io/api/callbacks/reduce_lr_on_plateau/
# reduce learning rate by a factor of 0.2 if val_loss -
# won't improve within 5 epoch.
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.00001)
# src: https://keras.io/api/callbacks/early_stopping/
# stop training if val_loss don't improve within 15 epoch.
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
history = model.fit(
train_data,
train_labels,
epochs=30,
validation_data=(test_data, test_labels),
callbacks=[reduce_lr, early_stop]
)
You may also find it useful of using ModelCheckpoint or LearningRateScheduler. This doesn't guarantee of no overfit but some approach for that to adopt.
I am fairly new to deep learning and right now am trying to predict consumer choices based on EEG data. The total dataset consists of 1045 EEG recordings each with a corresponding label, indicating Like or Dislike for a product. Classes are distributed as follows (44% Likes and 56% Dislikes). I read that Convolutional Neural Networks are suitable to work with raw EEG data so I tried to implement a network based on keras with the following structure:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(full_data, target, test_size=0.20, random_state=42)
y_train = np.asarray(y_train).astype('float32').reshape((-1,1))
y_test = np.asarray(y_test).astype('float32').reshape((-1,1))
# X_train.shape = ((836, 512, 14))
# y_train.shape = ((836, 1))
from keras.optimizers import Adam
from keras.optimizers import SGD
from keras.layers import MaxPooling1D
model = Sequential()
model.add(Conv1D(16, kernel_size=3, activation="relu", input_shape=(512,14)))
model.add(MaxPooling1D())
model.add(Conv1D(8, kernel_size=3, activation="relu"))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(1, activation="sigmoid"))
model.compile(optimizer=Adam(lr = 0.001), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size = 64)
When I fit the model however the validation accuracy does not change at all with the following output:
Epoch 1/20
14/14 [==============================] - 0s 32ms/step - loss: 292.6353 - accuracy: 0.5383 - val_loss: 0.7884 - val_accuracy: 0.5407
Epoch 2/20
14/14 [==============================] - 0s 7ms/step - loss: 1.3748 - accuracy: 0.5598 - val_loss: 0.8860 - val_accuracy: 0.5502
Epoch 3/20
14/14 [==============================] - 0s 6ms/step - loss: 1.0537 - accuracy: 0.5598 - val_loss: 0.7629 - val_accuracy: 0.5455
Epoch 4/20
14/14 [==============================] - 0s 6ms/step - loss: 0.8827 - accuracy: 0.5598 - val_loss: 0.7010 - val_accuracy: 0.5455
Epoch 5/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7988 - accuracy: 0.5598 - val_loss: 0.8689 - val_accuracy: 0.5407
Epoch 6/20
14/14 [==============================] - 0s 6ms/step - loss: 1.0221 - accuracy: 0.5610 - val_loss: 0.6961 - val_accuracy: 0.5455
Epoch 7/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7415 - accuracy: 0.5598 - val_loss: 0.6945 - val_accuracy: 0.5455
Epoch 8/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7381 - accuracy: 0.5574 - val_loss: 0.7761 - val_accuracy: 0.5455
Epoch 9/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7326 - accuracy: 0.5598 - val_loss: 0.6926 - val_accuracy: 0.5455
Epoch 10/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7338 - accuracy: 0.5598 - val_loss: 0.6917 - val_accuracy: 0.5455
Epoch 11/20
14/14 [==============================] - 0s 7ms/step - loss: 0.7203 - accuracy: 0.5610 - val_loss: 0.6916 - val_accuracy: 0.5455
Epoch 12/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7192 - accuracy: 0.5610 - val_loss: 0.6914 - val_accuracy: 0.5455
Epoch 13/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7174 - accuracy: 0.5610 - val_loss: 0.6912 - val_accuracy: 0.5455
Epoch 14/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7155 - accuracy: 0.5610 - val_loss: 0.6911 - val_accuracy: 0.5455
Epoch 15/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7143 - accuracy: 0.5610 - val_loss: 0.6910 - val_accuracy: 0.5455
Epoch 16/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7129 - accuracy: 0.5610 - val_loss: 0.6909 - val_accuracy: 0.5455
Epoch 17/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7114 - accuracy: 0.5610 - val_loss: 0.6907 - val_accuracy: 0.5455
Epoch 18/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7103 - accuracy: 0.5610 - val_loss: 0.6906 - val_accuracy: 0.5455
Epoch 19/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7088 - accuracy: 0.5610 - val_loss: 0.6906 - val_accuracy: 0.5455
Epoch 20/20
14/14 [==============================] - 0s 6ms/step - loss: 0.7075 - accuracy: 0.5610 - val_loss: 0.6905 - val_accuracy: 0.5455
Thanks in advance for any insights!
The phenomenon you run into is called underfitting. This happens when the amount our quality of your training data is insufficient, or your network architecture is too small and not capable to learn the problem.
Try normalizing your input data and experiment with different network architectures, learning rates and activation functions.
As #Muhammad Shahzad stated in his comment, adding some Dense Layers after flatting would be a concrete architecture adaption you should try.
You can also increase the epoch and must increase the data set. And you also can use-
train_datagen= ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip = True,
channel_shift_range=0.2,
fill_mode='nearest'
)
for feeding the model more data and I hope you can increase the validation_accuracy.
I am training a deep learning network using pre-trained VGG-16 . I have high loss around 7-8 and accuracy is around 50%. I want to improve the accuracy.
1. Could you explain me if my data set is set correctly?
trdata = ImageDataGenerator()
traindata =
trdata.flow_from_directory(directory="/Users/khand/OneDrive/Desktop/Thesis/Case_db/data",target_size=(224,224))
tsdata = ImageDataGenerator()
testdata = tsdata.flow_from_directory(directory="/Users/khand/OneDrive/Desktop/Thesis/Case_db/data", target_size=(224,224))
Here is how I set my data set and in the folder of "data" I have 2 subfolder 1 is containing main data other one containing labels.
I think connection between networks and layers are fine since I can train the network.
from keras.callbacks import ModelCheckpoint, EarlyStopping
checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=20, verbose=1, mode='auto')
hist = model.fit_generator( steps_per_epoch=10,generator=traindata, validation_data=
testdata,validation_steps=10,epochs=10,callbacks=[ModelCheckpoint('VGG16-transferlearning.model', monitor='val_acc', save_best_only=True)])
Above how my validation and training goes on and result is below:
Epoch 1/10
10/10 [==============================] - 253s 25s/step - loss: 8.1311 - accuracy: 0.4437 - val_loss: 7.5554 - val_accuracy: 0.4875
Epoch 2/10
C:\Users\khand\Anaconda3\envs\TensorFlow-GPU\lib\site-packages\keras\callbacks\callbacks.py:707: RuntimeWarning: Can save best model only with val_acc available, skipping.
'skipping.' % (self.monitor), RuntimeWarning)
10/10 [==============================] - 255s 26s/step - loss: 7.8576 - accuracy: 0.5000 - val_loss: 5.0369 - val_accuracy: 0.5281
Epoch 3/10
10/10 [==============================] - 263s 26s/step - loss: 8.0590 - accuracy: 0.5000 - val_loss: 8.0590 - val_accuracy: 0.5094
Epoch 4/10
10/10 [==============================] - 258s 26s/step - loss: 7.6561 - accuracy: 0.5250 - val_loss: 7.0517 - val_accuracy: 0.4765
Epoch 5/10
10/10 [==============================] - 246s 25s/step - loss: 7.9090 - accuracy: 0.4899 - val_loss: 9.0664 - val_accuracy: 0.5281
Epoch 6/10
10/10 [==============================] - 257s 26s/step - loss: 7.7065 - accuracy: 0.5219 - val_loss: 8.5627 - val_accuracy: 0.4812
Epoch 7/10
10/10 [==============================] - 244s 24s/step - loss: 7.9079 - accuracy: 0.5094 - val_loss: 8.5627 - val_accuracy: 0.5031
Epoch 8/10
10/10 [==============================] - 231s 23s/step - loss: 8.5147 - accuracy: 0.4765 - val_loss: 5.5406 - val_accuracy: 0.4966
Epoch 9/10
10/10 [==============================] - 251s 25s/step - loss: 8.3613 - accuracy: 0.4812 - val_loss: 5.5406 - val_accuracy: 0.4938
Epoch 10/10
10/10 [==============================] - 247s 25s/step - loss: 8.0087 - accuracy: 0.5031 - val_loss: 8.5627 - val_accuracy: 0.4906
If you have any suggestion please feel free to help
I am using Keras with TensorFlow backend to train an LSTM network for some time-sequential data sets. The performance seems pretty good when I represent my training data (as well as the validation data) in the Numpy array format:
train_x.shape: (128346, 10, 34)
val_x.shape: (7941, 10, 34)
test_x.shape: (24181, 10, 34)
train_y.shape: (128346, 2)
val_y.shape: (7941, 2)
test_y.shape: (24181, 2)
P.s., 10 is the time steps and 34 is the number of features; The labels were one-hot encoded.
model = tf.keras.Sequential()
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True,
input_shape=(_TIME_STEPS, _FEATURE_DIMENTIONS)))
model.add(layers.Dropout(0.4))
model.add(layers.LSTM(_HIDDEN_SIZE, return_sequences=True))
model.add(layers.Dropout(0.3))
model.add(layers.TimeDistributed(layers.Dense(_NUM_CLASSES)))
model.add(layers.Flatten())
model.add(layers.Dense(_NUM_CLASSES, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr = _LR)
model.compile(optimizer = opt, loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
And the training results are:
Train on 128346 samples, validate on 7941 samples
Epoch 1/10
128346/128346 [==============================] - 50s 390us/step - loss: 0.5883 - acc: 0.6975 - val_loss: 0.5242 - val_acc: 0.7416
Epoch 2/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4804 - acc: 0.7687 - val_loss: 0.4265 - val_acc: 0.8014
Epoch 3/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.4232 - acc: 0.8076 - val_loss: 0.4095 - val_acc: 0.8096
Epoch 4/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3894 - acc: 0.8276 - val_loss: 0.3529 - val_acc: 0.8469
Epoch 5/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3610 - acc: 0.8430 - val_loss: 0.3283 - val_acc: 0.8593
Epoch 6/10
128346/128346 [==============================] - 49s 382us/step - loss: 0.3402 - acc: 0.8525 - val_loss: 0.3334 - val_acc: 0.8558
Epoch 7/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3233 - acc: 0.8604 - val_loss: 0.2944 - val_acc: 0.8741
Epoch 8/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.3087 - acc: 0.8663 - val_loss: 0.2786 - val_acc: 0.8805
Epoch 9/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2969 - acc: 0.8709 - val_loss: 0.2785 - val_acc: 0.8777
Epoch 10/10
128346/128346 [==============================] - 49s 383us/step - loss: 0.2867 - acc: 0.8757 - val_loss: 0.2590 - val_acc: 0.8877
This log seems pretty normal, but when I tried to use TensorFlow Dataset API to represent my data sets, the training process performed very strange (it seems that the model turns to overfit/underfit?):
def tfdata_generator(features, labels, is_training = False, batch_size = _BATCH_SIZE, epoch = _EPOCH):
dataset = tf.data.Dataset.from_tensor_slices((features, tf.cast(labels, dtype = tf.uint8)))
if is_training:
dataset = dataset.shuffle(10000) # depends on sample size
dataset = dataset.batch(batch_size, drop_remainder = True).repeat(epoch).prefetch(batch_size)
return dataset
training_set = tfdata_generator(train_x, train_y, is_training=True)
validation_set = tfdata_generator(val_x, val_y, is_training=False)
testing_set = tfdata_generator(test_x, test_y, is_training=False)
Training on the same model and hyperparameters:
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
And the log seems much different from the previous one:
Epoch 1/10
2005/2005 [==============================] - 54s 27ms/step - loss: 0.1451 - acc: 0.9419 - val_loss: 3.2980 - val_acc: 0.4975
Epoch 2/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1675 - acc: 0.9371 - val_loss: 3.0838 - val_acc: 0.4975
Epoch 3/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1821 - acc: 0.9316 - val_loss: 3.1212 - val_acc: 0.4975
Epoch 4/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1902 - acc: 0.9287 - val_loss: 3.0032 - val_acc: 0.4975
Epoch 5/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1905 - acc: 0.9283 - val_loss: 2.9671 - val_acc: 0.4975
Epoch 6/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1867 - acc: 0.9299 - val_loss: 2.8734 - val_acc: 0.4975
Epoch 7/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1802 - acc: 0.9316 - val_loss: 2.8651 - val_acc: 0.4975
Epoch 8/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1740 - acc: 0.9350 - val_loss: 2.8793 - val_acc: 0.4975
Epoch 9/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1660 - acc: 0.9388 - val_loss: 2.7894 - val_acc: 0.4975
Epoch 10/10
2005/2005 [==============================] - 49s 24ms/step - loss: 0.1613 - acc: 0.9405 - val_loss: 2.7997 - val_acc: 0.4975
The validation loss could not be reduced and the val_acc always the same value when I use the TensorFlow Dataset API to represent my data.
My questions are:
Based on the same model and parameters, why the model.fit() provides such different training results when I merely adopted tf.data.Dataset API?
What the difference between these two mechanisms?
model.fit(train_x,
train_y,
epochs=_EPOCH,
batch_size = _BATCH_SIZE,
verbose = 1,
validation_data = (val_x, val_y)
)
vs
model.fit(
training_set.make_one_shot_iterator(),
epochs = _EPOCH,
steps_per_epoch = len(train_x) // _BATCH_SIZE,
verbose = 1,
validation_data = validation_set.make_one_shot_iterator(),
validation_steps = len(val_x) // _BATCH_SIZE
)
How to solve this strange problem if I have to use tf.data.Dataset API?