I'm trying to get good accuracy with Keras (TensorFlow as the backend) using categorical_crossentropy for multiclass classification problem (Heart disease dataset). My model can reach good training accuracy, but the validation accuracy is low (with high validation loss). I have tried over-fitting solutions (e.g., normalization, dropout, regularization, etc.), but I still have the same problem. I have been playing around with optimizers, losses, epochs and batchsizes with no success so far. This is the code I am using:
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.optimizers import SGD,Adam
from keras.layers import Dense, Dropout
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from keras.models import load_model
from keras.regularizers import l1,l2
# fix random seed for reproducibility
np.random.seed(5)
data = pd.read_csv('ProcessedClevelandData.csv',delimiter=',',header=None)
#Missing Values
Imp=SimpleImputer(missing_values=np.nan,strategy='mean',copy=True)
Imp=Imp.fit(data.values)
Imp.transform(data)
X = data.iloc[:, :-1].values
y=data.iloc[:,-1].values
y=to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.1)
scaler = StandardScaler()
X_train_norm = scaler.fit_transform(X_train)
X_test_norm=scaler.transform(X_test)
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, activation='relu',use_bias=True,kernel_regularizer=l2(0.0001)))
#model.add(Dropout(0.05))
model.add(Dense(9, activation='relu',use_bias=True,kernel_regularizer=l2(0.0001)))
#model.add(Dropout(0.05))
model.add(Dense(5,activation='softmax'))
sgd = SGD(lr=0.01, decay=0.01/32, nesterov=False)
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])#adam,adadelta,
print(model.summary())
history=model.fit(X_train_norm, y_train,validation_data=(X_test_norm,y_test), epochs=1200, batch_size=32,shuffle=True)
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
And this is part of the output in which you can see the aforementioned behavior:
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 13) 182
_________________________________________________________________
dense_2 (Dense) (None, 9) 126
_________________________________________________________________
dense_3 (Dense) (None, 5) 50
=================================================================
Total params: 358
Trainable params: 358
Non-trainable params: 0
_________________________________________________________________
Train on 272 samples, validate on 31 samples
Epoch 1/1200
32/272 [==>...........................] - ETA: 21s - loss: 1.9390 - acc: 0.1562
272/272 [==============================] - 3s 11ms/step - loss: 2.0505 - acc: 0.1434 - val_loss: 2.0875 - val_acc: 0.1613
Epoch 2/1200
32/272 [==>...........................] - ETA: 0s - loss: 1.6747 - acc: 0.2188
272/272 [==============================] - 0s 33us/step - loss: 1.9416 - acc: 0.1544 - val_loss: 1.9749 - val_acc: 0.1290
Epoch 3/1200
32/272 [==>...........................] - ETA: 0s - loss: 1.7708 - acc: 0.2812
272/272 [==============================] - 0s 37us/step - loss: 1.8493 - acc: 0.1801 - val_loss: 1.8823 - val_acc: 0.1290
Epoch 4/1200
32/272 [==>...........................] - ETA: 0s - loss: 1.9051 - acc: 0.2188
272/272 [==============================] - 0s 33us/step - loss: 1.7763 - acc: 0.1949 - val_loss: 1.8002 - val_acc: 0.1613
Epoch 5/1200
32/272 [==>...........................] - ETA: 0s - loss: 1.6337 - acc: 0.2812
272/272 [==============================] - 0s 33us/step - loss: 1.7099 - acc: 0.2426 - val_loss: 1.7284 - val_acc: 0.1935
Epoch 6/1200
....
32/272 [==>...........................] - ETA: 0s - loss: 0.0494 - acc: 1.0000
272/272 [==============================] - 0s 37us/step - loss: 0.0532 - acc: 1.0000 - val_loss: 4.1031 - val_acc: 0.5806
Epoch 1197/1200
32/272 [==>...........................] - ETA: 0s - loss: 0.0462 - acc: 1.0000
272/272 [==============================] - 0s 33us/step - loss: 0.0529 - acc: 1.0000 - val_loss: 4.1174 - val_acc: 0.5806
Epoch 1198/1200
32/272 [==>...........................] - ETA: 0s - loss: 0.0648 - acc: 1.0000
272/272 [==============================] - 0s 37us/step - loss: 0.0533 - acc: 1.0000 - val_loss: 4.1247 - val_acc: 0.5806
Epoch 1199/1200
32/272 [==>...........................] - ETA: 0s - loss: 0.0610 - acc: 1.0000
272/272 [==============================] - 0s 29us/step - loss: 0.0532 - acc: 1.0000 - val_loss: 4.1113 - val_acc: 0.5484
Epoch 1200/1200
32/272 [==>...........................] - ETA: 0s - loss: 0.0511 - acc: 1.0000
272/272 [==============================] - 0s 29us/step - loss: 0.0529 - acc: 1.0000 - val_loss: 4.1209 - val_acc: 0.5484
Help yourself by increasing your validation size to more like ~30%, unless you really have a large data set. Even 50/50 is often used.
Remember that good loss and acc with bad val_loss and val_acc implies overfitting.
Try this basic solution:
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
early_stop = EarlyStopping(monitor='val_loss',patience=10)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1,
patience=6, verbose=1, mode='auto',
min_delta=0.0001, cooldown=0, min_lr=1e-8)
history = model.fit(X,y,num_epochs=666,callbacks=[early_stop,reduce_lr])
Hope that helps!
A problem could be that your data is unevenly distributed in the training and testing split (as mentioned in a comment). Try to see whether the distributions are uneven and try a different seed if it is the case.
I have had a similar problem before with a small medical dataset. The smaller the dataset the higher the likelyhood that the split datasets will not accurately represent the real distribution.
Edit: depending on how you set your seed you can do
np.random.seed(my_seed) to set it for numpy, or random.seed(my_seed) to set it for the python module, or to set it for keras follow their documentation.
Related
I am new to Keras and have been practicing with resources from the web. Unfortunately, I cannot build a model without it throwing the following error:
ValueError: logits and labels must have the same shape, received ((None, 10) vs (None, 1)).
I have attempted the following:
DF = pd.read_csv("https://raw.githubusercontent.com/EpistasisLab/tpot/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope%20Data.csv")
X = DF.iloc[:,0:-1]
y = DF.iloc[:,-1]
yBin = np.array([1 if x == 'g' else 0 for x in y ])
scaler = StandardScaler()
X1 = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X1, yBin, test_size=0.25, random_state=2018)
print(X_train.__class__,X_test.__class__,y_train.__class__,y_test.__class__ )
model=Sequential()
model.add(Dense(6,activation="relu", input_shape=(10,)))
model.add(Dense(10,activation="softmax"))
model.build(input_shape=(None,1))
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x=X_train,
y=y_train,
epochs=600,
validation_data=(X_test, y_test), verbose=1
)
I have read my model is likely wrong in terms of input parameters, what is the correct approach?
When I look at the shape of your data
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)
I see, that X is 10-dimensional and y us 1-dimensional
Therefore, you need 10-dimensional input
model.build(input_shape=(None,10))
and 1-dimensional output in the last dense layer
model.add(Dense(1,activation="softmax"))
Target variable yBin/y_train/y_test is 1D array (has a shape (None,1) for a given batch).
Your logits come from the Dense layer and the last Dense layer has 10 neurons with softmax activation. So it will give 10 outputs for each input or (batch_size,10) for each batch. This is represented formally as (None,10).
To resolve the particular shape mismatch issue in question change the neuron count of dense layer to 1 and set activation finction to "sigmoid".
model.add(Dense(1,activation="sigmoid"))
As correctly mentioned by #MSS, You need to use sigmoid activation function with 1 neuron in the last dense layer to match the logits with the labels(1,0) of your dataset which indicates binary class.
Fixed code:
model=Sequential()
model.add(Dense(6,activation="relu", input_shape=(10,)))
model.add(Dense(1,activation="sigmoid"))
#model.build(input_shape=(None,1))
model.summary()
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(x=X_train,y=y_train,epochs=10,validation_data=(X_test, y_test),verbose=1)
Output:
Epoch 1/10
446/446 [==============================] - 3s 4ms/step - loss: 0.5400 - accuracy: 0.7449 - val_loss: 0.4769 - val_accuracy: 0.7800
Epoch 2/10
446/446 [==============================] - 2s 4ms/step - loss: 0.4425 - accuracy: 0.7987 - val_loss: 0.4241 - val_accuracy: 0.8095
Epoch 3/10
446/446 [==============================] - 2s 3ms/step - loss: 0.4082 - accuracy: 0.8175 - val_loss: 0.4034 - val_accuracy: 0.8242
Epoch 4/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3934 - accuracy: 0.8286 - val_loss: 0.3927 - val_accuracy: 0.8313
Epoch 5/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3854 - accuracy: 0.8347 - val_loss: 0.3866 - val_accuracy: 0.8320
Epoch 6/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3800 - accuracy: 0.8397 - val_loss: 0.3827 - val_accuracy: 0.8364
Epoch 7/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3762 - accuracy: 0.8411 - val_loss: 0.3786 - val_accuracy: 0.8387
Epoch 8/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3726 - accuracy: 0.8432 - val_loss: 0.3764 - val_accuracy: 0.8404
Epoch 9/10
446/446 [==============================] - 2s 3ms/step - loss: 0.3695 - accuracy: 0.8466 - val_loss: 0.3724 - val_accuracy: 0.8408
Epoch 10/10
446/446 [==============================] - 2s 4ms/step - loss: 0.3665 - accuracy: 0.8478 - val_loss: 0.3698 - val_accuracy: 0.8454
<keras.callbacks.History at 0x7f68ca30f670>
Value of val_acc does not change over the epochs.
Summary:
I'm using a pre-trained (ImageNet) VGG16 from Keras;
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet', include_top=True, input_shape=(224, 224, 3))
Database from ISBI 2016 (ISIC) - which is a set of 900 images of skin lesion used for binary classification (malignant or benign) for training and validation, plus 379 images for testing -;
I use the top dense layers of VGG16 except the last one (that classifies over 1000 classes), and use a binary output with sigmoid function activation;
conv_base.layers.pop() # Remove last one
conv_base.trainable = False
model = models.Sequential()
model.add(conv_base)
model.add(layers.Dense(1, activation='sigmoid'))
Unlock the dense layers setting them to trainable;
Fetch the data, which are in two different folders, one named "malignant" and the other "benign", within the "training data" folder;
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
folder = 'ISBI2016_ISIC_Part3_Training_Data'
batch_size = 20
full_datagen = ImageDataGenerator(
rescale=1./255,
#rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
validation_split = 0.2, # 20% validation
horizontal_flip=True)
train_generator = full_datagen.flow_from_directory( # Found 721 images belonging to 2 classes.
folder,
target_size=(224, 224),
batch_size=batch_size,
subset = 'training',
class_mode='binary')
validation_generator = full_datagen.flow_from_directory( # Found 179 images belonging to 2 classes.
folder,
target_size=(224, 224),
batch_size=batch_size,
subset = 'validation',
shuffle=False,
class_mode='binary')
model.compile(loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=0.001), # High learning rate
metrics=['accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=721 // batch_size+1,
epochs=20,
validation_data=validation_generator,
validation_steps=180 // batch_size+1,
)
Then I fine-tune it with 100 more epochs and lower learning rate, setting the last convolutional layer to trainable.
I've tried many things such as:
Changing the optimizer (RMSprop, Adam and SGD);
Removing the top dense layers of the pre-trained VGG16 and adding mine;
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Shuffle=True in validation_generator;
Changing batch size;
Varying the learning rate (0.001, 0.0001, 2e-5).
The results are similar to the following:
Epoch 1/100
37/37 [==============================] - 33s 900ms/step - loss: 0.6394 - acc: 0.7857 - val_loss: 0.6343 - val_acc: 0.8101
Epoch 2/100
37/37 [==============================] - 30s 819ms/step - loss: 0.6342 - acc: 0.8107 - val_loss: 0.6342 - val_acc: 0.8101
Epoch 3/100
37/37 [==============================] - 30s 822ms/step - loss: 0.6324 - acc: 0.8188 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 4/100
37/37 [==============================] - 31s 840ms/step - loss: 0.6346 - acc: 0.8080 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 5/100
37/37 [==============================] - 31s 833ms/step - loss: 0.6395 - acc: 0.7843 - val_loss: 0.6341 - val_acc: 0.8101
Epoch 6/100
37/37 [==============================] - 31s 829ms/step - loss: 0.6334 - acc: 0.8134 - val_loss: 0.6340 - val_acc: 0.8101
Epoch 7/100
37/37 [==============================] - 31s 834ms/step - loss: 0.6334 - acc: 0.8134 - val_loss: 0.6340 - val_acc: 0.8101
Epoch 8/100
37/37 [==============================] - 31s 829ms/step - loss: 0.6342 - acc: 0.8093 - val_loss: 0.6339 - val_acc: 0.8101
Epoch 9/100
37/37 [==============================] - 31s 849ms/step - loss: 0.6330 - acc: 0.8147 - val_loss: 0.6339 - val_acc: 0.8101
Epoch 10/100
37/37 [==============================] - 30s 812ms/step - loss: 0.6332 - acc: 0.8134 - val_loss: 0.6338 - val_acc: 0.8101
Epoch 11/100
37/37 [==============================] - 31s 839ms/step - loss: 0.6338 - acc: 0.8107 - val_loss: 0.6338 - val_acc: 0.8101
Epoch 12/100
37/37 [==============================] - 30s 807ms/step - loss: 0.6334 - acc: 0.8120 - val_loss: 0.6337 - val_acc: 0.8101
Epoch 13/100
37/37 [==============================] - 32s 852ms/step - loss: 0.6334 - acc: 0.8120 - val_loss: 0.6337 - val_acc: 0.8101
Epoch 14/100
37/37 [==============================] - 31s 826ms/step - loss: 0.6330 - acc: 0.8134 - val_loss: 0.6336 - val_acc: 0.8101
Epoch 15/100
37/37 [==============================] - 32s 854ms/step - loss: 0.6335 - acc: 0.8107 - val_loss: 0.6336 - val_acc: 0.8101
And goes on the same way, with constant val_acc = 0.8101.
When I use the test set after finishing training, the confusion matrix gives me 100% correct on benign lesions (304) and 0% on malignant, as so:
Confusion Matrix
[[304 0]
[ 75 0]]
What could I be doing wrong?
Thank you.
VGG16 was trained on RGB centered data. Your ImageDataGenerator does not enable featurewise_center, however, so you're feeding your net with raw RGB data. The VGG convolutional base can't process this to provide any meaningful information, so your net ends up universally guessing the more common class.
In general, when you see this type of problem (your net exclusively guessing the most common class), it means that there's something wrong with your data, not with the net. It can be caused by a preprocessing step like this or by a significant portion of "poisoned" anomalous training data that actively harms the training process.
I am running a simple neural network using tensorflow as in the below code. However I don't understand why the loss values decrease significantly to nearly zero after several batches, but the acc values don't really increase?
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3327) 0
_________________________________________________________________
dense_2 (Dense) (None, 3) 9984
=================================================================
Total params: 9,984
Trainable params: 9,984
Non-trainable params: 0
_________________________________________________________________
None
Train on 480888 samples, validate on 53433 samples
Epoch 1/20
32/480888 [..............................] - ETA: 1092:47:40 - loss: 0.6549 - acc: 0.3125
224/480888 [..............................] - ETA: 156:05:19 - loss: 0.0936 - acc: 0.3393
576/480888 [..............................] - ETA: 60:40:08 - loss: 0.0364 - acc: 0.3490
1024/480888 [..............................] - ETA: 34:06:04 - loss: 0.0205 - acc: 0.3555
1440/480888 [..............................] - ETA: 24:14:01 - loss: 0.0146 - acc: 0.3604
1856/480888 [..............................] - ETA: 18:47:22 - loss: 0.0113 - acc: 0.3594
...
...
20960/480888 [>.............................] - ETA: 1:36:49 - loss: 9.9997e-04 - acc: 0.3468
21440/480888 [>.............................] - ETA: 1:34:34 - loss: 9.7758e-04 - acc: 0.3465
21888/480888 [>.............................] - ETA: 1:32:34 - loss: 9.5757e-04 - acc: 0.3469
22336/480888 [>.............................] - ETA: 1:30:38 - loss: 9.3837e-04 - acc: 0.3476
22784/480888 [>.............................] - ETA: 1:28:47 - loss: 9.1992e-04 - acc: 0.3477
23264/480888 [>.............................] - ETA: 1:26:53 - loss: 9.0094e-04 - acc: 0.3475
23712/480888 [>.............................] - ETA: 1:25:10 - loss: 8.8392e-04 - acc: 0.3479
The above acc value is evaluated on each current batch, is that correct?
scaler = preprocessing.MinMaxScaler()
scalerMaxAbs = preprocessing.MaxAbsScaler()
training_metadata = scaler.fit_transform(training_data[metadata].astype(np.float32))
testing_metadata = scaler.transform(testing_data[metadata].astype(np.float32))
training_scores = scalerMaxAbs.fit_transform(training_data[scores])
testing_scores = scalerMaxAbs.transform(testing_data[scores])
y_train = np_utils.to_categorical(training_data['label'], num_classes=3)
y_test = np_utils.to_categorical(testing_data['label'], num_classes=3)
training_features = np.concatenate((training_metadata, training_scores), axis=1)
testing_features = np.concatenate((testing_metadata, testing_scores), axis=1)
inputs = Input(shape=(training_features.shape[1],), dtype='float32')
hh_layer = Dense(128, activation=tf.nn.relu)(inputs)
dropout = Dropout(0.2)(hh_layer)
output = Dense(3, activation=tf.nn.softmax)(inputs)
model = Model(inputs=inputs, output=output)
print(model.summary())
early_stopping_monitor = EarlyStopping(patience=3)
adam = Adam(lr=0.001)
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(training_features, y_train, epochs=20, validation_split=0.1, callbacks=[early_stopping_monitor])
score = model.evaluate(testing_features, y_test)
Result of 4 Epochs:
480888/480888 [==============================] - 332s 691us/step - loss: 4.3699e-05 - acc: 0.3474 - val_loss: 1.1921e-07 - val_acc: 0.3493
480888/480888 [==============================] - 71s 148us/step - loss: 1.1921e-07 - acc: 0.3474 - val_loss: 1.1921e-07 - val_acc: 0.3493
480888/480888 [==============================] - 71s 148us/step - loss: 1.1921e-07 - acc: 0.3474 - val_loss: 1.1921e-07 - val_acc: 0.3493
480888/480888 [==============================] - 71s 147us/step - loss: 1.1921e-07 - acc: 0.3474 - val_loss: 1.1921e-07 - val_acc: 0.3493
Final result on test set after 4 epochs with early stopping
loss, acc: [1.1920930376163597e-07, 0.34758880897839645]
Step0: save the model, load it then and check its performance on a few samples to make sure if there's any bug in computing loss or acc. If you find nothing wrong, update the question then.
When I use Keras to train a model with model.fit(), I see a progress bar that looks like this:
Epoch 1/10
8000/8000 [==========] - 55s 7ms/step - loss: 0.9318 - acc: 0.0783 - val_loss: 0.8631 - val_acc: 0.1180
Epoch 2/10
8000/8000 [==========] - 55s 7ms/step - loss: 0.6587 - acc: 0.1334 - val_loss: 0.7052 - val_acc: 0.1477
Epoch 3/10
8000/8000 [==========] - 54s 7ms/step - loss: 0.5701 - acc: 0.1526 - val_loss: 0.6445 - val_acc: 0.1632
To improve readability, I would like to have the epoch number on the same line as the progress bar, like this:
Epoch 1/10: 8000/8000 [==========] - 55s 7ms/step - loss: 0.9318 - acc: 0.0783 - val_loss: 0.8631 - val_acc: 0.1180
Epoch 2/10: 8000/8000 [==========] - 55s 7ms/step - loss: 0.6587 - acc: 0.1334 - val_loss: 0.7052 - val_acc: 0.1477
Epoch 3/10: 8000/8000 [==========] - 54s 7ms/step - loss: 0.5701 - acc: 0.1526 - val_loss: 0.6445 - val_acc: 0.1632
How can I make that change? I know that Keras has callbacks that can be invoked during training, but I am not familiar with how that works.
If you want to use an alternative, you could use tqdm (version >= 4.41.0):
from tqdm.keras import TqdmCallback
...
model.fit(..., verbose=0, callbacks=[TqdmCallback(verbose=2)])
This turns off keras' progress (verbose=0), and uses tqdm instead. For the callback, verbose=2 means separate progressbars for epochs and batches. 1 means clear batch bars when done. 0 means only show epochs (never show batch bars).
Yes, you can use callbacks (https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback). For example:
import tensorflow as tf
class PrintLogs(tf.keras.callbacks.Callback):
def __init__(self, epochs):
self.epochs = epochs
def set_params(self, params):
params['epochs'] = 0
def on_epoch_begin(self, epoch, logs=None):
print('Epoch %d/%d' % (epoch + 1, self.epochs), end='')
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
epochs = 5
model.fit(x_train, y_train,
epochs=epochs,
validation_split=0.2,
verbose = 2,
callbacks=[PrintLogs(epochs)])
output:
Train on 48000 samples, validate on 12000 samples
Epoch 1/5 - 10s - loss: 0.0306 - acc: 0.9901 - val_loss: 0.0837 - val_acc: 0.9786
Epoch 2/5 - 9s - loss: 0.0269 - acc: 0.9910 - val_loss: 0.0839 - val_acc: 0.9788
Epoch 3/5 - 9s - loss: 0.0253 - acc: 0.9915 - val_loss: 0.0895 - val_acc: 0.9781
Epoch 4/5 - 9s - loss: 0.0201 - acc: 0.9930 - val_loss: 0.0871 - val_acc: 0.9792
Epoch 5/5 - 9s - loss: 0.0206 - acc: 0.9931 - val_loss: 0.0917 - val_acc: 0.9793
I'm working on a multi-class classification problem with Keras 2.1.3 and a Tensorflow backend. I have two numpy arrays, x and y and I'm using tf.data.Dataset like this:
dataset = tf.data.Dataset.from_tensor_slices(({"sequence": x}, y))
dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(self.batch_size))
dataset = dataset.repeat()
xt, yt = dataset.make_one_shot_iterator().get_next()
Then I make my Keras model (omitted for brevity), compile, and fit:
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'],
)
model.fit(xt, yt, steps_per_epoch=100, epochs=10)
This works perfectly well. But when I add in callbacks, I run into issues. Specifically, if I do this:
callbacks = [
tf.keras.callbacks.ModelCheckpoint("model_{epoch:04d}_{val_acc:.4f}.h5",
monitor='val_acc',
verbose=1,
save_best_only=True,
mode='max'),
tf.keras.callbacks.TensorBoard(os.path.join('.', 'logs')),
tf.keras.callbacks.EarlyStopping(monitor='val_acc', patience=5, min_delta=0, mode='max')
]
model.fit(xt, yt, steps_per_epoch=10, epochs=100, callbacks=callbacks)
I get:
KeyError: 'val_acc'
Also, if I include validation_split=0.1 in my model.fit(...) call, I'm told:
ValueError: If your data is in the form of symbolic tensors, you cannot use validation_split.`
What is the normal way to use callbacks and validation splits with tf.data.Dataset (tensors)?
Thanks!
Using the tensorflow keras API, you can provide a Dataset for training and another for validation.
First some imports
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense
import numpy as np
define the function which will split the numpy arrays into training/val
def split(x, y, val_size=50):
idx = np.random.choice(x.shape[0], size=val_size, replace=False)
not_idx = list(set(range(x.shape[0])).difference(set(idx)))
x_val = x[idx]
y_val = y[idx]
x_train = x[not_idx]
y_train = y[not_idx]
return x_train, y_train, x_val, y_val
define numpy arrays and the train/val tensorflow Datasets
x = np.random.randn(150, 9)
y = np.random.randint(0, 10, 150)
x_train, y_train, x_val, y_val = split(x, y)
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, tf.one_hot(y_train, depth=10)))
train_dataset = train_dataset.batch(32).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, tf.one_hot(y_val, depth=10)))
val_dataset = val_dataset.batch(32).repeat()
Make the model (notice we are using the tensorflow keras API)
model = keras.models.Sequential([Dense(64, input_shape=(9,), activation='relu'),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
model.compile(optimizer='Adam', loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
model.fit(train_dataset,
epochs=10,
steps_per_epoch=int(100/32)+1,
validation_data=val_dataset,
validation_steps=2)
and the model trains, kind of (output):
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 64) 640
_________________________________________________________________
dense_1 (Dense) (None, 64) 4160
_________________________________________________________________
dense_2 (Dense) (None, 10) 650
=================================================================
Total params: 5,450
Trainable params: 5,450
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
4/4 [==============================] - 0s 69ms/step - loss: 2.3170 - acc: 0.1328 - val_loss: 2.3877 - val_acc: 0.0712
Epoch 2/10
4/4 [==============================] - 0s 2ms/step - loss: 2.2628 - acc: 0.2500 - val_loss: 2.3850 - val_acc: 0.0712
Epoch 3/10
4/4 [==============================] - 0s 2ms/step - loss: 2.2169 - acc: 0.2656 - val_loss: 2.3838 - val_acc: 0.0712
Epoch 4/10
4/4 [==============================] - 0s 2ms/step - loss: 2.1743 - acc: 0.3359 - val_loss: 2.3830 - val_acc: 0.0590
Epoch 5/10
4/4 [==============================] - 0s 2ms/step - loss: 2.1343 - acc: 0.3594 - val_loss: 2.3838 - val_acc: 0.0590
Epoch 6/10
4/4 [==============================] - 0s 2ms/step - loss: 2.0959 - acc: 0.3516 - val_loss: 2.3858 - val_acc: 0.0590
Epoch 7/10
4/4 [==============================] - 0s 4ms/step - loss: 2.0583 - acc: 0.3750 - val_loss: 2.3887 - val_acc: 0.0590
Epoch 8/10
4/4 [==============================] - 0s 2ms/step - loss: 2.0223 - acc: 0.4453 - val_loss: 2.3918 - val_acc: 0.0747
Epoch 9/10
4/4 [==============================] - 0s 2ms/step - loss: 1.9870 - acc: 0.4609 - val_loss: 2.3954 - val_acc: 0.1059
Epoch 10/10
4/4 [==============================] - 0s 2ms/step - loss: 1.9523 - acc: 0.4609 - val_loss: 2.3995 - val_acc: 0.1059
Callbacks
Adding callbacks also works,
callbacks = [ tf.keras.callbacks.ModelCheckpoint("model_{epoch:04d}_{val_acc:.4f}.h5",
monitor='val_acc',
verbose=1,
save_best_only=True,
mode='max'),
tf.keras.callbacks.TensorBoard('./logs'),
tf.keras.callbacks.EarlyStopping(monitor='val_acc', patience=5, min_delta=0, mode='max')
]
model.fit(train_dataset, epochs=10, steps_per_epoch=int(100/32)+1, validation_data=val_dataset,
validation_steps=2, callbacks=callbacks)
Output:
Epoch 1/10
4/4
[==============================] - 0s 59ms/step - loss: 2.3274 - acc: 0.1094 - val_loss: 2.3143 - val_acc: 0.0833
Epoch 00001: val_acc improved from -inf to 0.08333, saving model to model_0001_0.0833.h5
Epoch 2/10
4/4 [==============================] - 0s 2ms/step - loss: 2.2655 - acc: 0.1094 - val_loss: 2.3204 - val_acc: 0.1389
Epoch 00002: val_acc improved from 0.08333 to 0.13889, saving model to model_0002_0.1389.h5
Epoch 3/10
4/4 [==============================] - 0s 5ms/step - loss: 2.2122 - acc: 0.1250 - val_loss: 2.3289 - val_acc: 0.1111
Epoch 00003: val_acc did not improve from 0.13889
Epoch 4/10
4/4 [==============================] - 0s 2ms/step - loss: 2.1644 - acc: 0.1953 - val_loss: 2.3388 - val_acc: 0.0556
Epoch 00004: val_acc did not improve from 0.13889
Epoch 5/10
4/4 [==============================] - 0s 2ms/step - loss: 2.1211 - acc: 0.2734 - val_loss: 2.3495 - val_acc: 0.0556
Epoch 00005: val_acc did not improve from 0.13889
Epoch 6/10
4/4 [==============================] - 0s 4ms/step - loss: 2.0808 - acc: 0.2969 - val_loss: 2.3616 - val_acc: 0.0556
Epoch 00006: val_acc did not improve from 0.13889
Epoch 7/10
4/4 [==============================] - 0s 2ms/step - loss: 2.0431 - acc: 0.2969 - val_loss: 2.3749 - val_acc: 0.0712
Epoch 00007: val_acc did not improve from 0.13889
I think your problem is type of your datasets, where xt and yt is not a list or array, so it can not to slice or split.
Let assume x is array with shape (1000,2) fill with random number using numpy and y is class with 10 classes.
# Define x and y
import numpy as np
np.random.seed(2018)
n_cat = 10
x = np.random.rand(1000,2) # Generate random 2 variables and 1000 rows
y = [np.random.randint(1,n_cat) for i in range(len(x))] # Make 10 class
Then you assemble the model and compile
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'],
)
Create callbacks for checkpoint model
callbacks = [
tf.keras.callbacks.ModelCheckpoint("model_{epoch:04d}_{val_acc:.4f}.h5",
monitor='val_acc',
verbose=1,
save_best_only=True,
mode='max'),
tf.keras.callbacks.TensorBoard(os.path.join('.', 'logs')),
tf.keras.callbacks.EarlyStopping(monitor='val_acc', patience=5, min_delta=0, mode='max')
]
fit the model
model.fit(x, y, batch_size=32, epochs=100, callbacks=callbacks, validation_split=0.1)
it's works for me.