I'm trying to understand why tf.keras.callbacks.EarlyStopping isn't behaving as expected.
In the simple reproducible example below, I'd expect training to stop after validation loss has been consistently decreasing for 3 epochs. You can see that is the case from epoch 5.
import tensorflow as tf
from tensorflow.keras import models, layers
import numpy as np
tf.random.set_seed(42)
# make 100 400-dimensional vectors, representing features:
x = np.full((100, 400), 7)
x = np.expand_dims(x, 1)
# make 100 800-dimensional vectors, representing labels:
y = np.full((100, 800), 1)
y = np.expand_dims(y, 1)
model = models.Sequential([
layers.Input((None,400)),
layers.Dense(800)
])
model.compile(loss='mse', optimizer='adam')
model.summary()
history = model.fit(x=x,
y=y,
epochs=100,
validation_split=0.2,
callbacks=tf.keras.callbacks.EarlyStopping(monitor='val_loss', verbose=1, patience=3)
)
Why does the model keeps training until the 100th epoch?
Related
I'm a rookie at machine learning so please bear with me.
I have a model that trains images and classifies them in 3 different classes. I'm trying to get the confusion matrix for the test data, but either I don't understand it or it's not making any sense. When the model is done training after 200 epochs it shows that the accuracy is around 65%:
loss: 1.5386 - accuracy: 0.6583
But then when the confusion matrix is printed like this:
[[23 51 42]
[20 27 25]
[47 69 56]]
Which isn't correct because the "true" results (23, 27 and 56) don't make up for 65% of all the results. If you add up the numbers in the matrix it adds to the amount of test images so I know that part is correct.
I had a tensorflow warning just before printing the confusion matrix that says the following, but I don't really get its meaning:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 13 batches). You may need to use the repeat() function when building your dataset
This is my code:
import sys
from matplotlib import pyplot
import numpy as np
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, Flatten, Dense, MaxPooling2D, Dropout, GlobalAveragePooling2D
from tensorflow.keras.optimizers import SGD, RMSprop, Adam
from keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix
CLASSES = 3
# define cnn model
def define_model():
# load model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
#add new classifier layers
x = base_model.output
x = Dropout(0.4)(x)
x = GlobalAveragePooling2D(name='avg_pool')(x)
predictions = Dense(CLASSES, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
# mark loaded layers as not trainable
for layer in base_model.layers:
layer.trainable = False
# compile model
opt = RMSprop(lr=0.0001)
#momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# plot diagnostic learning curves
def summarize_diagnostics(history):
# plot loss
pyplot.subplot(211)
pyplot.title('Cross Entropy Loss')
pyplot.plot(history.history['loss'], color='blue', label='train')
pyplot.plot(history.history['val_loss'], color='orange', label='test')
# plot accuracy
pyplot.subplot(212)
pyplot.title('Classification Accuracy')
pyplot.plot(history.history['accuracy'], color='blue', label='train')
pyplot.plot(history.history['val_accuracy'], color='orange', label='test')
# save plot to file
filename = sys.argv[0].split('/')[-1]
pyplot.savefig(filename + '_plot.png')
pyplot.close()
# run the test harness for evaluating a model
def run_test_harness():
# define model
model = define_model()
# create data generator
datagen = ImageDataGenerator(featurewise_center=True)
# specify imagenet mean values for centering
datagen.mean = [123.68, 116.779, 103.939]
# prepare iterators
train_it = datagen.flow_from_directory('../datasetMainPruebas3ClasesQuitandoConfusosCV1/train/',
class_mode='categorical', batch_size=32, target_size=(224, 224))
test_it = datagen.flow_from_directory('../datasetMainPruebas3ClasesQuitandoConfusosCV1/test/',
class_mode='categorical', batch_size=32, target_size=(224, 224))
# fit model
history = model.fit_generator(train_it, steps_per_epoch=len(train_it),
validation_data=test_it, validation_steps=len(test_it), epochs=200, verbose=1)
# evaluate model
_, acc = model.evaluate_generator(test_it, steps=len(test_it), verbose=1)
print('> %.3f' % (acc * 100.0))
#confusion matrix
Y_pred = model.predict_generator(test_it, len(test_it) + 1)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
cm = confusion_matrix(test_it.classes, y_pred)
print(cm)
# learning curves
summarize_diagnostics(history)
# entry point, run the test harness
run_test_harness()
Any help or tip is welcome, thank you
Edit: after checking one of the comments I changed my CM code to the following:
#confusion matrix
all_y_pred = []
all_y_true = []
for i in range(len(test_it)):
x, y = test_it[i]
y_pred = model.predict(x)
all_y_pred.append(y_pred)
all_y_true.append(y)
all_y_pred = np.concatenate(all_y_pred, axis=0)
all_y_true = np.concatenate(all_y_true, axis=0)
print('Confusion Matrix')
cm = confusion_matrix(all_y_true, all_y_pred)
print(cm)
And now the error I get says "Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets"
Any idea why?
my neural network written in keras, for the problem of binary image classification, after selecting hyperparameters using the keras tuner, produces only zeros.
import keras_tuner
from kerastuner import BayesianOptimization
from keras_tuner import Objective
from tensorflow.keras.models import Model
from tensorflow.keras.applications import Xception
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from torch.nn.modules import activation
def build_model(hp):
# create the base pre-trained model
base_model = Xception(include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False
x = base_model.output
x = Flatten()(x)
hp_units = hp.Int('units', min_value=32, max_value=4096, step=32)
x = Dense(units = hp_units, activation="relu")(x)
hp_rate = hp.Float('rate', min_value = 0.01, max_value=0.9, step=0.01)
x = Dropout(rate = hp_rate)(x)
predictions = Dense(1, activation='sigmoid')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
hp_learning_rate = hp.Float('learning_rate', max_value = 1e-2, min_value = 1e-7, step = 0.0005)
optimizer = hp.Choice('optimizer', ['adam', 'sgd', 'adagrad', 'rmsprop'])
model.compile(optimizer,
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy'])
return model
stop_early = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, min_delta= 0.0001)
tuner = BayesianOptimization(
hypermodel = build_model,
objective = Objective(name="val_accuracy",direction="max"),
max_trials = 10,
directory='/content/best_model_s',
overwrite=False
)
tuner.search(train_batches,
validation_data = valid_batches,
epochs = 100,
callbacks=[stop_early]
)
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
model = tuner.hypermodel.build(best_hps)
history = model.fit(train_batches, validation_data = valid_batches ,epochs=50)
val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))
best_model = tuner.hypermodel.build(best_hps)
# Retrain the model
best_model.fit(train_batches, validation_data = valid_batches , epochs=best_epoch)
test_generator.reset()
predict = best_model.predict_generator(test_generator, steps = len(test_generator.filenames))
I'm guessing that maybe the problem is that the ImageDataGenerator is fed to train with 2 batches of 16 images each, and to test the ImageDataGenerator with 2 batches of 4 images (each batch has an equal number of class representatives).I also noticed that with a small number of epochs, the neural network produces values from 0 to 1, but the more epochs, the closer the response of the neural network is to zero. For a solution, I tried to stop training as soon as the next 5 iterations do not decrease the loss on validation. Again, it seems to me that the matter is in the validation sample, it is very small.
Any advice?
I don't work with Keras or TF very often so just trying to understand how it works. For example, this is a bit confusing: we generate some points of sine plot and trying to predict the remainder:
import numpy as np
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
a = np.array([np.sin(i) for i in np.arange(0, 1000, 0.1)])
b = np.arange(0, 1000, 0.1)
x_train = a[:8000]
x_test = a[8000:]
y_train = b[:8000]
y_test = b[8000:]
model = Sequential(layers.Dense(20, activation='relu'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=x_train, y=y_train, epochs=200, validation_split=0.2)
Now if I generate predictions either by simply calling model(x_test) or by using predict(x_test) method, the array that I get has a shape (2000, 20).
Why is this happening? Why do I get multiple predictions? And how do I get just a 1-dimensional array of predictions?
It's because, in your model, you have 20 relu activated features in your last layer. That gave 20 features of a single instance in the inference time. All you need to do (as you requested) is to use a layer with 1 unit, place it as the last layer, and probably no activation.
Try this:
import numpy as np
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
a = np.array([np.sin(i) for i in np.arange(0, 1000, 0.1)])
b = np.arange(0, 1000, 0.1)
x_train = a[:8000]
x_test = a[8000:]
y_train = b[:8000]
y_test = b[8000:]
model = Sequential(
[
layers.Dense(20, activation='relu'),
layers.Dense(1, activation=None)
])
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=x_train, y=y_train, epochs=2, validation_split=0.2)
model.predict(x_test).shape
(2000, 1)
I'm trying to train a deep classifier in Keras both with and without pretraining of the hidden layers via stacked autoencoders. My problem is that the pretraining seems to drastically degrade performance (i.e. if pretrain is set to False in the code below the training error of the final classification layer converges much faster). This seems completely outrageous to me given that pretraining should only initialize the weights of the hidden layers and I don't see how that could completely kill the models performance even if that initialization does not work very well. I can not include the specific dataset I used but the effect should occur for any appropriate dataset (e.g. minist). What is going on here and how can I fix it?
EDIT: code is now reproducible with the MNIST data, final line prints change in loss function, which is significantly lower with pre-training.
I have also slightly modified the code and added sample learning curves below:
from functools import partial
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.regularizers import l2
from keras.utils import to_categorical
(inputs_train, targets_train), _ = mnist.load_data()
inputs_train = inputs_train[:1000].reshape(1000, 784)
targets_train = to_categorical(targets_train[:1000])
hidden_nodes = [256] * 4
learning_rate = 0.01
regularization = 1e-6
epochs = 30
def train_model(pretrain):
model = Sequential()
layer = partial(Dense,
activation='sigmoid',
kernel_initializer='random_normal',
kernel_regularizer=l2(regularization))
for i, hn in enumerate(hidden_nodes):
kwargs = dict(units=hn, name='hidden_{}'.format(i + 1))
if i == 0:
kwargs['input_dim'] = inputs_train.shape[1]
model.add(layer(**kwargs))
if pretrain:
# train autoencoders
inputs_train_ = inputs_train.copy()
for i, hn in enumerate(hidden_nodes):
autoencoder = Sequential()
autoencoder.add(layer(units=hn,
input_dim=inputs_train_.shape[1],
name='hidden'))
autoencoder.add(layer(units=inputs_train_.shape[1],
name='decode'))
autoencoder.compile(optimizer=SGD(lr=learning_rate, momentum=0.9),
loss='binary_crossentropy')
autoencoder.fit(
inputs_train_,
inputs_train_,
batch_size=32,
epochs=epochs,
verbose=0)
autoencoder.pop()
model.layers[i].set_weights(autoencoder.layers[0].get_weights())
inputs_train_ = autoencoder.predict(inputs_train_)
num_classes = targets_train.shape[1]
model.add(Dense(units=num_classes,
activation='softmax',
name='classify'))
model.compile(optimizer=SGD(lr=learning_rate, momentum=0.9),
loss='categorical_crossentropy')
h = model.fit(
inputs_train,
targets_train,
batch_size=32,
epochs=epochs,
verbose=0)
return h.history['loss']
plt.plot(train_model(pretrain=False), label="Without Pre-Training")
plt.plot(train_model(pretrain=True), label="With Pre-Training")
plt.xlabel("Epoch")
plt.ylabel("Cross-Entropy")
plt.legend()
plt.show()
I have a dataset of 3000 observations. Each observation consists of 3 timeseries of length 200 samples. As the output I have 5 class labels.
So I build train as test sets as follows:
test_split = round(num_samples * 3 / 4)
X_train = X_all[:test_split, :, :] # Start upto just before test_split
y_train = y_all[:test_split]
X_test = X_all[test_split:, :, :] # From test_split to end
y_test = y_all[test_split:]
# Print shapes and class labels
print(X_train.shape)
print(y_train.shape)
> (2250, 200, 3)
> (22250, 5)
I build my network using Keras functional API:
from keras.models import Model
from keras.layers import Dense, Activation, Input, Dropout, concatenate
from keras.layers.recurrent import LSTM
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.callbacks import EarlyStopping
series_len = 200
num_RNN_neurons = 64
ch1 = Input(shape=(series_len, 1), name='ch1')
ch2 = Input(shape=(series_len, 1), name='ch2')
ch3 = Input(shape=(series_len, 1), name='ch3')
ch1_layer = LSTM(num_RNN_neurons, return_sequences=False)(ch1)
ch2_layer = LSTM(num_RNN_neurons, return_sequences=False)(ch2)
ch3_layer = LSTM(num_RNN_neurons, return_sequences=False)(ch3)
visible = concatenate([
ch1_layer,
ch2_layer,
ch3_layer])
hidden1 = Dense(30, activation='linear', name='weighted_average_channels')(visible)
output = Dense(num_classes, activation='softmax')(hidden1)
model = Model(inputs= [ch1, ch2, ch3], outputs=output)
# Compile model
model.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-4, patience=5, verbose=1, mode='auto')
Then, I try to fit the model:
# Fit the model
model.fit(X_train, y_train,
epochs=epochs,
batch_size=batch_size,
validation_data=(X_test, y_test),
callbacks=[monitor],
verbose=1)
and I get the following error:
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 3 array(s), but instead got the following list of 1 arrays...
How should I reshape my data, to solve the issue?
You magically assume a single input with 3 time series X_train will split into 4 channels and be assigned to different inputs. Well this doesn't happen and that is what the error is complaining about. You have 1 input:
ch123_in = Input(shape=(series_len, 3), name='ch123')
latent = LSTM(num_RNN_neurons)(ch123_in)
hidden1 = Dense(30, activation='linear', name='weighted_average_channels')(latent)
By merging the series together into single LSTM, the model might pickup relations across time series as well. Now your target shape has to be y_train.shape == (2250, 5), the first dimension must match X_train.shape[0].
Another point is you have Dense layer with linear activation, that is almost useless as it doesn't provide any non-linearity. You might want to use a non-linear activation function like relu.