I am running:
D.fit(X_train, y_train, nb_epoch=12,validation_data=(X_train,y_train))
But I get outputs like:
Train on 61936 samples, validate on 61936 samples
Epoch 1/12
61936/61936 [==============================] - 10s 166us/step - loss: 0.0021 - val_loss: 1.5650e-04
Epoch 2/12
61936/61936 [==============================] - 10s 165us/step - loss: 0.0014 - val_loss: 6.6482e-04
...
Epoch 10/12
61936/61936 [==============================] - 11s 170us/step - loss: 0.0104 - val_loss: 9.6666e-05
known issue
https://github.com/keras-team/keras/issues/605
The other reason that the results are different is because the model
is being trained while the "loss" is being computed, whereas the model
is fixed while "val_loss" is being computed. Since the model is
training, "loss" is typically going to be larger than the true
training set loss at the end of the epoch. I.e. "loss" is the average
loss during the epoch, and "val_loss" is the average loss after the
end of the epoch. Since the model changes during the epoch, the loss
changes.
These will never match. Validation loss is computed on the whole dataset at once (with weights fixed), with training loss is the average of loss across batches (weights change after every batch). If you want the real loss on the training set, you should run model.evaluate(X_train)
Related
I am doing a time series analysis using Tensorflow/ Keras in Python.
The overall LSTM model looks like,
model = keras.models.Sequential()
model.add(keras.layers.LSTM(25, input_shape = (1,1), activation = 'relu', dropout = 0.2, return_sequences = False))
model.add(keras.layers.Dense(1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics=['acc'])
tensorboard = keras.callbacks.TensorBoard(log_dir="logs/{}".format(time()))
es = keras.callbacks.EarlyStopping(monitor='val_acc', mode='max', verbose=1, patience=50)
mc = keras.callbacks.ModelCheckpoint('/home/sukriti/best_model.h5', monitor='val_loss', mode='min', save_best_only=True)
history = model.fit(trainX_3d, trainY_1d, epochs=50, batch_size=10, verbose=2, validation_data = (testX_3d, testY_1d), callbacks=[mc, es, tensorboard])
I am having the following outcome,
Train on 14015 samples, validate on 3503 samples
Epoch 1/50
- 3s - loss: 0.0222 - acc: 7.1352e-05 - val_loss: 0.0064 - val_acc: 0.0000e+00
Epoch 2/50
- 2s - loss: 0.0120 - acc: 7.1352e-05 - val_loss: 0.0054 - val_acc: 0.0000e+00
Epoch 3/50
- 2s - loss: 0.0108 - acc: 7.1352e-05 - val_loss: 0.0047 - val_acc: 0.0000e+00
Now the val_acc remains unchanged. Is it normal?
what does it signify?
As signified by loss = 'mean_squared_error', you are in a regression setting, where accuracy is meaningless (it is meaningful only in classification problems).
Unfortunately, Keras will not "protect" you in such a case, insisting in computing and reporting back an "accuracy", despite the fact that it is meaningless and inappropriate for your problem - see my answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?
You should simply remove metrics=['acc'] from your model compilation, and don't bother - in regression settings, MSE itself can (and usually does) serve also as the performance metric.
In my case I had validation accuracy of 0.0000e+00 throughout training (using Keras and CNTK-GPU backend) when my batch size was 64 but there were only 120 samples in my validation set (divided into three classes). After I changed the batch size to 60, I got normal accuracy values.
It will not improve with changing batch size or with metrics. I had the same problem but when I shuffled my training and validation data set 0.0000e+00 gone.
I have a dataset that I used for making NN model in Keras, i took 2000 rows from that dataset to have them as validation data, those 2000 rows should be added in .predict function.
I wrote a code for Keras NN and for now it works good, but I noticed something that is very strange for me. It gives me very good accuracy of more than 83%, loss is around 0.12, but when I want to make a prediction with unseen data (those 2000 rows), it only predicts correct in average of 65%.
When I add Dropout layer, it only decreases accuracy.
Then I have added EarlyStopping, and it gave me accuracy around 86%, loss is around 0.10, but still when I make prediction with unseen data, I get final prediction accuracy of 67%.
Does this mean that model made correct prediction in 87% of situations? Im going with a logic, if I add 100 samples in my .predict function, that program should make good prediction for 87/100 samples, or somewhere in that range (lets say more than 80)? I have tried to add 100, 500, 1000, 1500 and 2000 samples in my .predict function, and it always make correct prediction in 65-68% of the samples.
Why is that, am I doing something wrong?
I have tried to play with number of layers, number of nodes, with different activation functions and with different optimizers but it only changes the results by 1-2%.
My dataset looks like this:
DataFrame shape (59249, 33)
x_train shape (47399, 32)
y_train shape (47399,)
x_test shape (11850, 32)
y_test shape (11850,)
testing_features shape (1000, 32)
This is my NN model:
model = Sequential()
model.add(Dense(64, input_dim = x_train.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1
# compile the model, adam gradient descent (optimized)
model.compile(loss="binary_crossentropy", optimizer= "adam", metrics=['accuracy'])
# call the function to fit to the data training the network)
es = EarlyStopping(monitor='val_loss', min_delta=0.0, patience=1, verbose=0, mode='auto')
model.fit(x_train, y_train, epochs = 15, shuffle = True, batch_size=32, validation_data=(x_test, y_test), verbose=2, callbacks=[es])
scores = model.evaluate(x_test, y_test)
print(model.metrics_names[0], round(scores[0]*100,2), model.metrics_names[1], round(scores[1]*100,2))
These are the results:
Train on 47399 samples, validate on 11850 samples
Epoch 1/15
- 25s - loss: 0.3648 - acc: 0.8451 - val_loss: 0.2825 - val_acc: 0.8756
Epoch 2/15
- 9s - loss: 0.2949 - acc: 0.8689 - val_loss: 0.2566 - val_acc: 0.8797
Epoch 3/15
- 9s - loss: 0.2741 - acc: 0.8773 - val_loss: 0.2468 - val_acc: 0.8849
Epoch 4/15
- 9s - loss: 0.2626 - acc: 0.8816 - val_loss: 0.2416 - val_acc: 0.8845
Epoch 5/15
- 10s - loss: 0.2566 - acc: 0.8827 - val_loss: 0.2401 - val_acc: 0.8867
Epoch 6/15
- 8s - loss: 0.2503 - acc: 0.8858 - val_loss: 0.2364 - val_acc: 0.8893
Epoch 7/15
- 9s - loss: 0.2480 - acc: 0.8873 - val_loss: 0.2321 - val_acc: 0.8895
Epoch 8/15
- 9s - loss: 0.2450 - acc: 0.8886 - val_loss: 0.2357 - val_acc: 0.8888
11850/11850 [==============================] - 2s 173us/step
loss 23.57 acc 88.88
And this is for prediction:
#testing_features are 2000 rows that i extracted from dataset (these samples are not used in training, this is separate dataset thats imported)
prediction = model.predict(testing_features , batch_size=32)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn - also much lower
acc_score = accuracy_score(testing_results, res)
print("Sklearn acc", acc_score)
result_df = pd.DataFrame({"label":testing_results,
"prediction":res})
result_df["prediction"] = result_df["prediction"].astype(int)
s = 0
for x,y in zip(result_df["label"], result_df["prediction"]):
if x == y:
s+=1
print(s,"/",len(result_df))
acc = s*100/len(result_df)
print('TOTAL ACC:', round(acc,2))
The problem is...now I get accuracy with sklearn 52% and my_acc 52%.
Why do I get such low accuracy on validation, when it says that its much larger?
The training data you posted gives high validation accuracy, so I'm a bit confused as to where you get that 65% from, but in general when your model performs much better on training data than on unseen data, that means you're over fitting. This is a big and recurring problem in machine learning, and there is no method guaranteed to prevent this, but there are a couple of things you can try:
regularizing the weights of your network, e.g. using l2 regularization
using stochastic regularization techniques such as drop-out during training
early stopping
reducing model complexity (but you say you've already tried this)
I will list the problems/recommendations that I see on your model.
What are you trying to predict? You are using sigmoid activation function in the last layer which seems it is a binary classification but in your loss fuction you used mse which seems strange. You can try binary_crossentropy instead of mse loss function for your model.
Your model seems suffer from overfitting so you can increase the prob. of Dropout and also add new Dropout between other hidden layers or you can remove one of the hidden layers because it seem your model is too complex.
You can change your neuron numbers in layers like a narrower => 64 -> 32 -> 16 -> 1 or try different NN architectures.
Try adam optimizer instead of sgd.
If you have 57849 sample you can use 47000 samples in training+validation and rest of will be your test set.
Don't use the same sets for your evaluation and validation. First split your data into train and test set. Then when you are fitting your model give validation_split_ratio then it will automatically give validation set from your training set.
When the training starts, in the run window only loss and acc are displayed, the val_loss and val_acc are missing. Only at the end, these values are showed.
model.add(Flatten())
model.add(Dense(512, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation="softmax"))
model.compile(
loss='categorical_crossentropy',
optimizer="adam",
metrics=['accuracy']
)
model.fit(
x_train,
y_train,
batch_size=32,
epochs=1,
validation_data=(x_test, y_test),
shuffle=True
)
this is how the training starts:
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
32/50000 [..............................] - ETA: 34:53 - loss: 2.3528 - acc: 0.0938
64/50000 [..............................] - ETA: 18:56 - loss: 2.3131 - acc: 0.0938
96/50000 [..............................] - ETA: 13:45 - loss: 2.3398 - acc: 0.1146
and this is when it finishes
49984/50000 [============================>.] - ETA: 0s - loss: 1.5317 - acc: 0.4377
50000/50000 [==============================] - 231s 5ms/step - loss: 1.5317 - acc: 0.4378 - val_loss: 1.1503 - val_acc: 0.5951
I want to see the val_acc and val_loss in each line
Validation loss and accuracy are computed on epoch end, not on batch end. If you want to compute those values after each batch, you would have to implement your own callback with an on_batch_end() method and call self.model.evaluate() on the validation set. See https://keras.io/callbacks/.
But computing the validation loss and accuracy after each epoch is going to slow down your training a lot and doesn't bring much in terms of evaluation of the network performance.
It doesn't make much sense to compute the validation metrics at each iteration, because it would make your training process much slower and your model doesn't change that much from iteration to iteration. On the other hand it makes much more sense to compute these metrics at the end of each epoch.
In your case you have 50000 samples on the training set and 10000 samples on the validation set and a batch size of 32. If you were to compute the val_loss and val_acc after each iteration it would mean that for every 32 training samples updating your weights you would have 313 (i.e. 10000/32) iterations of 32 validation samples. Since your every epoch consists of 1563 iterations (i.e. 50000/32), you'd have to perform 489219 (i.e. 313*1563) batch predictions just for evaluating the model. This would cause your model to train several orders of magnitude slower!
If you still want to compute the validation metrics at the end of each iteration (not recommended for the reasons stated above), you could simply shorten your "epoch" so that your model sees just 1 batch per epoch:
model.fit(
x_train,
y_train,
batch_size=32,
epochs=len(x_train) // batch_size + 1, # 1563 in your case
steps_per_epoch=1,
validation_data=(x_test, y_test),
shuffle=True
)
This isn't exactly equivalent because the samples will be drawn at random, with replacement, from the data but it is the easiest you can get...
You should use One hot encoding or to_categorical method
to your y_train and y_test variable.
I faced this problem while model fitting part in a Text generation nlp problem.
When I used to_categorical method to both my y_test and y_validation variable then
the validation accuracy displayed.
if batch size is bigger than max number of your validation data set, this will happen. Simply lowering batch size solved my issue.
I'm programming a neural network in tf.keras, with 3 layers. My dataset is the MNIST dataset. I decreased the number of examples in the dataset, so the runtime is lower. This is my code:
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import pandas as pd
!git clone https://github.com/DanorRon/data
%cd data
!ls
batch_size = 32
epochs = 10
alpha = 0.0001
lambda_ = 0
h1 = 50
train = pd.read_csv('/content/first-repository/mnist_train.csv.zip')
test = pd.read_csv('/content/first-repository/mnist_test.csv.zip')
train = train.loc['1':'5000', :]
test = test.loc['1':'2000', :]
train = train.sample(frac=1).reset_index(drop=True)
test = test.sample(frac=1).reset_index(drop=True)
x_train = train.loc[:, '1x1':'28x28']
y_train = train.loc[:, 'label']
x_test = test.loc[:, '1x1':'28x28']
y_test = test.loc[:, 'label']
x_train = x_train.values
y_train = y_train.values
x_test = x_test.values
y_test = y_test.values
nb_classes = 10
targets = y_train.reshape(-1)
y_train_onehot = np.eye(nb_classes)[targets]
nb_classes = 10
targets = y_test.reshape(-1)
y_test_onehot = np.eye(nb_classes)[targets]
model = tf.keras.Sequential()
model.add(layers.Dense(784, input_shape=(784,)))
model.add(layers.Dense(h1, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_)))
model.add(layers.Dense(10, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(lambda_)))
model.compile(optimizer=tf.train.GradientDescentOptimizer(alpha),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(x_train, y_train_onehot, epochs=epochs, batch_size=batch_size)
Whenever I run it, one of 3 things happens:
The loss decreases and the accuracy increases for a few epochs, until the loss becomes NaN for no apparent reason and the accuracy plummets.
The loss and accuracy stay the same for each epoch. Usually the loss is 2.3025 and the accuracy is 0.0986.
The loss starts at NaN(and stays that way), while the accuracy stays low.
Most of the time, the model does one of these things, but sometimes it does something random. It seems like the type of erratic behavior that occurs is completely random. I have no idea what the problem is. How do I fix this problem?
Edit: Sometimes, the loss decreases, but the accuracy stays the same. Also, sometimes the loss decreases and the accuracy increases, then after a while the accuracy decreases while the loss still decreases. Or, the loss decreases and the accuracy increases, then it switches and the loss goes up fast while the accuracy plummets, eventually ending with loss: 2.3025 acc: 0.0986.
Edit 2: This is an example of something that sometimes happens:
Epoch 1/100
49999/49999 [==============================] - 5s 92us/sample - loss: 1.8548 - acc: 0.2390
Epoch 2/100
49999/49999 [==============================] - 5s 104us/sample - loss: 0.6894 - acc: 0.8050
Epoch 3/100
49999/49999 [==============================] - 4s 90us/sample - loss: 0.4317 - acc: 0.8821
Epoch 4/100
49999/49999 [==============================] - 5s 104us/sample - loss: 2.2178 - acc: 0.1345
Epoch 5/100
49999/49999 [==============================] - 5s 90us/sample - loss: 2.3025 - acc: 0.0986
Epoch 6/100
49999/49999 [==============================] - 4s 90us/sample - loss: 2.3025 - acc: 0.0986
Epoch 7/100
49999/49999 [==============================] - 4s 89us/sample - loss: 2.3025 - acc: 0.0986
Edit 3: I changed the loss to mean squared error and the network works well now. Is there a way to keep it in cross entropy without it converging to a local minimum?
I changed the loss to mean squared error and the network works well now
MSE is not the appropriate loss function for such classification problems; you should certainly stick to loss = 'categorical_crossentropy'.
Most probably, the issue is due to your MNIST data being not normalized; you should normalize your final variables as
x_train = x_train.values/255
x_test = x_test.values/255
Not normalizing input data is a known cause of exploding gradient problems, which is probably what is happening here.
Other advice: set activation='relu' for your first dense layer, and get rid of both the regularizer & initializer arguments from all layers (the default glorot_uniform is actually a better initializer, while regularization here may actually be harmful for the performance).
As a general advice, try not to reinvent the wheel - start with a Keras example using the built-in MNIST data...
The frustration your feeling towards the seemly random output of your code is understandable and correctly identified. Every time the model begins training it randomly initializes the weights. Depending on this initialization you see one of your three output scenarios.
The issue is most likely due to vanishing gradients. It's a phenomenon that occurs when the backpropagation causes very small weights to be multiplied by a small number to create an almost infinitely small value. The solution is to add small jitter (1e-10) to each of your gradients (from within the cost function) so that they never reach zero.
There are tons of more detailed blogs about vanishing gradients online and for an implementation example checkout line 217 of this TensorFlow Network
My model stops training after the 4th epoch even though I expect it to continue training beyond that. I've set monitor to validation loss and patience to 2, which I thought means that training stops after validation loss increases consecutively for 2 epochs. However, training seems to stop before that happens.
I've defined EarlyStopping as follows:
callbacks = [
EarlyStopping(monitor='val_loss', patience=2, verbose=0),
]
And in the fit function I use it like this:
hist = model.fit_generator(
generator(imgIds, batch_size=batch_size, is_train=True),
validation_data=generator(imgIds, batch_size=batch_size, is_val=True),
validation_steps=steps_per_val,
steps_per_epoch=steps_per_epoch,
epochs=epoch_count,
verbose=verbose_level,
callbacks=callbacks)
I don't understand why training ends after the 4th epoch.
675/675 [==============================] - 1149s - loss: 0.1513 - val_loss: 0.0860
Epoch 2/30
675/675 [==============================] - 1138s - loss: 0.0991 - val_loss: 0.1096
Epoch 3/30
675/675 [==============================] - 1143s - loss: 0.1096 - val_loss: 0.1040
Epoch 4/30
675/675 [==============================] - 1139s - loss: 0.1072 - val_loss: 0.1019
Finished training intermediate1.
I think your interpretation of the EarlyStopping callback is a little off; it stops when the loss doesn't improve from the best loss it has ever seen for patience epochs. The best loss your model had was 0.0860 at epoch 1, and for epochs 2 and 3 the loss did not improve, so it should have stopped training after epoch 3. However, it continues to train for one more epoch due to an off-by-one error, at least I would call it that given what the docs say about patience, which is:
patience: number of epochs with no improvement after which training will be stopped.
From the Keras source code (edited slightly for clarity):
class EarlyStopping(Callback):
def on_epoch_end(self, epoch, logs=None):
current = logs.get(self.monitor)
if np.less(current - self.min_delta, self.best):
self.best = current
self.wait = 0
else:
if self.wait >= self.patience:
self.stopped_epoch = epoch
self.model.stop_training = True
self.wait += 1
Notice how self.wait isn't incremented until after the check against self.patience, so while your model should have stopped training after epoch 3, it continued for one more epoch.
Unfortunately it seems if you want a callback that behaves the way you described, where it stops training without consecutive improvement in patience epochs, you'd have to write it yourself. But I think you could just modify the EarlyStopping callback slightly to accomplish this.
Edit: The off-by-one error is fixed.