I am running a code using Python 3.7.5 with TensorFlow 2.0 for MNIST classification.
I am using EarlyStopping from TensorFlow 2.0 and the callback I have for it is:
callbacks = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience = 3,
min_delta=0.001
)
]
According to EarlyStopping - TensorFlow 2.0 page, the definition of min_delta parameter is as follows:
min_delta: Minimum change in the monitored quantity to qualify as an
improvement, i.e. an absolute change of less than min_delta, will
count as no improvement.
Train on 60000 samples, validate on 10000 samples
Epoch 1/15 60000/60000 [==============================] - 10s
173us/sample - loss: 0.2040 - accuracy: 0.9391 - val_loss: 0.1117 -
val_accuracy: 0.9648
Epoch 2/15 60000/60000 [==============================] - 9s
150us/sample - loss: 0.0845 - accuracy: 0.9736 - val_loss: 0.0801 -
val_accuracy: 0.9748
Epoch 3/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0574 - accuracy: 0.9817 - val_loss: 0.0709 -
val_accuracy: 0.9795
Epoch 4/15 60000/60000 [==============================] - 9s
149us/sample - loss: 0.0434 - accuracy: 0.9858 - val_loss: 0.0787 -
val_accuracy: 0.9761
Epoch 5/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0331 - accuracy: 0.9893 - val_loss: 0.0644 -
val_accuracy: 0.9808
Epoch 6/15 60000/60000 [==============================] - 9s
150us/sample - loss: 0.0275 - accuracy: 0.9910 - val_loss: 0.0873 -
val_accuracy: 0.9779
Epoch 7/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0232 - accuracy: 0.9921 - val_loss: 0.0746 -
val_accuracy: 0.9805
Epoch 8/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0188 - accuracy: 0.9936 - val_loss: 0.1088 -
val_accuracy: 0.9748
Now if I look at the last three epochs viz., epochs 6, 7, and 8 and look at the validation loss ('val_loss'), their values are:
0.0688, 0.0843 and 0.0847.
And the differences between consecutive 3 terms are: 0.0155, 0.0004. But isn't the first difference greater than 'min_delta' as defined in the callback.
The code I came up with for EarlyStopping is as follows:
# numpy array to hold last 'patience = 3' values-
pv = [0.0688, 0.0843, 0.0847]
# numpy array to compute differences between consecutive elements in 'pv'-
differences = np.diff(pv, n=1)
differences
# array([0.0155, 0.0004])
# minimum change required for monitored metric's improvement-
min_delta = 0.001
# Check whether the consecutive differences is greater than 'min_delta' parameter-
check = differences > min_delta
check
# array([ True, False])
# Condition to see whether all 3 'val_loss' differences are less than 'min_delta'
# for training to stop since EarlyStopping is called-
if np.all(check == False):
print("Stop Training - EarlyStopping is called")
# stop training
But according to the 'val_loss', the differences between the not ALL of the 3 last epochs are greater than 'min_delta' of 0.001. For example, the first difference is greater than 0.001 (0.0843 - 0.0688) while the second difference is less than 0.001 (0.0847 - 0.0843).
Also, according to patience parameter definition of "EarlyStopping":
patience: Number of epochs with no improvement after which training will be stopped.
So, EarlyStopping should be called when there is no improvement for 'val_loss' for 3 consecutive epochs where the absolute change of less than 'min_delta' does not count as improvement.
Then why is EarlyStopping called?
Code for model definition and 'fit()' are:
import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.sparsity import keras as sparsity
import matplotlib.pyplot as plt from tensorflow.keras.layers import AveragePooling2D, Conv2D
from tensorflow.keras import models, layers, datasets
from tensorflow.keras.layers import Dense, Flatten, Reshape, Input, InputLayer
from tensorflow.keras.models import Sequential, Model
# Specify the parameters to be used for layer-wise pruning, NO PRUNING is done here:
pruning_params_unpruned = {
'pruning_schedule': sparsity.PolynomialDecay(
initial_sparsity=0.0, final_sparsity=0.0,
begin_step = 0, end_step=0, frequency=100) }
def pruned_nn(pruning_params):
"""
Function to define the architecture of a neural network model
following 300 100 architecture for MNIST dataset and using
provided parameter which are used to prune the model.
Input: 'pruning_params' Python 3 dictionary containing parameters which are used for pruning
Output: Returns designed and compiled neural network model
"""
pruned_model = Sequential()
pruned_model.add(l.InputLayer(input_shape=(784, )))
pruned_model.add(Flatten())
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 300, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.2))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 100, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.1))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = num_classes, activation='softmax'),
**pruning_params))
# Compile pruned CNN-
pruned_model.compile(
loss=tf.keras.losses.categorical_crossentropy,
# optimizer='adam',
optimizer=tf.keras.optimizers.Adam(lr = 0.001),
metrics=['accuracy'])
return pruned_model
batch_size = 32
epochs = 50
# Instantiate NN-
orig_model = pruned_nn(pruning_params_unpruned)
# Train unpruned Neural Network-
history_orig = orig_model.fit(
x = X_train, y = y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
callbacks = callbacks,
validation_data = (X_test, y_test),
shuffle = True )
The behaviour of the Early Stopping callback is related to:
Metric or Loss to be monitored
min_delta which is the minimum quantity to be considered an improvement, between the performance of the monitored quantity in the current epoch and the best result in that metric.
patience which is the number of epochs without improvements (taking into consideration that improvements have to be of a greater change than min_delta) before stopping the algorithm.
In your case, the best val_lossis 0.0644 and should have a value of lower than 0.0634 to be registered as improvement:
Epoch 6/15 val_loss: 0.0873 | Difference is: + 0.0229
Epoch 7/15 val_loss: 0.0746 | Difference is: + 0.0102
Epoch 8/15 val_loss: 0.1088 | Difference is: + 0.0444
Be aware that the quantities that are printed in the "training log", are approximated and you shouldn't base your assumptions on these values. You should rather take into consideration the true values from callbacks or the history of the training.
Reference
Related
Firstly I read in my cvs file which contained a 1 or 0 matrix
df = pd.read_csv(url)
print(df.head())
print(df.columns)
Next I gathered the Pictures and resized them
image_directory = 'Directory/'
dir_list = os.listdir(path)
print("Files and directories in '", image_directory, "' :")
# print the list
print(dir_list)
They were saved into an X2 variable.
SIZE = 200
X_dataset = []
for i in tqdm(range(df.shape[0])):
img2 = cv2.imread("Cell{}.png".format(i), cv2.IMREAD_UNCHANGED)
img = tf.keras.preprocessing.image.load_img(image_directory +df['ID'][i], target_size=(SIZE,SIZE,3))
#numpy array of each image at size 200, 200, 3 (color)
img = np.array(img)
img = img/255.
X_dataset.append(img)
X2 = np.array(X_dataset)
print(X2.shape)
I created the y2 data by getting the cvs data, dropping two columns and getting a shape of (1000, 16)
y2 = np.array(df.drop(['Outcome', 'ID'], axis=1))
print(y2.shape)
I then did the train_test_split
I wonder if my random state or test_size is not optimal
X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y2, random_state=10, test_size=0.3)
Next, I created a sequential model
SIZE = (200,200,3) which was made above in the resized model.
model2 = Sequential()
model2.add(Conv2D(filters=16, kernel_size=(10, 10), activation="relu", input_shape=(SIZE,SIZE,3)))
model2.add(BatchNormalization())
model2.add(MaxPooling2D(pool_size=(5, 5)))
model2.add(Dropout(0.2))
model2.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
model2.add(MaxPooling2D(pool_size=(2, 2)))
model2.add(BatchNormalization())
model2.add(Dropout(0.2))
model2.add(Conv2D(filters=64, kernel_size=(5, 5), activation="relu"))
model2.add(MaxPooling2D(pool_size=(2, 2)))
model2.add(BatchNormalization())
model2.add(Dropout(0.2))
model2.add(Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
model2.add(MaxPooling2D(pool_size=(2, 2)))
model2.add(BatchNormalization())
model2.add(Dropout(0.2))
model2.add(Flatten())
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.5))
model2.add(Dense(128, activation='relu'))
model2.add(Dropout(0.5))
model2.add(Dense(16, activation='sigmoid'))
#Do not use softmax for multilabel classification
#Softmax is useful for mutually exclusive classes, either cat or dog but not both.
#Also, softmax outputs all add to 1. So good for multi class problems where each
#class is given a probability and all add to 1. Highest one wins.
#Sigmoid outputs probability. Can be used for non-mutually exclusive problems.
#like multi label, in this example.
#But, also good for binary mutually exclusive (cat or not cat).
model2.summary()
#Binary cross entropy of each label. So no really a binary classification problem but
#Calculating binary cross entropy for each label.
opt = tf.keras.optimizers.Adamax(
learning_rate=0.02,
beta_1=0.8,
beta_2=0.9999,
epsilon=1e-9,
name='Adamax')
model2.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy', 'mse' ])
The model uses a custom optimizer, ands the shape generated has 473,632 trainable params.
I then specify the sample weight which was calculated by taking the highest sampled number and divided the other numbers by that.
sample_weight = { 0:1,
1:0.5197368421,
2:0.4385964912,
3:0.2324561404,
4:0.2302631579,
5:0.399122807,
6:0.08114035088,
7:0.5723684211,
8:0.08552631579,
9:0.2061403509,
10:0.3815789474,
11:0.125,
12:0.08333333333,
13:0.1206140351,
14:0.1403508772,
15:0.4824561404
}
finally I ran the model.fit
history = model2.fit(X_train2, y_train2, epochs=25, validation_data=(X_test2, y_test2), batch_size=64, class_weight = sample_weight, shuffle = False)
My issue was that the model was maxing out at around 30 to 40% accuracy. I looked into it, and they said tuning the learning rate was important. I also saw that raising the epochs would help to a point, as would lowering the batch size.
Is there any other thing I may have missed? I noticed the worse models only predicted one class frequently (100% normal, 0% anything else) but the better model predicted on a sliding scale where some items were at 10% and some at 70%.
I also wonder if I inverted my sample weights, my item 0 has the most items in it... Should it be inverted, where 1 sample 1 counts for 2 sample 0s?
Thanks.
Things I tried.
Changing the batch size down to 16 or 8. (resulted in longer epoch times, slightly better results)
Changing the learning rate to a lower number (resulted in slightly better results, but over more epochs)
Changing it to 100 epochs (results plateaued around 20 epochs usually.)
Attempting to create more params higher filters, larger initial kernel size, larger initial pool size, more and higher value dense layers. (This resulted in it eating the RAM and not getting much better results.)
Changing the optimizer to Adam or RAdam or AdamMax. (Didn't really change much, the other optimizers sucked though). I messed with the beta_1 and epsilon too.
Revising the Cvs. (the data is fairly vague, had help and it still was hard to tell)
Removing bad data (I didn't want to get rid of too many pictures.)
Edit: Added sample accuracy. This one was unusually low, but starts off well enough (accuracy initially is 25.9%)
14/14 [==============================] - 79s 6s/step - loss: 0.4528 - accuracy: 0.2592 - mse: 0.1594 - val_loss: 261.8521 - val_accuracy: 0.3881 - val_mse: 0.1416
Epoch 2/25
14/14 [==============================] - 85s 6s/step - loss: 0.2817 - accuracy: 0.3188 - mse: 0.1310 - val_loss: 22.7037 - val_accuracy: 0.3881 - val_mse: 0.1416
Epoch 3/25
14/14 [==============================] - 79s 6s/step - loss: 0.2611 - accuracy: 0.3555 - mse: 0.1243 - val_loss: 11.9977 - val_accuracy: 0.3881 - val_mse: 0.1416
Epoch 4/25
14/14 [==============================] - 80s 6s/step - loss: 0.2420 - accuracy: 0.3521 - mse: 0.1172 - val_loss: 6.6056 - val_accuracy: 0.3881 - val_mse: 0.1416
Epoch 5/25
14/14 [==============================] - 80s 6s/step - loss: 0.2317 - accuracy: 0.3899 - mse: 0.1151 - val_loss: 4.9567 - val_accuracy: 0.3881 - val_mse: 0.1415
Epoch 6/25
14/14 [==============================] - 80s 6s/step - loss: 0.2341 - accuracy: 0.3899 - mse: 0.1141 - val_loss: 2.7395 - val_accuracy: 0.3881 - val_mse: 0.1389
Epoch 7/25
14/14 [==============================] - 76s 5s/step - loss: 0.2277 - accuracy: 0.4128 - mse: 0.1107 - val_loss: 2.3758 - val_accuracy: 0.3881 - val_mse: 0.1375
Epoch 8/25
14/14 [==============================] - 85s 6s/step - loss: 0.2199 - accuracy: 0.4106 - mse: 0.1094 - val_loss: 1.4526 - val_accuracy: 0.3881 - val_mse: 0.1319
Epoch 9/25
14/14 [==============================] - 76s 5s/step - loss: 0.2196 - accuracy: 0.4151 - mse: 0.1086 - val_loss: 0.7962 - val_accuracy: 0.3881 - val_mse: 0.1212
Epoch 10/25
14/14 [==============================] - 80s 6s/step - loss: 0.2187 - accuracy: 0.4140 - mse: 0.1087 - val_loss: 0.6308 - val_accuracy: 0.3744 - val_mse: 0.1211
Epoch 11/25
14/14 [==============================] - 81s 6s/step - loss: 0.2175 - accuracy: 0.4071 - mse: 0.1086 - val_loss: 0.5986 - val_accuracy: 0.3242 - val_mse: 0.1170
Epoch 12/25
14/14 [==============================] - 80s 6s/step - loss: 0.2087 - accuracy: 0.3968 - mse: 0.1034 - val_loss: 0.4003 - val_accuracy: 0.3333 - val_mse: 0.1092
Epoch 13/25
12/14 [========================>.....] - ETA: 10s - loss: 0.2092 - accuracy: 0.3945 - mse: 0.1044
Here are some notes that might help:
When using Batch Normalization, avoid too small batch sizes. For more details see the Group Normalization paper by Yuxin Wu and Kaiming He.
It may worth looking at metrics like AUC, and F1 as well since you have an imbalanced multiclass case. You can add tf.keras.metrics.AUC(curve='PR') to your metrics list.
The training loss seems to have stalled at the end of epoch 13. If the training loss does not decrease anymore, you may want to 1. use smaller learning rate, and/or 2. reduce your dropout parameters. Particularly, the relatively-large dropout right before your last layer seems suspicious to me. First, try to obtain a model that fits well your training dataset (with low to no dropout). That is an important step. If your model cannot fit well your training dataset without any regularization, it may need more trainable parameters. After achieving a minimum model that fits the training dataset, you can then add regularization mechanisms to mitigate the overfitting issue.
Unless you have a good reason to do otherwise set shuffle = True (which is also the default setting) to shuffle the training data before each epoch.
Although it is probably not the root cause of your problem, there is a debate on whether the normalization should come before the activation or after that. Some prefer to use it before the activation.
The following was not clear to me:
I then specify the sample weight which was calculated by taking the
highest sampled number and divided the other numbers by that.
Your class weights may have already been calculated correctly. I'd like to emphasize though that an under-represented class should be assigned a larger weight. Refer to this tutorial from TensorFlow as needed.
To do some parameter tuning, I like to loop over some training function with Keras. However, I realized that when using tensorflow.keras.metrics.AUC() as a metric, for every training loop, an integer gets added to the auc metric name (e.g. auc_1, auc_2, ...). So actually the keras metrics are somehow stored even when coming out of the training function.
This first of all leads to the callbacks not recognizing the metric anymore and also makes me wonder if there are not other things stored like the model weights.
How can I reset the metrics and are there other things that get stored by keras that I need to reset to get a clean restart for training?
Below you can find a minimal working example:
edit: this example seems to only work with tensorflow 2.2
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='sigmoid'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[AUC()])
return model
def train():
CB_lr = tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_auc",
patience=3,
verbose=1,
mode="max",
min_delta=0.0001,
min_lr=1e-6)
CB_es = tf.keras.callbacks.EarlyStopping(
monitor="val_auc",
min_delta=0.00001,
verbose=1,
patience=10,
mode="max",
restore_best_weights=True)
callbacks = [CB_lr, CB_es]
y = [np.ones((11, 1)) for _ in range(1000)]
x = [np.ones((37, 12, 1)) for _ in range(1000)]
dummy_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
model = dummy_network(input_shape=((37, 12, 1)))
model.fit(dummy_dataset, validation_data=val_dataset, epochs=2,
steps_per_epoch=len(x) // 100,
validation_steps=len(x) // 100, callbacks=callbacks)
for i in range(3):
print(f'\n\n **** Loop {i} **** \n\n')
train()
The output is:
**** Loop 0 ****
2020-06-16 14:37:46.621264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f991e541f10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-16 14:37:46.621296: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 1/2
10/10 [==============================] - 0s 44ms/step - loss: 0.1295 - auc: 0.0000e+00 - val_loss: 0.0310 - val_auc: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - 0s 10ms/step - loss: 0.0262 - auc: 0.0000e+00 - val_loss: 0.0223 - val_auc: 0.0000e+00 - lr: 0.0010
**** Loop 1 ****
Epoch 1/2
10/10 [==============================] - ETA: 0s - loss: 0.4751 - auc_1: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
10/10 [==============================] - 0s 36ms/step - loss: 0.4751 - auc_1: 0.0000e+00 - val_loss: 0.3137 - val_auc_1: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - ETA: 0s - loss: 0.2617 - auc_1: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
10/10 [==============================] - 0s 10ms/step - loss: 0.2617 - auc_1: 0.0000e+00 - val_loss: 0.2137 - val_auc_1: 0.0000e+00 - lr: 0.0010
**** Loop 2 ****
Epoch 1/2
10/10 [==============================] - ETA: 0s - loss: 0.1948 - auc_2: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
10/10 [==============================] - 0s 34ms/step - loss: 0.1948 - auc_2: 0.0000e+00 - val_loss: 0.0517 - val_auc_2: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - ETA: 0s - loss: 0.0445 - auc_2: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
10/10 [==============================] - 0s 10ms/step - loss: 0.0445 - auc_2: 0.0000e+00 - val_loss: 0.0389 - val_auc_2: 0.0000e+00 - lr: 0.0010
Your reproducible example failed in several places for me, so I changed just a few things (I'm using TF 2.1). After getting it to run, I was able to get rid of the additional metric names by specifying metrics=[AUC(name='auc')]. Here's the full (fixed) reproducible example:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='softmax'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[AUC(name='auc')])
return model
def train():
CB_lr = tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_auc",
patience=3,
verbose=1,
mode="max",
min_delta=0.0001,
min_lr=1e-6)
CB_es = tf.keras.callbacks.EarlyStopping(
monitor="val_auc",
min_delta=0.00001,
verbose=1,
patience=10,
mode="max",
restore_best_weights=True)
callbacks = [CB_lr, CB_es]
y = tf.keras.utils.to_categorical([np.random.randint(0, 11) for _ in range(1000)])
x = [np.ones((37, 12, 1)) for _ in range(1000)]
dummy_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
model = dummy_network(input_shape=((37, 12, 1)))
model.fit(dummy_dataset, validation_data=val_dataset, epochs=2,
steps_per_epoch=len(x) // 100,
validation_steps=len(x) // 100, callbacks=callbacks)
for i in range(3):
print(f'\n\n **** Loop {i} **** \n\n')
train()
Train for 10 steps, validate for 10 steps
Epoch 1/2
1/10 [==>...........................] - ETA: 6s - loss: 0.3426 - auc: 0.4530
7/10 [====================>.........] - ETA: 0s - loss: 0.3318 - auc: 0.4895
10/10 [==============================] - 1s 117ms/step - loss: 0.3301 -
auc: 0.4893 - val_loss: 0.3222 - val_auc: 0.5085
This happens because every loop, you created a new metric without a specified name by doing this: metrics=[AUC()]. The first iteration of the loop, TF automatically created a variable in the name space called auc, but at the second iteration of your loop, the name 'auc' was already taken, so TF named it auc_1 since you didn't specify a name. But, your callback was set to be based on auc, which is a metric that this model didn't have (it was the metric of the model from the previous loop). So, you either do name='auc' and overwrite the previous metric name, or define it outside of the loop, like this:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
auc = AUC()
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='softmax'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[auc])
return model
And don't worry about keras resetting the metrics. It takes care of all that in the fit() method. If you want more flexibility and/or do it yourself, I suggest using custom training loops, and reset it yourself:
auc = tf.keras.metrics.AUC()
auc.update_state(np.random.randint(0, 2, 10), np.random.randint(0, 2, 10))
print(auc.result())
auc.reset_states()
print(auc.result())
Out[6]: <tf.Tensor: shape=(), dtype=float32, numpy=0.875> # state updated
Out[8]: <tf.Tensor: shape=(), dtype=float32, numpy=0.0> # state reset
I want to create a neural network which can add two integer numbers. I have designed it as follows:
question I have really low accuracy of 0.002% . what can i do to increase it?
For creating data:
import numpy as np
import random
a=[]
b=[]
c=[]
for i in range(1, 1001):
a.append(random.randint(1,999))
b.append(random.randint(1,999))
c.append(a[i-1] + b[i-1])
X = np.array([a,b]).transpose()
y = np.array(c).transpose().reshape(-1, 1)
scaling my data :
from sklearn.preprocessing import MinMaxScaler
minmax = MinMaxScaler()
minmax2 = MinMaxScaler()
X = minmax.fit_transform(X)
y = minmax2.fit_transform(y)
The network :
from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
clfa = Sequential()
clfa.add(Dense(input_dim=2, output_dim=2, activation='sigmoid', kernel_initializer='he_uniform'))
clfa.add(Dense(output_dim=2, activation='sigmoid', kernel_initializer='uniform'))
clfa.add(Dense(output_dim=2, activation='sigmoid', kernel_initializer='uniform'))
clfa.add(Dense(output_dim=2, activation='sigmoid', kernel_initializer='uniform'))
clfa.add(Dense(output_dim=1, activation='relu'))
opt = SGD(lr=0.01)
clfa.compile(opt, loss='mean_squared_error', metrics=['acc'])
clfa.fit(X, y, epochs=140)
outputs :
Epoch 133/140
1000/1000 [==============================] - 0s 39us/step - loss: 0.0012 - acc: 0.0020
Epoch 134/140
1000/1000 [==============================] - 0s 40us/step - loss: 0.0012 - acc: 0.0020
Epoch 135/140
1000/1000 [==============================] - 0s 41us/step - loss: 0.0012 - acc: 0.0020
Epoch 136/140
1000/1000 [==============================] - 0s 40us/step - loss: 0.0012 - acc: 0.0020
Epoch 137/140
1000/1000 [==============================] - 0s 41us/step - loss: 0.0012 - acc: 0.0020
Epoch 138/140
1000/1000 [==============================] - 0s 42us/step - loss: 0.0012 - acc: 0.0020
Epoch 139/140
1000/1000 [==============================] - 0s 40us/step - loss: 0.0012 - acc: 0.0020
Epoch 140/140
1000/1000 [==============================] - 0s 42us/step - loss: 0.0012 - acc: 0.0020
That is my code with console outputs..
I have tried every different combinations of optimizers, losses, and activations, plus this data fits perfectly a Linear Regression.
Two mistakes, several issues.
The mistakes:
This is a regression problem, so the activation of the last layer should be linear, not relu (leaving it without specifying anything will work, since linear is the default activation in a Keras layer).
Accuracy is meaningless in regression; remove metrics=['acc'] from your model compilation - you should judge the performance of your model only with your loss.
The issues:
We don't use sigmoid activations for the intermediate layers; change all of them to relu.
Remove the kernel_initializer argument, thus leaving the default glorot_uniform, which is the recommended one.
A number of Dense layers each one only with two nodes is not a good idea; try reducing the number of layers and increasing the number of nodes. See here for a simple example network for the iris data.
You are trying to fit a linear function, but internally use sigmoid nodes, which map values to a range (0,1). Sigmoid is very useful for classification, but not really for regression if the values are outside (0,1). It could MAYBE work if you restricted your random number to floating point in the interval [0,1]. OR input into your nodes all the bits seperately, and have it learn an adder.
I trained a model with ResNet50 and got an amazing accuracy of 95% on training set.
I took the same training set for validation and the accuracy seem very bad.(<0.05%)
from keras.preprocessing.image import ImageDataGenerator
train_set = ImageDataGenerator(horizontal_flip=True,rescale=1./255,shear_range=0.2,zoom_range=0.2).flow_from_directory(data,target_size=(256,256),classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=31)
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import layers,Model
conv_base = ResNet50(
include_top=False,
weights='imagenet')
for layer in conv_base.layers:
layer.trainable = False
x = conv_base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
predictions = layers.Dense(31, activation='softmax')(x)
model = Model(conv_base.input, predictions)
# here you will write the path for train data or if you create your val data then you can test using that too.
# test_dir = ""
test_datagen = ImageDataGenerator(rescale=1. / 255)
test_generator = test_datagen.flow_from_directory(
data,
target_size=(256, 256), classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=1,shuffle=True)
model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
model.fit_generator(train_set,steps_per_epoch=1488//31,epochs=10,verbose=True,validation_data = test_generator,
validation_steps = test_generator.samples // 31)
Epoch 1/10
48/48 [==============================] - 27s 553ms/step - loss: 1.9631 - acc: 0.4825 - val_loss: 4.3134 - val_acc: 0.0208
Epoch 2/10
48/48 [==============================] - 22s 456ms/step - loss: 0.6395 - acc: 0.8212 - val_loss: 4.7584 - val_acc: 0.0833
Epoch 3/10
48/48 [==============================] - 23s 482ms/step - loss: 0.4325 - acc: 0.8810 - val_loss: 5.3852 - val_acc: 0.0625
Epoch 4/10
48/48 [==============================] - 23s 476ms/step - loss: 0.2925 - acc: 0.9153 - val_loss: 6.0963 - val_acc: 0.0208
Epoch 5/10
48/48 [==============================] - 23s 477ms/step - loss: 0.2275 - acc: 0.9341 - val_loss: 5.6571 - val_acc: 0.0625
Epoch 6/10
48/48 [==============================] - 23s 478ms/step - loss: 0.1855 - acc: 0.9489 - val_loss: 6.2440 - val_acc: 0.0208
Epoch 7/10
48/48 [==============================] - 23s 483ms/step - loss: 0.1704 - acc: 0.9543 - val_loss: 7.4446 - val_acc: 0.0208
Epoch 8/10
48/48 [==============================] - 23s 487ms/step - loss: 0.1828 - acc: 0.9476 - val_loss: 7.5198 - val_acc: 0.0417
What could be the reason?!
You have assigned train_set and test_datagen differently. In particular one is flipped and scaled where the other isn't. As I mentioned in my comment, if its the same data it will have the same accuracy. You can see a model is overfitting when you use validation correctly and use unseen data for validation. Using the same data will always give the same accuracy for training and validation
not sure what is exactly wrong but it is NOT an over fitting issue. It is clear your validation data(same as training data) is not going in correctly. For one thing you set the validation batch size =1 but you set the validation steps as validation_steps = test_generator.samples // 31) . If test_generator,samples = 1488 then you have 48 steps but with a batch size of 1 you will only validate 48 samples. You want to set the batch size and steps so that batch_size X validation_steps equals the total number of samples. That way you go through the validation set exactly one time. I also recommend that for the test generator you set shuffle=False. Also why do you bother entering all the class names. If you have your class directories labeled as 'airplane','airport','baseball_diamond' etc then you don;t need to specifically define the classes flow from directory will do that for you automatically. See documentation below.
classes: Optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under directory, where each subdirectory will be treated as a different class (and the order of the classes, which will map to the label indices, will be alphanumeric). The dictionary containing the mapping from class names to class indices can be obtained via the attribute class_indices.
Your training data is actually different than your test data because you are using data augmentation in the generator. That's OK it may lead to a small difference between your test and validation accuracy but your validation accuracy should be pretty close once you get the validation data to go in correctly
I'm making a simple classification algo with a keras neural network. The goal is to take 3 data points on weather and decide whether or not there's a wildfire. Here's an image of the .csv dataset that I'm using to train the model(this image is only the top few lines and isn't the entire thing ):
wildfire weather dataset
As you can see, there are 4 columns with the fourth being either a "1" which means "fire", or a "0" which means "no fire". I want the algo to predict either a 1 or a 0. This is the code that I wrote:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import csv
#THIS IS USED TO TRAIN THE MODEL
# Importing the dataset
dataset = pd.read_csv('Fire_Weather.csv')
dataset.head()
X=dataset.iloc[:,0:3]
Y=dataset.iloc[:,3]
X.head()
obj=StandardScaler()
X=obj.fit_transform(X)
X_train,X_test,y_train,y_test=train_test_split(X, Y, test_size=0.25)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation =
'relu', input_dim = 3))
# classifier.add(Dropout(p = 0.1))
# Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation
= 'relu'))
# classifier.add(Dropout(p = 0.1))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation
= 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics
= ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 3, epochs = 10)
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
print(y_pred)
classifier.save("weather_model.h5")
The problem is that whenever I run this, my accuracy is always "0.0000e+00" and my training output looks like this:
Epoch 1/10
2146/2146 [==============================] - 2s 758us/step - loss: nan - accuracy: 0.0238
Epoch 2/10
2146/2146 [==============================] - 1s 625us/step - loss: nan - accuracy: 0.0000e+00
Epoch 3/10
2146/2146 [==============================] - 1s 604us/step - loss: nan - accuracy: 0.0000e+00
Epoch 4/10
2146/2146 [==============================] - 1s 609us/step - loss: nan - accuracy: 0.0000e+00
Epoch 5/10
2146/2146 [==============================] - 1s 624us/step - loss: nan - accuracy: 0.0000e+00
Epoch 6/10
2146/2146 [==============================] - 1s 633us/step - loss: nan - accuracy: 0.0000e+00
Epoch 7/10
2146/2146 [==============================] - 1s 481us/step - loss: nan - accuracy: 0.0000e+00
Epoch 8/10
2146/2146 [==============================] - 1s 476us/step - loss: nan - accuracy: 0.0000e+00
Epoch 9/10
2146/2146 [==============================] - 1s 474us/step - loss: nan - accuracy: 0.0000e+00
Epoch 10/10
2146/2146 [==============================] - 1s 474us/step - loss: nan - accuracy: 0.0000e+00
Does anyone know why this is happening and what I could do to my code to fix this?
Thank You!
EDIT: I realized that my earlier response was highly misleading, which was thankfully pointed out by #xdurch0 and #Timbus Calin. Here is an edited answer.
Check that all your input values are valid. Are there any nan or inf values in your training data?
Try using different activation functions. ReLU is good, but it is prone to what is known as the dying ReLu problem, where the neural network basically learns nothing since no updates are made to its weight. One possibility is to use Leaky ReLu or PReLU.
Try using gradient clipping, which is a technique used to tackle vanishing or exploding gradients (which is likely what is happening in your case). Keras allows users to configure clipnorm clip value for optimizers.
There are posts on SO that report similar problems, such as this one, which might also be of interest to you.