To do some parameter tuning, I like to loop over some training function with Keras. However, I realized that when using tensorflow.keras.metrics.AUC() as a metric, for every training loop, an integer gets added to the auc metric name (e.g. auc_1, auc_2, ...). So actually the keras metrics are somehow stored even when coming out of the training function.
This first of all leads to the callbacks not recognizing the metric anymore and also makes me wonder if there are not other things stored like the model weights.
How can I reset the metrics and are there other things that get stored by keras that I need to reset to get a clean restart for training?
Below you can find a minimal working example:
edit: this example seems to only work with tensorflow 2.2
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='sigmoid'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[AUC()])
return model
def train():
CB_lr = tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_auc",
patience=3,
verbose=1,
mode="max",
min_delta=0.0001,
min_lr=1e-6)
CB_es = tf.keras.callbacks.EarlyStopping(
monitor="val_auc",
min_delta=0.00001,
verbose=1,
patience=10,
mode="max",
restore_best_weights=True)
callbacks = [CB_lr, CB_es]
y = [np.ones((11, 1)) for _ in range(1000)]
x = [np.ones((37, 12, 1)) for _ in range(1000)]
dummy_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
model = dummy_network(input_shape=((37, 12, 1)))
model.fit(dummy_dataset, validation_data=val_dataset, epochs=2,
steps_per_epoch=len(x) // 100,
validation_steps=len(x) // 100, callbacks=callbacks)
for i in range(3):
print(f'\n\n **** Loop {i} **** \n\n')
train()
The output is:
**** Loop 0 ****
2020-06-16 14:37:46.621264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f991e541f10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-16 14:37:46.621296: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 1/2
10/10 [==============================] - 0s 44ms/step - loss: 0.1295 - auc: 0.0000e+00 - val_loss: 0.0310 - val_auc: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - 0s 10ms/step - loss: 0.0262 - auc: 0.0000e+00 - val_loss: 0.0223 - val_auc: 0.0000e+00 - lr: 0.0010
**** Loop 1 ****
Epoch 1/2
10/10 [==============================] - ETA: 0s - loss: 0.4751 - auc_1: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
10/10 [==============================] - 0s 36ms/step - loss: 0.4751 - auc_1: 0.0000e+00 - val_loss: 0.3137 - val_auc_1: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - ETA: 0s - loss: 0.2617 - auc_1: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
10/10 [==============================] - 0s 10ms/step - loss: 0.2617 - auc_1: 0.0000e+00 - val_loss: 0.2137 - val_auc_1: 0.0000e+00 - lr: 0.0010
**** Loop 2 ****
Epoch 1/2
10/10 [==============================] - ETA: 0s - loss: 0.1948 - auc_2: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
10/10 [==============================] - 0s 34ms/step - loss: 0.1948 - auc_2: 0.0000e+00 - val_loss: 0.0517 - val_auc_2: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - ETA: 0s - loss: 0.0445 - auc_2: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
10/10 [==============================] - 0s 10ms/step - loss: 0.0445 - auc_2: 0.0000e+00 - val_loss: 0.0389 - val_auc_2: 0.0000e+00 - lr: 0.0010
Your reproducible example failed in several places for me, so I changed just a few things (I'm using TF 2.1). After getting it to run, I was able to get rid of the additional metric names by specifying metrics=[AUC(name='auc')]. Here's the full (fixed) reproducible example:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='softmax'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[AUC(name='auc')])
return model
def train():
CB_lr = tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_auc",
patience=3,
verbose=1,
mode="max",
min_delta=0.0001,
min_lr=1e-6)
CB_es = tf.keras.callbacks.EarlyStopping(
monitor="val_auc",
min_delta=0.00001,
verbose=1,
patience=10,
mode="max",
restore_best_weights=True)
callbacks = [CB_lr, CB_es]
y = tf.keras.utils.to_categorical([np.random.randint(0, 11) for _ in range(1000)])
x = [np.ones((37, 12, 1)) for _ in range(1000)]
dummy_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
model = dummy_network(input_shape=((37, 12, 1)))
model.fit(dummy_dataset, validation_data=val_dataset, epochs=2,
steps_per_epoch=len(x) // 100,
validation_steps=len(x) // 100, callbacks=callbacks)
for i in range(3):
print(f'\n\n **** Loop {i} **** \n\n')
train()
Train for 10 steps, validate for 10 steps
Epoch 1/2
1/10 [==>...........................] - ETA: 6s - loss: 0.3426 - auc: 0.4530
7/10 [====================>.........] - ETA: 0s - loss: 0.3318 - auc: 0.4895
10/10 [==============================] - 1s 117ms/step - loss: 0.3301 -
auc: 0.4893 - val_loss: 0.3222 - val_auc: 0.5085
This happens because every loop, you created a new metric without a specified name by doing this: metrics=[AUC()]. The first iteration of the loop, TF automatically created a variable in the name space called auc, but at the second iteration of your loop, the name 'auc' was already taken, so TF named it auc_1 since you didn't specify a name. But, your callback was set to be based on auc, which is a metric that this model didn't have (it was the metric of the model from the previous loop). So, you either do name='auc' and overwrite the previous metric name, or define it outside of the loop, like this:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
auc = AUC()
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='softmax'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[auc])
return model
And don't worry about keras resetting the metrics. It takes care of all that in the fit() method. If you want more flexibility and/or do it yourself, I suggest using custom training loops, and reset it yourself:
auc = tf.keras.metrics.AUC()
auc.update_state(np.random.randint(0, 2, 10), np.random.randint(0, 2, 10))
print(auc.result())
auc.reset_states()
print(auc.result())
Out[6]: <tf.Tensor: shape=(), dtype=float32, numpy=0.875> # state updated
Out[8]: <tf.Tensor: shape=(), dtype=float32, numpy=0.0> # state reset
Related
I am trying to train a simple convolutional network using Keras (Tensorflow 2.8.0) in Python 3.7.9 on Spyder IDE 5.2.2. The network involves Conv2D, MaxPooling2D, Flatten and Dense layers.
The model ran perfectly when I used my CPU, but training was slow. So I decided to try to run it on my GPU (GeForce GTX 1050 Ti).
I installed CUDA 11.2 and added lib, include and bin to my path. I installed CuDNN 8.1, and copied cudnn84_8.dll into the CUDA bin directory, also copied cudnn.h into CUDA header and cudnn.lib into CUDA lib.
Once I had done the above, I was able to use my GPU with Tensorflow. Tensorflow does recognize my GPU: when I run tf.config.list_physical_devices() I get:
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
However, when I run the model containing Conv2D layers using my GPU, it fails as follows. This is the output I get:
2022-03-17 18:32:04.687276: I
tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow
binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical
operations: AVX AVX2 To enable them in other operations, rebuild
TensorFlow with the appropriate compiler flags.
2022-03-17 18:32:05.171943: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device
/job:localhost/replica:0/task:0/device:GPU:0 with 2782 MB memory: ->
device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0,
compute capability: 6.1
Epoch 1/10
This is where you would normally see the progress bars for the training (verbose=1) but nothing further appears after "Epoch 1/10". Instead, I think the kernel restarts, because it clears all saved variables and I have to re-import all packages.
If I train a model using only Dense layers, it does work using the GPU. I have checked that it is actually using the GPU in this case, by monitoring GPU usage - it does use it. And as mentioned, the Conv2D model works fine on my CPU.
So in summary, I seem to have a problem selective for Conv2D models on my GPU. Any help in understanding this would be greatly appreciated!
My code:
from matplotlib import pyplot
import tensorflow as tf
import numpy as np
import keras
from keras.datasets import cifar10
#from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from tensorflow.keras.optimizers import SGD
def load_dataset():
(trainX, trainy), (testX, testy) = cifar10.load_data()
print('Train shape: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test shape: X=%s, y=%s' % (testX.shape, testy.shape))
trainY = tf.keras.utils.to_categorical(trainy)
testY = tf.keras.utils.to_categorical(testy)
return trainX, trainY, testX, testY
def prep_pixels(train, test):
# convert from integers to floats
train_norm = train.astype('float32')
test_norm = test.astype('float32')
# normalize to range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0
# return normalized images
return train_norm, test_norm
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3))) #kernel_initializer='he_uniform',
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(learning_rate=0.001, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
def summarize_diagnostics(history):
# plot loss
pyplot.subplot(211)
pyplot.title('Cross Entropy Loss')
pyplot.plot(history.history['loss'], color='blue', label='train')
pyplot.plot(history.history['val_loss'], color='orange', label='test')
# plot accuracy
pyplot.subplot(212)
pyplot.title('Classification Accuracy')
pyplot.plot(history.history['accuracy'], color='blue', label='train')
pyplot.plot(history.history['val_accuracy'], color='orange', label='test')
pyplot.show()
def run_test_harness():
trainX, trainY, testX, testY = load_dataset()
trainX, testX = prep_pixels(trainX, testX)
model = define_model()
history = model.fit(trainX, trainY, epochs=10, batch_size=64, validation_data=(testX, testY))
_, acc = model.evaluate(testX, testY)
print('> %.3f' % (acc * 100.0)) #overall validation accuracy
summarize_diagnostics(history)
tf.config.list_physical_devices()
run_test_harness()
This could be possible due to less RAM available for running this code using GPU in your system.
This code works fine when I tried replicating the same in Google colab using gpu mode, However cpu mode, takes more time to run the code comparatively:
tf.config.list_physical_devices()
Output:
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
To run the code:
run_test_harness()
Output:
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 2s 0us/step
170508288/170498071 [==============================] - 2s 0us/step
Train shape: X=(50000, 32, 32, 3), y=(50000, 1)
Test shape: X=(10000, 32, 32, 3), y=(10000, 1)
Epoch 1/10
782/782 [==============================] - 22s 10ms/step - loss: 1.9416 - accuracy: 0.3059 - val_loss: 1.7283 - val_accuracy: 0.3935
Epoch 2/10
782/782 [==============================] - 7s 9ms/step - loss: 1.6194 - accuracy: 0.4278 - val_loss: 1.5163 - val_accuracy: 0.4528
Epoch 3/10
782/782 [==============================] - 8s 10ms/step - loss: 1.4575 - accuracy: 0.4835 - val_loss: 1.3856 - val_accuracy: 0.5099
Epoch 4/10
782/782 [==============================] - 6s 8ms/step - loss: 1.3539 - accuracy: 0.5213 - val_loss: 1.3111 - val_accuracy: 0.5348
Epoch 5/10
782/782 [==============================] - 5s 7ms/step - loss: 1.2553 - accuracy: 0.5589 - val_loss: 1.2233 - val_accuracy: 0.5658
Epoch 6/10
782/782 [==============================] - 5s 6ms/step - loss: 1.1763 - accuracy: 0.5865 - val_loss: 1.1691 - val_accuracy: 0.5841
Epoch 7/10
782/782 [==============================] - 5s 6ms/step - loss: 1.1091 - accuracy: 0.6115 - val_loss: 1.1284 - val_accuracy: 0.6005
Epoch 8/10
782/782 [==============================] - 5s 6ms/step - loss: 1.0402 - accuracy: 0.6344 - val_loss: 1.0932 - val_accuracy: 0.6183
Epoch 9/10
782/782 [==============================] - 5s 6ms/step - loss: 0.9842 - accuracy: 0.6546 - val_loss: 1.0825 - val_accuracy: 0.6255
Epoch 10/10
782/782 [==============================] - 5s 6ms/step - loss: 0.9350 - accuracy: 0.6742 - val_loss: 1.0722 - val_accuracy: 0.6256
313/313 [==============================] - 1s 3ms/step - loss: 1.0722 - accuracy: 0.6256
> 62.560
Please check again in Google colab by selecting the GPU
(Runtime - Change runtime type - Hardware accelerator - GPU - Save)
and let us know if issue still persists.
Some context about my project: I intend to study various parameters about bullets and how they affect the ballistics coefficient (i.e. bullet performance) of the projectile. I have different parameters, such as weight, caliber, sectional density, etc. I feel that I did this all wrong though; I am just reading through tutorials and applying what I feel could be useful and relevant in my project.
The output of my regression model looks a bit off to me; the trained model continuously outputs 0.0201 as MSE throughout the model.fit() part of my program.
Also, the model.predict(X) seems to have an accuracy of 100%, however, this does not seem right; I borrowed some code from a tutorial describing Keras models to display the model output while displaying the expected output.
This is the program constructing the model and training it
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.utils import shuffle
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from pandas.plotting import scatter_matrix
import time
name = 'Bullet Database Analysis v2-{}'.format(int(time.time()))
tensorboard = TensorBoard(log_dir='logs/{}'.format(name))
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
df = pd.read_csv('Bullet Optimization\ShootForum Bullet DB_2.csv')
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
dataset = df.values
X = dataset[:,0:12]
X = np.asarray(X).astype(np.float32)
y = dataset[:,13]
y = np.asarray(y).astype(np.float32)
X_train, X_val_and_test, y_train, y_val_and_test = train_test_split(X, y, test_size=0.3, shuffle=True)
X_val, X_test, y_val, y_test = train_test_split(X_val_and_test, y_val_and_test, test_size=0.5)
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization
model = Sequential(
[
#2430 is the shape of X_train
#BatchNormalization(axis=-1, momentum = 0.1),
Dense(2430, activation='relu'),
Dense(32, activation='relu'),
Dense(1),
]
)
model.compile(loss='mse', metrics=['mse'])
history = model.fit(X_train, y_train,
batch_size=64,
epochs=20,
validation_data=(X_val, y_val),
#callbacks = [tensorboard]
)
# plt.plot(history.history['loss'],'r')
# plt.plot(history.history['val_loss'],'m')
plt.plot(history.history['mse'],'b')
plt.show()
model.summary()
model.save("Bullet Optimization\Bullet Database Analysis.h5")
Here is my code, loading my previously trained model via h5
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras.models import load_model
import pandas as pd
df = pd.read_csv('Bullet Optimization\ShootForum Bullet DB_2.csv')
model = load_model('Bullet Optimization\Bullet Database Analysis.h5')
dataset = df.values
X = dataset[:,0:12]
y = dataset[:,13]
model.fit(X,y, epochs=10)
#predictions = np.argmax(model.predict(X), axis=-1)
predictions = model.predict(X)
# summarize the first 5 cases
for i in range(5):
print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))
This is the output
Epoch 1/10
2021-03-09 10:38:06.372303: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-03-09 10:38:07.747241: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
109/109 [==============================] - 2s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 2/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 3/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 4/10
109/109 [==============================] - 0s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 5/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 6/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 7/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 8/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 9/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 10/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
[0.314, 7.9756, 100.0, 100.0, 31.4, 0.00314, 318.4713376, 6.480041472000001, 0.51, 12.95400001, 4.067556004, 0.145] => 0 (expected 0)
[0.358, 9.0932, 148.0, 148.0, 52.983999999999995, 0.002418919, 413.4078212, 9.590461379, 0.635, 16.12900002, 5.774182006, 0.165] => 0 (expected 0)
[0.313, 7.9502, 83.0, 83.0, 25.979, 0.003771084, 265.1757188, 5.378434422000001, 0.504, 12.80160001, 4.006900804, 0.121] => 0 (expected 0)
[0.251, 6.3754, 50.0, 50.0, 12.55, 0.00502, 199.20318730000002, 3.2400207360000004, 0.4, 10.16000001, 2.5501600030000002, 0.113] => 0 (expected 0)
[0.251, 6.3754, 50.0, 50.0, 12.55, 0.00502, 199.20318730000002, 3.2400207360000004, 0.41, 10.41400001, 2.613914003, 0.113] => 0 (expected 0)
Here is a link to my training dataset. Within my code, I used train_test_split to create both the test and train dataset.
Lastly, is there a way within Tensorboard to visualize the model fitting with the dataset? I really feel that although my model is training, it is not making any significant fitting even though the MSE error is reduced.
Because you have nan values in your dataset. Before splitting up you can check it with df.isna().sum(). These can have a negative impact on your network. Here I just simply dropped them (df.dropna(inplace = True, axis = 0)) but you can use some imputation techniques to replace them.
Also 2430 neurons can be overkill for this data, start with less neurons.
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1),
]
)
Here is the last epoch:
Epoch 20/20
27/27 [==============================] - 0s 8ms/step - loss: 8.2077e-04 - mse: 8.2077e-04 -
val_loss: 8.5023e-04 - val_mse: 8.5023e-04
While doing regression, calculating accuracy straight forward is not a valid option. You can use model.evaluate(X_test, y_test) or when you get predictions by model.predict, you can use other regression metrics to compute how close your predictions are.
I am running a code using Python 3.7.5 with TensorFlow 2.0 for MNIST classification.
I am using EarlyStopping from TensorFlow 2.0 and the callback I have for it is:
callbacks = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience = 3,
min_delta=0.001
)
]
According to EarlyStopping - TensorFlow 2.0 page, the definition of min_delta parameter is as follows:
min_delta: Minimum change in the monitored quantity to qualify as an
improvement, i.e. an absolute change of less than min_delta, will
count as no improvement.
Train on 60000 samples, validate on 10000 samples
Epoch 1/15 60000/60000 [==============================] - 10s
173us/sample - loss: 0.2040 - accuracy: 0.9391 - val_loss: 0.1117 -
val_accuracy: 0.9648
Epoch 2/15 60000/60000 [==============================] - 9s
150us/sample - loss: 0.0845 - accuracy: 0.9736 - val_loss: 0.0801 -
val_accuracy: 0.9748
Epoch 3/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0574 - accuracy: 0.9817 - val_loss: 0.0709 -
val_accuracy: 0.9795
Epoch 4/15 60000/60000 [==============================] - 9s
149us/sample - loss: 0.0434 - accuracy: 0.9858 - val_loss: 0.0787 -
val_accuracy: 0.9761
Epoch 5/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0331 - accuracy: 0.9893 - val_loss: 0.0644 -
val_accuracy: 0.9808
Epoch 6/15 60000/60000 [==============================] - 9s
150us/sample - loss: 0.0275 - accuracy: 0.9910 - val_loss: 0.0873 -
val_accuracy: 0.9779
Epoch 7/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0232 - accuracy: 0.9921 - val_loss: 0.0746 -
val_accuracy: 0.9805
Epoch 8/15 60000/60000 [==============================] - 9s
151us/sample - loss: 0.0188 - accuracy: 0.9936 - val_loss: 0.1088 -
val_accuracy: 0.9748
Now if I look at the last three epochs viz., epochs 6, 7, and 8 and look at the validation loss ('val_loss'), their values are:
0.0688, 0.0843 and 0.0847.
And the differences between consecutive 3 terms are: 0.0155, 0.0004. But isn't the first difference greater than 'min_delta' as defined in the callback.
The code I came up with for EarlyStopping is as follows:
# numpy array to hold last 'patience = 3' values-
pv = [0.0688, 0.0843, 0.0847]
# numpy array to compute differences between consecutive elements in 'pv'-
differences = np.diff(pv, n=1)
differences
# array([0.0155, 0.0004])
# minimum change required for monitored metric's improvement-
min_delta = 0.001
# Check whether the consecutive differences is greater than 'min_delta' parameter-
check = differences > min_delta
check
# array([ True, False])
# Condition to see whether all 3 'val_loss' differences are less than 'min_delta'
# for training to stop since EarlyStopping is called-
if np.all(check == False):
print("Stop Training - EarlyStopping is called")
# stop training
But according to the 'val_loss', the differences between the not ALL of the 3 last epochs are greater than 'min_delta' of 0.001. For example, the first difference is greater than 0.001 (0.0843 - 0.0688) while the second difference is less than 0.001 (0.0847 - 0.0843).
Also, according to patience parameter definition of "EarlyStopping":
patience: Number of epochs with no improvement after which training will be stopped.
So, EarlyStopping should be called when there is no improvement for 'val_loss' for 3 consecutive epochs where the absolute change of less than 'min_delta' does not count as improvement.
Then why is EarlyStopping called?
Code for model definition and 'fit()' are:
import tensorflow_model_optimization as tfmot
from tensorflow_model_optimization.sparsity import keras as sparsity
import matplotlib.pyplot as plt from tensorflow.keras.layers import AveragePooling2D, Conv2D
from tensorflow.keras import models, layers, datasets
from tensorflow.keras.layers import Dense, Flatten, Reshape, Input, InputLayer
from tensorflow.keras.models import Sequential, Model
# Specify the parameters to be used for layer-wise pruning, NO PRUNING is done here:
pruning_params_unpruned = {
'pruning_schedule': sparsity.PolynomialDecay(
initial_sparsity=0.0, final_sparsity=0.0,
begin_step = 0, end_step=0, frequency=100) }
def pruned_nn(pruning_params):
"""
Function to define the architecture of a neural network model
following 300 100 architecture for MNIST dataset and using
provided parameter which are used to prune the model.
Input: 'pruning_params' Python 3 dictionary containing parameters which are used for pruning
Output: Returns designed and compiled neural network model
"""
pruned_model = Sequential()
pruned_model.add(l.InputLayer(input_shape=(784, )))
pruned_model.add(Flatten())
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 300, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.2))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = 100, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
**pruning_params))
# pruned_model.add(l.Dropout(0.1))
pruned_model.add(sparsity.prune_low_magnitude(
Dense(units = num_classes, activation='softmax'),
**pruning_params))
# Compile pruned CNN-
pruned_model.compile(
loss=tf.keras.losses.categorical_crossentropy,
# optimizer='adam',
optimizer=tf.keras.optimizers.Adam(lr = 0.001),
metrics=['accuracy'])
return pruned_model
batch_size = 32
epochs = 50
# Instantiate NN-
orig_model = pruned_nn(pruning_params_unpruned)
# Train unpruned Neural Network-
history_orig = orig_model.fit(
x = X_train, y = y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
callbacks = callbacks,
validation_data = (X_test, y_test),
shuffle = True )
The behaviour of the Early Stopping callback is related to:
Metric or Loss to be monitored
min_delta which is the minimum quantity to be considered an improvement, between the performance of the monitored quantity in the current epoch and the best result in that metric.
patience which is the number of epochs without improvements (taking into consideration that improvements have to be of a greater change than min_delta) before stopping the algorithm.
In your case, the best val_lossis 0.0644 and should have a value of lower than 0.0634 to be registered as improvement:
Epoch 6/15 val_loss: 0.0873 | Difference is: + 0.0229
Epoch 7/15 val_loss: 0.0746 | Difference is: + 0.0102
Epoch 8/15 val_loss: 0.1088 | Difference is: + 0.0444
Be aware that the quantities that are printed in the "training log", are approximated and you shouldn't base your assumptions on these values. You should rather take into consideration the true values from callbacks or the history of the training.
Reference
I'm working on some Artificial Intelligence project and I want to predict the bitcoin trend but while using the model.predict function from Keras with my test_set, the prediction is always equal to 1 and the line in my diagram is therefor always straight.
import csv
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from cryptory import Cryptory
from keras.models import Sequential, Model, InputLayer
from keras.layers import LSTM, Dropout, Dense
from sklearn.preprocessing import MinMaxScaler
def format_to_3d(df_to_reshape):
reshaped_df = np.array(df_to_reshape)
return np.reshape(reshaped_df, (reshaped_df.shape[0], 1, reshaped_df.shape[1]))
crypto_data = Cryptory(from_date = "2014-01-01")
bitcoin_data = crypto_data.extract_coinmarketcap("bitcoin")
sc = MinMaxScaler()
for col in bitcoin_data.columns:
if col != "open":
del bitcoin_data[col]
training_set = bitcoin_data;
training_set = sc.fit_transform(training_set)
# Split the data into train, validate and test
train_data = training_set[365:]
# Split the data into x and y
x_train, y_train = train_data[:len(train_data)-1], train_data[1:]
model = Sequential()
model.add(LSTM(units=4, input_shape=(None, 1))) # 128 -- neurons**?
# model.add(Dropout(0.2))
model.add(Dense(units=1, activation="softmax")) # activation function could be different
model.compile(optimizer="adam", loss="mean_squared_error") # mse could be used for loss, look into optimiser
model.fit(format_to_3d(x_train), y_train, batch_size=32, epochs=15)
test_set = bitcoin_data
test_set = sc.transform(test_set)
test_data = test_set[:364]
input = test_data
input = sc.inverse_transform(input)
input = np.reshape(input, (364, 1, 1))
predicted_result = model.predict(input)
print(predicted_result)
real_value = sc.inverse_transform(input)
plt.plot(real_value, color='pink', label='Real Price')
plt.plot(predicted_result, color='blue', label='Predicted Price')
plt.title('Bitcoin Prediction')
plt.xlabel('Time')
plt.ylabel('Prices')
plt.legend()
plt.show()
The training set performance looks like this:
1566/1566 [==============================] - 3s 2ms/step - loss: 0.8572
Epoch 2/15
1566/1566 [==============================] - 1s 406us/step - loss: 0.8572
Epoch 3/15
1566/1566 [==============================] - 1s 388us/step - loss: 0.8572
Epoch 4/15
1566/1566 [==============================] - 1s 388us/step - loss: 0.8572
Epoch 5/15
1566/1566 [==============================] - 1s 389us/step - loss: 0.8572
Epoch 6/15
1566/1566 [==============================] - 1s 392us/step - loss: 0.8572
Epoch 7/15
1566/1566 [==============================] - 1s 408us/step - loss: 0.8572
Epoch 8/15
1566/1566 [==============================] - 1s 459us/step - loss: 0.8572
Epoch 9/15
1566/1566 [==============================] - 1s 400us/step - loss: 0.8572
Epoch 10/15
1566/1566 [==============================] - 1s 410us/step - loss: 0.8572
Epoch 11/15
1566/1566 [==============================] - 1s 395us/step - loss: 0.8572
Epoch 12/15
1566/1566 [==============================] - 1s 386us/step - loss: 0.8572
Epoch 13/15
1566/1566 [==============================] - 1s 385us/step - loss: 0.8572
Epoch 14/15
1566/1566 [==============================] - 1s 393us/step - loss: 0.8572
Epoch 15/15
1566/1566 [==============================] - 1s 397us/step - loss: 0.8572
I'm supposed to print a plot with the Real Price and the Predicted Price, the Real Price is displayed properly but the Predicted price is only a straight line because of that model.predict that only contains the value 1.
Thanks in advance!
You're trying to predict a price value, that is, you're aiming at solving a regression problem and not a classification problem.
However, in your last layer of the network (model.add(Dense(units=1, activation="softmax"))), you have a single neuron (which would be adequate for a regression problem), but you've chosen to use a softmax activation function. The softmax function is used in multi-class classification problems, to normalize the outputs into a probability distribution. If you have a single output neuron and you apply softmax, the final result will always 1.0, as it is the only parameter of the probability distribution.
In summary, for regression problems you do not use an activation function, as the network is intended to already output the predicted value.
I am quite new to deep learning and I was studying this RNN example.
After completing the tutorial, I decided to see the effect of various hyperparameters such as the number of nodes in each layer and dropout factor etc.
What I do is, for each value in my lists, create a new model using a set of parameters and test the performance in my dataset. Below is the basic code:
def build_model(MODELNAME, l1,l2,l3, l4, d):
tf.global_variables_initializer()
tf.reset_default_graph()
model = Sequential(name = MODELNAME)
model.reset_states
model.add(CuDNNLSTM(l1, input_shape=(x_train.shape[1:]), return_sequences=True) )
model.add(Dropout(d))
model.add(BatchNormalization())
model.add(CuDNNLSTM(l2, input_shape=(x_train.shape[1:]), return_sequences=True) )
# Definition of other layers of the model ...
model.compile(loss="sparse_categorical_crossentropy",
optimizer=opt,
metrics=['accuracy'])
history = model.fit(x_train, y_train,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
validation_data=(x_validation, y_validation))
return model
layer1 = [64, 128, 256]
layer2,3,4 = [...]
drop = [0.2, 0.3, 0.4]
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
for l1 in layer1:
#for l2, l3, l4 for layer2, layer3, layer4
for d in drop:
sess = tf.Session(config=config)
set_session(sess)
MODELNAME = 'RNN-l1={}-l2={}-l3={}-l4={}-drop={} '.format(l1, l2, l3, l4, d)
print(MODELNAME)
model = build_model(MODELNAME, l1,l2,l3, l4, d)
sess.close()
print('-----> training & validation loss & accuracies)
The problem is when the new model is built using the new parameters, it works as if the next epoch of the previous model, rather than epoch 1 of the new one. Below is some of the results.
RNN-l1=64-l2=64-l3=64-l4=32-drop=0.2
Train on 90116 samples, validate on 4458 samples
Epoch 1/6
90116/90116 [==============================] - 139s 2ms/step - loss: 0.5558 - acc: 0.7116 - val_loss: 0.8857 - val_acc: 0.5213
... # results for other epochs
Epoch 6/6
RNN-l1=64-l2=64-l3=64-l4=32-drop=0.3
90116/90116 [==============================] - 140s 2ms/step - loss: 0.5233 - acc: 0.7369 - val_loss: 0.9760 - val_acc: 0.5336
Epoch 1/6
90116/90116 [==============================] - 142s 2ms/step - loss: 0.5170 - acc: 0.7403 - val_loss: 0.9671 - val_acc: 0.5310
... # results for other epochs
90116/90116 [==============================] - 142s 2ms/step - loss: 0.4953 - acc: 0.7577 - val_loss: 0.9587 - val_acc: 0.5354
Epoch 6/6
90116/90116 [==============================] - 143s 2ms/step - loss: 0.4908 - acc: 0.7614 - val_loss: 1.0319 - val_acc: 0.5397
# -------------------AFTER 31TH SET OF PARAMETERS
RNN-l1=64-l2=256-l3=128-l4=32-drop=0.2
Epoch 1/6
90116/90116 [==============================] - 144s 2ms/step - loss: 0.1080 - acc: 0.9596 - val_loss: 1.8910 - val_acc: 0.5372
As seen, the first epoch of 31th set of parameters behaves as if it is 181th epoch. Similarly, if I stop the run at one point and re-run again, the accuracy and loss look as if it is the next epoch as below.
Epoch 1/6
90116/90116 [==============================] - 144s 2ms/step - loss: 0.1053 - acc: 0.9621 - val_loss: 1.9120 - val_acc: 0.5375
I tried a bunch of things (as you can see in the code), such as model=None, reinitializing the variables, resetting_status of the model, closing session in each iteration etc but none helped. I searched for similar question with no luck.
I am trying to understand what I am doing wrong.
Any help is appreciated,
Note: Title is not very explanatory, I am open to suggestions for a better title.
Looks like you are using a Keras setting, which means you need to import keras backend and then clear that session before you run your new model. It would be something like this:
from keras import backend as K
K.clear_session()