How to get Keras Conv2D layers to work on GPU

How to get Keras Conv2D layers to work on GPU - python

I am trying to train a simple convolutional network using Keras (Tensorflow 2.8.0) in Python 3.7.9 on Spyder IDE 5.2.2. The network involves Conv2D, MaxPooling2D, Flatten and Dense layers.
The model ran perfectly when I used my CPU, but training was slow. So I decided to try to run it on my GPU (GeForce GTX 1050 Ti).
I installed CUDA 11.2 and added lib, include and bin to my path. I installed CuDNN 8.1, and copied cudnn84_8.dll into the CUDA bin directory, also copied cudnn.h into CUDA header and cudnn.lib into CUDA lib.
Once I had done the above, I was able to use my GPU with Tensorflow. Tensorflow does recognize my GPU: when I run tf.config.list_physical_devices() I get:
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
However, when I run the model containing Conv2D layers using my GPU, it fails as follows. This is the output I get:
2022-03-17 18:32:04.687276: I
tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow
binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical
operations: AVX AVX2 To enable them in other operations, rebuild
TensorFlow with the appropriate compiler flags.
2022-03-17 18:32:05.171943: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device
/job:localhost/replica:0/task:0/device:GPU:0 with 2782 MB memory: ->
device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0,
compute capability: 6.1
Epoch 1/10
This is where you would normally see the progress bars for the training (verbose=1) but nothing further appears after "Epoch 1/10". Instead, I think the kernel restarts, because it clears all saved variables and I have to re-import all packages.
If I train a model using only Dense layers, it does work using the GPU. I have checked that it is actually using the GPU in this case, by monitoring GPU usage - it does use it. And as mentioned, the Conv2D model works fine on my CPU.
So in summary, I seem to have a problem selective for Conv2D models on my GPU. Any help in understanding this would be greatly appreciated!
My code:
from matplotlib import pyplot
import tensorflow as tf
import numpy as np
import keras
from keras.datasets import cifar10
#from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from tensorflow.keras.optimizers import SGD
def load_dataset():
(trainX, trainy), (testX, testy) = cifar10.load_data()
print('Train shape: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test shape: X=%s, y=%s' % (testX.shape, testy.shape))
trainY = tf.keras.utils.to_categorical(trainy)
testY = tf.keras.utils.to_categorical(testy)
return trainX, trainY, testX, testY
def prep_pixels(train, test):
# convert from integers to floats
train_norm = train.astype('float32')
test_norm = test.astype('float32')
# normalize to range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0
# return normalized images
return train_norm, test_norm
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3))) #kernel_initializer='he_uniform',
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(learning_rate=0.001, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
def summarize_diagnostics(history):
# plot loss
pyplot.subplot(211)
pyplot.title('Cross Entropy Loss')
pyplot.plot(history.history['loss'], color='blue', label='train')
pyplot.plot(history.history['val_loss'], color='orange', label='test')
# plot accuracy
pyplot.subplot(212)
pyplot.title('Classification Accuracy')
pyplot.plot(history.history['accuracy'], color='blue', label='train')
pyplot.plot(history.history['val_accuracy'], color='orange', label='test')
pyplot.show()
def run_test_harness():
trainX, trainY, testX, testY = load_dataset()
trainX, testX = prep_pixels(trainX, testX)
model = define_model()
history = model.fit(trainX, trainY, epochs=10, batch_size=64, validation_data=(testX, testY))
_, acc = model.evaluate(testX, testY)
print('> %.3f' % (acc * 100.0)) #overall validation accuracy
summarize_diagnostics(history)
tf.config.list_physical_devices()
run_test_harness()

This could be possible due to less RAM available for running this code using GPU in your system.
This code works fine when I tried replicating the same in Google colab using gpu mode, However cpu mode, takes more time to run the code comparatively:
tf.config.list_physical_devices()
Output:
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
To run the code:
run_test_harness()
Output:
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 2s 0us/step
170508288/170498071 [==============================] - 2s 0us/step
Train shape: X=(50000, 32, 32, 3), y=(50000, 1)
Test shape: X=(10000, 32, 32, 3), y=(10000, 1)
Epoch 1/10
782/782 [==============================] - 22s 10ms/step - loss: 1.9416 - accuracy: 0.3059 - val_loss: 1.7283 - val_accuracy: 0.3935
Epoch 2/10
782/782 [==============================] - 7s 9ms/step - loss: 1.6194 - accuracy: 0.4278 - val_loss: 1.5163 - val_accuracy: 0.4528
Epoch 3/10
782/782 [==============================] - 8s 10ms/step - loss: 1.4575 - accuracy: 0.4835 - val_loss: 1.3856 - val_accuracy: 0.5099
Epoch 4/10
782/782 [==============================] - 6s 8ms/step - loss: 1.3539 - accuracy: 0.5213 - val_loss: 1.3111 - val_accuracy: 0.5348
Epoch 5/10
782/782 [==============================] - 5s 7ms/step - loss: 1.2553 - accuracy: 0.5589 - val_loss: 1.2233 - val_accuracy: 0.5658
Epoch 6/10
782/782 [==============================] - 5s 6ms/step - loss: 1.1763 - accuracy: 0.5865 - val_loss: 1.1691 - val_accuracy: 0.5841
Epoch 7/10
782/782 [==============================] - 5s 6ms/step - loss: 1.1091 - accuracy: 0.6115 - val_loss: 1.1284 - val_accuracy: 0.6005
Epoch 8/10
782/782 [==============================] - 5s 6ms/step - loss: 1.0402 - accuracy: 0.6344 - val_loss: 1.0932 - val_accuracy: 0.6183
Epoch 9/10
782/782 [==============================] - 5s 6ms/step - loss: 0.9842 - accuracy: 0.6546 - val_loss: 1.0825 - val_accuracy: 0.6255
Epoch 10/10
782/782 [==============================] - 5s 6ms/step - loss: 0.9350 - accuracy: 0.6742 - val_loss: 1.0722 - val_accuracy: 0.6256
313/313 [==============================] - 1s 3ms/step - loss: 1.0722 - accuracy: 0.6256
> 62.560
Please check again in Google colab by selecting the GPU
(Runtime - Change runtime type - Hardware accelerator - GPU - Save)
and let us know if issue still persists.

Related

Accuracy and val_accuracy don't change while training

I tried to train my convolutional neural network using tensorflow and keras libraries. But the values of accuracy and val_accuracy didn't change the whole time. There is my neural network code:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import pickle
X = pickle.load(open("X.pickle", "rb"))
y = pickle.load(open("y.pickle", "rb"))
X = X/255.0
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy",
optimizer="adam",
metrics=["accuracy"])
model.fit(X, y, batch_size=10, epochs=10, validation_split=0.1)
There is the creation of traning data, features and labels (X - features, y - labels)
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
except Exception as e:
pass
create_training_data()
random.shuffle(training_data)
X = []
y = []
for features, label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
y = np.array(y)
And this is the log of training:
2023-01-15 00:36:42.368335: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/10
70/70 [==============================] - 45s 619ms/step - loss: 0.3039 - accuracy: 0.9627 - val_loss: 0.1211 - val_accuracy: 0.9744
Epoch 2/10
70/70 [==============================] - 42s 600ms/step - loss: 0.1524 - accuracy: 0.9670 - val_loss: 0.1189 - val_accuracy: 0.9744
Epoch 3/10
70/70 [==============================] - 42s 600ms/step - loss: 0.1537 - accuracy: 0.9670 - val_loss: 0.1622 - val_accuracy: 0.9744
Epoch 4/10
70/70 [==============================] - 44s 627ms/step - loss: 0.1563 - accuracy: 0.9670 - val_loss: 0.1464 - val_accuracy: 0.9744
Epoch 5/10
70/70 [==============================] - 42s 604ms/step - loss: 0.1591 - accuracy: 0.9670 - val_loss: 0.1185 - val_accuracy: 0.9744
Epoch 6/10
70/70 [==============================] - 42s 605ms/step - loss: 0.1511 - accuracy: 0.9670 - val_loss: 0.1338 - val_accuracy: 0.9744
Epoch 7/10
70/70 [==============================] - 49s 698ms/step - loss: 0.1623 - accuracy: 0.9670 - val_loss: 0.1188 - val_accuracy: 0.9744
Epoch 8/10
70/70 [==============================] - 50s 709ms/step - loss: 0.1480 - accuracy: 0.9670 - val_loss: 0.1397 - val_accuracy: 0.9744
Epoch 9/10
70/70 [==============================] - 45s 637ms/step - loss: 0.1508 - accuracy: 0.9670 - val_loss: 0.1203 - val_accuracy: 0.9744
Epoch 10/10
70/70 [==============================] - 47s 665ms/step - loss: 0.1716 - accuracy: 0.9670 - val_loss: 0.1238 - val_accuracy: 0.9744
Process finished with exit code 0
What should I do to fix this problem?

There are a couple potential reasons as to why you are facing this:
Your dataset is far too small. If your validation set is tiny, there is a high probability that your model will get the same % of predictions correct/incorrect
There is a great imbalance in your dataset. If one class heavily outweighs another, your model will favor the majority class, and predict it no matter what, as that is what brings the optimal accuracy for the model.
From what I see, there is nothing wrong with your code, rather modifications that need to be made to the dataset itself.

Hmm accuracy and validation accuracy are high even on the first epoch. Try using a lower learning rate in the Adam optimizer say .0002, On the first epoch pay attention to the loss and validation loss as the batches are process. It should start low and gradually increase during the epoch.

How to reset Keras metrics?

To do some parameter tuning, I like to loop over some training function with Keras. However, I realized that when using tensorflow.keras.metrics.AUC() as a metric, for every training loop, an integer gets added to the auc metric name (e.g. auc_1, auc_2, ...). So actually the keras metrics are somehow stored even when coming out of the training function.
This first of all leads to the callbacks not recognizing the metric anymore and also makes me wonder if there are not other things stored like the model weights.
How can I reset the metrics and are there other things that get stored by keras that I need to reset to get a clean restart for training?
Below you can find a minimal working example:
edit: this example seems to only work with tensorflow 2.2
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='sigmoid'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[AUC()])
return model
def train():
CB_lr = tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_auc",
patience=3,
verbose=1,
mode="max",
min_delta=0.0001,
min_lr=1e-6)
CB_es = tf.keras.callbacks.EarlyStopping(
monitor="val_auc",
min_delta=0.00001,
verbose=1,
patience=10,
mode="max",
restore_best_weights=True)
callbacks = [CB_lr, CB_es]
y = [np.ones((11, 1)) for _ in range(1000)]
x = [np.ones((37, 12, 1)) for _ in range(1000)]
dummy_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
model = dummy_network(input_shape=((37, 12, 1)))
model.fit(dummy_dataset, validation_data=val_dataset, epochs=2,
steps_per_epoch=len(x) // 100,
validation_steps=len(x) // 100, callbacks=callbacks)
for i in range(3):
print(f'\n\n **** Loop {i} **** \n\n')
train()
The output is:
**** Loop 0 ****
2020-06-16 14:37:46.621264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f991e541f10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-16 14:37:46.621296: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 1/2
10/10 [==============================] - 0s 44ms/step - loss: 0.1295 - auc: 0.0000e+00 - val_loss: 0.0310 - val_auc: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - 0s 10ms/step - loss: 0.0262 - auc: 0.0000e+00 - val_loss: 0.0223 - val_auc: 0.0000e+00 - lr: 0.0010
**** Loop 1 ****
Epoch 1/2
10/10 [==============================] - ETA: 0s - loss: 0.4751 - auc_1: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
10/10 [==============================] - 0s 36ms/step - loss: 0.4751 - auc_1: 0.0000e+00 - val_loss: 0.3137 - val_auc_1: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - ETA: 0s - loss: 0.2617 - auc_1: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_1,val_loss,val_auc_1,lr
10/10 [==============================] - 0s 10ms/step - loss: 0.2617 - auc_1: 0.0000e+00 - val_loss: 0.2137 - val_auc_1: 0.0000e+00 - lr: 0.0010
**** Loop 2 ****
Epoch 1/2
10/10 [==============================] - ETA: 0s - loss: 0.1948 - auc_2: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
10/10 [==============================] - 0s 34ms/step - loss: 0.1948 - auc_2: 0.0000e+00 - val_loss: 0.0517 - val_auc_2: 0.0000e+00 - lr: 0.0010
Epoch 2/2
10/10 [==============================] - ETA: 0s - loss: 0.0445 - auc_2: 0.0000e+00WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
WARNING:tensorflow:Early stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc_2,val_loss,val_auc_2,lr
10/10 [==============================] - 0s 10ms/step - loss: 0.0445 - auc_2: 0.0000e+00 - val_loss: 0.0389 - val_auc_2: 0.0000e+00 - lr: 0.0010

Your reproducible example failed in several places for me, so I changed just a few things (I'm using TF 2.1). After getting it to run, I was able to get rid of the additional metric names by specifying metrics=[AUC(name='auc')]. Here's the full (fixed) reproducible example:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='softmax'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[AUC(name='auc')])
return model
def train():
CB_lr = tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_auc",
patience=3,
verbose=1,
mode="max",
min_delta=0.0001,
min_lr=1e-6)
CB_es = tf.keras.callbacks.EarlyStopping(
monitor="val_auc",
min_delta=0.00001,
verbose=1,
patience=10,
mode="max",
restore_best_weights=True)
callbacks = [CB_lr, CB_es]
y = tf.keras.utils.to_categorical([np.random.randint(0, 11) for _ in range(1000)])
x = [np.ones((37, 12, 1)) for _ in range(1000)]
dummy_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
val_dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size=100).repeat()
model = dummy_network(input_shape=((37, 12, 1)))
model.fit(dummy_dataset, validation_data=val_dataset, epochs=2,
steps_per_epoch=len(x) // 100,
validation_steps=len(x) // 100, callbacks=callbacks)
for i in range(3):
print(f'\n\n **** Loop {i} **** \n\n')
train()
Train for 10 steps, validate for 10 steps
Epoch 1/2
1/10 [==>...........................] - ETA: 6s - loss: 0.3426 - auc: 0.4530
7/10 [====================>.........] - ETA: 0s - loss: 0.3318 - auc: 0.4895
10/10 [==============================] - 1s 117ms/step - loss: 0.3301 -
auc: 0.4893 - val_loss: 0.3222 - val_auc: 0.5085
This happens because every loop, you created a new metric without a specified name by doing this: metrics=[AUC()]. The first iteration of the loop, TF automatically created a variable in the name space called auc, but at the second iteration of your loop, the name 'auc' was already taken, so TF named it auc_1 since you didn't specify a name. But, your callback was set to be based on auc, which is a metric that this model didn't have (it was the metric of the model from the previous loop). So, you either do name='auc' and overwrite the previous metric name, or define it outside of the loop, like this:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.metrics import AUC
auc = AUC()
def dummy_network(input_shape):
model = keras.Sequential()
model.add(keras.layers.Dense(10,
input_shape=input_shape,
activation=tf.nn.relu,
kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(l=1e-3)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(11, activation='softmax'))
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
metrics=[auc])
return model
And don't worry about keras resetting the metrics. It takes care of all that in the fit() method. If you want more flexibility and/or do it yourself, I suggest using custom training loops, and reset it yourself:
auc = tf.keras.metrics.AUC()
auc.update_state(np.random.randint(0, 2, 10), np.random.randint(0, 2, 10))
print(auc.result())
auc.reset_states()
print(auc.result())
Out[6]: <tf.Tensor: shape=(), dtype=float32, numpy=0.875> # state updated
Out[8]: <tf.Tensor: shape=(), dtype=float32, numpy=0.0> # state reset

Model fitting doesn't use all of the provided data [duplicate]

This question already has answers here:
Keras not training on entire dataset
(3 answers)
TensorFlow Only running on 1/32 of the Training data provided [duplicate]
(1 answer)
Closed 2 years ago.
I ran into a problem when playing with the introduction Tutorial for Tensorflow 2.0 Keras (https://www.tensorflow.org/tutorials/keras/classification).
The Problem:
There should be (and there are) 60.000 Images to fit the model. I checked this by printing out the length of train_images and train_labels.
The output when fitting the model on the other hand lets me believe that not all of the data was used as it says 1875/1875. Same for the Testing Data.
I deactivated the GPU Detection which does not have an effect on this (so it seems).
I'm using:
Python 3.8.3
Tensorflow 2.2.0
My Code:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
data = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = data.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
# preprocess the image data to have a pixel value between 0 and 1
train_images = train_images / 255.0
test_images = test_images / 255.0
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('\nTest accuracy:', test_acc)
Output:
2020-05-17 17:48:07.147033: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-17 17:48:10.075816: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-05-17 17:48:10.098581: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-05-17 17:48:10.105898: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-UU9P1OG
2020-05-17 17:48:10.109837: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-UU9P1OG
2020-05-17 17:48:10.113879: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-05-17 17:48:10.127711: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14dc97288a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-17 17:48:10.132743: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 1/10
1875/1875 [==============================] - 2s 1ms/step - loss: 0.4943 - accuracy: 0.8264
Epoch 2/10
1875/1875 [==============================] - 2s 938us/step - loss: 0.3747 - accuracy: 0.8649
Epoch 3/10
1875/1875 [==============================] - 2s 929us/step - loss: 0.3403 - accuracy: 0.8762
Epoch 4/10
1875/1875 [==============================] - 2s 914us/step - loss: 0.3146 - accuracy: 0.8844
Epoch 5/10
1875/1875 [==============================] - 2s 937us/step - loss: 0.2985 - accuracy: 0.8900
Epoch 6/10
1875/1875 [==============================] - 2s 923us/step - loss: 0.2808 - accuracy: 0.8964
Epoch 7/10
1875/1875 [==============================] - 2s 939us/step - loss: 0.2702 - accuracy: 0.8998
Epoch 8/10
1875/1875 [==============================] - 2s 911us/step - loss: 0.2585 - accuracy: 0.9032
Epoch 9/10
1875/1875 [==============================] - 2s 918us/step - loss: 0.2482 - accuracy: 0.9073
Epoch 10/10
1875/1875 [==============================] - 2s 931us/step - loss: 0.2412 - accuracy: 0.9106
313/313 - 0s - loss: 0.3484 - accuracy: 0.8729
Test accuracy: 0.8729000091552734

The model is being trained with a batchsize of 32, hence there are 60,000/32 = 1875 batches.
Despite tensorflow documentation shows batch_size=None in the fit function overview, the information about this argument says:
batch_size: Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of datasets, generators, or keras.utils.Sequence instances (since they generate batches).

why multi_gpu_model in tf.keras is much slower than the one in keras?

The multi_gpu_model in tf.keras seems to be much slower than the one in keras. For the example given here, it is about 12x slower when importing from tensorflow.keras instead of keras.
import tensorflow as tf
from tensorflow.keras.applications import Xception
from tensorflow.keras.utils import multi_gpu_model
import numpy as np
num_samples = 1000
height = 224
width = 224
num_classes = 1000
# Instantiate the base model (or "template" model).
# We recommend doing this with under a CPU device scope,
# so that the model's weights are hosted on CPU memory.
# Otherwise they may end up hosted on a GPU, which would
# complicate weight sharing.
with tf.device('/cpu:0'):
model = Xception(weights=None,
input_shape=(height, width, 3),
classes=num_classes)
# Replicates the model on 8 GPUs.
# This assumes that your machine has 8 available GPUs.
parallel_model = multi_gpu_model(model, gpus=4)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
# Generate dummy data.
x = np.random.random((num_samples, height, width, 3))
y = np.random.random((num_samples, num_classes))
# This `fit` call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)
# Save model via the template model (which shares the same weights):
model.save('my_model.h5')
The only changes are
from tensorflow.keras.applications import Xception
from tensorflow.keras.utils import multi_gpu_model
instead of
from keras.applications import Xception
from keras.utils import multi_gpu_model
With tf.keras
Epoch 1/20
1000/1000 [==============================] - 78s 78ms/step - loss: 3487.2197
Epoch 2/20
1000/1000 [==============================] - 37s 37ms/step - loss: 3454.2403
Epoch 3/20
1000/1000 [==============================] - 37s 37ms/step - loss: 3453.6264
Epoch 4/20
1000/1000 [==============================] - 37s 37ms/step - loss: 3452.7994
Epoch 5/20
1000/1000 [==============================] - 37s 37ms/step - loss: 3452.3592
Importing directly from keras
Epoch 1/20
1000/1000 [==============================] - 52s 52ms/step - loss: 3486.8955
Epoch 2/20
1000/1000 [==============================] - 3s 3ms/step - loss: 3454.1935
Epoch 3/20
1000/1000 [==============================] - 3s 3ms/step - loss: 3453.5585
Epoch 4/20
1000/1000 [==============================] - 3s 3ms/step - loss: 3452.8249
Epoch 5/20
1000/1000 [==============================] - 3s 3ms/step - loss: 3452.1542
That is like a 12x speed difference from the second epoch onwards.
I am using latest keras 2.2.4 and tensorflow-gpu-1.10

Keras: Binary_crossentropy has negative values

I'm following this tutorial (section 6: Tying it All Together), with my own dataset. I can get the example in the tutorial working, no problem, with the sample dataset provided.
I'm getting a binary cross-entropy error that is negative, and no improvements as epochs progress. I'm pretty sure binary cross-entropy should always be positive, and I should see some improvement in the loss. I've truncated the sample output (and code call) below to 5 epochs. Others seem to run into similar problems sometimes when training CNNs, but I didn't see a clear solution in my case. Does anyone know why this is happening?
Sample output:
Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN Black, pci bus id: 0000:84:00.0)
10240/10240 [==============================] - 2s - loss: -5.5378 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
Epoch 2/5
10240/10240 [==============================] - 0s - loss: -7.9712 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
Epoch 3/5
10240/10240 [==============================] - 0s - loss: -7.9712 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
Epoch 4/5
10240/10240 [==============================] - 0s - loss: -7.9712 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
Epoch 5/5
10240/10240 [==============================] - 0s - loss: -7.9712 - acc: 0.5000 - val_loss: -7.9712 - val_acc: 0.5000
My code:
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import History
history = History()
seed = 7
np.random.seed(seed)
dataset = np.loadtxt('train_rows.csv', delimiter=",")
#print dataset.shape (10240, 64)
# split into input (X) and output (Y) variables
X = dataset[:, 0:(dataset.shape[1]-2)] #0:62 (63 of 64 columns)
Y = dataset[:, dataset.shape[1]-1] #column 64 counting from 0
#print X.shape (10240, 62)
#print Y.shape (10240,)
testset = np.loadtxt('test_rows.csv', delimiter=",")
#print testset.shape (2560, 64)
X_test = testset[:,0:(testset.shape[1]-2)]
Y_test = testset[:,testset.shape[1]-1]
#print X_test.shape (2560, 62)
#print Y_test.shape (2560,)
num_units_per_layer = [100, 50]
### create model
model = Sequential()
model.add(Dense(100, input_dim=(dataset.shape[1]-2), init='uniform', activation='relu'))
model.add(Dense(50, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
## Fit the model
model.fit(X, Y, validation_data=(X_test, Y_test), nb_epoch=5, batch_size=128)

I should have printed out my response variable. The categories were labelled as 1 and 2 instead of 0 and 1, which confused the classifier.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get Keras Conv2D layers to work on GPU - python

Related

Accuracy and val_accuracy don't change while training

How to reset Keras metrics?

Model fitting doesn't use all of the provided data [duplicate]

why multi_gpu_model in tf.keras is much slower than the one in keras?

Keras: Binary_crossentropy has negative values

Categories

Resources