I've an issue running a Keras model on a Google Cloud Platform instance.
The model is the following:
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
verbose, epochs, batch_size = 1, 1, 64 # low number of epochs just for testing purpose
with tf.device('/cpu:0'):
m = Sequential()
m.add(CuDNNLSTM(20, input_shape=(n_timesteps, n_features)))
m.add(LeakyReLU(alpha=0.1))
m.add(RepeatVector(n_outputs))
m.add(CuDNNLSTM(20, return_sequences=True))
m.add(LeakyReLU(alpha=0.1))
m.add(TimeDistributed(Dense(20)))
m.add(LeakyReLU(alpha=0.1))
m.add(TimeDistributed(Dense(1)))
self.model = multi_gpu_model(m, gpus=8)
self.model.compile(loss='mse', optimizer='adam')
self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
As you can see from the code above, I run the model on machine with 8 GPUs (Nvidia Tesla K80).
Train works well, without any errors. However, the prediction fails and returns the following error:
W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at cudnn_rnn_ops.cc:1336 : Unknown: CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1285): 'cudnnSetTensorNdDescriptor( tensor_desc.get(), data_type, sizeof(dims) / sizeof(dims[0]), dims, strides)'
Here the code to run the prediction:
self.model.predict(input_x)
What I've noticed is that if I remove the code for multi-GPU data parallelism, the code works well using a single GPU.
To be more precise, if I comment this line, the code works without error
self.model = multi_gpu_model(m, gpus=8)
What am I missing?
virtualenv information
cudatoolkit - 10.0.130
cudnn - 7.6.4
keras - 2.2.4
keras-applications - 1.0.8
keras-base - 2.2.4
keras-gpu - 2.2.4
python - 3.6
UPDATE
train_x.shape = (1441, 288, 1)
train_y.shape = (1441, 288, 1)
input_x.shape = (1, 288, 1)
After Olivier Dehaene's reply I tried his suggestion and it worked.
I tried to modify the input_x shape in order to obtain (8, 288, 1).
In order to do that I also modified train_x and train_y shapes.
Here a recap:
train_x.shape = (8065, 288, 1)
train_y.shape = (8065, 288, 1)
input_x.shape = (8, 288, 1)
But now I've the same error on the training phase, on this line:
self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
From the tf.keras.utils.multi_gpu_model we can see that it works in the following way:
Divide the model's input(s) into multiple sub-batches.
Apply a model copy on each sub-batch. Every model copy is executed on a dedicated GPU.
Concatenate the results (on CPU) into one big batch.
You are triggering an error because the input of the CuDNNLSTM layer is empty for at least one of the model copy. This is because the divide operations requires that: input // n_gpus > 0
Try this code out:
input_x = np.random.randn(8, n_timesteps, n_features)
model.predict(input_x)
Related
I am novice in TensorFlow
I am traying to use BERT embeddings in LSTM model
this is my model function
def bert_tweets_model():
Bertmodel = TFAutoModel.from_pretrained(model_name,output_hidden_states=True)
input_word_ids = tf.keras.Input(shape=(max_length,), dtype=tf.int32, name="input_ids")
input_masks_in = tf.keras.Input(shape=(max_length,), name='masked_token', dtype='int32')
with torch.no_grad():
last_hidden_states = Bertmodel(input_word_ids, attention_mask=input_masks_in)[0]
x = tf.keras.layers.LSTM(100, dropout=0.1, activation='relu',recurrent_dropout=0.3,return_sequences = True)(last_hidden_states)
x = tf.keras.layers.LSTM(50, dropout=0.1,activation='relu', recurrent_dropout=0.3,return_sequences = True)(x)
x=tf.keras.layers.Flatten()(x)
output = tf.keras.layers.Dense(units = 2, activation='sigmoid')(x)
model = tf.keras.Model(inputs=[input_word_ids, input_masks_in], outputs = output)
return model
with strategy.scope():
model = bert_tweets_model()
adam_optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5)
model.compile(loss='binary_crossentropy',optimizer=adam_optimizer,metrics=['accuracy'])
model.summary()
validation_data=[dev_encoded, y_val]
train2=[input_id, attention_mask]
history = model.fit(
x=train2, y=y_train, batch_size=batch_size,
epochs=3,
validation_data=validation_data,
verbose=2)
I recieved this error in fit function when I tried to input data
"ValueError: Layer "model_1" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 512) dtype=int32>]"
also,I received these warning massages I do not know what is means.
WARNING:tensorflow:Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
WARNING:tensorflow:Layer lstm_3 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
can someone help me, thanks in advance.
Regenerating your error
_input1 = tf.random.uniform((1,100), 0 , 10)
_input2 = tf.random.uniform((1,100), 0 , 10)
model(_input1, _input2)
After running this code I am getting the same error...
Layer "model" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor: shape=(1, 100), ...
#Now, the problem is you have to enclose the inputs in the set or list then you have to pass the inputs to the model like this
model((_input1, _input2))
<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[0.5324366, 0.3743334]], dtype=float32)>
Remember: if you are using tf.data.Dataset then encolse it then while making the dataset enclose the dataset within the set like this
tf.data.Dataset.from_tensor_slices((words_id, words_mask))
Second Problem as you asked
The warning you are getting because, you should be aware that LSTM doesn't run in CUDA GPU it uses the CPU only therefore it is slow, so TensorFlow is just telling you that LSTM will not run under GPU or parallel computing.
Very simple question. I am using tensorflow probability package to use a bijector to form a trainable_distribution from a simple distribution (let's say gaussian)
Everything works properly in jupyter notebook but as I bring it to terminal (where I am running my code file) it gives me this error:
TypeError: Cannot interpret '<KerasTensor: shape=(None, 3) dtype=float32 (created by layer 'input_1')>' as a data type
Here is my code for this part (X_data is (m,3) where m is the number of samples and trainable_distribution is already built using tensorflow_probability.distributions.TransformedDistribution(base_dist, bijector):
def train_dist_routine(X_data, trainable_distribution, n_epochs=200, batch_size=None):
x_ = tensorflow.keras.layers.Input(shape=(3,), dtype=tf.float32)
print(x_)
log_prob_ = trainable_distribution.log_prob(x_)
model = tensorflow.keras.models.Model(x_, log_prob_)
model.compile(optimizer=tf.optimizers.Adam(),
loss=lambda _, log_prob: -log_prob)
ns = X_data.shape[0]
if batch_size is None:
batch_size = ns
history = model.fit(x=X_data,
y=np.zeros((ns, 0), dtype=np.float32),
batch_size=batch_size,
epochs=n_epochs,
validation_split=0.2,
shuffle=True,
verbose=False)
return history
I have a Keras Model declared with the following code:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(units=50, activation="tanh", return_sequences=False, input_shape=(settings["past_size"], len(indicators_used))))
model.add(tf.keras.layers.Dense(3, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit_generator(
generator=batch_generator(train_x, train_y),
steps_per_epoch=n_batches_per_epoch,
epochs=settings["epochs"],
workers=5,
use_multiprocessing=True,
max_queue_size=10000)
I experimented with the workers, use_multiprocessing and the max_queue_size settings but to no avail. The input shape is (100000, 500, 27).
The Batch Generator function looks like this:
def batch_generator(x, y):
while True:
for i in range(n_batches_per_epoch):
x_train = []
y_train = []
for j in range(settings["past_size"] + settings["batch_size"] * i, settings["past_size"] + (settings["batch_size"] * (i + 1))):
x_train.append(x.iloc[j - past_size:j].to_numpy())
y_train.append(y.iloc[j].to_numpy())
return_x, return_y = np.array(x_train), np.array(y_train)
yield return_x, return_y
Execution Times:
Batch Size 256: 767ms/step
Batch Size 512: 1s/step
Batch Size 1024: 2s/step
The problem I'm facing now is that the Keras training process is incredibly slow. One Epoch is taking approximately 45 minutes. I can't use model.fit() because the data is too large for the RAM.
My understanding of the batch_generator functionality was, that the function prepares batches and loads them on the GPU / TPU, but that doesn't seem to be the case. This Code runs in Google Colab with the TPU Runtime.
In the Google colab environment, you need to explicitly convert the model to a TPU compatible version. This was my mistake when I worked with Google Colab the last time.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
tpu_model has the same interface as model.
Guide:
https://medium.com/tensorflow/tf-keras-on-tpus-on-colab-674367932aa0
Unfortunately, this doesn´t seem to work with the Sequential api.
When trying to fit Keras model, written in tensorflow.keras API with tf.Dataset induced iterator, the model is complaining about steps_per_epoch argument, even though I've set this one to a concrete value.
Here below is my model class
import tensorflow as tf
import numpy as np
from typing import Union, List
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras import layers
from tftools import TFTools
class TestServe():
def __init__(self, tfrecords: Union[List[tf.train.Example], tf.train.Example], batch_size: int = 10, input_shape: tuple = (64, 23)) -> None:
self.tfrecords = tfrecords
self.batch_size = batch_size
self.input_shape = input_shape
def get_model(self):
ins = layers.Input(shape=(64, 23))
l = layers.Reshape((*self.input_shape, 1))(ins)
l = layers.Conv2D(8, (30, 23), padding='same', activation='relu')(l)
l = layers.MaxPool2D((4, 5), strides=(4, 5))(l)
l = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(l)
l = layers.Conv2D(32, (3, 3), padding='same', activation='relu')(l)
l = layers.MaxPool2D((2, 2), strides=(2, 2))(l)
l = layers.Flatten()(l)
out = layers.Dense(1, activation='softmax')(l)
return tf.keras.models.Model(ins, out)
def train(self):
# Create Dataset
dataset = TFTools.create_dataset(self.tfrecords)
dataset = dataset.repeat(6).batch(self.batch_size)
val_iterator = dataset.take(300).make_one_shot_iterator()
train_iterator = dataset.skip(300).make_one_shot_iterator()
model = self.get_model()
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_iterator, validation_data=val_iterator,
epochs=10, verbose=1, steps_per_epoch=20)
def predict(self, X: np.array) -> np.array:
pass
ts = TestServe(['./ok.tfrecord', './nok.tfrecord'])
ts.train()
But as soon I start the training, before the first batch is finished, I get an exception from tensorflow
2019-06-13 14:22:25.393398: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1995445000 Hz
2019-06-13 14:22:25.393681: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2f7d120 executing computations on platform Host. Devices:
2019-06-13 14:22:25.393708: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
Epoch 1/2
19/20 [===========================>..] - ETA: 0s - loss: 1.1921e-07 - acc: 1.0000Traceback (most recent call last):
File "TestServe.py", line 62, in <module>
ts.train()
File "TestServe.py", line 56, in train
epochs=2, verbose=1, callbacks=callbacks, steps_per_epoch=20) #The steps_per_epoch is typically samples_per_epoch / batch_size
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
validation_steps=validation_steps)
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 364, in model_iteration
validation_in_fit=True)
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 202, in model_iteration
steps_per_epoch)
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 76, in _get_num_samples_or_steps
'steps_per_epoch')
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 230, in check_num_samples
if check_steps_argument(ins, steps, steps_name):
File "/home/josef/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 960, in check_steps_argument
input_type=input_type_str, steps_name=steps_name))
ValueError: When using data tensors as input to a model, you should specify the `steps_per_epoch` argument.
The original dataset contains around 1500 samples, but I want to join multiple tfrecord files to TFRecordDataset so I wont have the information about the length.
Anyone saw something similar before? I dont know where to go for help, since the tf.keras API is relatively new. The create_dataset function just returns the dataset mapped with the right parse function.
Found the solution.
There is not only steps_per_epoch but also validation_steps parameter, which you also have to specify.
This error was reported when I tried to use a TensorFlow 2.0 model when I actually had an older version (TensorFlow 1.14) installed locally.
To upgrade to the latest TensorFlow version, run:
python -m pip install --upgrade pip
python -m pip install --upgrade tensorflow
I want to run an implementation of the YOLO algorithm (object detection) with Keras. The code I use come mostly from here.
I am trying to train my model with a sample of the Open Image Dataset V4 from Google.
The problem is that, when I try to train my model, I get the following warnings and exception:
W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 831.81MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 380.25MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 84.50MiB. Current allocation summary follows.
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[8,64,208,208] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node conv2d_3/Conv2D}} = Conv2D[T=DT_FLOAT, _class=["loc:#training/Adam/gradients/conv2d_3/Conv2D_grad/Conv2DBackpropInput"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/Conv2D/ReadVariableOp)]]
(Here I am using the tensorflow-GPU lib, but I have a similar error without the non-GPU tensorflow.)
At first I though it was because of the size of my dataset (200.000 pictures => ~60GB), but when running the code with a minimal sample (500 pictures => ~150MB), I get exactly the same error.
So I guess there is a problem with my code.
Here is a minimal example of the problematic part (I guess) :
def _main():
input_shape = [416,416]
model = ### #Create YOLO model
anchors = ### #Collection of 9 anchors
num_classes = 601
train_data = ### # A collection of the form [PathToImage, X1,X2,Y1,Y2, class], where the X,Y values define the bounding box
valid_data = ### # A collection of the form [PathToImage, X1,X2,Y1,Y2, class], where the X,Y values define the bounding box
batch_size = 8
model.fit_generator(data_generator(train_data, batch_size, input_shape, anchors, num_classes),
steps_per_epoch=max(1, len(train_data)//batch_size),
validation_data=data_generator(valid_data, batch_size, input_shape, anchors, num_classes),
validation_steps=max(1, len(valid_data)//batch_size),
epochs=50,
initial_epoch=0)
# Unfreeze and continue training, to fine-tune.
for i in range(len(model.layers)):
model.layers[i].trainable = True
model.compile(optimizer=Adam(lr=1e-4), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change
print('Unfreeze all of the layers.')
print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
model.fit_generator(data_generator(train_data, batch_size, input_shape, anchors, num_classes),
steps_per_epoch=max(1, len(train_data)//batch_size),
validation_data=data_generator(valid_data, batch_size, input_shape, anchors, num_classes),
validation_steps=max(1, len(valid_data)//batch_size),
epochs=100,
initial_epoch=50)
def data_generator(lines, batch_size, input_shape, anchors, num_classes):
'''data generator for fit_generator'''
n = len(lines)
i = 0
while True:
image_data = []
box_data = []
for b in range(batch_size):
if i==0:
np.random.shuffle(lines)
image, box = get_data(lines[i], input_shape) # Retrieve the image from path and return it with the bounding box (the object class is in box object)
image_data.append(image)
box_data.append(box)
i = (i+1) % n
image_data = np.array(image_data)
box_data = np.array(box_data)
y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes) # For each boxes, find the best anchor
yield [image_data, *y_true], np.zeros(batch_size)
The OOM exception is raised on the second call to fit_generator()
Following answer on similar question, I added the gpu_options allow_growth ont my TensorFlow session :
K.clear_session() # get a new session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
K.set_session(sess)
But id does not solved the problem.
So I am a bit stuck here. What am I doing wrong ?
Notes :
I have a Quadro P1000 GPU with 20GB GPU Memory (according to the Windows task manager)
I have 32GB RAM
I haven't changed the model architecture, you can find it here