Keras fit_generator running very slowly

Keras fit_generator running very slowly - python

I have a Keras Model declared with the following code:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(units=50, activation="tanh", return_sequences=False, input_shape=(settings["past_size"], len(indicators_used))))
model.add(tf.keras.layers.Dense(3, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit_generator(
generator=batch_generator(train_x, train_y),
steps_per_epoch=n_batches_per_epoch,
epochs=settings["epochs"],
workers=5,
use_multiprocessing=True,
max_queue_size=10000)
I experimented with the workers, use_multiprocessing and the max_queue_size settings but to no avail. The input shape is (100000, 500, 27).
The Batch Generator function looks like this:
def batch_generator(x, y):
while True:
for i in range(n_batches_per_epoch):
x_train = []
y_train = []
for j in range(settings["past_size"] + settings["batch_size"] * i, settings["past_size"] + (settings["batch_size"] * (i + 1))):
x_train.append(x.iloc[j - past_size:j].to_numpy())
y_train.append(y.iloc[j].to_numpy())
return_x, return_y = np.array(x_train), np.array(y_train)
yield return_x, return_y
Execution Times:
Batch Size 256: 767ms/step
Batch Size 512: 1s/step
Batch Size 1024: 2s/step
The problem I'm facing now is that the Keras training process is incredibly slow. One Epoch is taking approximately 45 minutes. I can't use model.fit() because the data is too large for the RAM.
My understanding of the batch_generator functionality was, that the function prepares batches and loads them on the GPU / TPU, but that doesn't seem to be the case. This Code runs in Google Colab with the TPU Runtime.

In the Google colab environment, you need to explicitly convert the model to a TPU compatible version. This was my mistake when I worked with Google Colab the last time.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
tpu_model has the same interface as model.
Guide:
https://medium.com/tensorflow/tf-keras-on-tpus-on-colab-674367932aa0
Unfortunately, this doesn´t seem to work with the Sequential api.

Related

How to properly setup a data set for training a Keras model

I am trying to create a dataset for audio recognition with a simple Keras sequential model.
This is the function I am using to create the model:
def dnn_model(input_shape, output_shape):
model = keras.Sequential()
model.add(keras.Input(input_shape))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation = "relu"))
model.add(layers.Dense(output_shape, activation = "softmax"))
model.compile( optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['acc'])
model.summary()
return model
And I am Generating my trainingsdata with this Generator function:
def generator(x_dirs, y_dirs, hmm, sampling_rate, parameters):
window_size_samples = tools.sec_to_samples(parameters['window_size'], sampling_rate)
window_size_samples = 2**tools.next_pow2(window_size_samples)
hop_size_samples = tools.sec_to_samples(parameters['hop_size'],sampling_rate)
for i in range(len(x_dirs)):
features = fe.compute_features_with_context(x_dirs[i],**parameters)
praat = tools.praat_file_to_target( y_dirs[i],
sampling_rate,
window_size_samples,
hop_size_samples,
hmm)
yield features,praat
The variables x_dirs and y_dirs contain a list of paths to labels and audiofiles. In total I got 8623 files to train my model. This is how I train my model:
def train_model(model, model_dir, x_dirs, y_dirs, hmm, sampling_rate, parameters, steps_per_epoch=10,epochs=10):
model.fit((generator(x_dirs, y_dirs, hmm, sampling_rate, parameters)),
epochs=epochs,
batch_size=steps_per_epoch)
return model
My problem now is that if I pass all 8623 files it will use all 8623 files to train the model in the first epoch and complain after the first epoch that it needs steps_per_epoch * epochs batches to train the model.
I tested this with only 10 of the 8623 files with a sliced list, but than Tensorflow complains that there are needed 100 batches.
So how do I have my Generator yield out data that its works best? I always thought that steps_per_epoch just limits the data received per epoch.

The fit function is going to exhaust your generator, that is to say, once it will have yielded all your 8623 batches, it wont be able to yield batches anymore.
You want to solve the issue like this:
def generator(x_dirs, y_dirs, hmm, sampling_rate, parameters, epochs=1):
for epoch in range(epochs): # or while True:
window_size_samples = tools.sec_to_samples(parameters['window_size'], sampling_rate)
window_size_samples = 2**tools.next_pow2(window_size_samples)
hop_size_samples = tools.sec_to_samples(parameters['hop_size'],sampling_rate)
for i in range(len(x_dirs)):
features = fe.compute_features_with_context(x_dirs[i],**parameters)
praat = tools.praat_file_to_target( y_dirs[i],
sampling_rate,
window_size_samples,
hop_size_samples,
hmm)
yield features,praat

Keras model doesn't update weights

I'm trying to make a simple neural network with Keras, but my weights won't update after calling fit()
To test the model, I created a simple data set, called mem. mem is a deque of tuples. mem[i][0] gives a np.array of size inp_len of only ones or only zeros.
Here is my code:
inp_len = 5*3 + 3187*4
model = Sequential()
model.add(Dense(units=124, kernel_initializer='ones', input_shape = (inp_len,)))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(48, kernel_initializer='ones'))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(48, kernel_initializer='ones'))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(48, kernel_initializer='ones'))
model.add(LeakyReLU(alpha=0.05))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=Adam(lr=learning_rate, decay=learning_rate_decay))
batch_size = 20
batch_old = random.sample(mem, min(len(mem), batch_size))
for i_batch in range(len(batch_old)):
X = batch_old[i_batch][0].reshape(1,inp_len)
y = np.array([[X[0]]])
model.fit(X, y, epochs = 1, batch_size = 1)
I use 1 epoch and with a batch size of 1, because I want to use model.predict() in another part of the code with a different batch size.
Can someone please explain why model.get_weights()[0] keeps returning ones after fitting the model?

Error on prediction running keras multi_gpu_model

I've an issue running a Keras model on a Google Cloud Platform instance.
The model is the following:
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
verbose, epochs, batch_size = 1, 1, 64 # low number of epochs just for testing purpose
with tf.device('/cpu:0'):
m = Sequential()
m.add(CuDNNLSTM(20, input_shape=(n_timesteps, n_features)))
m.add(LeakyReLU(alpha=0.1))
m.add(RepeatVector(n_outputs))
m.add(CuDNNLSTM(20, return_sequences=True))
m.add(LeakyReLU(alpha=0.1))
m.add(TimeDistributed(Dense(20)))
m.add(LeakyReLU(alpha=0.1))
m.add(TimeDistributed(Dense(1)))
self.model = multi_gpu_model(m, gpus=8)
self.model.compile(loss='mse', optimizer='adam')
self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
As you can see from the code above, I run the model on machine with 8 GPUs (Nvidia Tesla K80).
Train works well, without any errors. However, the prediction fails and returns the following error:
W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at cudnn_rnn_ops.cc:1336 : Unknown: CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1285): 'cudnnSetTensorNdDescriptor( tensor_desc.get(), data_type, sizeof(dims) / sizeof(dims[0]), dims, strides)'
Here the code to run the prediction:
self.model.predict(input_x)
What I've noticed is that if I remove the code for multi-GPU data parallelism, the code works well using a single GPU.
To be more precise, if I comment this line, the code works without error
self.model = multi_gpu_model(m, gpus=8)
What am I missing?
virtualenv information
cudatoolkit - 10.0.130
cudnn - 7.6.4
keras - 2.2.4
keras-applications - 1.0.8
keras-base - 2.2.4
keras-gpu - 2.2.4
python - 3.6
UPDATE
train_x.shape = (1441, 288, 1)
train_y.shape = (1441, 288, 1)
input_x.shape = (1, 288, 1)
After Olivier Dehaene's reply I tried his suggestion and it worked.
I tried to modify the input_x shape in order to obtain (8, 288, 1).
In order to do that I also modified train_x and train_y shapes.
Here a recap:
train_x.shape = (8065, 288, 1)
train_y.shape = (8065, 288, 1)
input_x.shape = (8, 288, 1)
But now I've the same error on the training phase, on this line:
self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)

From the tf.keras.utils.multi_gpu_model we can see that it works in the following way:
Divide the model's input(s) into multiple sub-batches.
Apply a model copy on each sub-batch. Every model copy is executed on a dedicated GPU.
Concatenate the results (on CPU) into one big batch.
You are triggering an error because the input of the CuDNNLSTM layer is empty for at least one of the model copy. This is because the divide operations requires that: input // n_gpus > 0
Try this code out:
input_x = np.random.randn(8, n_timesteps, n_features)
model.predict(input_x)

The performance of GPU still slow even by keras fit_generator method

I have a large dataset 5GB that I want to use for training a neural network model designed using Keras. Although I am using Nvidia Tesla P100 GPU, the training is really slow (each epoch takes ~ 60-70s) (I choose the batch size=10000). After reading and searching, I found out that I can improve the training speed by using keras fit_generator instead of the typical fit. To do so, I coded the following:
from __future__ import print_function
import numpy as np
from keras import Sequential
from keras.layers import Dense
import keras
from sklearn.model_selection import train_test_split
def generator(C, r, batch_size):
samples_per_epoch = C.shape[0]
number_of_batches = samples_per_epoch / batch_size
counter = 0
while 1:
X_batch = np.array(C[batch_size * counter:batch_size * (counter + 1)])
y_batch = np.array(r[batch_size * counter:batch_size * (counter + 1)])
counter += 1
yield X_batch, y_batch
# restart counter to yeild data in the next epoch as well
if counter >= number_of_batches:
counter = 0
if __name__ == "__main__":
X, y = readDatasetFromFile()
X_tr, X_ts, y_tr, y_ts = train_test_split(X, y, test_size=.2)
model = Sequential()
model.add(Dense(16, input_dim=X.shape[1]))
model.add(keras.layers.advanced_activations.PReLU())
model.add(Dense(16))
model.add(keras.layers.advanced_activations.PReLU())
model.add(Dense(16))
model.add(keras.layers.advanced_activations.PReLU())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
batch_size = 1000
model.fit_generator(generator(X_tr, y_tr, batch_size), epochs=200, steps_per_epoch=X.shape[0]/ batch_size,
validation_data=generator(X_ts, y_ts, batch_size * 2),
validation_steps=X.shape[0] / batch_size * 2, verbose=2, use_multiprocessing=True)
loss, accuracy = model.evaluate(X_ts, y_ts, verbose=0)
print(loss, accuracy)
After running with fit_generator, the training time improved a little bit but it is still slow (each epoch now takes ~ 40-50s). When running nvidia-smi in the terminal, I found out that GPU utilization is ~15% only which makes me wonder if my code is wrong. I am posting my code above to kindly ask you if there is a bug causing to slow the performance of GPU.
Thank you,

Just try assigning GPUs forcefully so:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0" # or if you want more than 1 GPU set it as "0", "1"

Out of Memory training sequential models in for loop; previous solutions not working

I'm training a series of models in a for loop - to test a certain architecture. While doing so, I run out of memory and the system shuts down the process.
The same problem appears in this question and this question. To try their solutions, I did a test run with a similar loop to the one that is giving me problems. The code is:
def mem_test(n):
train_data = np.random.rand(1000,1500)
train_labels = np.random.randint(2,size= 1000)
mem = []
for i in range(n):
model = keras.Sequential([keras.layers.Dense(1000, activation= tf.nn.relu),
keras.layers.Dense(2,activation = tf.nn.softmax)])
model.compile(optimizer= tf.train.AdamOptimizer(.001), loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
model.fit(train_data,train_labels, epochs = 1)
mem.append(psutil.virtual_memory())
return mem
def mem_test_clear(n):
train_data = np.random.rand(1000,1500)
train_labels = np.random.randint(2,size= 1000)
mem = []
for i in range(n):
model = keras.Sequential([keras.layers.Dense(1000, activation= tf.nn.relu),
keras.layers.Dense(2,activation = tf.nn.softmax)])
model.compile(optimizer= tf.train.AdamOptimizer(.001), loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
model.fit(train_data,train_labels, epochs = 1)
mem.append(psutil.virtual_memory())
keras.backend.clear_session()
tf.reset_default_graph()
return mem
while the latter seems to do slightly better than the former, they both still end up accumulating memory usage. So, for my actual application of this, I'm left without a solution. What do I need to do in order to actually free up memory in this situation? What am I doing wrong?

You have to compile only once the model.
Then you can build a loop for fitting it:
import numpy as np
import psutil
import keras
import tensorflow as tf
def mem_test(n):
train_data = np.random.rand(1000,1500)
train_labels = np.random.randint(2,size= 1000)
mem = []
model = keras.Sequential([keras.layers.Dense(1000, activation= tf.nn.relu),
keras.layers.Dense(2,activation = tf.nn.softmax)])
model.compile(optimizer= tf.train.AdamOptimizer(.001), loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
for i in range(n):
model.fit(train_data,train_labels, epochs = 1)
mem.append(psutil.virtual_memory())
return mem
mem_test(50)
This way it will consume just a tiny amount of memory and will not accumulate anything. Furthermore this is the way how your model will work correctly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keras fit_generator running very slowly - python

Related

How to properly setup a data set for training a Keras model

Keras model doesn't update weights

Error on prediction running keras multi_gpu_model

The performance of GPU still slow even by keras fit_generator method

Out of Memory training sequential models in for loop; previous solutions not working

Categories

Resources