I am training a CNN with TensorFlow for medical images application.
As I don't have a lot of data, I am trying to apply random modifications to my training batch during the training loop to artificially increase my training dataset. I made the following function in a different script and call it on my training batch:
def randomly_modify_training_batch(images_train_batch, batch_size):
for i in range(batch_size):
image = images_train_batch[i]
image_tensor = tf.convert_to_tensor(image)
distorted_image = tf.image.random_flip_left_right(image_tensor)
distorted_image = tf.image.random_flip_up_down(distorted_image)
distorted_image = tf.image.random_brightness(distorted_image, max_delta=60)
distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8)
with tf.Session():
images_train_batch[i] = distorted_image.eval() # .eval() is used to reconvert the image from Tensor type to ndarray
return images_train_batch
The code works well for applying modifications to my images.
The problem is :
After each iteration of my training loop (feedfoward + backpropagation), applying this same function to my next training batch steadily takes 5 seconds longer than the last time.
It takes around 1 second to process and reaches over a minute of processing after a bit more than 10 iterations.
What causes this slowing?
How can I prevent it?
(I suspect something with distorted_image.eval() but I'm not quite sure. Am opening a new session each time? TensorFlow isn't supposed to close automatically the session as I use in a "with tf.Session()" block?)
You call that code in each iteration, so each iteration you add these operations to the graph. You don't want to do that. You want to build the graph at the start and in the training loop only execute it. Also, why do you need to convert to ndimage again afterwards, instead of putting things into your TF graph once and just use tensors all the way through?
Related
I came across this notebook that covers forecasting. I got it through this article.
I am confused about the 2nd and 4th line from below
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_data = train_data.cache().shuffle(buffer_size).batch(batch_size).repeat()
val_data = tf.data.Dataset.from_tensor_slices((x_vali, y_vali))
val_data = val_data.batch(batch_size).repeat()
I understand that we are trying to shuffle our data as we dont want to feed data to our model in the serial order. On additional reading I realized that it is better to have buffer_size same as the size of the dataset. But I am not sure what repeat is doing in this case. Could someone explain what is being done here and what is the function of repeat?
I also looked at this page and saw below text but still not clear.
The following methods in tf.Dataset :
repeat( count=0 ) The method repeats the dataset count number of times.
shuffle( buffer_size, seed=None, reshuffle_each_iteration=None) The method shuffles the samples in the dataset. The buffer_size is the number of samples which are randomized and returned as tf.Dataset.
batch(batch_size,drop_remainder=False) Creates batches of the dataset with batch size given as batch_size which is also the length of the batches.
The repeat call with nothing passed to the count param makes this dataset repeat infinitely.
In python terms, Datasets are a subclass of python iterables. If you have an object ds of type tf.data.Dataset, then you can execute iter(ds). If the dataset was generated by repeat(), then it will never run out of items, i.e., it will never throw a StopIteration exception.
In the notebook you referenced, the call to tf.keras.Model.fit() is passed an argument of 100 to the param steps_per_epoch. This means that the dataset should be infinitely repeating, and Keras will pause training to run validation every 100 steps.
tldr: leave it in.
https://github.com/tensorflow/tensorflow/blob/3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/python/data/ops/dataset_ops.py#L134-L3445
https://docs.python.org/3/library/exceptions.html
I'm training a simple VAE model on 64*64 images and I would like to see the images generated after every epoch or every couple batches to see the progress.
when I train the model I wait until the training is done and then I look at the results.
I tried to make a custom callback function in Keras that generates an image and saves it but couldn't do it. is it even possible? I couldn't find anything like it.
it would be awesome if you refer me to a source that explains how to do so or show me an example.
Note: I'm interested in a clean Keras.callback solution and not to iterate over every epoch, train and generate the sample
If you still need it, you can define custom callback in keras as a subclass of keras.callbacks.Callback:
class CustomCallback(keras.callbacks.Callback):
def __init__(self, save_path, VAE):
self.save_path = save_path
self.VAE = VAE
def on_epoch_end(self, epoch, logs={}):
#load the image
#get latent_space with self.VAE.encoder.predict(image)
#get reconstructed image wtih self.VAE.decoder.predict(latent_space)
#plot reconstructed image with matplotlib.pyplot
Then define callback as image_callback = CustomCallback(...)
and place image_callback in the list of callbacks
Yeah its actually possible, but i always use matplotlib and a self-defined function for that. For example something like that.
for steps in range (epochs):
Train,Test = YourDataGenerator() # load your images for one loop
model.fit(Train,Test,batch_size= ...)
result = model.predict(Test_image)
plt.imshow(result[0,:,:,:]) # keras always returns [batch.nr,heigth,width,channels]
filename1 = '/content/runde2/%s_generated_plot_%06d.png' % (test, (steps+1))
plt.savefig(filename1 )
plt.close()
I think there is also a clean keras.callback version, but i always used this approach because you can use other libraries for easier data augmentation per loop. But thats just my opinion, hope i could help you at least a bit.
I'm working on a Keras model with images separated into patches.
I have a quite peculiar pipeline:
for i in range(n_iteration):
print("Epoch:", i, "/", n_iteration)
start = time.time()
self.train_batch, self.validation_batch = self.get_batch()
end = time.time()
print("Time for loading: ",end - start)
K.set_value(self.batch_source, self.train_batch[0][:self.batch_size])
K.set_value(self.batch_target, self.train_batch[0][self.batch_size:])
pred = self.model.predict(self.train_batch[0])
K.set_value(self.gamma, self.compute_gamma(pred))
hist = self.model.train_on_batch(self.train_batch[0], self.train_batch[1])
I need to compute based on the prediction of my model at a time t (for a given batch) a certain value named gamma.This value is then taken into account in my loss function but is not differentiable, therefore I canno't integrate it's computation in my loss function.
When measuring the necessary time for loading and training, it appears that the bottleneck is in the loading phase.
My question is: Is it possible to load several batches (the function self.get_batch() while computing the prediction, gamma and training on an other batch?
I guess the idea would be to create some kind of queue in which I store my batches, but I don't really know how to do that.
PS: in my get_batch function I'm accessing an hdf5 file, can it cause any trouble in multiprocessing ?
Thank you in advance.
I was trying to do a simple image classification exercise using CNN and Keras.
I have a list that stores the directions of the images (train_glob) and another list with the corresponding classification labels one hot encoded (dummy_y).
The function load_one() takes as arguments a path and some parameters for image resizing and augmentation and returns a transformed image as a numpy array.
When I run the code in batch mode though .fit(), creating a single file holding all the images called batch_features I achieved after 5 epochs a decent accuracy of 0.7.
The problem appears when I try to replicate the results using a python generator to feed the data and train using .fit_generator(), the performance results are really poor when in fact I would expected them to be slightly better since, to my understanding, more data is being fed.
Unlike the batch function, in the generator y am randomly altering the brightness of the images and looping more times over the data, so in theory If I understand correctly how the generator works I would expect the results to be better.
This is my generator function
def generate_arrays_from_file(paths,cat_list, batch_size = 128):
number = 0
max_len = len(paths)
while True:
batch_features = np.zeros((batch_size, 128, 64, 3),np.uint8)
batch_labels = np.zeros((batch_size,cat_list.shape[1]),np.uint8)
for i in range(number*batch_size, number*batch_size + batch_size):
#choose random index in features
#index= np.random.choice(len(paths))
batch_features[i % batch_size] = load_one(paths[i], final_size=(64,128), augment = True)
batch_labels[i % batch_size] = cat_list[i]
batch_features = normalize_data(batch_features)
yield batch_features, batch_labels
number += 1
if number*batch_size + batch_size > max_len:
number = 0
An this is the keras call to the generator
mod.fit_generator(generate_arrays_from_file(train_glob, dummy_y, 256),
samples_per_epoch=16368, nb_epoch=10)
Is this the right way of passing a generator?
Thanks
To match your accuracy you want to feed in the same data. Since you do some transformations on the images that you didn't do without the generator, it is normal for your accuracy not to match.
If you think the generator is the problem, you can test it out quite easily.
Fire up a python shell, import your package, make a generator and get a few samples to see if they're what you expected.
# say you save the generator in mygenerator.py
$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mygenerator
# initialise paths, cat_list here:
>>> paths = [...]
>>> cat_list = [...]
# use a small batch_size to be able to see the results
>>> g = mygenerator.generate_arrays_from_file(paths, cat_list, batch_size = 2)
>>> batch = g.__next__()
# now check if batch is what you expect
To save an image or display it (from this tutorial):
# Save:
from scipy import misc
misc.imsave('face.png', image_array) # uses the Image module (PIL)
# Display:
import matplotlib.pyplot as plt
plt.imshow(image_array)
plt.show()
More about accuracy and data augmentation
If you test the two models (one trained with the generator and one with the whole data preloaded) on different datasets the accuracies will clearly be different. Try to use the exact same test and train data for both models, turn off augmentation completely and you should see similar accuracies (for the same number of epochs, batch_sizes, etc). If you don't use the method above to fix the generator.
If there are only few data points the model will overfit (thus have high training accuracy) very quickly. Data augmentation helps reduce overfitting and makes models generalise better. This also means that the accuracy on training after very few epochs will be lower as the data is more varied.
Please note it is very easy to get image processing (data augmentation) things wrong and not realise it. Crop it wrongly, you get a black image. Zoom too much you only get noise. Confuse x and y and you get a totally wrong image. And so on... Test your generator to see if the images it outputs are what you expect and that the labels match.
On brightness. If you alter the brightness on the input images you make your model agnostic to brightness. You don't improve the generalisation on other things like rotations and zoom. Make sure you do not overdo the brightness changes: do not make your images fully white or fully black - if this happens it will explain the huge drop in accuracy.
As pointed in the comments by VMRuiz, if you have categorical data (which you do), use keras.preprocessing.image.ImageDataGenerator (docs). It will save you a lot of time. A very good example on Keras blog (code here). If you are interested in your own image processing have a look at the ImageDataGenerator source code.
By following the mnist example, I was able to build a custom network and use the inputs function of the example to load my dataset (previously encoded as a TFRecord). Just to recap it, the inputs function looks like:
def inputs(train_dir, train, batch_size, num_epochs, one_hot_labels=False):
if not num_epochs: num_epochs = None
filename = os.path.join(train_dir,
TRAIN_FILE if train else VALIDATION_FILE)
with tf.name_scope('input'):
filename_queue = tf.train.string_input_producer(
[filename], num_epochs=num_epochs)
# Even when reading in multiple threads, share the filename
# queue.
image, label = read_and_decode(filename_queue)
# Shuffle the examples and collect them into batch_size batches.
# (Internally uses a RandomShuffleQueue.)
# We run this in two threads to avoid being a bottleneck.
images, sparse_labels = tf.train.shuffle_batch(
[image, label], batch_size=batch_size, num_threads=2,
capacity=1000 + 3 * batch_size,
# Ensures a minimum amount of shuffling of examples.
min_after_dequeue=1000)
return images, sparse_labels
Then, during the training I declare the training operator and run everything, and everything goes smoothly.
Now, I am trying to use the very same function to train a different network on the same data, the only (major) difference is that instead of just calling the slim.learning.train function on some train_operator, I do the training manually (by manually evaluating the losses and updating the parameters). The architecture is more complex and I'm forced to do so.
When I try to use the data generated by the inputs function, the program gets stuck, setting a queue timeout indeed shows that it's stuck on the producer's queue.
This leads me to believe that I'm probably missing something about the use of producers in tensorflow, I have read the tutorials but I couldn't figure out the issue. Is there some kind of initialization that calling slim.learning.train does and that I need to replicate by hand if I do my training manually? Why exactly isn't the producer producing?
For example, doing something like:
imgs, labels = inputs(...)
print imgs
prints
<tf.Tensor 'input/shuffle_batch:0' shape=(1, 128, 384, 6) dtype=float32>
which is the correct (symbolic?) tensor but if I then try to get the actual data with a imgs.eval() it's stuck indefinitely.
You need to start the queue runners, or the queues will be empty and reading from them will hang. See the documentation on queue runners.