I'm trying to train a 2D Unet, for the segmentation task.
I execute this line of code:
model.fit(training_generator, epochs = params["nEpoches"],
validation_data=validation_generator, verbose = 1, use_multiprocessing = True, workers = 6, callbacks=[callbacks_list,csv_logger])
Where
training_generator = Istance of DataGenerator(x_training, y_train_flat, **params), with the image and the masks array as parameters of this class.
epochs = 2
validation_generator = Istance of DataGenerator(x_validation, y_validation_flat, **params), with validation data.
callbacks_list = checkPoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, mode='min', period=1)
callbacks_list = checkPoint
With the verbose=1 parameter I think I should see a progress bar showing the training status for each epoch, but the only thing I see is Epoch 1/2, without any bar. So I can't say if the training process is going on or if it's stucked somewhere.
According to Tensorflow documentation,
steps_per_epoch:-
Integer or None. Total number of steps (batches of samples) before
declaring one epoch finished and starting the next epoch. When
training with input tensors such as TensorFlow data tensors, the
default None is equal to the number of samples in your dataset divided
by the batch size, or 1 if that cannot be determined. If x is a
tf.data dataset, and 'steps_per_epoch' is None, the epoch will run
until the input dataset is exhausted. When passing an infinitely
repeating dataset, you must specify the steps_per_epoch argument.
validation_steps:-
Only relevant if validation_data is provided and is a tf.data dataset.
Total number of steps (batches of samples) to draw before stopping
when performing validation at the end of every epoch. If
'validation_steps' is None, validation will run until the
validation_data dataset is exhausted. In the case of an infinitely
repeated dataset, it will run into an infinite loop. If
'validation_steps' is specified and only part of the dataset will be
consumed, the evaluation will start from the beginning of the dataset
at each epoch. This ensures that the same validation samples are used
every time.
In your case, training progress is going on, as rightly mentioned by #Kaveh, it does not know how much steps it should have for one epoch and ran into an infinite loop. Check your batch size and add steps_per_epoch and validation_steps to the model.fit() as shown below will resolve your issue.
model.fit(training_generator,
steps_per_epoch = len(training_generator) // training_generator.batch_size,
epochs = params["nEpoches"],
validation_data=validation_generator,
validation_steps=len(validation_generator) // validation_generator.batch_size,
verbose = 1,
use_multiprocessing = True, workers = 6, callbacks=[callbacks_list,csv_logger])
For more information you can refer here
Related
# x_train.shape[0] = 54000
model.fit(
x_train, y_train,
batch_size = 128,
epochs = 12,
validation_data = (x_val, y_val)
)
When I am using this fit() method to train a neural network:
batch_size = 128 means that I randomly pick 54000 // 128 batches of size 128 in my training dataset every epoch.
Are those batches chosen with replacement? I suspect from the docs they're not but I'd like confirmation.
Can I manually choose my batches? I would like to focus on specific images and not others for a given batch, by choosing them personally instead of letting randomness choose for me.
Are those batches chosen with replacement?
In each individual epoch, no. Of course the entire dataset is used again in the next epoch.
Can I manually choose my batches? I would like to focus on specific images and not others for a given batch, by choosing them personally instead of letting randomness choose for me.
You should create a custom dataset for this, and leave the rest of the training loop (data loader, model etc.) unchanged.
But be aware that the samples in a minibatch are supposed to be random.
My main question is, does it iterate over every sample in the directory for every epoch? I have directory with 6 classes with almost same number of samples in each class, when I trained model with batch_size=16 it didn't work at all, predicts only 1 class correctly. Making batch_size=128 made that, it can predict 3 classes with high accuracy and other 3 never appeared in test predictions. Why it did so? Does every steps_per_epoch uniquely generated and it only remembers samples of that batch? Which means that it does not remember last used batch samples and creates new random batch with possibility to use already used samples and miss others, if so then it means that it misses whole class samples and the only way to overcome this would be increasing batch_size so that it will remember it in one batch. I can't increase batch_size more than 128 because there is not enough memory on my GPU.
So what should I do?
Here is my code for ImageDataGenerator
train_d = ImageDataGenerator(rescale=1. / 255, shear_range=0.2, zoom_range=0.1, validation_split=0.2,
rotation_range=10.,
width_shift_range=0.1,
height_shift_range=0.1)
train_s = train_d.flow_from_directory('./images/', target_size=(width, height),
class_mode='categorical',
batch_size=32, subset='training')
validation_s = train_d.flow_from_directory('./images/', target_size=(width, height), class_mode='categorical',
subset='validation')
And here is code for fit_generator
classifier.fit_generator(train_s, epochs=20, steps_per_epoch=100, validation_data=validation_s,
validation_steps=20, class_weight=class_weights)
Yes, it iterates for every sample in each folder every epoch. This is the definition of en epoch, a complete pass over the whole dataset.
steps_per_epoch should be set to len(dataset) / batch_size, then only issue is when the batch size does not exactly divide the number samples, and in that case you round steps_per_epoch up and the last batch is smaller than batch_size.
I'm trying to create a stacked autoencoder with my own dataset ,evreything work great ,when i try to draw the curve with tensorboard i get this scalars:
i think the error is in steps_per_epoch if it's not X_train.shape[0] so what sould it contain:
autoencoder.fit_generator(generated_data.flow(X_train, X_train, batch_size=batch_size), steps_per_epoch=X_train.shape[0], epochs=epochs, validation_data=(X_test, X_test), callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])
and the other thing how can i add Accuarcy ?
From the documentation of fit_generator
steps_per_epoch: Integer. Total number of steps (batches of samples)
to yield from generator before declaring one epoch finished and
starting the next epoch. It should typically be equal to the number of
samples of your dataset divided by the batch size. Optional for
Sequence: if unspecified, will use the len(generator) as a number of
steps.
So you should set it roughly equal to X_train.shape[0]/batch_size
To monitor accuracy use
autoencoder.compile(optimizer='rmsprop', loss='mse', metrics=['mse', 'accuracy'])
I've trained several models in Keras. I have 39, 592 samples in my training set, and 9, 899 in my validation set. I used a batch size of 2.
As I was examining my code, it occurred to me that my generators may have been missing some batches of data.
This is the code for my generator:
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
val_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(224, 224)
batch_size=batch_size,
class_mode='categorical')
validation_generator = val_datagen.flow_from_directory(
val_dir,
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
I searched around to see how my generators behave, and found this answer:
what if steps_per_epoch does not fit into numbers of samples?
I calculated my steps_per_epoch and validation_steps this way:
steps_per_epoch = int(number_of_train_samples / batch_size)
val_steps = int(number_of_val_samples / batch_size)
Using the code in this link with my own batch size and number of samples, I got these results:
"missing the last batch" for train_generator and "weird behavior" for val_generator.
I'm afraid that I have to retrain my models again. What values should I choose for steps_per_epoch and validation_steps? Is there a way to use exact values for these variables(Other than setting batch_size to 1 or removing some of the samples)? I have several other models with different number of samples, and I think they've all been missing some batches. Any help would be much appreciated.
Two related question:
1- Regarding the models I already trained, are they reliable and properly trained?
2- What would happen if I set these variables using following values:
steps_per_epoch = np.ceil(number_of_train_samples / batch_size)
val_steps = np.ceil(number_of_val_samples / batch_size)
will my model see some of the images more than once in each epoch during training and validation? or Is this the solution to my question?!
Since Keras data generator is meant to loop infinitely, steps_per_epoch indicates how many times you will fetch a new batch from generator during single epoch. Therefore, if you simply take steps_per_epoch = int(number_of_train_samples / batch_size), your last batch would have less than batch_size items and would be discarded. However, in your case, it's not a big deal to lose 1 image per training epoch. The same is for validation step. To sum up: your models are trained [almost :) ] correctly, because the quantity of lost elements is minor.
Corresponding to implementation ImageDataGenerator https://keras.io/preprocessing/image/#imagedatagenerator-class if your number of steps would be larger than expected, after reaching the maximum number of samples you will receive new batches from the beginning, because your data is looped over. In your case, if steps_per_epoch = np.ceil(number_of_train_samples / batch_size) you would receive one additional batch per each epoch which would contains repeated image.
In addition to Greeser's answer,
To avoid losing some training samples, you could calculate your steps with this function:
def cal_steps(num_images, batch_size):
# calculates steps for generator
steps = num_images // batch_size
# adds 1 to the generator steps if the steps multiplied by
# the batch size is less than the total training samples
return steps + 1 if (steps * batch_size) < num_images else steps
i'm currently beginning to discover Keras library for deap learning, it seems that in the training phase a centain number of epoch is chosen, but i don't know on which assumption is this choice based on.
In the Mnist dataset the number of epochs chosen is 4 :
model.fit(X_train, Y_train,
batch_size=128, nb_epoch=4,
show_accuracy=True, verbose=1,
validation_data=(X_test, Y_test))
Could someone tell me why and how do we choose a correct number of epochs ?
Starting Keras 2.0, nb_epoch argument has been renamed to epochs everywhere.
Neural networks are trained iteratively, making multiple passes over entire dataset. Each pass over entire dataset is referred to as epoch.
There are two possible ways to choose an optimum number of epochs:
1) Set epochs to a large number, and stop training when validation accuracy or loss stop improving: so-called early stopping
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=4, mode='auto')
model.fit(X_train, Y_train,
batch_size=128, epochs=500,
show_accuracy=True, verbose=1,
validation_data=(X_test, Y_test),callbacks = [early_stopping])
2) Consider number of epochs as a hyperparameter and select the best value based on a set of trials (runs) on a grid of epochs values
it seems you might be using old version of keras ,nb_epoch refers to number of epochs which has been replaced by epoch
if you look here you will see that it has been deprecated.
One epoch means that you have trained all dataset(all records) once,if you have 384 records,one epoch means that you have trained your model for all on all 384 records.
Batch size means the data you model uses on single iteration,in that case,128 batch size means that at once,your model takes 128 and do some a single forward pass and backward pass(backpropation)[This is called one iteration]
.it
To break it down with this example,one iteration,your model takes 128 records[1st batch] from your whole 384 to be trained and do a forward pass and backward pass(back propagation).
on second batch,it takes from 129 to 256 records and do another iteration.
then 3rd batch,from 256 to 384 and performs the 3rd iteration.
In this case,we say that it has completed one epoch.
the number of epoch tells the model the number it has to repeat all those processes above then stops.
There is no correct way to choose a number of epoch,its something that is done by experimenting,usually when the model stops to learn(loss is not going down anymore) you usually decrease the learning rate,if it doesn't go down after that and the results seems to be more or less as you expected then you select at that epoch where the model stopped to learn
I hope it helps
In neural networks, an epoch is equivalent to training the network using each data once.
The number of epochs, nb_epoch, is hence how many times you re-use your data during training.