Why is TensorFlow's `tf.data` package slowing down my code?

Why is TensorFlow's `tf.data` package slowing down my code? - python

I'm just learning to use TensorFlow's tf.data API, and I've found that it is slowing my code down a lot, measured in time per epoch. This is the opposite of what it's supposed to do, I thought. I wrote a simple linear regression program to test it out.
Tl;Dr: With 100,000 training data, tf.data slows time per epoch down by about a factor of ten, if you're using full batch training. Worse if you use smaller batches. The opposite is true with 500 training data.
My question: What is going on? Is my implementation flawed? Other sources I've read have tf.data improving speeds by about 30%.
import tensorflow as tf
import numpy as np
import timeit
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.logging.set_verbosity(tf.logging.ERROR)
n_epochs = 10
input_dimensions_list = [10]
def function_to_approximate(x):
return np.dot(x, random_covector).astype(np.float32) + np.float32(.01) * np.random.randn(1,1).astype(np.float32)
def regress_without_tfData(n_epochs, input_dimension, training_inputs, training_labels):
tf.reset_default_graph()
weights = tf.get_variable("weights", initializer=np.random.randn(input_dimension, 1).astype(np.float32))
X = tf.placeholder(tf.float32, shape=(None, input_dimension), name='X')
Y = tf.placeholder(tf.float32, shape=(None, 1), name='Y')
prediction = tf.matmul(X,weights)
loss = tf.reduce_mean(tf.square(tf.subtract(prediction, Y)))
loss_op = tf.train.AdamOptimizer(.01).minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for _ in range(n_epochs):
sess.run(loss_op, feed_dict={X: training_inputs, Y:training_labels})
def regress_with_tfData(n_epochs, input_dimension, training_inputs, training_labels, batch_size):
tf.reset_default_graph()
weights = tf.get_variable("weights", initializer=np.random.randn(input_dimension, 1).astype(np.float32))
X,Y = data_set.make_one_shot_iterator().get_next()
prediction = tf.matmul(X, weights)
loss = tf.reduce_mean(tf.square(tf.subtract(prediction, Y)))
loss_op = tf.train.AdamOptimizer(.01).minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
while True:
try:
sess.run(loss_op)
except tf.errors.OutOfRangeError:
break
for input_dimension in input_dimensions_list:
for data_size in [500, 100000]:
training_inputs = np.random.randn(data_size, input_dimension).astype(np.float32)
random_covector = np.random.randint(-5, 5, size=(input_dimension, 1))
training_labels = function_to_approximate(training_inputs)
print("Not using tf.data, with data size "
"{}, input dimension {} and training with "
"a full batch, it took an average of "
"{} seconds to run {} epochs.\n".
format(
data_size,
input_dimension,
timeit.timeit(
lambda: regress_without_tfData(
n_epochs, input_dimension,
training_inputs, training_labels
),
number=3
),
n_epochs))
for input_dimension in input_dimensions_list:
for data_size, batch_size in [(500, 50), (500, 500), (100000, 50), (100000, 100000)]:
training_inputs = np.random.randn(data_size, input_dimension).astype(np.float32)
random_covector = np.random.randint(-5, 5, size=(input_dimension, 1))
training_labels = function_to_approximate(training_inputs)
data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))
data_set = data_set.repeat(n_epochs)
data_set = data_set.batch(batch_size)
print("Using tf.data, with data size "
"{}, and input dimension {}, and training with "
"batch size {}, it took an average of {} seconds "
"to run {} epochs.\n".
format(
data_size,
input_dimension,
batch_size,
timeit.timeit(
lambda: regress_with_tfData(
n_epochs, input_dimension,
training_inputs, training_labels,
batch_size
),
number=3
)/3,
n_epochs
))
This outputs for me:
Not using tf.data, with data size 500, input dimension 10 and training
with a full batch, it took an average of 0.20243382899980134 seconds
to run 10 epochs.
Not using tf.data, with data size 100000, input dimension 10 and
training with a full batch, it took an average of 0.2431719040000644
seconds to run 10 epochs.
Using tf.data, with data size 500, and input dimension 10, and
training with batch size 50, it took an average of 0.09512088866661846
seconds to run 10 epochs.
Using tf.data, with data size 500, and input dimension 10, and
training with batch size 500, it took an average of
0.07286913600000844 seconds to run 10 epochs.
Using tf.data, with data size 100000, and input dimension 10, and
training with batch size 50, it took an average of 4.421892363666605
seconds to run 10 epochs.
Using tf.data, with data size 100000, and input dimension 10, and
training with batch size 100000, it took an average of
2.2555197536667038 seconds to run 10 epochs.
Edit: Fixed an important issue that Fred Guth pointed out. It didn't much affect the results, though.

I wanted to test the dataset API which seems to be really convenient for processing data. I did a lot of time testing about this API in CPU, GPU and multi-GPU way for small and large NN with different type of data.
First thing, It seems to me that your code is ok. But I need to point that your NN is just one simple layer.
Now, the dataset API is not suitable for your type of NN but for NN with a lot more complexity. Why ? For several reasons that I explain below (founded in my quest of understanding the dataset API).
Firstly, in one hand the dataset API processes data each batch whereas in the other hand data are preprocessed. Therefore, if it fits your RAM, you can save time by preprocessing the data. Here your data are just to "simple". If you want to test what i am saying, try to find a really really big dataset to process. Nevertheless, the dataset API can be tuned with prefetching data. You can take a look to this tutorial that explain really well why it is good to process data with prefetch.
Secondly, in my quest of dataset API for Multi-GPU training, I discovered that as far as i know the old pre-processing way is faster than dataset API for small Neural Network. You can verify that by creating a simple stackable RNN which take a sequence in input. You can try different size of stack (i have tested 1, 2, 10 and 20). You will see that, using the dataset API, on 1-GPU or on 4-GPUs, the time did not differ for small RNN stacks (1, 2 and 5).
To summarize, the dataset API is suitable for Neural Network that have data that can't be pre-process. Depending on your task, it may be more convenient to pre-process data, for example if you want to tweak your NN in order to improve it. I agree that the dataset API is really cool for batch, padding and also convenient for shuffling large amount of data but it's also not suitable for multi-GPU training.

First:
You are recreating the dataset unnecessarily.
data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))
Create the dataset prior to the loop and change the regress_with_tfData input signature to use dataset instead of training_inputs and training_labels.
Second:
The problem here is that minibatches of size 50 or even 500 are too small to compensate the cost of td.data building latency. You should increase the minibatch size. Interestingly you did so with a minibatch of size 100000, but then maybe it is too big ( I am not certain of this, I think it would need more tests).
There are a couple of things you could try:
1) Increase the minibatch size to something like 10000 and see if you get an improvement
2) Change your pipeline to use an iterator, example:
data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))
data_set = data_set.repeat(n_epochs)
data_set = data_set.batch(batch_size)
iterator = data_set.make_one_shot_iterator()
....
next_element = iterator.get_next()

That is because you are comparing apples with bananas.
On one hand, when using placeholders, you are providing a monolithic tensor as is. On the other hand, when using Dataset, you are slicing the tensor into individual samples. This is very different.
The equivalent of providing a monolothic placeholder tensor with the Dataset pipeline is by using tf.data.Dataset.from_tensors. When I use from_tensors in your example, I get similar (actually smaller) computation times than with placeholders.
If you want to compare a more sophisticated pipeline using from_tensor_slices, you should use a fair comparison with placeholders. For example, shuffle your data. Add some preprocessing on your slices. I have no doubt you will observe the performance gain that makes people switch to this pipeline.

One possible thing you are missing is a prefetch. Add a prefetch of 1 at the end of your data pipeline like so:
data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))
data_set = data_set.repeat(n_epochs)
data_set = data_set.batch(batch_size).prefetch(1)
Adding a prefetch of 1 at the end of your dataset pipeline means you try and fetch 1 batch of data while training is happening. This way you wont be waiting around while the batch is prepared, it should be ready to go as soon as each train iteration is done.

The accepted answer doesn't help longer valid, as the TF behavior has changed. Per documentation:
from_tensors produces a dataset containing only a single element. To
slice the input tensor into multiple elements, use from_tensor_slices
instead.
This means you cannot batch it
X = np.arange(10)
data = tf.data.Dataset.from_tensors( X )
data = data.batch(2)
for t in data.as_numpy_iterator():
print(t)
# only one row, whereas expected 5 !!!
The documentation recommends from_tensor_slices. But this has quite some overhead when compared to numpy slicing. Slow slicing is an open issue https://github.com/tensorflow/tensorflow/issues/39750
Essentially, slicing in TF is slow and impacts input-bound or light models such as small networks (regression, word2vec).

Related

Validation accuracy on MNIST differs when training data in one go vs when separated

I have a 27G dataset to analyse, and because of the size of my RAM I can't feed all my data into my Neural Network at once, and I have to import bits of it, learn on them, and then another part, so the process would look some thing like this:
import 10% of data
learn
save model
delete the data on RAM
import the next 10% and so on
To see how this would affect a known dataset, I tested it on MNIST. the following is the process/procedure:
for 35 times:
import 1/5 of the data
learn
delete
import the next 1/5
learn
delete
...
This is the code to import the dataset from tensorflow:
from tensorflow.keras.datasets import mnist
(sep, label), (sep_t, label_t) = mnist.load_data()
Then, the network:
Dense = tf.keras.layers.Dense
fc_model = tf.keras.Sequential(
[
tf.keras.Input(shape=(28,28)),
tf.keras.layers.Flatten(),
Dense(128, activation='relu'),
Dense(32, activation='relu'),
Dense(10, activation='softmax')])
fc_model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
Below is the code for partially importing and learning the MNIST data set:
for k in range(35):
for j in range(5):
if i == 0:
history = fc_model.fit(sep[i*12000:(i+1)*12000-1], label[i*12000:(i+1)*12000-1], batch_size=128, validation_data=(sep_t, label_t) ,epochs=1)
fc_model.save('Mytf.h5')
i = i + 1
else:
fc_model = load_model('Mytf.h5')
history = fc_model.fit(sep[i*12000:(i+1)*12000-1], label[i*12000:(i+1)*12000-1], batch_size=128, validation_data=(sep_t, label_t) ,epochs=1)
fc_model.save('Mytf.h5')
valacc.append(history.history['val_accuracy'])
valacc_epc.append(history.history['val_accuracy'])
The following is the code to learn the data in one whole dataset:
history_new = fc_model.fit(sep, label, batch_size=128, validation_data=(sep_t, label_t) ,epochs=35)
and the graph below is the comparison between the two methods in terms of accuracy of the validation data:
even though the difference is like 1% (96(avg)-95(avg)=1%), would this mean that when testing on a different dataset using the same methodology of saving and learning, this would result in reduced accuracy? is it better to do some investment and do it on a cloud computation platform?

For both approaches, the batches are organized differently, so there would have to be some deviations (similar to shuffling the data vs feeding it in a specific order). But we can assume that these differences would not be consistent unless we observe this on a large amount of trials.
In any case, this is the common approach to loading large datasets bit by bit: tf.keras allows you to pass a Python generator for model.fit(x) (in case you want to research tutorials or work with an older API: until recently this was a separate method called model.fit_generator, see API).
All the data generator needs to do is yield a batch of training data and labels (x,y) each time it is called. The API will take care of calling it for you, as long as you pass it with fit. The result is that everything is read batch-by-batch into the RAM. A very basic template for a generator is something like this (source):
def generator(features, labels, batch_size):
# Create empty arrays to contain batch of features and labels#
batch_features = np.zeros((batch_size, 64, 64, 3))
batch_labels = np.zeros((batch_size,1))
while True:
for i in range(batch_size):
# choose random index in features
index= random.choice(len(features),1)
batch_features[i] = some_processing(features[index])
batch_labels[i] = labels[index]
yield batch_features, batch_labels

Training works but prediction produces constant values (cnn with pytorch)

I have a model trying to predict the class of image: cat or dog. I receive 95% accuracy in training. However when I try to predict a single image, I am stuck with almost constant output every time I run the model. There are some non-constant values, but they mostly look like catastrophic failure.
I read similar topics from forums but that hasn't helped, as it appears there is no particular solution for this problem...
I have tried the following:
Changing epochs 5 to 15,20,30...
Changing lr = 0.001 to 0.01, 0.0001...
I implemented with both dropout regularization model and batch
normalization model...
I changed testing pictures...
Changing last activation layer from softmax to torch.sigmoid...
Reducing batch size from 100 to 30, 75...
Trying with a batch, which results with normal acc, loss and
predictions.
My dataset is scaled which is mentioned in forums as solution.
My optim is Adam which is mentioned in forums as solution.
Loading dataset with torch.data.DataLoader...
Sampling randomly...
I saved and load the model, in case there are problems with that.
However, I already checked that state_dict's are different...
I re-prepared data which resulted the constant value to change
otherwise (dog to cat), somehow? Idk if that's a coincidence though.
Infos:
Dataset :
https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip
Here is all my code with predictions in Jupyter Notebook, feel free to investigate. I am really tired of this solution. Any help is highly appreciated!
https://github.com/yusuftengriverdi/neural_networks/blob/master/CNN_Last.ipynb
Similar topics around the web:
https://discuss.pytorch.org/t/rnn-predicting-a-constant-output/40397/5
https://discuss.pytorch.org/t/cnn-does-not-predict-properly-does-not-converge-as-expected/43567
https://discuss.pytorch.org/t/making-a-prediction-with-a-trained-model/2193
https://datascience.stackexchange.com/questions/46779/predict-gives-the-same-output-value-for-every-image-keras
https://github.com/keras-team/keras/issues/6447
PyTorch model prediction fail for single item
Having trouble with CNN prediction

If something works in training but fails during prediction, the most likely cause is you're not preprocessing the data the same way.
I had a look at the notebook (huge amount of code, in future please condense this to just the relevant parts here). At a glance - this is your prediction code which doesn't work as expected:
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
plt.imshow(img, cmap='gray')
x = torch.Tensor([i for i in img]).view(-1, 50, 50)
y= torch.Tensor([0,1]).to(device)
test_x = x.view(-1, 1, 50, 50)
test_x = test_x.to(device)
net.eval()
#with torch.no_grad():
yhat.append(net(test_x))
But during training you're using a dataloader
testloader = DataLoader(v_dataset, batch_size = BATCH_SIZE, sampler= test_sampler)
...
test_dt = next(iter(testloader))
X, y = test_dt[0].view(-1, 1, 50, 50), test_dt[1]
val_acc, val_loss = fwd_pass(X.view(-1, 1, 50, 50).to(device), y.to(device))
which works (since your test/validation accuracy goes up to a good level).
Figure out what the dataloader code path does which the other code path doesn't do, and you'll have the solution. Eg, load the same image in both ways and compare - same dimensions? data average / standard deviation the same? etc
For a shortcut - just use a dataloader to make predictions as well. P.S. Yes, it is okay to create a dataloader for just one image.

tensorflow 2.0, model.fit() : Your input ran out of data

I am absolutely new to TensorFlow and Keras, and I am trying to make my way around trying out some code that I am finding online.
In particular I am using the fashion-MNIST - consisting of 60000 examples and test set of 10000 examples. Each of them is a 28x28 grayscale image.
I am following this tutorial "https://towardsdatascience.com/building-your-first-neural-network-in-tensorflow-2-tensorflow-for-hackers-part-i-e1e2f1dfe7a0", and I have no problem until the definition of
history = model.fit(
train_dataset.repeat(),
epochs=10,
steps_per_epoch=500,
validation_data=val_dataset.repeat(),
validation_steps=2)
As long as I understood, I need to use train_dataset.repeat() as input dataset because otherwise I won't have enough training example using those values for the hyperparameters (epochs, steps_per_epochs).
My question is: how can I avoid to have to use .repeat()?
How do I need to change the hyperparameters?
I am coping the code here, for simplicity:
def preprocess(x,y):
x = tf.cast(x,tf.float32) / 255.0
y = tf.cast(y, tf.float32)
return x,y
def create_dataset(xs, ys, n_classes=10):
ys = tf.one_hot(ys, depth=n_classes)
return tf.data.Dataset.from_tensor_slices((xs, ys)).map(preprocess).shuffle(len(ys)).batch(128)
model.compile(optimizer = 'adam', loss =tf.losses.CategoricalCrossentropy(from_logits= True), metrics =['accuracy'])
history1 = model.fit(train_dataset.repeat(),
epochs=10,
steps_per_epoch=500,
validation_data=val_dataset.repeat(),
validation_steps=2)
Thanks!

If you don't want to use .repeat() you need to have your model passing thought your entire data only one time per epoch.
In order to do that you need to calculate how many steps it will take for your model to pass throught the entire dataset, the calcul is easy :
steps_per_epoch = len(train_dataset) // batch_size
So with a train_dataset of 60 000 sample and a batch_size of 128, you need to have 468 steps per epoch.
By setting this parameter like that you make sure that you do not exceed the size of your dataset.

I encountered the same problem and here is what I found.
Documentation of tf.keras.Model.fit: "If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch will run until the input dataset is exhausted."
In other words, we don't need to specify 'steps_per_epoch' if we use the tf.data.dataset as the training data, and tf will figure out how many steps are there. Meanwhile, tf will automatically repeat the dataset when the next epoch begins, so you can specify any 'epoch'.
When passing an infinitely repeating dataset (e.g. dataset.repeat()), you must specify the steps_per_epoch argument.

How to use properly Tensorflow Dataset with batch?

I am new to Tensorflow and deep learning, and I am struggling with the Dataset class. I tried a lot of things and I can’t find a good solution.
What I am trying
I have a large amount of images (500k+) to train my DNN with. This is a denoising autoencoder so I have a pair of each image. I am using the dataset class of TF to manage the data, but I think I use it really badly.
Here is how I load the filenames in a dataset:
class Data:
def __init__(self, in_path, out_path):
self.nb_images = 512
self.test_ratio = 0.2
self.batch_size = 8
# load filenames in input and outputs
inputs, outputs, self.nb_images = self._load_data_pair_paths(in_path, out_path, self.nb_images)
self.size_training = self.nb_images - int(self.nb_images * self.test_ratio)
self.size_test = int(self.nb_images * self.test_ratio)
# split arrays in training / validation
test_data_in, training_data_in = self._split_test_data(inputs, self.test_ratio)
test_data_out, training_data_out = self._split_test_data(outputs, self.test_ratio)
# transform array to tf.data.Dataset
self.train_dataset = tf.data.Dataset.from_tensor_slices((training_data_in, training_data_out))
self.test_dataset = tf.data.Dataset.from_tensor_slices((test_data_in, test_data_out))
I have a function to call at each epoch that will prepare the dataset. It shuffles the filenames, and transforms filenames to images and batch data.
def get_batched_data(self, seed, batch_size):
nb_batch = int(self.size_training / batch_size)
def img_to_tensor(path_in, path_out):
img_string_in = tf.read_file(path_in)
img_string_out = tf.read_file(path_out)
im_in = tf.image.decode_jpeg(img_string_in, channels=1)
im_out = tf.image.decode_jpeg(img_string_out, channels=1)
return im_in, im_out
t_datas = self.train_dataset.shuffle(self.size_training, seed=seed)
t_datas = t_datas.map(img_to_tensor)
t_datas = t_datas.batch(batch_size)
return t_datas
Now during the training, at each epoch we call the get_batched_data function, make an iterator, and run it for each batch, then feed the array to the optimizer operation.
for epoch in range(nb_epoch):
sess_iter_in = tf.Session()
sess_iter_out = tf.Session()
batched_train = data.get_batched_data(epoch)
iterator_train = batched_train.make_one_shot_iterator()
in_data, out_data = iterator_train.get_next()
total_batch = int(data.size_training / batch_size)
for batch in range(total_batch):
print(f"{batch + 1} / {total_batch}")
in_images = sess_iter_in.run(in_data).reshape((-1, 64, 64, 1))
out_images = sess_iter_out.run(out_data).reshape((-1, 64, 64, 1))
sess.run(optimizer, feed_dict={inputs: in_images,
outputs: out_images})
What do I need ?
I need to have a pipeline that loads only the images of the current batch (otherwise it will not fit in memory) and I want to shuffle the dataset in a different way for each epoch.
Questions and problems
First question, am I using the Dataset class in a good way? I saw very different things on the internet, for example in this blog post the dataset is used with a placeholder and fed during the learning with the datas. It seems strange because the data are all in an array, so loaded in memory. I don't see the point of using tf.data.dataset in this case.
I found solution by using repeat(epoch) on the dataset, like this, but the shuffle will not be different for each epoch in this case.
The second problem with my implementation is that I have an OutOfRangeError in some cases. With a small amount of data (512 like in the exemple) it works fine, but with a bigger amount of data, the error occurs. I thought it was because of a bad calculation of the number of batch due to bad rounding, or when the last batch has a smaller amount of data, but it happens in batch 32 out of 115... Is there any way to know the number of batch created after a batch(n) call on dataset?
Sorry for this loooonng question, but I've been struggling with this for a few days.

As far as I know, Official Performance Guideline is the best teaching material to make input pipelines.
I want to shuffle the dataset in a different way for each epoch.
Using shuffle() and repeat(), you can get different shuffle pattern for each epochs. You can confirm it with the following code
dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4])
dataset = dataset.shuffle(4)
dataset = dataset.repeat(3)
iterator = dataset.make_one_shot_iterator()
x = iterator.get_next()
with tf.Session() as sess:
for i in range(10):
print(sess.run(x))
You can also use tf.contrib.data.shuffle_and_repeat as the mentioned by the above official page.
There are some problems in your code outside of creating data pipelines. You confuse graph construction with graph execution. You are repeating to create data input pipeline, so there are many redundant input pipelines as many as epochs. You can observe the redundant pipelines by Tensorboard.
You should place your graph construction code outside of loop as the following code (pseudo code)
batched_train = data.get_batched_data()
iterator = batched_train.make_initializable_iterator()
in_data, out_data = iterator_train.get_next()
for epoch in range(nb_epoch):
# reset iterator's state
sess.run(iterator.initializer)
try:
while True:
in_images = sess.run(in_data).reshape((-1, 64, 64, 1))
out_images = sess.run(out_data).reshape((-1, 64, 64, 1))
sess.run(optimizer, feed_dict={inputs: in_images,
outputs: out_images})
except tf.errors.OutOfRangeError:
pass
Moreover there are some unimportant inefficient code. You loaded a list of file path with from_tensor_slices(), so the list was embedded in your graph. (See https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays for detail)
You would be better off using prefetch, and decreasing sess.run call by combining your graph.

how to read batches in one hdf5 data file for training?

I have a hdf5 training dataset with size (21760, 1, 33, 33). 21760 is the whole number of training samples. I want to use the mini-batch training data with the size 128 to train the network.
I want to ask:
How to feed 128 mini-batch training data from the whole dataset with tensorflow each time?

If your data set is so large that it can't be imported into memory like keveman suggested, you can use the h5py object directly:
import h5py
import tensorflow as tf
data = h5py.File('myfile.h5py', 'r')
data_size = data['data_set'].shape[0]
batch_size = 128
sess = tf.Session()
train_op = # tf.something_useful()
input = # tf.placeholder or something
for i in range(0, data_size, batch_size):
current_data = data['data_set'][position:position+batch_size]
sess.run(train_op, feed_dict={input: current_data})
You can also run through a huge number of iterations and randomly select a batch if you want to:
import random
for i in range(iterations):
pos = random.randint(0, int(data_size/batch_size)-1) * batch_size
current_data = data['data_set'][pos:pos+batch_size]
sess.run(train_op, feed_dict={inputs=current_data})
Or sequentially:
for i in range(iterations):
pos = (i % int(data_size / batch_size)) * batch_size
current_data = data['data_set'][pos:pos+batch_size]
sess.run(train_op, feed_dict={inputs=current_data})
You probably want to write some more sophisticated code that goes through all data randomly, but keeps track of which batches have been used, so you don't use any batch more often than others. Once you've done a full run through the training set you enable all batches again and repeat.

You can read the hdf5 dataset into a numpy array, and feed slices of the numpy array to the TensorFlow model. Pseudo code like the following would work :
import numpy, h5py
f = h5py.File('somefile.h5','r')
data = f.get('path/to/my/dataset')
data_as_array = numpy.array(data)
for i in range(0, 21760, 128):
sess.run(train_op, feed_dict={input:data_as_array[i:i+128, :, :, :]})

alkamen's approach seems logically right but I have not gotten any positive results using it. My best guess is this: Using code sample 1 above, in every iteration, the network trains afresh, forgetting all that has been learned in the previous loop. So if we are fetching at 30 samples or batches per iteration, at every loop/iteration, only 30 data samples are being used, then at the next loop, everything is overwritten.
Find below a screenshot of this approach
As can be seen, the loss and accuracy always start afresh. I will be happy if anyone could share a possible way around this, please.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.