Training huge amounts of data with tensorflow

Training huge amounts of data with tensorflow - python

I have about 60 thousand samples of size 200x870, they are all numpy arrays and I want to build a four-dimensional tensor out of them (with one singleton dimension) and train them with a CNN in tensorflow. Up to this point, I was using data that I could just load and create batches as below:
with tf.Graph().as_default():
data_train = tf.to_float(getInput.data_train)
phase, lr = tf.placeholder(tf.bool), tf.placeholder(tf.float32)
global_step = tf.Variable(0,trainable = False)
image_train, label_train = tf.train.slice_input_producer([data_train, labels_train], num_epochs=args.num_epochs)
images_train, batch_labels_train = tf.train.batch([image_train, label_train], batch_size=args.bsize)
Can someone suggest a way to go around it?
I wanted to split the dataset into subsets and in one epoch train one after the ather using a Queue for the paths of this files:
import scipy.io as sc
import numpy as np
import threading
import time
import tensorflow as tf
from tensorflow.python.client import timeline
def testQueues():
paths = ['data1', 'data2', 'data3', 'data4','data5']
queue_capacity = 6
bsize = 10
num_epochs = 2
filename_queue = tf.FIFOQueue(
#min_after_dequeue=0,
capacity=queue_capacity,
dtypes=tf.string,
shapes=[[]]
)
filenames_placeholder = tf.placeholder(dtype='string', shape=(None))
filenames_enqueue_op = filename_queue.enqueue_many(filenames_placeholder)
data_train, phase = tf.placeholder(tf.float32), tf.placeholder(tf.bool)
sess= tf.Session()
sess.run(filenames_enqueue_op, feed_dict={filenames_placeholder: paths})
for i in range(len(paths)):
train_set_batch_name = sess.run(filename_queue.dequeue())
train_set_batch_name = train_set_batch_name.decode('utf-8')
train_set_batch = np.load(train_set_batch_name+'.npy')
train_set_batch = tf.cast(train_set_batch, tf.float32)
init_op = tf.group(tf.initialize_all_variables(), tf.initialize_local_variables())
sess.run(init_op)
run_one_epoch(train_set_batch, sess)
size = sess.run(filename_queue.size())
print(size)
print(train_set_batch)
def run_one_epoch(train_set,sess):
image_train = tf.train.slice_input_producer([train_set], num_epochs=1)
images_train = tf.train.batch(image_train, batch_size=10)
x = tf.nn.relu(images_train)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
sess.run(x)
except tf.errors.OutOfRangeError:
pass
finally:
# When done, ask the threads to stop.
coord.request_stop()
coord.join(threads)
testQueues()
However I get an error
FailedPreconditionError: Attempting to use uninitialized value input_producer/input_producer/fraction_of_32_full/limit_epochs/epochs
[[Node: input_producer/input_producer/fraction_of_32_full/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:#input_producer/input_producer/fraction_of_32_full/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](input_producer/input_producer/fraction_of_32_full/limit_epochs/epochs)]]
Also it seems as I can't feed the dictionary with a tf.tensor only with numpy array, but casting it later to tf.tensor is also troublesome.

Have a look at Dataset api.
"The tf.data API enables you to build complex input pipelines from simple, reusable pieces."
In this approach what you do is you model your graph such that it handles data for you and pulls in limited data at a time for you to train your model on.
If memory issue still persists then you might want to look into generator to create your tf.data.Dataset. Your next step could be to potentially speed up the process by preparing tfrecords to create you Dataset.
Follow all the links to learn more and feel free to comment if you don't understand something.

For data that doesn't fit into memory the standard solution is to use Queues. You can set up some ops that read from files directly (cvs files, image files), and feed them into TensorFlow -- https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html

Related

Efficient example implementation of GPU-training of a simple feed-forward NN in TensorFlow? Maybe with tf.data?

I just started using the GPU version of TensorFlow hoping that it would speed up the training of my feed-forward neural networks. I am able to train on my GPU (GTX1080ti), but unfortunately it is not notably faster than doing the same training on my CPU (i7-8700K) the current way I’ve implemented it. During training, the GPU appears to barely be utilized at all, which makes me suspect that the bottleneck in my implementation is how the data is copied from the host to the device using feed_dict.
I’ve heard that TensorFlow has something called the “tf.data” pipeline which is supposed to make it easier and faster to feed data to GPUs etc. However I have not been able to find any simple examples where this concept is implemented into multilayer perceptron training as a replacement for feed_dict.
Is anyone aware of such an example and can point me to it? Preferably as simple as possible since I’m new to TensorFlow in general. Or is there something else I should change in my current implementation to make it more efficient? I’m pasting the code I have here:
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
tf.reset_default_graph()
import time
# Function for iris dataset.
def get_iris_data():
iris = datasets.load_iris()
data = iris["data"]
target = iris["target"]
# Convert to one-hot vectors
num_labels = len(np.unique(target))
all_Y = np.eye(num_labels)[target]
return train_test_split(data, all_Y, test_size=0.33, random_state=89)
# Function which initializes tensorflow weights & biases for feed-forward NN.
def InitWeights(LayerSizes):
with tf.device('/gpu:0'):
# Make tf placeholders for network inputs and outputs.
X = tf.placeholder( shape = (None,LayerSizes[0]),
dtype = tf.float32,
name ='InputData')
y = tf.placeholder( shape = (None,LayerSizes[-1]),
dtype = tf.float32,
name ='OutputData')
# Initialize weights and biases.
W = {}; b = {};
for ii in range(len(LayerSizes)-1):
layername = f'layer%s' % ii
with tf.variable_scope(layername):
ny = LayerSizes[ii]
nx = LayerSizes[ii+1]
# Weights (initialized with xavier initializatiion).
W['Weights_'+layername] = tf.get_variable(
name = 'Weights_'+layername,
shape = (ny, nx),
initializer = tf.contrib.layers.xavier_initializer(),
dtype = tf.float32
)
# Bias (initialized with xavier initializatiion).
b['Bias_'+layername] = tf.get_variable(
name = 'Bias_'+layername,
shape = (nx),
initializer = tf.contrib.layers.xavier_initializer(),
dtype = tf.float32
)
return W, b, X, y
# Function for forward propagation of NN.
def FeedForward(X, W, b):
with tf.device('/gpu:0'):
# Initialize 'a' of first layer to the placeholder of the network input.
a = X
# Loop all layers of the network.
for ii in range(len(W)):
# Use name of each layer as index.
layername = f'layer%s' % ii
## Weighted sum: z = input*W + b
z = tf.add(tf.matmul(a, W['Weights_'+layername], name = 'WeightedSum_z_'+layername), b['Bias_'+layername])
## Passed through actication fcn: a = h(z)
if ii == len(W)-1:
a = z
else:
a = tf.nn.relu(z, name = 'activation_a_'+layername)
return a
if __name__ == "__main__":
# Import data
train_X, test_X, train_y, test_y = get_iris_data()
# Define network size [ninputs-by-256-by-outputs]
LayerSizes = [4, 256, 3]
# Initialize weights and biases.
W, b, X, y = InitWeights(LayerSizes)
# Define loss function to optimize.
yhat = FeedForward(X, W, b)
loss = tf.reduce_sum(tf.square(y - yhat),reduction_indices=[0])
# Define optimizer to use when minimizing loss function.
all_variables = tf.trainable_variables()
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.0001)
train_op = optimizer.minimize(loss, var_list = all_variables)
# Start tf session and initialize variables.
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Train 10000 minibatches and time how long it takes.
t0 = time.time()
for i in range(10000):
ObservationsToUse = np.random.choice(len(train_X), 32)
X_minibatch = train_X[ObservationsToUse,:]
y_minibatch = train_y[ObservationsToUse,:]
sess.run(train_op, feed_dict={X : X_minibatch, y : y_minibatch})
t1 = time.time()
print('Training took %0.2f seconds' %(t1-t0))
sess.close()

The speed might be low because:
You are creating placeholders. Using numpy, we insert the data in the
placeholders and thereby they are converted to tensors of the graph.
By using tf.data.Dataset, you can create a direct pipeline which makes the data directly flow into the graph without the need of placeholders. They are fast, scalable and have a number of functions to play around with.
with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
Some useful functions :
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32) # Creating batches
dataset = dataset.repeat(num_epochs) # repeat the dataset 'N' times
iterator = dataset.make_one_shot_iterator() # Create a iterator to retrieve batches of data
X, Y = iterator.get_next()
Here, 32 is the batch size.
In your case,
dataset = tf.data.Dataset.from_tensor_slices((data, targets))
Hence, there is no need of placeholders. Directly run,
session.run( train_op ) # no feed_dict!!

Tensorflow Estimator: Cache bottlenecks

When following the tensorflow image classification tutorial, at first it caches the bottleneck of each image:
def: cache_bottlenecks())
I have rewritten the training using tensorflow's Estimator. This really simplified all the code. However I want to cache the bottleneck features here.
Here is my model_fn. I want to cache the results of the dense layer so I can make changes to the actual training without having to compute the bottlenecks each time.
How can I accomplish that?
def model_fn(features, labels, mode, params):
is_training = mode == tf.estimator.ModeKeys.TRAIN
num_classes = len(params['label_vocab'])
module = hub.Module(params['module_spec'], trainable=is_training and params['train_module'])
bottleneck_tensor = module(features['image'])
with tf.name_scope('final_retrain_ops'):
logits = tf.layers.dense(bottleneck_tensor, units=num_classes, trainable=is_training) # save this?
def train_op_fn(loss):
optimizer = tf.train.AdamOptimizer()
return optimizer.minimize(loss, global_step=tf.train.get_global_step())
head = tf.contrib.estimator.multi_class_head(n_classes=num_classes, label_vocabulary=params['label_vocab'])
return head.create_estimator_spec(
features, mode, logits, labels, train_op_fn=train_op_fn
)

TF cannot work as you code. You should:
Export bottleneck to file from the raw net.
Use bottleneck result as input, use another net to train your data.

To expand on what #Feng said:
see TFRecords and TFExamples and Load Images
Something like this should work (untested):
# Serialize the data into two tfrecord files
tf.enable_eager_execution()
feature_extractor = ...
features_file = tf.python_io.TFRecordWriter('features.tfrec')
label_file = tf.python_io.TFRecordWriter('labels.tfrec')
for images, labels in dataset:
features = feature_extractor(images)
features_file.write(tf.serialize_tensor(features))
label_file.write(tf.serialize_tensor(labels))
# Parse the files and zip them together
def parse(type, shape):
_def parse(x):
result = tf.parse_tensor(x, out_type=shape)
result = tf.reshape(result, FEATURE_SHAPE)
return result
return parse
features_ds = tf.data.TFRecordDataset('features.tfrec')
features_ds = features_ds.map(parse(tf.float32, FEATURE_SHAPE), num_parallel_calls=AUTOTUNE)
labels_ds = tf.data.TFRecordDataset('labels.tfrec')
labels_ds = labels_ds.map(parse(tf.float32, FEATURE_SHAPE), num_parallel_calls=AUTOTUNE)
ds = tf.data.Dataset.zip(features_ds, labels_ds)
ds = ds.unbatch().shuffle().repeat().batch().prefetch()...
You might also be able to do it using Dataset.cache, but I'm not 100% sure of the details.

Tensorflow and reading binary data properly

I am trying to properly read in my own binary data to Tensorflow based on Fixed length records section of this tutorial, and by looking at the read_cifar10 function here. Mind you I am new to tensorflow, so my understanding may be off.
My Data
My files are binary with float32 type. The first 32 bit sample is the label, and the remaining 256 samples are the data. I want to reshape the data at the end to a [2, 128] matrix.
My Code So far:
import tensorflow as tf
import os
def read_data(filename_queue):
item_type = tf.float32
label_items = 1
data_items = 256
label_bytes = label_items * item_type.size
data_bytes = data_items * item_type.size
record_bytes = label_bytes + data_bytes
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
key, value = reader.read(filename_queue)
record_data = tf.decode_raw(value, item_type)
# labels = tf.cast(tf.strided_slice(record_data, [0], [label_items]), tf.int32)
label = tf.strided_slice(record_data, [0], [label_items])
data0 = tf.strided_slice(record_data, [label_items], [label_items + data_items])
data = tf.reshape(data0, [2, data_items/2])
return data, label
if __name__ == '__main__':
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Set GPU device
datafiles = ['train_0000.dat', 'train_0001.dat']
num_epochs = 2
filename_queue = tf.train.string_input_producer(datafiles, num_epochs=num_epochs, shuffle=True)
data, label = read_data(filename_queue)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
(x, y) = read_data(filename_queue)
print(y.eval())
This code hands at the print(y.eval()), but I fear I have much bigger issues than that.
Question:
When I execute this, I get a data and label tensor returned. The problem is I don't quite understand how to actually read the data from the tensor. For example, I understand the autoencoder example here, however this has a mnist.train.next_batch(batch_size) function that is called to read the next batch. Do I need to write that for my function, or is it handled by something internal to my read_data() function. If I need to write that function, what does it look like?
Are their any other obvious things I'm missing? My goal in using this method is to reduce I/O overhead, and not store all of the data in memory, since my file are quite large.
Thanks in advance.

Yes. You are pretty much done. At this point you need to:
1) Write your neural network model model which is supposed to take your data and return a label.
2) Write your cost function C which takes the network prediction and the true label and gives you a cost.
3) Choose and optimizer.
4) Put everything together:
opt = tf.AdamOptimizer(learning_rate=0.001)
datafiles = ['train_0000.dat', 'train_0001.dat']
num_epochs = 2
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
filename_queue = tf.train.string_input_producer(datafiles, num_epochs=num_epochs, shuffle=True)
data, label = read_data(filename_queue)
example_batch, label_batch = tf.train.shuffle_batch(
[data, label], batch_size=128)
y_pred = model(data)
loss = C(label, y_pred)
After which you iterate and minimize the loss with:
opt.minimize(loss)
See also tf.train.string_input_producer behavior in a loop for related information.

Using MLP to associate integers (input) to histograms (outputs) using tensorflow

I'm struggling with solving this issue and I believe it is due to my data. I'm thinking about this as a few to many regression problem, but there could be a better approach in tensorflow.
Training Data
I have some data generated from a video sequence. For each frame of video I have a distribution of x,y positions for each cluster. There are 157,110 frames and 200,000 clusters. The frames and clusters are the inputs, which are integers and I think could be considered labels (I'll be using another network to learn the sequences of clusters later on). As each histogram is related to both a frame and clusterID, the input is not "one hot". The histograms (outputs) have 19+8 (x+y) bins where each count is rarely above 10, and could be normalized.
A subset of the training data is available here: The first two columns are the frame and clusterID (inputs) and the remaining 19+8 columns are the histograms (outputs).
What is the best network to learn to generate the appropriate histogram for a given frame/clusterID pair?
The following code is my current attempt using an MLP. It does not converge; in fact cost does not decrease at all. Is there something wrong in my implementation, or my choice of MLP, or a lack of scaling in my input data?
#!/usr/bin/python
# This program uses tensorflow to learn cluster probabilities and associate them with frame and cluster IDs
# Arguments
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("clusterProbabilityfile", help="CSV file containing cluster probabilities")
parser.add_argument("trainingIterations", type=int, help="CSV file containing cluster probabilities")
args = parser.parse_args()
# Imports for ML
import tensorflow as tf
import numpy as np
from tensorflow.python.framework import dtypes
# Imports for loading CSV file
from tensorflow.python.platform import gfile
import csv
# Global vars
numInputUnits = 2;
numOutputUnits = 19+8
numHiddenUnits = (numOutputUnits-numInputUnits)/2
workingDirectory = args.clusterProbabilityfile.split('/')[0]+"/"
columnSplit = 2 # Column number that splits
# Shuffle training set
def shuffleTrainingSet(trainingSet):
trainingIndecies = np.arange(len(trainingSet.data)) # assumes len(data) == len(target)
np.random.shuffle(trainingIndecies) # shuffle indecies
data = trainingSet.data[trainingIndecies]
target = trainingSet.target[trainingIndecies]
training_set = tf.contrib.learn.datasets.base.Dataset(data=data, target=target)
return training_set
# Load training data from CSV file, convert to numpy arrays and construct Dataset
# Modified from tf.contrib.learn.datasets.base.load_csv_without_header
# Should these be randomized???
with gfile.Open(args.clusterProbabilityfile) as csv_file:
data_file = csv.reader(csv_file)
data, target = [], []
for row in data_file:
target.append(row[columnSplit+1:]) # All elements past the split column.
data.append(row[:columnSplit]) # All elements before and including the split column.
target = np.array(target, dtype=int)
data = np.array(data, dtype=int)
training_set = tf.contrib.learn.datasets.base.Dataset(data=data, target=target)
training_set = shuffleTrainingSet(training_set)
# Construct computation graph
# MLP approach (from https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py)
# Single hidden layer!
inputVec = tf.placeholder(tf.float32, [None, numInputUnits])
outputVec = tf.placeholder(tf.float32, [None, numOutputUnits])
# Weights
hiddenWeights = tf.Variable(tf.random_normal([numInputUnits, numHiddenUnits])) # inputUnits -> hiddenUnits
outputWeights = tf.Variable(tf.random_normal([numHiddenUnits, numOutputUnits])) # hiddenUnits -> outputUnits
# Biases
hiddenBiases = tf.Variable(tf.random_normal([numHiddenUnits]))
outputBiases = tf.Variable(tf.random_normal([numOutputUnits]))
# Contruct MLP from layers
hiddenLayer = tf.add(tf.matmul(inputVec, hiddenWeights), hiddenBiases) # input * weight + bias = hidden
hiddenLayer = tf.nn.relu(hiddenLayer) # RELU Activation function for hidden layer.
outputLayer = tf.add(tf.matmul(hiddenLayer, outputWeights), outputBiases) # hidden * weight + bias = output
# loss and optimizer
#cross_entropy = -(outputVec * tf.log(outputLayer) + (1 - outputVec) * tf.log(1 - outputLayer))
#cost = tf.reduce_mean(cross_entropy)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(outputLayer, outputVec))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
# Compute graph
sess = tf.Session()
sess.run(tf.initialize_all_variables())
for epoch in range(args.trainingIterations):
training_set = shuffleTrainingSet(training_set) # Reshuffle for each epoch.
epochCost = sess.run(cost, feed_dict={inputVec: training_set.data, outputVec: training_set.target})
print("{:d}\t{:f}".format(epoch, epochCost))
# Evaluate model
correct_prediction = tf.equal(tf.argmax(outputLayer,1), tf.argmax(outputVec,1)) # compare output layer with target output vector.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Cost:", sess.run(cost,feed_dict={inputVec: training_set.data, outputVec: training_set.target}))
print("Accuracy:", sess.run(accuracy,feed_dict={inputVec: training_set.data, outputVec: training_set.target}))

How to prefetch data using a custom python function in tensorflow

I am trying to prefetch training data to hide I/O latency. I would like to write custom Python code that loads data from disk and preprocesses the data (e.g. by adding a context window). In other words, one thread does data preprocessing and the other does training. Is this possible in TensorFlow?
Update: I have a working example based on #mrry's example.
import numpy as np
import tensorflow as tf
import threading
BATCH_SIZE = 5
TRAINING_ITERS = 4100
feature_input = tf.placeholder(tf.float32, shape=[128])
label_input = tf.placeholder(tf.float32, shape=[128])
q = tf.FIFOQueue(200, [tf.float32, tf.float32], shapes=[[128], [128]])
enqueue_op = q.enqueue([label_input, feature_input])
label_batch, feature_batch = q.dequeue_many(BATCH_SIZE)
c = tf.reshape(feature_batch, [BATCH_SIZE, 128]) + tf.reshape(label_batch, [BATCH_SIZE, 128])
sess = tf.Session()
def load_and_enqueue(sess, enqueue_op, coord):
with open('dummy_data/features.bin') as feature_file, open('dummy_data/labels.bin') as label_file:
while not coord.should_stop():
feature_array = np.fromfile(feature_file, np.float32, 128)
if feature_array.shape[0] == 0:
print('reach end of file, reset using seek(0,0)')
feature_file.seek(0,0)
label_file.seek(0,0)
continue
label_value = np.fromfile(label_file, np.float32, 128)
sess.run(enqueue_op, feed_dict={feature_input: feature_array,
label_input: label_value})
coord = tf.train.Coordinator()
t = threading.Thread(target=load_and_enqueue, args=(sess,enqueue_op, coord))
t.start()
for i in range(TRAINING_ITERS):
sum = sess.run(c)
print('train_iter='+str(i))
print(sum)
coord.request_stop()
coord.join([t])

This is a common use case, and most implementations use TensorFlow's queues to decouple the preprocessing code from the training code. There is a tutorial on how to use queues, but the main steps are as follows:
Define a queue, q, that will buffer the preprocessed data. TensorFlow supports the simple tf.FIFOQueue that produces elements in the order they were enqueued, and the more advanced tf.RandomShuffleQueue that produces elements in a random order. A queue element is a tuple of one or more tensors (which can have different types and shapes). All queues support single-element (enqueue, dequeue) and batch (enqueue_many, dequeue_many) operations, but to use the batch operations you must specify the shapes of each tensor in a queue element when constructing the queue.
Build a subgraph that enqueues preprocessed elements into the queue. One way to do this would be to define some tf.placeholder() ops for tensors corresponding to a single input example, then pass them to q.enqueue(). (If your preprocessing produces a batch at once, you should use q.enqueue_many() instead.) You might also include TensorFlow ops in this subgraph.
Build a subgraph that performs training. This will look like a regular TensorFlow graph, but will get its input by calling q.dequeue_many(BATCH_SIZE).
Start your session.
Create one or more threads that execute your preprocessing logic, then execute the enqueue op, feeding in the preprocessed data. You may find the tf.train.Coordinator and tf.train.QueueRunner utility classes useful for this.
Run your training graph (optimizer, etc.) as normal.
EDIT: Here's a simple load_and_enqueue() function and code fragment to get you started:
# Features are length-100 vectors of floats
feature_input = tf.placeholder(tf.float32, shape=[100])
# Labels are scalar integers.
label_input = tf.placeholder(tf.int32, shape=[])
# Alternatively, could do:
# feature_batch_input = tf.placeholder(tf.float32, shape=[None, 100])
# label_batch_input = tf.placeholder(tf.int32, shape=[None])
q = tf.FIFOQueue(100, [tf.float32, tf.int32], shapes=[[100], []])
enqueue_op = q.enqueue([feature_input, label_input])
# For batch input, do:
# enqueue_op = q.enqueue_many([feature_batch_input, label_batch_input])
feature_batch, label_batch = q.dequeue_many(BATCH_SIZE)
# Build rest of model taking label_batch, feature_batch as input.
# [...]
train_op = ...
sess = tf.Session()
def load_and_enqueue():
with open(...) as feature_file, open(...) as label_file:
while True:
feature_array = numpy.fromfile(feature_file, numpy.float32, 100)
if not feature_array:
return
label_value = numpy.fromfile(feature_file, numpy.int32, 1)[0]
sess.run(enqueue_op, feed_dict={feature_input: feature_array,
label_input: label_value})
# Start a thread to enqueue data asynchronously, and hide I/O latency.
t = threading.Thread(target=load_and_enqueue)
t.start()
for _ in range(TRAINING_EPOCHS):
sess.run(train_op)

In other words, one thread does data preprocessing and the other does training. Is this possible in TensorFlow?
Yes, it is. mrry's solution works, but simpler exists.
Fetching data
tf.py_func wraps a python function and uses it as a TensorFlow operator. So we can load the data at sess.run() each time. The problem with this approach is that data is loaded during sess.run() via the main thread.
A minimal example:
def get_numpy_tensor():
return np.array([[1,2],[3,4]], dtype=np.float32)
tensorflow_tensor = tf.py_func(get_numpy_tensor, [], tf.float32)
A more complex example:
def get_numpy_tensors():
# Load data from the disk into numpy arrays.
input = np.array([[1,2],[3,4]], dtype=np.float32)
target = np.int32(1)
return input, target
tensorflow_input, tensorflow_target = tf.py_func(get_numpy_tensors, [], [tf.float32, tf.int32])
tensorflow_input, tensorflow_target = 2*tensorflow_input, 2*tensorflow_target
sess = tf.InteractiveSession()
numpy_input, numpy_target = sess.run([tensorflow_input, tensorflow_target])
assert np.all(numpy_input==np.array([[2,4],[6,8]])) and numpy_target==2
Prefetching data in another thread
To queue our data in another thread (so that sess.run() won't have to wait for the data), we can use tf.train.batch() on our operators from tf.py_func().
A minimal example:
tensor_shape = get_numpy_tensor().shape
tensorflow_tensors = tf.train.batch([tensorflow_tensor], batch_size=32, shapes=[tensor_shape])
# Run `tf.train.start_queue_runners()` once session is created.
We can omit the argument shapes if tensorflow_tensor has its shape specified:
tensor_shape = get_numpy_tensor().shape
tensorflow_tensor.set_shape(tensor_shape)
tensorflow_tensors = tf.train.batch([tensorflow_tensor], batch_size=32)
# Run `tf.train.start_queue_runners()` once session is created.
A more complex example:
input_shape, target_shape = (2, 2), ()
def get_numpy_tensors():
input = np.random.rand(*input_shape).astype(np.float32)
target = np.random.randint(10, dtype=np.int32)
print('f', end='')
return input, target
tensorflow_input, tensorflow_target = tf.py_func(get_numpy_tensors, [], [tf.float32, tf.int32])
batch_size = 2
tensorflow_inputs, tensorflow_targets = tf.train.batch([tensorflow_input, tensorflow_target], batch_size, shapes=[input_shape, target_shape], capacity=2)
# Internal queue will contain at most `capasity=2` times `batch_size=2` elements `[tensorflow_input, tensorflow_target]`.
tensorflow_inputs, tensorflow_targets = 2*tensorflow_inputs, 2*tensorflow_targets
sess = tf.InteractiveSession()
tf.train.start_queue_runners() # Internally, `tf.train.batch` uses a QueueRunner, so we need to ask tf to start it.
for _ in range(10):
numpy_inputs, numpy_targets = sess.run([tensorflow_inputs, tensorflow_targets])
assert numpy_inputs.shape==(batch_size, *input_shape) and numpy_targets.shape==(batch_size, *target_shape)
print('r', end='')
# Prints `fffffrrffrfrffrffrffrffrffrffrf`.
In case get_numpy_tensor() returns a batch of tensors, then tf.train.batch(..., enqueue_many=True) will help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Training huge amounts of data with tensorflow - python

For data that doesn't fit into memory the standard solution is to use Queues. You can set up some ops that read from files directly (cvs files, image files), and feed them into TensorFlow -- https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html

Related

Efficient example implementation of GPU-training of a simple feed-forward NN in TensorFlow? Maybe with tf.data?

Tensorflow Estimator: Cache bottlenecks

Tensorflow and reading binary data properly

Using MLP to associate integers (input) to histograms (outputs) using tensorflow

How to prefetch data using a custom python function in tensorflow

Categories

Resources