Situation
I want to train a specific network architecture (a GAN) that needs inputs from different sources during training.
One input source is examples loaded from disk. The other source is a generator sub-network creating examples.
To choose which kind of input to feed to the network I use tf.cond. There is one caveat though that has already been explained: tf.cond evaluates the inputs to both conditional branches even though only one of those will ultimately be used.
Enough setup, here is a minimal working example:
import numpy as np
import tensorflow as tf
BATCH_SIZE = 32
def load_input_data():
# Normally this data would be read from disk
data = tf.reshape(np.arange(10 * BATCH_SIZE, dtype=np.float32), shape=(10 * BATCH_SIZE, 1))
return tf.train.batch([data], BATCH_SIZE, enqueue_many=True)
def generate_input_data():
# Normally this data would be generated by a much bigger sub-network
return tf.random_uniform(shape=[BATCH_SIZE, 1])
def main():
# A bool to choose between loaded or generated inputs
load_inputs_pred = tf.placeholder(dtype=tf.bool, shape=[])
# Variant 1: Call "load_input_data" inside tf.cond
data_batch = tf.cond(load_inputs_pred, load_input_data, generate_input_data)
# Variant 2: Call "load_input_data" outside tf.cond
#loaded_data = load_input_data()
#data_batch = tf.cond(load_inputs_pred, lambda: loaded_data, generate_input_data)
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
print(threads)
# Get generated input data
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: False})
print(data_batch_values)
# Get input data loaded from disk
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: True})
print(data_batch_values)
if __name__ == '__main__':
main()
Problem
Variant 1 does not work at all since the queue runner threads don't seem to run. print(threads) outputs something like [<Thread(Thread-1, stopped daemon 140165838264064)>, ...].
Variant 2 does work and print(threads) outputs something like [<Thread(Thread-1, started daemon 140361854863104)>, ...]. But since load_input_data() has been called outside of tf.cond, batches of data will be loaded from disk even when load_inputs_pred is False.
Is it possible to make Variant 1 work, so that input data is only loaded when load_inputs_pred is True and not for every call to session.run()?
If you're using a queue when loading your data and follow it up with a batch input then this shouldn't be a problem as you can specify the max amount to have loaded or stored in the queue.
input = tf.WholeFileReader(somefilelist) # or another way to load data
return tf.train.batch(input,batch_size=10,capacity=100)
See here for more details:
https://www.tensorflow.org/versions/r0.10/api_docs/python/io_ops.html#batch
Also there's an alternative approach that skips the tf.cond completely. Just define two losses one that follows the data through the autoencoder and discrimator and the other that follows the data through just the discriminator.
Then it just becomes a matter of calling
sess.run(auto_loss,feed_dict)
or
sess.run(real_img_loss,feed_dict)
In this way the graph will only run through which ever loss was called upon. Let me know if this needs more explanation.
Lastly I think to make variant one work you need to do something like this if you're using preloaded data.
https://www.tensorflow.org/versions/r0.10/how_tos/reading_data/index.html#preloaded-data
Otherwise I'm not sure what the issue is to be honest.
Related
I'm changing my TensorFlow code from the old queue interface to the new Dataset API. With the old interface I could specify the num_threads argument to the tf.train.shuffle_batch queue. However, the only way to control the amount of threads in the Dataset API seems to be in the map function using the num_parallel_calls argument. However, I'm using the flat_map function instead, which doesn't have such an argument.
Question: Is there a way to control the number of threads/processes for the flat_map function? Or is there are way to use map in combination with flat_map and still specify the number of parallel calls?
Note that it is of crucial importance to run multiple threads in parallel, as I intend to run heavy pre-processing on the CPU before data enters the queue.
There are two (here and here) related posts on GitHub, but I don't think they answer this question.
Here is a minimal code example of my use-case for illustration:
with tf.Graph().as_default():
data = tf.ones(shape=(10, 512), dtype=tf.float32, name="data")
input_tensors = (data,)
def pre_processing_func(data_):
# normally I would do data-augmentation here
results = (tf.expand_dims(data_, axis=0),)
return tf.data.Dataset.from_tensor_slices(results)
dataset_source = tf.data.Dataset.from_tensor_slices(input_tensors)
dataset = dataset_source.flat_map(pre_processing_func)
# do something with 'dataset'
To the best of my knowledge, at the moment flat_map does not offer parallelism options.
Given that the bulk of the computation is done in pre_processing_func, what you might use as a workaround is a parallel map call followed by some buffering, and then using a flat_map call with an identity lambda function that takes care of flattening the output.
In code:
NUM_THREADS = 5
BUFFER_SIZE = 1000
def pre_processing_func(data_):
# data-augmentation here
# generate new samples starting from the sample `data_`
artificial_samples = generate_from_sample(data_)
return atificial_samples
dataset_source = (tf.data.Dataset.from_tensor_slices(input_tensors).
map(pre_processing_func, num_parallel_calls=NUM_THREADS).
prefetch(BUFFER_SIZE).
flat_map(lambda *x : tf.data.Dataset.from_tensor_slices(x)).
shuffle(BUFFER_SIZE)) # my addition, probably necessary though
Note (to myself and whoever will try to understand the pipeline):
Since pre_processing_func generates an arbitrary number of new samples starting from the initial sample (organised in matrices of shape (?, 512)), the flat_map call is necessary to turn all the generated matrices into Datasets containing single samples (hence the tf.data.Dataset.from_tensor_slices(x) in the lambda) and then flatten all these datasets into one big Dataset containing individual samples.
It's probably a good idea to .shuffle() that dataset, or generated samples will be packed together.
I'm a noob to TF so go easy on me.
I have to train a simple CNN from a bunch of images in a directory with labels. After looking around a lot, I cooked up this code that prepares a TF input pipeline and I was able to print the image array.
image_list, label_list = load_dataset()
imagesq = ops.convert_to_tensor(image_list, dtype=dtypes.string)
labelsq = ops.convert_to_tensor(label_list, dtype=dtypes.int32)
# Makes an input queue
input_q = tf.train.slice_input_producer([imagesq, labelsq],
shuffle=True)
file_content = tf.read_file(input_q[0])
train_image = tf.image.decode_png(file_content,channels=3)
train_label = input_q[1]
train_image.set_shape([120,120,3])
# collect batches of images before processing
train_image_batch, train_label_batch = tf.train.batch(
[train_image, train_label],
batch_size=5
# ,num_threads=1
)
with tf.Session() as sess:
# initialize the variables
sess.run(tf.global_variables_initializer())
# initialize the queue threads to start to shovel data
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# print "from the train set:"
for i in range(len(image_list)):
print sess.run(train_image_batch)
# sess.run(train_image)
# sess.run(train_label)
# classifier.fit(input_fn=lambda: (train_image, train_label),
# steps=100,
# monitors=[logging_hook])
# stop our queue threads and properly close the session
coord.request_stop()
coord.join(threads)
sess.close()
But looking at the MNIST example given in TF docs, I see they use a cnn_model_fn along with Estimator class.
I have defined my own cnn_model_fn and would like to combine the two. Please help me on how to move forward with this. This code doesn't work
classifier = learn.Estimator(model_fn=cnn_model_fn, model_dir='./test_model')
classifier.fit(input_fn=lambda: (train_image, train_label),
steps=100,
monitors=[logging_hook])
It seems the pipeline is populated only when the session is run, otherwise its empty and it gives a ValueError 'Input graph and Layer graph are not the same'
Please help me.
I'm new to tensorflow myself so take this with a grain of salt.
AFAICT, when you call any of the tf APIs that create "tensors" or "operations" they are created into a context called a Graph.
Further, I believe when the Estimator runs it creates a new empty Graph for each run. It populates the Graph by running model_fn and input_fn that are supposed to call tf APIs that add the "tensors" and "operations" in context of this fresh Graph.
The return values from model_fn and input_fn just provide references so that the parts could be connected correctly - the Graph already contains them.
However in this example the input operations have already been created before the Estimator created the Graph and thus their related operations have been added to the implicit default Graph (one is created automatically I believe). So when the Estimator creates a new one and populates the model with model_fn the input and model will be on two different graphs.
To fix this you need to change the input_fn. Don't just wrap the (image, labels) pair into a lambda but rather wrap the entire construction of the input into a function so that when the Estimator runs input_fn as a side effect of all the API calls all the input operations and tensors would be created in context of the correct Graph.
Here, I read some tensorflow implementation of style transfer.
Specifically, it defines the loss which is then to be optimized. In one loss function, it says:
`
def sum_style_losses(sess, net, style_imgs):
total_style_loss = 0.
weights = args.style_imgs_weights
for img, img_weight in zip(style_imgs, weights):
sess.run(net['input'].assign(img))
style_loss = 0.
for layer, weight in zip(args.style_layers, args.style_layer_weights):
a = sess.run(net[layer])
x = net[layer]
a = tf.convert_to_tensor(a)
style_loss += style_layer_loss(a, x) * weight
style_loss /= float(len(args.style_layers))
total_style_loss += (style_loss * img_weight)
`
The optimizer is called with the current session:
optimizer.minimize(sess)
So the session is up and running, but during the run, it calls further runs in the for loop. Can anyone exlain the tensorflow logic, especially why x contains the feature vector of input image (and not of the style image). For me, there seem to be two runs in parallel.
The code from the github repo is as follows:
init_op = tf.global_variables_initializer()
sess.run(init_op)
sess.run(net['input'].assign(init_img))
optimizer.minimize(sess)
A session schedules operations to be run on devices and holds some variables. It can be used to schedule many operations across different (sub)graphs. So in the code above the session
Initializes variables.
Assigns an image to the symbolic input tensor. Note you can also use feeds for this.
Minimize using scipy's optimizer (which has already been passed a loss in its constructor... More details found here).
At each of these stages the session is responsible for scheduling execution of the subgraphs that allow for the computations that occur at that stage.
By following the mnist example, I was able to build a custom network and use the inputs function of the example to load my dataset (previously encoded as a TFRecord). Just to recap it, the inputs function looks like:
def inputs(train_dir, train, batch_size, num_epochs, one_hot_labels=False):
if not num_epochs: num_epochs = None
filename = os.path.join(train_dir,
TRAIN_FILE if train else VALIDATION_FILE)
with tf.name_scope('input'):
filename_queue = tf.train.string_input_producer(
[filename], num_epochs=num_epochs)
# Even when reading in multiple threads, share the filename
# queue.
image, label = read_and_decode(filename_queue)
# Shuffle the examples and collect them into batch_size batches.
# (Internally uses a RandomShuffleQueue.)
# We run this in two threads to avoid being a bottleneck.
images, sparse_labels = tf.train.shuffle_batch(
[image, label], batch_size=batch_size, num_threads=2,
capacity=1000 + 3 * batch_size,
# Ensures a minimum amount of shuffling of examples.
min_after_dequeue=1000)
return images, sparse_labels
Then, during the training I declare the training operator and run everything, and everything goes smoothly.
Now, I am trying to use the very same function to train a different network on the same data, the only (major) difference is that instead of just calling the slim.learning.train function on some train_operator, I do the training manually (by manually evaluating the losses and updating the parameters). The architecture is more complex and I'm forced to do so.
When I try to use the data generated by the inputs function, the program gets stuck, setting a queue timeout indeed shows that it's stuck on the producer's queue.
This leads me to believe that I'm probably missing something about the use of producers in tensorflow, I have read the tutorials but I couldn't figure out the issue. Is there some kind of initialization that calling slim.learning.train does and that I need to replicate by hand if I do my training manually? Why exactly isn't the producer producing?
For example, doing something like:
imgs, labels = inputs(...)
print imgs
prints
<tf.Tensor 'input/shuffle_batch:0' shape=(1, 128, 384, 6) dtype=float32>
which is the correct (symbolic?) tensor but if I then try to get the actual data with a imgs.eval() it's stuck indefinitely.
You need to start the queue runners, or the queues will be empty and reading from them will hang. See the documentation on queue runners.
I am training a neural network and have been running this code without any problems but sometimes (twice) I get an error Not Found: FetchOutputs node not found at the line y_1 = sess.run(get_labels(step)) (See below).
get_labels(step) is a function to return the correct labels of my training images which is in a text file.
def get_labels(step):
with open('labels.txt','r') as fin:
reader = csv.reader(fin)
c = [[int(s) for s in row] for i,row in enumerate(reader) if i==step]
label_numbers = np.array(c)
# Convert to one-hot vectors
numpy_label = np.zeros((BATCH_SIZE,5))
for i in range(BATCH_SIZE):
numpy_label[i,label_numbers[0][i]-1] = 1
# Convert to tensor
y_label = tf.convert_to_tensor(numpy_label,dtype=tf.float32)
return y_label
This is my main function:
def main():
# Placeholder for correct labels
y_label = tf.placeholder(tf.float32,shape=[BATCH_SIZE,5])
< Other functions etc. >
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
for step in range(1000):
# Get labels for current batch
y_1 = sess.run(get_labels(step))
# Train
sess.run([train_step],feed_dict={y_label:y_1})
< Other stuff like writing summaries, saving variables etc. >
sess.close()
From reading some of the issues on GitHub, I know this is to do with the fact that I call y_1 = sess.run(get_labels(step)) after tf.train.start_queue_runners(sess=sess) but I don't understand:
why it works most of the time, but occasionally doesn't?
Is y_1 = sess.run(get_labels(step)) adding or modifying nodes in the graph? I thought I was just running a node get_labels(step) that was already defined in the graph. I tried finalizing the graph before starting the queue runners but that gave me the error that finalized graphs cannot be modified.
What would be the proper way to write the code? Usually I just restart my program and it is fine - but clearly I am not doing it the proper way.
Thank you!
EDIT:
I think it might be important to mention that this happens when I am trying to run a TensorFlow script in a separate screen on a server i.e. I have one screen running a TensorFlow script and now I create a new screen to run a different TensorFlow script. I just started using screens so I might be missing something fundamental about how they work.