Here, I read some tensorflow implementation of style transfer.
Specifically, it defines the loss which is then to be optimized. In one loss function, it says:
`
def sum_style_losses(sess, net, style_imgs):
total_style_loss = 0.
weights = args.style_imgs_weights
for img, img_weight in zip(style_imgs, weights):
sess.run(net['input'].assign(img))
style_loss = 0.
for layer, weight in zip(args.style_layers, args.style_layer_weights):
a = sess.run(net[layer])
x = net[layer]
a = tf.convert_to_tensor(a)
style_loss += style_layer_loss(a, x) * weight
style_loss /= float(len(args.style_layers))
total_style_loss += (style_loss * img_weight)
`
The optimizer is called with the current session:
optimizer.minimize(sess)
So the session is up and running, but during the run, it calls further runs in the for loop. Can anyone exlain the tensorflow logic, especially why x contains the feature vector of input image (and not of the style image). For me, there seem to be two runs in parallel.
The code from the github repo is as follows:
init_op = tf.global_variables_initializer()
sess.run(init_op)
sess.run(net['input'].assign(init_img))
optimizer.minimize(sess)
A session schedules operations to be run on devices and holds some variables. It can be used to schedule many operations across different (sub)graphs. So in the code above the session
Initializes variables.
Assigns an image to the symbolic input tensor. Note you can also use feeds for this.
Minimize using scipy's optimizer (which has already been passed a loss in its constructor... More details found here).
At each of these stages the session is responsible for scheduling execution of the subgraphs that allow for the computations that occur at that stage.
Related
It is such a long title, but hopefully I will be able to explain myself properly in a few sentences:
I am trying to minimize a given score function using Tensorflow inspired by what was published in Minimize a function of one variable in Tensorflow. The value for such score function is obtained through making a call to a Matlab script which needs to be provided with only one parameter (which is related to the input variable, a tensor).
To do so I am using the beta version of Tensorflow 2.0, which includes a feature known as eager execution which allows to access to the contents of each tensor without needing to run any session whatsoever.
Here you may find a scratch of my code:
import tensorflow as tf
import numpy as np
eng = matlab.engine.start_matlab()
def costFunction():
z = tf.add(x,y).numpy()
H = np.asarray(eng.matlabfunction(matlab.double(z.tolist()),...)) # There are other parameters (Python lists) to be passed as arguments to my Matlab script alongside them, not included for the sake of simplicity
h = tf.convert_to_tensor(...) # Here I retrieve those elements from matrix H which I actually aim to maximize
return h
x = tf.Variable(initial_value=tf.zeros([6,N], tf.float64), trainable=True)
opt = tf.optimizers.Adam(learning_rate=1e-5, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
iters = 1000
for i in range(iters):
train = opt.minimize(costFunction, tunedPhases)
if i % 100 == 0:
print("Iteration {}, loss: {}".format(i+1, costFunction()))
Sadly, this solution does still not work out as I receive the following error message as output:
ValueError: No gradients provided for any variable: ['Variable:0'].
After a exhaustive search, I think this problem is related to this old post (TensorFlow: 'ValueError: No gradients provided for any variable'), which was solved by doing the corresponding operations from cost function directly to the tensors. However, I have no other option but to invoke this matlabfunction and use its output as the output of my cost function.
Do you have any ideas about how to overcome this?
Many thanks in advance, and may you all have a nice week!
Recently, I try to learn how to use Tensorflow on multiple GPU to accelerate training speed. I found an official tutorial about training classification model based on Cifar10 dataset. However, I found that this tutorial reads image by using the queue. Out of curiosity, how can I use multiple GPU by feeding value into Session? It seems that it is hard for me to solve the problem that feeds different value from the same dataset to different GPU. Thank you, everybody! The following code is about part of the official tutorial.
images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
[images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
# Dequeues one batch for the GPU
image_batch, label_batch = batch_queue.dequeue()
# Calculate the loss for one tower of the CIFAR model. This function
# constructs the entire CIFAR model but shares the variables across
# all towers.
loss = tower_loss(scope, image_batch, label_batch)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
# Calculate the gradients for the batch of data on this CIFAR tower.
grads = opt.compute_gradients(loss)
# Keep track of the gradients across all towers.
tower_grads.append(grads)
The core idea of the multi-GPU example is that you explicitly assign operations to a tf.device. The example loops over FLAGS.num_gpus devices and creates a replica for each of the GPUs.
If you create placeholder ops inside the for loop, they will get assigned to their respective devices. All you need to do is keep handles to the created placeholders and then feed them all independently in a single session.run call.
placeholders = []
for i in range(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
plc = tf.placeholder(tf.int32)
placeholders.append(plc)
with tf.Session() as sess:
fd = {plc: i for i, plc in enumerate(placeholders)}
sess.run(sum(placeholders), feed_dict=fd) # this should give you the sum of all
# numbers from 0 to FLAGS.num_gpus - 1
To address your specific example, it should suffice to replace the batch_queue.dequeue() call with the construction of two placeholders (for image_batch and label_batch tensors), store these placeholders somewhere, and then feed the values you need to those.
Another (somewhat hacky) way is to override the image_batch and label_batch tensors directly in the session.run call, because you can feed_dict any tensor (not just a placeholder). You will still need to store the tensors somewhere to be able to reference them from the run call.
QueueRunner and Queue-based API is relatively out-dated, it is clearly mentioned in Tensorflow docs:
Input pipelines using the queue-based APIs can be cleanly
replaced by the tf.data API
As a result, it is recommended to use tf.data API. It optimized for multi GPU and TPU purposes.
How to use it?
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
iterator = dataset.make_one_shot_iterator()
x,y = iterator.get_next()
# define your model
logit = tf.layers.dense(x,2) # use x directrly in your model
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
train_step = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
sess.run(train_step)
You can create multiple iterator for each GPU with Dataset.shard() or more easily use estimator API.
For a complete tutorial see here.
I am trying to train a network with tensorflow with multiple towers. I had set reuse = True for all the towers. But in the cifar10 multi gpu train of tensorflow tutorials, the reuse variable has set after the first tower was created:
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
# Dequeues one batch for the GPU
image_batch, label_batch = batch_queue.dequeue()
# Calculate the loss for one tower of the CIFAR model. This function
# constructs the entire CIFAR model but shares the variables across
# all towers.
# Actually the logits (whole network) is defined in tower_loss
loss = tower_loss(scope, image_batch, label_batch)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
Does it make any difference? What happens if we set reuse=True beforehand?
You need to have reuse=False for the first run to generate variables. It is an error if reuse=True but the variable is not yet constructed.
If you use a newer version of tensorflow (>1.4 I think) you can use reuse=tf.AUTO_REUSE and it will do the magic for you.
I'm not sure how this interacts with the multi device setup you have. Double check if the variable names don't become prefixed by the device. In that case there's no reuse, each device has a different variable.
There are two ways to share variables.
Either version 1:
with tf.variable_scope("model"):
output1 = my_image_filter(input1)
with tf.variable_scope("model", reuse=True):
output2 = my_image_filter(input2)
or version 2:
with tf.variable_scope("model") as scope:
output1 = my_image_filter(input1)
scope.reuse_variables()
output2 = my_image_filter(input2)
Both methods share the variable. The second method is used in the Cifar10 tutorial because it is much cleaner (and that's only my opinion). You can try to rebuild it with version 1, the code will probably be less readable.
All code is assuming Tensorflow 1.3 and Python 3.x
We are working on a GAN algorithm which has an interesting loss function.
Stage 1 - Compute only the completion/generator loss portion of the network
Iterates over the completion portion of the GAN for X iterations.
Stage 2 - Compute only the discriminator loss portion of the network
Iterates over the discriminator portion for Y iterations (but
don't train on Stage 1)
Stage 3 - Compute the full loss on the network
Iterate over both completion and discriminator for Z iterations
(training on the entire network).
We have this working single GPU. We want to make it work multi GPU since training times are long.
We have looked at the Tensorflow/models/tutorials/Images/cifar10/cifar10_multi_gpu_train.py, which talks about tower loss, averaging the towers together, computing your gradients on the GPUs then applying them on the CPU. This is a great start. However, since our loss is more complicated, it has complicated everything a bit for us.
The code is decently complicated, but is roughly similar to this, https://github.com/timsainb/Tensorflow-MultiGPU-VAE-GAN, (but that won't run because it was written around Tensorflow 0.1, so it has some oddities that I haven't gotten working, but that should give you an idea of what we're doing)
When we compute gradients, it looks something like this (pseudocode to try to highlight the important portions):
for i in range(num_gpus):
with tf.device('/gpu:%d' % gpus[i]):
with tf.name_scope('Tower_%d' % gpus[i]) as scope:
with tf.variable_scope( "generator" )
generator = build_generator()
with tf.variable_scope( "discriminator" ):
with tf.variable_scope( "real_discriminator" ) :
real_discriminator = build_discriminator(x)
with tf.variable_scope( "fake_discriminator", reuse = True ):
fake_discriminator = build_discriminator(generator)
gen_only_loss, discm_only_loss, full_loss = build_loss( generator,
real_discriminator, fake_discriminator )
tf.get_variable_scope().reuse_variables()
gen_only_grads = gen_only_opt.compute_gradients(gen_only_loss)
tower_gen_only_grads.append(gen_only_grads)
discm_only_train_vars= tf.get_collection(
tf.GraphKeys.TRAINABLE_VARIABLES, "discriminator" )
discm_only_train_vars= discm_only_train_vars+ tf.get_collection(
tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES, "discriminator" )
discm_only_grads = discm_only_opt.compute_gradients(discm_only_loss,
var_list = discm_only_train_vars)
tower_discm_only_grads.append(discm_only_grads)
full_grads = full_opt.compute_gradients(full_loss)
tower_full_grads.append(full_grads)
# average_gradients is the same code from the cifar10_multi_gpu_train.py.
We haven't changed it. Just iterates over gradients and averages
them...this is part of the problem...
gen_only_grads = average_gradients(tower_gen_only_grads)
gen_only_train = gen_only_opt.apply_gradients(gen_only_grads,
global_step=global_step)
discm_only_grads = average_gradients(tower_discm_only_grads)
discm_only_train = discm_only_opt.apply_gradients(discm_only_grads,
global_step=global_step)
full_grads = average_gradients(tower_full_grads)
full_train = full_opt.apply_gradients(full_grads, global_step=global_step)
If we call only "compute_gradients(full_loss)", the algorithm works properly on multiple GPUs. This is pretty equivalent to the code in the cifar10_multi_gpu_train.py example. The tricky part comes when need to restrict the network in stage 1 or 2.
Compute_gradients(full_loss), has a var_list parameter with a default value of None, which means it trains all the variables. How does it know to not train Tower_0 variables when in Tower_1? I ask, because when we deal with the compute_gradients( discm_only_loss, var_list = discm_only_train_vars), I need to know how to gather up the correct variables to restrict training to that portion of the network. I found one thread talking about this, but found it to be inaccurate/incomplete - "freeze" some variables/scopes in tensorflow: stop_gradient vs passing variables to minimize.
The reason being, that if you look at the code in compute_gradients, var_list is filled out with is a combination of trainable variables and trainable resource variables when None is passed in. So that's how I've limited it as well. This all works properly if we don't attempt to split across multiple GPUs.
Question 1:
Now that I've split the network by towers, am I responsible for gathering up the current tower as well? Do I need to add a line like this?
discm_only_train_vars= tf.get_collection( tf.GraphKeys.TRAINABLE_VARIABLES, "Tower_{}/discriminator".format( i ) )
discm_only_train_vars= discm_only_train_vars + tf.get_collection( tf.GraphKeys.TRAINABLE_RESOURCE_VARIABLES, "Tower_{}/discriminator".format( i ) )
In order to train the proper variables for tower (and ensure I don't miss the training of those variable?)
Question 2:
Probably the same answer as question 1. Getting "compute_gradients(gen_only_loss)" is a bit harder...in the non towered version, gen_only_loss never touched the discriminator, so it activated the tensors in the graph that it needed and everything was fine. However, in the towered version, when I call "compute_gradients", it returns gradients for tensors it hasn't activated yet - so some of the entries are [(None, tf.Variable), (None, tf.Variable)]. This causes average_gradients to crash because it can't convert a None value to a Tensor. This makes me think I need to restrict these as well.
The confusing thing about all of this is that the cifar example, and my full_loss example does not care about training on specific towers, but I'm guessing once I specify a var_list, any magic that compute_gradients was using to know which variables to train on which towers disappear? Do I need to worry about grabbing any other variables?
For question 1, you are responsible for gathering if you split manually, yes.
For question 2 you might want to restrict the call to compute_gradients or filter the result.
Situation
I want to train a specific network architecture (a GAN) that needs inputs from different sources during training.
One input source is examples loaded from disk. The other source is a generator sub-network creating examples.
To choose which kind of input to feed to the network I use tf.cond. There is one caveat though that has already been explained: tf.cond evaluates the inputs to both conditional branches even though only one of those will ultimately be used.
Enough setup, here is a minimal working example:
import numpy as np
import tensorflow as tf
BATCH_SIZE = 32
def load_input_data():
# Normally this data would be read from disk
data = tf.reshape(np.arange(10 * BATCH_SIZE, dtype=np.float32), shape=(10 * BATCH_SIZE, 1))
return tf.train.batch([data], BATCH_SIZE, enqueue_many=True)
def generate_input_data():
# Normally this data would be generated by a much bigger sub-network
return tf.random_uniform(shape=[BATCH_SIZE, 1])
def main():
# A bool to choose between loaded or generated inputs
load_inputs_pred = tf.placeholder(dtype=tf.bool, shape=[])
# Variant 1: Call "load_input_data" inside tf.cond
data_batch = tf.cond(load_inputs_pred, load_input_data, generate_input_data)
# Variant 2: Call "load_input_data" outside tf.cond
#loaded_data = load_input_data()
#data_batch = tf.cond(load_inputs_pred, lambda: loaded_data, generate_input_data)
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
print(threads)
# Get generated input data
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: False})
print(data_batch_values)
# Get input data loaded from disk
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: True})
print(data_batch_values)
if __name__ == '__main__':
main()
Problem
Variant 1 does not work at all since the queue runner threads don't seem to run. print(threads) outputs something like [<Thread(Thread-1, stopped daemon 140165838264064)>, ...].
Variant 2 does work and print(threads) outputs something like [<Thread(Thread-1, started daemon 140361854863104)>, ...]. But since load_input_data() has been called outside of tf.cond, batches of data will be loaded from disk even when load_inputs_pred is False.
Is it possible to make Variant 1 work, so that input data is only loaded when load_inputs_pred is True and not for every call to session.run()?
If you're using a queue when loading your data and follow it up with a batch input then this shouldn't be a problem as you can specify the max amount to have loaded or stored in the queue.
input = tf.WholeFileReader(somefilelist) # or another way to load data
return tf.train.batch(input,batch_size=10,capacity=100)
See here for more details:
https://www.tensorflow.org/versions/r0.10/api_docs/python/io_ops.html#batch
Also there's an alternative approach that skips the tf.cond completely. Just define two losses one that follows the data through the autoencoder and discrimator and the other that follows the data through just the discriminator.
Then it just becomes a matter of calling
sess.run(auto_loss,feed_dict)
or
sess.run(real_img_loss,feed_dict)
In this way the graph will only run through which ever loss was called upon. Let me know if this needs more explanation.
Lastly I think to make variant one work you need to do something like this if you're using preloaded data.
https://www.tensorflow.org/versions/r0.10/how_tos/reading_data/index.html#preloaded-data
Otherwise I'm not sure what the issue is to be honest.