I know this question is asked more than one time, but I couldn't understand codes or the logic behind.
In my data set, first I created a layer, sigmoid layer, then I connected this layer to the output layer and I've used softmax function in the output layer.
fl = tf.layers.dense(x, 10,activation=tf.sigmoid)
output = tf.layers.dense(fl, 2,activation=tf.nn.softmax)
I've created loss and accuracy, initialized variables, set optimizer and train variables, then I start running on my data:
loss = tf.losses.softmax_cross_entropy(onehot_labels=y,logits=output)
accuracy = tf.metrics.accuracy(tf.argmax(y_train,1),tf.argmax(output,1))
# inits
init_local = tf.local_variables_initializer()
init_global = tf.global_variables_initializer()
sess.run(init_global)
sess.run(init_local)
optimizer = tf.train.GradientDescentOptimizer(rate)
train = optimizer.minimize(loss)
for i in range(1000):
_, lv = sess.run((train, loss))
if i%5 == 0:
print("L: " + str(lv))
print("Accuracy: "+str(sess.run(accuracy)))
I can see that my loss value decreases every time I run on the training set. And my accuracy is ~0.93.
The problem is, from now on, I don't know how to test this model with real data.
Also, how can I draw a histogram of my real data? I have correct labels for my real data as well.
I will assume that you use Dataset to feed your training data and you want to run on test data immediately after training (since you don't have checkpoints in your code).
When using Dataset, you would create an iterator and call get_next() on it. Then, you would use the return values of get_next() as inputs to your model.
To run your model on the test data, you can use two high-level approaches:
If you test data has the same format as you train data, create a dataset that reads your test data. Then, create another copy (sometimes called a "tower") of your model (operations will be new but variables will be shared) that uses the test Dataset. Then, use sess.run() similarly to how you use it for training - you might not need to compute loss or train, but only accuracy.
If you test data has different format, you can feed it directly, by using feed_dict argument to sess.run(). You would feed your test data as values for tensors returned from get_next(). Usually, one feeds placeholders, but TensorFlow allows you to feed any tensor.
As for histograms, Tensorboard has a nice way of visualizing them: https://www.tensorflow.org/programmers_guide/tensorboard_histograms.
Related
There are ways to get and update the weight directly outside of training, but what I am looking to do is after the each training step and after the gradients have updated a particular variable or layer, I would like to save those weights to file and then replace the layer or variable weights with brand new ones. And then continue with the next step (forward pass using the new values in the variable or layer, then backward pass with the calculated loss/gradients) of training.
I have thought about just calling each training step individually, but I am wondering if that is somehow very time/memory inefficient.
You can try to use a Callback to do that.
Define the function you want:
def afterBatch(batch, logs):
model.save_weights('weights'+str(batch)) #maybe you want also a method to save the current epoch....
#option 1
model.layers[select_a_layer].set_weights(listOfNumpyWeights) #not sure this works
#option 2
K.set_value(model.layers[select_a_layer].kernel, newValueInNumpy)
#depending on the layer you have kernel, bias and maybe other weight names
#take a look at the source code of the layer you are changing the weights
Use a LambdaCallback:
from keras.callbacks import LambdaCallback
batchCallback = LambdaCallback(on_batch_end = aterBatch)
model.fit(x, y, callbacks = [batchCallback, ....])
There are weight updates after every batch (this might be too much if you are saving weights every time). You can also try on_epoch_end instead of on_batch_end.
Recently, I try to learn how to use Tensorflow on multiple GPU to accelerate training speed. I found an official tutorial about training classification model based on Cifar10 dataset. However, I found that this tutorial reads image by using the queue. Out of curiosity, how can I use multiple GPU by feeding value into Session? It seems that it is hard for me to solve the problem that feeds different value from the same dataset to different GPU. Thank you, everybody! The following code is about part of the official tutorial.
images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
[images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
# Dequeues one batch for the GPU
image_batch, label_batch = batch_queue.dequeue()
# Calculate the loss for one tower of the CIFAR model. This function
# constructs the entire CIFAR model but shares the variables across
# all towers.
loss = tower_loss(scope, image_batch, label_batch)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
# Calculate the gradients for the batch of data on this CIFAR tower.
grads = opt.compute_gradients(loss)
# Keep track of the gradients across all towers.
tower_grads.append(grads)
The core idea of the multi-GPU example is that you explicitly assign operations to a tf.device. The example loops over FLAGS.num_gpus devices and creates a replica for each of the GPUs.
If you create placeholder ops inside the for loop, they will get assigned to their respective devices. All you need to do is keep handles to the created placeholders and then feed them all independently in a single session.run call.
placeholders = []
for i in range(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
plc = tf.placeholder(tf.int32)
placeholders.append(plc)
with tf.Session() as sess:
fd = {plc: i for i, plc in enumerate(placeholders)}
sess.run(sum(placeholders), feed_dict=fd) # this should give you the sum of all
# numbers from 0 to FLAGS.num_gpus - 1
To address your specific example, it should suffice to replace the batch_queue.dequeue() call with the construction of two placeholders (for image_batch and label_batch tensors), store these placeholders somewhere, and then feed the values you need to those.
Another (somewhat hacky) way is to override the image_batch and label_batch tensors directly in the session.run call, because you can feed_dict any tensor (not just a placeholder). You will still need to store the tensors somewhere to be able to reference them from the run call.
QueueRunner and Queue-based API is relatively out-dated, it is clearly mentioned in Tensorflow docs:
Input pipelines using the queue-based APIs can be cleanly
replaced by the tf.data API
As a result, it is recommended to use tf.data API. It optimized for multi GPU and TPU purposes.
How to use it?
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
iterator = dataset.make_one_shot_iterator()
x,y = iterator.get_next()
# define your model
logit = tf.layers.dense(x,2) # use x directrly in your model
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
train_step = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
sess.run(train_step)
You can create multiple iterator for each GPU with Dataset.shard() or more easily use estimator API.
For a complete tutorial see here.
I've found the Dataset.map() functionality pretty nice for setting up pipelines to preprocess image/audio data before feeding into the network for training, but one issue I have is accessing the raw data before the preprocessing to send to tensorboard as a summary.
For example, say I have a function that loads audio data, does some framing, makes a spectrogram, and returns this.
import tensorflow as tf
def load_audio_examples(label, path):
# loads audio, converts to spectorgram
pcm = ... # this is what I'd like to put into tf.summmary.audio() !
# creates one-hot encoded labels, etc
return labels, examples
# create dataset
training = tf.data.Dataset.from_tensor_slices((
tf.constant(labels),
tf.constant(paths)
))
training = training.map(load_audio_examples, num_parallel_calls=4)
# create ops for training
train_step = # ...
accuracy = # ...
# create iterator
iterator = training.repeat().make_one_shot_iterator()
next_element = iterator.get_next()
# ready session
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
train_writer = # ...
# iterator
test_iterator = testing.make_one_shot_iterator()
test_next_element = iterator.get_next()
# train loop
for i in range(100):
batch_ys, batch_xs, path = sess.run(next_element)
summary, train_acc, _ = sess.run([summaries, accuracy, train_step],
feed_dict={x: batch_xs, y: batch_ys})
train_writer.add_summary(summary, i)
It appears as though this does not become part of the graph that is plotted in the "Graph" tab of tensorboard (see screenshot below).
As you can see, it's just X (the output of the preprocessing map() function).
How would I better structure this to get the raw audio into a tf.summary.audio()? Right now the things inside map() aren't accessible as Tensors inside my training loop.
Also, why isn't my graph showing up on Tensorboard? Worries me that I won't be able to export my model or use Tensorflow Serving to put my model into production because I'm using the new Dataset API - maybe I should go back to doing things manually? (with queues, etc).
I think your use of Dataset API doesn't make much sense. In fact you have 2 disconnected subgraphs. One for reading data and the other for running your training step.
batch_ys, batch_xs, path = sess.run(next_element)
summary, train_acc, _ = sess.run([summaries, accuracy, train_step],
feed_dict={x: batch_xs, y: batch_ys})
The first line in the code above runs session and fetches data items from it. It transfers data from Tensorflow backend into Python.
The next line feeds data using feed_dict and that is said to be inefficient. This time TensorFlow transfers data from Python to runtime.
This has the following consequences:
Your graph looks disconnected
TensorFlow wastes time doing unnecessary data transfer to and from Python.
To have a single graph (without disconnected subgraphs) you need to build your model on top of tensors returned by Dataset API. Please note that it is possible to switch between training and testing datasets without manual fetching of batches (see Dataset guide)
If to speak about summary defined in map_fn I believe you can retrieve summary from SUMMARIES collection (default collection for summaries). You can also pass your own collection name when adding summary operation.
I have a pipeline to read train and validation datasets from tfrecords.
I build batches using tf.train.batch. During training I want to switch between training and evaluation on validation dataset.
Here is simplified snippet of code how I implement it now.
is_training_pl = tf.placeholder(tf.bool)
images_train, labels_train = tf.train.batch([img_train, label_train])
images_val, labels_val = tf.train.batch([img_val, label_val])
data = tf.cond(is_training_pl, lambda: [images_train, labels_train], lambda: [images_val, labels_val])
loss = my_model(input=data)
I know that one can do it with tf.cond, but the problem with it is that both train and val batch ops would be executed when tf.cond is called.
On github ebrevdo told (link to the comment) that it's possible to use tf.train.maybe_batch for this purpose instead, which is more efficient.
Can anyone give an example of how to use tf.train.batch in my case please?
I think I'm losing my mind at this point.
I'm using Lasagne for a small convolutional neural network. It trains perfectly, I can compute the error on training and validation set as well, but I cannot save the trained model on the disk. Better, I can save it and load it, but I cannot use it to predict for new data.
This is what I do after training
model = {'network': network, 'params': get_all_params(network), 'params_values': get_all_param_values(network)}
pickle.dump(model, open('models/model_1.pkl', 'wb'), protocol=pickle.HIGHEST_PROTOCOL)
And this is what I do to load the model
with open('models/model.pkl', 'rb') as pickle_file:
model = pickle.load(pickle_file)
network = model['network']
values = model['params_values']
set_all_param_values(network, values)
T_input = T.tensor4('input', dtype='float32')
T_target = T.ivector('target')
predictions = get_output(network, deterministic=True)
loss = (cross_entropy(predictions, T_target)).mean()
acc = T.mean(T.eq(T.argmax(predictions, axis=1), T_target), dtype=config.floatX)
test_fn = function([T_input, T_target], [loss, acc])
I cannot even pass the real numpy input, that I get this error
theano.compile.function_module.UnusedInputError: theano.function was asked to create a
function computing outputs given certain inputs, but the provided input variable at index 0
is not part of the computational graph needed to compute the outputs: input.
To make this error into a warning, you can pass the parameter
on_unused_input='warn' to theano.function. To disable it completely, use
on_unused_input='ignore'.
I tried to set the parameter on_unused_input='warn' then, and this is the result
theano.gof.fg.MissingInputError: An input of the graph, used to compute (..)
was not provided and not given a value.Use the Theano flag
exception_verbosity='high',for more information on this error.
The problem is that your T_input is not tied to the input layer and hence theano can't compile it
T_input = lasagne.layers.get_all_layers(network)[0].input_var