I've been digging around on this for a while. I have found a ton of articles; but none really show just tensorflow inference as a plain inference. Its always "use the serving engine" or using a graph that is pre-coded/defined.
Here is the problem: I have a device which occasionally checks for updated models. It then needs to load that model and run input predictions through the model.
In keras this was simple: build a model; train the model and the call model.predict(). In scikit-learn same thing.
I am able to grab a new model and load it; I can print out all of the weights; but how in the world do I run inference against it?
Code to load model and print weights:
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)
new_saver.restore(sess, MODEL_PATH)
for var in tf.trainable_variables():
print(sess.run(var))
I printed out all of my collections and I have:
['queue_runners', 'variables', 'losses', 'summaries', 'train_op', 'cond_context', 'trainable_variables']
I tried using sess.run(train_op); however that just started kicking up a full training session; which is not what I want to do. I just want to run inference against a different set of inputs that I provide which are not TF Records.
Just a little more detail:
The device can use C++ or Python; as long as I can produce a .exe. I can set up a feed dict if I want to feed the system. I trained with TFRecords; but in production I'm not going to use TFRecords; its a real/near real time system.
Thanks for any input. I am posting sample code to this repo: https://github.com/drcrook1/CIFAR10/TensorFlow which does all the training and sample inference.
Any hints are greatly appreciated!
------------EDITS-----------------
I rebuilt the model to be as below:
def inference(images):
'''
Portion of the compute graph that takes an input and converts it into a Y output
'''
with tf.variable_scope('Conv1') as scope:
C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')
C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')
P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)
with tf.variable_scope('Dense1') as scope:
P_1 = tf.reshape(C_1_2, (CONSTANTS.BATCH_SIZE, -1))
dim = P_1.get_shape()[1].value
D_1 = ld.mlp_layer(P_1, dim, NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)
with tf.variable_scope('Dense2') as scope:
D_2 = ld.mlp_layer(D_1, NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)
H = tf.nn.softmax(D_2, name='prediction')
return H
notice I add the name 'prediction' to the TF operation so I can retrieve it later.
When training I used the input pipeline for tfrecords and input queues.
GRAPH = tf.Graph()
with GRAPH.as_default():
examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
batch_size=CONSTANTS.BATCH_SIZE,
img_shape=CONSTANTS.IMAGE_SHAPE,
num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
examples = tf.reshape(examples, [CONSTANTS.BATCH_SIZE, CONSTANTS.IMAGE_SHAPE[0],
CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]])
logits = Vgg3CIFAR10.inference(examples)
loss = Vgg3CIFAR10.loss(logits, labels)
OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
I am attempting to use feed_dict on the loaded operation in the graph; however now it is just simply hanging....
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
#sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)
new_saver.restore(sess, MODEL_PATH)
pred = tf.get_default_graph().get_operation_by_name('prediction')
rand = np.random.rand(1, 32, 32, 3)
print(rand)
print(pred)
print(sess.run(pred, feed_dict={images: rand}))
print('done')
run_inference()
I believe this is not working because the original network was trained using TFRecords. In the sample CIFAR data set the data is small; our real data set is huge and it is my understanding TFRecords the the default best practice for training a network. The feed_dict makes great perfect sense from a productionizing perspective; we can spin up some threads and populate that thing from our input systems.
So I guess I have a network that is trained, I can get the predict operation; but how do I tell it to stop using the input queues and start using the feed_dict? Remember that from the production perspective I do not have access to whatever the scientists did to make it. They do their thing; and we stick it in production using whatever agreed upon standard.
-------INPUT OPS--------
tf.Operation 'input/input_producer/Const' type=Const, tf.Operation 'input/input_producer/Size' type=Const, tf.Operation 'input/input_producer/Greater/y' type=Const, tf.Operation 'input/input_producer/Greater' type=Greater, tf.Operation 'input/input_producer/Assert/Const' type=Const, tf.Operation 'input/input_producer/Assert/Assert/data_0' type=Const, tf.Operation 'input/input_producer/Assert/Assert' type=Assert, tf.Operation 'input/input_producer/Identity' type=Identity, tf.Operation 'input/input_producer/RandomShuffle' type=RandomShuffle, tf.Operation 'input/input_producer' type=FIFOQueueV2, tf.Operation 'input/input_producer/input_producer_EnqueueMany' type=QueueEnqueueManyV2, tf.Operation 'input/input_producer/input_producer_Close' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Close_1' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Size' type=QueueSizeV2, tf.Operation 'input/input_producer/Cast' type=Cast, tf.Operation 'input/input_producer/mul/y' type=Const, tf.Operation 'input/input_producer/mul' type=Mul, tf.Operation 'input/input_producer/fraction_of_32_full/tags' type=Const, tf.Operation 'input/input_producer/fraction_of_32_full' type=ScalarSummary, tf.Operation 'input/TFRecordReaderV2' type=TFRecordReaderV2, tf.Operation 'input/ReaderReadV2' type=ReaderReadV2,
------END INPUT OPS-----
----UPDATE 3----
I believe what I need to do is to kill the input section of the graph trained with TF Records and rewire the input to the first layer to a new input. Its kinda like performing surgery; but this is the only way I can find to do inference if I trained using TFRecords as crazy as it sounds...
Full Graph:
Section to kill:
So I think the question becomes: How does one kill the input section of the graph and replace it with a feed_dict?
A follow up to this would be: is this really the right way to do it? This seems bonkers.
----END UPDATE 3----
---link to checkpoint files---
https://drcdata.blob.core.windows.net/checkpoints/CIFAR_10_VGG3_50neuron_1pool_1e-3lr_adam.model.zip?st=2017-05-01T21%3A56%3A00Z&se=2020-05-02T21%3A56%3A00Z&sp=rl&sv=2015-12-11&sr=b&sig=oBCGxlOusB4NOEKnSnD%2FTlRYa5NKNIwAX1IyuZXAr9o%3D
--end link to checkpoint files---
-----UPDATE 4 -----
I gave in and just gave a shot at the 'normal' way of performing inference assuming I could have the scientists simply just pickle their models and we could grab the model pickle; unpack it and then run inference on it. So to test I tried the normal way assuming we already unpacked it...It doesn't work worth a beans either...
import tensorflow as tf
import CONSTANTS
import Vgg3CIFAR10
import numpy as np
from scipy import misc
import time
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
imgs_bsdir = 'C:/data/cifar_10/train/'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
logits = Vgg3CIFAR10.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')#, import_scope='1', input_map={'input:0': images})
new_saver.restore(sess, MODEL_PATH)
pred = tf.get_default_graph().get_operation_by_name('prediction')
enq = sess.graph.get_operation_by_name(enqueue_op)
#tf.train.start_queue_runners(sess)
print(rand)
print(pred)
print(enq)
for i in range(1, 25):
img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0
img = img.reshape(1, 32, 32, 3)
print(sess.run(logits, feed_dict={images : img}))
time.sleep(3)
print('done')
run_inference()
Tensorflow ends up building a new graph with the inference function from the loaded model; then it appends all the other stuff from the other graph to the end of it. So then when I populate a feed_dict expecting to get inferences back; I just get a bunch of random garbage as if it were the first pass through the network...
Again; this seems nuts; do I really need to write my own framework for serializing and deserializing random networks? This has had to have been done before...
-----UPDATE 4 -----
Again; thanks!
Alright, this took way too much time to figure out; so here is the answer for the rest of the world.
Quick Reminder: I needed to persist a model that can be dynamically loaded and inferred against without knowledge as to the under pinnings or insides of how it works.
Step 1: Create a model as a Class and ideally use an interface definition
class Vgg3Model:
NUM_DENSE_NEURONS = 50
DENSE_RESHAPE = 32 * (CONSTANTS.IMAGE_SHAPE[0] // 2) * (CONSTANTS.IMAGE_SHAPE[1] // 2)
def inference(self, images):
'''
Portion of the compute graph that takes an input and converts it into a Y output
'''
with tf.variable_scope('Conv1') as scope:
C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')
C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')
P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)
with tf.variable_scope('Dense1') as scope:
P_1 = tf.reshape(P_1, (-1, self.DENSE_RESHAPE))
dim = P_1.get_shape()[1].value
D_1 = ld.mlp_layer(P_1, dim, self.NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)
with tf.variable_scope('Dense2') as scope:
D_2 = ld.mlp_layer(D_1, self.NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)
H = tf.nn.softmax(D_2, name='prediction')
return H
def loss(self, logits, labels):
'''
Adds Loss to all variables
'''
cross_entr = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
cross_entr = tf.reduce_mean(cross_entr)
tf.summary.scalar('cost', cross_entr)
tf.add_to_collection('losses', cross_entr)
return tf.add_n(tf.get_collection('losses'), name='total_loss')
Step 2: Train your network with whatever inputs you want; in my case I used Queue Runners and TF Records. Note that this step is done by a different team which iterates, builds, designs and optimizes models. This can also change over time. The output they produce must be able to be pulled from a remote location so we can dynamically load the updated models on devices (reflashing hardware is a pain especially if it is geographically distributed). In this instance; the team drops the 3 files associated with a graph saver; but also a pickle of the model used for that training session
model = vgg3.Vgg3Model()
def create_sess_ops():
'''
Creates and returns operations needed for running
a tensorflow training session
'''
GRAPH = tf.Graph()
with GRAPH.as_default():
examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
batch_size=CONSTANTS.BATCH_SIZE,
img_shape=CONSTANTS.IMAGE_SHAPE,
num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
examples = tf.reshape(examples, [-1, CONSTANTS.IMAGE_SHAPE[0],
CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]], name='infer/input')
logits = model.inference(examples)
loss = model.loss(logits, labels)
OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
gradients = OPTIMIZER.compute_gradients(loss)
apply_gradient_op = OPTIMIZER.apply_gradients(gradients)
gradients_summary(gradients)
summaries_op = tf.summary.merge_all()
return [apply_gradient_op, summaries_op, loss, logits], GRAPH
def main():
'''
Run and Train CIFAR 10
'''
print('starting...')
ops, GRAPH = create_sess_ops()
total_duration = 0.0
with tf.Session(graph=GRAPH) as SESSION:
COORDINATOR = tf.train.Coordinator()
THREADS = tf.train.start_queue_runners(SESSION, COORDINATOR)
SESSION.run(tf.global_variables_initializer())
SUMMARY_WRITER = tf.summary.FileWriter('Tensorboard/' + CONSTANTS.MODEL_NAME, graph=GRAPH)
GRAPH_SAVER = tf.train.Saver()
for EPOCH in range(CONSTANTS.EPOCHS):
duration = 0
error = 0.0
start_time = time.time()
for batch in range(CONSTANTS.MINI_BATCHES):
_, summaries, cost_val, prediction = SESSION.run(ops)
error += cost_val
duration += time.time() - start_time
total_duration += duration
SUMMARY_WRITER.add_summary(summaries, EPOCH)
print('Epoch %d: loss = %.2f (%.3f sec)' % (EPOCH, error, duration))
if EPOCH == CONSTANTS.EPOCHS - 1 or error < 0.005:
print(
'Done training for %d epochs. (%.3f sec)' % (EPOCH, total_duration)
)
break
GRAPH_SAVER.save(SESSION, 'models/' + CONSTANTS.MODEL_NAME + '.model')
with open('models/' + CONSTANTS.MODEL_NAME + '.pkl', 'wb') as output:
pickle.dump(model, output)
COORDINATOR.request_stop()
COORDINATOR.join(THREADS)
Step 3: Run some Inference. Load your pickled model; create a new graph by piping in the new placeholder to the logits; and then call session restore. DO NOT RESTORE THE WHOLE GRAPH; JUST THE VARIABLES.
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
imgs_bsdir = 'C:/data/cifar_10/train/'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
with open('models/vgg3.pkl', 'rb') as model_in:
model = pickle.load(model_in)
logits = model.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.Saver()
new_saver.restore(sess, MODEL_PATH)
print("Starting...")
for i in range(20, 30):
print(str(i) + '.png')
img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0
img = img.reshape(1, 32, 32, 3)
pred = sess.run(logits, feed_dict={images : img})
max_node = np.argmax(pred)
print('predicted label: ' + str(max_node))
print('done')
run_inference()
There definitely ways to improve on this using interfaces and maybe packaging up everything better; but this is working and sets the stage for how we will be moving forward.
FINAL NOTE When we finally pushed this to production, we ended up having to ship the stupid `mymodel_model.py file down with everything to build up the graph. So we now enforce a naming convention for all models and there is also a coding standard for production model runs so we can do this properly.
Good Luck!
While it's not as cut and dry as model.predict(), it's still really trivial.
In your model you should have a tensor that computes the final output you're interested in, let's name that tensor output. You may currently just have a loss function. If so create another tensor (variable in the model) that actually computes the output you want.
For example, if your loss function is:
tf.nn.sigmoid_cross_entropy_with_logits(last_layer_activation, labels)
And you expect your outputs to be in the range [0,1] per class, create another variable:
output = tf.sigmoid(last_layer_activation)
Now, when you call sess.run(...) just request the output tensor. Don't request the optimization OP you normally would to train it. When you request this variable tensorflow will do the minimum work necessary to produce the value (e.g. it won't bother with backprop, loss functions, and all that because a simple feed forward pass is all that's necessary to compute output.
So if you're creating a service to return inferences of the model you'll want to keep the model loaded in memory/gpu, and repeat:
sess.run(output, feed_dict={X: input_data})
You won't need to feed it the labels because tensorflow won't bother to compute ops that aren't needed to produce the output you are requesting. You don't have to change your model or anything.
While this approach might not be as obvious as model.predict(...) I'd argue that it's vastly more flexible. If you start playing with more complex models you'll probably learn to love this approach. model.predict() is like "thinking inside the box."
Related
I'm trying to produce adversarial examples for a semantic segmentation classifier, which involves optimising an image based using the gradients of the loss with respect to the input image variable (where loss is between the current and goal network outputs).
However, no matter what I've tried, I can't seem to create the graph in a way that allows these gradients to be calculated. I need to ensure that the calculated network output for each iteration of the image is not disconnected from the loss.
Here is the code. I've not included absolutely everything as it would be nightmarishly long. The model builder is a method from the code suite I'm trying to adapt. I'm sure that this must be some kind of trivial misunderstanding on my part.
#From elsewhere - x is the processed input image and yg is calculated using argmin on the output
#of a previous run through the network.
x = self.xclean
self.get_ygoal()
yg = self.ygoal
yg = tf.convert_to_tensor(yg)
tf.reset_default_graph()
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
#sess.run(tf.global_variables_initializer())
net_input = tf.placeholder(tf.float32,shape=[None,None,None,3])
net_output = tf.placeholder(tf.float32,shape=[None,None,None,self.num_classes])
network, _ = model_builder.build_model(self.model, net_input=net_input,
num_classes=self.num_classes,
crop_width=self.dims[0],
crop_height=self.dims[1],
is_training=True)
print('Loading model checkpoint weights')
checkpoint_path = 'checkpoints/latest_model_'+self.model+'_'+self.dataset+'.ckpt'
saver=tf.train.Saver(max_to_keep=1000)
saver.restore(sess, checkpoint_path)
img = tf.Variable(tf.zeros(shape=(1,self.dims[0], self.dims[1], 3)),name='img')
assign = tf.assign(img,net_input)
learning_rate = tf.constant(lr,dtype=tf.float32)
loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=network, labels=net_output)
optim_step = tf.compat.v1.train.GradientDescentOptimizer(learning_rate).minimize(loss, var_list=[img])
epsilon_ph = tf.placeholder(tf.float32, ())
below = net_input - epsilon_ph
above = net_input + epsilon_ph
projected = tf.clip_by_value(tf.clip_by_value(img, below, above), 0, 1)
with tf.control_dependencies([projected]):
project_step = tf.assign(img, projected)
sess.run(assign, feed_dict={net_input: x})
for i in range(steps):
print('Starting...')
# gradient descent step
_, loss_value = sess.run([optim_step], feed_dict={net_input:x,net_output:yg})
# project step
sess.run(project_step, feed_dict={net_input: x, epsilon_ph: epsilon})
if (i+1) % 10 == 0:
print('step %d, loss=%g' % (i+1, loss_value))
adv = img.eval() # retrieve the adversarial example
Here's the error message I get:
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'img:0' shape=(1, 512, 512, 3) dtype=float32_ref>"] and loss Tensor("softmax_cross_entropy_with_logits/Reshape_2:0", shape=(?, ?, ?), dtype=float32).
I should mention that this is using Tensorflow 1.14 - as the code suite is built around it.
Thanks in advance.
I have trained a model and saved all the files (meta, index, checkpoint, etc.) using the saver = tf.compat.v1.train.Saver() function, and now I want to re-load that model in order to test on new data. It works fine, but my question is, every time I run the restored model on the same dataset (i.e. run it once on a testing dataset, and then start it over and run it again on that same dataset) I get very different results. I'm hoping to be able to run it over and over again on the same dataset, but get the same results.
I have two separate .py files, one for training and one for testing/loading the model to test on the dataset. My training variables/placeholders look something like this in the training.py file (in case it's relevant):
# set some tensorflow variables and placeholders, etc.
self.X = tf.compat.v1.placeholder(tf.float32, (None, self.state_size))
self.REWARDS = tf.compat.v1.placeholder(tf.float32, (None))
self.ACTIONS = tf.compat.v1.placeholder(tf.int32, (None))
feed_forward = tf.layers.dense(self.X, self.LAYER_SIZE, activation = tf.nn.relu)
self.logits = tf.layers.dense(feed_forward, self.OUTPUT_SIZE, activation = tf.nn.softmax)
input_y = tf.one_hot(self.ACTIONS, self.OUTPUT_SIZE)
loglike = tf.math.log((input_y * (input_y - self.logits) + (1 - input_y) * (input_y + self.logits)) + 1) # tf.log
rewards = tf.tile(tf.reshape(self.REWARDS, (-1,1)), [1, self.OUTPUT_SIZE])
self.cost = -tf.reduce_mean(loglike * (rewards + 1)) # leave this as a negative, so that the minimize function of the Adam optimizer will keep improving
# Adam Optimizer
self.optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate = self.LEARNING_RATE).minimize(self.cost) # minimize(self.cost)
# Start the Tensorflow session
self.sess = tf.compat.v1.InteractiveSession()
self.sess.run(tf.compat.v1.global_variables_initializer())
...
saver = tf.compat.v1.train.Saver()
save_path = saver.save(self.sess, "./agent_output/" + name + "_model")
And in the testing.py file, it looks something like this:
...
# Start the Tensorflow session
self.sess = tf.compat.v1.InteractiveSession()
new_saver = tf.train.import_meta_graph('./agent_output/' + name + '_model.meta')
new_saver.restore(self.sess, tf.train.latest_checkpoint('./agent_output/'))
print('Model loaded step 1')
#saver = tf.compat.v1.train.Saver()
#saver.restore(self.sess, "./agent_output/" + name + "_model")
#print('Model Restored!')
self.sess.run(tf.compat.v1.global_variables_initializer())
Just to give you an idea of what I'm working with. As you can see, I've tried the import_meta_graph and the commented out saver.restore method, but I think I'm missing something, or if it's even possible in my case?
I'm just hoping someone can point me in the right direction. What I've discovered on my own is that there should be a way to not only load the variables, but also the graph? Or maybe I need to implement that during the training? I'm running Python 3.6 and Tensorflow 1.14 (I believe? Not 2.0).
Your problem is probably running self.sess.run(tf.compat.v1.global_variables_initializer()) after restoring the model. You only need to run this for a fresh model not a restored one. Try it without this line.
I am trying to properly read in my own binary data to Tensorflow based on Fixed length records section of this tutorial, and by looking at the read_cifar10 function here. Mind you I am new to tensorflow, so my understanding may be off.
My Data
My files are binary with float32 type. The first 32 bit sample is the label, and the remaining 256 samples are the data. I want to reshape the data at the end to a [2, 128] matrix.
My Code So far:
import tensorflow as tf
import os
def read_data(filename_queue):
item_type = tf.float32
label_items = 1
data_items = 256
label_bytes = label_items * item_type.size
data_bytes = data_items * item_type.size
record_bytes = label_bytes + data_bytes
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
key, value = reader.read(filename_queue)
record_data = tf.decode_raw(value, item_type)
# labels = tf.cast(tf.strided_slice(record_data, [0], [label_items]), tf.int32)
label = tf.strided_slice(record_data, [0], [label_items])
data0 = tf.strided_slice(record_data, [label_items], [label_items + data_items])
data = tf.reshape(data0, [2, data_items/2])
return data, label
if __name__ == '__main__':
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Set GPU device
datafiles = ['train_0000.dat', 'train_0001.dat']
num_epochs = 2
filename_queue = tf.train.string_input_producer(datafiles, num_epochs=num_epochs, shuffle=True)
data, label = read_data(filename_queue)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
(x, y) = read_data(filename_queue)
print(y.eval())
This code hands at the print(y.eval()), but I fear I have much bigger issues than that.
Question:
When I execute this, I get a data and label tensor returned. The problem is I don't quite understand how to actually read the data from the tensor. For example, I understand the autoencoder example here, however this has a mnist.train.next_batch(batch_size) function that is called to read the next batch. Do I need to write that for my function, or is it handled by something internal to my read_data() function. If I need to write that function, what does it look like?
Are their any other obvious things I'm missing? My goal in using this method is to reduce I/O overhead, and not store all of the data in memory, since my file are quite large.
Thanks in advance.
Yes. You are pretty much done. At this point you need to:
1) Write your neural network model model which is supposed to take your data and return a label.
2) Write your cost function C which takes the network prediction and the true label and gives you a cost.
3) Choose and optimizer.
4) Put everything together:
opt = tf.AdamOptimizer(learning_rate=0.001)
datafiles = ['train_0000.dat', 'train_0001.dat']
num_epochs = 2
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
filename_queue = tf.train.string_input_producer(datafiles, num_epochs=num_epochs, shuffle=True)
data, label = read_data(filename_queue)
example_batch, label_batch = tf.train.shuffle_batch(
[data, label], batch_size=128)
y_pred = model(data)
loss = C(label, y_pred)
After which you iterate and minimize the loss with:
opt.minimize(loss)
See also tf.train.string_input_producer behavior in a loop for related information.
I'm new to generative networks and I decided to first try it on my own before seeing up a code. These are the steps I used to train my GAN.
[lib: tensorflow]
1) Train a discriminator on the dataset. (I used a dataset of 2 features with labels of either 'mediatating' or 'not meditating', dataset: https://drive.google.com/open?id=0B5DaSp-aTU-KSmZtVmFoc0hRa3c )
2) Once the the discriminator is trained, save it.
3) Make another file with for another feed forward network (or any other depending on your dataset). This feed forward network is the generator.
4) Once the generator is constructed, restore the discriminator and define a loss function for generator such that it learns to fool the discriminator. (this didn't work in tensorflow because sess.run() doesn't return a tf tensor and the path between G and D breaks but should work when done from scratch)
d_output = sess.run(graph.get_tensor_by_name('ol:0'), feed_dict={graph.get_tensor_by_name('features_placeholder:0'): g_output})
print(d_output)
optimize_for = tf.constant([[0.0]*10]) #not meditating
g_loss = -tf.reduce_mean((d_output - optimize_for)**2)
train = tf.train.GradientDescentOptimizer(learning_rate).minimize(g_loss)
Why don't we train a generator like this? This seems so much simpler. It's true I couldn't manage to run this on tensorflow but this should be possible if I do from scratch.
Full code:
Discriminator:
import pandas as pd
import tensorflow as tf
from sklearn.utils import shuffle
data = pd.read_csv("E:/workspace_py/datasets/simdata/linear_data_train.csv")
learning_rate = 0.001
batch_size = 1
n_epochs = 1000
n_examples = 999 # This is highly unsatisfying >:3
n_iteration = int(n_examples/batch_size)
features = tf.placeholder('float', [None, 2], name='features_placeholder')
labels = tf.placeholder('float', [None, 1], name = 'labels_placeholder')
weights = {
'ol': tf.Variable(tf.random_normal([2, 1]), name = 'w_ol')
}
biases = {
'ol': tf.Variable(tf.random_normal([1]), name = 'b_ol')
}
ol = tf.nn.sigmoid(tf.add(tf.matmul(features, weights['ol']), biases['ol']), name = 'ol')
loss = tf.reduce_mean((labels - ol)**2, name = 'loss')
train = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for epoch in range(n_epochs):
ptr = 0
data = shuffle(data)
data_f = data.drop("lbl", axis = 1)
data_l = data.drop(["f1", "f2"], axis = 1)
for iteration in range(n_iteration):
epoch_x = data_f[ptr: ptr + batch_size]
epoch_y = data_l[ptr: ptr + batch_size]
ptr = ptr + batch_size
_, lss = sess.run([train, loss], feed_dict={features: epoch_x, labels:epoch_y})
print("Loss # epoch ", epoch, " = ", lss)
print("\nTesting...\n")
data = pd.read_csv("E:/workspace_py/datasets/simdata/linear_data_eval.csv")
test_data_l = data.drop(["f1", "f2"], axis = 1)
test_data_f = data.drop("lbl", axis = 1)
print(sess.run(ol, feed_dict={features: test_data_f}))
print(test_data_l)
print("Saving model...")
saver = tf.train.Saver()
saver.save(sess, save_path="E:/workspace_py/saved_models/meditation_disciminative_model.ckpt")
sess.close()
Generator:
import tensorflow as tf
# hyper parameters
learning_rate = 0.1
# batch_size = 1
n_epochs = 100
from numpy import random
noise = random.rand(10, 2)
print(noise)
# Model
input_placeholder = tf.placeholder('float', [None, 2])
weights = {
'hl1': tf.Variable(tf.random_normal([2, 3]), name = 'w_hl1'),
'ol': tf.Variable(tf.random_normal([3, 2]), name = 'w_ol')
}
biases = {
'hl1': tf.Variable(tf.zeros([3]), name = 'b_hl1'),
'ol': tf.Variable(tf.zeros([2]), name = 'b_ol')
}
hl1 = tf.add(tf.matmul(input_placeholder, weights['hl1']), biases['hl1'])
ol = tf.add(tf.matmul(hl1, weights['ol']), biases['ol'])
sess = tf.Session()
sess.run(tf.global_variables_initializer())
g_output = sess.run(ol, feed_dict={input_placeholder: noise})
# restoring discriminator
saver = tf.train.import_meta_graph("E:/workspace_py/saved_models/meditation_disciminative_model.ckpt.meta")
saver.restore(sess, tf.train.latest_checkpoint('E:/workspace_py/saved_models/'))
graph = tf.get_default_graph()
d_output = sess.run(graph.get_tensor_by_name('ol:0'), feed_dict={graph.get_tensor_by_name('features_placeholder:0'): g_output})
print(d_output)
optimize_for = tf.constant([[0.0]*10])
g_loss = -tf.reduce_mean((d_output - optimize_for)**2)
train = tf.train.GradientDescentOptimizer(learning_rate).minimize(g_loss)
The discriminator's purpose isn't to classify your original data, or really discriminate anything about your original data. Its sole purpose is to discriminate your generator's output from original output.
Think of an example of an art forger. Your dataset is all original paintings. Your generator network G is an art forger, and your discriminator D is a detective whose sole purpose in life is to find forgeries made by G.
D can't learn much just by looking at original paintings. What's really important for him is to figure out what sets G's forgeries apart from everything else. G can't make any money selling forgeries if all his pieces are discovered and marked as such by D, so he must learn how to thwart D.
This creates an environment where G is constantly trying to make his pieces look more "like" original artwork, and D is constantly getting better at finding the nuances to G's forgery style. The better D gets, the better G needs to be in order to make a living. They each get better at their task until they (theoretically) reach some Nash equilibrium defined by the complexity of the networks and the data they're trying to forge.
That's why D needs to be trained back-and-forth with G, because it needs to know and adapt to G's particular nuances (which change over time as G learns and adapts), not just find some average definition of "not forged". By making D hunt G specifically, you force G to become a better forger, and thus end up with a better generator network. If you just train D once, then G can learn some easy, obvious, unimportant way to beat D and never actually produce very good forgeries.
I have wrote a program with Tensorflow that identifies a number of figures in an image. The model is trained with a function and then used with another function to label the figures. The training have been done on my computer and the resulting model upload to aws with the solve function.
I my computer it works well, but when create a lambda in aws it works strange and start giving different answers with the same test data.
The model in the solve function is this:
# Recreate neural network from model file generated during training
# input
x = tf.placeholder(tf.float32, [None, size_of_image])
# weights
W = tf.Variable(tf.zeros([size_of_image, num_chars]))
# biases
b = tf.Variable(tf.zeros([num_chars]))
The solve function code to label the figures is this:
for testi in range(captcha_letters_num):
# load model from file
saver = tf.train.import_meta_graph(model_path + '.meta',
clear_devices=True)
saver.restore(sess, model_path)
# Data to label
test_x = np.asarray(char_imgs[testi], dtype=np.float32)
predict_op = model(test_x, W, b)
op = sess.run(predict_op, feed_dict={x: test_x})
# find max probability from the probability distribution returned by softmax
max_probability = op[0][0]
max_probability_index = -1
for i in range(num_chars):
if op[0][i] > max_probability:
max_probability = op[0][i]
max_probability_index = i
# append it to final output
final_text += char_map_list[max_probability_index]
# Reset the model so it can be used again
tf.reset_default_graph()
With the same test data it gives different answers, don't know why.
Solved!
What I finally do was to keep the Session outside the loop and initialize the variables. After ending the loop, reset the graph.
saver = tf.train.Saver()
sess = tf.Session()
# Initialize variables
sess.run(tf.global_variables_initializer())
.
.
.
# passing each of the 5 characters through the NNet
for testi in range(captcha_letters_num):
# Data to label
test_x = np.asarray(char_imgs[testi], dtype=np.float32)
predict_op = model(test_x, W, b)
op = sess.run(predict_op, feed_dict={x: test_x})
# find max probability from the probability distribution returned by softmax
max_probability = op[0][0]
max_probability_index = -1
for i in range(num_chars):
if op[0][i] > max_probability:
max_probability = op[0][i]
max_probability_index = i
# append it to final output
final_text += char_map_list[max_probability_index]
# Reset the model so it can be used again
tf.reset_default_graph()
sess.close()