Can't fix torch autograd runtime error: UNet inplace operation

Can't fix torch autograd runtime error: UNet inplace operation - python

I can't fix the runtime error "one of the variables needed for gradient computation has been modified by an inplace operation.
I know, that if I comment out loss.backward() the code will run, but I don't get in which order should I call the functions to avoid this error
When I call it my wrapper with Resnet50 I don't experience any problems, but with Unet the RuntimeError occurs
for i, (x, y) in batch_iter:
with torch.autograd.set_detect_anomaly(True):
input, target = x.to(self.device), y.to(self.device)
self.optimizer.zero_grad()
if self.box_training:
out = self.model(input)
else:
out = self.model(input).clamp(0,1)
loss = self.criterion(out, target)
loss_value = loss.item()
train_losses.append(loss_value)
loss.backward()
self.optimizer.step()
batch_iter.set_description(f'Training: (loss {loss_value:.4f})')
self.training_loss.append(np.mean(train_losses))
self.learning_rate.append(self.optimizer.param_groups[0]['lr'])
As the comments pointed out, I should provide a model
And by looking at it, I actually found what was the problem:
model = UNet(in_channels=1,
num_encoding_blocks = 6,
out_classes = 1,
padding=1,
dimensions = 2,
out_channels_first_layer = 32,
normalization = None,
pooling_type = 'max',
upsampling_type = 'conv',
preactivation = False,
#residual = True,
padding_mode = 'zeros',
activation = 'ReLU',
initial_dilation = None,
dropout = 0,
monte_carlo_dropout = 0
)
It is residual = True which I has commented out. I will look into the docs, what is going on. Maybe if you have an idea, you can enlighten me

Explanation:
It looks like the UNet library you are using includes a += (in-place tensor addition) in the residual branch of the encoder:
if self.residual:
connection = self.conv_residual(x)
x = self.conv1(x)
x = self.conv2(x)
x += connection # <------- !!!
In-place operations like += may overwrite information that is needed for gradient computation during loss.backward(). PyTorch detects when this necessary information has been overwritten, and complains.
Fix:
If you want to train this network with residual enabled, you would need to replace this += with a not-in-place add:
if self.residual:
connection = self.conv_residual(x)
x = self.conv1(x)
x = self.conv2(x)
x = x + connection # <-------
A similar edit is needed in the decoder. If you installed this unet library via pip, you would want to download it directly from github instead so you can make these edits (and uninstall the pip version to avoid confusion).
For more information about why in-place operations can cause problems, see this blog post or this section of the PyTorch docs.

Related

Problems porting Tensorflow code to Keras

I have been bashing my head against the wall for the past few days - and I simply cannot figure it out.
Would some of you good people perhaps let me know what I am doing wrong?
I am trying to port code from https://github.com/simoninithomas/Deep_reinforcement_learning_Course/blob/master/Deep%20Q%20Learning/Doom/Deep%20Q%20learning%20with%20Doom.ipynb (written in Tensorflow) to Keras. Here is the original part of the code:
class DQNetwork:
def __init__(self, state_size, action_size, learning_rate, name='DQNetwork'):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
with tf.variable_scope(name):
self.inputs_ = tf.placeholder(tf.float32, [None, *state_size], name="inputs")
self.actions_ = tf.placeholder(tf.float32, [None, 3], name="actions_")
self.target_Q = tf.placeholder(tf.float32, [None], name="target")
#First convnet: CNN => BatchNormalization => ELU; Input is 84x84x4
self.conv1 = tf.layers.conv2d(inputs = self.inputs_,
filters = 32, kernel_size = [8,8],strides = [4,4],padding = "VALID",
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(), name = "conv1")
self.conv1_batchnorm = tf.layers.batch_normalization(self.conv1,training = True,
epsilon = 1e-5,name = 'batch_norm1')
self.conv1_out = tf.nn.elu(self.conv1_batchnorm, name="conv1_out")
## --> [20, 20, 32]
#Second convnet: CNN => BatchNormalization => ELU
self.conv2 = tf.layers.conv2d(inputs = self.conv1_out,
filters = 64,kernel_size = [4,4],strides = [2,2],padding = "VALID",
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(),name = "conv2")
self.conv2_batchnorm = tf.layers.batch_normalization(self.conv2,training = True,
epsilon = 1e-5,name = 'batch_norm2')
self.conv2_out = tf.nn.elu(self.conv2_batchnorm, name="conv2_out")
## --> [9, 9, 64]
#Third convnet: CNN => BatchNormalization => ELU
self.conv3 = tf.layers.conv2d(inputs = self.conv2_out,
filters = 128,kernel_size = [4,4],strides = [2,2],padding = "VALID",
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(),name = "conv3")
self.conv3_batchnorm = tf.layers.batch_normalization(self.conv3,training = True,
epsilon = 1e-5,name = 'batch_norm3')
self.conv3_out = tf.nn.elu(self.conv3_batchnorm, name="conv3_out")
## --> [3, 3, 128]
self.flatten = tf.layers.flatten(self.conv3_out)
## --> [1152]
self.fc = tf.layers.dense(inputs = self.flatten,
units = 512, activation = tf.nn.elu,
kernel_initializer=tf.contrib.layers.xavier_initializer(),name="fc1")
self.output = tf.layers.dense(inputs = self.fc, kernel_initializer=tf.contrib.layers.xavier_initializer(),
units = 3, activation=None)
# Q is our predicted Q value.
self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1)
# The loss is the difference between our predicted Q_values and the Q_target
# Sum(Qtarget - Q)^2
self.loss = tf.reduce_mean(tf.square(self.target_Q - self.Q))
self.optimizer = tf.train.RMSPropOptimizer(self.learning_rate).minimize(self.loss)
# farther below...
Qs_next_state = sess.run(DQNetwork.output, feed_dict = {DQNetwork.inputs_: next_states_mb})
# Set Q_target = r if the episode ends at s+1, otherwise set Q_target = r + gamma*maxQ(s', a')
for i in range(0, len(batch)):
terminal = dones_mb[i]
# If we are in a terminal state, only equals reward
if terminal:
target_Qs_batch.append(rewards_mb[i])
else:
target = rewards_mb[i] + gamma * np.max(Qs_next_state[i])
target_Qs_batch.append(target)
targets_mb = np.array([each for each in target_Qs_batch])
loss, _ = sess.run([DQNetwork.loss, DQNetwork.optimizer],
feed_dict={DQNetwork.inputs_: states_mb,
DQNetwork.target_Q: targets_mb,
DQNetwork.actions_: actions_mb})
And here is my conversion:
class DQNetworkA:
def __init__(self, state_size, action_size, learning_rate):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
self.model = keras.models.Sequential()
self.model.add(keras.layers.Conv2D(32, (8, 8), strides=(4, 4), padding = "VALID", input_shape=state_size))#, kernel_initializer='glorot_normal'))
self.model.add(keras.layers.BatchNormalization(epsilon = 1e-5))
self.model.add(keras.layers.Activation('elu'))
self.model.add(keras.layers.Conv2D(64, (4, 4), strides=(2, 2), padding = "VALID"))#, kernel_initializer='glorot_normal'))
self.model.add(keras.layers.BatchNormalization(epsilon = 1e-5))
self.model.add(keras.layers.Activation('elu'))
self.model.add(keras.layers.Conv2D(128, (4, 4), strides=(2, 2), padding = "VALID"))#, kernel_initializer='glorot_normal'))
self.model.add(keras.layers.BatchNormalization(epsilon = 1e-5))
self.model.add(keras.layers.Activation('elu'))
self.model.add(keras.layers.Flatten())
self.model.add(keras.layers.Dense(512))
self.model.add(keras.layers.Activation('elu'))
self.model.add(keras.layers.Dense(action_size))
self.model.compile(loss="mse", optimizer=keras.optimizers.RMSprop(lr=self.learning_rate))
print(self.model.summary())
# farther below...
Qs = DQNetwork.predict(states_mb)
Qs_next_state = DQNetwork.predict(next_states_mb)
# Set Q_target = r if the episode ends at s+1, otherwise set Q_target = r + gamma*maxQ(s', a')
for i in range(0, len(batch)):
terminal = dones_mb[i]
t = np.copy(Qs[i])
a = np.argmax(actions_mb[i])
# If we are in a terminal state, only equals reward
if terminal:
t[a] = rewards_mb[i]
else:
t[a] = rewards_mb[i] + gamma * np.max(Qs_next_state[i])
target_Qs_batch.append(t)
dbg_target_Qs_batch.append(t[a])
targets_mb = np.array([each for each in target_Qs_batch])
loss = DQNetwork.train_on_batch(states_mb, targets_mb)
Everything else is the same. I have even tried to mess around with a custom loss function to minimize differences in the code – and it simply does not work! While the original code quickly converges my Keras doodlings simply does not seem to want to work!
Does anyone have a clue? Any hints or help would be highly appreciated...
A little further explanation:
This is a simple DQN playing Doom - so the after about 100 episodes (games), the model seems to be able to shoot the target without a problem every episode. Loss goes down, rewards per game go up - as one would expect... However, in the Keras model loss graph is flat, reward graph is flat - it almost seems not to be able to learn anything. (see the graphs linked below)
Here is how it works. In TF code, model outputs a tensor [a, b, c] where a, b and c give probability of each action the main character might take (ie: [left, right, shoot]). Model is then given reward for every action, so it is passed a target value (target_mb, f.ex. 10) along with which action this is for (one-hot encoded in actions_mb, ie [0,1,0] - if this is a target for moving right). Loss is then computed with a simple MSE over difference between target and predicted value of the model for the given action.
I have done two things:
1) I tried to use the standard "mse" loss as I have seen in other models of this type. To make the loss behave the same way, I pass the model its own input apart from target value. So if model predicts [3,4,5] and the target is 10 for [0,1,0] - we pass [3,10,5] as the truth to the model. This should be equivalent to the actions of the TF model. ie, difference between 10 and 4, squared and then mean over all differences from the batch.
2) When 1) did not work, I tried to make a custom loss function that basically attempts to mimick behaviour of the TF model as closely as possible. So if model predicts [3,4,5] and the target is 10 for [0,1,0] (as above) - we pass [0,10,0] as the truth to the model. Then the custom loss function through some finicky multiplication and division arrives at difference between 10 and 4 - squares it and takes mean of all squared errors as below:
def custom_loss(y_true, y_pred):
isolated_truths = tf.reduce_sum(y_true, axis=1)
isolated_predictions = tf.divide(tf.reduce_sum(tf.multiply(y_true, y_pred), axis=1), isolated_truths)
delta = isolated_predictions - isolated_truths
return tf.reduce_mean(tf.square(delta))
# when training, this small modification is made to targets:
loss = DQN_Keras.train_on_batch(states_mb, targets_mb.reshape(len(targets_mb),1) * actions_mb)
And it still does not work (although you can see on the graphs that the loss seems to behave far more reasonably!).
Take a look at the graphs:
tf model: https://pasteboard.co/IN1b5MN.png
keras model with mse loss: https://pasteboard.co/IN1kH6P.png
keras model with custom loss: https://pasteboard.co/IN17ktg.png
edit #2 - runnable code
Original TF code - copy pasted from tutorial above, working:
=> https://pastebin.com/QLb7nWZi
My code with custom loss in full:
=> https://pastebin.com/3HiYg6t7

Well, I have made work - by removing BatchNormalization layers. Now I am completely mystified... so does batch normalization work differently in Keras and Tensorflow? Or is the missing clue this mysterious "training=True" parameter in TF (not present in Keras)?
PS.
While digging into the issue, I also found this very useful article describing how to create advanced Keras models with several inputs like masks (like in the original TF code!):
https://becominghuman.ai/lets-build-an-atari-ai-part-1-dqn-df57e8ff3b26

How to re-load a saved model (with graph?) to create the same results on future testing data?

I have trained a model and saved all the files (meta, index, checkpoint, etc.) using the saver = tf.compat.v1.train.Saver() function, and now I want to re-load that model in order to test on new data. It works fine, but my question is, every time I run the restored model on the same dataset (i.e. run it once on a testing dataset, and then start it over and run it again on that same dataset) I get very different results. I'm hoping to be able to run it over and over again on the same dataset, but get the same results.
I have two separate .py files, one for training and one for testing/loading the model to test on the dataset. My training variables/placeholders look something like this in the training.py file (in case it's relevant):
# set some tensorflow variables and placeholders, etc.
self.X = tf.compat.v1.placeholder(tf.float32, (None, self.state_size))
self.REWARDS = tf.compat.v1.placeholder(tf.float32, (None))
self.ACTIONS = tf.compat.v1.placeholder(tf.int32, (None))
feed_forward = tf.layers.dense(self.X, self.LAYER_SIZE, activation = tf.nn.relu)
self.logits = tf.layers.dense(feed_forward, self.OUTPUT_SIZE, activation = tf.nn.softmax)
input_y = tf.one_hot(self.ACTIONS, self.OUTPUT_SIZE)
loglike = tf.math.log((input_y * (input_y - self.logits) + (1 - input_y) * (input_y + self.logits)) + 1) # tf.log
rewards = tf.tile(tf.reshape(self.REWARDS, (-1,1)), [1, self.OUTPUT_SIZE])
self.cost = -tf.reduce_mean(loglike * (rewards + 1)) # leave this as a negative, so that the minimize function of the Adam optimizer will keep improving
# Adam Optimizer
self.optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate = self.LEARNING_RATE).minimize(self.cost) # minimize(self.cost)
# Start the Tensorflow session
self.sess = tf.compat.v1.InteractiveSession()
self.sess.run(tf.compat.v1.global_variables_initializer())
...
saver = tf.compat.v1.train.Saver()
save_path = saver.save(self.sess, "./agent_output/" + name + "_model")
And in the testing.py file, it looks something like this:
...
# Start the Tensorflow session
self.sess = tf.compat.v1.InteractiveSession()
new_saver = tf.train.import_meta_graph('./agent_output/' + name + '_model.meta')
new_saver.restore(self.sess, tf.train.latest_checkpoint('./agent_output/'))
print('Model loaded step 1')
#saver = tf.compat.v1.train.Saver()
#saver.restore(self.sess, "./agent_output/" + name + "_model")
#print('Model Restored!')
self.sess.run(tf.compat.v1.global_variables_initializer())
Just to give you an idea of what I'm working with. As you can see, I've tried the import_meta_graph and the commented out saver.restore method, but I think I'm missing something, or if it's even possible in my case?
I'm just hoping someone can point me in the right direction. What I've discovered on my own is that there should be a way to not only load the variables, but also the graph? Or maybe I need to implement that during the training? I'm running Python 3.6 and Tensorflow 1.14 (I believe? Not 2.0).

Your problem is probably running self.sess.run(tf.compat.v1.global_variables_initializer()) after restoring the model. You only need to run this for a fresh model not a restored one. Try it without this line.

How to get rid of Variable API in PyTorch.autograd?

I am forwarding, and backpropping tensor data X through two simple nn.Module PyTorch models instances, model1 and model2.
I can't get this process to work without usage of the depreciated Variable API.
So this works just fine:
y1 = model1(X)
v = Variable(y1.data, requires_grad=training) # Its all about this line!
y2 = model2(v)
criterion = nn.NLLLoss()
loss = criterion(y2, y)
loss.backward()
y1.backward(v.grad)
self.step()
But this will throw an error:
y1 = model1(X)
y2 = model2(y1)
criterion = nn.NLLLoss()
loss = criterion(y2, y)
loss.backward()
y1.backward(y1.grad) # it breaks here
self.step()
>>> RuntimeError: grad can be implicitly created only for scalar outputs
I just can't seem to find a relevant difference between v in the first implementation, and y1 in the second. In both cases requires_grad is set to True. The only thing I could find was that y1.grad_fn=<ThnnConv2DBackward> and v.grad_fn=<ThnnConv2DBackward>
What am I missing here? What (tensor attributes?) do I not know about, and if Variable is depreciated, what other implementation would work?

[UPDATED]
You are not correctly passing the y1.grad into y1.backward in the second example. After the first backward all the intermediate gradient will be destroyed, you need a special hook to extract that gradients. And in your case you are passing the None value. Here is small example to reproduce your case:
Code:
import torch
import torch.nn as nn
torch.manual_seed(42)
class Model1(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x.pow(3)
class Model2(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x / 2
model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()
X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
# We are going to backprop 2 times, so we need to
# retain_graph=True while first backward
loss.backward(retain_graph=True)
try:
y1.backward(y1.grad)
except RuntimeError as err:
print(err)
print('y1.grad: ', y1.grad)
Output:
grad can be implicitly created only for scalar outputs
y1.grad: None
So you need to extract them correctly:
Code:
def extract(V):
"""Gradient extractor.
"""
def hook(grad):
V.grad = grad
return hook
model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()
X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
y1.register_hook(extract(y1))
loss.backward(retain_graph=True)
print('y1.grad', y1.grad)
y1.backward(y1.grad)
Output:
y1.grad: tensor([[-0.1763, -0.2114, -0.0266, -0.3293, 0.0534]])

After some investigation I came to the following two solutions.
The solution provided elsewhere in this thread retained the computation graph manually, without an option the free them, thus running fine initially, but causing OOM errors later on.
The first solution is to tie the models together using the built in torch.nn.Sequential as such:
model = torch.nn.Sequential(Model1(), Model2())
it's as easy as that. It looks clean and behaves exactly like an ordinary model would.
The alternative is to simply tie them together manually:
model1 = Model1()
model2 = Model2()
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
loss.backward()
My fear that this would only backpropagate model2 turned out to be unsubstantiated, since model1 is also stored in the computation graph that is back propagated over.
This implementation enabled inceased transparancy of the interface between the two models, compared to the previous implementation.

PyTorch optimizer.step() function doesn't update weights

The code can be seen below.
The problem is, that the optimizer.step() part doesn't work. I'm printing model.parameters() before and after the training, and the weights don't change.
I'm trying to make a perceptron that can solve the AND-problem. I've been successful in doing this with my own tiny library, where I've implemented a perceptron with the two functions predict() and train().
Just to clarify, I've just started learning deep learning using PyTorch, so it's probably a very newbie problem. I've tried searching for a solution, but without luck. I've also compared my code with other codes that work, but I don't know what I'm doing wrong.
import torch
from torch import nn, optim
from random import randint
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.layer1 = nn.Linear(2, 1)
def forward(self, input):
out = input
out = self.layer1(out)
out = torch.sign(out)
out = torch.clamp(out, 0, 1) # 0=false, 1=true
return out
data = torch.Tensor([[0, 0], [0, 1], [1, 0], [1, 1]])
target = torch.Tensor([0, 0, 0, 1])
model = NeuralNet()
epochs = 1000
lr = 0.01
print(list(model.parameters()))
print() # Print parameters before training
loss_func = nn.L1Loss()
optimizer = optim.Rprop(model.parameters(), lr)
for epoch in range(epochs + 1):
optimizer.zero_grad()
rand_int = randint(0, len(data) - 1)
x = data[rand_int]
y = target[rand_int]
pred = model(x)
loss = loss_func(pred, y)
loss.backward()
optimizer.step()
# Print parameters again
# But they haven't changed
print(list(model.parameters()))

Welcome to stackoverflow!
The issue here is you are trying to perform back-propagation through a non-differentiable function. Non-differentiable means that no gradients can flow back through them, implying that all trainable weights applied before them will not be updated by your optimizer. Such functions are easy to spot; they are discrete, sharp operations that resemble 'if' statements. In your case it is the sign() function.
Unfortunately, PyTorch does not do any hand-holding in this regard and will not point you to the issue. What you could do to alleviate the issue would be to transform the range of your output to [-1,1] and apply a Tanh() non-linearity instead of the sign() and clamp() operators.

TensorFlow Inference

I've been digging around on this for a while. I have found a ton of articles; but none really show just tensorflow inference as a plain inference. Its always "use the serving engine" or using a graph that is pre-coded/defined.
Here is the problem: I have a device which occasionally checks for updated models. It then needs to load that model and run input predictions through the model.
In keras this was simple: build a model; train the model and the call model.predict(). In scikit-learn same thing.
I am able to grab a new model and load it; I can print out all of the weights; but how in the world do I run inference against it?
Code to load model and print weights:
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)
new_saver.restore(sess, MODEL_PATH)
for var in tf.trainable_variables():
print(sess.run(var))
I printed out all of my collections and I have:
['queue_runners', 'variables', 'losses', 'summaries', 'train_op', 'cond_context', 'trainable_variables']
I tried using sess.run(train_op); however that just started kicking up a full training session; which is not what I want to do. I just want to run inference against a different set of inputs that I provide which are not TF Records.
Just a little more detail:
The device can use C++ or Python; as long as I can produce a .exe. I can set up a feed dict if I want to feed the system. I trained with TFRecords; but in production I'm not going to use TFRecords; its a real/near real time system.
Thanks for any input. I am posting sample code to this repo: https://github.com/drcrook1/CIFAR10/TensorFlow which does all the training and sample inference.
Any hints are greatly appreciated!
------------EDITS-----------------
I rebuilt the model to be as below:
def inference(images):
'''
Portion of the compute graph that takes an input and converts it into a Y output
'''
with tf.variable_scope('Conv1') as scope:
C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')
C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')
P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)
with tf.variable_scope('Dense1') as scope:
P_1 = tf.reshape(C_1_2, (CONSTANTS.BATCH_SIZE, -1))
dim = P_1.get_shape()[1].value
D_1 = ld.mlp_layer(P_1, dim, NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)
with tf.variable_scope('Dense2') as scope:
D_2 = ld.mlp_layer(D_1, NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)
H = tf.nn.softmax(D_2, name='prediction')
return H
notice I add the name 'prediction' to the TF operation so I can retrieve it later.
When training I used the input pipeline for tfrecords and input queues.
GRAPH = tf.Graph()
with GRAPH.as_default():
examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
batch_size=CONSTANTS.BATCH_SIZE,
img_shape=CONSTANTS.IMAGE_SHAPE,
num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
examples = tf.reshape(examples, [CONSTANTS.BATCH_SIZE, CONSTANTS.IMAGE_SHAPE[0],
CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]])
logits = Vgg3CIFAR10.inference(examples)
loss = Vgg3CIFAR10.loss(logits, labels)
OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
I am attempting to use feed_dict on the loaded operation in the graph; however now it is just simply hanging....
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
#sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)
new_saver.restore(sess, MODEL_PATH)
pred = tf.get_default_graph().get_operation_by_name('prediction')
rand = np.random.rand(1, 32, 32, 3)
print(rand)
print(pred)
print(sess.run(pred, feed_dict={images: rand}))
print('done')
run_inference()
I believe this is not working because the original network was trained using TFRecords. In the sample CIFAR data set the data is small; our real data set is huge and it is my understanding TFRecords the the default best practice for training a network. The feed_dict makes great perfect sense from a productionizing perspective; we can spin up some threads and populate that thing from our input systems.
So I guess I have a network that is trained, I can get the predict operation; but how do I tell it to stop using the input queues and start using the feed_dict? Remember that from the production perspective I do not have access to whatever the scientists did to make it. They do their thing; and we stick it in production using whatever agreed upon standard.
-------INPUT OPS--------
tf.Operation 'input/input_producer/Const' type=Const, tf.Operation 'input/input_producer/Size' type=Const, tf.Operation 'input/input_producer/Greater/y' type=Const, tf.Operation 'input/input_producer/Greater' type=Greater, tf.Operation 'input/input_producer/Assert/Const' type=Const, tf.Operation 'input/input_producer/Assert/Assert/data_0' type=Const, tf.Operation 'input/input_producer/Assert/Assert' type=Assert, tf.Operation 'input/input_producer/Identity' type=Identity, tf.Operation 'input/input_producer/RandomShuffle' type=RandomShuffle, tf.Operation 'input/input_producer' type=FIFOQueueV2, tf.Operation 'input/input_producer/input_producer_EnqueueMany' type=QueueEnqueueManyV2, tf.Operation 'input/input_producer/input_producer_Close' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Close_1' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Size' type=QueueSizeV2, tf.Operation 'input/input_producer/Cast' type=Cast, tf.Operation 'input/input_producer/mul/y' type=Const, tf.Operation 'input/input_producer/mul' type=Mul, tf.Operation 'input/input_producer/fraction_of_32_full/tags' type=Const, tf.Operation 'input/input_producer/fraction_of_32_full' type=ScalarSummary, tf.Operation 'input/TFRecordReaderV2' type=TFRecordReaderV2, tf.Operation 'input/ReaderReadV2' type=ReaderReadV2,
------END INPUT OPS-----
----UPDATE 3----
I believe what I need to do is to kill the input section of the graph trained with TF Records and rewire the input to the first layer to a new input. Its kinda like performing surgery; but this is the only way I can find to do inference if I trained using TFRecords as crazy as it sounds...
Full Graph:
Section to kill:
So I think the question becomes: How does one kill the input section of the graph and replace it with a feed_dict?
A follow up to this would be: is this really the right way to do it? This seems bonkers.
----END UPDATE 3----
---link to checkpoint files---
https://drcdata.blob.core.windows.net/checkpoints/CIFAR_10_VGG3_50neuron_1pool_1e-3lr_adam.model.zip?st=2017-05-01T21%3A56%3A00Z&se=2020-05-02T21%3A56%3A00Z&sp=rl&sv=2015-12-11&sr=b&sig=oBCGxlOusB4NOEKnSnD%2FTlRYa5NKNIwAX1IyuZXAr9o%3D
--end link to checkpoint files---
-----UPDATE 4 -----
I gave in and just gave a shot at the 'normal' way of performing inference assuming I could have the scientists simply just pickle their models and we could grab the model pickle; unpack it and then run inference on it. So to test I tried the normal way assuming we already unpacked it...It doesn't work worth a beans either...
import tensorflow as tf
import CONSTANTS
import Vgg3CIFAR10
import numpy as np
from scipy import misc
import time
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
imgs_bsdir = 'C:/data/cifar_10/train/'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
logits = Vgg3CIFAR10.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')#, import_scope='1', input_map={'input:0': images})
new_saver.restore(sess, MODEL_PATH)
pred = tf.get_default_graph().get_operation_by_name('prediction')
enq = sess.graph.get_operation_by_name(enqueue_op)
#tf.train.start_queue_runners(sess)
print(rand)
print(pred)
print(enq)
for i in range(1, 25):
img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0
img = img.reshape(1, 32, 32, 3)
print(sess.run(logits, feed_dict={images : img}))
time.sleep(3)
print('done')
run_inference()
Tensorflow ends up building a new graph with the inference function from the loaded model; then it appends all the other stuff from the other graph to the end of it. So then when I populate a feed_dict expecting to get inferences back; I just get a bunch of random garbage as if it were the first pass through the network...
Again; this seems nuts; do I really need to write my own framework for serializing and deserializing random networks? This has had to have been done before...
-----UPDATE 4 -----
Again; thanks!

Alright, this took way too much time to figure out; so here is the answer for the rest of the world.
Quick Reminder: I needed to persist a model that can be dynamically loaded and inferred against without knowledge as to the under pinnings or insides of how it works.
Step 1: Create a model as a Class and ideally use an interface definition
class Vgg3Model:
NUM_DENSE_NEURONS = 50
DENSE_RESHAPE = 32 * (CONSTANTS.IMAGE_SHAPE[0] // 2) * (CONSTANTS.IMAGE_SHAPE[1] // 2)
def inference(self, images):
'''
Portion of the compute graph that takes an input and converts it into a Y output
'''
with tf.variable_scope('Conv1') as scope:
C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')
C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')
P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)
with tf.variable_scope('Dense1') as scope:
P_1 = tf.reshape(P_1, (-1, self.DENSE_RESHAPE))
dim = P_1.get_shape()[1].value
D_1 = ld.mlp_layer(P_1, dim, self.NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)
with tf.variable_scope('Dense2') as scope:
D_2 = ld.mlp_layer(D_1, self.NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)
H = tf.nn.softmax(D_2, name='prediction')
return H
def loss(self, logits, labels):
'''
Adds Loss to all variables
'''
cross_entr = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
cross_entr = tf.reduce_mean(cross_entr)
tf.summary.scalar('cost', cross_entr)
tf.add_to_collection('losses', cross_entr)
return tf.add_n(tf.get_collection('losses'), name='total_loss')
Step 2: Train your network with whatever inputs you want; in my case I used Queue Runners and TF Records. Note that this step is done by a different team which iterates, builds, designs and optimizes models. This can also change over time. The output they produce must be able to be pulled from a remote location so we can dynamically load the updated models on devices (reflashing hardware is a pain especially if it is geographically distributed). In this instance; the team drops the 3 files associated with a graph saver; but also a pickle of the model used for that training session
model = vgg3.Vgg3Model()
def create_sess_ops():
'''
Creates and returns operations needed for running
a tensorflow training session
'''
GRAPH = tf.Graph()
with GRAPH.as_default():
examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
batch_size=CONSTANTS.BATCH_SIZE,
img_shape=CONSTANTS.IMAGE_SHAPE,
num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
examples = tf.reshape(examples, [-1, CONSTANTS.IMAGE_SHAPE[0],
CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]], name='infer/input')
logits = model.inference(examples)
loss = model.loss(logits, labels)
OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
gradients = OPTIMIZER.compute_gradients(loss)
apply_gradient_op = OPTIMIZER.apply_gradients(gradients)
gradients_summary(gradients)
summaries_op = tf.summary.merge_all()
return [apply_gradient_op, summaries_op, loss, logits], GRAPH
def main():
'''
Run and Train CIFAR 10
'''
print('starting...')
ops, GRAPH = create_sess_ops()
total_duration = 0.0
with tf.Session(graph=GRAPH) as SESSION:
COORDINATOR = tf.train.Coordinator()
THREADS = tf.train.start_queue_runners(SESSION, COORDINATOR)
SESSION.run(tf.global_variables_initializer())
SUMMARY_WRITER = tf.summary.FileWriter('Tensorboard/' + CONSTANTS.MODEL_NAME, graph=GRAPH)
GRAPH_SAVER = tf.train.Saver()
for EPOCH in range(CONSTANTS.EPOCHS):
duration = 0
error = 0.0
start_time = time.time()
for batch in range(CONSTANTS.MINI_BATCHES):
_, summaries, cost_val, prediction = SESSION.run(ops)
error += cost_val
duration += time.time() - start_time
total_duration += duration
SUMMARY_WRITER.add_summary(summaries, EPOCH)
print('Epoch %d: loss = %.2f (%.3f sec)' % (EPOCH, error, duration))
if EPOCH == CONSTANTS.EPOCHS - 1 or error < 0.005:
print(
'Done training for %d epochs. (%.3f sec)' % (EPOCH, total_duration)
)
break
GRAPH_SAVER.save(SESSION, 'models/' + CONSTANTS.MODEL_NAME + '.model')
with open('models/' + CONSTANTS.MODEL_NAME + '.pkl', 'wb') as output:
pickle.dump(model, output)
COORDINATOR.request_stop()
COORDINATOR.join(THREADS)
Step 3: Run some Inference. Load your pickled model; create a new graph by piping in the new placeholder to the logits; and then call session restore. DO NOT RESTORE THE WHOLE GRAPH; JUST THE VARIABLES.
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
imgs_bsdir = 'C:/data/cifar_10/train/'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
with open('models/vgg3.pkl', 'rb') as model_in:
model = pickle.load(model_in)
logits = model.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.Saver()
new_saver.restore(sess, MODEL_PATH)
print("Starting...")
for i in range(20, 30):
print(str(i) + '.png')
img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0
img = img.reshape(1, 32, 32, 3)
pred = sess.run(logits, feed_dict={images : img})
max_node = np.argmax(pred)
print('predicted label: ' + str(max_node))
print('done')
run_inference()
There definitely ways to improve on this using interfaces and maybe packaging up everything better; but this is working and sets the stage for how we will be moving forward.
FINAL NOTE When we finally pushed this to production, we ended up having to ship the stupid `mymodel_model.py file down with everything to build up the graph. So we now enforce a naming convention for all models and there is also a coding standard for production model runs so we can do this properly.
Good Luck!

While it's not as cut and dry as model.predict(), it's still really trivial.
In your model you should have a tensor that computes the final output you're interested in, let's name that tensor output. You may currently just have a loss function. If so create another tensor (variable in the model) that actually computes the output you want.
For example, if your loss function is:
tf.nn.sigmoid_cross_entropy_with_logits(last_layer_activation, labels)
And you expect your outputs to be in the range [0,1] per class, create another variable:
output = tf.sigmoid(last_layer_activation)
Now, when you call sess.run(...) just request the output tensor. Don't request the optimization OP you normally would to train it. When you request this variable tensorflow will do the minimum work necessary to produce the value (e.g. it won't bother with backprop, loss functions, and all that because a simple feed forward pass is all that's necessary to compute output.
So if you're creating a service to return inferences of the model you'll want to keep the model loaded in memory/gpu, and repeat:
sess.run(output, feed_dict={X: input_data})
You won't need to feed it the labels because tensorflow won't bother to compute ops that aren't needed to produce the output you are requesting. You don't have to change your model or anything.
While this approach might not be as obvious as model.predict(...) I'd argue that it's vastly more flexible. If you start playing with more complex models you'll probably learn to love this approach. model.predict() is like "thinking inside the box."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't fix torch autograd runtime error: UNet inplace operation - python

Related

Problems porting Tensorflow code to Keras

How to re-load a saved model (with graph?) to create the same results on future testing data?

How to get rid of Variable API in PyTorch.autograd?

PyTorch optimizer.step() function doesn't update weights

TensorFlow Inference

Categories

Resources