I use tensorflow to implement handwritten digit recognition. I hope that the logits in softmax_cross_entropy_with_logits are first represented by a placeholder, and then passed to the placeholder by the calculated value when calculating, but tensorflow will report error ValueError: No gradients provided for any variable, check Your graph for ops that do not support gradients. I know that it is ok to change the logits directly to outputs, but if I have to use logits, the result is a placeholder first. How should I solve it?
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/home/as/downloads/resnet-152_mnist-master/mnist_dataset", one_hot=True)
from tensorflow.contrib.layers import fully_connected
x = tf.placeholder(dtype=tf.float32,shape=[None,784])
y = tf.placeholder(dtype=tf.float32,shape=[None,10])
hidden1 = fully_connected(x,100,activation_fn=tf.nn.elu,
weights_initializer=tf.random_normal_initializer())
hidden2 = fully_connected(hidden1,200,activation_fn=tf.nn.elu,
weights_initializer=tf.random_normal_initializer())
hidden3 = fully_connected(hidden2,200,activation_fn=tf.nn.elu,
weights_initializer=tf.random_normal_initializer())
outputs = fully_connected(hidden3,10,activation_fn=None,
weights_initializer=tf.random_normal_initializer())
a = tf.placeholder(tf.float32,[None,10])
loss = tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=a)
reduce_mean_loss = tf.reduce_mean(loss)
equal_result = tf.equal(tf.argmax(outputs,1),tf.argmax(y,1))
cast_result = tf.cast(equal_result,dtype=tf.float32)
accuracy = tf.reduce_mean(cast_result)
train_op = tf.train.AdamOptimizer(0.001).minimize(reduce_mean_loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(30000):
xs,ys = mnist.train.next_batch(128)
result = outputs.eval(feed_dict={x:xs})
sess.run(train_op,feed_dict={a:result,y:ys})
print(i)
To be brief, the logits in your loss can't be a placeholder, but need to be a tensorflow Operation. Otherwise, your optimizer can't calculate the gradient w.r.t any variables (see error message).
Operations are "a graph node that performs computation on tensors", whereas a placeholder is a tensor that needs to be fed, when evaluating the graph.
I don't really understand, why you don't directly assign the outputs operation to logits, like so:
loss = tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=outputs)
I could try to further help you, if you provide a special use case?
Related
I have a Keras pre-trained model "model_keras" and I want to use it in a loss function. The input of model "model_keras" is an output of another Tensorflow model "model_tf" (a generative model). I'm trying to update the weights of "model_tf" by minimizing the loss. During the optimization, "model_kears" is only used for inference and will not get updated. My problem is that I'm not able to get the correct inference result from "model_keras", due to this issue, I'm not able to update the "model_tf" correctly. The code is shown below:
loss_func(input, target, model_keras): # the input is an output of another Tensorflow model.
inference_res = model_keras(input)
loss = tf.reduce_mean(inference_res-target)
return loss
train_phase = tf.placeholder(tf.bool)
z = tf.placeholder(tf.float32, [None, 128])
y = tf.placeholder(tf.int32, [None])
t = tf.placeholder(tf.float32, [None, 10])
model_tf = Generator("generator") # Building the Tensorflow model "model_tf"
fake_img = model_tf(z, train_phase, y, NUMS_CLASS) # fake_img is the output of "model_tf" and will be served as the input of "model_keras"
model_keras = MyKerasModel("Vgg19") # Loading the pretrained Keras model
G_loss = loss_func(fake_img, t, model_keras)
G_opt = tf.train.AdamOptimizer(4e-4, beta1=0., beta2=0.9).minimize(G_loss, var_list=model_tf.var_list())
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run(G_opt, feed_dict={z: Z, train_phase: True, y: Y, t: target}) # Z, Y and target are numpy arrays.
I also tried to use model.predict(input) but got the ValueError: "When feeding symbolic tensors to a model, we expect the tensors to have a static batch size". Reason behind is that model.predict() expects the input to be real data tensor instead of a symbolic tensor. However, since I want to update the weights of "model_tf", I need to make the loss function differentiable and compute the gradients. Therefore, I can not just pass a numpy array to "model_keras".
How can I get the correct output(inference_res) of "model_keras" in this case? The Tensorflow and Keras version I'm using is 1.15 and 2.2.5, respectively.
If I understood your question, here is an idea. You can pass your input to model_keras and lets name the output keras_y. Then freeze the model_keras and add the model to the end of model_tf so you have a big model which is sequence of model_tf and then model_keras (which the second part has been freezed). Next give the inputs to your model and name the output as model_y. Now you can compute the loss as loss_func(keras_y, model_y)
If I define this simple Keras model
import tensorflow as tf
from tensorflow import keras
import numpy as np
l1 = keras.layers.Input(shape=(32))
l2 = keras.layers.Dense(10)(l1)
model = keras.Model(inputs=l1, outputs=l2)
model.compile(loss='mse', optimizer='adam')
Let's say I have the input and labels values stored in train_examples and train_labels respectively
If I also define a variable some_var that depends on that model's loss (I just use model_loss here for the sake of this example)
some_var = model.total_loss
How do I evaluate the value for some_var? I know it should be something like:
with keras.backend.get_session() as sess:
sess.run(some_var, feed_dict={ ?: train_examples, ?: train_labels })
what should go in place of the question marks?
I don't want to modify the model's loss function, just use whatever has been defined in the definition of another variable
thank you in advance
How do I evaluate the value for some_var?
some_var = model.total_loss
train_examples = np.ones((1,32))
train_labels=np.ones((1,10))
with keras.backend.get_session() as sess:
loss = sess.run(some_var, feed_dict={ 'input_1:0': train_examples,
'dense_target:0' : train_labels })
print(f'loss is {loss}')
loss is 3.1767444610595703
Regarding feed_dict keys,
for Keras input layer it's layer_name:0, in the above example, input_1:0 (layer name assigned by keras can be known from model.summary())
When a particular key:value pair is not provided in feed_dict, tensorflow throws an error with the key that is missing.
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'dense_target' with dtype float and shape [?,?]
Long answer on why ':0' is needed for the tensor name. http://stackoverflow.com/a/37870634/419116
Since you want to "evaluate" and not use it in some further graph calculations, you can simply use a callback:
from keras.callbacks import LambdaCallback
def getLoss(epoch, logs):
print(logs['loss']) #or val_loss (print the keys of logs if in doubt)
callback = LambdaCallback(on_epoch_end = getLoss)
Use this callback when fitting:
model.fit(x, y, callbacks = [callback])
If you can wait until training finishes, you can simply get the history at the end:
historyCallback = model.fit(x, y)
print(historyCallback.history['loss'])
For practice, I wanted to implement a model in tensorflow which gives me back the square of the input. My code works correctly, but when I have a look at the computation graph in TensorBoard, the LOSS operation is not connected to the Gradients subgraph and neither to Adam. Why is this? As I understand, the compute the gradients, tensorflow has to derivate the loss.
Here is my code:
import numpy as np
import tensorflow as tf
np_inp = np.array([3, 6, 4, 2, 9, 11, 0.48, 22, -2.3, -0.48])
np_outp = np.power(np_inp, 2)
inputs = tf.Variable(np_inp, name='input', trainable=False)
outputs = tf.Variable(np_outp, name='output', trainable=False)
multiplier = tf.Variable(0.1,
dtype=tf.float64, trainable=True, name='multiplier')
mul = inputs * multiplier
predict = tf.square(mul, name='prediction')
loss = tf.math.reduce_sum(tf.math.square(predict-outputs), name='LOSS')
optimizer = tf.train.AdamOptimizer(0.1)
to_minimize = optimizer.minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
logs_path = "./logs/unt" # path to the folder that we want to save the logs for Tensorboard
train_writer = tf.summary.FileWriter(logs_path, sess.graph)
for i in range(100):
sess.run(to_minimize)
print(sess.run({'mult':multiplier}))
Tensorboard:
https://gofile.io/?c=jxbWiG
Thanks in advance!
This can be counter intuitive, but the actual value of the loss is not used for the training itself (although it can be useful to plot it to see its progress). What optimizers generally use is the gradient, that is, how each change in each variable would affect the loss value. To compute this, a tensor with the same shape as LOSS but filled with ones is created, and the gradient of each operation is computed through back-propagation. If you open the gradients box in the graph, you will see a LOSS_grad box representing this.
It is a couple of nodes making that tensor of ones, because the gradient of something with respect to itself is always one. From there, the rest of gradients are computed.
I used tf.keras to build a fully-connected ANN, "my_model". Then, I'm trying to minimize a function f(x) = my_model.predict(x) - 0.5 + g(x) using Adam optimizer from TensorFlow. I tried the below code:
x = tf.get_variable('x', initializer = np.array([1.5, 2.6]))
f = my_model.predict(x) - 0.5 + g(x)
optimizer = tf.train.AdamOptimizer(learning_rate=.001).minimize(f)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(50):
print(sess.run([x,f]))
sess.run(optimizer)
However, I'm getting the following error when my_model.predict(x) is executed:
If your data is in the form of symbolic tensors, you should specify the steps argument (instead of the batch_size argument)
I understand what the error is but I'm unable to figure out how to make my_model.predict(x) work in the presence of symbolic tensors. If my_model.predict(x) is removed from the function f(x), the code runs without any error.
I checked the following link, link where TensorFlow optimizers are used to minimize an arbitrary function, but I think my problem is with the usage of underlying keras's model.predict() function. I appreciate any help. Thanks in advance!
I found the answer!
Basically, I was trying to optimize a function involving a trained ANN w.r.t the input variables to the ANN. So, all I wanted was to know how to call my_model and put it in f(x). Digging a bit into the Keras documentation here: https://keras.io/getting-started/functional-api-guide/, I found that all Keras models are callable just like the layers of the models! Quoting the information from the link,
..you can treat any model as if it were a layer, by calling it on a
tensor. Note that by calling a model you aren't just reusing the
architecture of the model, you are also reusing its weights.
Meanwhile, the model.predict(x) part expects x to be numpy arrays or evaluated tensors and does not take tensorflow variables as inputs (https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict).
So the following code worked:
## initializations
sess = tf.InteractiveSession()
x_init_value = np.array([1.5, 2.6])
x_placeholder = tf.placeholder(tf.float32)
x_var = tf.Variable(x_init_value, dtype=tf.float32)
# Check calling my_model
assign_step = tf.assign(x_var, x_placeholder)
sess.run(assign_step, feed_dict={x_placeholder: x_init_value})
model_output = my_model(x_var) # This simple step is all I wanted!
sess.run(model_output) # This outputs my_model's predicted value for input x_init_value
# Now, define the objective function that has to be minimized
f = my_model(x_var) - 0.5 + g(x_var) # g(x_var) is some function of x_var
# Define the optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=.001).minimize(f)
# Run the optimization steps
for i in range(50): # for 50 steps
_,loss = optimizer.minimize(f, var_list=[x_var])
print("step: ", i+1, ", loss: ", loss, ", X: ", x_var.eval()))
After going through some Stack questions and the Keras documentation, I manage to write some code trying to evaluate the gradient of the output of a neural network w.r.t its inputs, the purpose being a simple exercise of approximating a bivariate function (f(x,y) = x^2+y^2) using as loss the difference between analytical and automatic differentiation.
Combining answers from two questions (Keras custom loss function: Accessing current input pattern
and Getting gradient of model output w.r.t weights using Keras
), I came up with this:
import tensorflow as tf
from keras import backend as K
from keras.models import Model
from keras.layers import Dense, Activation, Input
def custom_loss(input_tensor):
outputTensor = model.output
listOfVariableTensors = model.input
gradients = K.gradients(outputTensor, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:input_tensor})
grad_pred = K.add(evaluated_gradients[0], evaluated_gradients[1])
grad_true = k.add(K.scalar_mul(2, model.input[0][0]), K.scalar_mul(2, model.input[0][1]))
return K.square(K.subtract(grad_pred, grad_true))
input_tensor = Input(shape=(2,))
hidden = Dense(10, activation='relu')(input_tensor)
out = Dense(1, activation='sigmoid')(hidden)
model = Model(input_tensor, out)
model.compile(loss=custom_loss_wrapper(input_tensor), optimizer='adam')
Which yields the error: TypeError: The value of a feed cannot be a tf.Tensor object. because of feed_dict={model.input:input_tensor}. I understand the error, I just don't know how to fix it.
From what I gathered, I can't simply pass input data into the loss function, it must be a tensor. I realized Keras would 'understand' it when I call input_tensor. This all just leads me to think I'm doing things the wrong way, trying to evaluate the gradient like that. Would really appreciate some enlightenment.
I don't really understand why you want this loss function, but I will provide an answer anyway. Also, there is no need to evaluate the gradient within the function (in fact, you would be "disconnecting" the computational graph). The loss function could be implemented as follows:
from keras import backend as K
from keras.models import Model
from keras.layers import Dense, Input
def custom_loss(input_tensor, output_tensor):
def loss(y_true, y_pred):
gradients = K.gradients(output_tensor, input_tensor)
grad_pred = K.sum(gradients, axis=-1)
grad_true = K.sum(2*input_tensor, axis=-1)
return K.square(grad_pred - grad_true)
return loss
input_tensor = Input(shape=(2,))
hidden = Dense(10, activation='relu')(input_tensor)
output_tensor = Dense(1, activation='sigmoid')(hidden)
model = Model(input_tensor, output_tensor)
model.compile(loss=custom_loss(input_tensor, output_tensor), optimizer='adam')
A Keras loss must have y_true and y_pred as inputs. You can try adding your input object as both x and y during the fit:
def custom_loss(y_true,y_pred):
...
return K.square(K.subtract(grad_true, grad_pred))
...
model.compile(loss=custom_loss, optimizer='adam')
model.fit(X, X, ...)
This way, y_true will be the batch being processed at each iteration from the input X, while y_pred will be the output of the model for that particular batch.