I am new to tensorflow and I am building a network but failing to compute/apply the gradients for it. I get the error:
ValueError: No gradients provided for any variable: ((None, tensorflow.python.ops.variables.Variable object at 0x1025436d0), ... (None, tensorflow.python.ops.variables.Variable object at 0x10800b590))
I tried using a tensorboard graph to see if there`s was something that made it impossible to trace the graph and get the gradients but I could not see anything.
Here`s part of the code:
sess = tf.Session()
X = tf.placeholder(type, [batch_size,feature_size])
W = tf.Variable(tf.random_normal([feature_size, elements_size * dictionary_size]), name="W")
target_probabilties = tf.placeholder(type, [batch_size * elements_size, dictionary_size])
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * number_of_layers)
initial_state = state = stacked_lstm.zero_state(batch_size, type)
output, state = stacked_lstm(X, state)
pred = tf.matmul(output,W)
pred = tf.reshape(pred, (batch_size * elements_size, dictionary_size))
# instead of calculating this, I will calculate the difference between the target_W and the current W
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(target_probabilties, pred)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
sess.run(optimizer, feed_dict={X:my_input, target_probabilties:target_prob})
I will appreciate any help on figuring this out.
I always have the tf.nn.softmax_cross_entropy_with_logits() used so that I have the logits as first argument and the labels as second. Can you try this?
Related
I'm trying to produce adversarial examples for a semantic segmentation classifier, which involves optimising an image based using the gradients of the loss with respect to the input image variable (where loss is between the current and goal network outputs).
However, no matter what I've tried, I can't seem to create the graph in a way that allows these gradients to be calculated. I need to ensure that the calculated network output for each iteration of the image is not disconnected from the loss.
Here is the code. I've not included absolutely everything as it would be nightmarishly long. The model builder is a method from the code suite I'm trying to adapt. I'm sure that this must be some kind of trivial misunderstanding on my part.
#From elsewhere - x is the processed input image and yg is calculated using argmin on the output
#of a previous run through the network.
x = self.xclean
self.get_ygoal()
yg = self.ygoal
yg = tf.convert_to_tensor(yg)
tf.reset_default_graph()
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
#sess.run(tf.global_variables_initializer())
net_input = tf.placeholder(tf.float32,shape=[None,None,None,3])
net_output = tf.placeholder(tf.float32,shape=[None,None,None,self.num_classes])
network, _ = model_builder.build_model(self.model, net_input=net_input,
num_classes=self.num_classes,
crop_width=self.dims[0],
crop_height=self.dims[1],
is_training=True)
print('Loading model checkpoint weights')
checkpoint_path = 'checkpoints/latest_model_'+self.model+'_'+self.dataset+'.ckpt'
saver=tf.train.Saver(max_to_keep=1000)
saver.restore(sess, checkpoint_path)
img = tf.Variable(tf.zeros(shape=(1,self.dims[0], self.dims[1], 3)),name='img')
assign = tf.assign(img,net_input)
learning_rate = tf.constant(lr,dtype=tf.float32)
loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=network, labels=net_output)
optim_step = tf.compat.v1.train.GradientDescentOptimizer(learning_rate).minimize(loss, var_list=[img])
epsilon_ph = tf.placeholder(tf.float32, ())
below = net_input - epsilon_ph
above = net_input + epsilon_ph
projected = tf.clip_by_value(tf.clip_by_value(img, below, above), 0, 1)
with tf.control_dependencies([projected]):
project_step = tf.assign(img, projected)
sess.run(assign, feed_dict={net_input: x})
for i in range(steps):
print('Starting...')
# gradient descent step
_, loss_value = sess.run([optim_step], feed_dict={net_input:x,net_output:yg})
# project step
sess.run(project_step, feed_dict={net_input: x, epsilon_ph: epsilon})
if (i+1) % 10 == 0:
print('step %d, loss=%g' % (i+1, loss_value))
adv = img.eval() # retrieve the adversarial example
Here's the error message I get:
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'img:0' shape=(1, 512, 512, 3) dtype=float32_ref>"] and loss Tensor("softmax_cross_entropy_with_logits/Reshape_2:0", shape=(?, ?, ?), dtype=float32).
I should mention that this is using Tensorflow 1.14 - as the code suite is built around it.
Thanks in advance.
I have a pre-trained model that I load and it effectively works (i.e. I can make predictions). I want to get the gradients of the model for a certain parameter, however I cannot manage to get any meaningful results. Always a Noneoutput.
My code:
sess = tf.Session()
K.set_session(sess)
x = X_test[0].reshape(1,100)
y = np.reshape(Y_test[0], (1,1))
tf_y = tf.convert_to_tensor(y,dtype=np.float32)
model2 = ClassificationModel(config, logging).model
model2.load_weights("class_models/model.382-0.46-0.87.h5")
# predict real x_test
y_hat = model2.predict(x)
tf_y_hat = tf.convert_to_tensor(y_hat, dtype=np.float32)
loss = keras.losses.binary_crossentropy(tf_y,tf_y_hat)
grad, = K.gradients(loss,x)
print(grad)
And the output I get for the print is None. What am I doing wrong? How do I get the gradient given my model?
With your current code, tensorflow cannot connect x to the computational graph of loss since loss is created from a numpy array (y_hat) and x is also just a numpy array. The following code should work instead:
tf_x = tf.convert_to_tensor(x, dtype=np.float32)
loss = tf.keras.losses.binary_crossentropy(tf_y, model2(tf_x))
grad, = K.gradients(loss, tf_x)
I've wrote my first Tensorflow program ( using my own data) . It works well at least it doesn't crash! but I'm getting a wired accuracy values either 0 oder 1 ?
.................................
the previous part of the code, is only about handeling csv file an getting Data in correct format / shapes
......................................................
# Tensoflow
x = tf.placeholder(tf.float32,[None,len(Training_Data[0])],name='Train_data')# each input has a 457 lenght
y_ = tf.placeholder(tf.float32,[None, numberOFClasses],name='Labels')#
#w = tf.Variable(tf.zeros([len(Training_Data[0]),numberOFClasses]),name='Weights')
w = tf.Variable(tf.truncated_normal([len(Training_Data[0]),numberOFClasses],stddev=1./10),name='Weights')
b = tf.Variable(tf.zeros([numberOFClasses]),name='Biases')
model = tf.add(tf.matmul(x,w),b)
y = tf.nn.softmax(model)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
#cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for j in range(len(train_data)):
if(np.shape(train_data) == (batchSize,numberOFClasses)):
sess.run(train_step,feed_dict={x:train_data[j],y_:np.reshape(train_labels[j],(batchSize,numberOFClasses)) })
correct_prediction = tf.equal(tf.arg_max(y,1),tf.arg_max(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
accuracy_vector= []
current_class =[]
for i in range(len(Testing_Data)):
if( np.shape(Testing_Labels[i]) == (numberOFClasses,)):
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i],(1,457)),y_:np.reshape(Testing_Labels[i],(1,19))}))#,i)#,Test_Labels[i])
current_class.append(int(Test_Raw[i][-1]))
ploting theaccuracy_vector delivers the following :
[]
any idea what I'm missing here ?
thanks a lot for any hint !
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
tf.nn.softmax_cross_entropy_with_logits wants unscaled logits.
From the doc:
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
this means that the line y = tf.nn.softmax(model) is wrong.
Instead, you want to pass unscaled logits to that function, thus:
y = model
Moreover, once you fix this problem, if the network doesn't work, try to lower the learning rate from 0.01 to something about 1e-3 or 1e-4. (I tell you this because 1e-2 usually is an "high" learning rate)
You're testing on batches of size 1, so either the prediction is good or it's false, so you can only get 0 or 1 accuracy:
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i],(1,457)),y_:np.reshape(Testing_Labels[i],(1,19))}))#,i)#,Test_Labels[i])
Just use a bigger batch size :
accuracy_vector.append(sess.run(accuracy,feed_dict={x:np.reshape(Testing_Data[i:i+batch_size],(batch_size,457)),y_:np.reshape(Testing_Labels[i:i+batch_size],(batch_size,19))}))
Suppose I want to train a model over a number of samples (and also variables) that are known only at run time.
Case study: PCA (X W = Y)
(this is only a simplification of a much more complex model)
Take for example this simple PCA model, where only the features dimensions (Din and Dout) are known and fixed.
W = tf.Variable(tf.zeros([D_in, D_out]), name='weights', trainable=True)
X = tf.placeholder(tf.float32, [None, D_in], name='placeholder_latent')
Y_est = tf.matmul(X, W)
loss = tf.reduce_sum((Y_tf-Y_est)**2)
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
Suppose now we generate some data
W_true = np.random.randn(D_in, D_out)
X_true = np.random.randn(N, D_in)
Y_true = np.dot(X_true, W_true)
Y_tf = tf.constant(Y_true.astype(np.float32))
As soon as I know the dimension of my training data, I can declare the latent variable that will be fed to the placeholder X to be optimised.
latent = tf.Variable(tf.zeros([N, D_in]), name='latent', trainable=True)
init_op = tf.global_variables_initializer()
After that, what I would like to do is to feed this latent variable to the placeholder X and run the optimisation.
with tf.Session() as sess:
sess.run(init_op)
for n in range(10000):
sess.run(train_step, feed_dict={X : sess.run(latent)})
if (n+1) % 1000 == 0:
print('iter %i, %f' % (n+1, sess.run(loss, feed_dict={X : sess.run(latent)})))
The problem is that the optimiser does not optimise W and latent, but only W. I have tried also to feed the variable directly without evaluation but I get this error:
ValueError: setting an array element with a sequence.
Have you ever encountered this kind of issue? Do you know how to overcome this problem? Are there any possible workaround to optimise on a placeholder?
By the way, I am using TensorFlow 1.1.0rc0 with Python 2.7.13
I have implemented a bi-directional RNN in TensorFlow using a BasicLSTMCell and rnn.bidirectional_rnn. I am calculating the loss using seq2seq.sequence_loss_by_example after concatenating the outputs I receive. My application is a next character predictor.
I getting an extremely low cost, (~50 times lesser than the unidirectional RNN). I suspect I've made a mistake in the seq2seq.sequence_loss_by_example step.
Here is my model -
# Model begins
cell_fn = rnn_cell.BasicLSTMCell
cell = fw_cell = cell_fn(args.rnn_size, state_is_tuple=True)
cell2 = bw_cell = cell_fn(args.rnn_size, state_is_tuple=True)
input_data = tf.placeholder(tf.int32, [args.batch_size, args.seq_length])
targets = tf.placeholder(tf.int32, [args.batch_size, args.seq_length])
initial_state = fw_cell.zero_state(args.batch_size, tf.float32)
initial_state2 = bw_cell.zero_state(args.batch_size, tf.float32)
with tf.variable_scope('rnnlm'):
softmax_w = tf.get_variable("softmax_w", [2*args.rnn_size, args.vocab_size])
softmax_b = tf.get_variable("softmax_b", [args.vocab_size])
with tf.device("/cpu:0"):
embedding = tf.get_variable("embedding", [args.vocab_size, args.rnn_size])
input_embeddings = tf.nn.embedding_lookup(embedding, input_data)
inputs = tf.unpack(input_embeddings, axis=1)
outputs, last_state, last_state2 = rnn.bidirectional_rnn(fw_cell,
bw_cell,
inputs,
initial_state_fw=initial_state,
initial_state_bw=initial_state2,
dtype=tf.float32)
output = tf.reshape(tf.concat(1, outputs), [-1, 2*args.rnn_size])
logits = tf.matmul(output, softmax_w) + softmax_b
probs = tf.nn.softmax(logits)
loss = seq2seq.sequence_loss_by_example([logits],
[tf.reshape(targets, [-1])],
[tf.ones([args.batch_size * args.seq_length])],
args.vocab_size)
cost = tf.reduce_sum(loss) / args.batch_size / args.seq_length
lr = tf.Variable(0.0, trainable=False)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
args.grad_clip)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.apply_gradients(zip(grads, tvars))
I think there is no any mistake in your code.
The problem is the objective function with the Bi-RNN model in your application (next character predictor).
The unidirectional RNN (such as ptb_word_lm or char-rnn-tensorflow), it is really a model used for the prediction, for example, if raw_text is 1,3,5,2,4,8,9,0, then, your inputs and target will be:
inputs: 1,3,5,2,4,8,9
target: 3,5,2,4,8,9,0
and the prediction is (1)->3, (1,3)->5, ..., (1,3,5,2,4,8,9)->0
But in Bi-RNN, the first prediction is really not just (1)->3, because the output[0] in your code contians the reverse information of the raw_text by use bw_cell (also not (1,3)->5, ..., (1,3,5,2,4,8,9)->0). A similar example is: I tell you that flower is a rose, and than I let you to predict what the flower is? I think you can give me the right answer very easy, and this is also the reason why you getting an extremely low loss in your Bi-RNN model for the application.
In fact, I think Bi-RNN (or Bi-LSTM) is not an appropriate model for the application of next character predictor. Bi-RNN need the full sequence when it works, you will find you can't use this model easily when you want to predict the next character.