Variable has 'None' for gradient

Variable has 'None' for gradient - python

I am currently using TensorFlow version 1.14.
In the code below, I am trying to create a dummy model that takes in 2 inputs and provides two outputs, with all weights set to ones and biases to zeros (Single layered perceptron). I am defining a custom loss function that computes the jacobian of the input layer wrt the output layer.
def my_loss(y_pred, y_true):
jacobian_tf = jacobian_tensorflow3(sim.output, sim.input)
loss = tf.abs(tf.linalg.det(jacobian_tf))
return K.mean(loss)
def jacobian_tensorflow3(x,y, verbose=False):
jacobian_matrix = []
it = tqdm(range(ndim)) if verbose else range(ndim)
for o in it:
grad_func = tf.gradients(x[:,o], y)
jacobian_matrix.append(grad_func[0])
jacobian_matrix = tf.stack(jacobian_matrix)
jacobian_matrix1 = tf.transpose(jacobian_matrix, perm=[1,0,2])
return jacobian_matrix1
sim = Sequential()
sim.add(Dense(2, kernel_initializer='ones', bias_initializer='zeros', activation='linear', input_dim=2))
sim.compile(optimizer='adam', loss=my_loss)
sim.fit(B, np.random.random(B.shape), batch_size=100, epochs=2)
While this model works in giving the result of the Jacobian matrix and also has no issues with compilation, but when I run sim.fit I get the following error:
ValueError: Variable <tf.Variable 'dense_14/bias:0' shape=(2,) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I am stuck at this step for a long time, and I am not able to proceed ahead. Any help/suggestions would be beneficial.

Related

Subset model outputs in custom loss function in tensorflow/keras

I am interested in using a neural network to estimate the parameters of a linear regression. To do this I am creating a network that makes two-parameter prediction, and I am trying to write a custom loss function that will determine how well the two parameters do as a slope and intercept in a logistic regression model, using a third dataset as a predictor in the logistic regression.
So I have a matrix of predictors X, with dimensions 10,000 by 20, and a binary outcome variable y. Additionally, I have a 10,000 observations linear_predictor that I want to use to use in the custom loss function evaluate the two outputs of the model.
import numpy as np
from tensorflow.keras import Model, Input
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import Dense
import tensorflow as tf
# create some dummy data
X = np.random.rand(10_000, 20)
y = (np.random.rand(10_000) > 0.8).astype(int)
linear_predictor = np.random.rand(10_000)
# define custom loss function
def CustomLoss(y_true, y_pred, input_):
y_estim = y_pred[:,0]*input_ + y_pred[:,1]
y_estim = tf.gather(y_pred, 0, axis=1)*input_ + tf.gather(y_pred, 1, axis=1)
return tf.keras.losses.BinaryCrossentropy(from_logits=True)(y_true, y_estim)
# create inputs to model
lp_input = Input(shape=linear_predictor.shape)
X_input = Input(shape=X.shape)
y_input = Input(shape=y.shape)
# create network
hidden1 = Dense(32, activation='relu')(X_input)
hidden2 = Dense(8, activation='relu')(hidden1)
output = Dense(2, activation='linear')(hidden2)
model = Model([y_input, X_input, lp_input], output)
# add loss function
model.add_loss(CustomLoss(y_input, output, lp_input))
# fit model
model.fit(x=X_input, y=y_input, epochs=3)
However, I am unable to get the CustomLoss function to work. Something is going wrong with subsetting the model's two-parameter output to get one parameter to use as a scalar as the slope and another to use as the intercept.
The error I am getting is:
ValueError: Exception encountered when calling layer "tf.math.multiply_1" (type TFOpLambda).
Dimensions must be equal, but are 2 and 10000 for '{{node tf.math.multiply_1/Mul}} = Mul[T=DT_FLOAT](
Placeholder, Placeholder_1)' with input shapes: [?,2], [?,10000].
Call arguments received by layer "tf.math.multiply_1" (type TFOpLambda):
• x=tf.Tensor(shape=(None, 2), dtype=float32)
• y=tf.Tensor(shape=(None, 10000), dtype=float32)
• name=None
This suggests that the variable y_pred is not being subset, even though I have tried using the method recommended here with numpy-like indexing (y_pred[:1]) as well as the gather_nd method here, among others.
I think this should be possible, any help is appreciated.

Using Gradient Tape for Jacobian of LSTM model - Python

I am building a sequence to one model prediction using LSTM. My data has 4 input variables and 1 output variable which needs to be predicted. The data is a time series data. The total length of the data is 38265 (total number of timesteps). The total data is in a Data Frame of size 38265 *5
I want to use the previous 20 timesteps data of the 4 input variables to make prediction of my output variable. I am using the below code for this purpose.
model = Sequential()
model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape =
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(1))
I want to calculate the Jacobian of the output variable w.r.t the LSTM model function using tf.Gradient Tape .. Can anyone help me out with this??

The solution to segregate the Jacobian of the output with respect to the LSTM input can be done as follows:
Using tf.GradientTape(), we can compute the Jacobian arising from the gradient flow.
However for getting the Jacobian , the input needs to be in the form of tf.EagerTensor which is usually available when we want to see the Jacobian of the output (after executing y=model(x)). The following code snippet shares this idea:
#Get the Jacobian for each persistent gradient evaluation
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu'))
model.add(tf.keras.layers.Dense(2,activation='relu'))
x = tf.constant([[5., 6., 3.]])
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
# Forward pass
tape.watch(x)
y = model(x)
loss = tf.reduce_mean(y**2)
print('Gradients\n')
jacobian_wrt_loss=tape.jacobian(loss,x)
print(f'{jacobian_wrt_loss}\n')
jacobian_wrt_y=tape.jacobian(y,x)
print(f'{jacobian_wrt_y}\n')
But for getting intermediate outputs ,such as in this case, there have been many samples which use Keras. When we separate the outputs coming out from model.layers.output, we get the type to be a Keras.Tensor instead of an EagerTensor.
However for creating the Jacobian, we need the Eager Tensor. (After many failed attempts with #tf.function wrapping as eager execution is already present in TF>2.0)
So alternatively, an auxiliary model can be created with the layers required (in this case, just the Input and LSTM layers).The output of this model will be a tf.EagerTensor which will be useful for the Jacobian tensor creation. The following has been shown in this snippet:
#General Syntax for getting jacobians for each layer output
import numpy as np
import tensorflow as tf
tf.executing_eagerly()
x=tf.constant([[15., 60., 32.]])
x_inp = tf.keras.layers.Input(tensor=tf.constant([[15., 60., 32.]]))
model=tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_2'))
aux_model=tf.keras.Sequential()
aux_model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
#model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
# Forward pass
tape.watch(x)
x_y = model(x)
act_y=aux_model(x)
print(x_y,type(x_y))
ops=[layer.output for layer in model.layers]
# ops=[layer.output for layer in model.layers]
# inps=[layer.input for layer in model.layers]
print('Jacobian of Full FFNN\n')
jacobian=tape.jacobian(x_y,x)
print(f'{jacobian[0]}\n')
print('Jacobian of FFNN with Just first Dense\n')
jacobian=tape.jacobian(act_y,x)
print(f'{jacobian[0]}\n')
Here I have used a simple FFNN consisting of 2 Dense layers, but I want to evaluate w.r.t the output of the first Dense layer. Hence I created an auxiliary model having just 1 Dense layer and determined the output of the Jacobian from it.
The details can be found here.

With the help from #Abhilash Majumder, I have done it this way. I am posting it here so that it might help someone in the future.
import numpy as np
import pandas as pd
import tensorflow as tf
tf.compat.v1.enable_eager_execution() #This will enable eager execution which is must.
tf.executing_eagerly() #check if eager execution is enabled or not. Should give "True"
data = pd.read_excel("FileName or Location ")
#My data is in the from of dataframe with 127549 rows and 5 columns(127549*5)
a = data[:20] #shape is (20,5)
b = data[50:70] # shape is (20,5)
A = [a,b] # making a list
A = np.array(A) # convert into array size (2,20,5)
At = tf.convert_to_tensor(A, np.float32) #convert into tensor
At.shape # TensorShape([Dimension(2), Dimension(20), Dimension(5)])
model = load_model('EKF-LSTM-1.h5') # Load the trained model
# I have a trained model which is shown in the question above.
# Output of this model is a single value
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
tape.watch(At)
y1 = model(At) #defining your output as a function of input variables
print(y1,type(y1)
#output
tf.Tensor([[0.04251503],[0.04634088]], shape=(2, 1), dtype=float32) <class
'tensorflow.python.framework.ops.EagerTensor'>
jacobian=tape.jacobian(y1,At) #jacobian of output w.r.t both inputs
jacobian.shape
Outupt
TensorShape([Dimension(2), Dimension(1), Dimension(2), Dimension(20), Dimension(5)])
Here I calculated Jacobian w.r.t 2 inputs each of size (20,5). If you want to calculate w.r.t to only one input of size (20,5), then use this
jacobian=tape.jacobian(y1,At[0]) #jacobian of output w.r.t only 1st input in 'At'
jacobian.shape
Output
TensorShape([Dimension(1), Dimension(1), Dimension(1), Dimension(20), Dimension(5)])

For those looking to compute the Jacobian over a series of inputs and outputs that are independent of each other for input[i], output[j], i != j, consider the batch_jacobian method.
This will reduce the number of dimensions in your computed Jacobian tensor by one and could be the difference between running out of memory and not.
See: batch_jacobian in the TensorFlow GradientTape docs.

Keras loss function value error: ValueError: An operation has `None` for gradient. on LSTM network

So I'm trying to train my LSTM network language model, and use a perplexity function as my loss function but i get the following error:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My loss function looks as follows:
from keras import backend as K
def perplexity_raw(y_true, y_pred):
"""
The perplexity metric. Why isn't this part of Keras yet?!
https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow
https://github.com/keras-team/keras/issues/8267
"""
# cross_entropy = K.sparse_categorical_crossentropy(y_true, y_pred)
cross_entropy = K.cast(K.equal(K.max(y_true, axis=-1),
K.cast(K.argmax(y_pred, axis=-1), K.floatx())),
K.floatx())
perplexity = K.exp(cross_entropy)
return perplexity
and I create my model as follows:
# define model
model = Sequential()
model.add(Embedding(vocab_size, 500, input_length=max_length-1))
model.add(LSTM(750))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile network
model.compile(loss=perplexity_raw, optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(X, y, epochs=150, verbose=2)
The error occurs when I try to fit my model. Does anyone know what causes the error and how to fix it?

These are the culprits: K.argmax and K.max. They don't have a gradient. I also think you just straight up don't need them in your loss metric! That's because maxing and argmaxing something removes the information on how much the prediction is wrong.
I don't know what kind of loss you want to measure, but I think you are looking for something like tf.exp(tf.nn.sigmoid_cross_entropy_with_logits(y_true, y_pred)) or tf.exp(tf.softmax_cross_entopy_with_logits(y_true, y_pred)). You might need to convert your logits to one hot encodings using tf.one_hot.

How is embedding matrix being trained in this code snippet?

I'm following the code of a coursera assignment which implements a NER tagger using a bidirectional LSTM.
But I'm not able to understand how the embedding matrix is being updated. In the following code, build_layers has a variable embedding_matrix_variable which acts an input the the LSTM. However it's not getting updated anywhere.
Can you help me understand how embeddings are being trained?
def build_layers(self, vocabulary_size, embedding_dim, n_hidden_rnn, n_tags):
initial_embedding_matrix = np.random.randn(vocabulary_size, embedding_dim) / np.sqrt(embedding_dim)
embedding_matrix_variable = tf.Variable(initial_embedding_matrix, name='embedding_matrix', dtype=tf.float32)
forward_cell = tf.nn.rnn_cell.DropoutWrapper(
tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn, forget_bias=3.0),
input_keep_prob=self.dropout_ph,
output_keep_prob=self.dropout_ph,
state_keep_prob=self.dropout_ph
)
backward_cell = tf.nn.rnn_cell.DropoutWrapper(
tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn, forget_bias=3.0),
input_keep_prob=self.dropout_ph,
output_keep_prob=self.dropout_ph,
state_keep_prob=self.dropout_ph
)
embeddings = tf.nn.embedding_lookup(embedding_matrix_variable, self.input_batch)
(rnn_output_fw, rnn_output_bw), _ = tf.nn.bidirectional_dynamic_rnn(
cell_fw=forward_cell, cell_bw=backward_cell,
dtype=tf.float32,
inputs=embeddings,
sequence_length=self.lengths
)
rnn_output = tf.concat([rnn_output_fw, rnn_output_bw], axis=2)
self.logits = tf.layers.dense(rnn_output, n_tags, activation=None)
def compute_loss(self, n_tags, PAD_index):
"""Computes masked cross-entopy loss with logits."""
ground_truth_tags_one_hot = tf.one_hot(self.ground_truth_tags, n_tags)
loss_tensor = tf.nn.softmax_cross_entropy_with_logits(labels=ground_truth_tags_one_hot, logits=self.logits)
mask = tf.cast(tf.not_equal(self.input_batch, PAD_index), tf.float32)
self.loss = tf.reduce_mean(tf.reduce_sum(tf.multiply(loss_tensor, mask), axis=-1) / tf.reduce_sum(mask, axis=-1))

In TensorFlow, variables are not usually updated directly (i.e. by manually setting them to a certain value), but rather they are trained using an optimization algorithm and automatic differentiation.
When you define a tf.Variable, you are adding a node (that maintains a state) to the computational graph. At training time, if the loss node depends on the state of the variable that you defined, TensorFlow will compute the gradient of the loss function with respect to that variable by automatically following the chain rule through the computational graph. Then, the optimization algorithm will make use of the computed gradients to update the values of the trainable variables that took part in the computation of the loss.
Concretely, the code that you provide builds a TensorFlow graph in which the loss self.loss depends on the weights in embedding_matrix_variable (i.e. there is a path between these nodes in the graph), so TensorFlow will compute the gradient with respect to this variable, and the optimizer will update its values when minimizing the loss. It might be useful to inspect the TensorFlow graph using TensorBoard.

TensorFlow session inside Keras custom loss function

After going through some Stack questions and the Keras documentation, I manage to write some code trying to evaluate the gradient of the output of a neural network w.r.t its inputs, the purpose being a simple exercise of approximating a bivariate function (f(x,y) = x^2+y^2) using as loss the difference between analytical and automatic differentiation.
Combining answers from two questions (Keras custom loss function: Accessing current input pattern
and Getting gradient of model output w.r.t weights using Keras
), I came up with this:
import tensorflow as tf
from keras import backend as K
from keras.models import Model
from keras.layers import Dense, Activation, Input
def custom_loss(input_tensor):
outputTensor = model.output
listOfVariableTensors = model.input
gradients = K.gradients(outputTensor, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:input_tensor})
grad_pred = K.add(evaluated_gradients[0], evaluated_gradients[1])
grad_true = k.add(K.scalar_mul(2, model.input[0][0]), K.scalar_mul(2, model.input[0][1]))
return K.square(K.subtract(grad_pred, grad_true))
input_tensor = Input(shape=(2,))
hidden = Dense(10, activation='relu')(input_tensor)
out = Dense(1, activation='sigmoid')(hidden)
model = Model(input_tensor, out)
model.compile(loss=custom_loss_wrapper(input_tensor), optimizer='adam')
Which yields the error: TypeError: The value of a feed cannot be a tf.Tensor object. because of feed_dict={model.input:input_tensor}. I understand the error, I just don't know how to fix it.
From what I gathered, I can't simply pass input data into the loss function, it must be a tensor. I realized Keras would 'understand' it when I call input_tensor. This all just leads me to think I'm doing things the wrong way, trying to evaluate the gradient like that. Would really appreciate some enlightenment.

I don't really understand why you want this loss function, but I will provide an answer anyway. Also, there is no need to evaluate the gradient within the function (in fact, you would be "disconnecting" the computational graph). The loss function could be implemented as follows:
from keras import backend as K
from keras.models import Model
from keras.layers import Dense, Input
def custom_loss(input_tensor, output_tensor):
def loss(y_true, y_pred):
gradients = K.gradients(output_tensor, input_tensor)
grad_pred = K.sum(gradients, axis=-1)
grad_true = K.sum(2*input_tensor, axis=-1)
return K.square(grad_pred - grad_true)
return loss
input_tensor = Input(shape=(2,))
hidden = Dense(10, activation='relu')(input_tensor)
output_tensor = Dense(1, activation='sigmoid')(hidden)
model = Model(input_tensor, output_tensor)
model.compile(loss=custom_loss(input_tensor, output_tensor), optimizer='adam')

A Keras loss must have y_true and y_pred as inputs. You can try adding your input object as both x and y during the fit:
def custom_loss(y_true,y_pred):
...
return K.square(K.subtract(grad_true, grad_pred))
...
model.compile(loss=custom_loss, optimizer='adam')
model.fit(X, X, ...)
This way, y_true will be the batch being processed at each iteration from the input X, while y_pred will be the output of the model for that particular batch.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Variable has 'None' for gradient - python

Related

Subset model outputs in custom loss function in tensorflow/keras

Using Gradient Tape for Jacobian of LSTM model - Python

Keras loss function value error: ValueError: An operation has `None` for gradient. on LSTM network

How is embedding matrix being trained in this code snippet?

TensorFlow session inside Keras custom loss function

Categories

Resources