Keras Custom Layer ValueError: An operation has `None` for gradient.

Keras Custom Layer ValueError: An operation has `None` for gradient. - python

I have created a custom Keras Conv2D layer as follows:
class CustConv2D(Conv2D):
def __init__(self, filters, kernel_size, kernelB=None, activation=None, **kwargs):
self.rank = 2
self.num_filters = filters
self.kernel_size = conv_utils.normalize_tuple(kernel_size, self.rank, 'kernel_size')
self.kernelB = kernelB
self.activation = activations.get(activation)
super(CustConv2D, self).__init__(self.num_filters, self.kernel_size, **kwargs)
def build(self, input_shape):
if K.image_data_format() == 'channels_first':
channel_axis = 1
else:
channel_axis = -1
if input_shape[channel_axis] is None:
raise ValueError('The channel dimension of the inputs '
'should be defined. Found `None`.')
input_dim = input_shape[channel_axis]
num_basis = K.int_shape(self.kernelB)[-1]
kernel_shape = (num_basis, input_dim, self.num_filters)
self.kernelA = self.add_weight(shape=kernel_shape,
initializer=RandomUniform(minval=-1.0,
maxval=1.0, seed=None),
name='kernelA',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
self.kernel = K.sum(self.kernelA[None, None, :, :, :] * self.kernelB[:, :, :, None, None], axis=2)
# Set input spec.
self.input_spec = InputSpec(ndim=self.rank + 2, axes={channel_axis: input_dim})
self.built = True
super(CustConv2D, self).build(input_shape)
I use the CustomConv2D as the first Conv layer of my model.
img = Input(shape=(width, height, 1))
l1 = CustConv2D(filters=64, kernel_size=(11, 11), kernelB=basis_L1, activation='relu')(img)
The model compiles fine; but gives me the following error while training.
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Is there a way to figure out which operation is throwing the error? Also, is there any implementation error in the way I am writing the custom layer?

You're destroying your build by calling the original Conv2D build (your self.kernel will be replaced, then self.kernelA will never be used, thus backpropagation will never reach it).
It's also expecting biases and all the regular stuff:
class CustConv2D(Conv2D):
def __init__(self, filters, kernel_size, kernelB=None, activation=None, **kwargs):
#...
#...
#don't use bias if you're not defining it:
super(CustConv2D, self).__init__(self.num_filters, self.kernel_size,
activation=activation,
use_bias=False, **kwargs)
#bonus: don't forget to add the activation to the call above
#it will also replace all your `self.anything` defined before this call
def build(self, input_shape):
#...
#...
#don't use bias:
self.bias = None
#consider the layer built
self.built = True
#do not destroy your build
#comment: super(CustConv2D, self).build(input_shape)

It may be because there are some weights in your code that are defined by not used in the calculation of the output. Thus its gradient wrt the loss is None/undefined.
A coded out example can be found here: https://github.com/keras-team/keras/issues/12521#issuecomment-496743146

Related

An op outside of the function building code is being passed a "Graph" tensor

I've implemented a Tf2 Keras Layer but when I'm training I'm getting the following error:
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
#tf.function
def has_init_scope():
my_constant = tf.constant(1.)
with tf.init_scope():
added = my_constant * 2
The graph tensor has name: ada_cos_layer_1/truediv:0
I've seen some similar posts but their issue was with the Lambda layer, which I'm not using. I believe that, in my case, it has to do with an assignment to an attribute which is not a tf.Variable (self.s). However, I've already tried setting it as such or using add_weight without any help. My layer is the following:
class AdaCos(tf.keras.layers.Layer):
def __init__(self, n_classes, margin=None, logit_scale=None, **kwargs):
super().__init__(**kwargs)
self.n_classes = n_classes
self.s = math.sqrt(2)*math.log(n_classes-1)
def build(self, input_shape):
super().build(input_shape[0])
self.w = self.add_weight(name='weights',
shape=(input_shape[0][-1], self.n_classes),
initializer='glorot_uniform',
trainable=True)
#staticmethod
def get_median(v):
v = tf.reshape(v, [-1])
mid = v.get_shape()[0]//2 + 1
return tf.nn.top_k(v, mid).values[-1]
def call(self, inputs):
x, y = inputs
# normalize feature
x = tf.nn.l2_normalize(x, axis=1, name='normed_embd')
# normalize weights
w = tf.nn.l2_normalize(self.w, axis=0, name='normed_weights')
# dot product
logits = tf.matmul(x, w, name='cos_t')
# add margin
# clip logits to prevent zero division when backward
theta = tf.acos(tf.clip_by_value(logits, -1.0 + 1e-5, 1.0 - 1e-5))
B_avg = tf.where(tf.expand_dims(y, 1) < 1, tf.exp(self.s*logits), tf.zeros_like(logits))
B_avg = tf.reduce_mean(tf.reduce_sum(B_avg, axis=1), name='B_avg')
theta_class = tf.gather_nd(theta, tf.expand_dims(tf.cast(y, tf.int32), 1), 1, name='theta_class')
theta_med = self.get_median(theta_class)
with tf.control_dependencies([theta_med, B_avg]):
self.s = tf.math.log(B_avg) / tf.cos(tf.minimum(math.pi/4, theta_med))
out = tf.multiply(logits, self.s, 'arcface_logist')
return out
def compute_output_shape(self, input_shape):
return (None, self.n_classes)

Why does tf.executing_eagerly() return False in TensorFlow 2?

Let me explain my set up. I am using TensorFlow 2.1, the Keras version shipped with TF, and TensorFlow Probability 0.9.
I have a function get_model that creates (with the functional API) and returns a model using Keras and custom layers. In the __init__ method of these custom layers A, I call a method A.m, which executes the statement print(tf.executing_eagerly()), but it returns False. Why?
To be more precise, this is roughly my setup
def get_model():
inp = Input(...)
x = A(...)(inp)
x = A(...)(x)
...
model = Model(inp, out)
model.compile(...)
return model
class A(tfp.layers.DenseFlipout): # TensorFlow Probability
def __init__(...):
self.m()
def m(self):
print(tf.executing_eagerly()) # Prints False
The documentation of tf.executing_eagerly says
Eager execution is enabled by default and this API returns True in most of cases. However, this API might return False in the following use cases.
Executing inside tf.function, unless under tf.init_scope or tf.config.experimental_run_functions_eagerly(True) is previously called.
Executing inside a transformation function for tf.dataset.
tf.compat.v1.disable_eager_execution() is called.
But these cases are not my case, so tf.executing_eagerly() should return True in my case, but no. Why?
Here's a simple complete example (in TF 2.1) that illustrates the problem.
import tensorflow as tf
class MyLayer(tf.keras.layers.Layer):
def call(self, inputs):
tf.print("tf.executing_eagerly() =", tf.executing_eagerly())
return inputs
def get_model():
inp = tf.keras.layers.Input(shape=(1,))
out = MyLayer(8)(inp)
model = tf.keras.Model(inputs=inp, outputs=out)
model.summary()
return model
def train():
model = get_model()
model.compile(optimizer="adam", loss="mae")
x_train = [2, 3, 4, 1, 2, 6]
y_train = [1, 0, 1, 0, 1, 1]
model.fit(x_train, y_train)
if __name__ == '__main__':
train()
This example prints tf.executing_eagerly() = False.
See the related Github issue.

As far as I know, when an input to a custom layer is symbolic input, then the layer is executed in graph (non-eager) mode. However, if your input to the custom layer is an eager tensor (as in the following example #1, then the custom layer is executed in the eager mode. So your model's output tf.executing_eagerly() = False is expected.
Example #1
from tensorflow.keras import layers
class Linear(layers.Layer):
def __init__(self, units=32, input_dim=32):
super(Linear, self).__init__()
w_init = tf.random_normal_initializer()
self.w = tf.Variable(initial_value=w_init(shape=(input_dim, units),
dtype='float32'),
trainable=True)
b_init = tf.zeros_initializer()
self.b = tf.Variable(initial_value=b_init(shape=(units,),
dtype='float32'),
trainable=True)
def call(self, inputs):
print("tf.executing_eagerly() =", tf.executing_eagerly())
return tf.matmul(inputs, self.w) + self.b
x = tf.ones((1, 2)) # returns tf.executing_eagerly() = True
#x = tf.keras.layers.Input(shape=(2,)) #tf.executing_eagerly() = False
linear_layer = Linear(4, 2)
y = linear_layer(x)
print(y)
#output in graph mode: Tensor("linear_9/Identity:0", shape=(None, 4), dtype=float32)
#output in Eager mode: tf.Tensor([[-0.03011466 0.02563028 0.01234017 0.02272708]], shape=(1, 4), dtype=float32)
Here is another example with Keras functional API where custom layer was used (similar to you). This model is executed in graph mode and prints tf.executing_eagerly() = False as in your case.
from tensorflow import keras
from tensorflow.keras import layers
class CustomDense(layers.Layer):
def __init__(self, units=32):
super(CustomDense, self).__init__()
self.units = units
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='random_normal',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)
def call(self, inputs):
print("tf.executing_eagerly() =", tf.executing_eagerly())
return tf.matmul(inputs, self.w) + self.b
inputs = keras.Input((4,))
outputs = CustomDense(10)(inputs)
model = keras.Model(inputs, outputs)

You might be running in a Colab. If so, try the following immediately after importing Tensorflow:
tf.compat.v1.enable_v2_behavior()
More generally, check the docs at https://www.tensorflow.org/api_docs/python/tf/executing_eagerly for more information on eager execution.

How to convert this code from Keras to Tensorflow?

I am trying to convert code from Keras to tensorflow, I don't have much idea about Keras api, I am a Tensorflow user, Here is Keras code :
rawmeta = layers.Input(shape=(1,), dtype="string")
emb = elmolayer()(rawmeta)
d1 = layers.Dense(256, activation='relu')(emb)
yhat = layers.Dense(31, activation='softmax', name = "output_node")(d1)
model = Model(inputs=[rawmeta], outputs=yhat)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Where elmolayer defined as follows :
class elmolayer(Layer):
def __init__(self, **kwargs):
self.dimensions = 1024
self.trainable=True
super(elmolayer, self).__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
name="{}_module".format(self.name))
self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
super(elmolayer, self).build(input_shape)
def call(self, x, mask=None):
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True,
signature='default',
)['default']
return result
def compute_mask(self, inputs, mask=None):
return K.not_equal(inputs, '--PAD--')
def compute_output_shape(self, input_shape):
return (input_shape[0], self.dimensions)
My Tensorflow implementation of this code is :
class Base_model(object):
def __init__(self, elmo_embedding_matrix):
tf.reset_default_graph()
# define placeholders
sentences = tf.placeholder(tf.int32, [None, None], name='sentences')
y_true = tf.placeholder(tf.int32, [None, None], name='labels' )
self.elmo = tf.get_variable(name="relation_embedding", shape=[elmo_embedding_matrix.shape[0],elmo_embedding_matrix.shape[1]],
initializer=tf.constant_initializer(np.array(elmo_embedding_matrix)),
trainable=True,dtype=tf.float32)
embedding_lookup = tf.nn.embedding_lookup(self.elmo,sentences)
d1 = tf.layers.dense(embedding_lookup, 256, tf.nn.relu)
y_pred = tf.layers.dense(d1, 31, tf.nn.softmax)
matches = tf.equal(tf.argmax(y_pred,1),tf.argmax(y_true,1))
acc = tf.reduce_mean(tf.cast(matches,tf.float32))
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true,logits=y_pred))
train = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cross_entropy)
My confusion is the last dense layer in keras model is :
yhat = layers.Dense(31, activation='softmax', name = "output_node")(d1)
While in tensorflow code if i am using tf.nn.softmax_cross_entropy_with_logits_v2 then should i pass second dense layer to softmax eg.,
y_pred = tf.layers.dense(d1, 31, tf.nn.softmax)
Because if i am using softmax here then tf.nn.softmax_cross_entropy_with_logits_v2 will use softmax again on logits.
How to convert that Keras code to Tensorflow?

Specifying the comment here (Answer Section) even though it is present in Comments Section, for the benefit of the Community.
The Tensorflow equivalent Code for the Keras Code to represent Output Layer,
yhat = layers.Dense(31, activation='softmax', name = "output_node")(d1)
is
y_logits = tf.layers.dense(d1, 31, tf.nn.softmax)
y_pred = tf.nn.softmax(y_logits)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true,logits=y_logits))
Hope this helps. Happy Learning!

Get batch size in Keras custom layer and use tensorflow operations (tf.Variable)

I would like to write a Keras custom layer with tensorflow operations, that require the batch size as input. Apparently I'm struggling in every nook and cranny.
Suppose a very simple layer:
(1) get batch size
(2) create a tf.Variable (let's call it my_var) based on the batch size, then some tf.random ops to alter my_var
(3) finally, return input multiplied with my_var
What I tried so far:
class TestLayer(Layer):
def __init__(self, **kwargs):
self.num_batch = None
self.my_var = None
super(TestLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.batch_size = input_shape[0]
var_init = tf.ones(self.batch_size, dtype = x.dtype)
self.my_var = tf.Variable(var_init, trainable=False, validate_shape=False)
# some tensorflow random operations to alter self.my_var
super(TestLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return self.my_var * x
def compute_output_shape(self, input_shape):
return input_shape
Now creating a very simple model:
# define model
input_layer = Input(shape = (2, 2, 3), name = 'input_layer')
x = TestLayer()(input_layer)
# connect model
my_mod = Model(inputs = input_layer, outputs = x)
my_mod.summary()
Unfortunately, what ever I try/change in the code, I get multiple errors, most of them with very cryptical tracebacks (ValueError: Cannot convert a partially known TensorShape to a Tensor: or ValueError: None values not supported.).
Any general suggestions? Thanks in advance.

You need to specify batch size if you want to create a variable of size batch_size. Additionally, if you want to print a summary the tf.Variable must have a fixed shape (validatate_shape=True) and it must be broadcastable to be successfully multiplied by the input:
import tensorflow as tf
from tensorflow.keras.layers import Layer, Input
from tensorflow.keras.models import Model
class TestLayer(Layer):
def __init__(self, **kwargs):
self.num_batch = None
self.my_var = None
super(TestLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.batch_size = input_shape[0]
var_init = tf.ones(self.batch_size, dtype=tf.float32)[..., None, None, None]
self.my_var = tf.Variable(var_init, trainable=False, validate_shape=True)
super(TestLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
res = self.my_var * x
return res
def compute_output_shape(self, input_shape):
return input_shape
# define model
input_layer = Input(shape=(2, 2, 3), name='input_layer', batch_size=10)
x = TestLayer()(input_layer)
# connect model
my_mod = Model(inputs=input_layer, outputs=x)
my_mod.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) (10, 2, 2, 3) 0
_________________________________________________________________
test_layer (TestLayer) (10, 2, 2, 3) 0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0

Including deviation from expected model output in a cost function

I'm trying to write a simple model to do the following. I want my model to take input and then do a normal feedforward output with linear activation (K.bias_add(K.dot(x, self.kernel), self.bias)) with the kernel and bias vectors as trainable parameters. I also want the model to include a parameter x_out (also trainable) of the estimated output. The goal is for the model to output the parameter x_out as output, but to included norm(W*x + b - x_out) as a penalty term in the cost function. I've tried doing this with a wrapper function for a custom loss, but have not succeeded so far.
from keras import backend as K
from keras.layers import Layer
class MyFeedforward(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyFeedforward, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
self.bias = self.add_weight(name='bias',
shape=(self.output_dim,),
initializer='uniform',
trainable=True)
super(MyFeedforward, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
#return K.bias_add(K.dot(x, self.zeros), self.output_estimate)
return K.bias_add(K.dot(x, self.kernel), self.bias)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
from keras import backend as K
from keras.layers import Layer
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.zeros = self.add_weight(name='zeros',
shape=(input_shape[1], self.output_dim),
initializer='Zeros',
trainable=False)
self.output_estimate = self.add_weight(name='output_estimate',
shape=(self.output_dim,),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.bias_add(K.dot(x, self.zeros), self.output_estimate)
#return K.bias_add(K.dot(x, self.kernel), self.bias)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
from keras.models import Model
from keras import layers
from keras import Input
input_tensor = layers.Input(shape=(784,))
output_estimate = MyLayer(10,)(input_tensor)
calculated_output = MyFeedforward(10,)(input_tensor)
model = Model(input_tensor, [calculated_output, output_estimate])
model.summary()
My hope is to have both the MyFeedforward layer (computes K.bias_add(K.dot(x, kernel), bias) ) and the MyLayer (simply outputs the estimated output x_out) used in the cost function. The ultimate goal is to optimize something like categorical_crossentropy between the training data and the output of x_out, with the squared deviation of x_out and the real output (K.sum(K.square(x_out - K.bias_add(K.dot(x, kernel), bias)). If I can do this without setting up separate layers this would work equally well.
I've tried setting this up with a wrapper for a custom loss function and a multi-head model (one for the x_out layer, one for the normal layer) but it appears that I end up not having access to the normal layer's output when I do this. (I need y_pred from another layer, or I need access to the input so that I can grab layer.kernel and layer.bias and apply them to the input for the computation).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keras Custom Layer ValueError: An operation has `None` for gradient. - python

It may be because there are some weights in your code that are defined by not used in the calculation of the output. Thus its gradient wrt the loss is None/undefined. A coded out example can be found here: https://github.com/keras-team/keras/issues/12521#issuecomment-496743146

Related

An op outside of the function building code is being passed a "Graph" tensor

Why does tf.executing_eagerly() return False in TensorFlow 2?

How to convert this code from Keras to Tensorflow?

Get batch size in Keras custom layer and use tensorflow operations (tf.Variable)

Including deviation from expected model output in a cost function

Categories

Resources