Keras Custom Layer with advanced calculations - python

I want to write some custom Keras Layers and do some advanced calculations in the layer, for example with Numpy, Scikit, OpenCV...
I know there are some math functions in keras.backend that can operate on tensors, but i need some more advanced functions.
However, i have no clue how to implement this correctly, i get the error message:
You must feed a value for placeholder tensor 'input_1' with dtype float and shape [...]
Here is my custom layer:
class MyCustomLayer(Layer):
def __init__(self, **kwargs):
super(MyCustomLayer, self).__init__(**kwargs)
def call(self, inputs):
"""
How to implement this correctly in Keras?
"""
nparray = K.eval(inputs) # <-- does not work
# do some calculations here with nparray
# for example with Numpy, Scipy, Scikit, OpenCV...
result = K.variable(nparray, dtype='float32')
return result
def compute_output_shape(self, input_shape):
output_shape = tuple([input_shape[0], 256, input_shape[3]])
return output_shape # (batch, 256, channels)
The error appears here in this dummy model:
inputs = Input(shape=(96, 96, 3))
x = MyCustomLayer()(inputs)
x = Flatten()(x)
x = Activation("relu")(x)
x = Dense(1)(x)
predictions = Activation("sigmoid")(x)
model = Model(inputs=inputs, outputs=predictions)
Thanks for all hints...

TD;LR You should not mix Numpy inside Keras layers. Keras uses Tensorflow underneath because it has to track all the computations to be able to compute the gradients in the backward phase.
If you dig in Tensorflow, you will see that it almost covers all the Numpy functionality (or even extends it) and if I remember correctly, Tensorflow functionality can be accessed through the Keras backend (K).
What are the advance calculations/functions you need?

i think that this kinda process should apply before the model because the process does not contain variables so it cant be optimized.
K.eval(inputs) does not work beacuse you are trying to evaluate a placeholder not variable placeholders has not values for evaluate. if you want get values you should feed it or you can make a list from tensors one by one with tf.unstack()
nparray = tf.unstack(tf.unstack(tf.unstack(inputs,96,0),96,0),3,0)
your call function is wrong because returns a variable you should return a constant:
result = K.constant(nparray, dtype='float32')
return result

Related

Pytorch submodules output shape

How does the output shape of submodules in pytorch is determined? why is the output shape of a certain sub-module is modified in the code below?
When I separate the head of a classical classifier from its backbone in the following way:
import torch, torchvision
from torchsummary import summary
effnet = torchvision.models.efficientnet_b0(num_classes = 2)
backbone = torch.nn.Sequential(*(list(effnet.children())[0]))
adaptive_pool = list(effnet.children())[1]
head = list(effnet.children())[2]
model = torch.nn.Sequential(*[backbone, adaptive_pool, head])
summary(model, (3,256,256), device = 'cpu') # <== Error
I get the following error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2560x1 and 1280x2)
This error is due to modified output shape of the sub-module adaptive_pool. To workaround this problem, flatten can be used as follows:
class flatten(torch.nn.Module):
def forward(self, input):
return input.view(input.size(0), -1)
model = torch.nn.Sequential(*[backbone, adaptive_pool, flatten(), head])
summary(model, (3,256,256), device = 'cpu')
Why is the output shape of the sub-module adaptive_pool is modified?
The output of an nn.AdaptiveAvgPool2d is 4D even if the average is computed globally i.e output_size=1. In other words, the output shape of your global pooling layer is (N, C, 1, 1). This means you indeed need to flatten it for the layer which is fully connected.
In the referenced original efficient net classification network, the implementation of the flattening operation is done directly in the forward logic without the use of a dedicated layer. See this line.
Instead of implementing your own flattening layer, you can use the built-in nn.Flatten. More details about this module can be found here.
>>> model = nn.Sequential(backbone, adaptive_pool, nn.Flatten(1), head)

How can I implement a custom PCA layer in my model using Model Subclassing API?

I am trying to implement a custom PCA layer for my model being developed using Model Subclassing API. This is how I have defined the layer.
class PCALayer(tf.keras.layers.Layer):
def __init__(self):
super(PCALayer, self).__init__()
self.pc = pca
def call(self, input_tensor, training=False):
x = K.constant(self.pc.transform(input_tensor))
return x
The pca itself is from sklearn.decomposition.PCA and has been fit with the needed data and not transformed.
Now, this is how I have added the layer to my model
class ModelSubClassing(tf.keras.Model):
def __init__(self, initizlizer):
super(ModelSubClassing, self).__init__()
# define all layers in init
# Layer of Block 1
self.pca_layer = PCALayer()
self.dense1 = tf.keras.layers.Dense(...)
self.dense2 = tf.keras.layers.Dense(...)
self.dense3 = tf.keras.layers.Dense(...)
def call(self, input_tensor, training=False):
# forward pass: block 1
x = self.pca_layer(input_tensor)
x = self.dense1(x)
x = self.dense2(x)
return self.dense3(x)
When I compile the model there is no error. However, when I fit the model, I get the following error:
NotImplementedError: Cannot convert a symbolic Tensor (model_sub_classing_1/Cast:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported
Can anyone help me please...
self.pc.transform which comes from sklearn is expecting a numpy array, but you provide a tf tensor. When the layer is built, it passes a symbolic tensor to build the graph etc, and this tensor cannot be converted to a numpy array. The answer is in error :
you're trying to pass a Tensor to a NumPy call, which is not supported

Tensorflow 2: How can I use the shape of tensor y_true in custom loss?

I pass a list a to my custom function and I want to tf.tile it after converting it to a constant tensor. The times I tile it depends on the shape of y_true. I don't know how I can get the shape of y_true as integers. Here's the code:
def getloss(a):
a = tf.constant(a, tf.float32)
def loss(y_true, y_pred):
a = tf.reshape(a, [1,1,-1])
ytrue_shape = y_true.get_shape().as_list() #????
multiples = tf.constant([ytrue_shape[0], ytrue_shape[1], 1], tf.int32)
a = tf.tile(a, multiples)
#...
return loss
I have tried y_true.get_shape().as_list() but it reports an error because the first dimension (batch size) is None when compiling the model. Is there any way I can use the shape of y_true here?
When trying to access the shape of a tensor during the building of the model, when not all shapes are known, it is best to use tf.shape. It will be evaluated when the model is ran, as stated in the doc :
tf.shape and Tensor.shape should be identical in eager mode. Within tf.function or within a compat.v1 context, not all dimensions may be known until execution time. Hence when defining custom layers and models for graph mode, prefer the dynamic tf.shape(x) over the static x.shape.
ytrue_shape = tf.shape(y_true)
This will yield a Tensor, so use TF ops to get what you want :
multiples = tf.concat((tf.shape(y_true_shape)[:2],[1]),axis=0)

Extract intermmediate variable from a custom Tensorflow/Keras layer during inference (TF 2.0)

A bit of background:
I've implemented an NLP classification model using mostly Keras functional model bits of Tensorflow 2.0. The model architecture is a pretty straightforward LSTM network with the addition of an Attention layer between the LSTM and the Dense output layer. The Attention layer comes from this Kaggle kernel (starting around line 51).
I wrapped the trained model in a simple Flask app and get reasonably accurate predictions. In addition to predicting a class for a specific input I also output the value of the attention weight vector "a" from the aforementioned Attention layer so I can visualize the weights applied to the input sequence.
My current method of extracting the attention weights variable works, but seems incredibly inefficient as I'm predicting the output class and then manually calculating the attention vector using an intermediate Keras model. In the Flask app, inference looks something like this:
# Load the trained model
model = tf.keras.models.load_model('saved_model.h5')
# Extract the trained weights and biases of the trained attention layer
attention_weights = model.get_layer('attention').get_weights()
# Create an intermediate model that outputs the activations of the LSTM layer
intermediate_model = tf.keras.Model(inputs=model.input, outputs=model.get_layer('bi-lstm').output)
# Predict the output class using the trained model
model_score = model.predict(input)
# Obtain LSTM activations by predicting the output again using the intermediate model
lstm_activations = intermediate_model.predict(input)
# Use the intermediate LSTM activations and the trained model attention layer weights and biases to calculate the attention vector.
# Maths from the custom Attention Layer (heavily modified for the sake of brevity)
eij = tf.keras.backend.dot(lstm_activations, attention_weights)
a = tf.keras.backend.exp(eij)
attention_vector = a
I think I should be able to include the attention vector as part of the model output, but I'm struggling with figuring out how to accomplish this. Ideally I'd extract the attention vector from the custom attention layer in a single forward pass rather than extracting the various intermediate model values and calculating a second time.
For example:
model_score = model.predict(input)
model_score[0] # The predicted class label or probability
model_score[1] # The attention vector, a
I think I'm missing some basic knowledge around how Tensorflow/Keras throw variables around and when/how I can access those values to include as model output. Any advice would be appreciated.
After a little more research I've managed to cobble together a working solution. I'll summarize here for any future weary internet travelers that come across this post.
The first clues came from this github thread. The attention layer defined there seems to build on the attention layer in the previously mentioned Kaggle kernel. The github user adds a return_attention flag to the layer init which, when enabled, includes the attention vector in addition to the weighted RNN output vector in the layer output.
I also added a get_config function suggested by this user in the same github thread which enables us to save and reload trained models. I had to add the return_attention flag to get_config, otherwise TF would throw a list iteration error when trying to load a saved model with return_attention=True.
With those changes made, the model definition needed to be updated to capture the additional layer outputs.
inputs = Input(shape=(max_sequence_length,))
lstm = Bidirectional(LSTM(lstm1_units, return_sequences=True))(inputs)
# Added 'attention_vector' to capture the second layer output
attention, attention_vector = Attention(max_sequence_length, return_attention=True)(lstm)
x = Dense(dense_units, activation="softmax")(attention)
The final, and most important piece of the puzzle came from this Stackoverflow answer. The method described there allows us to output multiple results while only optimizing on one of them. The code changes are subtle, but very important. I've added comments below in the spots I made changes to implement this functionality.
model = Model(
inputs=inputs,
outputs=[x, attention_vector] # Original value: outputs=x
)
model.compile(
loss=['categorical_crossentropy', None], # Original value: loss='categorical_crossentropy'
optimizer=optimizer,
metrics=[BinaryAccuracy(name='accuracy')])
With those changes in place, I retrained the model and voila! The output of model.predict() is now a list containing the score and its associated attention vector.
The results of the change were pretty dramatic. Running inference on 10k examples took about 20 minutes using this new method. The old method utilizing intermediate models took ~33 minutes to perform inference on the same dataset.
And for anyone that's interested, here is my modified Attention layer:
from tensorflow.python.keras.layers import Layer
from tensorflow.keras import initializers, regularizers, constraints
from tensorflow.keras import backend as K
class Attention(Layer):
def __init__(self, step_dim,
W_regularizer=None, b_regularizer=None,
W_constraint=None, b_constraint=None,
bias=True, return_attention=True, **kwargs):
self.supports_masking = True
self.init = initializers.get('glorot_uniform')
self.W_regularizer = regularizers.get(W_regularizer)
self.b_regularizer = regularizers.get(b_regularizer)
self.W_constraint = constraints.get(W_constraint)
self.b_constraint = constraints.get(b_constraint)
self.bias = bias
self.step_dim = step_dim
self.features_dim = 0
self.return_attention = return_attention
super(Attention, self).__init__(**kwargs)
def build(self, input_shape):
assert len(input_shape) == 3
self.W = self.add_weight(shape=(input_shape[-1],),
initializer=self.init,
name='{}_W'.format(self.name),
regularizer=self.W_regularizer,
constraint=self.W_constraint)
self.features_dim = input_shape[-1]
if self.bias:
self.b = self.add_weight(shape=(input_shape[1],),
initializer='zero',
name='{}_b'.format(self.name),
regularizer=self.b_regularizer,
constraint=self.b_constraint)
else:
self.b = None
self.built = True
def compute_mask(self, input, input_mask=None):
return None
def call(self, x, mask=None):
features_dim = self.features_dim
step_dim = self.step_dim
eij = K.reshape(K.dot(K.reshape(x, (-1, features_dim)),
K.reshape(self.W, (features_dim, 1))), (-1, step_dim))
if self.bias:
eij += self.b
eij = K.tanh(eij)
a = K.exp(eij)
if mask is not None:
a *= K.cast(mask, K.floatx())
a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())
a = K.expand_dims(a)
weighted_input = x * a
result = K.sum(weighted_input, axis=1)
if self.return_attention:
return [result, a]
return result
def compute_output_shape(self, input_shape):
if self.return_attention:
return [(input_shape[0], self.features_dim),
(input_shape[0], input_shape[1])]
else:
return input_shape[0], self.features_dim
def get_config(self):
config = {
'step_dim': self.step_dim,
'W_regularizer': regularizers.serialize(self.W_regularizer),
'b_regularizer': regularizers.serialize(self.b_regularizer),
'W_constraint': constraints.serialize(self.W_constraint),
'b_constraint': constraints.serialize(self.b_constraint),
'bias': self.bias,
'return_attention': self.return_attention
}
base_config = super(Attention, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

Tensorflow 2.0 doesn't compute the gradient

I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convolutional layer, choose the feature map and find the gradients with the respect to the input. The idea is to change the input in such a way that will maximize the activation of the desired feature map. Using tensorflow 2.0 I have a GradientTape that follows the function and then computes the gradient, however the gradient returns None, why is it unable to compute the gradient?
import tensorflow as tf
import matplotlib.pyplot as plt
import time
import numpy as np
from tensorflow.keras.applications import vgg16
class maxFeatureMap():
def __init__(self, model):
self.model = model
self.optimizer = tf.keras.optimizers.Adam()
def getNumLayers(self, layer_name):
for layer in self.model.layers:
if layer.name == layer_name:
weights = layer.get_weights()
num = weights[1].shape[0]
return ("There are {} feature maps in {}".format(num, layer_name))
def getGradient(self, layer, feature_map):
pic = vgg16.preprocess_input(np.random.uniform(size=(1,96,96,3))) ## Creates values between 0 and 1
pic = tf.convert_to_tensor(pic)
model = tf.keras.Model(inputs=self.model.inputs,
outputs=self.model.layers[layer].output)
with tf.GradientTape() as tape:
## predicts the output of the model and only chooses the feature_map indicated
predictions = model.predict(pic, steps=1)[0][:,:,feature_map]
loss = tf.reduce_mean(predictions)
print(loss)
gradients = tape.gradient(loss, pic[0])
print(gradients)
self.optimizer.apply_gradients(zip(gradients, pic))
model = vgg16.VGG16(weights='imagenet', include_top=False)
x = maxFeatureMap(model)
x.getGradient(1, 24)
This is a common pitfall with GradientTape; the tape only traces tensors that are set to be "watched" and by default tapes will watch only trainable variables (meaning tf.Variable objects created with trainable=True). To watch the pic tensor, you should add tape.watch(pic) as the very first line inside the tape context.
Also, I'm not sure if the indexing (pic[0]) will work, so you might want to remove that -- since pic has just one entry in the first dimension it shouldn't matter anyway.
Furthermore, you cannot use model.predict because this returns a numpy array, which basically "destroys" the computation graph chain so gradients won't be backpropagated. You should simply use the model as a callable, i.e. predictions = model(pic).
Did you define your own loss function? Did you convert tensor to numpy in your loss function?
As a freshman, I also met the same problem:
When using tape.gradient(loss, variables), it turns out None because I convert tensor to numpy array in my own loss function. It seems to be a stupid but common mistake for freshman.
FYI: When GradientTape is not working, there is a possibility of TensorFlow issue. Checking the TF github if the TF functions being used have known issues would be one of the problem determinations.
Gradients do not exist for variables after tf.concat(). #37726.

Categories

Resources