Calling forward function without .forward() - python

While looking at some pytorch code on pose estimation AlphaPose I noticed some unfamiliar syntax:
Basically, we define a Darknet class which inherits nn.Module properties like so: class Darknet(nn.Module)
This re-constructs the neural net from some config file and also defines functions to load pre-trained weights and a forward pass
Now, forward pass takes the following parameters:
def forward(self, x, CUDA)
I should note that in class definition forward is the only method that has a CUDA attribute (this will become important later on)
In the forward pass we get the predictions:
for i in range(number_of_modules):
x = self.module[i](x)
where module[i] was constructed as:
module = nn.Sequential()
conv = nn.Conv2d(prev_fileters, filters, kernel_size, stride, pad, bias=bias)
module.add_module("conv_{0}".format(index), conv)
We then call invoke this model and (I presume) a forward method like so:
self.det_model = Darknet("yolo/cfg/yolov3-spp.cfg")
self.det_model.load_weights('models/yolo/yolov3-spp.weights')
self.det_model.cpu()
self.det_model.eval()
image = image.cpu()
prediction = self.det_model(img, CUDA = False)
I assume that the last line is the calling of the forward pass but why not use the .forward? Is this a pytorch specific syntax or am I missing some basic python principles?

This is nothing torch specific. When you call something as class_object(fn params) it invokes the __call__ method of that class.
If you dig the code of torch, specifically nn.Module you will see that __call__ internally invokes forward but taking care of hooks and states that pytorch allows. So when you are calling self.det_model(img, cuda) you are still calling forward.
See the code for nn.module here.

Related

Accessing 'training' attribute in TensorFlow functional (functional API) Model

As the title states I'm wondering how I could access the privileged 'training' argument when I'm using the functional API.
So if I use subclassing, I can write something like:
class MyLayer(tf.keras.layers.Layer):
def __init__(self):
...
self.BN = tf.keras.Layers.BatchNormalization()
def call(self,inputs, training=None):
self.BN(inputs, training=training)
So I can control how my batchnorm behaves during training and prediction. But If I want to use the functional API:
input = tf.Input(someshape)
normalized = tf.keras.layers.BatchNormalization()(input)
tf.keras.Model(inputs=input, outputs=normalized)
Now I can't really set the priviledged 'training' argument for my batch_norm anymore. I love the functional API, its just really so much fun to use, but having to build around this kind of is a dealbreaker quite often. I feel like I must miss some important idea on how one would solve this here.
I'm aware that I could create a tf.Input, which could hold the 'training' argument. But this would change it from a keyord arg to some element of a list, which creates very very inconsistent code. Any smarter solution to this?
Edit: Should make it clear that I'm looking for a general idea that can be used for the 'training' arg, not just tackling the BatchNormalization in particular.
When you instantiate the model model = tf.keras.Model(inputs=input, outputs=normalized), the model has not yet been built. You will need to call the build method, usually when you do everything by hand using the gradient tape, or when you first call the fit method. At that point, the weights will be initialized. Now, if you use the fit method or call your model output_tensors = mymodel(input_tensors, training=True), or conversely if you use the predict method or use output_tensors = mymodel(input_tensors, training=False), the training flag will be set to True or False, (which is obvious if you call the model directly).

Can I access the inner layer outputs of DeepLab in pytorch?

Using Pytorch, I am trying to implement a network that is using the pre=trained DeepLab ResNet-101.
I found two possible methods for using this network:
this one
or
torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=False, progress=True, num_classes=21, aux_loss=None, **kwargs)
However, I might not only need this network's output, but also several inside layers' outputs.
Is there a way to access the inner layer outputs using one of these methods?
If not - Is it possible to manually copy the trained resnet's parameters so I can manually recreate it and add those outputs myself? (Hopefully the first option is possible so I won't need to do this)
Thanks!
You can achieve this without too much trouble using forward hooks.
The idea is to loop over the modules of your model, find the layers you're interested in, hook a callback function onto them. When called, those layers will trigger the hook. We will take advantage of this to save the intermediate outputs.
For example, let's say you want to get the outputs of layer classifier.0.convs.3.1:
layers = ['classifier.0.convs.3.1']
activations = {}
def forward_hook(name):
def hook(module, x, y):
activations[name] = y
return hook
for name, module in model.named_modules():
if name in layers:
module.register_forward_hook(forward_hook(name))
*The closure around hook() made by forward_hook's scope is used to enclose the module's name which you wouldn't otherwise have access to at this point.
Everything is ready, we can call the model
>>> model = torchvision.models.segmentation.deeplabv3_resnet101(
pretrained=True, progress=True, num_classes=21, aux_loss=None)
>>> model(torch.rand(16, 3, 100, 100))
And as expected, after inference, activations will have a new entry 'classifier.0.convs.3.1' which - in this case - will contain a tensor of shape (16, 256, 13, 13).
Not so long ago, I wrote an answer about a similar question which goes a little bit more in detail on how hooks can be used to inspect the intermediate output shapes.

Does the way I create and store layers in subclassed Keras `Model` have any effect?

AKA Keras Model subclassing magic.
While playing with Keras, I noticed, that ResNetBlock.layers gets populated as I put new instances of layers into collections I previously put into my custom model.
class ResNetBlock(Model):
PART_COUNT = 3
def __init__(self, kernel_size, filters):
super().__init__()
self.convs = []
self.batchNorms = []
for part in range(ResNetBlock.PART_COUNT):
if part == 1:
conv = Conv2D(filters[part], kernel_size=kernel_size, padding="same")
else:
conv = Conv2D(filters[part], kernel_size=(1,1))
self.convs.append(conv)
self.batchNorms.append(BatchNormalization())
resnet = ResNetBlock(1, [1, 2, 3])
print(resnet.layers) # actually prints non-empty list
# filled with Conv2Ds and BNs from above
Adopted from official tutorial: https://www.tensorflow.org/beta/tutorials/eager/custom_layers
A bit of digging into TensorFlow source showed, that some kind of tracking is used via __setattr__ in Network class.
Now the code is not trivial, documentation lacking, and it seems unclear if the order of creating new layers/adding them to respective collections matters at all? E.g. if I first fill in convs collection, and only then batchNorms collection, would it still be the same model?
In most tutorials each layer is actually put into its own attribute.
Bonus question is: why is it done so implicitly? This kind of magic kinda breaks the motto to prefer explicit over implicit. What if for some reason I'd need to use a custom collection type not derived from list? How would I ensure these magic operations are done properly?
The order won't matter. What really changes your model is the call method. This stores the order of the operations (even if the order of the weights were variable, they would be applied in the same graph with the same functions)
Now, if you suspect that not using a "property", but using another kind of storage for the layers, would not register the layer for some reason, you can double check with:
print(len(resnet.trainable_weights))
The count should be 6 * PART_COUNT:
2 tensors for the conv layers (kernel and bias)
4 tensors for the BatchNormalization layers (mean, variance, scale and offset)

How to apply Optimizer on Variable in Chainer?

Here is an example in Pytorch:
optimizer = optim.Adam([modifier_var], lr=0.0005)
And here in Tensorflow:
self.train = self.optimizer.minimize(self.loss, var_list=[self.modifier])
But Chainer's optimizers only can use on 'Link', how can I apply Optimizer on Variable in Chainer?
In short, there is no way to directly assign chainer.Variable (even nor chainer.Parameter) to chainer.Optimizer.
The following is some redundant explanation.
First, I re-define Variable and Parameter to avoid confusion.
Variable is (1) torch.Tensor in PyTorch v4, (2) torch.autograd.Variable in PyTorch v3, and (3) chainer.Variable in Chainer v4.
Variable is an object who holds two tensors; .data and .grad. It is the necessary and sufficient condition, so Variable is not necessarily a learnable parameter, which is a target of the optimizer.
In both libraries, there is another class Parameter, which is similar but not the same with Variable. Parameter is torch.autograd.Parameter in Pytorch and chainer.Parameter in Chainer.
Parameter must be a learnable parameter and should be optimized.
Therefore, there should be no case to register Variable (not Parameter) to Optimizer (although PyTorch allows to register Variable to Optimizer: this is just for backward compatibility).
Second, in PyTorch torch.nn.Optimizer directly optimizes Parameter, but in Chainer chainer.Optimizer DOES NOT optimize Parameter: instead, chainer.UpdateRule does. The Optimizer just registers UpdateRules to Parameters in a Link.
Therefore, it is only natural that chainer.Optimizer does not receive Parameter as its arguments, because it is just a "delivery-man" of UpdateRule.
If you want to attach different UpdateRule for each Parameter, you should directly create an instance of UpdateRule subclass, and attach it to the Parameter.
Below is an example to learn regression task by MyChain MLP model using Adam optimizer in Chainer.
from chainer import Chain, Variable
# Prepare your model (neural network) as `Link` or `Chain`
class MyChain(Chain):
def __init__(self):
super(MyChain, self).__init__(
l1=L.Linear(None, 30),
l2=L.Linear(None, 30),
l3=L.Linear(None, 1)
)
def __call__(self, x):
h = self.l1(x)
h = self.l2(F.sigmoid(h))
return self.l3(F.sigmoid(h))
model = MyChain()
# Then you can instantiate optimizer
optimizer = chainer.optimizers.Adam()
# Register model to optimizer (to indicate which parameter to update)
optimizer.setup(model)
# Calculate loss, and update parameter as follows.
def lossfun(x, y):
loss = F.mean_squared_error(model(x), y)
return loss
# this iteration is "training", to fit the model into desired function.
for i in range(300):
optimizer.update(lossfun, x, y)
So in summary, you need to setup the model, after that you can use update function to calculate loss and update model's parameter.
The above code comes from here
Also, there are other way to write training code using Trainer module. For more detailed tutorial of Chainer, please refer below
chainer-handson
deep-learning-tutorial-with-chainer

When calling a class like Activation('relu')(X), how does it actually work?

While I learn Keras, I always see a syntax like Activation('relu')(X). I looked at the source code and found Activation is a class, so it does make no sense to me how the syntax like Class(...)(...) works.
Here is an example and use case of it: A = Add()([A1, A2])
In Keras, it's a bit more convoluted than vanilla Python. Let's break down what happens when you call Activation('relu')(X):
Activation('relu') creates a new object of that class by calling the class __init__ method. This creates the object with 'relu' as parameter.
All objects in Python can be callable by implementing __call__ allowing you to call it like a function. Activation('relu')(X) now calls that function with X as parameter.
But wait, Activation doesn't directly implement it, in fact it is the base class Layer.__call__ gets called which does some checks like shape matching etc.
Then Layer.__call__ actually calls self.call(X) which then invokes the Activation.call method which applies the activation to the tensor and returns the result.
Hope that clarifies that line of code, a similar process happens when creating other layers and calling them with the functional API.
In python, classes may have the __call__ method, meaning that class instances are callable.
So, it's totally ok to call Activation(...)(...).
The first step creates an instance of Activation, and the second calls that instance with some parameters.
It's exactly the same as doing:
activationLayer = Activation('relu')
outputTensor = activationLayer(inputTensor) #where inputTensor == X in your example
With this, you can also reuse the same layers with different input tensors:
activationLayer = Activation('relu')
out1 = activationLayer(X1)
out2 = activationLayer(X2)
This doesn't make a big difference with a standard activation layer, but it starts getting very interesting with certain trained layers.
Example: you want to use a standard trained VGG16 model to process two images and then join the images:
vgg16 = keras.applications.vgg16(......)
img1 = Input(imageShape1)
img2 = Input(imageShape2)
out1 = vgg16(img1) #a model is also a layer by inheritance
out2 = vgg16(img2)
... continue the model ....
Are you expecting the new keyword? Python does not use that keyword, instead uses "function notation":
Class instantiation uses function notation. Just pretend that the class
object is a parameterless function that returns a new instance of the
class. For example (assuming the above class):
x = MyClass()

Categories

Resources