As the title states I'm wondering how I could access the privileged 'training' argument when I'm using the functional API.
So if I use subclassing, I can write something like:
class MyLayer(tf.keras.layers.Layer):
def __init__(self):
...
self.BN = tf.keras.Layers.BatchNormalization()
def call(self,inputs, training=None):
self.BN(inputs, training=training)
So I can control how my batchnorm behaves during training and prediction. But If I want to use the functional API:
input = tf.Input(someshape)
normalized = tf.keras.layers.BatchNormalization()(input)
tf.keras.Model(inputs=input, outputs=normalized)
Now I can't really set the priviledged 'training' argument for my batch_norm anymore. I love the functional API, its just really so much fun to use, but having to build around this kind of is a dealbreaker quite often. I feel like I must miss some important idea on how one would solve this here.
I'm aware that I could create a tf.Input, which could hold the 'training' argument. But this would change it from a keyord arg to some element of a list, which creates very very inconsistent code. Any smarter solution to this?
Edit: Should make it clear that I'm looking for a general idea that can be used for the 'training' arg, not just tackling the BatchNormalization in particular.
When you instantiate the model model = tf.keras.Model(inputs=input, outputs=normalized), the model has not yet been built. You will need to call the build method, usually when you do everything by hand using the gradient tape, or when you first call the fit method. At that point, the weights will be initialized. Now, if you use the fit method or call your model output_tensors = mymodel(input_tensors, training=True), or conversely if you use the predict method or use output_tensors = mymodel(input_tensors, training=False), the training flag will be set to True or False, (which is obvious if you call the model directly).
Related
I need to set the attribute activation_out = 'logistic' in a MLPRegressor of sklearn. It is supposed that this attribute can take the names of the relevant activation functions ('relu','logistic','tanh' etc). The problem is that I cannot find the way that you can control this attribute and set it to the preferred functions. Please, if someone has faced this problem before or knows something more, I want some help.
I have tried to set attribute to MLPRegressor(), error. I have tried with the method set_params(), error. I have tried manually to change it through Variable Explorer, error. Finally, I used MLPName.activation_out = 'logistic' but again when I used fit() method it changed to 'identity'.
CODE:
X_train2, X_test2, y_train2,y_test2 =
train_test_split(signals_final,masks,test_size=0.05,random_state =
17)
scaler2 = MinMaxScaler()
X_train2 = scaler.fit_transform(X_train2)
X_test2 = scaler.transform(X_test2)
MatchingNetwork = MLPRegressor(alpha = 1e-15,hidden_layer_sizes=
(300,)
,random_state=1,max_iter=20000,activation='logistic',batch_size=64)
MLPRegressor().out_activation_ = 'logistic'
You cannot. The output activation is determined by the problem type at fit time. For regression, the identity activation is used; see the User Guide.
Here is the relevant bit of source code. You might be able to hack it by fitting one iteration, changing the attribute, then using partial_fit, since then this _initialize method won't be called again; but it's likely to break when back-propogating.
Generally I think the sklearn neural networks aren't designed to be super flexible: there are other packages that play that role, are more efficient (use GPUs), etc.
AKA Keras Model subclassing magic.
While playing with Keras, I noticed, that ResNetBlock.layers gets populated as I put new instances of layers into collections I previously put into my custom model.
class ResNetBlock(Model):
PART_COUNT = 3
def __init__(self, kernel_size, filters):
super().__init__()
self.convs = []
self.batchNorms = []
for part in range(ResNetBlock.PART_COUNT):
if part == 1:
conv = Conv2D(filters[part], kernel_size=kernel_size, padding="same")
else:
conv = Conv2D(filters[part], kernel_size=(1,1))
self.convs.append(conv)
self.batchNorms.append(BatchNormalization())
resnet = ResNetBlock(1, [1, 2, 3])
print(resnet.layers) # actually prints non-empty list
# filled with Conv2Ds and BNs from above
Adopted from official tutorial: https://www.tensorflow.org/beta/tutorials/eager/custom_layers
A bit of digging into TensorFlow source showed, that some kind of tracking is used via __setattr__ in Network class.
Now the code is not trivial, documentation lacking, and it seems unclear if the order of creating new layers/adding them to respective collections matters at all? E.g. if I first fill in convs collection, and only then batchNorms collection, would it still be the same model?
In most tutorials each layer is actually put into its own attribute.
Bonus question is: why is it done so implicitly? This kind of magic kinda breaks the motto to prefer explicit over implicit. What if for some reason I'd need to use a custom collection type not derived from list? How would I ensure these magic operations are done properly?
The order won't matter. What really changes your model is the call method. This stores the order of the operations (even if the order of the weights were variable, they would be applied in the same graph with the same functions)
Now, if you suspect that not using a "property", but using another kind of storage for the layers, would not register the layer for some reason, you can double check with:
print(len(resnet.trainable_weights))
The count should be 6 * PART_COUNT:
2 tensors for the conv layers (kernel and bias)
4 tensors for the BatchNormalization layers (mean, variance, scale and offset)
While looking at some pytorch code on pose estimation AlphaPose I noticed some unfamiliar syntax:
Basically, we define a Darknet class which inherits nn.Module properties like so: class Darknet(nn.Module)
This re-constructs the neural net from some config file and also defines functions to load pre-trained weights and a forward pass
Now, forward pass takes the following parameters:
def forward(self, x, CUDA)
I should note that in class definition forward is the only method that has a CUDA attribute (this will become important later on)
In the forward pass we get the predictions:
for i in range(number_of_modules):
x = self.module[i](x)
where module[i] was constructed as:
module = nn.Sequential()
conv = nn.Conv2d(prev_fileters, filters, kernel_size, stride, pad, bias=bias)
module.add_module("conv_{0}".format(index), conv)
We then call invoke this model and (I presume) a forward method like so:
self.det_model = Darknet("yolo/cfg/yolov3-spp.cfg")
self.det_model.load_weights('models/yolo/yolov3-spp.weights')
self.det_model.cpu()
self.det_model.eval()
image = image.cpu()
prediction = self.det_model(img, CUDA = False)
I assume that the last line is the calling of the forward pass but why not use the .forward? Is this a pytorch specific syntax or am I missing some basic python principles?
This is nothing torch specific. When you call something as class_object(fn params) it invokes the __call__ method of that class.
If you dig the code of torch, specifically nn.Module you will see that __call__ internally invokes forward but taking care of hooks and states that pytorch allows. So when you are calling self.det_model(img, cuda) you are still calling forward.
See the code for nn.module here.
While I learn Keras, I always see a syntax like Activation('relu')(X). I looked at the source code and found Activation is a class, so it does make no sense to me how the syntax like Class(...)(...) works.
Here is an example and use case of it: A = Add()([A1, A2])
In Keras, it's a bit more convoluted than vanilla Python. Let's break down what happens when you call Activation('relu')(X):
Activation('relu') creates a new object of that class by calling the class __init__ method. This creates the object with 'relu' as parameter.
All objects in Python can be callable by implementing __call__ allowing you to call it like a function. Activation('relu')(X) now calls that function with X as parameter.
But wait, Activation doesn't directly implement it, in fact it is the base class Layer.__call__ gets called which does some checks like shape matching etc.
Then Layer.__call__ actually calls self.call(X) which then invokes the Activation.call method which applies the activation to the tensor and returns the result.
Hope that clarifies that line of code, a similar process happens when creating other layers and calling them with the functional API.
In python, classes may have the __call__ method, meaning that class instances are callable.
So, it's totally ok to call Activation(...)(...).
The first step creates an instance of Activation, and the second calls that instance with some parameters.
It's exactly the same as doing:
activationLayer = Activation('relu')
outputTensor = activationLayer(inputTensor) #where inputTensor == X in your example
With this, you can also reuse the same layers with different input tensors:
activationLayer = Activation('relu')
out1 = activationLayer(X1)
out2 = activationLayer(X2)
This doesn't make a big difference with a standard activation layer, but it starts getting very interesting with certain trained layers.
Example: you want to use a standard trained VGG16 model to process two images and then join the images:
vgg16 = keras.applications.vgg16(......)
img1 = Input(imageShape1)
img2 = Input(imageShape2)
out1 = vgg16(img1) #a model is also a layer by inheritance
out2 = vgg16(img2)
... continue the model ....
Are you expecting the new keyword? Python does not use that keyword, instead uses "function notation":
Class instantiation uses function notation. Just pretend that the class
object is a parameterless function that returns a new instance of the
class. For example (assuming the above class):
x = MyClass()
I have a tensorflow contrib.learn.DNNRegressor that I have trained as part of the following code snippet:
regressor = tf.contrib.learn.DNNRegressor(feature_columns=fc,
hidden_units=hu_array,
optimizer=tf.train.AdamOptimizer(
learning_rate=0.001,
),
enable_centered_bias=False,
activation_fn=tf.tanh,
model_dir="./models/my_model/",
)
regressor.fit(x=training_features, y=training_labels, steps=10000)
The trained network performs quite well, and I'd like to use it as a part of some other code, on another machine. I have tried copying over the models/my_model directory, and constructing a new DNNRegressor pointing just at the model_dir, but it requires that I supply feature_columns and hidden_units definitions. Shouldn't that information be available via the snapshots stored in model_dir? Is there a better way to save/recover a trained model which is performing well, to be used as a predictor, without having to separately save the feature_columns and hidden_units?
I came up with something workable- not ideal, but it gets the job done. If anyone has a better idea, I am all ears.
I converted my kwargs for DNNRegressor into a dict, and used the ** operator. Then I was able to pickle the kwargs dict, and reconstruct the DNNRegressor from that. E.g:
reg_args = {'feature_columns': fc, 'hidden_units': hu_array, ...}
regressor = tf.contrib.learn.DNNRegressor(**reg_args)
pickle.dump(reg_args, open('reg_args.pkl', 'wb'))
Later on, I reconstruct via:
reg_args = pickle.load(open('reg_args.pkl', 'rb'))
# On another machine and so my model dir path changed:
reg_args['model_dir'] = NEW_MODEL_DIR
regressor = tf.contrib.learn.DNNRegressor(**reg_args)
It worked well. I'm sure there must be a better way but for now if someone is trying to figure out a workaround for tf.contrib.learn, this is a solution.
When training
You call DNNRegressor(..., model_dir) and then call the fit() and evaluate() method.
When testing
You call DNNRegressor(..., model_dir) and then can call predict() methods. Your model will find a trained model in the model_dir and will load the trained model params.
Reference
Issue #3340 of TF