TF Gradient Tape has issues with cross products? - python

I'm trying to use TF gradient tape as an autograd tool for root finding via Newton's method. But when I'm trying to compute the Jacobian matrix, it seems that tf.GradientTape.jacobian can't handle cross products:
x = tf.convert_to_tensor(np.array([1., 2., 3.]))
Wx = np.ones((3))
with tf.GradientTape() as tape:
tape.watch(x)
y = tf.linalg.cross(x, Wx)
print(tape.jacobian(y, x))
gives below error:
StagingError: in converted code:
relative to /Users/xinzhang/anaconda3/lib/python3.7/site-packages:
tensorflow_core/python/ops/parallel_for/control_flow_ops.py:184 f *
return _pfor_impl(loop_fn, iters, parallel_iterations=parallel_iterations)
tensorflow_core/python/ops/parallel_for/control_flow_ops.py:257 _pfor_impl
outputs.append(converter.convert(loop_fn_output))
tensorflow_core/python/ops/parallel_for/pfor.py:1231 convert
output = self._convert_helper(y)
tensorflow_core/python/ops/parallel_for/pfor.py:1395 _convert_helper
if flags.FLAGS.op_conversion_fallback_to_while_loop:
tensorflow_core/python/platform/flags.py:84 __getattr__
wrapped(_sys.argv)
absl/flags/_flagvalues.py:633 __call__
name, value, suggestions=suggestions)
UnrecognizedFlagError: Unknown command line flag 'f'
Whereas if I switch out the call to jacobian to a simple gradient:
x = tf.convert_to_tensor(np.array([1., 2., 3.]))
Wx = np.ones((3))
with tf.GradientTape() as tape:
tape.watch(x)
y = tf.linalg.cross(x, Wx)
print(tape.gradient(y, x))
gives the expected result:
tf.Tensor([0. 0. 0.], shape=(3,), dtype=float64)
Is this a bug?? Or am I doing something wrong with the tape.jacobian method?
p.s. python version 3.7.4; tf version 2.0.0 Everything installed with conda.

This might be a bug in Tensorflow Version 2.0 but it is fixed in Tensorflow Version 2.1.
So, please upgrade your Tensorflow Version to either 2.1 or 2.2 and the issue will be resolved.
Working code is mentioned below:
!pip install tensorflow==2.2
import tensorflow as tf
import numpy as np
print(tf.__version__)
x = tf.convert_to_tensor(np.array([1., 2., 3.]))
Wx = np.ones((3))
with tf.GradientTape() as tape:
tape.watch(x)
y = tf.linalg.cross(x, Wx)
print(tape.jacobian(y, x))
Output is shown below:
2.2.0
tf.Tensor(
[[ 0. 1. -1.]
[-1. 0. 1.]
[ 1. -1. 0.]], shape=(3, 3), dtype=float64)

Related

Tensorflow sort changes values in output list to 0 when the tensor datatype is tf.float32, but not tf.float64

Actions speak louder than words so here is the MWE:
import tensorflow as tf
n=17
a = tf.random.uniform(shape=[n], dtype=tf.float32)
print(a)
print(tf.sort(a))
When n<=16 it sorts the values just fine, but when n>16 it sorts the list and then turns the values at position 16 and higher to have value = -0. Example output:
tf.Tensor(
[0.41191268 0.48915362 0.65293264 0.6125376 0.00088847 0.03644979
0.13768506 0.528106 0.27231824 0.4003389 0.5799836 0.83420205
0.06494105 0.39109504 0.8135816 0.153288 0.07945895], shape=(17,), dtype=float32)
tf.Tensor(
[ 0.00088847 0.03644979 0.06494105 0.07945895 0.13768506 0.153288
0.27231824 0.39109504 0.4003389 0.41191268 0.48915362 0.528106
0.5799836 0.6125376 0.65293264 0.8135816 -0. ], shape=(17,), dtype=float32)
But when I make the dtype tf.float64 there is no problem for seemingly arbitrary n:
tf.Tensor(
[0.91347295 0.60086058 0.0271204 0.83564393 0.49664206 0.96215479
0.60472639 0.64395121 0.58394402 0.93489432 0.50379539 0.14087138
0.51662724 0.29758834 0.5657154 0.08638131 0.47912787], shape=(17,), dtype=float64)
tf.Tensor(
[0.0271204 0.08638131 0.14087138 0.29758834 0.47912787 0.49664206
0.50379539 0.51662724 0.5657154 0.58394402 0.60086058 0.60472639
0.64395121 0.83564393 0.91347295 0.93489432 0.96215479], shape=(17,), dtype=float64)
Not sure if this is a bug or expected behavior. It does not depend on eager execution. I was noticing issues when using the tensorflowprobability percentile function where it was giving me -0.0 as the value, so I made my own percentile function and observed the same issue (which I suspect is due to this underlying issue with tf.sort). Numpy sorting works fine regardless of the data type, but I was trying to keep things within tensorflow.
Any reason why this might be happening or should I make a bug report?
Hardware: I am on an M1 Macbook Air using tensorflow 2.5.0
I also checked the same with tensorflow==2.5.0 as you mentioned and did not notice any issue.
import tensorflow as tf
print(tf.__version__)
Output:
2.5.0
and then
import tensorflow as tf
n=17
a = tf.random.uniform(shape=[n], dtype=tf.float32)
print(a)
print(tf.sort(a))
Output:
tf.Tensor(
[0.7946081 0.84397626 0.04671419 0.276353 0.8124876 0.66761124
0.21016991 0.28140187 0.22393394 0.20382321 0.667526 0.1714747
0.9672215 0.17870915 0.9914366 0.32059753 0.5422765 ], shape=(17,), dtype=float32)
tf.Tensor(
[0.04671419 0.1714747 0.17870915 0.20382321 0.21016991 0.22393394
0.276353 0.28140187 0.32059753 0.5422765 0.667526 0.66761124
0.7946081 0.8124876 0.84397626 0.9672215 0.9914366 ], shape=(17,), dtype=float32)
with n>16:
import tensorflow as tf
n=18
a = tf.random.uniform(shape=[n], dtype=tf.float32)
print(a)
print(tf.sort(a))
Output:
tf.Tensor(
[0.1922586 0.6136733 0.7517139 0.3762852 0.52895963 0.7804493
0.9869323 0.08194113 0.3963052 0.6049119 0.9553219 0.18031311
0.58210933 0.92059183 0.4442644 0.91004515 0.4451145 0.8300687 ], shape=(18,), dtype=float32)
tf.Tensor(
[0.08194113 0.18031311 0.1922586 0.3762852 0.3963052 0.4442644
0.4451145 0.52895963 0.58210933 0.6049119 0.6136733 0.7517139
0.7804493 0.8300687 0.91004515 0.92059183 0.9553219 0.9869323 ], shape=(18,), dtype=float32)
Would you mind checking it back again and let us know if the issue still persists?

Calculating jacobians and gradients using tensor flow

I'm trying to solve 2D Darcy equation which is a mixed formulation. Suppose I have a target vector and source vector as follows:
u = [u1,u2,p]
x = [x,y].
grad(u,x) =
[du1/dx, du2/dx, dp/dx;
du1/dy, du2/dy, dp/dy]
I'm not understanding if this is what happens if I do tf.gradients(u,x).
tf.gradients(u,x) doesn't return what you want because
from https://www.tensorflow.org/api_docs/python/tf/gradients,
gradients() adds ops to the graph to output the derivatives of ys with
respect to xs. It returns a list of Tensor of length len(xs) where
each tensor is the sum(dy/dx) for y in ys and for x in xs.
Here is how you can get jacobian.
import tensorflow as tf
x=tf.constant([3.0,4.0])
with tf.GradientTape() as tape:
tape.watch(x)
u1=x[0]**2+x[1]**2
u2=x[0]**2
u3=x[1]**3
u=tf.stack([u1,u2,u3])
J = tape.jacobian(u, x)
print(J)
'''
tf.Tensor(
[[ 6. 8.]
[ 6. 0.]
[ 0. 48.]], shape=(3, 2), dtype=float32)
'''

Tensorflow aggregates scalar-tensor multiplication gradient

Suppose I multiply a vector with a scalar, e.g.:
a = tf.Variable(3.)
b = tf.Variable([1., 0., 1.])
with tf.GradientTape() as tape:
c = a*b
grad = tape.gradient(c, a)
The resulting gradient I get is a scalar,
<tf.Tensor: shape=(), dtype=float32, numpy=2.0>
whereas we would expect the vector:
<tf.Variable 'Variable:0' shape=(3,) dtype=float32, numpy=array([1., 0., 1.], dtype=float32)>
Looking at other examples, it appears that tensorflow sums the expected vector, also for scalar-matrix multiplication and so on.
Why does tensorflow do this? This can probably be avoided using #custum_gradient, is there another less cumbersome way to get the correct gradient?
There are appear to be some related questions but these all seem to consider a the gradient of a loss function that aggregates over a training-batch. No loss function or aggregation is used here, so I think the issue is something else?
You're getting scaler value because you took the gradient wrt scaler. You would get a vector if you took grad wrt some vector. Take a look to the following example:
import tensorflow as tf
a = tf.Variable(3., trainable=True)
b = tf.Variable([1., 0, 1.], trainable=True)
c = tf.Variable(2., trainable=True)
d = tf.Variable([2., 1, 2.], trainable=True)
with tf.GradientTape(persistent=True) as tape:
e = a*b*c*d # abcd , abcd , abcd
tf.print(e)
grad = tape.gradient(e, [a, b, c, d])
grad[0].numpy(), grad[1].numpy(), grad[2].numpy(), grad[3].numpy()
[12 0 12]
(8.0,
array([12., 6., 12.], dtype=float32),
12.0,
array([6., 0., 6.], dtype=float32))
Formally, what I was looking for was the differential of the vector-field that is function of the variable a. For a vector-field the differential is the same as the Jacobian. It turns out that what I was looking for can be done by tape.jacobian.

Can Convolution2D work on rectangular images?

Let's say I have a 360px by 240px image. Instead of cropping my (already small) image to 240x240, can I create a convolutional neural network that operates on the full rectangle? Specifically using the Convolution2D layer.
I ask because every paper I've read doing CNNs seems to have square input sizes, so I wonder if what I propose will be OK, and if so, what disadvantages I may run into. Are all the settings (like border_mode='same') going to work the same?
No issues with a rectangle image... Everything will work properly as for square images.
Yes.
But why don't you give it a try
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
np.random.seed(1234)
from keras.layers import Input
from keras.layers.convolutional import Convolution2D
from keras.models import Model
print("Building Model...")
inp = Input(shape=(1,None,None))
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
model_network = Model(input=inp, output=output)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
model_network.layers[1].set_weights(w)
print("Weights after change:")
print(model_network.layers[1].get_weights())
print("Input:")
print(input_mat)
print("Output:")
print(model_network.predict(input_mat))
Build a sample model
inp = Input(shape=(1,None,None))
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
model_network = Model(input=inp, output=output)
Give it some weights and set them so you could predit the output, say:
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
model_network.layers[1].set_weights(w)
So that the convolution would simply double your input.
Give it your rectangular image:
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
And check the output to see if it works
print("Output:")
print(model_network.predict(input_mat))
Sample output:
Using Theano backend.
Building Model...
Weights after change:
[array([[[[ 0., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 0.]]]], dtype=float32)]
Input:
[[[[ 1. 2. 3. 10.]
[ 4. 5. 6. 11.]
[ 7. 8. 9. 12.]]]]
Output:
[[[[ 2. 4. 6. 20.]
[ 8. 10. 12. 22.]
[ 14. 16. 18. 24.]]]]
original post with some changes

Reset weights in Keras layer

I'd like to reset (randomize) the weights of all layers in my Keras (deep learning) model. The reason is that I want to be able to train the model several times with different data splits without having to do the (slow) model recompilation every time.
Inspired by this discussion, I'm trying the following code:
# Reset weights
for layer in KModel.layers:
if hasattr(layer,'init'):
input_dim = layer.input_shape[1]
new_weights = layer.init((input_dim, layer.output_dim),name='{}_W'.format(layer.name))
layer.trainable_weights[0].set_value(new_weights.get_value())
However, it only partly works.
Partly, becuase I've inspected some layer.get_weights() values, and they seem to change. But when I restart the training, the cost values are much lower than the initial cost values on the first run. It's almost like I've succeeded resetting some of the weights, but not all of them.
Save the initial weights right after compiling the model but before training it:
model.save_weights('model.h5')
and then after training, "reset" the model by reloading the initial weights:
model.load_weights('model.h5')
This gives you an apples to apples model to compare different data sets and should be quicker than recompiling the entire model.
Reset all layers by checking for initializers:
def reset_weights(model):
import keras.backend as K
session = K.get_session()
for layer in model.layers:
if hasattr(layer, 'kernel_initializer'):
layer.kernel.initializer.run(session=session)
if hasattr(layer, 'bias_initializer'):
layer.bias.initializer.run(session=session)
Update: kernel_initializer is kernel.initializer now.
If you want to truly re-randomize the weights, and not merely restore the initial weights, you can do the following. The code is slightly different depending on whether you're using TensorFlow or Theano.
from keras.initializers import glorot_uniform # Or your initializer of choice
import keras.backend as K
initial_weights = model.get_weights()
backend_name = K.backend()
if backend_name == 'tensorflow':
k_eval = lambda placeholder: placeholder.eval(session=K.get_session())
elif backend_name == 'theano':
k_eval = lambda placeholder: placeholder.eval()
else:
raise ValueError("Unsupported backend")
new_weights = [k_eval(glorot_uniform()(w.shape)) for w in initial_weights]
model.set_weights(new_weights)
I have found the clone_model function that creates a cloned network with the same architecture but new model weights.
Example of use:
model_cloned = tensorflow.keras.models.clone_model(model_base)
Comparing the weights:
original_weights = model_base.get_weights()
print("Original weights", original_weights)
print("========================================================")
print("========================================================")
print("========================================================")
model_cloned = tensorflow.keras.models.clone_model(model_base)
new_weights = model_cloned.get_weights()
print("New weights", new_weights)
If you execute this code several times, you will notice that the cloned model receives new weights each time.
Tensorflow 2 answer:
for ix, layer in enumerate(model.layers):
if hasattr(model.layers[ix], 'kernel_initializer') and \
hasattr(model.layers[ix], 'bias_initializer'):
weight_initializer = model.layers[ix].kernel_initializer
bias_initializer = model.layers[ix].bias_initializer
old_weights, old_biases = model.layers[ix].get_weights()
model.layers[ix].set_weights([
weight_initializer(shape=old_weights.shape),
bias_initializer(shape=old_biases.shape)])
Original weights:
model.layers[1].get_weights()[0][0]
array([ 0.4450057 , -0.13564804, 0.35884023, 0.41411972, 0.24866664,
0.07641453, 0.45726687, -0.04410008, 0.33194816, -0.1965386 ,
-0.38438258, -0.13263905, -0.23807487, 0.40130925, -0.07339832,
0.20535922], dtype=float32)
New weights:
model.layers[1].get_weights()[0][0]
array([-0.4607593 , -0.13104361, -0.0372932 , -0.34242013, 0.12066692,
-0.39146423, 0.3247317 , 0.2635846 , -0.10496247, -0.40134245,
0.19276887, 0.2652442 , -0.18802321, -0.18488845, 0.0826562 ,
-0.23322225], dtype=float32)
K.get_session().close()
K.set_session(tf.Session())
K.get_session().run(tf.global_variables_initializer())
Try set_weights.
for example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
np.random.seed(1234)
from keras.layers import Input
from keras.layers.convolutional import Convolution2D
from keras.models import Model
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Input:")
print(input_mat)
print("Output:")
print(model_network.predict(input_mat))
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
build a model with say, two convolutional layers
print("Building Model...")
inp = Input(shape=(1,None,None))
x = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(inp)
output = Convolution2D(1, 3, 3, border_mode='same', init='normal',bias=False)(x)
model_network = Model(input=inp, output=output)
then define your weights (i'm using a simple w, but you could use np.random.uniform or anything like that if you want)
w = np.asarray([
[[[
[0,0,0],
[0,2,0],
[0,0,0]
]]]
])
Take a peek at what are the layers inside a model
for layer_i in range(len(model_network.layers)):
print (model_network.layers[layer_i])
Set each weight for each convolutional layer (you'll see that the first layer is actually input and you don't want to change that, that's why the range starts from 1 not zero).
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w)
Generate some input for your test and predict the output from your model
input_mat = np.asarray([
[[
[1.,2.,3.,10.],
[4.,5.,6.,11.],
[7.,8.,9.,12.]
]]
])
print("Output:")
print(model_network.predict(input_mat))
You could change it again if you want and check again for the output:
w2 = np.asarray([
[[[
[0,0,0],
[0,3,0],
[0,0,0]
]]]
])
for layer_i in range(1,len(model_network.layers)):
model_network.layers[layer_i].set_weights(w2)
print("Output:")
print(model_network.predict(input_mat))
Sample output:
Using Theano backend.
Building Model...
<keras.engine.topology.InputLayer object at 0x7fc0c619fd50>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6166250>
<keras.layers.convolutional.Convolution2D object at 0x7fc0c6150a10>
Weights after change:
[array([[[[ 0., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 0.]]]], dtype=float32)]
Input:
[[[[ 1. 2. 3. 10.]
[ 4. 5. 6. 11.]
[ 7. 8. 9. 12.]]]]
Output:
[[[[ 4. 8. 12. 40.]
[ 16. 20. 24. 44.]
[ 28. 32. 36. 48.]]]]
Output:
[[[[ 9. 18. 27. 90.]
[ 36. 45. 54. 99.]
[ 63. 72. 81. 108.]]]]
From your peek at .layers you can see that the first layer is input and the others your convolutional layers.
For tf2 the simplest way to actually reset weights would be:
tf_model.set_weights(
clone_model(tf_model).get_weights()
)
clone_model() as mentioned by #danielsaromo returns new model with trainable params initialized from scratch, we use its weights to reinitialize our model thus no model compilation (knowledge about its loss or optimizer) is needed.
There are two caveats though, first is mentioned in clone_model()'s documentation:
clone_model will not preserve the uniqueness of shared objects within the model (e.g. a single variable attached to two distinct layers will be restored as two separate variables).
Another caveat is that for large models cloning might fail due to memory limit.
To "random" re-initialize weights of a compiled untrained model in TF 2.0 (tf.keras):
weights = [glorot_uniform(seed=random.randint(0, 1000))(w.shape) if w.ndim > 1 else w for w in model.get_weights()]
Note the "if wdim > 1 else w". You don't want to re-initialize the biases (they stay 0 or 1).
use keras.backend.clear_session()

Categories

Resources