Calculating jacobians and gradients using tensor flow

Calculating jacobians and gradients using tensor flow - python

I'm trying to solve 2D Darcy equation which is a mixed formulation. Suppose I have a target vector and source vector as follows:
u = [u1,u2,p]
x = [x,y].
grad(u,x) =
[du1/dx, du2/dx, dp/dx;
du1/dy, du2/dy, dp/dy]
I'm not understanding if this is what happens if I do tf.gradients(u,x).

tf.gradients(u,x) doesn't return what you want because
from https://www.tensorflow.org/api_docs/python/tf/gradients,
gradients() adds ops to the graph to output the derivatives of ys with
respect to xs. It returns a list of Tensor of length len(xs) where
each tensor is the sum(dy/dx) for y in ys and for x in xs.
Here is how you can get jacobian.
import tensorflow as tf
x=tf.constant([3.0,4.0])
with tf.GradientTape() as tape:
tape.watch(x)
u1=x[0]**2+x[1]**2
u2=x[0]**2
u3=x[1]**3
u=tf.stack([u1,u2,u3])
J = tape.jacobian(u, x)
print(J)
'''
tf.Tensor(
[[ 6. 8.]
[ 6. 0.]
[ 0. 48.]], shape=(3, 2), dtype=float32)
'''

Related

keras, sparse_categorical_crossentropy label Y dimension and value range

Can someone please explain dimensionality logic for input X and class Y
for sparse_categorical_crossentropy loss function ?
I checked both Keras and tf2 doc and examples, and this post.
Cross Entropy vs Sparce but one point is not clear to me.
Does the Y vector need to be expanded to the same number column as
the number classes models outputs (if I use softmax output), or
Does Keras automatically expand Y?
In my case, I have input images 32x32, and Y is a number between 0 and 10.
So the input is (batch_size, h, w), Y (batch_size, 0....10 integer value)
X = (73257, 32, 32)
Y = (73257, 1)
model.fit(X, Y, epochs=30, validation_split=0.10, batch_size=1, verbose=True)
The model itself just a Sequential bunch of Dense layers and output Softmax.
model = Sequential()
model.add(Dense(32, activation='relu',
input_shape=input_shape,
kernel_initializer='he_uniform',
bias_initializer='ones'))
# bunch of Dense layer and output softmax
model.add(Dense(10, activation='softmax'))
The error is dimensionality.
ValueError: Shape mismatch: The shape of labels (received (1, 1)) should equal the shape of logits except for the last dimension (received (1, 32, 10)).
Thank you.

As mentioned in that post, both categorical cross-entropy (cce) and sparse categorical cross-entropy (scc) have the same loss function just except the format of the true label Y. Simply if Y is an integer, you would use scc whereas if Y is one-hot, you would use cce. So for scc, ground truth Y is mostly 1D whereas in cce, ground truth Y mostly is 2D. For ground truth
- (num_of_samples, n_class_one_hot_encode) <- for cce (2D)
- (num_of_samples, n_class_int) <- for scc (1D)
For example, if we use the cifar10 data set, we can do
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.cifar10.load_data()
# train set / data
x_train = x_train.astype('float32') / 255
sparse = y_train
onehot = y_train
onehot = tf.keras.utils.to_categorical(onehot , num_classes=10)
print(sparse[:5]) # < --- (num_of_samples, n_class_int)
print(onehot[:5]) # < --- (num_of_samples, n_class_one_hot_encode)
[[6]
[9]
[9]
[4]
[1]]
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
Now, let's define a simple model and train using the above both and see what happens.
def net():
input = tf.keras.Input(shape=(32, 32, 3))
x = tf.keras.layers.Conv2D(16, 3, activation="relu")(input)
x = tf.keras.layers.MaxPooling2D(3)(x)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(input, x)
return model
Using cce
model = net()
model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, onehot, return_dict=True)
print(his)
{'loss': 2.376708984375, 'accuracy': 0.09651999920606613}
one_hot_pred = model.predict(x_train)
print(onehot[0])
print(one_hot_pred[0])
print(onehot[0].shape)
print(one_hot_pred[0].shape)
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0.1516315 0.1151238 0.11732318 0.10644271 0.08946694 0.1398355
0.05046898 0.04249624 0.11813554 0.06907552]
(10,)
(10,)
Now, using scc
model = net()
model.compile(
loss = tf.keras.losses.SparseCategoricalCrossentropy(),
metrics = 'accuracy',
optimizer = 'adam')
his = model.train_on_batch(x_train, sparse, return_dict=True)
print(his)
{'loss': 2.331458806991577, 'accuracy': 0.10066000372171402}
sparse_pred = model.predict(x_train)
print(sparse[0])
print(sparse_pred[0])
print(sparse[0].shape)
print(sparse_pred[0].shape)
[6]
[0.07184976 0.08837385 0.06910037 0.12347631 0.09542189 0.09981853
0.11247937 0.06707954 0.14902702 0.12337337]
(1,)
(10,)
Observe that, gt and pred shape for scc are (1,) and (10,). In this case, the loss computes the logarithm only for output index which ground truth indicates to. For example, the gt here is 6, and from pred the loss will compute only the logarithm of pred[6]. Here are some little more details of it.

Could I set a part of a tensor untrainable?

It's easy to set a tensor untrainable, trainable=False. But Could I set only part of a tensor untrainable?
Suppose I have a 2*2 tensor, I only want one element untrainable and the other three elements trainable.
Like this (I want the 1,1 element always to be zero, and the other three elements updated by optimizer)
untrainable trainable
trainable trainable
Thanks.

Short answer: you can't.
Longer answer: you can mimic that effect by setting part of the gradient to zero after the computation of the gradient so that part of the variable is never updated.
Here is an example:
import tensorflow as tf
tf.random.set_seed(0)
model = tf.keras.Sequential([tf.keras.layers.Dense(2, activation="sigmoid", input_shape=(2,), name="first"), tf.keras.layers.Dense(1,activation="sigmoid")])
X = tf.random.normal((1000,2))
y = tf.reduce_sum(X, axis=1)
ds = tf.data.Dataset.from_tensor_slices((X,y))
In that example, the first layer has a weight W of the following:
>>> model.get_layer("first").trainable_weights[0]
<tf.Variable 'first/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[ 0.13573623, -0.68269 ],
[ 0.8938798 , 0.6792033 ]], dtype=float32)>
We then write the custom loop that will only update the first row of that weight W :
loss = tf.losses.MSE
opt = tf.optimizers.SDG(1.) # high learning rate to see the change
for xx,yy in ds.take(1):
with tf.GradientTape() as tape:
l = loss(model(xx),yy)
g = tape.gradient(l,model.get_layer("first").trainable_weights[0])
gradient_slice = g[:1] # first row
new_grad = tf.concat([gradient_slice, tf.zeros((1,2), dtype=tf.float32),], axis=0) # replacing the rest with zeros
opt.apply_gradients(zip([new_grad], [model.get_layer("first").trainable_weights[0]]))
And then, after running that loop, we can inspect the wieghts again:
model.get_layer("first").trainable_weights[0]
<tf.Variable 'first/kernel:0' shape=(2, 2) dtype=float32, numpy=
array([[-0.08515069, -0.51738167],
[ 0.8938798 , 0.6792033 ]], dtype=float32)>
And only the first row changed.

Why gradient Tape with tensor flow make the sum of the tensor? [duplicate]

I want to calculate Jacobian matrix by Tensorflow.
What I have:
def compute_grads(fn, vars, data_num):
grads = []
for n in range(0, data_num):
for v in vars:
grads.append(tf.gradients(tf.slice(fn, [n, 0], [1, 1]), v)[0])
return tf.reshape(tf.stack(grads), shape=[data_num, -1])
fn is a loss function, vars are all trainable variables, and data_num is a number of data.
But if we increase the number of data, it takes tremendous time to run the function compute_grads.
Any ideas?

Assuming that X and Y are Tensorflow tensors and that Y depends on X:
from tensorflow.python.ops.parallel_for.gradients import jacobian
J=jacobian(Y,X)
The result has the shape Y.shape + X.shape and provides the partial derivative of each element of Y with respect to each element of X.

Assuming you are using Tensorflow 2 or Tensorflow <2 and Eager mode, you can use the GradientTape and the inbuild function:
with tf.GradientTape() as g:
x = tf.constant([1.0, 2.0])
g.watch(x)
y = x * x
jacobian = g.jacobian(y, x)
# jacobian value is [[2., 0.], [0., 4.]]
Check the official documentation for more

How can I implement this layer similar to Conv2D using tensorflow?

I want to make a neural network layer similar to Conv2D using tensorflow.
Below is what I want to implement. A layer uses a kernel just like convolution layer but the output is larger than the input.
The layer image that I want to implement
However, it seems there is no way I can implement that using only tensorflow operations.
I managed to implement the below code by converting tensorflow tensors to numpy arrays but I still have no idea how to merge 4D output array into 2D array.
input = [[a, b],
[c, d]]
kernel = [[1, -1],
[2, 1]]
output = [[input[0][0] * kernel, input[0][1] * kernel],
[input[1][0] * kernel, input[1][1] * kernel]]
#since "input[0][0] * kernel" is 2D, "output" becomes 4D array.
Is there any way I can implement this using only tensorflow?
If not, what method should I use instead?

class MyLayer(tf.keras.layers.Layer):
def __init__(self, kernel):
super(MyLayer, self).__init__()
self.k = tf.constant(kernel)
def build(self, input_shape):
self.i = input_shape
def call(self, input):
x = tf.reshape(input, [-1])
return tf.map_fn(lambda s: tf.scalar_mul(s, self.k), x)
mylayer = MyLayer([[1.0, -1.0], [-1.0, 1.0]])
x = tf.constant([[1.0, 2.0, 3.0], [3.0, 4.0, 5.0]])
with tf.Session() as sess:
print (sess.run(r))
Output:
[[[ 1. -1.]
[-1. 1.]]
[[ 2. -2.]
[-2. 2.]]
[[ 3. -3.]
[-3. 3.]]
[[ 3. -3.]
[-3. 3.]]
[[ 4. -4.]
[-4. 4.]]
[[ 5. -5.]
[-5. 5.]]]

how can I calculate the multi-label top k precisions with tensorflow?

My task is to predict the five most probable tags in a sentence. And now I've got unscaled logits from the output(dense connect) layer:
with tf.name_scope("output"):
scores = tf.nn.xw_plus_b(self.h_drop, W,b, name="scores")
predictions = tf.nn.top_k(self.scores, 5) # should be the k highest score
with tf.name_scope("accuracy"):
labels = input_y # its shape is (batch_size, num_classes)
# calculate the top k accuracy
now predictions are just like [3,1,2,50,12] (3,1... are indexes of the highest scores), while labels are in "multi-hot" form: [0,1,0,1,1,0...].
In python, i can simply write
correct_preds = [input_y[i]==1 for i in predictions]
weighted = np.dot(correct_preds, [5,4,3,2,1]) # weighted by rank
recall = sum(correct_preds) /sum(input_y)
precision =sum(correct_preds)/len(correct_preds)
but in tensorflow, what form shoud I use to complete this task?

Solution
I've coded up an example of how to do the calculations. All of the inputs in this example are coded as tf.constant but of course you can substitute your variables.
The main trick is the matrix multiplications. First is input_y reshaped to be 2d times a [1x5] ones matrix called to_top5. The second is correct_preds by the weighted_matrix.
Code
import tensorflow as tf
input_y = tf.constant( [5,2,9,1] , dtype=tf.int32 )
predictions = tf.constant( [[9,3,5,2,1],[8,9,0,6,5],[1,9,3,4,5],[1,2,3,4,5]])
to_top5 = tf.constant( [[1,1,1,1,1]] , dtype=tf.int32 )
input_y_for_top5 = tf.matmul( tf.reshape(input_y,[-1,1]) , to_top5 )
correct_preds = tf.cast( tf.equal( input_y_for_top5 , predictions ) , dtype=tf.float16 )
weighted_matrix = tf.constant( [[5.],[4.],[3.],[2.],[1.]] , dtype=tf.float16 )
weighted = tf.matmul(correct_preds,weighted_matrix)
recall = tf.reduce_sum(correct_preds) / tf.cast( tf.reduce_sum(input_y) , tf.float16)
precision = tf.reduce_sum(correct_preds) / tf.constant(5.0,dtype=tf.float16)
## training
# Run tensorflow and print the result
with tf.Session() as sess:
print "\n\n=============\n\n"
print "\ninput_y_for_top5"
print sess.run(input_y_for_top5)
print "\ncorrect_preds"
print sess.run(correct_preds)
print "\nweighted"
print sess.run(weighted)
print "\nrecall"
print sess.run(recall)
print "\nprecision"
print sess.run(precision)
print "\n\n=============\n\n"
Output
=============
input_y_for_top5
[[5 5 5 5 5]
[2 2 2 2 2]
[9 9 9 9 9]
[1 1 1 1 1]]
correct_preds
[[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]]
weighted
[[ 3.]
[ 0.]
[ 4.]
[ 5.]]
recall
0.17651
precision
0.6001
=============
Summary
The above examples shows a batch size of 4.
The first batch has a y_label of 5, which means that the element with an index of 5 is the correct label for the first batch. Furthermore, the prediction for the first batch is [9,3,5,2,1] which means that the prediction function thinks that the 9th element is the most likely, then element 3 is the next most likely and so on.
Let's say we want an example of a batch size of 3, then use the following code
input_y = tf.constant( [5,2,9] , dtype=tf.int32 )
predictions = tf.constant( [[9,3,5,2,1],[8,9,0,6,5],[1,9,3,4,5]])
If we substitute in the above lines to the program we can see that indeed it calculates everything for a batch size of 3 correctly.

inspired by #wontonimo' answer above, I implemented a method using matrix ops and tf.reshape, tf.gather. The label tensor are "multi-hot", e.g. [[0,1,0,1],[1,0,0,1]]. prediction tensor are obtained by tf.nn.top_k, looks like [[3,1],[0,1]]. Here is the code:
top_k_pred = tf.nn.top_k(logits, 5)
tmp1 = tf.reshape(tf.range(batch_size)*num_classes, (-1,1))
idx_incre = top_k_pred[1] + tf.concat([tmp1]*5,1)
correct_preds = tf.gather(tf.reshape(y_label, (-1,), tf.reshape(idx_incre, (-1,)))
correct_preds = tf.reshape(correct_pred, (batch_size, 5))
weighted = correct_preds * [[5],[4],[3],[2],[1]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calculating jacobians and gradients using tensor flow - python

I'm trying to solve 2D Darcy equation which is a mixed formulation. Suppose I have a target vector and source vector as follows: u = [u1,u2,p] x = [x,y]. grad(u,x) = [du1/dx, du2/dx, dp/dx; du1/dy, du2/dy, dp/dy] I'm not understanding if this is what happens if I do tf.gradients(u,x).

Related

keras, sparse_categorical_crossentropy label Y dimension and value range

Could I set a part of a tensor untrainable?

Why gradient Tape with tensor flow make the sum of the tensor? [duplicate]

How can I implement this layer similar to Conv2D using tensorflow?

how can I calculate the multi-label top k precisions with tensorflow?

Categories

Resources