I'm trying to use a custom loss function. I'm now using TF 2.x where eager execution is turned on by default. I gave this a go with TF 1.x, but ran into too many problems. Is there any alternative to wrapping my function with tf.py_function()? If not, how would I wrap this?
General purpose: Autoencoder with a custom loss function built around unusual ranked differences. For now I'm just using scipy stats rankdata, but that will change in the future.
Tensor shape: n, x, x, 1
n images, each of dim x, x.
Therefore, I want to run this custom loss function on each pair of orig, pred for all n images.
General algorithm:
import scipy.stats as ss
def rank_loss(orig, pred):
orig_arr = orig.numpy() # want x,x,1
pred_arr = pred.numpy()
orig_rank = (ss.rankdata(orig_arr)) # returns flat array of length size of array
pred_rank = (ss.rankdata(pred_arr))
distance_diff = 0
for i in range(len(orig_rank)): # gets sum of rank differences
distance_diff = abs(orig_rank[i] - pred_rank[i])
return distance_diff
If I can't do this, am I limited to the available tf.<funcs> or how can I pull out the tensor as some form of an array so I can run comparison computations across the two tensors?
I also looked at tf.make_ndarray, but that doesn't seem applicable.
Related
I am trying to do SVD using a neural network. My input is a matrix (let's say 4x4 matrices only) and the output is a vector representing the decomposed form (given that the input is 4x4 this would be a 36 element vector with 16 elements for U, 4 elements for S, and 16 elements for V.T).
I am trying to define a custom loss function instead of using something like MSE on the decomposed form. So instead of comparing the 36 length vectors for loss, I want to compute the loss between the reconstructed matrices. So if A = U * S * V.T (actual) and A' = U' * S' * V.T' (predicted), I want to compute the loss between A and A'.
I am pretty new to tensorflow and keras, so I may be doing some naive things, but here is what I have so far. While the logic seems okay to me, I get a TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn. I am not sure why this is the case and how to fix it? Also, do I need to flatten the output from the reconstruct_matrix, as I am currently doing, or should I just leave it as is?
# This function takes the decomposed matrix (vector of U, S, V.T)
# and reconstructs the original matrix
def reconstruct_matrix(decomposed_vector):
example = decomposed_vector
s = np.zeros((4,4))
for en, i in enumerate(example[16:20]):
s[en, en] = i
u = example[:16].reshape(4,4)
vt = example[20:].reshape(4,4)
orig = np.matmul(u, s)
orig = np.matmul(orig, vt)
return orig.flatten() # Given that matrices are 4x4, this will be a length 16 vector
# Custom loss that essentially computes MSE on reconstructed matrices
def custom_loss(y_true, y_pred):
Y = reconstruct_matrix(y_true)
Y_prime = reconstruct_matrix(y_pred)
return K.mean(K.square(Y - Y_prime))
model.compile(optimizer='adam',
loss=custom_loss)
Note: My keras version is 2.2.4 and my tensorflow version is 1.14.0.
In tf1.x eager execution is disabled by default (it's on for version 2 onwards).
You have to enable it by calling at the top of your script:
import tensorflow as tf
tf.enable_eager_execution()
This mode allows you to use Python-like abstractions for flow control (e.g. if statements and for loop you've been using in your code). If it's disabled you need to use Tensorflow functions (tf.cond and tf.while_loop for if and for respectively).
More information about it in the docs.
BTW. I'm not sure about flatten, but remember your y_true and y_pred need the same shape and samples have to be respective to each other, if that's fulfilled you should be fine.
Given a TensorFlow tf.while_loop, how can I calculate the gradient of x_out with respect to all weights of the network for each time step?
network_input = tf.placeholder(tf.float32, [None])
steps = tf.constant(0.0)
weight_0 = tf.Variable(1.0)
layer_1 = network_input * weight_0
def condition(steps, x):
return steps <= 5
def loop(steps, x_in):
weight_1 = tf.Variable(1.0)
x_out = x_in * weight_1
steps += 1
return [steps, x_out]
_, x_final = tf.while_loop(
condition,
loop,
[steps, layer_1]
)
Some notes
In my network the condition is dynamic. Different runs are going to run the while loop a different amount of times.
Calling tf.gradients(x, tf.trainable_variables()) crashes with AttributeError: 'WhileContext' object has no attribute 'pred'. It seems like the only possibility to use tf.gradients within the loop is to calculate the gradient with respect to weight_1 and the current value of x_in / time step only without backpropagating through time.
In each time step, the network is going to output a probability distribution over actions. The gradients are then needed for a policy gradient implementation.
You can't ever call tf.gradients inside tf.while_loop in Tensorflow based on this and this, I found this out the hard way when I was trying to create conjugate gradient descent entirely into the Tensorflow graph.
But if I understand your model correctly, you could make your own version of an RNNCell and wrap it in a tf.dynamic_rnn, but the actual cell
implementation will be a little complex since you need to evaluate a condition dynamically at runtime.
For starters, you can take a look at Tensorflow's dynamic_rnn code here.
Alternatively, dynamic graphs have never been Tensorflow's strong suite, so consider using other frameworks like PyTorch or you can try out eager_execution and see if that helps.
I am attempting to gather the indices of specific tensors/(vectors/matrices) within a tensor in keras. Therefore, I attempted to use tf.gather with tf.where to get the indices to use in the gather function.
However, tf.where provides element wise indices for the matching values when testing for equality. I would like to have the ability to find the indices (rows) for tensors (vectors) which are equal to another.
This is especially useful for finding the one-hot vectors within a tensor which match a set of one-hot vectors of interest.
I have some code to illustrate the shortcoming so far:
# standard
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import LabelBinarizer
sess = tf.Session()
# one-hot vector encoding labels
l = LabelBinarizer()
l.fit(['a','b','c'])
# input tensor
t = tf.constant(l.transform(['a','a','c','b', 'a']))
# find the indices where 'c' is label
# ***THIS WORKS***
np.all(t.eval(session = sess) == l.transform(['c']), axis = 1)
# We need to do everything in tensorflow and then wrap in Lambda layer for keras so...
from keras import backend as K
# ***THIS DOES NOT WORK***
K.all(t.eval(session = sess) == l.transform(['c']), axis = 1)
# go on from here to get smaller subset of vectors from another tensor with the indicies given by `tf.gather`
Clearly the code above shows I have tried to get this conditional by axis to work, and it does fine in numpy, but the tensorflow version is not as easily ported from numpy.
Is there a better way to do this?
Similarly to what you do, we can use tf.reduce_all which is the tensorflow equivalent of np.all:
tf.reduce_all(t.eval(session = sess) == l.transform(['c']), axis = 1)
I am using a function consisting of compound Tensorflow operations. However, instead of letting Tensorflow automatically compute its derivatives with respect to one of the inputs, I would like to replace the gradients with a different computation on the same input. Moreover, some of the calculation is shared between the forward and backward pass. For example:
def func(in1, in2):
# do something with inputs using only tf operations
shared_rep = tf.op1(tf.op2(tf.op3(in1, in2))) # same computation for both forward and gradient pass
# return output of forward computation
return tf.op4(shared_rep)
def func_grad(in1, in2):
shared_rep = tf.op1(tf.op2(tf.op3(in1, in2)))
# explicitly calculate gradients with respect to in1, with the intention of replacing the gradients computed by Tensorflow
mygrad1 = tf.op5(tf.op6(shared_rep))
return mygrad1
in1 = tf.Variable([1,2,3])
in2 = tf.Variable([2.5,0.01])
func_val = func(in1, in2)
my_grad1 = func_grad(in1, in2)
tf_grad1 = tf.gradients(func_val, in1)
with tf.Session() as sess:
# would like tf_grad1 to equal my_grad1
val, my1, tf1 = sess.run([func_val, my_grad1, tf_grad1])
tf.assert_equal(my1, tf1)
NOTE: This is similar to question How to replace or modify gradient? with one key difference: I am not interested in Tensorflow computing gradients of a different function in the backward pass; rather I would like to supply the gradients myself based on alternate tensorflow operations on the input.
I am trying to use the ideas proposed in the solution to the above question and in the following post, that is using tf.RegisterGradient and gradient_override_map to override the gradient of the identity function wrapping the forward function.
This fails because inside the registered alternate grad for identity, I have no access to the input to func_grad:
#tf.RegisterGradient("CustomGrad")
def alternate_identity_grad(op, grad):
# op.inputs[0] is the output of func(in1,in2)
# grad is of no use, because I would like to replace it with func_grad(in1,in2)
g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
out_grad = tf.identity(input, name="Identity")
EDIT After additional research, I believe this question is similar to the following question. I managed to obtain the desired solution by combining gradient_override_map with the hack suggested here.
I want to take a closer look at the Jacobians of each layer in a fully connected neural network, i.e. ∂y/∂x where x is the input vector (activations previous layer) to the layer and y is the output vector (activations this layer) of that layer.
In an online learning scheme, this could be easily done as follows:
import theano
import theano.tensor as T
import numpy as np
x = T.vector('x')
w = theano.shared(np.random.randn(10, 5))
y = T.tanh(T.dot(w, x))
# computation of Jacobian
j = T.jacobian(y, x)
When learning on batches, you need an additional scan to get the Jacobian for each sample
x = T.matrix('x')
...
# computation of Jacobian
j = theano.scan(lambda i, a, b : jacobian(b[i], a)[:,i],
sequences = T.arange(y.shape[0]), non_sequences = [x, y]
)
This works perfectly well for toy examples, but when learning a network with multiple layers with 1000 hidden units and for thousands of samples, this approach leads to a massive slowdown of the computations. (The idea behind indexing the result of the Jacobian can be found in this question)
The thing is that I believe there is no need for this explicit Jacobian computation when we are already computing the derivative of the loss. After all, the gradient of the loss with regard to e.g. the inputs of the network, can be decomposed as
∂L(y,yL)/∂x = ∂L(y,yL)/∂yL ∂yL/∂y(L-1) ∂y(L-1)/∂y(L-2) ... ∂y2/∂y1 ∂y1/∂x
i.e. the gradient of the loss w.r.t. x is the product of derivatives of each layer (L would be the number of layers here).
My question is thus whether (and how) it is possible to avoid the extra computation and use the decomposition discussed above. I assume it should be possible, because automatic differentiation is practically an application of the chain rule (for as far I understood it). However, I don't seem to find anything that could back this idea. Any suggestions, hints or pointers?
T.jacobian is very inefficient because it uses scan internally. If you plan to multiply jacobian matrix with something, you should use T.Lop or T.Rop for left / right multiplication respectively. Currently "smart" jacobian does not exist theano's in gradient module. You have to hand craft them if you want optimized jacobian.
Instead of using T.scan, use batched Op such asT.batched_dot when possible. T.scan will always results in a CPU loop.