Why does Tensorflow Reshape tf.reshape() break the flow of gradients?

Why does Tensorflow Reshape tf.reshape() break the flow of gradients? - python

I am creating a tf.Variable() and then create a simple function using that variable, then I flatten the original variable using tf.reshape() and then I take the tf.gradients() between the function and the flattened variable. Why does that return [None].
var = tf.Variable(np.ones((5,5)), dtype = tf.float32)
f = tf.reduce_sum(tf.reduce_sum(tf.square(var)))
var_f = tf.reshape(var, [-1])
print tf.gradients(f,var_f)
The above codeblock when executed returns [None]. Is this a bug? Please Help!

You are finding derivative of f with respect to var_f, but f is not a function of var_f but var instead. Thats why you are getting [None]. Now if you change the code to:
var = tf.Variable(np.ones((5,5)), dtype = tf.float32)
var_f = tf.reshape(var, [-1])
f = tf.reduce_sum(tf.reduce_sum(tf.square(var_f)))
grad = tf.gradients(f,var_f)
print(grad)
your gradients will be defined:
tf.Tensor 'gradients_28/Square_32_grad/mul_1:0' shape=(25,) dtype=float32>
The visualization of the graphs for the following code is given below:
var = tf.Variable(np.ones((5,5)), dtype = tf.float32, name='var')
f = tf.reduce_sum(tf.reduce_sum(tf.square(var)), name='f')
var_f = tf.reshape(var, [-1], name='var_f')
grad_1 = tf.gradients(f,var_f, name='grad_1')
grad_2 = tf.gradients(f,var, name='grad_2')
The derivative of grad_1 is not defined, while for grad_2 it's defined. The back-propagation graph (gradient graphs) of the two gradients are shown.

Related

How to plot keras activation functions in a notebook

I wanted to plot all Keras activation functions but some of them are not working. i.e. linear throws an error:
AttributeError: 'Series' object has no attribute 'eval'
which is weird. How can I plot the rest of my activation functions?
points = 100
zeros = np.zeros((points,1))
df = pd.DataFrame({"activation": np.linspace(-1.2,1.2,points)})
df["softmax"] = K.eval(activations.elu(df["activation"]))
#df["linear"] = K.eval(activations.linear(df["activation"]))
df["tanh"] = K.eval(activations.tanh(df["activation"]))
df["sigmoid"] = K.eval(activations.sigmoid(df["activation"]))
df["relu"] = K.eval(activations.relu(df["activation"]))
#df["hard_sigmoid"] = K.eval(activations.hard_sigmoid(df["activation"]))
#df["exponential"] = K.eval(activations.exponential(df["activation"]))
df["softsign"] = K.eval(activations.softsign(df["activation"]))
df["softplus"] = K.eval(activations.softplus(df["activation"]))
#df["selu"] = K.eval(activations.selu(df["activation"]))
df["elu"] = K.eval(activations.elu(df["activation"]))
df.plot(x="activation", figsize=(15,15))

That's because the linear activation returns the input without any modifications:
def linear(x):
"""Linear (i.e. identity) activation function.
"""
return x
Since you are passing a Pandas Series as input, the same Pandas Series will be returned and therefore you don't need to use K.eval():
df["linear"] = activations.linear(df["activation"])
As for the selu activation, you need to reshape the input to (n_samples, n_output):
df["selu"] = K.eval(activations.selu(df["activation"].values.reshape(-1,1)))
And as for the hard_sigmoid activation, its input should be explicitly a Tensor which you can create using K.variable():
df["hard_sigmoid"] = K.eval(activations.hard_sigmoid(K.variable(df["activation"].values)))
Further, exponential activation works as you have written and there is no need for modifications.

How to convert tensorflow variable to numpy array

I am trying to create a model graph where my input is tensorflow variable which I am inputting from my java program
In my code, I am using numpy methods where I need to convert my tensorflow variable input to numpy array input
Here, is my code snippet
import tensorflow as tf
import numpy as np
eps = np.finfo(float).eps
EXPORT_DIR = './model'
def standardize(x):
med0 = np.median(x)
mad0 = np.median(np.abs(x - med0))
x1 = (x - med0) / (mad0 + eps)
return x1
#tensorflow input variable
a = tf.placeholder(tf.float32, name="input")
with tf.Session() as session:
session.run(tf.global_variables_initializer())
#Converting the input variable to numpy array
tensor = a.eval()
#calling standardize method
numpyArray = standardize(tensor)
#converting numpy array to tf
tf.convert_to_tensor(numpyArray)
#creating graph
graph = tf.get_default_graph()
tf.train.write_graph(graph, EXPORT_DIR, 'model_graph.pb', as_text=False)
I am getting error: InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input' with dtype float in line tensor = a.eval()
When I am giving constant value in place of placeholder then it's working and generating the graph. But I want to input from my java code.
Is there any way to do that or do I need to convert all my numpy methods to tensorflow methods

placeholder is just an empty variable in tensorflow, to which you can feed numpy values. Now, what you are trying to do does not make sense. You can not get value out of an empty variable.
If you want to standardize your tensor, why convert it to numpy var first? You can directly do this using tensorflow.
The following taken from this stackoverflow ans
def get_median(v):
v = tf.reshape(v, [-1])
m = v.get_shape()[0]//2
return tf.nn.top_k(v, m).values[m-1]
Now, you can implement your function as
def standardize(x):
med0 = get_median(x)
mad0 = get_median(tf.abs(x - med0))
x1 = (x - med0)/(mad0 + eps)
return x1

How to implement indicator function in tensorflow?

I wanna implement a function like this:if x == k, f(x) = 1, else f(x) = 0(k is a parameter). So I used tf.equal and tf.cast and my code was like this:
import tensorflow as tf
a = range(12)
a = tf.Variable(a)
b = 6
b = tf.Variable(b)
a = tf.reshape(a, [3, 4])
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
c = tf.equal(a, b)
d = tf.cast(c, tf.int32)
print sess.run(c)
print sess.run(d)
It seems fine, but the problem is tf.gradients(d, a) and tf.gradients(d, b) are None. I tried tf.gradients(c, a) and got TypedError. Are there any decent way to implement this function?

I'm not sure the gradient is even defined here.
The indicator function is f(a,b) = 1 if a=b, 0 otherwise. Away from a=b, this function is constant (zero) and so has zero derivative. At any point where a=b the function is discontinuous, so it doesn't have a derivative there.
More intuitively: derivatives don't exist where you have a 'jump' in your function.

It would be possible to have the PDF of the normal distribution to approximate the indicator function. I am also new to TensorFlow, so feel free to point out any issue.
##I am using tensorflow2
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import tensorflow_probability as tfp
a = tf.range(12)
a = tf.Variable(a)
b = 6
b = tf.Variable(b)
a = tf.reshape(a, [3, 4])
## Define the PDF of a normal distribution to approximate the indicator function
dist = tfp.distributions.Normal(0., 0.1)
scalar = dist.prob(0) # a normalization constant
#since the pdf at data zero is not one
## Implement the approximazed indicator function
a = tf.cast(a, dtype= tf.float32)
b = tf.cast(b, dtype= tf.float32)
c = dist.prob(a-b)/scalar
#d = tf.cast(c, tf.int32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(c))
## calcualte the gradient
c_a = tf.gradients(c, a)
print(sess.run(c_a))

Merging tensor rowwise with a vector in keras

I was hoping to implement a variation of PointNet (https://arxiv.org/pdf/1612.00593.pdf) in keras, but I'm having trouble repeating the context vector (g) a variable amount of times so that I can Concatenate it rowwise with a previous layer that lacks context(pre). I tried Repeat() and keras.backend.Tile().
input = Input(shape=(None,3))
x = TimeDistributed(Dense(128, activation = 'relu'))(input)
pre = TimeDistributed(Dense(256, activation = 'relu'))(x)
g = GlobalMaxPooling1D()(pre)
x = Lambda(merge_on_single, output_shape=(None,512))([pre,g])
print(x.shape)
This is the lambda definition I came up with.
def merge_on_single(v):
#v[0] is variable length tensor, v[1] is the single vector
return Concatenate()([K.repeat(v[1],K.get_variable_shape(v[0])),v[0]])
However the following error occurs:
TypeError: Tensors in list passed to 'values' of 'Pack' Op have types [int32, , int32] that don't all match.
UPDATE:
So I was able to get the layers to not give errors by doing the following:
input = Input(shape=(None,3))
num_point = K.placeholder(input.get_shape()[1].value, dtype=tf.int32)
#first global feature layer
x = TimeDistributed(Dense(512, activation = 'relu'))(input)
x = TimeDistributed(Dense(256, activation = 'relu'))(x)
g = GlobalMaxPooling1D()(x)
g = K.reshape(g,(-1,1,256))
g = K.tile(x, [1,num_point,1])
concat_feat = K.concatenate([x, g])
but now, I get the following error:
AttributeError: 'Tensor' object has no attribute '_keras_history'

I suspect the culprit is K.get_variable_shape(v[0]). Since v[0] is of type int32 (as specified by your error), when you get the shape it returns None. Concatenate wants all inputs to be of the same type.

Tensorflow: Incompatible shapes when making a custom activation function?

I am trying to build a neural network using custom activation functions. I followed the solution given here, and it works when the input and output vectors have the same size, but not when using different sizes (like in a pooling function). Here is my problem so far:
I am trying to generalize this to the case when the input and the output have different sizes. In my code the input 'x' is of size (2,4), the output 'y' is of size (1,2), and the activation function MEX(.) does the mapping y = MEX(x). I have computed the gradient of MEX() as d_MEX(), where d_MEX(x) has the same size as 'x', that is (2,4). Nevertheless, I get this error
InvalidArgumentError (see above for traceback): Incompatible shapes: [1,2] vs. [2,4]
Shouldn't the gradient of MEX(x) be of the same size as x? Here is my complete code:
import tensorflow as tf
import numpy as np
# This is our target function
def MEX(x):
'''
:param x: is a row vector which is the concatenation of [input, beta]
:return MEX_{beta}(x): scalar output
'''
# lenx = np.size(x) # Number of columns (ROW vector)
lenx = x.shape[1]
N = x.shape[0]
out = np.zeros((1,N))
for ii in range(N):
c = x[ii,0:lenx-1]
beta = x[ii,lenx-1]
out[0,ii] = 1./beta * np.log( np.mean( np.exp(beta*c) ))
return np.array(out)
# Now we should write its derivative.
def d_MEX(x):
# lenx = np.size(x) # Number of
lenx = x.shape[1]
N = x.shape[0]
out = np.zeros((N,lenx))
for ii in range(N):
c = x[ii,0:lenx-1]
beta = x[ii,lenx-1]
d_beta = np.array([0.])
d_beta[0] = -1./beta*( MEX(np.array([x[ii,:]])) - np.mean( np.multiply( c, np.exp(beta*c)))/np.mean( np.exp(beta*c)) )
d_c = 1./lenx*np.exp(beta*c) /np.mean( np.exp(beta*c))
out[ii,:] = np.concatenate((d_c,d_beta), axis=0)
return out
# The first step is making it into a numpy function, this is easy:
np_MEX = np.vectorize(MEX, excluded=['x']) # IMPORTANT!! Otherwise np.vectorize() doesnt work
np_d_MEX = np.vectorize(d_MEX, excluded=['x']) # IMPORTANT!! Otherwise np.vectorize() doesnt work
# Now we make a tensforflow function
'''
Making a numpy fct to a tensorflow fct: We will start by making np_d_MEX_32 into a tensorflow function.
There is a function in tensorflow tf.py_func(func, inp, Tout, stateful=stateful, name=name) [doc]
which transforms any numpy function to a tensorflow function, so we can use it:
'''
np_d_MEX_32 = lambda x: np_d_MEX(x=x).astype(np.float32)
def tf_d_MEX(x,name=None):
with tf.name_scope(name, "d_MEX", [x]) as name:
y = tf.py_func(np_d_MEX_32,
[x],
[tf.float32],
name=name,
stateful=False)
return y[0]
'''
tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x] (and return y[0]).
The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False)
in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations.
One thing to be careful of at this point is that numpy used float64 but tensorflow uses float32 so you need to convert
your function to use float32 before you can convert it to a tensorflow function otherwise tensorflow will complain.
This is why we need to make np_d_MEX_32 first.
What about the Gradients? The problem with only doing the above is that even though we now have tf_d_MEX which is the
tensorflow version of np_d_MEX, we couldn't use it as an activation function if we wanted to because tensorflow doesn't
know how to calculate the gradients of that function.
Hack to get Gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function
using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc]. Copying the code from harpone we can modify
the tf.py_func function to make it define the gradient at the same time:
'''
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
'''
Now we are almost done, the only thing is that the grad function we need to pass to the above py_func function needs to
take a special form. It needs to take in an operation, and the previous gradients before the operation and propagate
the gradients backward after the operation.
Gradient Function: So for our MEX activation function that is how we would do it:
'''
def MEXgrad(op, grad):
x = op.inputs[0]
# x = op
n_gr = tf_d_MEX(x)
return grad * n_gr
'''
The activation function has only one input, that is why x = op.inputs[0]. If the operation had many inputs, we would
need to return a tuple, one gradient for each input. For example if the operation was a-bthe gradient with respect to a
is +1 and with respect to b is -1 so we would have return +1*grad,-1*grad. Notice that we need to return tensorflow
functions of the input, that is why need tf_d_MEX, np_d_MEX would not have worked because it cannot act on
tensorflow tensors. Alternatively we could have written the derivative using tensorflow functions:
'''
# Combining it all together: Now that we have all the pieces, we can combine them all together:
np_MEX_32 = lambda x: np_MEX(x=x).astype(np.float32)
def tf_MEX(x, name=None):
with tf.name_scope(name, "MEX",[x]) as name:
y = py_func(np_MEX_32,
[x],
[tf.float32],
name=name,
grad=MEXgrad) # <-- here's the call to the gradient
return y[0]
with tf.Session() as sess:
x = tf.constant([[0.2,0.7,1.2,1.7],[0.2,0.7,1.2,1.7]])
y = tf_MEX(x)
tf.global_variables_initializer().run()
print(x.eval(), y.eval(), tf.gradients(y, [x])[0].eval())
In the console, I have checked that the variables have the "correct" shapes:
x.eval()
Out[9]:
array([[ 0.2 , 0.69999999, 1.20000005, 1.70000005],
[ 0.2 , 0.69999999, 1.20000005, 1.70000005]], dtype=float32)
y.eval()
Out[10]: array([[ 0.83393127, 0.83393127]], dtype=float32)
tf_d_MEX(x).eval()
Out[11]:
array([[ 0.0850958 , 0.19909413, 0.46581003, 0.07051659],
[ 0.0850958 , 0.19909413, 0.46581003, 0.07051659]], dtype=float32)

My bad, I just found the mistake.
Its here:
def MEXgrad(op, grad):
x = op.inputs[0]
# x = op
n_gr = tf_d_MEX(x)
return n_gr
I wonder if there is a typo here, where this mistake is also there.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does Tensorflow Reshape tf.reshape() break the flow of gradients? - python

Related

How to plot keras activation functions in a notebook

How to convert tensorflow variable to numpy array

How to implement indicator function in tensorflow?

Merging tensor rowwise with a vector in keras

Tensorflow: Incompatible shapes when making a custom activation function?

Categories

Resources