Tensorflow 2 gradient gives nan results for pow - python

The following simplified code outputs nan for derivatives when x=0. I'm running tensorflow 2.0.0.
import tensorflow as tf
x = tf.Variable([[-1.0], [0.0], [1.0]])
with tf.GradientTape(persistent=True) as t:
t.watch(x)
# case 1: y = x^4
# y = tf.reduce_sum(tf.pow(x, 4), axis=1) # gives nan for 2nd to 5th derivative at x=0
# case 2: y = x + x^2 + x^3 + x^4
y = tf.reduce_sum(tf.pow(x, [[1, 2, 3, 4]]), axis=1) # gives nan for 2nd to 5th derivative at x=0
dy_dx = t.gradient(y, x)
d2y_dx2 = t.gradient(dy_dx, x)
d3y_dx3 = t.gradient(d2y_dx2, x)
d4y_dx4 = t.gradient(d3y_dx3, x)
d5y_dx5 = t.gradient(d4y_dx4, x)
del t
tf.print(y)
tf.print(tf.transpose(dy_dx)) # transpose only to fit on one line when printed
tf.print(tf.transpose(d2y_dx2))
tf.print(tf.transpose(d3y_dx3))
tf.print(tf.transpose(d4y_dx4))
tf.print(tf.transpose(d5y_dx5))
This outputs correct values except when x=0:
[0 0 4]
[[-2 1 10]]
[[8 -nan(ind) 20]]
[[-18 -nan(ind) 30]]
[[24 -nan(ind) 24]]
[[0 -nan(ind) 0]]
If you run the tf.pow(x, 4) case instead, the nan only shows up for the 5th derivative:
[1 0 1]
[[-4 0 4]]
[[12 0 12]]
[[-24 0 24]]
[[24 24 24]]
[[-0 -nan(ind) 0]]
So my questions are:
The tensorflow documentation doesn't explicitly say that the pow function supports two parameters of different size, but the first output y is correct. Anyone have experience with this? I'm expecting a matrix of all 3 input x values raised to all 4 powers.
Is the nan value returned from the gradient a bug I should report? I did find this previous possibly related issue, but it was fixed: https://github.com/tensorflow/tfjs/issues/346

Related

Numpy Covariance

When there is a zero mean applied to the numpy matrix, is there a difference expected between, the following 2 codes? I was learning andrew ng's ML course and he suggested to use X # X^T to find the covariance matrix(considering the zero mean is applied). When I tried to visually examine the matrix, found it gives diff result with np.cov function.. Please help..
import numpy as np
X=np.random.randint(0,9,(3,3))
print(X)
[[2 1 5]
[7 4 8]
[4 7 6]]
X = (X - X.mean(axis=0)) # <- Zero Mean
print(X)
[[-2.33333333 -3. -1.33333333]
[ 2.66666667 0. 1.66666667]
[-0.33333333 3. -0.33333333]]
cov1 = (X # X.T)/m # <- Find covariance manually as suggested in the course
print(cov1)
[[ 5.40740741 -2.81481481 -2.59259259]
[-2.81481481 3.2962963 -0.48148148]
[-2.59259259 -0.48148148 3.07407407]]
cov2 = np.cov(X,bias=True) # <- Find covariance with np.cov
print(cov2)
[[ 0.7037037 0.59259259 -1.2962963 ]
[ 0.59259259 1.81481481 -2.40740741]
[-1.2962963 -2.40740741 3.7037037 ]]
If your observations are in rows and variables are in columns (set rowvar to False), then it must be x.T # x:
import numpy as np
x0 = np.array([[2, 1, 5], [7, 4, 8], [4, 7, 6]])
x = x0 - x0.mean(axis=0)
cov1 = x.T # x / 3
cov2 = np.cov(x, rowvar=False, bias=True)
assert np.allclose(cov1, cov2)
x # x.T is for the case when the observations are in columns and variables are in the rows:
x = x0 - x0.mean(axis=1)[:,None]
cov1 = x # x.T / 3
cov2 = np.cov(x, bias=True) # rowvar=True by default
assert np.allclose(cov1, cov2)

How to efficiently compute logsumexp of upper triangle in a nested loop?

I have a nested for loop that iterates over rows of the weight matrix and applies logsumexp to the upper triangular portion of the outer addition matrix from these weights rows. It is very slow so I'm trying to figure out how to speed this up by either vectorizing or taking out the loops in lieu of matrix operations.
'''
Wm: weights matrix, nxk
W: updated weights matrix, nxn
triu_inds: upper triangular indices of Wxy outer matrix
'''
for x in range(n-1):
wx = Wm[x, :]
for y in range(x+1, n):
wy = Wm[y, :]
Wxy = np.add.outer(wx, wy)
Wxy = Wxy[triu_inds]
W[x, y] = logsumexp(Wxy)
logsumexp: computes the log of the sum of exponentials of an input array
a: [1, 2, 3]
logsumexp(a) = log( exp(1) + exp(2) + exp(3) )
The input data Wm is a weights matrix of nxk dimensions. K represents a patients sensor locations and n represents all such possible sensor locations. The values in Wm are basically how close a patients sensor is to a known sensor.
example:
Wm = [1 2 3]
[4 5 6]
[7 8 9]
[10 11 12]
wx = [1 2 3]
wy = [4 5 6]
Wxy = [5 6 7]
[6 7 8]
[7 8 9]
triu_indices = ([0, 0, 1], [1, 2, 2])
Wxy[triu_inds] = [6, 7, 8]
logsumexp(Wxy[triu_inds]) = log(exp(6) + exp(7) + exp(8))
You can perform the outer product on the full matrix Wm and then swap the axes corresponding to columns in operand 1 and rows in operand 2 in order to apply the triangle indices to the columns. The resulting matrix is filled for all combinations of rows, so you need to select the upper triangle part.
W = logsumexp(
np.add.outer(Wm, Wm).swapaxes(1, 2)[(slice(None),)*2 + triu_inds],
axis=-1 # Perform summation over last axis.
)
W = np.triu(W, k=1)

What does tf.reduce_sum do with axis = -1?

I don't understand why the output of the following code is [7 56].
import tensorflow as tf
x = tf.constant([[1, 2, 4], [8, 16, 32]])
a = tf.reduce_sum(x, -1) # [ 9 18 36]
with tf.Session() as sess:
output_a = sess.run(a)
print(output_a)
I get that row-wise addition has been done. But can someone shed some light on why -1 in the reduce_sum function has been treated to sum all values in a row?
-1 means the last axis; Since you have a rank 2 tensor, the last axis is the second axis, that is, along the rows; tf.reduce_sum with axis=-1 will thus reduce (sum) the second dimension.
I run your code and actually gave me a different answer:
import tensorflow as tf
x = tf.constant([[1, 2, 4], [8, 16, 32]])
a = tf.reduce_sum(x, -1)
tf.print(a)
The answer is [7,56], which is added 1+2+4 =7, and 8+16+32=56.
axis: The dimensions to reduce.
My understanding:
'-1' means the last axis, or dimension.
tf.reduce_sum(x, -1) is equal to tf.reduce_sum(x, 1) here since only 2 dimensions.
The dimension is 2*3 (2 rows*3 cols). Goal is to remove the last dimension '3' and the result should be 2 rows:
[[7]
[56]]
Since no 'keepdims=True' here, [] will be removed and we get result [7,56]
You can test this example. Note: '-1' and '1' are different here.
y = tf.constant([[[1, 2, 4], [1, 0, 3]],[[1,2,3],[2,2,1]]])
c = tf.reduce_sum(y, 1) # if (y,-1) will be [[7,4],[6,5]]
tf.print(c) #[[2 2 7], [3 4 4]]

Tensorflow scan multiple matrix rows with offset

Question
I want to scan a matrix analogous to Tensorflow's tf.scan(), but using multiple rows at a time. So given a [n, m] matrix, I want to be able to iterate the m rows (with n elements) from i + j to m giving m - j slices of shape [i - j, n].
How can this be achieved?
I know how tf.scan does something like this, returning the accumulated value of each iteration. But I don't think shifting the matrix as multiple inputs solves this, since the values that have an offset cannot be precomputed.
Example
To give an example for n = 3 and m = 5, let's say I have a matrix that looks like the following:
# [[1 0 0]
# [1 1 0]
# [0 0 0] row 3
# [0 0 0] row 4
# [0 0 0]] row 5
matrix_shape = [5, 3]
matrix_idx = tf.constant([[0, 0], [1, 0], [1, 1]])
matrix = tf.scatter_nd(matrix_idx,
tf.ones(tf.shape(matrix_idx)[0],
dtype=tf.int32),
matrix_shape)
I want to apply the following function from row 3 to row 5:
# [[ 1 0 0] ┌ a
# [ 1 1 0] ├ b
# [ 6 4 2] <─┴ output / current line
# [16 12 6]
# [46 34 18]]
def compute(x):
a = x[0]
b = x[1]
return (a + b + 1) * 2
Does Tensorflow have a function specific to this problem?
The following code I wrote does exactly what I wanted.
The important part here is the return of the function used by tf.scan, which not only gives back the current computation c, but also the row from the previous step b. It is therefore important to later cut off this excess from computation by only selecting the later tensor in this list with [1].
#!/usr/bin/env python3
import tensorflow as tf
def compute(x, _):
a = x[0]
b = x[1]
c = (a + b + 1) * 2
return (b, c)
matrix_shape = tf.constant([3, 3])
init_data = [[1, 0, 0], [1, 1, 0]]
initializer = (
tf.constant(init_data[0]),
tf.constant(init_data[1]),
)
matrix = tf.zeros(matrix_shape, dtype=tf.int32)
computation = tf.scan(compute, matrix, initializer)[1]
result = tf.concat((tf.constant(init_data), computation), axis=0)
with tf.Session() as sess:
sess.run(result)
print(result.eval())
Since I'm yet lacking experience: May this solution be bad for performance, because the function is returning a tuple and therefore not using Tensorflow's speed optimizations?

Getting time frequency of number in array in Python?

Let's say I have a time series represented in a numpy array, where every 3 seconds, I get a data point. It looks something like this (but with many more data points):
z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
I want to find a threshold where, on average, every y seconds a data point will surpass that threshold (x).
Maybe my question would be easier to understand in this sense: let's say I've gathered some data on how many ants are leaving their mound every 3 seconds. Using this data, I want to create a threshold (x) so that in the future if the number of ants leaving at one time exceeds x, my beeper will go off. Now this is the key part - I want my beeper to go off roughly every 4 seconds. I'd like to use Python to figure out what x should be given some y amount of time based on an array of data I've already collected.
Is there a way to do this in Python?
I think it is easiest to first think about this in terms of statistics. What I think you are really saying is that you want to calculate the 100*(1-m/nth) percentile, that is the number such that the value is below it 1-m/nth of the time, where m is your sampling period and n is your desired interval. In your example, it would be the 100*(1-3/4th) percentile, or 25th percentile. That is, you want the value that is exceeded 75% of the time.
So to calculate that on your data, you should use scipy.stats.scoreatpercentile. So for your case you can do something like:
>>> z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
>>> m = 3.
>>> n = 4.
>>> x = scipy.stats.scoreatpercentile(z, 100*(1-m/n))
>>> print(x)
1.05
>>> print((z>x).sum()/len(z)) # test, should be about 0.75
0.714285714286
Of course if you have a lot of values this estimate will be better.
Edit: Originally I had the percentile backwards. It should be 1-m/n, but I originally had just m/n.
Assuming that one second resolution for the trigger is ok...
import numpy as np
z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
period = 3
Divide each sample point by its period (in seconds) and create an array of one second data - assumes linear distribution(?) for each sample.
y = np.array([[n]*period for n in z / period])
y = y.flatten()
Reshape the data into four second periods (lossy)
h = len(y) % 4
x = y[:-h]
w = x.reshape((4, len(x) / 4))
Find the sum of each four second period and find the minimum of these intervals
v = w.sum(axis = -1)
# use the min value of these sums
threshold = v.min() # 2.1
This gives a gross threshold for non-overlapping four-second chunks- however it only produces 6 triggers for z which represents 42 seconds of data.
Use overlapping, rolling windows to find the minimum value of the sums of each four-second window (lossless)
def rolling(a, window, step = 1):
"""
Examples
--------
>>> a = np.arange(10)
>>> print rolling(a, 3)
[[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]
[5 6 7]
[6 7 8]
[7 8 9]]
>>> print rolling(a, 4)
[[0 1 2 3]
[1 2 3 4]
[2 3 4 5]
[3 4 5 6]
[4 5 6 7]
[5 6 7 8]
[6 7 8 9]]
>>> print rolling(a, 4, 2)
[[0 1 2 3]
[2 3 4 5]
[4 5 6 7]
[6 7 8 9]]
>>>
from http://stackoverflow.com/a/12498122/2823755
"""
shape = ( (a.size-window)/step + 1 , window)
strides = (a.itemsize*step, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
t = rolling(y, 4)
s = t.sum(axis = -1)
threshold = s.min() # 1.3999999
This will produce 8 triggers for z.

Categories

Resources