Softmax and its derivative along an axis

Softmax and its derivative along an axis - python

I'm trying to implement a Softmax activation that can be applied to arrays of any dimension and softmax can be obtained along a specified axis.
Let's suppose I've an array [[1,2],[3,4]], then if I need the softmax along the rows, I extract each row and apply softmax individually on it through np.apply_along_axis with axis=1. So for the example given above applying softmax to each of [1,2] and [3,4] we get the output as softmax = [[0.26894142, 0.73105858], [0.26894142, 0.73105858]]. So far so good.
Now for the backward pass, let's suppose, I'll have the gradient from the upper layer as upper_grad = [[1,1],[1,1]], so I compute the Jacobian jacobian = [[0.19661193, -0.19661193],[-0.19661193, 0.19661193]] of shape (2,2) for each of the 1D arrays of shape (2,) in softmax and then np.dot it with the corresponding 1D array in upper_grad of shape (2,), so the result of dot product will be an array of shape (2,), the final derivative will be grads = [[0. 0.],[0. 0.]]
I definitely know I'm going wrong somewhere, because while doing gradient checking, I get ~0.90, which is absolutely bonkers. Could someone please help with what is wrong in my approach and how I can resolve it?
import numpy as np
def softmax(arr, axis):
# implementation of softmax for a 1d array
def calc_softmax(arr_1d):
exponentiated = np.exp(arr_1d-np.max(arr_1d))
sum_val = np.sum(exponentiated)
return exponentiated/sum_val
# split the given array of multiple dims into 1d arrays along axis and
# apply calc_softmax to each of those 1d arrays
result = np.apply_along_axis(calc_softmax, axis, arr)
return result
def softmax_backward(arr, axis, upper_grad):
result = softmax(arr, axis)
counter = 0
upper_grad_slices = []
def get_ug_slices(arr_1d, upper_grad_slices):
upper_grad_slices.append(arr_1d)
def backward(arr_1d, upper_grad_slices, counter):
local_grad = -np.broadcast_to(arr_1d, (arr_1d.size, arr_1d.size)) # local_grad is the jacobian
np.fill_diagonal(local_grad, 1+np.diagonal(local_grad))
local_grad*=arr_1d.reshape(arr_1d.size, 1)
grads = np.dot(local_grad, upper_grad_slices[counter]) # grads is 1d array because (2,2) dot (2,)
counter+=1 # increment the counter to access the next slice of upper_grad_slices
return grads
# since apply_along_axis doesnt give the index of the 1d array,
# we take the slices of 1d array of upper_grad and store it in a list
np.apply_along_axis(get_ug_slices, axis, upper_grad, upper_grad_slices)
# Iterate over each 1d array in result along axis and calculate its local_grad(jacobian)
# and np.dot it with the corresponding upper_grad slice
grads = np.apply_along_axis(backward, axis, result, upper_grad_slices, counter)
return grads
a = np.array([[1,2],[3,4]])
result = softmax(a, 1)
print("Result")
print(result)
upper_grad = np.array([[1,1],[1,1]])
grads = softmax_backward(a, 1, upper_grad)
print("Gradients")
print(grads)
apply_along_axis documentation - https://numpy.org/doc/stable/reference/generated/numpy.apply_along_axis.html

I'm so dumb. I was using the counter to get the next slice of upper_grad, but the counter was only getting updated locally, so this caused me to get the same slice of upper_grad each time, thus giving invalid gradient. Resolved it using pop method on upper_grad_slices
Updated code
import numpy as np
def softmax(arr, axis):
# implementation of softmax for a 1d array
def calc_softmax(arr_1d):
exponentiated = np.exp(arr_1d-np.max(arr_1d))
sum_val = np.sum(exponentiated)
return exponentiated/sum_val
# split the given array of multiple dims into 1d arrays along axis and
# apply calc_softmax to each of those 1d arrays
result = np.apply_along_axis(calc_softmax, axis, arr)
return result
def softmax_backward(arr, axis, upper_grad):
result = softmax(arr, axis)
upper_grad_slices = []
def get_ug_slices(arr_1d, upper_grad_slices):
upper_grad_slices.append(arr_1d)
def backward(arr_1d, upper_grad_slices):
local_grad = -np.broadcast_to(arr_1d, (arr_1d.size, arr_1d.size)) # local_grad is the jacobian
np.fill_diagonal(local_grad, 1+np.diagonal(local_grad))
local_grad*=arr_1d.reshape(arr_1d.size, 1)
grads = np.dot(local_grad, upper_grad_slices.pop(0)) # grads is 1d array because (2,2) dot (2,)
return grads
# since apply_along_axis doesnt give the index of the 1d array,
# we take the slices of 1d array of upper_grad and store it in a list
np.apply_along_axis(get_ug_slices, axis, upper_grad, upper_grad_slices)
# Iterate over each 1d array in result along axis and calculate its local_grad(jacobian)
# and np.dot it with the corresponding upper_grad slice
grads = np.apply_along_axis(backward, axis, result, upper_grad_slices)
return grads
a = np.array([[1,2],[3,4]])
result = softmax(a, 1)
print("Result")
print(result)
upper_grad = np.array([[1,1],[1,1]])
grads = softmax_backward(a, 1, upper_grad)
print("Gradients")
print(grads)

Related

Centering matrix

I want to write a function for centering an input data matrix by multiplying it with the centering matrix. The function shall subtract the row-wise mean from the input.
My code:
import numpy as np
def centering(data):
n = data.shape()[0]
centeringMatrix = np.identity(n) - 1/n * (np.ones(n) # np.ones(n).T)
data = centeringMatrix # data
data = np.array([[1,2,3], [3,4,5]])
center_with_matrix(data)
But I get a wrong result matrix, it is not centered.
Thanks!

The centering matrix is
np.eye(n) - np.ones((n, n)) / n
Here is a list of issues in your original formulation:
np.ones(n).T is the same as np.ones(n). The transpose of a 1D array is a no-op in numpy. If you want to turn a row vector into a column vector, add the dimension explicitly:
np.ones((n, 1))
OR
np.ones(n)[:, None]
The normal definition is to subtract the column-wise mean, not the row-wise, so you will have to transpose and right-multiply the input to get row-wise operation:
n = data.shape()[1]
...
data = (centeringMatrix # data.T).T
Your function creates a new array for the output but does not currently return anything. You can either return the result, or perform the assignment in-place:
return (centeringMatrix # data.T).T
OR
data[:] = (centeringMatrix # data.T).T
OR
np.matmul(centeringMatrix, data.T, out=data.T)

Iterating over successive 1-D slices along arbitrary axis of numpy array

I am writing a python package that performs various complex statistical analysis tasks along an arbitrary axis of an arbitrarily-shaped numpy array.
Currently, so that the array shape and axis can be arbitrary, I just permute the array so the axis of interest is placed on the far RHS, and squash the LHS axes into one. For example, if the array shape is (3,4,5), and we want to perform some operation along axis 1, it is transformed into the shape (15,4), the operations is performed along axis -1, then it is transformed back into the shape (3,4,5) and returned by the function.
I feel this approach may be unnecessarily slow because of all these array manipulations. Is there a way that I can cleanly iterate over all but one dimension of the array? That is, in the above example this would go [0,:,0], [0,:,1], ..., [2,:,3], [2,:,4], but again this should work for arbitrary array shape and axis position.
Maybe np.ndenumerate, np.ndindex, and np.take can be used for this somehow?
Edit: Is there a way to do this with np.nditer? Perhaps this can match the speed of permuting/reshaping.

Turns out just transposing and reshaping is indeed faster. So I guess the answer is... don't do that, it is preferable to permute and reshape as I was already doing.
Here's the code from my project.
# Benchmark
f = lambda x: x # can change this to any arbitrary function
def test1(data, axis=-1):
# Test the lead flatten approach
data, shape = lead_flatten(permute(data, axis))
output = np.empty(data.shape)
for i in range(data.shape[0]): # iterate along first dimension; each row is an autocor
output[i,:] = f(data[i,:]) # arbitrary complex equation
return unpermute(lead_unflatten(output, shape), axis)
def test2(data, axis=-1):
# Test the new approach
output = np.empty(data.shape)
for d,o in zip(iter_1d(data, axis), iter_1d(output, axis)):
o[...] = f(d)
return output
# Iterator class
class iter_1d(object):
def __init__(self, data, axis=-1):
axis = (axis % data.ndim) # e.g. for 3D array, -1 becomes 2
self.data = data
self.axis = axis
def __iter__(self):
shape = (s for i,s in enumerate(self.data.shape) if i!=self.axis)
self.iter = np.ndindex(*shape)
return self
def __next__(self):
idx = self.iter.next()
idx = [*idx]
idx.insert(self.axis, slice(None))
return self.data[idx]
# Permute and reshape functions
def lead_flatten(data, nflat=None):
shape = list(data.shape)
if nflat is None:
nflat = data.ndim-1 # all but last dimension
if nflat<=0: # just apply singleton dimension
return data[None,...], shape
return np.reshape(data, (np.prod(data.shape[:nflat]).astype(int), *data.shape[nflat:]), order='C'), shape # make column major
def lead_unflatten(data, shape, nflat=None):
if nflat is None:
nflat = len(shape) - 1 # all but last dimension
if nflat<=0: # we artificially added a singleton dimension; remove it
return data[0,...]
if data.shape[0] != np.prod(shape[:nflat]):
raise ValueError(f'Number of leading elements {data.shape[0]} does not match leading shape {shape[nflat:]}.')
if not all(s1==s2 for s1,s2 in zip(data.shape[1:], shape[nflat:])):
raise ValueError(f'Trailing dimensions on data, {data.shape[1:]}, do not match trailing dimensions on new shape, {shape[nflat:]}.')
return np.reshape(data, shape, order='C')
def permute(data, source=-1, destination=-1):
data = np.moveaxis(data, source, destination)
return data
def unpermute(data, source=-1, destination=-1):
data = np.moveaxis(data, destination, source)
return data
And here are results from some %timeit operations.
import numpy as np
a = np.random.rand(10,20,30,40)
%timeit -r10 -n10 test1(a, axis=2) # around 12ms
%timeit -r10 -n10 test2(a, axis=2) # around 22ms

How to optimize this function calculating the categorical crossentropy of two numpy arrays

I want to calculate the categorical crossentropy of two numpy arrays. Both arrays have the same length.
y_true contains around 10000 2D arrays, which are the labels
y_pred contains 10000 2D arrays, which are my predictions
The result should be a 1D numpy array which contains all the categorical crossentropy values for the arrays. The formular is:
Here x_true is the i-th element of one true vector and x_pred is the i-th element of the prediction vector.
My implementation looks like this, but it is very slow. The reshaping is done to convert the 2D arrays to 1D arrays to simple iterate over them.
def categorical_cross_entropy(y_true, y_pred):
losses = np.zeros(len(y_true))
for i in range(len(y_true)):
single_sequence = y_true[i].reshape(y_true.shape[1]*y_true.shape[2])
single_pred = y_pred[i].reshape(y_pred.shape[1]*y_pred.shape[2])
sum = 0
for j in range(len(single_sequence)):
log = math.log(single_pred[j])
sum = sum + single_sequence[j] * log
sum = sum * (-1)
losses[i] = sum
return losses
A conversion to tensors is not possible, since tf.constant(y_pred) fails in a MemoryError, because every 2D array in y_true and y_pred has roughly the dimensions 190 x 190. So any ideas?

You can use scipy.special.xlogy. For example,
In [10]: import numpy as np
In [11]: from scipy.special import xlogy
Create some data:
In [12]: y_true = np.random.randint(1, 10, size=(8, 200, 200))
In [13]: y_pred = np.random.randint(1, 10, size=(8, 200, 200))
Compute the result using xlogy:
In [14]: -xlogy(y_true, y_pred).sum(axis=(1, 2))
Out[14]:
array([-283574.67634307, -283388.18672431, -284720.65206688,
-285517.06983709, -286383.26148469, -282200.33634505,
-285781.78641736, -285862.91148953])
Verify the result by computing it with your function:
In [15]: categorical_cross_entropy(y_true, y_pred)
Out[15]:
array([-283574.67634309, -283388.18672432, -284720.65206689,
-285517.0698371 , -286383.2614847 , -282200.33634506,
-285781.78641737, -285862.91148954])
If you don't want the dependence on scipy, you can do the same thing with np.log, but you might get a warning if any value in y_pred is 0:
In [20]: -(y_true*np.log(y_pred)).sum(axis=(1, 2))
Out[20]:
array([-283574.67634307, -283388.18672431, -284720.65206688,
-285517.06983709, -286383.26148469, -282200.33634505,
-285781.78641736, -285862.91148953])

Reducing two tensors in Tensorflow

I have two tensors.
A tensor of shape (1,N)
A tensor of shape (N,T)
What I want to calculate is the following scalar:
tf.reduce_sum seemed helpful, but I couldn't get my head around combining the two tensors and reduce functions to get what I want. Can someone help me how to write the above equation in tensorflow?

Does this work?
import tensorflow as tf
import numpy as np
N = 10
T = 20
l = tf.constant(np.random.randn(1, N), dtype=tf.float32)
z = tf.constant(np.random.randn(N, T), dtype=tf.float32)
with tf.Session() as sess:
# swap axis for broadcasting to work
l = tf.transpose(l, [1, 0])
z_div_l = tf.divide(z, l)
z_div_l_2 = tf.divide(1.0 - z, 1.0 - l)
result = tf.reduce_sum(tf.add(z_div_l, z_div_l_2), axis=0)
eval_result = sess.run(result)
print('{}\n{}'.format(eval_result.shape, eval_result))
This calculates the above expression for every t from 0 to T-1, so it is not a scalar but a vector of size (T,). Your question mentions you want to compute just one scalar, but the sum is only over N and not over T, so I assumed you just want this expression to be evaluated for every t.

mapping a numpy array to a function, passing along the indices

I have trained per-pixel models on many images and want to evaluate them on new images.
What I'd like to do is for each image of shape (N, M, 3), apply a function in this fashion:
myfunc(array[i, j, :], i, j)
# Takes (3,1) input and indices
def myfunc(input, i, j):
ret1, ret2 = model[i,j].predict(input)
# returns a single float value
return ret1[1]
where i, j are indices, where myfunc will look up the correct model parameters to apply. If it helps, model can be a numpy array of objects, with the same dimensions as the original input's first two dimensions (N, M)
I was looking at ufuncs and vectorize and wasn't really sure if they did what I wanted. Is there a provided interface for doing this, or will I have to loop through the array myself (ugly and possibly slower as it is in python).
Alternatively, what about applying the same function to each value?
e.g.
myfunc(array[i, j, :])
# Takes (3,1) input
def myfunc(input):
ret1, ret2 = model.predict(input)
# returns a single float value
return ret1[1]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Softmax and its derivative along an axis - python

Related

Centering matrix

Iterating over successive 1-D slices along arbitrary axis of numpy array

How to optimize this function calculating the categorical crossentropy of two numpy arrays

Reducing two tensors in Tensorflow

mapping a numpy array to a function, passing along the indices

Categories

Resources