np.sum for row axis not working in Numpy - python

I wrote a softmax regression function def softmax_1(x) that essentially takes in a m x n matrix, exponentiates the matrix, then sums the exponentials of each column.
x = np.arange(-2.0, 6.0, 0.1)
scores = np.vstack([x, np.ones_like(x), 0.2 * np.ones_like(x)])
#scores shape is (3, 80)
def softmax_1(x):
"""Compute softmax values for each sets of scores in x."""
return(np.exp(x)/np.sum(np.exp(x),axis=0))
Converting it into a DataFrame I have to transpose
DF_activation_1 = pd.DataFrame(softmax_1(scores).T,index=x,columns=["x","1.0","0.2"])
So I wanted to try and make a version of the softmax function that takes in the transposed version and computes the softmax function
scores_T = scores.T
#scores_T shape is (80,3)
def softmax_2(y):
return(np.exp(y/np.sum(np.exp(y),axis=1)))
DF_activation_2 = pd.DataFrame(softmax_2(scores_T),index=x,columns=["x","1.0","0.2"])
Then I get this error:
Traceback (most recent call last):
File "softmax.py", line 22, in <module>
DF_activation_2 = pd.DataFrame(softmax_2(scores_T),index=x,columns=["x","1.0","0.2"])
File "softmax.py", line 18, in softmax_2
return(np.exp(y/np.sum(np.exp(y),axis=1)))
ValueError: operands could not be broadcast together with shapes (80,3) (80,)
Why doesn't this work when I transpose and switch the axis in the np.sum method?

Change
np.exp(y/np.sum(np.exp(y),axis=1))
to
np.exp(y)/np.sum(np.exp(y),axis=1, keepdims=True)
This will mean that np.sum will return an array of shape (80, 1) rather than (80,), which will broadcast correctly for the division. Also note the correction to the bracket closing.

Related

Softmax and its derivative along an axis

I'm trying to implement a Softmax activation that can be applied to arrays of any dimension and softmax can be obtained along a specified axis.
Let's suppose I've an array [[1,2],[3,4]], then if I need the softmax along the rows, I extract each row and apply softmax individually on it through np.apply_along_axis with axis=1. So for the example given above applying softmax to each of [1,2] and [3,4] we get the output as softmax = [[0.26894142, 0.73105858], [0.26894142, 0.73105858]]. So far so good.
Now for the backward pass, let's suppose, I'll have the gradient from the upper layer as upper_grad = [[1,1],[1,1]], so I compute the Jacobian jacobian = [[0.19661193, -0.19661193],[-0.19661193, 0.19661193]] of shape (2,2) for each of the 1D arrays of shape (2,) in softmax and then np.dot it with the corresponding 1D array in upper_grad of shape (2,), so the result of dot product will be an array of shape (2,), the final derivative will be grads = [[0. 0.],[0. 0.]]
I definitely know I'm going wrong somewhere, because while doing gradient checking, I get ~0.90, which is absolutely bonkers. Could someone please help with what is wrong in my approach and how I can resolve it?
import numpy as np
def softmax(arr, axis):
# implementation of softmax for a 1d array
def calc_softmax(arr_1d):
exponentiated = np.exp(arr_1d-np.max(arr_1d))
sum_val = np.sum(exponentiated)
return exponentiated/sum_val
# split the given array of multiple dims into 1d arrays along axis and
# apply calc_softmax to each of those 1d arrays
result = np.apply_along_axis(calc_softmax, axis, arr)
return result
def softmax_backward(arr, axis, upper_grad):
result = softmax(arr, axis)
counter = 0
upper_grad_slices = []
def get_ug_slices(arr_1d, upper_grad_slices):
upper_grad_slices.append(arr_1d)
def backward(arr_1d, upper_grad_slices, counter):
local_grad = -np.broadcast_to(arr_1d, (arr_1d.size, arr_1d.size)) # local_grad is the jacobian
np.fill_diagonal(local_grad, 1+np.diagonal(local_grad))
local_grad*=arr_1d.reshape(arr_1d.size, 1)
grads = np.dot(local_grad, upper_grad_slices[counter]) # grads is 1d array because (2,2) dot (2,)
counter+=1 # increment the counter to access the next slice of upper_grad_slices
return grads
# since apply_along_axis doesnt give the index of the 1d array,
# we take the slices of 1d array of upper_grad and store it in a list
np.apply_along_axis(get_ug_slices, axis, upper_grad, upper_grad_slices)
# Iterate over each 1d array in result along axis and calculate its local_grad(jacobian)
# and np.dot it with the corresponding upper_grad slice
grads = np.apply_along_axis(backward, axis, result, upper_grad_slices, counter)
return grads
a = np.array([[1,2],[3,4]])
result = softmax(a, 1)
print("Result")
print(result)
upper_grad = np.array([[1,1],[1,1]])
grads = softmax_backward(a, 1, upper_grad)
print("Gradients")
print(grads)
apply_along_axis documentation - https://numpy.org/doc/stable/reference/generated/numpy.apply_along_axis.html
I'm so dumb. I was using the counter to get the next slice of upper_grad, but the counter was only getting updated locally, so this caused me to get the same slice of upper_grad each time, thus giving invalid gradient. Resolved it using pop method on upper_grad_slices
Updated code
import numpy as np
def softmax(arr, axis):
# implementation of softmax for a 1d array
def calc_softmax(arr_1d):
exponentiated = np.exp(arr_1d-np.max(arr_1d))
sum_val = np.sum(exponentiated)
return exponentiated/sum_val
# split the given array of multiple dims into 1d arrays along axis and
# apply calc_softmax to each of those 1d arrays
result = np.apply_along_axis(calc_softmax, axis, arr)
return result
def softmax_backward(arr, axis, upper_grad):
result = softmax(arr, axis)
upper_grad_slices = []
def get_ug_slices(arr_1d, upper_grad_slices):
upper_grad_slices.append(arr_1d)
def backward(arr_1d, upper_grad_slices):
local_grad = -np.broadcast_to(arr_1d, (arr_1d.size, arr_1d.size)) # local_grad is the jacobian
np.fill_diagonal(local_grad, 1+np.diagonal(local_grad))
local_grad*=arr_1d.reshape(arr_1d.size, 1)
grads = np.dot(local_grad, upper_grad_slices.pop(0)) # grads is 1d array because (2,2) dot (2,)
return grads
# since apply_along_axis doesnt give the index of the 1d array,
# we take the slices of 1d array of upper_grad and store it in a list
np.apply_along_axis(get_ug_slices, axis, upper_grad, upper_grad_slices)
# Iterate over each 1d array in result along axis and calculate its local_grad(jacobian)
# and np.dot it with the corresponding upper_grad slice
grads = np.apply_along_axis(backward, axis, result, upper_grad_slices)
return grads
a = np.array([[1,2],[3,4]])
result = softmax(a, 1)
print("Result")
print(result)
upper_grad = np.array([[1,1],[1,1]])
grads = softmax_backward(a, 1, upper_grad)
print("Gradients")
print(grads)

Question about tensor product between four-dimensional arrays

I'm trying to multiply together some 4 dimensional arrays (block matrices) in the following way:
where C has shape (50,50,12,6), Q has shape (50,50,12,12), R has shape (50,50,6,6),
I wonder how I should choose the correct axes to carry out tensor products? I tried doing matrix product in the following way:
H = np.tensordot(C_block.T,Q_block) # C_block
But a value error is returned:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_2976/3668270968.py in <module>
----> 1 H = np.tensordot(C_block.T,Q_block) # C_block
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (6,12,12,12)->(6,12,newaxis,newaxis) (50,50,12,6)->(50,50,newaxis,newaxis) and requested shape (12,6)
Creating some arrays of the right shape mix - with 10 instead of 50 for the batch dimensions. That is, treating the first 2 dimensions as batch that is repeated across all arrays including the result.
The sum-of-products dimension is size 12, right and left for Q.
This is most easily expressed with einsum.
In [71]: N=10; C=np.ones((N,N,12,6)); Q=np.ones((N,N,12,12)); R=np.ones((N,N,6,6))
In [73]: res = np.einsum('ijkl,ijkm,ijmn->ijln',C,Q,C)+R
In [74]: res.shape
Out[74]: (10, 10, 6, 6)
dot does not handle 'batches' right, hence the memory error in the other answer. np.matmul does though.
In [75]: res1 = C.transpose(0,1,3,2)#Q#C + R
In [76]: res1.shape
Out[76]: (10, 10, 6, 6)
With all ones, the value test isn't very diagnostic, still:
In [77]: np.allclose(res,res1)
Out[77]: True
matmul/# treats the first 2 dimensions as 'batch', and does a dot on the last 2. C.T in the equation should just swap the last 2 dimensions, not all.
Based on the sizes of your arrays, it looks like you are trying to do a regular matrix multiply on 50x50 arrays of matrices.
CT = np.swapaxes(C, 2, 3)
H = CT # Q # C + R
The documentation for np.matmul (which can be written using the operator #) specifically mentions this case.

Dot product of two numpy arrays with 3D Vectors

My goal is finding the closest Segment (in an array of segments) to a single point.
Getting the dot product between arrays of 2D coordinates work, but using 3D coordinates gives the following error:
*ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 3)*
A = np.array([[1,1,1],[2,2,2]])
B = np.array([[3,3,3], [4,4,4]])
dp = np.dot(A,B)
dp should return 2 values,
The dot product of [1,1,1]#[3,3,3] and [2,2,2]#[4,4,4]
// Thanks everyone.
Here is the final solution to find the closest line segment to a single point.
Any optimization is welcome.
import numpy as np
import time
#find closest segment to single point
then = time.time()
#random line segment
l1 = np.random.rand(1000000, 3)*10
l2 = np.random.rand(1000000, 3)*10
#single point
p = np.array([5,5,5]) #only single point
#set to origin
line = l2-l1
pv = p-l1
#length of line squared
len_sq = np.sum(line**2, axis = 1) #len_sq = numpy.einsum("ij,ij->i", line, line)
#dot product of 3D vectors with einsum
dot = np.einsum('ij,ij->i',line,pv) #np.sum(line*pv,axis=1)
#percentage of line the pv vector travels in
param = np.array([dot/len_sq])
#param<0 projected point=l1, param>1 pp=l2
clamped_param = np.clip(param,0,1)
#add line fraction to l1 to get projected point
pp = l1+(clamped_param.T*line)
##distance vector between single point and projected point
pp_p = pp-p
#sort by smallest distance between projected point and l1
index_of_mininum_dist = np.sum(pp_p**2, axis = 1).argmin()
print(index_of_mininum_dist)
print("FINISHED IN: ", time.time()-then)
np.dot works only on vectors, not matrices. When passing matrices it expects to do a matrix multiplication, which will fail because of the dimensions passed.
On a vector it will work like you expected:
np.dot(A[0,:],B[0,:])
np.dot(A[1,:],B[1,:])
To do it in one go:
np.sum(A*B,axis=1)
Do you mean this:
np.einsum('ij,ij->i',A,B)
output:
[ 9 24]
However, if you want the dot product of every row in A with every row in B, you should do:
A#B.T
output:
[[ 9 12]
[18 24]]
The dot product is numpy is not designed to be used with arrays apparently. It's pretty easy to write some wrapper around it. Like this for example:
def array_dot(A, B):
return [A[i]#B[i] for i in range(A.shape[0])]
In [265]: A = np.array([[1,1,1],[2,2,2]])
...: B = np.array([[3,3,3], [4,4,4]])
Element wise multiplication followed by sum works fine:
In [266]: np.sum(A*B, axis=1)
Out[266]: array([ 9, 24])
einsum also makes expressing this easy:
In [267]: np.einsum('ij,ij->i',A,B)
Out[267]: array([ 9, 24])
dot with 2d arrays (here (2,3) shaped), performs matrix multiplication, the classic across rows, down columns. In einsum notation this is 'ij,jk->ik'.
In [268]: np.dot(A,B)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-268-189f80e2c351> in <module>
----> 1 np.dot(A,B)
<__array_function__ internals> in dot(*args, **kwargs)
ValueError: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
With a transpose, dimensions match (2,3) with (3,2),but the result is (2,2):
In [269]: np.dot(A,B.T)
Out[269]:
array([[ 9, 12],
[18, 24]])
The desired values are on the diagonal.
One way to think of the problem is that we want to do a batch of 1d products. matmul/# was added to perform batch matrix multiplication (which dot can't do). But the arrays have to be expanded to 3d, so the batch dimension is the leading one (and the 3 is on the respective last and 2nd to the last dimensions):
In [270]: A[:,None,:]#B[:,:,None] # (2,1,3) with (2,3,1)
Out[270]:
array([[[ 9]],
[[24]]])
But the result is (2,1,1) shaped. The right numbers are there, but we have to squeeze out the extra dimensions.
Overall then the first 2 solutions are simplest - sum or product or einsum equivalent.

matrix to the power of a column of a dense matrix using numpy in Python

I'm trying to obtain all the values in a matrix beta VxK to the power of all the values in a column Vx1 that is part of a dense matrix VxN. So each value in beta should be to the power of the corresponding line in the column and this should be done for all K columns in beta. When I use np.power on python for a practice numpy array for beta using:
np.power(head_beta.T, head_matrix[:,0])
I am able to obtain the results I want. The dimensions are (3, 10) for beta and (10,) for head_matrix[:,0] where in this case 3=K and 10=V.
However, if I do this on my actual matrix, which was obtained by using
matrix=csc_matrix((data,(row,col)), shape=(30784,72407) ).todense()
where data, row, and col are arrays, I am unable to do the same operation:
np.power(beta.T, matrix[:,0])
where the dimensions are (10, 30784) for beta and (30784, 1) for matrix where in this case 10=K and 30784=V. I get the following error
ValueError Traceback (most recent call last)
<ipython-input-29-9f55d4cb9c63> in <module>()
----> 1 np.power(beta.T, matrix[:,0])
ValueError: operands could not be broadcast together with shapes (10,30784) (30784,1) `
It seems that the difference is that matrix is a matrix (length,1) and head_matrix is actually a numpy array (length,) that I created. How can I do this same operation with the column of a dense matrix?
In the problem case it can't broadcast (10,30784) and (30784,1). As you note it works when (10,N) is used with (N,). That's because it can expand the (N,) to (1,N) and on to (10,N).
M = sparse.csr_matrix(...).todense()
is np.matrix which is always 2d, so M(:,0) is (N,1). There are several solutons.
np.power(beta.T, M[:,0].T) # change to a (1,N)
np.power(beta, M[:,0]) # line up the expandable dimensions
convert the sparse matrix to an array:
A = sparse.....toarray()
np.power(beta.T, A[:,0])
M[:,0].squeeze() and M[:,0].ravel() both produce a (1,N) matrix. So does M[:,0].reshape(-1). That 2d quality is persistent, as long as it returns a matrix.
M[:,0].A1 produces a (N,) array
From a while back: Numpy matrix to array
You can use the squeeze method on arrays to get rid of this extra dimension.
So
np.power(beta.T, matrix[:,0].squeeze()) should do the trick.

Getting around, numpy objects mismatch error in python

I'm having a problem with multiplying two big matrices in python using numpy.
I have a (15,7) matrix and I want to multipy it by its transpose, i.e. AT(7,15)*A(15*7) and mathemeticaly this should work, but I get an error :
ValueError:shape mismatch:objects cannot be broadcast to a single shape
I'm using numpy in Python. How can I get around this, anyone please help!
You've probably represented the matrices as arrays. You can either convert them to matrices with np.asmatrix, or use np.dot to do the matrix multiplication:
>>> X = np.random.rand(15 * 7).reshape((15, 7))
>>> X.T * X
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (7,15) (15,7)
>>> np.dot(X.T, X).shape
(7, 7)
>>> X = np.asmatrix(X)
>>> (X.T * X).shape
(7, 7)
One difference between arrays and matrices is that * on a matrix is matrix product, while on an array it's an element-wise product.

Categories

Resources