Related
I have the following function that calculates the eucledian distance between all combinations of the vectors in Matrix A and Matrix B
def distance_matrix(A,B):
n=A.shape[1]
m=B.shape[1]
C=np.zeros((n,m))
for ai, a in enumerate(A.T):
for bi, b in enumerate(B.T):
C[ai][bi]=np.linalg.norm(a-b)
return C
This works fine and creates an n*m-Matrix from a d*n-Matrix and a d*m-Matrix containing the eucledian distance between all combinations of the column vectors.
>>> print(A)
[[-1 -1 1 1 2]
[ 1 -1 2 -1 1]]
>>> print(B)
[[-2 -1 1 2]
[-1 2 1 -1]]
>>> print(distance_matrix(A,B))
[[2.23606798 1. 2. 3.60555128]
[1. 3. 2.82842712 3. ]
[4.24264069 2. 1. 3.16227766]
[3. 3.60555128 2. 1. ]
[4.47213595 3.16227766 1. 2. ]]
I spent some time looking for a numpy or scipy function to achieve this in a more efficient way. Is there such a function or what would be the vecotrized way to do this?
You can use:
np.linalg.norm(A[:,:,None]-B[:,None,:],axis=0)
or (totaly equivalent but without in-built function)
((A[:,:,None]-B[:,None,:])**2).sum(axis=0)**0.5
We need a 5x4 final array so we extend our array this way:
A[:,:,None] -> 2,5,1
↑ ↓
B[:,None,:] -> 2,1,4
A[:,:,None] - B[:,None,:] -> 2,5,4
and we apply our sum over the axis 0 to finally get a 5,4 ndarray.
Yes, you can broadcast your vectors:
A = np.array([[-1, -1, 1, 1, 2], [ 1, -1, 2, -1, 1]])
B = np.array([[-2, -1, 1, 2], [-1, 2, 1, -1]])
C = np.linalg.norm(A.T[:, None, :] - B.T[None, :, :], axis=-1)
print(C)
array([[2.23606798, 1. , 2. , 3.60555128],
[1. , 3. , 2.82842712, 3. ],
[4.24264069, 2. , 1. , 3.16227766],
[3. , 3.60555128, 2. , 1. ],
[4.47213595, 3.16227766, 1. , 2. ]])
You can get an explanation of how it works here:
https://sparrow.dev/pairwise-distance-in-numpy/
I have a 3-d matrix as shown below and would like to take the max value along axis 1, and keep all non-max values to zero.
A = np.random.rand(3,3,2)
[[[0.34444547, 0.50260393],
[0.93374423, 0.39021899],
[0.94485653, 0.9264881 ]],
[[0.95446736, 0.335068 ],
[0.35971558, 0.11732342],
[0.72065402, 0.36436023]],
[[0.56911013, 0.04456443],
[0.17239996, 0.96278067],
[0.26004909, 0.06767436]]]
Desired result:
[[0 , 0 ],
[0 , 0 ],
[0.94485653, 0.9264881]],
[[0.95446736, 0 ],
[0 , 0 ],
[0 , 0.36436023]],
[[0.56911013, 0 ],
[0 , 0.96278067],
[0 , 0 ]]])
I have tried:
B = np.zeros_like(A) #return matrix of zero with same shape as A
max_idx = np.argmax(A, axis=1) #index along axis 1 with max value
array([[2, 0],
[2, 2],
[0, 2],
[0, 1]])
C = np.max(A, axis=1, keepdims = True) #gives a (4,1,2) matrix of max value along axis 1
array([[[0.95377958, 0.92940525]],
[[0.94485653, 0.9264881 ]],
[[0.95446736, 0.36436023]],
[[0.56911013, 0.96278067]]])
But I can't figure out how to combine these ideas together to get my desired output. Please help!!
You can get the 3 dimensional index of your max values from max_idx. The values in max_idx are the index along axis 1 of your max values. There are six values since your other axes are 3 and 2 (3 x 2 = 6). You just have to realize the order that numpy goes through them to get the index for each of the other axes. You iterate over the last axes first:
d0, d1, d2 = A.shape
a0 = [i for i in range(d0) for _ in range(d2)] # [0, 0, 1, 1, 2, 2]
a1 = max_idx.flatten() # [2, 2, 0, 2, 0, 1]
a2 = [k for _ in range(d0) for k in range(d2)] # [0, 1, 0, 1, 0, 1]
B[a0, a1, a2] = A[a0, a1, a2]
Output:
array([[[0. , 0. ],
[0. , 0. ],
[0.94485653, 0.9264881 ]],
[[0.95446736, 0. ],
[0. , 0. ],
[0. , 0.36436023]],
[[0.56911013, 0. ],
[0. , 0.96278067],
[0. , 0. ]]])
Assuming I have a matrix / array / list like a=[1,2,3,4,5] and I want to nullify all entries except for the max so it would be a=[0,0,0,0,5].
I'm using b = [val if idx == np.argmax(a) else 0 for idx,val in enumerate(a)] but is there a better (and faster) way (especially for more than 1-dim arrays...)
You can use numpy for an in-place solution. Note that the below method will make all matches for the max value equal to 0.
import numpy as np
a = np.array([1,2,3,4,5])
a[np.where(a != a.max())] = 0
# array([0, 0, 0, 0, 5])
For unique maxima, see #cᴏʟᴅsᴘᴇᴇᴅ's solution.
Rather than masking, you can create an array of zeros and set the right index appropriately?
1-D (optimised) Solution
(Setup) Convert a to a 1D array: a = np.array([1,2,3,4,5]).
To replace just one instance of the max
b = np.zeros_like(a)
i = np.argmax(a)
b[i] = a[i]
To replace all instances of the max
b = np.zeros_like(a)
m = a == a.max()
b[m] = a[m]
N-D solution
np.random.seed(0)
a = np.random.randn(5, 5)
b = np.zeros_like(a)
m = a == a.max(1, keepdims=True)
b[m] = a[m]
b
array([[0. , 0. , 0. , 2.2408932 , 0. ],
[0. , 0.95008842, 0. , 0. , 0. ],
[0. , 1.45427351, 0. , 0. , 0. ],
[0. , 1.49407907, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 2.26975462]])
Works for all instances of max per row.
I want to write only one function to calculate true mean (don't count the zero element when averaging numbers in row or column) of each row or column of matrix. I try to control whether it is by-row or by-column calculation using axis parameters as 1 or 0, respectively.
This is the function for by-column calculation
def true_mean(matrix, axis):
countnonzero = (matrix!=0).sum(axis)
mask = countnonzero!=0
output_mat = np.zeros(matrix.T.shape[axis])
output_mat[mask] = matrix[:,mask].sum(axis)/countnonzero[mask] # line4
return output_mat
Test the function
eachPSM = np.ones([5,4])
eachPSM[0] = 0
eachPSM[2,2:4] = 5
print each PSM
> [[ 0. 0. 0. 0.]
[ 1. 1. 1. 1.]
[ 1. 1. 5. 5.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
ans = true_mean(eachPSM,0)
print ans
> [ 1. 1. 2. 2.]
However, if I want to calculate by row (axis = 1), only line4 has to change to
output_mat[mask] = matrix[mask,:].sum(axis)/countnonzero[mask]
Is there a way to flip matrix[:,mask] to matrix[mask,:] by using only number 0 and 1? So I can have only one function for calculating true mean from row and column.
You can use the fact that the [] operator takes a tuple as input argument:
indexer = [slice(None), slice(None)]
indexer[axis] = mask
print(x[indexer])
slice(None) is equivalent to :, so we construct a tuple that takes the full matrix [:, :] and replace the entry of the desired axis with the mask.
Complete example:
import numpy as np
x = np.arange(9).reshape(3, 3)
mask = np.array([True, False, True])
for axis in [0, 1]:
indexer = [slice(None)] * x.ndim
indexer[axis] = mask
print(x[indexer])
prints
[[0 1 2]
[6 7 8]]
and
[[0 2]
[3 5]
[6 8]]
I've been trying to create a watershed algorithm and as all the examples seem to be in Python I've run into a bit of a wall. I've been trying to find in numpy documentation what this line means:
matrixVariable[A==255] = 0
but have had no luck. Could anyone explain what that operation does?
For context the line in action: label [lbl == -1] = 0
The expression A == 255 creates a boolean array which is True where x == 255 in A and False otherwise.
The expression matrixVariable[A==255] = 0 sets each index corresponding to a True value in A == 255 to 0.
EG:
import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = np.zeros([3, 3])
print('before:')
print(B)
B[A>5] = 5
print('after:')
print(B)
OUT:
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
after:
[[ 0. 0. 0.]
[ 0. 0. 5.]
[ 5. 5. 5.]]
I assumed that matrixVariable and A are numpy arrays. If the assumption is correct then "matrixVariable[A==255] = 0" expression first gets the index of the array A where values of A are equal to 255 then gets the values of matrixVariable for those index and set them to "0"
Example:
import numpy as np
matrixVariable = np.array([(1, 3),
(2, 2),
(3,1)])
A = np.array([255, 1,255])
So A[0] and A[2] are equal to 255
matrixVariable[A==255]=0 #then sets matrixVariable[0] and matrixVariable[2] to zero
print(matrixVariable) # this would print
[[0 0]
[2 2]
[0 0]]