Eucledian distance matrix between two matrices - python

I have the following function that calculates the eucledian distance between all combinations of the vectors in Matrix A and Matrix B
def distance_matrix(A,B):
n=A.shape[1]
m=B.shape[1]
C=np.zeros((n,m))
for ai, a in enumerate(A.T):
for bi, b in enumerate(B.T):
C[ai][bi]=np.linalg.norm(a-b)
return C
This works fine and creates an n*m-Matrix from a d*n-Matrix and a d*m-Matrix containing the eucledian distance between all combinations of the column vectors.
>>> print(A)
[[-1 -1 1 1 2]
[ 1 -1 2 -1 1]]
>>> print(B)
[[-2 -1 1 2]
[-1 2 1 -1]]
>>> print(distance_matrix(A,B))
[[2.23606798 1. 2. 3.60555128]
[1. 3. 2.82842712 3. ]
[4.24264069 2. 1. 3.16227766]
[3. 3.60555128 2. 1. ]
[4.47213595 3.16227766 1. 2. ]]
I spent some time looking for a numpy or scipy function to achieve this in a more efficient way. Is there such a function or what would be the vecotrized way to do this?

You can use:
np.linalg.norm(A[:,:,None]-B[:,None,:],axis=0)
or (totaly equivalent but without in-built function)
((A[:,:,None]-B[:,None,:])**2).sum(axis=0)**0.5
We need a 5x4 final array so we extend our array this way:
A[:,:,None] -> 2,5,1
↑ ↓
B[:,None,:] -> 2,1,4
A[:,:,None] - B[:,None,:] -> 2,5,4
and we apply our sum over the axis 0 to finally get a 5,4 ndarray.

Yes, you can broadcast your vectors:
A = np.array([[-1, -1, 1, 1, 2], [ 1, -1, 2, -1, 1]])
B = np.array([[-2, -1, 1, 2], [-1, 2, 1, -1]])
C = np.linalg.norm(A.T[:, None, :] - B.T[None, :, :], axis=-1)
print(C)
array([[2.23606798, 1. , 2. , 3.60555128],
[1. , 3. , 2.82842712, 3. ],
[4.24264069, 2. , 1. , 3.16227766],
[3. , 3.60555128, 2. , 1. ],
[4.47213595, 3.16227766, 1. , 2. ]])
You can get an explanation of how it works here:
https://sparrow.dev/pairwise-distance-in-numpy/

Related

Python: Find max value along one axis in 3d matrix, make non-max values zero

I have a 3-d matrix as shown below and would like to take the max value along axis 1, and keep all non-max values to zero.
A = np.random.rand(3,3,2)
[[[0.34444547, 0.50260393],
[0.93374423, 0.39021899],
[0.94485653, 0.9264881 ]],
[[0.95446736, 0.335068 ],
[0.35971558, 0.11732342],
[0.72065402, 0.36436023]],
[[0.56911013, 0.04456443],
[0.17239996, 0.96278067],
[0.26004909, 0.06767436]]]
Desired result:
[[0 , 0 ],
[0 , 0 ],
[0.94485653, 0.9264881]],
[[0.95446736, 0 ],
[0 , 0 ],
[0 , 0.36436023]],
[[0.56911013, 0 ],
[0 , 0.96278067],
[0 , 0 ]]])
I have tried:
B = np.zeros_like(A) #return matrix of zero with same shape as A
max_idx = np.argmax(A, axis=1) #index along axis 1 with max value
array([[2, 0],
[2, 2],
[0, 2],
[0, 1]])
C = np.max(A, axis=1, keepdims = True) #gives a (4,1,2) matrix of max value along axis 1
array([[[0.95377958, 0.92940525]],
[[0.94485653, 0.9264881 ]],
[[0.95446736, 0.36436023]],
[[0.56911013, 0.96278067]]])
But I can't figure out how to combine these ideas together to get my desired output. Please help!!
You can get the 3 dimensional index of your max values from max_idx. The values in max_idx are the index along axis 1 of your max values. There are six values since your other axes are 3 and 2 (3 x 2 = 6). You just have to realize the order that numpy goes through them to get the index for each of the other axes. You iterate over the last axes first:
d0, d1, d2 = A.shape
a0 = [i for i in range(d0) for _ in range(d2)] # [0, 0, 1, 1, 2, 2]
a1 = max_idx.flatten() # [2, 2, 0, 2, 0, 1]
a2 = [k for _ in range(d0) for k in range(d2)] # [0, 1, 0, 1, 0, 1]
B[a0, a1, a2] = A[a0, a1, a2]
Output:
array([[[0. , 0. ],
[0. , 0. ],
[0.94485653, 0.9264881 ]],
[[0.95446736, 0. ],
[0. , 0. ],
[0. , 0.36436023]],
[[0.56911013, 0. ],
[0. , 0.96278067],
[0. , 0. ]]])

How to compute a spatial distance matrix from a given value

I've been looking for a way to (efficiently) compute a distance matrix from a target value and an input matrix.
If you consider an input array as:
[0 0 1 2 5 2 1]
[0 0 2 3 5 2 1]
[0 1 1 2 5 4 1]
[1 1 1 2 5 4 0]
Ho do you compute the spatial distance matrix associated to the target value 0?
i.e. what is the distance from each pixel to the closest 0 value?
Thanks in advance
You are looking for scipy.ndimage.morphology.distance_transform_edt. It operates on a binary array and computes euclidean distances on each TRUE position to the nearest background FALSE position. In our case, since we want to find out distances from nearest 0s, so the background is 0. Now, under the hoods, it converts the input to a binary array assuming 0 as the background, so we can just use it with the default parameters. Hence, it would be as simple as -
In [179]: a
Out[179]:
array([[0, 0, 1, 2, 5, 2, 1],
[0, 0, 2, 3, 5, 2, 1],
[0, 1, 1, 2, 5, 4, 1],
[1, 1, 1, 2, 5, 4, 0]])
In [180]: from scipy import ndimage
In [181]: ndimage.distance_transform_edt(a)
Out[181]:
array([[0. , 0. , 1. , 2. , 3. , 3.16, 3. ],
[0. , 0. , 1. , 2. , 2.83, 2.24, 2. ],
[0. , 1. , 1.41, 2.24, 2.24, 1.41, 1. ],
[1. , 1.41, 2.24, 2.83, 2. , 1. , 0. ]])
Solving for generic case
Now, let's say we want to find out distances from nearest 1s, then it would be -
In [183]: background = 1 # element from which distances are to be computed
# compare this with original array, a to verify
In [184]: ndimage.distance_transform_edt(a!=background)
Out[184]:
array([[2. , 1. , 0. , 1. , 2. , 1. , 0. ],
[1.41, 1. , 1. , 1.41, 2. , 1. , 0. ],
[1. , 0. , 0. , 1. , 2. , 1. , 0. ],
[0. , 0. , 0. , 1. , 2. , 1.41, 1. ]])

Interpolate 2D matrix along columns using Python

I am trying to interpolate a 2D numpy matrix with the dimensions (5, 3) to a matrix with the dimensions (7, 3) along the axis 1 (columns). Obviously, the wrong approach would be to randomly insert rows anywhere between the original matrix, see the following example:
Source:
[[0, 1, 1]
[0, 2, 0]
[0, 3, 1]
[0, 4, 0]
[0, 5, 1]]
Target (terrible interpolation -> not wanted!):
[[0, 1, 1]
[0, 1.5, 0.5]
[0, 2, 0]
[0, 3, 1]
[0, 3.5, 0.5]
[0, 4, 0]
[0, 5, 1]]
The correct approach would be to take every row into account and interpolate between all of them to expand the source matrix to a (7, 3) matrix. I am aware of the scipy.interpolate.interp1d or scipy.interpolate.interp2d methods, but could not get it to work with other Stack Overflow posts or websites. I hope to receive any type of tips or tricks.
Update #1: The expected values should be equally spaced.
Update #2:
What I want to do is basically use the separate columns of the original matrix, expand the length of the column to 7 and interpolate between the values of the original column. See the following example:
Source:
[[0, 1, 1]
[0, 2, 0]
[0, 3, 1]
[0, 4, 0]
[0, 5, 1]]
Split into 3 separate Columns:
[0 [1 [1
0 2 0
0 3 1
0 4 0
0] 5] 1]
Expand length to 7 and interpolate between them, example for second column:
[1
1.66
2.33
3
3.66
4.33
5]
It seems like each column can be treated completely independently, but for each column you need to define essentially an "x" coordinate so that you can fit some function "f(x)" from which you generate your output matrix.
Unless the rows in your matrix are associated with some other datastructure (e.g. a vector of timestamps), an obvious set of x values is just the row-number:
x = numpy.arange(0, Source.shape[0])
You can then construct an interpolating function:
fit = scipy.interpolate.interp1d(x, Source, axis=0)
and use that to construct your output matrix:
Target = fit(numpy.linspace(0, Source.shape[0]-1, 7)
which produces:
array([[ 0. , 1. , 1. ],
[ 0. , 1.66666667, 0.33333333],
[ 0. , 2.33333333, 0.33333333],
[ 0. , 3. , 1. ],
[ 0. , 3.66666667, 0.33333333],
[ 0. , 4.33333333, 0.33333333],
[ 0. , 5. , 1. ]])
By default, scipy.interpolate.interp1d uses piecewise-linear interpolation. There are many more exotic options within scipy.interpolate, based on higher order polynomials, etc. Interpolation is a big topic in itself, and unless the rows of your matrix have some particular properties (e.g. being regular samples of a signal with a known frequency range), there may be no "truly correct" way of interpolating. So, to some extent, the choice of interpolation scheme will be somewhat arbitrary.
You can do this as follows:
from scipy.interpolate import interp1d
import numpy as np
a = np.array([[0, 1, 1],
[0, 2, 0],
[0, 3, 1],
[0, 4, 0],
[0, 5, 1]])
x = np.array(range(a.shape[0]))
# define new x range, we need 7 equally spaced values
xnew = np.linspace(x.min(), x.max(), 7)
# apply the interpolation to each column
f = interp1d(x, a, axis=0)
# get final result
print(f(xnew))
This will print
[[ 0. 1. 1. ]
[ 0. 1.66666667 0.33333333]
[ 0. 2.33333333 0.33333333]
[ 0. 3. 1. ]
[ 0. 3.66666667 0.33333333]
[ 0. 4.33333333 0.33333333]
[ 0. 5. 1. ]]

Divide one column in array by another numpy

I am trying to get
[[ 4. 0. 0. ]
[ 8. 0. 0. ]]
out of this:
[[ 2. 0.5 0. ]
[ 2. 0.25 0. ]]
So I want to divide the first column by the second one:
div = arr[:,0]/arr[:,1] but don't know what's the best way to reshape and add zeros to get the result.
Thanks in advance.
If you want to do it in place, you could do
a[:, 0] = a[:, 0] / a[:, 1]
a[:, 1] = 0
If not
b = np.zeros(6).reshape(2, 3)
b[:, 0] = (a[:, 0] / a[:, 1])

Matlab / Octave bwdist() in Python or C

Does anyone know of a Python replacement for Matlab / Octave bwdist() function? This function returns Euclidian distance of each cell to the closest non-zero cell for a given matrix. I saw an Octave C implementation, a pure Matlab implementation, and I was wondering if anyone had to implement this in ANSI C (which doesn't include any Matlab / Octave headers, so I can integrate from Python easily) or in pure Python.
Both links I mentioned are below:
C++
Matlab M-File
As a test, a Matlab code / output looks something like this:
bw= [0 1 0 0 0;
1 0 0 0 0;
0 0 0 0 1;
0 0 0 0 0;
0 0 1 0 0]
D = bwdist(bw)
D =
1.00000 0.00000 1.00000 2.00000 2.00000
0.00000 1.00000 1.41421 1.41421 1.00000
1.00000 1.41421 2.00000 1.00000 0.00000
2.00000 1.41421 1.00000 1.41421 1.00000
2.00000 1.00000 0.00000 1.00000 2.00000
I tested a recommended distance_transform_edt call in Python, which gave this result:
import numpy as np
from scipy import ndimage
a = np.array(([0,1,0,0,0],
[1,0,0,0,0],
[0,0,0,0,1],
[0,0,0,0,0],
[0,0,1,0,0]))
res = ndimage.distance_transform_edt(a)
print res
[[ 0. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]]
This result does not seem to match the Octave / Matlab output.
While Matlab bwdist returns distances to the closest non-zero cell, Python distance_transform_edt returns distances “to the closest background element”. SciPy documentation is not clear about what it considers to be the “background”, there is some type conversion machinery behind it; in practice 0 is the background, non-zero is the foreground.
So if we have matrix a:
>>> a = np.array(([0,1,0,0,0],
[1,0,0,0,0],
[0,0,0,0,1],
[0,0,0,0,0],
[0,0,1,0,0]))
then to calculate the same result we need to replaces ones with zeros and zeros with ones, e.g. consider matrix 1-a:
>>> a
array([[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0]])
>>> 1 - a
array([[1, 0, 1, 1, 1],
[0, 1, 1, 1, 1],
[1, 1, 1, 1, 0],
[1, 1, 1, 1, 1],
[1, 1, 0, 1, 1]])
In this case scipy.ndimage.morphology.distance_transform_edt gives the expected results:
>>> distance_transform_edt(1-a)
array([[ 1. , 0. , 1. , 2. , 2. ],
[ 0. , 1. , 1.41421356, 1.41421356, 1. ],
[ 1. , 1.41421356, 2. , 1. , 0. ],
[ 2. , 1.41421356, 1. , 1.41421356, 1. ],
[ 2. , 1. , 0. , 1. , 2. ]])
Does scipy.ndimage.morphology.distance_transform_edt meet your needs?
No need to do the 1-a
>>> distance_transform_edt(a==0)
array([[ 1. , 0. , 1. , 2. , 2. ],
[ 0. , 1. , 1.41421356, 1.41421356, 1. ],
[ 1. , 1.41421356, 2. , 1. , 0. ],
[ 2. , 1.41421356, 1. , 1.41421356, 1. ],
[ 2. , 1. , 0. , 1. , 2. ]])
I think you can use distanceTransform() from OpenCV that Calculates the distance to the closest zero pixel for each pixel of the source image.
Check this link: https://docs.opencv.org/3.4/d7/d1b/group__imgproc__misc.html#ga8a0b7fdfcb7a13dde018988ba3a43042

Categories

Resources