calculate distance of 2 list of points in numpy - python

I have 2 lists of points as numpy.ndarray, each row is the coordinate of a point, like:
a = np.array([[1,0,0],[0,1,0],[0,0,1]])
b = np.array([[1,1,0],[0,1,1],[1,0,1]])
Here I want to calculate the euclidean distance between all pairs of points in the 2 lists, for each point p_a in a, I want to calculate the distance between it and every point p_b in b. So the result is
d = np.array([[1,sqrt(3),1],[1,1,sqrt(3)],[sqrt(3),1,1]])
How to use matrix multiplication in numpy to compute the distance matrix?

Using direct numpy broadcasting, you can do this:
dist = np.sqrt(((a[:, None] - b[:, :, None]) ** 2).sum(0))
Alternatively, scipy has a routine that will compute this slightly more efficiently (particularly for large matrices)
from scipy.spatial.distance import cdist
dist = cdist(a, b)
I would avoid solutions that depend on factoring-out matrix products (of the form A^2 + B^2 - 2AB), because they can be numerically unstable due to floating point roundoff errors.

To compute the squared euclidean distance for each pair of elements off them - x and y, we need to find :
(Xik-Yjk)**2 = Xik**2 + Yjk**2 - 2*Xik*Yjk
and then sum along k to get the distance at coressponding point as dist(Xi,Yj).
Using associativity, it reduces to :
dist(Xi,Yj) = sum_k(Xik**2) + sum_k(Yjk**2) - 2*sum_k(Xik*Yjk)
Bringing in matrix-multiplication for the last part, we would have all the distances, like so -
dist = sum_rows(X^2), sum_rows(Y^2), -2*matrix_multiplication(X, Y.T)
Hence, putting into NumPy terms, we would end up with the euclidean distances for our case with a and b as the inputs, like so -
np.sqrt((a**2).sum(1)[:,None] + (b**2).sum(1) - 2*a.dot(b.T))
Leveraging np.einsum, we could replace the first two summation-reductions with -
np.einsum('ij,ij->i',a,a)[:,None] + np.einsum('ij,ij->i',b,b)
More info could be found on eucl_dist package's wiki page (disclaimer: I am its author).

If you have 2 each 1-dimensional arrays, x and y, you can convert the arrays into matrices with repeating columns, transpose, and apply the distance formula. This assumes that x and y are coordinated pairs. The result is a symmetrical distance matrix.
x = [1, 2, 3]
y = [4, 5, 6]
xx = np.repeat(x,3,axis = 0).reshape(3,3)
yy = np.repeat(y,3,axis = 0).reshape(3,3)
dist = np.sqrt((xx-xx.T)**2 + (yy-yy.T)**2)
dist
Out[135]:
array([[0. , 1.41421356, 2.82842712],
[1.41421356, 0. , 1.41421356],
[2.82842712, 1.41421356, 0. ]])

L2 distance = (a^2 + b^2 - 2ab)^0.5
a = np.random.randn(5, 3)
b = np.random.randn(2, 3)
a2 = np.sum(np.square(a), axis = 1)[..., None]
b2 = np.sum(np.square(b), axis = 1)[None, ...]
ab = -2*np.dot(a, b.T)
dist = np.sqrt(a2 + b2 + ab)

Related

Finding basis of affine space in python?

Suppose I am given an affine space as some conjunction of equalities say:
x + y + z = 2 && x - 3z = 4
which I represent in python as:
[ [1,1,1,2] , [1,0,-3,4] ]
I would like to find the basis of this affine set. Does python have any such library?
Little bit of mathematics:
Let the affine space be given by the matrix equation Ax = b. Let the k vectors {x_1, x_2, .. x_k } be the basis of the nullspace of A i.e. the space represented by Ax = 0. Let y be any particular solution of Ax = b. Then the basis of the affine space represented by Ax = b is given by the (k+1) vectors {y, y + x_1, y + x_2, .. y + x_k }. If there is no particular solution for Ax = b, then return some error message, as the set represented is empty.
For the above equation the matrix equation is:
Ax = b where
A = [[1, 1 , 1] , [1, 0 , -3]]
x = [x , y , z]^T
b = [2, 4]^T
If you are looking for a numerical (i.e. approximate) solution then you can try this:
import numpy as np
import scipy
A = np.array([[1,1,1] , [1,0,-3]])
b = np.array([2, 4])
null_sp = scipy.linalg.null_space(A)
x0 = np.linalg.lstsq(A, b, rcond=None)[0][..., None]
aff_basis = np.c_[np.zeros(A.shape[1])[..., None], x] + x0
print(aff_basis)
It gives:
[[ 1.69230769 1.10395929]
[ 1.07692308 1.86138762]
[-0.76923077 -0.9653469 ]]
You find two points on the line by adding two constraints and solving for the remaining unknowns:
x = 1 -> (1, 2, -1)
z = 0 -> (4, -2, 0)
This requires the resolution of two easy 2x2 systems, no need for a library.

Fast way to calculate min distance between two numpy arrays of 3D points

I would like to know if there is a fast way to calculate Euclidian distance between all points of a 3D numpy array (A [N,3]) to all points of a second 3D numpy array (B [M,3]).
I should then get an array C which would be [N, M] with all distances from points of array A to points of array B to then use np.min() along specified axis to get all minimum distances from points of set A to points of set B.
This is the way I have done the implementation so far :
distances = np.repeat(9999, len(A))
for i, point in enumerate(A):
min_distance = np.min(np.sqrt(np.sum(np.square(point - B), axis=1)))
distances[i] = min_distance
Is there any way to get rid of the for loop...?
Thanks in advance :)
If the scipy method doesn't work or if you do have other reasons, here is a numpy way-
import numpy as np
x = np.random.random((200, 3))
y = np.random.random((100,3))
x = x.reshape((-1, 1, 3)) # [200x1x3]
y = np.expand_dims(y, axis=0) # [1x100x3]
y = y.repeat(x.shape[0], axis=0) # [200x100x3]
distance = np.linalg.norm(y-x, axis=2) # Difference is [200x100x3] and norm results in [200x100]
import numpy as np
# arrays with xyz coordinates of all points
a = np.asarray([[xa1,ya1,za1],...,[xan,yan,zan]])
b = np.asarray([[xb1,yb1,zb1],...,[xbn,ybn,zbn]])
# reshaping to be able to calculate the distance matrix
a_reshaped = a.reshape(a.shape[0], 1, 3)
b_reshaped = b.reshape(1, b.shape[0], 3)
"""calculation of all distances between all points - creates a
len(a) x len(b) matrix"""
distance = np.sqrt(np.sum((a_reshaped - b_reshaped)**2, axis=2))

How to write Z[i,k] = sqrt(sum_j((X[i,j] - Y[k,j])**2) in einsum notation? [duplicate]

I have 2 lists of points as numpy.ndarray, each row is the coordinate of a point, like:
a = np.array([[1,0,0],[0,1,0],[0,0,1]])
b = np.array([[1,1,0],[0,1,1],[1,0,1]])
Here I want to calculate the euclidean distance between all pairs of points in the 2 lists, for each point p_a in a, I want to calculate the distance between it and every point p_b in b. So the result is
d = np.array([[1,sqrt(3),1],[1,1,sqrt(3)],[sqrt(3),1,1]])
How to use matrix multiplication in numpy to compute the distance matrix?
Using direct numpy broadcasting, you can do this:
dist = np.sqrt(((a[:, None] - b[:, :, None]) ** 2).sum(0))
Alternatively, scipy has a routine that will compute this slightly more efficiently (particularly for large matrices)
from scipy.spatial.distance import cdist
dist = cdist(a, b)
I would avoid solutions that depend on factoring-out matrix products (of the form A^2 + B^2 - 2AB), because they can be numerically unstable due to floating point roundoff errors.
To compute the squared euclidean distance for each pair of elements off them - x and y, we need to find :
(Xik-Yjk)**2 = Xik**2 + Yjk**2 - 2*Xik*Yjk
and then sum along k to get the distance at coressponding point as dist(Xi,Yj).
Using associativity, it reduces to :
dist(Xi,Yj) = sum_k(Xik**2) + sum_k(Yjk**2) - 2*sum_k(Xik*Yjk)
Bringing in matrix-multiplication for the last part, we would have all the distances, like so -
dist = sum_rows(X^2), sum_rows(Y^2), -2*matrix_multiplication(X, Y.T)
Hence, putting into NumPy terms, we would end up with the euclidean distances for our case with a and b as the inputs, like so -
np.sqrt((a**2).sum(1)[:,None] + (b**2).sum(1) - 2*a.dot(b.T))
Leveraging np.einsum, we could replace the first two summation-reductions with -
np.einsum('ij,ij->i',a,a)[:,None] + np.einsum('ij,ij->i',b,b)
More info could be found on eucl_dist package's wiki page (disclaimer: I am its author).
If you have 2 each 1-dimensional arrays, x and y, you can convert the arrays into matrices with repeating columns, transpose, and apply the distance formula. This assumes that x and y are coordinated pairs. The result is a symmetrical distance matrix.
x = [1, 2, 3]
y = [4, 5, 6]
xx = np.repeat(x,3,axis = 0).reshape(3,3)
yy = np.repeat(y,3,axis = 0).reshape(3,3)
dist = np.sqrt((xx-xx.T)**2 + (yy-yy.T)**2)
dist
Out[135]:
array([[0. , 1.41421356, 2.82842712],
[1.41421356, 0. , 1.41421356],
[2.82842712, 1.41421356, 0. ]])
L2 distance = (a^2 + b^2 - 2ab)^0.5
a = np.random.randn(5, 3)
b = np.random.randn(2, 3)
a2 = np.sum(np.square(a), axis = 1)[..., None]
b2 = np.sum(np.square(b), axis = 1)[None, ...]
ab = -2*np.dot(a, b.T)
dist = np.sqrt(a2 + b2 + ab)

How to do n-D distance and nearest neighbor calculations on numpy arrays

This question is intended to be a canonical duplicate target
Given two arrays X and Y of shapes (i, n) and (j, n), representing lists of n-dimensional coordinates,
def test_data(n, i, j, r = 100):
X = np.random.rand(i, n) * r - r / 2
Y = np.random.rand(j, n) * r - r / 2
return X, Y
X, Y = test_data(3, 1000, 1000)
what are the fastest ways to find:
The distance D with shape (i,j) between every point in X and every point in Y
The indices k_i and distance k_d of the k nearest neighbors against all points in X for every point in Y
The indices r_i, r_j and distance r_d of every point in X within distance r of every point j in Y
Given the following sets of restrictions:
Only using numpy
Using any python package
Including the special case:
Y is X
In all cases distance primarily means Euclidean distance, but feel free to highlight methods that allow other distance calculations.
#1. All Distances
only using numpy
The naive method is:
D = np.sqrt(np.sum((X[:, None, :] - Y[None, :, :])**2, axis = -1))
However this takes up a lot of memory creating an (i, j, n)-shaped intermediate matrix, and is very slow
However, thanks to a trick from #Divakar (eucl_dist package, wiki), we can use a bit of algebra and np.einsum to decompose as such: (X - Y)**2 = X**2 - 2*X*Y + Y**2
D = np.sqrt( # (X - Y) ** 2
np.einsum('ij, ij ->i', X, X)[:, None] + # = X ** 2 \
np.einsum('ij, ij ->i', Y, Y) - # + Y ** 2 \
2 * X.dot(Y.T)) # - 2 * X * Y
Y is X
Similar to above:
XX = np.einsum('ij, ij ->i', X, X)
D = np.sqrt(XX[:, None] + XX - 2 * X.dot(X.T))
Beware that floating-point imprecision can make the diagonal terms deviate very slightly from zero with this method. If you need to make sure they are zero, you'll need to explicitly set it:
np.einsum('ii->i', D)[:] = 0
Any Package
scipy.spatial.distance.cdist is the most intuitive builtin function for this, and far faster than bare numpy
from scipy.spatial.distance import cdist
D = cdist(X, Y)
cdist can also deal with many, many distance measures as well as user-defined distance measures (although these are not optimized). Check the documentation linked above for details.
Y is X
For self-referring distances, scipy.spatial.distance.pdist works similar to cdist, but returns a 1-D condensed distance array, saving space on the symmetric distance matrix by only having each term once. You can convert this to a square matrix using squareform
from scipy.spatial.distance import pdist, squareform
D_cond = pdist(X)
D = squareform(D_cond)
#2. K Nearest Neighbors (KNN)
Only using numpy
We could use np.argpartition to get the k-nearest indices and use those to get the corresponding distance values. So, with D as the array holding the distance values obtained above, we would have -
if k == 1:
k_i = D.argmin(0)
else:
k_i = D.argpartition(k, axis = 0)[:k]
k_d = np.take_along_axis(D, k_i, axis = 0)
However we can speed this up a bit by not taking the square roots until we have reduced our dataset. np.sqrt is the slowest part of calculating the Euclidean norm, so we don't want to do that until the end.
D_sq = np.einsum('ij, ij ->i', X, X)[:, None] +\
np.einsum('ij, ij ->i', Y, Y) - 2 * X.dot(Y.T)
if k == 1:
k_i = D_sq.argmin(0)
else:
k_i = D_sq.argpartition(k, axis = 0)[:k]
k_d = np.sqrt(np.take_along_axis(D_sq, k_i, axis = 0))
Now, np.argpartition performs indirect partition and doesn't necessarily give us the elements in sorted order and only makes sure that the first k elements are the smallest ones. So, for a sorted output, we need to use argsort on the output from previous step -
sorted_idx = k_d.argsort(axis = 0)
k_i_sorted = np.take_along_axis(k_i, sorted_idx, axis = 0)
k_d_sorted = np.take_along_axis(k_d, sorted_idx, axis = 0)
If you only need, k_i, you never need the square root at all:
D_sq = np.einsum('ij, ij ->i', X, X)[:, None] +\
np.einsum('ij, ij ->i', Y, Y) - 2 * X.dot(Y.T)
if k == 1:
k_i = D_sq.argmin(0)
else:
k_i = D_sq.argpartition(k, axis = 0)[:k]
k_d_sq = np.take_along_axis(D_sq, k_i, axis = 0)
sorted_idx = k_d_sq.argsort(axis = 0)
k_i_sorted = np.take_along_axis(k_i, sorted_idx, axis = 0)
X is Y
In the above code, replace:
D_sq = np.einsum('ij, ij ->i', X, X)[:, None] +\
np.einsum('ij, ij ->i', Y, Y) - 2 * X.dot(Y.T)
with:
XX = np.einsum('ij, ij ->i', X, X)
D_sq = XX[:, None] + XX - 2 * X.dot(X.T))
Any Package
KD-Tree is a much faster method to find neighbors and constrained distances. Be aware the while KDTree is usually much faster than brute force solutions above for 3d (as long as oyu have more than 8 points), if you have n-dimensions, KDTree only scales well if you have more than 2**n points. For discussion and more advanced methods for high dimensions, see Here
The most recommended method for implementing KDTree is to use scipy's scipy.spatial.KDTree or scipy.spatial.cKDTree
from scipy.spatial import KDTree
X_tree = KDTree(X)
k_d, k_i = X_tree.query(Y, k = k)
Unfortunately scipy's KDTree implementation is slow and has a tendency to segfault for larger data sets. As pointed out by #HansMusgrave here, pykdtree increases the performance a lot, but is not as common an include as scipy and can only deal with Euclidean distance currently (while the KDTree in scipy can handle Minkowsi p-norms of any order)
X is Y
Use instead:
k_d, k_i = X_tree.query(X, k = k)
Arbitrary metrics
A BallTree has similar algorithmic properties to a KDTree. I'm not aware of a parallel/vectorized/fast BallTree in Python, but using scipy we can still have reasonable KNN queries for user-defined metrics. If available, builtin metrics will be much faster.
def d(a, b):
return max(np.abs(a-b))
tree = sklearn.neighbors.BallTree(X, metric=d)
k_d, k_i = tree.query(Y)
This answer will be wrong if d() is not a metric. The only reason a BallTree is faster than brute force is because the properties of a metric allow it to rule out some solutions. For truly arbitrary functions, brute force is actually necessary.
#3. Radius search
Only using numpy
The simplest method is just to use boolean indexing:
mask = D_sq < r**2
r_i, r_j = np.where(mask)
r_d = np.sqrt(D_sq[mask])
Any Package
Similar to above, you can use scipy.spatial.KDTree.query_ball_point
r_ij = X_tree.query_ball_point(Y, r = r)
or scipy.spatial.KDTree.query_ball_tree
Y_tree = KDTree(Y)
r_ij = X_tree.query_ball_tree(Y_tree, r = r)
Unfortunately r_ij ends up being a list of index arrays that are a bit difficult to untangle for later use.
Much easier is to use cKDTree's sparse_distance_matrix, which can output a coo_matrix
from scipy.spatial import cKDTree
X_cTree = cKDTree(X)
Y_cTree = cKDTree(Y)
D_coo = X_cTree.sparse_distance_matrix(Y_cTree, r = r, output_type = `coo_matrix`)
r_i = D_coo.row
r_j = D_coo.column
r_d = D_coo.data
This is an extraordinarily flexible format for the distance matrix, as it stays an actual matrix (if converted to csr) can also be used for many vectorized operations.

Euler Angles and Rotation Matrix from two 3D points

I am trying to find the Euler angles that allow the transformation from point A to point B in 3D space.
Consider the normalized vectors A = [1, 0, 0] and B = [0.32 0.88 -0.34].
I understand that by computing the cross product A × B I get the rotation axis. The angle between A and B is given by tan⁻¹(||cross||, A·B), where A·B is the dot product between A and B.
This gives me the rotation vector rotvec = [0 0.36 0.93 1.24359531111], which is rotvec = [A × B; angle] (the cross product is normalized).
Now my question is: How do I move from here to get the Euler angles that correspond to the transformation from A to B?
In MATLAB the function vrrotvec2mat receives as input a rotation vector and outputs a rotation matrix. Then the function rotm2eul should return the corresponding Euler angles. I get the following result (in radians): [0.2456 0.3490 1.2216], according to the XYZ convention. Yet, this is not the expected result.
The correct answer is [0 0.3490 1.2216] that corresponds to a rotation of 20° and 70° in Y and Z, respectively.
When I use eul2rot([0 0.3490 1.2216]) (with eul2rot taken from here) to verify the resulting rotation matrix, this one is different from the one I obtain when using vrrotvec2mat(rotvec).
I also have a Python spinet that yields the exactly same results as described above.
--- Python (2.7) using transform3d ---
import numpy as np
import transforms3d
cross = np.cross(A, B)
dot = np.dot(A, B.transpose())
angle = math.atan2(np.linalg.norm(cross), dot)
rotation_axes = sklearn.preprocessing.normalize(cross)
rotation_m = transforms3d.axangles.axangle2mat(rotation_axes[0], angle, True)
rotation_angles = transforms3d.euler.mat2euler(rotation_m, 'sxyz')
What I am missing here? What should I be doing instead?
Thank you
A rotation matrix has 3 degrees of freedom but the constraints of your problem only constrain 2 of those degrees.
This can be made more concrete by considering the case where we have a rotation matrix R which rotates from A to B so R*A == B. If we then construct another rotation matrix RB which rotates about vector B then applying this rotation to R*A won't have any effect, i.e. B == R*A == RB*R*A. It will, however, produce a different rotation matrix RB*R with different Euler angles.
Here's an example in MATLAB:
A = [1; 0; 0];
B = [0.32; 0.88; -0.34];
A = A / norm(A);
B = B / norm(B);
ax = cross(A, B);
ang = atan2(norm(ax), dot(A, B)); % ang = acos(dot(A, B)) works too
R = axang2rotm([ax; ang].');
ang_arbitrary = rand()*2*pi;
RB = axang2rotm([B; ang_arbitrary].');
R*A - B
RB*R*A - B
rotm2eul(R)
rotm2eul(RB*R)
Result
ans =
1.0e-15 *
-0.0555
0.1110
0
ans =
1.0e-15 *
0.2220
0.7772
-0.2776
ans =
1.2220 0.3483 0.2452
ans =
1.2220 0.3483 0.7549
I will give you a solution based on Euler's rotation theorem.
This solution gives you only the one angle, but the other angles can be derived.
import numpy as np
a_vec = np.array([1, 0, 0])/np.linalg.norm(np.array([1, 0, 0]))
b_vec = np.array([0.32, 0.88, -0.34])/np.linalg.norm(np.array([0.32, 0.88, -0.34]))
cross = np.cross(a_vec, b_vec)
ab_angle = np.arccos(np.dot(a_vec,b_vec))
vx = np.array([[0,-cross[2],cross[1]],[cross[2],0,-cross[0]],[-cross[1],cross[0],0]])
R = np.identity(3)*np.cos(ab_angle) + (1-np.cos(ab_angle))*np.outer(cross,cross) + np.sin(ab_angle)*vx
validation=np.matmul(R,a_vec)
This uses the common axis of rotation (eigenvector in this case), as the cross product.
The matrix R is then the rotation matrix.
This is a general way of doing it, and very simple.

Categories

Resources