I'm looking for efficient alternate ways to compute cosine angle between 2D vectors. Your insights on this problem will be of much help.
Problem Statement:
vectors is a 2D array where vectors are stored. The shape of the vectors array is (N, 2) where N is the number of vectors. vectors[:, 0] has x-component and vectors[:, 1] has y-component.
I have to find the angle between all the vectors in vectors. For example, if there are three vectors A, B, C in vectors, I need to find the angle between A and B, B and C, and, A and C.
I have implemented it and wants to know alternative ways.
Current Implementation:
vectors = np.array([[1, 3], [2, 4], [3, 5]])
vec_x = vectors[:, 0]
vec_y = vectors[:, 1]
a1 = np.ones([vec_x.shape[0], vec_x.shape[0]]) * vec_x
a2 = np.ones([vec_x.shape[0], vec_x.shape[0]]) * vec_y
a1b1 = a1 * a1.T
a2b2 = a2 * a2.T
mask = np.triu_indices(a1b1.shape[0], 0) # We are interested in lower triangular matrix
a1b1[mask] = 0
a2b2[mask] = 0
numer = a1b1 + a2b2
denom = np.ones([vec_x.shape[0], vec_x.shape[0]]) * np.sqrt(np.square(a1) + np.square(a2))
denom = denom * denom.T
denom[mask] = 0
eps = 1e-7
dot_res = np.rad2deg(np.arccos(np.divide(numer, denom + eps)))
dot_res[mask] = 0
print(dot_res)
Output:
[[ 0. 0. 0. ]
[ 8.13010519 0. 0. ]
[12.52880911 4.39870821 0. ]]
Questions:
Is there any alternative way to do this more efficient?
Can we improve the speed of the current version in some way?
Use scipy.spatial.distance.pdist:
import numpy as np
import scipy.spatial.distance
vectors = np.array([[1, 3], [2, 4], [3, 5]])
# Compute cosine distance
dist = scipy.spatial.distance.pdist(vectors, 'cosine')
# Compute angles
angle = np.rad2deg(np.arccos(1 - dist))
# Make it into a matrix
angle_matrix = scipy.spatial.distance.squareform(angle)
print(angle_matrix)
# [[ 0. 8.13010235 12.52880771]
# [ 8.13010235 0. 4.39870535]
# [12.52880771 4.39870535 0. ]]
Related
# Table with weight 2/3
a = np.array(
[[0, 0],
[12, 12]]
)
# Table with weight 1/3
b = np.array(
[[12, 6],
[9, 3]]
)
# Returned table
c = np.array(
[[4, 2],
[11, 9]]
)
I have a, b (each holding some points) and I want to compute efficiently given their weights, matrix c holding the pairwise point averages. Something like their weighted centroid.
How can I do that?
Thanks
You can use simple basic numpy matrix operations:
c = a * weight_a + b * weight_b
# With your example :
c = a * 2 / 3 + b * 1 / 3
# array([[ 4., 2.],
# [11., 9.]])
Im trying to multiply a row of a matrix by a number.
This is my code:
def multiply_rows(m,r,x):
for i in range(r,r +1):
for j in range(0,m.shape[1]):
m[i,j] = (m[i,j]) * (1 / float(x))
return m
This is what the console gave me:
multiply_rows(numpy.array([[1,2],[3,4]]),1, 4)
array([[1, 2],
[0, 1]])
I can't understand why does it shows a 0 instead of 0,75. All help is welcome
You can just do m*(r/float(x)). Numpy should take care of multiplying.
x = np.array([[1,2], [3,4]])
x*(1/4)
Output: array([[0.25, 0.5 ],
[0.75, 1. ]])
I have an array A whose shape is (N, N, K) and I would like to compute another array B with the same shape where B[:, :, i] = np.linalg.inv(A[:, :, i]).
As solutions, I see map and for loops but I am wondering if numpy provides a function to do this (I have tried np.apply_over_axes but it seems that it can only handle 1D array).
with a for loop:
B = np.zeros(shape=A.shape)
for i in range(A.shape[2]):
B[:, :, i] = np.linalg.inv(A[:, :, i])
with map:
B = np.asarray(map(np.linalg.inv, np.squeeze(np.dsplit(A, A.shape[2])))).transpose(1, 2, 0)
For an invertible matrix M we have inv(M).T == inv(M.T) (the transpose of the inverse is equal to the inverse of the transpose).
Since np.linalg.inv is broadcastable, your problem can be solved by simply transposing A, calling inv and transposing the result:
B = np.linalg.inv(A.T).T
For example:
>>> N, K = 2, 3
>>> A = np.random.randint(1, 5, (N, N, K))
>>> A
array([[[4, 2, 3],
[2, 3, 1]],
[[3, 3, 4],
[4, 4, 4]]])
>>> B = np.linalg.inv(A.T).T
>>> B
array([[[ 0.4 , -4. , 0.5 ],
[-0.2 , 3. , -0.125]],
[[-0.3 , 3. , -0.5 ],
[ 0.4 , -2. , 0.375]]])
You can check the values of B match the inverses of the arrays in A as expected:
>>> all(np.allclose(B[:, :, i], np.linalg.inv(A[:, :, i])) for i in range(K))
True
It's a python question: let's say I have an m+1-dimensional numpy array a consisting of non-negative numbers, and I would like to obtain an array b of the same size where the last coordinates are normalized so that they sum up to 1, or zero in case all of them were zeros. For example, if m = 2, my code would be as follows
import numpy as np
a = np.array([[[ 0.34 , 0.66],
[ 0.75 , 0.25]],
[[ 0. , 0. ],
[ 1. , 0. ]]])
for i1 in range(len(a)):
for i2 in range(len(a)):
s = a[i1][i2].sum()
if s > 0:
a[i1][i2] = a[i1][i2]/s
however I find this method sloppy. Also, it works only for fixed m.
This can be done by broadcasting. There are several ways to take into account the zero-sum exception. Without taking it into account, you could write
import numpy as np
shape = (2, 3, 4)
X = np.random.randn(*shape) ** 2
sums = X.sum(-1)
Y = X / sums[..., np.newaxis]
Now, in order to take into account potential zero-sum-ness of some lines, we set one line of the data to 0:
X[0, 0, :] = 0
sums = X.sum(-1)
nnz = sums != 0
Y = np.zeros_like(X)
Y[nnz, :] = X[nnz, :] / sums[nnz, np.newaxis]
You will observe that Y.sum(axis=-1) has the entry 0 in coordinate (0,0) reflecting the zero-ness of the corresponding line.
EDIT: Application to the concrete example
X = np.array(array([[[ 0.34 , 0.66],
[ 0.75 , 0.25]],
[[ 0. , 0. ],
[ 1. , 0. ]]]))
sums = X.sum(-1)
nnz = sums != 0
Y = np.zeros_like(X)
Y[nnz, :] = X[nnz, :] / sums[nnz, np.newaxis]
yields Y == X (because along the last axis the sum is already one or zero.)
I'm doing a project and I'm doing a lot of matrix computation in it.
I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M non-zeros values. The example below is just to see my point.
Let's say I have:
A vector v of size (2)
A vector c of size (3)
A sparse matrix X of size (2,3)
v = np.asarray([10, 20])
c = np.asarray([ 2, 3, 4])
data = np.array([1, 1, 1, 1])
row = np.array([0, 0, 1, 1])
col = np.array([1, 2, 0, 2])
X = coo_matrix((data,(row,col)), shape=(2,3))
X.todense()
# matrix([[0, 1, 1],
# [1, 0, 1]])
Currently I'm doing:
result = np.zeros_like(v)
d = scipy.sparse.lil_matrix((v.shape[0], v.shape[0]))
d.setdiag(v)
tmp = d * X
print tmp.todense()
#matrix([[ 0., 10., 10.],
# [ 20., 0., 20.]])
# At this point tmp is csr sparse matrix
for i in range(tmp.shape[0]):
x_i = tmp.getrow(i)
result += x_i.data * ( c[x_i.indices] - x_i.data)
# I only want to do the subtraction on non-zero elements
print result
# array([-430, -380])
And my problem is the for loop and especially the subtraction.
I would like to find a way to vectorize this operation by subtracting only on the non-zero elements.
Something to get directly the sparse matrix on the subtraction:
matrix([[ 0., -7., -6.],
[ -18., 0., -16.]])
Is there a way to do this smartly ?
You don't need to loop over the rows to do what you are already doing. And you can use a similar trick to perform the multiplication of the rows by the first vector:
import scipy.sparse as sps
# number of nonzero entries per row of X
nnz_per_row = np.diff(X.indptr)
# multiply every row by the corresponding entry of v
# You could do this in-place as:
# X.data *= np.repeat(v, nnz_per_row)
Y = sps.csr_matrix((X.data * np.repeat(v, nnz_per_row), X.indices, X.indptr),
shape=X.shape)
# subtract from the non-zero entries the corresponding column value in c...
Y.data -= np.take(c, Y.indices)
# ...and multiply by -1 to get the value you are after
Y.data *= -1
To see that it works, set up some dummy data
rows, cols = 3, 5
v = np.random.rand(rows)
c = np.random.rand(cols)
X = sps.rand(rows, cols, density=0.5, format='csr')
and after run the code above:
>>> x = X.toarray()
>>> mask = x == 0
>>> x *= v[:, np.newaxis]
>>> x = c - x
>>> x[mask] = 0
>>> x
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
>>> Y.toarray()
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
The way you are accumulating your result requires that there are the same number of non-zero entries in every row, which seems a pretty weird thing to do. Are you sure that is what you are after? If that's really what you want you could get that value with something like:
result = np.sum(Y.data.reshape(Y.shape[0], -1), axis=0)
but I have trouble believing that is really what you are after...