I have in my code the following expression:
a = (b / x[:, np.newaxis]).sum(axis=1)
where b is an ndarray of shape (M, N), and x is an ndarray of shape (M,). Now, b is actually sparse, so for memory efficiency I would like to substitute in a scipy.sparse.csc_matrix or csr_matrix. However, broadcasting in this way is not implemented (even though division or multiplication is guaranteed to maintain sparsity) (the entries of x are non-zero), and raises a NotImplementedError. Is there a sparse function I'm not aware of that would do what I want? (dot() would sum along the wrong axis.)
If b is in CSC format, then b.data has the non-zero entries of b, and b.indices has the row index of each of the non-zero entries, so you can do your division as:
b.data /= np.take(x, b.indices)
It's hackier than Warren's elegant solution, but it will probably also be faster in most settings:
b = sps.rand(1000, 1000, density=0.01, format='csc')
x = np.random.rand(1000)
def row_divide_col_reduce(b, x):
data = b.data.copy() / np.take(x, b.indices)
ret = sps.csc_matrix((data, b.indices.copy(), b.indptr.copy()),
shape=b.shape)
return ret.sum(axis=1)
def row_divide_col_reduce_bis(b, x):
d = sps.spdiags(1.0/x, 0, len(x), len(x))
return (d * b).sum(axis=1)
In [2]: %timeit row_divide_col_reduce(b, x)
1000 loops, best of 3: 210 us per loop
In [3]: %timeit row_divide_col_reduce_bis(b, x)
1000 loops, best of 3: 697 us per loop
In [4]: np.allclose(row_divide_col_reduce(b, x),
...: row_divide_col_reduce_bis(b, x))
Out[4]: True
You can cut the time almost in half in the above example if you do the division in-place, i.e.:
def row_divide_col_reduce(b, x):
b.data /= np.take(x, b.indices)
return b.sum(axis=1)
In [2]: %timeit row_divide_col_reduce(b, x)
10000 loops, best of 3: 131 us per loop
To implement a = (b / x[:, np.newaxis]).sum(axis=1), you can use a = b.sum(axis=1).A1 / x. The A1 attribute returns the 1D ndarray, so the result is a 1D ndarray, not a matrix. This concise expression works because you are both scaling by x and summing along axis 1. For example:
In [190]: b
Out[190]:
<3x3 sparse matrix of type '<type 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [191]: b.A
Out[191]:
array([[ 1., 0., 2.],
[ 0., 3., 0.],
[ 4., 0., 5.]])
In [192]: x
Out[192]: array([ 2., 3., 4.])
In [193]: b.sum(axis=1).A1 / x
Out[193]: array([ 1.5 , 1. , 2.25])
More generally, if you want to scale the rows of a sparse matrix with a vector x, you could multiply b on the left with a sparse matrix containing 1.0/x on the diagonal. The function scipy.sparse.spdiags can be used to create such a matrix. For example:
In [71]: from scipy.sparse import csc_matrix, spdiags
In [72]: b = csc_matrix([[1,0,2],[0,3,0],[4,0,5]], dtype=np.float64)
In [73]: b.A
Out[73]:
array([[ 1., 0., 2.],
[ 0., 3., 0.],
[ 4., 0., 5.]])
In [74]: x = array([2., 3., 4.])
In [75]: d = spdiags(1.0/x, 0, len(x), len(x))
In [76]: d.A
Out[76]:
array([[ 0.5 , 0. , 0. ],
[ 0. , 0.33333333, 0. ],
[ 0. , 0. , 0.25 ]])
In [77]: p = d * b
In [78]: p.A
Out[78]:
array([[ 0.5 , 0. , 1. ],
[ 0. , 1. , 0. ],
[ 1. , 0. , 1.25]])
In [79]: a = p.sum(axis=1)
In [80]: a
Out[80]:
matrix([[ 1.5 ],
[ 1. ],
[ 2.25]])
Related
Say I have a 2D pytorch tensor and a 2D numpy boolean as follows,
a = torch.tensor([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.],
[12., 13., 14.]])
m = numpy.array([[ False, True, False],
[ True, False, True],
[ False, True, True],
[ False, False, False],
[ True, False, False]])
They have the same dimension and the number of True's in each column of m is the same.
I need to get the 2x3 tensor that is
a.transpose(0,1).masked_select(torch.from_numpy(m.transpose())).reshape(a.shape[1],-1).transpose(0,1)
which is
tensor([[ 3., 1., 5.],
[12., 7., 8.]])
The actual tensor is very large, and the operation needs to be performed many times. So I want to ask what is an efficient way of doing this (or the most efficient way).
In my benchmarks a jitted numba solution is the fastest, I could find
My benchmarks for a, m with shape (10000,200)(equal result tensors)
1
#numba.jit
13.2 ms (3.46x)
2
list comprehension
31.3 ms (1.46x)
3
baseline
45.7 ms (1.00x)
Generation of sufficiently large sample data for benchmarking
import torch
import numpy as np
def generate_data(rows=500, columns=100):
a = torch.from_numpy(np.random.uniform(1,10, (rows,columns)).astype(np.float32))
# argsort trick by #divakar https://stackoverflow.com/a/55317373/14277722
def shuffle_along_axis(a, axis):
idx = np.random.rand(*a.shape).argsort(axis=axis)
return np.take_along_axis(a,idx,axis=axis)
m = shuffle_along_axis(np.full((columns,rows), np.random.randint(2, size=rows)), 1).astype('bool').T
return a, np.ascontiguousarray(m)
a, m = generate_data(10000,200)
A jitted numba implementation
import numba as nb
#nb.njit
def gather2d(arr1, arr2):
res = np.zeros((np.count_nonzero(arr2[:,0]), arr1.shape[1]), np.float32)
counter = np.zeros(arr1.shape[1], dtype=np.intp)
for i in range(arr1.shape[0]):
for j in range(arr1.shape[1]):
if arr2[i,j]:
res[counter[j], j] = arr1[i,j]
counter[j] += 1
return res
torch.from_numpy(gather2d(a.numpy(),m))
Output
# %timeit 10 loops, best of 5: 13.2 ms per loop
tensor([[2.1846, 7.8890, 8.8218, ..., 4.8309, 9.2853, 6.4404],
[5.8842, 3.7332, 6.7436, ..., 1.2914, 3.2983, 3.5627],
[9.5128, 2.4283, 2.2152, ..., 4.9512, 9.7335, 9.6252],
...,
[7.3193, 7.8524, 9.6654, ..., 3.3665, 8.8926, 4.7660],
[1.3829, 1.3347, 6.6436, ..., 7.1956, 4.0446, 6.4633],
[6.4264, 3.6283, 3.6385, ..., 8.4152, 5.8498, 5.0281]])
Against a vectorized baseline solution
# %timeit 10 loops, best of 5: 45.7 ms per loop
a.gather(0, torch.from_numpy(np.nonzero(m.T)[1].reshape(-1, m.shape[1], order='F')))
A python list comprehension turns out to be surprisingly fast
def g(arr1,arr2):
return np.array([i[j] for i,j in zip(arr1.T,arr2.T)]).T
# %timeit 10 loops, best of 5: 31.3 ms per loop
torch.from_numpy(g(a.numpy(), m))
You can try this way by using only NumPy and PyTorch:
b,c = m.nonzero()
b = torch.tensor(b)
c = torch.tensor(c)
a[b,c].reshape(2,3)
#output
tensor([[ 1., 3., 5.],
[ 7., 8., 12.]]) # True values are taken on axis=1
I used the same example provided by #Michael Szczesny to measure the time:
import numpy as np
import timeit
import torch
rows, columns = (10000,200)
a = torch.from_numpy(np.random.uniform(1,10, (rows,columns)).astype(np.float32))
m = np.random.choice([False, True], size=(rows, columns))
starttime = timeit.default_timer()
b,c = m.nonzero()
b = torch.tensor(b)
c = torch.tensor(c)
a[b,c]
print(f"The time difference is :{(timeit.default_timer() - starttime)*1000} ms")
#output
The time difference is : 26.4 ms
It is better than the second and third approaches of #Michael Szczesny.
I have two 1D numpy arrays A and B of size (n, ) and (m, ) respectively which correspond to the x positions of points on a line. I want to calculate the distance between every point in A to every point in B. I then need to use these distances at a set y distance, d, to work out the potential at each point in A.
I'm currently using the following:
V = numpy.zeros(n)
for i in range(n):
xdist = A[i] - B
r = numpy.sqrt(xdist**2 + d**2)
dV = 1/r
V[i] = numpy.sum(dV)
This works but for large data sets it can take a while so I would like to use a function similar to scipy.spatial.distance.cdist which doesn't work for 1D arrays and I don't want to add another dimension to the arrays as they become too large.
Vectorized approach
One vectorized approach after extending A to 2D with the introduction of a new axis using np.newaxis/None and thus making use of broadcasting would be -
(1/(np.sqrt((A[:,None] - B)**2 + d**2))).sum(1)
Hybrid approach for large arrays
Now, for large arrays, we might have to divide the data into chunks.
Thus, with BSZ as the block size, we would have a hybrid approach, like so -
dsq = d**2
V = np.zeros((n//BSZ,BSZ))
for i in range(n//BSZ):
V[i] = (1/(np.sqrt((A[i*BSZ:(i+1)*BSZ,None] - B)**2 + dsq))).sum(1)
Runtime test
Approaches -
def original_app(A,B,d):
V = np.zeros(n)
for i in range(n):
xdist = A[i] - B
r = np.sqrt(xdist**2 + d**2)
dV = 1/r
V[i] = np.sum(dV)
return V
def vectorized_app1(A,B,d):
return (1/(np.sqrt((A[:,None] - B)**2 + d**2))).sum(1)
def vectorized_app2(A,B,d, BSZ = 100):
dsq = d**2
V = np.zeros((n//BSZ,BSZ))
for i in range(n//BSZ):
V[i] = (1/(np.sqrt((A[i*BSZ:(i+1)*BSZ,None] - B)**2 + dsq))).sum(1)
return V.ravel()
Timings and verification -
In [203]: # Setup inputs
...: n,m = 10000,2000
...: A = np.random.rand(n)
...: B = np.random.rand(m)
...: d = 10
...:
In [204]: out1 = original_app(A,B,d)
...: out2 = vectorized_app1(A,B,d)
...: out3 = vectorized_app2(A,B,d, BSZ = 100)
...:
...: print np.allclose(out1, out2)
...: print np.allclose(out1, out3)
...:
True
True
In [205]: %timeit original_app(A,B,d)
10 loops, best of 3: 133 ms per loop
In [206]: %timeit vectorized_app1(A,B,d)
10 loops, best of 3: 138 ms per loop
In [207]: %timeit vectorized_app2(A,B,d, BSZ = 100)
10 loops, best of 3: 65.2 ms per loop
We can play around with the parameter block size BSZ -
In [208]: %timeit vectorized_app2(A,B,d, BSZ = 200)
10 loops, best of 3: 74.5 ms per loop
In [209]: %timeit vectorized_app2(A,B,d, BSZ = 50)
10 loops, best of 3: 67.4 ms per loop
Thus, the best one seems to be giving a 2x speedup with a block size of 100 at my end.
EDIT: My answer turned out to be nearly identical to Divakar's after a closer look. However, you can save some memory by doing the operations in-place. Taking the sum along the second axis is more efficient than long the first.
import numpy
a = numpy.random.randint(0,10,10) * 1.
b = numpy.random.randint(0,10,10) * 1.
xdist = a[:,None] - b
xdist **= 2
xdist += d**2
xdist **= -1
V = numpy.sum(xdist, axis=1)
which gives the same solution as your code.
I would like to use a function similar to scipy.spatial.distance.cdist which doesn't work for 1D arrays and I don't want to add another dimension to the arrays as they become too large.
cdist works fine, you just have to reshape the arrays to have shape (n, 1) instead of (n,). You can add another dimension to a one-dimensional array A without copying the underlying data by using A[:, None] or A.reshape(-1, 1).
For example,
In [56]: from scipy.spatial.distance import cdist
In [57]: A
Out[57]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [58]: B
Out[58]: array([0, 2, 4, 6, 8])
In [59]: A[:, None]
Out[59]:
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
In [60]: cdist(A[:, None], B[:, None])
Out[60]:
array([[ 0., 2., 4., 6., 8.],
[ 1., 1., 3., 5., 7.],
[ 2., 0., 2., 4., 6.],
[ 3., 1., 1., 3., 5.],
[ 4., 2., 0., 2., 4.],
[ 5., 3., 1., 1., 3.],
[ 6., 4., 2., 0., 2.],
[ 7., 5., 3., 1., 1.],
[ 8., 6., 4., 2., 0.],
[ 9., 7., 5., 3., 1.]])
To compute V as shown in your code, you can use cdist with metric='sqeuclidean', as follows:
In [72]: d = 3.
In [73]: r = np.sqrt(cdist(A[:,None], B[:,None], metric='sqeuclidean') + d**2)
In [74]: V = (1/r).sum(axis=1)
So I'm trying to to take the dot product of two arrays using numpy's dot product function.
import numpy as np
MWFrPos_Hydro1 = subPos1[submaskFirst1]
x = MWFrPos_Hydro1
MWFrVel_Hydro1 = subVel1[submaskFirst1]
y = MWFrVel_Hydro1
MWFrPosMag_Hydro1 = [np.linalg.norm(i) for i in MWFrPos_Hydro1]
np.dot(x, y)
returns
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-135-9ef41eb4235d> in <module>()
6
7
----> 8 np.dot(x, y)
ValueError: shapes (1220,3) and (1220,3) not aligned: 3 (dim 1) != 1220 (dim 0)
And I using this function improperly?
The arrays look like this
print x
[[ 51.61872482 106.19775391 69.64765167]
[ 33.86419296 11.75729942 11.84990311]
[ 12.75009823 58.95491028 38.06708527]
...,
[ 99.00266266 96.0210495 18.79844856]
[ 27.18083954 74.35041809 78.07577515]
[ 19.29788399 82.16114044 1.20453501]]
print y
[[ 40.0402298 -162.62153625 -163.00158691]
[-359.41983032 -115.39328766 14.8419466 ]
[ 95.92044067 -359.26425171 234.57330322]
...,
[ 130.17840576 -7.00977898 42.09699249]
[ 37.37852478 -52.66002655 -318.15155029]
[ 126.1726532 121.3104248 -416.20855713]]
Would for looping np.vdot be more optimal in this circumstance?
You can't take the dot product of two n * m matrices unless m == n -- when multiplying two matrices, A and B, B needs to have as many columns as A has rows. (So you can multiply an n * m matrix with an m * n matrix.)
See this article on multiplying matrices.
Some possible products for (n,3) arrays (here I'll just one)
In [434]: x=np.arange(12.).reshape(4,3)
In [435]: x
Out[435]:
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.]])
element by element product, summed across the columns; n values. This is a magnitude like number.
In [436]: (x*x).sum(axis=1)
Out[436]: array([ 5., 50., 149., 302.])
Same thing with einsum, which gives more control over which axes are multiplied, and which are summed.
In [437]: np.einsum('ij,ij->i',x,x)
Out[437]: array([ 5., 50., 149., 302.])
dot requires last of the 1st and 2nd last of 2nd to have the same size, so I have to use x.T (transpose). The diagonal matches the above.
In [438]: np.dot(x,x.T)
Out[438]:
array([[ 5., 14., 23., 32.],
[ 14., 50., 86., 122.],
[ 23., 86., 149., 212.],
[ 32., 122., 212., 302.]])
np.einsum('ij,kj',x,x) does the same thing.
There is a new matmul product, but with 2d arrays like this it is just dot. I have to turn them into 3d arrays to get the 4 values; and even with that I have to squeeze out excess dimensions:
In [450]: x[:,None,:]#x[:,:,None]
Out[450]:
array([[[ 5.]],
[[ 50.]],
[[ 149.]],
[[ 302.]]])
In [451]: np.squeeze(_)
Out[451]: array([ 5., 50., 149., 302.])
I create an arbitrary 2x2 matrix:
In [87]: mymat = np.matrix([[2,4],[5,3]])
In [88]: mymat
Out[88]:
matrix([[2, 4],
[5, 3]])
I attempt to calculate eigenvectors using numpy.linalg.eig:
In [91]: np.linalg.eig(mymat)
Out[91]:
(array([-2., 7.]),
matrix([[-0.70710678, -0.62469505],
[ 0.70710678, -0.78086881]]))
In [92]: eigvec = np.linalg.eig(mymat)[1][0].T
In [93]: eigvec
Out[93]:
matrix([[-0.70710678],
[-0.62469505]])
I multiply one of my eigenvectors with my matrix expecting the result to be a vector that is a scalar multiple of my eigenvector.
In [94]: mymat * eigvec
Out[94]:
matrix([[-3.91299375],
[-5.40961905]])
However it is not. Can anyone explain to me what is going wrong here?
From the documentation for linalg.eig:
v : (..., M, M) array
The normalized (unit "length") eigenvectors, such that the
column v[:,i] is the eigenvector corresponding to the
eigenvalue w[i].
You want the columns, not the rows.
>>> mymat = np.matrix([[2,4],[5,3]])
>>> vals, vecs = np.linalg.eig(mymat)
>>> vecs[:,0]
matrix([[-0.70710678],
[ 0.70710678]])
>>> (mymat * vecs[:,0])/vecs[:,0]
matrix([[-2.],
[-2.]])
>>> vecs[:,1]
matrix([[-0.62469505],
[-0.78086881]])
>>> (mymat * vecs[:,1])/vecs[:,1]
matrix([[ 7.],
[ 7.]])
No, it's true. numpy does not work correctly. Example:
A
Out[194]:
matrix([[-3, 3, 2],
[ 1, -1, -2],
[-1, -3, 0]])
E = np.linalg.eig(A)
E
Out[196]:
(array([ 2., -4., -2.]),
matrix([[ -2.01889132e-16, 9.48683298e-01, 8.94427191e-01],
[ 5.54700196e-01, -3.16227766e-01, -3.71551690e-16],
[ -8.32050294e-01, 2.73252305e-17, 4.47213595e-01]]))
A*E[1] / E[1]
Out[205]:
matrix([[ 6.59900617, -4. , -2. ],
[ 2. , -4. , -3.88449298],
[ 2. , 8.125992 , -2. ]])
My goal is to assign the values of an existing 2D array, or create a new array, using two 2D arrays of the same shape, one with values and one with indices to assign the corresponding value to.
X = np.array([range(5),range(5)])
X
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
Y= np.array([range(5), [2,3,4,1,0]])
Y
array([[0, 1, 2, 3, 4],
[2, 3, 4, 1, 0]])
My desired output is an array of the same shape as X and Y, with the values of X given in the index from the corresponding row in Y. This result can be achieved by looping through each row in the following way:
output = np.zeros(X.shape)
for i in range(X.shape[0]):
output[i][Y[i]] = X[i]
output
array([[ 0., 1., 2., 3., 4.],
[ 4., 3., 0., 1., 2.]])
Is there a more efficient way to apply this sort of assignment?
np.take(output, Y)
Will return the items in the output array I would like to assign to the values of X to, but I believe np.take does not produce a reference to the original array, and instead a new array.
for i in range(X.shape[0]):
output[i][Y[i]] = X[i]
is equivalent to
I = np.arange(X.shape[0])[:, np.newaxis]
output[I, Y] = X
For example,
X = np.array([range(5),range(5)])
Y = np.array([range(5), [2,3,4,1,0]])
output = np.zeros(X.shape)
I = np.arange(X.shape[0])[:, np.newaxis]
output[I, Y] = X
yields
>>> output
array([[ 0., 1., 2., 3., 4.],
[ 4., 3., 0., 1., 2.]])
There is not much difference in performance when the loop has few iterations.
But if X.shape[0] is large, then using indexing is much faster:
def using_loop(X, Y):
output = np.zeros(X.shape)
for i in range(X.shape[0]):
output[i][Y[i]] = X[i]
return output
def using_indexing(X, Y):
output = np.zeros(X.shape)
I = np.arange(X.shape[0])[:, np.newaxis]
output[I, Y] = X
return output
X2 = np.tile(X, (100,1))
Y2 = np.tile(Y, (100,1))
In [77]: %timeit using_loop(X2, Y2)
1000 loops, best of 3: 376 µs per loop
In [78]: %timeit using_indexing(X2, Y2)
100000 loops, best of 3: 15.2 µs per loop