Related
I have two HxW matrices A and B. I'd like to get an NxHxW matrix C such that C[0]=A, C[-1]=B, and each of the remaining N-2 slices are linearly interpolated between A and B. Is there a single numpy function I can do this with, without needing a for loop?
Just use linspace if you are looking for linear interpolation between just 2 points.
A = np.array([[0,1],
[2,3]])
B = np.array([[1, 3],
[-1,-2]])
C = np.linspace(A,B,4) #<- Change this to H+2, which is H linearly interpolated values between the 2 points
C
array([[[ 0. , 1. ], #<-- A matrix is C[0]
[ 2. , 3. ]],
[[ 0.33333333, 1.66666667],
[ 1. , 1.33333333]], #
#<-- Elementwise equally spaced values
[[ 0.66666667, 2.33333333], #
[ 0. , -0.33333333]],
[[ 1. , 3. ], #<-- B matrix is C[-1]
[-1. , -2. ]]])
Suppose a 2d array is given as:
arr = array([[1, 1, 1],
[4, 5, 8],
[2, 6, 9]])
if point=array([1,1]) is given then I want to calculate the euclidean distance from all indices of arr to point (1,1). The result should be
array([[1.41 , 1. , 1.41],
[1. , 0. , 1. ],
[1.41 , 1. , 1.41]])
For loop is too slow to do these computations. Is there any faster method to achieve this using numpy or scipy?
Thanks!!!
Approach #1
You can use scipy.ndimage.morphology.distance_transform_edt -
def distmat(a, index):
mask = np.ones(a.shape, dtype=bool)
mask[index[0],index[1]] = False
return distance_transform_edt(mask)
Approach #2
Another with NumPy-native tools -
def distmat_v2(a, index):
i,j = np.indices(a.shape, sparse=True)
return np.sqrt((i-index[0])**2 + (j-index[1])**2)
Sample run -
In [60]: a
Out[60]:
array([[1, 1, 1],
[4, 5, 8],
[2, 6, 9]])
In [61]: distmat(a, index=[1,1])
Out[61]:
array([[1.41421356, 1. , 1.41421356],
[1. , 0. , 1. ],
[1.41421356, 1. , 1.41421356]])
In [62]: distmat_v2(a, index=[1,1])
Out[62]:
array([[1.41421356, 1. , 1.41421356],
[1. , 0. , 1. ],
[1.41421356, 1. , 1.41421356]])
Benchmarking
Other proposed solution(s) :
# https://stackoverflow.com/a/61629292/3293881 #Ehsan
def norm_method(arr, point):
point = np.asarray(point)
return np.linalg.norm(np.indices(arr.shape, sparse=True)-point)
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
In [66]: import benchit
In [76]: funcs = [distmat, distmat_v2, norm_method]
In [77]: inputs = {n:(np.random.rand(n,n),[1,1]) for n in [3,10,50,100,500,1000,2000,5000]}
In [83]: T = benchit.timings(funcs, inputs, multivar=True, input_name='Length')
In [84]: In [33]: T.plot(logx=True, colormap='Dark2', savepath='plot.png')
So, distmat_v2 seems to be doing really well, We can further improve on it, by leveraging numexpr.
Extend to array of indices
We could extend the listed solutions to cover for the generic/bigger case of list/array of indices w.r.t. whom we need to get euclidean distances at rest of the positions, like so -
def distmat_indices(a, indices):
indices = np.atleast_2d(indices)
mask = np.ones(a.shape, dtype=bool)
mask[indices[:,0],indices[:,1]] = False
return distance_transform_edt(mask)
def distmat_indices_v2(a, indices):
indices = np.atleast_2d(indices)
i,j = np.indices(a.shape, sparse=True)
return np.sqrt(((i-indices[:,0])[...,None])**2 + (j-indices[:,1,None])**2).min(1)
Sample run -
In [143]: a = np.random.rand(4,5)
In [144]: distmat_indices(a, indices=[[2,2],[0,3]])
Out[144]:
array([[2.82842712, 2. , 1. , 0. , 1. ],
[2.23606798, 1.41421356, 1. , 1. , 1.41421356],
[2. , 1. , 0. , 1. , 2. ],
[2.23606798, 1.41421356, 1. , 1.41421356, 2.23606798]])
On top of #Divakar's good solutions, if you are looking for something abstract, you can use:
np.linalg.norm(np.indices(arr.shape, sparse=True)-point)
Note that it works with numpy 1.17+ (argument sparse is added on the versions 1.17+ of numpy). Upgrade your numpy and enjoy.
In case you have older than 1.17 version of numpy , you can add dimensions to your point by using this:
np.linalg.norm(np.indices(arr.shape)-point[:,None,None], axis=0)
output for point=np.array([1,1]) and given array in question:
[[1.41421356 1. 1.41421356]
[1. 0. 1. ]
[1.41421356 1. 1.41421356]]
I have this delta function which have 3 cases. mask1, mask2 and if none of them is satisfied delta = 0, since res = np.zeros
def delta(r, dr):
res = np.zeros(r.shape)
mask1 = (r >= 0.5*dr) & (r <= 1.5*dr)
res[mask1] = (5-3*np.abs(r[mask1])/dr \
- np.sqrt(-3*(1-np.abs(r[mask1])/dr)**2+1)) \
/(6*dr)
mask2 = np.logical_not(mask1) & (r <= 0.5*dr)
res[mask2] = (1+np.sqrt(-3*(r[mask2]/dr)**2+1))/(3*dr)
return res
Then I have this other function where I call the former and I construct an array, E
def matrix_E(nk,X,Y,xhi,eta,dx,dy):
rx = abs(X[np.newaxis,:] - xhi[:,np.newaxis])
ry = abs(Y[np.newaxis,:] - eta[:,np.newaxis])
deltx = delta(rx,dx)
delty = delta(ry,dy)
E = deltx*delty
return E
The thing is that most of the elements of E belong to the third case of delta, 0. Most means about 99%.
So, I would like to have a sparse matrix instead of a dense one and not to stock the 0 elements in order to save memory.
Any ideas in how I could do it?
The normal way to create a sparse matrix is to construct three 1d arrays, with the nonzero values, and their i and j indexes. Then pass them to the coo_matrix function.
The coordinates don't have to be in order, so you could construct the arrays for the 2 nonzero mask cases and concatenate them.
Here's a sample construction using 2 masks
In [107]: x=np.arange(5)
In [108]: i,j,data=[],[],[]
In [110]: mask1=x%2==0
In [111]: mask2=x%2!=0
In [112]: i.append(x[mask1])
In [113]: j.append((x*2)[mask1])
In [114]: i.append(x[mask2])
In [115]: j.append(x[mask2])
In [116]: i=np.concatenate(i)
In [117]: j=np.concatenate(j)
In [118]: i
Out[118]: array([0, 2, 4, 1, 3])
In [119]: j
Out[119]: array([0, 4, 8, 1, 3])
In [120]: M=sparse.coo_matrix((x,(i,j)))
In [121]: print(M)
(0, 0) 0
(2, 4) 1
(4, 8) 2
(1, 1) 3
(3, 3) 4
In [122]: M.A
Out[122]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 2]])
A coo format stores those 3 arrays as is, but they get sorted and cleaned up when converted to other formats and printed.
I can work on adapting this to your case, but this may be enough to get you started.
It looks like X,Y,xhi,eta are 1d arrays. rx and ry are then 2d. delta returns a result the same shape as its input. E = deltx*delty suggests that deltax and deltay are the same shape (or at least broadcastable).
Since sparse matrix has a .multiply method to do element wise multiplication, we can focus on producing sparse delta matrices.
If you afford the memory to make rx, and a couple of masks, then you can also afford to make deltax (all the same size). Even through deltax has lots of zeros, it is probably fastest to make it dense.
But let's try to case the delta calculation, as a sparse build.
This looks like the essense of what you are doing in delta, at least with one mask:
start with a 2d array:
In [138]: r = np.arange(24).reshape(4,6)
In [139]: mask1 = (r>=8) & (r<=16)
In [140]: res1 = r[mask1]*0.2
In [141]: I,J = np.where(mask1)
the resulting vectors are:
In [142]: I
Out[142]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [143]: J
Out[143]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [144]: res1
Out[144]: array([ 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2])
Make a sparse matrix:
In [145]: M=sparse.coo_matrix((res1,(I,J)), r.shape)
In [146]: M.A
Out[146]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I could make another sparse matrix with mask2, and add the two.
In [147]: mask2 = (r>=17) & (r<=22)
In [148]: res2 = r[mask2]*-0.4
In [149]: I,J = np.where(mask2)
In [150]: M2=sparse.coo_matrix((res2,(I,J)), r.shape)
In [151]: M2.A
Out[151]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
...
In [153]: (M1+M2).A
Out[153]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Or I could concatenate the res1 and res2, etc and make one sparse matrix:
In [156]: I1,J1 = np.where(mask1)
In [157]: I2,J2 = np.where(mask2)
In [158]: res12=np.concatenate((res1,res2))
In [159]: I12=np.concatenate((I1,I2))
In [160]: J12=np.concatenate((J1,J2))
In [161]: M12=sparse.coo_matrix((res12,(I12,J12)), r.shape)
In [162]: M12.A
Out[162]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, -6.8],
[-7.2, -7.6, -8. , -8.4, -8.8, 0. ]])
Here I choose the masks so the nonzero values don't overlap, but both methods work if they did. It's a delibrate design feature of the coo format that values for repeated indices are summed. It's very handy feature when creating sparse matries for finite element problems.
I can also get index arrays by creating a sparse matrix from the mask:
In [179]: rmask1=sparse.coo_matrix(mask1)
In [180]: rmask1.row
Out[180]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
In [181]: rmask1.col
Out[181]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
In [184]: sparse.coo_matrix((res1, (rmask1.row, rmask1.col)),rmask1.shape).A
Out[184]:
array([[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1.6, 1.8, 2. , 2.2],
[ 2.4, 2.6, 2.8, 3. , 3.2, 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
I can't, though, create a mask from a sparse version of r. (r>=8) & (r<=16). That kind of inequality test has not been implemented for sparse matrices. But that might not matter, since r is probably not sparse.
I create an arbitrary 2x2 matrix:
In [87]: mymat = np.matrix([[2,4],[5,3]])
In [88]: mymat
Out[88]:
matrix([[2, 4],
[5, 3]])
I attempt to calculate eigenvectors using numpy.linalg.eig:
In [91]: np.linalg.eig(mymat)
Out[91]:
(array([-2., 7.]),
matrix([[-0.70710678, -0.62469505],
[ 0.70710678, -0.78086881]]))
In [92]: eigvec = np.linalg.eig(mymat)[1][0].T
In [93]: eigvec
Out[93]:
matrix([[-0.70710678],
[-0.62469505]])
I multiply one of my eigenvectors with my matrix expecting the result to be a vector that is a scalar multiple of my eigenvector.
In [94]: mymat * eigvec
Out[94]:
matrix([[-3.91299375],
[-5.40961905]])
However it is not. Can anyone explain to me what is going wrong here?
From the documentation for linalg.eig:
v : (..., M, M) array
The normalized (unit "length") eigenvectors, such that the
column v[:,i] is the eigenvector corresponding to the
eigenvalue w[i].
You want the columns, not the rows.
>>> mymat = np.matrix([[2,4],[5,3]])
>>> vals, vecs = np.linalg.eig(mymat)
>>> vecs[:,0]
matrix([[-0.70710678],
[ 0.70710678]])
>>> (mymat * vecs[:,0])/vecs[:,0]
matrix([[-2.],
[-2.]])
>>> vecs[:,1]
matrix([[-0.62469505],
[-0.78086881]])
>>> (mymat * vecs[:,1])/vecs[:,1]
matrix([[ 7.],
[ 7.]])
No, it's true. numpy does not work correctly. Example:
A
Out[194]:
matrix([[-3, 3, 2],
[ 1, -1, -2],
[-1, -3, 0]])
E = np.linalg.eig(A)
E
Out[196]:
(array([ 2., -4., -2.]),
matrix([[ -2.01889132e-16, 9.48683298e-01, 8.94427191e-01],
[ 5.54700196e-01, -3.16227766e-01, -3.71551690e-16],
[ -8.32050294e-01, 2.73252305e-17, 4.47213595e-01]]))
A*E[1] / E[1]
Out[205]:
matrix([[ 6.59900617, -4. , -2. ],
[ 2. , -4. , -3.88449298],
[ 2. , 8.125992 , -2. ]])
I'm doing a project and I'm doing a lot of matrix computation in it.
I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M non-zeros values. The example below is just to see my point.
Let's say I have:
A vector v of size (2)
A vector c of size (3)
A sparse matrix X of size (2,3)
v = np.asarray([10, 20])
c = np.asarray([ 2, 3, 4])
data = np.array([1, 1, 1, 1])
row = np.array([0, 0, 1, 1])
col = np.array([1, 2, 0, 2])
X = coo_matrix((data,(row,col)), shape=(2,3))
X.todense()
# matrix([[0, 1, 1],
# [1, 0, 1]])
Currently I'm doing:
result = np.zeros_like(v)
d = scipy.sparse.lil_matrix((v.shape[0], v.shape[0]))
d.setdiag(v)
tmp = d * X
print tmp.todense()
#matrix([[ 0., 10., 10.],
# [ 20., 0., 20.]])
# At this point tmp is csr sparse matrix
for i in range(tmp.shape[0]):
x_i = tmp.getrow(i)
result += x_i.data * ( c[x_i.indices] - x_i.data)
# I only want to do the subtraction on non-zero elements
print result
# array([-430, -380])
And my problem is the for loop and especially the subtraction.
I would like to find a way to vectorize this operation by subtracting only on the non-zero elements.
Something to get directly the sparse matrix on the subtraction:
matrix([[ 0., -7., -6.],
[ -18., 0., -16.]])
Is there a way to do this smartly ?
You don't need to loop over the rows to do what you are already doing. And you can use a similar trick to perform the multiplication of the rows by the first vector:
import scipy.sparse as sps
# number of nonzero entries per row of X
nnz_per_row = np.diff(X.indptr)
# multiply every row by the corresponding entry of v
# You could do this in-place as:
# X.data *= np.repeat(v, nnz_per_row)
Y = sps.csr_matrix((X.data * np.repeat(v, nnz_per_row), X.indices, X.indptr),
shape=X.shape)
# subtract from the non-zero entries the corresponding column value in c...
Y.data -= np.take(c, Y.indices)
# ...and multiply by -1 to get the value you are after
Y.data *= -1
To see that it works, set up some dummy data
rows, cols = 3, 5
v = np.random.rand(rows)
c = np.random.rand(cols)
X = sps.rand(rows, cols, density=0.5, format='csr')
and after run the code above:
>>> x = X.toarray()
>>> mask = x == 0
>>> x *= v[:, np.newaxis]
>>> x = c - x
>>> x[mask] = 0
>>> x
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
>>> Y.toarray()
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
The way you are accumulating your result requires that there are the same number of non-zero entries in every row, which seems a pretty weird thing to do. Are you sure that is what you are after? If that's really what you want you could get that value with something like:
result = np.sum(Y.data.reshape(Y.shape[0], -1), axis=0)
but I have trouble believing that is really what you are after...