In the code, I have a 2D array(D) and for each column, I want to extract some "k" no of neighboring cols(left and right) and sum them up. A naive approach would be to use a for loop, but to speed up this I am trying to slice the 2D matrix for each column to get a submatrix and sum it column-wise. Surprisingly, the naive approach is faster than using the slicing option for k > 6. Any suggestion on how I can make the implementation efficient?
Naive implementation:
`
k = 64
index = np.arange(D.shape[1])
index_kp = index + k
index_kn = index - k
# neighbors can be less than k if sufficient neighbors not available; for ex. near beginning and the end of an array
index_kn[np.where(index_kn <0)] = np.where(index_kn <0)
index_kp[np.where(index_kp > (len(index)-1))] = np.where(index_kp > (len(index)-1))
Dsmear = np.empty_like(D) #stores the summation of neighboring k columns for each col
for i in range(len(index_kp)):
Dsmear[:,i] = np.sum(D[:, index_kn[i]:index_kp[i]], axis=1)
`
Slicing implementation:
D1 = np.concatenate((np.repeat(D[:,0].reshape(-1,1),k,axis=1), D, np.repeat(D[:,-1].reshape(-1,1),k,axis=1)),axis=1) #padding the edges with k columns
idx = np.asarray([np.arange(i-k,i+k+1) for i in range(k, D.shape[1]+k)], dtype=np.int32)
D_broadcast = D1[:, idx] # 3D array; is a bottleneck
Dsmear = np.sum(D_broadcast, axis=2)
Related
Using python/numpy, I would like to create a 2D matrix M whose components are:
I know I can do this with a bunch of for loops but is there a better way to do this by using numpy (not using for loops)?
This is how I tried, which end up giving me a value error.
I tried to first define a function that takes the sum over k:
define sum_function(i,j):
initial_array = np.arange(g(i,j),h(i,j)+1)
applied_array = f(i,j,initial_array)
return applied_array.sum()
then I tried to create the M matrix with np.mgrid as follows:
ii, jj = np.mgrid(start:fin, start:fin)
M_matrix = sum_function(ii,jj)
--
(Edited)
Let me write down the concrete form of a matrix as an example:
M_{i,j} = \sum_{k=min(i,j)}^{i+j}\sin{\left( (i+j)^k \right)}
if i,j = 0,1, then this matrix is 2 by 2 and it's form will be
\bigl(\begin{smallmatrix}
\sin(0) & \sin(1) \
\sin(1)& \sin(2)+\sin(4)
\end{smallmatrix}\bigr)
Now if the matrix gets really big, how would I create this matrix without using for loops?
To simplify thinking, lets ravel the i,j dimensions to one, ij dimension. Can we evaluate 3 arrays:
G = g(ij) # for all ij values
H = h(ij)
F = f(ij, kk) # for all ij, and all kk
In other words, can g,h,f be evaluated at multiple values, to produce whole-arrays?
If the G and H values were the same for all ij, or subsets (preferably slices), then
F[:, G:H].sum(axis=1)
would be the value for all ij.
If the H-G difference, the size of each slice, was the same, then we can construct a 2d indexing array, GH such that
F[:, GH].sum(axis=1)
In other words we are summing constant size windows of the F rows.
But if the H-G differences vary across ij, I think we are stuck with doing the sum for each ij element separately - with Python level loops, or ones complied with numba or cython.
I think I myself found an answer to this. I first create 3D array F_{i,j,k} = f(i,j,k). And then create a mask_array whose component is Ture if g(i,j) < k < f(i,j), False otherwise. Then I compute the element-wise multiplication of these two arrays, F*mask_array, and then taking the sum over k axis.
For example, this matrix can be efficiently created by the following code.
M_{i,j} = \sum_{k=min(i,j)}^{i+j}\sin{\left( (i+j)^k \right)}
#in this example, g(i,j) = min(i,j) and h(i,j) = i+j f(i,j,k) = sin((i+j)^k)
# 0<= i, j <= 2
#kk should range from min g(i,j) to max h(i,j)
ii, jj, kk = np.mgrid[0:3,0:3,0:5]
# k > g(i,j)
frm1 = kk >= jj
frm2 = kk >= ii
frm = np.logical_or(frm1,frm2)
# k < h(i,j)
to = kk <= ii+jj
#mask
k_mask = np.logical_and(frm,to)
def f(i,j,k):
return np.sin((i+j)**k)
M_before_mask = f(ii,jj,kk)
#Matrix created
M_matrix = (M_before_mask*k_mask).sum(axis=2)
This is the case of a 3D camera streaming a depth map of a scene. The resolution of the camera is known and equal to (w, h) which is set to (3, 2) for this example.
I try to compare each new frame with a bag of samples. Each pixel has the same number of samples to be compared with which is known and equal to 4 for this example. The bag of samples has the following shape (w, h, nb_sample) which is equal to (3, 2, 4) for this example.
I loop from 0 to nb_sample to compare the new frame with the samples. If the difference is higher than a threshold R, a counter is incremented.
The question is: Is there a way to optimize the loop?
import numpy as np
w = 3
h = 2
nb_sample = 4
R = 0.5
new_matrix = np.random.rand(w,h)
sample = np.random.rand(w, h, nb_sample)
count = np.zeros((w,h))
for index in range(0, nb_sample):
distance = np.abs(new_matrix - sample[:, :, index])
count[distance < R] += 1
print(count)
Try this two line solution:
distance = np.abs(sample - new_matrix[:,:,np.newaxis])
np.sum(distance < R, axis = -1)
Explanation:
By adding a dimension to new_matrix with np.newaxis numpy can calculate the difference for each matrix in sample using the - operation.
Then distance < R is calculated like in your code. True and False are represented as 1 and 0 in python, which is why they can then simply be added together along the right axis.
In my problem I have a vector containing n elements. Given a window size k I want to efficiently create a matrix size n x 2k+1 which contains the banded diagonal. For example:
a = [a_1, a_2, a_3, a_4]
k = 1
b = [[0, a_1, a_2],
[a_1, a_2, a_3],
[a_2, a_3, a_4],
[a_3, a_4, a_5],
[a_4, a_5, 0]]
The naive way to implement this would be using for loops
out_data = mx.ndarray.zeros((n, 2k+1))
for i in range(0, n):
for j in range(0, 2k+1):
index = i - k + j
if not (index < 0 or index >= seq_len):
out_data[i][j] = in_data[index]
This is very slow.
Creating the full matrix would be easy by just using tile and reshape, however the masking part is not clear.
Update
I found a faster, yet still very slow, implementation:
window = 2*self.windowSize + 1
in_data_reshaped = in_data.reshape((batch_size, seq_len))
out_data = mx.ndarray.zeros((seq_len * window))
for i in range(0, seq_len):
copy_from_start = max(i - self.windowSize, 0)
copy_from_end = min(seq_len -1, i+1+self.windowSize)
copy_length = copy_from_end - copy_from_start
copy_to_start = i*window + (2*self.windowSize + 1 - copy_length)
copy_to_end = copy_to_start + copy_length
out_data[copy_to_start:copy_to_end] = in_data_reshaped[copy_from_start:copy_from_end]
out_data = out_data.reshape((seq_len, window))
If in your operation, k and n are constant and you can do what you want using a combination of mxnet.nd.gather_nd() and mx.nd.scatter_nd. Even though generating the indices tensor is just as inefficient, because you need to do it only once, that wouldn't be a problem. You would want to use gather_nd to effectively "duplicate" your data from original array and then use scatter_nd to scatter them to the final matrix shape. Alternatively, you can simply concatenate a 0 element to your input array (for example [a_1, a_2, a_3] would turn into [0, a_1, a_2, a_3]) and then use only mxnet.nd.gather_nd() to duplicate elements into your final matrix.
I have a list X containg the data performed by different users N so the the number of the user is i=0,1,....,N-1. Each entry Xi has a different length.
I want to normalize the value of each user Xi over the global dataset X.
This is what I am doing. First of all I create a 1D list containing all the data, so:
tmp = list()
for i in range(0,len(X)):
tmp.extend(X[i])
then I convert it to an array and I remove outliers and NaN.
A = np.array(tmp)
A = A[~np.isnan(A)] #remove NaN
tr = np.percentile(A,95)
A = A[A < tr] #remove outliers
and then I create the histogram of this dataset
p, x = np.histogram(A, bins=10) # bin it into n = N/10 bins
finally I normalize the value of each users over the histogram I created, so:
Xn = list()
for i in range(0,len(X)):
tmp = np.array(X[i])
tmp = tmp[tmp < tr]
tmp = np.histogram(tmp, x)
Xn.append(append(tmp[0]/sum(tmp[0]))
My data set is very large and this process could take a while. I am wondering if there is e a better way to do that or a package.
For the first part, if each element X[i] of X is a list, you may be able to use sum, and then convert directly to an array, or use concatenate:
# Example X
X = [list(range(i)) for i in range(3, 19)] + [[2., np.NaN]]
# Build array with sum
A = np.array(sum(X, []))
# Build array with concatenate
A = np.concatenate(X)
The latter is more readable.
For the second part, I would store indices of the user to which each data point belongs.
idx = np.concatenate([np.full(len(x), i, int) for i,x in enumerate(X)])
tr = np.nanpercentile(A,95)
ok = A < tr # this excludes outliers, +Inf and NaN
idx = idx[ok]
A = A[ok]
Finally, you can compute x from the range of A, then use digitize on A and get the bins of each element. Then each pair (idx,bin-1) identifies the datum of a given user belonging to a given bin. You can then sum all these contributions using the at method of the ufunc add (see documentation). Finally, you divide by the sum over bins to normalize.
x = np.linspace(A.min(), A.max(), 10+1)
bin = np.digitize(A, x)
Xn = np.zeros((len(X), len(x)))
np.add.at(Xn, (idx,bin-1), 1)
Xn /= Xn.sum(axis=1)[:,np.newaxis]
I'm looking for a fast way to interconvert between linear and multidimensional indexing in Numpy.
To make my usage concrete, I have a large collection of N particles, each assigned 5 float values (dimensions) giving an Nx5 array. I then bin each dimension using numpy.digitize with an appropriate choice of bin boundaries to assign each particle a bin in the 5 dimensional space.
N = 10
ndims = 5
p = numpy.random.normal(size=(N,ndims))
for idim in xrange(ndims):
bbnds[idim] = numpy.array([-float('inf')]+[-2.,-1.,0.,1.,2.]+[float('inf')])
binassign = ndims*[None]
for idim in xrange(ndims):
binassign[idim] = numpy.digitize(p[:,idim],bbnds[idim]) - 1
binassign then contains rows that correspond to the multidimensional index. If I then want to convert the multidimensional index to a linear index, I think I would want to do something like:
linind = numpy.arange(6**5).reshape(6,6,6,6,6)
This would give a look-up for each multidimensional index to map it to a linear index. You could then go back using:
mindx = numpy.unravel_index(x,linind.shape)
Where I'm having difficulties is figuring out how to take binassign (the Nx5 array) containing the multidimensional index in each row, and coverting that to an 1d linear index, by using it to slice the linear indexing array linind.
If anyone has a one (or several) line indexing trick to go back and forth between the multidimensional index and the linear index in a way that vectorizes the operation for all N particles, I would appreciate your insight.
You can simply calculate the index of each bin:
box_indices = numpy.dot(ndims**numpy.arange(ndims), binassign)
The scalar product simply does 1*x0 + 5*x1 + 5*5*x2 +… This is done very efficiently through NumPy's dot().
Although I very much like EOL's answer, I wanted to generalize it a bit for non-uniform numbers of bins along each direction, and also to highlight the differences between C and F styles of ordering. Here is an example solution:
ndims = 5
N = 10
# Define bin boundaries
binbnds = ndims*[None]
nbins = []
for idim in xrange(ndims):
binbnds[idim] = numpy.linspace(-10.0,10.0,numpy.random.randint(2,15))
binbnds[idim][0] = -float('inf')
binbnds[idim][-1] = float('inf')
nbins.append(binbnds[idim].shape[0]-1)
nstates = numpy.cumprod(nbins)[-1]
# Define variable values for N particles in ndims dimensions
p = numpy.random.normal(size=(N,ndims))
# Assign to bins along each dimension
binassign = ndims*[None]
for idim in xrange(ndims):
binassign[idim] = numpy.digitize(p[:,idim],binbnds[idim]) - 1
binassign = numpy.array(binassign)
# multidimensional array with elements mapping from multidim to linear index
# Two different arrays for C vs F ordering
linind_C = numpy.arange(nstates).reshape(nbins,order='C')
linind_F = numpy.arange(nstates).reshape(nbins,order='F')
and now make the conversion
# Fast conversion to linear index
b_F = numpy.cumprod([1] + nbins)[:-1]
b_C = numpy.cumprod([1] + nbins[::-1])[:-1][::-1]
box_index_F = numpy.dot(b_F,binassign)
box_index_C = numpy.dot(b_C,binassign)
and to check for correctness:
# Check
print 'Checking correct mapping for each particle F order'
for k in xrange(N):
ii = box_index_F[k]
jj = linind_F[tuple(binassign[:,k])]
print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)
print 'Checking correct mapping for each particle C order'
for k in xrange(N):
ii = box_index_C[k]
jj = linind_C[tuple(binassign[:,k])]
print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)
And for completeness, if you want to go back from the 1d index to the multidimensional index in a fast, vectorized-style way:
print 'Convert C-style from linear to multi'
x = box_index_C.reshape(-1,1)
bassign_rev_C = x / b_C % nbins
print 'Convert F-style from linear to multi'
x = box_index_F.reshape(-1,1)
bassign_rev_F = x / b_F % nbins
and again to check:
print 'Check C-order'
for k in xrange(N):
ii = tuple(binassign[:,k])
jj = tuple(bassign_rev_C[k,:])
print ii==jj,ii,jj
print 'Check F-order'
for k in xrange(N):
ii = tuple(binassign[:,k])
jj = tuple(bassign_rev_F[k,:])
print ii==jj,ii,jj