Given a 2D NumPy array a and a list of indices stored in index, there must be a way of extracting the values of the list very efficiently. Using a for loop as follows take about 5 ms which seems extremely slow for 2000 elements to extract:
import numpy as np
import time
# generate dummy array
a = np.arange(4000).reshape(1000, 4)
# generate dummy list of indices
r1 = np.random.randint(1000, size=2000)
r2 = np.random.randint(3, size=2000)
index = np.concatenate([[r1], [r2]]).T
start = time.time()
result = [a[i, j] for [i, j] in index]
print time.time() - start
How can I increase the extraction speed? np.take does not seem appropriate here because it would return a 2D array instead of a 1D array.
You can use advanced indexing which basically means extract the row and column indices from the index array and then use it to extract values from a, i.e. a[index[:,0], index[:,1]] -
%timeit a[index[:,0], index[:,1]]
# 12.1 µs ± 368 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit [a[i, j] for [i, j] in index]
# 2.22 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Another option would be numpy.ravel_multi_index, which lets you avoid the manual indexing.
np.ravel_multi_index(index.T, a.shape)
Related
I would like to generate bootstrap samples of each row's nonzero indices. E.g. for this array:
m = np.array([[1,1,0,0], [1,1,0,1]])
I want to select two indices from the first row, and three from the second, with replacement. The non-vectorized solution is a for loop over the rows:
for row in m:
idx = np.nonzero(row)[0]
boot_idx = np.random.choice(idx, len(idx), replace=True)
print(boot_idx)
To clarify the need, the array m is actually a mask of a 3D tensor, and I want to take bootstrap averages of that tensor based on the indices selected here.
If speed is the concern, you could use numba:
import numpy as np
import numba as nb
#nb.njit
def func(m):
for row in m:
idx = np.nonzero(row)[0]
boot_idx = np.random.choice(idx, len(idx), replace=True)
return #return what you want
This results in significant speed increases:
def func_op(m):
for row in m:
idx = np.nonzero(row)[0]
boot_idx = np.random.choice(idx, len(idx), replace=True)
return
func(m) #Run once to JIT
%timeit func(m)
%timeit func_op(m)
Output:
706 ns ± 2.47 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
30.2 µs ± 859 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I need to generate a random binary matrix with dimensions m x n where all their rows are different among themselves. Using numpy I tried
import numpy as np
import random
n = 512
m = 1000
a = random.sample(range(0, 2**n), m)
a = np.array(a)
a = np.reshape(a, (m, 1))
np.unpackbits(a.view(np.uint8), axis=1)
But it is not suitable for my case because n > 128 and m > 1000. So, the code above generates only rows with at most 62 elements. Could you help me, please?
You could generate a random array of 0's and 1's with numpy.random.choice and then make sure that the rows are different through numpy.unique:
import numpy as np
m = 1000
n = 512
while True:
a = np.random.choice([0, 1], size=(m, n))
if len(np.unique(a, axis=0)) == m:
break
I would try creating one row at a time and check if that row exists already via a set which has a membership testing runtime of O(1). If the row exists simply generate another 1, if not add it to the array and move to the next row until you are done. This principle can be made faster by:
Setting the unique counter to 0
generating m - counter rows, adding the unique rows to the solution
increasing counter the by unique rows added
if counter == m you are done, else return to 2
The implementation is as follows:
import numpy as np
n = 128
m = 1000
a = np.zeros((m,n))
rows = set()
counter = 0
while counter < m:
temp = np.random.randint(0, 2, (m-counter, n))
for row in temp:
if tuple(row) not in rows:
rows.add(tuple(row))
a[counter] = row
counter += 1
Runtime comparison
By generating all the matrix at once and checking if all the rows are unique you are saving a lot of time, only if n >> log2(m).
Example 1
with the following:
n = 128
m = 1000
I ran my suggestion and the solution mentions in the other answer, resulting in:
# my suggestion
17.7 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# generating all matrix at once and chacking if all rows are unique
4.62 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This is because the probabily of generating m different rows is very high in this situation.
Example 2
When changing to:
n = 10
m = 1024
I ran my suggestion and the solution mentions in the other answer, resulting in:
# my suggestion
26.3 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The suggestion of generating all matrix at once and checking if all rows are unique did not finish running. This is because when math.log2(m) == n there are exactly m valid rows. The probability of generating a valid matrix randomly approaches 0 as the shape of the matrix increases.
You could create a matrix with unique rows and shuffle the rows:
n = 512
m = 1000
d = np.arange(m) # m unique numbers
d = ((d[:, None] & (1 << d[:n])) > 0).astype(np.uint8) # convert to binary array
i = np.random.randn(m).argsort() # indices used for shuffling rows
a = d[i] # output
all rows are unique:
assert len(np.unique(a, axis=0)) == m
Timings
n=128, m=1000:
271 µs ± 6.06 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
n=2**10, m=2**14:
50.9 ms ± 2.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
This works best for n <= m, otherwise you need to swap d[:n] with np.arange(n), resulting in longer runtime.
I have a two dimensional array Y of size (N,M), say for instance:
N, M = 200, 100
Y = np.random.normal(0,1,(N,M))
For each N, I want to compute the dot product of the vector (M,1) with its transpose, which returns a (M,M) matrix. One way to do it inefficiently is:
Y = Y[:,:,np.newaxis]
[Y[i,:,:] # Y[i,:,:].T for i in range(N)]
which is quite slow: timeit on the second line returns
11.7 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
I thought a much better way to do it is the use the einsum numpy function (https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html):
np.einsum('ijk,imk->ijm', Y, Y, optimize=True)
(which means: for each row i, create a (j,k) matrix where its elements results from the dot product on the last dimension m)
The two methods does returns the exact same result, but the runtime of this new version is disappointing (only a bit more than twice the speed)
3.82 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
One would expect much more improvement by using the vectorized einsum function since the first method is very inefficient... Do you have an explanation for this ? Does there exists a better way to do this calculation ?
In [60]: N, M = 200, 100
...: Y = np.random.normal(0,1,(N,M))
In [61]: Y1 = Y[:,:,None]
Your iteration, 200 steps to produce (100,100) arrays:
In [62]: timeit [Y1[i,:,:]#Y1[i,:,:].T for i in range(N)]
18.5 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
einsum only modestly faster:
In [64]: timeit np.einsum('ijk,imk->ijm', Y1,Y1)
14.5 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
but you could apply the # in full 'batch' mode with:
In [65]: timeit Y[:,:,None]#Y[:,None,:]
7.63 ms ± 224 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
But as Divakar notes, the sum axis is size 1, so you could use plain broadcasted multiply. This is an outer product, not a matrix one.
In [66]: timeit Y[:,:,None]*Y[:,None,:]
8.2 ms ± 64.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
'vectorizing' gives big gains when doing many iterations on a simple operation. For fewer operations on a more complex operation, the gain isn't as great.
This is an old post, yet covers the subject in many details: efficient outer product.
In particular if you are interested in adding numba dependency, that may be your fastest option.
Updating part of numba code from the original post and adding the multi outer product:
import numpy as np
from numba import jit
from numba.typed import List
#jit(nopython=True)
def outer_numba(a, b):
m = a.shape[0]
n = b.shape[0]
result = np.empty((m, n))
for i in range(m):
for j in range(n):
result[i, j] = a[i]*b[j]
return result
#jit(nopython=True)
def multi_outer_numba(Y):
all_result = List()
for k in range(Y.shape[0]):
y = Y[k]
n = y.shape[0]
tmp_res = np.empty((n, n))
for i in range(n):
for j in range(n):
tmp_res[i, j] = y[i]*y[j]
all_result.append(tmp_res)
return all_result
r = [outer_numba(Y[i],Y[i]) for i in range(N)]
r = multi_outer_numba(Y)
I have a square matrix that is NxN (N is usually >500). It is constructed using a numpy array.
I need to extract a new matrix that has the i-th column and row removed from this matrix. The new matrix is (N-1)x(N-1).
I am currently using the following code to extract this matrix:
new_mat = np.delete(old_mat,idx_2_remove,0)
new_mat = np.delete(old_mat,idx_2_remove,1)
I have also tried to use:
row_indices = [i for i in range(0,idx_2_remove)]
row_indices += [i for i in range(idx_2_remove+1,N)]
col_indices = row_indices
rows = [i for i in row_indices for j in col_indices]
cols = [j for i in row_indices for j in col_indices]
old_mat[(rows, cols)].reshape(len(row_indices), len(col_indices))
But I found this is slower than using np.delete() in the former. The former is still quite slow for my application.
Is there a faster way to accomplish what I want?
Edit 1:
It seems the following is even faster than the above two, but not by much:
new_mat = old_mat[row_indices,:][:,col_indices]
Here are 3 alternatives I quickly wrote:
Repeated delete:
def foo1(arr, i):
return np.delete(np.delete(arr, i, axis=0), i, axis=1)
Maximal use of slicing (may need some edge checks):
def foo2(arr,i):
N = arr.shape[0]
res = np.empty((N-1,N-1), arr.dtype)
res[:i, :i] = arr[:i, :i]
res[:i, i:] = arr[:i, i+1:]
res[i:, :i] = arr[i+1:, :i]
res[i:, i:] = arr[i+1:, i+1:]
return res
Advanced indexing:
def foo3(arr,i):
N = arr.shape[0]
idx = np.r_[:i,i+1:N]
return arr[np.ix_(idx, idx)]
Test that they work:
In [874]: x = np.arange(100).reshape(10,10)
In [875]: np.allclose(foo1(x,5),foo2(x,5))
Out[875]: True
In [876]: np.allclose(foo1(x,5),foo3(x,5))
Out[876]: True
Compare timings:
In [881]: timeit foo1(arr,100).shape
4.98 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [882]: timeit foo2(arr,100).shape
526 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [883]: timeit foo3(arr,100).shape
2.21 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So the slicing is fastest, even if the code is longer. It looks like np.delete works like foo3, but one dimension at a time.
I have points coordinates stored in a 3-dimensional array:
(UPD. the array is actually numpy-derived ndarray, sorry for the confusion in the initial version)
a = [ [[11,12]], [[21,22]], [[31,32]], [[41,42]] ]
you see that each coordinate pair is stored as nested 2-d array like [[11,12]], while I would like it to be [11,12], i.e. my array should have this content:
b = [ [11,12], [21,22], [31,32], [41,42] ]
So, how to get from a to b form? For now my solution is to create a list and then convert it to an array with numpy:
b = numpy.array([p[0] for p in a])
This works but I assume there must be a simpler and cleaner way...
UPD. originally I tried to do a simple comprehension: b = [p[0] for p in a] - but then b turned out to be a list, not an array - I assume that's because the original a array is ndarray from numpy
If you do want to use numpy:
b = np.array(a)[:, 0, :]
This will be faster than a comprehension.
Well... I certainly thought it would be
a = np.random.random((100_000, 1, 2)).tolist()
%timeit np.array([x[0] for x in a])
41.1 ms ± 304 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.array(a)[:, 0, :]
57.6 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit x = np.array(a); x.shape = len(a), 2
58.2 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
edit
Oh if its a numpy array then definitely use this method. Or use .squeeze() if you're sure it's not empty.
Here is another solution using list comprehension:
b = [x[0] for x in a]
If you are going to use numpy later, then it's best to avoid the list comprehension. Also it's always good practice to automate things as much as possible, so instead of manually selecting the singleton dimension just let numpy take care of:
b=numpy.array(a).squeeze()
Unless there are other singleton dimensions that you need to keep.
In order to flatten a "nested 2-d array like" as you call them, you just need to get the first element. arr[0]
Apply this concept in several ways:
list comprehension (most performing) : flatter_a_compr = [e[0] for e in a]
iterating (second best performing):
b =[]
for e in a:
b.append(e[0])
lambda (un-Pythonic): flatter_a = list(map(lambda e : e[0], a))
numpy (worst performing) : flatter_a_numpy = np.array(a)[:, 0, :]