Order of elements in a numpy array

Order of elements in a numpy array - python

I have a 2-d array of shape(nx3), say arr1. Now consider a second array, arr2, of same shape as arr1 and has the same rows. However, the rows are not in the same order. I want to get the indices of each row in arr2 as they are in arr1. I am looking for fastest Pythonic way to do this as n is of the order of 10,000.
For example:
arr1 = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2 = numpy.array([[4, 5, 6], [7, 8, 9], [1, 2, 3]])
ind = [1, 2, 0]
Note that the row elements need not be integers. In fact they are floats.
I have found related answers that use numpy.searchsorted but they work for 1-D arrays only.

If you are ensure that arr2 is a permutation of arr1, you can use sort to get the index:
import numpy as np
n = 100000
a1 = np.random.randint(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))]
idx1 = np.lexsort(a1.T)
idx2 = np.lexsort(a2.T)
idx = idx2[np.argsort(idx1)]
np.all(a1 == a2[idx])
if they don't have exact the same values, you can use kdTree in scipy:
n = 100000
a1 = np.random.uniform(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))] + np.random.normal(0, 1e-8, size=(n, 3))
from scipy import spatial
tree = spatial.cKDTree(a2)
dist, idx = tree.query(a1)
np.allclose(a1, a2[idx])

Before we begin, you should mention whether duplicates can exist in your list.
That said, the method I would use is numpy's where function within a list comprehension like so:
[numpy.where(arr1 == x)[0][0] for x in arr2]
Though this might not be the fastest way. Another method might include building a dictionary from the rows in arr1 somehow and then looking them up with arr2.

While this is very similar to: Find indexes of matching rows in two 2-D arrays I don't have the reputation to leave a comment.
However, based on that comment there appear to be two clear possibilities for a large matrix like yours:
def find_rows_searchsorted(a, b):
dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
a_view = np.ascontiguousarray(a).view(dt).ravel()
b_view = np.ascontiguousarray(b).view(dt).ravel()
sort_b = np.argsort(b_view)
where_in_b = np.searchsorted(b_view, a_view, sorter=sort_b)
return np.take(sort_b, where_in_b)
def find_rows_iterative(a, b):
answer = np.empty(a.shape[0], dtype=int)
for idx, row in enumerate(a):
answer[idx] = np.where(np.equal(b, row).all(1))[0]
return answer
def find_rows_list_comprehension(a, b):
return [np.where(b == x)[0][0] for x in a]
However, a little timing with a matrix of 10000 elements shows that the searchsorted based method is significantly faster than the brute force iterative method:
arr1 = np.random.randn(10000, 3)
shuffled_inds = np.arange(arr1.shape[0])
np.random.shuffle(shuffled_inds)
arr2 = arr1[new_inds, :]
np.array_equal(find_rows_searchsorted(arr2, arr1), new_inds)
>> True
np.array_equal(find_rows_iterative(arr2, arr1), new_inds)
>> True
np.array_equal(find_rows_list_comprehension(arr2, arr1), new_inds)
>> True
%timeit find_rows_iterative(arr2, arr1)
>> 1 loops, best of 3: 2.62 s per loop
%timeit find_rows_list_comprehension(arr2, arr1)
>> 1 loops, best of 3: 1.61 s per loop
%timeit find_rows_searchsorted(arr2, arr1)
>> 100 loops, best of 3: 6.53 ms per loop
Based off of HYRY's great responses I also added lexsort and kdball tests as well as a test of argsort for structured arrays.
def find_rows_lexsort(a, b):
idx1 = np.lexsort(a.T)
idx2 = np.lexsort(b.T)
return idx2[np.argsort(idx1)]
def find_rows_argsort(a, b):
a_rec = np.core.records.fromarrays(a.transpose())
b_rec = np.core.records.fromarrays(b.transpose())
idx1 = a_rec.argsort(order=a_rec.dtype.names).argsort()
return b_rec.argsort(order=b_rec.dtype.names)[idx1]
def find_rows_kdball(a, b):
from scipy import spatial
tree = spatial.cKDTree(b)
_, idx = tree.query(a)
return idx
%timeit find_rows_lexsort(arr2, arr1)
>> 100 loops, best of 3: 4.63 ms per loop
%timeit find_rows_argsort(arr2, arr1)
>> 100 loops, best of 3: 7.37 ms per loop
%timeit find_rows_kdball(arr2, arr1)
>> 100 loops, best of 3: 18.5 ms per loop

Related

Matrix element wise multiplication with shifted columns

Say I have two arrays, A and B.
An element wise multiplication is defined as follows:
I want to do an element-wise multiplication in a convolutional-like manner, i.e., move every column one step right, for example, column 1 will be now column 2 and column 3 will be now column 1.
This should yield a ( 2 by 3 by 3 ) array (2x3 matrix for all 3 possibilities)

We can concatenate A with one of it's own slice and then get those sliding windows. To get those windows, we can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows. Then, multiply those windows with B for the final output. More info on use of as_strided based view_as_windows.
Hence, we will have one vectorized solution like so -
In [70]: from skimage.util.shape import view_as_windows
In [71]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
In [74]: view_as_windows(A1,A.shape)[0]*B
Out[74]:
array([[[1, 0, 3],
[0, 0, 6]],
[[2, 0, 1],
[0, 0, 4]],
[[3, 0, 2],
[0, 0, 5]]])
We can also leverage multi-cores with numexpr module for the final step of broadcasted-multiplication, which should be better on larger arrays. Hence, for the sample case, it would be -
In [53]: import numexpr as ne
In [54]: w = view_as_windows(A1,A.shape)[0]
In [55]: ne.evaluate('w*B')
Out[55]:
array([[[1, 0, 3],
[0, 0, 6]],
[[2, 0, 1],
[0, 0, 4]],
[[3, 0, 2],
[0, 0, 5]]])
Timings on large arrays comparing the proposed two methods -
In [56]: A = np.random.rand(500,500)
...: B = np.random.rand(500,500)
In [57]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
...: w = view_as_windows(A1,A.shape)[0]
In [58]: %timeit w*B
...: %timeit ne.evaluate('w*B')
1 loop, best of 3: 422 ms per loop
1 loop, best of 3: 228 ms per loop
Squeezing out the best off strided-based method
If you really squeeze out the best off the strided-view-based approach, go with the original np.lib.stride_tricks.as_strided based one to avoid the functional overhead off view_as_windows -
def vaw_with_as_strided(A,B):
A1 = np.concatenate((A,A[:,:-1]),axis=1)
s0,s1 = A1.strides
S = (A.shape[1],)+A.shape
w = np.lib.stride_tricks.as_strided(A1,shape=S,strides=(s1,s0,s1))
return w*B
Comparing against #Paul Panzer's array-assignment based one, the crossover seems to be at 19x19 shaped arrays -
In [33]: n = 18
...: A = np.random.rand(n,n)
...: B = np.random.rand(n,n)
In [34]: %timeit vaw_with_as_strided(A,B)
...: %timeit pp(A,B)
10000 loops, best of 3: 22.4 µs per loop
10000 loops, best of 3: 21.4 µs per loop
In [35]: n = 19
...: A = np.random.rand(n,n)
...: B = np.random.rand(n,n)
In [36]: %timeit vaw_with_as_strided(A,B)
...: %timeit pp(A,B)
10000 loops, best of 3: 24.5 µs per loop
10000 loops, best of 3: 24.5 µs per loop
So, for anything smaller than 19x19, array-assignment seems to be better and for larger than those, strided-based one should be the way to go.

Just a note on view_as_windows/as_strided. Neat as these functions are, it is useful to know that they have a rather pronounced constant overhead. Here is comparison between #Divakar's view_as_windows based solution (vaw) and a copy-reshape based approach by me.
As you can see vaw is not very fast on small to medium sized operands and only begins to shine above array size 30x30.
Code:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from skimage.util.shape import view_as_windows
B = BenchmarkBuilder()
#B.add_function()
def vaw(A,B):
A1 = np.concatenate((A,A[:,:-1]),axis=1)
w = view_as_windows(A1,A.shape)[0]
return w*B
#B.add_function()
def pp(A,B):
m,n = A.shape
aux = np.empty((n,m,2*n),A.dtype)
AA = np.concatenate([A,A],1)
aux.reshape(-1)[:-n].reshape(n,-1)[...] = AA.reshape(-1)[:-1]
return aux[...,:n]*B
#B.add_arguments('array size')
def argument_provider():
for exp in range(4, 16):
dim_size = int(1.4**exp)
a = np.random.rand(dim_size,dim_size)
b = np.random.rand(dim_size,dim_size)
yield dim_size, MultiArgument([a,b])
r = B.run()
r.plot()
import pylab
pylab.savefig('vaw.png')

Run a for loop for the number of columns and use np.roll() around axis =1, to shift your columns and do the matrix multiplication.
refer to the accepted answer in this reference.
Hope this helps.

I can actually pad the array from its two sides with 2 columns (to get 2x5 array)
and run a conv2 with 'b' as a kernel, I think it's more efficient

Stepping with multiple values while slicing an array in Python

I am trying to get m values while stepping through every n elements of an array. For example, for m = 2 and n = 5, and given
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
I want to retrieve
b = [1, 2, 6, 7]
Is there a way to do this using slicing? I can do this using a nested list comprehension, but I was wondering if there was a way to do this using the indices only. For reference, the list comprehension way is:
b = [k for j in [a[i:i+2] for i in range(0,len(a),5)] for k in j]

I agree with wim that you can't do it with just slicing. But you can do it with just one list comprehension:
>>> [x for i,x in enumerate(a) if i%n < m]
[1, 2, 6, 7]

No, that is not possible with slicing. Slicing only supports start, stop, and step - there is no way to represent stepping with "groups" of size larger than 1.

In short, no, you cannot. But you can use itertools to remove the need for intermediary lists:
from itertools import chain, islice
res = list(chain.from_iterable(islice(a, i, i+2) for i in range(0, len(a), 5)))
print(res)
[1, 2, 6, 7]
Borrowing #Kevin's logic, if you want a vectorised solution to avoid a for loop, you can use 3rd party library numpy:
import numpy as np
m, n = 2, 5
a = np.array(a) # convert to numpy array
res = a[np.where(np.arange(a.shape[0]) % n < m)]

There are other ways to do it, which all have advantages for some cases, but none are "just slicing".
The most general solution is probably to group your input, slice the groups, then flatten the slices back out. One advantage of this solution is that you can do it lazily, without building big intermediate lists, and you can do it to any iterable, including a lazy iterator, not just a list.
# from itertools recipes in the docs
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
groups = grouper(a, 5)
truncated = (group[:2] for group in groups)
b = [elem for group in truncated for elem in group]
And you can convert that into a pretty simple one-liner, although you still need the grouper function:
b = [elem for group in grouper(a, 5) for elem in group[:2]]
Another option is to build a list of indices, and use itemgetter to grab all the values. This might be more readable for a more complicated function than just "the first 2 of every 5", but it's probably less readable for something as simple as your use:
indices = [i for i in range(len(a)) if i%5 < 2]
b = operator.itemgetter(*indices)(a)
… which can be turned into a one-liner:
b = operator.itemgetter(*[i for i in range(len(a)) if i%5 < 2])(a)
And you can combine the advantages of the two approaches by writing your own version of itemgetter that takes a lazy index iterator—which I won't show, because you can go even better by writing one that takes an index filter function instead:
def indexfilter(pred, a):
return [elem for i, elem in enumerate(a) if pred(i)]
b = indexfilter((lambda i: i%5<2), a)
(To make indexfilter lazy, just replace the brackets with parens.)
… or, as a one-liner:
b = [elem for i, elem in enumerate(a) if i%5<2]
I think this last one might be the most readable. And it works with any iterable rather than just lists, and it can be made lazy (again, just replace the brackets with parens). But I still don't think it's simpler than your original comprehension, and it's not just slicing.

The question states array, and by that if we are talking about NumPy arrays, we can surely use few obvious NumPy tricks and few not-so obvious ones. We can surely use slicing to get a 2D view into the input under certain conditions.
Now, based on the array length, let's call it l and m, we would have three scenarios :
Scenario #1 :l is divisible by n
We can use slicing and reshaping to get a view into the input array and hence get constant runtime.
Verify the view concept :
In [108]: a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
In [109]: m = 2; n = 5
In [110]: a.reshape(-1,n)[:,:m]
Out[110]:
array([[1, 2],
[6, 7]])
In [111]: np.shares_memory(a, a.reshape(-1,n)[:,:m])
Out[111]: True
Check timings on a very large array and hence constant runtime claim :
In [118]: a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
In [119]: %timeit a.reshape(-1,n)[:,:m]
1000000 loops, best of 3: 563 ns per loop
In [120]: a = np.arange(10000000)
In [121]: %timeit a.reshape(-1,n)[:,:m]
1000000 loops, best of 3: 564 ns per loop
To get flattened version :
If we have to get a flattened array as output, we just need to use a flattening operation with .ravel(), like so -
In [127]: a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
In [128]: m = 2; n = 5
In [129]: a.reshape(-1,n)[:,:m].ravel()
Out[129]: array([1, 2, 6, 7])
Timings show that it's not too bad when compared with the other looping and vectorized numpy.where versions from other posts -
In [143]: a = np.arange(10000000)
# #Kevin's soln
In [145]: %timeit [x for i,x in enumerate(a) if i%n < m]
1 loop, best of 3: 1.23 s per loop
# #jpp's soln
In [147]: %timeit a[np.where(np.arange(a.shape[0]) % n < m)]
10 loops, best of 3: 145 ms per loop
In [144]: %timeit a.reshape(-1,n)[:,:m].ravel()
100 loops, best of 3: 16.4 ms per loop
Scenario #2 :l is not divisible by n, but the groups end with a complete one at the end
We go to the non-obvious NumPy methods with np.lib.stride_tricks.as_strided that allows to go beyoond the memory block bounds (hence we need to be careful here to not write into those) to facilitate a solution using slicing. The implementation would look something like this -
def select_groups(a, m, n):
a = np.asarray(a)
strided = np.lib.stride_tricks.as_strided
# Get params defining the lengths for slicing and output array shape
nrows = len(a)//n
add0 = len(a)%n
s = a.strides[0]
out_shape = nrows+int(add0!=0),m
# Finally stride, flatten with reshape and slice
return strided(a, shape=out_shape, strides=(s*n,s))
A sample run to verify that the output is a view -
In [151]: a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
In [152]: m = 2; n = 5
In [153]: select_groups(a, m, n)
Out[153]:
array([[ 1, 2],
[ 6, 7],
[11, 12]])
In [154]: np.shares_memory(a, select_groups(a, m, n))
Out[154]: True
To get flattened version, append with .ravel().
Let's get some timings comparison -
In [158]: a = np.arange(10000003)
In [159]: m = 2; n = 5
# #Kevin's soln
In [161]: %timeit [x for i,x in enumerate(a) if i%n < m]
1 loop, best of 3: 1.24 s per loop
# #jpp's soln
In [162]: %timeit a[np.where(np.arange(a.shape[0]) % n < m)]
10 loops, best of 3: 148 ms per loop
In [160]: %timeit select_groups(a, m=m, n=n)
100000 loops, best of 3: 5.8 µs per loop
If we need a flattened version, it's still not too bad -
In [163]: %timeit select_groups(a, m=m, n=n).ravel()
100 loops, best of 3: 16.5 ms per loop
Scenario #3 :l is not divisible by n,and the groups end with a incomplete one at the end
For this case, we would need an extra slicing at the end on top of what we had in the previous method, like so -
def select_groups_generic(a, m, n):
a = np.asarray(a)
strided = np.lib.stride_tricks.as_strided
# Get params defining the lengths for slicing and output array shape
nrows = len(a)//n
add0 = len(a)%n
lim = m*(nrows) + add0
s = a.strides[0]
out_shape = nrows+int(add0!=0),m
# Finally stride, flatten with reshape and slice
return strided(a, shape=out_shape, strides=(s*n,s)).reshape(-1)[:lim]
Sample run -
In [166]: a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [167]: m = 2; n = 5
In [168]: select_groups_generic(a, m, n)
Out[168]: array([ 1, 2, 6, 7, 11])
Timings -
In [170]: a = np.arange(10000001)
In [171]: m = 2; n = 5
# #Kevin's soln
In [172]: %timeit [x for i,x in enumerate(a) if i%n < m]
1 loop, best of 3: 1.23 s per loop
# #jpp's soln
In [173]: %timeit a[np.where(np.arange(a.shape[0]) % n < m)]
10 loops, best of 3: 145 ms per loop
In [174]: %timeit select_groups_generic(a, m, n)
100 loops, best of 3: 12.2 ms per loop

I realize that recursion isn't popular, but would something like this work? Also, uncertain if adding recursion to the mix counts as just using slices.
def get_elements(A, m, n):
if(len(A) < m):
return A
else:
return A[:m] + get_elements(A[n:], m, n)
A is the array, m and n are defined as in the question. The first if covers the base case, where you have an array with length less than the number of elements you're trying to retrieve, and the second if is the recursive case. I'm somewhat new to python, please forgive my poor understanding of the language if this doesn't work properly, though I tested it and it seems to work fine.

With itertools you could get an iterator with:
from itertools import compress, cycle
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
n = 5
m = 2
it = compress(a, cycle([1, 1, 0, 0, 0]))
res = list(it)

Iterative subtraction of elements in array in Python

I have a large numpy array. Is there a way to subtract each element with the elements below it, and store the result in a new list/array, without using a loop.
A simple example of what I mean:
a = numpy.array([4,3,2,1])
result = [4-3, 4-2, 4-1, 3-2, 3-1, 2-1] = [1, 2, 3, 1, 2 ,1]
Note that the 'real' array I am working with doesn't contain numbers in sequence. This is just to make the example simple.
I know the result should have (n-1)! elements, where n is the size of the array.
Is there a way to do this without using a loop, but by repeating the array in a 'smart' way?
Thanks!

temp = a[:, None] - a
result = temp[np.triu_indices(len(a), k=1)]
Perform all pairwise subtractions to produce temp, including subtracting elements from themselves and subtracting earlier elements from later elements, then use triu_indices to select the results we want. (a[:, None] adds an extra length-1 axis to a.)
Note that almost all of the runtime is spent constructing result from temp (because triu_indices is slow and using indices to select the upper triangle of an array is slow). If you can use temp directly, you can save a lot of time:
In [13]: a = numpy.arange(2000)
In [14]: %%timeit
....: temp = a[:, None] - a
....:
100 loops, best of 3: 6.99 ms per loop
In [15]: %%timeit
....: temp = a[:, None] - a
....: result = temp[numpy.triu_indices(len(a), k=1)]
....:
10 loops, best of 3: 51.7 ms per loop

Here's a masking based approach for the extraction after broadcasted subtractions and for the mask creation we are again making use of broadcasting (double broadcasting powered so to speak) -
r = np.arange(a.size)
out = (a[:, None] - a)[r[:,None] < r]
Runtime test
Vectorized approaches -
# #user2357112's solution
def pairwise_diff_triu_indices_based(a):
return (a[:, None] - a)[np.triu_indices(len(a), k=1)]
# Proposed in this post
def pairwise_diff_masking_based(a):
r = np.arange(a.size)
return (a[:, None] - a)[r[:,None] < r]
Timings -
In [109]: a = np.arange(2000)
In [110]: %timeit pairwise_diff_triu_indices_based(a)
10 loops, best of 3: 36.1 ms per loop
In [111]: %timeit pairwise_diff_masking_based(a)
100 loops, best of 3: 11.8 ms per loop
Closer look at involved performance parameters
Let's dig deep a bit through the timings on this setup to study how much mask based approach helps. Now, for comparison there are two parts - Mask creation vs. indices creation and Mask based boolean indexing vs. integer based indexing.
How much mask creation helps?
In [37]: r = np.arange(a.size)
In [38]: %timeit np.arange(a.size)
1000000 loops, best of 3: 1.88 µs per loop
In [39]: %timeit r[:,None] < r
100 loops, best of 3: 3 ms per loop
In [40]: %timeit np.triu_indices(len(a), k=1)
100 loops, best of 3: 14.7 ms per loop
About 5x improvement on mask creation over index setup.
How much boolean indexing helps against integer based indexing?
In [41]: mask = r[:,None] < r
In [42]: idx = np.triu_indices(len(a), k=1)
In [43]: subs = a[:, None] - a
In [44]: %timeit subs[mask]
100 loops, best of 3: 4.15 ms per loop
In [45]: %timeit subs[idx]
100 loops, best of 3: 10.9 ms per loop
About 2.5x improvement here.

a = [4, 3, 2, 1]
differences = ((x - y) for i, x in enumerate(a) for y in a[i+1:])
for diff in differences:
# do something with difference.
pass

Check out itertools.combinations:
from itertools import combinations
l = [4, 3, 2, 1]
result = []
for n1, n2 in combinations(l, 2):
result.append(n1 - n2)
print result
Results in:
[1, 2, 3, 1, 2, 1]
combinations returns a generator, so this is good for very large lists :)

Is there a faster way to add two 2-d numpy array

Let say I have two large 2-d numpy array of same dimensions (say 2000x2000). I want to sum them element wise. I was wondering if there is a faster way than np.add()
Edit: I am adding a similar example of what I am using now. Is there a way to speed up this?
#a and b are the two matrices I already have.Dimension is 2000x2000
#shift is also a list that is previously known
for j in range(100000):
b=np.roll(b, shift[j] , axis=0)
a=np.add(a,b)

Approach #1 (Vectorized)
We can use modulus to simulate the circulating behavior of roll/circshift and with broadcasted indices to cover all rows, we would have a fully vectorized approach, like so -
n = b.shape[0]
idx = n-1 - np.mod(shift.cumsum()[:,None]-1 - np.arange(n), n)
a += b[idx].sum(0)
Approach #2 (Loopy one)
b_ext = np.row_stack((b, b[:-1] ))
start_idx = n-1 - np.mod(shift.cumsum()-1,n)
for j in range(start_idx.size):
a += b_ext[start_idx[j]:start_idx[j]+n]
Colon notation vs using indices for slicing
The idea here to do minimal work once we are inside the loop. We are pre-computing the start row index of each iteration before going into the loop. So, all we need to do once inside the loop is slicing using colon notation, which is a view into the array and adding up. This should be much better than rolling that needs to compute all of those row indices that results in a copy that is expensive.
Here's a bit more into the view and copy concepts when slicing with colon and indices -
In [11]: a = np.random.randint(0,9,(10))
In [12]: a
Out[12]: array([8, 0, 1, 7, 5, 0, 6, 1, 7, 0])
In [13]: a[3:8]
Out[13]: array([7, 5, 0, 6, 1])
In [14]: a[[3,4,5,6,7]]
Out[14]: array([7, 5, 0, 6, 1])
In [15]: np.may_share_memory(a, a[3:8])
Out[15]: True
In [16]: np.may_share_memory(a, a[[3,4,5,6,7]])
Out[16]: False
Runtime test
Function defintions -
def original_loopy_app(a,b):
for j in range(shift.size):
b=np.roll(b, shift[j] , axis=0)
a += b
def vectorized_app(a,b):
n = b.shape[0]
idx = n-1 - np.mod(shift.cumsum()[:,None]-1 - np.arange(n), n)
a += b[idx].sum(0)
def modified_loopy_app(a,b):
n = b.shape[0]
b_ext = np.row_stack((b, b[:-1] ))
start_idx = n-1 - np.mod(shift.cumsum()-1,n)
for j in range(start_idx.size):
a += b_ext[start_idx[j]:start_idx[j]+n]
Case #1:
In [5]: # Setup input arrays
...: N = 200
...: M = 1000
...: a = np.random.randint(11,99,(N,N))
...: b = np.random.randint(11,99,(N,N))
...: shift = np.random.randint(0,N,M)
...:
In [6]: original_loopy_app(a1,b1)
...: vectorized_app(a2,b2)
...: modified_loopy_app(a3,b3)
...:
In [7]: np.allclose(a1, a2) # Verify results
Out[7]: True
In [8]: np.allclose(a1, a3) # Verify results
Out[8]: True
In [9]: %timeit original_loopy_app(a1,b1)
...: %timeit vectorized_app(a2,b2)
...: %timeit modified_loopy_app(a3,b3)
...:
10 loops, best of 3: 107 ms per loop
10 loops, best of 3: 137 ms per loop
10 loops, best of 3: 48.2 ms per loop
Case #2:
In [13]: # Setup input arrays (datasets are exactly 1/10th of original sizes)
...: N = 200
...: M = 10000
...: a = np.random.randint(11,99,(N,N))
...: b = np.random.randint(11,99,(N,N))
...: shift = np.random.randint(0,N,M)
...:
In [14]: %timeit original_loopy_app(a1,b1)
...: %timeit modified_loopy_app(a3,b3)
...:
1 loops, best of 3: 1.11 s per loop
1 loops, best of 3: 481 ms per loop
So, we are looking at 2x+ speedup there with the modified loopy approach!

Efficient way to check high dimensional arrays are overlapped in two ndarray in Python

For example, I have two ndarrays, the shape of train_dataset is (10000, 28, 28) and the shape of val_dateset is (2000, 28, 28).
Except for using iterations, is there any efficient way to use the numpy array functions to find the overlap between two ndarrays?

One trick I learnt from Jaime's excellent answer here is to use an np.void dtype in order to view each row in the input arrays as a single element. This allows you to treat them as 1D arrays, which can then be passed to np.in1d or one of the other set routines.
import numpy as np
def find_overlap(A, B):
if not A.dtype == B.dtype:
raise TypeError("A and B must have the same dtype")
if not A.shape[1:] == B.shape[1:]:
raise ValueError("the shapes of A and B must be identical apart from "
"the row dimension")
# reshape A and B to 2D arrays. force a copy if neccessary in order to
# ensure that they are C-contiguous.
A = np.ascontiguousarray(A.reshape(A.shape[0], -1))
B = np.ascontiguousarray(B.reshape(B.shape[0], -1))
# void type that views each row in A and B as a single item
t = np.dtype((np.void, A.dtype.itemsize * A.shape[1]))
# use in1d to find rows in A that are also in B
return np.in1d(A.view(t), B.view(t))
For example:
gen = np.random.RandomState(0)
A = gen.randn(1000, 28, 28)
dupe_idx = gen.choice(A.shape[0], size=200, replace=False)
B = A[dupe_idx]
A_in_B = find_overlap(A, B)
print(np.all(np.where(A_in_B)[0] == np.sort(dupe_idx)))
# True
This method is much more memory-efficient than Divakar's, since it doesn't require broadcasting out to an (m, n, ...) boolean array. In fact, if A and B are row-major then no copying is required at all.
For comparison I've slightly adapted Divakar and B. M.'s solutions.
def divakar(A, B):
A.shape = A.shape[0], -1
B.shape = B.shape[0], -1
return (B[:,None] == A).all(axis=(2)).any(0)
def bm(A, B):
t = 'S' + str(A.size // A.shape[0] * A.dtype.itemsize)
ma = np.frombuffer(np.ascontiguousarray(A), t)
mb = np.frombuffer(np.ascontiguousarray(B), t)
return (mb[:, None] == ma).any(0)
Benchmarks:
In [1]: na = 1000; nb = 200; rowshape = 28, 28
In [2]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
divakar(A, B)
....:
1 loops, best of 3: 244 ms per loop
In [3]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
bm(A, B)
....:
100 loops, best of 3: 2.81 ms per loop
In [4]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
find_overlap(A, B)
....:
100 loops, best of 3: 15 ms per loop
As you can see, B. M.'s solution is slightly faster than mine for small n, but np.in1d scales better than testing equality for all elements (O(n log n) rather than O(n²) complexity).
In [5]: na = 10000; nb = 2000; rowshape = 28, 28
In [6]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
bm(A, B)
....:
1 loops, best of 3: 271 ms per loop
In [7]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
find_overlap(A, B)
....:
10 loops, best of 3: 123 ms per loop
Divakar's solution is intractable on my laptop for arrays of this size, since it requires generating a 15GB intermediate array whereas I only have 8GB RAM.

Memory permitting you could use broadcasting, like so -
val_dateset[(train_dataset[:,None] == val_dateset).all(axis=(2,3)).any(0)]
Sample run -
In [55]: train_dataset
Out[55]:
array([[[1, 1],
[1, 1]],
[[1, 0],
[0, 0]],
[[0, 0],
[0, 1]],
[[0, 1],
[0, 0]],
[[1, 1],
[1, 0]]])
In [56]: val_dateset
Out[56]:
array([[[0, 1],
[1, 0]],
[[1, 1],
[1, 1]],
[[0, 0],
[0, 1]]])
In [57]: val_dateset[(train_dataset[:,None] == val_dateset).all(axis=(2,3)).any(0)]
Out[57]:
array([[[1, 1],
[1, 1]],
[[0, 0],
[0, 1]]])
If the elements are integers, you could collapse every block of axis=(1,2) in the input arrays into a scalar assuming them as linearly index-able numbers and then efficiently use np.in1d or np.intersect1d to find the matches.

Full broadcasting generate here a 10000*2000*28*28 =150 Mo boolean array.
For efficiency, you can :
pack data, for a 200 ko array:
from pylab import *
N=10000
a=rand(N,28,28)
b=a[[randint(0,N,N//5)]]
packedtype='S'+ str(a.size//a.shape[0]*a.dtype.itemsize) # 'S6272'
ma=frombuffer(a,packedtype) # ma.shape=10000
mb=frombuffer(b,packedtype) # mb.shape=2000
%timeit a[:,None]==b : 102 s
%timeit ma[:,None]==mb : 800 ms
allclose((a[:,None]==b).all((2,3)),(ma[:,None]==mb)) : True
less memory is helped here by lazy string comparison, breaking at first difference :
In [31]: %timeit a[:100]==b[:100]
10000 loops, best of 3: 175 µs per loop
In [32]: %timeit a[:100]==a[:100]
10000 loops, best of 3: 133 µs per loop
In [34]: %timeit ma[:100]==mb[:100]
100000 loops, best of 3: 7.55 µs per loop
In [35]: %timeit ma[:100]==ma[:100]
10000 loops, best of 3: 156 µs per loop
Solutions are given here with (ma[:,None]==mb).nonzero().
use in1d, for a (Na+Nb) ln(Na+Nb) complexity, against
Na*Nb on full comparison :
%timeit in1d(ma,mb).nonzero() : 590ms
Not a big gain here, but asymptotically better.

Solution
def overlap(a,b):
"""
returns a boolean index array for input array b representing
elements in b that are also found in a
"""
a.repeat(b.shape[0],axis=0)
b.repeat(a.shape[0],axis=0)
c = aa == bb
c = c[::a.shape[0]]
return c.all(axis=1)[:,0]
You can use the returned index array to index b to extract the elements which are also found in a
b[overlap(a,b)]
Explanation
For simplicity's sake I assume you have imported everything from numpy for this example:
from numpy import *
So, for example, given two ndarrays
a = arange(4*2*2).reshape(4,2,2)
b = arange(3*2*2).reshape(3,2,2)
we repeat a and b so that they have the same shape
aa = a.repeat(b.shape[0],axis=0)
bb = b.repeat(a.shape[0],axis=0)
we can then simply compare the elements of aa and bb
c = aa == bb
Finally, to get the indices of the elements in b which are also found in a by looking at every 4th, or actually, every shape(a)[0]th element of c
cc == c[::a.shape[0]]
Finally, we extract an index array with only the elements where all elements in the sub-arrays are True
c.all(axis=1)[:,0]
In our example we get
array([True, True, True], dtype=bool)
To check, change the first element of b
b[0] = array([[50,60],[70,80]])
and we get
array([False, True, True], dtype=bool)

This question comes form Google's online deep learning course?
The following is my solution:
sum = 0 # number of overlapping rows
for i in range(val_dataset.shape[0]): # iterate over all rows of val_dataset
overlap = (train_dataset == val_dataset[i,:,:]).all(axis=1).all(axis=1).sum()
if overlap:
sum += 1
print(sum)
Automatic broadcasting is used instead of iteration. You may test the performance difference.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Order of elements in a numpy array - python

Related

Matrix element wise multiplication with shifted columns

Stepping with multiple values while slicing an array in Python

Iterative subtraction of elements in array in Python

Is there a faster way to add two 2-d numpy array

Efficient way to check high dimensional arrays are overlapped in two ndarray in Python

Categories

Resources