Say I have two arrays, A and B.
An element wise multiplication is defined as follows:
I want to do an element-wise multiplication in a convolutional-like manner, i.e., move every column one step right, for example, column 1 will be now column 2 and column 3 will be now column 1.
This should yield a ( 2 by 3 by 3 ) array (2x3 matrix for all 3 possibilities)
We can concatenate A with one of it's own slice and then get those sliding windows. To get those windows, we can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows. Then, multiply those windows with B for the final output. More info on use of as_strided based view_as_windows.
Hence, we will have one vectorized solution like so -
In [70]: from skimage.util.shape import view_as_windows
In [71]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
In [74]: view_as_windows(A1,A.shape)[0]*B
Out[74]:
array([[[1, 0, 3],
[0, 0, 6]],
[[2, 0, 1],
[0, 0, 4]],
[[3, 0, 2],
[0, 0, 5]]])
We can also leverage multi-cores with numexpr module for the final step of broadcasted-multiplication, which should be better on larger arrays. Hence, for the sample case, it would be -
In [53]: import numexpr as ne
In [54]: w = view_as_windows(A1,A.shape)[0]
In [55]: ne.evaluate('w*B')
Out[55]:
array([[[1, 0, 3],
[0, 0, 6]],
[[2, 0, 1],
[0, 0, 4]],
[[3, 0, 2],
[0, 0, 5]]])
Timings on large arrays comparing the proposed two methods -
In [56]: A = np.random.rand(500,500)
...: B = np.random.rand(500,500)
In [57]: A1 = np.concatenate((A,A[:,:-1]),axis=1)
...: w = view_as_windows(A1,A.shape)[0]
In [58]: %timeit w*B
...: %timeit ne.evaluate('w*B')
1 loop, best of 3: 422 ms per loop
1 loop, best of 3: 228 ms per loop
Squeezing out the best off strided-based method
If you really squeeze out the best off the strided-view-based approach, go with the original np.lib.stride_tricks.as_strided based one to avoid the functional overhead off view_as_windows -
def vaw_with_as_strided(A,B):
A1 = np.concatenate((A,A[:,:-1]),axis=1)
s0,s1 = A1.strides
S = (A.shape[1],)+A.shape
w = np.lib.stride_tricks.as_strided(A1,shape=S,strides=(s1,s0,s1))
return w*B
Comparing against #Paul Panzer's array-assignment based one, the crossover seems to be at 19x19 shaped arrays -
In [33]: n = 18
...: A = np.random.rand(n,n)
...: B = np.random.rand(n,n)
In [34]: %timeit vaw_with_as_strided(A,B)
...: %timeit pp(A,B)
10000 loops, best of 3: 22.4 µs per loop
10000 loops, best of 3: 21.4 µs per loop
In [35]: n = 19
...: A = np.random.rand(n,n)
...: B = np.random.rand(n,n)
In [36]: %timeit vaw_with_as_strided(A,B)
...: %timeit pp(A,B)
10000 loops, best of 3: 24.5 µs per loop
10000 loops, best of 3: 24.5 µs per loop
So, for anything smaller than 19x19, array-assignment seems to be better and for larger than those, strided-based one should be the way to go.
Just a note on view_as_windows/as_strided. Neat as these functions are, it is useful to know that they have a rather pronounced constant overhead. Here is comparison between #Divakar's view_as_windows based solution (vaw) and a copy-reshape based approach by me.
As you can see vaw is not very fast on small to medium sized operands and only begins to shine above array size 30x30.
Code:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from skimage.util.shape import view_as_windows
B = BenchmarkBuilder()
#B.add_function()
def vaw(A,B):
A1 = np.concatenate((A,A[:,:-1]),axis=1)
w = view_as_windows(A1,A.shape)[0]
return w*B
#B.add_function()
def pp(A,B):
m,n = A.shape
aux = np.empty((n,m,2*n),A.dtype)
AA = np.concatenate([A,A],1)
aux.reshape(-1)[:-n].reshape(n,-1)[...] = AA.reshape(-1)[:-1]
return aux[...,:n]*B
#B.add_arguments('array size')
def argument_provider():
for exp in range(4, 16):
dim_size = int(1.4**exp)
a = np.random.rand(dim_size,dim_size)
b = np.random.rand(dim_size,dim_size)
yield dim_size, MultiArgument([a,b])
r = B.run()
r.plot()
import pylab
pylab.savefig('vaw.png')
Run a for loop for the number of columns and use np.roll() around axis =1, to shift your columns and do the matrix multiplication.
refer to the accepted answer in this reference.
Hope this helps.
I can actually pad the array from its two sides with 2 columns (to get 2x5 array)
and run a conv2 with 'b' as a kernel, I think it's more efficient
Related
I'm using numpy einsum to calculate the dot products of an array of column vectors pts, of shape (3,N), with itself, resulting on a matrix dotps, of shape (N,N), with all the dot products. This is the code I use:
dotps = np.einsum('ij,ik->jk', pts, pts)
This works, but I only need the values above the main diagonal. ie. the upper triangular part of the result without the diagonal. Is it possible to compute only these values with einsum? or in any other way that is faster than using einsum to compute the whole matrix?
My pts array can be quite large so if I could calculate only the values I need that would double my computation speed.
You can slice relevant columns and then use np.einsum -
R,C = np.triu_indices(N,1)
out = np.einsum('ij,ij->j',pts[:,R],pts[:,C])
Sample run -
In [109]: N = 5
...: pts = np.random.rand(3,N)
...: dotps = np.einsum('ij,ik->jk', pts, pts)
...:
In [110]: dotps
Out[110]:
array([[ 0.26529103, 0.30626052, 0.18373867, 0.13602931, 0.51162729],
[ 0.30626052, 0.56132272, 0.5938057 , 0.28750708, 0.9876753 ],
[ 0.18373867, 0.5938057 , 0.84699103, 0.35788749, 1.04483158],
[ 0.13602931, 0.28750708, 0.35788749, 0.18274288, 0.4612556 ],
[ 0.51162729, 0.9876753 , 1.04483158, 0.4612556 , 1.82723949]])
In [111]: R,C = np.triu_indices(N,1)
...: out = np.einsum('ij,ij->j',pts[:,R],pts[:,C])
...:
In [112]: out
Out[112]:
array([ 0.30626052, 0.18373867, 0.13602931, 0.51162729, 0.5938057 ,
0.28750708, 0.9876753 , 0.35788749, 1.04483158, 0.4612556 ])
Optimizing further -
Let's time our approach and see if there's any scope for improvement performance-wise.
In [126]: N = 5000
In [127]: pts = np.random.rand(3,N)
In [128]: %timeit np.triu_indices(N,1)
1 loops, best of 3: 413 ms per loop
In [129]: R,C = np.triu_indices(N,1)
In [130]: %timeit np.einsum('ij,ij->j',pts[:,R],pts[:,C])
1 loops, best of 3: 1.47 s per loop
Staying within the memory constraints, it doesn't look like we can do much about optimizing np.einsum. So, let's shift the focus to np.triu_indices.
For N = 4, we have :
In [131]: N = 4
In [132]: np.triu_indices(N,1)
Out[132]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
It seems to be creating a regular pattern, sort of like a shifting one though. This could be written with a cumulative sum that has shifts at those 3 and 5 positions. Thinking generically, we would end up coding it something like this -
def triu_indices_cumsum(N):
# Length of R and C index arrays
L = (N*(N-1))/2
# Positions along the R and C arrays that indicate
# shifting to the next row of the full array
shifts_idx = np.arange(2,N)[::-1].cumsum()
# Initialize "shift" arrays for finally leading to R and C
shifts1_arr = np.zeros(L,dtype=int)
shifts2_arr = np.ones(L,dtype=int)
# At shift positions along the shifts array set appropriate values,
# such that when cumulative summed would lead to desired R and C arrays.
shifts1_arr[shifts_idx] = 1
shifts2_arr[shifts_idx] = -np.arange(N-2)[::-1]
# Finall cumsum to give R, C
R_arr = shifts1_arr.cumsum()
C_arr = shifts2_arr.cumsum()
return R_arr, C_arr
Let's time it for various N's!
In [133]: N = 100
In [134]: %timeit np.triu_indices(N,1)
10000 loops, best of 3: 122 µs per loop
In [135]: %timeit triu_indices_cumsum(N)
10000 loops, best of 3: 61.7 µs per loop
In [136]: N = 1000
In [137]: %timeit np.triu_indices(N,1)
100 loops, best of 3: 17 ms per loop
In [138]: %timeit triu_indices_cumsum(N)
100 loops, best of 3: 16.3 ms per loop
Thus, it looks like for decent N's, the customized cumsum based triu_indices might be worth a look!
For example, I have two ndarrays, the shape of train_dataset is (10000, 28, 28) and the shape of val_dateset is (2000, 28, 28).
Except for using iterations, is there any efficient way to use the numpy array functions to find the overlap between two ndarrays?
One trick I learnt from Jaime's excellent answer here is to use an np.void dtype in order to view each row in the input arrays as a single element. This allows you to treat them as 1D arrays, which can then be passed to np.in1d or one of the other set routines.
import numpy as np
def find_overlap(A, B):
if not A.dtype == B.dtype:
raise TypeError("A and B must have the same dtype")
if not A.shape[1:] == B.shape[1:]:
raise ValueError("the shapes of A and B must be identical apart from "
"the row dimension")
# reshape A and B to 2D arrays. force a copy if neccessary in order to
# ensure that they are C-contiguous.
A = np.ascontiguousarray(A.reshape(A.shape[0], -1))
B = np.ascontiguousarray(B.reshape(B.shape[0], -1))
# void type that views each row in A and B as a single item
t = np.dtype((np.void, A.dtype.itemsize * A.shape[1]))
# use in1d to find rows in A that are also in B
return np.in1d(A.view(t), B.view(t))
For example:
gen = np.random.RandomState(0)
A = gen.randn(1000, 28, 28)
dupe_idx = gen.choice(A.shape[0], size=200, replace=False)
B = A[dupe_idx]
A_in_B = find_overlap(A, B)
print(np.all(np.where(A_in_B)[0] == np.sort(dupe_idx)))
# True
This method is much more memory-efficient than Divakar's, since it doesn't require broadcasting out to an (m, n, ...) boolean array. In fact, if A and B are row-major then no copying is required at all.
For comparison I've slightly adapted Divakar and B. M.'s solutions.
def divakar(A, B):
A.shape = A.shape[0], -1
B.shape = B.shape[0], -1
return (B[:,None] == A).all(axis=(2)).any(0)
def bm(A, B):
t = 'S' + str(A.size // A.shape[0] * A.dtype.itemsize)
ma = np.frombuffer(np.ascontiguousarray(A), t)
mb = np.frombuffer(np.ascontiguousarray(B), t)
return (mb[:, None] == ma).any(0)
Benchmarks:
In [1]: na = 1000; nb = 200; rowshape = 28, 28
In [2]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
divakar(A, B)
....:
1 loops, best of 3: 244 ms per loop
In [3]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
bm(A, B)
....:
100 loops, best of 3: 2.81 ms per loop
In [4]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
find_overlap(A, B)
....:
100 loops, best of 3: 15 ms per loop
As you can see, B. M.'s solution is slightly faster than mine for small n, but np.in1d scales better than testing equality for all elements (O(n log n) rather than O(n²) complexity).
In [5]: na = 10000; nb = 2000; rowshape = 28, 28
In [6]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
bm(A, B)
....:
1 loops, best of 3: 271 ms per loop
In [7]: %%timeit A = gen.randn(na, *rowshape); idx = gen.choice(na, size=nb, replace=False); B = A[idx]
find_overlap(A, B)
....:
10 loops, best of 3: 123 ms per loop
Divakar's solution is intractable on my laptop for arrays of this size, since it requires generating a 15GB intermediate array whereas I only have 8GB RAM.
Memory permitting you could use broadcasting, like so -
val_dateset[(train_dataset[:,None] == val_dateset).all(axis=(2,3)).any(0)]
Sample run -
In [55]: train_dataset
Out[55]:
array([[[1, 1],
[1, 1]],
[[1, 0],
[0, 0]],
[[0, 0],
[0, 1]],
[[0, 1],
[0, 0]],
[[1, 1],
[1, 0]]])
In [56]: val_dateset
Out[56]:
array([[[0, 1],
[1, 0]],
[[1, 1],
[1, 1]],
[[0, 0],
[0, 1]]])
In [57]: val_dateset[(train_dataset[:,None] == val_dateset).all(axis=(2,3)).any(0)]
Out[57]:
array([[[1, 1],
[1, 1]],
[[0, 0],
[0, 1]]])
If the elements are integers, you could collapse every block of axis=(1,2) in the input arrays into a scalar assuming them as linearly index-able numbers and then efficiently use np.in1d or np.intersect1d to find the matches.
Full broadcasting generate here a 10000*2000*28*28 =150 Mo boolean array.
For efficiency, you can :
pack data, for a 200 ko array:
from pylab import *
N=10000
a=rand(N,28,28)
b=a[[randint(0,N,N//5)]]
packedtype='S'+ str(a.size//a.shape[0]*a.dtype.itemsize) # 'S6272'
ma=frombuffer(a,packedtype) # ma.shape=10000
mb=frombuffer(b,packedtype) # mb.shape=2000
%timeit a[:,None]==b : 102 s
%timeit ma[:,None]==mb : 800 ms
allclose((a[:,None]==b).all((2,3)),(ma[:,None]==mb)) : True
less memory is helped here by lazy string comparison, breaking at first difference :
In [31]: %timeit a[:100]==b[:100]
10000 loops, best of 3: 175 µs per loop
In [32]: %timeit a[:100]==a[:100]
10000 loops, best of 3: 133 µs per loop
In [34]: %timeit ma[:100]==mb[:100]
100000 loops, best of 3: 7.55 µs per loop
In [35]: %timeit ma[:100]==ma[:100]
10000 loops, best of 3: 156 µs per loop
Solutions are given here with (ma[:,None]==mb).nonzero().
use in1d, for a (Na+Nb) ln(Na+Nb) complexity, against
Na*Nb on full comparison :
%timeit in1d(ma,mb).nonzero() : 590ms
Not a big gain here, but asymptotically better.
Solution
def overlap(a,b):
"""
returns a boolean index array for input array b representing
elements in b that are also found in a
"""
a.repeat(b.shape[0],axis=0)
b.repeat(a.shape[0],axis=0)
c = aa == bb
c = c[::a.shape[0]]
return c.all(axis=1)[:,0]
You can use the returned index array to index b to extract the elements which are also found in a
b[overlap(a,b)]
Explanation
For simplicity's sake I assume you have imported everything from numpy for this example:
from numpy import *
So, for example, given two ndarrays
a = arange(4*2*2).reshape(4,2,2)
b = arange(3*2*2).reshape(3,2,2)
we repeat a and b so that they have the same shape
aa = a.repeat(b.shape[0],axis=0)
bb = b.repeat(a.shape[0],axis=0)
we can then simply compare the elements of aa and bb
c = aa == bb
Finally, to get the indices of the elements in b which are also found in a by looking at every 4th, or actually, every shape(a)[0]th element of c
cc == c[::a.shape[0]]
Finally, we extract an index array with only the elements where all elements in the sub-arrays are True
c.all(axis=1)[:,0]
In our example we get
array([True, True, True], dtype=bool)
To check, change the first element of b
b[0] = array([[50,60],[70,80]])
and we get
array([False, True, True], dtype=bool)
This question comes form Google's online deep learning course?
The following is my solution:
sum = 0 # number of overlapping rows
for i in range(val_dataset.shape[0]): # iterate over all rows of val_dataset
overlap = (train_dataset == val_dataset[i,:,:]).all(axis=1).all(axis=1).sum()
if overlap:
sum += 1
print(sum)
Automatic broadcasting is used instead of iteration. You may test the performance difference.
I have a ndarray of shape(z,y,x) containing values. I am trying to index this array with another ndarray of shape(y,x) that contains the z-index of the value I am interested in.
import numpy as np
val_arr = np.arange(27).reshape(3,3,3)
z_indices = np.array([[1,0,2],
[0,0,1],
[2,0,1]])
Since my arrays are rather large I tried to use np.take to avoid unnecessary copies of the array but just can't wrap my head around indexing 3-dimensional arrays with it.
How do I have to index val_arr with z_indices to get the values at the desired z-axis position? The expected outcome would be:
result_arr = np.array([[9,1,20],
[3,4,14],
[24,7,17]])
You can use choose to make the selection:
>>> z_indices.choose(val_arr)
array([[ 9, 1, 20],
[ 3, 4, 14],
[24, 7, 17]])
The function choose is incredibly useful, but can be somewhat tricky to make sense of. Essentially, given an array (val_arr) we can make a series of choices (z_indices) from each n-dimensional slice along the first axis.
Also: any fancy indexing operation will create a new array rather than a view of the original data. It is not possible to index val_arr with z_indices without creating a brand new array.
With readability, np.choose definitely looks great.
If performance is of essence, you can calculate the linear indices and then use np.take or use a flattened version with .ravel() and extract those specific elements from val_arr. The implementation would look something like this -
def linidx_take(val_arr,z_indices):
# Get number of columns and rows in values array
_,nC,nR = val_arr.shape
# Get linear indices and thus extract elements with np.take
idx = nC*nR*z_indices + nR*np.arange(nR)[:,None] + np.arange(nC)
return np.take(val_arr,idx) # Or val_arr.ravel()[idx]
Runtime tests and verify results -
Ogrid based solution from here is made into a generic version for these tests, like so :
In [182]: def ogrid_based(val_arr,z_indices):
...: v_shp = val_arr.shape
...: y,x = np.ogrid[0:v_shp[1], 0:v_shp[2]]
...: return val_arr[z_indices, y, x]
...:
Case #1: Smaller datasize
In [183]: val_arr = np.random.rand(30,30,30)
...: z_indices = np.random.randint(0,30,(30,30))
...:
In [184]: np.allclose(z_indices.choose(val_arr),ogrid_based(val_arr,z_indices))
Out[184]: True
In [185]: np.allclose(z_indices.choose(val_arr),linidx_take(val_arr,z_indices))
Out[185]: True
In [187]: %timeit z_indices.choose(val_arr)
1000 loops, best of 3: 230 µs per loop
In [188]: %timeit ogrid_based(val_arr,z_indices)
10000 loops, best of 3: 54.1 µs per loop
In [189]: %timeit linidx_take(val_arr,z_indices)
10000 loops, best of 3: 30.3 µs per loop
Case #2: Bigger datasize
In [191]: val_arr = np.random.rand(300,300,300)
...: z_indices = np.random.randint(0,300,(300,300))
...:
In [192]: z_indices.choose(val_arr) # Seems like there is some limitation here with bigger arrays.
Traceback (most recent call last):
File "<ipython-input-192-10c3bb600361>", line 1, in <module>
z_indices.choose(val_arr)
ValueError: Need between 2 and (32) array objects (inclusive).
In [194]: np.allclose(linidx_take(val_arr,z_indices),ogrid_based(val_arr,z_indices))
Out[194]: True
In [195]: %timeit ogrid_based(val_arr,z_indices)
100 loops, best of 3: 3.67 ms per loop
In [196]: %timeit linidx_take(val_arr,z_indices)
100 loops, best of 3: 2.04 ms per loop
If you have numpy >= 1.15.0 you could use numpy.take_along_axis. In your case:
result_array = numpy.take_along_axis(val_arr, z_indices.reshape((3,3,1)), axis=2)
That should give you the result you want in one neat line of code. Note the size of the indices array. It needs to have the same number of dimensions as your val_arr (and the same size in the first two dimensions).
Inspired by this thread, using np.ogrid:
y,x = np.ogrid[0:3, 0:3]
print [z_indices, y, x]
[array([[1, 0, 2],
[0, 0, 1],
[2, 0, 1]]),
array([[0],
[1],
[2]]),
array([[0, 1, 2]])]
print val_arr[z_indices, y, x]
[[ 9 1 20]
[ 3 4 14]
[24 7 17]]
I have to admit that multidimensional fancy indexing can be messy and confusing :)
I have a 2-d array of shape(nx3), say arr1. Now consider a second array, arr2, of same shape as arr1 and has the same rows. However, the rows are not in the same order. I want to get the indices of each row in arr2 as they are in arr1. I am looking for fastest Pythonic way to do this as n is of the order of 10,000.
For example:
arr1 = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2 = numpy.array([[4, 5, 6], [7, 8, 9], [1, 2, 3]])
ind = [1, 2, 0]
Note that the row elements need not be integers. In fact they are floats.
I have found related answers that use numpy.searchsorted but they work for 1-D arrays only.
If you are ensure that arr2 is a permutation of arr1, you can use sort to get the index:
import numpy as np
n = 100000
a1 = np.random.randint(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))]
idx1 = np.lexsort(a1.T)
idx2 = np.lexsort(a2.T)
idx = idx2[np.argsort(idx1)]
np.all(a1 == a2[idx])
if they don't have exact the same values, you can use kdTree in scipy:
n = 100000
a1 = np.random.uniform(0, 100, size=(n, 3))
a2 = a1[np.random.permutation(np.arange(n))] + np.random.normal(0, 1e-8, size=(n, 3))
from scipy import spatial
tree = spatial.cKDTree(a2)
dist, idx = tree.query(a1)
np.allclose(a1, a2[idx])
Before we begin, you should mention whether duplicates can exist in your list.
That said, the method I would use is numpy's where function within a list comprehension like so:
[numpy.where(arr1 == x)[0][0] for x in arr2]
Though this might not be the fastest way. Another method might include building a dictionary from the rows in arr1 somehow and then looking them up with arr2.
While this is very similar to: Find indexes of matching rows in two 2-D arrays I don't have the reputation to leave a comment.
However, based on that comment there appear to be two clear possibilities for a large matrix like yours:
def find_rows_searchsorted(a, b):
dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
a_view = np.ascontiguousarray(a).view(dt).ravel()
b_view = np.ascontiguousarray(b).view(dt).ravel()
sort_b = np.argsort(b_view)
where_in_b = np.searchsorted(b_view, a_view, sorter=sort_b)
return np.take(sort_b, where_in_b)
def find_rows_iterative(a, b):
answer = np.empty(a.shape[0], dtype=int)
for idx, row in enumerate(a):
answer[idx] = np.where(np.equal(b, row).all(1))[0]
return answer
def find_rows_list_comprehension(a, b):
return [np.where(b == x)[0][0] for x in a]
However, a little timing with a matrix of 10000 elements shows that the searchsorted based method is significantly faster than the brute force iterative method:
arr1 = np.random.randn(10000, 3)
shuffled_inds = np.arange(arr1.shape[0])
np.random.shuffle(shuffled_inds)
arr2 = arr1[new_inds, :]
np.array_equal(find_rows_searchsorted(arr2, arr1), new_inds)
>> True
np.array_equal(find_rows_iterative(arr2, arr1), new_inds)
>> True
np.array_equal(find_rows_list_comprehension(arr2, arr1), new_inds)
>> True
%timeit find_rows_iterative(arr2, arr1)
>> 1 loops, best of 3: 2.62 s per loop
%timeit find_rows_list_comprehension(arr2, arr1)
>> 1 loops, best of 3: 1.61 s per loop
%timeit find_rows_searchsorted(arr2, arr1)
>> 100 loops, best of 3: 6.53 ms per loop
Based off of HYRY's great responses I also added lexsort and kdball tests as well as a test of argsort for structured arrays.
def find_rows_lexsort(a, b):
idx1 = np.lexsort(a.T)
idx2 = np.lexsort(b.T)
return idx2[np.argsort(idx1)]
def find_rows_argsort(a, b):
a_rec = np.core.records.fromarrays(a.transpose())
b_rec = np.core.records.fromarrays(b.transpose())
idx1 = a_rec.argsort(order=a_rec.dtype.names).argsort()
return b_rec.argsort(order=b_rec.dtype.names)[idx1]
def find_rows_kdball(a, b):
from scipy import spatial
tree = spatial.cKDTree(b)
_, idx = tree.query(a)
return idx
%timeit find_rows_lexsort(arr2, arr1)
>> 100 loops, best of 3: 4.63 ms per loop
%timeit find_rows_argsort(arr2, arr1)
>> 100 loops, best of 3: 7.37 ms per loop
%timeit find_rows_kdball(arr2, arr1)
>> 100 loops, best of 3: 18.5 ms per loop
The problem is, given a arbitrary 1-d vector y, expanded it into d basis vectors with n dimension.
The rule of the expansion is: each element in y is the index of columns in the n*n identity matrix.
For example:
y = [3, 0, 1]
n = 4
Since n = 4, we have the 4*4 identity matrix:
[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]
Expand each element y using the rule, we have:
[0, 1, 0]
[0, 0, 1]
[0, 0, 0]
[1, 0, 0]
I want to solve this problem using theano, with very large n (>50k) and very long y (>10k), so efficiency is important.
The solution using numpy is trivial, but the numpy.eye function may cost too much, we may use anther method to make it faster. Comparing the following methods:
import numpy as np
import theano
import theano.tensor as T
n = 25500
y_value = np.asarray([2, 0, 10, 4], dtype='int32')
# method 1
%timeit np.eye(n)[y_value]
# 10 loops, best of 3: 56.9 ms per loop
# method 2
def vec(i):
e = np.zeros(n)
e[i] = 1
return e
%timeit np.vstack([vec(i) for i in y_value])
# 100 loops, best of 3: 16.3 ms per loop
However, the second method may not work in theano since loop in symbolic variable may not trivial. Is there a method which can avoid using T.eye?
y_value can be an arbitrary 1-d vector.
You can try another approach. In my computer:
>>> %timeit np.eye(n)[y_value]
1 loops, best of 3: 544 ms per loop
However, you don't need to create the whole array if you know in advance the rows you want. You can do this:
>>> n = 25500
>>> n_rows = y_value.size
>>> r = np.zeros((n_rows, n))
>>> r[range(n_rows), y_value] = 1
You create a way smaller array, only y x n where y is the size of your index vector, and populate it in every row. The timing in my computer is:
>>> %%timeit
..: r = np.zeros((n_rows, n))
..: r[range(n_rows), y_value] = 1
100 loops, best of 3: 3.8 ms per loop
x151 speedup in my laptop.
Additionally, if you don't want an array full of zeros at the rear (x-axis), you could do:
>>> %%timeit
..: r = np.zeros((n_rows, y_value.max()+1))
..: r[range(n_rows), y_value] = 1
100000 loops, best of 3: 16 µs per loop
Which is even faster, but the resulting array is y x ymax, in this case 99 x 100, which might not be what you want.