I have two arrays, one is a list of values and one is a list of IDs corresponding to each value. Some IDs have multiple values. I want to create a new array that contains the maximum value recorded for each id, which will have a length equal to the number of unique ids.
Example using a for loop:
import numpy as np
values = np.array([5, 3, 2, 6, 3, 4, 8, 2, 4, 8])
ids = np.array([0, 1, 3, 3, 3, 3, 5, 6, 6, 6])
uniq_ids = np.unique(ids)
maximums = np.ones_like(uniq_ids) * np.nan
for i, id in enumerate(uniq_ids):
maximums[i] = np.max(values[np.where(ids == id)])
print(uniq_ids)
print(maximums)
[0 1 3 5 6]
[5. 3. 6. 8. 8.]
Is it possible to vectorize this so it runs fast? I'm imagining a one-liner that can create the "maximums" array using only NumPy functions, but I haven't been able to come up with anything that works.
Use np.lexsort to sort both lists simultaneously:
idx = np.lexsort([values, ids])
The indices of the last unique ID are given by
last = np.r_[np.flatnonzero(np.diff(ids[idx])), len(ids) - 1]
You can use this to get the maxima of each group:
values[idx[last]]
This is the same as values[idx][last], but faster because you only need to extract len(last) elements this way, instead of rearranging the whole array and then extracting.
Keep in mind that np.unique is basically doing sort and flatnonzero when you do return_index=True.
Here's a collection of all the solutions so far, along with a benchmark. I suggest a modified solution to #bb1's pandas recipe implemented below.
def Shannon(values, ids):
uniq_ids = np.unique(ids)
maxima = np.ones_like(uniq_ids) * np.nan
for i, id in enumerate(uniq_ids):
maxima[i] = np.max(values[np.where(ids == id)])
return maxima
def richardec(values, ids):
return [a.max() for a in np.split(values, np.arange(1, ids.shape[0])[(np.diff(ids) != 0)])]
def MadPhysicist(values, ids):
idx = np.lexsort([values, ids])
return values[idx[np.r_[np.flatnonzero(np.diff(ids[idx])), len(ids) - 1]]]
def PeptideWitch(values, ids):
return np.vectorize(lambda id: np.max(values[np.where(ids == id)]))(np.unique(ids))
def mathfux(values, ids):
idx = np.argsort(ids)
return np.maximum.reduceat(values[idx], np.r_[0, np.flatnonzero(np.diff(ids[idx])) + 1])
def bb1(values, ids):
return DataFrame({'ids': ids, 'vals': values}).groupby('ids')['values'].max().to_numpy()
def bb1_modified(values, ids):
return Series(values).groupby(ids).max().to_numpy()
values = np.array([5, 3, 2, 6, 3, 4, 8, 2, 4, 8])
ids = np.array([0, 1, 3, 3, 3, 3, 5, 6, 6, 6])
%timeit Shannon(values, ids)
42.1 µs ± 561 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit richardec(values, ids)
27.7 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit MadPhysicist(values, ids)
19.3 µs ± 268 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit PeptideWitch(values, ids)
55.9 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit mathfux(values, ids)
20.9 µs ± 308 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit bb1(values, ids)
865 µs ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit bb1_modified(values, ids)
537 µs ± 3.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
values = np.random.randint(10000, size=10000)
ids = np.random.randint(100, size=10000)
%timeit Shannon(values, ids)
1.76 ms ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit richardec(values, ids)
29.1 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit MadPhysicist(values, ids)
904 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit PeptideWitch(values, ids)
1.74 ms ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit mathfux(values, ids)
372 µs ± 3.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit bb1(values, ids)
964 µs ± 9.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit bb1_modified(values, ids)
679 µs ± 7.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#mathfux's suggestion of using np.maximum.reduceat is by far the fastest all around, followed by the modified pandas solution and then lexsorting. The pandas solution is likely very similar to the reduceat one, but with a lot more overhead. You can see that the incremental change for the modified pandas solution is tiny because of that.
For large arrays (which are the only ones that matter), it seems that finding the maxima individually is comparable to the question, while richardec's comprehension is much slower, despite the fact that it does not sort the input values according to the IDs.
np.lexsort sorts by multiple columns. However, this is not compulsory. You can sort ids first and then choose maximum item of each divided group using numpy.maximum.reduceat
def mathfux(values, ids, return_groups=False):
argidx = np.argsort(ids) #70% time
ids_sort, values_sort = ids[argidx], values[argidx] #4% time
div_points = np.r_[0, np.flatnonzero(np.diff(ids_sort)) + 1] #11% time (the most part for np.flatnonzero)
if return_groups:
return ids[div_points], np.maximum.reduceat(values_sort, div_points)
else: return np.maximum.reduceat(values_sort, div_points)
mathfux(values, ids, return_groups=True)
>>> (array([0, 1, 3, 5, 6]), array([5, 3, 6, 8, 8]))
mathfux(values, ids)
>>> mathfux(values, ids)
array([5, 3, 6, 8, 8])
Usually, some parts of numpy codes could be optimised further in numba. Note that np.argsort is a bottleneck in majority of groupby problems which can't be replaced by any other method. It is unlikely to be improved soon in numba or numpy. So you are reaching an optimal performance here and can't do much in further optimisations.
Here's a solution, which, although not 100% vectorized, (per my benchmarks) takes about half the time as your does (using your sample data). The performance improvement probably becomes more drastic with more data:
maximums = [a.max() for a in np.split(values, np.arange(1, ids.shape[0])[(np.diff(ids) != 0)])]
Output:
>>> maximums
[5, 3, 6, 8, 8]
In trying to visualize the problem:
In [82]: [np.where(ids==id) for id in uniq_ids]
Out[82]:
[(array([0]),),
(array([1]),),
(array([2, 3, 4, 5]),),
(array([6]),),
(array([7, 8, 9]),)]
unique can also return:
In [83]: np.unique(ids, return_inverse=True)
Out[83]: (array([0, 1, 3, 5, 6]), array([0, 1, 2, 2, 2, 2, 3, 4, 4, 4]))
Which is a variante on what richardec produced:
In [88]: [a for a in np.split(ids, np.arange(1, ids.shape[0])[(np.diff(ids) !=
...: 0)])]
Out[88]: [array([0]), array([1]), array([3, 3, 3, 3]), array([5]), array([6, 6, 6])]
That inverse is also produced by doing where on all == at once:
In [90]: ids[:,None] == uniq_ids
Out[90]:
array([[ True, False, False, False, False],
[False, True, False, False, False],
[False, False, True, False, False],
[False, False, True, False, False],
[False, False, True, False, False],
[False, False, True, False, False],
[False, False, False, True, False],
[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, False, True]])
In [91]: np.nonzero(ids[:,None] == uniq_ids)
Out[91]: (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 2, 2, 2, 3, 4, 4, 4]))
I'm still thinking through this ...
EDIT: I'll leave this up as an example of why we can't always use np.vectorize() to make everything magically faster:
One solution is to use numpy's vectorize function:
import numpy as np
values = np.array([5, 3, 2, 6, 3, 4, 8, 2, 4, 8])
ids = np.array([0, 1, 3, 3, 3, 3, 5, 6, 6, 6])
def my_func(id):
return np.max(values[np.where(ids==id)])
vector_func = np.vectorize(my_func)
maximums = vector_func(np.unique(ids))
which returns
array([5, 3, 6, 8, 8])
But as for speed, your version has about the same performance when we use
values = np.array([random.randint(1, 100) for i in range(1000000)])
ids = []
for i in range(100000):
r = random.randint(1, 4)
if r == 3:
for x in range(3):
ids.append(i)
elif r == 2:
for x in range(4):
ids.append(i)
else:
ids.append(i)
ids = np.array(ids)
It's about 12 seconds per execution.
With pandas:
import pandas as pd
def with_pandas(ids, vals):
df = pd.DataFrame({'ids': ids, 'vals': values})
return df.groupby('ids')['vals'].max().to_numpy()
Timing:
import numpy as np
values = np.random.randint(10000, size=10000)
ids = np.random.randint(100, size=10000)
%timeit with_pandas(ids, values)
692 µs ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Related
For example I want to convert this list
x=[False, True, True, True, True, False, True, True, False, True]
to a ranges (start and end locations) of True values
[[1,4],
[6,7],
[9,9]]
This is obviously possible using a for loop. However, I am looking for a other options that are faster and better (one-liners are welcome e.g. maybe a list comprehension). Ideally, I am looking for some way that could also be applicable to a pandas series.
A solution with Pandas only:
s = pd.Series(x)
grp = s.eq(False).cumsum()
arr = grp.loc[s.eq(True)] \
.groupby(grp) \
.apply(lambda x: [x.index.min(), x.index.max()])
Output:
>>> arr
1 [1, 4]
2 [6, 7]
3 [9, 9]
dtype: object
>>> arr.tolist()
[[1, 4], [6, 7], [9, 9]]
Alternative:
np.vstack([s[s & (s.shift(1, fill_value=False) == False)].index.values,
s[s & (s.shift(-1, fill_value=False) == False)].index.values]).T
# Output:
array([[1, 4],
[6, 7],
[9, 9]])
Performance
# Solution 1
>>> %timeit s.eq(False).cumsum().loc[s.eq(True)].groupby(s.eq(False).cumsum()).apply(lambda x: [x.index.min(), x.index.max()])
1.22 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Solution 2
>>> %timeit np.vstack([s[s & (s.shift(1, fill_value=False) == False)].index.values, s[s & (s.shift(-1, fill_value=False) == False)].index.values]).T
477 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Solution #psidom
>>> %timeit np_vec2ran(x)
29.2 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
For 1,000,000 records:
x = np.random.choice([True, False], 1000000)
s = pd.Series(s)
>>> %timeit np.vstack([s[s & (s.shift(1, fill_value=False) == False)].index.values, s[s & (s.shift(-1, fill_value=False) == False)].index.values]).T
18.2 ms ± 247 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit np_vec2ran(x)
5.03 ms ± 266 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Option with numpy. We can check if previous value is False and current value is True, then it's the start of True sequence. On the other hand, if previous value is True and current value is False, then it's the end of True sequence.
z = np.concatenate(([False], x, [False]))
start = np.flatnonzero(~z[:-1] & z[1:])
end = np.flatnonzero(z[:-1] & ~z[1:])
np.column_stack((start, end-1))
array([[1, 4],
[6, 7],
[9, 9]], dtype=int32)
A little benchmark against the faster pandas solution:
def np_vec2ran(x):
z = np.concatenate(([False], x, [False]))
start = np.flatnonzero(~z[:-1] & z[1:])
end = np.flatnonzero(z[:-1] & ~z[1:])
return np.column_stack((start, end-1))
np_vec2ran(x)
array([[1, 4],
[6, 7],
[9, 9]], dtype=int32)
def pd_vec2ran(x):
s = pd.Series(x)
return list(zip(s[s.eq(True) & s.shift(1).eq(False)].index, s[s.eq(True) & s.shift(-1, fill_value=False).eq(False)].index))
pd_vec2ran(x)
[(1, 4), (6, 7), (9, 9)]
timeit('pd_vec2ran(x)', number=10, globals=globals())
0.040585001000181364
timeit('np_vec2ran(x)', number=10, globals=globals())
0.0011799999992945231
Here's a solution that uses scipy and pandas:
import pandas as pd
import scipy as sc
def boolean_vector2ranges(x):
df1=pd.DataFrame({'location':range(len(l)),
'bool':x,
})
df1['group']=sc.ndimage.measurements.label(df1['bool'].astype(int))[0]
return df1.loc[(df1['group']!=0),:].groupby('group')['location'].agg([min,max])
boolean_vector2ranges(x=[False, True, True, True, True, False, True, True, False, True])
returns,
Just wondering if there is a way to construct a tumbling window in python. So for example if I have list/ndarray , listA = [3,2,5,9,4,6,3,8,7,9]. Then how could I find the maximum of the first 3 items (3,2,5) -> 5, and then the next 3 items (9,4,6) -> 9 and so on... Sort of like breaking it up to sections and finding the max. So the final result would be list [5,9,8,9]
Approach #1: One-liner for windowed-max using np.maximum.reduceat -
In [118]: np.maximum.reduceat(listA,np.arange(0,len(listA),3))
Out[118]: array([5, 9, 8, 9])
Becomes more compact with np.r_ -
np.maximum.reduceat(listA,np.r_[:len(listA):3])
Approach #2: Generic ufunc way
Here's a function for generic ufuncs and that window length as a parameter -
def windowed_ufunc(a, ufunc, W):
a = np.asarray(a)
n = len(a)
L = W*(n//W)
out = ufunc(a[:L].reshape(-1,W),axis=1)
if n>L:
out = np.hstack((out, ufunc(a[L:])))
return out
Sample run -
In [81]: a = [3,2,5,9,4,6,3,8,7,9]
In [82]: windowed_ufunc(a, ufunc=np.max, W=3)
Out[82]: array([5, 9, 8, 9])
On other ufuncs -
In [83]: windowed_ufunc(a, ufunc=np.min, W=3)
Out[83]: array([2, 4, 3, 9])
In [84]: windowed_ufunc(a, ufunc=np.sum, W=3)
Out[84]: array([10, 19, 18, 9])
In [85]: windowed_ufunc(a, ufunc=np.mean, W=3)
Out[85]: array([3.33333333, 6.33333333, 6. , 9. ])
Benchmarking
Timings on NumPy solutions on array data with sample data scaled up by 10000x -
In [159]: a = [3,2,5,9,4,6,3,8,7,9]
In [160]: a = np.tile(a, 10000)
# #yatu's soln
In [162]: %timeit moving_maxima(a, w=3)
435 µs ± 8.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# From this post - app#1
In [167]: %timeit np.maximum.reduceat(a,np.arange(0,len(a),3))
353 µs ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# From this post - app#2
In [165]: %timeit windowed_ufunc(a, ufunc=np.max, W=3)
379 µs ± 6.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you want a one-liner, you can use list comprehension:
listA = [3,2,5,9,4,6,3,8,7,9]
listB=[max(listA[i:i+3]) for i in range(0,len(listA),3)]
print (listB)
it returns:
[5, 9, 8, 9]
Of course the codes can be written more dynamically: if you want a different window size, just change 3 to any integer.
Using numpy, you can extend the list with zeroes so its length is divisible by the window size, and reshape and compute the maxalong the second axis:
def moving_maxima(a, w):
mod = len(a)%w
d = w if mod else mod
x = np.r_[a, [0]*(d-mod)]
return x.reshape(-1,w).max(1)
Some examples:
moving_maxima(listA,2)
# array([3., 9., 6., 8., 9.])
moving_maxima(listA,3)
#array([5, 9, 8, 9])
moving_maxima(listA,4)
#array([9, 8, 9])
I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A.
Now I am looking for a way to get the index of B's values within A.
For example:
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9
You can use np.in1d with np.nonzero -
np.nonzero(np.in1d(A,B))[0]
You can also use np.searchsorted, if you care about maintaining the order -
np.searchsorted(A,B)
For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -
sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
I would add in my favorite broadcasting too in the mix to solve a generic case -
np.nonzero(B[:,None] == A)[1]
Sample run -
In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])
In [126]: B
Out[126]: array([ 1, 10, 7])
In [127]: sort_idx = A.argsort()
In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])
In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])
Have you tried searchsorted?
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
A.searchsorted(B)
# array([0, 6, 9])
Just for completeness: If the values in A are non negative and reasonably small:
lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices = lookup[B]
I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.
As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.
I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.
In [1]: import numpy as np
In [2]: from numba import njit
In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])
In [5]: np.where(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [6]: np.nonzero(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [7]: np.searchsorted(a, b)
...: array([0, 6, 9])
In [8]: np.searchsorted(a, np.intersect1d(a, b))
...: array([0, 6, 9])
In [9]: [i for i, x in enumerate(a) if x in b]
...: [0, 6, 9]
In [10]: #njit
...: def func(a, b):
...: return [i for i, x in enumerate(a) if x in b]
In [11]: func(a, b)
...: [0, 6, 9]
Now, let's compare the timing performance of these solutions.
In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I'm having a hard time trying to solve this problem, the main issue is I'm running a simulation, so for lops are mainly forbidden, I have a numpy array NxN, in this case mine is about (10000x20).
stoploss = 19.9 # condition to apply
monte_carlo_simulation(20,1.08,10000,20) #which gives me that 10000x20 np array
mask_trues = np.where(np.any((simulation <= stoploss) == True, axis=1)) # boolean mask
I need some code to make a new vector of len(10000) which returns an array with all the positions for every row, lets suppose:
function([[False,True,True],[False,False,True]])
output = [[1,2],[2]]
Again, the main problem resides in not using loops.
Simply this:
list(map(np.where, my_array))
performance comparison against Kasrâmvd's solution:
def f(a):
return list(map(np.where, a))
def g(a):
x, y = np.where(a)
return np.split(y, np.where(np.diff(x) != 0)[0] + 1)
a = np.random.randint(2, size=(10000,20))
%timeit f(a)
%timeit g(a)
7.66 ms ± 38.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.3 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For completeness I'll demonstrate a sparse matrix approach:
In [57]: A = np.array([[False,True,True],[False,False,True]])
In [58]: A
Out[58]:
array([[False, True, True],
[False, False, True]])
In [59]: M = sparse.lil_matrix(A)
In [60]: M
Out[60]:
<2x3 sparse matrix of type '<class 'numpy.bool_'>'
with 3 stored elements in LInked List format>
In [61]: M.data
Out[61]: array([list([True, True]), list([True])], dtype=object)
In [62]: M.rows
Out[62]: array([list([1, 2]), list([2])], dtype=object)
And to make a large sparse one:
In [63]: BM = sparse.random(10000,20,.05, 'lil')
In [64]: BM
Out[64]:
<10000x20 sparse matrix of type '<class 'numpy.float64'>'
with 10000 stored elements in LInked List format>
In [65]: BM.rows
Out[65]:
array([list([3]), list([]), list([6, 15]), ..., list([]), list([11]),
list([])], dtype=object)
Rough time tests:
In [66]: arr = BM.A
In [67]: timeit sparse.lil_matrix(arr)
19.5 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [68]: timeit list(map(np.where,arr))
11 ms ± 55.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [69]: %%timeit
...: x,y = np.where(arr)
...: np.split(y, np.where(np.diff(x) != 0)[0] + 1)
...:
13.8 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Generating a csr sparse format matrix is faster:
In [70]: timeit sparse.csr_matrix(arr)
2.68 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [71]: Mr = sparse.csr_matrix(arr)
In [72]: Mr.indices
Out[72]: array([ 3, 6, 15, ..., 8, 16, 11], dtype=int32)
In [73]: Mr.indptr
Out[73]: array([ 0, 1, 1, ..., 9999, 10000, 10000], dtype=int32)
In [74]: np.where(arr)[1]
Out[74]: array([ 3, 6, 15, ..., 8, 16, 11])
It's indices is just like the column where, while the indptr is like the split indices.
Here is one way using np.split() and np.diff():
x, y = np.where(boolean_array)
np.split(y, np.where(np.diff(x) != 0)[0] + 1)
Demo:
In [12]: a = np.array([[False,True,True],[False,False,True]])
In [13]: x, y = np.where(a)
In [14]: np.split(y, np.where(np.diff(x) != 0)[0] + 1)
Out[14]: [array([1, 2]), array([2])]
This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 4 years ago.
Say I have a NumPy array:
>>> X = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
>>> X
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
and an array of indexes that I want to select for each row:
>>> ixs = np.array([[1, 3], [0, 1], [1, 2]])
>>> ixs
array([[1, 3],
[0, 1],
[1, 2]])
How do I index the array X so that for every row in X I select the two indices specified in ixs?
So for this case, I want to select element 1 and 3 for the first row, element 0 and 1 for the second row, and so on. The output should be:
array([[2, 4],
[5, 6],
[10, 11]])
A slow solution would be something like this:
output = np.array([row[ix] for row, ix in zip(X, ixs)])
however this can get kinda slow for extremely long arrays. Is there a faster way to do this without a loop using NumPy?
EDIT: Some very approximate speed tests on a 2.5K * 1M array with 2K wide ixs (10GB):
np.array([row[ix] for row, ix in zip(X, ixs)]) 0.16s
X[np.arange(len(ixs)), ixs.T].T 0.175s
X.take(idx+np.arange(0, X.shape[0]*X.shape[1], X.shape[1])[:,None]) 33s
np.fromiter((X[i, j] for i, row in enumerate(ixs) for j in row), dtype=X.dtype).reshape(ixs.shape) 2.4s
You can use this:
X[np.arange(len(ixs)), ixs.T].T
Here is the reference for complex indexing.
I believe you can use .take thusly:
In [185]: X
Out[185]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
In [186]: idx
Out[186]:
array([[1, 3],
[0, 1],
[1, 2]])
In [187]: X.take(idx + (np.arange(X.shape[0]) * X.shape[1]).reshape(-1, 1))
Out[187]:
array([[ 2, 4],
[ 5, 6],
[10, 11]])
If your array dimensions are massive, it might be faster, albeit uglier, to do:
idx+np.arange(0, X.shape[0]*X.shape[1], X.shape[1])[:,None]
Just for fun, see how the following performs:
np.fromiter((X[i, j] for i, row in enumerate(ixs) for j in row), dtype=X.dtype, count=ixs.size).reshape(ixs.shape)
Edit to add timings
In [15]: X = np.arange(1000*10000, dtype=np.int32).reshape(1000,-1)
In [16]: ixs = np.random.randint(0, 10000, (1000, 2))
In [17]: ixs.sort(axis=1)
In [18]: ixs
Out[18]:
array([[2738, 3511],
[3600, 7414],
[7426, 9851],
...,
[1654, 8252],
[2194, 8200],
[5497, 8900]])
In [19]: %timeit np.array([row[ix] for row, ix in zip(X, ixs)])
928 µs ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [20]: %timeit X[np.arange(len(ixs)), ixs.T].T
23.6 µs ± 491 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [21]: %timeit X.take(idx+np.arange(0, X.shape[0]*X.shape[1], X.shape[1])[:,None])
20.6 µs ± 530 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [22]: %timeit np.fromiter((X[i, j] for i, row in enumerate(ixs) for j in row), dtype=X.dtype, count=ixs.size).reshape(ixs.shape)
1.42 ms ± 9.94 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#mxbi I've added some timings and my results aren't really consistent with yours, you should check it out
Here's a larger array:
In [33]: X = np.arange(10000*100000, dtype=np.int32).reshape(10000,-1)
In [34]: ixs = np.random.randint(0, 100000, (10000, 2))
In [35]: ixs.sort(axis=1)
In [36]: X.shape
Out[36]: (10000, 100000)
In [37]: ixs.shape
Out[37]: (10000, 2)
With some results:
In [42]: %timeit np.array([row[ix] for row, ix in zip(X, ixs)])
11.4 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [43]: %timeit X[np.arange(len(ixs)), ixs.T].T
596 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [44]: %timeit X.take(ixs+np.arange(0, X.shape[0]*X.shape[1], X.shape[1])[:,None])
540 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Now, we are using column 500 indices instead of two, and we see the list-comprehension start winning out:
In [45]: ixs = np.random.randint(0, 100000, (10000, 500))
In [46]: ixs.sort(axis=1)
In [47]: %timeit np.array([row[ix] for row, ix in zip(X, ixs)])
93 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [48]: %timeit X[np.arange(len(ixs)), ixs.T].T
133 ms ± 638 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [49]: %timeit X.take(ixs+np.arange(0, X.shape[0]*X.shape[1], X.shape[1])[:,None])
87.5 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The usual suggestion for indexing items from rows is:
X[np.arange(X.shape[0])[:,None], ixs]
That is, make a row index of shape (n,1) (column vector), which will broadcast with the (n,m) shape of ixs to give a (n,m) solution.
This basically the same as:
X[np.arange(len(ixs)), ixs.T].T
which broadcasts a (n,) index against a (m,n), and transposes.
Timings are essentially the same:
In [299]: X = np.ones((1000,2000))
In [300]: ixs = np.random.randint(0,2000,(1000,200))
In [301]: timeit X[np.arange(len(ixs)), ixs.T].T
6.58 ms ± 71.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [302]: timeit X[np.arange(X.shape[0])[:,None], ixs]
6.57 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
and for comparison:
In [307]: timeit np.array([row[ix] for row, ix in zip(X, ixs)])
6.63 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I'm a little surprised that this list comprehension does so well. I wonder how the relative advantages compare when the dimensions change, particularly in the relative shape of X and ixs (long, wide etc).
The first solution is the style of indexing produced by ix_:
In [303]: np.ix_(np.arange(3), np.arange(2))
Out[303]:
(array([[0],
[1],
[2]]), array([[0, 1]]))
This should work
[X[i][[y]] for i, y in enumerate(ixs)]
EDIT: I just noticed you wanted no loop solution.