This question already has answers here:
Grouping indices of unique elements in numpy
(6 answers)
Closed 5 years ago.
I have a numpy array with some integers, e.g.,
a = numpy.array([1, 6, 6, 4, 1, 1, 4])
I would now like to put all items into "bins" of equal values such that the bin with label 1 contains all indices of a that have the value 1. For the above example:
bins = {
1: [0, 4, 5],
6: [1, 2],
4: [3, 6],
}
A combination of unique and wheres does the trick,
uniques = numpy.unique(a)
bins = {u: numpy.where(a == u)[0] for u in uniques}
but this doesn't seem ideal since the number of unique entries may be large.
Defaultdict with append would do the trick:
from collections import defaultdict
d = defaultdict(list)
for ix, val in enumerate(a):
d[val].append(ix)
Here is one way by utilizing the broadcasting, np.where(), and np.split():
In [66]: unique = np.unique(a)
In [67]: rows, cols = np.where(unique[:, None] == a)
In [68]: indices = np.split(cols, np.where(np.diff(rows) != 0)[0] + 1)
In [69]: dict(zip(unique, indices))
Out[69]: {1: array([0, 4, 5]), 4: array([3, 6]), 6: array([1, 2])}
Here's one approach -
def groupby_uniqueness_dict(a):
sidx = a.argsort()
b = a[sidx]
cut_idx = np.flatnonzero(b[1:] != b[:-1])+1
parts = np.split(sidx, cut_idx)
out = dict(zip(b[np.r_[0,cut_idx]], parts))
return out
More efficient one by avoiding the use of np.split -
def groupby_uniqueness_dict_v2(a):
sidx = a.argsort() # use .tolist() for output dict values as lists
b = a[sidx]
cut_idx = np.flatnonzero(b[1:] != b[:-1])+1
idxs = np.r_[0,cut_idx, len(b)+1]
out = {b[i]:sidx[i:j] for i,j in zip(idxs[:-1], idxs[1:])}
return out
Sample run -
In [161]: a
Out[161]: array([1, 6, 6, 4, 1, 1, 4])
In [162]: groupby_uniqueness_dict(a)
Out[162]: {1: array([0, 4, 5]), 4: array([3, 6]), 6: array([1, 2])}
Runtime test
Other approach(es) -
from collections import defaultdict
def defaultdict_app(a): # #Grisha's soln
d = defaultdict(list)
for ix, val in enumerate(a):
d[val].append(ix)
return d
Timings -
Case #1 : Dict values as arrays
In [226]: a = np.random.randint(0,1000, 10000)
In [227]: %timeit defaultdict_app(a)
...: %timeit groupby_uniqueness_dict(a)
...: %timeit groupby_uniqueness_dict_v2(a)
100 loops, best of 3: 4.06 ms per loop
100 loops, best of 3: 3.06 ms per loop
100 loops, best of 3: 2.02 ms per loop
In [228]: a = np.random.randint(0,10000, 100000)
In [229]: %timeit defaultdict_app(a)
...: %timeit groupby_uniqueness_dict(a)
...: %timeit groupby_uniqueness_dict_v2(a)
10 loops, best of 3: 43.5 ms per loop
10 loops, best of 3: 29.1 ms per loop
100 loops, best of 3: 19.9 ms per loop
Case #2 : Dict values as lists
In [238]: a = np.random.randint(0,1000, 10000)
In [239]: %timeit defaultdict_app(a)
...: %timeit groupby_uniqueness_dict(a)
...: %timeit groupby_uniqueness_dict_v2(a)
100 loops, best of 3: 4.15 ms per loop
100 loops, best of 3: 4.5 ms per loop
100 loops, best of 3: 2.44 ms per loop
In [240]: a = np.random.randint(0,10000, 100000)
In [241]: %timeit defaultdict_app(a)
...: %timeit groupby_uniqueness_dict(a)
...: %timeit groupby_uniqueness_dict_v2(a)
10 loops, best of 3: 57.5 ms per loop
10 loops, best of 3: 54.6 ms per loop
10 loops, best of 3: 34 ms per loop
Related
simple example:
a = array([[[1, 0, 0],
[0, 2, 0],
[0, 0, 3]],
[[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]])
result = []
for i in a:
result.append(i.sum())
result = [6, 3]
Is there a numpy function doing this faster? If it helps: a contains only diagonal matrices.
Edit:
I just realized that a contains scipy csc_sparse matrices, i.e. its a numpy 1D array containing matrices and i can not apply the sum function with axis=(1, 2)
A proper use of the axis parameter of np.sum() would do:
import numpy as np
np.sum(a, axis=(1, 2))
# [6, 3]
While the above should be generic preferred method, if your input is actually diagonal over axis 1 and 2, then summing all the zeros is bound to be inefficient (read O(n² k) with same n and k as the gen_a() function below). Using np.sum() after np.diag() inside a loop can be much better (read O(n k) with same n and k as before). Possibly, using a list comprehension is the way to go:
import numpy as np
np.array([np.sum(np.diag(x)) for x in a])
# [3, 6]
To give some idea of the relative speed, let's write a function to generate inputs of arbitrary size:
def gen_a(n, k):
return np.array([
np.diag(np.ones(n, dtype=int))
if i % 2 else
np.diag(np.arange(1, n + 1, dtype=int))
for i in range(k)])
print(gen_a(3, 2))
# [[[1 0 0]
# [0 2 0]
# [0 0 3]]
# [[1 0 0]
# [0 1 0]
# [0 0 1]]]
Now, we can time for different input size. I have also included a list comprehension without the np.diag() call, which is fundamentally a slightly more concise version of your approach.
a = gen_a(3, 2)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 100000 loops, best of 3: 16 µs per loop
%timeit np.sum(a, axis=(1, 2))
# 100000 loops, best of 3: 4.51 µs per loop
%timeit np.array([np.sum(x) for x in a])
# 100000 loops, best of 3: 10 µs per loop
a = gen_a(3000, 2)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 10000 loops, best of 3: 20.5 µs per loop
%timeit np.sum(a, axis=(1, 2))
# 100 loops, best of 3: 17.8 ms per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 17.8 ms per loop
a = gen_a(3, 2000)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 100 loops, best of 3: 14.8 ms per loop
%timeit np.sum(a, axis=(1, 2))
# 10000 loops, best of 3: 34 µs per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 8.93 ms per loop
a = gen_a(300, 200)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 1000 loops, best of 3: 1.67 ms per loop
%timeit np.sum(a, axis=(1, 2))
# 100 loops, best of 3: 17.8 ms per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 19.3 ms per loop
And we observe that depending on the value of n and k one or the other solution gets faster.
For larger n, the list comprehension gets faster, but only if np.diag() is used.
On the contrary, for smaller n and larger k, np.sum() raw speed can outperform the explicit looping.
Is there a more numpythonic way to do this?
#example arrays
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7], dtype=np.float32)
values = np.array([0.2, 3.0, 1.5])
#get the indices where each value falls between values in arr
between = [np.nonzero(i > arr)[0][-1] for i in values]
For sorted arr, we can use np.searchsorted for performance -
In [67]: np.searchsorted(arr,values)-1
Out[67]: array([0, 2, 1])
Timings on large dataset -
In [81]: np.random.seed(0)
...: arr = np.unique(np.random.randint(0,10000, 10000))
...: values = np.random.randint(0,10000, 1000)
# #Andy L.'s soln
In [84]: %timeit np.argmin(values > arr[:,None], axis=0) - 1
10 loops, best of 3: 28.2 ms per loop
# Original soln
In [82]: %timeit [np.nonzero(i > arr)[0][-1] for i in values]
100 loops, best of 3: 8.68 ms per loop
# From this post
In [83]: %timeit np.searchsorted(arr,values)-1
10000 loops, best of 3: 57.8 µs per loop
Use broadcast and argmin
np.argmin(values > arr[:,None], axis=0) - 1
Out[32]: array([0, 2, 1], dtype=int32)
Note: I assume arr is monotonic increasing as in the sample
I have a sorted integer array, e.g., [0, 0, 1, 1, 1, 2, 4, 4], and I would like to determine where the integer blocks start and how long the blocks are. The block sizes are small but the array itself can be very large, so efficiency is important. The total number of blocks is also known.
numpy.unique does the trick:
import numpy
a = numpy.array([0, 0, 1, 1, 1, 2, 4, 4])
num_blocks = 4
print(a)
_, idx_start, count = numpy.unique(a, return_index=True, return_counts=True)
print(idx_start)
print(count)
[0 0 1 1 1 2 4 4]
[0 2 5 6]
[2 3 1 2]
but is slow. I would assume that, given the specific structure of the input array, there's a more efficient solution.
For example, something as simple as
import numpy
a = numpy.array([0, 0, 1, 1, 1, 2, 3, 3])
num_blocks = 4
k = 0
z = a[k]
block_idx = 0
counts = numpy.empty(num_blocks, dtype=int)
count = 0
while k < len(a):
if z == a[k]:
count += 1
else:
z = a[k]
counts[block_idx] = count
count = 1
block_idx += 1
k += 1
counts[block_idx] = count
print(counts)
gives the block sizes, and a simple numpy.cumsum would give index_start. Using a Python loop is slow of course.
Any hints?
Here's one with some masking and slicing -
def grp_start_len(a):
m = np.r_[True,a[:-1] != a[1:],True] #np.concatenate for a bit more boost
idx = np.flatnonzero(m)
return idx[:-1], np.diff(idx)
Sample run -
In [18]: a
Out[18]: array([0, 0, 1, 1, 1, 2, 4, 4])
In [19]: grp_start_len(a)
Out[19]: (array([0, 2, 5, 6]), array([2, 3, 1, 2]))
Timings (setup from #AGN Gazer's solution) -
In [24]: np.random.seed(0)
In [25]: a = np.sort(np.random.randint(1, 10000, 10000))
In [26]: %timeit _, idx_start, count = np.unique(a, return_index=True, return_counts=True)
1000 loops, best of 3: 411 µs per loop
# #AGN Gazer's solution
In [27]: %timeit st = np.where(np.ediff1d(a, a[-1] + 1, a[0] + 1))[0]; idx = st[:-1]; cnt = np.ediff1d(st)
10000 loops, best of 3: 81.2 µs per loop
In [28]: %timeit grp_start_len(a)
10000 loops, best of 3: 60.1 µs per loop
Bumping up the sizes 10x more -
In [40]: np.random.seed(0)
In [41]: a = np.sort(np.random.randint(1, 100000, 100000))
In [42]: %timeit _, idx_start, count = np.unique(a, return_index=True, return_counts=True)
...: %timeit st = np.where(np.ediff1d(a, a[-1] + 1, a[0] + 1))[0]; idx = st[:-1]; cnt = np.ediff1d(st)
...: %timeit grp_start_len(a)
100 loops, best of 3: 5.34 ms per loop
1000 loops, best of 3: 792 µs per loop
1000 loops, best of 3: 463 µs per loop
np.where(np.ediff1d(a, None, a[0]))[0]
If you want to have the first "0" as in your answer, add a non-zero number to a[0]:
np.where(np.ediff1d(a, None, a[0] + 1))[0]
EDIT (Block length):
Ah, just noticed that you also want to get block length. Then, modify the above code:
st = np.where(np.ediff1d(a, a[-1] + 1, a[0] + 1))[0]
idx = st[:-1]
cnt = np.ediff1d(st)
Then,
>>> print(idx)
[0 2 5 6]
>>> print(cnt)
[2 3 1 2]
EDIT 2 (Timing tests)
In [69]: a = np.sort(np.random.randint(1, 10000, 10000))
In [70]: %timeit _, idx_start, count = np.unique(a, return_index=True, return_counts=True)
240 µs ± 7.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [71]: %timeit st = np.where(np.ediff1d(a, a[-1] + 1, a[0] + 1))[0]; idx = st[:-1]; cnt = np.ediff1d(st)
74.3 µs ± 816 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I have an array a. I want to create new array with doubled size where items is x*2 and x*3.
For example: a = [1,10,100]
result must be b = [2,3,20,30,200,300]
I know this (ugly and very slow) way: b = sum([[x*2,x*3] for x in a], [])
There is other way (truly I want shortest way :)?
This can be done using a list comprehension with nested loops
In [4]: [y for x in a for y in (x * 2, x * 3)]
Out[4]: [2, 3, 20, 30, 200, 300]
Seems to outperform all answers, but loses to the numpy solution when a is large.
You can perform the multiplications in a list comprehension, then zip and flatten the resulting lists.
>>> a = [1,10,100]
>>> b = [j for i in zip([i*2 for i in a], [i*3 for i in a]) for j in i]
>>> b
[2, 3, 20, 30, 200, 300]
You can do this several ways. Below is one of them, using numpy (line 4):
In [1]: import numpy as np
In [2]: a = [1, 10, 100]
In [3]: %timeit sum([[x*2,x*3] for x in a], [])
1000000 loops, best of 3: 632 ns per loop
In [4]: %timeit x = np.array(a); np.array([x*2,x*3]).T.ravel()
100000 loops, best of 3: 3.25 µs per loop
Your way is faster! But this is because a is small. When it's larger, numpy becomes much better.
In [5]: a = range(1000)
In [6]: %timeit sum([[x*2,x*3] for x in a], [])
100 loops, best of 3: 2.37 ms per loop
In [7]: %timeit x = np.array(a); np.array([x*2,x*3]).T.ravel()
10000 loops, best of 3: 39.6 µs per loop
Included timeit results for #CoryKramer's answer above, which is fastest for small arrays but also loses to numpy for large arrays:
In [10]: a = [1, 10, 100]
In [11]: %timeit [j for i in zip([i*2 for i in a], [i*3 for i in a]) for j in i]
1000000 loops, best of 3: 853 ns per loop
In [12]: a = range(1000)
In [13]: %timeit [j for i in zip([i*2 for i in a], [i*3 for i in a]) for j in i]
1000 loops, best of 3: 252 µs per loop
Generally using tuples are faster than list:
>>> timeit.timeit("sum([[x*2,x*3] for x in (1,10,100)], [])", number=10000)
0.023060083389282227
>>> timeit.timeit("sum(((x*2,x*3) for x in (1,10,100)), ())", number=10000)
0.01667189598083496
I have a M by N array I, each row of which is an index an N dimensional array A. I want a vectorized expression to get the 1-d array of the M indexed values from A. I found that A[tuple(I.T)] does the right thing, but profiling shows it to be very expensive despite being vectorized. It is also not particularly elegant or "natural" and A[I] and A[I.T] do something completely different
What is the right way to do this?
It should also works for assignment like
A[tuple(I.T)] = 1
I think you are talking about something like:
In [398]: A=np.arange(24).reshape(4,6)
In [401]: I=np.array([[0,1],[1,2],[3,4],[0,0],[2,5]])
In [402]: tuple(I.T)
Out[402]: (array([0, 1, 3, 0, 2]), array([1, 2, 4, 0, 5]))
In [403]: A[tuple(I.T)]
Out[403]: array([ 1, 8, 22, 0, 17])
This is http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#purely-integer-array-indexing - purely integer array advanced indexing.
This is always going to be slower than basic indexing, which returns a view. Basic indexing pickes contiguous blocks of data, or values that can be selected with strides. That isn't possible with your indexing.
Look at some timings:
In [404]: timeit tuple(I.T)
100000 loops, best of 3: 3.4 µs per loop
In [405]: timeit A[tuple(I.T)]
100000 loops, best of 3: 10 µs per loop
In [406]: %%timeit i,j=tuple(I.T)
.....: A[i,j]
.....:
100000 loops, best of 3: 4.86 µs per loop
Constructing the tuple takes about 1/3 of the time. i,j=I.T is just a bit faster. But that indexing is the largest piece.
A[i,j] is the same as A[(i,j)] (as is A.__getitem__((i,j))). So wrapping I.T in tuple just produces the 2 indexing arrays, one for each dimension.
It is faster to index on a flattened version of the array:
In [420]: J= np.ravel_multi_index(tuple(I.T),A.shape)
In [421]: J
Out[421]: array([ 1, 8, 22, 0, 17], dtype=int32)
In [422]: A.flat[J]
Out[422]: array([ 1, 8, 22, 0, 17])
In [425]: timeit A.flat[J]
1000000 loops, best of 3: 1.56 µs per loop
In [426]: %%timeit
.....: J= np.ravel_multi_index(tuple(I.T),A.shape)
.....: A.flat[J]
.....:
100000 loops, best of 3: 11.2 µs per loop
So being able to precompute and reuse the indexes will save you time, but there's no way of getting about fact that selecting a bunch of individual values from A will takes extra time.
Just for fun, compare the time it takes to index A with each row of I:
In [442]: timeit np.array([A[tuple(i)] for i in I])
100000 loops, best of 3: 17.3 µs per loop
In [443]: timeit np.array([A[i,j] for i,j in I])
100000 loops, best of 3: 15.7 µs per loop
You can use linear indexing another way, like so -
def ravel_einsum(A,I):
# Get A's shape and calculate cummulative dimensions based on it
shp = np.asarray(A.shape)
cumdims = np.append(1,shp[::-1][:-1].cumprod())[::-1]
# Use linear indexing of A to extract elements from A corresponding
# to linear indexing of it with I
return A.ravel()[np.einsum('ij,j->i',I,cumdims)]
Runtime tests
Case #1:
In [84]: # Inputs
...: A = np.random.randint(0,10,(3,5,2,4,5,2,6,8,2,5,3,4,3))
...: I = np.mod(np.random.randint(0,10,(5,A.ndim)),A.shape)
...:
In [85]: %timeit A[tuple(I.T)]
10000 loops, best of 3: 27.7 µs per loop
In [86]: %timeit ravel_einsum(A,I)
10000 loops, best of 3: 48.3 µs per loop
Case #2:
In [87]: # Inputs
...: A = np.random.randint(0,10,(3,5,4,2))
...: I = np.mod(np.random.randint(0,5,(10000,A.ndim)),A.shape)
...:
In [88]: %timeit A[tuple(I.T)]
1000 loops, best of 3: 357 µs per loop
In [89]: %timeit ravel_einsum(A,I)
1000 loops, best of 3: 240 µs per loop
Case #3:
In [90]: # Inputs
...: A = np.random.randint(0,10,(30,50,40,20))
...: I = np.mod(np.random.randint(0,50,(5000,A.ndim)),A.shape)
...:
In [91]: %timeit A[tuple(I.T)]
1000 loops, best of 3: 220 µs per loop
In [92]: %timeit ravel_einsum(A,I)
10000 loops, best of 3: 168 µs per loop