I have a M by N array I, each row of which is an index an N dimensional array A. I want a vectorized expression to get the 1-d array of the M indexed values from A. I found that A[tuple(I.T)] does the right thing, but profiling shows it to be very expensive despite being vectorized. It is also not particularly elegant or "natural" and A[I] and A[I.T] do something completely different
What is the right way to do this?
It should also works for assignment like
A[tuple(I.T)] = 1
I think you are talking about something like:
In [398]: A=np.arange(24).reshape(4,6)
In [401]: I=np.array([[0,1],[1,2],[3,4],[0,0],[2,5]])
In [402]: tuple(I.T)
Out[402]: (array([0, 1, 3, 0, 2]), array([1, 2, 4, 0, 5]))
In [403]: A[tuple(I.T)]
Out[403]: array([ 1, 8, 22, 0, 17])
This is http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#purely-integer-array-indexing - purely integer array advanced indexing.
This is always going to be slower than basic indexing, which returns a view. Basic indexing pickes contiguous blocks of data, or values that can be selected with strides. That isn't possible with your indexing.
Look at some timings:
In [404]: timeit tuple(I.T)
100000 loops, best of 3: 3.4 µs per loop
In [405]: timeit A[tuple(I.T)]
100000 loops, best of 3: 10 µs per loop
In [406]: %%timeit i,j=tuple(I.T)
.....: A[i,j]
.....:
100000 loops, best of 3: 4.86 µs per loop
Constructing the tuple takes about 1/3 of the time. i,j=I.T is just a bit faster. But that indexing is the largest piece.
A[i,j] is the same as A[(i,j)] (as is A.__getitem__((i,j))). So wrapping I.T in tuple just produces the 2 indexing arrays, one for each dimension.
It is faster to index on a flattened version of the array:
In [420]: J= np.ravel_multi_index(tuple(I.T),A.shape)
In [421]: J
Out[421]: array([ 1, 8, 22, 0, 17], dtype=int32)
In [422]: A.flat[J]
Out[422]: array([ 1, 8, 22, 0, 17])
In [425]: timeit A.flat[J]
1000000 loops, best of 3: 1.56 µs per loop
In [426]: %%timeit
.....: J= np.ravel_multi_index(tuple(I.T),A.shape)
.....: A.flat[J]
.....:
100000 loops, best of 3: 11.2 µs per loop
So being able to precompute and reuse the indexes will save you time, but there's no way of getting about fact that selecting a bunch of individual values from A will takes extra time.
Just for fun, compare the time it takes to index A with each row of I:
In [442]: timeit np.array([A[tuple(i)] for i in I])
100000 loops, best of 3: 17.3 µs per loop
In [443]: timeit np.array([A[i,j] for i,j in I])
100000 loops, best of 3: 15.7 µs per loop
You can use linear indexing another way, like so -
def ravel_einsum(A,I):
# Get A's shape and calculate cummulative dimensions based on it
shp = np.asarray(A.shape)
cumdims = np.append(1,shp[::-1][:-1].cumprod())[::-1]
# Use linear indexing of A to extract elements from A corresponding
# to linear indexing of it with I
return A.ravel()[np.einsum('ij,j->i',I,cumdims)]
Runtime tests
Case #1:
In [84]: # Inputs
...: A = np.random.randint(0,10,(3,5,2,4,5,2,6,8,2,5,3,4,3))
...: I = np.mod(np.random.randint(0,10,(5,A.ndim)),A.shape)
...:
In [85]: %timeit A[tuple(I.T)]
10000 loops, best of 3: 27.7 µs per loop
In [86]: %timeit ravel_einsum(A,I)
10000 loops, best of 3: 48.3 µs per loop
Case #2:
In [87]: # Inputs
...: A = np.random.randint(0,10,(3,5,4,2))
...: I = np.mod(np.random.randint(0,5,(10000,A.ndim)),A.shape)
...:
In [88]: %timeit A[tuple(I.T)]
1000 loops, best of 3: 357 µs per loop
In [89]: %timeit ravel_einsum(A,I)
1000 loops, best of 3: 240 µs per loop
Case #3:
In [90]: # Inputs
...: A = np.random.randint(0,10,(30,50,40,20))
...: I = np.mod(np.random.randint(0,50,(5000,A.ndim)),A.shape)
...:
In [91]: %timeit A[tuple(I.T)]
1000 loops, best of 3: 220 µs per loop
In [92]: %timeit ravel_einsum(A,I)
10000 loops, best of 3: 168 µs per loop
Related
simple example:
a = array([[[1, 0, 0],
[0, 2, 0],
[0, 0, 3]],
[[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]])
result = []
for i in a:
result.append(i.sum())
result = [6, 3]
Is there a numpy function doing this faster? If it helps: a contains only diagonal matrices.
Edit:
I just realized that a contains scipy csc_sparse matrices, i.e. its a numpy 1D array containing matrices and i can not apply the sum function with axis=(1, 2)
A proper use of the axis parameter of np.sum() would do:
import numpy as np
np.sum(a, axis=(1, 2))
# [6, 3]
While the above should be generic preferred method, if your input is actually diagonal over axis 1 and 2, then summing all the zeros is bound to be inefficient (read O(n² k) with same n and k as the gen_a() function below). Using np.sum() after np.diag() inside a loop can be much better (read O(n k) with same n and k as before). Possibly, using a list comprehension is the way to go:
import numpy as np
np.array([np.sum(np.diag(x)) for x in a])
# [3, 6]
To give some idea of the relative speed, let's write a function to generate inputs of arbitrary size:
def gen_a(n, k):
return np.array([
np.diag(np.ones(n, dtype=int))
if i % 2 else
np.diag(np.arange(1, n + 1, dtype=int))
for i in range(k)])
print(gen_a(3, 2))
# [[[1 0 0]
# [0 2 0]
# [0 0 3]]
# [[1 0 0]
# [0 1 0]
# [0 0 1]]]
Now, we can time for different input size. I have also included a list comprehension without the np.diag() call, which is fundamentally a slightly more concise version of your approach.
a = gen_a(3, 2)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 100000 loops, best of 3: 16 µs per loop
%timeit np.sum(a, axis=(1, 2))
# 100000 loops, best of 3: 4.51 µs per loop
%timeit np.array([np.sum(x) for x in a])
# 100000 loops, best of 3: 10 µs per loop
a = gen_a(3000, 2)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 10000 loops, best of 3: 20.5 µs per loop
%timeit np.sum(a, axis=(1, 2))
# 100 loops, best of 3: 17.8 ms per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 17.8 ms per loop
a = gen_a(3, 2000)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 100 loops, best of 3: 14.8 ms per loop
%timeit np.sum(a, axis=(1, 2))
# 10000 loops, best of 3: 34 µs per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 8.93 ms per loop
a = gen_a(300, 200)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 1000 loops, best of 3: 1.67 ms per loop
%timeit np.sum(a, axis=(1, 2))
# 100 loops, best of 3: 17.8 ms per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 19.3 ms per loop
And we observe that depending on the value of n and k one or the other solution gets faster.
For larger n, the list comprehension gets faster, but only if np.diag() is used.
On the contrary, for smaller n and larger k, np.sum() raw speed can outperform the explicit looping.
I have a NumPy array:
arr = [[1, 2],
[3, 4]]
I want to create a new array that contains powers of arr up to a power order:
# arr_new = [arr^0, arr^1, arr^2, arr^3,...arr^order]
arr_new = [[1, 1, 1, 2, 1, 4, 1, 8],
[1, 1, 3, 4, 9, 16, 27, 64]]
My current approach uses for loops:
# Pre-allocate an array for powers
arr = np.array([[1, 2],[3,4]])
order = 3
rows, cols = arr.shape
arr_new = np.zeros((rows, (order+1) * cols))
# Iterate over each exponent
for i in range(order + 1):
arr_new[:, (i * cols) : (i + 1) * cols] = arr**i
print(arr_new)
Is there a faster (i.e. vectorized) approach to creating powers of an array?
Benchmarking
Thanks to #hpaulj and #Divakar and #Paul Panzer for the answers. I benchmarked the loop-based and broadcasting-based operations on the following test arrays.
arr = np.array([[1, 2],
[3,4]])
order = 3
arrLarge = np.random.randint(0, 10, (100, 100)) # 100 x 100 array
orderLarge = 10
The loop_based function is:
def loop_based(arr, order):
# pre-allocate an array for powers
rows, cols = arr.shape
arr_new = np.zeros((rows, (order+1) * cols))
# iterate over each exponent
for i in range(order + 1):
arr_new[:, (i * cols) : (i + 1) * cols] = arr**i
return arr_new
The broadcast_based function using hstack is:
def broadcast_based_hstack(arr, order):
# Create a 3D exponent array for a 2D input array to force broadcasting
powers = np.arange(order + 1)[:, None, None]
# Generate values (third axis contains array at various powers)
exponentiated = arr ** powers
# Reshape and return array
return np.hstack(exponentiated) # <== using hstack function
The broadcast_based function using reshape is:
def broadcast_based_reshape(arr, order):
# Create a 3D exponent array for a 2D input array to force broadcasting
powers = np.arange(order + 1)[:, None]
# Generate values (3-rd axis contains array at various powers)
exponentiated = arr[:, None] ** powers
# reshape and return array
return exponentiated.reshape(arr.shape[0], -1) # <== using reshape function
The broadcast_based function using cumulative product cumprod and reshape:
def broadcast_cumprod_reshape(arr, order):
rows, cols = arr.shape
# Create 3D empty array where the middle dimension is
# the array at powers 0 through order
out = np.empty((rows, order + 1, cols), dtype=arr.dtype)
out[:, 0, :] = 1 # 0th power is always 1
a = np.broadcast_to(arr[:, None], (rows, order, cols))
# Cumulatively multiply arrays so each multiplication produces the next order
np.cumprod(a, axis=1, out=out[:,1:,:])
return out.reshape(rows, -1)
On Jupyter notebook, I used the timeit command and got these results:
Small arrays (2x2):
%timeit -n 100000 loop_based(arr, order)
7.41 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit -n 100000 broadcast_based_hstack(arr, order)
10.1 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit -n 100000 broadcast_based_reshape(arr, order)
3.31 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit -n 100000 broadcast_cumprod_reshape(arr, order)
11 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Large arrays (100x100):
%timeit -n 1000 loop_based(arrLarge, orderLarge)
261 µs ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit -n 1000 broadcast_based_hstack(arrLarge, orderLarge)
225 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit -n 1000 broadcast_based_reshape(arrLarge, orderLarge)
223 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit -n 1000 broadcast_cumprod_reshape(arrLarge, orderLarge)
157 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Conclusions:
It seems that the broadcast based approach using reshape is faster for smaller arrays. However, for large arrays, the cumprod approach scales better and is faster.
Extend arrays to higher dims and let broadcasting do its magic with some help from reshaping -
In [16]: arr = np.array([[1, 2],[3,4]])
In [17]: order = 3
In [18]: (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
Out[18]:
array([[ 1, 1, 1, 2, 1, 4, 1, 8],
[ 1, 1, 3, 4, 9, 16, 27, 64]])
Note that arr[:,None] is essentially arr[:,None,:], but we can skip the trailing ellipsis for brevity.
Timings on a bigger dataset -
In [40]: np.random.seed(0)
...: arr = np.random.randint(0,9,(100,100))
...: order = 10
# #hpaulj's soln with broadcasting and stacking
In [41]: %timeit np.hstack(arr **np.arange(order+1)[:,None,None])
1000 loops, best of 3: 734 µs per loop
In [42]: %timeit (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
1000 loops, best of 3: 401 µs per loop
That reshaping part is practically free and that's where we gain performance here alongwith the broadcasting part of course, as seen in the breakdown below -
In [52]: %timeit (arr[:,None]**np.arange(order+1)[:,None])
1000 loops, best of 3: 390 µs per loop
In [53]: %timeit (arr[:,None]**np.arange(order+1)[:,None]).reshape(arr.shape[0],-1)
1000 loops, best of 3: 401 µs per loop
Use broadcasting to generate the values, and reshape or rearrange the values as desired:
In [34]: arr **np.arange(4)[:,None,None]
Out[34]:
array([[[ 1, 1],
[ 1, 1]],
[[ 1, 2],
[ 3, 4]],
[[ 1, 4],
[ 9, 16]],
[[ 1, 8],
[27, 64]]])
In [35]: np.hstack(_)
Out[35]:
array([[ 1, 1, 1, 2, 1, 4, 1, 8],
[ 1, 1, 3, 4, 9, 16, 27, 64]])
Here is a solution using cumulative multiplication which scales better than power based approaches, especially if the input array is of float dtype:
import numpy as np
def f_mult(a, k):
m, n = a.shape
out = np.empty((m, k, n), dtype=a.dtype)
out[:, 0, :] = 1
a = np.broadcast_to(a[:, None], (m, k-1, n))
a.cumprod(axis=1, out=out[:, 1:])
return out.reshape(m, -1)
Timings:
int up to power 9
divakar: 0.4342731796205044 ms
hpaulj: 0.794165057130158 ms
pp: 0.20520629966631532 ms
float up to power 39
divakar: 29.056487752124667 ms
hpaulj: 31.773792404681444 ms
pp: 1.0329263447783887 ms
Code for timings, thks #Divakar:
def f_divakar(a, k):
return (a[:,None]**np.arange(k)[:,None]).reshape(a.shape[0],-1)
def f_hpaulj(a, k):
return np.hstack(a**np.arange(k)[:,None,None])
from timeit import timeit
np.random.seed(0)
a = np.random.randint(0,9,(100,100))
k = 10
print('int up to power 9')
print('divakar:', timeit(lambda: f_divakar(a, k), number=1000), 'ms')
print('hpaulj: ', timeit(lambda: f_hpaulj(a, k), number=1000), 'ms')
print('pp: ', timeit(lambda: f_mult(a, k), number=1000), 'ms')
a = np.random.uniform(0.5,2.0,(100,100))
k = 40
print('float up to power 39')
print('divakar:', timeit(lambda: f_divakar(a, k), number=1000), 'ms')
print('hpaulj: ', timeit(lambda: f_hpaulj(a, k), number=1000), 'ms')
print('pp: ', timeit(lambda: f_mult(a, k), number=1000), 'ms')
You are creating a Vandermonde matrix with a reshape, so it is probably best to use numpy.vander to make it, and let someone else take care of the best algorithm.
This way your code is just:
np.vander(arr.ravel(), order).reshape((arr.shape[0], -1))
That said, it seems like they use something like Paul Panzer's cumprod method under the hood so it should scale well.
This question already has answers here:
Grouping indices of unique elements in numpy
(6 answers)
Closed 5 years ago.
I have a numpy array with some integers, e.g.,
a = numpy.array([1, 6, 6, 4, 1, 1, 4])
I would now like to put all items into "bins" of equal values such that the bin with label 1 contains all indices of a that have the value 1. For the above example:
bins = {
1: [0, 4, 5],
6: [1, 2],
4: [3, 6],
}
A combination of unique and wheres does the trick,
uniques = numpy.unique(a)
bins = {u: numpy.where(a == u)[0] for u in uniques}
but this doesn't seem ideal since the number of unique entries may be large.
Defaultdict with append would do the trick:
from collections import defaultdict
d = defaultdict(list)
for ix, val in enumerate(a):
d[val].append(ix)
Here is one way by utilizing the broadcasting, np.where(), and np.split():
In [66]: unique = np.unique(a)
In [67]: rows, cols = np.where(unique[:, None] == a)
In [68]: indices = np.split(cols, np.where(np.diff(rows) != 0)[0] + 1)
In [69]: dict(zip(unique, indices))
Out[69]: {1: array([0, 4, 5]), 4: array([3, 6]), 6: array([1, 2])}
Here's one approach -
def groupby_uniqueness_dict(a):
sidx = a.argsort()
b = a[sidx]
cut_idx = np.flatnonzero(b[1:] != b[:-1])+1
parts = np.split(sidx, cut_idx)
out = dict(zip(b[np.r_[0,cut_idx]], parts))
return out
More efficient one by avoiding the use of np.split -
def groupby_uniqueness_dict_v2(a):
sidx = a.argsort() # use .tolist() for output dict values as lists
b = a[sidx]
cut_idx = np.flatnonzero(b[1:] != b[:-1])+1
idxs = np.r_[0,cut_idx, len(b)+1]
out = {b[i]:sidx[i:j] for i,j in zip(idxs[:-1], idxs[1:])}
return out
Sample run -
In [161]: a
Out[161]: array([1, 6, 6, 4, 1, 1, 4])
In [162]: groupby_uniqueness_dict(a)
Out[162]: {1: array([0, 4, 5]), 4: array([3, 6]), 6: array([1, 2])}
Runtime test
Other approach(es) -
from collections import defaultdict
def defaultdict_app(a): # #Grisha's soln
d = defaultdict(list)
for ix, val in enumerate(a):
d[val].append(ix)
return d
Timings -
Case #1 : Dict values as arrays
In [226]: a = np.random.randint(0,1000, 10000)
In [227]: %timeit defaultdict_app(a)
...: %timeit groupby_uniqueness_dict(a)
...: %timeit groupby_uniqueness_dict_v2(a)
100 loops, best of 3: 4.06 ms per loop
100 loops, best of 3: 3.06 ms per loop
100 loops, best of 3: 2.02 ms per loop
In [228]: a = np.random.randint(0,10000, 100000)
In [229]: %timeit defaultdict_app(a)
...: %timeit groupby_uniqueness_dict(a)
...: %timeit groupby_uniqueness_dict_v2(a)
10 loops, best of 3: 43.5 ms per loop
10 loops, best of 3: 29.1 ms per loop
100 loops, best of 3: 19.9 ms per loop
Case #2 : Dict values as lists
In [238]: a = np.random.randint(0,1000, 10000)
In [239]: %timeit defaultdict_app(a)
...: %timeit groupby_uniqueness_dict(a)
...: %timeit groupby_uniqueness_dict_v2(a)
100 loops, best of 3: 4.15 ms per loop
100 loops, best of 3: 4.5 ms per loop
100 loops, best of 3: 2.44 ms per loop
In [240]: a = np.random.randint(0,10000, 100000)
In [241]: %timeit defaultdict_app(a)
...: %timeit groupby_uniqueness_dict(a)
...: %timeit groupby_uniqueness_dict_v2(a)
10 loops, best of 3: 57.5 ms per loop
10 loops, best of 3: 54.6 ms per loop
10 loops, best of 3: 34 ms per loop
I've got a numpy array of strictly increasing "cutoff" values of length m, and a pandas series of values (thought the index isn't important and this could be cast to a numpy array) of values of length n.
I need to come up with an efficient way of spitting out a length m vector of counts of the number of elements in the pandas series less than the jth element of the "cutoff" array.
I could do this via a list iterator:
output = array([(pan_series < cutoff_val).sum() for cutoff_val in cutoff_ar])
but I was wondering if there were any way to do this that leveraged more of numpy's magic speed, as I have to do this quite a few times inside multiple loops and it keeps crasshing my computer.
Thanks!
Is this what you are looking for?
In [36]: a = np.random.random(20)
In [37]: a
Out[37]:
array([ 0.68574307, 0.15743428, 0.68006876, 0.63572484, 0.26279663,
0.14346269, 0.56267286, 0.47250091, 0.91168387, 0.98915746,
0.22174062, 0.11930722, 0.30848231, 0.1550406 , 0.60717858,
0.23805205, 0.57718675, 0.78075297, 0.17083826, 0.87301963])
In [38]: b = np.array((0.3,0.7))
In [39]: np.sum(a[:,None]<b[None,:], axis=0)
Out[39]: array([ 8, 16])
In [40]: np.sum(a[:,None]<b, axis=0) # b's new axis above is unnecessary...
Out[40]: array([ 8, 16])
In [41]: (a[:,None]<b).sum(axis=0) # even simpler
Out[41]: array([ 8, 16])
Timings are always well received (for a longish, 2E6 elements array)
In [47]: a = np.random.random(2000000)
In [48]: %timeit (a[:,None]<b).sum(axis=0)
10 loops, best of 3: 78.2 ms per loop
In [49]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
1 loop, best of 3: 448 ms per loop
For a smaller array
In [50]: a = np.random.random(2000)
In [51]: %timeit (a[:,None]<b).sum(axis=0)
10000 loops, best of 3: 89 µs per loop
In [52]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 141 µs per loop
Edit
Divakar says that things may be different for lenghty bs, let's see
In [71]: a = np.random.random(2000)
In [72]: b =np.random.random(200)
In [73]: %timeit (a[:,None]<b).sum(axis=0)
1000 loops, best of 3: 1.44 ms per loop
In [74]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
10000 loops, best of 3: 172 µs per loop
quite different indeed! Thank you for prompting my curiosity.
Probably the OP should test for his use case, very long sample with respect to cutoff sequences or not? and where there is a balance?
Edit #2
I made a blooper in my timings, I forgot the axis=0 argument to .sum()...
I've edited the timings with the corrected statement and, of course, the corrected timing. My apologies.
You can use np.searchsorted for some NumPy magic -
# Convert to numpy array for some "magic"
pan_series_arr = np.array(pan_series)
# Let the magic begin!
sortidx = pan_series_arr.argsort()
out = np.searchsorted(pan_series_arr,cutoff_ar,'right',sorter=sortidx)
Explanation
You are performing [(pan_series < cutoff_val).sum() for cutoff_val in cutoff_ar] i.e. for each
element in cutoff_ar, we are counting the number of pan_series elements that are lesser than it. Now with np.searchsorted, we are looking for cutoff_ar to be put in a sorted pan_series_arr and get the indices of such positions compared to whom the current element in cutoff_ar is at 'right' position . These indices essentially represent the number of pan_series elements below the current cutoff_ar element, thus giving us our desired output.
Sample run
In [302]: cutoff_ar
Out[302]: array([ 1, 3, 9, 44, 63, 90])
In [303]: pan_series_arr
Out[303]: array([ 2, 8, 69, 55, 97])
In [304]: [(pan_series_arr < cutoff_val).sum() for cutoff_val in cutoff_ar]
Out[304]: [0, 1, 2, 2, 3, 4]
In [305]: sortidx = pan_series_arr.argsort()
...: out = np.searchsorted(pan_series_arr,cutoff_ar,'right',sorter=sortidx)
...:
In [306]: out
Out[306]: array([0, 1, 2, 2, 3, 4])
In numpy, is there a nice idiomatic way of testing if all rows are equal in a 2d array?
I can do something like
np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])
This seems to mix python lists with numpy arrays which is ugly and presumably also slow.
Is there a nicer/neater way?
One way is to check that every row of the array arr is equal to its first row arr[0]:
(arr == arr[0]).all()
Using equality == is fine for integer values, but if arr contains floating point values you could use np.isclose instead to check for equality within a given tolerance:
np.isclose(a, a[0]).all()
If your array contains NaN and you want to avoid the tricky NaN != NaN issue, you could combine this approach with np.isnan:
(np.isclose(a, a[0]) | np.isnan(a)).all()
Simply check if the number if unique items in the array are 1:
>>> arr = np.array([[1]*10 for _ in xrange(5)])
>>> len(np.unique(arr)) == 1
True
A solution inspired from unutbu's answer:
>>> arr = np.array([[1]*10 for _ in xrange(5)])
>>> np.all(np.all(arr == arr[0,:], axis = 1))
True
One problem with your code is that you're creating an entire list first before applying np.all() on it. Due to that there's no short-circuiting happening in your version, instead of that it would be better if you use Python's all() with a generator expression:
Timing comparisons:
>>> M = arr = np.array([[3]*100] + [[2]*100 for _ in xrange(1000)])
>>> %timeit np.all(np.all(arr == arr[0,:], axis = 1))
1000 loops, best of 3: 272 µs per loop
>>> %timeit (np.diff(M, axis=0) == 0).all()
1000 loops, best of 3: 596 µs per loop
>>> %timeit np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])
100 loops, best of 3: 10.6 ms per loop
>>> %timeit all(np.array_equal(M[0], M[i]) for i in xrange(1,len(M)))
100000 loops, best of 3: 11.3 µs per loop
>>> M = arr = np.array([[2]*100 for _ in xrange(1000)])
>>> %timeit np.all(np.all(arr == arr[0,:], axis = 1))
1000 loops, best of 3: 330 µs per loop
>>> %timeit (np.diff(M, axis=0) == 0).all()
1000 loops, best of 3: 594 µs per loop
>>> %timeit np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])
100 loops, best of 3: 9.51 ms per loop
>>> %timeit all(np.array_equal(M[0], M[i]) for i in xrange(1,len(M)))
100 loops, best of 3: 9.44 ms per loop
It is worth mentioning that the above version will not work for multidimensional arrays.
For example: for a three-dimensional square image tensor img [256, 256, 3] , we need to check whether the same RGB [256, 256] layers in the image or not.
In this case, we need to use broadcasting
(img == img[:, :, 0, np.newaxis]).all()
Because simple img[:, :, 0] gives us [256, 256], but we need [256, 256, 1] to broadcast through layers.
For Alex's answer about nan, we have now,
np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)
np.allclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)