How can I check whether a numpy array is empty or not?
I used the following code, but this fails if the array contains a zero.
if not self.Definition.all():
Is this the solution?
if self.Definition == array([]):
You can always take a look at the .size attribute. It is defined as an integer, and is zero (0) when there are no elements in the array:
import numpy as np
a = np.array([])
if a.size == 0:
# Do something when `a` is empty
https://numpy.org/devdocs/user/quickstart.html (2020.04.08)
NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.
(...) NumPy’s array class is called ndarray. (...) The more important attributes of an ndarray object are:
ndarray.ndim
the number of axes (dimensions) of the array.
ndarray.shape
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
ndarray.size
the total number of elements of the array. This is equal to the product of the elements of shape.
One caveat, though.
Note that np.array(None).size returns 1!
This is because a.size is equivalent to np.prod(a.shape),
np.array(None).shape is (), and an empty product is 1.
>>> import numpy as np
>>> np.array(None).size
1
>>> np.array(None).shape
()
>>> np.prod(())
1.0
Therefore, I use the following to test if a numpy array has elements:
>>> def elements(array):
... return array.ndim and array.size
>>> elements(np.array(None))
0
>>> elements(np.array([]))
0
>>> elements(np.zeros((2,3,4)))
24
Why would we want to check if an array is empty? Arrays don't grow or shrink in the same that lists do. Starting with a 'empty' array, and growing with np.append is a frequent novice error.
Using a list in if alist: hinges on its boolean value:
In [102]: bool([])
Out[102]: False
In [103]: bool([1])
Out[103]: True
But trying to do the same with an array produces (in version 1.18):
In [104]: bool(np.array([]))
/usr/local/bin/ipython3:1: DeprecationWarning: The truth value
of an empty array is ambiguous. Returning False, but in
future this will result in an error. Use `array.size > 0` to
check that an array is not empty.
#!/usr/bin/python3
Out[104]: False
In [105]: bool(np.array([1]))
Out[105]: True
and bool(np.array([1,2]) produces the infamous ambiguity error.
edit
The accepted answer suggests size:
In [11]: x = np.array([])
In [12]: x.size
Out[12]: 0
But I (and most others) check the shape more than the size:
In [13]: x.shape
Out[13]: (0,)
Another thing in its favor is that it 'maps' on to an empty list:
In [14]: x.tolist()
Out[14]: []
But there are other other arrays with 0 size, that aren't 'empty' in that last sense:
In [15]: x = np.array([[]])
In [16]: x.size
Out[16]: 0
In [17]: x.shape
Out[17]: (1, 0)
In [18]: x.tolist()
Out[18]: [[]]
In [19]: bool(x.tolist())
Out[19]: True
np.array([[],[]]) is also size 0, but shape (2,0) and len 2.
While the concept of an empty list is well defined, an empty array is not well defined. One empty list is equal to another. The same can't be said for a size 0 array.
The answer really depends on
what do you mean by 'empty'?
what are you really test for?
Related
I just came across this piece of code:
x = np.load(lc_path, allow_pickle=True)[()]
And I've never seen this pattern before: [()]. What does it do and why is this syntacticly correct?
a = np.load(lc_path, allow_pickle=True)
>>> array({'train_ppls': [1158.359413193576, 400.54333992093854, ...],
'val_ppls': [493.0056070137404, 326.53203520368623, ...],
'train_losses': [340.40905952453613, 675.6475067138672, ...],
'val_losses': [217.46258735656738, 438.86770486831665, ...],
'times': [19.488852977752686, 20.147733449935913, ...]}, dtype=object)
So I guess a is a dict wrapped in an array for some reason by the person who saved it
It a way (the only way) of indexing a 0d array:
In [475]: x=np.array(21)
In [476]: x
Out[476]: array(21)
In [477]: x.shape
Out[477]: ()
In [478]: x[()]
Out[478]: 21
In effect it pulls the element out of the array. item() is another way:
In [479]: x.item()
Out[479]: 21
In [480]: x.ndim
Out[480]: 0
In
x = np.load(lc_path, allow_pickle=True)[()]
most likely the np.save was given a non-array; and wrapped in a 0d object dtype array to save it. This is a way of recovering that object.
In [481]: np.save('test.npy', {'a':1})
In [482]: x = np.load('test.npy', allow_pickle=True)
In [483]: x
Out[483]: array({'a': 1}, dtype=object)
In [484]: x.ndim
Out[484]: 0
In [485]: x[()]
Out[485]: {'a': 1}
In general when we index a nd array, e.g. x[1,2] we are really doing x[(1,2)], that is, using a tuple that corresponds to the number of dimensions. If x is 0d, the only tuple that works is an empty one, ().
That's indexing the array with a tuple of 0 indices. For most arrays, this just produces a view of the whole array, but for a 0-dimensional array, it extracts the array's single element as a scalar.
In this case, it looks like someone made the weird choice to dump a non-NumPy object to an array with numpy.save, resulting in NumPy saving a 0-dimensional array of object dtype wrapping the original object. The use of allow_pickle=True and the empty tuple index extracts the object from the 0-dimensional array.
They probably should have picked something other than numpy.save to save this object.
I am doing some quick calculations on a scalar value from a numpy array. As it says in the documentation,
The primary advantage of using array scalars is that they preserve the
array type (Python may not have a matching scalar type available, e.g.
int16)...
But is there a better (faster, and more concise) way of assigning a new value to an existing array scalar than this:
>>> x = np.array(2.0, dtype='float32')
which works but is not that convenient (I am doing other arithmetic and want to preserve the type throughout).
This doesn't work for obvious reasons:
>>> x = np.array(1.0, dtype='float32')
>>> print(x, type(x))
1.0 <class 'numpy.ndarray'>
>>> x = 2.0
>>> print(x, type(x))
2.0 <class 'float'>
Neither does this:
>>> x = np.array(1.0, dtype='float32')
>>> x[] = 2.0
File "<ipython-input-319-7f36071ff81d>", line 2
x[] = 2.0
^
SyntaxError: invalid syntax
Nor this:
>>> x = np.array(1.0, dtype='float32')
>>> x[:] = 2.0
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-24-62cd4ca238ce> in <module>()
1 x = np.array(1.0, dtype='float32')
----> 2 x[:] = 2.0
IndexError: too many indices for array
UPDATE:
Based on comments below (thanks) I have now realised that I am not actually using array scalars. x is a zero-dimensional array.
Here is how to create an array scalar:
>>> a = np.array((1.0, 2.0, 3.0), dtype='float32')
>>> x = a[0]
>>> print(x, type(x))
1.0 <class 'numpy.float32'>
Or simply:
>>> x = np.float32(1.0)
>>> print(x, type(x))
1.0 <class 'numpy.float32'>
A 0d array can be modified, but an array scalar cannot:
In [199]: x = np.array(1.0, 'float32')
In [200]: x
Out[200]: array(1., dtype=float32)
In [201]: x.shape
Out[201]: ()
In [202]: x[...] = 2
In [203]: x
Out[203]: array(2., dtype=float32)
In [204]: x[()] =3
In [205]: x
Out[205]: array(3., dtype=float32)
You have to mutate x, not assign a new object to the variable.
That said, I don't see why one would want, or need, to do this.
This 0d array is not quite the same as an array scalar:
In [207]: y = np.float32(1)
In [208]: y[...] = 2
....
TypeError: 'numpy.float32' object does not support item assignment
Extracting an element from an array with indexing produces an array scalar:
In [210]: type(x[()])
Out[210]: numpy.float32
The float32 object has many of the array attributes, even methods, but it isn't quite same:
In [211]: x.shape
Out[211]: ()
In [212]: y.shape
Out[212]: ()
An array can be indexed with a tuple the same size as its shape. arr[1,2] is the same as arr[(1,2)]. The shape of x is (), so it can only be indexed with an empty tuple, x[()]. Similarly arr[:,:] works for a 2d array, but not for 1d. ... means, any number of slices, so works with x[...].
Enough of the __getitem__ has been defined for np.generic class objects to allow indexing like [...] and [()]. But the assignment has not been defined.
It might be useful to look at the class hierarchy of classes like np.ndarray, np.int_, np.float32, np.float, and np.int.
fuller quote
From your link: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html#array-scalars
NumPy generally returns elements of arrays as array scalars (a scalar with an associated dtype). Array scalars differ from Python scalars, but for the most part they can be used interchangeably (the primary exception is for versions of Python older than v2.x, where integer array scalars cannot act as indices for lists and tuples). There are some exceptions, such as when code requires very specific attributes of a scalar or when it checks specifically whether a value is a Python scalar. Generally, problems are easily fixed by explicitly converting array scalars to Python scalars, using the corresponding Python type function (e.g., int, float, complex, str, unicode).
The primary advantage of using array scalars is that they preserve the array type (Python may not have a matching scalar type available, e.g. int16). Therefore, the use of array scalars ensures identical behaviour between arrays and scalars, irrespective of whether the value is inside an array or not. NumPy scalars also have many of the same methods arrays do.
The 2nd paragraph is written the context of the 1st. It attempts to explain why elements of an array are returned as array scalars. That is, why arr[0,1] returns a np.float32 object, as opposed to a Python float.
It is not suggesting that we create an array scalar directly.
I first wrote this answer glossing over the difference between a 0d array, and what this quote is calling array scalars.
I want to understand the NumPy behavior.
When I try to get the reference of an inner array of a NumPy array, and then compare it to the object itself, I get as returned value False.
Here is the example:
In [198]: x = np.array([[1,2,3], [4,5,6]])
In [201]: x0 = x[0]
In [202]: x0 is x[0]
Out[202]: False
While on the other hand, with Python native objects, the returned is True.
In [205]: c = [[1,2,3],[1]]
In [206]: c0 = c[0]
In [207]: c0 is c[0]
Out[207]: True
My question, is that the intended behavior of NumPy? If so, what should I do if I want to create a reference of inner objects of NumPy arrays.
2d slicing
When I first wrote this I constructed and indexed a 1d array. But the OP is working with a 2d array, so x[0] is a 'row', a slice of the original.
In [81]: arr = np.array([[1,2,3], [4,5,6]])
In [82]: arr.__array_interface__['data']
Out[82]: (181595128, False)
In [83]: x0 = arr[0,:]
In [84]: x0.__array_interface__['data']
Out[84]: (181595128, False) # same databuffer pointer
In [85]: id(x0)
Out[85]: 2886887088
In [86]: x1 = arr[0,:] # another slice, different id
In [87]: x1.__array_interface__['data']
Out[87]: (181595128, False)
In [88]: id(x1)
Out[88]: 2886888888
What I wrote earlier about slices still applies. Indexing an individual elements, as with arr[0,0] works the same as with a 1d array.
This 2d arr has the same databuffer as the 1d arr.ravel(); the shape and strides are different. And the distinction between view, copy and item still applies.
A common way of implementing 2d arrays in C is to have an array of pointers to other arrays. numpy takes a different, strided approach, with just one flat array of data, and usesshape and strides parameters to implement the transversal. So a subarray requires its own shape and strides as well as a pointer to the shared databuffer.
1d array indexing
I'll try to illustrate what is going on when you index an array:
In [51]: arr = np.arange(4)
The array is an object with various attributes such as shape, and a data buffer. The buffer stores the data as bytes (in a C array), not as Python numeric objects. You can see information on the array with:
In [52]: np.info(arr)
class: ndarray
shape: (4,)
strides: (4,)
itemsize: 4
aligned: True
contiguous: True
fortran: True
data pointer: 0xa84f8d8
byteorder: little
byteswap: False
type: int32
or
In [53]: arr.__array_interface__
Out[53]:
{'data': (176486616, False),
'descr': [('', '<i4')],
'shape': (4,),
'strides': None,
'typestr': '<i4',
'version': 3}
One has the data pointer in hex, the other decimal. We usually don't reference it directly.
If I index an element, I get a new object:
In [54]: x1 = arr[1]
In [55]: type(x1)
Out[55]: numpy.int32
In [56]: x1.__array_interface__
Out[56]:
{'__ref': array(1),
'data': (181158400, False),
....}
In [57]: id(x1)
Out[57]: 2946170352
It has some properties of an array, but not all. For example you can't assign to it. Notice also that its 'data` value is totally different.
Make another selection from the same place - different id and different data:
In [58]: x2 = arr[1]
In [59]: id(x2)
Out[59]: 2946170336
In [60]: x2.__array_interface__['data']
Out[60]: (181143288, False)
Also if I change the array at this point, it does not affect the earlier selections:
In [61]: arr[1] = 10
In [62]: arr
Out[62]: array([ 0, 10, 2, 3])
In [63]: x1
Out[63]: 1
x1 and x2 don't have the same id, and thus won't match with is, and they don't use the arr data buffer either. There's no record that either variable was derived from arr.
With slicing it is possible get a view of the original array,
In [64]: y = arr[1:2]
In [65]: y.__array_interface__
Out[65]:
{'data': (176486620, False),
'descr': [('', '<i4')],
'shape': (1,),
....}
In [66]: y
Out[66]: array([10])
In [67]: y[0]=4
In [68]: arr
Out[68]: array([0, 4, 2, 3])
In [69]: x1
Out[69]: 1
It's data pointer is 4 bytes larger than arr - that is, it points to the same buffer, just a different spot. And changing y does change arr (but not the independent x1).
I could even make a 0d view of this item
In [71]: z = y.reshape(())
In [72]: z
Out[72]: array(4)
In [73]: z[...]=0
In [74]: arr
Out[74]: array([0, 0, 2, 3])
In Python code we normally don't work with objects like this. When we use the c-api or cython is it possible to access the data buffer directly. nditer is an iteration mechanism that works with 0d objects like this (either in Python or the c-api). In cython typed memoryviews are particularly useful for low level access.
http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
https://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#c.NpyIter
elementwise ==
In response to comment, Comparing NumPy object references
np.array([1]) == np.array([2]) will return array([False], dtype=bool)
== is defined for arrays as an elementwise operation. It compares the values of the respective elements and returns a matching boolean array.
If such a comparison needs to be used in a scalar context (such as an if) it needs to be reduced to a single value, as with np.all or np.any.
The is test compares object id's (not just for numpy objects). It has limited value in practical coding. I used it most often in expressions like is None, where None is an object with a unique id, and which does not play nicely with equality tests.
I think that you have a miss understanding about Numpy arrays. You think that sub arrays in a multidimensional array in Numpy (like in Python lists) are separate objects, well, they're not.
A Numpy array, regardless of its dimension is just one object. And that's because Numpy creates the arrays at C levels and when loads them up as a python object it can't be break down to multiple objects. That makes Python to create a new object for preserving new parts when you use some attributes like split(), __getitem__, take() or etc., which as a mater of fact, its just the way that python abstracts the list-like behavior for Numpy arrays.
You can also check thin in real-time like following:
In [7]: x
Out[7]:
array([[1, 2, 3],
[4, 5, 6]])
In [8]: x[0] is x[0]
Out[8]: False
So as soon as you have an array or any mutable object that can hols other object in it you'll have a python mutable object and therefore you will lose the performance and all other Numpy array's cool features.
Also as #Imanol mentioned in comments you may want to use Numpy view objects if you want to have a memory optimized and flexible operation when you want to modify an array(s) with reference(s). view objects can be constructed in following two ways:
a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of
the array’s memory with a different data-type. This can cause a
reinterpretation of the bytes of memory.
a.view(ndarray_subclass) or a.view(type=ndarray_subclass) just returns
an instance of ndarray_subclass that looks at the same array (same
shape, dtype, etc.) This does not cause a reinterpretation of the
memory.
For a.view(some_dtype), if some_dtype has a different number of bytes
per entry than the previous dtype (for example, converting a regular
array to a structured array), then the behavior of the view cannot be
predicted just from the superficial appearance of a (shown by
print(a)). It also depends on exactly how a is stored in memory.
Therefore if a is C-ordered versus fortran-ordered, versus defined as
a slice or transpose, etc., the view may give different results.
Not sure if it's useful at this point, but numpy.ndarray.ctypes seems to have useful bits:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html
Used something like this (missing dtype, but meh):
def is_same_array(a, b):
return (a.shape == b.shape) and (a == b).all() and a.ctypes.data == b.ctypes.data
here:
https://github.com/EricCousineau-TRI/repro/blob/a60daf899e9726daf2ca1259bb80ad2c7c9b3e3f/python/namedlist_alt.py#L111
So here's an oddity I noticed recently. In the code below I am creating either a 1D or 2D numpy array, extracting out one of the values from the array (position_3), and then assigning the position I extracted from to a different value.
In the 1D case position_3 matches the originally assigned value in the array (i.e. looks like position_3 is a copy from the 1D array) while in the 2D cases position_3 changes upon changing the array (i.e. it looks like position_3 is a reference from the 2D array).
import numpy as np
print "Testing 1D array"
D1_array = np.array([0,1,2,3,4])
position_3 = D1_array[3]
D1_array[3] = 0
print "Value at position 3 in array: %i"%(D1_array[3]) #: 0
print "Value at position_3 variable: %i"%(position_3) #: 3
print "Testing 2D array"
D2_array = np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11],[12,13,14]])
position_3 = D2_array[3]
D2_array[3] = [0,0,0]
print "Value at position 3 in array: %s"%(str(D2_array[3])) #: [0,0,0]
print "Value at position_3 variable: %s"%(str(position_3)) #: [0,0,0]
I understand that everything is a reference in Python etc., but what I don't understand is why is this behavior inconsistent between 1D and 2D arrays? It's worth noting that the same comparison with Python lists yields the copy-like behaviour in both cases (i.e. in the 2D array - AKA a nested list - the position_3 variable remains [6,7,8]).
Per the docs on Basic Slicing and Indexing:
The simplest case of indexing with N integers returns an array scalar
representing the corresponding item....
All arrays generated by basic slicing are always views of the original array. (my emphasis)
So indexing a 1D array with an integer returns an array scalar:
In [32]: D1_array = np.array([0,1,2,3,4])
In [33]: D2_array = np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11],[12,13,14]])
In [36]: D1_array[3]
Out[36]: 3
In [37]: type(D1_array[3]) # an array scalar
Out[37]: numpy.int64
If you assign position_3 = D1_array[3] then position_3 holds the value of this array scalar. Modifying the original array with D1_array[3] = 0 does not affect the value of position_3.
In contrast, indexing a 2D array with an integer returns a view:
In [38]: D2_array[3]
Out[38]: array([ 9, 10, 11])
In [40]: type(D2_array[3])
Out[40]: numpy.ndarray
In [42]: D2_array[3].base is D2_array
Out[42]: True
Modifying a view also alters the original array, and modifying the original array also affects the view.
Say I have a 3 dimensional numpy array:
np.random.seed(1145)
A = np.random.random((5,5,5))
and I have two lists of indices corresponding to the 2nd and 3rd dimensions:
second = [1,2]
third = [3,4]
and I want to select the elements in the numpy array corresponding to
A[:][second][third]
so the shape of the sliced array would be (5,2,2) and
A[:][second][third].flatten()
would be equivalent to to:
In [226]:
for i in range(5):
for j in second:
for k in third:
print A[i][j][k]
0.556091074129
0.622016249651
0.622530505868
0.914954716368
0.729005532319
0.253214472335
0.892869371179
0.98279375528
0.814240066639
0.986060321906
0.829987410941
0.776715489939
0.404772469431
0.204696635072
0.190891168574
0.869554447412
0.364076117846
0.04760811817
0.440210532601
0.981601369658
Is there a way to slice a numpy array in this way? So far when I try A[:][second][third] I get IndexError: index 3 is out of bounds for axis 0 with size 2 because the [:] for the first dimension seems to be ignored.
Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].
You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).
What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).
For example:
In [8]: import numpy as np
In [9]: a = np.arange(125).reshape(5,5,5)
In [10]: second = [1,2]
In [11]: third = [3,4]
In [12]: s = a[:, np.array(second).reshape(-1,1), third]
In [13]: s.shape
Out[13]: (5, 2, 2)
Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:
In [14]: s2 = a[:, 1:3, 3:5]
In [15]: s2.shape
Out[15]: (5, 2, 2)
In [16]: np.all(s == s2)
Out[16]: True
There are a couple very important difference in those two methods.
The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.
One way would be to use np.ix_:
>>> out = A[np.ix_(range(A.shape[0]),second, third)]
>>> out.shape
(5, 2, 2)
>>> manual = [A[i,j,k] for i in range(5) for j in second for k in third]
>>> (out.ravel() == manual).all()
True
Downside is that you have to specify the missing coordinate ranges explicitly, but you could wrap that into a function.
I think there are three problems with your approach:
Both second and third should be slices
Since the 'to' index is exclusive, they should go from 1 to 3 and from 3 to 5
Instead of A[:][second][third], you should use A[:,second,third]
Try this:
>>> np.random.seed(1145)
>>> A = np.random.random((5,5,5))
>>> second = slice(1,3)
>>> third = slice(3,5)
>>> A[:,second,third].shape
(5, 2, 2)
>>> A[:,second,third].flatten()
array([ 0.43285482, 0.80820122, 0.64878266, 0.62689481, 0.01298507,
0.42112921, 0.23104051, 0.34601169, 0.24838564, 0.66162209,
0.96115751, 0.07338851, 0.33109539, 0.55168356, 0.33925748,
0.2353348 , 0.91254398, 0.44692211, 0.60975602, 0.64610556])