What does "[()]" mean when called upon a numpy array? - python

I just came across this piece of code:
x = np.load(lc_path, allow_pickle=True)[()]
And I've never seen this pattern before: [()]. What does it do and why is this syntacticly correct?
a = np.load(lc_path, allow_pickle=True)
>>> array({'train_ppls': [1158.359413193576, 400.54333992093854, ...],
'val_ppls': [493.0056070137404, 326.53203520368623, ...],
'train_losses': [340.40905952453613, 675.6475067138672, ...],
'val_losses': [217.46258735656738, 438.86770486831665, ...],
'times': [19.488852977752686, 20.147733449935913, ...]}, dtype=object)
So I guess a is a dict wrapped in an array for some reason by the person who saved it

It a way (the only way) of indexing a 0d array:
In [475]: x=np.array(21)
In [476]: x
Out[476]: array(21)
In [477]: x.shape
Out[477]: ()
In [478]: x[()]
Out[478]: 21
In effect it pulls the element out of the array. item() is another way:
In [479]: x.item()
Out[479]: 21
In [480]: x.ndim
Out[480]: 0
In
x = np.load(lc_path, allow_pickle=True)[()]
most likely the np.save was given a non-array; and wrapped in a 0d object dtype array to save it. This is a way of recovering that object.
In [481]: np.save('test.npy', {'a':1})
In [482]: x = np.load('test.npy', allow_pickle=True)
In [483]: x
Out[483]: array({'a': 1}, dtype=object)
In [484]: x.ndim
Out[484]: 0
In [485]: x[()]
Out[485]: {'a': 1}
In general when we index a nd array, e.g. x[1,2] we are really doing x[(1,2)], that is, using a tuple that corresponds to the number of dimensions. If x is 0d, the only tuple that works is an empty one, ().

That's indexing the array with a tuple of 0 indices. For most arrays, this just produces a view of the whole array, but for a 0-dimensional array, it extracts the array's single element as a scalar.
In this case, it looks like someone made the weird choice to dump a non-NumPy object to an array with numpy.save, resulting in NumPy saving a 0-dimensional array of object dtype wrapping the original object. The use of allow_pickle=True and the empty tuple index extracts the object from the 0-dimensional array.
They probably should have picked something other than numpy.save to save this object.

Related

What's the difference between np.array(int) and np.array([int])?

What's the difference between np.array(100) and np.array([100])? I understand that the latter is a 1D array containing a single value (100) but what is the former called?
This is a 0d array. It can be used in many of the same ways as other arrays, subject of course to shape and dtype compatibilites.
In [545]: x=np.array(3)
In [546]: x.shape
Out[546]: () # empty tuple
In [547]: x.ndim
Out[547]: 0
In [548]: x.ravel()
Out[548]: array([3]) # as with other arrays, ravel makes a 1d array
In [549]: x.reshape(1,1,1) # reshape to 3d
Out[549]: array([[[3]]])
In [550]: x.item() # extracting that element
Out[550]: 3
In [551]: x[()] # another extracting
Out[551]: 3
In [552]: type(_)
Out[552]: numpy.int64
In [553]: type(x.item())
Out[553]: int
There's a subtle difference between item() and [()]. One returns python object, the other a "numpy scalar".
More on numpy scalars:
https://numpy.org/doc/stable/reference/arrays.scalars.html#methods
A common case where we encounter 0d array is when an object gets wrapped in an array, such as via np.save.
In [556]: d = np.array({'foo':'bar'})
In [557]: d
Out[557]: array({'foo': 'bar'}, dtype=object)
In [558]: d.shape
Out[558]: ()
In [559]: d.item()['foo']
Out[559]: 'bar'
The value of a 0d array can be changed
In [562]: x[...] = 4
In [563]: x
Out[563]: array(4)

Mapping an integer to array (Python): ValueError: setting an array element with a sequence

I have a defaultdict which maps certain integers to a numpy array of size 20.
In addition, I have an existing array of indices. I want to turn that array of indices into a 2D array, where each original index is converted into an array via my defaultdict.
Finally, in the case that an index isn't found in the defaultdict, I want to create an array of zeros for that index.
Here's what I have so far
converter = lambda x: np.zeros((d), dtype='float32') if x == -1 else cVf[x]
vfunc = np.vectorize(converter)
cvf = vfunc(indices)
np.zeros((d), dtype='float32') and cVf[x] are identical data types/ shapes:
(Pdb) np.shape(cVf[0])
(20,)
Yet I get the error in the title (*** ValueError: setting an array element with a sequence.) when I try to run this code.
Any ideas?
You should give us a some sample arrays or dictionaries (in the case of cVF, so we can make a test run.
Read what vectorize has to say about the return value. Since you don't define otypes, it makes a test calculation to determine the dtype of the returned array. My first thought was that the test calc and subsequent one might be returning different things. But you claim converter will always be returning the same dtype and shape array.
But let's try something simpler:
In [609]: fv = np.vectorize(lambda x: np.array([x,x]))
In [610]: fv([1,2,3])
...
ValueError: setting an array element with a sequence.
It's having trouble with returning any array.
But if I give an otypes, it works
In [611]: fv = np.vectorize(lambda x: np.array([x,x]), otypes=[object])
In [612]: fv([1,2,3])
Out[612]: array([array([1, 1]), array([2, 2]), array([3, 3])], dtype=object)
In fact in this case I could use frompyfunc, which returns object dtype, and is the underlying function for vectorize (and a bit faster).
In [613]: fv = np.frompyfunc(lambda x: np.array([x,x]), 1,1)
In [614]: fv([1,2,3])
Out[614]: array([array([1, 1]), array([2, 2]), array([3, 3])], dtype=object)
vectorize and frompyfunc are designed for functions that are scalar in- scalar out. That scalar may be an object, even array, but is still treated as a scalar.

Comparing NumPy object references

I want to understand the NumPy behavior.
When I try to get the reference of an inner array of a NumPy array, and then compare it to the object itself, I get as returned value False.
Here is the example:
In [198]: x = np.array([[1,2,3], [4,5,6]])
In [201]: x0 = x[0]
In [202]: x0 is x[0]
Out[202]: False
While on the other hand, with Python native objects, the returned is True.
In [205]: c = [[1,2,3],[1]]
In [206]: c0 = c[0]
In [207]: c0 is c[0]
Out[207]: True
My question, is that the intended behavior of NumPy? If so, what should I do if I want to create a reference of inner objects of NumPy arrays.
2d slicing
When I first wrote this I constructed and indexed a 1d array. But the OP is working with a 2d array, so x[0] is a 'row', a slice of the original.
In [81]: arr = np.array([[1,2,3], [4,5,6]])
In [82]: arr.__array_interface__['data']
Out[82]: (181595128, False)
In [83]: x0 = arr[0,:]
In [84]: x0.__array_interface__['data']
Out[84]: (181595128, False) # same databuffer pointer
In [85]: id(x0)
Out[85]: 2886887088
In [86]: x1 = arr[0,:] # another slice, different id
In [87]: x1.__array_interface__['data']
Out[87]: (181595128, False)
In [88]: id(x1)
Out[88]: 2886888888
What I wrote earlier about slices still applies. Indexing an individual elements, as with arr[0,0] works the same as with a 1d array.
This 2d arr has the same databuffer as the 1d arr.ravel(); the shape and strides are different. And the distinction between view, copy and item still applies.
A common way of implementing 2d arrays in C is to have an array of pointers to other arrays. numpy takes a different, strided approach, with just one flat array of data, and usesshape and strides parameters to implement the transversal. So a subarray requires its own shape and strides as well as a pointer to the shared databuffer.
1d array indexing
I'll try to illustrate what is going on when you index an array:
In [51]: arr = np.arange(4)
The array is an object with various attributes such as shape, and a data buffer. The buffer stores the data as bytes (in a C array), not as Python numeric objects. You can see information on the array with:
In [52]: np.info(arr)
class: ndarray
shape: (4,)
strides: (4,)
itemsize: 4
aligned: True
contiguous: True
fortran: True
data pointer: 0xa84f8d8
byteorder: little
byteswap: False
type: int32
or
In [53]: arr.__array_interface__
Out[53]:
{'data': (176486616, False),
'descr': [('', '<i4')],
'shape': (4,),
'strides': None,
'typestr': '<i4',
'version': 3}
One has the data pointer in hex, the other decimal. We usually don't reference it directly.
If I index an element, I get a new object:
In [54]: x1 = arr[1]
In [55]: type(x1)
Out[55]: numpy.int32
In [56]: x1.__array_interface__
Out[56]:
{'__ref': array(1),
'data': (181158400, False),
....}
In [57]: id(x1)
Out[57]: 2946170352
It has some properties of an array, but not all. For example you can't assign to it. Notice also that its 'data` value is totally different.
Make another selection from the same place - different id and different data:
In [58]: x2 = arr[1]
In [59]: id(x2)
Out[59]: 2946170336
In [60]: x2.__array_interface__['data']
Out[60]: (181143288, False)
Also if I change the array at this point, it does not affect the earlier selections:
In [61]: arr[1] = 10
In [62]: arr
Out[62]: array([ 0, 10, 2, 3])
In [63]: x1
Out[63]: 1
x1 and x2 don't have the same id, and thus won't match with is, and they don't use the arr data buffer either. There's no record that either variable was derived from arr.
With slicing it is possible get a view of the original array,
In [64]: y = arr[1:2]
In [65]: y.__array_interface__
Out[65]:
{'data': (176486620, False),
'descr': [('', '<i4')],
'shape': (1,),
....}
In [66]: y
Out[66]: array([10])
In [67]: y[0]=4
In [68]: arr
Out[68]: array([0, 4, 2, 3])
In [69]: x1
Out[69]: 1
It's data pointer is 4 bytes larger than arr - that is, it points to the same buffer, just a different spot. And changing y does change arr (but not the independent x1).
I could even make a 0d view of this item
In [71]: z = y.reshape(())
In [72]: z
Out[72]: array(4)
In [73]: z[...]=0
In [74]: arr
Out[74]: array([0, 0, 2, 3])
In Python code we normally don't work with objects like this. When we use the c-api or cython is it possible to access the data buffer directly. nditer is an iteration mechanism that works with 0d objects like this (either in Python or the c-api). In cython typed memoryviews are particularly useful for low level access.
http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
https://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#c.NpyIter
elementwise ==
In response to comment, Comparing NumPy object references
np.array([1]) == np.array([2]) will return array([False], dtype=bool)
== is defined for arrays as an elementwise operation. It compares the values of the respective elements and returns a matching boolean array.
If such a comparison needs to be used in a scalar context (such as an if) it needs to be reduced to a single value, as with np.all or np.any.
The is test compares object id's (not just for numpy objects). It has limited value in practical coding. I used it most often in expressions like is None, where None is an object with a unique id, and which does not play nicely with equality tests.
I think that you have a miss understanding about Numpy arrays. You think that sub arrays in a multidimensional array in Numpy (like in Python lists) are separate objects, well, they're not.
A Numpy array, regardless of its dimension is just one object. And that's because Numpy creates the arrays at C levels and when loads them up as a python object it can't be break down to multiple objects. That makes Python to create a new object for preserving new parts when you use some attributes like split(), __getitem__, take() or etc., which as a mater of fact, its just the way that python abstracts the list-like behavior for Numpy arrays.
You can also check thin in real-time like following:
In [7]: x
Out[7]:
array([[1, 2, 3],
[4, 5, 6]])
In [8]: x[0] is x[0]
Out[8]: False
So as soon as you have an array or any mutable object that can hols other object in it you'll have a python mutable object and therefore you will lose the performance and all other Numpy array's cool features.
Also as #Imanol mentioned in comments you may want to use Numpy view objects if you want to have a memory optimized and flexible operation when you want to modify an array(s) with reference(s). view objects can be constructed in following two ways:
a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of
the array’s memory with a different data-type. This can cause a
reinterpretation of the bytes of memory.
a.view(ndarray_subclass) or a.view(type=ndarray_subclass) just returns
an instance of ndarray_subclass that looks at the same array (same
shape, dtype, etc.) This does not cause a reinterpretation of the
memory.
For a.view(some_dtype), if some_dtype has a different number of bytes
per entry than the previous dtype (for example, converting a regular
array to a structured array), then the behavior of the view cannot be
predicted just from the superficial appearance of a (shown by
print(a)). It also depends on exactly how a is stored in memory.
Therefore if a is C-ordered versus fortran-ordered, versus defined as
a slice or transpose, etc., the view may give different results.
Not sure if it's useful at this point, but numpy.ndarray.ctypes seems to have useful bits:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html
Used something like this (missing dtype, but meh):
def is_same_array(a, b):
return (a.shape == b.shape) and (a == b).all() and a.ctypes.data == b.ctypes.data
here:
https://github.com/EricCousineau-TRI/repro/blob/a60daf899e9726daf2ca1259bb80ad2c7c9b3e3f/python/namedlist_alt.py#L111

How can I create an array of 1-element arrays from an array?

I would like to be able to convert arrays, such as
a = np.array([[1,2], [3,4]])
into the same array BUT each element as a 1-element array instead of a number.
The desired output would be:
np.array([[np.array([1]), np.array([2])], [np.array([3]), np.array([4])]])
The operation you describe is very rarely useful. More likely, it would be a better idea to add an extra dimension of length 1 to the end of your array:
a = a[..., np.newaxis]
# or
a = a.reshape(a.shape + (1,))
Then a[0, 1] will be a 1D array, but all the nice NumPy features like broadcasting and ufuncs will work right. Note that this creates a view of the original array; if you need an independent copy, you can call the copy() method.
If you actually want a 2D array whose elements are 1D arrays, NumPy doesn't make that easy for you. (It's almost never a good way to organize your data, so there isn't much reason for the NumPy developers to provide an easy way to do it.) Most of the things you might expect to create such an array will instead create a 3D array. The most straightforward way to do it I know of is to create an empty array of object dtype and fill the cells one by one, using ordinary Python loops:
b = numpy.empty(a.shape, dtype=object)
for i in range(a.shape[0]):
for j in range(a.shape[1]):
b[i, j] = numpy.array([a[i, j]])
np.array([ [ [x] for x in row ] for row in [[1,2], [3,4]] ])
You are fighting np.arrays attempt to create a high dimensional array.
For example, your desired expression, when entered, produces a (2,2,1) array:
a2 = np.array([[np.array([1]), np.array([2])], [np.array([3]), np.array([4])]])
In [172]: a2
Out[172]:
array([[[1],
[2]],
[[3],
[4]]])
Which might actually be what you want (despite print appearances), since
In [180]: a2[0,0]
Out[180]: array([1])
Often the best way to fight the tendency of numpy to make a higher nd array, is to initialize it as empty with the right shape and dtype, and then fill it:
In [183]: a1=np.empty(a.shape,dtype=object)
In [184]: a1.flat=[np.array(x) for x in a.flatten()]
In [185]: a1
Out[185]:
array([[array(1), array(2)],
[array(3), array(4)]], dtype=object)
It has the right shape (2,2), and dtype. The elements are arrays - but their dimensions are 'wrong', () instead of (1,). flat and flatten are handy tools for iterating over nd arrays as though they were 1d.
a1.flat=[np.array([x]) for x in a.flatten()] does not work - it's an array of ints like a, but with object dtype.
A more explicit iteration does work:
In [214]: a1=np.empty(a.shape,dtype=object)
In [215]: for i in np.ndindex(2,2):
a1[i]=np.array([a[i]])
In [216]: a1
Out[216]:
array([[array([1]), array([2])],
[array([3]), array([4])]], dtype=object)

how can I flatten an 2d numpy array, which has different length in the second axis?

I have a numpy array which looks like:
myArray = np.array([[1,2],[3]])
But I can not flatten it,
In: myArray.flatten()
Out: array([[1, 2], [3]], dtype=object)
If I change the array to the same length in the second axis, then I can flatten it.
In: myArray2 = np.array([[1,2],[3,4]])
In: myArray2.flatten()
Out: array([1, 2, 3, 4])
My Question is:
Can I use some thing like myArray.flatten() regardless the dimension of the array and the length of its elements, and get the output: array([1,2,3])?
myArray is a 1-dimensional array of objects. Your list objects will simply remain in the same order with flatten() or ravel(). You can use hstack to stack the arrays in sequence horizontally:
>>> np.hstack(myArray)
array([1, 2, 3])
Note that this is basically equivalent to using concatenate with an axis of 1 (this should make sense intuitively):
>>> np.concatenate(myArray, axis=1)
array([1, 2, 3])
If you don't have this issue however and can merge the items, it is always preferable to use flatten() or ravel() for performance:
In [1]: u = timeit.Timer('np.hstack(np.array([[1,2],[3,4]]))'\
....: , setup = 'import numpy as np')
In [2]: print u.timeit()
11.0124390125
In [3]: u = timeit.Timer('np.array([[1,2],[3,4]]).flatten()'\
....: , setup = 'import numpy as np')
In [4]: print u.timeit()
3.05757689476
Iluengo's answer also has you covered for further information as to why you cannot use flatten() or ravel() given your array type.
Well, I agree with the other answers when they say that hstack or concatenate do the job in this case. However, I would like to point that even if it 'fixes' the problem, the problem is not addressed properly.
The problem is that even if it looks like the second axis has different length, this is not true in practice. If you try:
>>> myArray.shape
(2,)
>>> myArray.dtype
dtype('O') # stands for Object
>>> myArray[0]
[1, 2]
It shows you that your array is not a 2D array with variable size (as you might think), it is just a 1D array of objects. In your case, the elements are list, being the first element of your array a 2-element list and the second element of the array is a 1-element list.
So, flatten and ravel won't work because transforming 1D array to a 1D array results in exactly the same 1D array. If you have a object numpy array, it won't care about what you put inside, it will treat individual items as unkown items and can't decide how to merge them.
What you should have in consideration, is if this is the behaviour you want for your application. Numpy arrays are specially efficient with fixed-size numeric matrices. If you are playing with arrays of objects, I don't see why would you like to use Numpy instead of regular python lists.
np.hstack works in this case
In [69]: np.hstack(myArray)
Out[69]: array([1, 2, 3])

Categories

Resources