I want to hash numpy arrays without copying the data into a bytearray first.
Specifically, I have a contiguous read-only two-dimensional int64 numpy array A with unique rows. To be concrete, let's say:
A = np.array([[1, 2], [3, 4], [5, 6]])
A.setflags(write=False)
I want to make a constant time function that maps an arbitrary array ap that's identical in value to a slice of A, e.g. A[i], to its index i. e.g.
foo(np.array([1, 2])) == 0
foo(np.array([3, 4])) == 1
foo(np.array([5, 6])) == 2
The natural choice is to make a dictionary like:
lookup = {a: i for i, a in enumerate(A)}
Unfortunately numpy arrays are not hashable. There are ways to hash numpy arrays, but ideally I'd like the equality to be preserved so I can use it in a dictionary without writing manual collision detection.
The referenced article does point out that I could do:
lookup = {a.data.tobytes(): i for i, a in enumerate(A)}
def foo(ap):
return lookup[ap.data.tobytes()]
However the tobytes method returns a copy of the data pointed to by a.data, hence doubling the memory usage.
What I'd love to do is something like:
lookup = {a.data: i for i, a in enumerate(A)}
def foo(ap):
return lookup[ap.data]
This would ideally use a pointer to the underlying memory instead of the array object or a copy of its bytes, but since a.dtype == int, this fails with:
ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'
This is fine, we can cast it Aview = A.view(np.byte), now we have:
>>> Aview.flags
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False
# OWNDATA : False
# WRITEABLE : False
# ALIGNED : True
# UPDATEIFCOPY : False
>>> Aview.data.format
# 'b'
However, when trying to hash this, it still errors with:
TypeError: unhashable type: 'numpy.ndarray'
A possible solution (inspired by this) would be to define:
class _wrapper(object)
def __init__(self, array):
self._array = array
self._hash = hash(array.data.tobytes())
def __hash__(self):
return self._hash
def __eq__(self, other):
return self._hash == other._hash and np.all(self._array == other._array)
lookup = {_wrapper(a): i for i, a in enumerate(A)}
def foo(ap):
return lookup[_wrapper(ap)]
But this seems inelegant. Is there a way to tell python to just interpret the memoryview as a bytearray and hash it normally without having to make a copy to a bytestring or having numpy intercede and abort the hash?
Other things I've tried:
The format of A does allow me to map each row into a distinct integer, but for very large A the space of possible arrays is larger than np.iinfo(int).max, and while I can use python's integer types, this is ~100x slower than just hashing the memory.
I also tried doing something like:
Aview = A.view(np.void(A.shape[1] * A.itemsize)).squeeze()
However, even though A.flags.writeable == False, A[0].flags.writeable == True. When trying hash(A[0]) python raises TypeError: unhashable type: 'writeable void-scalar'. I'm unsure if it's possible to mark scalars as read-only, or otherwise hash a void scalar, even though most other scalars seem hashable.
I can't make sense of what you are trying to do.
When I create an array:
In [111]: A=np.array([1,0,1,2,3,0,2])
In [112]: A.__array_interface__
Out[112]:
{'data': (173599672, False),
'descr': [('', '<i4')],
'shape': (7,),
'strides': None,
'typestr': '<i4',
'version': 3}
In [113]: A.nbytes
Out[113]: 28
In [114]: id(A)
Out[114]: 2984144632
I get a ndarray object with a unique id, and attributes like shape, and strides, and data buffer. This buffer is 28 bytes starting at 173599672.
There isn't a A[3] object in A. I have to create it
In [115]: x=A[3]
In [116]: type(x)
Out[116]: numpy.int32
In [117]: id(x)
Out[117]: 2984723472
In [118]: x.__array_interface__
Out[118]:
{'__ref': array(2),
'data': (179546048, False),
'descr': [('', '<i4')],
'shape': (),
'strides': None,
'typestr': '<i4',
'version': 3}
This x is in many ways just a 0d 1 element array (it displays differently). Notice that its data pointer is unrelated to that of A. So it isn't even sharing memory.
A slice does share memory
In [119]: y=A[3:4]
In [120]: y.__array_interface__
Out[120]:
{'data': (173599684, False), # 173599672+3*4
'descr': [('', '<i4')],
'shape': (1,),
'strides': None,
'typestr': '<i4',
'version': 3}
====================
What exactly do you mean by mapping arbitrary A[i] to its B[i]? Are you using the value at A[i] as the key, or the location as the key? In my example, the elements of A are not unique. I can uniquely access A[0] or A[2] (by index), but in both cases I get a value of 1.
But consider this situation. There is a relatively fast way of finding a value in a 1d array - in1d.
In [121]: np.in1d(A,1)
Out[121]: array([ True, False, True, False, False, False, False], dtype=bool)
Make the B array:
In [122]: B=np.arange(A.shape[0])
All the elements in B corresponding to a 1 value in A:
In [123]: B[np.in1d(A,1)]
Out[123]: array([0, 2])
In [124]: B[np.in1d(A,0)] # to 0
Out[124]: array([1, 5])
In [125]: B[np.in1d(A,2)] # to 2
Out[125]: array([3, 6])
A dictionary created from A gives the same (last) values:
In [134]: dict(zip(A,B))
Out[134]: {0: 5, 1: 2, 2: 6, 3: 4}
=====================
The paragraph about hashable in the Python docs talks about needing to have a __hash__ method.
So I checked a objects:
In [200]: {}.__hash__ # None
In [201]: [].__hash__ # None
In [202]: ().__hash__
Out[202]: <method-wrapper '__hash__' of tuple object at 0xb729302c>
In [204]: class MyClass(object):
...: pass
...:
In [205]: MyClass().__hash__
Out[205]: <method-wrapper '__hash__' of MyClass object at 0xb3008c4c>
A numpy integer - int with a np.int32 wrapper:
In [206]: x
Out[206]: 2
In [207]: x.__hash__
Out[207]: <method-wrapper '__hash__' of numpy.int32 object at 0xb1e748c0>
In [208]: x.__hash__()
Out[208]: 2
A numpy array
In [209]: A
Out[209]:
array(['one', 'two'],
dtype='<U3')
In [210]: A.__hash__ # None
In [212]: np.float(12.232).__hash__()
Out[212]: 1219767578
So at a minimum the key for a dictionary must have a method of generating a hash, a unique identifier. It may be the instance id (default case), or maybe something derived from the values of the object (a checksum of some sort?). The dictionary maintains are table of these hashes, presumably with a pointer to both the key and the value. When I do a dictionary get, it generates the hash for the object I give it, looks that up, and returns the corresponding value - if present.
Classes that aren't hashable don't have a __hash__ method (or the method is None). They can't generate this unique id. Apparently by design an object of class np.ndarray does not have a __hash__. And playing with the writability flags does not change that.
The big problem with trying to hash or make a dictionary of the rows of an array is that you aren't interested in hashing a particular instance of an array (the object created by a slicing view), but a hash based on the values of that row.
So to take your 2d array:
In [236]: A
Out[236]:
array([[1, 2],
[3, 4],
[5, 6]])
you want A[1,:], and np.array([3,4]) to both generate the same __hash__() value. And A[0,:]+2, and maybe A.mean(axis=0) (except that's a float array).
And since you are worried about memory, you must be dealing with very large arrays, say (1000,1000) - which implies a hash value based on 1000 different numbers, and some how unique.
Related
Consider the following example:
>>> a=np.array([1,2,3,4])
>>> a
array([1, 2, 3, 4])
>>> a[np.newaxis,:,np.newaxis]
array([[[1],
[2],
[3],
[4]]])
How is it possible for Numpy to use the : (normally used for slicing arrays) as an index when using comma-separated subscripting?
If I try to use comma-separated subscripting with either a Python list or a Python list-of-lists, I get a TypeError:
>>> [[1,2],[3,4]][0,:]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not tuple
?
Define a simple class with a getitem, indexing method:
In [128]: class Foo():
...: def __getitem__(self, arg):
...: print(type(arg), arg)
...:
In [129]: f = Foo()
And look at what different indexes produce:
In [130]: f[:]
<class 'slice'> slice(None, None, None)
In [131]: f[1:2:3]
<class 'slice'> slice(1, 2, 3)
In [132]: f[:, [1,2,3]]
<class 'tuple'> (slice(None, None, None), [1, 2, 3])
In [133]: f[:, :3]
<class 'tuple'> (slice(None, None, None), slice(None, 3, None))
In [134]: f[(slice(1,None),3)]
<class 'tuple'> (slice(1, None, None), 3)
For builtin classes like list, a tuple argument raises an error. But that's a class dependent issue, not a syntax one. numpy.ndarray accepts a tuple, as long as it's compatible with its shape.
The syntax for a tuple index was added to Python to meet the needs of numpy. I don't think there are any builtin classes that use it.
The numpy.lib.index_tricks.py module has several classes that take advantage of this behavior. Look at its code for more ideas.
In [137]: np.s_[3:]
Out[137]: slice(3, None, None)
In [139]: np.r_['0,2,1',[1,2,3],[4,5,6]]
Out[139]:
array([[1, 2, 3],
[4, 5, 6]])
In [140]: np.c_[[1,2,3],[4,5,6]]
Out[140]:
array([[1, 4],
[2, 5],
[3, 6]])
other "indexing" examples:
In [141]: f[...]
<class 'ellipsis'> Ellipsis
In [142]: f[[1,2,3]]
<class 'list'> [1, 2, 3]
In [143]: f[10]
<class 'int'> 10
In [144]: f[{1:12}]
<class 'dict'> {1: 12}
I don't know of any class that makes use of a dict argument, but the syntax allows it.
Lists are 1D and you can either pass a single index or a slice. The : when used for index is a short notation to create a slice. In [[1,2],[3,4]][0,:] you are passing 0,:, which is the tuple (0, :). That is, in order to create a tuple the parenthesis are optional here. Just having values separated by comma will create the tuple.
But numpy is different, since it is a N-dimensional array and not just 1D as lists you can pass multiple indexes to index the different dimensions. Therefore, passing a tuple to index numpy is allowed as long as the number of elements in the tuple is not greater than the dimensions of the indexed array. Consuder the array below
arr = np.random.randn(5,10, 3)
It has 3 dimensions and we can index it like arr[0,1,0], but this is the same as arr[(0,1,0)]. That is, we are passing a tuple to index the array. Each tuple element itself can be an integer or a slice and numpy will do the appropriated indexing. Numpy accepts tuples for indexing but lists don't.
However, when you write a[np.newaxis,:,np.newaxis] it is a more than just indexing. First, note that np.newaxis is just None. When you use None to index a dimension in numpy what it does as creating that dimension. The a array in your example is 1D, but a[np.newaxis,:,np.newaxis] is a special type of indexing understood by numpy as a short notation to "give me an array with extra axis where I'm indexing with np.newaxis and whose elements are from my original array indexed as I'm specifying".
So, the TLDR answer is numpy indexing is more general and powerful than list indexing.
I am new to python and learning Numpy. What i have read and tested is that np.array has single data type. When I use it on normal code, it works and behaves well. i.e
import numpy as np
np1 = np.array([1,'2' , True])
for i in np1:
print(type(i))
Answer is
<class 'numpy.str_'>
<class 'numpy.str_'>
<class 'numpy.str_'>
But when my code is
np2 = np.array([{1:1 , 2:2 }, 1 , True , '1'])
for i in np2:
print(type(i))
Answer is
<class 'dict'>
<class 'int'>
<class 'bool'>
<class 'str'>
Which shows that elements are not of numpy class as above answer was <class 'numpy.str'>.
When I printed print(type(np2)) , Answer was <class 'numpy.ndarray'>.
Can you explain why they are not of the same data type.? Thanks
If the desired datatype for the array is not given, then the type "will be determined as the minimum type required to hold the objects in the sequence."
In the first case, the minimum type is str, because each item can be converted to a string. The new array holds strings.
In the second case, the minimum type is object (because <class 'str'> and dict cannot be converted to strings). The new array holds references to objects. Each object has its own type.
You can force np1 to be an array of objects:
np1 = np.array([1, '2' , True], dtype=object)
type(np1[0]))
#<class 'int'>
type(np1[1]))
#<class 'str'>
type(np1[2]))
#<class 'bool'>
In an interactive ipython session, objects such as arrays are shown with their repr representation. I find this to be quite informative:
In [41]: np1 = np.array([1,'2' , True])
In [42]: np1
Out[42]: array(['1', '2', 'True'], dtype='<U21')
Note the quotes and U21 dtype. Both show that the array contains strings, that both the number and the boolean have been converted to the common string dtype.
In [43]: np2 = np.array([{1:1 , 2:2 }, 1 , True , '1'])
In [44]: np2
Out[44]: array([{1: 1, 2: 2}, 1, True, '1'], dtype=object)
In [45]: [{1:1 , 2:2 }, 1 , True , '1']
Out[45]: [{1: 1, 2: 2}, 1, True, '1']
Note the object dtype. And the element display is basically the same as for a list. Such an array is practically a list. There are some differences, but for many purposes it can be regarded as a list. It has few advantages over a list, and some disadvantages. It does not have have the computational speed of a numeric numpy array.
The databuffer of an object dtype array is similar to the underlying buffer of a list. Both contain pointers or references to objects stored elsewhere in memory. In that sense it does have a single data type - a reference.
===
If I make a list, and then make an object dtype array from that list:
In [48]: alist = [{1:1 , 2:2 }, 1 , True , '1']
In [49]: arr = np.array(alist)
In [50]: arr
Out[50]: array([{1: 1, 2: 2}, 1, True, '1'], dtype=object)
I can show that the dictionary in the array is the same dictionary as in the list. They have the same id:
In [51]: id(arr[0])
Out[51]: 140602595005568
In [52]: id(alist[0])
Out[52]: 140602595005568
and modifications to the list, show up in the array:
In [53]: alist[0][3]=3
In [54]: arr
Out[54]: array([{1: 1, 2: 2, 3: 3}, 1, True, '1'], dtype=object)
Please, refer to documentation. The first feature it supports is that it's
a powerful N-dimensional array object
So, it deals with any element as an object
Another thing:
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
So, the NumPy array tries efficiently to store its elements as the same data type if possible to optimize the performance.
This question already has an answer here:
id() vs `is` operator. Is it safe to compare `id`s? Does the same `id` mean the same object?
(1 answer)
Closed 3 years ago.
Two python objects have the same id but "is" operation returns false as shown below:
a = np.arange(12).reshape(2, -1)
c = a.reshape(12, 1)
print("id(c.data)", id(c.data))
print("id(a.data)", id(a.data))
print(c.data is a.data)
print(id(c.data) == id(a.data))
Here is the actual output:
id(c.data) 241233112
id(a.data) 241233112
False
True
My question is... why "c.data is a.data" returns false even though they point to the same ID, thus pointing to the same object? I thought that they point to the same object if they have same ID or am I wrong? Thank you!
a.data and c.data both produce a transient object, with no reference to it. As such, both are immediately garbage-collected. The same id can be used for both.
In your first if statement, the objects have to co-exist while is checks if they are identical, which they are not.
In the second if statement, each object is released as soon as id returns its id.
If you save references to both objects, keeping them alive, you can see they are not the same object.
r0 = a.data
r1 = c.data
assert r0 is not r1
In [62]: a = np.arange(12).reshape(2,-1)
...: c = a.reshape(12,1)
.data returns a memoryview object. id just gives the id of that object; it's not the value of the object, or any indication of where a databuffer is located.
In [63]: a.data
Out[63]: <memory at 0x7f672d1101f8>
In [64]: c.data
Out[64]: <memory at 0x7f672d1103a8>
In [65]: type(a.data)
Out[65]: memoryview
https://docs.python.org/3/library/stdtypes.html#memoryview
If you want to verify that a and c share a data buffer, I find the __array_interface__ to be a better tool.
In [66]: a.__array_interface__['data']
Out[66]: (50988640, False)
In [67]: c.__array_interface__['data']
Out[67]: (50988640, False)
It even shows the offset produced by slicing - here 24 bytes, 3*8
In [68]: c[3:].__array_interface__['data']
Out[68]: (50988664, False)
I haven't seen much use of a.data. It can be used as the buffer object when creating a new array with ndarray:
In [70]: d = np.ndarray((2,6), dtype=a.dtype, buffer=a.data)
In [71]: d
Out[71]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
In [72]: d.__array_interface__['data']
Out[72]: (50988640, False)
But normally we create new arrays with shared memory with slicing or np.array (copy=False).
I wanted to create an array to hold mixed types - string and int.
The following code did not work as desired - all elements got typed as String.
>>> a=numpy.array(["Str",1,2,3,4])
>>> print a
['Str' '1' '2' '3' '4']
>>> print type(a[0]),type(a[1])
<type 'numpy.string_'> <type 'numpy.string_'>
All elements of the array were typed as 'numpy.string_'
But, oddly enough, if I pass one of the elements as "None", the types turn out as desired:
>>> a=numpy.array(["Str",None,2,3,4])
>>> print a
['Str' None 2 3 4]
>>> print type(a[0]),type(a[1]),type(a[2])
<type 'str'> <type 'NoneType'> <type 'int'>
Thus, including a "None" element provides me with a workaround, but I am wondering why this should be the case.
Even if I don't pass one of the elements as None, shouldn't the elements be typed as they are passed?
Mixed types in NumPy is strongly discouraged. You lose the benefits of vectorised computations. In this instance:
For your first array, NumPy makes the decision to convert your
array to a uniform array of strings of 3 or less characters.
For your second array, None is not permitted as a "stringable" variable in NumPy,
so NumPy uses the standard object dtype. object dtype represents a collection of pointers to arbitrary types.
You can see this when you print the dtype attributes of your arrays:
print(np.array(["Str",1,2,3,4]).dtype) # <U3
print(np.array(["Str",None,2,3,4]).dtype) # object
This should be entirely expected. NumPy has a strong preference for homogenous types, as indeed you should have for any meaningful computations. Otherwise, Python list may be a more appropriate data structure.
For a more detailed descriptions of how NumPy prioritises dtype choice, see:
How does numpy determine the object-array's dtype and what it
means?
NumPy array/matrix of mixed types
An alternative to adding the None is to make the dtype explicit:
In [80]: np.array(["str",1,2,3,4])
Out[80]: array(['str', '1', '2', '3', '4'], dtype='<U3')
In [81]: np.array(["str",1,2,3,4], dtype=object)
Out[81]: array(['str', 1, 2, 3, 4], dtype=object)
Creating a object dtype array and filling it from a list is another option:
In [85]: res = np.empty(5, object)
In [86]: res
Out[86]: array([None, None, None, None, None], dtype=object)
In [87]: res[:] = ['str', 1, 2, 3, 4]
In [88]: res
Out[88]: array(['str', 1, 2, 3, 4], dtype=object)
Here it isn't needed, but it matters when you want an array of lists.
I want to understand the NumPy behavior.
When I try to get the reference of an inner array of a NumPy array, and then compare it to the object itself, I get as returned value False.
Here is the example:
In [198]: x = np.array([[1,2,3], [4,5,6]])
In [201]: x0 = x[0]
In [202]: x0 is x[0]
Out[202]: False
While on the other hand, with Python native objects, the returned is True.
In [205]: c = [[1,2,3],[1]]
In [206]: c0 = c[0]
In [207]: c0 is c[0]
Out[207]: True
My question, is that the intended behavior of NumPy? If so, what should I do if I want to create a reference of inner objects of NumPy arrays.
2d slicing
When I first wrote this I constructed and indexed a 1d array. But the OP is working with a 2d array, so x[0] is a 'row', a slice of the original.
In [81]: arr = np.array([[1,2,3], [4,5,6]])
In [82]: arr.__array_interface__['data']
Out[82]: (181595128, False)
In [83]: x0 = arr[0,:]
In [84]: x0.__array_interface__['data']
Out[84]: (181595128, False) # same databuffer pointer
In [85]: id(x0)
Out[85]: 2886887088
In [86]: x1 = arr[0,:] # another slice, different id
In [87]: x1.__array_interface__['data']
Out[87]: (181595128, False)
In [88]: id(x1)
Out[88]: 2886888888
What I wrote earlier about slices still applies. Indexing an individual elements, as with arr[0,0] works the same as with a 1d array.
This 2d arr has the same databuffer as the 1d arr.ravel(); the shape and strides are different. And the distinction between view, copy and item still applies.
A common way of implementing 2d arrays in C is to have an array of pointers to other arrays. numpy takes a different, strided approach, with just one flat array of data, and usesshape and strides parameters to implement the transversal. So a subarray requires its own shape and strides as well as a pointer to the shared databuffer.
1d array indexing
I'll try to illustrate what is going on when you index an array:
In [51]: arr = np.arange(4)
The array is an object with various attributes such as shape, and a data buffer. The buffer stores the data as bytes (in a C array), not as Python numeric objects. You can see information on the array with:
In [52]: np.info(arr)
class: ndarray
shape: (4,)
strides: (4,)
itemsize: 4
aligned: True
contiguous: True
fortran: True
data pointer: 0xa84f8d8
byteorder: little
byteswap: False
type: int32
or
In [53]: arr.__array_interface__
Out[53]:
{'data': (176486616, False),
'descr': [('', '<i4')],
'shape': (4,),
'strides': None,
'typestr': '<i4',
'version': 3}
One has the data pointer in hex, the other decimal. We usually don't reference it directly.
If I index an element, I get a new object:
In [54]: x1 = arr[1]
In [55]: type(x1)
Out[55]: numpy.int32
In [56]: x1.__array_interface__
Out[56]:
{'__ref': array(1),
'data': (181158400, False),
....}
In [57]: id(x1)
Out[57]: 2946170352
It has some properties of an array, but not all. For example you can't assign to it. Notice also that its 'data` value is totally different.
Make another selection from the same place - different id and different data:
In [58]: x2 = arr[1]
In [59]: id(x2)
Out[59]: 2946170336
In [60]: x2.__array_interface__['data']
Out[60]: (181143288, False)
Also if I change the array at this point, it does not affect the earlier selections:
In [61]: arr[1] = 10
In [62]: arr
Out[62]: array([ 0, 10, 2, 3])
In [63]: x1
Out[63]: 1
x1 and x2 don't have the same id, and thus won't match with is, and they don't use the arr data buffer either. There's no record that either variable was derived from arr.
With slicing it is possible get a view of the original array,
In [64]: y = arr[1:2]
In [65]: y.__array_interface__
Out[65]:
{'data': (176486620, False),
'descr': [('', '<i4')],
'shape': (1,),
....}
In [66]: y
Out[66]: array([10])
In [67]: y[0]=4
In [68]: arr
Out[68]: array([0, 4, 2, 3])
In [69]: x1
Out[69]: 1
It's data pointer is 4 bytes larger than arr - that is, it points to the same buffer, just a different spot. And changing y does change arr (but not the independent x1).
I could even make a 0d view of this item
In [71]: z = y.reshape(())
In [72]: z
Out[72]: array(4)
In [73]: z[...]=0
In [74]: arr
Out[74]: array([0, 0, 2, 3])
In Python code we normally don't work with objects like this. When we use the c-api or cython is it possible to access the data buffer directly. nditer is an iteration mechanism that works with 0d objects like this (either in Python or the c-api). In cython typed memoryviews are particularly useful for low level access.
http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
https://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#c.NpyIter
elementwise ==
In response to comment, Comparing NumPy object references
np.array([1]) == np.array([2]) will return array([False], dtype=bool)
== is defined for arrays as an elementwise operation. It compares the values of the respective elements and returns a matching boolean array.
If such a comparison needs to be used in a scalar context (such as an if) it needs to be reduced to a single value, as with np.all or np.any.
The is test compares object id's (not just for numpy objects). It has limited value in practical coding. I used it most often in expressions like is None, where None is an object with a unique id, and which does not play nicely with equality tests.
I think that you have a miss understanding about Numpy arrays. You think that sub arrays in a multidimensional array in Numpy (like in Python lists) are separate objects, well, they're not.
A Numpy array, regardless of its dimension is just one object. And that's because Numpy creates the arrays at C levels and when loads them up as a python object it can't be break down to multiple objects. That makes Python to create a new object for preserving new parts when you use some attributes like split(), __getitem__, take() or etc., which as a mater of fact, its just the way that python abstracts the list-like behavior for Numpy arrays.
You can also check thin in real-time like following:
In [7]: x
Out[7]:
array([[1, 2, 3],
[4, 5, 6]])
In [8]: x[0] is x[0]
Out[8]: False
So as soon as you have an array or any mutable object that can hols other object in it you'll have a python mutable object and therefore you will lose the performance and all other Numpy array's cool features.
Also as #Imanol mentioned in comments you may want to use Numpy view objects if you want to have a memory optimized and flexible operation when you want to modify an array(s) with reference(s). view objects can be constructed in following two ways:
a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of
the array’s memory with a different data-type. This can cause a
reinterpretation of the bytes of memory.
a.view(ndarray_subclass) or a.view(type=ndarray_subclass) just returns
an instance of ndarray_subclass that looks at the same array (same
shape, dtype, etc.) This does not cause a reinterpretation of the
memory.
For a.view(some_dtype), if some_dtype has a different number of bytes
per entry than the previous dtype (for example, converting a regular
array to a structured array), then the behavior of the view cannot be
predicted just from the superficial appearance of a (shown by
print(a)). It also depends on exactly how a is stored in memory.
Therefore if a is C-ordered versus fortran-ordered, versus defined as
a slice or transpose, etc., the view may give different results.
Not sure if it's useful at this point, but numpy.ndarray.ctypes seems to have useful bits:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html
Used something like this (missing dtype, but meh):
def is_same_array(a, b):
return (a.shape == b.shape) and (a == b).all() and a.ctypes.data == b.ctypes.data
here:
https://github.com/EricCousineau-TRI/repro/blob/a60daf899e9726daf2ca1259bb80ad2c7c9b3e3f/python/namedlist_alt.py#L111