Some confusions on how numpy array stored in Python - python

I have some confusions when playing with data type numpy array in Python.
Question 1
I execute the following scripts in python intepreter
>>> import numpy as np
>>> L = [1000,2000,3000]
>>> A = np.array(L)
>>> B = A
Then I check the following things:
>>> A is B
True
>>> id(A) == id(B)
True
>>> id(A[0]) == id(B[0])
True
That's fine. But some strange things happened then.
>>> A[0] is B[0]
False
But how can A[0] and B[0] be different things? They have the same id!
For List in python, we have
>>> LL = [1000,2000,3000]
>>> SS = LL
>>> LL[0] is SS[0]
True
The method to store numpy array is totally different with list? And we also have
>>> A[0] = 1001
>>> B[0]
1001
It seems that A[0] and B[0] is the identical objects.
Question2
I make a copy of A.
>>> C = A[:]
>>> C is A
False
>>> C[0] is A[0]
False
That is fine. A and C seem to be independent with each other. But
>>> A[0] = 1002
>>> C[0]
1002
It seems that A and C is not independent? I am totally confused.

You are asking two completely independent questions, so here's two answsers.
The data of Numpy arrays is internally stored as a contiguous C array. Each entry in the array is just a number. Python objects on the other hand require some housekeeping data, e.g. the reference count and a pointer to the type object. You can't simply have a raw pointer to a number in memory. For this reason, Numpy "boxes" a number in a Python object if you access an individual elemtent. This happens everytime you access an element, so even A[0] and A[0] are different objects:
>>> A[0] is A[0]
False
This is at the heart of why Numpy can store arrays in a more memory-efficient way: It does not store a full Python object for each entry, and only creates these objects on the fly when needed. It is optimised for vectorised operations on the array, not for individual element access.
When you execute C = A[:] you are creating a new view for the same data. You are not making a copy. You will then have two different wrapper objects, pointed to by A and C respectively, but they are backed by the same buffer. The base attribute of an array refers to the array object it was originally created from:
>>> A.base is None
True
>>> C.base is A
True
New views on the same data are particularly useful when combined with indexing, since you can get views that only include some slice of the original array, but are backed by the same memory.
To actually make a copy of an array, use the copy() method.
As a more general remark, you should not read too much into object identity in Python. In general, if x is y is true, you know that they are really the same object. However, if this returns false, they can still be two different proxies to the same object.

Related

Numpy Arrays : Although A = B[:] creates an numpy array object A with id(A) != id(B) . It still gets updated , unlike lists

I was trying to do a shallow copy of numpy arrays using [:], like i normally would do with lists. However i found that this does not behave same as lists. I know that i can use .copy() method to solve this. But just wanted to understand whats happening under the hood with numpy here. can anyone please elaborate.
import numpy as np
a = np.array([1,2,3,4,5])
b = a[:]
print(id(b) == id(a)) # Ids are different, So different objects right ?
b[3] = 10
print(a, b) # Both a and b got updated
From the documentation on slicing (emphasis mine):
Note that slices of arrays do not copy the internal array data but
only produce new views of the original data. This is different from
list or tuple slicing and an explicit copy() is recommended if the
original data is not required anymore.
So just do:
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = a.copy()
b[3] = 10
print(a, b)
Output
[1 2 3 4 5] [ 1 2 3 10 5]
Notice that the reason the ids are different is because b is a view of a and indeed a different object.
The operator [:] in numpy does not copy the data, but the reference. So that kind of behavior is expected. It's same as you do the a = b directly. Use the np.copy(array) to copy the values.
If there's array a = [1, 2], and the empty variable b, so the behavior of assignments of numpy array are summarized into this:
These kind of assignments will copy the reference, so if you change the b later, it will also change a.
b = a
b[1] = 10
print(b == a) # True
b = a[:]
b[1] = 10
print(b == a) # True
This kind of assignment will copy the values, so if you change b later, a won't change.
b = np.copy(a)
b[1] = 10
print(b == a) # True
...and if the dtype=object, the use of deepcopy() from package copy will make sure all the values copied.
b = copy.deepcopy(a)
b[1] = 10
print(b == a) # True

numpy value shows same id result after changing the assigned value

I have a question.(python version : 3.9.7)
I run this below code. However, I cannot understand this happening.
plz, let me know why it happens below.
(As far as I know, number is immutable, so when something new as a number is assigned, the object address should point out different address including the number.)
a = np.array([[0,1,2],[3,4,5],[6,7,8]])
id(a[0]) #1977043162384*
A = [0,0,0]; a[0] = A
id(a[0]) #1977043162384 (I cannot understand this part)***
b = [1,2,3]
a[0] = b
id(a[0]) #1977290465808
The ID number on Line 4 should be changed, shouldn't it?
Every time you access an array like that, Python wraps the underlying information in a new object, so:
In [3]: a = np.array([[0,1,2],[3,4,5],[6,7,8]])
In [4]: a[0]
Out[4]: array([0, 1, 2])
In [5]: a[0] is a[0]
Out[5]: False
Or perhaps visualized another way:
In [6]: id(a[0])
Out[6]: 140266652673680
In [7]: id(a[0])
Out[7]: 140268012281648
In [8]: id(a[0])
Out[8]: 140267734662960
In [9]: id(a[0])
Out[9]: 140266652673680
You shouldn't expect id(a[0]) to be different. It's free to re-use the same id because the lifetimes of those objects are not overlapping.
Of course, whether it re-uses that ID is an implementation detail. But why did you expect the ID to change? It is important to understand,
A = [0,0,0]; a[0] = A
Does not in anyway put that list in the array. Instead, the primitive, underlying buffer is modified.
(As far as I know, number is immutable, so when something new as a
number is assigned, the object address should point out different
address including the number)
There are no python objects in your array, you are using a numpy.int64 dtype. This is crucial to understand.

What happens internally when concatenating two lists in Python?

When concatenating two lists,
a = [0......, 10000000]
b = [0......, 10000000]
a = a + b
does the Python runtime allocate a bigger array and loop through both arrays and put the elements of a and b into the bigger array?
Or does it loop through the elements of b and append them to a and resize as necessary?
I am interested in the CPython implementation.
In CPython, two lists are concatenated in function list_concat.
You can see in the linked source code that that function allocates the space needed to fit both lists.
size = Py_SIZE(a) + Py_SIZE(b);
np = (PyListObject *) list_new_prealloc(size);
Then it copies the items from both lists to the new list.
for (i = 0; i < Py_SIZE(a); i++) {
...
}
...
for (i = 0; i < Py_SIZE(b); i++) {
...
}
You can find out by looking at the id of a before and after concatenating b:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> id(a)
140025874463112
>>> a = a + b
>>> id(a)
140025874467144
Here, since the id is different, we see that the interpreter has created a new list and bound it to the name a. The old a list will be garbage collected eventually.
However, the behaviour can be different when using the augmented assignment operator +=:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> id(a)
140025844068296
>>> a += b
>>> id(a)
140025844068296
Here, since the id is the same, we see that the interpreter has reused the same list object a and appended the values of b to it.
For more detailed information, see these questions:
Why does += behave unexpectedly on lists?
Does list concatenation with the `+` operator always return a new `list` instance?
You can see the implementation in listobject.c::list_concat. Python will get the size of a and b and create a new list object of that size. It will then loop through the values of a and b, which are C pointers to python objects, increment their ref counts and add those pointers to the new list.
It will create a new list with a shallow copy of the items in the first list, followed by a shallow copy of the items in the second list. The + operator calls the object.__add__(self, other) method. For example, for the expression x + y, where x is an instance of a class that has an __add__() method, x.__add__(y) is called. You can read more in the documentation.

Does slice operation allocate a new object always?

I am confused about the slice operation.
>>> s = "hello world"
>>> y = s[::]
>>> id(s)
4507906480
>>> id(y)
4507906480 # they are the same - no new object was created
>>> z = s[:2]
>>> z
'he'
>>> id(z)
4507835488 # z is a new object
What allocation rule does slice operation follow?
For most built-in types, slicing is always a shallow copy... in the sense that modifying the copy will not modify the original. This means that for immutable types, an object counts as a copy of itself. The copy module also uses this concept of "copy":
>>> t = (1, 2, 3)
>>> copy.copy(t) is t
True
Objects are free to use whatever allocation strategy they choose, as long as they implement the semantics they document. y can be the same object as s, but z cannot, because s and z store different values.

Variables and memory allocated for them?

I have a question about Python deal with memory to copy variables.
For example, I have a list(or string, tuple, dictionary, set) variable
A = [1,2,3]
then I assign the value of A to another variable B
B = A
then if I do "some changes" to A, e.g.,
A.pop(0)
then B also changes, i.e.,
print(A,B) will give me ([2,3], [2,3])
I read some material and they say "B=A did not copy the value of A to a new place in memory labelled by B. It just made the name B point to the same position in memory as A." Can I interpret this as we still only have one place of memory, but now it has 2 names?
However, I found that if I did some other changes to A, such as
A = [5,6] # I reassign A value,
Then I found
print(A,B)
gives me ([5,6],[1,2,3])
So I am confused here. It seems that now we have two places of memory
Your first understanding was correct. When you do
B = A
you now have two names pointing to the same object in memory.
Your misunderstanding is what happens when you do
A = [5, 6]
This doesn't copy [5, 6] to that location in memory. It allocates a new list [5, 6] and then changes the name A to point to this. But B still points to the same list that it pointed to before.
Basically, every time you do
A = <something>
you're changing where A points, not changing the thing that it points to.
Lists are objects and therefore 'call-by-reference'. When you write B=A you'll get a reference (c-pointer) on the object behind A (not A itself!), so basically, as your code is telling you already, A is B == True. The reference is not on A but on the object that A points to, so if you change A to A = [5,6] the interpreter will notice that you've got another reference (B) on the old list and will keep that reference and the list (else it would land in the garbage collector). It'll only change the adress stored in A.
If you then, however, reassing B=A, B will be [5,6].
you assign new obj to a at second time
>>>
>>> a= [1,2,3]
>>> id(a)
4353139632
>>> b = a
>>> id(b)
4353139632
>>> a= [4,5]
>>> id(a)
4353139776
>>> id(b)
4353139632
Lists, tuples, and objects are referenced in Python. You can see these variable names as pointers in C. So, A is a pointer to some location, storing an array, when you did B = A you copied the reference to that location ( the address ) to B.
So, when you changed contents at that location, via A, then consequently, answer would be what that is at that memory location, whether you access it via A or B.
However, if you would like to copy the elements, you can use
B = [i for i in A]
or something like that.
and when you assigned some other value to A, A = [5,6], the reference at A is now pointing to some other memory location, and that at B to the original location, so B stays same.

Categories

Resources