numpy value shows same id result after changing the assigned value - python

I have a question.(python version : 3.9.7)
I run this below code. However, I cannot understand this happening.
plz, let me know why it happens below.
(As far as I know, number is immutable, so when something new as a number is assigned, the object address should point out different address including the number.)
a = np.array([[0,1,2],[3,4,5],[6,7,8]])
id(a[0]) #1977043162384*
A = [0,0,0]; a[0] = A
id(a[0]) #1977043162384 (I cannot understand this part)***
b = [1,2,3]
a[0] = b
id(a[0]) #1977290465808
The ID number on Line 4 should be changed, shouldn't it?

Every time you access an array like that, Python wraps the underlying information in a new object, so:
In [3]: a = np.array([[0,1,2],[3,4,5],[6,7,8]])
In [4]: a[0]
Out[4]: array([0, 1, 2])
In [5]: a[0] is a[0]
Out[5]: False
Or perhaps visualized another way:
In [6]: id(a[0])
Out[6]: 140266652673680
In [7]: id(a[0])
Out[7]: 140268012281648
In [8]: id(a[0])
Out[8]: 140267734662960
In [9]: id(a[0])
Out[9]: 140266652673680
You shouldn't expect id(a[0]) to be different. It's free to re-use the same id because the lifetimes of those objects are not overlapping.
Of course, whether it re-uses that ID is an implementation detail. But why did you expect the ID to change? It is important to understand,
A = [0,0,0]; a[0] = A
Does not in anyway put that list in the array. Instead, the primitive, underlying buffer is modified.
(As far as I know, number is immutable, so when something new as a
number is assigned, the object address should point out different
address including the number)
There are no python objects in your array, you are using a numpy.int64 dtype. This is crucial to understand.

Related

difference in variable assigning in Python between integer and list

I am studying Wes McKinney's 'Python for data analysis'.
At some point he says:
"When assigning a variable (or name) in Python, you are creating a reference to the object on the righthand side of the equals sign. In practical terms, consider a list of integers:
In [8]: a = [1, 2, 3]
In [9]: b = a
In [11]: a.append(4)
In [12]: b
output will be:
Out[12]: [1, 2, 3, 4]
He reasons as such:
"In some languages, the assignment of b will cause the data [1, 2, 3] to be copied. In Python, a and b actually now refer to the same object, the original list"
My question is that why the same thing does not occur in the case below:
In [8]: a = 5
In [9]: b = a
In [11]: a +=1
In [12]: b
Where I still get
Out[12]: 5
for b?
In the first case, you're creating a list and both a and b are pointing at this list. When you're changing the list, then both variables are pointers at the list including its changes.
But if you increase the value of a variable that points at an integer. 5 is still 5, you're not changing the integer. You're changing which object the variable a is pointing to. So a is now pointing at the value 6, while b is still pointing at 5. You're not changing the thing that a is pointing to, you're changing WHAT a is pointing to. b doesn't care about that.

"is" operation returns false even though two objects have same id [duplicate]

This question already has an answer here:
id() vs `is` operator. Is it safe to compare `id`s? Does the same `id` mean the same object?
(1 answer)
Closed 3 years ago.
Two python objects have the same id but "is" operation returns false as shown below:
a = np.arange(12).reshape(2, -1)
c = a.reshape(12, 1)
print("id(c.data)", id(c.data))
print("id(a.data)", id(a.data))
print(c.data is a.data)
print(id(c.data) == id(a.data))
Here is the actual output:
id(c.data) 241233112
id(a.data) 241233112
False
True
My question is... why "c.data is a.data" returns false even though they point to the same ID, thus pointing to the same object? I thought that they point to the same object if they have same ID or am I wrong? Thank you!
a.data and c.data both produce a transient object, with no reference to it. As such, both are immediately garbage-collected. The same id can be used for both.
In your first if statement, the objects have to co-exist while is checks if they are identical, which they are not.
In the second if statement, each object is released as soon as id returns its id.
If you save references to both objects, keeping them alive, you can see they are not the same object.
r0 = a.data
r1 = c.data
assert r0 is not r1
In [62]: a = np.arange(12).reshape(2,-1)
...: c = a.reshape(12,1)
.data returns a memoryview object. id just gives the id of that object; it's not the value of the object, or any indication of where a databuffer is located.
In [63]: a.data
Out[63]: <memory at 0x7f672d1101f8>
In [64]: c.data
Out[64]: <memory at 0x7f672d1103a8>
In [65]: type(a.data)
Out[65]: memoryview
https://docs.python.org/3/library/stdtypes.html#memoryview
If you want to verify that a and c share a data buffer, I find the __array_interface__ to be a better tool.
In [66]: a.__array_interface__['data']
Out[66]: (50988640, False)
In [67]: c.__array_interface__['data']
Out[67]: (50988640, False)
It even shows the offset produced by slicing - here 24 bytes, 3*8
In [68]: c[3:].__array_interface__['data']
Out[68]: (50988664, False)
I haven't seen much use of a.data. It can be used as the buffer object when creating a new array with ndarray:
In [70]: d = np.ndarray((2,6), dtype=a.dtype, buffer=a.data)
In [71]: d
Out[71]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
In [72]: d.__array_interface__['data']
Out[72]: (50988640, False)
But normally we create new arrays with shared memory with slicing or np.array (copy=False).

Some confusions on how numpy array stored in Python

I have some confusions when playing with data type numpy array in Python.
Question 1
I execute the following scripts in python intepreter
>>> import numpy as np
>>> L = [1000,2000,3000]
>>> A = np.array(L)
>>> B = A
Then I check the following things:
>>> A is B
True
>>> id(A) == id(B)
True
>>> id(A[0]) == id(B[0])
True
That's fine. But some strange things happened then.
>>> A[0] is B[0]
False
But how can A[0] and B[0] be different things? They have the same id!
For List in python, we have
>>> LL = [1000,2000,3000]
>>> SS = LL
>>> LL[0] is SS[0]
True
The method to store numpy array is totally different with list? And we also have
>>> A[0] = 1001
>>> B[0]
1001
It seems that A[0] and B[0] is the identical objects.
Question2
I make a copy of A.
>>> C = A[:]
>>> C is A
False
>>> C[0] is A[0]
False
That is fine. A and C seem to be independent with each other. But
>>> A[0] = 1002
>>> C[0]
1002
It seems that A and C is not independent? I am totally confused.
You are asking two completely independent questions, so here's two answsers.
The data of Numpy arrays is internally stored as a contiguous C array. Each entry in the array is just a number. Python objects on the other hand require some housekeeping data, e.g. the reference count and a pointer to the type object. You can't simply have a raw pointer to a number in memory. For this reason, Numpy "boxes" a number in a Python object if you access an individual elemtent. This happens everytime you access an element, so even A[0] and A[0] are different objects:
>>> A[0] is A[0]
False
This is at the heart of why Numpy can store arrays in a more memory-efficient way: It does not store a full Python object for each entry, and only creates these objects on the fly when needed. It is optimised for vectorised operations on the array, not for individual element access.
When you execute C = A[:] you are creating a new view for the same data. You are not making a copy. You will then have two different wrapper objects, pointed to by A and C respectively, but they are backed by the same buffer. The base attribute of an array refers to the array object it was originally created from:
>>> A.base is None
True
>>> C.base is A
True
New views on the same data are particularly useful when combined with indexing, since you can get views that only include some slice of the original array, but are backed by the same memory.
To actually make a copy of an array, use the copy() method.
As a more general remark, you should not read too much into object identity in Python. In general, if x is y is true, you know that they are really the same object. However, if this returns false, they can still be two different proxies to the same object.

Variables and memory allocated for them?

I have a question about Python deal with memory to copy variables.
For example, I have a list(or string, tuple, dictionary, set) variable
A = [1,2,3]
then I assign the value of A to another variable B
B = A
then if I do "some changes" to A, e.g.,
A.pop(0)
then B also changes, i.e.,
print(A,B) will give me ([2,3], [2,3])
I read some material and they say "B=A did not copy the value of A to a new place in memory labelled by B. It just made the name B point to the same position in memory as A." Can I interpret this as we still only have one place of memory, but now it has 2 names?
However, I found that if I did some other changes to A, such as
A = [5,6] # I reassign A value,
Then I found
print(A,B)
gives me ([5,6],[1,2,3])
So I am confused here. It seems that now we have two places of memory
Your first understanding was correct. When you do
B = A
you now have two names pointing to the same object in memory.
Your misunderstanding is what happens when you do
A = [5, 6]
This doesn't copy [5, 6] to that location in memory. It allocates a new list [5, 6] and then changes the name A to point to this. But B still points to the same list that it pointed to before.
Basically, every time you do
A = <something>
you're changing where A points, not changing the thing that it points to.
Lists are objects and therefore 'call-by-reference'. When you write B=A you'll get a reference (c-pointer) on the object behind A (not A itself!), so basically, as your code is telling you already, A is B == True. The reference is not on A but on the object that A points to, so if you change A to A = [5,6] the interpreter will notice that you've got another reference (B) on the old list and will keep that reference and the list (else it would land in the garbage collector). It'll only change the adress stored in A.
If you then, however, reassing B=A, B will be [5,6].
you assign new obj to a at second time
>>>
>>> a= [1,2,3]
>>> id(a)
4353139632
>>> b = a
>>> id(b)
4353139632
>>> a= [4,5]
>>> id(a)
4353139776
>>> id(b)
4353139632
Lists, tuples, and objects are referenced in Python. You can see these variable names as pointers in C. So, A is a pointer to some location, storing an array, when you did B = A you copied the reference to that location ( the address ) to B.
So, when you changed contents at that location, via A, then consequently, answer would be what that is at that memory location, whether you access it via A or B.
However, if you would like to copy the elements, you can use
B = [i for i in A]
or something like that.
and when you assigned some other value to A, A = [5,6], the reference at A is now pointing to some other memory location, and that at B to the original location, so B stays same.

How does python assign values after assignment operator [duplicate]

This question already has answers here:
How do I pass a variable by reference?
(39 answers)
Closed 7 years ago.
Okay a very silly question I'm sure. But how does python assign value to variables?
Say there is a variable a and is assigned the value a=2. So python assigns a memory location to the variable and a now points to the memory location that contains the value 2. Now, if I assign a variable b=a the variable b also points to the same location as variable a.
Now. If I assign a variable c=2 it still points to the same memory location as a instead of pointing to a new memory location. So, how does python work? Does it check first check all the previously assigned variables to check if any of them share the same values and then assign it the memory location?
Also, it doesn't work the same way with lists. If I assign a=[2,3] and then b=[2,3] and check their memory locations with the id function, I get two different memory locations.But c=b gives me the same location. Can someone explain the proper working and reason for this?
edit :-
Basically my question is because I've just started learning about the is operator and apparently it holds True only if they are pointing to the same location. So, if a=1000 and b=1000 a is b is False but, a="world" b="world" it holds true.
I've faced this problem before and understand that it gets confusing. There are two concepts here:
some data structures are mutable, while others are not
Python works off pointers... most of the time
So let's consider the case of a list (you accidentally stumbled on interning and peephole optimizations when you used ints - I'll get to that later)
So let's create two identical lists (remember lists are mutable)
In [42]: a = [1,2]
In [43]: b = [1,2]
In [44]: id(a) == id(b)
Out[44]: False
In [45]: a is b
Out[45]: False
See, despite the fact that the lists are identical, a and b are different memory locations. Now, this is because python computes [1,2], assigns it to a memory location, and then calls that location a (or b). It would take quite a long time for python to check every allocated memory location to see if [1,2] already exists, to assign b to the same memory location as a.
And that's not to mention that lists are mutable, i.e. you can do the following:
In [46]: a = [1,2]
In [47]: id(a)
Out[47]: 4421968008
In [48]: a.append(3)
In [49]: a
Out[49]: [1, 2, 3]
In [50]: id(a)
Out[50]: 4421968008
See that? The value that a holds has changed, but the memory location has not. Now, what if a bunch of other variable names were assigned to the same memory location?! they would be changed as well, which would be a flaw with the language. In order to fix this, python would have to copy over the entire list into a new memory location, just because I wanted to change the value of a
This is true even of empty lists:
In [51]: a = []
In [52]: b = []
In [53]: a is b
Out[53]: False
In [54]: id(a) == id(b)
Out[54]: False
Now, let's talk about that stuff I said about pointers:
Let's say you want two variables to actually talk about the same memory location. Then, you could assign your second variable to your first:
In [55]: a = [1,2,3,4]
In [56]: b = a
In [57]: id(a) == id(b)
Out[57]: True
In [58]: a is b
Out[58]: True
In [59]: a[0]
Out[59]: 1
In [60]: b[0]
Out[60]: 1
In [61]: a
Out[61]: [1, 2, 3, 4]
In [62]: b
Out[62]: [1, 2, 3, 4]
In [63]: a.append(5)
In [64]: a
Out[64]: [1, 2, 3, 4, 5]
In [65]: b
Out[65]: [1, 2, 3, 4, 5]
In [66]: a is b
Out[66]: True
In [67]: id(a) == id(b)
Out[67]: True
In [68]: b.append(6)
In [69]: a
Out[69]: [1, 2, 3, 4, 5, 6]
In [70]: b
Out[70]: [1, 2, 3, 4, 5, 6]
In [71]: a is b
Out[71]: True
In [72]: id(a) == id(b)
Out[72]: True
Look what happened there! a and b are both assigned to the same memory location. Therefore, any changes you make to one, will be reflected on the other.
Lastly, let's talk briefly about that peephole stuff I mentioned before. Python tries to save space. So, it loads a few small things into memory when it starts up (small integers, for example). As a result, when you assign a variable to a small integer (like 5), python doesn't have to compute 5 before assigning the value to a memory location, and assigning a variable name to it (unlike it did in the case of your lists). Since it already knows what 5 is, and has it stashed away in some memory location, all it does is assign that memory location a variable name. However, for much larger integers, this is no longer the case:
In [73]: a = 5
In [74]: b = 5
In [75]: id(a) == id(b)
Out[75]: True
In [76]: a is b
Out[76]: True
In [77]: a = 1000000000
In [78]: b = 1000000000
In [79]: id(a) == id(b)
Out[79]: False
In [80]: a is b
Out[80]: False
This is an optimization that python performs for small integers. In general, you can't count on a and c pointing to the same location. If you try this experiment with progressively larger integers you'll see that it stops working at some point. I'm pretty sure 1000 is large enough but I'm not near a computer; I thought I remembered it being all integers from -128 to 127 are handled this way (or some other "round number").
Your understanding is generally correct, but it's worth noting that python lists are totally different animals compared to arrays in C or C++. From the documentation:
id(obj)
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
The simple answer to your question is that lists in python are actually references. This results in their memory addresses being different as the address is that of the reference as opposed to the object as one might expect.

Categories

Resources