Reference Assignment in Python - Why isn't the value changed? - python

I know that "variable assignment" in python is, in fact, a binding / re-binding of a name (the variable) to an object.
b = [1,2,3]
a = b[2] # binding a to b[2] ?
a = 1000
b is [1, 2, 3]
after this change, why b is not changed?
here is another example:
b = [1,2,3]
a = b
a[0] = 1000
this case b is [1000, 2, 3]
isn't assignment in Python reference binding?
Thank you

a = b[2] # binding a to b[2] ?
Specifically, this binds the name a to the same value referenced by b[2]. It does not bind a to the element in the list b at index 2. The name a is entirely independent of the list where it got its value from.
a = 1000
Now you bind the name a to a new value, the integer 1000. Since a has no association with b, b does not change.
In your second example:
a = b
Now a is bound to the same list value that b is bound to. So when you do
a[0] = 1000
You modify an element in the underlying list that has two different names. When you access the list by either name, you will see the same value.

a[0] = ... is a special form of assignment that desugars to a method call,
a.__setattr__(0, ...)
You aren't reassigning to the name a; you are assigning to a "slot" in the object referenced by a (and b).

Lists are mutable objects, ints aren't. 4 can't become 5 in python, but a list can change its contents.

Related

What happens internally when concatenating two lists in Python?

When concatenating two lists,
a = [0......, 10000000]
b = [0......, 10000000]
a = a + b
does the Python runtime allocate a bigger array and loop through both arrays and put the elements of a and b into the bigger array?
Or does it loop through the elements of b and append them to a and resize as necessary?
I am interested in the CPython implementation.
In CPython, two lists are concatenated in function list_concat.
You can see in the linked source code that that function allocates the space needed to fit both lists.
size = Py_SIZE(a) + Py_SIZE(b);
np = (PyListObject *) list_new_prealloc(size);
Then it copies the items from both lists to the new list.
for (i = 0; i < Py_SIZE(a); i++) {
...
}
...
for (i = 0; i < Py_SIZE(b); i++) {
...
}
You can find out by looking at the id of a before and after concatenating b:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> id(a)
140025874463112
>>> a = a + b
>>> id(a)
140025874467144
Here, since the id is different, we see that the interpreter has created a new list and bound it to the name a. The old a list will be garbage collected eventually.
However, the behaviour can be different when using the augmented assignment operator +=:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> id(a)
140025844068296
>>> a += b
>>> id(a)
140025844068296
Here, since the id is the same, we see that the interpreter has reused the same list object a and appended the values of b to it.
For more detailed information, see these questions:
Why does += behave unexpectedly on lists?
Does list concatenation with the `+` operator always return a new `list` instance?
You can see the implementation in listobject.c::list_concat. Python will get the size of a and b and create a new list object of that size. It will then loop through the values of a and b, which are C pointers to python objects, increment their ref counts and add those pointers to the new list.
It will create a new list with a shallow copy of the items in the first list, followed by a shallow copy of the items in the second list. The + operator calls the object.__add__(self, other) method. For example, for the expression x + y, where x is an instance of a class that has an __add__() method, x.__add__(y) is called. You can read more in the documentation.

Variables and memory allocated for them?

I have a question about Python deal with memory to copy variables.
For example, I have a list(or string, tuple, dictionary, set) variable
A = [1,2,3]
then I assign the value of A to another variable B
B = A
then if I do "some changes" to A, e.g.,
A.pop(0)
then B also changes, i.e.,
print(A,B) will give me ([2,3], [2,3])
I read some material and they say "B=A did not copy the value of A to a new place in memory labelled by B. It just made the name B point to the same position in memory as A." Can I interpret this as we still only have one place of memory, but now it has 2 names?
However, I found that if I did some other changes to A, such as
A = [5,6] # I reassign A value,
Then I found
print(A,B)
gives me ([5,6],[1,2,3])
So I am confused here. It seems that now we have two places of memory
Your first understanding was correct. When you do
B = A
you now have two names pointing to the same object in memory.
Your misunderstanding is what happens when you do
A = [5, 6]
This doesn't copy [5, 6] to that location in memory. It allocates a new list [5, 6] and then changes the name A to point to this. But B still points to the same list that it pointed to before.
Basically, every time you do
A = <something>
you're changing where A points, not changing the thing that it points to.
Lists are objects and therefore 'call-by-reference'. When you write B=A you'll get a reference (c-pointer) on the object behind A (not A itself!), so basically, as your code is telling you already, A is B == True. The reference is not on A but on the object that A points to, so if you change A to A = [5,6] the interpreter will notice that you've got another reference (B) on the old list and will keep that reference and the list (else it would land in the garbage collector). It'll only change the adress stored in A.
If you then, however, reassing B=A, B will be [5,6].
you assign new obj to a at second time
>>>
>>> a= [1,2,3]
>>> id(a)
4353139632
>>> b = a
>>> id(b)
4353139632
>>> a= [4,5]
>>> id(a)
4353139776
>>> id(b)
4353139632
Lists, tuples, and objects are referenced in Python. You can see these variable names as pointers in C. So, A is a pointer to some location, storing an array, when you did B = A you copied the reference to that location ( the address ) to B.
So, when you changed contents at that location, via A, then consequently, answer would be what that is at that memory location, whether you access it via A or B.
However, if you would like to copy the elements, you can use
B = [i for i in A]
or something like that.
and when you assigned some other value to A, A = [5,6], the reference at A is now pointing to some other memory location, and that at B to the original location, so B stays same.

Understanding python's name binding

I am trying to clarify for myself Python's rules for 'assigning' values
to variables.
Is the following comparison between Python and C++ valid?
In C/C++ the statement int a=7 means, memory is allocated for an integer variable called a (the quantity on the LEFT of the = sign)
and only then the value 7 is stored in it.
In Python the statement a=7 means, a nameless integer object with value 7 (the quantity on the RIGHT side of the =) is created first and stored somewhere in memory. Then the name a is bound to this object.
The output of the following C++ and Python programs seem to bear this out, but I would like some feedback whether I am right.
C++ produces different memory locations for a and b
while a and b seem to refer to the same location in Python
(going by the output of the id() function)
C++ code
#include<iostream>
using namespace std;
int main(void)
{
int a = 7;
int b = a;
cout << &a << " " << &b << endl; // a and b point to different locations in memory
return 0;
}
Output: 0x7ffff843ecb8 0x7ffff843ecbc
Python: code
a = 7
b = a
print id(a), ' ' , id(b) # a and b seem to refer to the same location
Output: 23093448 23093448
Yes, you're basically correct. In Python, a variable name can be thought of as a binding to a value. This is one of those "a ha" moments people tend to experience when they truly start to grok (deeply understand) Python.
Assigning to a variable name in Python makes the name bind to a different value from what it currently was bound to (if indeed it was already bound), rather than changing the value it currently binds to:
a = 7 # Create 7, bind a to it.
# a -> 7
b = a # Bind b to the thing a is currently bound to.
# a
# \
# *-> 7
# /
# b
a = 42 # Create 42, bind a to it, b still bound to 7.
# a -> 42
# b -> 7
I say "create" but that's not necessarily so - if a value already exists somewhere, it may be re-used.
Where the underlying data is immutable (cannot be changed), that usually makes Python look as if it's behaving identically to the way other languages do (C and C++ come to mind). That's because the 7 (the actual object that the names are bound to) cannot be changed.
But, for mutable data (same as using pointers in C or references in C++), people can sometimes be surprised because they don't realise that the value behind it is shared:
>>> a = [1,2,3] # a -> [1,2,3]
>>> print(a)
[1, 2, 3]
>>> b = a # a,b -> [1,2,3]
>>> print(b)
[1, 2, 3]
>>> a[1] = 42 # a,b -> [1,42,3]
>>> print(a) ; print(b)
[1, 42, 3]
[1, 42, 3]
You need to understand that a[1] = 42 is different to a = [1, 42, 3]. The latter is an assignment, which would result in a being re-bound to a different object, and therefore independent of b.
The former is simply changing the mutable data that both a and b are bound to, which is why it affects both.
There are ways to get independent copies of a mutable value, with things such as:
b = a[:]
b = [item for item in a]
b = list(a)
These will work to one level (b = a can be thought of as working to zero levels) meaning if the a list contains other mutable things, those will still be shared between a and b:
>>> a = [1, [2, 3, 4], 5]
>>> b = a[:]
>>> a[0] = 8 # This is independent.
>>> a[1][1] = 9 # This is still shared.
>>> print(a) ; print(b) # Shared bit will 'leak' between a and b.
[8, [2, 9, 4], 5]
[1, [2, 9, 4], 5]
For a truly independent copy, you can use deepcopy, which will work down to as many levels as needed to separate the two objects.
In your example code, as "int" is a built-in type in C++, so the operator "=" could not be overloaded, but "=" doesn't always create new object, they could also reference to same object. The python object module is kind of like Java, most of the object is an reference but not a copy.
You can also try this:
a = 7
b = 7
print id(a), ' ' , id(b)
it output the same result, as python will find both a and b point to same const variable

How does append method work in python?

I study programming languages and a quiz question and solution was this:
def foo(x):
x.append (3)
x = [8]
return x
x=[1, 5]
y= foo(x)
print x
print y
Why does this print as follows:
[1 5 3 ]
[8]
Why doesn't x equal to 8 ??
The other two answers are great. I suggest you to try id to get address.
See the following example
def foo(x):
x.append (3)
print "global",id(x)
x = [8]
print "local ",id(x)
return x
x=[1, 5]
print "global",id(x)
y= foo(x)
print "global",id(x)
print x
print y
And the output
global 140646798391920
global 140646798391920
local 140646798392928
global 140646798391920
[1, 5, 3]
[8]
As you can see, the address of the variable x remains same when you manipulate it but changes when you use =. Variable assignment inside a function makes the variable local to the function
You have a lot of things going on there. So let's go step by step.
x = [1,5] You are assigning a list of 1,5 to x
y=foo(x) You are calling foo and passing in x and assigning whatever gets returned from foo
inside foo you call x.append(3) which appends 3 to the list that was passed in.
you then set x = [8] which is now a reference to a local variable x which then gets returned from foo ultimately setting y = [8]
Object References
The key to understanding this is that Python passes variables around using object references. These are similar to pointers in a language like c++, but are different in very key ways.
When an assignment is made (using the assignment operator, =):
x = [1, 5]
Actually TWO things have been created. First is the object itself, which is the list [1, 5]. This object is a separate entity from the second thing, which is the (global) object reference to the object.
Object(s) Object Reference(s)
[1, 5] x (global) <--- New object created
In python, objects are passed into functions by object reference; they are not passed "by reference" or "by value" like in c++. This means that when x is passed into the foo function, there is a new, local object reference to the object created.
Object(s) Object Reference(s)
[1, 5] x (global), x (local to foo) <--- Now two object references
Now inside of foo we call x.append(3), which directly changes the object (referred to by the foo-local x object reference) itself:
Object(s) Object Reference(s)
[1, 5, 3] x (global), x (local to foo) <--- Still two object references
Next, we do something different. We assign the local-foo x object reference (or re-assign, since the object reference already existed previously) to a NEW LIST.
Object(s) Object Reference(s)
[1, 5, 3] x (global) <--- One object reference remains
[8] x (local to foo) <--- New object created
Notice that the global object reference, x, is still there! It has not been impacted. We only re-assigned the local-foo x object reference to a NEW list.
And finally, it should be clear that when the function returns, we have this:
Object(s) Object Reference(s)
[1, 5, 3] x (global) <--- Unchanged
[8] y (global), x (local to foo) <--- New object reference created
Sidebar
Notice that the local-foo x object reference is still there! This is important behavior to understand, because if we do something like this:
def f(a = []):
a.append(1)
return a
f()
f()
f()
We will NOT get:
[1]
[1]
[1]
Instead, we will get:
[1]
[1, 1]
[1, 1, 1]
The statement a = [] is only evaluated ONCE by the interpreter when the program first runs, and that object reference never gets deleted (unless we delete the function itself).
As a result, when f() is called, local-f a is not changed back to []. It "remembers" its previous value; this is because that local object reference is still valid (i.e., it has not been deleted), and therefore the object does not get garbage collected between function calls.
Contrast with Pointers
One of the ways object references are different from pointers is in assignment. If Python used actual pointers, you would expect behavior such as the following:
a = 1
b = a
b = 2
assert a == 2
However, this assertion produces an error. The reason is that b = 2 does NOT impact the object "pointed to" by the object reference, b. It creates a NEW object (2) and re-assigns b to that object.
Another way object references are different from pointers is in deletion:
a = 1
b = a
del a
assert b is None
This assertion also produces an error. The reason is the same as in the example above; del a does NOT impact the object "pointed to" by the object reference, b. It simply deletes the object reference, a. The object reference, b, and the object it points to, 1, are not impacted.
You might be asking, "Well then how do I delete the actual object?" The answer is YOU CAN'T! You can only delete all the references to that object. Once there are no longer any references to an object, the object becomes eligible for garbage collection and it will be deleted for you (although you can force this to happen using the gc module). This feature is known as memory management, and it is one of the primary strengths of Python, and it is also one of the reasons why Python uses object references in the first place.
Mutability
Another subject that needs to be understood is that there are two types of objects: mutable, and immutable. Mutable objects can be changed, while immutable objects cannot be changed.
A list, such as [1, 5], is mutable. A tuple or int, on the other hand, is immutable.
The append Method
Now that all of this is clear, we should be able to intuit the answer to the question "How does append work in Python?" The result of the append() method is an operation on the mutable list object itself. The list is changed "in place", so to speak. There is not a new list created and then assigned to the foo-local x. This is in contrast to the assignment operator =, which creates a NEW object and assigns that object to an object reference.
The append function modifies the x that was passed into the function, whereas assigning something new to x changed the locally scoped value and returned it.
The scope of x inside foo is specific to the function and is independent from the main calling context. x inside foo starts out referencing the same object as x in the main context because that's the parameter that was passed in, but the moment you use the assignment operator to set it to [8] you have allocated a new object to which x inside foo points, which is now totally different from x in the main context. To illustrate further, try changing foo to this:
def foo(x):
print("Inside foo(): id(x) = " + str(id(x)))
x.append (3)
print("After appending: id(x) = " + str(id(x)))
x = [8]
print("After setting x to [8], id(x) = " + str(id(x)))
return x
When I executed, I got this output:
Inside foo(): id(x) = 4418625336
After appending: id(x) = 4418625336
After setting x to [8], id(x) = 4418719896
[1, 5, 3]
[8]
(the IDs you see will vary, but the point will still be clear I hope)
You can see that append just mutates the existing object - no need to allocate a new one. But once the = operator executes, a new object gets allocated (and eventually returned to the main context, where it is assigned to y).

Python Variable Scope (passing by reference or copy?)

Why does the variable L gets manipulated in the sorting(L) function call? In other languages, a copy of L would be passed through to sorting() as a copy so that any changes to x would not change the original variable?
def sorting(x):
A = x #Passed by reference?
A.sort()
def testScope():
L = [5,4,3,2,1]
sorting(L) #Passed by reference?
return L
>>> print testScope()
>>> [1, 2, 3, 4, 5]
Long story short: Python uses pass-by-value, but the things that are passed by value are references. The actual objects have 0 to infinity references pointing at them, and for purposes of mutating that object, it doesn't matter who you are and how you got a reference to the object.
Going through your example step by step:
L = [...] creates a list object somewhere in memory, the local variable L stores a reference to that object.
sorting (strictly speaking, the callable object pointed to be the global name sorting) gets called with a copy of the reference stored by L, and stores it in a local called x.
The method sort of the object pointed to by the reference contained in x is invoked. It gets a reference to the object (in the self parameter) as well. It somehow mutates that object (the object, not some reference to the object, which is merely more than a memory address).
Now, since references were copied, but not the object the references point to, all the other references we discussed still point to the same object. The one object that was modified "in-place".
testScope then returns another reference to that list object.
print uses it to request a string representation (calls the __str__ method) and outputs it. Since it's still the same object, of course it's printing the sorted list.
So whenever you pass an object anywhere, you share it with whoever recives it. Functions can (but usually won't) mutate the objects (pointed to by the references) they are passed, from calling mutating methods to assigning members. Note though that assigning a member is different from assigning a plain ol' name - which merely means mutating your local scope, not any of the caller's objects. So you can't mutate the caller's locals (this is why it's not pass-by-reference).
Further reading: A discussion on effbot.org why it's not pass-by-reference and not what most people would call pass-by-value.
Python has the concept of Mutable and Immutable objects. An object like a string or integer is immutable - every change you make creates a new string or integer.
Lists are mutable and can be manipulated in place. See below.
a = [1, 2, 3]
b = [1, 2, 3]
c = a
print a is b, a is c
# False True
print a, b, c
# [1, 2, 3] [1, 2, 3] [1, 2, 3]
a.reverse()
print a, b, c
# [3, 2, 1] [1, 2, 3] [3, 2, 1]
print a is b, a is c
# False True
Note how c was reversed, because c "is" a. There are many ways to copy a list to a new object in memory. An easy method is to slice: c = a[:]
It's specifically mentioned in the documentation the .sort() function mutates the collection. If you want to iterate over a sorted collection use sorted(L) instead. This provides a generator instead of just sorting the list.
a = 1
b = a
a = 2
print b
References are not the same as separate objects.
.sort() also mutates the collection.

Categories

Resources