Understanding python's name binding - python

I am trying to clarify for myself Python's rules for 'assigning' values
to variables.
Is the following comparison between Python and C++ valid?
In C/C++ the statement int a=7 means, memory is allocated for an integer variable called a (the quantity on the LEFT of the = sign)
and only then the value 7 is stored in it.
In Python the statement a=7 means, a nameless integer object with value 7 (the quantity on the RIGHT side of the =) is created first and stored somewhere in memory. Then the name a is bound to this object.
The output of the following C++ and Python programs seem to bear this out, but I would like some feedback whether I am right.
C++ produces different memory locations for a and b
while a and b seem to refer to the same location in Python
(going by the output of the id() function)
C++ code
#include<iostream>
using namespace std;
int main(void)
{
int a = 7;
int b = a;
cout << &a << " " << &b << endl; // a and b point to different locations in memory
return 0;
}
Output: 0x7ffff843ecb8 0x7ffff843ecbc
Python: code
a = 7
b = a
print id(a), ' ' , id(b) # a and b seem to refer to the same location
Output: 23093448 23093448

Yes, you're basically correct. In Python, a variable name can be thought of as a binding to a value. This is one of those "a ha" moments people tend to experience when they truly start to grok (deeply understand) Python.
Assigning to a variable name in Python makes the name bind to a different value from what it currently was bound to (if indeed it was already bound), rather than changing the value it currently binds to:
a = 7 # Create 7, bind a to it.
# a -> 7
b = a # Bind b to the thing a is currently bound to.
# a
# \
# *-> 7
# /
# b
a = 42 # Create 42, bind a to it, b still bound to 7.
# a -> 42
# b -> 7
I say "create" but that's not necessarily so - if a value already exists somewhere, it may be re-used.
Where the underlying data is immutable (cannot be changed), that usually makes Python look as if it's behaving identically to the way other languages do (C and C++ come to mind). That's because the 7 (the actual object that the names are bound to) cannot be changed.
But, for mutable data (same as using pointers in C or references in C++), people can sometimes be surprised because they don't realise that the value behind it is shared:
>>> a = [1,2,3] # a -> [1,2,3]
>>> print(a)
[1, 2, 3]
>>> b = a # a,b -> [1,2,3]
>>> print(b)
[1, 2, 3]
>>> a[1] = 42 # a,b -> [1,42,3]
>>> print(a) ; print(b)
[1, 42, 3]
[1, 42, 3]
You need to understand that a[1] = 42 is different to a = [1, 42, 3]. The latter is an assignment, which would result in a being re-bound to a different object, and therefore independent of b.
The former is simply changing the mutable data that both a and b are bound to, which is why it affects both.
There are ways to get independent copies of a mutable value, with things such as:
b = a[:]
b = [item for item in a]
b = list(a)
These will work to one level (b = a can be thought of as working to zero levels) meaning if the a list contains other mutable things, those will still be shared between a and b:
>>> a = [1, [2, 3, 4], 5]
>>> b = a[:]
>>> a[0] = 8 # This is independent.
>>> a[1][1] = 9 # This is still shared.
>>> print(a) ; print(b) # Shared bit will 'leak' between a and b.
[8, [2, 9, 4], 5]
[1, [2, 9, 4], 5]
For a truly independent copy, you can use deepcopy, which will work down to as many levels as needed to separate the two objects.

In your example code, as "int" is a built-in type in C++, so the operator "=" could not be overloaded, but "=" doesn't always create new object, they could also reference to same object. The python object module is kind of like Java, most of the object is an reference but not a copy.
You can also try this:
a = 7
b = 7
print id(a), ' ' , id(b)
it output the same result, as python will find both a and b point to same const variable

Related

difference in variable assigning in Python between integer and list

I am studying Wes McKinney's 'Python for data analysis'.
At some point he says:
"When assigning a variable (or name) in Python, you are creating a reference to the object on the righthand side of the equals sign. In practical terms, consider a list of integers:
In [8]: a = [1, 2, 3]
In [9]: b = a
In [11]: a.append(4)
In [12]: b
output will be:
Out[12]: [1, 2, 3, 4]
He reasons as such:
"In some languages, the assignment of b will cause the data [1, 2, 3] to be copied. In Python, a and b actually now refer to the same object, the original list"
My question is that why the same thing does not occur in the case below:
In [8]: a = 5
In [9]: b = a
In [11]: a +=1
In [12]: b
Where I still get
Out[12]: 5
for b?
In the first case, you're creating a list and both a and b are pointing at this list. When you're changing the list, then both variables are pointers at the list including its changes.
But if you increase the value of a variable that points at an integer. 5 is still 5, you're not changing the integer. You're changing which object the variable a is pointing to. So a is now pointing at the value 6, while b is still pointing at 5. You're not changing the thing that a is pointing to, you're changing WHAT a is pointing to. b doesn't care about that.

What happens internally when concatenating two lists in Python?

When concatenating two lists,
a = [0......, 10000000]
b = [0......, 10000000]
a = a + b
does the Python runtime allocate a bigger array and loop through both arrays and put the elements of a and b into the bigger array?
Or does it loop through the elements of b and append them to a and resize as necessary?
I am interested in the CPython implementation.
In CPython, two lists are concatenated in function list_concat.
You can see in the linked source code that that function allocates the space needed to fit both lists.
size = Py_SIZE(a) + Py_SIZE(b);
np = (PyListObject *) list_new_prealloc(size);
Then it copies the items from both lists to the new list.
for (i = 0; i < Py_SIZE(a); i++) {
...
}
...
for (i = 0; i < Py_SIZE(b); i++) {
...
}
You can find out by looking at the id of a before and after concatenating b:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> id(a)
140025874463112
>>> a = a + b
>>> id(a)
140025874467144
Here, since the id is different, we see that the interpreter has created a new list and bound it to the name a. The old a list will be garbage collected eventually.
However, the behaviour can be different when using the augmented assignment operator +=:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> id(a)
140025844068296
>>> a += b
>>> id(a)
140025844068296
Here, since the id is the same, we see that the interpreter has reused the same list object a and appended the values of b to it.
For more detailed information, see these questions:
Why does += behave unexpectedly on lists?
Does list concatenation with the `+` operator always return a new `list` instance?
You can see the implementation in listobject.c::list_concat. Python will get the size of a and b and create a new list object of that size. It will then loop through the values of a and b, which are C pointers to python objects, increment their ref counts and add those pointers to the new list.
It will create a new list with a shallow copy of the items in the first list, followed by a shallow copy of the items in the second list. The + operator calls the object.__add__(self, other) method. For example, for the expression x + y, where x is an instance of a class that has an __add__() method, x.__add__(y) is called. You can read more in the documentation.

What's the point of assignment to slice?

I found this line in the pip source:
sys.path[:] = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
As I understand the line above is doing the same as below:
sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Another one thing is that the first case is two times slower than second:
>>> timeit('a[:] = a + [1,2]', setup='a=[]', number=20000)
2.111023200035561
>>> timeit('a = a + [1,2]', setup='a=[]', number=20000)
1.0290934000513516
The reason as I think is that in the case of slice assignment objects from a (references to objects) are copied to a new list and then copied back to the resized a.
So what are the benefits of using a slice assignment?
Assigning to a slice is useful if there are other references to the same list, and you want all references to pick up the changes.
So if you do something like:
bar = [1, 2, 3]
foo = bar
bar[:] = [5, 4, 3, 2, 1]
print(foo)
this will print [5, 4, 3, 2, 1]. If you instead do:
bar = [5, 4, 3, 2, 1]
print(foo)
the output will be [1, 2, 3].
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Right: That’s the whole point, you’re modifying the object behind the name instead of the name. Thus all other names referring to the same object also see the changes.
Another one thing is that the first case is two times slower than second:
Not really. Slice assignment performs a copy. Performing a copy is an O(n) operation while performing a name assignment is O(1). In other words, the bigger the list, the slower the copy; whereas the name assignment always takes the same (short) time.
Your assumptions are very good!
In python a variable is a name that has been set to point to an object in memory, which in essence is what gives python the ability to be a dynamically typed language, i.e. you can have the same variable as a number, then reassign it to a string etc.
as shown here whenever you assign a new value to a variable, you are just pointing a name to a different object in memory
>>> a = 1
>>> id(a)
10968800
>>> a = 1.0
>>> id(a)
140319774806136
>>> a = 'hello world'
>>> id(a)
140319773005552
(in CPython the id refers to its address in memory).
Now for your question sys.path is a list, and a python list is a mutable type, thus meaning that the type itself can change, i.e.
>>> l = []
>>> id(l)
140319772970184
>>> l.append(1)
>>> id(l)
140319772970184
>>> l.append(2)
>>> id(l)
140319772970184
even though I modified the list by adding items, the list still points to the same object, and following the nature of python, a lists elements as well are only pointers to different areas in memory (the elements aren't the objects, the are only like variables to the objects held there) as shown here,
>>> l
[1, 2]
>>> id(l[0])
10968800
>>> l[0] = 3
>>> id(l[0])
10968864
>>> id(l)
140319772970184
After reassigning to l[0] the id of that element has changed. but once again the list hasn't.
Seeing that assigning to an index in the list only changes the places where lists elements where pointing, now you will understand that when I reassign l I don't reassign, I just change where l was pointing
>>> id(l)
140319772970184
>>> l = [4, 5, 6]
>>> id(l)
140319765766728
but if I reassign to all of ls indexes, then l stays the same object only the elements point to different places
>>> id(l)
140319765766728
>>> l[:] = [7, 8, 9]
>>> id(l)
140319765766728
That will also give you understanding on why it is slower, as python is reassigning the elements of the list, and not just pointing the list somewhere else.
One more little point if you are wondering about the part where the line finishes with
sys.path[:] = ... + sys.path
it goes in the same concept, python first creates the object on the right side of the = and then points the name on the left side to the new object, so when python is still creating the new list on the right side, sys.path is in essence the original list, and python takes all of its elements and then reassigns all of the newly created elements to the mappings in the original sys.paths addresses (since we used [:])
now for why pip is using [:] instead of reassigning, I don't really know, but I would believe that it might have a benefit of reusing the same object in memory for sys.path.
python itself also does it for the small integers, for example
>>> id(a)
10968800
>>> id(b)
10968800
>>> id(c)
10968800
a, b and c all point to the same object in memory even though all requested to create an 1 and point to it, since python knows that the small numbers are most probably going to be used a lot in programs (for example in for loops) so they create it and reuse it throughout.
(you might also find it being the case with filehandles that python will recycle instead of creating a new one.)
You are right, slice assignment will not rebind, and slice object is one type of objects in Python. You can use it to set and get.
In [1]: a = [1, 2, 3, 4]
In [2]: a[slice(0, len(a), 2)]
Out[2]: [1, 3]
In [3]: a[slice(0, len(a), 2)] = 6, 6
In [4]: a[slice(0, len(a), 1)] = range(10)
In [5]: a
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [6]: a[:] = range(4)
In [7]: a
Out[7]: [0, 1, 2, 3]

Why do python lists act like this when using the = operator [duplicate]

This question already has answers here:
Variable assignment and modification (in python) [duplicate]
(6 answers)
Closed 4 years ago.
How come the following code:
a = [1,2,3]
b = a
b[0] = 3
print(a)
will print list b after it has been altered?[3,2,3].
Also why is this true but that the following code:
a = [1,2,3]
b = a
b = [0,0,0]
print(a,b)
prints [1, 2, 3] [0, 0, 0]?? This seems inconsistent. If the first code is true, then shouldn't the second code print [0,0,0][0,0,0]? Can someone please provide an explanation for this?
In python there are two types of data... mutable and immutable. Numbers, strings, boolean, tuples, and other simple types are immutable. Dicts, lists, sets, objects, classes, and other complex types are mutable.
When you say:
a = [1,2,3]
b = a
You've created a single mutable list in memory, assigned a to point to it, and then assigned b to point to it. It's the same thing in memory.
Therefore when you mutate it (modify it):
b[0] = 3
It is a modification (mutation) of the index [0] of the value which b points to at that same memory location.
However, when you replace it:
b = [0,0,0]
It is creating a new mutable list in memory and assigning b to point at it.
Check out the id() function. It will tell you the "address" of any variable. You can see which names are pointing to the same memory location with id(varname).
Bonus: Every value in python is passed by reference... meaning that when you assign it to a variable it simply causes that variable to point to that value where it was in memory. Having immutable types allows python to "reuse" the same memory location for common immutable types.
Consider some common values when the interpreter starts up:
>>> import sys
>>> sys.getrefcount('abc')
68
>>> sys.getrefcount(100)
110
>>> sys.getrefcount(2)
6471
However, a value that is definitely not present would return 2. This has to do with the fact that a couple of references to that value were in-use during the call to sys.getrefcount
>>> sys.getrefcount('nope not me. I am definitely not here already.')
2
Notice that an empty tuple has a lot of references:
>>> sys.getrefcount(tuple())
34571
But an empty list has no extra references:
>>> sys.getrefcount(list())
1
Why is this? Because tuple is immutable so it is fine to share that value across any number of variables. However, lists are mutable so they MUST NOT be shared across arbitrary variables or changes to one would affect the others.
Incidentally, this is also why you must NEVER use mutable types as default argument values to functions. Consider this innocent little function:
>>> def foo(value=[]):
... value.append(1)
... print(value)
...
...
When you call it you might expect to get [1] printed...
>>> foo()
[1]
However, when you call it again, you prob. won't expect to get [1,1] out... ???
>>> foo()
[1, 1]
And on and on...
>>> foo()
[1, 1, 1]
>>> foo()
[1, 1, 1, 1]
WHY IS THIS? Because default arguments to functions are evaluated once during function definition, and not at function run time. That way if you use a mutable value as a default argument value, then you will be stuck with that one value, mutating in unexpected ways as the function is called multiple times.
The proper way to do it is this:
>>> def foo(value=None):
... if value is None:
... value = []
... value.append(1)
... print(value)
...
...
>>>
>>> foo()
[1]
>>> foo()
[1]
>>> foo()
[1]

How to create a new data object in memory, rather than pointing to one? (in Python 3)

As an illustration of my question, say I want to swap two elements in an array:
# Array Integer Integer -> Array
# I want to swap the values at locations i1 and i2.
# I want to return the array with values swapped.
def swap(A, i1, i2):
newA = A
newA[i1] = A[i2]
newA[i2] = A[i1]
return newA
Run this code, and an array is returned with only one value changed:
> testArray = [1, 2, 3, 4]
> swap(testArray, 0, 1)
[2, 2, 3, 4]
Also, if I now check what testArray is (I want it to still be [1, 2, 3, 4]):
> testArray
[2, 2, 3, 4]
So my questions are:
I guess newA = A uses a pointer to A. I'm used to programming in a style where I return a new data structure each time. I'd like to create a whole new array, newA, which just has the same values as A. Then I can let garbage collection take care of newA later. Can I do this in python?
What is newA = A really doing?
Why would someone create a new variable (like newA) to point to the old one (A)? Why wouldn't they just mutate A directly?
And why does the syntax behave differently for atomic data?
i.e.
a = 1
b = a # this same syntax doesn't seem to be a pointer.
b = 2
> a
1
If it is list of integers then you can do:
def swap(A, i1, i2):
temp = A[i1]
A[i1] = A[i2]
A[i2] = temp
return A
or more pythonic way
def swap(A, i1, i2):
A[i1], A[i2] = A[i2], A[i1]
return A
-
newA = A
this create "alias" - both variables use the same list in memory. When you change value in A then you change value in newA too.
see visualization on PythonTutor.com (it is long link with Python code)
http://pythontutor.com/visualize.html#code=A+%3D+%5B1,+2,+3,+4%5D%0A%0AnewA+%3D+A&mode=display&origin=opt-frontend.js&cumulative=false&heapPrimitives=false&textReferences=false&py=2&rawInputLstJSON=%5B%5D&curInstr=2
-
To create copy you can use slicing
newA = A[:] # python 2 & 3
or
import copy
newA = copy.copy(A)
newA = copy.deepcopy(A)
or on Python 3
newA = A.copy()
-
integers and float are kept in variable but other objects are too big to keep it in variable so python keeps only reference/pointer to memory with this big object. Sometimes it is better to send reference (to function or class) than clone all data and send it.
a = 1
b = a # copy value
a = [1,2,3] # big object - variable keeps reference/pointer
b = a # copy reference
How to create a new list?
Some ways and timings for large and small lists:
>>> for way in 'A[:]', 'list(A)', 'A.copy()':
print(way, timeit(way, 'A = list(range(100000))', number=10000))
A[:] 7.3193273699369
list(A) 7.248674272188737
A.copy() 7.366528860679182
>>> for way in 'A[:]', 'list(A)', 'A.copy()':
print(way, timeit(way, 'A = list(range(10))', number=10000000))
A[:] 4.324301856050852
list(A) 7.022488782549999
A.copy() 4.61609732160332
What is newA = A really doing?
Makes variable newA reference the same object A references.
Why would someone create a new variable (like newA) to point to the old one (A)? Why wouldn't they just mutate A directly?
Just an example:
if <something>:
now = A
...
else:
now = B
...
<modify now>
And why does the syntax behave differently for atomic data?
It doesn't. It does make the new variable reference the same object, also for ints. You just don't notice it, because ints can't be changed. But you can see it by testing with is:
>>> a = 1234
>>> b = a
>>> b is a
True <== See?
>>> b = 1234
>>> b is a
False <== And now it's a different 1234 object

Categories

Resources