Python identity operators [duplicate] - python

When playing around with the Python interpreter, I stumbled upon this conflicting case regarding the is operator:
If the evaluation takes place in the function it returns True, if it is done outside it returns False.
>>> def func():
... a = 1000
... b = 1000
... return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)
Since the is operator evaluates the id()'s for the objects involved, this means that a and b point to the same int instance when declared inside of function func but, on the contrary, they point to a different object when outside of it.
Why is this so?
Note: I am aware of the difference between identity (is) and equality (==) operations as described in Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256] as described in "is" operator behaves unexpectedly with integers.
This isn't the case here since the numbers are outside that range and I do want to evaluate identity and not equality.

tl;dr:
As the reference manual states:
A block is a piece of Python program text that is executed as a unit.
The following are blocks: a module, a function body, and a class definition.
Each command typed interactively is a block.
This is why, in the case of a function, you have a single code block which contains a single object for the numeric literal
1000, so id(a) == id(b) will yield True.
In the second case, you have two distinct code objects each with their own different object for the literal 1000 so id(a) != id(b).
Take note that this behavior doesn't manifest with int literals only, you'll get similar results with, for example, float literals (see here).
Of course, comparing objects (except for explicit is None tests ) should always be done with the equality operator == and not is.
Everything stated here applies to the most popular implementation of Python, CPython. Other implementations might differ so no assumptions should be made when using them.
Longer Answer:
To get a little clearer view and additionally verify this seemingly odd behaviour we can look directly in the code objects for each of these cases using the dis module.
For the function func:
Along with all other attributes, function objects also have a __code__ attribute that allows you to peek into the compiled bytecode for that function. Using dis.code_info we can get a nice pretty view of all stored attributes in a code object for a given function:
>>> print(dis.code_info(func))
Name: func
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 2
Stack size: 2
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
1: 1000
Variable names:
0: a
1: b
We're only interested in the Constants entry for function func. In it, we can see that we have two values, None (always present) and 1000. We only have a single int instance that represents the constant 1000. This is the value that a and b are going to be assigned to when the function is invoked.
Accessing this value is easy via func.__code__.co_consts[1] and so, another way to view our a is b evaluation in the function would be like so:
>>> id(func.__code__.co_consts[1]) == id(func.__code__.co_consts[1])
Which, of course, will evaluate to True because we're referring to the same object.
For each interactive command:
As noted previously, each interactive command is interpreted as a single code block: parsed, compiled and evaluated independently.
We can get the code objects for each command via the compile built-in:
>>> com1 = compile("a=1000", filename="", mode="single")
>>> com2 = compile("b=1000", filename="", mode="single")
For each assignment statement, we will get a similar looking code object which looks like the following:
>>> print(dis.code_info(com1))
Name: <module>
Filename:
Argument count: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: NOFREE
Constants:
0: 1000
1: None
Names:
0: a
The same command for com2 looks the same but has a fundamental difference: each of the code objects com1 and com2 have different int instances representing the literal 1000. This is why, in this case, when we do a is b via the co_consts argument, we actually get:
>>> id(com1.co_consts[0]) == id(com2.co_consts[0])
False
Which agrees with what we actually got.
Different code objects, different contents.
Note: I was somewhat curious as to how exactly this happens in the source code and after digging through it I believe I finally found it.
During compilations phase the co_consts attribute is represented by a dictionary object. In compile.c we can actually see the initialization:
/* snippet for brevity */
u->u_lineno = 0;
u->u_col_offset = 0;
u->u_lineno_set = 0;
u->u_consts = PyDict_New();
/* snippet for brevity */
During compilation this is checked for already existing constants. See #Raymond Hettinger's answer below for a bit more on this.
Caveats:
Chained statements will evaluate to an identity check of True
It should be more clear now why exactly the following evaluates to True:
>>> a = 1000; b = 1000;
>>> a is b
In this case, by chaining the two assignment commands together we tell the interpreter to compile these together. As in the case for the function object, only one object for the literal 1000 will be created resulting in a True value when evaluated.
Execution on a module level yields True again:
As previously mentioned, the reference manual states that:
... The following are blocks: a module ...
So the same premise applies: we will have a single code object (for the module) and so, as a result, single values stored for each different literal.
The same doesn't apply for mutable objects:
Meaning that unless we explicitly initialize to the same mutable object (for example with a = b = []), the identity of the objects will never be equal, for example:
a = []; b = []
a is b # always evaluates to False
Again, in the documentation, this is specified:
after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists.

At the interactive prompt, entry are compiled in a single mode which processes one complete statement at a time. The compiler itself (in Python/compile.c) tracks the constants in a dictionary called u_consts that maps the constant object to its index.
In the compiler_add_o() function, you see that before adding a new constant (and incrementing the index), the dict is checked to see whether the constant object and index already exist. If so, they are reused.
In short, that means that repeated constants in one statement (such as in your function definition) are folded into one singleton. In contrast, your a = 1000 and b = 1000 are two separate statements, so no folding takes place.
FWIW, this is all just a CPython implementation detail (i.e. not guaranteed by the language). This is why the references given here are to the C source code rather than the language specification which makes no guarantees on the subject.
Hope you enjoyed this insight into how CPython works under the hood :-)

Related

comparison of variables in python [duplicate]

When playing around with the Python interpreter, I stumbled upon this conflicting case regarding the is operator:
If the evaluation takes place in the function it returns True, if it is done outside it returns False.
>>> def func():
... a = 1000
... b = 1000
... return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)
Since the is operator evaluates the id()'s for the objects involved, this means that a and b point to the same int instance when declared inside of function func but, on the contrary, they point to a different object when outside of it.
Why is this so?
Note: I am aware of the difference between identity (is) and equality (==) operations as described in Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256] as described in "is" operator behaves unexpectedly with integers.
This isn't the case here since the numbers are outside that range and I do want to evaluate identity and not equality.
tl;dr:
As the reference manual states:
A block is a piece of Python program text that is executed as a unit.
The following are blocks: a module, a function body, and a class definition.
Each command typed interactively is a block.
This is why, in the case of a function, you have a single code block which contains a single object for the numeric literal
1000, so id(a) == id(b) will yield True.
In the second case, you have two distinct code objects each with their own different object for the literal 1000 so id(a) != id(b).
Take note that this behavior doesn't manifest with int literals only, you'll get similar results with, for example, float literals (see here).
Of course, comparing objects (except for explicit is None tests ) should always be done with the equality operator == and not is.
Everything stated here applies to the most popular implementation of Python, CPython. Other implementations might differ so no assumptions should be made when using them.
Longer Answer:
To get a little clearer view and additionally verify this seemingly odd behaviour we can look directly in the code objects for each of these cases using the dis module.
For the function func:
Along with all other attributes, function objects also have a __code__ attribute that allows you to peek into the compiled bytecode for that function. Using dis.code_info we can get a nice pretty view of all stored attributes in a code object for a given function:
>>> print(dis.code_info(func))
Name: func
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 2
Stack size: 2
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
1: 1000
Variable names:
0: a
1: b
We're only interested in the Constants entry for function func. In it, we can see that we have two values, None (always present) and 1000. We only have a single int instance that represents the constant 1000. This is the value that a and b are going to be assigned to when the function is invoked.
Accessing this value is easy via func.__code__.co_consts[1] and so, another way to view our a is b evaluation in the function would be like so:
>>> id(func.__code__.co_consts[1]) == id(func.__code__.co_consts[1])
Which, of course, will evaluate to True because we're referring to the same object.
For each interactive command:
As noted previously, each interactive command is interpreted as a single code block: parsed, compiled and evaluated independently.
We can get the code objects for each command via the compile built-in:
>>> com1 = compile("a=1000", filename="", mode="single")
>>> com2 = compile("b=1000", filename="", mode="single")
For each assignment statement, we will get a similar looking code object which looks like the following:
>>> print(dis.code_info(com1))
Name: <module>
Filename:
Argument count: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: NOFREE
Constants:
0: 1000
1: None
Names:
0: a
The same command for com2 looks the same but has a fundamental difference: each of the code objects com1 and com2 have different int instances representing the literal 1000. This is why, in this case, when we do a is b via the co_consts argument, we actually get:
>>> id(com1.co_consts[0]) == id(com2.co_consts[0])
False
Which agrees with what we actually got.
Different code objects, different contents.
Note: I was somewhat curious as to how exactly this happens in the source code and after digging through it I believe I finally found it.
During compilations phase the co_consts attribute is represented by a dictionary object. In compile.c we can actually see the initialization:
/* snippet for brevity */
u->u_lineno = 0;
u->u_col_offset = 0;
u->u_lineno_set = 0;
u->u_consts = PyDict_New();
/* snippet for brevity */
During compilation this is checked for already existing constants. See #Raymond Hettinger's answer below for a bit more on this.
Caveats:
Chained statements will evaluate to an identity check of True
It should be more clear now why exactly the following evaluates to True:
>>> a = 1000; b = 1000;
>>> a is b
In this case, by chaining the two assignment commands together we tell the interpreter to compile these together. As in the case for the function object, only one object for the literal 1000 will be created resulting in a True value when evaluated.
Execution on a module level yields True again:
As previously mentioned, the reference manual states that:
... The following are blocks: a module ...
So the same premise applies: we will have a single code object (for the module) and so, as a result, single values stored for each different literal.
The same doesn't apply for mutable objects:
Meaning that unless we explicitly initialize to the same mutable object (for example with a = b = []), the identity of the objects will never be equal, for example:
a = []; b = []
a is b # always evaluates to False
Again, in the documentation, this is specified:
after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists.
At the interactive prompt, entry are compiled in a single mode which processes one complete statement at a time. The compiler itself (in Python/compile.c) tracks the constants in a dictionary called u_consts that maps the constant object to its index.
In the compiler_add_o() function, you see that before adding a new constant (and incrementing the index), the dict is checked to see whether the constant object and index already exist. If so, they are reused.
In short, that means that repeated constants in one statement (such as in your function definition) are folded into one singleton. In contrast, your a = 1000 and b = 1000 are two separate statements, so no folding takes place.
FWIW, this is all just a CPython implementation detail (i.e. not guaranteed by the language). This is why the references given here are to the C source code rather than the language specification which makes no guarantees on the subject.
Hope you enjoyed this insight into how CPython works under the hood :-)

understanding python id() uniqueness

Python documentation for id() function states the following:
This is an integer which is guaranteed to be unique and constant for
this object during its lifetime. Two objects with non-overlapping
lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
Although, the snippet below shows that id's are repeated. Since I didn't explicitly del the objects, I presume they are all alive and unique (I do not know what non-overlapping means).
>>> g = [0, 1, 0]
>>> for h in g:
... print(h, id(h))
...
0 10915712
1 10915744
0 10915712
>>> a=0
>>> b=1
>>> c=0
>>> d=[a, b,c]
>>> for e in d:
... print(e, id(e))
...
0 10915712
1 10915744
0 10915712
>>> id(a)
10915712
>>> id(b)
10915744
>>> id(c)
10915712
>>>
How can the id values for different objects be the same? Is it so because the value 0 (object of class int) is a constant and the interpreter/C compiler optimizes?
If I were to do a = c, then I understand c to have the same id as a since c would just be a reference to a (alias). I expected the objects a and c to have different id values otherwise, but, as shown above, they have the same values.
What's happening? Or am I looking at this the wrong way?
I would expect the id's for user-defined class' objects to ALWAYS be unique even if they have the exact same member values.
Could someone explain this behavior? (I looked at the other questions that ask uses of id(), but they steer in other directions)
EDIT (09/30/2019):
TO extend what I already wrote, I ran python interpreters in separate terminals and checked the id's for 0 on all of them, they were exactly the same (for the same interpreter); multiple instances of different interpreters had the same id for 0. Python2 vs Python3 had different values, but the same Python2 interpreter had same id values.
My question is because the id()'s documentation doesn't state any such optimizations, which seems misleading (I don't expect every quirk to be noted, but some note alongside the CPython note would be nice)...
EDIT 2 (09/30/2019):
The question is stemmed in understanding this behavior and knowing if there are any hooks to optimize user-define classes in a similar way (by modifying the __equals__ method to identify if two objects are same; perhaps the would point to the same address in memory i.e. same id? OR use some metaclass properties)
Ids are guaranteed to be unique for the lifetime of the object. If an object gets deleted, a new object can acquire the same id. CPython will delete items immediately when their refcount drops to zero. The garbage collector is only needed to break up reference cycles.
CPython may also cache and re-use certain immutable objects like small integers and strings defined by literals that are valid identifiers. This is an implementation detail that you should not rely upon. It is generally considered improper to use is checks on such objects.
There are certain exceptions to this rule, for example, using an is check on possibly-interned strings as an optimization before comparing them with the normal == operator is fine. The dict builtin uses this strategy for lookups to make them faster for identifiers.
a is b or a == b # This is OK
If the string happens to be interned, then the above can return true with a simple id comparison instead of a slower character-by-character comparison, but it still returns true if and only if a == b (because if a is b then a == b must also be true). However, a good implementation of .__eq__() would already do an is check internally, so at best you would only avoid the overhead of calling the .__eq__().
Thanks for the answer, would you elaborate around the uniqueness for user-defined objects, are they always unique?
The id of any object (be it user-defined or not) is unique for the lifetime of the object. It's important to distinguish objects from variables. It's possible to have two or more variables refer to the same object.
>>> a = object()
>>> b = a
>>> c = object()
>>> a is b
True
>>> a is c
False
Caching optimizations mean that you are not always guaranteed to get a new object in cases where one might naiively think one should, but this does not in any way violate the uniqueness guarantee of IDs. Builtin types like int and str may have some caching optimizations, but they follow exactly the same rules: If they are live at the same time, and their IDs are the same, then they are the same object.
Caching is not unique to builtin types. You can implement caching for your own objects.
>>> def the_one(it=object()):
... return it
...
>>> the_one() is the_one()
True
Even user-defined classes can cache instances. For example, this class only makes one instance of itself.
>>> class TheOne:
... _the_one = None
... def __new__(cls):
... if not cls._the_one:
... cls._the_one = super().__new__(cls)
... return cls._the_one
...
>>> TheOne() is TheOne() # There can be only one TheOne.
True
>>> id(TheOne()) == id(TheOne()) # This is what an is-check does.
True
Note that each construction expression evaluates to an object with the same id as the other. But this id is unique to the object. Both expressions reference the same object, so of course they have the same id.
The above class only keeps one instance, but you could also cache some other number. Perhaps recently used instances, or those configured in a way you expect to be common (as ints do), etc.

why some types of data refer to the same memory location

Asked such a question. Why only the type only str and boolean with the same variables refer to one memory location:
a = 'something'
b = 'something'
if a is b: print('True') # True
but we did not write anywhere a = b. hence the interpreter saw that the strings are equal to each other and made a reference to one memory cell.
Of course, if we assign a new value to either of these two variables, there will be no conflict, so now the variable will refer to another memory location
b = 'something more'
if a is b: print('True') # False
with type boolean going on all the same
a = True
b = True
if a is b: print('True') # True
I first thought that this happens with all mutable types. But no. There remained one unchangeable type - tuple. But it has a different behavior, that is, when we assign the same values ​​to variables, we already refer to different memory cells. Why does this happen only with tuple of immutable types
a = (1,9,8)
b = (1,9,8)
if a is b: print('True') # False
In Python, == checks for value equality, while is checks if basically its the same object like so: id(object) == id(object)
Python has some builtin singletons which it starts off with (I'm guessing lower integers and some commonly used strings)
So, if you dig deeper into your statement
a = 'something'
b = 'something'
id(a)
# 139702804094704
id(b)
# 139702804094704
a is b
# True
But if you change it a bit:
a = 'something else'
b = 'something else'
id(a)
# 139702804150640
id(b)
# 139702804159152
a is b
# False
We're getting False because Python uses different memory location for a and b this time, unlike before.
My guess is with tuples (and someone correct me if I'm mistaken) Python allocates different memory every time you create one.
Why do some types cache values? Because you shouldn't be able to notice the difference!
is is a very specialized operator. Nearly always you should use == instead, which will do exactly what you want.
The cases where you want to use is instead of == basically are when you're dealing with objects that have overloaded the behavior of == to not mean what you want it to mean, or where you're worried that you might be dealing with such objects.
If you're not sure whether you're dealing with such objects or not, you're probably not, which means that == is always right and you don't have to ever use is.
It can be a matter of "style points" to use is with known singleton objects, like None, but there's nothing wrong with using == there (again, in the absence of a pathological implementation of ==).
If you're dealing with potentially untrustworthy objects, then you should never do anything that may invoke a method that they control.... and that's a good place to use is. But almost nobody is doing that, and those who do should be aware of the zillion other ways a malicious object could cause problems.
If an object implements == incorrectly then you can get all kinds of weird problems. In the course of debugging those problems, of course you can and should use is! But that shouldn't be your normal way of comparing objects in code you write.
The one other case where you might want to use is rather than == is as a performance optimization, if the object you're dealing with implements == in a particularly expensive way. This is not going to be the case very often at all, and most of the time there are better ways to reduce the number of times you have to compare two objects (e.g. by comparing hash codes instead) which will ultimately have a much better effect on performance without bringing correctness into question.
If you use == wherever you semantically want an equality comparison, then you will never even notice when some types sneakily reuse instances on you.

Python: why assigning a literal to multiple variables in a suite results in their pointing to the same address? [duplicate]

When playing around with the Python interpreter, I stumbled upon this conflicting case regarding the is operator:
If the evaluation takes place in the function it returns True, if it is done outside it returns False.
>>> def func():
... a = 1000
... b = 1000
... return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)
Since the is operator evaluates the id()'s for the objects involved, this means that a and b point to the same int instance when declared inside of function func but, on the contrary, they point to a different object when outside of it.
Why is this so?
Note: I am aware of the difference between identity (is) and equality (==) operations as described in Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256] as described in "is" operator behaves unexpectedly with integers.
This isn't the case here since the numbers are outside that range and I do want to evaluate identity and not equality.
tl;dr:
As the reference manual states:
A block is a piece of Python program text that is executed as a unit.
The following are blocks: a module, a function body, and a class definition.
Each command typed interactively is a block.
This is why, in the case of a function, you have a single code block which contains a single object for the numeric literal
1000, so id(a) == id(b) will yield True.
In the second case, you have two distinct code objects each with their own different object for the literal 1000 so id(a) != id(b).
Take note that this behavior doesn't manifest with int literals only, you'll get similar results with, for example, float literals (see here).
Of course, comparing objects (except for explicit is None tests ) should always be done with the equality operator == and not is.
Everything stated here applies to the most popular implementation of Python, CPython. Other implementations might differ so no assumptions should be made when using them.
Longer Answer:
To get a little clearer view and additionally verify this seemingly odd behaviour we can look directly in the code objects for each of these cases using the dis module.
For the function func:
Along with all other attributes, function objects also have a __code__ attribute that allows you to peek into the compiled bytecode for that function. Using dis.code_info we can get a nice pretty view of all stored attributes in a code object for a given function:
>>> print(dis.code_info(func))
Name: func
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 2
Stack size: 2
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
1: 1000
Variable names:
0: a
1: b
We're only interested in the Constants entry for function func. In it, we can see that we have two values, None (always present) and 1000. We only have a single int instance that represents the constant 1000. This is the value that a and b are going to be assigned to when the function is invoked.
Accessing this value is easy via func.__code__.co_consts[1] and so, another way to view our a is b evaluation in the function would be like so:
>>> id(func.__code__.co_consts[1]) == id(func.__code__.co_consts[1])
Which, of course, will evaluate to True because we're referring to the same object.
For each interactive command:
As noted previously, each interactive command is interpreted as a single code block: parsed, compiled and evaluated independently.
We can get the code objects for each command via the compile built-in:
>>> com1 = compile("a=1000", filename="", mode="single")
>>> com2 = compile("b=1000", filename="", mode="single")
For each assignment statement, we will get a similar looking code object which looks like the following:
>>> print(dis.code_info(com1))
Name: <module>
Filename:
Argument count: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: NOFREE
Constants:
0: 1000
1: None
Names:
0: a
The same command for com2 looks the same but has a fundamental difference: each of the code objects com1 and com2 have different int instances representing the literal 1000. This is why, in this case, when we do a is b via the co_consts argument, we actually get:
>>> id(com1.co_consts[0]) == id(com2.co_consts[0])
False
Which agrees with what we actually got.
Different code objects, different contents.
Note: I was somewhat curious as to how exactly this happens in the source code and after digging through it I believe I finally found it.
During compilations phase the co_consts attribute is represented by a dictionary object. In compile.c we can actually see the initialization:
/* snippet for brevity */
u->u_lineno = 0;
u->u_col_offset = 0;
u->u_lineno_set = 0;
u->u_consts = PyDict_New();
/* snippet for brevity */
During compilation this is checked for already existing constants. See #Raymond Hettinger's answer below for a bit more on this.
Caveats:
Chained statements will evaluate to an identity check of True
It should be more clear now why exactly the following evaluates to True:
>>> a = 1000; b = 1000;
>>> a is b
In this case, by chaining the two assignment commands together we tell the interpreter to compile these together. As in the case for the function object, only one object for the literal 1000 will be created resulting in a True value when evaluated.
Execution on a module level yields True again:
As previously mentioned, the reference manual states that:
... The following are blocks: a module ...
So the same premise applies: we will have a single code object (for the module) and so, as a result, single values stored for each different literal.
The same doesn't apply for mutable objects:
Meaning that unless we explicitly initialize to the same mutable object (for example with a = b = []), the identity of the objects will never be equal, for example:
a = []; b = []
a is b # always evaluates to False
Again, in the documentation, this is specified:
after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists.
At the interactive prompt, entry are compiled in a single mode which processes one complete statement at a time. The compiler itself (in Python/compile.c) tracks the constants in a dictionary called u_consts that maps the constant object to its index.
In the compiler_add_o() function, you see that before adding a new constant (and incrementing the index), the dict is checked to see whether the constant object and index already exist. If so, they are reused.
In short, that means that repeated constants in one statement (such as in your function definition) are folded into one singleton. In contrast, your a = 1000 and b = 1000 are two separate statements, so no folding takes place.
FWIW, this is all just a CPython implementation detail (i.e. not guaranteed by the language). This is why the references given here are to the C source code rather than the language specification which makes no guarantees on the subject.
Hope you enjoyed this insight into how CPython works under the hood :-)

Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?

Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?
I know this question is a bit general but I really would like to know :)
In C and C++, a variable is a named memory location. The value of the variable is the value stored in that location. Assign to the variable and you modify that value. So the variable is the memory location, not the name for it.
In Python, a variable is a name used to refer to an object. The value of the variable is that object. So far sounds like the same thing. But assign to the variable and you don't modify the object itself, rather you alter which object the variable refers to. So the variable is the name, not the object.
For this reason, if you're considering the properties of Python in the abstract, or if you're talking about multiple languages at once, then it's useful to use different names for these two different things. To keep things straight you might avoid talking about variables in Python, and refer to what the assignment operator does as "binding" rather than "assignment".
Note that The Python grammar talks about "assignments" as a kind of statement, not "bindings". At least some of the Python documentation calls names variables. So in the context of Python alone, it's not incorrect to do the same. Different definitions for jargon words apply in different contexts.
In, for example, C, a variable is a location in memory identified by a specific name. For example, int i; means that there is a 4-byte (usually) variable identified by i. This memory location is allocated regardless of whether a value is assigned to it yet. When C runs i = 1000, it is changing the value stored in the memory location i to 1000.
In python, the memory location and size is irrelevant to the interpreter. The closest python comes to a "variable" in the C sense is a value (e.g. 1000) which exists as an object somewhere in memory, with or without a name attached. Binding it to a name happens by i = 1000. This tells python to create an integer object with a value of 1000, if it does not already exist, and bind to to the name 'i'. An object can be bound to multiple names quite easily, e.g:
>>> a = [] # Create a new list object and bind it to the name 'a'
>>> b = a # Get the object bound to the name 'a' and bind it to the name 'b'
>>> a is b # Are the names 'a' and 'b' bound to the same object?
True
This explains the difference between the terms, but as long as you understand the difference it doesn't really matter which you use. Unless you're pedantic.
I'm not sure the name/binding description is the easiest to understand, for example I've always been confused by it even if I've a somewhat accurate understanding of how Python (and cpython in particular) works.
The simplest way to describe how Python works if you're coming from a C background is to understand that all variables in Python are indeed pointers to objects and for example that a list object is indeed an array of pointers to values. After a = b both a and b are pointing to the same object.
There are a couple of tricky parts where this simple model of Python semantic seems to fail, for example with list augmented operator += but for that it's important to note that a += b in Python is not the same as a = a + b but it's a special increment operation (that can also be defined for user types with the __iadd__ method; a += b is indeed a = a.__iadd__(b)).
Another important thing to understand is that while in Python all variables are indeed pointers still there is no pointer concept. In other words you cannot pass a "pointer to a variable" to a function so that the function can change the variable: what in C++ is defined by
void increment(int &x) {
x += 1;
}
or in C by
void increment(int *x) {
*x += 1;
}
in Python cannot be defined because there's no way to pass "a variable", you can only pass "values". The only way to pass a generic writable place in Python is to use a callback closure.
who said you should? Unless you are discussing issues that are directly related to name binding operations; it is perfectly fine to talk about variables and assignments in Python as in any other language. Naturally the precise meaning is different in different programming languages.
If you are debugging an issue connected with "Naming and binding" then use this terminology because Python language reference uses it: to be as specific and precise as possible, to help resolve the problem by avoiding unnecessary ambiguity.
On the other hand, if you want to know what is the difference between variables in C and Python then these pictures might help.
I would say that the distinction is significant because of several of the differences between C and Python:
Duck typing: a C variable is always an instance of a given type - in Python it isn't the type that a name refers to can change.
Shallow copies - Try the following:
>>> a = [4, 5, 6]
>>> b = a
>>> b[1] = 0
>>> a
[4, 0, 6]
>>> b = 3
>>> a
[4, 0, 6]
This makes sense as a and b are both names that spend some of the time bound to a list instance rather than being separate variables.

Categories

Resources