Python's Passing by References [duplicate] - python

This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 5 years ago.
Hello I am trying to understand how Python's pass by reference works.
I have an example:
>>>a = 1
>>>b = 1
>>>id(a);id(b)
140522779858088
140522779858088
This makes perfect sense since a and b are both referencing the same value that they would have the identity. What I dont quite understand is how this example:
>>>a = 4.4
>>>b = 1.0+3.4
>>>id(a);id(b)
140522778796184
140522778796136
Is different from this example:
>>>a = 2
>>>b = 2 + 0
>>>id(a);id(b)
140522779858064
140522779858064
Is it because in the 3rd example the 0 int object is being viewed as "None" by the interpreter and is not being recognized as needing a different identity from the object which variable "a" is referencing(2)? Whereas in the 2nd example "b" is adding two different int objects and the interpreter is allocating memory for both of those objects to be added, which gives variable "a", a different identity from variable "b"?

In your first example, the names a and b are both "referencing" the same object because of interning. The assignment statement resulted in an integer with the same id only because it has reused a preexisting object that happened to be hanging around in memory already. That's not a reliable behavior of integers:
>>> a = 257
>>> b = 257
>>> id(a), id(b)
(30610608, 30610728)
As demonstrated above, if you pick a big enough integer then it will behave as the floats in your second example behaved. And interning small integers is optional in the Python language anyway, this happens to be a CPython implementation detail: it's a performance optimization intended to avoid the overhead of creating a new object. We can speed things up by caching commonly used integer instances, at the cost of a higher memory footprint of the Python interpreter.
Don't think about "reference" and "value" when dealing with Python, the model that works for C doesn't really work well here. Instead think of "names" and "objects".
The diagram above illustrates your third example. 2 is an object, a and b are names. We can have different names pointing at the same object. And objects can exist without any name.
Assigning a variable only attaches a nametag. And deleting a variable only removes a nametag. If you keep this idea in mind, then the Python object model will never surprise you again.

As stated here, CPython caches integers from -5 to 256. So all variables within this range with the same value share the same id (I wouldn't bet on that for future versions, though, but that's the current implementation)
There's no such thing for floats probably because there's an "infinity" of possible values (well not infinite but big because of floating point), so the chance of computing the same value by different means is really low compared to integers.
>>> a=4.0
>>> b=4.0
>>> a is b
False

Python variables are always references to objects. These objects can be divided into mutable and immutable objects.
A mutable type can be modified without its id being modified, thus every variable that points to this object will get updated. However, an immutable object can not be modified this way, so Python generates a new object with the changes and reassigns the variable to point to this new object, thus the id of the variable before the change will not match the id of the variable after the change.
Integers and floats are immutable, so after a change they will point to a different object and thus have different ids.
The problem is that CPython "caches" some common integer values so that there are not multiple objects with the same value in order to save memory, and 2 is one of those cache-ed integers, so every time a variable points to the integer 2 it will have the same id (its value will be different for different python executions).

Related

Variable assignement and the relationship with the fact objects are mutable or unmutable [duplicate]

This question already has answers here:
Immutable vs Mutable types
(18 answers)
How do I pass a variable by reference?
(39 answers)
Closed 2 years ago.
Consider the following code:
aa=2
bb=aa
aa=4
print(bb) # displays 2
list1=[0]
list2=list1
list1[0]=5
print(list2) # displays [5]
As we see, the behavior is different from the variable containing number and list. I know that in the case of the numbers, when I do:
aa=4
I am telling to Python to create a new memory slot and putting the integer 4 in it. So before this line, bb and aa were referring to the same memory slot but not anymore after this line. bb still refers to the original memory slot containing the value 2 and it is why it hasn't been "updated".
For the list on the other hand, everything is the same before the following line
list1[0]=5
But right after this line, the value contained in the memory slot where list1 and list2 both refers to is changed. This is why list2 will also be changed after.
My questions:
Is the notion of mutable/unmutable object exactly related to this behavior ? In practice, for any mutable object, the same kind of behavior as the one for the list here would occur: if I change the value of an object a, then all object that were assigned to it will also be updated. But for unmutable object, if I change the value of this object a, all other object that were assigned to it before will not have been updated or it is simply impossible to change the value of a at all (like for tuples for instance).
Second but very highly related question. What confused me is that while being unmutable, numbers can be "modified" in the sense that the following code works:
a=2
a=3
Even though I know in practice a new memory is created at the second line (and the first line creates "an orphan"), this is a kind of specific behavior. For tuples it is simply impossible to modify them. Is there a logic behind this "dissymetry" in the behavior depending on the kind of variable within the unmutable ones ?

python atomic data types

It was written here that Python has both atomic and reference object types. Atomic objects are: int, long, complex.
When assigning atomic object, it's value is copied, when assigning reference object it's reference is copied.
My question is:
why then, when i do the code bellow i get 'True'?
a = 1234
b = a
print id(a) == id(b)
It seems to me that i don't copy value, i just copy reference, no matter what type it is.
Assignment (binding) in Python NEVER copies data. It ALWAYS copies a reference to the value being bound.
The interpreter computes the value on the right-hand side, and the left-hand side is bound to the new value by referencing it. If expression on the right-hand side is an existing value (in other words, if no operators are required to compute its value) then the left-hand side will be a reference to the same object.
After
a = b
is executed,
a is b
will ALWAYS be true - that's how assignment works in Python. It's also true for containers, so x[i].some_attribute = y will make x[i].some_attribute is y true.
The assertion that Python has atomic types and reference types seems unhelpful to me, if not just plain untrue. I'd say it has atomic types and container types. Containers are things like lists, tuples, dicts, and instances with private attributes (to a first approximation).
As #BallPointPen helpfully pointed out in their comment, mutable values can be altered without needing to re-bind the reference. Since immutable values cannot be altered, references must be re-bound in order to refer to a different value.
Edit: Recently reading the English version of the quoted page (I'm afraid I don't understand Russian) I see "Python uses dynamic typing, and a combination of reference counting and a cycle-detecting garbage collector for memory management." It's possible the Russian page has mistranslated this to give a false impression of the language, or that it was misunderstood by the OP. But Python doesn't have "reference types" except in the most particular sense for weakrefs and similar constructs.
int types are immutable.
what you see is the reference for the number 1234 and that will never change.
for mutable object like list, dictionary you can use
import copy
a = copy.deepcopy(b)
Actually like #spectras said there are only references but there are immutable objects like floats, ints, tuples. For immutable objects (apart from memory consumption) it just does not matter if you pass around references or create copies.
The interpreter even does some optimizations making use of numbers with the same value being interchangeable making checking numbers for identity interesting because eg for
a=1
b=1
c=2/2
d=12345
e=12345*1
a is b is true and a is c is also true but d is e is false (== works normally as expected)
Immutable objects are atomic the way that changing them is threadsafe because you do not actually change the object itself but just put a new reference in a variable (which is threadsafe).

What is an object reference in Python?

A introductory Python textbook defined 'object reference' as follows, but I didn't understand:
An object reference is nothing more than a concrete representation of the object’s identity (the memory address where the object is stored).
The textbook tried illustrating this by using an arrow to show an object reference as some sort of relation going from a variable a to an object 1234 in the assignment statement a = 1234.
From what I gathered off of Wikipedia, the (object) reference of a = 1234 would be an association between a and 1234 were a was "pointing" to 1234 (feel free to clarify "reference vs. pointer"), but it has been a bit difficult to verify as (1) I'm teaching myself Python, (2) many search results talk about references for Java, and (3) not many search results are about object references.
So, what is an object reference in Python? Thanks for the help!
Whatever is associated with a variable name has to be stored in the program's memory somewhere. An easy way to think of this, is that every byte of memory has an index-number. For simplicity's sake, lets imagine a simple computer, these index-numbers go from 0 (the first byte), upwards to however many bytes there are.
Say we have a sequence of 37 bytes, that a human might interpret as some words:
"The Owl and the Pussy-cat went to sea"
The computer is storing them in a contiguous block, starting at some index-position in memory. This index-position is most often called an "address". Obviously this address is absolutely just a number, the byte-number of the memory these letters are residing in.
#12000 The Owl and the Pussy-cat went to sea
So at address 12000 is a T, at 12001 an h, 12002 an e ... up to the last a at 12037.
I am labouring the point here because it's fundamental to every programming language. That 12000 is the "address" of this string. It's also a "reference" to it's location. For most intents and purposes an address is a pointer is a reference. Different languages have differing syntactic handling of these, but essentially they're the same thing - dealing with a block of data at a given number.
Python and Java try to hide this addressing as much as possible, where languages like C are quite happy to expose pointers for exactly what they are.
The take-away from this, is that an object reference is the number of where the data is stored in memory. (As is a pointer.)
Now, most programming languages distinguish between simple types: characters and numbers, and complex types: strings, lists and other compound-types. This is where the reference to an object makes a difference.
So when performing operations on simple types, they are independent, they each have their own memory for storage. Imagine the following sequence in python:
>>> a = 3
>>> b = a
>>> b
3
>>> b = 4
>>> b
4
>>> a
3 # <-- original has not changed
The variables a and b do not share the memory where their values are stored. But with a complex type:
>>> s = [ 1, 2, 3 ]
>>> t = s
>>> t
[1, 2, 3]
>>> t[1] = 8
>>> t
[1, 8, 3]
>>> s
[1, 8, 3] # <-- original HAS changed
We assigned t to be s, but obviously in this case t is s - they share the same memory. Wait, what! Here we have found out that both s and t are a reference to the same object - they simply share (point to) the same address in memory.
One place Python differs from other languages is that it considers strings as a simple type, and these are independent, so they behave like numbers:
>>> j = 'Pussycat'
>>> k = j
>>> k
'Pussycat'
>>> k = 'Owl'
>>> j
'Pussycat' # <-- Original has not changed
Whereas in C strings are definitely handled as complex types, and would behave like the Python list example.
The upshot of all this, is that when objects that are handled by reference are modified, all references-to this object "see" the change. So if the object is passed to a function that modifies it (i.e.: the content of memory holding the data is changed), the change is reflected outside that function too.
But if a simple type is changed, or passed to a function, it is copied to the function, so the changes are not seen in the original.
For example:
def fnA( my_list ):
my_list.append( 'A' )
a_list = [ 'B' ]
fnA( a_list )
print( str( a_list ) )
['B', 'A'] # <-- a_list was changed inside the function
But:
def fnB( number ):
number += 1
x = 3
fnB( x )
print( x )
3 # <-- x was NOT changed inside the function
So keeping in mind that the memory of "objects" that are used by reference is shared by all copies, and memory of simple types is not, it's fairly obvious that the two types operate differently.
Objects are things. Generally, they're what you see on the right hand side of an equation.
Variable names (often just called "names") are references to the actual object. When a name is on the right hand side of an equation1, the object that it references is automatically looked up and used in the equation. The result of the expression on the right hand side is an object. The name on the left hand side of the equation becomes a reference to this (possibly new) object.
Note, you can have object references that aren't explicit names if you are working with container objects (like lists or dictionaries):
a = [] # the name a is a reference to a list.
a.append(12345) # the container list holds a reference to an integer object
In a similar way, multiple names can refer to the same object:
a = []
b = a
We can demonstrate that they are the same object by looking at the id of a and b and noting that they are the same. Or, we can look at the "side-effects" of mutating the object referenced by a or b (if we mutate one, we mutate both because they reference the same object).
a.append(1)
print a, b # look mom, both are [1]!
1More accurately, when a name is used in an expression
In python, strictly speaking, the language has only naming references to the objects, that behave as labels. The assignment operator only binds to the name. The objects will stay in the memory until they are garbage collected
Ok, first things first.
Remember, there are two types of objects in python.
Mutable : Whose values can be changed. Eg: dictionaries, lists and user defined objects(unless defined immutable)
Immutable : Whose values can't be changed. Eg: tuples, numbers, booleans and strings.
Now, when python says PASS BY OBJECT REFERENECE, just remember that
If the underlying object is mutable, then any modifications done will persist.
and,
If the underlying object is immutable, then any modifications done will not persist.
If you still want examples for clarity, scroll down or click here .

Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?

Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?
I know this question is a bit general but I really would like to know :)
In C and C++, a variable is a named memory location. The value of the variable is the value stored in that location. Assign to the variable and you modify that value. So the variable is the memory location, not the name for it.
In Python, a variable is a name used to refer to an object. The value of the variable is that object. So far sounds like the same thing. But assign to the variable and you don't modify the object itself, rather you alter which object the variable refers to. So the variable is the name, not the object.
For this reason, if you're considering the properties of Python in the abstract, or if you're talking about multiple languages at once, then it's useful to use different names for these two different things. To keep things straight you might avoid talking about variables in Python, and refer to what the assignment operator does as "binding" rather than "assignment".
Note that The Python grammar talks about "assignments" as a kind of statement, not "bindings". At least some of the Python documentation calls names variables. So in the context of Python alone, it's not incorrect to do the same. Different definitions for jargon words apply in different contexts.
In, for example, C, a variable is a location in memory identified by a specific name. For example, int i; means that there is a 4-byte (usually) variable identified by i. This memory location is allocated regardless of whether a value is assigned to it yet. When C runs i = 1000, it is changing the value stored in the memory location i to 1000.
In python, the memory location and size is irrelevant to the interpreter. The closest python comes to a "variable" in the C sense is a value (e.g. 1000) which exists as an object somewhere in memory, with or without a name attached. Binding it to a name happens by i = 1000. This tells python to create an integer object with a value of 1000, if it does not already exist, and bind to to the name 'i'. An object can be bound to multiple names quite easily, e.g:
>>> a = [] # Create a new list object and bind it to the name 'a'
>>> b = a # Get the object bound to the name 'a' and bind it to the name 'b'
>>> a is b # Are the names 'a' and 'b' bound to the same object?
True
This explains the difference between the terms, but as long as you understand the difference it doesn't really matter which you use. Unless you're pedantic.
I'm not sure the name/binding description is the easiest to understand, for example I've always been confused by it even if I've a somewhat accurate understanding of how Python (and cpython in particular) works.
The simplest way to describe how Python works if you're coming from a C background is to understand that all variables in Python are indeed pointers to objects and for example that a list object is indeed an array of pointers to values. After a = b both a and b are pointing to the same object.
There are a couple of tricky parts where this simple model of Python semantic seems to fail, for example with list augmented operator += but for that it's important to note that a += b in Python is not the same as a = a + b but it's a special increment operation (that can also be defined for user types with the __iadd__ method; a += b is indeed a = a.__iadd__(b)).
Another important thing to understand is that while in Python all variables are indeed pointers still there is no pointer concept. In other words you cannot pass a "pointer to a variable" to a function so that the function can change the variable: what in C++ is defined by
void increment(int &x) {
x += 1;
}
or in C by
void increment(int *x) {
*x += 1;
}
in Python cannot be defined because there's no way to pass "a variable", you can only pass "values". The only way to pass a generic writable place in Python is to use a callback closure.
who said you should? Unless you are discussing issues that are directly related to name binding operations; it is perfectly fine to talk about variables and assignments in Python as in any other language. Naturally the precise meaning is different in different programming languages.
If you are debugging an issue connected with "Naming and binding" then use this terminology because Python language reference uses it: to be as specific and precise as possible, to help resolve the problem by avoiding unnecessary ambiguity.
On the other hand, if you want to know what is the difference between variables in C and Python then these pictures might help.
I would say that the distinction is significant because of several of the differences between C and Python:
Duck typing: a C variable is always an instance of a given type - in Python it isn't the type that a name refers to can change.
Shallow copies - Try the following:
>>> a = [4, 5, 6]
>>> b = a
>>> b[1] = 0
>>> a
[4, 0, 6]
>>> b = 3
>>> a
[4, 0, 6]
This makes sense as a and b are both names that spend some of the time bound to a list instance rather than being separate variables.

Why do Python variables take a new address (id) every time they're modified?

Just wondering what the logic behind this one is? On the surface it seems kind of inefficient, that every time you do something simple like "x=x+1" that it has to take a new address and discard the old one.
The Python variable (called an identifier or name, in Python) is a reference to a value. The id() function says something for that value, not the name.
Many values are not mutable; integers, strings, floats all do not change in place. When you add 1 to another integer, you return a new integer that then replaces the reference to the old value.
You can look at Python names as labels, tied to values. If you imagine values as balloons, you are retying the label a new balloon each time you assign to that name. If there are no other labels attached to a balloon anymore, it simply drifts away in the wind, never to be seen again. The id() function gives you a unique number for that balloon.
See this previous answer of mine where I talk a little bit more about that idea of values-as-balloons.
This may seem inefficient. For many often used and small values, Python actually uses a process called interning, where it will cache a stash of these values for re-use. None is such a value, as are small integers and the empty tuple (()). You can use the intern() function to do the same with strings you expect to use a lot.
But note that values are only cleaned up when their reference count (the number of 'labels') drops to 0. Loads of values are reused all over the place all the time, especially those interned integers and singletons.
Because the basic types are immutable, so every time you modify it, it needs to be instantiated again
...which is perfectly fine, especially for thread-safe functions
The = operator doesn't modify an object, it assigns the name to a completely different object, which may or may not already have an id.
For your example, integers are immutable; there's no way to add something to one and keep the same id.
And, in fact, small integers are interned at least in cPython, so if you do:
x = 1
y = 2
x = x + 1
Then x and y may have the same id.
In python "primitive" types like ints and strings are immutable, which means they can not be modified.
Python is actually quite efficient, because, as #Wooble commented, «Very short strings and small integers are interned.»: if two variables reference the same (small) immutable value their id is the same (reducing duplicated immutables).
>>> a = 42
>>> b = 5
>>> id(a) == id(b)
False
>>> b += 37
>>> id(a) == id(b)
True
The reason behind the use of immutable types is a safe approach to the concurrent access on those values.
At the end of the day it depends on a design choice.
Depending on your needs you can take more advantage of an implementation instead of another.
For instance, a different philosophy can be found in a somewhat similar language, Ruby, where those types that in Python are immutable, are not.
To be accurate, assignment x=x+1 doesn't modify the object that x is referencing, it just lets the x point to another object whose value is x+1.
To understand the logic behind, one needs to understand the difference between value semantics and reference semantics.
An object with value semantics means only its value matters, not its identity. While an object with reference semantics focuses on its identity(in Python, identity can be returned from id(obj)).
Typically, value semantics implies immutability of the object. Or conversely, if an object is mutable(i.e. in-place change), that means it has reference semantics.
Let's briefly explain the rationale behind this immutability.
Objects with reference semantics can be changed in-place without losing their original addresses/identities. This makes sense in that it's the identity of an object with reference semantics that makes itself distinguishable from other objects.
In contrast, an object with value-semantics should never change itself.
First, this is possible and reasonable in theory. Since only the value(not its identity) is significant, when a change is needed, it's safe to swap it to another identity with different value. This is called referential transparency. Be noted that this is impossible for the objects with reference semantics.
Secondly, this is beneficial in practice. As the OP thought, it seems inefficient to discard the old objects each time when it's changed , but most time it's more efficient than not. For one thing, Python(or any other language) has intern/cache scheme to make less objects to be created. What's more, if objects of value-semantics were designed to be mutable, it would take much more space in most cases.
For example, Date has a value semantics. If it's designed to be mutable, any method that returning a date from internal field will exposes the handle to outside world, which is risky(e.g. outside can directly modify this internal field without resorting to public interface). Similarly, if one passes any date object by reference to some function/method, this object could be modified in that function/method, which may be not as expected. To avoid these kinds of side-effect, one has to do defensive programming: instead of directly returning the inner date field, he returns a clone of it; instead of passing by reference, he passes by value which means extra copies are made. As one could imagine, there are more chances to create more objects than necessary. What's worse, code becomes more complicated with these extra cloning.
In a word, immutability enforces the value-semantics, it usually involves less object creation, has less side-effects and less hassles, and is more test-friendly. Besides, immutable objects are inherently thread-safe, which means less locks and better efficiency in multithreading environment.
That's the reason why basic data types of value-semantics like number, string, date, time are all immutable(well, string in C++ is an exception, that's why there're so many const string& stuffs to avoid string being modified unexpectedly). As a lesson, Java made mistakes on designing value-semantic class Date, Point, Rectangle, Dimension as mutable.
As we know, objects in OOP have three characteristics: state, behavior and identity. Objects with value semantics are not typical objects in that their identities do not matter at all. Usually they are passive, and mostly used to describe other real, active objects(i.e. those with reference semantics). This is a good hint to distinguish between value semantics and reference semantics.

Categories

Resources