I am puzzled with an issue of a variable assignment in python. I tried looking for pages on internet about this issue, but I still do not understand it.
Let's say I assign a certain value to the variable:
v = 2
Python reserved a place in some memory location with a value 2. We now know how to access that particular memory location by assigning to it a variable name: v
If then, we do this:
v = v + 4
A new memory location will be filed with the value 6 (2+4), so now to access this value, we are binding the variable v to it.
Is this correct, or completely incorrect?
And what happens with the initial value 2 in the memory location? It is no longer bound to any variable, so this value can not be accessed any more? Does this mean that in the next couple of minutes, this memory location will be freed by the garbage collector?
Thank you for the reply.
EDIT:
The v = 2 and v = v + 2 are just simple examples. In my case instead of 2 and v+2 I have a non-python native objects which can weight from tens of megabytes, up to 400 megabytes.
I apologize for not clearing this on the very start. I did not know there is a difference between integers and objects of other types.
You're correct, as far as this goes. Actually, what happens is that Python constructs integer objects 2 and 4. In the first assignment, the interpreter makes v point to the 2 object.
In the second assignment, we construct an object 6 and make v point to that, as you suspected.
What happens next is dependent on the implementation. Most interpreters and compilers that use objects for even atomic values, will recognize the 2 object as a small integer, likely to be used again, and very cheap to maintain -- so it won't get collected. In general, a literal appearing in the code will not be turned over for garbage collection.
Thanks for updating the question. Yes, your large object, with no references to it, is now available for GC.
The example with small integers (-5 to 256) is not the best one because these are represented in a special way (see this for example).
You are generally correct about the flow of things, so if you had something like
v = f(v)
the following things happen:
a new object gets created as a result of evaluating f(v) (modulo the small integer or literal optimization)
this new object is bound to variable v
whatever was previously in v is now inaccessible and, generally, may get garbage collected soon
Related
As follow is my understanding of types & parameters passing in java and python:
In java, there are primitive types and non-primitive types. Former are not object, latter are objects.
In python, they are all objects.
In java, arguments are passed by value because:
primitive types are copied and then passed, so they are passed by value for sure. non-primitive types are passed by reference but reference(pointer) is also value, so they are also passed by value.
In python, the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects.
Based on official doc, arguments are passed by assignment. What does it mean by 'passed by assignment'? Is objects in java work the same way as python? What result in the difference (passed by value in java and passed by argument in python)?
And is there any wrong understanding above?
tl;dr: You're right that Python's semantics are essentially Java's semantics, without any primitive types.
"Passed by assignment" is actually making a different distinction than the one you're asking about.1 The idea is that argument passing to functions (and other callables) works exactly the same way assignment works.
Consider:
def f(x):
pass
a = 3
b = a
f(a)
b = a means that the target b, in this case a name in the global namespace, becomes a reference to whatever value a references.
f(a) means that the target x, in this case a name in the local namespace of the frame built to execute f, becomes a reference to whatever value a references.
The semantics are identical. Whenever a value gets assigned to a target (which isn't always a simple name—e.g., think lst[0] = a or spam.eggs = a), it follows the same set of assignment rules—whether it's an assignment statement, a function call, an as clause, or a loop iteration variable, there's just one set of rules.
But overall, your intuitive idea that Python is like Java but with only reference types is accurate: You always "pass a reference by value".
Arguing over whether that counts as "pass by reference" or "pass by value" is pointless. Trying to come up with a new unambiguous name for it that nobody will argue about is even more pointless. Liskov invented the term "call by object" three decades ago, and if that never caught on, anything someone comes up with today isn't likely to do any better.
You understand the actual semantics, and that's what matters.
And yes, this means there is no copying. In Java, only primitive values are copied, and Python doesn't have primitive values, so nothing is copied.
the only difference is that 'primitive types'(for example, numbers) are not copied, but simply taken as objects
It's much better to see this as "the only difference is that there are no 'primitive types' (not even simple numbers)", just as you said at the start.
It's also worth asking why Python has no primitive types—or why Java does.2
Making everything "boxed" can be very slow. Adding 2 + 3 in Python means dereferencing the 2 and 3 objects, getting the native values out of them, adding them together, and wrapping the result up in a new 5 object (or looking it up in a table because you already have an existing 5 object). That's a lot more work than just adding two ints.3
While a good JIT like Hotspot—or like PyPy for Python—can often automatically do those optimizations, sometimes "often" isn't good enough. That's why Java has native types: to let you manually optimize things in those cases.
Python, instead, relies on third-party libraries like Numpy, which let you pay the boxing costs just once for a whole array, instead of once per element. Which keeps the language simpler, but at the cost of needing Numpy.4
1. As far as I know, "passed by assignment" appears a couple times in the FAQs, but is not actually used in the reference docs or glossary. The reference docs already lean toward intuitive over rigorous, but the FAQ, like the tutorial, goes much further in that direction. So, asking what a term in the FAQ means, beyond the intuitive idea it's trying to get across, may not be a meaningful question in the first place.
2. I'm going to ignore the issue of Java's lack of operator overloading here. There's no reason they couldn't include special language rules for a handful of core classes, even if they didn't let you do the same thing with your own classes—e.g., Go does exactly that for things like range, and people rarely complain.
3. … or even than looping over two arrays of 30-bit digits, which is what Python actually does. The cost of working on unlimited-size "bigints" is tiny compared to the cost of boxing, so Python just always pays that extra, barely-noticeable cost. Python 2 did, like Java, have separate fixed and bigint types, but a couple decades of experience showed that it wasn't getting any performance benefits out of the extra complexity.
4. The implementation of Numpy is of course far from simple. But using it is pretty simple, and a lot more people need to use Numpy than need to write Numpy, so that turns out to be a pretty decent tradeoff.
Similar to passing reference types by value in C#.
Docs: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/passing-reference-type-parameters#passing-reference-types-by-value
Code demo:
# mutable object
l = [9, 8, 7]
def createNewList(l1: list):
# l1+[0] will create a new list object, the reference address of the local variable l1 is changed without affecting the variable l
l1 = l1+[0]
def changeList(l1: list):
# Add an element to the end of the list, because l1 and l refer to the same object, so l will also change
l1.append(0)
print(l)
createNewList(l)
print(l)
changeList(l)
print(l)
# immutable object
num = 9
def changeValue(val: int):
# int is an immutable type, and changing the val makes the val point to the new object 8,
# it's not change the num value
value = 8
print(num)
changeValue(num)
print(num)
This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 5 years ago.
Hello I am trying to understand how Python's pass by reference works.
I have an example:
>>>a = 1
>>>b = 1
>>>id(a);id(b)
140522779858088
140522779858088
This makes perfect sense since a and b are both referencing the same value that they would have the identity. What I dont quite understand is how this example:
>>>a = 4.4
>>>b = 1.0+3.4
>>>id(a);id(b)
140522778796184
140522778796136
Is different from this example:
>>>a = 2
>>>b = 2 + 0
>>>id(a);id(b)
140522779858064
140522779858064
Is it because in the 3rd example the 0 int object is being viewed as "None" by the interpreter and is not being recognized as needing a different identity from the object which variable "a" is referencing(2)? Whereas in the 2nd example "b" is adding two different int objects and the interpreter is allocating memory for both of those objects to be added, which gives variable "a", a different identity from variable "b"?
In your first example, the names a and b are both "referencing" the same object because of interning. The assignment statement resulted in an integer with the same id only because it has reused a preexisting object that happened to be hanging around in memory already. That's not a reliable behavior of integers:
>>> a = 257
>>> b = 257
>>> id(a), id(b)
(30610608, 30610728)
As demonstrated above, if you pick a big enough integer then it will behave as the floats in your second example behaved. And interning small integers is optional in the Python language anyway, this happens to be a CPython implementation detail: it's a performance optimization intended to avoid the overhead of creating a new object. We can speed things up by caching commonly used integer instances, at the cost of a higher memory footprint of the Python interpreter.
Don't think about "reference" and "value" when dealing with Python, the model that works for C doesn't really work well here. Instead think of "names" and "objects".
The diagram above illustrates your third example. 2 is an object, a and b are names. We can have different names pointing at the same object. And objects can exist without any name.
Assigning a variable only attaches a nametag. And deleting a variable only removes a nametag. If you keep this idea in mind, then the Python object model will never surprise you again.
As stated here, CPython caches integers from -5 to 256. So all variables within this range with the same value share the same id (I wouldn't bet on that for future versions, though, but that's the current implementation)
There's no such thing for floats probably because there's an "infinity" of possible values (well not infinite but big because of floating point), so the chance of computing the same value by different means is really low compared to integers.
>>> a=4.0
>>> b=4.0
>>> a is b
False
Python variables are always references to objects. These objects can be divided into mutable and immutable objects.
A mutable type can be modified without its id being modified, thus every variable that points to this object will get updated. However, an immutable object can not be modified this way, so Python generates a new object with the changes and reassigns the variable to point to this new object, thus the id of the variable before the change will not match the id of the variable after the change.
Integers and floats are immutable, so after a change they will point to a different object and thus have different ids.
The problem is that CPython "caches" some common integer values so that there are not multiple objects with the same value in order to save memory, and 2 is one of those cache-ed integers, so every time a variable points to the integer 2 it will have the same id (its value will be different for different python executions).
x = 3
x = 4
Is the second line an assignment statement or a new variable binding?
In (very basic) C terms, when you assign a new value to a variable this is what happens:
x = malloc(some object struct)
If I'm interpreting your question correctly, you're asking what happens when you reassign x - this:
A. *x = some other value
or this:
B. x = malloc(something else)
The correct answer is B, because the object the variable points to can be also referred to somewhere else and changing it might affect other parts of the program in an unpredictable way. Therefore, Python unbinds the variable name from the old structure (decreasing its "reference counter"), allocates a new structure and binds the name to this new one. Once reference counter of a structure becomes zero, it becomes garbage and will be freed at some point.
Of course, this workflow is highly optimized internally, and details may vary depending on the object itself, specific interpreter (CPython, Jython etc) and from version to version. As userland python programmers, we only have a guarantee that
x = old_object
and then
x = new_object
doesn't affect "old_object" in any way.
There is no difference. Assigning to a name in Python is the same whether or not the name already existed.
Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?
I know this question is a bit general but I really would like to know :)
In C and C++, a variable is a named memory location. The value of the variable is the value stored in that location. Assign to the variable and you modify that value. So the variable is the memory location, not the name for it.
In Python, a variable is a name used to refer to an object. The value of the variable is that object. So far sounds like the same thing. But assign to the variable and you don't modify the object itself, rather you alter which object the variable refers to. So the variable is the name, not the object.
For this reason, if you're considering the properties of Python in the abstract, or if you're talking about multiple languages at once, then it's useful to use different names for these two different things. To keep things straight you might avoid talking about variables in Python, and refer to what the assignment operator does as "binding" rather than "assignment".
Note that The Python grammar talks about "assignments" as a kind of statement, not "bindings". At least some of the Python documentation calls names variables. So in the context of Python alone, it's not incorrect to do the same. Different definitions for jargon words apply in different contexts.
In, for example, C, a variable is a location in memory identified by a specific name. For example, int i; means that there is a 4-byte (usually) variable identified by i. This memory location is allocated regardless of whether a value is assigned to it yet. When C runs i = 1000, it is changing the value stored in the memory location i to 1000.
In python, the memory location and size is irrelevant to the interpreter. The closest python comes to a "variable" in the C sense is a value (e.g. 1000) which exists as an object somewhere in memory, with or without a name attached. Binding it to a name happens by i = 1000. This tells python to create an integer object with a value of 1000, if it does not already exist, and bind to to the name 'i'. An object can be bound to multiple names quite easily, e.g:
>>> a = [] # Create a new list object and bind it to the name 'a'
>>> b = a # Get the object bound to the name 'a' and bind it to the name 'b'
>>> a is b # Are the names 'a' and 'b' bound to the same object?
True
This explains the difference between the terms, but as long as you understand the difference it doesn't really matter which you use. Unless you're pedantic.
I'm not sure the name/binding description is the easiest to understand, for example I've always been confused by it even if I've a somewhat accurate understanding of how Python (and cpython in particular) works.
The simplest way to describe how Python works if you're coming from a C background is to understand that all variables in Python are indeed pointers to objects and for example that a list object is indeed an array of pointers to values. After a = b both a and b are pointing to the same object.
There are a couple of tricky parts where this simple model of Python semantic seems to fail, for example with list augmented operator += but for that it's important to note that a += b in Python is not the same as a = a + b but it's a special increment operation (that can also be defined for user types with the __iadd__ method; a += b is indeed a = a.__iadd__(b)).
Another important thing to understand is that while in Python all variables are indeed pointers still there is no pointer concept. In other words you cannot pass a "pointer to a variable" to a function so that the function can change the variable: what in C++ is defined by
void increment(int &x) {
x += 1;
}
or in C by
void increment(int *x) {
*x += 1;
}
in Python cannot be defined because there's no way to pass "a variable", you can only pass "values". The only way to pass a generic writable place in Python is to use a callback closure.
who said you should? Unless you are discussing issues that are directly related to name binding operations; it is perfectly fine to talk about variables and assignments in Python as in any other language. Naturally the precise meaning is different in different programming languages.
If you are debugging an issue connected with "Naming and binding" then use this terminology because Python language reference uses it: to be as specific and precise as possible, to help resolve the problem by avoiding unnecessary ambiguity.
On the other hand, if you want to know what is the difference between variables in C and Python then these pictures might help.
I would say that the distinction is significant because of several of the differences between C and Python:
Duck typing: a C variable is always an instance of a given type - in Python it isn't the type that a name refers to can change.
Shallow copies - Try the following:
>>> a = [4, 5, 6]
>>> b = a
>>> b[1] = 0
>>> a
[4, 0, 6]
>>> b = 3
>>> a
[4, 0, 6]
This makes sense as a and b are both names that spend some of the time bound to a list instance rather than being separate variables.
Just wondering what the logic behind this one is? On the surface it seems kind of inefficient, that every time you do something simple like "x=x+1" that it has to take a new address and discard the old one.
The Python variable (called an identifier or name, in Python) is a reference to a value. The id() function says something for that value, not the name.
Many values are not mutable; integers, strings, floats all do not change in place. When you add 1 to another integer, you return a new integer that then replaces the reference to the old value.
You can look at Python names as labels, tied to values. If you imagine values as balloons, you are retying the label a new balloon each time you assign to that name. If there are no other labels attached to a balloon anymore, it simply drifts away in the wind, never to be seen again. The id() function gives you a unique number for that balloon.
See this previous answer of mine where I talk a little bit more about that idea of values-as-balloons.
This may seem inefficient. For many often used and small values, Python actually uses a process called interning, where it will cache a stash of these values for re-use. None is such a value, as are small integers and the empty tuple (()). You can use the intern() function to do the same with strings you expect to use a lot.
But note that values are only cleaned up when their reference count (the number of 'labels') drops to 0. Loads of values are reused all over the place all the time, especially those interned integers and singletons.
Because the basic types are immutable, so every time you modify it, it needs to be instantiated again
...which is perfectly fine, especially for thread-safe functions
The = operator doesn't modify an object, it assigns the name to a completely different object, which may or may not already have an id.
For your example, integers are immutable; there's no way to add something to one and keep the same id.
And, in fact, small integers are interned at least in cPython, so if you do:
x = 1
y = 2
x = x + 1
Then x and y may have the same id.
In python "primitive" types like ints and strings are immutable, which means they can not be modified.
Python is actually quite efficient, because, as #Wooble commented, «Very short strings and small integers are interned.»: if two variables reference the same (small) immutable value their id is the same (reducing duplicated immutables).
>>> a = 42
>>> b = 5
>>> id(a) == id(b)
False
>>> b += 37
>>> id(a) == id(b)
True
The reason behind the use of immutable types is a safe approach to the concurrent access on those values.
At the end of the day it depends on a design choice.
Depending on your needs you can take more advantage of an implementation instead of another.
For instance, a different philosophy can be found in a somewhat similar language, Ruby, where those types that in Python are immutable, are not.
To be accurate, assignment x=x+1 doesn't modify the object that x is referencing, it just lets the x point to another object whose value is x+1.
To understand the logic behind, one needs to understand the difference between value semantics and reference semantics.
An object with value semantics means only its value matters, not its identity. While an object with reference semantics focuses on its identity(in Python, identity can be returned from id(obj)).
Typically, value semantics implies immutability of the object. Or conversely, if an object is mutable(i.e. in-place change), that means it has reference semantics.
Let's briefly explain the rationale behind this immutability.
Objects with reference semantics can be changed in-place without losing their original addresses/identities. This makes sense in that it's the identity of an object with reference semantics that makes itself distinguishable from other objects.
In contrast, an object with value-semantics should never change itself.
First, this is possible and reasonable in theory. Since only the value(not its identity) is significant, when a change is needed, it's safe to swap it to another identity with different value. This is called referential transparency. Be noted that this is impossible for the objects with reference semantics.
Secondly, this is beneficial in practice. As the OP thought, it seems inefficient to discard the old objects each time when it's changed , but most time it's more efficient than not. For one thing, Python(or any other language) has intern/cache scheme to make less objects to be created. What's more, if objects of value-semantics were designed to be mutable, it would take much more space in most cases.
For example, Date has a value semantics. If it's designed to be mutable, any method that returning a date from internal field will exposes the handle to outside world, which is risky(e.g. outside can directly modify this internal field without resorting to public interface). Similarly, if one passes any date object by reference to some function/method, this object could be modified in that function/method, which may be not as expected. To avoid these kinds of side-effect, one has to do defensive programming: instead of directly returning the inner date field, he returns a clone of it; instead of passing by reference, he passes by value which means extra copies are made. As one could imagine, there are more chances to create more objects than necessary. What's worse, code becomes more complicated with these extra cloning.
In a word, immutability enforces the value-semantics, it usually involves less object creation, has less side-effects and less hassles, and is more test-friendly. Besides, immutable objects are inherently thread-safe, which means less locks and better efficiency in multithreading environment.
That's the reason why basic data types of value-semantics like number, string, date, time are all immutable(well, string in C++ is an exception, that's why there're so many const string& stuffs to avoid string being modified unexpectedly). As a lesson, Java made mistakes on designing value-semantic class Date, Point, Rectangle, Dimension as mutable.
As we know, objects in OOP have three characteristics: state, behavior and identity. Objects with value semantics are not typical objects in that their identities do not matter at all. Usually they are passive, and mostly used to describe other real, active objects(i.e. those with reference semantics). This is a good hint to distinguish between value semantics and reference semantics.