x = 3
x = 4
Is the second line an assignment statement or a new variable binding?
In (very basic) C terms, when you assign a new value to a variable this is what happens:
x = malloc(some object struct)
If I'm interpreting your question correctly, you're asking what happens when you reassign x - this:
A. *x = some other value
or this:
B. x = malloc(something else)
The correct answer is B, because the object the variable points to can be also referred to somewhere else and changing it might affect other parts of the program in an unpredictable way. Therefore, Python unbinds the variable name from the old structure (decreasing its "reference counter"), allocates a new structure and binds the name to this new one. Once reference counter of a structure becomes zero, it becomes garbage and will be freed at some point.
Of course, this workflow is highly optimized internally, and details may vary depending on the object itself, specific interpreter (CPython, Jython etc) and from version to version. As userland python programmers, we only have a guarantee that
x = old_object
and then
x = new_object
doesn't affect "old_object" in any way.
There is no difference. Assigning to a name in Python is the same whether or not the name already existed.
Related
I am puzzled with an issue of a variable assignment in python. I tried looking for pages on internet about this issue, but I still do not understand it.
Let's say I assign a certain value to the variable:
v = 2
Python reserved a place in some memory location with a value 2. We now know how to access that particular memory location by assigning to it a variable name: v
If then, we do this:
v = v + 4
A new memory location will be filed with the value 6 (2+4), so now to access this value, we are binding the variable v to it.
Is this correct, or completely incorrect?
And what happens with the initial value 2 in the memory location? It is no longer bound to any variable, so this value can not be accessed any more? Does this mean that in the next couple of minutes, this memory location will be freed by the garbage collector?
Thank you for the reply.
EDIT:
The v = 2 and v = v + 2 are just simple examples. In my case instead of 2 and v+2 I have a non-python native objects which can weight from tens of megabytes, up to 400 megabytes.
I apologize for not clearing this on the very start. I did not know there is a difference between integers and objects of other types.
You're correct, as far as this goes. Actually, what happens is that Python constructs integer objects 2 and 4. In the first assignment, the interpreter makes v point to the 2 object.
In the second assignment, we construct an object 6 and make v point to that, as you suspected.
What happens next is dependent on the implementation. Most interpreters and compilers that use objects for even atomic values, will recognize the 2 object as a small integer, likely to be used again, and very cheap to maintain -- so it won't get collected. In general, a literal appearing in the code will not be turned over for garbage collection.
Thanks for updating the question. Yes, your large object, with no references to it, is now available for GC.
The example with small integers (-5 to 256) is not the best one because these are represented in a special way (see this for example).
You are generally correct about the flow of things, so if you had something like
v = f(v)
the following things happen:
a new object gets created as a result of evaluating f(v) (modulo the small integer or literal optimization)
this new object is bound to variable v
whatever was previously in v is now inaccessible and, generally, may get garbage collected soon
After getting in touch with the more deeper workings of Python, I've come to the understanding that assigning a variable equals to the creation of a new object which has their own specific address regardless of the variable-name that was assigned to the object.
In that case though, it makes me wonder what happens to an object that was created and modified later on. Does it sit there and consumes memory?
The scenarion in mind looks something like this:
# Creates object with id 10101001 (random numbers)
x = 5
# Creates object with id 10010010 using the value from object 10101001.
x += 10
What happens to object with the id 10101001?
Out of curiosity too, why do objects need an ID AND a refrence that is the variable name, wouldn't it be better to just assign the address with the variable name?
I apologize in advance for the gringe this question might invoke in someone.
Here is a great talk that was given at PyCon by Ned Batchelder this year about how Python manages variables.
https://www.youtube.com/watch?v=_AEJHKGk9ns
I think it will help clear up some of your confusion.
First of all Augmented assignment statements states:
An augmented assignment expression like x += 1 can be rewritten x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.
So depending on the type of x this might not create a new object.
Python is reference counted. So the reference count of the object with id 10101001 decremented. If this count hits zero, the is freed almost immediately. But most low range integers are cached anyways. Refer to Objects, Types and Reference Counts for all the details.
Regarding the id of an object:
CPython implementation detail: This is the address of the object in memory.
So basically id and reference are the same. The variable name is just a binding to the object itself.
Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?
I know this question is a bit general but I really would like to know :)
In C and C++, a variable is a named memory location. The value of the variable is the value stored in that location. Assign to the variable and you modify that value. So the variable is the memory location, not the name for it.
In Python, a variable is a name used to refer to an object. The value of the variable is that object. So far sounds like the same thing. But assign to the variable and you don't modify the object itself, rather you alter which object the variable refers to. So the variable is the name, not the object.
For this reason, if you're considering the properties of Python in the abstract, or if you're talking about multiple languages at once, then it's useful to use different names for these two different things. To keep things straight you might avoid talking about variables in Python, and refer to what the assignment operator does as "binding" rather than "assignment".
Note that The Python grammar talks about "assignments" as a kind of statement, not "bindings". At least some of the Python documentation calls names variables. So in the context of Python alone, it's not incorrect to do the same. Different definitions for jargon words apply in different contexts.
In, for example, C, a variable is a location in memory identified by a specific name. For example, int i; means that there is a 4-byte (usually) variable identified by i. This memory location is allocated regardless of whether a value is assigned to it yet. When C runs i = 1000, it is changing the value stored in the memory location i to 1000.
In python, the memory location and size is irrelevant to the interpreter. The closest python comes to a "variable" in the C sense is a value (e.g. 1000) which exists as an object somewhere in memory, with or without a name attached. Binding it to a name happens by i = 1000. This tells python to create an integer object with a value of 1000, if it does not already exist, and bind to to the name 'i'. An object can be bound to multiple names quite easily, e.g:
>>> a = [] # Create a new list object and bind it to the name 'a'
>>> b = a # Get the object bound to the name 'a' and bind it to the name 'b'
>>> a is b # Are the names 'a' and 'b' bound to the same object?
True
This explains the difference between the terms, but as long as you understand the difference it doesn't really matter which you use. Unless you're pedantic.
I'm not sure the name/binding description is the easiest to understand, for example I've always been confused by it even if I've a somewhat accurate understanding of how Python (and cpython in particular) works.
The simplest way to describe how Python works if you're coming from a C background is to understand that all variables in Python are indeed pointers to objects and for example that a list object is indeed an array of pointers to values. After a = b both a and b are pointing to the same object.
There are a couple of tricky parts where this simple model of Python semantic seems to fail, for example with list augmented operator += but for that it's important to note that a += b in Python is not the same as a = a + b but it's a special increment operation (that can also be defined for user types with the __iadd__ method; a += b is indeed a = a.__iadd__(b)).
Another important thing to understand is that while in Python all variables are indeed pointers still there is no pointer concept. In other words you cannot pass a "pointer to a variable" to a function so that the function can change the variable: what in C++ is defined by
void increment(int &x) {
x += 1;
}
or in C by
void increment(int *x) {
*x += 1;
}
in Python cannot be defined because there's no way to pass "a variable", you can only pass "values". The only way to pass a generic writable place in Python is to use a callback closure.
who said you should? Unless you are discussing issues that are directly related to name binding operations; it is perfectly fine to talk about variables and assignments in Python as in any other language. Naturally the precise meaning is different in different programming languages.
If you are debugging an issue connected with "Naming and binding" then use this terminology because Python language reference uses it: to be as specific and precise as possible, to help resolve the problem by avoiding unnecessary ambiguity.
On the other hand, if you want to know what is the difference between variables in C and Python then these pictures might help.
I would say that the distinction is significant because of several of the differences between C and Python:
Duck typing: a C variable is always an instance of a given type - in Python it isn't the type that a name refers to can change.
Shallow copies - Try the following:
>>> a = [4, 5, 6]
>>> b = a
>>> b[1] = 0
>>> a
[4, 0, 6]
>>> b = 3
>>> a
[4, 0, 6]
This makes sense as a and b are both names that spend some of the time bound to a list instance rather than being separate variables.
There is a lot of confusion with python names in the web and documentation doesn't seem to be that clear about names. Below are several things I read about python names.
names are references to objects (where are they? heap?) and what name holds is an address. (like Java).
names in python are like C++ references ( int& b) which means that it is another alias for a memory location; i.e. for int a , a is a memory location. if int& b = a means that b is another name the for same memory location
names are very similar to automatically dereferenced pointers variables in C.
Which of the above statements is/are correct?
Does Python names contain some kind of address in them or is it just a name to a memory location (like C++ & references)?
Where are python names stored, Stack or heap?
EDIT:
Check out the below lines from http://etutorials.org/Programming/Python.+Text+processing/Appendix+A.+A+Selective+and+Impressionistic+Short+Review+of+Python/A.2+Namespaces+and+Bindings/#
Whenever a (possibly qualified) name occurs on the right side of an assignment, or on a line by itself, the name is dereferenced to the object itself. If a name has not been bound inside some accessible scope, it cannot be dereferenced; attempting to do so raises a NameError exception. If the name is followed by left and right parentheses (possibly with comma-separated expressions between them), the object is invoked/called after it is dereferenced. Exactly what happens upon invocation can be controlled and overridden for Python objects; but in general, invoking a function or method runs some code, and invoking a class creates an instance. For example:
pkg.subpkg.func() # invoke a function from a namespace
x = y # deref 'y' and bind same object to 'x'
This makes sense.Just want to cross check how true it is.Comments and answers please
names are references to objects
Yes. You shouldn't care where the objects live if you just want to understand Python variables' semantics; they're somewhere in memory and Python implementations manage memory for you. How they do that depends on the implementation (CPython, Jython, PyPy...).
names in python are like C++ references
Not exactly. Reassigning a reference in C++ actually reassigns the memory location referenced, e.g. after
int i = 0;
int &r = i;
r = 1;
it is true that i == 1. You can't do this in Python except by using a mutable container object. The closest you can get to the C++ reference behavior is
i = [0] # single-element list
r = i # r is now another reference to the object referenced by i
r[0] = 1 # sets i[0]
are very similar to automatically dereferenced pointers variables in C
No, because then they'd be similar to C++ references in the above regard.
Does Python names contain some kind of address in them or is it just a name to a memory location?
The former is closer to the truth, assuming a straightforward implementation (again, PyPy might do things differently than CPython). In any case, Python variables are not storage locations, but rather labels/names for objects that may live anywhere in memory.
Every object in a Python process has an identity that can be obtained using the id function, which in CPython returns its memory address. You can check whether two variables (or expressions more generally) reference the same object by checking their id, or more directly by using is:
>>> i = [1, 2]
>>> j = i # a new reference
>>> i is j # same identity?
True
>>> j = [1, 2] # a new list
>>> i == j # same value?
True
>>> i is j # same identity?
False
Python names are, well, names. You have objects and names, that's it.
Creating an object, say [3, 4, 5] creates an object somewhere on the heap. You don't need to know how. Now you can put names to target this object, by assigning it to names:
x = [3, 4, 5]
That is, the assignment operator assigns names rather than values. x isn't [3, 4, 5], no, it's simply a name pointing to the [3, 4, 5] object. So doing this:
x = 1
Doesn't change the original [3, 4, 5] object, instead it assigns the object 1 to the name x. Also note that most expressions like [3, 4, 5], but also 8 + 3 create temporaries. Unless you assign a name to that temporary it will immediately die. There is no (except, for example in CPython for small numbers, but that aside) mechanism to keep objects alive that aren't referenced, and cache them. For example, this fails:
>>> x = [3, 4, 5]
>>> x is [3, 4, 5] # must be some object, right? no!
False
However, that's merely assignment (which is not overloadable in Python). In fact, objects and names in Python very well behave like automatically dereferencing pointers, except that they are automatically reference counted and die after they're not referenced anymore (in CPython, at least) and that they do not automatically dereference on assignment.
Thanks to this memory model, for example the C++ way of overloading index operations doesn't work. Instead, Python uses __setitem__ and __getitem__ because you can't return anything that's "assignable". Furthermore, the operators +=, *=, etc... work by creating temporaries and assigning that temporary back to the name.
Python objects are stored on a heap and are garbage collected via reference counting.
Variables are references to objects like in Java, and thus point 1 applies. I am not familiar with either C++ or automatically dereferenced pointer variables in C, to make a call on those.
Ultimately, it's the python interpreter that does the looking up of items in the interpreter structures, which usually are python lists and dictionaries and other such abstract containers; namespaces use dict (a hash table) for example, where the names and values are pointers to other python objects. These are managed explicitly by the mapping protocol.
To the python programmer, this is all hidden; you don't need to know where your objects live, just that they are still alive as long as you have something referencing them. You pass around these references when coding in python.
I have a question about the map function in Python.
From what I understand, the function does not mutate the list it's operating on, but rather create a new one and return it. Is this correct ?
Additionally, I have the following piece of code
def reflect(p,dir):
if(dir == 'X'):
func = lambda (a,b) : (a * -1, b)
else:
func = lambda (a,b) : (a, b * -1)
p = map(func,p)
print 'got', p
points is an array of tuples such as: [(1, 1), (-1, 1), (-1, -1), (1, -1)]
If I call the above function in such a manner:
print points
reflect(points,'X')
print points
the list points does not change. Inside the function though, the print function properly prints what I want.
Could someone maybe point me in some direction where I could learn how all this passing by value / reference etc works in python, and how I could fix the above ? Or maybe I'm trying too hard to emulate Haskell in python...
Thanks
edit:
Say instead of p = map(func,p) I do
for i in range(len(p)):
p[i] = func(p[i])
The value of the list is updated outside of the function, as if working by reference. Ugh, hope this is clear :S
You misunderstand how references work in Python. Here, all names are references, there are no "values". Names are bound to objects. But = doesn't modify the object that's pointed to by the name — it rebinds the name to a different object:
x = 42
y = x
# now:
# 'is' is a identity operator — it checks whether two names point to the
# exact same object
print x is y # => True
print x, y # 42 42
y = 69
# now y has been rebound, but that does not change the '42' object, nor rebinds x
print x is y # => False
print x, y # 42 69
To modify the object itself, it needs to be mutable — i.e. expose members that mutate it or have a modifiable dict. The same thing as above happens when you rebind p — it doesn't touch points at all, it simply modifies the meaning of local p name.
If you want to simulate C++-like references, you need to encapsulate the object into a mutable container, e.g. a list.
reflect([points], 'X')
# inside reflect:
p[0] = ...
But you shouldn't, at least in this case — you should just return the new object instead.
points = reflect(points, 'X')
# inside reflect, instead of p = ...
return map(func, p)
Well, now that I think about it, you can also do
p[:] = map(func, p)
But again, returning new object is usually better.
The data model of Python is based on a trilogy:
identifier - reference - object
.
The identifier is a string written in the code.
The reference is a variable stricto sensu, that is to say "a chunk of
memory whose content can change". The value of a reference is the adress of the object.
The object has an
implementation based on the structures of the langage C that is the
foundations of Python.
Other words are also used to designate the 'identifier' :
1) name
2) variable ; because this word is used by metonymy in mathematics to designate the symbols that represent the real mathematical variables, and the variables in a computer have conceptually the same functioning as the mathematical variables (their values can change).
This use in Python is a very bad habit in my opinion : it creates ambiguities and confusion with what is called 'variable' in computer science: "chunk of memory whose content can change".
The best is to use the word : identifier
.
An identifier and an object are binded in a certain namespace. Namespaces are displayed under the form of Python's dictionaries, but they ARE NOT dictionaries.
The binding of the identifier and the object is indirect, via the reference.
The binding of the identifier and the reference is direct, and realized in the SYMBOL TABLE (or symbol-table).
In computer science, a symbol table is a data structure used by a
language translator such as a compiler or interpreter, where each
identifier in a program's source code is associated with information
relating to its declaration or appearance in the source, such as its
type, scope level and sometimes its location.
http://en.wikipedia.org/wiki/Symbol_table
They say: identifiers. Precisely.
I rarely see allusions to symbol table, though it's the crucial thing that sheds light on the functioning of the Python's data model IMO.
In my opinion, the word
binding
doesn't designate a precise and unique mechanism but globally the set of all the mechanisms concerning the trilogy identifier - reference - object
.
I dont' pretend that I perfectly understood all the matters concerning the data and execution models of Python, and that the above considerations are the more exact and precisely expressed that can be done.
However, they allow me to have an operational understanding of what happens during executions of code.
I would be very happy to be corrected on some points if I am wrong (for exemple, I am very satisfied to have learnt from Michael Foord that the nature of namespaces is not being dictionary, which is only the way they are represented)
.
That said, I don't know what is called value and reference when the subject of passing something as argument in Python is discussed, and I have the impression that a lot of the persons that have expressed on the subject in numerous esoteric discussions don't know more than me.
I think that there is no best and lucid opinion on the subject that this one of Alex Martelli:
"Trying to reuse terminology that is more generally applied to
languages where "variables are boxes" to a language where "variables
are post-it tags" is, IMHO, more likely to confuse than to help."
Alex Martelli
http://bytes.com/topic/python/answers/37219-value-reference
I want to bump this. The answers don't really answer the question - they disregard the fact that the OP was able to achieve the desired result by iterating. The question comes down to the behavior of map. Here is a more direct example:
f=(lambda pair: pair[0].append(pair[1]))
a = ([],1)
f(a)
print(a) #prints ([1],1)
a=([],1)
map(f,[a])
print(a) #prints ([0],1)
So map isn't mutating objects in the way the OP is expecting. I have the same confusion.
Can anyone comment on exactly what is going on here? I think that'd be a good answer to the OP's question.
Note that we have different behavior if we assign the output of map as follows (as per Cat Plus Plus' answer)
f=(lambda pair: pair[0].append(pair[1]))
a = ([],1)
x = [a]
x[:] = map(f,x)
print(x) #prints None
print(a) # prints [1]
Please note that in the first example, we simply called f(a), not a=f(a). Why do we need assignment when using map and not when working outside of map?