There is a lot of confusion with python names in the web and documentation doesn't seem to be that clear about names. Below are several things I read about python names.
names are references to objects (where are they? heap?) and what name holds is an address. (like Java).
names in python are like C++ references ( int& b) which means that it is another alias for a memory location; i.e. for int a , a is a memory location. if int& b = a means that b is another name the for same memory location
names are very similar to automatically dereferenced pointers variables in C.
Which of the above statements is/are correct?
Does Python names contain some kind of address in them or is it just a name to a memory location (like C++ & references)?
Where are python names stored, Stack or heap?
EDIT:
Check out the below lines from http://etutorials.org/Programming/Python.+Text+processing/Appendix+A.+A+Selective+and+Impressionistic+Short+Review+of+Python/A.2+Namespaces+and+Bindings/#
Whenever a (possibly qualified) name occurs on the right side of an assignment, or on a line by itself, the name is dereferenced to the object itself. If a name has not been bound inside some accessible scope, it cannot be dereferenced; attempting to do so raises a NameError exception. If the name is followed by left and right parentheses (possibly with comma-separated expressions between them), the object is invoked/called after it is dereferenced. Exactly what happens upon invocation can be controlled and overridden for Python objects; but in general, invoking a function or method runs some code, and invoking a class creates an instance. For example:
pkg.subpkg.func() # invoke a function from a namespace
x = y # deref 'y' and bind same object to 'x'
This makes sense.Just want to cross check how true it is.Comments and answers please
names are references to objects
Yes. You shouldn't care where the objects live if you just want to understand Python variables' semantics; they're somewhere in memory and Python implementations manage memory for you. How they do that depends on the implementation (CPython, Jython, PyPy...).
names in python are like C++ references
Not exactly. Reassigning a reference in C++ actually reassigns the memory location referenced, e.g. after
int i = 0;
int &r = i;
r = 1;
it is true that i == 1. You can't do this in Python except by using a mutable container object. The closest you can get to the C++ reference behavior is
i = [0] # single-element list
r = i # r is now another reference to the object referenced by i
r[0] = 1 # sets i[0]
are very similar to automatically dereferenced pointers variables in C
No, because then they'd be similar to C++ references in the above regard.
Does Python names contain some kind of address in them or is it just a name to a memory location?
The former is closer to the truth, assuming a straightforward implementation (again, PyPy might do things differently than CPython). In any case, Python variables are not storage locations, but rather labels/names for objects that may live anywhere in memory.
Every object in a Python process has an identity that can be obtained using the id function, which in CPython returns its memory address. You can check whether two variables (or expressions more generally) reference the same object by checking their id, or more directly by using is:
>>> i = [1, 2]
>>> j = i # a new reference
>>> i is j # same identity?
True
>>> j = [1, 2] # a new list
>>> i == j # same value?
True
>>> i is j # same identity?
False
Python names are, well, names. You have objects and names, that's it.
Creating an object, say [3, 4, 5] creates an object somewhere on the heap. You don't need to know how. Now you can put names to target this object, by assigning it to names:
x = [3, 4, 5]
That is, the assignment operator assigns names rather than values. x isn't [3, 4, 5], no, it's simply a name pointing to the [3, 4, 5] object. So doing this:
x = 1
Doesn't change the original [3, 4, 5] object, instead it assigns the object 1 to the name x. Also note that most expressions like [3, 4, 5], but also 8 + 3 create temporaries. Unless you assign a name to that temporary it will immediately die. There is no (except, for example in CPython for small numbers, but that aside) mechanism to keep objects alive that aren't referenced, and cache them. For example, this fails:
>>> x = [3, 4, 5]
>>> x is [3, 4, 5] # must be some object, right? no!
False
However, that's merely assignment (which is not overloadable in Python). In fact, objects and names in Python very well behave like automatically dereferencing pointers, except that they are automatically reference counted and die after they're not referenced anymore (in CPython, at least) and that they do not automatically dereference on assignment.
Thanks to this memory model, for example the C++ way of overloading index operations doesn't work. Instead, Python uses __setitem__ and __getitem__ because you can't return anything that's "assignable". Furthermore, the operators +=, *=, etc... work by creating temporaries and assigning that temporary back to the name.
Python objects are stored on a heap and are garbage collected via reference counting.
Variables are references to objects like in Java, and thus point 1 applies. I am not familiar with either C++ or automatically dereferenced pointer variables in C, to make a call on those.
Ultimately, it's the python interpreter that does the looking up of items in the interpreter structures, which usually are python lists and dictionaries and other such abstract containers; namespaces use dict (a hash table) for example, where the names and values are pointers to other python objects. These are managed explicitly by the mapping protocol.
To the python programmer, this is all hidden; you don't need to know where your objects live, just that they are still alive as long as you have something referencing them. You pass around these references when coding in python.
Related
I am trying these in the python shell and am getting quite confusing results.
>>> p = [1, 2, 3, 4, 5, 6, 7, 8]
>>> p
[1, 2, 3, 4, 5, 6, 7, 8]
>>> p[2:8:2]
[3, 5, 7]
>>> id(p[2:8:2])
37798416
>>> id(p[2:8:2])
37798416
>>> id(p[2:8:2])
50868392
Note how the id changed the 3rd time !
>>> id(p[2:8:2])
37798336
And changed again !
Question#1: How and why did that happen ?
Question#2:
>>> p[2:8:2] = [33,55,77]
>>> p
[1, 2, 33, 4, 55, 6, 77, 8]
How does python exactly "store" p[2:8:2] ? (may be "store" is not the right word, but I hope you get the idea). It does not look like it is a distinct list from the original list (though it is made up of non-sequential immutable items from the original list), as changes to this list are reflected in the original list !
Slicing, with rare exception, makes brand new copies of whatever you're slicing. So all the id checks are telling you is that sometimes the new list reuses the memory from last time, and sometimes it uses a different bit of memory. The exact behavior is pure implementation detail. In CPython (the reference interpreter) id happens to correspond to memory addresses, so all you're seeing is a behavioral artifact of the allocator, not some deep meaning to slicing.
On your question #2: When use in an assignment context, slicing modifies the original sequence, it doesn't create a new list at all. Don't try to draw meaningful parallels between slicing (read oriented, makes new sequences) and slice assignment (write oriented, modifies existing sequences); the behaviors under the hood are different in almost every way.
For question 1:
The id of an object is guaranteed to both be unique and stay constant during the lifetime of that object. See here in the Python library docs:
id(object) - return the identity of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Since you're creating and destroying objects with your slicing, the id is actually following the rules.
If you're using the reference (and I suspect most common) implementation, CPython, it simply gives you the memory address of the object. The source code can be found in Python/bltinmodule.c, simplified and annotated below:
static PyObject *builtin_id(PyModuleDef *self, PyObject *v) {
PyObject *id = PyLong_FromVoidPtr(v); // Turn object address into
return id; // long and return it.
}
That ensures that it's unique and the vagaries and order of memory allocation calls also explain why it can repeat and/or be different.
For question 2:
Assigning to the "slice" does not actually involve creating the sliced object and assigning to it. It simply sets certain values in the already existing object as specified by the slice notation to those given on the right hand side of the assignment.
More detail can be found in the sliceobject files in the CPython source code, specifically Objects/sliceobject.c and Include/sliceobject.h. These involve the creation of a PySliceObject which consists of a {start, stop, step} tuple.
When you apply this tuple to an object on the right hand side of an assignment, such as x = y[2:8:2], it uses the PySliceObject to create a new list x based on y, getting only the relevant elements.
When used on the left hand side, such as x[2:8:2] = [33,55,77], it uses the PySliceObject to decide which elements of x are set to the values on the right.
I'm passing the value i to th = threading.Thread(target=func, args=(i,)) and start the thread immediately by th.start().
Because this is executed inside a loop with changing index i, I'm wondering if i inside the thread retaines its value from the time the thread has been created, or if the thread is working on the reference of i. In the latter case the value wouldn't necessarily be the sames as it was at creation of th.
Are values passed by reference or value?
I would say, passing mutable objects to functions is calling by reference.
You are not alone in wanting to say that, but thinking that way limits your ability to communicate with others about how Python programs actually work.
Consider the following Python fragment:
a = [1, 2, 3]
b = a
foobar(a)
if not b is a:
print('Impossible!')
If Python was a "pass-by-reference" programming language, then the foobar(a) call could cause this program to print Impossible! by changing local variable a to refer to some different object.
But, Python does not do pass-by-reference. It only does pass-by-value, and that means there is no definition of foobar that could make the fragment execute the print call. The foobar function can mutate the object to which local variable a refers, but it has no ability to modify a itself.
See https://en.wikipedia.org/wiki/Evaluation_strategy
Pass by value and pass by reference are two terms that can be misleading sometimes, and they don't always mean the same in every language. I'm going to assume we're taking about what the two terms mean in C (where a reference is passing a pointer to the variable).
Python is really neither of those, I'll give you an example I took from an article (all credit to the original writer, I'll link the article at the end)
def spam(eggs):
eggs.append(1)
eggs = [2, 3]
ham = [0]
spam(ham)
print(ham)
When spam is called, both ham and eggs point to the same value ([0]), to the same object. So, when eggs.append (1) is executed, [0] becomes [0, 1]. That sounds like pass by reference.
However, when eggs = [2, 3], now both eggs and ham should become the new list in pass by reference. But that does not happen; now eggs points to a list in memory containing [2, 3], but ham still points to the original list with the 1 appended to it. That bit sound more like pass by value.
EDIT
As explained above, if a parameter of the thread is modified inside it, the changes will be seen in the original thread as long as the parameter is mutable. That way, passing a list to the thread and appending something to it will be reflected in the caller thread, for example.
However, an immutable object can't be modified. If you do i += 1, you didn't modify the integer, integers are immutable in Python. You are assigning to i a new integer with a value one unit higher than the one before. It's the same thing that happened with eggs = [2, 3]. So, in that particular example, changes will not be reflected in the original thread.
Hope this helps!
Here's the article I promised, it has a much better explanation of the matter. http://stupidpythonideas.blogspot.com/2013/11/does-python-pass-by-value-or-by.html?m=1
A introductory Python textbook defined 'object reference' as follows, but I didn't understand:
An object reference is nothing more than a concrete representation of the object’s identity (the memory address where the object is stored).
The textbook tried illustrating this by using an arrow to show an object reference as some sort of relation going from a variable a to an object 1234 in the assignment statement a = 1234.
From what I gathered off of Wikipedia, the (object) reference of a = 1234 would be an association between a and 1234 were a was "pointing" to 1234 (feel free to clarify "reference vs. pointer"), but it has been a bit difficult to verify as (1) I'm teaching myself Python, (2) many search results talk about references for Java, and (3) not many search results are about object references.
So, what is an object reference in Python? Thanks for the help!
Whatever is associated with a variable name has to be stored in the program's memory somewhere. An easy way to think of this, is that every byte of memory has an index-number. For simplicity's sake, lets imagine a simple computer, these index-numbers go from 0 (the first byte), upwards to however many bytes there are.
Say we have a sequence of 37 bytes, that a human might interpret as some words:
"The Owl and the Pussy-cat went to sea"
The computer is storing them in a contiguous block, starting at some index-position in memory. This index-position is most often called an "address". Obviously this address is absolutely just a number, the byte-number of the memory these letters are residing in.
#12000 The Owl and the Pussy-cat went to sea
So at address 12000 is a T, at 12001 an h, 12002 an e ... up to the last a at 12037.
I am labouring the point here because it's fundamental to every programming language. That 12000 is the "address" of this string. It's also a "reference" to it's location. For most intents and purposes an address is a pointer is a reference. Different languages have differing syntactic handling of these, but essentially they're the same thing - dealing with a block of data at a given number.
Python and Java try to hide this addressing as much as possible, where languages like C are quite happy to expose pointers for exactly what they are.
The take-away from this, is that an object reference is the number of where the data is stored in memory. (As is a pointer.)
Now, most programming languages distinguish between simple types: characters and numbers, and complex types: strings, lists and other compound-types. This is where the reference to an object makes a difference.
So when performing operations on simple types, they are independent, they each have their own memory for storage. Imagine the following sequence in python:
>>> a = 3
>>> b = a
>>> b
3
>>> b = 4
>>> b
4
>>> a
3 # <-- original has not changed
The variables a and b do not share the memory where their values are stored. But with a complex type:
>>> s = [ 1, 2, 3 ]
>>> t = s
>>> t
[1, 2, 3]
>>> t[1] = 8
>>> t
[1, 8, 3]
>>> s
[1, 8, 3] # <-- original HAS changed
We assigned t to be s, but obviously in this case t is s - they share the same memory. Wait, what! Here we have found out that both s and t are a reference to the same object - they simply share (point to) the same address in memory.
One place Python differs from other languages is that it considers strings as a simple type, and these are independent, so they behave like numbers:
>>> j = 'Pussycat'
>>> k = j
>>> k
'Pussycat'
>>> k = 'Owl'
>>> j
'Pussycat' # <-- Original has not changed
Whereas in C strings are definitely handled as complex types, and would behave like the Python list example.
The upshot of all this, is that when objects that are handled by reference are modified, all references-to this object "see" the change. So if the object is passed to a function that modifies it (i.e.: the content of memory holding the data is changed), the change is reflected outside that function too.
But if a simple type is changed, or passed to a function, it is copied to the function, so the changes are not seen in the original.
For example:
def fnA( my_list ):
my_list.append( 'A' )
a_list = [ 'B' ]
fnA( a_list )
print( str( a_list ) )
['B', 'A'] # <-- a_list was changed inside the function
But:
def fnB( number ):
number += 1
x = 3
fnB( x )
print( x )
3 # <-- x was NOT changed inside the function
So keeping in mind that the memory of "objects" that are used by reference is shared by all copies, and memory of simple types is not, it's fairly obvious that the two types operate differently.
Objects are things. Generally, they're what you see on the right hand side of an equation.
Variable names (often just called "names") are references to the actual object. When a name is on the right hand side of an equation1, the object that it references is automatically looked up and used in the equation. The result of the expression on the right hand side is an object. The name on the left hand side of the equation becomes a reference to this (possibly new) object.
Note, you can have object references that aren't explicit names if you are working with container objects (like lists or dictionaries):
a = [] # the name a is a reference to a list.
a.append(12345) # the container list holds a reference to an integer object
In a similar way, multiple names can refer to the same object:
a = []
b = a
We can demonstrate that they are the same object by looking at the id of a and b and noting that they are the same. Or, we can look at the "side-effects" of mutating the object referenced by a or b (if we mutate one, we mutate both because they reference the same object).
a.append(1)
print a, b # look mom, both are [1]!
1More accurately, when a name is used in an expression
In python, strictly speaking, the language has only naming references to the objects, that behave as labels. The assignment operator only binds to the name. The objects will stay in the memory until they are garbage collected
Ok, first things first.
Remember, there are two types of objects in python.
Mutable : Whose values can be changed. Eg: dictionaries, lists and user defined objects(unless defined immutable)
Immutable : Whose values can't be changed. Eg: tuples, numbers, booleans and strings.
Now, when python says PASS BY OBJECT REFERENECE, just remember that
If the underlying object is mutable, then any modifications done will persist.
and,
If the underlying object is immutable, then any modifications done will not persist.
If you still want examples for clarity, scroll down or click here .
>>> d
{1: 1, 2: 2, 3: 3}
>>> lst = [d, d]
>>> c=lst[0]
>>> c[1]=5
>>> lst
[{1: 5, 2: 2, 3: 3}, {1: 5, 2: 2, 3: 3}]
When lst = [d, d], are lst[0] and lsg[1] both references to the memory block of d, instead of creating two memory blocks and copy the content of d to them respectively?
When c=lst[0], is c just a reference to the memory occupied by lst[0], instead of creating a new memory block and copy the content from lst[0]?
In Python, when is a reference created to point to an existing memory block, and when is a new memory block allocated and then copy?
This language feature of Python is different from C. What is the name of this language feature?
Thanks.
All variables (and other containers, such as dictionaries, lists, and object attributes) hold references to objects. Memory allocation occurs when the object is instantiated. Simple assignment always creates another reference to the existing object. For example, if you have:
a = [1, 2, 3]
b = a
Then b and a point to the same object, a list. You can verify this using the is operator:
print(b is a) # True
If you change a, then b changes too, because they are two names for the same object.
a.append(4)
print(b[3] == 4) # True
print(b[3] is a[3]) # also True
If you want to create a copy, you must do so explicitly. Here are some ways of doing this:
For lists, use a slice: b = a[:].
For many types, you can use the type name to copy an existing object of that type: b = list(a). When creating your own classes, this is a good approach to take if you need copy functionality.
The copy module has methods that can be used to copy objects (either shallowly or deeply).
For immutable types, such as strings, numbers, and tuples, there is never any need to make a copy. You can only "change" these kinds of values by referencing different ones.
The best way of describing this is probably "everything's an object." In C, "primitive" types like integers are treated differently from arrays. In Python, they are not: all values are stored as references to objects—even integers.
This paragraph from the Python tutorial should help clear things up for you:
Objects have individuality, and multiple names (in multiple scopes)
can be bound to the same object. This is known as aliasing in other
languages. This is usually not appreciated on a first glance at
Python, and can be safely ignored when dealing with immutable basic
types (numbers, strings, tuples). However, aliasing has a possibly
surprising effect on the semantics of Python code involving mutable
objects such as lists, dictionaries, and most other types. This is
usually used to the benefit of the program, since aliases behave like
pointers in some respects. For example, passing an object is cheap
since only a pointer is passed by the implementation; and if a
function modifies an object passed as an argument, the caller will see
the change — this eliminates the need for two different argument
passing mechanisms as in Pascal.
To answer your individual questions in more detail:
When lst = [d, d], are lst[0] and lst[1] both references to the memory block of d, instead of creating two memory blocks and copy the content of d to them respectively?
No. They don't refer to the memory block of d. lst[0] and lst[1] are aliasing the same object as d, at that point in time. Proof: If you assign d to a new object after initializing the list, lst[0] and lst[1] will be unchanged. If you mutate the object aliased by d, then the mutation is visible lst[0] and lst[1], because they alias the same object.
When c=lst[0], is c just a reference to the memory occupied by lst[0], instead of creating a new memory block and copy the content from lst[0]?
Again no. It's not a reference to the memory occupied by lst[0]. Proof: if you assign lst[0] to a new object, c will be unchanged. If you modify a mutable object (like the dictionary that lst[0] points to) you will see the change in c, because c is referring to the same object, the original dictionary.
In Python, when is a reference created to point to an existing memory block, and when is a new memory block allocated and then copy?
Python doesn't really work with "memory blocks" in the same way that C does. It is an abstraction away from that. Whenever you create a new object, and assign it to a variable, you've obviously got memory allocated for that object. But you will never work with that memory directly, you work with references to the objects in that memory.
Those references are the values that get assigned to symbolic names, AKA variables, AKA aliases. "pass-by-reference" is a concept from pointer-based languages like C and C++, and does not apply to Python. There is a blog post which I believe covers this topic the best.
It is often argued whether Python is pass-by-value, pass-by-reference, or pass-by-object-reference. The truth is that it doesn't matter how you think of it, as long as you understand that the entire language specification is just an abstraction for working with names and objects. Java and Ruby have similar execution models, but the Java docs call it pass-by-value while the Ruby docs call it pass-by-reference. The Python docs remain neutral on the subject, so it's best not to speculate and just see things for what they are.
This language feature of Python is different from C. What is the name of this language feature?
Associating names with objects is known as name binding. Allowing multiple names (in potentially multiple scopes) to be bound to the same object is known as aliasing. You can read more about aliasing in the Python tutorial and on Wikipedia.
It might also be helpful for you to read would be the execution model documentation where it talks about name binding and scopes in more detail.
In short; Python is pass-by-reference. Objects are created and memory allocated upon their construction. Referencing objects does not allocate more memory unless you are either creating new objects or expanding existing objects (list.append())
This post Is Python pass-by-reference or pass-by-value covers it very well.
As a side note; if you are worried about how memory is allocated in a manage programming language like Python then you're probably using the wrong language and/or prematurely optimizing. Also how memory is managed in Python is implemtnation specific as there are many implementations of Python; CPython (what you are probably using); Jython, IronPython, PyPy, MicroPython, etc.
Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?
I know this question is a bit general but I really would like to know :)
In C and C++, a variable is a named memory location. The value of the variable is the value stored in that location. Assign to the variable and you modify that value. So the variable is the memory location, not the name for it.
In Python, a variable is a name used to refer to an object. The value of the variable is that object. So far sounds like the same thing. But assign to the variable and you don't modify the object itself, rather you alter which object the variable refers to. So the variable is the name, not the object.
For this reason, if you're considering the properties of Python in the abstract, or if you're talking about multiple languages at once, then it's useful to use different names for these two different things. To keep things straight you might avoid talking about variables in Python, and refer to what the assignment operator does as "binding" rather than "assignment".
Note that The Python grammar talks about "assignments" as a kind of statement, not "bindings". At least some of the Python documentation calls names variables. So in the context of Python alone, it's not incorrect to do the same. Different definitions for jargon words apply in different contexts.
In, for example, C, a variable is a location in memory identified by a specific name. For example, int i; means that there is a 4-byte (usually) variable identified by i. This memory location is allocated regardless of whether a value is assigned to it yet. When C runs i = 1000, it is changing the value stored in the memory location i to 1000.
In python, the memory location and size is irrelevant to the interpreter. The closest python comes to a "variable" in the C sense is a value (e.g. 1000) which exists as an object somewhere in memory, with or without a name attached. Binding it to a name happens by i = 1000. This tells python to create an integer object with a value of 1000, if it does not already exist, and bind to to the name 'i'. An object can be bound to multiple names quite easily, e.g:
>>> a = [] # Create a new list object and bind it to the name 'a'
>>> b = a # Get the object bound to the name 'a' and bind it to the name 'b'
>>> a is b # Are the names 'a' and 'b' bound to the same object?
True
This explains the difference between the terms, but as long as you understand the difference it doesn't really matter which you use. Unless you're pedantic.
I'm not sure the name/binding description is the easiest to understand, for example I've always been confused by it even if I've a somewhat accurate understanding of how Python (and cpython in particular) works.
The simplest way to describe how Python works if you're coming from a C background is to understand that all variables in Python are indeed pointers to objects and for example that a list object is indeed an array of pointers to values. After a = b both a and b are pointing to the same object.
There are a couple of tricky parts where this simple model of Python semantic seems to fail, for example with list augmented operator += but for that it's important to note that a += b in Python is not the same as a = a + b but it's a special increment operation (that can also be defined for user types with the __iadd__ method; a += b is indeed a = a.__iadd__(b)).
Another important thing to understand is that while in Python all variables are indeed pointers still there is no pointer concept. In other words you cannot pass a "pointer to a variable" to a function so that the function can change the variable: what in C++ is defined by
void increment(int &x) {
x += 1;
}
or in C by
void increment(int *x) {
*x += 1;
}
in Python cannot be defined because there's no way to pass "a variable", you can only pass "values". The only way to pass a generic writable place in Python is to use a callback closure.
who said you should? Unless you are discussing issues that are directly related to name binding operations; it is perfectly fine to talk about variables and assignments in Python as in any other language. Naturally the precise meaning is different in different programming languages.
If you are debugging an issue connected with "Naming and binding" then use this terminology because Python language reference uses it: to be as specific and precise as possible, to help resolve the problem by avoiding unnecessary ambiguity.
On the other hand, if you want to know what is the difference between variables in C and Python then these pictures might help.
I would say that the distinction is significant because of several of the differences between C and Python:
Duck typing: a C variable is always an instance of a given type - in Python it isn't the type that a name refers to can change.
Shallow copies - Try the following:
>>> a = [4, 5, 6]
>>> b = a
>>> b[1] = 0
>>> a
[4, 0, 6]
>>> b = 3
>>> a
[4, 0, 6]
This makes sense as a and b are both names that spend some of the time bound to a list instance rather than being separate variables.