When we look at the following code,
my_var = "Hello World"
id(my_var)
The statement id(my_var) returns the address/location of the string-object "Hello World"
I was wondering if we have any command, with which I can get the address/location of my_var
I am trying to understand memory in Python. For example, in C-programming I can get the address of variable and pointer in following way
int var;
int *var_ptr = &var;
printf ("%d", var_ptr); % Prints address of var
printf ("%d", &var_ptr); % Prints address of var_ptr
You can’t, but the reason is so fundamental that I think it worth posting anyway. In C, a pointer can be formed to any variable, including another pointer. (Another pointer variable, that is: you can write &p, but not &(p+1).) In Python, every variable is a pointer but every pointer is to an object.
A variable, not being an object, cannot be the referent of a pointer. Variables can however be parts of objects, accessed either as o.foo or o[bar] (where bar might be an index or a dictionary key). In fact, every variable is such an object component except a local variable; as a corollary, it is impossible to assign to a local variable from any other (non-nested) function. By contrast, C does that regularly by passing &local to whatever other function.
This distinction is readily illustrated by C++ containers: they typically provide operator[] to return a reference (a pointer that, like a Python reference, is automatically dereferenced) to an element to which = can be applied, whereas the Python equivalent is to provide both __getitem__ (to return a reference) and __setitem__ (which implements []= all at once to store to a variable).
In CPython’s implementation, of course, each Python variable is a PyObject* variable, and a PyObject** to one can be used for internal purposes, but those are always temporary and do not even conceptually exist at the Python level. As such, there is no equivalent for id for them.
Related
The docs states that:
It is important to realize that scopes are determined textually: the global scope of a function defined in a module is that module’s namespace, no matter from where or by what alias the function is called. On the other hand, the actual search for names is done dynamically, at run time
I understand the first part: scopes are determined textually. But what does it mean that the actual search for names is done dynamically at run time? As opposed to what?
Let's try to compare this to what happens in C for instance, as I understand that this is the opposite of what happens in Python.
In C, consider the following code:
int a = 5
printf("The value of a is: %d\n", a);
So in C, the actual search for names is done at compile time - that means that the compiled machine code for the printf function will contain reference to the memory address of a whereas in Python
a = 5
print(a)
The compiled code of the print(a) will contain instructions for going looking in the namespace dictionary for what is pointed to by a and then access it.
Is that correct?
It means that a name can suddenly start resolving to something else, because it was redefined during the execution of the program. The alternative would be to resolve names when the program is read and parsed, and stick to this interpretation. (Which would be somewhat faster allow considerable additional optimization, e.g. by "knowing" things about the default behavior of Python built-in functions; but it is not how the language was designed.)
Here's an example that suddenly changes behavior:
for n in range(3):
print(max([10, 20, 30]))
max = min
This loop will print what you expect on the first iteration, but from then on the identifier max will refer to the local variable max, and will resolve to the builtin min(). Silly, but realistic use cases are a different question...
As opposed to being done statically at compile-time.
For instance in a language like C or Rust, by default symbols are looked up at compile-time, and at runtime the code just goes to whatever was resolved during compilation.
In Python however, every time you call a function the interpreter will look for that name in the relevant scope(s), then will use whatever's bound to that name at that point in time. Semantically if not necessarily technically.
So if e.g. you swap the object assigned to that name, then the code will call the remplacement instead of the original. Even if the replacement is not callable at all.
I can't figure out how to change the value of a parameter passed from Python to C.
PyArg_ParseTuple (args, "Os", &file_handle, &filename)
will let me get file_handle as a PyObject *. Is there a way to change the value file_handle represents? I know I can return multiple values to a Python function call, but that isn't what I want to do in this case. Just for consistency with the C API I am making a module to represent.
You can't change what the caller's parameter refers to in the caller, all you can do is perform mutations of the object itself using its API. Basically, you received a copy of the caller's pointer, not a C++-style reference (nor a C-style double pointer that would give you access to a pointer declared in the caller), so you can't reassign the argument in the caller.
In general, you don't want to try to perfectly reproduce C APIs (I'm assuming your C API uses double-pointers to allow reassigning the value in the caller?) in Python APIs. That's how PHP operates, and it makes for terribly inconsistent APIs that often take no advantage of being in a high level language.
This case is doubly-fraught because, when used properly with with statements, file-like objects actually have multiple references (not C++ meaning) to them, the named variable (that was passed to your function) and one or more hidden references held inside the interpreter (to ensure the with statement has a consistent __exit__ to call, even if the caller deletes their own binding for the object). Even if you could somehow reassign the caller's argument, the with statement would still refer to the original file object, and it wouldn't be obvious to the caller that they needed to close (implicitly using with or explicitly calling close) the result again because your function replaced their object.
Return multiple results (Py_BuildValue makes this easy), and the caller can replace their value if they want to.
I am coding in Python trying to decide whether I should return a numpy array (the result of a diff on some other array) or return numpy.where(diff)[0], which is a smaller array but requires that little extra work to create. Let's call the method where this happens methodB.
I call methodB from methodA. The rub is that I won't necessarily always need the where() result in methodA, but I might. So is it worth doing this work inside methodB, or should I pass back the (much larger memory-wise) diff itself and then only process it further in methodA if needed? That would be the more efficient choice assuming methodA just gets a reference to the result.
So, are function results ever not copied when they are passed back the the code that called that function?
I believe that when methodB finishes, all the memory in its frame will be reclaimed by the system, so methodA has to actually copy anything returned by methodB in to its own frame in order to be able to use it. I would call this "return by value". Is this correct?
Yes, you are correct. In Python, arguments are always passed by value, and return values are always returned by value. However, the value being returned (or passed) is a reference to a potentially shared, potentially mutable object.
There are some types for which the value being returned or passed may be the actual object itself, e.g. this is the case for integers, but the difference between the two can only be observed for mutable objects which integers aren't, and de-referencing an object reference is completely transparent, so you will never notice the difference. To simplify your mental model, you may just assume that arguments and return values are always passed by value (this is true anyhow), and that the value being passed is always a reference (this is not always true, but you cannot tell the difference, you can treat it as a simple performance optimization).
Note that passing / returning a reference by value is in no way similar (and certainly not the same thing) as passing / returning by reference. In particular, it does not allow you to mutate the name binding in the caller / callee, as pass-by-reference would allow you to.
This particular flavor of pass-by-value, where the value is typically a reference is the same in e.g. ECMAScript, Ruby, Smalltalk, and Java, and is sometimes called "call by object sharing" (coined by Barbara Liskov, I believe), "call by sharing", "call by object", and specifically within the Python community "call by assignment" (thanks to #timgeb) or "call by name-binding" (thanks to #Terry Jan Reedy) (not to be confused with call by name, which is again a different thing).
Assignment never copies data. If you have a function foo that returns a value, then an assignment like result = foo(arg) never copies any data. (You could, of course, have copy-operations in the function's body.) Likewise, return x does not copy the object x.
Your question lacks a specific example, so I can't go into more detail.
edit: You should probably watch the excellent Facts and Myths about Python names and values talk.
So roughly your code is:
def methodA(arr):
x = methodB(arr)
....
def methodB(arr):
diff = somefn(arr)
# return diff or
# return np.where(diff)[0]
arr is a (large) array, that is passed a reference to methodA and methodB. No copies are made.
diff is a similar size array that is generated in methodB. If that is returned, it be referenced in the methodA namespace by x. No copy is made in returning it.
If the where array is returned, diff disappears when methodB returns. Assuming it doesn't share a data buffer with some other array (such as arr), all the memory that it occupied is recovered.
But as long as memory isn't tight, returning diff instead of the where result won't be more expensive. Nothing is copied during the return.
A numpy array consists of small object wrapper with attributes like shape and dtype. It also has a pointer to a potentially large data buffer. Where possible numpy tries to share buffers, but readily makes new ndarray objects. Thus there's an important distinction between view and copy.
I see what I missed now: Objects are created on the heap, but function frames are on the stack. So when methodB finishes, its frame will be reclaimed, but that object will still exist on the heap, and methodA can access it with a simple reference.
Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?
I know this question is a bit general but I really would like to know :)
In C and C++, a variable is a named memory location. The value of the variable is the value stored in that location. Assign to the variable and you modify that value. So the variable is the memory location, not the name for it.
In Python, a variable is a name used to refer to an object. The value of the variable is that object. So far sounds like the same thing. But assign to the variable and you don't modify the object itself, rather you alter which object the variable refers to. So the variable is the name, not the object.
For this reason, if you're considering the properties of Python in the abstract, or if you're talking about multiple languages at once, then it's useful to use different names for these two different things. To keep things straight you might avoid talking about variables in Python, and refer to what the assignment operator does as "binding" rather than "assignment".
Note that The Python grammar talks about "assignments" as a kind of statement, not "bindings". At least some of the Python documentation calls names variables. So in the context of Python alone, it's not incorrect to do the same. Different definitions for jargon words apply in different contexts.
In, for example, C, a variable is a location in memory identified by a specific name. For example, int i; means that there is a 4-byte (usually) variable identified by i. This memory location is allocated regardless of whether a value is assigned to it yet. When C runs i = 1000, it is changing the value stored in the memory location i to 1000.
In python, the memory location and size is irrelevant to the interpreter. The closest python comes to a "variable" in the C sense is a value (e.g. 1000) which exists as an object somewhere in memory, with or without a name attached. Binding it to a name happens by i = 1000. This tells python to create an integer object with a value of 1000, if it does not already exist, and bind to to the name 'i'. An object can be bound to multiple names quite easily, e.g:
>>> a = [] # Create a new list object and bind it to the name 'a'
>>> b = a # Get the object bound to the name 'a' and bind it to the name 'b'
>>> a is b # Are the names 'a' and 'b' bound to the same object?
True
This explains the difference between the terms, but as long as you understand the difference it doesn't really matter which you use. Unless you're pedantic.
I'm not sure the name/binding description is the easiest to understand, for example I've always been confused by it even if I've a somewhat accurate understanding of how Python (and cpython in particular) works.
The simplest way to describe how Python works if you're coming from a C background is to understand that all variables in Python are indeed pointers to objects and for example that a list object is indeed an array of pointers to values. After a = b both a and b are pointing to the same object.
There are a couple of tricky parts where this simple model of Python semantic seems to fail, for example with list augmented operator += but for that it's important to note that a += b in Python is not the same as a = a + b but it's a special increment operation (that can also be defined for user types with the __iadd__ method; a += b is indeed a = a.__iadd__(b)).
Another important thing to understand is that while in Python all variables are indeed pointers still there is no pointer concept. In other words you cannot pass a "pointer to a variable" to a function so that the function can change the variable: what in C++ is defined by
void increment(int &x) {
x += 1;
}
or in C by
void increment(int *x) {
*x += 1;
}
in Python cannot be defined because there's no way to pass "a variable", you can only pass "values". The only way to pass a generic writable place in Python is to use a callback closure.
who said you should? Unless you are discussing issues that are directly related to name binding operations; it is perfectly fine to talk about variables and assignments in Python as in any other language. Naturally the precise meaning is different in different programming languages.
If you are debugging an issue connected with "Naming and binding" then use this terminology because Python language reference uses it: to be as specific and precise as possible, to help resolve the problem by avoiding unnecessary ambiguity.
On the other hand, if you want to know what is the difference between variables in C and Python then these pictures might help.
I would say that the distinction is significant because of several of the differences between C and Python:
Duck typing: a C variable is always an instance of a given type - in Python it isn't the type that a name refers to can change.
Shallow copies - Try the following:
>>> a = [4, 5, 6]
>>> b = a
>>> b[1] = 0
>>> a
[4, 0, 6]
>>> b = 3
>>> a
[4, 0, 6]
This makes sense as a and b are both names that spend some of the time bound to a list instance rather than being separate variables.
I am trying to write a program to illustrate to A level students the difference between call by reference and call by value using Python. I had succeeded by passing mutable objects as variables to functions, but found I could also do the same using the ctypes library.
I don't quite understand how it works because there is a function byref() in the ctype library, but it didn't work in my example. However, by calling a function without byref() it did work!
My working code:
"""
Program to illustrate call by ref
"""
from ctypes import * #allows call by ref
test = c_int(56) #Python call by reference eg address
t = 67 #Python call by value eg copy
#expects a ctypes argument
def byRefExample(x):
x.value= x.value + 2
#expects a normal Python variable
def byValueExample(x):
x = x + 2
if __name__ == "__main__":
print "Before call test is",test
byRefExample(test)
print "After call test is",test
print "Before call t is",t
byValueExample(t)
print "After call t is",t
Question
When passing a normal Python variable to byValueExample() it works as expected. The copy of the function argument t changes but the variable t in the header does not. However, when I pass the ctypes variable test both the local and the header variable change, thus it is acting like a C pointer variable. Although my program works, I am not sure how and why the byref() function doesn't work when used like this:
byRefExample(byref(test))
You're actually using terminology that's not exactly correct, and potentially very misleading. I'll explain at the end. But first I'll answer in terms of your wording.
I had succeeded by passing mutable objects as variables to functions but found I could also do the same using the ctypes library.
That's because those ctypes objects are mutable objects, so you're just doing the same thing you already did. In particular, a ctypes.c_int is a mutable object holding an integer value, which you can mutate by setting its value member. So you're already doing the exact same thing you'd done without ctypes.
In more detail, compare these:
def by_ref_using_list(x):
x[0] += 1
value = [10]
by_ref_using_list(value)
print(value[0])
def by_ref_using_dict(x):
x['value'] += 1
value = {'value': 10}
by_ref_using_list(value)
print(value['value'])
class ValueHolder(object):
def __init__(self, value):
self.value = value
def by_ref_using_int_holder(x):
x.value += 1
value = ValueHolder(10)
by_ref_using_list(value)
print(value.value)
You'd expect all three of those to print out 11, because they're just three different ways of passing different kinds of mutable objects and mutating them.
And that's exactly what you're doing with c_int.
You may want to read the FAQ How do I write a function with output parameters (call by reference)?, although it seems like you already know the answers there, and just wanted to know how ctypes fits in…
So, what is byref even for, then?
It's used for calling a C function that takes values by reference C-style: by using explicit pointer types. For example:
void by_ref_in_c(int *x) {
*x += 1;
}
You can't pass this a c_int object, because it needs a pointer to a c_int. And you can't pass it an uninitialized POINTER(c_int), because then it's just going to be writing to random memory. You need to get the pointer to an actual c_int. Which you can do like this:
x = c_int(10)
xp = pointer(x)
by_ref_in_c(xp)
print(x)
That works just fine. But it's overkill, because you've created an extra Python ctypes object, xp, that you don't really need for anything. And that's what byref is for: it gives you a lightweight pointer to an object, that can only be used for passing that object by reference:
x = c_int(10)
by_ref_in_c(byref(x))
print(x)
And that explains why this doesn't work:
byRefExample(byref(test))
That call is making a lightweight pointer to test, and passing that pointer to byRefExample. But byRefExample doesn't want a pointer to a c_int, it wants a c_int.
Of course this is all in Python, not C, so there's no static type checking going on. The function call works just fine, and your code doesn't care what type it gets, so long as it has a value member that you can increment. But a POINTER doesn't have a value member. (It has a contents member instead.) So, you get an AttributeError trying to access x.value.
So, how do you do this kind of thing?
Well, using a single-element-list is a well-known hack to get around the fact that you need to share something mutable but you only have something immutable. If you use it, experienced Python programmers will know what you're up to.
That being said, if you think you need this, you're usually wrong. Often the right answer is to just return the new value. It's easier to reason about functions that don't mutate anything. You can string them together in any way you want, turn them inside-out with generators and iterators, ship them off to child processes to take advantage of those extra cores in your CPU, etc. And even if you don't do any of that stuff, it's usually faster to return a new value than to modify one in-place, even in cases where you wouldn't expect that (e.g., deleting 75% of the values in a list).
And often, when you really do need mutable values, there's already an obvious place for them to live, such as instance attributes of a class.
But sometimes you do need the single-element list hack, so it's worth having in your repertoire; just don't use it when you don't need it.
So, what's wrong with your terminology?
In a sense (the sense Ruby and Lisp programmers use), everything in Python is pass-by-reference. In another sense (the sense many Java and VB programmers use), it's all pass-by-value. But really, it's best to not call it either.* What you're passing is neither a copy of the value of a variable, nor a reference to a variable, but a reference to a value. When you call that byValueExample(t) function, you're not passing a new integer with the value 67 the way you would in C, you're passing a reference to the same integer 67 that's bound to the name t. If you could mutate 67 (you can't, because ints are immutable), the caller would see the change.
Second, Python names are not even variables in the sense you're thinking of. In C, a variable is an lvalue. It has a type and, more importantly, an address. So, you can pass around a reference to the variable itself, rather than to its value. In Python, a name is just a name (usually a key in a module, local, or object dictionary). It doesn't have a type or an address. It's not a thing you can pass around. So, there is no way to pass the variable x by reference.**
Finally, = in Python isn't an assignment operator that copies a value to a variable; it's a binding operator that gives a value a name. So, in C, when you write x = x + 1, that copies the value x + 1 to the location of the variable x, but in Python, when you write x = x + 1, that just rebinds the local variable x to refer to the new value x + 1. That won't have any effect on whatever value x used to be bound to. (Well, if it was the only reference to that value, the garbage collector might clean it up… but that's it.)
This is actually a lot easier to understand if you're coming from C++, which really forces you to understand rvalues and lvalues and different kinds of references and copy construction vs. copy assignment and so on… In C, it's all deceptively simple, which makes it harder to realize how very different it is from the equally-simple Python.
* Some people in the Python community like to call it "pass-by-sharing". Some researchers call it "pass-by-object". Others choose to first differentiate between value semantics and reference semantics, before describing calling styles, so you can call this "reference-semantics pass-by-copy". But, while at least those names aren't ambiguous, they also aren't very well known, so they're not likely to help anyone. I think it's better to describe it than to try to figure out the best name for it…
** Of course, because Python is fully reflective, you can always pass the string x and the context in which it's found, directly or indirectly… If your byRefExample did globals()['x'] = x + 2, that would affect the global x. But… don't do that.
Python uses neither "call-by-reference" or "call-by-value" but "call-by-object". Assignment gives names to objects.
test = c_int(56)
t = 67
test is a name given to a ctypes.c_int object that internally has a value name assigned to an int object.
t is a name give to an int object.
When calling byRefExample(test), x is another name given to the ctypes.c_int object referenced by test.
x.value = x.value + 2
The above reassigns the 'value' name stored in the ctypes.c_int object to a completely new int object with a different value. Since value is an attribute of the same ctypes.c_int object referred by the names test and x, x.value and test.value are referring to the same value.
When calling byValueExample(t), x is another name given to the int object referenced by t.
x = x + 2
The above reassigns the name x to a completely new int object with a different value. x and t no longer refer to the same object, so t will not observe the change. It still refers to the original int object.
You can observe this by printing the id() of the objects at different points in time:
from ctypes import *
test = c_int(56)
t = 67
print('test id =',id(test))
print('t id =',id(t))
#expects a ctypes argument
def byRefExample(x):
print('ByRef x',x,id(x))
print('ByRef x.value',x.value,id(x.value))
x.value = x.value + 2
print('ByRef x.value',x.value,id(x.value))
print('ByRef x',x,id(x))
#expects a normal Python variable
def byValueExample(x):
print('ByVal x',x,id(x))
x = x + 2
print('ByVal x',x,id(x))
print("Before call test is",test,id(test))
print("Before call test is",test.value,id(test.value))
byRefExample(test)
print("After call test is",test.value,id(test.value))
print("After call test is",test,id(test))
print("Before call t is",t,id(t))
byValueExample(t)
print("After call t is",t,id(t))
Output (with comments):
test id = 80548680
t id = 507083328
Before call test is c_long(56) 80548680
Before call test.value is 56 507082976
ByRef x c_long(56) 80548680 # same id as test
ByRef x.value 56 507082976
ByRef x.value 58 507083040 # x.value is new object!
ByRef x c_long(58) 80548680 # but x is still the same.
After call test.value is 58 507083040 # test.value sees new object because...
After call test is c_long(58) 80548680 # test is same object as x.
Before call t is 67 507083328
ByVal x 67 507083328 # same id as t
ByVal x 69 507083392 # x is new object!
After call t is 67 507083328 # t id same old object.