How to change function parameter values in a Python 3 C extension? - python

I can't figure out how to change the value of a parameter passed from Python to C.
PyArg_ParseTuple (args, "Os", &file_handle, &filename)
will let me get file_handle as a PyObject *. Is there a way to change the value file_handle represents? I know I can return multiple values to a Python function call, but that isn't what I want to do in this case. Just for consistency with the C API I am making a module to represent.

You can't change what the caller's parameter refers to in the caller, all you can do is perform mutations of the object itself using its API. Basically, you received a copy of the caller's pointer, not a C++-style reference (nor a C-style double pointer that would give you access to a pointer declared in the caller), so you can't reassign the argument in the caller.
In general, you don't want to try to perfectly reproduce C APIs (I'm assuming your C API uses double-pointers to allow reassigning the value in the caller?) in Python APIs. That's how PHP operates, and it makes for terribly inconsistent APIs that often take no advantage of being in a high level language.
This case is doubly-fraught because, when used properly with with statements, file-like objects actually have multiple references (not C++ meaning) to them, the named variable (that was passed to your function) and one or more hidden references held inside the interpreter (to ensure the with statement has a consistent __exit__ to call, even if the caller deletes their own binding for the object). Even if you could somehow reassign the caller's argument, the with statement would still refer to the original file object, and it wouldn't be obvious to the caller that they needed to close (implicitly using with or explicitly calling close) the result again because your function replaced their object.
Return multiple results (Py_BuildValue makes this easy), and the caller can replace their value if they want to.

Related

What does it mean that "actual search of names is done at runtime" in python?

The docs states that:
It is important to realize that scopes are determined textually: the global scope of a function defined in a module is that module’s namespace, no matter from where or by what alias the function is called. On the other hand, the actual search for names is done dynamically, at run time
I understand the first part: scopes are determined textually. But what does it mean that the actual search for names is done dynamically at run time? As opposed to what?
Let's try to compare this to what happens in C for instance, as I understand that this is the opposite of what happens in Python.
In C, consider the following code:
int a = 5
printf("The value of a is: %d\n", a);
So in C, the actual search for names is done at compile time - that means that the compiled machine code for the printf function will contain reference to the memory address of a whereas in Python
a = 5
print(a)
The compiled code of the print(a) will contain instructions for going looking in the namespace dictionary for what is pointed to by a and then access it.
Is that correct?
It means that a name can suddenly start resolving to something else, because it was redefined during the execution of the program. The alternative would be to resolve names when the program is read and parsed, and stick to this interpretation. (Which would be somewhat faster allow considerable additional optimization, e.g. by "knowing" things about the default behavior of Python built-in functions; but it is not how the language was designed.)
Here's an example that suddenly changes behavior:
for n in range(3):
print(max([10, 20, 30]))
max = min
This loop will print what you expect on the first iteration, but from then on the identifier max will refer to the local variable max, and will resolve to the builtin min(). Silly, but realistic use cases are a different question...
As opposed to being done statically at compile-time.
For instance in a language like C or Rust, by default symbols are looked up at compile-time, and at runtime the code just goes to whatever was resolved during compilation.
In Python however, every time you call a function the interpreter will look for that name in the relevant scope(s), then will use whatever's bound to that name at that point in time. Semantically if not necessarily technically.
So if e.g. you swap the object assigned to that name, then the code will call the remplacement instead of the original. Even if the replacement is not callable at all.

how to get memory location of a variable in Python

When we look at the following code,
my_var = "Hello World"
id(my_var)
The statement id(my_var) returns the address/location of the string-object "Hello World"
I was wondering if we have any command, with which I can get the address/location of my_var
I am trying to understand memory in Python. For example, in C-programming I can get the address of variable and pointer in following way
int var;
int *var_ptr = &var;
printf ("%d", var_ptr); % Prints address of var
printf ("%d", &var_ptr); % Prints address of var_ptr
You can’t, but the reason is so fundamental that I think it worth posting anyway. In C, a pointer can be formed to any variable, including another pointer. (Another pointer variable, that is: you can write &p, but not &(p+1).) In Python, every variable is a pointer but every pointer is to an object.
A variable, not being an object, cannot be the referent of a pointer. Variables can however be parts of objects, accessed either as o.foo or o[bar] (where bar might be an index or a dictionary key). In fact, every variable is such an object component except a local variable; as a corollary, it is impossible to assign to a local variable from any other (non-nested) function. By contrast, C does that regularly by passing &local to whatever other function.
This distinction is readily illustrated by C++ containers: they typically provide operator[] to return a reference (a pointer that, like a Python reference, is automatically dereferenced) to an element to which = can be applied, whereas the Python equivalent is to provide both __getitem__ (to return a reference) and __setitem__ (which implements []= all at once to store to a variable).
In CPython’s implementation, of course, each Python variable is a PyObject* variable, and a PyObject** to one can be used for internal purposes, but those are always temporary and do not even conceptually exist at the Python level. As such, there is no equivalent for id for them.

Are all objects returned by value rather than by reference?

I am coding in Python trying to decide whether I should return a numpy array (the result of a diff on some other array) or return numpy.where(diff)[0], which is a smaller array but requires that little extra work to create. Let's call the method where this happens methodB.
I call methodB from methodA. The rub is that I won't necessarily always need the where() result in methodA, but I might. So is it worth doing this work inside methodB, or should I pass back the (much larger memory-wise) diff itself and then only process it further in methodA if needed? That would be the more efficient choice assuming methodA just gets a reference to the result.
So, are function results ever not copied when they are passed back the the code that called that function?
I believe that when methodB finishes, all the memory in its frame will be reclaimed by the system, so methodA has to actually copy anything returned by methodB in to its own frame in order to be able to use it. I would call this "return by value". Is this correct?
Yes, you are correct. In Python, arguments are always passed by value, and return values are always returned by value. However, the value being returned (or passed) is a reference to a potentially shared, potentially mutable object.
There are some types for which the value being returned or passed may be the actual object itself, e.g. this is the case for integers, but the difference between the two can only be observed for mutable objects which integers aren't, and de-referencing an object reference is completely transparent, so you will never notice the difference. To simplify your mental model, you may just assume that arguments and return values are always passed by value (this is true anyhow), and that the value being passed is always a reference (this is not always true, but you cannot tell the difference, you can treat it as a simple performance optimization).
Note that passing / returning a reference by value is in no way similar (and certainly not the same thing) as passing / returning by reference. In particular, it does not allow you to mutate the name binding in the caller / callee, as pass-by-reference would allow you to.
This particular flavor of pass-by-value, where the value is typically a reference is the same in e.g. ECMAScript, Ruby, Smalltalk, and Java, and is sometimes called "call by object sharing" (coined by Barbara Liskov, I believe), "call by sharing", "call by object", and specifically within the Python community "call by assignment" (thanks to #timgeb) or "call by name-binding" (thanks to #Terry Jan Reedy) (not to be confused with call by name, which is again a different thing).
Assignment never copies data. If you have a function foo that returns a value, then an assignment like result = foo(arg) never copies any data. (You could, of course, have copy-operations in the function's body.) Likewise, return x does not copy the object x.
Your question lacks a specific example, so I can't go into more detail.
edit: You should probably watch the excellent Facts and Myths about Python names and values talk.
So roughly your code is:
def methodA(arr):
x = methodB(arr)
....
def methodB(arr):
diff = somefn(arr)
# return diff or
# return np.where(diff)[0]
arr is a (large) array, that is passed a reference to methodA and methodB. No copies are made.
diff is a similar size array that is generated in methodB. If that is returned, it be referenced in the methodA namespace by x. No copy is made in returning it.
If the where array is returned, diff disappears when methodB returns. Assuming it doesn't share a data buffer with some other array (such as arr), all the memory that it occupied is recovered.
But as long as memory isn't tight, returning diff instead of the where result won't be more expensive. Nothing is copied during the return.
A numpy array consists of small object wrapper with attributes like shape and dtype. It also has a pointer to a potentially large data buffer. Where possible numpy tries to share buffers, but readily makes new ndarray objects. Thus there's an important distinction between view and copy.
I see what I missed now: Objects are created on the heap, but function frames are on the stack. So when methodB finishes, its frame will be reclaimed, but that object will still exist on the heap, and methodA can access it with a simple reference.

What happens to a Python input value if it's not assigned to a variable?

Was just wondering this. So sometimes programmers will insert an input() into a block of code without assigning its value to anything for the purpose of making the program wait for an input before continuing. Usually when it runs, you're expected to just hit enter without typing anything to move forward, but what if you do type something? What happens to that string if its not assigned to any variable? Is there any way to read its value after the fact?
TL;DR: If you don't immediately assign the return value of input(), it's lost.
I can't imagine how or why you would want to retrieve it afterwards.
If you have any callable (as all callables have return values, default is None), call it and do not save its return value, there's no way to get that again. You have one chance to capture the return value, and if you miss it, it's gone.
The return value gets created inside the callable of course, the code that makes it gets run and some memory will be allocated to hold the value. Inside the callable, there's a variable name referencing the value (except if you're directly returning something, like return "unicorns".upper(). In that case there's of course no name).
But after the callable returns, what happens? The return value is still there and can be assigned to a variable name in the calling context. All names that referenced the value inside the callable are gone though. Now if you don't assign the value to a name in your call statement, there are no more names referencing it.
What does that mean? It's gets on the garbage collector's hit list and will be nuked from your memory on its next garbage collection cycle. Of course the GC implementation may be different for different Python interpreters, but the standard CPython implementation uses reference counting.
So to sum it up: if you don't assign the return value a name in your call statement, it's gone for your program and it will be destroyed and the memory it claims will be freed up any time afterwards, as soon as the GC handles it in background.
Now of course a callable might do other stuff with the value before it finally returns it and exits. There are a few possible ways how it could preserve a value:
Write it to an existing, global variable
Write it through any output method, e.g. store it in a file
If it's an instance method of an object, it can also write it to the object's instance variables.
But what for? Unless there would be any benefit from storing the last return value(s), why should it be implemented to hog memory unnecessarily?
There are a few cases where caching the return values makes sense, i.e. for functions with determinable return values (means same input always results in same output) that are often called with the same arguments and take long to calculate.
But for the input function? It's probably the least determinable function existing, even if you call random.random() you can be more sure of the result than when you ask for user input. Caching makes absolutely no sense here.
The value is discarded. You can't get it back. It's the same as if you just had a line like 2 + 2 or random.rand() by itself; the result is gone.

How to bind a name with multiple objects or values in python

I saw in a book about language description that says
On the other hand, a name can be bound to no object (a dangling pointer),
one object (the usual case), or several objects (a parameter name in a
recursive function).
How can we bind a name to several objects? Isnt that what we call an array for example where all elements have the same name but with index? For a recursive function like the example here:
x = 0
def f(y):
global x
x += 1
if x < 4 :
y +=100
f(y)
else: return
f(100)
Is the name y binded with multiple values that are created recursively since the nametable has already the y name binded to an initial value which is being reproduced with recursion?
EDITED Just press here Visualizer and see what it generates. :)
No.
A name is bound to one single object . When we are talking about Python - it is either bound to a single object in a given context, or do not exist at all.
What happens, is that the inner workings may have the name defined in several "layers" - but your code will only see one of those.
If a name is a variable in a recursive function, you will only see whatver is bound to it in the current running context - each time there is a function call in Python, the execution frame, which is an object which holds several attributes of the running code, including a reference to the local variables, is frozen. On the called function, a new execuciton frame is created, and there, the variable names are bound again to whatever new values they have in the called context. Your code just "see" this instance.
Then, there is the issue of global variables and builtin objects in Python: if a name is not a local variable in the function execution context, it is searched in the globals variables for the module (again, just one of those will be visible).ANd if the name is not defiend in the globals, them, Python looks for it in globals().__builtins__ that is your last call.
If I understand you correctly, you're asking about what rules Python has for creating variables in different scopes. Python uses lexical scoping on the function level.
It's hard to tell exactly what you're getting at with the code you've written, but, while there may be a different value associated with y in different scopes (with a value of y defined at each level of recursion), your code will only ever be able to see one at a time (the value defined at the scope in which you're operating).
To really understand scoping rules in Python, I would have a look at PEP 227. Also, have a look at this Stack Overflow question.
Finally, to be able to speak intelligently about what a "name" is in Python, I suggest you read about how Python is a "Call-By-Object" language.
At this point, we are capable of understanding that, instead of a "nametable", python uses a dictionary to hold what is accessible in a given scope. See this answer for a little more detail. The implication of this is that you can never have two of the same name in a single scope (for the same reason you can't have two of the same key in a python dictionary). So, while y may exist in a dictionary for a different scope, you have no way of accessing it, since you can only access the variables in the current scope's dictionary.
The key is:
several objects (a parameter name in a recursive function).
The passage is almost certainly not referring to arrays, but simply to the fact that in a recursive function (or any function, but a recursive function is likely to have multiple activations at one time), a parameter may be bound to a different value in each recursive call.
This does not mean that you can access each such object in every stack frame; indeed the point of the technique is to ensure that only one such value is accessible in each stack frame.
Firstly, you should mention in the question that the sentence from the book is not related explicitly to Python (as jsbueno wrote, one name is bound to exactly one object in Python).
Anyway, name bound to no object is a bit inaccurate. Generally, names are related to variables, and name related to a dangling pointer is the name of that pointer variable.
When speaking about the variable scope (i.e. the part of code where the variable is used), one variable name can be used only for a single value at a time. However, there may be other parts of code, independent on the one where we think about that variable. In the other part of code, the same name can be used; however, the two variables with the same name are totally isolated. This is the case of local variables also in the case of function bodies. If the language allows recursion, it must be capable to create another isolated space of local variable even for another call of the same function.
In Python, each function can also access outer variables, but it is more usual to use the inner, local variables. Whenever you assign a name some value, it is created in the local space.

Categories

Resources