My apologies for what some may regard as a fundamental question. In the following simple code:
def greet(name):
def say_hi():
print('Preparing to greet...')
print('Hi', name, '!')
print('Greeting given.')
return say_hi
What is the sequence of events when 'greet' is called with a formal parameter and the interpreter encounters the 'say_hi' function. I see that a reference to it is returned (forming a closure I assume?), but is the inner function executed or simply 'read' and not called until the programmer writes code like the following:
f = greet('Caroline')
f()
Since every thing in python is about runtime (except compile time tasks like peephole optimizer and etc.), python doesn't call your function unless you call it.
You can see this behavior by using dis function from dis module, which returns the relative bytecode of your function :
>>> def greet(name):
... def say_hi():
... print('Preparing to greet...')
... print('Hi', name, '!')
... print('Greeting given.')
... return say_hi
...
>>> import dis
>>>
>>> dis.dis(greet)
2 0 LOAD_CLOSURE 0 (name)
3 BUILD_TUPLE 1
6 LOAD_CONST 1 (<code object say_hi at 0x7fdacc12c8b0, file "<stdin>", line 2>)
9 MAKE_CLOSURE 0
12 STORE_FAST 1 (say_hi)
6 15 LOAD_FAST 1 (say_hi)
18 RETURN_VALUE
As you can see in part 6 python just load the function as a code object in a CONST value.
Related
I'm trying to learn about the Lambda function in Python by messing up the code to observe what errors Python would throw in various scenarios.
Can anyone please explain what exactly is the meaning of following message displayed by the IDLE and how to interpret it?
def myfunc(n):
return lambda a:a*n
myfunc(2)
error:
(function myfunc.. at 0x037800B8)
Your defined function returns a lambda function. This isn't an error message, it is printing the lambda function object.
I'll assume you mis-typed your indentation, and that the code snippet was
def myfunc(n):
return lambda a:a*n
myfunc(2)
the lambda operator here returns a function doing x-> n*x. So myfunc is essentially a function that returns another function.
Your code is legit, and the output of python is only the description of the object myfunc(2) returned. Let's break it down:
the leading function confirms that you get a function
The myfunc.<locals>.lambda is the name of the object. That part explains that this function was created in myfunc, using some of the parameters of myfunc (note that you use n in the lambda definition`. This is called a closure.
the 0x037800B8 is the memory adress this python object is stored at
You should interpret this as a function object or a functor.
myfunc.<locals>.<lambda> its just says that it is a local variable of myfunc and its a lambda function.
myfunc(2) # is a function object
print(myfunc(2)(3)) # prints 6 which is 3 * 2
This function stores the inner variable n = 2 and then returns the called value. This concept is called functional closure.
You can get into more detail if you disassembled and dive deep if you want.
from dis import dis
def myfunc(n):
return lambda a:a*n
dis(myfunc)
OUTPUT
➜ codebase git:(master) ✗ python temp.py
3 0 LOAD_CLOSURE 0 (n)
2 BUILD_TUPLE 1
4 LOAD_CONST 1 (<code object <lambda> at 0x7fc5f671f5d0, file "temp.py", line 3>)
6 LOAD_CONST 2 ('myfunc.<locals>.<lambda>')
8 MAKE_FUNCTION 8
10 RETURN_VALUE
Disassembly of <code object <lambda> at 0x7fc5f671f5d0, file "temp.py", line 3>:
3 0 LOAD_FAST 0 (a)
2 LOAD_DEREF 0 (n)
4 BINARY_MULTIPLY
6 RETURN_VALUE
This means that your command
myfunc(2)
returned a function (namely, a lambda function in myfunc scope). 0x037800B8 is its address in memory.
Looks like you're returning the entire lambda function rather than returning the result. To return the lambda function's value, surround it with parenthesis like so:
>>> def myfunc(n):
return (lambda a: a*n)(n)
>>> myfunc(2)
4
You are simply returning a reference to a function. Lambdas were born to make short functions, short to type, avoiding the boilerplate of writing "def function(): .."
Try looking at the following tutorial to get the basics:
https://www.w3schools.com/python/python_lambda.asp
I'm trying to use types.MethodType to modify the behaviour of some iterator.
def parse(line):
return line.upper()
def reader(f):
f.__next__ = types.MethodType(lambda x: parse(_io.TextIOWrapper.readline(x)), f)
f.__iter__ = types.MethodType(lambda x: x, f)
return f
I guess I'm using types.MethodType correctly, because running the following code I get the expected result:
>>with open("myfile.txt") as f:
>> x = reader(f)
>> print(f.__next__())
NORMAL LINE
However, as soon as I use a for loop, it seems that the parse() function is not called.
>>with open("myfile.txt") as f:
>> for line in reader(f):
>> print(line)
normal line
It's as if the for-loop was using the original next() method of my object instead of the overwritten one.
What am I missing here? I know I could achieve the same results in a simpler way, for instance yielding parsed lines in reader(), but I would really prefer to return this 'decorated' file object instead.
Thanks in advance.
There is a huge difference between your two examples. In the first one you are calling the __next__ method explicitly, while in the latter you are letting the iterator protocol call it for you. In fact you can see that even in the first case the behaviour is not what you wanted:
In [5]: with open('myfile.txt') as f:
...: print(next(reader(f))) # next here calls the original implementation!
normal line
In [6]: with open('myfile.txt') as f:
...: print(reader(f).__next__())
NORMAL LINE
You can see what the interpreter is doing by checking the bytecode using the dis module.
For example:
In [8]: import dis
In [9]: def f():
...: for x in iterable:
...: pass
In [10]: dis.dis(f)
2 0 SETUP_LOOP 14 (to 17)
3 LOAD_GLOBAL 0 (iterable)
6 GET_ITER
>> 7 FOR_ITER 6 (to 16)
10 STORE_FAST 0 (x)
3 13 JUMP_ABSOLUTE 7
>> 16 POP_BLOCK
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
Note how there is a call to GET_ITER, but no call to LOAD_ATTR. If you explicitly mention the attribute however:
In [11]: def f():
...: for x in iterable.__iter__():
...: pass
In [12]: dis.dis(f)
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (iterable)
6 LOAD_ATTR 1 (__iter__)
9 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (x)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
Note that LOAD_ATTR bytecode.
When you see a LOAD_ATTR bytecode it means that the interpreter is going to perform a full-blown attribute lookup on the instance (and thus finds the attribute you just set).
However bytecodes like GET_ITER perform a special method lookup, which avoids the instance attribute lookup.
When the interpreter calls the special methods as a result of a statement he does not look them up in the instance, but in the class. This means that he will not check the __iter__ attribute you just created.
This is documented in some places. For example under object.__getattribute__, which is the method used to implement attribute lookups, there is a note:
Note: This method may still be bypassed when looking up special
methods as the result of implicit invocation via language syntax or
built-in functions. See Special method lookup.
AFAIK, since files are written in C you cannot modify the attributes of the class, so you simply cannot achieve what you wanted.
However it's extremely easy to simply create a new wrapper class:
class Wrapper:
def __init__(self, fobj):
self.fobj = fobj
def __iter__(self):
return self
def __next__(self):
return parse(next(self.fobj))
An alternative would be to create a subclass of file. In python3 this is a bit complex to do because you have to subclass io.TextIOWrapper, where its constructor takes a buffer instead of a filename, so it's slighlty more involved than python2.
However if you did create a subclass it would work fine. There may be some problems when you pass its instances to some functions, which may decide to call the original file methods, however the interpreter itself would call the __next__ and __iter__ methods that you defined.
Modifying method of instances looks tricky to me, I would try to avoid that. If you just need kind of pre-processing the text file, you can do it in a separate function, like:
def preprocess(f):
for l in f:
yield parse(l)
with open("myfile.txt") as f:
for line in preprocess(f):
print(line)
First of all, let me say that I read the many threads with similar topics on creating dynamically named variables, but they mostly relate to Python 2 or they assume you are working with classes. And yes, I read Behavior of exec function in Python 2 and Python 3.
I'm also aware that creating dynamically named variables is a bad idea in 99% of time and dictionaries are the way to got, but I just want to know whether it is still possible and how exactly exec and locals work in python 3.
I'd like to show a bit of sample code illustrating my question (fibonacci calculates fibonacci numbers, ListOfLetters provides ["A", "B", ...]):
def functionname():
for index, buchstabe in enumerate(ListOfLetters.create_list("A", "K"), 1):
exec("{} = {}".format(buchstabe, fibonacci(index)) ) #A = 1, B = 1, C = 2, D = 3, E = 5,...
print(index, buchstabe, eval(buchstabe)) #works nicely, e.g. prints "4 D 3"
print(locals()) #pritns all locals: {'B': 1, 'A': 1, 'index': 11, 'C': 2, 'H': 21, 'K': 89, ...
print(locals()['K']) #prints 89 as it should
print(eval("K")) #prints 89 as it should
print(K) #NameError: name 'K' is not defined
So at least at my current understanding, there is some inconsistency in the behaviour of locals(), since it contains the variable names added by exec() but the variables are not available in the function.
I would be greatful if someone could explain this and tell whether this is by design or if it is a real inconsistency in the language. Yes, I know that locals should not be modified , but I'm not modifying it, I'm calling exec()...
When you're not sure why something works the way it does in Python, it often can help to put the behavior that you're confused by in a function and then disassemble it from the Python bytecode with the dis module.
Lets start with a simpler version of your code:
def foo():
exec("K = 89")
print(K)
If you run foo(), you'll get the same exception you're seeing with your more complicated function:
>>> foo()
Traceback (most recent call last):
File "<pyshell#167>", line 1, in <module>
foo()
File "<pyshell#166>", line 3, in foo
print(K)
NameError: name 'K' is not defined
Lets disassemble it and see why:
>>> import dis
>>> dis.dis(foo)
2 0 LOAD_GLOBAL 0 (exec)
3 LOAD_CONST 1 ('K = 89')
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 POP_TOP
3 10 LOAD_GLOBAL 1 (print)
13 LOAD_GLOBAL 2 (K)
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 POP_TOP
20 LOAD_CONST 0 (None)
23 RETURN_VALUE
The operation that you need to pay attention to is the one labeled "13". This is where the compiler handles looking up K within the last line of the function (print(K)). It is using the LOAD_GLOBAL opcode, which fails because "K" is not a global variable name, rather it's a value in our locals() dict (added by the exec call).
What if we persuaded the compiler to see K as a local variable (by giving it a value before running the exec), so it will know not to look for a global variable that doesn't exist?
def bar():
K = None
exec("K = 89")
print(K)
This function won't give you an error if you run it, but you won't get the expected value printed out:
>>> bar()
None
Lets disassemble to see why:
>>> dis.dis(bar)
2 0 LOAD_CONST 0 (None)
3 STORE_FAST 0 (K)
3 6 LOAD_GLOBAL 0 (exec)
9 LOAD_CONST 1 ('K = 89')
12 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
15 POP_TOP
4 16 LOAD_GLOBAL 1 (print)
19 LOAD_FAST 0 (K)
22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
25 POP_TOP
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
Note the opcodes used at "3" and "19". The Python compiler uses STORE_FAST and LOAD_FAST to put the value for the local variable K into slot 0 and later fetch it back out. Using numbered slots is significantly faster than inserting and fetching values from a dictionary like locals(), which is why the Python compiler does it for all local variable access in a function. You can't overwrite a local variable in a slot by modifying the dictionary returned by locals() (as exec does, if you don't pass it a dict to use for its namespace).
Indeed, lets try a third version of our function, where we peek into locals again when we have K defined as a regular local variable:
def baz():
K = None
exec("K = 89")
print(locals())
You won't see 89 in the output this time either!
>>> baz()
{"K": None}
The reason you see the old K value in locals() is explained in the function's documentation:
Update and return a dictionary representing the current local symbol table.
The slot that the local variable K's value is stored in was not changed by the exec statement, which only modifies the locals() dict. When you call locals() again, Python "update[s]" the dictionary with the value from the slot, replacing the value stored there by exec.
This is why the docs go on to say:
Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
Your exec call is modifying the locals() dict, and you're finding how its changes are not always seen by your later code.
On the exec/eval/locals question
At least on the CPython implementation modifications to the locals() dictionary do not actually change the names in the local scope, which is why it's meant to be used read-only. You can change it, and you can see your changes in the dictionary object, but the actual local scope is not changed.
exec() takes two optional dictionary arguments, a global scope and a local scope. It defaults to globals() and locals(), but since changes to locals() aren't "real" outside of the dictionary, exec() only affects the "real" local scope when globals() is locals(), i.e. in a module outside of any function. (So in your case it's failing because it's inside a function scope).
The "better" way to use exec() in this case is to pass in your own dictionary, then operate on the values in that.
def foo():
exec_scope = {}
exec("y = 2", exec_scope)
print(exec_scope['y'])
foo()
In this case, exec_scope is used as the global and local scope for the exec, and after the exec it will contain {'y': 2, '__builtins__': __builtins__} (the builtins are inserted for you if not present)
If you want access to more globals you could do exec_scope = dict(globals()).
Passing in different global and local scope dictionaries can produce "interesting" behavior.
If you pass the same dictionary into successive calls to exec or eval, then they have the same scope, which is why your eval worked (it implicitly used the locals() dictionary).
On dynamic variable names
If you set the name from a string, what's so wrong about getting the value as a string (i.e. what a dictionary does)? In other words, why would you want to set locals()['K'] and then access K? If K is in your source it's not really a dynamically set name... hence, dictionaries.
While I was hanging out in the Python chatroom, someone dropped in and reported the following exception:
NameError: free variable 'var' referenced before assignment in enclosing scope
I'd never seen that error message before, and the user provided only a small code fragment that couldn't have caused the error by itself, so off I went googling for information, and ... there doesn't seem to be much. While I was searching, the user reported their problem solved as a "whitespace issue", and then left the room.
After playing around a bit, I've only been able to reproduce the exception with toy code like this:
def multiplier(n):
def multiply(x):
return x * n
del n
return multiply
Which gives me:
>>> triple = multiplier(3)
>>> triple(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in multiply
NameError: free variable 'n' referenced before assignment in enclosing scope
All well and good, but I'm having a hard time working out how this exception could occur in the wild, given that my example above is
Pretty stupid
Unlikely to happen by accident
... but obviously it does, given the report I mentioned at the start of this question.
So - how can this specific exception occur in real code?
Think of a more complex function where n is bound depending on some condition, or not. You don't have to del the name in question, it also happens if the compiler sees an assignment, so the name is local, but the code path is not taken and the name gets never assigned anything. Another stupid example:
def f():
def g(x):
return x * n
if False:
n = 10
return g
Too late to answer but I think I can give some detailed information to this situation. It would help future readers to see what's going on here.
So the error message says:
NameError: free variable 'var' referenced before assignment in enclosing scope
When we talk about free variables, we're dealing with nested functions. Python has done some "magic" in order to give nested functions the ability to access the variables defined inside their parent scope. If we have:
def outer():
foo = 10
def inner():
print(foo)
return inner
outer()() # 10
Normally we shouldn't have access to foo in inner function. Why ? because after calling and executing the body of the outer function, its namespace is destroyed. Basically any local variable defined inside the function is no longer available after the function terminates.
But we have access...
That magic happens with the help of the "Cell object":
“Cell” objects are used to implement variables referenced by multiple scopes. For each such variable, a cell object is created to store the value; the local variables of each stack frame that references the value contains a reference to the cells from outer scopes which also use that variable. When the value is accessed, the value contained in the cell is used instead of the cell object itself.
Just to see that hidden stored value in cells(we'll talk about __closure__ a bit later):
def outer():
foo = 10
def inner():
print(foo)
return inner
print(outer().__closure__[0].cell_contents) # 10
How does it work?
In "compile" time,
when Python sees a function within another function, it takes note of the name of the variables referenced inside the nested function which are actually defined in the outer function. This information is stored in both functions' code objects. co_cellvars for outer function and co_freevars for inner function:
def outer():
foo = 10
def inner():
print(foo)
return inner
print(outer.__code__.co_cellvars) # ('foo',)
print(outer().__code__.co_freevars) # ('foo',)
Now execution time..., (see the code)
When Python wants to execute the outer function, it creates a "cell object" for each variables (co_cellvars) that it has taken a note of.
Then as it goes through the lines, whenever it sees an assignment to such variables, it fills the corresponding cell object with that variable. (remember, "they" contain the actual values indirectly.)
When the execution reaches the line of creating the inner function, Python takes all the created cell objects and make a tuple out of them. This tuple is then assigned to the inner function's __closure__.
The point is when this tuple is created, some of the cells may not have value yet. They are empty(see the output)!...
At this point when you call the inner function those cells without value will raise that mentioned error!
def outer():
foo = 10
def inner():
print(foo)
try:
print(boo)
except NameError as e:
print(e)
# Take a look at inner's __closure__ cells
print(inner.__closure__)
# So one boo is empty! This raises error
inner()
# Now lets look at inner's __closure__ cells one more time (they're filled now)
boo = 20
print(inner.__closure__)
# This works fine now
inner()
outer()
output from Python 3.10:
(<cell at 0x7f14a5b62710: empty>, <cell at 0x7f14a5b62830: int object at 0x7f14a6f00210>)
10
free variable 'boo' referenced before assignment in enclosing scope
(<cell at 0x7f14a5b62710: int object at 0x7f14a6f00350>, <cell at 0x7f14a5b62830: int object at 0x7f14a6f00210>)
10
20
The error free variable 'boo' referenced before assignment in enclosing scope makes sense now.
Note: This error is reworded in Python 3.11 to:
cannot access free variable 'boo' where it is not associated with a value in enclosing scope
But the idea is the same.
If you look at the bytecode of the outer function, you'd see the steps I mentioned in the "execution time" section in action:
from dis import dis
def outer():
foo = 10
def inner():
print(foo)
print(boo)
boo = 20
return inner
dis(outer)
output from Python 3.11:
0 MAKE_CELL 1 (boo)
2 MAKE_CELL 2 (foo)
3 4 RESUME 0
4 6 LOAD_CONST 1 (10)
8 STORE_DEREF 2 (foo)
5 10 LOAD_CLOSURE 1 (boo)
12 LOAD_CLOSURE 2 (foo)
14 BUILD_TUPLE 2
16 LOAD_CONST 2 (<code object inner at 0x7fb6d4731a30, file "", line 5>)
18 MAKE_FUNCTION 8 (closure)
20 STORE_FAST 0 (inner)
8 22 LOAD_CONST 3 (20)
24 STORE_DEREF 1 (boo)
9 26 LOAD_FAST 0 (inner)
28 RETURN_VALUE
MAKE_CELL is new in Python3.11.
STORE_DEREF stores the value inside the cell object.
So AFAIK in CPython, function definitions are compiled into function objects when executed at parse time. But what about inner functions? Do they get compiled into function objects at parse time or do they get compiled (or interpreted) every single time the function is called? Do inner functions incur any performance penalty at all?
To give a general explaination - assuming you have the following code in a module:
def outer(x=1):
def inner(y=2):
return x+y
When the file is parsed by python via compile(), the above text is turned into bytecode for how to execute the module. In the module bytecode, there are two "code objects", one for the bytecode of outer() and one for the bytecode inner(). Note that I said code objects, not functions - the code objects contain little more than the bytecode used by the function, and any information that could be known at compile time - such as the bytecode for outer() containing a ref to the bytecode for inner().
When the module actually loads, by evaluating the code object associated with the module, one thing which happens is an actual "function object" is created for outer(), and stored in the module's outer attribute. The function object acts as a collection of the bytecode and all context-relavant things that are needed to call the function (eg which globals dict it should pull from, etc) that can't be known at compile time. In a way, a code object is a template for a function, which is a template for execution of the actual bytecode with all variables filled in.
None of this involved inner()-as-a-function yet - Each time you actually get around to calling outer(), that's when a new inner() function object is created for that invocation of outer, which binds the already-created inner bytecode object to a list of globals, including the value of x as passed into that call to outer. As you can imagine, this is pretty fast, since no parsing is needed, just filling in a quick struct with some pointers to other already-existing objects.
Easy test: the default arguments to a function are called once, at define time.
>>> def foo():
... def bar(arg=count()):
... pass
... pass
...
>>> def count():
... print "defined"
...
>>> foo()
defined
>>> foo()
defined
So yes: this is a minor (very very! minor) performance hit.
>>> import dis
>>> def foo():
... def bar():
... print "stuff"
... return bar
...
>>> b = foo()
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (<code object bar at 0x20bf738, file "<stdin>", line 2>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (bar)
4 9 LOAD_FAST 0 (bar)
12 RETURN_VALUE
>>> dis.dis(b)
3 0 LOAD_CONST 1 ('stuff')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
I suspect this is heavily implementation dependent, but that was CPython 2.6.6, and the inner function looks like it was compiled. Here's another example:
>>> def foo():
... def bar():
... return 1
... return dis.dis(bar)
...
>>> foo()
3 0 LOAD_CONST 1 (1)
3 RETURN_VALUE
So we can conclude that they are compiled. As for their performance characteristics, use them. If you start having performance issues, profile. I know it's not really an answer, but it almost never matters and when it does, general answers don't cut it. Function calls incur some overhead and it looks like inner functions are just like functions.
To extend nmichaels answer inner function are compiled in compile time as he guessed and there byte code is saved in the foo.func_code.co_consts and they are accessed using the opcode LOAD_CONST as you can see in the disassembly of the function.
Example:
>>> def foo():
... def inner():
... pass
>>> print foo.func_code.co_consts
(None, <code object inner at 0x249c6c0, file "<ipython console>", line 2>)
I'm late on this, but as a little experimental complement to these thorough answers: you may use the builtin function id to verify whether a new object is created or not:
In []: # inner version
def foo():
def bar():
return id(bar)
return bar()
foo(), foo()
Out[]: (4352951432, 4352952752)
The actual numbers may differ, but their difference indicates that two distinct instances of bar are indeed created.
In []: # outer version
def bar():
return id(bar)
def foo():
return bar()
foo(), foo()
Out[]: (4352950952, 4352950952)
This time, as expected, the two ids are the same.
Now for some timeit measurements. Inner first, outer second:
100000 loops, best of 3: 1.93 µs per loop
1000000 loops, best of 3: 1.25 µs per loop
So, on my machine, it seems that the inner version is 50% slower (Python 2.7, IPython Notebook).