How does the Python executable parse and execute scripts? [closed]

How does the Python executable parse and execute scripts? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Let's say I have the following script, test.py:
import my_library
bar = 12
def foo():
nested_bar = 21
my_library.do_things()
def nested_foo():
nested_bar += 11
not_a_variable += 1
{$ invalid_syntax
bar = 13
foo()
bar = 14
I'm curious as to what exactly happens when I run python test.py. Obviously Python doesn't just read programs line-by-line - otherwise it wouldn't catch syntax errors before actually executing the program. But this makes the workings of the interpreter seem somewhat nebulous. I was wondering if someone would help clear things up for me. In particular, I would like to know:
At what point does Python realize there is a syntax error on line 13?
At what point does Python read the nested functions and add them to the scope of foo?
Similarly, how does Python add the function foo to its namespace when it encounters it, without executing it?
Suppose my_library were an invalid import. Would Python necessarily raise an ImportError before executing any other commands?
Suppose my_library were a valid module, but it has no function do_things. At what point would Python realize this, during execution of foo() or before?
If anyone could point me to documentation on how Python parses and executes scripts it would be very much appreciated.

There's some information in the tutorial's section on modules, but I don't think the documentation has a complete reference for this. So, here's what happens.
When you first run a script or import a module, Python parses the syntax into an AST and then compiles that into bytecode. It hasn't executed anything yet; it's just compiled your code into instructions for a little stack-based machine. This is where syntax errors are caught. (You can see the guts of all this in the ast module, the token module, the compile builtin, the grammar reference, and sprinkled around various other places.)
You can actually compile a module independently of running the generated code; that's what the builtin compileall method does.
So that's the first phase: compiling. Python only has one other phase, which is actually running the code. Every statement in your module, except those contained within def or lambda, is executed in order. That means that imports happen at runtime, wherever you happen to put them in your module. Which is part of the reason it's good hygiene to put them all at the top. Same for def and class: these are just statements that create a specific type of object, and they're executed as they're encountered, like anything else.
The only tricky bit here is that the phases can happen more than once — for example, an import is only executed at runtime, but if you've never imported that module before, then it has to be compiled, and now you're back in compile time. But "outside" the import it's still runtime, which is why you can catch a SyntaxError thrown by an import.
Anyway, to answer your specific questions:
At compile time. When you run this as a script, or when you import it as a module, or when you compile it with compileall, or otherwise ask Python to make any sense of it. In practical terms, this can happen at any time: if you tried to import this module within a function, you'd only get a SyntaxError when calling that function, which might be halfway through your program.
During the execution of foo, because def and class just create a new object and assign it to a name. But Python still knows how to create the nested function, because it's already compiled all the code within it.
The same way it would add foo = lambda: 1 + 2 to a namespace without executing it. A function is just an object that contains a "code" attribute — literally just a block of Python bytecode. You can manipulate the code type as data, because it is data, independently of executing it. Try looking at a function's .__code__, read the "code objects" section of the data model, or even play around with the disassembler. (You can even execute a code object directly with custom locals and globals using exec, or change the code object a function uses!)
Yes, because import is a plain old statement like any other, executed in order. But if there were other code before the import, that would run first. And if it were in a function, you wouldn't get an error until that function ran. Note that import, just like def and class, is just a fancy form of assignment.
Only during the execution of foo(). Python has no way of knowing whether other code will add a do_things to your module before that point, or even change my_library to some other object entirely. Attribute lookups are always done just-in-time, when you ask for them, never in advance.

As a general rule, python first parses the file, compiles the abstract syntax tree to byte code, then attempt to execute it sequentially. That means all statements are executed line by line. Thus, this means:
Syntax errors are caught at parse time, before anything is executed. If you add some side effect to the script, e.g. create a file, you will see that it never gets executed.
A function becomes defined in the scope after the definition. If you try to call nested_foo right before def nested_foo() you will see that it would fail because nested_foo has not been defined at that point.
Same as 2.
If python cannot import a library, where import means it tries to execute the module, then it fails with an ImportError.
Since you don't try to access do_things at import time (i.e. you are not doing from my_library import do_things), an error only occurs when you attempt to call foo().

Related

Python 2.7 function disappearing from module after import

I'm running a python script that imports a function, and then imports a class from a module with the same name as the function. For example:
from antioch import parrot
from antioch.parrot import Spam
If I print help(antioch) after the first import statement, it shows parrot() listed under the FUNCTIONS, however if I print help(antioch) after the second import statement the FUNCTIONS list no longer includes the parrot() function.
This causes a problem later in my code when trying to call the function as I get a 'module object us not callable' error.
I realise that I could probably avoid this issue by renaming the parrot module to a different name to the function but this would involve editing quite a lot of code and seems like a workaround which shouldn't be necessary.
Is there a better way around this problem?

Python: Print functioncalls and namespaces

I was wondering, for debugging purposes, if it is possible to see what namespaces and modules you are operating with once you do an import and furthermore to see where a function was called.
If I have a function f(x) and a rather complicated structure in my code, is there a way to see where f(x) is being called without adding prints all over the place?
Something like f.print_occurance()
"f was called in function integrate"
"f was called in function linspace"
"f was called in function enumerate"
Something similar to do this.
As for the first question, suppose I import a module "import somemodule"
Now if that module imports other modules, can I see what namespaces and modules have been imported/used without looking up somemodule.py (or its header file if it exists like in c/cpp).
Sorry if this is a newbie question, just seems like basic tricks I should know for error handling and debugging but googling returned nothing useful.

You could possibly write your own f.print_occurence() attribute. Create a varible that flags 'true' when the function starts then the f.print_occurence() will recognize the flag and print accordingly.

You should definitely look at the traceback and inspect modules.
For a simple way to do this:
traceback.print_stack(limit=2)
This will be ugly, but tell you which function is being called and what called it. You can look at the modules for how to use them to fit your needs.
You can look at the imported modules with sys.modules

Declare function at end of file in Python [duplicate]

This question already has answers here:
How do I forward-declare a function to avoid `NameError`s for functions defined later?
(17 answers)
Closed 8 months ago.
Is it possible to call a function without first fully defining it? When attempting this I get the error: "function_name is not defined". I am coming from a C++ background so this issue stumps me.
Declaring the function before works:
def Kerma():
return "energy / mass"
print Kerma()
However, attempting to call the function without first defining it gives trouble:
print Kerma()
def Kerma():
return "energy / mass"
In C++, you can declare a function after the call once you place its header before it.
Am I missing something here?

One way that is sort of idiomatic in Python is writing:
def main():
print Kerma()
def Kerma():
return "energy / mass"
if __name__ == '__main__':
main()
This allows you to write you code in the order you like as long as you keep calling the function main at the end.

When a Python module (.py file) is run, the top level statements in it are executed in the order they appear, from top to bottom (beginning to end). This means you can't reference something until you've defined it. For example the following will generate the error shown:
c = a + b # -> NameError: name 'a' is not defined
a = 13
b = 17
Unlike with many other languages, def and class statements are executable in Python—not just declarative—so you can't reference either a or b until that happens and they're defined. This is why your first example has trouble—you're referencing the Kerma() function before its def statement has executed and body have been processed and the resulting function object bound to the function's name, so it's not defined at that point in the script.
Programs in languages like C++ are usually preprocessed before being run and during this compilation stage the entire program and any #include files it refers to are read and processed all at once. Unlike Python, this language features declarative statements which allow the name and calling sequence of functions (or static type of variables) to be declared (but not defined), before use so that when the compiler encounters their name it has enough information to check their usage, which primarily entails type checking and type conversions, none of which requires their actual contents or code bodies to have been defined yet.

This isn't possible in Python, but quite frankly you will soon find you don't need it at all. The Pythonic way to write code is to divide your program into modules that define classes and functions, and a single "main module" that imports all the others and runs.
For simple throw-away scripts get used to placing the "executable portion" at the end, or better yet, learn to use an interactive Python shell.

If you are willing to be like C++ and use everything inside a functions. you can call the first function from the bottom of the file, like this:
def main():
print("I'm in main")
#calling a although it is in the bottom
a()
def b():
print("I'm in b")
def a():
print("I'm in a")
b()
main()
That way python is first 'reading' the whole file and just then starting the execution

Python is a dynamic programming language and the interpreter always takes the state of the variables (functions,...) as they are at the moment of calling them. You could even redefine the functions in some if-blocks and call them each time differently. That's why you have to define them before calling them.

dynamic module creation

I'd like to dynamically create a module from a dictionary, and I'm wondering if adding an element to sys.modules is really the best way to do this. EG
context = { a: 1, b: 2 }
import types
test_context_module = types.ModuleType('TestContext', 'Module created to provide a context for tests')
test_context_module.__dict__.update(context)
import sys
sys.modules['TestContext'] = test_context_module
My immediate goal in this regard is to be able to provide a context for timing test execution:
import timeit
timeit.Timer('a + b', 'from TestContext import *')
It seems that there are other ways to do this, since the Timer constructor takes objects as well as strings. I'm still interested in learning how to do this though, since a) it has other potential applications; and b) I'm not sure exactly how to use objects with the Timer constructor; doing so may prove to be less appropriate than this approach in some circumstances.
EDITS/REVELATIONS/PHOOEYS/EUREKA:
I've realized that the example code relating to running timing tests won't actually work, because import * only works at the module level, and the context in which that statement is executed is that of a function in the testit module. In other words, the globals dictionary used when executing that code is that of __main__, since that's where I was when I wrote the code in the interactive shell. So that rationale for figuring this out is a bit botched, but it's still a valid question.
I've discovered that the code run in the first set of examples has the undesirable effect that the namespace in which the newly created module's code executes is that of the module in which it was declared, not its own module. This is like way weird, and could lead to all sorts of unexpected rattlesnakeic sketchiness. So I'm pretty sure that this is not how this sort of thing is meant to be done, if it is in fact something that the Guido doth shine upon.
The similar-but-subtly-different case of dynamically loading a module from a file that is not in python's include path is quite easily accomplished using imp.load_source('NewModuleName', 'path/to/module/module_to_load.py'). This does load the module into sys.modules. However this doesn't really answer my question, because really, what if you're running python on an embedded platform with no filesystem?
I'm battling a considerable case of information overload at the moment, so I could be mistaken, but there doesn't seem to be anything in the imp module that's capable of this.
But the question, essentially, at this point is how to set the global (ie module) context for an object. Maybe I should ask that more specifically? And at a larger scope, how to get Python to do this while shoehorning objects into a given module?

Hmm, well one thing I can tell you is that the timeit function actually executes its code using the module's global variables. So in your example, you could write
import timeit
timeit.a = 1
timeit.b = 2
timeit.Timer('a + b').timeit()
and it would work. But that doesn't address your more general problem of defining a module dynamically.
Regarding the module definition problem, it's definitely possible and I think you've stumbled on to pretty much the best way to do it. For reference, the gist of what goes on when Python imports a module is basically the following:
module = imp.new_module(name)
execfile(file, module.__dict__)
That's kind of the same thing you do, except that you load the contents of the module from an existing dictionary instead of a file. (I don't know of any difference between types.ModuleType and imp.new_module other than the docstring, so you can probably use them interchangeably) What you're doing is somewhat akin to writing your own importer, and when you do that, you can certainly expect to mess with sys.modules.
As an aside, even if your import * thing was legal within a function, you might still have problems because oddly enough, the statement you pass to the Timer doesn't seem to recognize its own local variables. I invoked a bit of Python voodoo by the name of extract_context() (it's a function I wrote) to set a and b at the local scope and ran
print timeit.Timer('print locals(); a + b', 'sys.modules["__main__"].extract_context()').timeit()
Sure enough, the printout of locals() included a and b:
{'a': 1, 'b': 2, '_timer': <built-in function time>, '_it': repeat(None, 999999), '_t0': 1277378305.3572791, '_i': None}
but it still complained NameError: global name 'a' is not defined. Weird.

How can I figure out in my module if the main program uses a specific variable?

I know this does not sound Pythonic, but bear with me for a second.
I am writing a module that depends on some external closed-source module. That module needs to get instantiated to be used (using module.create()).
My module attempts to figure out if my user already loaded that module (easy to do), but then needs to figure out if the module was instantiated. I understand that checking out the type() of each variable can tell me this, but I am not sure how I can get the names of variables defined by the main program. The reason for this is that when one instantiates the model, they also set a bunch of parameters that I do not want to overwrite for any reason.
My attempts so far involved using sys._getframe().f_globals and iterating through the elements, but in my testing it doesn't work. If I instantiate the module as modInst and then call the function in my module, it fails to show the modInst variable. Is there another solution to this? Sample code provided below.
import sys
if moduleName not in sys.modules:
import moduleName
modInst = moduleName.create()
else:
globalVars = sys._getframe().f_globals
for key, value in globalVars:
if value == "Module Name Instance":
return key
return moduleName.create()
EDIT: Sample code included.

Looks like your code assumes that the .create() function was called, if at all, by the immediate/direct caller of your function (which you show only partially, making it pretty hard to be sure about what's going on) and the results placed in a global variable (of the module where the caller of your function resides). It all seems pretty fragile. Doesn't that third-party module have some global variables of its own that are affected by whether the module's create has been called or not? I imagine it would -- where else is it keeping the state-changes resulting from executing the create -- and I would explore that.
To address a specific issue you raise,
I am not sure how I can get the names
of variables defined by the main
program
that's easy -- the main program is found, as a module, in sys.modules['__main__'], so just use vars(sys.modules['__main__']) to get the global dictionary of the main program (the variable names are the keys in that dictionary, along of course with names of functions, classes, etc -- the module, like any other module, has exactly one top-level/global namespace, not one for variables, a separate one for functions, etc).

Suppose the external closed-sourced module is called extmod.
Create my_extmod.py:
import extmod
INSTANTIATED=False
def create(*args,**kw):
global INSTANTIATED
INSTANTIATED=True
return extmod.create(*args,**kw)
Then require your users to import my_extmod instead of extmod directly.
To test if the create function has been called, just check the value of extmod.INSTANTIATED.
Edit: If you open up an IPython session and type import extmod, then type
extmod.[TAB], then you'll see all the top-level variables in the extmod namespace. This might help you find some parameter that changes when extmod.create is called.
Barring that, and barring the possibility of training users to import my_extmod, then perhaps you could use something like the function below. find_extmod_instance searches through all modules in sys.modules.
def find_instance(cls):
for modname in sys.modules:
module=sys.modules[modname]
for value in vars(module).values():
if isinstance(value,cls):
return value
x=find_instance(extmod.ExtmodClass) or extmod.create()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does the Python executable parse and execute scripts? [closed] - python

Related

Python 2.7 function disappearing from module after import

Python: Print functioncalls and namespaces

Declare function at end of file in Python [duplicate]

dynamic module creation

How can I figure out in my module if the main program uses a specific variable?

Categories

Resources