Recently I have been trying to dig deeper into the core of python. Currently I am look into pythons module system and how "global", "local", and "nonlocal" variables are stored. More specifically, my question is how does the interpreter treat the file being run? Is it treated as its own module in the modules (or something similar)?
The top-level script is treated as a module, but with a few differences.
Instead of its name being the script name minus a .py extension, its name is __main__.
The top-level script does not get looked up in the .pyc cache, nor compiled and cached there.
Other than that, it's mostly the same: the interpreter compiles your script as a module, builds a types.ModuleType out of it, stores it in sys.modules['__main__'], etc.
Also look at runpy, which explains how both python spam.py and python-m spam work. (As of, I think, 3.4, runpy.run_path should do exactly the same thing as running a script, not just something very similar.) And notice that the docs link to the source, so if you need to look up any specifics of the internals, you can.
The first difference is why you often see this idiom:
if __name__ == '__main__':
import sys
main(sys.argv) # or test() or similar
That allows the same file spam.py to be used as a module (in which case its __name__ will be spam) or as a script (in which case its __name__ will be __main__), with code that you only want to be run in the script case.
If you're curious whether standard input to the interactive interpreter is treated the same way as a script, there are a lot more differences there. Most importantly, each statement is compiled and run as a statement with exec, rather than the whole script/module being compiled and run as a module.
Yes, that's essentially what happens. It's the __main__ module. You can see this by running something like the following:
x = 3
import __main__
print(__main__.x)
Either run as a script file, or on the interpreter, this will print:
3
Related
I always structure my repos in the following way.
repo/main.py
repo/scripts/script.py
…
In main.py, I import script.py in the following manner:
from scripts import script as sc
This always works unless I decide to make changes to script.py. After making changes, if I run main.py from shell, it still imports code from the older script.py without the current changes. Until now, what I would then is just create another branch. However, this leads to a lot of branches in the development process. Is there any way to avoid this? How else should I be importing from the scripts directory to avoid this?
Help would be highly appreciated.
UPDATE
From the answers, I can see that I have caused some confusion. What I mean when I say that I run main.py from shell, I mean executing it with python main.py from the terminal. One can think of main.py as a script that does some math and outputs the answer. In doing those math, it imports script.py from scripts that has additional functions that it (main.py) uses. After running main.py N times, if I choose to update script.py, and then I execute main.py again (in the terminal), it imports the old script.py again and does the math with the older code. The answer does not reflect the changes I just made to script.py. Until now, what I have done, when I have had to go through something like this is just create another new branch and literally copy-paste the old files and the newer script.py in the new branch and execute main.py in the shell. It does import the newer script.py then. One more thing I have noticed is that if I just create a new file as say script2.py and then import it in main.py as
from scripts import script2 as sc
it imports script2.py just as it should - it reflects all the changes made to script.py.
There’s no second import statement in main.py.
On the surface this question sounds like we're repeatedly running
$ python main.py,
which quickly executes and exits.
But from the symptom, it must be the case that
we have a long-lived REPL prompt repeatedly executing the main.py code.
The module you're looking for is
importlib.
Do this:
from importlib import reload
from scripts import script as sc
# [do stuff, then edit script.py]
reload(sc)
# [do more stuff, and see the effect of the edit]
What is going on here?
Well, if you repeatedly execute
import scripts.script
import scripts.script
it turns out that the 2nd and subsequent import does nothing.
It consults sys.modules, finds the module has already
been loaded, and reports "cache hit!" instead of doing
the hard work of pulling in that text file.
The purpose of the reload() function is to
do a cache invalidate and repeat the import, so it actually
pulls in python text from the (presumably edited) source file.
Suppose you have a short-lived $ python main.py process
that runs repeatedly, sometimes after a script.py edit.
So the in-memory sys.module cache is not relevant,
having been discarded each time the process exits.
There is another level of caching at work here.
Typically the cPython interpreter will read script.py,
parse, produce bytecode from the parse tree, and write
the bytecode to script.pyc. More than one output cache
directory is possible, depending on system details.
Upon being asked to read script.py, the interpreter
can look to see if the corresponding .pyc bytecode file
is available and fresh, and then declare "cache hit!",
in which case the .py source file is not even read.
Normally this works great, because source file updates
are infrequent (human editing speed), and the freshness
comparison of file timestamps is effective. If there's
something wrong with those timestamps the whole mechanism
won't work properly, and we might fail to notice the
source was recently edited. Perhaps you suffer from this.
First, identify the relevant .pyc bytecode file.
Then, run main, make an edit, delete the .pyc file,
re-run main, and notice that the edit took effect.
Let us know
how it goes.
I am studying python, both 3. and 2. I started a few days ago.
I want to know the differences between site module and interpreter.
I got this question from
Python exit commands - why so many and when should each be used?
Those explanations are very clear but it's still hard to me.
If I am understanding your question correctly,site is a module in Python. A module is a file containing Python definitions and statements. In order to use the functions (for ex: exit() or quit(), you need to import the site module as those respective functions are defined in there.
The Python interpreter is the program that reads and executes Python code. This includes source code, pre-compiled code, scripts - in this case you reference, you would need to import the site module into your current Python interpreter session, in order to use say exit() or quit() in that given session.
So, the process of this particular question would be:
* Activate the Python interpreter by typing into your respective terminal the version of Python you have installed on your computer, ex. python3.
* In the Python interpreter, type import site
Hope that helps Hwan.
I assume you are stuck on understanding:
"Nevertheless, quit should not be used in production code. This is
because it only works if the site module is loaded. Instead, this
function should only be used in the interpreter."
Basically, what that is saying is that quit is a part of a module loaded in the python interpreter. That module's name, is site.
Firstly, the python interpreter is what you use to run python scripts or environments. It interprets python commands. For example, if you write a = 1 in a script or python environment, the interpreter takes that command and executes it without compiling it. (If it was a language like c you would need to compile the script before you run it).
Secondly, a module is a pre-written file that can define functions, classes and variables. When you write import numpy into python, you are importing the module, numpy. Therefore when they say "this only works if the site module is loaded", that means that import site must be executed.
When you start a python interpreter (by typing python into your command shell), it automatically imports site, which has sys, venv and main etc. which are all required to run an active interpreter session.
I had a strange situation:
In my folder /home/Komponenten/ were a lot of python scripts
When I started
cd /home/Kompontenen
/home/Kompontenen>python urlfilter.py
resulted in the execution of another script, i found out that it was in my case it was queue.py from the same folder
I though ok, there might be some code in urlfilter were I used the queue.py. Queue.py contained a little test with multithreading but nothing special
So I simply tried to move the queue.py file
After that urlfilter.py was executed normally and no error
So I still have no clue why the python interpreter executed queue.py instead of urlfilter.py
In Python the import path contains . (working directory). Importing a module basically means executing it. That is why your importing of queue from urlfilter.py resulted in queue being executed. To avoid accidental execution of scripts by importing, you can check the __name__ variable for the value '__main__'.
if __name__ == '__main__':
do_not_execute_this_during_import()
Every time I make a change to a python script I have to reload python and re-import the module. Please advise how I can modify my scripts and run then without having to relaunch python in the terminal.
Thanks.
I've got a suggestion, based on your comment describing your work flow:
first, i run python3.1 in terminal second, i do "import module" then, i run a method from the module lets say "module.method(arg)" every time, i try to debug the code, i have to do this entire sequence, even though the change is minor. it is highly inefficient
Instead of firing up the interactive Python shell, make the module itself executable. The easiest way to do this is to add a block to the bottom of the module like so:
if __name__ == '__main__':
method(arg) # matches what you run manually in the Python shell
Then, instead of running python3.1, then importing the module, then calling the method, you can do something like this:
python3.1 modulename.py
and Python will run whatever code is in the if __name__ == '__main__' block. But that code will not be run if the module is imported by another Python module. More information on this common Python idiom can be found in the Python tutorial.
The advantage of this is that when you make a change to your code, you can usually just re-run the module by pressing the up arrow and hitting enter. No messy reloading necessary.
If it's just some module that's changing, you can call reload(module) in your script.
Do you mean that you enter script directly into the interactive python shall, or are you executing your .py file from the terminal by running something like python myscript.py?
I found a bypass online. Magic commands exist. It works like a charm for me. The Reload option didn't work for me. Got it from here and point 3 of this link.
Basically all you have to do is the following: and changes you make are reflected automatically after you save.
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: Import MODULE
In [4]: my_class = Module.class()
my_class.printham()
Out[4]: ham
In [5]: #make changes to printham and save
In [6]: my_class.printham()
Out[6]: hamlet
You can use reload to re-import a module. I use it frequently when using the interactive mode to debug code.
However, from a higher level, I would be hesitant to use that in a production version of a program. Unless you will have very strict control over how sub-modules are changing, it would not be hard to have the reloaded module change in some way that breaks your program. Unless your program really needs 100% up-time, it would make sense to stop it and start it again when there is some kind of version change.
I am developing a Python package using a text editor and IPython. Each time I change any of the module code I have to restart the interpreter to test this. This is a pain since the classes I am developing rely on a context that needs to be re-established on each reload.
I am aware of the reload() function, but this appears to be frowned upon (also since it has been relegated from a built-in in Python 3.0) and moreover it rarely works since the modules almost always have multiple references.
My question is - what is the best/accepted way to develop a Python module/package so that I don't have to go through the pain of constantly re-establishing my interpreter context?
One idea I did think of was using the if __name__ == '__main__': trick to run a module directly so the code is not imported. However this leaves a bunch of contextual cruft (specific to my setup) at the bottom of my module files.
Ideas?
A different approach may be to formalise your test driven development, and instead of using the interpreter to test your module, save your tests and run them directly.
You probably know of the various ways to do this with python, I imagine the simplest way to start in this direction is to copy and paste what you do in the interpreter into the docstring as a doctest and add the following to the bottom of your module:
if __name__ == "__main__":
import doctest
doctest.testmod()
Your informal test will then be repeated every time the module is called directly. This has a number of other benefits. See the doctest docs for more info on writing doctests.
Ipython does allow reloads see the magic function %run iPython doc
or if modules under the one have changed the recursive dreloadd() function
If you have a complex context is it possible to create it in another module? or assign it to a global variable which will stay around as the interpreter is not restarted
How about using nose with nosey to run your tests automatically in a separate terminal every time you save your edits to disk? Set up all the state you need in your unit tests.
you could create a python script that sets up your context and run it with
python -i context-setup.py
-i When a script is passed as first argument or the -c option is
used, enter interactive mode after executing the script or the
command. It does not read the $PYTHONSTARTUP file. This can be
useful to inspect global variables or a stack trace when a
script raises an exception.