At which point changing a file changes the execution's outcome? - python

Say I have a module foo.py, with some code, and a script main.py that includes foo and is executed with python -m main.
At which point changing the code of foo affects the outcome of python -m main?
Specifically, does calling import "freeze" the file in the sense that future execution is not affected by changing it?
Example of main.py:
input()
import foo
input()
import foo
print(foo.f())
On which circumstances the modification of a module file can affect the outcome of the execution?
My question is related with the following:
If I have a code under version control and run it, and checkout a different branch, the code from the different branch will be run if some import is called lazily (e.g. on a function to avoid circular depedencies). Is this true?

From the documentation:
A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement.
So changing the module on disk will not have any effect once the module has been imported once. You can see this yourself: have a file foo.py that prints "foo" when imported:
print("foo")
and a file main.py that imports foo multiple times:
import foo
import foo
import foo
and you can see that when you run main.py, the output is only one foo, so foo.py only runs once.
(Note that there is a function importlib.reload that attempts to reload the module, but it is not guaranteed to replace all references to the old module.)
With regard to your edit, yes, that's correct.

In python documentation
When a module is first imported, Python searches for the module and if
found, it creates a module object 1, initializing it.
Once this object is created it will be used even if in another module is reimporting it, python keeps track of already imported modules.
If you want to reload them you have to do it manually, check the built-in reload for python2 or imp.reload for python3

Related

Python import with modification

I always structure my repos in the following way.
repo/main.py
repo/scripts/script.py
…
In main.py, I import script.py in the following manner:
from scripts import script as sc
This always works unless I decide to make changes to script.py. After making changes, if I run main.py from shell, it still imports code from the older script.py without the current changes. Until now, what I would then is just create another branch. However, this leads to a lot of branches in the development process. Is there any way to avoid this? How else should I be importing from the scripts directory to avoid this?
Help would be highly appreciated.
UPDATE
From the answers, I can see that I have caused some confusion. What I mean when I say that I run main.py from shell, I mean executing it with python main.py from the terminal. One can think of main.py as a script that does some math and outputs the answer. In doing those math, it imports script.py from scripts that has additional functions that it (main.py) uses. After running main.py N times, if I choose to update script.py, and then I execute main.py again (in the terminal), it imports the old script.py again and does the math with the older code. The answer does not reflect the changes I just made to script.py. Until now, what I have done, when I have had to go through something like this is just create another new branch and literally copy-paste the old files and the newer script.py in the new branch and execute main.py in the shell. It does import the newer script.py then. One more thing I have noticed is that if I just create a new file as say script2.py and then import it in main.py as
from scripts import script2 as sc
it imports script2.py just as it should - it reflects all the changes made to script.py.
There’s no second import statement in main.py.
On the surface this question sounds like we're repeatedly running
$ python main.py,
which quickly executes and exits.
But from the symptom, it must be the case that
we have a long-lived REPL prompt repeatedly executing the main.py code.
The module you're looking for is
importlib.
Do this:
from importlib import reload
from scripts import script as sc
# [do stuff, then edit script.py]
reload(sc)
# [do more stuff, and see the effect of the edit]
What is going on here?
Well, if you repeatedly execute
import scripts.script
import scripts.script
it turns out that the 2nd and subsequent import does nothing.
It consults sys.modules, finds the module has already
been loaded, and reports "cache hit!" instead of doing
the hard work of pulling in that text file.
The purpose of the reload() function is to
do a cache invalidate and repeat the import, so it actually
pulls in python text from the (presumably edited) source file.
Suppose you have a short-lived $ python main.py process
that runs repeatedly, sometimes after a script.py edit.
So the in-memory sys.module cache is not relevant,
having been discarded each time the process exits.
There is another level of caching at work here.
Typically the cPython interpreter will read script.py,
parse, produce bytecode from the parse tree, and write
the bytecode to script.pyc. More than one output cache
directory is possible, depending on system details.
Upon being asked to read script.py, the interpreter
can look to see if the corresponding .pyc bytecode file
is available and fresh, and then declare "cache hit!",
in which case the .py source file is not even read.
Normally this works great, because source file updates
are infrequent (human editing speed), and the freshness
comparison of file timestamps is effective. If there's
something wrong with those timestamps the whole mechanism
won't work properly, and we might fail to notice the
source was recently edited. Perhaps you suffer from this.
First, identify the relevant .pyc bytecode file.
Then, run main, make an edit, delete the .pyc file,
re-run main, and notice that the edit took effect.
Let us know
how it goes.

Testing an interactive module with pytest

I have a python module which is designed to be used by non-programmers, interactively. It is used within iPython, and on load, it asks for a couple of user inputs, prints some ascii art, that sort of thing. This is implemented in the __init__.py of the module.
The module also contains a utils file tld(containing setup.py)/utils.py, with some functions in it. I would like to test these. The utils file does not import anything from the rest of the module, and contains only pure functions. The tests live in tld(containing setup.py)/tests/test_utils.pyCurrently, trying to run pytest results in a failure, as the module's __init__ file is run, which hangs whilst awaiting the above mentioned user input.
Is there a way of getting pytest to run the tests in test_utils.py, without running the __init__.py of the python module? Or some other way of getting around this issue?
I have tried running pytest tests/test_utils.py, but that had the same effect.
I solved this by defining an initialize function inside the __init__.py of the module, which encapsulates all of the user interaction. If module.initialize() is not explicitly called, no user input is requested, so the pytest tests can run as expected.

How are basic .py files treated by the python interpreter?

Recently I have been trying to dig deeper into the core of python. Currently I am look into pythons module system and how "global", "local", and "nonlocal" variables are stored. More specifically, my question is how does the interpreter treat the file being run? Is it treated as its own module in the modules (or something similar)?
The top-level script is treated as a module, but with a few differences.
Instead of its name being the script name minus a .py extension, its name is __main__.
The top-level script does not get looked up in the .pyc cache, nor compiled and cached there.
Other than that, it's mostly the same: the interpreter compiles your script as a module, builds a types.ModuleType out of it, stores it in sys.modules['__main__'], etc.
Also look at runpy, which explains how both python spam.py and python-m spam work. (As of, I think, 3.4, runpy.run_path should do exactly the same thing as running a script, not just something very similar.) And notice that the docs link to the source, so if you need to look up any specifics of the internals, you can.
The first difference is why you often see this idiom:
if __name__ == '__main__':
import sys
main(sys.argv) # or test() or similar
That allows the same file spam.py to be used as a module (in which case its __name__ will be spam) or as a script (in which case its __name__ will be __main__), with code that you only want to be run in the script case.
If you're curious whether standard input to the interactive interpreter is treated the same way as a script, there are a lot more differences there. Most importantly, each statement is compiled and run as a statement with exec, rather than the whole script/module being compiled and run as a module.
Yes, that's essentially what happens. It's the __main__ module. You can see this by running something like the following:
x = 3
import __main__
print(__main__.x)
Either run as a script file, or on the interpreter, this will print:
3

reload (update) a module file in the interpreter

Let's say I have this python script script.py and I load it in the interpreter by typing
import script
and then I execute my function by typing:
script.testFunction(testArgument)
OK so far so good, but when I change script.py, if I try to import again the script doesn't update. I have to exit from the interpreter, restart the interpreter, and then import the new version of the script for it to work.
What should I do instead?
You can issue a reload script, but that will not update your existing objects and will not go deep inside other modules.
Fortunately this is solved by IPython - a better python shell which supports auto-reloading.
To use autoreloading in IPython, you'll have to type import ipy_autoreload first, or put it permanently in your ~/.ipython/ipy_user_conf.py.
Then run:
%autoreload 1
%aimport script
%autoreload 1 means that every module loaded with %aimport will be reloaded before executing code from the prompt. This will not update any existing objects, however.
See http://ipython.org/ipython-doc/dev/config/extensions/autoreload.html for more fun things you can do.
http://docs.python.org/library/functions.html#reload
reload(module)
Reload a previously imported module. The argument must
be a module object, so it must have been successfully imported before.
This is useful if you have edited the module source file using an
external editor and want to try out the new version without leaving
the Python interpreter. The return value is the module object (the
same as the module argument).
An alternative solution that has helped me greatly is to maintain a copy of sys.modules keys and pop the new modules after the import to force re-imports of deep imports:
>>> oldmods = set(sys.modules.keys())
>>> import script
>>> # Do stuff
>>> for mod in set(sys.modules.keys()).difference(oldmods): sys.modules.pop(mod)
>>> import script

imports while starting an interactive shell

When I start the interactive django shell through manage.py, by executing
python -v manage.py shell
from the project directory, I see a lot of modules of format django.package.module getting imported in the verbose output but still I have to import them to use it in the shell.
The same happens when I just run the Python shell (with the -v argument). For example I see this in the verbose output,
import os # precompiled from /usr/local/gdp/lib/python2.4/os.pyc
but still i have to do import os to import and use the os module. What is being imported that am seeing in the verbose output and why I have to import them explicitly again to use them in the shell? Does Python load some essential modules while starting the shell or is it some kind of behind-the-scene magic?
-v traces the first import of a module -- the one that actually loads the module (executes its code, and so may take a bit of time) and sticks it into sys.modules.
That has nothing to do whether your interactive session (module __main__) gets the module injected into its namespace, of course. To ensure module 'goo' does get into the namespace of module 'X' (for any X, so of course including __main__... among many, many others), module 'X' just needs to import goo itself (a very fast operation indeed, if sys.modules['goo'] is already defined!-).
Python loads the site module implicitly when starting up, which may in turn import other modules for its own use. You can pass -S to disable this behavior.
They are getting imported (look at sys.modules) and references to the module are created in whichever modules have imported it.
When you do an import in your shell, if the module has already been imported, you will just get a copy of the reference to it in sys.modules

Categories

Resources