I have to code something in Jython, for CCPS (programm using jython as scripting interface). However Jython does not update the submodules if I change them in an editor, unless I restart the programm (startup time is prohibitive). SO testing and adjusting is relatively slow.
I have googled and found a post indicating, that one should import or reload the submodules. The basic outline thus is:
def loader(module, part=None):
if not module in sys.modules :
if part == None:
exec("import "+module)
else:
exec("from %s import %s" % (module, part))
else :
exec("reload "+module)
however I have an issue with this, the module is loaded locally, meaning i can access the module within the loader() function, but not in my main code.
Two questions:
What is the right way to test something with submodules in Jython without restarting Jython after each submodule change?
Is there a way to generate globals dynamically so I can import into the global namespace?
(e.g. exec("global %(mod)s = %(mod)s" % ({'mod':module}))
How about just unloading all modules so they are reloaded on the next import:
import sys
sys.modules.clear()
Related
I have a python program that loads quite a bit of data before running. As such, I'd like to be able to reload code without reloading data. With regular python, importlib.reload has been working fine. Here's an example:
setup.py:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
extensions = [
Extension("foo.bar", ["foo/bar.pyx"],
language="c++",
extra_compile_args=["-std=c++11"],
extra_link_args=["-std=c++11"])
]
setup(
name="system2",
ext_modules=cythonize(extensions, compiler_directives={'language_level' : "3"}),
)
foo/bar.py
cpdef say_hello():
print('Hello!')
runner.py:
import pyximport
pyximport.install(reload_support=True)
import foo.bar
import subprocess
from importlib import reload
if __name__ == '__main__':
def reload_bar():
p = subprocess.Popen('python setup.py build_ext --inplace',
shell=True,
cwd='<your directory>')
p.wait()
reload(foo.bar)
foo.bar.say_hello()
But this doesn't seem to work. If I edit bar.pyx and run reload_bar I don't see my changes. I also tried pyximport.build_module() with no luck -- the module rebuilt but didn't reload. I'm running in a "normal" python shell, not IPython if it makes a difference.
I was able to get a solution working for Python 2.x a lot easier than Python 3.x. For whatever reason, Cython seems to be caching the shareable object (.so) file it imports your module from, and even after rebuilding and deleting the old file while running, it still imports from the old shareable object file. However, this isn't necessary anyways (when you import foo.bar, it doesn't create one), so we can just skip this anyways.
The largest problem was that python kept a reference to the old module, even after reloading. Normal python modules seem to work find, but not anything cython related. To fix this, I run execute two statements in place of reload(foo.bar)
del sys.modules['foo.bar']
import foo.bar
This successfully (though probably less efficiently) reloads the cython module. The only issue that remains in in Python 3.x running that subprocess creates a problematic shareable objects. Instead, skip that all together and let the import foo.bar work its magic with the pyximporter module, and recompile for you. I also added an option to the pyxinstall command to specify the language level to match what you've specified in the setup.py
pyximport.install(reload_support=True, language_level=3)
So all together:
runner.py
import sys
import pyximport
pyximport.install(reload_support=True, language_level=3)
import foo.bar
if __name__ == '__main__':
def reload_bar():
del sys.modules['foo.bar']
import foo.bar
foo.bar.say_hello()
input(" press enter to proceed ")
reload_bar()
foo.bar.say_hello()
Other two files remained unchanged
Running:
Hello!
press enter to proceed
-replace "Hello!" in foo/bar.pyx with "Hello world!", and press Enter.
Hello world!
Cython-extensions are not the usual python-modules and thus the behavior of the underlying OS shimmers through. This answer is about Linux, but also other OSes have similar behavior/problems (ok, Windows wouldn't even allow you to rebuild the extension).
A cython-extension is a shared object. When importing, CPython opens this shared object via ldopen and calls the init-function, i.e. PyInit_<module_name> in Python3, which among other things registers the functions/functionality provided by the extension.
Is a shared-object loaded, we no longer can unload it, because there might be some Python objects alive, which would then have dangling pointers instead of function-pointers to the functionality from the original shared-object. See for example this CPython-issue.
Another important thing: When ldopen loads a shared object with the same path as one already loaded shared object, it will not read it from the disc, but just reuse the already loaded version - even if there is a different version on the disc.
And this is the problem with our approach: As long as the resulting shared object has the same name as the old one, you will never get to see the new functionality in the interpreter without restarting it.
What are your options?
A: Use pyximport with reload_support=True
Let's assume your Cython (foo.pyx) module looks as follows:
def doit():
print(42)
# called when loaded:
doit()
Now import it with pyximport:
>>> import pyximport
>>> pyximport.install(reload_support=True)
>>> import foo
42
>>> foo.doit()
42
foo.pyx was built and loaded (we can see, it prints 42 while loading, as expected). Let's take a look at the file of foo:
>>> foo.__file__
'/home/XXX/.pyxbld/lib.linux-x86_64-3.6/foo.cpython-36m-x86_64-linux-gnu.so.reload1'
You can see the additional reload1-suffix compared to the case built with reload_support=False. Seeing the file-name, we also verify that there is no other foo.so lying in the path somewhere and being wrongly loaded.
Now, let's change 42 to 21 in the foo.pyx and reload the file:
>>> import importlib
>>> importlib.reload(foo)
21
>>> foo.doit()
42
>>> foo.__file__
'/home/XXX/.pyxbld/lib.linux-x86_64-3.6/foo.cpython-36m-x86_64-linux-gnu.so.reload2'
What happened? pyximport built an extension with a different prefix (reload2) and loaded it. It was successful, because the name/path of the new extension is different due to the new prefix and we can see 21 printed while loaded.
However, foo.doit() is still the old version! If we look up the reload-documentation, we see:
When reload() is executed:
Python module’s code is recompiled and the module-level code re-executed,
defining a new set of objects which are bound to names in
the module’s dictionary by reusing the loader which originally loaded
the module. The init function of extension modules is not called a
second time.
init (i.e. PyInit_<module_name>) isn't executed for extension (that means also for Cython-extensions), thus PyModuleDef_Init with foo-module-definition isn't called and one is stuck with the old definition bound to foo.doit. This behavior is sane, because for some extension, init-function isn't supposed to be called twice.
To fix it we have to import the module foo once again:
>>> import foo
>>> foo.doit()
21
Now foo is reloaded as good as it gets - which means there might be still old objects being in use. But I trust you to know what you do.
B: Change the name of your extensions with every version
Another strategy could be to build the module foo.pyx as foo_prefix1.so and then foo_prefix2.so and so on and load it as
>>> import foo_perfixX as foo
This is strategy used by %%cython-magic in IPython, which uses sha1-hash of the Cython-code as prefix.
One can emulate IPython's approach using imp.load_dynamic (or its implementation with help of importlib, as imp is deprecated):
from importlib._bootstrap _load
def load_dynamic(name, path, file=None):
"""
Load an extension module.
"""
import importlib.machinery
loader = importlib.machinery.ExtensionFileLoader(name, path)
# Issue #24748: Skip the sys.modules check in _load_module_shim;
# always load new extension
spec = importlib.machinery.ModuleSpec(
name=name, loader=loader, origin=path)
return _load(spec)
And now putting so-files e.g. into different folders (or adding some suffix), so dlopen sees them as different from previous version we can use it:
# first argument (name="foo") tells how the init-function
# of the extension (i.e. `PyInit_<module_name>`) is called
foo = load_dynamic("foo", "1/foo.cpython-37m-x86_64-linux-gnu.so")
# now foo has new functionality:
foo = load_dynamic("foo", "2/foo.cpython-37m-x86_64-linux-gnu.so")
Even if reloading and reloading of extension in particular is kind of hacky, for prototyping purposes I would probably go with pyximport-solution... or use IPython and %%cython-magic.
I would like to find out disadvantages of using exec for imports. One of the files serves as interface towards real implementations of specific functionalities depending on chosen project (framework is intended to work on several projects).
First use-case goes like this:
exec ("from API.%s.specific_API_%s import *" % (project, project))
This way I don't have to hard code anything except the variable project which is injected in the interface-module itself.
This is the other way:
if project == 'project_one':
from API.project_one.specific_API_project_one import *
elif project == 'project_two':
from API.project_two.specific_API_project_two import *
elif project == 'project_three':
from API.project_three.specific_API_project_three import *
This way I have to alter this interface-file each time new project is added to be supported.
If you need programmatic way to import modules, please use importlib or __import__ (for really specific cases). Reasons — don't re-invent the wheel, there's way to do what you want without exec. If your project variable coming from outer world, exec is a huge security issue.
Wildcard imports considered bad practice — it makes harder to maintain your codebase afterwards.
Oversimplified example of issues with exec by executing arbitrary code:
module = 'request'
func = 'urlopen'
exec("from urllib.%s import %s" % (module, func))
func = 'urlopen; print("hello python")'
exec("from urllib.%s import %s" % (module, func))
yes, your example is harder to forge, but problem stays — giving python arbitrary code to execute is overkill (with potential security gap), when you have tool built exactly for your purpose — programatically importing modules.
I'm debugging from the python console and would like to reload a module every time I make a change so I don't have to exit the console and re-enter it. I'm doing:
>>> from project.model.user import *
>>> reload(user)
but I receive:
>>>NameError: name 'user' is not defined
What is the proper way to reload the entire user class? Is there a better way to do this, perhaps auto-updating while debugging?
Thanks.
As asked, the best you can do is
>>> from project.models.user import *
>>> import project # get module reference for reload
>>> reload(project.models.user) # reload step 1
>>> from project.models.user import * # reload step 2
it would be better and cleaner if you used the user module directly, rather than doing import * (which is almost never the right way to do it). Then it would just be
>>> from project.models import user
>>> reload(user)
This would do what you want. But, it's not very nice. If you really need to reload modules so often, I've got to ask: why?
My suspicion (backed up by previous experience with people asking similar questions) is that you're testing your module. There are lots of ways to test a module out, and doing it by hand in the interactive interpreter is among the worst ways. Save one of your sessions to a file and use doctest, for a quick fix. Alternatively, write it out as a program and use python -i. The only really great solution, though, is using the unittest module.
If that's not it, hopefully it's something better, not worse. There's really no good use of reload (in fact, it's removed in 3.x). It doesn't work effectively-- you might reload a module but leave leftovers from previous versions. It doesn't even work on all kinds of modules-- extension modules will not reload properly, or sometimes even break horribly, when reloaded.
The context of using it in the interactive interpreter doesn't leave a lot of choices as to what you are doing, and what the real best solution would be. Outside it, sometimes people used reload() to implement plugins etc. This is dangerous at best, and can frequently be done differently using either exec (ah the evil territory we find ourselves in), or a segregated process.
For python3.4+, reload has been moved to the importlib module. you can use importlib.reload(). You can refer to this post.
>>> import importlib
>>> import project # get module reference for reload
>>> importlib.reload(project.models.user) # reload step 1
>>> from project.models.user import * # reload step 2
For python3 versions before 3.4, the module to import is imp (instead of importlib)
IPython can reload modules before executing every new line:
%load_ext autoreload
%autoreload 2
Where %autoreload 2reloads "all modules (except those excluded by %aimport) every time before executing the Python code typed."
See the docs:
https://ipython.org/ipython-doc/3/config/extensions/autoreload.html
You can't use reload() in a effective way.
Python does not provide an effective support for reloading or unloading of previously imported
modules; module references makes it impractical to reload a module because references could exist in many places of your program.
Python 3 has removed reload() feature entirely.
Unfortunately you've got to use:
>>> from project.model import user
>>> reload(user)
I don't know off the top of my head of something which will automatically reload modules at the interactive prompt… But I don't see any reason one shouldn't exist (in fact, it wouldn't be too hard to implement, either…)
Now, you could do something like this:
from types import ModuleType
import sys
_reload_builtin = reload
def reload(thing):
if isinstance(thing, ModuleType):
_reload_builtin(thing)
elif hasattr(thing, '__module__') and thing.__module__:
module = sys.modules[thing.__module__]
_reload_builtin(module)
else:
raise TypeError, "reload() argument must be a module or have an __module__"
You could also try twisted.python.rebuild.rebuild.
from test_reload import add_test
where test_reload is a module, and add_test is a function
if you changed the function add_test, of course you need to reload this function.
then you can do this:
import imp
imp.reload(test_reload)
from test_reload import add_test
this will refresh the function add_test.
so you need to add
imp.reload(test_reload)
from test_reload import add_test --add this line in your code
As of Python 3.4 you can use importlib.reload(module)
>>> from importlib import reload
>>> from project.model import user
>>> reload(user)
When writing python modules, is there a way to prevent it being imported twice by the client codes? Just like the c/c++ header files do:
#ifndef XXX
#define XXX
...
#endif
Thanks very much!
Python modules aren't imported multiple times. Just running import two times will not reload the module. If you want it to be reloaded, you have to use the reload statement. Here's a demo
foo.py is a module with the single line
print("I am being imported")
And here is a screen transcript of multiple import attempts.
>>> import foo
Hello, I am being imported
>>> import foo # Will not print the statement
>>> reload(foo) # Will print it again
Hello, I am being imported
Imports are cached, and only run once. Additional imports only cost the lookup time in sys.modules.
As specified in other answers, Python generally doesn't reload a module when encountering a second import statement for it. Instead, it returns its cached version from sys.modules without executing any of its code.
However there are several pitfalls worth noting:
Importing the main module as an ordinary module effectively creates two instances of the same module under different names.
This occurs because during program startup the main module is set up with the name __main__. Thus, when importing it as an ordinary module, Python doesn't detect it in sys.modules and imports it again, but with its proper name the second time around.
Consider the file /tmp/a.py with the following content:
# /tmp/a.py
import sys
print "%s executing as %s, recognized as %s in sys.modules" % (__file__, __name__, sys.modules[__name__])
import b
Another file /tmp/b.py has a single import statement for a.py (import a).
Executing /tmp/a.py results in the following output:
root#machine:/tmp$ python a.py
a.py executing as __main__, recognized as <module '__main__' from 'a.py'> in sys.modules
/tmp/a.pyc executing as a, recognized as <module 'a' from '/tmp/a.pyc'> in sys.modules
Therefore, it is best to keep the main module fairly minimal and export most of its functionality to an external module, as advised here.
This answer specifies two more possible scenarios:
Slightly different import statements utilizing different entries in sys.path leading to the same module.
Attempting another import of a module after a previous one failed halfway through.
Summary: when a certain python module is imported, I want to be able to intercept this action, and instead of loading the required class, I want to load another class of my choice.
Reason: I am working on some legacy code. I need to write some unit test code before I start some enhancement/refactoring. The code imports a certain module which will fail in a unit test setting, however. (Because of database server dependency)
Pseduo Code:
from LegacyDataLoader import load_me_data
...
def do_something():
data = load_me_data()
So, ideally, when python excutes the import line above in a unit test, an alternative class, says MockDataLoader, is loaded instead.
I am still using 2.4.3. I suppose there is an import hook I can manipulate
Edit
Thanks a lot for the answers so far. They are all very helpful.
One particular type of suggestion is about manipulation of PYTHONPATH. It does not work in my case. So I will elaborate my particular situation here.
The original codebase is organised in this way
./dir1/myapp/database/LegacyDataLoader.py
./dir1/myapp/database/Other.py
./dir1/myapp/database/__init__.py
./dir1/myapp/__init__.py
My goal is to enhance the Other class in the Other module. But since it is legacy code, I do not feel comfortable working on it without strapping a test suite around it first.
Now I introduce this unit test code
./unit_test/test.py
The content is simply:
from myapp.database.Other import Other
def test1():
o = Other()
o.do_something()
if __name__ == "__main__":
test1()
When the CI server runs the above test, the test fails. It is because class Other uses LegacyDataLoader, and LegacydataLoader cannot establish database connection to the db server from the CI box.
Now let's add a fake class as suggested:
./unit_test_fake/myapp/database/LegacyDataLoader.py
./unit_test_fake/myapp/database/__init__.py
./unit_test_fake/myapp/__init__.py
Modify the PYTHONPATH to
export PYTHONPATH=unit_test_fake:dir1:unit_test
Now the test fails for another reason
File "unit_test/test.py", line 1, in <module>
from myapp.database.Other import Other
ImportError: No module named Other
It has something to do with the way python resolves classes/attributes in a module
You can intercept import and from ... import statements by defining your own __import__ function and assigning it to __builtin__.__import__ (make sure to save the previous value, since your override will no doubt want to delegate to it; and you'll need to import __builtin__ to get the builtin-objects module).
For example (Py2.4 specific, since that's what you're asking about), save in aim.py the following:
import __builtin__
realimp = __builtin__.__import__
def my_import(name, globals={}, locals={}, fromlist=[]):
print 'importing', name, fromlist
return realimp(name, globals, locals, fromlist)
__builtin__.__import__ = my_import
from os import path
and now:
$ python2.4 aim.py
importing os ('path',)
So this lets you intercept any specific import request you want, and alter the imported module[s] as you wish before you return them -- see the specs here. This is the kind of "hook" you're looking for, right?
There are cleaner ways to do this, but I'll assume that you can't modify the file containing from LegacyDataLoader import load_me_data.
The simplest thing to do is probably to create a new directory called testing_shims, and create LegacyDataLoader.py file in it. In that file, define whatever fake load_me_data you like. When running the unit tests, put testing_shims into your PYTHONPATH environment variable as the first directory. Alternately, you can modify your test runner to insert testing_shims as the first value in sys.path.
This way, your file will be found when importing LegacyDataLoader, and your code will be loaded instead of the real code.
The import statement just grabs stuff from sys.modules if a matching name is found there, so the simplest thing is to make sure you insert your own module into sys.modules under the target name before anything else tries to import the real thing.
# in test code
import sys
import MockDataLoader
sys.modules['LegacyDataLoader'] = MockDataLoader
import module_under_test
There are a handful of variations on the theme, but that basic approach should work fine to do what you describe in the question. A slightly simpler approach would be this, using just a mock function to replace the one in question:
# in test code
import module_under_test
def mock_load_me_data():
# do mock stuff here
module_under_test.load_me_data = mock_load_me_data
That simply replaces the appropriate name right in the module itself, so when you invoke the code under test, presumably do_something() in your question, it calls your mock routine.
Well, if the import fails by raising an exception, you could put it in a try...except loop:
try:
from LegacyDataLoader import load_me_data
except: # put error that occurs here, so as not to mask actual problems
from MockDataLoader import load_me_data
Is that what you're looking for? If it fails, but doesn't raise an exception, you could have it run the unit test with a special command line tag, like --unittest, like this:
import sys
if "--unittest" in sys.argv:
from MockDataLoader import load_me_data
else:
from LegacyDataLoader import load_me_data