reload module with pyximport? - python

I have a python program that loads quite a bit of data before running. As such, I'd like to be able to reload code without reloading data. With regular python, importlib.reload has been working fine. Here's an example:
setup.py:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
extensions = [
Extension("foo.bar", ["foo/bar.pyx"],
language="c++",
extra_compile_args=["-std=c++11"],
extra_link_args=["-std=c++11"])
]
setup(
name="system2",
ext_modules=cythonize(extensions, compiler_directives={'language_level' : "3"}),
)
foo/bar.py
cpdef say_hello():
print('Hello!')
runner.py:
import pyximport
pyximport.install(reload_support=True)
import foo.bar
import subprocess
from importlib import reload
if __name__ == '__main__':
def reload_bar():
p = subprocess.Popen('python setup.py build_ext --inplace',
shell=True,
cwd='<your directory>')
p.wait()
reload(foo.bar)
foo.bar.say_hello()
But this doesn't seem to work. If I edit bar.pyx and run reload_bar I don't see my changes. I also tried pyximport.build_module() with no luck -- the module rebuilt but didn't reload. I'm running in a "normal" python shell, not IPython if it makes a difference.

I was able to get a solution working for Python 2.x a lot easier than Python 3.x. For whatever reason, Cython seems to be caching the shareable object (.so) file it imports your module from, and even after rebuilding and deleting the old file while running, it still imports from the old shareable object file. However, this isn't necessary anyways (when you import foo.bar, it doesn't create one), so we can just skip this anyways.
The largest problem was that python kept a reference to the old module, even after reloading. Normal python modules seem to work find, but not anything cython related. To fix this, I run execute two statements in place of reload(foo.bar)
del sys.modules['foo.bar']
import foo.bar
This successfully (though probably less efficiently) reloads the cython module. The only issue that remains in in Python 3.x running that subprocess creates a problematic shareable objects. Instead, skip that all together and let the import foo.bar work its magic with the pyximporter module, and recompile for you. I also added an option to the pyxinstall command to specify the language level to match what you've specified in the setup.py
pyximport.install(reload_support=True, language_level=3)
So all together:
runner.py
import sys
import pyximport
pyximport.install(reload_support=True, language_level=3)
import foo.bar
if __name__ == '__main__':
def reload_bar():
del sys.modules['foo.bar']
import foo.bar
foo.bar.say_hello()
input(" press enter to proceed ")
reload_bar()
foo.bar.say_hello()
Other two files remained unchanged
Running:
Hello!
press enter to proceed
-replace "Hello!" in foo/bar.pyx with "Hello world!", and press Enter.
Hello world!

Cython-extensions are not the usual python-modules and thus the behavior of the underlying OS shimmers through. This answer is about Linux, but also other OSes have similar behavior/problems (ok, Windows wouldn't even allow you to rebuild the extension).
A cython-extension is a shared object. When importing, CPython opens this shared object via ldopen and calls the init-function, i.e. PyInit_<module_name> in Python3, which among other things registers the functions/functionality provided by the extension.
Is a shared-object loaded, we no longer can unload it, because there might be some Python objects alive, which would then have dangling pointers instead of function-pointers to the functionality from the original shared-object. See for example this CPython-issue.
Another important thing: When ldopen loads a shared object with the same path as one already loaded shared object, it will not read it from the disc, but just reuse the already loaded version - even if there is a different version on the disc.
And this is the problem with our approach: As long as the resulting shared object has the same name as the old one, you will never get to see the new functionality in the interpreter without restarting it.
What are your options?
A: Use pyximport with reload_support=True
Let's assume your Cython (foo.pyx) module looks as follows:
def doit():
print(42)
# called when loaded:
doit()
Now import it with pyximport:
>>> import pyximport
>>> pyximport.install(reload_support=True)
>>> import foo
42
>>> foo.doit()
42
foo.pyx was built and loaded (we can see, it prints 42 while loading, as expected). Let's take a look at the file of foo:
>>> foo.__file__
'/home/XXX/.pyxbld/lib.linux-x86_64-3.6/foo.cpython-36m-x86_64-linux-gnu.so.reload1'
You can see the additional reload1-suffix compared to the case built with reload_support=False. Seeing the file-name, we also verify that there is no other foo.so lying in the path somewhere and being wrongly loaded.
Now, let's change 42 to 21 in the foo.pyx and reload the file:
>>> import importlib
>>> importlib.reload(foo)
21
>>> foo.doit()
42
>>> foo.__file__
'/home/XXX/.pyxbld/lib.linux-x86_64-3.6/foo.cpython-36m-x86_64-linux-gnu.so.reload2'
What happened? pyximport built an extension with a different prefix (reload2) and loaded it. It was successful, because the name/path of the new extension is different due to the new prefix and we can see 21 printed while loaded.
However, foo.doit() is still the old version! If we look up the reload-documentation, we see:
When reload() is executed:
Python module’s code is recompiled and the module-level code re-executed,
defining a new set of objects which are bound to names in
the module’s dictionary by reusing the loader which originally loaded
the module. The init function of extension modules is not called a
second time.
init (i.e. PyInit_<module_name>) isn't executed for extension (that means also for Cython-extensions), thus PyModuleDef_Init with foo-module-definition isn't called and one is stuck with the old definition bound to foo.doit. This behavior is sane, because for some extension, init-function isn't supposed to be called twice.
To fix it we have to import the module foo once again:
>>> import foo
>>> foo.doit()
21
Now foo is reloaded as good as it gets - which means there might be still old objects being in use. But I trust you to know what you do.
B: Change the name of your extensions with every version
Another strategy could be to build the module foo.pyx as foo_prefix1.so and then foo_prefix2.so and so on and load it as
>>> import foo_perfixX as foo
This is strategy used by %%cython-magic in IPython, which uses sha1-hash of the Cython-code as prefix.
One can emulate IPython's approach using imp.load_dynamic (or its implementation with help of importlib, as imp is deprecated):
from importlib._bootstrap _load
def load_dynamic(name, path, file=None):
"""
Load an extension module.
"""
import importlib.machinery
loader = importlib.machinery.ExtensionFileLoader(name, path)
# Issue #24748: Skip the sys.modules check in _load_module_shim;
# always load new extension
spec = importlib.machinery.ModuleSpec(
name=name, loader=loader, origin=path)
return _load(spec)
And now putting so-files e.g. into different folders (or adding some suffix), so dlopen sees them as different from previous version we can use it:
# first argument (name="foo") tells how the init-function
# of the extension (i.e. `PyInit_<module_name>`) is called
foo = load_dynamic("foo", "1/foo.cpython-37m-x86_64-linux-gnu.so")
# now foo has new functionality:
foo = load_dynamic("foo", "2/foo.cpython-37m-x86_64-linux-gnu.so")
Even if reloading and reloading of extension in particular is kind of hacky, for prototyping purposes I would probably go with pyximport-solution... or use IPython and %%cython-magic.

Related

Execute bytecode .pyc from python code?

I have a bytecode document that declares functions and a logo. I also have a .py file where I call the bytecode to output the logo and strings in the functions. How do I go about actually executing the bytecode? I was able to dissemble it and see the assembly code. How can I actually run it?
question.py
import dis
import logo
def work_here():
# execute the bytecode
def main():
work_here()
if __name__ == '__main__':
main()
Try something like:
import dis
code = 'some byte code'
b_code = dis.Bytecode(code)
exec(b.codeobj)
To import a .pyc file, you just do the same thing you do with a .py file: import spam will find an appropriately-placed spam.pyc (or rather, something like __pycache__/spam.cpython-36.pyc) just as it will find an appropriately-placed spam.py. Its top-level code gets run, any functions and classes get defined so you can call them, etc., exactly the same as with a .py file; the only difference is that there isn't source text to show for things like tracebacks or debugger stepping.
If you want to programmatically import a .pyc file by explicit path, or execute one without importing it, you again do the same thing you do with a .py file.
Look at the Examples in importlib. For example:
path = 'bytecoderepo/myfile.pyc'
spec = importlib.util.spec_from_file('myfile', path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
And now, the code in bytecoderepo/myfile.pyc has been executed, and the resulting module is available in the variable mod, but it isn't in sys.modules or stored as a global.
If you actually need to dig into the .pyc format and, e.g., extract the bytecode of some function so you can exec it (or build a function object out of it) without executing the main module code, the details are only documented in the source, and subject to change between Python versions. Start with importlib; being able to (validate and) skip over the header and marshal.loads the body may be as far as you need to learn, but probably not (since ultimately, that's what the module loader already does for you in the sample code above, so if that's not good enough, you need to get deeper into the internals).

In Python, can one get the path of a relative import imported after and before chdir calls?

I'm looking to get the path of a module after os.chdir has been called.
In this example:
import os
os.chdir('/some/location')
import foo # foo is located in the current directory.
os.chdir('/other/location')
# How do I get the path of foo now? ..is this impossible?
..the foo.__file__ variable will be 'foo.py', as will inspect.stack()[0][1] -- yet, there's no way to know where 'foo.py' is located now, right?
What could I use, outside (or inside, without storing it as a variable at import time) of 'foo', which would allow me to discover the location of foo?
I'm attempting to build a definitive method to determine which file a module is executing from. Since I use IPython as a shell, this is something I could actually run into.
Example usage:
I have two versions of a project I'm working on, and I'm comparing their behavior during the process of debugging them. ..let's say they're in the directories 'proj1' and 'proj2'. ..which foo do I have loaded in the IPython interpreter again?
The ideal:
In [242]: from my_tools import loc
In [243]: loc(foo)
'/home/blah/projects/proj2/foo.py'
** As abarnert noted, that is not possible, as python does not record the base directory location of relative imports. This will, however, work with normal (non-relative) imports.
** Also, regular python (as opposed to IPython) does not allow imports from the current directory, but rather only from the module directory.
The information isn't available anymore, period. Tracebacks, the debugger, ipython magic, etc. can't get at it. For example:
# foo.py
def bar():
1/0
$ ipython
In [1]: import foo
In [2]: os.chdir('/tmp')
In [3]: foo.baz()
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-5-a70d319d0d05> in <module>()
----> 1 foo.baz()
/private/tmp/foo.pyc in baz()
ZeroDivisionError: integer division or modulo by zero
So:
the foo.__file__ variable will be 'foo.py', as will inspect.stack()[0][1] -- yet, there's no way to know where 'foo.py' is located now, right?
Right. As you can see, Python treats it as a relative path, and (incorrectly) resolves it according to the current working directory whenever it needs an absolute path.
What could I use, outside (or inside, without storing it as a variable at import time) of 'foo', which would allow me to discover the location of foo?
Nothing. You have to store it somewhere.
The obvious thing to do is to store os.path.abspath(foo.__file__) from outside, or os.path.abspath(__file__) from inside, at import time. Not what you were hoping for, but I can't think of anything better.
If you want to get tricky, you can build an import hook that modifies modules as they're imported, adding a new __abspath__ attribute or, more simply, changing __file__ to always been an abspath. This is easier with the importlib module Python 3.1+.
As a quick proof of concept, I slapped together abspathimporter. After doing an import imppath, every further import you do that finds a normal .py file or package will absify its __file__.
I don't know whether it works for .so/.pyd modules, or .pyc modules without source. It definitely doesn't work for modules inside zipfiles, frozen modules, or anything else that doesn't use the stock FileFinder. It won't retroactively affect the paths of anything imported before it. It requires 3.3+, and is horribly fragile (most seriously, the FileFinder class or its hook function has to be the last thing in sys.path_hooks—which it is by default in CPython 3.3.0-3.3.1 on four Mac and linux boxes I tested, but certainly isn't guaranteed).
But it shows what you can do if you want to. And honestly, for playing around in iPython for the past 20 minutes or so, it's kind of handy.
import os
import foo
foodir = os.getcwd()
os.chdir('/other/location')
foodir now has original directory stored in it...

Proper way to reload a python module from the console

I'm debugging from the python console and would like to reload a module every time I make a change so I don't have to exit the console and re-enter it. I'm doing:
>>> from project.model.user import *
>>> reload(user)
but I receive:
>>>NameError: name 'user' is not defined
What is the proper way to reload the entire user class? Is there a better way to do this, perhaps auto-updating while debugging?
Thanks.
As asked, the best you can do is
>>> from project.models.user import *
>>> import project # get module reference for reload
>>> reload(project.models.user) # reload step 1
>>> from project.models.user import * # reload step 2
it would be better and cleaner if you used the user module directly, rather than doing import * (which is almost never the right way to do it). Then it would just be
>>> from project.models import user
>>> reload(user)
This would do what you want. But, it's not very nice. If you really need to reload modules so often, I've got to ask: why?
My suspicion (backed up by previous experience with people asking similar questions) is that you're testing your module. There are lots of ways to test a module out, and doing it by hand in the interactive interpreter is among the worst ways. Save one of your sessions to a file and use doctest, for a quick fix. Alternatively, write it out as a program and use python -i. The only really great solution, though, is using the unittest module.
If that's not it, hopefully it's something better, not worse. There's really no good use of reload (in fact, it's removed in 3.x). It doesn't work effectively-- you might reload a module but leave leftovers from previous versions. It doesn't even work on all kinds of modules-- extension modules will not reload properly, or sometimes even break horribly, when reloaded.
The context of using it in the interactive interpreter doesn't leave a lot of choices as to what you are doing, and what the real best solution would be. Outside it, sometimes people used reload() to implement plugins etc. This is dangerous at best, and can frequently be done differently using either exec (ah the evil territory we find ourselves in), or a segregated process.
For python3.4+, reload has been moved to the importlib module. you can use importlib.reload(). You can refer to this post.
>>> import importlib
>>> import project # get module reference for reload
>>> importlib.reload(project.models.user) # reload step 1
>>> from project.models.user import * # reload step 2
For python3 versions before 3.4, the module to import is imp (instead of importlib)
IPython can reload modules before executing every new line:
%load_ext autoreload
%autoreload 2
Where %autoreload 2reloads "all modules (except those excluded by %aimport) every time before executing the Python code typed."
See the docs:
https://ipython.org/ipython-doc/3/config/extensions/autoreload.html
You can't use reload() in a effective way.
Python does not provide an effective support for reloading or unloading of previously imported
modules; module references makes it impractical to reload a module because references could exist in many places of your program.
Python 3 has removed reload() feature entirely.
Unfortunately you've got to use:
>>> from project.model import user
>>> reload(user)
I don't know off the top of my head of something which will automatically reload modules at the interactive prompt… But I don't see any reason one shouldn't exist (in fact, it wouldn't be too hard to implement, either…)
Now, you could do something like this:
from types import ModuleType
import sys
_reload_builtin = reload
def reload(thing):
if isinstance(thing, ModuleType):
_reload_builtin(thing)
elif hasattr(thing, '__module__') and thing.__module__:
module = sys.modules[thing.__module__]
_reload_builtin(module)
else:
raise TypeError, "reload() argument must be a module or have an __module__"
You could also try twisted.python.rebuild.rebuild.
from test_reload import add_test
where test_reload is a module, and add_test is a function
if you changed the function add_test, of course you need to reload this function.
then you can do this:
import imp
imp.reload(test_reload)
from test_reload import add_test
this will refresh the function add_test.
so you need to add
imp.reload(test_reload)
from test_reload import add_test --add this line in your code
As of Python 3.4 you can use importlib.reload(module)
>>> from importlib import reload
>>> from project.model import user
>>> reload(user)

How to prevent a module from being imported twice?

When writing python modules, is there a way to prevent it being imported twice by the client codes? Just like the c/c++ header files do:
#ifndef XXX
#define XXX
...
#endif
Thanks very much!
Python modules aren't imported multiple times. Just running import two times will not reload the module. If you want it to be reloaded, you have to use the reload statement. Here's a demo
foo.py is a module with the single line
print("I am being imported")
And here is a screen transcript of multiple import attempts.
>>> import foo
Hello, I am being imported
>>> import foo # Will not print the statement
>>> reload(foo) # Will print it again
Hello, I am being imported
Imports are cached, and only run once. Additional imports only cost the lookup time in sys.modules.
As specified in other answers, Python generally doesn't reload a module when encountering a second import statement for it. Instead, it returns its cached version from sys.modules without executing any of its code.
However there are several pitfalls worth noting:
Importing the main module as an ordinary module effectively creates two instances of the same module under different names.
This occurs because during program startup the main module is set up with the name __main__. Thus, when importing it as an ordinary module, Python doesn't detect it in sys.modules and imports it again, but with its proper name the second time around.
Consider the file /tmp/a.py with the following content:
# /tmp/a.py
import sys
print "%s executing as %s, recognized as %s in sys.modules" % (__file__, __name__, sys.modules[__name__])
import b
Another file /tmp/b.py has a single import statement for a.py (import a).
Executing /tmp/a.py results in the following output:
root#machine:/tmp$ python a.py
a.py executing as __main__, recognized as <module '__main__' from 'a.py'> in sys.modules
/tmp/a.pyc executing as a, recognized as <module 'a' from '/tmp/a.pyc'> in sys.modules
Therefore, it is best to keep the main module fairly minimal and export most of its functionality to an external module, as advised here.
This answer specifies two more possible scenarios:
Slightly different import statements utilizing different entries in sys.path leading to the same module.
Attempting another import of a module after a previous one failed halfway through.

python refresh/reload

This is a very basic question - but I haven't been able to find an answer by searching online.
I am using python to control ArcGIS, and I have a simple python script, that calls some pre-written code.
However, when I make a change to the pre-written code, it does not appear to result in any change. I import this module, and have tried refreshing it, but nothing happens.
I've even moved the file it calls to another location, and the script still works fine. One thing I did yesterday was I added the folder where all my python files are to the sys path (using sys.append('path') ), and I wonder if that made a difference.
Thanks in advance, and sorry for the sloppy terminology.
It's unclear what you mean with "refresh", but the normal behavior of Python is that you need to restart the software for it to take a new look on a Python module and reread it.
If your changes isn't taken care of even after restart, then this is due to one of two errors:
The timestamp on the pyc-file is incorrect and some time in the future.
You are actually editing the wrong file.
You can with reload re-read a file even without restarting the software with the reload() command. Note that any variable pointing to anything in the module will need to get reimported after the reload. Something like this:
import themodule
from themodule import AClass
reload(themodule)
from themodule import AClass
One way to do this is to call reload.
Example: Here is the contents of foo.py:
def bar():
return 1
In an interactive session, I can do:
>>> import foo
>>> foo.bar()
1
Then in another window, I can change foo.py to:
def bar():
return "Hello"
Back in the interactive session, calling foo.bar() still returns 1, until I do:
>>> reload(foo)
<module 'foo' from 'foo.py'>
>>> foo.bar()
'Hello'
Calling reload is one way to ensure that your module is up-to-date even if the file on disk has changed. It's not necessarily the most efficient (you might be better off checking the last modification time on the file or using something like pyinotify before you reload), but it's certainly quick to implement.
One reason that Python doesn't read from the source module every time is that loading a module is (relatively) expensive -- what if you had a 300kb module and you were just using a single constant from the file? Python loads a module once and keeps it in memory, until you reload it.
If you are running in an IPython shell, then there are some magic commands that exist.
The IPython docs cover this feature called the autoreload extension.
Originally, I found this solution from Jonathan March's blog posting on this very subject (see point 3 from that link).
Basically all you have to do is the following, and changes you make are reflected automatically after you save:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: Import MODULE
In [4]: my_class = Module.class()
my_class.printham()
Out[4]: ham
In [5]: #make changes to printham and save
In [6]: my_class.printham()
Out[6]: hamlet
I used the following when importing all objects from within a module to ensure web2py was using my current code:
import buttons
import table
reload(buttons)
reload(table)
from buttons import *
from table import *
I'm not really sure that is what you mean, so don't hesitate to correct me. You are importing a module - let's call it mymodule.py - in your program, but when you change its contents, you don't see the difference?
Python will not look for changes in mymodule.py each time it is used, it will load it a first time, compile it to bytecode and keep it internally. It will normally also save the compiled bytecode (mymodule.pyc). The next time you will start your program, it will check if mymodule.py is more recent than mymodule.pyc, and recompile it if necessary.
If you need to, you can reload the module explicitly:
import mymodule
[... some code ...]
if userAskedForRefresh:
reload(mymodule)
Of course, it is more complicated than that and you may have side-effects depending on what you do with your program regarding the other module, for example if variables depends on classes defined in mymodule.
Alternatively, you could use the execfile function (or exec(), eval(), compile())
I had the exact same issue creating a geoprocessing script for ArcGIS 10.2. I had a python toolbox script, a tool script and then a common script. I have a parameter for Dev/Test/Prod in the tool that would control which version of the code was run. Dev would run the code in the dev folder, test from test folder and prod from prod folder. Changes to the common dev script would not run when the tool was run from ArcCatalog. Closing ArcCatalog made no difference. Even though I selected Dev or Test it would always run from the prod folder.
Adding reload(myCommonModule) to the tool script resolved this issue.
The cases will be different for different versions of python.
Following shows an example of python 3.4 version or above:
hello import hello_world
#Calls hello_world function
hello_world()
HI !!
#Now changes are done and reload option is needed
import importlib
importlib.reload(hello)
hello_world()
How are you?
For earlier python versions like 2.x, use inbuilt reload function as stated above.
Better is to use ipython3 as it provides autoreload feature.

Categories

Resources