While developing a largeish project (split in several files and folders) in Python with IPython, I run into the trouble of cached imported modules.
The problem is that instructions import module only reads the module once, even if that module has changed! So each time I change something in my package, I have to quit and restart IPython. Painful.
Is there any way to properly force reloading some modules? Or, better, to somehow prevent Python from caching them?
I tried several approaches, but none works. In particular I run into really, really weird bugs, like some modules or variables mysteriously becoming equal to None...
The only sensible resource I found is Reloading Python modules, from pyunit, but I have not checked it. I would like something like that.
A good alternative would be for IPython to restart, or restart the Python interpreter somehow.
So, if you develop in Python, what solution have you found to this problem?
Edit
To make things clear: obviously, I understand that some old variables depending on the previous state of the module may stick around. That's fine by me. By why is that so difficult in Python to force reload a module without having all sort of strange errors happening?
More specifically, if I have my whole module in one file module.py then the following works fine:
import sys
try:
del sys.modules['module']
except AttributeError:
pass
import module
obj = module.my_class()
This piece of code works beautifully and I can develop without quitting IPython for months.
However, whenever my module is made of several submodules, hell breaks loose:
import os
for mod in ['module.submod1', 'module.submod2']:
try:
del sys.module[mod]
except AttributeError:
pass
# sometimes this works, sometimes not. WHY?
Why is that so different for Python whether I have my module in one big file or in several submodules? Why would that approach not work??
import checks to see if the module is in sys.modules, and if it is, it returns it. If you want import to load the module fresh from disk, you can delete the appropriate key in sys.modules first.
There is the reload builtin function which will, given a module object, reload it from disk and that will get placed in sys.modules. Edit -- actually, it will recompile the code from the file on the disk, and then re-evalute it in the existing module's __dict__. Something potentially very different than making a new module object.
Mike Graham is right though; getting reloading right if you have even a few live objects that reference the contents of the module you don't want anymore is hard. Existing objects will still reference the classes they were instantiated from is an obvious issue, but also all references created by means of from module import symbol will still point to whatever object from the old version of the module. Many subtly wrong things are possible.
Edit: I agree with the consensus that restarting the interpreter is by far the most reliable thing. But for debugging purposes, I guess you could try something like the following. I'm certain that there are corner cases for which this wouldn't work, but if you aren't doing anything too crazy (otherwise) with module loading in your package, it might be useful.
def reload_package(root_module):
package_name = root_module.__name__
# get a reference to each loaded module
loaded_package_modules = dict([
(key, value) for key, value in sys.modules.items()
if key.startswith(package_name) and isinstance(value, types.ModuleType)])
# delete references to these loaded modules from sys.modules
for key in loaded_package_modules:
del sys.modules[key]
# load each of the modules again;
# make old modules share state with new modules
for key in loaded_package_modules:
print 'loading %s' % key
newmodule = __import__(key)
oldmodule = loaded_package_modules[key]
oldmodule.__dict__.clear()
oldmodule.__dict__.update(newmodule.__dict__)
Which I very briefly tested like so:
import email, email.mime, email.mime.application
reload_package(email)
printing:
reloading email.iterators
reloading email.mime
reloading email.quoprimime
reloading email.encoders
reloading email.errors
reloading email
reloading email.charset
reloading email.mime.application
reloading email._parseaddr
reloading email.utils
reloading email.mime.base
reloading email.message
reloading email.mime.nonmultipart
reloading email.base64mime
Quitting and restarting the interpreter is the best solution. Any sort of live reloading or no-caching strategy will not work seamlessly because objects from no-longer-existing modules can exist and because modules sometimes store state and because even if your use case really does allow hot reloading it's too complicated to think about to be worth it.
With IPython comes the autoreload extension that automatically repeats an import before each function call. It works at least in simple cases, but don't rely too much on it: in my experience, an interpreter restart is still required from time to time, especially when code changes occur only on indirectly imported code.
Usage example from the linked page:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: from foo import some_function
In [4]: some_function()
Out[4]: 42
In [5]: # open foo.py in an editor and change some_function to return 43
In [6]: some_function()
Out[6]: 43
For Python version 3.4 and above
import importlib
importlib.reload(<package_name>)
from <package_name> import <method_name>
Refer below documentation for details.
There are some really good answers here already, but it is worth knowing about dreload, which is a function available in IPython which does as "deep reload". From the documentation:
The IPython.lib.deepreload module allows you to recursively reload a
module: changes made to any of its dependencies will be reloaded
without having to exit. To start using it, do:
http://ipython.org/ipython-doc/dev/interactive/reference.html#dreload
It is available as a "global" in IPython notebook (at least my version, which is running v2.0).
HTH
You can use import hook machinery described in PEP 302 to load not modules themself but some kind of proxy object that will allow you to do anything you want with underlying module object — reload it, drop reference to it etc.
Additional benefit is that your currently existing code will not require change and this additional module functionality can be torn off from a single point in code — where you actually add finder into sys.meta_path.
Some thoughts on implementing: create finder that will agree to find any module, except of builtin (you have nothing to do with builtin modules), then create loader that will return proxy object subclassed from types.ModuleType instead of real module object. Note that loader object are not forced to create explicit references to loaded modules into sys.modules, but it's strongly encouraged, because, as you have already seen, it may fail unexpectably. Proxy object should catch and forward all __getattr__, __setattr__ and __delattr__ to underlying real module it's keeping reference to. You will probably don't need to define __getattribute__ because of you would not hide real module contents with your proxy methods. So, now you should communicate with proxy in some way — you can create some special method to drop underlying reference, then import module, extract reference from returned proxy, drop proxy and hold reference to reloaded module. Phew, looks scary, but should fix your problem without reloading Python each time.
I am using PythonNet in my project. Fortunately, I found there is a command which can perfectly solve this problem.
using (Py.GIL())
{
dynamic mod = Py.Import(this.moduleName);
if (mod == null)
throw new Exception( string.Format("Cannot find module {0}. Python script may not be complied successfully or module name is illegal.", this.moduleName));
// This command works perfect for me!
PythonEngine.ReloadModule(mod);
dynamic instance = mod.ClassName();
Think twice for quitting and restarting in production
The easy solution without quitting & restarting is by using the reload from imp
import moduleA, moduleB
from imp import reload
reload (moduleB)
Related
Sorry for confusing title, let me explain what I mean. I came across a piece of code similar to the following using Google's PrettyTensor API, where it allows for custom functions to be added to the PrettyTensor class through its #prettytensor.Register() decorator.
(located in custom_ops.py)
import prettytensor as pt
#pt.Register(...)
def custom_foo(bar):
...
(located in main.py)
import prettytensor as pt
import custom_ops
x = pt.custom_foo(bar)
This code accesses prettytensor through 2 separate files, and I don't understand why the changes made in one file carry over to the other. What's also interesting is that the order of the imports doesn't matter.
import custom_ops
import prettytensor as pt
x = pt.custom_foo(bar)
The code above still works fine. I would like help finding an explanation for this phenomenon, as I could not find documentation for it anywhere. It seems to me like the python interpreter is caching the module in memory, and when it is altered by the custom_ops file it persists in the interpreter when it is imported again. If anyone knows why this happens, how would you stop it from occurring?
The reason both your modules see the same version of the prettytensor module is that Python caches the module objects it creates when it loads a module for the first time. The same module module object can then be imported any number of times in different places (or even several times within the same module, if you had a reason to do that), without being reloaded from its file.
You can see all the modules that have been loaded in the dictionary sys.modules. Whenever you do an import of a module that's already been loaded, Python will see it in sys.modules and you'll get a reference to the module object that already exists instead of a new module loaded from the .py file.
In general, this is what you want. It's usually a very bad thing if two different parts of the code can get a reference to a module loaded from the same file via two different module names. For instance, you can have two objects that both claim to be instances of class foo.Foo, but they could be instances of two different foo.Foo classes if foo can be accessed two different ways. This can make debugging a real nightmare.
Duplicated modules can happen if your Python module search path is messed up (so that the modules inside a package are also exposed at the top level). It can also happen with the __main__ module (created from the file you're running as a script), which can also be imported using its normal name (e.g. main in your example with main.py).
You can also manually reload a module using the reload function. In Python 2 this was a builtin, but it's stashed away in importlib now in Python 3.
I have a Python module that I am testing, and because of the way that the module works (it does some initialization upon import) have been reloading the module during each unittest that is testing the initialization. The reload is done in the setUp method, so all tests are actually reloading the module, which is fine.
This all works great if I am only running tests in that file during any given Python session because I never required a reference to the previous instance of the module. But when I use Pydev or unittest's discover I get errors as seen here because other tests which import this module have lost their reference to objects in the module since they were imported before all of the reloading business in my tests.
There are similar questions around SO like this one, but those all deal with updating objects after reloads have occurred. What I would like to do is save the state of the module after the initial import, run my tests that do all of the reloading, and then in the test tearDown to put the initial reference to the module back so that tests that run downstream that use the module still have the correct reference. Note that I am not making any changes to the module, I am only reloading it to test some initialization pieces that it does.
There are also some solutions that include hooks in the module code which I am not interested in. I don't want to ask developers to push things into the codebase just so tests can run. I am using Python 2.6 and unittest. I see that some projects exist like process-isolation, and while I am not sure if that does entirely what I am asking for, it does not work for Python 2.6 and I don't want to add new packages to our stack if possible. Stub code follows:
import mypackage.mymodule
saved_module = mypackage.mymodule
class SomeTestThatReloads(unittest.TestCase):
def setUp(self):
reload(mypackage.mymodule)
def tearDown(self):
# What to do here with saved_module?
def test_initialization(self):
# testing scenario code
Unfortunately, there is no simple way to do that. If your module's initialization has side effects (and by the looks of it it does -- hooks, etc.), there is no automated way to undo them, short of entirely restarting the Python process.
Similarly, if anything in your code imports something from your module rather than the module itself (e.g. from my_package.my_module import some_object instead of import my_package.my_module), reloading the module won't do anything to the imported objects (some_object will refer to whatever my_package.my_module.some_object referred to when the import statement was executed, regardless of what you reload and what's on the disk).
The problem this all comes down to is that Python's module system works by executing the modules (which is full of side effects, the definition of classes/functions/variables being only one of many) and then exposing the top-level variables they created, and the Python VM itself treats modules as one big chunk of global state with no isolation.
Therefore, the general solution to your problem is to restart a new Python process after each test (which sucks :( ).
If your modules' initialization side effects are limited, you can try running your tests with Nose instead of Unittest (the tests are compatible, you don't have to rewrite anything), whose Isolate plugin attempts to do what you want: http://nose.readthedocs.org/en/latest/plugins/isolate.html
But it's not guaranteed to work in the general case, because of what I said above.
For efficiency's sake I am trying to figure out how python works with its heap of objects (and system of namespaces, but it is more or less clear). So, basically, I am trying to understand when objects are loaded into the heap, how many of them are there, how long they live etc.
And my question is when I work with a package and import something from it:
from pypackage import pymodule
what objects get loaded into the memory (into the object heap of the python interpreter)? And more generally: what happens? :)
I guess the above example does something like:
some object of the package pypackage was created in the memory (which contains some information about the package but not too much), the module pymodule was loaded into the memory and its reference was created in the local name space. The important thing here is: no other modules of the pypackage (or other objects) were created in the memory, unless it is stated explicitly (in the module itself, or somewhere in the package initialization tricks and hooks, which I am not familiar with). At the end the only one big thing in the memory is pymodule (i.e. all the objects that were created when the module was imported). Is it so? I would appreciate if someone clarified this matter a little bit. Maybe you could advice some useful article about it? (documentation covers more particular things)
I have found the following to the same question about the modules import:
When Python imports a module, it first checks the module registry (sys.modules) to see if the module is already imported. If that’s the case, Python uses the existing module object as is.
Otherwise, Python does something like this:
Create a new, empty module object (this is essentially a dictionary)
Insert that module object in the sys.modules dictionary
Load the module code object (if necessary, compile the module first)
Execute the module code object in the new module’s namespace. All variables assigned by the code will be available via the module object.
And would be grateful for the same kind of explanation about packages.
By the way, with packages a module name is added into the sys.modules oddly:
>>> import sys
>>> from pypacket import pymodule
>>> "pymodule" in sys.modules.keys()
False
>>> "pypacket" in sys.modules.keys()
True
And also there is a practical question concerning the same matter.
When I build a set of tools, which might be used in different processes and programs. And I put them in modules. I have no choice but to load a full module even when all I want is to use only one function declared there. As I see one can make this problem less painful by making small modules and putting them into a package (if a package doesn't load all of its modules when you import only one of them).
Is there a better way to make such libraries in Python? (With the mere functions, which don't have any dependencies within their module.) Is it possible with C-extensions?
PS sorry for such a long question.
You have a few different questions here. . .
About importing packages
When you import a package, the sequence of steps is the same as when you import a module. The only difference is that the packages's code (i.e., the code that creates the "module code object") is the code of the package's __init__.py.
So yes, the sub-modules of the package are not loaded unless the __init__.py does so explicitly. If you do from package import module, only module is loaded, unless of course it imports other modules from the package.
sys.modules names of modules loaded from packages
When you import a module from a package, the name is that is added to sys.modules is the "qualified name" that specifies the module name together with the dot-separated names of any packages you imported it from. So if you do from package.subpackage import mod, what is added to sys.modules is "package.subpackage.mod".
Importing only part of a module
It is usually not a big concern to have to import the whole module instead of just one function. You say it is "painful" but in practice it almost never is.
If, as you say, the functions have no external dependencies, then they are just pure Python and loading them will not take much time. Usually, if importing a module takes a long time, it's because it loads other modules, which means it does have external dependencies and you have to load the whole thing.
If your module has expensive operations that happen on module import (i.e., they are global module-level code and not inside a function), but aren't essential for use of all functions in the module, then you could, if you like, redesign your module to defer that loading until later. That is, if your module does something like:
def simpleFunction():
pass
# open files, read huge amounts of data, do slow stuff here
you can change it to
def simpleFunction():
pass
def loadData():
# open files, read huge amounts of data, do slow stuff here
and then tell people "call someModule.loadData() when you want to load the data". Or, as you suggested, you could put the expensive parts of the module into their own separate module within a package.
I've never found it to be the case that importing a module caused a meaningful performance impact unless the module was already large enough that it could reasonably be broken down into smaller modules. Making tons of tiny modules that each contain one function is unlikely to gain you anything except maintenance headaches from having to keep track of all those files. Do you actually have a specific situation where this makes a difference for you?
Also, regarding your last point, as far as I'm aware, the same all-or-nothing load strategy applies to C extension modules as for pure Python modules. Obviously, just like with Python modules, you could split things up into smaller extension modules, but you can't do from someExtensionModule import someFunction without also running the rest of the code that was packaged as part of that extension module.
The approximate sequence of steps that occurs when a module is imported is as follows:
Python tries to locate the module in sys.modules and does nothing else if it is found. Packages are keyed by their full name, so while pymodule is missing from sys.modules, pypacket.pymodule will be there (and can be obtained as sys.modules["pypacket.pymodule"].
Python locates the file that implements the module. If the module is part of the package, as determined by the x.y syntax, it will look for directories named x that contain both an __init__.py and y.py (or further subpackages). The bottom-most file located will be either a .py file, a .pyc file, or a .so/.pyd file. If no file that fits the module is found, an ImportError will be raised.
An empty module object is created, and the code in the module is executed with the module's __dict__ as the execution namespace.1
The module object is placed in sys.modules, and injected into the importer's namespace.
Step 3 is the point at which "objects get loaded into memory": the objects in question are the module object, and the contents of the namespace contained in its __dict__. This dict typically contains top-level functions and classes created as a side effect of executing all the def, class, and other top-level statements normally contained in each module.
Note that the above only desribes the default implementation of import. There is a number of ways one can customize import behavior, for example by overriding the __import__ built-in or by implementing import hooks.
1 If the module file is a .py source file, it will be compiled into memory first, and the code objects resulting from the compilation will be executed. If it is a .pyc, the code objects will be obtained by deserializing the file contents. If the module is a .so or a .pyd shared library, it will be loaded using the operating system's shared-library loading facility, and the init<module> C function will be invoked to initialize the module.
During testing I added a reload command to my test cases so I could change code in several different places without having to reload everything manually and I noticed the reloads seemed to affect the results of the tests.
Here's what I did:
import mymodule
import mymodule.rules as rules
def testcase():
reload(mymodule)
reload(rules)
# The rest of the test case
Everything works fine like this, or when both reloads are commented out, but when I comment out the second reload the results of the test are different. Is there something that happens during the reload process that I'm not aware of that necessitates reloading all scripts from a module once the module is reloaded? Is there some other explanation?
I'm not sure if this is relevant, but rules is a separate script inside the package that includes this line:
from mymodule import Rule
The information in your question is rather vague, and your terminology rather non-standard. From
rules is a separate script inside mymodule.
I infer that mymodule is actually a package, and it seems it does not automatically import rules upon import. This implies that after executing
import mymodule
there will be no mymodule.rules, but after executing
import mymodule.rules as rules
the module rules will be imported into the namespace of mymodule. (Side note: The latter code line is usually written as from mymodule import rules.)
After executing the first reload() statement, you will get a frsh copy of mymodule, which wil not contain mymodule.rules – this will only be recreated after the second reload() statement.
I had to do a lot of guessing for this answer, so I might have gotten it wrong. The reload() statement has lots of subtleties, as can be seen in its documentation, so its better to only use it if you are closely familiar with Python's import machinery.
(Another side note: If rule.py resides inside the package mymodule, as your setup seems to be, you should use a relative import there. Instead of
from mymodule import Rule
you should do
from . import Rule
I also recommend from __future__ import absolute_import for more transparent import rules.)
I'm not sure exactly what is causing your problem, but I think you may be misusing reload().
According to the docs, reload() will
Reload a previously imported module.
However, if you're running this in a testcase, there won't be any changes to the module between when you import it and when you reload it, right? In order for there to be changes, I think you would have to be changing those files as the test case runs, which is probably not a good idea.
I'm running a website using Django, and I import ipdb at the beginning of almost all of my scripts to make debugging easier. However, most of the time I never use the functions from the module (only when I'm debugging).
Just wondering, will this decrease my performance? It's just that when I want to create a breakpoint I prefer to write:
ipdb.set_trace()
as opposed to:
import ipdb; ipdb.set_trace()
But I've seen the second example done in several places, which makes me wonder if it's more efficient...
I just don't know how importing python modules relates to efficiency (assuming you're not using the module methods within your script).
As #wRAR mentioned, Loading a module may imply executing any amounts of code which can take any amount of time. On the other hand, the module will only be loaded once and any subsequent attempt to import will find the module present in os.sys.modules and reference to that.
In a Django environment in debuging mode, modules are removed from Django's AppCache and actually re-imported only when they are changed, which you will probably not do with ipdb, so in your case it should not be an issue.
However, in cases it would be an issue, there are some ways around it. Suppose you have a custom module that you use to load anyway, you can add a function to it that imports ipdb only when you require it:
# much used module: mymodule
def set_trace():
import ipdb
ipdb.set_trace()
in the module you want to use ipdb.set_trace:
import mymodule
mymodule.set_trace()
or, on top of your module, use the cross-module __debug__ variable:
if __debug__:
from ipdp import set_trace
else:
def set_trace(): return
Short answer: Not usually
Long answer:
It will take time to load the module. This may be noticeable if you are loading python off a network drive or other slow source. But if running directly off a hard drive you'll never notice.
As #wRar points out, importing a module can execute any amount of code. You can have whatever code you want executed at module startup. However, most modules avoid executing unreasonable amounts of code during startup. So that itself probably isn't a huge cause.
However, importing very large modules especially those that also result in importing a large number of c modules will take time.
So importing will take time, but only once per module imported. If you import modules at the top of your modules (as opposed to in functions) it only applies to startup time anyways. Basically, you aren't going to get much optimisation mileage out of avoiding importing modules.
Importing a module but not using it decreases (the system) performance:
It takes time to import the module
Imported modules use up memory
While the first point makes your program slower to start, the second point might make ALL your programs slower, depending on the total amount of memory you have on your system.