Are Python modules ever garbage collected?

Are Python modules ever garbage collected? - python

If I load a module in Python, will it ever be garbage collected? Another way of framing this question is, where does Python keep references to Python modules? As I assume if there are no longer any references, the garbage collector will remove a module.
Here's an example I tried in the in Python interpreter:
>>> from importlib import import_module
>>> import sys
>>> import gc
>>> x = import_module('math')
>>> 'math' in sys.modules
This outputs:
True
So let's delete the reference to the module in the script.
>>> del x
>>> gc.collect()
>>> 'math' in sys.modules
Python still keeps track of the math module, as the output is still:
True
But now if I delete math from sys.modules, I no longer am aware of any further references:
>>> del sys.modules['math']
>>> gc.collect()
Howver, the output of gc.collect() is:
0
Nothing was garbage collected, so the module is no longer in sys.modules or my script. Why was it not garbage collected?

In general, at least in 3.4 and later, module objects shouldn’t be anything special in this regard. Of course normally there’s a reference to every loaded module in sys.modules, but if you’ve explicitly deleted that, a module should be able to go away.
That being said, there have definitely been problems in the past that prevent that from happening in some cases, and I wouldn’t promise that there aren’t any such problems left as of 3.7.
Unfortunately, your test is not actually testing anything. Presumably you’re using CPython. In CPython, the garbage collector uses reference counting—it stores a count directly on each object, incrementing and decrementing count every time a new name is bound to it, and immediately deleting it if the count goes to 0. The thing in the gc module is a cycle collector, which is needed to handle some special cases where two (or more) objects refer to each other but nobody else refers to them. If the module isn’t part of such a cycle, it’ll be deleted before you call gc.collect(), so of course that will return 0. But that 0 tells you nothing.
There are other problems with your test.
First, you should not test garbage in the interactive interpreter. All kinds of extra stuff gets kept around there, in ways that are complicated to explain. It’s much better to write a test script.
Second, you shouldn’t be using math as your test. It’s an extension module (that is, written in C rather than Python), and even after the major changes in 3.5, they still don’t work the same. It’s also a core module that may be part of startup or otherwise needed by other parts of the interpreter, even if you aren’t referencing it from your code. So, far better to use something else.
Anyway, I think there may be a way to test this directly, without using the debugger, but no promises on whether it’ll work.
First, you need to create a subclass of types.ModuleType, which has a __del__ method that prints out some message. Then, you just need to import a module (a .py one, not an extension module) and set its __class__ to that subclass. Which may be as simple as __class__ = MyModuleSubclass in the .py file. Now, when it gets collected, its destructor will run, and you’ll have proof that it was collected. (Well, proof that it was collected unless the destructor revived it, but if your destructor doesn’t do anything but print a static string, that hopefully isn’t a worry.)

Based on the answer from abarnert, I created the following run-it-yourself example that demonstrates the behaviour I was trying to understand:
from types import ModuleType
from importlib import import_module
import sys
class MyModule(ModuleType):
def __del__(self):
print('I am being deleted')
if __name__ == '__main__':
x = import_module('urllib3')
x.__class__ = MyModule
del x
del sys.modules['urllib3'] # Comment this out and urllib3 will NOT be garbage collected before the script finishes
print('finishing')
Output when run as is:
I am being deleted
finishing
Output with the del sys.modules['urllib3'] line commented out:
finishing
I am being deleted
It is clear that modules are garbage collected as one would expect when all references to them have been deleted, and that unless the module in question is somewhat particular, this occurs when references in the application and in sys.modules have been deleted.

Related

Reload function fails to erase removed variables

I am trying to access variables from a module that is being modified while the main script runs by using reload. But the reload function fails to erase the variables that have been removed from the module. How to force python to erase them ?
Here is my code
My module.py :
a = 1
b = 2
My main scipt.py :
import time
from importlib import reload
import module
while True:
reload(module)
print('------- module reloaded')
try:
print('a: ', module.a)
except AttributeError:
print('a: ', 'undefined')
try:
print('b: ', module.b)
except AttributeError:
print('b: ', 'undefined')
try:
print('c: ', module.c)
except AttributeError:
print('c: ', 'undefined')
time.sleep(5)
As expected, if I run my script with python (3.5.1) I get the output:
------- module reloaded
a: 1
b: 2
c: undefined
But I get an unexpected behavior, when I change the module.py as following:
# a = 1
b = 3
c = 4
I have the following output:
------- module reloaded
a: 1
b: 3
c: 4
This means that reload correctly updated the value of b and added the new variable c. But it failed to erase the variable a that has been removed from the module. It seems to only perform updates on the variables that are found in the new version of the module. How to force the reload function to erase removed values ?
Thank you for your help

The issue here is that reload is implemented with code that, roughly, execs the current version of the module code in the existing cached module's namespace. The reason for doing this is that reload is intended to have a global effect; reload(module) doesn't just reload it for you, it changes every other module's personally imported copy of module. If it created a new module namespace, other modules would still have the old module cached; while erasing the contents of the old module before execing might help, it would risk breaking already imported submodules of a package, trigger race conditions (where a thread might see module.a before and after the reload, but it would mysteriously disappear for a moment during the reload), etc.
As the docs note:
When a module is reloaded, its dictionary (containing the module’s global variables) is retained. Redefinitions of names will override the old definitions, so this is generally not a problem. If the new version of a module does not define a name that was defined by the old version, the old definition remains.
There are workarounds that bypass this safety mechanism if you absolutely must do it. The simplest is to simply remove the module from the module cache and reimport it, rather than reloading it:
import sys # At top of file
del sys.modules['module']
import module
That won't update any other importers of the module (they'll keep the stale cache), but if the module is only used in your module, that'll work.
Another approach that might work (untested, and it's kind of insane) would be to explicitly delete all the public names from the module before reloading with something like:
# Intentionally done as list comprehension so no modification to module's globals dict
# occurs while we're iterating it
# Might make sense to use dir or iterate module.__all__ if available instead of using vars;
# depends on design
for name in [n for n in vars(module) if not n.startswith('_')]:
try:
delattr(module, name)
except Exception:
pass # Undeletable attribute for whatever reason, just ignore and let reload deal with it
reload(module) # Now exec occurs in clean namespace
That would avoid the stale cache issue, in exchange for insanity.
Really, the answer is "don't use a design that depends on production reloading"; if the module is just data, store it as a JSON file or the like and just reparse it (which is generally much cheaper than what the Python import machinery goes through to import a module).

This is by design. From the docs:
When a module is reloaded, its dictionary (containing the module’s global variables) is retained. Redefinitions of names will override the old definitions, so this is generally not a problem. If the new version of a module does not define a name that was defined by the old version, the old definition remains.
Generally, reload is meant for convenience within interactive interpreter sessions; it is not really meant to be used in real scripts. You should probably re-think your design. (e.g. is there a reason that module can't be a regular text file?)

import inside a function: is memory reclaimed upon function exit?

Linked questions:
python - import at top of file vs inside a function
Should Python import statements always be at the top of a module?
If an import statement is inside a function, will the memory occupied by it get reclaimed once the function exits? If yes, is the timing of the reclamation deterministic (or even -ish)?
def func():
import os
...
# function about to exit; will memory occupied by `os` be freed?
If anyone has knowledge on the behavior of micropython on this topic, bonus points.

The first import executes the code in the module. It creates the module object's attributes. Each subsequent import just references the module object created by the first import.
Module objects in Python are effectively singletons. For this to work, the Python implementation has to keep the one and only module instance around after the first import, regardless of the name the module was bound to. If it was bound to a name anyway, as there are also imports of the form from some_module import some_name.
So no, the memory isn't reclaimed.
No idea about Micropython, but I would be surprised if it changes semantics here that drastically. You can simply test this yourself:
some_module.py:
value = 0
some_other_module.py:
def f():
import some_module
some_module.value += 1
print(some_module.value)
f()
f()
This should print the numbers 1 and 2.

To second what #BlackJack wrote, per Python semantics, an "import" statement adds module reference to sys.modules, that alone does keep the module object from being garbage collected.
You can try to do del sys.modules["some_module"], but there's no guarantee that all memory taken by the module would be reclaimed. (That issue popped up previously, but I don't remember the current state of it, e.g. if bytecode objects can be garbage-collected).
If yes, is the timing of the reclamation deterministic (or even -ish)?
In MicroPython, "reclamation time" is guaranteedly non-deterministic, because it uses purely garbage collection scheme, no reference counting. That means that any resource-consuming objects (files, sockets) should be closed explicitly.
Otherwise, function-level imports are valid and useful idiom in Python, and especially useful in MicroPython. It allows to import some module only if a particular code path is hit. E.g. if user never calls some function, a module will not be imported, saving more memory for tasks user needs more in this particular application/invocation.

dynamic module creation

I'd like to dynamically create a module from a dictionary, and I'm wondering if adding an element to sys.modules is really the best way to do this. EG
context = { a: 1, b: 2 }
import types
test_context_module = types.ModuleType('TestContext', 'Module created to provide a context for tests')
test_context_module.__dict__.update(context)
import sys
sys.modules['TestContext'] = test_context_module
My immediate goal in this regard is to be able to provide a context for timing test execution:
import timeit
timeit.Timer('a + b', 'from TestContext import *')
It seems that there are other ways to do this, since the Timer constructor takes objects as well as strings. I'm still interested in learning how to do this though, since a) it has other potential applications; and b) I'm not sure exactly how to use objects with the Timer constructor; doing so may prove to be less appropriate than this approach in some circumstances.
EDITS/REVELATIONS/PHOOEYS/EUREKA:
I've realized that the example code relating to running timing tests won't actually work, because import * only works at the module level, and the context in which that statement is executed is that of a function in the testit module. In other words, the globals dictionary used when executing that code is that of __main__, since that's where I was when I wrote the code in the interactive shell. So that rationale for figuring this out is a bit botched, but it's still a valid question.
I've discovered that the code run in the first set of examples has the undesirable effect that the namespace in which the newly created module's code executes is that of the module in which it was declared, not its own module. This is like way weird, and could lead to all sorts of unexpected rattlesnakeic sketchiness. So I'm pretty sure that this is not how this sort of thing is meant to be done, if it is in fact something that the Guido doth shine upon.
The similar-but-subtly-different case of dynamically loading a module from a file that is not in python's include path is quite easily accomplished using imp.load_source('NewModuleName', 'path/to/module/module_to_load.py'). This does load the module into sys.modules. However this doesn't really answer my question, because really, what if you're running python on an embedded platform with no filesystem?
I'm battling a considerable case of information overload at the moment, so I could be mistaken, but there doesn't seem to be anything in the imp module that's capable of this.
But the question, essentially, at this point is how to set the global (ie module) context for an object. Maybe I should ask that more specifically? And at a larger scope, how to get Python to do this while shoehorning objects into a given module?

Hmm, well one thing I can tell you is that the timeit function actually executes its code using the module's global variables. So in your example, you could write
import timeit
timeit.a = 1
timeit.b = 2
timeit.Timer('a + b').timeit()
and it would work. But that doesn't address your more general problem of defining a module dynamically.
Regarding the module definition problem, it's definitely possible and I think you've stumbled on to pretty much the best way to do it. For reference, the gist of what goes on when Python imports a module is basically the following:
module = imp.new_module(name)
execfile(file, module.__dict__)
That's kind of the same thing you do, except that you load the contents of the module from an existing dictionary instead of a file. (I don't know of any difference between types.ModuleType and imp.new_module other than the docstring, so you can probably use them interchangeably) What you're doing is somewhat akin to writing your own importer, and when you do that, you can certainly expect to mess with sys.modules.
As an aside, even if your import * thing was legal within a function, you might still have problems because oddly enough, the statement you pass to the Timer doesn't seem to recognize its own local variables. I invoked a bit of Python voodoo by the name of extract_context() (it's a function I wrote) to set a and b at the local scope and ran
print timeit.Timer('print locals(); a + b', 'sys.modules["__main__"].extract_context()').timeit()
Sure enough, the printout of locals() included a and b:
{'a': 1, 'b': 2, '_timer': <built-in function time>, '_it': repeat(None, 999999), '_t0': 1277378305.3572791, '_i': None}
but it still complained NameError: global name 'a' is not defined. Weird.

module level garbage collection in python

Let's say I have a module mod_x like the following:
class X:
pass
x=X()
Now, let's say I have another module that just performs import mod_x, and goes about its business. The module variable x will not be referenced further during the lifecycle of the interpreter.
Will the class instance x get garbage collected at any point except at the termination of the interpreter?

No, the variable will never get garbage-collected (until the end of the process), because the module object will stay in sys.modules['mod_x'] and it will have a reference to mod_x.x -- the reference count will never drop to 0 (until all modules are removed at the end of the program) and it's not an issue of "cyclycal garbage" -- it's a perfectly valid live reference, and proving that nobody every does (e.g.) a getattr(sys.modules[a], b) where string variables a and b happen to be worth 'mod_x' and 'x' respectively is at least as hard as solving the halting problem;-). ("At least" since more code may be about to be dynamically loaded at any time...!-).

Only if something else does a del mod_x.x or a rebind at some point, or if the module itself becomes fully deleted.

Once the module is imported it will be in the sys.modules dict so unless it is removed from there (which is possible though not standard practice) it will not be garbage collected.
So if you have a reason for wanting a module that has been loaded to be garbage collected you have to mess with sys.modules.

How to re import an updated package while in Python Interpreter? [duplicate]

This question already has answers here:
How do I unload (reload) a Python module?
(22 answers)
Closed 5 years ago.
I often test my module in the Python Interpreter, and when I see an error, I quickly update the .py file. But how do I make it reflect on the Interpreter ? So, far I have been exiting and reentering the Interpreter because re importing the file again is not working for me.

Update for Python3: (quoted from the already-answered answer, since the last edit/comment here suggested a deprecated method)
In Python 3, reload was moved to the imp module. In 3.4, imp was deprecated in favor of importlib, and reload was added to the latter. When targeting 3 or later, either reference the appropriate module when calling reload or import it.
Takeaway:
Python3 >= 3.4: importlib.reload(packagename)
Python3 < 3.4: imp.reload(packagename)
Python2: continue below
Use the reload builtin function:
https://docs.python.org/2/library/functions.html#reload
When reload(module) is executed:
Python modules’ code is recompiled and the module-level code reexecuted, defining a new set of objects which are bound to names in the module’s dictionary. The init function of extension modules is not called a second time.
As with all other objects in Python the old objects are only reclaimed after their reference counts drop to zero.
The names in the module namespace are updated to point to any new or changed objects.
Other references to the old objects (such as names external to the module) are not rebound to refer to the new objects and must be updated in each namespace where they occur if that is desired.
Example:
# Make a simple function that prints "version 1"
shell1$ echo 'def x(): print "version 1"' > mymodule.py
# Run the module
shell2$ python
>>> import mymodule
>>> mymodule.x()
version 1
# Change mymodule to print "version 2" (without exiting the python REPL)
shell2$ echo 'def x(): print "version 2"' > mymodule.py
# Back in that same python session
>>> reload(mymodule)
<module 'mymodule' from 'mymodule.pyc'>
>>> mymodule.x()
version 2

All the answers above about reload() or imp.reload() are deprecated.
reload() is no longer a builtin function in python 3 and imp.reload() is marked deprecated (see help(imp)).
It's better to use importlib.reload() instead.

So, far I have been exiting and reentering the Interpreter because re importing the file again is not working for me.
Yes, just saying import again gives you the existing copy of the module from sys.modules.
You can say reload(module) to update sys.modules and get a new copy of that single module, but if any other modules have a reference to the original module or any object from the original module, they will keep their old references and Very Confusing Things will happen.
So if you've got a module a, which depends on module b, and b changes, you have to ‘reload b’ followed by ‘reload a’. If you've got two modules which depend on each other, which is extremely common when those modules are part of the same package, you can't reload them both: if you reload p.a it'll get a reference to the old p.b, and vice versa. The only way to do it is to unload them both at once by deleting their items from sys.modules, before importing them again. This is icky and has some practical pitfalls to do with modules entries being None as a failed-relative-import marker.
And if you've got a module which passes references to its objects to system modules — for example it registers a codec, or adds a warnings handler — you're stuck; you can't reload the system module without confusing the rest of the Python environment.
In summary: for all but the simplest case of one self-contained module being loaded by one standalone script, reload() is very tricky to get right; if, as you imply, you are using a ‘package’, you will probably be better off continuing to cycle the interpreter.

In Python 3, the behaviour changes.
>>> import my_stuff
... do something with my_stuff, then later:
>>>> import imp
>>>> imp.reload(my_stuff)
and you get a brand new, reloaded my_stuff.

No matter how many times you import a module, you'll get the same copy of the module from sys.modules - which was loaded at first import mymodule
I am answering this late, as each of the above/previous answer has a bit of the answer, so I am attempting to sum it all up in a single answer.
Using built-in function:
For Python 2.x - Use the built-in reload(mymodule) function.
For Python 3.x - Use the imp.reload(mymodule).
For Python 3.4 - In Python 3.4 imp has been deprecated in favor of importlib i.e. importlib.reload(mymodule)
Few caveats:
It is generally not very useful to reload built-in or dynamically
loaded modules. Reloading sys, __main__, builtins and other key
modules is not recommended.
In many cases extension modules are not
designed to be initialized more than once, and may fail in arbitrary
ways when reloaded. If a module imports objects from another module
using from ... import ..., calling reload() for the other module does
not redefine the objects imported from it — one way around this is to
re-execute the from statement, another is to use import and qualified
names (module.name) instead.
If a module instantiates instances of a
class, reloading the module that defines the class does not affect
the method definitions of the instances — they continue to use the
old class definition. The same is true for derived classes.
External packages:
reimport - Reimport currently supports Python 2.4 through 2.7.
xreload- This works by executing the module in a scratch namespace, and then
patching classes, methods and functions in place. This avoids the
need to patch instances. New objects are copied into the target
namespace.
livecoding - Code reloading allows a running application to change its behaviour in response to changes in the Python scripts it uses. When the library detects a Python script has been modified, it reloads that script and replaces the objects it had previously made available for use with newly reloaded versions. As a tool, it allows a programmer to avoid interruption to their workflow and a corresponding loss of focus. It enables them to remain in a state of flow. Where previously they might have needed to restart the application in order to put changed code into effect, those changes can be applied immediately.

Short answer:
try using reimport: a full featured reload for Python.
Longer answer:
It looks like this question was asked/answered prior to the release of reimport, which bills itself as a "full featured reload for Python":
This module intends to be a full featured replacement for Python's reload function. It is targeted towards making a reload that works for Python plugins and extensions used by longer running applications.
Reimport currently supports Python 2.4 through 2.6.
By its very nature, this is not a completely solvable problem. The goal of this module is to make the most common sorts of updates work well. It also allows individual modules and package to assist in the process. A more detailed description of what happens is on the overview page.
Note: Although the reimport explicitly supports Python 2.4 through 2.6, I've been trying it on 2.7 and it seems to work just fine.

Basically reload as in allyourcode's asnwer. But it won't change underlying the code of already instantiated object or referenced functions. Extending from his answer:
#Make a simple function that prints "version 1"
shell1$ echo 'def x(): print "version 1"' > mymodule.py
# Run the module
shell2$ python
>>> import mymodule
>>> mymodule.x()
version 1
>>> x = mymodule.x
>>> x()
version 1
>>> x is mymodule.x
True
# Change mymodule to print "version 2" (without exiting the python REPL)
shell2$ echo 'def x(): print "version 2"' > mymodule.py
# Back in that same python session
>>> reload(mymodule)
<module 'mymodule' from 'mymodule.pyc'>
>>> mymodule.x()
version 2
>>> x()
version 1
>>> x is mymodule.x
False

Not sure if this does all expected things, but you can do just like that:
>>> del mymodule
>>> import mymodule

import sys
del sys.modules['module_name']

See here for a good explanation of how your dependent modules won't be reloaded and the effects that can have:
http://pyunit.sourceforge.net/notes/reloading.html
The way pyunit solved it was to track dependent modules by overriding __import__ then to delete each of them from sys.modules and re-import. They probably could've just reload'ed them, though.

dragonfly's answer worked for me (python 3.4.3).
import sys
del sys.modules['module_name']
Here is a lower level solution :
exec(open("MyClass.py").read(), globals())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Are Python modules ever garbage collected? - python

Related

Reload function fails to erase removed variables

import inside a function: is memory reclaimed upon function exit?

dynamic module creation

module level garbage collection in python

How to re import an updated package while in Python Interpreter? [duplicate]

Categories

Resources