module level garbage collection in python

module level garbage collection in python - python

Let's say I have a module mod_x like the following:
class X:
pass
x=X()
Now, let's say I have another module that just performs import mod_x, and goes about its business. The module variable x will not be referenced further during the lifecycle of the interpreter.
Will the class instance x get garbage collected at any point except at the termination of the interpreter?

No, the variable will never get garbage-collected (until the end of the process), because the module object will stay in sys.modules['mod_x'] and it will have a reference to mod_x.x -- the reference count will never drop to 0 (until all modules are removed at the end of the program) and it's not an issue of "cyclycal garbage" -- it's a perfectly valid live reference, and proving that nobody every does (e.g.) a getattr(sys.modules[a], b) where string variables a and b happen to be worth 'mod_x' and 'x' respectively is at least as hard as solving the halting problem;-). ("At least" since more code may be about to be dynamically loaded at any time...!-).

Only if something else does a del mod_x.x or a rebind at some point, or if the module itself becomes fully deleted.

Once the module is imported it will be in the sys.modules dict so unless it is removed from there (which is possible though not standard practice) it will not be garbage collected.
So if you have a reason for wanting a module that has been loaded to be garbage collected you have to mess with sys.modules.

Related

Reload function fails to erase removed variables

I am trying to access variables from a module that is being modified while the main script runs by using reload. But the reload function fails to erase the variables that have been removed from the module. How to force python to erase them ?
Here is my code
My module.py :
a = 1
b = 2
My main scipt.py :
import time
from importlib import reload
import module
while True:
reload(module)
print('------- module reloaded')
try:
print('a: ', module.a)
except AttributeError:
print('a: ', 'undefined')
try:
print('b: ', module.b)
except AttributeError:
print('b: ', 'undefined')
try:
print('c: ', module.c)
except AttributeError:
print('c: ', 'undefined')
time.sleep(5)
As expected, if I run my script with python (3.5.1) I get the output:
------- module reloaded
a: 1
b: 2
c: undefined
But I get an unexpected behavior, when I change the module.py as following:
# a = 1
b = 3
c = 4
I have the following output:
------- module reloaded
a: 1
b: 3
c: 4
This means that reload correctly updated the value of b and added the new variable c. But it failed to erase the variable a that has been removed from the module. It seems to only perform updates on the variables that are found in the new version of the module. How to force the reload function to erase removed values ?
Thank you for your help

The issue here is that reload is implemented with code that, roughly, execs the current version of the module code in the existing cached module's namespace. The reason for doing this is that reload is intended to have a global effect; reload(module) doesn't just reload it for you, it changes every other module's personally imported copy of module. If it created a new module namespace, other modules would still have the old module cached; while erasing the contents of the old module before execing might help, it would risk breaking already imported submodules of a package, trigger race conditions (where a thread might see module.a before and after the reload, but it would mysteriously disappear for a moment during the reload), etc.
As the docs note:
When a module is reloaded, its dictionary (containing the module’s global variables) is retained. Redefinitions of names will override the old definitions, so this is generally not a problem. If the new version of a module does not define a name that was defined by the old version, the old definition remains.
There are workarounds that bypass this safety mechanism if you absolutely must do it. The simplest is to simply remove the module from the module cache and reimport it, rather than reloading it:
import sys # At top of file
del sys.modules['module']
import module
That won't update any other importers of the module (they'll keep the stale cache), but if the module is only used in your module, that'll work.
Another approach that might work (untested, and it's kind of insane) would be to explicitly delete all the public names from the module before reloading with something like:
# Intentionally done as list comprehension so no modification to module's globals dict
# occurs while we're iterating it
# Might make sense to use dir or iterate module.__all__ if available instead of using vars;
# depends on design
for name in [n for n in vars(module) if not n.startswith('_')]:
try:
delattr(module, name)
except Exception:
pass # Undeletable attribute for whatever reason, just ignore and let reload deal with it
reload(module) # Now exec occurs in clean namespace
That would avoid the stale cache issue, in exchange for insanity.
Really, the answer is "don't use a design that depends on production reloading"; if the module is just data, store it as a JSON file or the like and just reparse it (which is generally much cheaper than what the Python import machinery goes through to import a module).

This is by design. From the docs:
When a module is reloaded, its dictionary (containing the module’s global variables) is retained. Redefinitions of names will override the old definitions, so this is generally not a problem. If the new version of a module does not define a name that was defined by the old version, the old definition remains.
Generally, reload is meant for convenience within interactive interpreter sessions; it is not really meant to be used in real scripts. You should probably re-think your design. (e.g. is there a reason that module can't be a regular text file?)

Are Python modules ever garbage collected?

If I load a module in Python, will it ever be garbage collected? Another way of framing this question is, where does Python keep references to Python modules? As I assume if there are no longer any references, the garbage collector will remove a module.
Here's an example I tried in the in Python interpreter:
>>> from importlib import import_module
>>> import sys
>>> import gc
>>> x = import_module('math')
>>> 'math' in sys.modules
This outputs:
True
So let's delete the reference to the module in the script.
>>> del x
>>> gc.collect()
>>> 'math' in sys.modules
Python still keeps track of the math module, as the output is still:
True
But now if I delete math from sys.modules, I no longer am aware of any further references:
>>> del sys.modules['math']
>>> gc.collect()
Howver, the output of gc.collect() is:
0
Nothing was garbage collected, so the module is no longer in sys.modules or my script. Why was it not garbage collected?

In general, at least in 3.4 and later, module objects shouldn’t be anything special in this regard. Of course normally there’s a reference to every loaded module in sys.modules, but if you’ve explicitly deleted that, a module should be able to go away.
That being said, there have definitely been problems in the past that prevent that from happening in some cases, and I wouldn’t promise that there aren’t any such problems left as of 3.7.
Unfortunately, your test is not actually testing anything. Presumably you’re using CPython. In CPython, the garbage collector uses reference counting—it stores a count directly on each object, incrementing and decrementing count every time a new name is bound to it, and immediately deleting it if the count goes to 0. The thing in the gc module is a cycle collector, which is needed to handle some special cases where two (or more) objects refer to each other but nobody else refers to them. If the module isn’t part of such a cycle, it’ll be deleted before you call gc.collect(), so of course that will return 0. But that 0 tells you nothing.
There are other problems with your test.
First, you should not test garbage in the interactive interpreter. All kinds of extra stuff gets kept around there, in ways that are complicated to explain. It’s much better to write a test script.
Second, you shouldn’t be using math as your test. It’s an extension module (that is, written in C rather than Python), and even after the major changes in 3.5, they still don’t work the same. It’s also a core module that may be part of startup or otherwise needed by other parts of the interpreter, even if you aren’t referencing it from your code. So, far better to use something else.
Anyway, I think there may be a way to test this directly, without using the debugger, but no promises on whether it’ll work.
First, you need to create a subclass of types.ModuleType, which has a __del__ method that prints out some message. Then, you just need to import a module (a .py one, not an extension module) and set its __class__ to that subclass. Which may be as simple as __class__ = MyModuleSubclass in the .py file. Now, when it gets collected, its destructor will run, and you’ll have proof that it was collected. (Well, proof that it was collected unless the destructor revived it, but if your destructor doesn’t do anything but print a static string, that hopefully isn’t a worry.)

Based on the answer from abarnert, I created the following run-it-yourself example that demonstrates the behaviour I was trying to understand:
from types import ModuleType
from importlib import import_module
import sys
class MyModule(ModuleType):
def __del__(self):
print('I am being deleted')
if __name__ == '__main__':
x = import_module('urllib3')
x.__class__ = MyModule
del x
del sys.modules['urllib3'] # Comment this out and urllib3 will NOT be garbage collected before the script finishes
print('finishing')
Output when run as is:
I am being deleted
finishing
Output with the del sys.modules['urllib3'] line commented out:
finishing
I am being deleted
It is clear that modules are garbage collected as one would expect when all references to them have been deleted, and that unless the module in question is somewhat particular, this occurs when references in the application and in sys.modules have been deleted.

import inside a function: is memory reclaimed upon function exit?

Linked questions:
python - import at top of file vs inside a function
Should Python import statements always be at the top of a module?
If an import statement is inside a function, will the memory occupied by it get reclaimed once the function exits? If yes, is the timing of the reclamation deterministic (or even -ish)?
def func():
import os
...
# function about to exit; will memory occupied by `os` be freed?
If anyone has knowledge on the behavior of micropython on this topic, bonus points.

The first import executes the code in the module. It creates the module object's attributes. Each subsequent import just references the module object created by the first import.
Module objects in Python are effectively singletons. For this to work, the Python implementation has to keep the one and only module instance around after the first import, regardless of the name the module was bound to. If it was bound to a name anyway, as there are also imports of the form from some_module import some_name.
So no, the memory isn't reclaimed.
No idea about Micropython, but I would be surprised if it changes semantics here that drastically. You can simply test this yourself:
some_module.py:
value = 0
some_other_module.py:
def f():
import some_module
some_module.value += 1
print(some_module.value)
f()
f()
This should print the numbers 1 and 2.

To second what #BlackJack wrote, per Python semantics, an "import" statement adds module reference to sys.modules, that alone does keep the module object from being garbage collected.
You can try to do del sys.modules["some_module"], but there's no guarantee that all memory taken by the module would be reclaimed. (That issue popped up previously, but I don't remember the current state of it, e.g. if bytecode objects can be garbage-collected).
If yes, is the timing of the reclamation deterministic (or even -ish)?
In MicroPython, "reclamation time" is guaranteedly non-deterministic, because it uses purely garbage collection scheme, no reference counting. That means that any resource-consuming objects (files, sockets) should be closed explicitly.
Otherwise, function-level imports are valid and useful idiom in Python, and especially useful in MicroPython. It allows to import some module only if a particular code path is hit. E.g. if user never calls some function, a module will not be imported, saving more memory for tasks user needs more in this particular application/invocation.

Python cross module variables and imports

I see that there are other questions related to cross module variables, but they don't really fully answer my question.
I have an application that I have split into 3 modules + 1 main application, mainly for ease of readability and maintainability.
2 of these modules have threads with variables that need to be modified from other modules and other module threads.
Whilst I can modify a module's variable from the main code, I don't appear to be able to modify one module's variable from another module unless I import every module into every other module.
The example below where a&b are imported into main and a module a needs to access a variable in module b:
main
module a
var a
module b
var a
main
a.a = 1
b.a = 2
module a
b.a = 3
module b
a.a = 0
without importing module a into module b and importing module b into module a, can this be achieved globally through the main program ?
If I do have to import a and b into main, and then import a into b and b into a, what are the implications in terms of memory and resource usage / speed etc ?
I tried the suggestion from #abarnert:
#moda
vara = 10
#modb
print(str(vara))
#main
import moda
from moda import vara
import modb
however I get "name error vara is not defined"

If the code in the modules are defined as classes, and the main program creates instances of these classes, the main program can pass an instance of one module class to another, and changes to that instance will be reflected everywhere. There would be no need to import a or b into each other, because they would simply have references to each other.

If I do have to import a and b into main, and then import a into b and b into a, what are the implications in terms of memory and resource usage / speed etc ?
Absolutely none for memory—every module that imports a will get a reference to the exact same a module object. All you're doing is increasing its refcount, not creating new objects.
For speed, the time to discover that you're trying to import a module that already exists is almost nothing (it's just looking up the module name in a dictionary). It is slightly slower to access a.a than to just access a. But this is very rarely an issue. If it is, you're almost certainly going to want to copy that value into the locals of whatever function is accessing it over and over, at which point it won't matter which globals it came from.
without importing module a into module b and importing module b into module a, can this be achieved globally through the main program ?
Sure. All you have to do is import (with from a import a or import a.a as aa or whatever) or copy the variables from module a into main.
Note that just makes a new name for each value; it doesn't make references to the variables. There is no such thing as a reference to a variable in Python.
This works if the variables are holding constants, or if they're holding mutable values that you modify. It just doesn't do anything useful if the variables are names that you want to rebind to new values. (If you do need to do that, just wrap the values in something mutable—e.g., turn each variable into a 1-item list, so you can rebind a[0] instead of a, which means anyone else who has a reference to a can see your new a[0] value.)
If you insist on a "true global", even that isn't impossible. See builtins for details. But you almost certainly don't want this.

If you want to be able to modify a module-level variable from a different module then yes, you will need to import the other module. I would question why you need to do this. Perhaps you should be breaking your code into classes instead of separate modules.
For example you could choose to encapsulate the variables that need to be modified by both modules inside a separate class and pass a single instance of that class to all classes (or modules but you should really use classes) that need it.
See Circular (or cyclic) imports in Python for more information about cyclical imports.

How can I figure out in my module if the main program uses a specific variable?

I know this does not sound Pythonic, but bear with me for a second.
I am writing a module that depends on some external closed-source module. That module needs to get instantiated to be used (using module.create()).
My module attempts to figure out if my user already loaded that module (easy to do), but then needs to figure out if the module was instantiated. I understand that checking out the type() of each variable can tell me this, but I am not sure how I can get the names of variables defined by the main program. The reason for this is that when one instantiates the model, they also set a bunch of parameters that I do not want to overwrite for any reason.
My attempts so far involved using sys._getframe().f_globals and iterating through the elements, but in my testing it doesn't work. If I instantiate the module as modInst and then call the function in my module, it fails to show the modInst variable. Is there another solution to this? Sample code provided below.
import sys
if moduleName not in sys.modules:
import moduleName
modInst = moduleName.create()
else:
globalVars = sys._getframe().f_globals
for key, value in globalVars:
if value == "Module Name Instance":
return key
return moduleName.create()
EDIT: Sample code included.

Looks like your code assumes that the .create() function was called, if at all, by the immediate/direct caller of your function (which you show only partially, making it pretty hard to be sure about what's going on) and the results placed in a global variable (of the module where the caller of your function resides). It all seems pretty fragile. Doesn't that third-party module have some global variables of its own that are affected by whether the module's create has been called or not? I imagine it would -- where else is it keeping the state-changes resulting from executing the create -- and I would explore that.
To address a specific issue you raise,
I am not sure how I can get the names
of variables defined by the main
program
that's easy -- the main program is found, as a module, in sys.modules['__main__'], so just use vars(sys.modules['__main__']) to get the global dictionary of the main program (the variable names are the keys in that dictionary, along of course with names of functions, classes, etc -- the module, like any other module, has exactly one top-level/global namespace, not one for variables, a separate one for functions, etc).

Suppose the external closed-sourced module is called extmod.
Create my_extmod.py:
import extmod
INSTANTIATED=False
def create(*args,**kw):
global INSTANTIATED
INSTANTIATED=True
return extmod.create(*args,**kw)
Then require your users to import my_extmod instead of extmod directly.
To test if the create function has been called, just check the value of extmod.INSTANTIATED.
Edit: If you open up an IPython session and type import extmod, then type
extmod.[TAB], then you'll see all the top-level variables in the extmod namespace. This might help you find some parameter that changes when extmod.create is called.
Barring that, and barring the possibility of training users to import my_extmod, then perhaps you could use something like the function below. find_extmod_instance searches through all modules in sys.modules.
def find_instance(cls):
for modname in sys.modules:
module=sys.modules[modname]
for value in vars(module).values():
if isinstance(value,cls):
return value
x=find_instance(extmod.ExtmodClass) or extmod.create()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

module level garbage collection in python - python

Only if something else does a del mod_x.x or a rebind at some point, or if the module itself becomes fully deleted.

Related

Reload function fails to erase removed variables

Are Python modules ever garbage collected?

import inside a function: is memory reclaimed upon function exit?

Python cross module variables and imports

How can I figure out in my module if the main program uses a specific variable?

Categories

Resources