Linked questions:
python - import at top of file vs inside a function
Should Python import statements always be at the top of a module?
If an import statement is inside a function, will the memory occupied by it get reclaimed once the function exits? If yes, is the timing of the reclamation deterministic (or even -ish)?
def func():
import os
...
# function about to exit; will memory occupied by `os` be freed?
If anyone has knowledge on the behavior of micropython on this topic, bonus points.
The first import executes the code in the module. It creates the module object's attributes. Each subsequent import just references the module object created by the first import.
Module objects in Python are effectively singletons. For this to work, the Python implementation has to keep the one and only module instance around after the first import, regardless of the name the module was bound to. If it was bound to a name anyway, as there are also imports of the form from some_module import some_name.
So no, the memory isn't reclaimed.
No idea about Micropython, but I would be surprised if it changes semantics here that drastically. You can simply test this yourself:
some_module.py:
value = 0
some_other_module.py:
def f():
import some_module
some_module.value += 1
print(some_module.value)
f()
f()
This should print the numbers 1 and 2.
To second what #BlackJack wrote, per Python semantics, an "import" statement adds module reference to sys.modules, that alone does keep the module object from being garbage collected.
You can try to do del sys.modules["some_module"], but there's no guarantee that all memory taken by the module would be reclaimed. (That issue popped up previously, but I don't remember the current state of it, e.g. if bytecode objects can be garbage-collected).
If yes, is the timing of the reclamation deterministic (or even -ish)?
In MicroPython, "reclamation time" is guaranteedly non-deterministic, because it uses purely garbage collection scheme, no reference counting. That means that any resource-consuming objects (files, sockets) should be closed explicitly.
Otherwise, function-level imports are valid and useful idiom in Python, and especially useful in MicroPython. It allows to import some module only if a particular code path is hit. E.g. if user never calls some function, a module will not be imported, saving more memory for tasks user needs more in this particular application/invocation.
Related
If I load a module in Python, will it ever be garbage collected? Another way of framing this question is, where does Python keep references to Python modules? As I assume if there are no longer any references, the garbage collector will remove a module.
Here's an example I tried in the in Python interpreter:
>>> from importlib import import_module
>>> import sys
>>> import gc
>>> x = import_module('math')
>>> 'math' in sys.modules
This outputs:
True
So let's delete the reference to the module in the script.
>>> del x
>>> gc.collect()
>>> 'math' in sys.modules
Python still keeps track of the math module, as the output is still:
True
But now if I delete math from sys.modules, I no longer am aware of any further references:
>>> del sys.modules['math']
>>> gc.collect()
Howver, the output of gc.collect() is:
0
Nothing was garbage collected, so the module is no longer in sys.modules or my script. Why was it not garbage collected?
In general, at least in 3.4 and later, module objects shouldn’t be anything special in this regard. Of course normally there’s a reference to every loaded module in sys.modules, but if you’ve explicitly deleted that, a module should be able to go away.
That being said, there have definitely been problems in the past that prevent that from happening in some cases, and I wouldn’t promise that there aren’t any such problems left as of 3.7.
Unfortunately, your test is not actually testing anything. Presumably you’re using CPython. In CPython, the garbage collector uses reference counting—it stores a count directly on each object, incrementing and decrementing count every time a new name is bound to it, and immediately deleting it if the count goes to 0. The thing in the gc module is a cycle collector, which is needed to handle some special cases where two (or more) objects refer to each other but nobody else refers to them. If the module isn’t part of such a cycle, it’ll be deleted before you call gc.collect(), so of course that will return 0. But that 0 tells you nothing.
There are other problems with your test.
First, you should not test garbage in the interactive interpreter. All kinds of extra stuff gets kept around there, in ways that are complicated to explain. It’s much better to write a test script.
Second, you shouldn’t be using math as your test. It’s an extension module (that is, written in C rather than Python), and even after the major changes in 3.5, they still don’t work the same. It’s also a core module that may be part of startup or otherwise needed by other parts of the interpreter, even if you aren’t referencing it from your code. So, far better to use something else.
Anyway, I think there may be a way to test this directly, without using the debugger, but no promises on whether it’ll work.
First, you need to create a subclass of types.ModuleType, which has a __del__ method that prints out some message. Then, you just need to import a module (a .py one, not an extension module) and set its __class__ to that subclass. Which may be as simple as __class__ = MyModuleSubclass in the .py file. Now, when it gets collected, its destructor will run, and you’ll have proof that it was collected. (Well, proof that it was collected unless the destructor revived it, but if your destructor doesn’t do anything but print a static string, that hopefully isn’t a worry.)
Based on the answer from abarnert, I created the following run-it-yourself example that demonstrates the behaviour I was trying to understand:
from types import ModuleType
from importlib import import_module
import sys
class MyModule(ModuleType):
def __del__(self):
print('I am being deleted')
if __name__ == '__main__':
x = import_module('urllib3')
x.__class__ = MyModule
del x
del sys.modules['urllib3'] # Comment this out and urllib3 will NOT be garbage collected before the script finishes
print('finishing')
Output when run as is:
I am being deleted
finishing
Output with the del sys.modules['urllib3'] line commented out:
finishing
I am being deleted
It is clear that modules are garbage collected as one would expect when all references to them have been deleted, and that unless the module in question is somewhat particular, this occurs when references in the application and in sys.modules have been deleted.
I am trying to better understand Pythons modules, coming from C background mostly.
I have main.py with the following:
def g():
print obj # Need access to the object below
if __name__ == "__main__":
obj = {}
import child
child.f()
And child.py:
def f():
import main
main.g()
This particular structure of code may seem strange at first, but rest assured this is stripped from a larger project I am working on, where delegation of responsibility and decoupling forces the kind of inter-module function call sequence you see.
I need to be able to access the actual object I create when first executing main python main.py. Is this possible without explicitly sending obj as parameter around? Because I will have other variables and I don't want to send these too. If desperate, I can create a "state" object for the entire main module that I need access to, and send it around, but even that is to me a last resort. This is global variables at its simplest in C, but in Python this is a different beast I suppose (module global variables only?)
One of the solutions, excluding parameter passing at least, has turned to revolve around the fact that when executing the main Python module main as such - via f.e. python main.py where if clause suceeds and subsequently, obj is bound - the main module and its state exist and are referenced as __main__ (inspected using sys.modules dictionary). So when the child module needs the actual instance of the main module, it is not main it needs to import but __main__, otherwise two distinct copies would exist, with their own distinct states.
'Fixed' child.py:
def f():
import __main__
__main__.g()
I'd like to dynamically create a module from a dictionary, and I'm wondering if adding an element to sys.modules is really the best way to do this. EG
context = { a: 1, b: 2 }
import types
test_context_module = types.ModuleType('TestContext', 'Module created to provide a context for tests')
test_context_module.__dict__.update(context)
import sys
sys.modules['TestContext'] = test_context_module
My immediate goal in this regard is to be able to provide a context for timing test execution:
import timeit
timeit.Timer('a + b', 'from TestContext import *')
It seems that there are other ways to do this, since the Timer constructor takes objects as well as strings. I'm still interested in learning how to do this though, since a) it has other potential applications; and b) I'm not sure exactly how to use objects with the Timer constructor; doing so may prove to be less appropriate than this approach in some circumstances.
EDITS/REVELATIONS/PHOOEYS/EUREKA:
I've realized that the example code relating to running timing tests won't actually work, because import * only works at the module level, and the context in which that statement is executed is that of a function in the testit module. In other words, the globals dictionary used when executing that code is that of __main__, since that's where I was when I wrote the code in the interactive shell. So that rationale for figuring this out is a bit botched, but it's still a valid question.
I've discovered that the code run in the first set of examples has the undesirable effect that the namespace in which the newly created module's code executes is that of the module in which it was declared, not its own module. This is like way weird, and could lead to all sorts of unexpected rattlesnakeic sketchiness. So I'm pretty sure that this is not how this sort of thing is meant to be done, if it is in fact something that the Guido doth shine upon.
The similar-but-subtly-different case of dynamically loading a module from a file that is not in python's include path is quite easily accomplished using imp.load_source('NewModuleName', 'path/to/module/module_to_load.py'). This does load the module into sys.modules. However this doesn't really answer my question, because really, what if you're running python on an embedded platform with no filesystem?
I'm battling a considerable case of information overload at the moment, so I could be mistaken, but there doesn't seem to be anything in the imp module that's capable of this.
But the question, essentially, at this point is how to set the global (ie module) context for an object. Maybe I should ask that more specifically? And at a larger scope, how to get Python to do this while shoehorning objects into a given module?
Hmm, well one thing I can tell you is that the timeit function actually executes its code using the module's global variables. So in your example, you could write
import timeit
timeit.a = 1
timeit.b = 2
timeit.Timer('a + b').timeit()
and it would work. But that doesn't address your more general problem of defining a module dynamically.
Regarding the module definition problem, it's definitely possible and I think you've stumbled on to pretty much the best way to do it. For reference, the gist of what goes on when Python imports a module is basically the following:
module = imp.new_module(name)
execfile(file, module.__dict__)
That's kind of the same thing you do, except that you load the contents of the module from an existing dictionary instead of a file. (I don't know of any difference between types.ModuleType and imp.new_module other than the docstring, so you can probably use them interchangeably) What you're doing is somewhat akin to writing your own importer, and when you do that, you can certainly expect to mess with sys.modules.
As an aside, even if your import * thing was legal within a function, you might still have problems because oddly enough, the statement you pass to the Timer doesn't seem to recognize its own local variables. I invoked a bit of Python voodoo by the name of extract_context() (it's a function I wrote) to set a and b at the local scope and ran
print timeit.Timer('print locals(); a + b', 'sys.modules["__main__"].extract_context()').timeit()
Sure enough, the printout of locals() included a and b:
{'a': 1, 'b': 2, '_timer': <built-in function time>, '_it': repeat(None, 999999), '_t0': 1277378305.3572791, '_i': None}
but it still complained NameError: global name 'a' is not defined. Weird.
I know this does not sound Pythonic, but bear with me for a second.
I am writing a module that depends on some external closed-source module. That module needs to get instantiated to be used (using module.create()).
My module attempts to figure out if my user already loaded that module (easy to do), but then needs to figure out if the module was instantiated. I understand that checking out the type() of each variable can tell me this, but I am not sure how I can get the names of variables defined by the main program. The reason for this is that when one instantiates the model, they also set a bunch of parameters that I do not want to overwrite for any reason.
My attempts so far involved using sys._getframe().f_globals and iterating through the elements, but in my testing it doesn't work. If I instantiate the module as modInst and then call the function in my module, it fails to show the modInst variable. Is there another solution to this? Sample code provided below.
import sys
if moduleName not in sys.modules:
import moduleName
modInst = moduleName.create()
else:
globalVars = sys._getframe().f_globals
for key, value in globalVars:
if value == "Module Name Instance":
return key
return moduleName.create()
EDIT: Sample code included.
Looks like your code assumes that the .create() function was called, if at all, by the immediate/direct caller of your function (which you show only partially, making it pretty hard to be sure about what's going on) and the results placed in a global variable (of the module where the caller of your function resides). It all seems pretty fragile. Doesn't that third-party module have some global variables of its own that are affected by whether the module's create has been called or not? I imagine it would -- where else is it keeping the state-changes resulting from executing the create -- and I would explore that.
To address a specific issue you raise,
I am not sure how I can get the names
of variables defined by the main
program
that's easy -- the main program is found, as a module, in sys.modules['__main__'], so just use vars(sys.modules['__main__']) to get the global dictionary of the main program (the variable names are the keys in that dictionary, along of course with names of functions, classes, etc -- the module, like any other module, has exactly one top-level/global namespace, not one for variables, a separate one for functions, etc).
Suppose the external closed-sourced module is called extmod.
Create my_extmod.py:
import extmod
INSTANTIATED=False
def create(*args,**kw):
global INSTANTIATED
INSTANTIATED=True
return extmod.create(*args,**kw)
Then require your users to import my_extmod instead of extmod directly.
To test if the create function has been called, just check the value of extmod.INSTANTIATED.
Edit: If you open up an IPython session and type import extmod, then type
extmod.[TAB], then you'll see all the top-level variables in the extmod namespace. This might help you find some parameter that changes when extmod.create is called.
Barring that, and barring the possibility of training users to import my_extmod, then perhaps you could use something like the function below. find_extmod_instance searches through all modules in sys.modules.
def find_instance(cls):
for modname in sys.modules:
module=sys.modules[modname]
for value in vars(module).values():
if isinstance(value,cls):
return value
x=find_instance(extmod.ExtmodClass) or extmod.create()
Let's say I have a module mod_x like the following:
class X:
pass
x=X()
Now, let's say I have another module that just performs import mod_x, and goes about its business. The module variable x will not be referenced further during the lifecycle of the interpreter.
Will the class instance x get garbage collected at any point except at the termination of the interpreter?
No, the variable will never get garbage-collected (until the end of the process), because the module object will stay in sys.modules['mod_x'] and it will have a reference to mod_x.x -- the reference count will never drop to 0 (until all modules are removed at the end of the program) and it's not an issue of "cyclycal garbage" -- it's a perfectly valid live reference, and proving that nobody every does (e.g.) a getattr(sys.modules[a], b) where string variables a and b happen to be worth 'mod_x' and 'x' respectively is at least as hard as solving the halting problem;-). ("At least" since more code may be about to be dynamically loaded at any time...!-).
Only if something else does a del mod_x.x or a rebind at some point, or if the module itself becomes fully deleted.
Once the module is imported it will be in the sys.modules dict so unless it is removed from there (which is possible though not standard practice) it will not be garbage collected.
So if you have a reason for wanting a module that has been loaded to be garbage collected you have to mess with sys.modules.