Lazy loading python sub-modules, importlib fails first time - python

I'm experimenting with the idea of lazy-loading of symbols in a package's __init__.py by subclassing ModuleType and defining properties for each of the submodules. Accessing the symbol in the package namespace would trigger the import. I've got it working, but for some reason, my call to import_module fails on the first attempt and I don't understand why.
I have a minimal example. Assume a package like this:
my_package:
__init__.py
m1.py
this is the __init__.py
import sys
import importlib
from types import ModuleType
class MyModule(ModuleType):
#property
def m1(self):
try:
_m1 = importlib.import_module('.m1', __package__)
except AttributeError:
print('second try ...')
_m1 = importlib.import_module('.m1', __package__)
return _m1
old = sys.modules[__name__]
new = MyModule(__name__)
new.__path__ = old.__path__
for k, v in list(old.__dict__.items()):
new.__dict__[k] = v
sys.modules[__name__] = new
The import_module call always fails with an AttributeError: module 'my_package' has no attribute 'm1'. However, the second call always succeeds. In other words, when I do my_package.m1 I always get m1, but it always prints 'second try ...'.
Note, the behavior is dependent on python version. The call to import_lib works fine the first time on python2.7.

Here is the difference between python2 vs python3.
In python3, the importlib.import_module call ultimately ends up
here
which is a call to setattr. Since you didn't define a .setter for
your property, you get the AttributeError.
In python2, the importlib.import_module call ends up
here
which is a call to the builtin __import__ which presumably operates
directly on the module __dict__.
The only question is how in the world it ever works in python3. I
would have thought it would always resulted in a AttributeError.
Your code works fine as long as you make a .setter:
#m1.setter
def m1(self, mod):
self.__dict__['m1'] = mod
It actually turns out that the .setter can do anything at all,
including pass since you are unconditionally making the call to
import_module.
I would consider using the .setattr above and changing the getter to:
#property
def m1(self):
if not self.__dict__.get('m1'):
self.__dict__['m1'] = importlib.import_module('.m1', __package__)
return self.__dict__['m1']

Related

Python global variable in import * [duplicate]

I've run into a bit of a wall importing modules in a Python script. I'll do my best to describe the error, why I run into it, and why I'm tying this particular approach to solve my problem (which I will describe in a second):
Let's suppose I have a module in which I've defined some utility functions/classes, which refer to entities defined in the namespace into which this auxiliary module will be imported (let "a" be such an entity):
module1:
def f():
print a
And then I have the main program, where "a" is defined, into which I want to import those utilities:
import module1
a=3
module1.f()
Executing the program will trigger the following error:
Traceback (most recent call last):
File "Z:\Python\main.py", line 10, in <module>
module1.f()
File "Z:\Python\module1.py", line 3, in f
print a
NameError: global name 'a' is not defined
Similar questions have been asked in the past (two days ago, d'uh) and several solutions have been suggested, however I don't really think these fit my requirements. Here's my particular context:
I'm trying to make a Python program which connects to a MySQL database server and displays/modifies data with a GUI. For cleanliness sake, I've defined the bunch of auxiliary/utility MySQL-related functions in a separate file. However they all have a common variable, which I had originally defined inside the utilities module, and which is the cursor object from MySQLdb module.
I later realised that the cursor object (which is used to communicate with the db server) should be defined in the main module, so that both the main module and anything that is imported into it can access that object.
End result would be something like this:
utilities_module.py:
def utility_1(args):
code which references a variable named "cur"
def utility_n(args):
etcetera
And my main module:
program.py:
import MySQLdb, Tkinter
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
And then, as soon as I try to call any of the utilities functions, it triggers the aforementioned "global name not defined" error.
A particular suggestion was to have a "from program import cur" statement in the utilities file, such as this:
utilities_module.py:
from program import cur
#rest of function definitions
program.py:
import Tkinter, MySQLdb
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
But that's cyclic import or something like that and, bottom line, it crashes too. So my question is:
How in hell can I make the "cur" object, defined in the main module, visible to those auxiliary functions which are imported into it?
Thanks for your time and my deepest apologies if the solution has been posted elsewhere. I just can't find the answer myself and I've got no more tricks in my book.
Globals in Python are global to a module, not across all modules. (Many people are confused by this, because in, say, C, a global is the same across all implementation files unless you explicitly make it static.)
There are different ways to solve this, depending on your actual use case.
Before even going down this path, ask yourself whether this really needs to be global. Maybe you really want a class, with f as an instance method, rather than just a free function? Then you could do something like this:
import module1
thingy1 = module1.Thingy(a=3)
thingy1.f()
If you really do want a global, but it's just there to be used by module1, set it in that module.
import module1
module1.a=3
module1.f()
On the other hand, if a is shared by a whole lot of modules, put it somewhere else, and have everyone import it:
import shared_stuff
import module1
shared_stuff.a = 3
module1.f()
ā€¦ and, in module1.py:
import shared_stuff
def f():
print shared_stuff.a
Don't use a from import unless the variable is intended to be a constant. from shared_stuff import a would create a new a variable initialized to whatever shared_stuff.a referred to at the time of the import, and this new a variable would not be affected by assignments to shared_stuff.a.
Or, in the rare case that you really do need it to be truly global everywhere, like a builtin, add it to the builtin module. The exact details differ between Python 2.x and 3.x. In 3.x, it works like this:
import builtins
import module1
builtins.a = 3
module1.f()
As a workaround, you could consider setting environment variables in the outer layer, like this.
main.py:
import os
os.environ['MYVAL'] = str(myintvariable)
mymodule.py:
import os
myval = None
if 'MYVAL' in os.environ:
myval = os.environ['MYVAL']
As an extra precaution, handle the case when MYVAL is not defined inside the module.
This post is just an observation for Python behaviour I encountered. Maybe the advices you read above don't work for you if you made the same thing I did below.
Namely, I have a module which contains global/shared variables (as suggested above):
#sharedstuff.py
globaltimes_randomnode=[]
globalist_randomnode=[]
Then I had the main module which imports the shared stuff with:
import sharedstuff as shared
and some other modules that actually populated these arrays. These are called by the main module. When exiting these other modules I can clearly see that the arrays are populated. But when reading them back in the main module, they were empty. This was rather strange for me (well, I am new to Python). However, when I change the way I import the sharedstuff.py in the main module to:
from globals import *
it worked (the arrays were populated).
Just sayin'
A function uses the globals of the module it's defined in. Instead of setting a = 3, for example, you should be setting module1.a = 3. So, if you want cur available as a global in utilities_module, set utilities_module.cur.
A better solution: don't use globals. Pass the variables you need into the functions that need it, or create a class to bundle all the data together, and pass it when initializing the instance.
The easiest solution to this particular problem would have been to add another function within the module that would have stored the cursor in a variable global to the module. Then all the other functions could use it as well.
module1:
cursor = None
def setCursor(cur):
global cursor
cursor = cur
def method(some, args):
global cursor
do_stuff(cursor, some, args)
main program:
import module1
cursor = get_a_cursor()
module1.setCursor(cursor)
module1.method()
Since globals are module specific, you can add the following function to all imported modules, and then use it to:
Add singular variables (in dictionary format) as globals for those
Transfer your main module globals to it
.
addglobals = lambda x: globals().update(x)
Then all you need to pass on current globals is:
import module
module.addglobals(globals())
Since I haven't seen it in the answers above, I thought I would add my simple workaround, which is just to add a global_dict argument to the function requiring the calling module's globals, and then pass the dict into the function when calling; e.g:
# external_module
def imported_function(global_dict=None):
print(global_dict["a"])
# calling_module
a = 12
from external_module import imported_function
imported_function(global_dict=globals())
>>> 12
The OOP way of doing this would be to make your module a class instead of a set of unbound methods. Then you could use __init__ or a setter method to set the variables from the caller for use in the module methods.
Update
To test the theory, I created a module and put it on pypi. It all worked perfectly.
pip install superglobals
Short answer
This works fine in Python 2 or 3:
import inspect
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
save as superglobals.py and employ in another module thusly:
from superglobals import *
superglobals()['var'] = value
Extended Answer
You can add some extra functions to make things more attractive.
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
def getglobal(key, default=None):
"""
getglobal(key[, default]) -> value
Return the value for key if key is in the global dictionary, else default.
"""
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals.get(key, default)
def setglobal(key, value):
_globals = superglobals()
_globals[key] = value
def defaultglobal(key, value):
"""
defaultglobal(key, value)
Set the value of global variable `key` if it is not otherwise st
"""
_globals = superglobals()
if key not in _globals:
_globals[key] = value
Then use thusly:
from superglobals import *
setglobal('test', 123)
defaultglobal('test', 456)
assert(getglobal('test') == 123)
Justification
The "python purity league" answers that litter this question are perfectly correct, but in some environments (such as IDAPython) which is basically single threaded with a large globally instantiated API, it just doesn't matter as much.
It's still bad form and a bad practice to encourage, but sometimes it's just easier. Especially when the code you are writing isn't going to have a very long life.

Import modules that don't exist (yet)

I wish to create my own variation of amoffat'ssh module, where it can import pretty much any command from user's UNIX path, such as:
from sh import hg
However, I am having a hard time finding a way to intercept / override python's own import [...] and from [...] import [...]. At this point I simply need a way to at least get [the name of] the object of the from import, at which point I can simply setattr() and partial() my way from there, I hope. I'm at a complete loss of how to do this at the moment, however, and hence, have no code to show for it.
The gist of what I'm going for:
from test import t # Even though "t" doesn't exist in the module (yet)
Any help with the full code would be greatly appreciated!
Final Answer, consolidated:
def __getattr__(name):
if name == '__path__': raise AttributeError
print(name)
There is actually a straightforward way if you are on Python 3.7+, PEP-562, which allows you to define __getattr__ at the module level:
def __getattr__(name):
if name == "t":
return "magic"
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
There is also a function __dir__ that you can define to declare what the builtin dir() will say about names in your module.
What sh does is more sophisticated, as they want to support versions below 3.7: Modifying sys.modules and replacing the module with a special object that pretends to be a module.
As #L3viathan pointed out, this is easy starting with Python 3.7: just define a __getattr__ function in your special module. So, for example, you could create an "echo" module (just returns the name of the object you requested) like this:
echo.py (Python >=3.7)
def __getattr__(name):
return name
Then you could use it like this:
from echo import x
print(repr(x))
# 'x'
On earlier versions of Python, you have to subclass the module, as hinted in PEP-562. This also works in Python 3.7.
echo.py (Python >=2)
import sys, types
class EchoModule(types.ModuleType):
def __getattr__(self, name):
return name
sys.modules[__name__] = EchoModule(__name__)
You would use this the same way as the 3.7 version: from echo import something.
Update
For some reason Python tries to retrieve the attribute twice for each from echo import <x> call. It also calls __getattr__('__path__') when the module is loaded. You can avoid side effects in these cases with the following code:
echo.py (only define attributes once)
import sys, types
class EchoModule(types.ModuleType):
def __getattr__(self, name):
# don't define __path__ attribute
if name == '__path__':
raise AttributeError
print("importing {}".format(name))
# create the attribute in case it's required again
setattr(self, name, name)
# return the new attribute
return getattr(self, name)
sys.modules[__name__] = EchoModule(__name__)
This code creates an attribute in the echo module each time a previously unused attribute is imported (sort of like collections.defaultdict). Then, if Python tries to import that same attribute again later, it will pull it directly from the module instead of calling __getattr__ (this is normal behavior for object attributes).
There is also some code here to avoid setting a spurious __path__ attribute; this also avoids running your code when __path__ is requested. Note that this may actually be the most important part; when I tested, just raising AttributeError for __path__ was enough to prevent the double-access to the named attribute.

How to debug a sys.modules entry that gets overwritten?

I'm trying to debug an issue that causes sys.modules['numpy'] to get overwritten. I've added some print statements to numpy.__init__, and when I try to import numpy, I get this output:
numpy.__init__ running
id(sys.modules) = 89034704
id(sys.modules['numpy']) = 161528304
numpy.__init__ running
id(sys.modules) = 89034704
id(sys.modules['numpy']) = 177135864
Numpy has a number of circular imports, which should work as described in this answer. But in my case, instead of getting the partially initialized numpy module from sys.modules, numpy gets imported again, and numpy.__init__ gets executed a second time, leading to a crash.
How can I instrument sys.modules to get some visibility into who is overwriting sys.modules['numpy'] and when? Normally I would write a dict subclass, but I don't think it's safe to change sys.modules to point to my own object. I tried overriding sys.modules.__setattr__, but that's a read-only attribute.
Context: I'm trying to debug this issue in PyCall, a Julia library. PyCall embeds a Python interpreter in a running Julia process, and delegates the import to PyImport_ImportModule from cpython. The problem above happens inside a single call to PyImport_ImportModule, so I hope this question should be answerable with knowledge of python / cpython, but without knowledge of Julia / PyCall.
You can change sys.modules from a plain dict to one that prints out assignments, e.g:
import sys
import traceback
class noisydict(dict):
def __setitem__(self, key, value):
print('ASSIGNED: key={!r} value={!r} at:'.format(key, value))
traceback.print_stack()
return dict.__setitem__(self, key, value)
sys.modules = noisydict(sys.modules)
This may or may not work if the overwriting happens in C code (such code may directly access the underlying dict.__setitem__ rather than just do a sys.modules[name] = newmodule as Python code would) but it's worth a try!
Thanks to #BrenBarn for pointing me to https://stackoverflow.com/a/14778568/744071. The following worked for my purposes:
importhack.py:
import traceback
old_import = __import__
def my_import(module, *args, **kwargs):
print "my_import({}) caused by:".format(module)
traceback.print_stack()
return old_import(module, *args, **kwargs)
__builtins__['__import__'] = my_import
Usage:
>>> import importhack
>>> import numpy
I believe the original problem in PyCall.jl was caused by calling PyImport_ImportModule before the Python interpreter was fully initialized.

How to mock an import

Module A includes import B at its top. However under test conditions I'd like to mock B in A (mock A.B) and completely refrain from importing B.
In fact, B isn't installed in the test environment on purpose.
A is the unit under test. I have to import A with all its functionality. B is the module I need to mock. But how can I mock B within A and stop A from importing the real B, if the first thing A does is import B?
(The reason B isn't installed is that I use pypy for quick testing and unfortunately B isn't compatible with pypy yet.)
How could this be done?
You can assign to sys.modules['B'] before importing A to get what you want:
test.py:
import sys
sys.modules['B'] = __import__('mock_B')
import A
print(A.B.__name__)
A.py:
import B
Note B.py does not exist, but when running test.py no error is returned and print(A.B.__name__) prints mock_B. You still have to create a mock_B.py where you mock B's actual functions/variables/etc. Or you can just assign a Mock() directly:
test.py:
import sys
sys.modules['B'] = Mock()
import A
The builtin __import__ can be mocked with the 'mock' library for more control:
# Store original __import__
orig_import = __import__
# This will be the B module
b_mock = mock.Mock()
def import_mock(name, *args):
if name == 'B':
return b_mock
return orig_import(name, *args)
with mock.patch('__builtin__.__import__', side_effect=import_mock):
import A
Say A looks like:
import B
def a():
return B.func()
A.a() returns b_mock.func() which can be mocked also.
b_mock.func.return_value = 'spam'
A.a() # returns 'spam'
Note for Python 3:
As stated in the changelog for 3.0, __builtin__ is now named builtins:
Renamed module __builtin__ to builtins (removing the underscores, adding an ā€˜sā€™).
The code in this answer works fine if you replace __builtin__ by builtins for Python 3.
How to mock an import, (mock A.B)?
Module A includes import B at its top.
Easy, just mock the library in sys.modules before it gets imported:
if wrong_platform():
sys.modules['B'] = mock.MagicMock()
and then, so long as A doesn't rely on specific types of data being returned from B's objects:
import A
should just work.
You can also mock import A.B:
This works even if you have submodules, but you'll want to mock each module. Say you have this:
from foo import This, That, andTheOtherThing
from foo.bar import Yada, YadaYada
from foo.baz import Blah, getBlah, boink
To mock, simply do the below before the module that contains the above is imported:
sys.modules['foo'] = MagicMock()
sys.modules['foo.bar'] = MagicMock()
sys.modules['foo.baz'] = MagicMock()
(My experience: I had a dependency that works on one platform, Windows, but didn't work on Linux, where we run our daily tests.
So I needed to mock the dependency for our tests. Luckily it was a black box, so I didn't need to set up a lot of interaction.)
Mocking Side Effects
Addendum: Actually, I needed to simulate a side-effect that took some time. So I needed an object's method to sleep for a second. That would work like this:
sys.modules['foo'] = MagicMock()
sys.modules['foo.bar'] = MagicMock()
sys.modules['foo.baz'] = MagicMock()
# setup the side-effect:
from time import sleep
def sleep_one(*args):
sleep(1)
# this gives us the mock objects that will be used
from foo.bar import MyObject
my_instance = MyObject()
# mock the method!
my_instance.method_that_takes_time = mock.MagicMock(side_effect=sleep_one)
And then the code takes some time to run, just like the real method.
Aaron Hall's answer works for me.
Just want to mention one important thing,
if in A.py you do
from B.C.D import E
then in test.py you must mock every module along the path, otherwise you get ImportError
sys.modules['B'] = mock.MagicMock()
sys.modules['B.C'] = mock.MagicMock()
sys.modules['B.C.D'] = mock.MagicMock()
I realize I'm a bit late to the party here, but here's a somewhat insane way to automate this with the mock library:
(here's an example usage)
import contextlib
import collections
import mock
import sys
def fake_module(**args):
return (collections.namedtuple('module', args.keys())(**args))
def get_patch_dict(dotted_module_path, module):
patch_dict = {}
module_splits = dotted_module_path.split('.')
# Add our module to the patch dict
patch_dict[dotted_module_path] = module
# We add the rest of the fake modules in backwards
while module_splits:
# This adds the next level up into the patch dict which is a fake
# module that points at the next level down
patch_dict['.'.join(module_splits[:-1])] = fake_module(
**{module_splits[-1]: patch_dict['.'.join(module_splits)]}
)
module_splits = module_splits[:-1]
return patch_dict
with mock.patch.dict(
sys.modules,
get_patch_dict('herp.derp', fake_module(foo='bar'))
):
import herp.derp
# prints bar
print herp.derp.foo
The reason this is so ridiculously complicated is when an import occurs python basically does this (take for example from herp.derp import foo)
Does sys.modules['herp'] exist? Else import it. If still not ImportError
Does sys.modules['herp.derp'] exist? Else import it. If still not ImportError
Get attribute foo of sys.modules['herp.derp']. Else ImportError
foo = sys.modules['herp.derp'].foo
There are some downsides to this hacked together solution: If something else relies on other stuff in the module path this kind of screws it over. Also this only works for stuff that is being imported inline such as
def foo():
import herp.derp
or
def foo():
__import__('herp.derp')
I found fine way to mock the imports in Python. It's Eric's Zaadi solution found here which I just use inside my Django application.
I've got class SeatInterface which is interface to Seat model class.
So inside my seat_interface module I have such an import:
from ..models import Seat
class SeatInterface(object):
(...)
I wanted to create isolated tests for SeatInterface class with mocked Seat class as FakeSeat. The problem was - how tu run tests offline, where Django application is down. I had below error:
ImproperlyConfigured: Requested setting BASE_DIR, but settings are not
configured. You must either define the environment variable
DJANGO_SETTINGS_MODULE or call settings.configure() before accessing
settings.
Ran 1 test in 0.078s
FAILED (errors=1)
The solution was:
import unittest
from mock import MagicMock, patch
class FakeSeat(object):
pass
class TestSeatInterface(unittest.TestCase):
def setUp(self):
models_mock = MagicMock()
models_mock.Seat.return_value = FakeSeat
modules = {'app.app.models': models_mock}
patch.dict('sys.modules', modules).start()
def test1(self):
from app.app.models_interface.seat_interface import SeatInterface
And then test magically runs OK :)
.
Ran 1 test in 0.002s
OK
If you do an import ModuleB you are really calling the builtin method __import__ as:
ModuleB = __import__('ModuleB', globals(), locals(), [], -1)
You could overwrite this method by importing the __builtin__ module and make a wrapper around the __builtin__.__import__method. Or you could play with the NullImporter hook from the imp module. Catching the exception and Mock your module/class in the except-block.
Pointer to the relevant docs:
docs.python.org: __import__
Accessing Import internals with the imp Module
I hope this helps. Be HIGHLY adviced that you step into the more arcane perimeters of python programming and that a) solid understanding what you really want to achieve and b)thorough understanding of the implications is important.
I know this is a fairly old question, but I have found myself returning to it a few times recently, and wanted to share a concise solution to this.
import sys
from unittest import mock
def mock_module_import(module):
"""Source: https://stackoverflow.com/a/63584866/3972558"""
def _outer_wrapper(func):
def _inner_wrapper(*args, **kwargs):
orig = sys.modules.get(module) # get the original module, if present
sys.modules[module] = mock.MagicMock() # patch it
try:
return func(*args, **kwargs)
finally:
if orig is not None: # if the module was installed, restore patch
sys.modules[module] = orig
else: # if the module never existed, remove the key
del sys.modules[module]
return _inner_wrapper
return _outer_wrapper
It works by temporarily patching the key for the module in sys.modules, and then restoring the original module after the decorated function is called. This can be used in scenarios where a package may not be installed in the testing environment, or a more complex scenario where the patched module might actually perform some of its own internal monkey-patching (which was the case I was facing).
Here's an example of use:
#mock_module_import("some_module")
def test_foo():
# use something that relies upon "some_module" here
assert True
I found myself facing a similar problem today, and I've decided to solve it a bit differently. Rather than hacking on top of Python's import machinery, you can simply add the mocked module into sys.path, and have Python prefer it over the original module.
Create the replacement module in a subdirectory, e.g.:
mkdir -p test/mocked-lib
${EDITOR} test/mocked-lib/B.py
Before A is imported, insert this directory to sys.path. I'm using pytest, so in my test/conftest.py, I've simply done:
import os.path
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "mocked-lib"))
Now, when the test suite is run, the mocked-lib subdirectory is prepended into sys.path and import A uses B from mocked-lib.

Lazy module variables--can it be done?

I'm trying to find a way to lazily load a module-level variable.
Specifically, I've written a tiny Python library to talk to iTunes, and I want to have a DOWNLOAD_FOLDER_PATH module variable. Unfortunately, iTunes won't tell you where its download folder is, so I've written a function that grabs the filepath of a few podcast tracks and climbs back up the directory tree until it finds the "Downloads" directory.
This takes a second or two, so I'd like to have it evaluated lazily, rather than at module import time.
Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?
You can't do it with modules, but you can disguise a class "as if" it was a module, e.g., in itun.py, code...:
import sys
class _Sneaky(object):
def __init__(self):
self.download = None
#property
def DOWNLOAD_PATH(self):
if not self.download:
self.download = heavyComputations()
return self.download
def __getattr__(self, name):
return globals()[name]
# other parts of itun that you WANT to code in
# module-ish ways
sys.modules[__name__] = _Sneaky()
Now anybody can import itun... and get in fact your itun._Sneaky() instance. The __getattr__ is there to let you access anything else in itun.py that may be more convenient for you to code as a top-level module object, than inside _Sneaky!_)
It turns out that as of Python 3.7, it's possible to do this cleanly by defining a __getattr__() at the module level, as specified in PEP 562 and documented in the data model chapter in the Python reference documentation.
# mymodule.py
from typing import Any
DOWNLOAD_FOLDER_PATH: str
def _download_folder_path() -> str:
global DOWNLOAD_FOLDER_PATH
DOWNLOAD_FOLDER_PATH = ... # compute however ...
return DOWNLOAD_FOLDER_PATH
def __getattr__(name: str) -> Any:
if name == "DOWNLOAD_FOLDER_PATH":
return _download_folder_path()
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
I used Alex' implementation on Python 3.3, but this crashes miserably:
The code
def __getattr__(self, name):
return globals()[name]
is not correct because an AttributeError should be raised, not a KeyError.
This crashed immediately under Python 3.3, because a lot of introspection is done
during the import, looking for attributes like __path__, __loader__ etc.
Here is the version that we use now in our project to allow for lazy imports
in a module. The __init__ of the module is delayed until the first attribute access
that has not a special name:
""" config.py """
# lazy initialization of this module to avoid circular import.
# the trick is to replace this module by an instance!
# modelled after a post from Alex Martelli :-)
Lazy module variables--can it be done?
class _Sneaky(object):
def __init__(self, name):
self.module = sys.modules[name]
sys.modules[name] = self
self.initializing = True
def __getattr__(self, name):
# call module.__init__ after import introspection is done
if self.initializing and not name[:2] == '__' == name[-2:]:
self.initializing = False
__init__(self.module)
return getattr(self.module, name)
_Sneaky(__name__)
The module now needs to define an init function. This function can be used
to import modules that might import ourselves:
def __init__(module):
...
# do something that imports config.py again
...
The code can be put into another module, and it can be extended with properties
as in the examples above.
Maybe that is useful for somebody.
For Python 3.5 and 3.6, the proper way of doing this, according to the Python docs, is to subclass types.ModuleType and then dynamically update the module's __class__. So, here's a solution loosely on Christian Tismer's answer but probably not resembling it much at all:
import sys
import types
class _Sneaky(types.ModuleType):
#property
def DOWNLOAD_FOLDER_PATH(self):
if not hasattr(self, '_download_folder_path'):
self._download_folder_path = '/dev/block/'
return self._download_folder_path
sys.modules[__name__].__class__ = _Sneaky
For Python 3.7 and later, you can define a module-level __getattr__() function. See PEP 562 for details.
Since Python 3.7 (and as a result of PEP-562), this is now possible with the module-level __getattr__:
Inside your module, put something like:
def _long_function():
# print() function to show this is called only once
print("Determining DOWNLOAD_FOLDER_PATH...")
# Determine the module-level variable
path = "/some/path/here"
# Set the global (module scope)
globals()['DOWNLOAD_FOLDER_PATH'] = path
# ... and return it
return path
def __getattr__(name):
if name == "DOWNLOAD_FOLDER_PATH":
return _long_function()
# Implicit else
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
From this it should be clear that the _long_function() isn't executed when you import your module, e.g.:
print("-- before import --")
import somemodule
print("-- after import --")
results in just:
-- before import --
-- after import --
But when you attempt to access the name from the module, the module-level __getattr__ will be called, which in turn will call _long_function, which will perform the long-running task, cache it as a module-level variable, and return the result back to the code that called it.
For example, with the first block above inside the module "somemodule.py", the following code:
import somemodule
print("--")
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
produces:
--
Determining DOWNLOAD_FOLDER_PATH...
/some/path/here
--
/some/path/here
--
or, more clearly:
# LINE OF CODE # OUTPUT
import somemodule # (nothing)
print("--") # --
print(somemodule.DOWNLOAD_FOLDER_PATH) # Determining DOWNLOAD_FOLDER_PATH...
# /some/path/here
print("--") # --
print(somemodule.DOWNLOAD_FOLDER_PATH) # /some/path/here
print("--") # --
Lastly, you can also implement __dir__ as the PEP describes if you want to indicate (e.g. to code introspection tools) that DOWNLOAD_FOLDER_PATH is available.
Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?
I think you are correct in saying that a function is the best solution to your problem here.
I will give you a brief example to illustrate.
#myfile.py - an example module with some expensive module level code.
import os
# expensive operation to crawl up in directory structure
The expensive operation will be executed on import if it is at module level. There is not a way to stop this, short of lazily importing the entire module!!
#myfile2.py - a module with expensive code placed inside a function.
import os
def getdownloadsfolder(curdir=None):
"""a function that will search upward from the user's current directory
to find the 'Downloads' folder."""
# expensive operation now here.
You will be following best practice by using this method.
Recently I came across the same problem, and have found a way to do it.
class LazyObject(object):
def __init__(self):
self.initialized = False
setattr(self, 'data', None)
def init(self, *args):
#print 'initializing'
pass
def __len__(self): return len(self.data)
def __repr__(self): return repr(self.data)
def __getattribute__(self, key):
if object.__getattribute__(self, 'initialized') == False:
object.__getattribute__(self, 'init')(self)
setattr(self, 'initialized', True)
if key == 'data':
return object.__getattribute__(self, 'data')
else:
try:
return object.__getattribute__(self, 'data').__getattribute__(key)
except AttributeError:
return super(LazyObject, self).__getattribute__(key)
With this LazyObject, You can define a init method for the object, and the object will be initialized lazily, example code looks like:
o = LazyObject()
def slow_init(self):
time.sleep(1) # simulate slow initialization
self.data = 'done'
o.init = slow_init
the o object above will have exactly the same methods whatever 'done' object have, for example, you can do:
# o will be initialized, then apply the `len` method
assert len(o) == 4
complete code with tests (works in 2.7) can be found here:
https://gist.github.com/observerss/007fedc5b74c74f3ea08
If that variable lived in a class rather than a module, then you could overload getattr, or better yet, populate it in init.
SPEC 1
Probably the best known recipe for Lazy Loading module attributes (and modules) is in SPEC 1 (Draft) at scientific-python.org. SPECs are operational guidelines for projects in the Scientific Python ecosystem. There is discussion around the SPEC 1 at Scientific Python Discourse and the solution is offered as a package in PyPI as lazy_loader. The lazy_loader implementation relies on the module __gettattr__ support introduced in Python 3.7 (PEP 562), and it is used in scikit-image, NetworkX, and partially in Scipy
Example usage:
The following example is using the lazy_loader PyPI package. You could also just copy-paste the source code to be part of your project.
# mypackage/__init__.py
import lazy_loader
__getattr__, __dir__, __all__ = lazy_loader.attach(
__name__,
submodules=['bar'],
submod_attrs={
'foo.morefoo': ['FooFilter', 'do_foo', 'MODULE_VARIABLE'],
'grok.spam': ['spam_a', 'spam_b', 'spam_c']
}
)
this is the lazy import equivalent of
from . import bar
from .foo.morefoo import FooFilter, do_foo, MODULE_VARIABLE
from .grok.spam import (spam_a, spam_b, spam_c)
Short explanation on lazy_loader.attach
If you want to lazy-load a module, you list it in submodules (which is a list)
If you want to lazy-load something from a module (function, class, etc.), you list it in submod_attrs (which is a dict)
Type checking
Static type checkers and IDEs cannot infer type information from lazily loaded imports. As workaround, you may use type stubs (.pyi files), like this:
# mypackage/__init__.pyi
from .foo.morefoo import FooFilter as FooFilter, do_foo as do_foo, MODULE_VARIABLE as MODULE_VARIABLE
from .grok.spam import spam_a as spam_a, spam_b as spam_b, spam_c as spam_c
The SPEC 1 mentions that this X as X syntax is necessary due to PEP484.
Side notes
There was recently a PEP for Lazy Imports, PEP 690, but it was rejected.
In Tensorflow, there is lazy loading class at util.lazyloader.
There is one blog post from Brett Cannon (a Python core developer), where he showed in 2018 a module __getattr__ based implementation of lazy_loader, and provided it in a package called modutil, but the project is marked archived in GitHub. This has been an inspiration for the scientific-python lazy_loader.

Categories

Resources