Lazy module variables--can it be done? - python

I'm trying to find a way to lazily load a module-level variable.
Specifically, I've written a tiny Python library to talk to iTunes, and I want to have a DOWNLOAD_FOLDER_PATH module variable. Unfortunately, iTunes won't tell you where its download folder is, so I've written a function that grabs the filepath of a few podcast tracks and climbs back up the directory tree until it finds the "Downloads" directory.
This takes a second or two, so I'd like to have it evaluated lazily, rather than at module import time.
Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?

You can't do it with modules, but you can disguise a class "as if" it was a module, e.g., in itun.py, code...:
import sys
class _Sneaky(object):
def __init__(self):
self.download = None
#property
def DOWNLOAD_PATH(self):
if not self.download:
self.download = heavyComputations()
return self.download
def __getattr__(self, name):
return globals()[name]
# other parts of itun that you WANT to code in
# module-ish ways
sys.modules[__name__] = _Sneaky()
Now anybody can import itun... and get in fact your itun._Sneaky() instance. The __getattr__ is there to let you access anything else in itun.py that may be more convenient for you to code as a top-level module object, than inside _Sneaky!_)

It turns out that as of Python 3.7, it's possible to do this cleanly by defining a __getattr__() at the module level, as specified in PEP 562 and documented in the data model chapter in the Python reference documentation.
# mymodule.py
from typing import Any
DOWNLOAD_FOLDER_PATH: str
def _download_folder_path() -> str:
global DOWNLOAD_FOLDER_PATH
DOWNLOAD_FOLDER_PATH = ... # compute however ...
return DOWNLOAD_FOLDER_PATH
def __getattr__(name: str) -> Any:
if name == "DOWNLOAD_FOLDER_PATH":
return _download_folder_path()
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

I used Alex' implementation on Python 3.3, but this crashes miserably:
The code
def __getattr__(self, name):
return globals()[name]
is not correct because an AttributeError should be raised, not a KeyError.
This crashed immediately under Python 3.3, because a lot of introspection is done
during the import, looking for attributes like __path__, __loader__ etc.
Here is the version that we use now in our project to allow for lazy imports
in a module. The __init__ of the module is delayed until the first attribute access
that has not a special name:
""" config.py """
# lazy initialization of this module to avoid circular import.
# the trick is to replace this module by an instance!
# modelled after a post from Alex Martelli :-)
Lazy module variables--can it be done?
class _Sneaky(object):
def __init__(self, name):
self.module = sys.modules[name]
sys.modules[name] = self
self.initializing = True
def __getattr__(self, name):
# call module.__init__ after import introspection is done
if self.initializing and not name[:2] == '__' == name[-2:]:
self.initializing = False
__init__(self.module)
return getattr(self.module, name)
_Sneaky(__name__)
The module now needs to define an init function. This function can be used
to import modules that might import ourselves:
def __init__(module):
...
# do something that imports config.py again
...
The code can be put into another module, and it can be extended with properties
as in the examples above.
Maybe that is useful for somebody.

For Python 3.5 and 3.6, the proper way of doing this, according to the Python docs, is to subclass types.ModuleType and then dynamically update the module's __class__. So, here's a solution loosely on Christian Tismer's answer but probably not resembling it much at all:
import sys
import types
class _Sneaky(types.ModuleType):
#property
def DOWNLOAD_FOLDER_PATH(self):
if not hasattr(self, '_download_folder_path'):
self._download_folder_path = '/dev/block/'
return self._download_folder_path
sys.modules[__name__].__class__ = _Sneaky
For Python 3.7 and later, you can define a module-level __getattr__() function. See PEP 562 for details.

Since Python 3.7 (and as a result of PEP-562), this is now possible with the module-level __getattr__:
Inside your module, put something like:
def _long_function():
# print() function to show this is called only once
print("Determining DOWNLOAD_FOLDER_PATH...")
# Determine the module-level variable
path = "/some/path/here"
# Set the global (module scope)
globals()['DOWNLOAD_FOLDER_PATH'] = path
# ... and return it
return path
def __getattr__(name):
if name == "DOWNLOAD_FOLDER_PATH":
return _long_function()
# Implicit else
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
From this it should be clear that the _long_function() isn't executed when you import your module, e.g.:
print("-- before import --")
import somemodule
print("-- after import --")
results in just:
-- before import --
-- after import --
But when you attempt to access the name from the module, the module-level __getattr__ will be called, which in turn will call _long_function, which will perform the long-running task, cache it as a module-level variable, and return the result back to the code that called it.
For example, with the first block above inside the module "somemodule.py", the following code:
import somemodule
print("--")
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
produces:
--
Determining DOWNLOAD_FOLDER_PATH...
/some/path/here
--
/some/path/here
--
or, more clearly:
# LINE OF CODE # OUTPUT
import somemodule # (nothing)
print("--") # --
print(somemodule.DOWNLOAD_FOLDER_PATH) # Determining DOWNLOAD_FOLDER_PATH...
# /some/path/here
print("--") # --
print(somemodule.DOWNLOAD_FOLDER_PATH) # /some/path/here
print("--") # --
Lastly, you can also implement __dir__ as the PEP describes if you want to indicate (e.g. to code introspection tools) that DOWNLOAD_FOLDER_PATH is available.

Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?
I think you are correct in saying that a function is the best solution to your problem here.
I will give you a brief example to illustrate.
#myfile.py - an example module with some expensive module level code.
import os
# expensive operation to crawl up in directory structure
The expensive operation will be executed on import if it is at module level. There is not a way to stop this, short of lazily importing the entire module!!
#myfile2.py - a module with expensive code placed inside a function.
import os
def getdownloadsfolder(curdir=None):
"""a function that will search upward from the user's current directory
to find the 'Downloads' folder."""
# expensive operation now here.
You will be following best practice by using this method.

Recently I came across the same problem, and have found a way to do it.
class LazyObject(object):
def __init__(self):
self.initialized = False
setattr(self, 'data', None)
def init(self, *args):
#print 'initializing'
pass
def __len__(self): return len(self.data)
def __repr__(self): return repr(self.data)
def __getattribute__(self, key):
if object.__getattribute__(self, 'initialized') == False:
object.__getattribute__(self, 'init')(self)
setattr(self, 'initialized', True)
if key == 'data':
return object.__getattribute__(self, 'data')
else:
try:
return object.__getattribute__(self, 'data').__getattribute__(key)
except AttributeError:
return super(LazyObject, self).__getattribute__(key)
With this LazyObject, You can define a init method for the object, and the object will be initialized lazily, example code looks like:
o = LazyObject()
def slow_init(self):
time.sleep(1) # simulate slow initialization
self.data = 'done'
o.init = slow_init
the o object above will have exactly the same methods whatever 'done' object have, for example, you can do:
# o will be initialized, then apply the `len` method
assert len(o) == 4
complete code with tests (works in 2.7) can be found here:
https://gist.github.com/observerss/007fedc5b74c74f3ea08

If that variable lived in a class rather than a module, then you could overload getattr, or better yet, populate it in init.

SPEC 1
Probably the best known recipe for Lazy Loading module attributes (and modules) is in SPEC 1 (Draft) at scientific-python.org. SPECs are operational guidelines for projects in the Scientific Python ecosystem. There is discussion around the SPEC 1 at Scientific Python Discourse and the solution is offered as a package in PyPI as lazy_loader. The lazy_loader implementation relies on the module __gettattr__ support introduced in Python 3.7 (PEP 562), and it is used in scikit-image, NetworkX, and partially in Scipy
Example usage:
The following example is using the lazy_loader PyPI package. You could also just copy-paste the source code to be part of your project.
# mypackage/__init__.py
import lazy_loader
__getattr__, __dir__, __all__ = lazy_loader.attach(
__name__,
submodules=['bar'],
submod_attrs={
'foo.morefoo': ['FooFilter', 'do_foo', 'MODULE_VARIABLE'],
'grok.spam': ['spam_a', 'spam_b', 'spam_c']
}
)
this is the lazy import equivalent of
from . import bar
from .foo.morefoo import FooFilter, do_foo, MODULE_VARIABLE
from .grok.spam import (spam_a, spam_b, spam_c)
Short explanation on lazy_loader.attach
If you want to lazy-load a module, you list it in submodules (which is a list)
If you want to lazy-load something from a module (function, class, etc.), you list it in submod_attrs (which is a dict)
Type checking
Static type checkers and IDEs cannot infer type information from lazily loaded imports. As workaround, you may use type stubs (.pyi files), like this:
# mypackage/__init__.pyi
from .foo.morefoo import FooFilter as FooFilter, do_foo as do_foo, MODULE_VARIABLE as MODULE_VARIABLE
from .grok.spam import spam_a as spam_a, spam_b as spam_b, spam_c as spam_c
The SPEC 1 mentions that this X as X syntax is necessary due to PEP484.
Side notes
There was recently a PEP for Lazy Imports, PEP 690, but it was rejected.
In Tensorflow, there is lazy loading class at util.lazyloader.
There is one blog post from Brett Cannon (a Python core developer), where he showed in 2018 a module __getattr__ based implementation of lazy_loader, and provided it in a package called modutil, but the project is marked archived in GitHub. This has been an inspiration for the scientific-python lazy_loader.

Related

How do I dynamically generate module contents in Python?

I know there are ways to perform dynamic import of Python modules themselves, but I would like to know if there's a way to write a module such that it can dynamically create its own module contents on demand. I am imagining a module hook that looks something like:
# In some_module.py:
def __import_name__(name):
return some_object
Such that if I were to write from some_module import foo in a script, Python will call some_module.__import_name__("foo") and let me dynamically create and return the contents.
I haven't found anything that works like this exactly in the documentation, though there are references to an "import protocol" with "finders" and "loaders" and "meta hooks" and "import path hooks" that permit customization of the import logic, and I imagine that such a thing is possible.
I discovered you can modify the behavior of a Module from within itself in arbitrary ways by setting sys.modules[__name__].__class__ to a class that implements whatever your chosen behavior.
import sys
import types
class DynamicModule(types.ModuleType):
# This function is what gets called on `from this_module import whatever`
# or `this_module.whatever` accesses.
def __getattr__(self, name):
# This check ensures we don't intercept special values like __path__
# if they're not set elsewhere.
if name.startswith("__") and name.endswith("__"):
return self.__getattribute__(name)
return make_object(name)
# Helpful to define this here if you need to dynamically construct the
# full set of available attributes.
#property
def __all__(self):
return get_all_objects()
# This ensures the DynamicModule class is used to define the behavior of
# this module.
sys.modules[__name__].__class__ = DynamicModule
Something about this feels like it may not be the intended path to do something like this, though, and that I should be hooking into the importlib machinery.

Import modules that don't exist (yet)

I wish to create my own variation of amoffat'ssh module, where it can import pretty much any command from user's UNIX path, such as:
from sh import hg
However, I am having a hard time finding a way to intercept / override python's own import [...] and from [...] import [...]. At this point I simply need a way to at least get [the name of] the object of the from import, at which point I can simply setattr() and partial() my way from there, I hope. I'm at a complete loss of how to do this at the moment, however, and hence, have no code to show for it.
The gist of what I'm going for:
from test import t # Even though "t" doesn't exist in the module (yet)
Any help with the full code would be greatly appreciated!
Final Answer, consolidated:
def __getattr__(name):
if name == '__path__': raise AttributeError
print(name)
There is actually a straightforward way if you are on Python 3.7+, PEP-562, which allows you to define __getattr__ at the module level:
def __getattr__(name):
if name == "t":
return "magic"
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
There is also a function __dir__ that you can define to declare what the builtin dir() will say about names in your module.
What sh does is more sophisticated, as they want to support versions below 3.7: Modifying sys.modules and replacing the module with a special object that pretends to be a module.
As #L3viathan pointed out, this is easy starting with Python 3.7: just define a __getattr__ function in your special module. So, for example, you could create an "echo" module (just returns the name of the object you requested) like this:
echo.py (Python >=3.7)
def __getattr__(name):
return name
Then you could use it like this:
from echo import x
print(repr(x))
# 'x'
On earlier versions of Python, you have to subclass the module, as hinted in PEP-562. This also works in Python 3.7.
echo.py (Python >=2)
import sys, types
class EchoModule(types.ModuleType):
def __getattr__(self, name):
return name
sys.modules[__name__] = EchoModule(__name__)
You would use this the same way as the 3.7 version: from echo import something.
Update
For some reason Python tries to retrieve the attribute twice for each from echo import <x> call. It also calls __getattr__('__path__') when the module is loaded. You can avoid side effects in these cases with the following code:
echo.py (only define attributes once)
import sys, types
class EchoModule(types.ModuleType):
def __getattr__(self, name):
# don't define __path__ attribute
if name == '__path__':
raise AttributeError
print("importing {}".format(name))
# create the attribute in case it's required again
setattr(self, name, name)
# return the new attribute
return getattr(self, name)
sys.modules[__name__] = EchoModule(__name__)
This code creates an attribute in the echo module each time a previously unused attribute is imported (sort of like collections.defaultdict). Then, if Python tries to import that same attribute again later, it will pull it directly from the module instead of calling __getattr__ (this is normal behavior for object attributes).
There is also some code here to avoid setting a spurious __path__ attribute; this also avoids running your code when __path__ is requested. Note that this may actually be the most important part; when I tested, just raising AttributeError for __path__ was enough to prevent the double-access to the named attribute.

Export decorator that manages __all__

A proper Python module will list all its public symbols in a list called __all__. Managing that list can be tedious, since you'll have to list each symbol twice. Surely there are better ways, probably using decorators so one would merely annotate the exported symbols as #export.
How would you write such a decorator? I'm certain there are different ways, so I'd like to see several answers with enough information that users can compare the approaches against one another.
In Is it a good practice to add names to __all__ using a decorator?, Ed L suggests the following, to be included in some utility library:
import sys
def export(fn):
"""Use a decorator to avoid retyping function/class names.
* Based on an idea by Duncan Booth:
http://groups.google.com/group/comp.lang.python/msg/11cbb03e09611b8a
* Improved via a suggestion by Dave Angel:
http://groups.google.com/group/comp.lang.python/msg/3d400fb22d8a42e1
"""
mod = sys.modules[fn.__module__]
if hasattr(mod, '__all__'):
name = fn.__name__
all_ = mod.__all__
if name not in all_:
all_.append(name)
else:
mod.__all__ = [fn.__name__]
return fn
We've adapted the name to match the other examples. With this in a local utility library, you'd simply write
from .utility import export
and then start using #export. Just one line of idiomatic Python, you can't get much simpler than this. On the downside, the module does require access to the module by using the __module__ property and the sys.modules cache, both of which may be problematic in some of the more esoteric setups (like custom import machinery, or wrapping functions from another module to create functions in this module).
The python part of the atpublic package by Barry Warsaw does something similar to this. It offers some keyword-based syntax, too, but the decorator variant relies on the same patterns used above.
This great answer by Aaron Hall suggests something very similar, with two more lines of code as it doesn't use __dict__.setdefault. It might be preferable if manipulating the module __dict__ is problematic for some reason.
You could simply declare the decorator at the module level like this:
__all__ = []
def export(obj):
__all__.append(obj.__name__)
return obj
This is perfect if you only use this in a single module. At 4 lines of code (plus probably some empty lines for typical formatting practices) it's not overly expensive to repeat this in different modules, but it does feel like code duplication in those cases.
You could define the following in some utility library:
def exporter():
all = []
def decorator(obj):
all.append(obj.__name__)
return obj
return decorator, all
export, __all__ = exporter()
export(exporter)
# possibly some other utilities, decorated with #export as well
Then inside your public library you'd do something like this:
from . import utility
export, __all__ = utility.exporter()
# start using #export
Using the library takes two lines of code here. It combines the definition of __all__ and the decorator. So people searching for one of them will find the other, thus helping readers to quickly understand your code. The above will also work in exotic environments, where the module may not be available from the sys.modules cache or where the __module__ property has been tampered with or some such.
https://github.com/russianidiot/public.py has yet another implementation of such a decorator. Its core file is currently 160 lines long! The crucial points appear to be the fact that it uses the inspect module to obtain the appropriate module based on the current call stack.
This is not a decorator approach, but provides the level of efficiency I think you're after.
https://pypi.org/project/auto-all/
You can use the two functions provided with the package to "start" and "end" capturing the module objects that you want included in the __all__ variable.
from auto_all import start_all, end_all
# Imports outside the start and end functions won't be externally availab;e.
from pathlib import Path
def a_private_function():
print("This is a private function.")
# Start defining externally accessible objects
start_all(globals())
def a_public_function():
print("This is a public function.")
# Stop defining externally accessible objects
end_all(globals())
The functions in the package are trivial (a few lines), so could be copied into your code if you want to avoid external dependencies.
While other variants are technically correct to a certain extent, one might also be sure that:
if the target module already has __all__ declared, it is handled correctly;
target appears in __all__ only once:
# utils.py
import sys
from typing import Any
def export(target: Any) -> Any:
"""
Mark a module-level object as exported.
Simplifies tracking of objects available via wildcard imports.
"""
mod = sys.modules[target.__module__]
__all__ = getattr(mod, '__all__', None)
if __all__ is None:
__all__ = []
setattr(mod, '__all__', __all__)
elif not isinstance(__all__, list):
__all__ = list(__all__)
setattr(mod, '__all__', __all__)
target_name = target.__name__
if target_name not in __all__:
__all__.append(target_name)
return target

Python 3 import hooks

I'm trying to implement an "import hook" in Python 3. The hook is supposed to add an attribute to every class that is imported. (Not really every class, but for the sake of simplifying the question, let's assume so.)
I have a loader defined as follows:
import sys
class ConfigurableImports(object):
def find_module(self, fullname, path):
return self
def create_module(self, spec):
# ???
def exec_module(self, module):
# ???
sys.meta_path = [ConfigurableImports()]
The documentation states that as of 3.6, loaders will have to implement both create_module and exec_module. However, the documentation also has little indication what one should do to implement these, and no examples. My use case is very simple because I'm only loading Python modules and the behavior of the loader is supposed to be almost exactly the same as the default behavior.
If I could, I'd just use importlib.import_module and then modify the module contents accordingly; however, since importlib leverages the import hook, I get an infinite recursion.
EDIT: I've also tried using the imp module's load_module, but this is deprecated.
Is there any easy way to implement this functionality with import hooks, or am I going about this the wrong way?
Imho, if you only need to alter the module, that is, play with it after it has been found and loaded, there's no need to actually create a full hook that finds, loads and returns the module; just patch __import__.
This can easily be done in a few lines:
import builtins
from inspect import getmembers, isclass
old_imp = builtins.__import__
def add_attr(mod):
for name, val in getmembers(mod):
if isclass(val):
setattr(val, 'a', 10)
def custom_import(*args, **kwargs):
m = old_imp(*args, **kwargs)
add_attr(m)
return m
builtins.__import__ = custom_import
Here, __import__ is replaced by your custom import that calls the original __import__ to get the loaded module and then calls a function add_attr that does the actual modification of the classes in a module (with getmembers and isclass from inspect) before returning the module.
Of course this is created in a way that when you import the script, the changes are made.
You can and probably should create auxiliary functions that restore and change it again if needed i.e things like:
def revert(): builtins.__import__ = old_imp
def apply(): builtins.__import__ = custom_import
A context-manager would also make this implementation cleaner.

Importing modules inside python class

I'm currently writing a class that needs os, stat and some others.
What's the best way to import these modules in my class?
I'm thinking about when others will use it, I want the 'dependency' modules to be already
imported when the class is instantiated.
Now I'm importing them in my methods, but maybe there's a better solution.
If your module will always import another module, always put it at the top as PEP 8 and the other answers indicate. Also, as #delnan mentions in a comment, sys, os, etc. are being used anyway, so it doesn't hurt to import them globally.
However, there is nothing wrong with conditional imports, if you really only need a module under certain runtime conditions.
If you only want to import them if the class is defined, like if the class is in an conditional block or another class or method, you can do something like this:
condition = True
if condition:
class C(object):
os = __import__('os')
def __init__(self):
print self.os.listdir
C.os
c = C()
If you only want it to be imported if the class is instantiated, do it in __new__ or __init__.
import sys
from importlib import import_module
class Foo():
def __init__(self):
# Depends on the configuration of the application.
self.condition = True # "True" Or "False"
if self.condition:
self.importedModule = import_module('moduleName')
# ---
if 'moduleName' in sys.modules:
self.importedModule.callFunction(params)
#or
if self.condition:
self.importedModule.callFunction(params)
# ---
PEP 8 on imports:
Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
This makes it easy to see all modules used by the file at hand, and avoids having to replicate the import in several places when a module is used in more than one place. Everything else (e.g. function-/method-level imports) should be an absolute exception and needs to be justified well.
This (search for the section "Imports") official paper states, that imports should normally be put in the top of your source file. I would abide to this rule, apart from special cases.

Categories

Resources