I'm trying to implement an "import hook" in Python 3. The hook is supposed to add an attribute to every class that is imported. (Not really every class, but for the sake of simplifying the question, let's assume so.)
I have a loader defined as follows:
import sys
class ConfigurableImports(object):
def find_module(self, fullname, path):
return self
def create_module(self, spec):
# ???
def exec_module(self, module):
# ???
sys.meta_path = [ConfigurableImports()]
The documentation states that as of 3.6, loaders will have to implement both create_module and exec_module. However, the documentation also has little indication what one should do to implement these, and no examples. My use case is very simple because I'm only loading Python modules and the behavior of the loader is supposed to be almost exactly the same as the default behavior.
If I could, I'd just use importlib.import_module and then modify the module contents accordingly; however, since importlib leverages the import hook, I get an infinite recursion.
EDIT: I've also tried using the imp module's load_module, but this is deprecated.
Is there any easy way to implement this functionality with import hooks, or am I going about this the wrong way?
Imho, if you only need to alter the module, that is, play with it after it has been found and loaded, there's no need to actually create a full hook that finds, loads and returns the module; just patch __import__.
This can easily be done in a few lines:
import builtins
from inspect import getmembers, isclass
old_imp = builtins.__import__
def add_attr(mod):
for name, val in getmembers(mod):
if isclass(val):
setattr(val, 'a', 10)
def custom_import(*args, **kwargs):
m = old_imp(*args, **kwargs)
add_attr(m)
return m
builtins.__import__ = custom_import
Here, __import__ is replaced by your custom import that calls the original __import__ to get the loaded module and then calls a function add_attr that does the actual modification of the classes in a module (with getmembers and isclass from inspect) before returning the module.
Of course this is created in a way that when you import the script, the changes are made.
You can and probably should create auxiliary functions that restore and change it again if needed i.e things like:
def revert(): builtins.__import__ = old_imp
def apply(): builtins.__import__ = custom_import
A context-manager would also make this implementation cleaner.
Related
I know there are ways to perform dynamic import of Python modules themselves, but I would like to know if there's a way to write a module such that it can dynamically create its own module contents on demand. I am imagining a module hook that looks something like:
# In some_module.py:
def __import_name__(name):
return some_object
Such that if I were to write from some_module import foo in a script, Python will call some_module.__import_name__("foo") and let me dynamically create and return the contents.
I haven't found anything that works like this exactly in the documentation, though there are references to an "import protocol" with "finders" and "loaders" and "meta hooks" and "import path hooks" that permit customization of the import logic, and I imagine that such a thing is possible.
I discovered you can modify the behavior of a Module from within itself in arbitrary ways by setting sys.modules[__name__].__class__ to a class that implements whatever your chosen behavior.
import sys
import types
class DynamicModule(types.ModuleType):
# This function is what gets called on `from this_module import whatever`
# or `this_module.whatever` accesses.
def __getattr__(self, name):
# This check ensures we don't intercept special values like __path__
# if they're not set elsewhere.
if name.startswith("__") and name.endswith("__"):
return self.__getattribute__(name)
return make_object(name)
# Helpful to define this here if you need to dynamically construct the
# full set of available attributes.
#property
def __all__(self):
return get_all_objects()
# This ensures the DynamicModule class is used to define the behavior of
# this module.
sys.modules[__name__].__class__ = DynamicModule
Something about this feels like it may not be the intended path to do something like this, though, and that I should be hooking into the importlib machinery.
I am trying to create a dynamic method executor, where I have a list that will always contain two elements. The first element is the name of the file, the second element is the name of the method to execute.
How can I achieve this?
My below code unfortunately doesn't work, but it will give you an good indication of what I am trying to achieve.
from logic.intents import CenterCapacity
def method_executor(event):
call_reference = ['CenterCapacity', 'get_capacity']
# process method call
return call_reference[0].call_reference[1]
Thanks!
You can use __import__ to look up the module by name and then then use getattr to find the method. For example if the following code is in a file called exec.py then
def dummy(): print("dummy")
def lookup(mod, func):
module = __import__(mod)
return getattr(module, func)
if __name__ == "__main__":
lookup("exec","dummy")()
will output
dummy
Addendum
Alternatively importlib.import_module can be used, which although a bit more verbose, may be easier to use.
The most important difference between these two functions is that import_module() returns the specified package or module (e.g. pkg.mod), while __import__() returns the top-level package or module (e.g. pkg).
def lookup(mod, func):
import importlib
module = importlib.import_module(mod)
return getattr(module, func)
starting from:
from logic.intents import CenterCapacity
def method_executor(event):
call_reference = ['CenterCapacity', 'get_capacity']
# process method call
return call_reference[0].call_reference[1]
Option 1
We have several options, the first one is using a class reference and the getattr. For this we have to remove the ' around the class and instantiate the class before calling a reference (you do not have to instantiate the class when the method is a staticmethod.)
def method_executor(event):
call_reference = [CenterCapacity, 'get_capacity'] # We now store a class reference
# process method call
return getattr(call_reference[0](), call_reference[1])
option 2
A second option is based on this answer. It revolves around using the getattr method twice. We firstly get module using sys.modules[__name__] and then get the class from there using getattr.
import sys
def method_executor(event):
call_reference = ['CenterCapacity', 'get_capacity']
class_ref = getattr(sys.modules[__name__], call_reference[0])
return getattr(class_ref, call_reference[1])
Option 3
A third option could be based on a full import path and use __import__('module.class'), take a look at this SO post.
(Note: This answer assumes that the necessary imports have already happened, and you just need a mechanism to invoke the functions of the imported modules. If you also want the import do be done by some program code, I will have to add that part, using importlib library)
You can do this:
globals()[call_reference[0]].__dict__[call_reference[1]]()
Explanation:
globals() returns a mapping between global variable names and their referenced objects. The imported module's name counts as one of these global variables of the current module.
Indexing this mapping object with call_reference[0] returns the module object containing the function to be called.
The module object's __dict__ maps each attribute-name of the module to the object referenced by that attribute. Functions defined in the module also count as attributes of the module.
Thus, indexing __dict__ with the function name call_reference[1] returns the function object.
A proper Python module will list all its public symbols in a list called __all__. Managing that list can be tedious, since you'll have to list each symbol twice. Surely there are better ways, probably using decorators so one would merely annotate the exported symbols as #export.
How would you write such a decorator? I'm certain there are different ways, so I'd like to see several answers with enough information that users can compare the approaches against one another.
In Is it a good practice to add names to __all__ using a decorator?, Ed L suggests the following, to be included in some utility library:
import sys
def export(fn):
"""Use a decorator to avoid retyping function/class names.
* Based on an idea by Duncan Booth:
http://groups.google.com/group/comp.lang.python/msg/11cbb03e09611b8a
* Improved via a suggestion by Dave Angel:
http://groups.google.com/group/comp.lang.python/msg/3d400fb22d8a42e1
"""
mod = sys.modules[fn.__module__]
if hasattr(mod, '__all__'):
name = fn.__name__
all_ = mod.__all__
if name not in all_:
all_.append(name)
else:
mod.__all__ = [fn.__name__]
return fn
We've adapted the name to match the other examples. With this in a local utility library, you'd simply write
from .utility import export
and then start using #export. Just one line of idiomatic Python, you can't get much simpler than this. On the downside, the module does require access to the module by using the __module__ property and the sys.modules cache, both of which may be problematic in some of the more esoteric setups (like custom import machinery, or wrapping functions from another module to create functions in this module).
The python part of the atpublic package by Barry Warsaw does something similar to this. It offers some keyword-based syntax, too, but the decorator variant relies on the same patterns used above.
This great answer by Aaron Hall suggests something very similar, with two more lines of code as it doesn't use __dict__.setdefault. It might be preferable if manipulating the module __dict__ is problematic for some reason.
You could simply declare the decorator at the module level like this:
__all__ = []
def export(obj):
__all__.append(obj.__name__)
return obj
This is perfect if you only use this in a single module. At 4 lines of code (plus probably some empty lines for typical formatting practices) it's not overly expensive to repeat this in different modules, but it does feel like code duplication in those cases.
You could define the following in some utility library:
def exporter():
all = []
def decorator(obj):
all.append(obj.__name__)
return obj
return decorator, all
export, __all__ = exporter()
export(exporter)
# possibly some other utilities, decorated with #export as well
Then inside your public library you'd do something like this:
from . import utility
export, __all__ = utility.exporter()
# start using #export
Using the library takes two lines of code here. It combines the definition of __all__ and the decorator. So people searching for one of them will find the other, thus helping readers to quickly understand your code. The above will also work in exotic environments, where the module may not be available from the sys.modules cache or where the __module__ property has been tampered with or some such.
https://github.com/russianidiot/public.py has yet another implementation of such a decorator. Its core file is currently 160 lines long! The crucial points appear to be the fact that it uses the inspect module to obtain the appropriate module based on the current call stack.
This is not a decorator approach, but provides the level of efficiency I think you're after.
https://pypi.org/project/auto-all/
You can use the two functions provided with the package to "start" and "end" capturing the module objects that you want included in the __all__ variable.
from auto_all import start_all, end_all
# Imports outside the start and end functions won't be externally availab;e.
from pathlib import Path
def a_private_function():
print("This is a private function.")
# Start defining externally accessible objects
start_all(globals())
def a_public_function():
print("This is a public function.")
# Stop defining externally accessible objects
end_all(globals())
The functions in the package are trivial (a few lines), so could be copied into your code if you want to avoid external dependencies.
While other variants are technically correct to a certain extent, one might also be sure that:
if the target module already has __all__ declared, it is handled correctly;
target appears in __all__ only once:
# utils.py
import sys
from typing import Any
def export(target: Any) -> Any:
"""
Mark a module-level object as exported.
Simplifies tracking of objects available via wildcard imports.
"""
mod = sys.modules[target.__module__]
__all__ = getattr(mod, '__all__', None)
if __all__ is None:
__all__ = []
setattr(mod, '__all__', __all__)
elif not isinstance(__all__, list):
__all__ = list(__all__)
setattr(mod, '__all__', __all__)
target_name = target.__name__
if target_name not in __all__:
__all__.append(target_name)
return target
I'm currently writing a class that needs os, stat and some others.
What's the best way to import these modules in my class?
I'm thinking about when others will use it, I want the 'dependency' modules to be already
imported when the class is instantiated.
Now I'm importing them in my methods, but maybe there's a better solution.
If your module will always import another module, always put it at the top as PEP 8 and the other answers indicate. Also, as #delnan mentions in a comment, sys, os, etc. are being used anyway, so it doesn't hurt to import them globally.
However, there is nothing wrong with conditional imports, if you really only need a module under certain runtime conditions.
If you only want to import them if the class is defined, like if the class is in an conditional block or another class or method, you can do something like this:
condition = True
if condition:
class C(object):
os = __import__('os')
def __init__(self):
print self.os.listdir
C.os
c = C()
If you only want it to be imported if the class is instantiated, do it in __new__ or __init__.
import sys
from importlib import import_module
class Foo():
def __init__(self):
# Depends on the configuration of the application.
self.condition = True # "True" Or "False"
if self.condition:
self.importedModule = import_module('moduleName')
# ---
if 'moduleName' in sys.modules:
self.importedModule.callFunction(params)
#or
if self.condition:
self.importedModule.callFunction(params)
# ---
PEP 8 on imports:
Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
This makes it easy to see all modules used by the file at hand, and avoids having to replicate the import in several places when a module is used in more than one place. Everything else (e.g. function-/method-level imports) should be an absolute exception and needs to be justified well.
This (search for the section "Imports") official paper states, that imports should normally be put in the top of your source file. I would abide to this rule, apart from special cases.
I'm trying to find a way to lazily load a module-level variable.
Specifically, I've written a tiny Python library to talk to iTunes, and I want to have a DOWNLOAD_FOLDER_PATH module variable. Unfortunately, iTunes won't tell you where its download folder is, so I've written a function that grabs the filepath of a few podcast tracks and climbs back up the directory tree until it finds the "Downloads" directory.
This takes a second or two, so I'd like to have it evaluated lazily, rather than at module import time.
Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?
You can't do it with modules, but you can disguise a class "as if" it was a module, e.g., in itun.py, code...:
import sys
class _Sneaky(object):
def __init__(self):
self.download = None
#property
def DOWNLOAD_PATH(self):
if not self.download:
self.download = heavyComputations()
return self.download
def __getattr__(self, name):
return globals()[name]
# other parts of itun that you WANT to code in
# module-ish ways
sys.modules[__name__] = _Sneaky()
Now anybody can import itun... and get in fact your itun._Sneaky() instance. The __getattr__ is there to let you access anything else in itun.py that may be more convenient for you to code as a top-level module object, than inside _Sneaky!_)
It turns out that as of Python 3.7, it's possible to do this cleanly by defining a __getattr__() at the module level, as specified in PEP 562 and documented in the data model chapter in the Python reference documentation.
# mymodule.py
from typing import Any
DOWNLOAD_FOLDER_PATH: str
def _download_folder_path() -> str:
global DOWNLOAD_FOLDER_PATH
DOWNLOAD_FOLDER_PATH = ... # compute however ...
return DOWNLOAD_FOLDER_PATH
def __getattr__(name: str) -> Any:
if name == "DOWNLOAD_FOLDER_PATH":
return _download_folder_path()
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
I used Alex' implementation on Python 3.3, but this crashes miserably:
The code
def __getattr__(self, name):
return globals()[name]
is not correct because an AttributeError should be raised, not a KeyError.
This crashed immediately under Python 3.3, because a lot of introspection is done
during the import, looking for attributes like __path__, __loader__ etc.
Here is the version that we use now in our project to allow for lazy imports
in a module. The __init__ of the module is delayed until the first attribute access
that has not a special name:
""" config.py """
# lazy initialization of this module to avoid circular import.
# the trick is to replace this module by an instance!
# modelled after a post from Alex Martelli :-)
Lazy module variables--can it be done?
class _Sneaky(object):
def __init__(self, name):
self.module = sys.modules[name]
sys.modules[name] = self
self.initializing = True
def __getattr__(self, name):
# call module.__init__ after import introspection is done
if self.initializing and not name[:2] == '__' == name[-2:]:
self.initializing = False
__init__(self.module)
return getattr(self.module, name)
_Sneaky(__name__)
The module now needs to define an init function. This function can be used
to import modules that might import ourselves:
def __init__(module):
...
# do something that imports config.py again
...
The code can be put into another module, and it can be extended with properties
as in the examples above.
Maybe that is useful for somebody.
For Python 3.5 and 3.6, the proper way of doing this, according to the Python docs, is to subclass types.ModuleType and then dynamically update the module's __class__. So, here's a solution loosely on Christian Tismer's answer but probably not resembling it much at all:
import sys
import types
class _Sneaky(types.ModuleType):
#property
def DOWNLOAD_FOLDER_PATH(self):
if not hasattr(self, '_download_folder_path'):
self._download_folder_path = '/dev/block/'
return self._download_folder_path
sys.modules[__name__].__class__ = _Sneaky
For Python 3.7 and later, you can define a module-level __getattr__() function. See PEP 562 for details.
Since Python 3.7 (and as a result of PEP-562), this is now possible with the module-level __getattr__:
Inside your module, put something like:
def _long_function():
# print() function to show this is called only once
print("Determining DOWNLOAD_FOLDER_PATH...")
# Determine the module-level variable
path = "/some/path/here"
# Set the global (module scope)
globals()['DOWNLOAD_FOLDER_PATH'] = path
# ... and return it
return path
def __getattr__(name):
if name == "DOWNLOAD_FOLDER_PATH":
return _long_function()
# Implicit else
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
From this it should be clear that the _long_function() isn't executed when you import your module, e.g.:
print("-- before import --")
import somemodule
print("-- after import --")
results in just:
-- before import --
-- after import --
But when you attempt to access the name from the module, the module-level __getattr__ will be called, which in turn will call _long_function, which will perform the long-running task, cache it as a module-level variable, and return the result back to the code that called it.
For example, with the first block above inside the module "somemodule.py", the following code:
import somemodule
print("--")
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
produces:
--
Determining DOWNLOAD_FOLDER_PATH...
/some/path/here
--
/some/path/here
--
or, more clearly:
# LINE OF CODE # OUTPUT
import somemodule # (nothing)
print("--") # --
print(somemodule.DOWNLOAD_FOLDER_PATH) # Determining DOWNLOAD_FOLDER_PATH...
# /some/path/here
print("--") # --
print(somemodule.DOWNLOAD_FOLDER_PATH) # /some/path/here
print("--") # --
Lastly, you can also implement __dir__ as the PEP describes if you want to indicate (e.g. to code introspection tools) that DOWNLOAD_FOLDER_PATH is available.
Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?
I think you are correct in saying that a function is the best solution to your problem here.
I will give you a brief example to illustrate.
#myfile.py - an example module with some expensive module level code.
import os
# expensive operation to crawl up in directory structure
The expensive operation will be executed on import if it is at module level. There is not a way to stop this, short of lazily importing the entire module!!
#myfile2.py - a module with expensive code placed inside a function.
import os
def getdownloadsfolder(curdir=None):
"""a function that will search upward from the user's current directory
to find the 'Downloads' folder."""
# expensive operation now here.
You will be following best practice by using this method.
Recently I came across the same problem, and have found a way to do it.
class LazyObject(object):
def __init__(self):
self.initialized = False
setattr(self, 'data', None)
def init(self, *args):
#print 'initializing'
pass
def __len__(self): return len(self.data)
def __repr__(self): return repr(self.data)
def __getattribute__(self, key):
if object.__getattribute__(self, 'initialized') == False:
object.__getattribute__(self, 'init')(self)
setattr(self, 'initialized', True)
if key == 'data':
return object.__getattribute__(self, 'data')
else:
try:
return object.__getattribute__(self, 'data').__getattribute__(key)
except AttributeError:
return super(LazyObject, self).__getattribute__(key)
With this LazyObject, You can define a init method for the object, and the object will be initialized lazily, example code looks like:
o = LazyObject()
def slow_init(self):
time.sleep(1) # simulate slow initialization
self.data = 'done'
o.init = slow_init
the o object above will have exactly the same methods whatever 'done' object have, for example, you can do:
# o will be initialized, then apply the `len` method
assert len(o) == 4
complete code with tests (works in 2.7) can be found here:
https://gist.github.com/observerss/007fedc5b74c74f3ea08
If that variable lived in a class rather than a module, then you could overload getattr, or better yet, populate it in init.
SPEC 1
Probably the best known recipe for Lazy Loading module attributes (and modules) is in SPEC 1 (Draft) at scientific-python.org. SPECs are operational guidelines for projects in the Scientific Python ecosystem. There is discussion around the SPEC 1 at Scientific Python Discourse and the solution is offered as a package in PyPI as lazy_loader. The lazy_loader implementation relies on the module __gettattr__ support introduced in Python 3.7 (PEP 562), and it is used in scikit-image, NetworkX, and partially in Scipy
Example usage:
The following example is using the lazy_loader PyPI package. You could also just copy-paste the source code to be part of your project.
# mypackage/__init__.py
import lazy_loader
__getattr__, __dir__, __all__ = lazy_loader.attach(
__name__,
submodules=['bar'],
submod_attrs={
'foo.morefoo': ['FooFilter', 'do_foo', 'MODULE_VARIABLE'],
'grok.spam': ['spam_a', 'spam_b', 'spam_c']
}
)
this is the lazy import equivalent of
from . import bar
from .foo.morefoo import FooFilter, do_foo, MODULE_VARIABLE
from .grok.spam import (spam_a, spam_b, spam_c)
Short explanation on lazy_loader.attach
If you want to lazy-load a module, you list it in submodules (which is a list)
If you want to lazy-load something from a module (function, class, etc.), you list it in submod_attrs (which is a dict)
Type checking
Static type checkers and IDEs cannot infer type information from lazily loaded imports. As workaround, you may use type stubs (.pyi files), like this:
# mypackage/__init__.pyi
from .foo.morefoo import FooFilter as FooFilter, do_foo as do_foo, MODULE_VARIABLE as MODULE_VARIABLE
from .grok.spam import spam_a as spam_a, spam_b as spam_b, spam_c as spam_c
The SPEC 1 mentions that this X as X syntax is necessary due to PEP484.
Side notes
There was recently a PEP for Lazy Imports, PEP 690, but it was rejected.
In Tensorflow, there is lazy loading class at util.lazyloader.
There is one blog post from Brett Cannon (a Python core developer), where he showed in 2018 a module __getattr__ based implementation of lazy_loader, and provided it in a package called modutil, but the project is marked archived in GitHub. This has been an inspiration for the scientific-python lazy_loader.