Creating a pseudo-module that creates submodules at runtime

Creating a pseudo-module that creates submodules at runtime - python

To support extensions in my Python project, I'm trying to create a pseudo-module that will serve "extension modules" as it's submodules. I'm having a problem treating the submodules as modules - it seems like I need to access them using from..import on the main pseudo-module and can't just access their full path.
Here is a minimal working example:
import sys
from types import ModuleType
class Foo(ModuleType):
#property
def bar(self):
# Here I would actually find the location of `bar.py` and load it
bar = ModuleType('foo.bar')
sys.modules['foo.bar'] = bar
return bar
sys.modules['foo'] = Foo('foo')
from foo import bar # without this line the next line fails
import foo.bar
This works, but if I comment out the from foo import bar line, it'll fail with:
ImportError: No module named bar
on Python2, and on Python3 it'll fail with:
ModuleNotFoundError: No module named 'foo.bar'; 'foo' is not a package
If I add the fields to make it a package:
class Foo(ModuleType):
__all__ = ('bar',)
__package__ = 'foo'
__path__ = []
__file__ = __file__
It'll fail on:
ModuleNotFoundError: No module named 'foo.bar'
From what I understand, the problem is that I did not set sys.modules['foo.bar'] yet. But... to fill sys.modules I need to load the module first, and I don't want to do it unless the user of my project explicitly imports it.
Is there any way to make Python realize that when it sees import foo.bar it needs to load foo first(or I can just guarantee foo will already be loaded at that point) and take bar from it?

This post does NOT answer "This is how you do it."
If you want to know how to do this yourself look at PEP 302 or Idan Arye's solution.
This post instead presents a recipe that makes it easy to write. The recipe is at the end of this answer.
The block of code below defines two classes intended for use: PseudoModule and PseudoPackage. The behaviour only differs from whether import foo.x should raise an error stating foo isn't a package or try to load x and make sure it's a module. Several example uses are outlined below.
PseudoModule
PseudoModule can be used as a decorator to a function, it creates a new module object that when attributes are accessed for the first time it called the decorated function with the name of the attribute and the namespace of previously defined elements.
For example, this will make a module that assigns a new integer to each attribute accessed:
#PseudoModule
def access_tracker(attr, namespace):
namespace["_count"] = namespace.get("_count", -1) + 1
return namespace["_count"]
#PseudoModule will set `namespace[attr] = <return value>` for you
#this can be overriden by passing `remember_results=False` to the constructor
sys.modules["access_tracker"] = access_tracker
from access_tracker import zero, one, two, three
assert zero == 0 and one == 1 and two == 2 and three == 3
PseudoPackage
PseudoPackage is used the same way as PseudoModule however if the decorated function returns a module (or package) it will correct the name to be qualified as a subpackage and sys.modules is updated as needed. (the top level package still needs to be added to sys.modules manually)
Here is an example use of PseudoPackage:
spam_submodules = {"bacon"}
spam_attributes = {"eggs", "ham"}
#PseudoPackage
def spam(name, namespace):
print("getting a component of spam:", name)
if name in spam_submodules:
#PseudoModule
def submodule(attr, nested_namespace):
print("getting a component of submodule {}: {}".format(name, attr))
return attr #use the string of the attribute
return submodule #PseudoPackage will rename the module to be spam.bacon for us
elif name in spam_attributes:
return "supported attribute"
else:
raise AttributeError("spam doesn't have any {!r}.".format(name))
sys.modules["spam"] = spam
import spam.bacon
#prints "getting a component of spam: bacon"
assert spam.bacon.something == "something"
#prints "getting a component of submodule bacon: something"
from spam import eggs
#prints "getting a component of spam: eggs"
assert eggs == "supported attribute"
import spam.ham #ham isn't a submodule, raises error!
The way PseudoPackage is setup also makes arbitrary depth packages very easy although this specific example doesn't accomplish much:
def make_abstract_package(qualname = ""):
"makes a PseudoPackage that has arbitrary nesting of subpackages"
def gen_func(attr, namespace):
print("getting {!r} from package {!r}".format(attr, qualname))
return make_abstract_package("{}.{}".format(qualname, attr))
#can pass the name of the module as second argument if needed
return PseudoPackage(gen_func, qualname)
sys.modules["foo"] = make_abstract_package("foo")
from foo.bar.baz import thing_I_want
##prints:
# getting 'bar' from package 'foo'
# getting 'baz' from package 'foo.bar'
# getting 'thing_I_want' from package 'foo.bar.baz'
print(thing_I_want)
#prints "<module 'foo.bar.baz.thing_I_want' from '<PseudoPackage>'>"
Few notes on implementation
As general guidelines:
The function that computes attributes of the module should not import the module it's defining the attributes for
If you want a package or module to be available for import, you need to put it in sys.modules yourself.
PseudoPackage assumes each submodule is unique, don't reuse module objects.
It is also worth noting that sys.modules is only updated with submodules of PseudoPackages when an import statement that requires the name to be a module, for example if foo is a package already in sys.modules but foo.x has not been referenced yet then all these assertions will pass:
assert "foo.x" not in sys.modules and not hasattr(foo,"x")
import foo; foo.x #foo.x is computed but not added to sys.modules
assert "foo.x" not in sys.modules and hasattr(foo,"x")
from foo import x #x is retrieved from namespace but sys.modules is still not affected
assert "foo.x" not in sys.modules
import foo.x #if x is a module then "foo.x" is added to sys.modules
assert "foo.x" in sys.modules
as well in the above case if foo.x isn't a module then the statement import foo.x raises a ModuleNotFoundError.
Finally, while the problematic edge cases I have identified can be avoided by following the guidelines above, the docstring for _PseudoPackageLoader describes the implementation details responsible for unwanted behaviour for possible future modifications.
The recipe
import sys
from types import ModuleType
import importlib.abc #uses Loader and MetaPathFinder, more for inspection purposes then use
class RawPseudoModule(ModuleType):
"""
see PseudoModule for documentation, this class is not intended for direct use.
RawPseudoModule does not handle __path__ so the generating function of direct
instances are expected to make and return an appropriate value for __path__
*** if you do not know what an appropriate value for __path__ is
then use PseudoModule instead ***
"""
#using slots keeps these two variables out of the module dictionary
__slots__ = ["__generating_func", "__remember_results"]
def __init__(self, func, name=None, remember_results = True):
name = name or func.__name__
super(RawPseudoModule, self).__init__(name)
self.__file__ = "<{0.__class__.__name__}>".format(self)
self.__generating_func = func
self.__remember_results = remember_results
def __getattr__(self, attr):
value = self.__generating_func(attr, vars(self))
if self.__remember_results:
setattr(self, attr, value)
return value
class PseudoModule(RawPseudoModule):
"""
A module that has attributes generated from a specified function
The generating function passed to the constructor should have the signature:
f(attr:str, namespace:dict) -> object:
- attr is the name of the attribute accessed
- namespace is the currently defined values in the module
the function should return a value for the attribute or raise an AttributeError if it doesn't exist.
by default the result is then saved to the namespace so you don't
have to explicitly do "namespace[attr] = <value>" however this behaviour
can be overridden by specifying "remember_results = False" in the constructor.
If no name is specified in the constructor the function name will be
used for the module name instead, this allows the class to be used as a decorator
Note: the PseudoModule class is setup so that "import foo.bar"
when foo is a PseudoModule will fail stating "'foo' is not a package".
- to allow importing submodules use PseudoPackage.
- to handle the internal __path__ manually use RawPseudoPackage.
Note: the module is NOT added to sys.modules automatically.
"""
def __getattr__(self, attr):
#to not have submodules then __path__ must not exist
if attr == "__path__":
msg = "{0.__name__} is a PseudoModule, it is not a package so it doesn't have a __path__"
#this error message would only be seen by people who explicitly access __path__
raise AttributeError(msg.format(self))
return super(PseudoModule, self).__getattr__(attr)
class PseudoPackage(RawPseudoModule):
"""
A version of PseudoModule that sets itself up to allow importing subpackages
When a submodule is imported from a PseudoPackage:
- it is evaluated with the generating function.
- the name of the submodule is overriden to be correctly qualified
- and it is added to sys.modules to allow repeated imports.
Note: the top level package still needs to be added to sys.modules manually
Note: A RecursionError will be raised if the code that generates submodules
attempts to import another submodule from the PseudoPackage.
"""
#IMPLEMENTATION DETAIL: technically this doesn't deal with adding submodules to
# sys.modules, that is handled in _PseudoPackageLoader
# which explicitly checks for instances of PseudoPackage
__path__ = [] #packages must have a __path__ to be recognized as packages.
def __getattr__(self, attr):
value = super(PseudoPackage, self).__getattr__(attr)
if isinstance(value, ModuleType):
#I'm just going to say if it's a module then the name must be in this format.
value.__name__ = self.__name__ + "." + attr
return value
class _PseudoPackageLoader(importlib.abc.Loader, importlib.abc.MetaPathFinder):
"""
Singleton finder and loader for pseudo packages
When ever a subpackage of a PseudoPackage (that is already in sys.modules) is imported
this will handle loading it and adding the subpackage to sys.modules
Note that although PEP 302 states the finder should not depend on the parent
being loaded in sys.modules, this is implemented under the understanding that
the user of PseudoPackage will add their module to sys.modules manually themselves
so this will work only when the parent is present in sys.modules
Also PEP 302 indicates the module should be added to sys.modules first in case
it is imported during it's execution, however this is impossible due to the
nature of how the module actually gets loaded.
So for heaven's sake don't try to import a pseudo package or a module that uses
a pseudo package from within the code that generates it.
I have only tested this when the sub module is either PseudoModule or PseudoPackage
and it was created new from the generating function, ideally there would be a way
to allow the generating function to return an unexecuted module and this would
properly handle executing it but I don't know how to deal with that.
"""
def find_module(self, fullname, path):
#this will only support loading if the parent package is a PseudoPackage
base,_,_ = fullname.rpartition(".")
if isinstance(sys.modules.get(base), PseudoPackage):
return self
#I found that `if path is PseudoPackage.__path__` worked the same way for all the cases I tested
#however since load_module will fail if the base part isn't in sys.modules
# it seems safer to just check for that.
def load_module(self, fullname):
if fullname in sys.modules:
return sys.modules[fullname]
base,_,sub = fullname.rpartition(".")
parent = sys.modules[base]
try:
submodule = getattr(parent, sub)
except AttributeError:
#when we just access `foo.x` it raises an AttributeError
#but `import foo.x` should instead raise an ImportError
raise ImportError("cannot import name {!r}".format(sub))
if not isinstance(submodule, ModuleType):
#match the format of error raised when the submodule isn't a module
#example: `import sys.path` raises the same format of error.
raise ModuleNotFoundError("No module named {}".format(fullname))
#fill all the fields as described in PEP 302 except __name__
submodule.__loader__ = self
submodule.__package__ = base
submodule.__file__ = getattr(submodule, "__file__", "<submodule of PseudoPackage>")
#if there was a way to do this before the module was made that'd be nice
sys.modules[fullname] = submodule
#if we needed to execute the body of an unloaded module it'd be done here.
return submodule
#add the loader to sys.meta_path so it will handle our pseudo packages
sys.meta_path.append(_PseudoPackageLoader())

Thanks to the link #TadhgMcDonald-Jensen provided I managed to solve it:
import sys
from types import ModuleType
class FooImporter(object):
module = ModuleType('foo')
module.__path__ = [module.__name__]
def find_module(self, fullname, path):
if fullname == self.module.__name__:
return self
if path == [self.module.__name__]:
return self
def load_module(self, fullname):
if fullname == self.module.__name__:
return sys.modules.setdefault(fullname, self.module)
assert fullname.startswith(self.module.__name__ + '.')
try:
return sys.modules[fullname]
except KeyError:
submodule = ModuleType(fullname)
name = fullname[len(self.module.__name__) + 1:]
setattr(self.module, name, submodule)
sys.modules[fullname] = submodule
return submodule
sys.meta_path.append(FooImporter())
from foo import bar
#TadhgMcDonald-Jensen - please make an answer so that I can approve it.

Related

How can I redirect module imports with modern Python?

I am maintaining a python package in which I did some restructuring. Now, I want to support clients who still do from my_package.old_subpackage.foo import Foo instead of the new from my_package.new_subpackage.foo import Foo, without explicitly reintroducing many files that do the forwarding. (old_subpackage still exists, but no longer contains foo.py.)
I have learned that there are "loaders" and "finders", and my impression was that I should implement a loader for my purpose, but I only managed to implement a finder so far:
RENAMED_PACKAGES = {
'my_package.old_subpackage.foo': 'my_package.new_subpackage.foo',
}
# TODO: ideally, we would not just implement a "finder", but also a "loader"
# (using the importlib.util.module_for_loader decorator); this would enable us
# to get module contents that also pass identity checks
class RenamedFinder:
#classmethod
def find_spec(cls, fullname, path, target=None):
renamed = RENAMED_PACKAGES.get(fullname)
if renamed is not None:
sys.stderr.write(
f'WARNING: {fullname} was renamed to {renamed}; please adapt import accordingly!\n')
return importlib.util.find_spec(renamed)
return None
sys.meta_path.append(RenamedFinder())
https://docs.python.org/3.5/library/importlib.html#importlib.util.module_for_loader and related functionality, however, seem to be deprecated. I know it's not a very pythonic thing I am trying to achieve, but I would be glad to learn that it's achievable.

On import of your package's __init__.py, you can place whatever objects you want into sys.modules, the values you put in there will be returned by import statements:
from . import new_package
from .new_package import module1, module2
import sys
sys.modules["my_lib.old_package"] = new_package
sys.modules["my_lib.old_package.module1"] = module1
sys.modules["my_lib.old_package.module2"] = module2
If someone now uses import my_lib.old_package or import my_lib.old_package.module1 they will obtain a reference to my_lib.new_package.module1. Since the import machinery already finds the keys in the sys.modules dictionary, it never even begins looking for the old files.
If you want to avoid importing all the submodules immediately, you can emulate a bit of lazy loading by placing a module with a __getattr__ in sys.modules:
from types import ModuleType
import importlib
import sys
class LazyModule(ModuleType):
def __init__(self, name, mod_name):
super().__init__(name)
self.__mod_name = name
def __getattr__(self, attr):
if "_lazy_module" not in self.__dict__:
self._lazy_module = importlib.import(self.__mod_name, package="my_lib")
return self._lazy_module.__getattr__(attr)
sys.modules["my_lib.old_package"] = LazyModule("my_lib.old_package", "my_lib.new_package")

In the init file of the old module, have it import from the newer modules
Old (package.oldpkg):
foo = __import__("Path to new module")
New (package.newpkg):
class foo:
bar = "thing"
so
package.oldpkg.foo.bar is the same as package.newpkg.foo.bar
Hope this helps!

I think that this is what you are looking for:
RENAMED_PACKAGES = {
'my_package.old_subpackage.foo': 'my_package.new_subpackage.foo',
}
class RenamedFinder:
#classmethod
def find_spec(cls, fullname, path, target=None):
renamed = RENAMED_PACKAGES.get(fullname)
if renamed is not None:
sys.stderr.write(
f'WARNING: {fullname} was renamed to {renamed}; please adapt import accordingly!\n')
spec = importlib.util.find_spec(renamed)
spec.loader = cls
return spec
return None
#staticmethod
def create_module(spec):
return importlib.import_module(spec.name)
#staticmethod
def exec_module(module):
pass
sys.meta_path.append(RenamedFinder())
Still, IMO the approach that manipulates sys.modules is preferable as it is more readable, more explicit, and provides you much more control. It might become useful especially in further versions of your package when my_package.new_subpackage.foo starts to diverge from my_package.old_subpackage.foo while you would still need to provide the old one for backward compatibility. For that reason, you would maybe need to preserve the code of both anyway.

Consolidate all the old package names into my_package.
Old packages (old_package):
image_processing (class) Will be deleted and replaced by better_image_processing
text_recognition (class) Will be deleted and replaced by better_text_recognition
foo (variable) Will be moved to better_text_recognition
still_there (class) Will not move
New packages:
super_image_processing
better_text_recognition
Redirector (class of my_package):
class old_package:
image_processing = super_image_processing # Will be replaced
text_recognition = better_text_recognition # Will be replaced
Your main new module (my_package):
#imports here
class super_image_processing:
def its(gets,even,better):
pass
class better_text_recognition:
def now(better,than,ever):
pass
class old_package:
#Links
image_processing = super_image_processing
text_recognition = better_text_recognition
still_there = __import__("path to unchanged module")
This allows you to delete some files and keep the rest. If you want to redirect variables you would do:
class super_image_processing:
def its(gets,even,better):
pass
class better_text_recognition:
def now(better,than,ever):
pass
class old_package:
#Links
image_processing = super_image_processing
text_recognition = better_text_recognition
foo = text_recognition.foo
still_there = __import__("path to unchanged module")
Would this work?

How to search for a main module using pyclbr in Python3?

I want to get all the functions and classes in module: __main__ of the source code directory: /tmp/rebound/rebound.
When I use the pyclbr.readmodule_ex API:
source_code_data = pyclbr.readmodule_ex(source_code_module, path=source_code_path)
I specify it the module and it's path:
DEBUG:root:Source code module: __main__, Source code path: ['/tmp/rebound/rebound/rebound']
I then get this error:
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/importlib/util.py", line 69, in _find_spec_from_path
raise ValueError('{}.__spec__ is None'.format(name))
ValueError: __main__.__spec__ is None
I then tried to use the function that is not supposed to be used by the public: _readmodule:
source_code_data = pyclbr._readmodule(source_code_module, source_code_path, )
But I could not decide what should be the value of the parameter: inpackage.
Upon tracing the code via debugger, I spotted a mistake:
def _find_spec_from_path(name, path=None):
"""Return the spec for the specified module.
First, sys.modules is checked to see if the module was already imported. If
so, then sys.modules[name].__spec__ is returned. If that happens to be
set to None, then ValueError is raised. If the module is not in
sys.modules, then sys.meta_path is searched for a suitable spec with the
value of 'path' given to the finders. None is returned if no spec could
be found.
Dotted names do not have their parent packages implicitly imported. You will
most likely need to explicitly import all parent packages in the proper
order for a submodule to get the correct spec.
"""
if name not in sys.modules:
return _find_spec(name, path)
else:
module = sys.modules[name]
if module is None:
return None
try:
spec = module.__spec__
except AttributeError:
raise ValueError('{}.__spec__ is not set'.format(name)) from None
else:
if spec is None:
raise ValueError('{}.__spec__ is None'.format(name))
return spec
This is the function in the module: python3.8/importlib/util.py and it evaluates __main__ as a built-in module as it falls in the else block.
How do I differentiate __main__ of my target source code to read from the built-in __main__? In other words, how do I read the module __main__ of the codebase: rebound?

TL:DR
Try:
source_code_data = pyclbr.readmodule_ex("rebound.__main__", path=source_code_path)
Explanation
As you already know: _find_spec_from_path will search for name in sys.modules and
__main__ is always present there.
If you inspect sys.modules.keys() you'll notice that it contains dot separated module names.
Example from Ipython shell:
'IPython.display',
'IPython.extensions',
'IPython.extensions.storemagic',
'IPython.lib',
'IPython.lib.backgroundjobs',
'IPython.lib.clipboard',
'IPython.lib.display',
'IPython.lib.pretty',
'IPython.lib.security',
'IPython.paths',
And if you realize you are looking for rebound.__main__ and not __main__ it becomes obvious. In order to step into if block the name can't be in sys.modules. The last remark would be that _find_spec_from_path has no bugs.
# python3.8/importlib/util.py
def _find_spec_from_path(name, path=None):
# ...
if name not in sys.modules:
return _find_spec(name, path)
else:
#...

How to use sys.path_hooks for customized loading of modules?

I hope the following question is not too long. But otherwise I cannot explain by problem and what I want:
Learned from How to use importlib to import modules from arbitrary sources? (my question of yesterday)
I have written a specfic loader for a new file type (.xxx).
(In fact the xxx is an encrypted version of a pyc to protect code from being stolen).
I would like just to add an import hook for the new file type "xxx" without affecting the other types (.py, .pyc, .pyd) in any way.
Now, the loader is ModuleLoader, inheriting from mportlib.machinery.SourcelessFileLoader.
Using sys.path_hooks the loader shall be added as a hook:
myFinder = importlib.machinery.FileFinder
loader_details = (ModuleLoader, ['.xxx'])
sys.path_hooks.append(myFinder.path_hook(loader_details))
Note: This is activated once by calling modloader.activateLoader()
Upon loading a module named test (which is a test.xxx) I get:
>>> import modloader
>>> modloader.activateLoader()
>>> import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'test'
>>>
However, when I delete content of sys.path_hooks before adding the hook:
sys.path_hooks = []
sys.path.insert(0, '.') # current directory
sys.path_hooks.append(myFinder.path_hook(loader_details))
it works:
>>> modloader.activateLoader()
>>> import test
using xxx class
in xxxLoader exec_module
in xxxLoader get_code: .\test.xxx
ANALYZING ...
GENERATE CODE OBJECT ...
2 0 LOAD_CONST 0
3 LOAD_CONST 1 ('foo2')
6 MAKE_FUNCTION 0
9 STORE_NAME 0 (foo2)
12 LOAD_CONST 2 (None)
15 RETURN_VALUE
>>>>>> test
<module 'test' from '.\\test.xxx'>
The module is imported correctly after conversion of the files content to a code object.
However I cannot load the same module from a package: import pack.test
Note: __init__.py is of course as an empty file in pack directory.
>>> import pack.test
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.test'; 'pack' is not a package
>>>
Not enough, I cannot load plain *.py modules from that package anymore: I get the same error as above:
>>> import pack.testpy
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.testpy'; 'pack' is not a package
>>>
For my understanding sys.path_hooks is traversed until the last entry is tried. So why is the first variant (without deleting sys.path_hooks) not recognizing the new extension "xxx" and the second variant (deleting sys.path_hooks) do?
It looks like the machinery is throwing an exception rather than traversing further to the next entry, when an entry of sys.path_hooks is not able to recognize "xxx".
And why is the second version working for py, pyc and xxx modules in the current directory, but not working in the package pack? I would expect that py and pyc is not even working in the current dir, because sys.path_hooks contains only a hook for "xxx"...

The short answer is that the default PathFinder in sys.meta_path isn't meant to have new file extensions and importers added in the same paths it already supports. But there's still hope!
Quick Breakdown
sys.path_hooks is consumed by the importlib._bootstrap_external.PathFinder class.
When an import happens, each entry in sys.meta_path is asked to find a matching spec for the requested module. The PathFinder in particular will then take the contents of sys.path and pass it to the factory functions in sys.path_hooks. Each factory function has a chance to either raise an ImportError (basically the factory saying "nope, I don't support this path entry") or return a finder instance for that path. The first successfully returned finder is then cached in sys.path_importer_cache. From then on PathFinder will only ask those cached finder instances if they can provide the requested module.
If you look at the contents of sys.path_importer_cache, you'll see all of the directory entries from sys.path have been mapped to FileFinder instances. Non-directory entries (zip files, etc) will be mapped to other finders.
Thus, if you append a new factory created via FileFinder.path_hook to sys.path_hooks, your factory will only be invoked if the previous FileFinder hook didn't accept the path. This is unlikely, since FileFinder will work on any existing directory.
Alternatively, if you insert your new factory to sys.path_hooks ahead of the existing factories, the default hook will only be used if your new factory doesn't accept the path. And again, since FileFinder is so liberal with what it will accept, this would lead to only your loader being used, as you've already observed.
Making it Work
So you can either try to adjust that existing factory to also support your file extension and importer (which is difficult as the importers and extension string tuples are held in a closure), or do what I ended up doing, which is add a new meta path finder.
So eg. from my own project,
import sys
from importlib.abc import FileLoader
from importlib.machinery import FileFinder, PathFinder
from os import getcwd
from os.path import basename
from sibilant.module import prep_module, exec_module
SOURCE_SUFFIXES = [".lspy", ".sibilant"]
_path_importer_cache = {}
_path_hooks = []
class SibilantPathFinder(PathFinder):
"""
An overridden PathFinder which will hunt for sibilant files in
sys.path. Uses storage in this module to avoid conflicts with the
original PathFinder
"""
#classmethod
def invalidate_caches(cls):
for finder in _path_importer_cache.values():
if hasattr(finder, 'invalidate_caches'):
finder.invalidate_caches()
#classmethod
def _path_hooks(cls, path):
for hook in _path_hooks:
try:
return hook(path)
except ImportError:
continue
else:
return None
#classmethod
def _path_importer_cache(cls, path):
if path == '':
try:
path = getcwd()
except FileNotFoundError:
# Don't cache the failure as the cwd can easily change to
# a valid directory later on.
return None
try:
finder = _path_importer_cache[path]
except KeyError:
finder = cls._path_hooks(path)
_path_importer_cache[path] = finder
return finder
class SibilantSourceFileLoader(FileLoader):
def create_module(self, spec):
return None
def get_source(self, fullname):
return self.get_data(self.get_filename(fullname)).decode("utf8")
def exec_module(self, module):
name = module.__name__
source = self.get_source(name)
filename = basename(self.get_filename(name))
prep_module(module)
exec_module(module, source, filename=filename)
def _get_lspy_file_loader():
return (SibilantSourceFileLoader, SOURCE_SUFFIXES)
def _get_lspy_path_hook():
return FileFinder.path_hook(_get_lspy_file_loader())
def _install():
done = False
def install():
nonlocal done
if not done:
_path_hooks.append(_get_lspy_path_hook())
sys.meta_path.append(SibilantPathFinder)
done = True
return install
_install = _install()
_install()
The SibilantPathFinder overrides PathFinder and replaces only those methods which reference sys.path_hook and sys.path_importer_cache with similar implementations which instead look in a _path_hook and _path_importer_cache which are local to this module.
During import, the existing PathFinder will try to find a matching module. If it cannot, then my injected SibilantPathFinder will re-traverse the sys.path and try to find a match with one of my own file extensions.
Figuring More Out
I ended up delving into the source for the _bootstrap_external module
https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py
The _install function and the PathFinder.find_spec method are the best starting points to seeing why things work the way they do.

#obriencj's analysis of the situation is correct. But I came up with a different solution to this problem that doesn't require putting anything in sys.meta_path. Instead, it installs a special hook in sys.path_hooks that acts almost as a sort of middle-ware between the PathFinder in sys.meta_path, and the hooks in sys.path_hooks where, rather than just using the first hook that says "I can handle this path!" it tries all matching hooks in order, until it finds one that actually returns a useful ModuleSpec from its find_spec method:
#PathEntryFinder.register
class MetaFileFinder:
"""
A 'middleware', if you will, between the PathFinder sys.meta_path hook,
and sys.path_hooks hooks--particularly FileFinder.
The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
it will handle *any* directory. So if one wants to insert another
FileFinder.path_hook into sys.path_hooks, that will totally take over
importing for any directory, and previous path hooks will be ignored.
This class provides its own sys.path_hooks hook as follows: If inserted
on sys.path_hooks (it should be inserted early so that it can supersede
anything else). Its find_spec method then calls each hook on
sys.path_hooks after itself and, for each hook that can handle the given
sys.path entry, it calls the hook to create a finder, and calls that
finder's find_spec. So each sys.path_hooks entry is tried until a spec is
found or all finders are exhausted.
"""
class hook:
"""
Use this little internal class rather than a function with a closure
or a classmethod or anything like that so that it's easier to
identify our hook and skip over it while processing sys.path_hooks.
"""
def __init__(self, basepath=None):
self.basepath = os.path.abspath(basepath)
def __call__(self, path):
if not os.path.isdir(path):
raise ImportError('only directories are supported', path=path)
elif not self.handles(path):
raise ImportError(
'only directories under {} are supported'.format(
self.basepath), path=path)
return MetaFileFinder(path)
def handles(self, path):
"""
Return whether this hook will handle the given path, depending on
what its basepath is.
"""
path = os.path.abspath(path)
return (self.basepath is None or
os.path.commonpath([self.basepath, path]) == self.basepath)
def __init__(self, path):
self.path = path
self._finder_cache = {}
def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, self.path)
def find_spec(self, fullname, target=None):
if not sys.path_hooks:
return None
last = len(sys.path_hooks) - 1
for idx, hook in enumerate(sys.path_hooks):
if isinstance(hook, self.__class__.hook):
continue
finder = None
try:
if hook in self._finder_cache:
finder = self._finder_cache[hook]
if finder is None:
# We've tried this finder before and got an ImportError
continue
except TypeError:
# The hook is unhashable
pass
if finder is None:
try:
finder = hook(self.path)
except ImportError:
pass
try:
self._finder_cache[hook] = finder
except TypeError:
# The hook is unhashable for some reason so we don't bother
# caching it
pass
if finder is not None:
spec = finder.find_spec(fullname, target)
if (spec is not None and
(spec.loader is not None or idx == last)):
# If no __init__.<suffix> was found by any Finder,
# we may be importing a namespace package (which
# FileFinder.find_spec returns in this case). But we
# only want to return the namespace ModuleSpec if we've
# exhausted every other finder first.
return spec
# Module spec not found through any of the finders
return None
def invalidate_caches(self):
for finder in self._finder_cache.values():
finder.invalidate_caches()
#classmethod
def install(cls, basepath=None):
"""
Install the MetaFileFinder in the front sys.path_hooks, so that
it can support any existing sys.path_hooks and any that might
be appended later.
If given, only support paths under and including basepath. In this
case it's not necessary to invalidate the entire
sys.path_importer_cache, but only any existing entries under basepath.
"""
if basepath is not None:
basepath = os.path.abspath(basepath)
hook = cls.hook(basepath)
sys.path_hooks.insert(0, hook)
if basepath is None:
sys.path_importer_cache.clear()
else:
for path in list(sys.path_importer_cache):
if hook.handles(path):
del sys.path_importer_cache[path]
This is still, depressing, far more complication than should be necessary. I feel like on Python 2, before the import system rewrite, it was much simpler to do this since less of the support for the built-in module types (.py, etc.) was built on top of the import hooks themselves, so it was harder to break importing normal modules by adding hooks to import new modules types. I'm going to start a discussion on python-ideas to see if there's any way we can't improve this situation.

I came up with yet an alternative tweak. I won't say it is beautiful as it does a closure on an already existing one, but at least short :)
It adds loaders to the default FileLoader objects through a new hook. The original path_hook_for_FileFinder is wrapped in a closure and the loaders are injected into the FileFinder objects returned by the original hook.
After the new hook added the path_importer_cache is cleared as that is already filled with the original FileFinder objects. Those could also be updated dynamically, but I did not bother for now.
Disclaimer: not extensively tested yet. It does what I need in the easiest possible way I know, but the import system is complicated enough to produce funny side-effects for a tweak like this.
import sys
import importlib.machinery
def extend_path_hook_for_FileFinder(*loader_details):
orig_hook, orig_pos = None, None
for i, hook in enumerate(sys.path_hooks):
if hook.__name__ == 'path_hook_for_FileFinder':
orig_hook, orig_pos = hook, i
break
sys.path_hooks.remove(orig_hook)
def extended_path_hook_for_FileFinder(path):
orig_finder = orig_hook(path)
loaders = []
for loader, suffixes in loader_details:
loaders.extend((suffix, loader) for suffix in suffixes)
orig_finder._loaders.extend(loaders)
return orig_finder
sys.path_hooks.insert(orig_pos, extended_path_hook_for_FileFinder)
MY_SUFFIXES = ['.pymy']
class MySourceFileLoader(importlib.machinery.SourceFileLoader):
pass
loader_detail = (MySourceFileLoader, MY_SUFFIXES)
extend_path_hook_for_FileFinder(loader_detail)
# empty cache as it is already filled with simple FileFinder
# objects for the most common path elements
sys.path_importer_cache.clear()
sys.path_importer_cache.invalidate_caches()

Dynamically reload a class definition in Python

I've written an IRC bot using Twisted and now I've gotten to the point where I want to be able to dynamically reload functionality.
In my main program, I do from bots.google import GoogleBot and I've looked at how to use reload to reload modules, but I still can't figure out how to do dynamic re-importing of classes.
So, given a Python class, how do I dynamically reload the class definition?

Reload is unreliable and has many corner cases where it may fail. It is suitable for reloading simple, self-contained, scripts. If you want to dynamically reload your code without restart consider using forkloop instead:
http://opensourcehacker.com/2011/11/08/sauna-reload-the-most-awesomely-named-python-package-ever/

You cannot reload the module using reload(module) when using the from X import Y form. You'd have to do something like reload(sys.modules['module']) in that case.
This might not necessarily be the best way to do what you want, but it works!
import bots.google
class BotClass(irc.IRCClient):
def __init__(self):
global plugins
plugins = [bots.google.GoogleBot()]
def privmsg(self, user, channel, msg):
global plugins
parts = msg.split(' ')
trigger = parts[0]
if trigger == '!reload':
reload(bots.google)
plugins = [bots.google.GoogleBot()]
print "Successfully reloaded plugins"

I figured it out, here's the code I use:
def reimport_class(self, cls):
"""
Reload and reimport class "cls". Return the new definition of the class.
"""
# Get the fully qualified name of the class.
from twisted.python import reflect
full_path = reflect.qual(cls)
# Naively parse the module name and class name.
# Can be done much better...
match = re.match(r'(.*)\.([^\.]+)', full_path)
module_name = match.group(1)
class_name = match.group(2)
# This is where the good stuff happens.
mod = __import__(module_name, fromlist=[class_name])
reload(mod)
# The (reloaded definition of the) class itself is returned.
return getattr(mod, class_name)

Better yet subprocess the plugins, then hypervise the subprocess, when the files change reload the plugins process.
Edit: cleaned up.

You can use the sys.modules to dynamically reload modules based on user-input.
Say that you have a folder with multiple plugins such as:
module/
cmdtest.py
urltitle.py
...
You can use sys.modules in this way to load/reload modules based on userinput:
import sys
if sys.modules['module.' + userinput]:
reload(sys.modules['module.' + userinput])
else:
' Module not loaded. Cannot reload '
try:
module = __import__("module." + userinput)
module = sys.modules["module." + userinput]
except:
' error when trying to load %s ' % userinput

When you do a from ... import ... it binds the object into the local namespace, so all you need to is re-import it. However, since the module is already loaded, it will just re-import the same version of the class so you would need to reload the module too. So this should do it:
from bots.google import GoogleBot
...
# do stuff
...
reload(bots.google)
from bots.google import GoogleBot
If for some reason you don't know the module name you can get it from GoogleBot.module.

def reload_class(class_obj):
module_name = class_obj.__module__
module = sys.modules[module_name]
pycfile = module.__file__
modulepath = string.replace(pycfile, ".pyc", ".py")
code=open(modulepath, 'rU').read()
compile(code, module_name, "exec")
module = reload(module)
return getattr(module,class_obj.__name__)
There is a lot of error checking you can do on this, if your using global variables you will probably have to figure out what happens then.

How can I get a list of all classes within current module in Python?

I've seen plenty of examples of people extracting all of the classes from a module, usually something like:
# foo.py
class Foo:
pass
# test.py
import inspect
import foo
for name, obj in inspect.getmembers(foo):
if inspect.isclass(obj):
print obj
Awesome.
But I can't find out how to get all of the classes from the current module.
# foo.py
import inspect
class Foo:
pass
def print_classes():
for name, obj in inspect.getmembers(???): # what do I do here?
if inspect.isclass(obj):
print obj
# test.py
import foo
foo.print_classes()
This is probably something really obvious, but I haven't been able to find anything. Can anyone help me out?

Try this:
import sys
current_module = sys.modules[__name__]
In your context:
import sys, inspect
def print_classes():
for name, obj in inspect.getmembers(sys.modules[__name__]):
if inspect.isclass(obj):
print(obj)
And even better:
clsmembers = inspect.getmembers(sys.modules[__name__], inspect.isclass)
Because inspect.getmembers() takes a predicate.

I don't know if there's a 'proper' way to do it, but your snippet is on the right track: just add import foo to foo.py, do inspect.getmembers(foo), and it should work fine.

What about
g = globals().copy()
for name, obj in g.iteritems():
?

I was able to get all I needed from the dir built in plus getattr.
# Works on pretty much everything, but be mindful that
# you get lists of strings back
print dir(myproject)
print dir(myproject.mymodule)
print dir(myproject.mymodule.myfile)
print dir(myproject.mymodule.myfile.myclass)
# But, the string names can be resolved with getattr, (as seen below)
Though, it does come out looking like a hairball:
def list_supported_platforms():
"""
List supported platforms (to match sys.platform)
#Retirms:
list str: platform names
"""
return list(itertools.chain(
*list(
# Get the class's constant
getattr(
# Get the module's first class, which we wrote
getattr(
# Get the module
getattr(platforms, item),
dir(
getattr(platforms, item)
)[0]
),
'SYS_PLATFORMS'
)
# For each include in platforms/__init__.py
for item in dir(platforms)
# Ignore magic, ourselves (index.py) and a base class.
if not item.startswith('__') and item not in ['index', 'base']
)
))

import pyclbr
print(pyclbr.readmodule(__name__).keys())
Note that the stdlib's Python class browser module uses static source analysis, so it only works for modules that are backed by a real .py file.

If you want to have all the classes, that belong to the current module, you could use this :
import sys, inspect
def print_classes():
is_class_member = lambda member: inspect.isclass(member) and member.__module__ == __name__
clsmembers = inspect.getmembers(sys.modules[__name__], is_class_member)
If you use Nadia's answer and you were importing other classes on your module, that classes will be being imported too.
So that's why member.__module__ == __name__ is being added to the predicate used on is_class_member. This statement checks that the class really belongs to the module.
A predicate is a function (callable), that returns a boolean value.

This is the line that I use to get all of the classes that have been defined in the current module (ie not imported). It's a little long according to PEP-8 but you can change it as you see fit.
import sys
import inspect
classes = [name for name, obj in inspect.getmembers(sys.modules[__name__], inspect.isclass)
if obj.__module__ is __name__]
This gives you a list of the class names. If you want the class objects themselves just keep obj instead.
classes = [obj for name, obj in inspect.getmembers(sys.modules[__name__], inspect.isclass)
if obj.__module__ is __name__]
This is has been more useful in my experience.

Another solution which works in Python 2 and 3:
#foo.py
import sys
class Foo(object):
pass
def print_classes():
current_module = sys.modules[__name__]
for key in dir(current_module):
if isinstance( getattr(current_module, key), type ):
print(key)
# test.py
import foo
foo.print_classes()

I think that you can do something like this.
class custom(object):
__custom__ = True
class Alpha(custom):
something = 3
def GetClasses():
return [x for x in globals() if hasattr(globals()[str(x)], '__custom__')]
print(GetClasses())`
if you need own classes

I frequently find myself writing command line utilities wherein the first argument is meant to refer to one of many different classes. For example ./something.py feature command —-arguments, where Feature is a class and command is a method on that class. Here's a base class that makes this easy.
The assumption is that this base class resides in a directory alongside all of its subclasses. You can then call ArgBaseClass(foo = bar).load_subclasses() which will return a dictionary. For example, if the directory looks like this:
arg_base_class.py
feature.py
Assuming feature.py implements class Feature(ArgBaseClass), then the above invocation of load_subclasses will return { 'feature' : <Feature object> }. The same kwargs (foo = bar) will be passed into the Feature class.
#!/usr/bin/env python3
import os, pkgutil, importlib, inspect
class ArgBaseClass():
# Assign all keyword arguments as properties on self, and keep the kwargs for later.
def __init__(self, **kwargs):
self._kwargs = kwargs
for (k, v) in kwargs.items():
setattr(self, k, v)
ms = inspect.getmembers(self, predicate=inspect.ismethod)
self.methods = dict([(n, m) for (n, m) in ms if not n.startswith('_')])
# Add the names of the methods to a parser object.
def _parse_arguments(self, parser):
parser.add_argument('method', choices=list(self.methods))
return parser
# Instantiate one of each of the subclasses of this class.
def load_subclasses(self):
module_dir = os.path.dirname(__file__)
module_name = os.path.basename(os.path.normpath(module_dir))
parent_class = self.__class__
modules = {}
# Load all the modules it the package:
for (module_loader, name, ispkg) in pkgutil.iter_modules([module_dir]):
modules[name] = importlib.import_module('.' + name, module_name)
# Instantiate one of each class, passing the keyword arguments.
ret = {}
for cls in parent_class.__subclasses__():
path = cls.__module__.split('.')
ret[path[-1]] = cls(**self._kwargs)
return ret

import Foo
dir(Foo)
import collections
dir(collections)

The following can be placed at the top of the file:
def get_classes():
import inspect, sys
return dict(inspect.getmembers(
sys.modules[__name__],
lambda member: inspect.isclass(member) and member.__module__ == __name__
))
Note, this can be placed at the top of the module because we've wrapped the logic in a function definition. If you want the dictionary to exist as a top-level object you will need to place the definition at the bottom of the file to ensure all classes are included.

Go to Python Interpreter. type help ('module_name') , then press Enter.
e.g. help('os') .
Here, I've pasted one part of the output below:
class statvfs_result(__builtin__.object)
| statvfs_result: Result from statvfs or fstatvfs.
|
| This object may be accessed either as a tuple of
| (bsize, frsize, blocks, bfree, bavail, files, ffree, favail, flag, namemax),
| or via the attributes f_bsize, f_frsize, f_blocks, f_bfree, and so on.
|
| See os.statvfs for more information.
|
| Methods defined here:
|
| __add__(...)
| x.__add__(y) <==> x+y
|
| __contains__(...)
| x.__contains__(y) <==> y in x

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a pseudo-module that creates submodules at runtime - python

Related

How can I redirect module imports with modern Python?

How to search for a main module using pyclbr in Python3?

How to use sys.path_hooks for customized loading of modules?

Dynamically reload a class definition in Python

How can I get a list of all classes within current module in Python?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a pseudo-module that creates submodules at runtime - python

Related

How can I redirect module imports with modern Python?

How to search for a __main__ module using pyclbr in Python3?

How to use sys.path_hooks for customized loading of modules?

Dynamically reload a class definition in Python

How can I get a list of all classes within current module in Python?

Categories

Resources

How to search for a main module using pyclbr in Python3?