How to use sys.path_hooks for customized loading of modules? - python

I hope the following question is not too long. But otherwise I cannot explain by problem and what I want:
Learned from How to use importlib to import modules from arbitrary sources? (my question of yesterday)
I have written a specfic loader for a new file type (.xxx).
(In fact the xxx is an encrypted version of a pyc to protect code from being stolen).
I would like just to add an import hook for the new file type "xxx" without affecting the other types (.py, .pyc, .pyd) in any way.
Now, the loader is ModuleLoader, inheriting from mportlib.machinery.SourcelessFileLoader.
Using sys.path_hooks the loader shall be added as a hook:
myFinder = importlib.machinery.FileFinder
loader_details = (ModuleLoader, ['.xxx'])
sys.path_hooks.append(myFinder.path_hook(loader_details))
Note: This is activated once by calling modloader.activateLoader()
Upon loading a module named test (which is a test.xxx) I get:
>>> import modloader
>>> modloader.activateLoader()
>>> import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'test'
>>>
However, when I delete content of sys.path_hooks before adding the hook:
sys.path_hooks = []
sys.path.insert(0, '.') # current directory
sys.path_hooks.append(myFinder.path_hook(loader_details))
it works:
>>> modloader.activateLoader()
>>> import test
using xxx class
in xxxLoader exec_module
in xxxLoader get_code: .\test.xxx
ANALYZING ...
GENERATE CODE OBJECT ...
2 0 LOAD_CONST 0
3 LOAD_CONST 1 ('foo2')
6 MAKE_FUNCTION 0
9 STORE_NAME 0 (foo2)
12 LOAD_CONST 2 (None)
15 RETURN_VALUE
>>>>>> test
<module 'test' from '.\\test.xxx'>
The module is imported correctly after conversion of the files content to a code object.
However I cannot load the same module from a package: import pack.test
Note: __init__.py is of course as an empty file in pack directory.
>>> import pack.test
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.test'; 'pack' is not a package
>>>
Not enough, I cannot load plain *.py modules from that package anymore: I get the same error as above:
>>> import pack.testpy
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.testpy'; 'pack' is not a package
>>>
For my understanding sys.path_hooks is traversed until the last entry is tried. So why is the first variant (without deleting sys.path_hooks) not recognizing the new extension "xxx" and the second variant (deleting sys.path_hooks) do?
It looks like the machinery is throwing an exception rather than traversing further to the next entry, when an entry of sys.path_hooks is not able to recognize "xxx".
And why is the second version working for py, pyc and xxx modules in the current directory, but not working in the package pack? I would expect that py and pyc is not even working in the current dir, because sys.path_hooks contains only a hook for "xxx"...

The short answer is that the default PathFinder in sys.meta_path isn't meant to have new file extensions and importers added in the same paths it already supports. But there's still hope!
Quick Breakdown
sys.path_hooks is consumed by the importlib._bootstrap_external.PathFinder class.
When an import happens, each entry in sys.meta_path is asked to find a matching spec for the requested module. The PathFinder in particular will then take the contents of sys.path and pass it to the factory functions in sys.path_hooks. Each factory function has a chance to either raise an ImportError (basically the factory saying "nope, I don't support this path entry") or return a finder instance for that path. The first successfully returned finder is then cached in sys.path_importer_cache. From then on PathFinder will only ask those cached finder instances if they can provide the requested module.
If you look at the contents of sys.path_importer_cache, you'll see all of the directory entries from sys.path have been mapped to FileFinder instances. Non-directory entries (zip files, etc) will be mapped to other finders.
Thus, if you append a new factory created via FileFinder.path_hook to sys.path_hooks, your factory will only be invoked if the previous FileFinder hook didn't accept the path. This is unlikely, since FileFinder will work on any existing directory.
Alternatively, if you insert your new factory to sys.path_hooks ahead of the existing factories, the default hook will only be used if your new factory doesn't accept the path. And again, since FileFinder is so liberal with what it will accept, this would lead to only your loader being used, as you've already observed.
Making it Work
So you can either try to adjust that existing factory to also support your file extension and importer (which is difficult as the importers and extension string tuples are held in a closure), or do what I ended up doing, which is add a new meta path finder.
So eg. from my own project,
import sys
from importlib.abc import FileLoader
from importlib.machinery import FileFinder, PathFinder
from os import getcwd
from os.path import basename
from sibilant.module import prep_module, exec_module
SOURCE_SUFFIXES = [".lspy", ".sibilant"]
_path_importer_cache = {}
_path_hooks = []
class SibilantPathFinder(PathFinder):
"""
An overridden PathFinder which will hunt for sibilant files in
sys.path. Uses storage in this module to avoid conflicts with the
original PathFinder
"""
#classmethod
def invalidate_caches(cls):
for finder in _path_importer_cache.values():
if hasattr(finder, 'invalidate_caches'):
finder.invalidate_caches()
#classmethod
def _path_hooks(cls, path):
for hook in _path_hooks:
try:
return hook(path)
except ImportError:
continue
else:
return None
#classmethod
def _path_importer_cache(cls, path):
if path == '':
try:
path = getcwd()
except FileNotFoundError:
# Don't cache the failure as the cwd can easily change to
# a valid directory later on.
return None
try:
finder = _path_importer_cache[path]
except KeyError:
finder = cls._path_hooks(path)
_path_importer_cache[path] = finder
return finder
class SibilantSourceFileLoader(FileLoader):
def create_module(self, spec):
return None
def get_source(self, fullname):
return self.get_data(self.get_filename(fullname)).decode("utf8")
def exec_module(self, module):
name = module.__name__
source = self.get_source(name)
filename = basename(self.get_filename(name))
prep_module(module)
exec_module(module, source, filename=filename)
def _get_lspy_file_loader():
return (SibilantSourceFileLoader, SOURCE_SUFFIXES)
def _get_lspy_path_hook():
return FileFinder.path_hook(_get_lspy_file_loader())
def _install():
done = False
def install():
nonlocal done
if not done:
_path_hooks.append(_get_lspy_path_hook())
sys.meta_path.append(SibilantPathFinder)
done = True
return install
_install = _install()
_install()
The SibilantPathFinder overrides PathFinder and replaces only those methods which reference sys.path_hook and sys.path_importer_cache with similar implementations which instead look in a _path_hook and _path_importer_cache which are local to this module.
During import, the existing PathFinder will try to find a matching module. If it cannot, then my injected SibilantPathFinder will re-traverse the sys.path and try to find a match with one of my own file extensions.
Figuring More Out
I ended up delving into the source for the _bootstrap_external module
https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py
The _install function and the PathFinder.find_spec method are the best starting points to seeing why things work the way they do.

#obriencj's analysis of the situation is correct. But I came up with a different solution to this problem that doesn't require putting anything in sys.meta_path. Instead, it installs a special hook in sys.path_hooks that acts almost as a sort of middle-ware between the PathFinder in sys.meta_path, and the hooks in sys.path_hooks where, rather than just using the first hook that says "I can handle this path!" it tries all matching hooks in order, until it finds one that actually returns a useful ModuleSpec from its find_spec method:
#PathEntryFinder.register
class MetaFileFinder:
"""
A 'middleware', if you will, between the PathFinder sys.meta_path hook,
and sys.path_hooks hooks--particularly FileFinder.
The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
it will handle *any* directory. So if one wants to insert another
FileFinder.path_hook into sys.path_hooks, that will totally take over
importing for any directory, and previous path hooks will be ignored.
This class provides its own sys.path_hooks hook as follows: If inserted
on sys.path_hooks (it should be inserted early so that it can supersede
anything else). Its find_spec method then calls each hook on
sys.path_hooks after itself and, for each hook that can handle the given
sys.path entry, it calls the hook to create a finder, and calls that
finder's find_spec. So each sys.path_hooks entry is tried until a spec is
found or all finders are exhausted.
"""
class hook:
"""
Use this little internal class rather than a function with a closure
or a classmethod or anything like that so that it's easier to
identify our hook and skip over it while processing sys.path_hooks.
"""
def __init__(self, basepath=None):
self.basepath = os.path.abspath(basepath)
def __call__(self, path):
if not os.path.isdir(path):
raise ImportError('only directories are supported', path=path)
elif not self.handles(path):
raise ImportError(
'only directories under {} are supported'.format(
self.basepath), path=path)
return MetaFileFinder(path)
def handles(self, path):
"""
Return whether this hook will handle the given path, depending on
what its basepath is.
"""
path = os.path.abspath(path)
return (self.basepath is None or
os.path.commonpath([self.basepath, path]) == self.basepath)
def __init__(self, path):
self.path = path
self._finder_cache = {}
def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, self.path)
def find_spec(self, fullname, target=None):
if not sys.path_hooks:
return None
last = len(sys.path_hooks) - 1
for idx, hook in enumerate(sys.path_hooks):
if isinstance(hook, self.__class__.hook):
continue
finder = None
try:
if hook in self._finder_cache:
finder = self._finder_cache[hook]
if finder is None:
# We've tried this finder before and got an ImportError
continue
except TypeError:
# The hook is unhashable
pass
if finder is None:
try:
finder = hook(self.path)
except ImportError:
pass
try:
self._finder_cache[hook] = finder
except TypeError:
# The hook is unhashable for some reason so we don't bother
# caching it
pass
if finder is not None:
spec = finder.find_spec(fullname, target)
if (spec is not None and
(spec.loader is not None or idx == last)):
# If no __init__.<suffix> was found by any Finder,
# we may be importing a namespace package (which
# FileFinder.find_spec returns in this case). But we
# only want to return the namespace ModuleSpec if we've
# exhausted every other finder first.
return spec
# Module spec not found through any of the finders
return None
def invalidate_caches(self):
for finder in self._finder_cache.values():
finder.invalidate_caches()
#classmethod
def install(cls, basepath=None):
"""
Install the MetaFileFinder in the front sys.path_hooks, so that
it can support any existing sys.path_hooks and any that might
be appended later.
If given, only support paths under and including basepath. In this
case it's not necessary to invalidate the entire
sys.path_importer_cache, but only any existing entries under basepath.
"""
if basepath is not None:
basepath = os.path.abspath(basepath)
hook = cls.hook(basepath)
sys.path_hooks.insert(0, hook)
if basepath is None:
sys.path_importer_cache.clear()
else:
for path in list(sys.path_importer_cache):
if hook.handles(path):
del sys.path_importer_cache[path]
This is still, depressing, far more complication than should be necessary. I feel like on Python 2, before the import system rewrite, it was much simpler to do this since less of the support for the built-in module types (.py, etc.) was built on top of the import hooks themselves, so it was harder to break importing normal modules by adding hooks to import new modules types. I'm going to start a discussion on python-ideas to see if there's any way we can't improve this situation.

I came up with yet an alternative tweak. I won't say it is beautiful as it does a closure on an already existing one, but at least short :)
It adds loaders to the default FileLoader objects through a new hook. The original path_hook_for_FileFinder is wrapped in a closure and the loaders are injected into the FileFinder objects returned by the original hook.
After the new hook added the path_importer_cache is cleared as that is already filled with the original FileFinder objects. Those could also be updated dynamically, but I did not bother for now.
Disclaimer: not extensively tested yet. It does what I need in the easiest possible way I know, but the import system is complicated enough to produce funny side-effects for a tweak like this.
import sys
import importlib.machinery
def extend_path_hook_for_FileFinder(*loader_details):
orig_hook, orig_pos = None, None
for i, hook in enumerate(sys.path_hooks):
if hook.__name__ == 'path_hook_for_FileFinder':
orig_hook, orig_pos = hook, i
break
sys.path_hooks.remove(orig_hook)
def extended_path_hook_for_FileFinder(path):
orig_finder = orig_hook(path)
loaders = []
for loader, suffixes in loader_details:
loaders.extend((suffix, loader) for suffix in suffixes)
orig_finder._loaders.extend(loaders)
return orig_finder
sys.path_hooks.insert(orig_pos, extended_path_hook_for_FileFinder)
MY_SUFFIXES = ['.pymy']
class MySourceFileLoader(importlib.machinery.SourceFileLoader):
pass
loader_detail = (MySourceFileLoader, MY_SUFFIXES)
extend_path_hook_for_FileFinder(loader_detail)
# empty cache as it is already filled with simple FileFinder
# objects for the most common path elements
sys.path_importer_cache.clear()
sys.path_importer_cache.invalidate_caches()

Related

How to search for a __main__ module using pyclbr in Python3?

I want to get all the functions and classes in module: __main__ of the source code directory: /tmp/rebound/rebound.
When I use the pyclbr.readmodule_ex API:
source_code_data = pyclbr.readmodule_ex(source_code_module, path=source_code_path)
I specify it the module and it's path:
DEBUG:root:Source code module: __main__, Source code path: ['/tmp/rebound/rebound/rebound']
I then get this error:
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/importlib/util.py", line 69, in _find_spec_from_path
raise ValueError('{}.__spec__ is None'.format(name))
ValueError: __main__.__spec__ is None
I then tried to use the function that is not supposed to be used by the public: _readmodule:
source_code_data = pyclbr._readmodule(source_code_module, source_code_path, )
But I could not decide what should be the value of the parameter: inpackage.
Upon tracing the code via debugger, I spotted a mistake:
def _find_spec_from_path(name, path=None):
"""Return the spec for the specified module.
First, sys.modules is checked to see if the module was already imported. If
so, then sys.modules[name].__spec__ is returned. If that happens to be
set to None, then ValueError is raised. If the module is not in
sys.modules, then sys.meta_path is searched for a suitable spec with the
value of 'path' given to the finders. None is returned if no spec could
be found.
Dotted names do not have their parent packages implicitly imported. You will
most likely need to explicitly import all parent packages in the proper
order for a submodule to get the correct spec.
"""
if name not in sys.modules:
return _find_spec(name, path)
else:
module = sys.modules[name]
if module is None:
return None
try:
spec = module.__spec__
except AttributeError:
raise ValueError('{}.__spec__ is not set'.format(name)) from None
else:
if spec is None:
raise ValueError('{}.__spec__ is None'.format(name))
return spec
This is the function in the module: python3.8/importlib/util.py and it evaluates __main__ as a built-in module as it falls in the else block.
How do I differentiate __main__ of my target source code to read from the built-in __main__? In other words, how do I read the module __main__ of the codebase: rebound?
TL:DR
Try:
source_code_data = pyclbr.readmodule_ex("rebound.__main__", path=source_code_path)
Explanation
As you already know: _find_spec_from_path will search for name in sys.modules and
__main__ is always present there.
If you inspect sys.modules.keys() you'll notice that it contains dot separated module names.
Example from Ipython shell:
'IPython.display',
'IPython.extensions',
'IPython.extensions.storemagic',
'IPython.lib',
'IPython.lib.backgroundjobs',
'IPython.lib.clipboard',
'IPython.lib.display',
'IPython.lib.pretty',
'IPython.lib.security',
'IPython.paths',
And if you realize you are looking for rebound.__main__ and not __main__ it becomes obvious. In order to step into if block the name can't be in sys.modules. The last remark would be that _find_spec_from_path has no bugs.
# python3.8/importlib/util.py
def _find_spec_from_path(name, path=None):
# ...
if name not in sys.modules:
return _find_spec(name, path)
else:
#...

Call a function from different file where the file name and function name are read from a list

I have multiple functions stored in different files, Both file names and function names are stored in lists. Is there any option to call the required function without the conditional statements?
Example, file1 has functions function11 and function12,
def function11():
pass
def function12():
pass
file2 has functions function21 and function22
def function21():
pass
def function22():
pass
and I have the lists
file_name = ["file1", "file2", "file1"]
function_name = ["function12", "function22", "funciton12"]
I will get the list index from different function, based on that I need to call the function and get the output.
If the other function will give you a list index directly, then you don't need to deal with the function names as strings. Instead, directly store (without calling) the functions in the list:
import file1, file2
functions = [file1.function12, file2.function22, file1.function12]
And then call them once you have the index:
function[index]()
There are ways to do what is called "reflection" in Python and get from the string to a matching-named function. But they solve a problem that is more advanced than what you describe, and they are more difficult (especially if you also have to work with the module names).
If you have a "whitelist" of functions and modules that are allowed to be called from the config file, but still need to find them by string, you can explicitly create the mapping with a dict:
allowed_functions = {
'file1': {
'function11': file1.function11,
'function12': file1.function12
},
'file2': {
'function21': file2.function21,
'function22': file2.function22
}
}
And then invoke the function:
try:
func = allowed_functions[module_name][function_name]
except KeyError:
raise ValueError("this function/module name is not allowed")
else:
func()
The most advanced approach is if you need to load code from a "plugin" module created by the author. You can use the standard library importlib package to use the string name to find a file to import as a module, and import it dynamically. It looks something like:
from importlib.util import spec_from_file_location, module_from_spec
# Look for the file at the specified path, figure out the module name
# from the base file name, import it and make a module object.
def load_module(path):
folder, filename = os.path.split(path)
basename, extension = os.path.splitext(filename)
spec = spec_from_file_location(basename, path)
module = module_from_spec(spec)
spec.loader.exec_module(module)
assert module.__name__ == basename
return module
This is still unsafe, in the sense that it can look anywhere on the file system for the module. Better if you specify the folder yourself, and only allow a filename to be used in the config file; but then you still have to protect against hacking the path by using things like ".." and "/" in the "filename".
(I have a project that does something like this. It chooses the paths from a whitelist that is also under the user's control, so I have to warn my users not to trust the path-whitelist file from each other. I also search the directories for modules, and then make a whitelist of plugins that may be used, based only on plugins that are in the directory - so no funny games with "..". And I'm still worried I forgot something.)
Once you have a module name, you can get a function from it by name like:
dynamic_module = load_module(some_path)
try:
func = getattr(dynamic_module, function_name)
except AttributeError:
raise ValueError("function not in module")
At any rate, there is no reason to eval anything, or generate and import code based on user input. That is most unsafe of all.
Another alternative. This is not much safer than an eval() however.
Someone with access to the lists you read from the config file could inject malicious code in the lists you import.
I.e.
'from subprocess import call; subprocess.call(["rm", "-rf", "./*" stdout=/dev/null, stderr=/dev/null, shell=True)'
Code:
import re
# You must first create a directory named "test_module"
# You can do this with code if needed.
# Python recognizes a "module" as a module by the existence of an __init__.py
# It will load that __init__.py at the "import" command, and you can access the methods it imports
m = ["os", "sys", "subprocess"] # Modules to import from
f = ["getcwd", "exit", "call; call('do', '---terrible-things')"] # Methods to import
# Create an __init__.py
with open("./test_module/__init__.py", "w") as FH:
for count in range(0, len(m), 1):
# Writes "from module import method" to __init.py
line = "from {} import {}\n".format(m[count], f[count])
# !!!! SANITIZE THE LINE !!!!!
if not re.match("^from [a-zA-Z0-9._]+ import [a-zA-Z0-9._]+$", line):
print("The line '{}' is suspicious. Will not be entered into __init__.py!!".format(line))
continue
FH.write(line)
import test_module
print(test_module.getcwd())
OUTPUT:
The line 'from subprocess import call; call('do', '---terrible-things')' is suspicious. Will not be entered into __init__.py!!
/home/rightmire/eclipse-workspace/junkcode
I'm not 100% sure I'm understanding the need. Maybe more detail in the question.
Is something like this what you're looking for?
m = ["os"]
f = ["getcwd"]
command = ''.join([m[0], ".", f[0], "()"])
# Put in some minimum sanity checking and sanitization!!!
if ";" in command or <other dangerous string> in command:
print("The line '{}' is suspicious. Will not run".format(command))
sys.exit(1)
print("This will error if the method isnt imported...")
print(eval(''.join([m[0], ".", f[0], "()"])) )
OUTPUT:
This will error if the method isnt imported...
/home/rightmire/eclipse-workspace/junkcode
As pointed out by #KarlKnechtel, having commands come in from an external file is a gargantuan security risk!

Creating a pseudo-module that creates submodules at runtime

To support extensions in my Python project, I'm trying to create a pseudo-module that will serve "extension modules" as it's submodules. I'm having a problem treating the submodules as modules - it seems like I need to access them using from..import on the main pseudo-module and can't just access their full path.
Here is a minimal working example:
import sys
from types import ModuleType
class Foo(ModuleType):
#property
def bar(self):
# Here I would actually find the location of `bar.py` and load it
bar = ModuleType('foo.bar')
sys.modules['foo.bar'] = bar
return bar
sys.modules['foo'] = Foo('foo')
from foo import bar # without this line the next line fails
import foo.bar
This works, but if I comment out the from foo import bar line, it'll fail with:
ImportError: No module named bar
on Python2, and on Python3 it'll fail with:
ModuleNotFoundError: No module named 'foo.bar'; 'foo' is not a package
If I add the fields to make it a package:
class Foo(ModuleType):
__all__ = ('bar',)
__package__ = 'foo'
__path__ = []
__file__ = __file__
It'll fail on:
ModuleNotFoundError: No module named 'foo.bar'
From what I understand, the problem is that I did not set sys.modules['foo.bar'] yet. But... to fill sys.modules I need to load the module first, and I don't want to do it unless the user of my project explicitly imports it.
Is there any way to make Python realize that when it sees import foo.bar it needs to load foo first(or I can just guarantee foo will already be loaded at that point) and take bar from it?
This post does NOT answer "This is how you do it."
If you want to know how to do this yourself look at PEP 302 or Idan Arye's solution.
This post instead presents a recipe that makes it easy to write. The recipe is at the end of this answer.
The block of code below defines two classes intended for use: PseudoModule and PseudoPackage. The behaviour only differs from whether import foo.x should raise an error stating foo isn't a package or try to load x and make sure it's a module. Several example uses are outlined below.
PseudoModule
PseudoModule can be used as a decorator to a function, it creates a new module object that when attributes are accessed for the first time it called the decorated function with the name of the attribute and the namespace of previously defined elements.
For example, this will make a module that assigns a new integer to each attribute accessed:
#PseudoModule
def access_tracker(attr, namespace):
namespace["_count"] = namespace.get("_count", -1) + 1
return namespace["_count"]
#PseudoModule will set `namespace[attr] = <return value>` for you
#this can be overriden by passing `remember_results=False` to the constructor
sys.modules["access_tracker"] = access_tracker
from access_tracker import zero, one, two, three
assert zero == 0 and one == 1 and two == 2 and three == 3
PseudoPackage
PseudoPackage is used the same way as PseudoModule however if the decorated function returns a module (or package) it will correct the name to be qualified as a subpackage and sys.modules is updated as needed. (the top level package still needs to be added to sys.modules manually)
Here is an example use of PseudoPackage:
spam_submodules = {"bacon"}
spam_attributes = {"eggs", "ham"}
#PseudoPackage
def spam(name, namespace):
print("getting a component of spam:", name)
if name in spam_submodules:
#PseudoModule
def submodule(attr, nested_namespace):
print("getting a component of submodule {}: {}".format(name, attr))
return attr #use the string of the attribute
return submodule #PseudoPackage will rename the module to be spam.bacon for us
elif name in spam_attributes:
return "supported attribute"
else:
raise AttributeError("spam doesn't have any {!r}.".format(name))
sys.modules["spam"] = spam
import spam.bacon
#prints "getting a component of spam: bacon"
assert spam.bacon.something == "something"
#prints "getting a component of submodule bacon: something"
from spam import eggs
#prints "getting a component of spam: eggs"
assert eggs == "supported attribute"
import spam.ham #ham isn't a submodule, raises error!
The way PseudoPackage is setup also makes arbitrary depth packages very easy although this specific example doesn't accomplish much:
def make_abstract_package(qualname = ""):
"makes a PseudoPackage that has arbitrary nesting of subpackages"
def gen_func(attr, namespace):
print("getting {!r} from package {!r}".format(attr, qualname))
return make_abstract_package("{}.{}".format(qualname, attr))
#can pass the name of the module as second argument if needed
return PseudoPackage(gen_func, qualname)
sys.modules["foo"] = make_abstract_package("foo")
from foo.bar.baz import thing_I_want
##prints:
# getting 'bar' from package 'foo'
# getting 'baz' from package 'foo.bar'
# getting 'thing_I_want' from package 'foo.bar.baz'
print(thing_I_want)
#prints "<module 'foo.bar.baz.thing_I_want' from '<PseudoPackage>'>"
Few notes on implementation
As general guidelines:
The function that computes attributes of the module should not import the module it's defining the attributes for
If you want a package or module to be available for import, you need to put it in sys.modules yourself.
PseudoPackage assumes each submodule is unique, don't reuse module objects.
It is also worth noting that sys.modules is only updated with submodules of PseudoPackages when an import statement that requires the name to be a module, for example if foo is a package already in sys.modules but foo.x has not been referenced yet then all these assertions will pass:
assert "foo.x" not in sys.modules and not hasattr(foo,"x")
import foo; foo.x #foo.x is computed but not added to sys.modules
assert "foo.x" not in sys.modules and hasattr(foo,"x")
from foo import x #x is retrieved from namespace but sys.modules is still not affected
assert "foo.x" not in sys.modules
import foo.x #if x is a module then "foo.x" is added to sys.modules
assert "foo.x" in sys.modules
as well in the above case if foo.x isn't a module then the statement import foo.x raises a ModuleNotFoundError.
Finally, while the problematic edge cases I have identified can be avoided by following the guidelines above, the docstring for _PseudoPackageLoader describes the implementation details responsible for unwanted behaviour for possible future modifications.
The recipe
import sys
from types import ModuleType
import importlib.abc #uses Loader and MetaPathFinder, more for inspection purposes then use
class RawPseudoModule(ModuleType):
"""
see PseudoModule for documentation, this class is not intended for direct use.
RawPseudoModule does not handle __path__ so the generating function of direct
instances are expected to make and return an appropriate value for __path__
*** if you do not know what an appropriate value for __path__ is
then use PseudoModule instead ***
"""
#using slots keeps these two variables out of the module dictionary
__slots__ = ["__generating_func", "__remember_results"]
def __init__(self, func, name=None, remember_results = True):
name = name or func.__name__
super(RawPseudoModule, self).__init__(name)
self.__file__ = "<{0.__class__.__name__}>".format(self)
self.__generating_func = func
self.__remember_results = remember_results
def __getattr__(self, attr):
value = self.__generating_func(attr, vars(self))
if self.__remember_results:
setattr(self, attr, value)
return value
class PseudoModule(RawPseudoModule):
"""
A module that has attributes generated from a specified function
The generating function passed to the constructor should have the signature:
f(attr:str, namespace:dict) -> object:
- attr is the name of the attribute accessed
- namespace is the currently defined values in the module
the function should return a value for the attribute or raise an AttributeError if it doesn't exist.
by default the result is then saved to the namespace so you don't
have to explicitly do "namespace[attr] = <value>" however this behaviour
can be overridden by specifying "remember_results = False" in the constructor.
If no name is specified in the constructor the function name will be
used for the module name instead, this allows the class to be used as a decorator
Note: the PseudoModule class is setup so that "import foo.bar"
when foo is a PseudoModule will fail stating "'foo' is not a package".
- to allow importing submodules use PseudoPackage.
- to handle the internal __path__ manually use RawPseudoPackage.
Note: the module is NOT added to sys.modules automatically.
"""
def __getattr__(self, attr):
#to not have submodules then __path__ must not exist
if attr == "__path__":
msg = "{0.__name__} is a PseudoModule, it is not a package so it doesn't have a __path__"
#this error message would only be seen by people who explicitly access __path__
raise AttributeError(msg.format(self))
return super(PseudoModule, self).__getattr__(attr)
class PseudoPackage(RawPseudoModule):
"""
A version of PseudoModule that sets itself up to allow importing subpackages
When a submodule is imported from a PseudoPackage:
- it is evaluated with the generating function.
- the name of the submodule is overriden to be correctly qualified
- and it is added to sys.modules to allow repeated imports.
Note: the top level package still needs to be added to sys.modules manually
Note: A RecursionError will be raised if the code that generates submodules
attempts to import another submodule from the PseudoPackage.
"""
#IMPLEMENTATION DETAIL: technically this doesn't deal with adding submodules to
# sys.modules, that is handled in _PseudoPackageLoader
# which explicitly checks for instances of PseudoPackage
__path__ = [] #packages must have a __path__ to be recognized as packages.
def __getattr__(self, attr):
value = super(PseudoPackage, self).__getattr__(attr)
if isinstance(value, ModuleType):
#I'm just going to say if it's a module then the name must be in this format.
value.__name__ = self.__name__ + "." + attr
return value
class _PseudoPackageLoader(importlib.abc.Loader, importlib.abc.MetaPathFinder):
"""
Singleton finder and loader for pseudo packages
When ever a subpackage of a PseudoPackage (that is already in sys.modules) is imported
this will handle loading it and adding the subpackage to sys.modules
Note that although PEP 302 states the finder should not depend on the parent
being loaded in sys.modules, this is implemented under the understanding that
the user of PseudoPackage will add their module to sys.modules manually themselves
so this will work only when the parent is present in sys.modules
Also PEP 302 indicates the module should be added to sys.modules first in case
it is imported during it's execution, however this is impossible due to the
nature of how the module actually gets loaded.
So for heaven's sake don't try to import a pseudo package or a module that uses
a pseudo package from within the code that generates it.
I have only tested this when the sub module is either PseudoModule or PseudoPackage
and it was created new from the generating function, ideally there would be a way
to allow the generating function to return an unexecuted module and this would
properly handle executing it but I don't know how to deal with that.
"""
def find_module(self, fullname, path):
#this will only support loading if the parent package is a PseudoPackage
base,_,_ = fullname.rpartition(".")
if isinstance(sys.modules.get(base), PseudoPackage):
return self
#I found that `if path is PseudoPackage.__path__` worked the same way for all the cases I tested
#however since load_module will fail if the base part isn't in sys.modules
# it seems safer to just check for that.
def load_module(self, fullname):
if fullname in sys.modules:
return sys.modules[fullname]
base,_,sub = fullname.rpartition(".")
parent = sys.modules[base]
try:
submodule = getattr(parent, sub)
except AttributeError:
#when we just access `foo.x` it raises an AttributeError
#but `import foo.x` should instead raise an ImportError
raise ImportError("cannot import name {!r}".format(sub))
if not isinstance(submodule, ModuleType):
#match the format of error raised when the submodule isn't a module
#example: `import sys.path` raises the same format of error.
raise ModuleNotFoundError("No module named {}".format(fullname))
#fill all the fields as described in PEP 302 except __name__
submodule.__loader__ = self
submodule.__package__ = base
submodule.__file__ = getattr(submodule, "__file__", "<submodule of PseudoPackage>")
#if there was a way to do this before the module was made that'd be nice
sys.modules[fullname] = submodule
#if we needed to execute the body of an unloaded module it'd be done here.
return submodule
#add the loader to sys.meta_path so it will handle our pseudo packages
sys.meta_path.append(_PseudoPackageLoader())
Thanks to the link #TadhgMcDonald-Jensen provided I managed to solve it:
import sys
from types import ModuleType
class FooImporter(object):
module = ModuleType('foo')
module.__path__ = [module.__name__]
def find_module(self, fullname, path):
if fullname == self.module.__name__:
return self
if path == [self.module.__name__]:
return self
def load_module(self, fullname):
if fullname == self.module.__name__:
return sys.modules.setdefault(fullname, self.module)
assert fullname.startswith(self.module.__name__ + '.')
try:
return sys.modules[fullname]
except KeyError:
submodule = ModuleType(fullname)
name = fullname[len(self.module.__name__) + 1:]
setattr(self.module, name, submodule)
sys.modules[fullname] = submodule
return submodule
sys.meta_path.append(FooImporter())
from foo import bar
#TadhgMcDonald-Jensen - please make an answer so that I can approve it.

pylint on in-memory file/stream

I'd like to embed pylint in a program. The user enters python programs (in Qt, in a QTextEdit, although not relevant) and in the background I call pylint to check the text he enters. Finally, I print the errors in a message box.
There are thus two questions: First, how can I do this without writing the entered text to a temporary file and giving it to pylint ? I suppose at some point pylint (or astroid) handles a stream and not a file anymore.
And, more importantly, is it a good idea ? Would it cause problems for imports or other stuffs ? Intuitively I would say no since it seems to spawn a new process (with epylint) but I'm no python expert so I'm really not sure. And if I use this to launch pylint, is it okay too ?
Edit:
I tried tinkering with pylint's internals, event fought with it, but finally have been stuck at some point.
Here is the code so far:
from astroid.builder import AstroidBuilder
from astroid.exceptions import AstroidBuildingException
from logilab.common.interface import implements
from pylint.interfaces import IRawChecker, ITokenChecker, IAstroidChecker
from pylint.lint import PyLinter
from pylint.reporters.text import TextReporter
from pylint.utils import PyLintASTWalker
class Validator():
def __init__(self):
self._messagesBuffer = InMemoryMessagesBuffer()
self._validator = None
self.initValidator()
def initValidator(self):
self._validator = StringPyLinter(reporter=TextReporter(output=self._messagesBuffer))
self._validator.load_default_plugins()
self._validator.disable('W0704')
self._validator.disable('I0020')
self._validator.disable('I0021')
self._validator.prepare_import_path([])
def destroyValidator(self):
self._validator.cleanup_import_path()
def check(self, string):
return self._validator.check(string)
class InMemoryMessagesBuffer():
def __init__(self):
self.content = []
def write(self, st):
self.content.append(st)
def messages(self):
return self.content
def reset(self):
self.content = []
class StringPyLinter(PyLinter):
"""Does what PyLinter does but sets checkers once
and redefines get_astroid to call build_string"""
def __init__(self, options=(), reporter=None, option_groups=(), pylintrc=None):
super(StringPyLinter, self).__init__(options, reporter, option_groups, pylintrc)
self._walker = None
self._used_checkers = None
self._tokencheckers = None
self._rawcheckers = None
self.initCheckers()
def __del__(self):
self.destroyCheckers()
def initCheckers(self):
self._walker = PyLintASTWalker(self)
self._used_checkers = self.prepare_checkers()
self._tokencheckers = [c for c in self._used_checkers if implements(c, ITokenChecker)
and c is not self]
self._rawcheckers = [c for c in self._used_checkers if implements(c, IRawChecker)]
# notify global begin
for checker in self._used_checkers:
checker.open()
if implements(checker, IAstroidChecker):
self._walker.add_checker(checker)
def destroyCheckers(self):
self._used_checkers.reverse()
for checker in self._used_checkers:
checker.close()
def check(self, string):
modname = "in_memory"
self.set_current_module(modname)
astroid = self.get_astroid(string, modname)
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
self._add_suppression_messages()
self.set_current_module('')
self.stats['statement'] = self._walker.nbstatements
def get_astroid(self, string, modname):
"""return an astroid representation for a module"""
try:
return AstroidBuilder().string_build(string, modname)
except SyntaxError as ex:
self.add_message('E0001', line=ex.lineno, args=ex.msg)
except AstroidBuildingException as ex:
self.add_message('F0010', args=ex)
except Exception as ex:
import traceback
traceback.print_exc()
self.add_message('F0002', args=(ex.__class__, ex))
if __name__ == '__main__':
code = """
a = 1
print(a)
"""
validator = Validator()
print(validator.check(code))
The traceback is the following:
Traceback (most recent call last):
File "validator.py", line 16, in <module>
main()
File "validator.py", line 13, in main
print(validator.check(code))
File "validator.py", line 30, in check
self._validator.check(string)
File "validator.py", line 79, in check
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
File "c:\Python33\lib\site-packages\pylint\lint.py", line 659, in check_astroid_module
tokens = tokenize_module(astroid)
File "c:\Python33\lib\site-packages\pylint\utils.py", line 103, in tokenize_module
print(module.file_stream)
AttributeError: 'NoneType' object has no attribute 'file_stream'
# And sometimes this is added :
File "c:\Python33\lib\site-packages\astroid\scoped_nodes.py", line 251, in file_stream
return open(self.file, 'rb')
OSError: [Errno 22] Invalid argument: '<?>'
I'll continue digging tomorrow. :)
I got it running.
the first one (NoneType …) is really easy and a bug in your code:
Encountering an exception can make get_astroid “fail”, i.e. send one syntax error message and return!
But for the secong one… such bullshit in pylint’s/logilab’s API… Let me explain: Your astroid object here is of type astroid.scoped_nodes.Module.
It’s also created by a factory, AstroidBuilder, which sets astroid.file = '<?>'.
Unfortunately, the Module class has following property:
#property
def file_stream(self):
if self.file is not None:
return open(self.file, 'rb')
return None
And there’s no way to skip that except for subclassing (Which would render us unable to use the magic in AstroidBuilder), so… monkey patching!
We replace the ill-defined property with one that checks an instance for a reference to our code bytes (e.g. astroid._file_bytes) before engaging in above default behavior.
def _monkeypatch_module(module_class):
if module_class.file_stream.fget.__name__ == 'file_stream_patched':
return # only patch if patch isn’t already applied
old_file_stream_fget = module_class.file_stream.fget
def file_stream_patched(self):
if hasattr(self, '_file_bytes'):
return BytesIO(self._file_bytes)
return old_file_stream_fget(self)
module_class.file_stream = property(file_stream_patched)
That monkeypatching can be called just before calling check_astroid_module. But one more thing has to be done. See, there’s more implicit behavior: Some checkers expect and use astroid’s file_encoding field. So we now have this code in the middle of check:
astroid = self.get_astroid(string, modname)
if astroid is not None:
_monkeypatch_module(astroid.__class__)
astroid._file_bytes = string.encode('utf-8')
astroid.file_encoding = 'utf-8'
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
One could say that no amount of linting creates actually good code. Unfortunately pylint unites enormous complexity with a specialization of calling it on files. Really good code has a nice native API and wraps that with a CLI interface. Don’t ask me why file_stream exists if internally, Module gets built from but forgets the source code.
PS: i had to change sth else in your code: load_default_plugins has to come before some other stuff (maybe prepare_checkers, maybe sth. else)
PPS: i suggest subclassing BaseReporter and using that instead of your InMemoryMessagesBuffer
PPPS: this just got pulled (3.2014), and will fix this: https://bitbucket.org/logilab/astroid/pull-request/15/astroidbuilderstring_build-was/diff
4PS: this is now in the official version, so no monkey patching required: astroid.scoped_nodes.Module now has a file_bytes property (without leading underscore).
Working with an unlocatable stream may definitly cause problems in case of relative imports, since the location is then needed to find the actually imported module.
Astroid support building an AST from a stream, but this is not used/exposed through Pylint which is a level higher and designed to work with files. So while you may acheive this it will need a bit of digging into the low-level APIs.
The easiest way is definitly to save the buffer to the file then to use the SA answer to start pylint programmatically if you wish (totally forgot this other account of mine found in other responses ;). Another option being to write a custom reporter to gain more control.

python windows directory mtime: how to detect package directory new file?

I'm working on an auto-reload feature for WHIFF
http://whiff.sourceforge.net
(so you have to restart the HTTP server less often, ideally never).
I have the following code to reload a package module "location"
if a file is added to the package directory. It doesn't work on Windows XP.
How can I fix it? I think the problem is that getmtime(dir) doesn't
change on Windows when the directory content changes?
I'd really rather not compare an os.listdir(dir) with the last directory
content every time I access the package...
if not do_reload and hasattr(location, "__path__"):
path0 = location.__path__[0]
if os.path.exists(path0):
dir_mtime = int( os.path.getmtime(path0) )
if fn_mtime<dir_mtime:
print "dir change: reloading package root", location
do_reload = True
md_mtime = dir_mtime
In the code the "fn_mtime" is the recorded mtime from the last (re)load.
... added comment: I came up with the following work around, which I think
may work, but I don't care for it too much since it involves code generation.
I dynamically generate a code fragment to load a module and if it fails
it tries again after a reload. Not tested yet.
GET_MODULE_FUNCTION = """
def f():
import %(parent)s
try:
from %(parent)s import %(child)s
except ImportError:
# one more time...
reload(%(parent)s)
from %(parent)s import %(child)s
return %(child)s
"""
def my_import(partname, parent):
f = None # for pychecker
parentname = parent.__name__
defn = GET_MODULE_FUNCTION % {"parent": parentname, "child": partname}
#pr "executing"
#pr defn
try:
exec(defn) # defines function f()
except SyntaxError:
raise ImportError, "bad function name "+repr(partname)+"?"
partmodule = f()
#pr "got", partmodule
setattr(parent, partname, partmodule)
#pr "setattr", parent, ".", partname, "=", getattr(parent, partname)
return partmodule
Other suggestions welcome. I'm not happy about this...
long time no see. I'm not sure exactly what you're doing, but the equivalent of your code:
GET_MODULE_FUNCTION = """
def f():
import %(parent)s
try:
from %(parent)s import %(child)s
except ImportError:
# one more time...
reload(%(parent)s)
from %(parent)s import %(child)s
return %(child)s
"""
to be execed with:
defn = GET_MODULE_FUNCTION % {"parent": parentname, "child": partname}
exec(defn)
is (per the docs), assuming parentname names a package and partname names a module in that package (if partname is a top-level name of the parentname package, such as a function or class, you'll have to use a getattr at the end):
import sys
def f(parentname, partname):
name = '%s.%s' % (parentname, partname)
try:
__import__(name)
except ImportError:
parent = __import__(parentname)
reload(parent)
__import__(name)
return sys.modules[name]
without exec or anything weird, just call this f appropriately.
you can try using getatime() instead.
I'm not understanding your question completely...
Are you calling getmtime() on a directory or an individual file?
There are two things about your first code snippet that concern me:
You cast the float from getmtime to int. Dependening on the frequency this code is run, you might get unreliable results.
At the end of the code you assign dir_mtime to a variable md_mtime. fn_mtime, which you check against, seems not to be updated.

Categories

Resources