I want to get all the functions and classes in module: __main__ of the source code directory: /tmp/rebound/rebound.
When I use the pyclbr.readmodule_ex API:
source_code_data = pyclbr.readmodule_ex(source_code_module, path=source_code_path)
I specify it the module and it's path:
DEBUG:root:Source code module: __main__, Source code path: ['/tmp/rebound/rebound/rebound']
I then get this error:
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/importlib/util.py", line 69, in _find_spec_from_path
raise ValueError('{}.__spec__ is None'.format(name))
ValueError: __main__.__spec__ is None
I then tried to use the function that is not supposed to be used by the public: _readmodule:
source_code_data = pyclbr._readmodule(source_code_module, source_code_path, )
But I could not decide what should be the value of the parameter: inpackage.
Upon tracing the code via debugger, I spotted a mistake:
def _find_spec_from_path(name, path=None):
"""Return the spec for the specified module.
First, sys.modules is checked to see if the module was already imported. If
so, then sys.modules[name].__spec__ is returned. If that happens to be
set to None, then ValueError is raised. If the module is not in
sys.modules, then sys.meta_path is searched for a suitable spec with the
value of 'path' given to the finders. None is returned if no spec could
be found.
Dotted names do not have their parent packages implicitly imported. You will
most likely need to explicitly import all parent packages in the proper
order for a submodule to get the correct spec.
"""
if name not in sys.modules:
return _find_spec(name, path)
else:
module = sys.modules[name]
if module is None:
return None
try:
spec = module.__spec__
except AttributeError:
raise ValueError('{}.__spec__ is not set'.format(name)) from None
else:
if spec is None:
raise ValueError('{}.__spec__ is None'.format(name))
return spec
This is the function in the module: python3.8/importlib/util.py and it evaluates __main__ as a built-in module as it falls in the else block.
How do I differentiate __main__ of my target source code to read from the built-in __main__? In other words, how do I read the module __main__ of the codebase: rebound?
TL:DR
Try:
source_code_data = pyclbr.readmodule_ex("rebound.__main__", path=source_code_path)
Explanation
As you already know: _find_spec_from_path will search for name in sys.modules and
__main__ is always present there.
If you inspect sys.modules.keys() you'll notice that it contains dot separated module names.
Example from Ipython shell:
'IPython.display',
'IPython.extensions',
'IPython.extensions.storemagic',
'IPython.lib',
'IPython.lib.backgroundjobs',
'IPython.lib.clipboard',
'IPython.lib.display',
'IPython.lib.pretty',
'IPython.lib.security',
'IPython.paths',
And if you realize you are looking for rebound.__main__ and not __main__ it becomes obvious. In order to step into if block the name can't be in sys.modules. The last remark would be that _find_spec_from_path has no bugs.
# python3.8/importlib/util.py
def _find_spec_from_path(name, path=None):
# ...
if name not in sys.modules:
return _find_spec(name, path)
else:
#...
Related
I solved my last problem, but now, it pretty much:
Traceback (most recent call last):
File "C:\Users\Qihua Huang\AppData\Local\Programs\Python\Python310\Lib\site-packages\win32\lib\pywintypes.py", line 115, in <module>
__import_pywin32_system_module__("pywintypes", globals())
File "C:\Users\Qihua Huang\AppData\Local\Programs\Python\Python310\Lib\site-packages\win32\lib\pywintypes.py", line 104, in __import_pywin32_system_module__
old_mod = sys.modules[modname]
KeyError: 'pywintypes'
I accessed the pwintypes.py, and inserted print statements to fish out error:
# Magic utility that "redirects" to pywintypesxx.dll
import importlib.util, importlib.machinery, sys, os
def __import_pywin32_system_module__(modname, globs):
# This has been through a number of iterations. The problem: how to
# locate pywintypesXX.dll when it may be in a number of places, and how
# to avoid ever loading it twice. This problem is compounded by the
# fact that the "right" way to do this requires win32api, but this
# itself requires pywintypesXX.
# And the killer problem is that someone may have done 'import win32api'
# before this code is called. In that case Windows will have already
# loaded pywintypesXX as part of loading win32api - but by the time
# we get here, we may locate a different one. This appears to work, but
# then starts raising bizarre TypeErrors complaining that something
# is not a pywintypes type when it clearly is!
# So in what we hope is the last major iteration of this, we now
# rely on a _win32sysloader module, implemented in C but not relying
# on pywintypesXX.dll. It then can check if the DLL we are looking for
# lib is already loaded.
# See if this is a debug build.
suffix = "_d" if "_d.pyd" in importlib.machinery.EXTENSION_SUFFIXES else ""
filename = "%s%d%d%s.dll" % (modname,sys.version_info[0],sys.version_info[1],suffix)
if hasattr(sys, "frozen"):
# If we are running from a frozen program (py2exe, McMillan, freeze)
# then we try and load the DLL from our sys.path
# XXX - This path may also benefit from _win32sysloader? However,
# MarkH has never seen the DLL load problem with py2exe programs...
for look in sys.path:
# If the sys.path entry is a (presumably) .zip file, use the
# directory
if os.path.isfile(look):
look = os.path.dirname(look)
found = os.path.join(look, filename)
if os.path.isfile(found):
break
else:
raise ImportError(
"Module '%s' isn't in frozen sys.path %s" % (modname, sys.path)
)
else:
# First see if it already in our process - if so, we must use that.
from win32 import _win32sysloader
found = _win32sysloader.GetModuleFilename(filename)
if found is None:
# We ask Windows to load it next. This is in an attempt to
# get the exact same module loaded should pywintypes be imported
# first (which is how we are here) or if, eg, win32api was imported
# first thereby implicitly loading the DLL.
# Sadly though, it doesn't quite work - if pywintypesxx.dll
# is in system32 *and* the executable's directory, on XP SP2, an
# import of win32api will cause Windows to load pywintypes
# from system32, where LoadLibrary for that name will
# load the one in the exe's dir.
# That shouldn't really matter though, so long as we only ever
# get one loaded.
found = _win32sysloader.LoadModule(filename)
if found is None:
# Windows can't find it - which although isn't relevent here,
# means that we *must* be the first win32 import, as an attempt
# to import win32api etc would fail when Windows attempts to
# locate the DLL.
# This is most likely to happen for "non-admin" installs, where
# we can't put the files anywhere else on the global path.
# If there is a version in our Python directory, use that
if os.path.isfile(os.path.join(sys.prefix, filename)):
found = os.path.join(sys.prefix, filename)
if found is None:
# Not in the Python directory? Maybe we were installed via
# easy_install...
if os.path.isfile(os.path.join(os.path.dirname(__file__), filename)):
found = os.path.join(os.path.dirname(__file__), filename)
print(found)
found='C:\\Users\Qihua Huang\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\pywin32_system32\\pywintypes310.dll'
print(found)
# There are 2 site-packages directories - one "global" and one "user".
# We could be in either, or both (but with different versions!). Factors include
# virtualenvs, post-install script being run or not, `setup.py install` flags, etc.
# In a worst-case, it means, say 'python -c "import win32api"'
# will not work but 'python -c "import pywintypes, win32api"' will,
# but it's better than nothing.
# We prefer the "user" site-packages if it exists...
if found is None:
import site
maybe = os.path.join(site.USER_SITE, "pywin32_system32", filename)
print(maybe)
if os.path.isfile(maybe):
found = maybe
print(found)
# Or the "global" site-packages.
if found is None:
import sysconfig
maybe = os.path.join(
sysconfig.get_paths()["platlib"], "pywin32_system32", filename
)
print(maybe)
if os.path.isfile(maybe):
found = maybe
print(found)
if found is None:
# give up in disgust.
raise ImportError("No system module '%s' (%s)" % (modname, filename))
# After importing the module, sys.modules is updated to the DLL we just
# loaded - which isn't what we want. So we update sys.modules to refer to
# this module, and update our globals from it.
old_mod = sys.modules[modname]
# Load the DLL.
loader = importlib.machinery.ExtensionFileLoader(modname, found)
spec = importlib.machinery.ModuleSpec(name=modname, loader=loader, origin=found)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
# Check the sys.modules[] behaviour we describe above is true...
assert sys.modules[modname] is mod
# as above - re-reset to the *old* module object then update globs.
sys.modules[modname] = old_mod
globs.update(mod.__dict__)
__import_pywin32_system_module__("pywintypes", globals())
The output that followed before error was:
C:\Users\*user*\AppData\Local\Programs\Python\Python310\Lib\site-packages\win32\lib\pywintypes310.dll
C:\Users\*user*\AppData\Local\Programs\Python\Python310\Lib\site-packages\pywin32_system32\pywintypes310.dll
To support extensions in my Python project, I'm trying to create a pseudo-module that will serve "extension modules" as it's submodules. I'm having a problem treating the submodules as modules - it seems like I need to access them using from..import on the main pseudo-module and can't just access their full path.
Here is a minimal working example:
import sys
from types import ModuleType
class Foo(ModuleType):
#property
def bar(self):
# Here I would actually find the location of `bar.py` and load it
bar = ModuleType('foo.bar')
sys.modules['foo.bar'] = bar
return bar
sys.modules['foo'] = Foo('foo')
from foo import bar # without this line the next line fails
import foo.bar
This works, but if I comment out the from foo import bar line, it'll fail with:
ImportError: No module named bar
on Python2, and on Python3 it'll fail with:
ModuleNotFoundError: No module named 'foo.bar'; 'foo' is not a package
If I add the fields to make it a package:
class Foo(ModuleType):
__all__ = ('bar',)
__package__ = 'foo'
__path__ = []
__file__ = __file__
It'll fail on:
ModuleNotFoundError: No module named 'foo.bar'
From what I understand, the problem is that I did not set sys.modules['foo.bar'] yet. But... to fill sys.modules I need to load the module first, and I don't want to do it unless the user of my project explicitly imports it.
Is there any way to make Python realize that when it sees import foo.bar it needs to load foo first(or I can just guarantee foo will already be loaded at that point) and take bar from it?
This post does NOT answer "This is how you do it."
If you want to know how to do this yourself look at PEP 302 or Idan Arye's solution.
This post instead presents a recipe that makes it easy to write. The recipe is at the end of this answer.
The block of code below defines two classes intended for use: PseudoModule and PseudoPackage. The behaviour only differs from whether import foo.x should raise an error stating foo isn't a package or try to load x and make sure it's a module. Several example uses are outlined below.
PseudoModule
PseudoModule can be used as a decorator to a function, it creates a new module object that when attributes are accessed for the first time it called the decorated function with the name of the attribute and the namespace of previously defined elements.
For example, this will make a module that assigns a new integer to each attribute accessed:
#PseudoModule
def access_tracker(attr, namespace):
namespace["_count"] = namespace.get("_count", -1) + 1
return namespace["_count"]
#PseudoModule will set `namespace[attr] = <return value>` for you
#this can be overriden by passing `remember_results=False` to the constructor
sys.modules["access_tracker"] = access_tracker
from access_tracker import zero, one, two, three
assert zero == 0 and one == 1 and two == 2 and three == 3
PseudoPackage
PseudoPackage is used the same way as PseudoModule however if the decorated function returns a module (or package) it will correct the name to be qualified as a subpackage and sys.modules is updated as needed. (the top level package still needs to be added to sys.modules manually)
Here is an example use of PseudoPackage:
spam_submodules = {"bacon"}
spam_attributes = {"eggs", "ham"}
#PseudoPackage
def spam(name, namespace):
print("getting a component of spam:", name)
if name in spam_submodules:
#PseudoModule
def submodule(attr, nested_namespace):
print("getting a component of submodule {}: {}".format(name, attr))
return attr #use the string of the attribute
return submodule #PseudoPackage will rename the module to be spam.bacon for us
elif name in spam_attributes:
return "supported attribute"
else:
raise AttributeError("spam doesn't have any {!r}.".format(name))
sys.modules["spam"] = spam
import spam.bacon
#prints "getting a component of spam: bacon"
assert spam.bacon.something == "something"
#prints "getting a component of submodule bacon: something"
from spam import eggs
#prints "getting a component of spam: eggs"
assert eggs == "supported attribute"
import spam.ham #ham isn't a submodule, raises error!
The way PseudoPackage is setup also makes arbitrary depth packages very easy although this specific example doesn't accomplish much:
def make_abstract_package(qualname = ""):
"makes a PseudoPackage that has arbitrary nesting of subpackages"
def gen_func(attr, namespace):
print("getting {!r} from package {!r}".format(attr, qualname))
return make_abstract_package("{}.{}".format(qualname, attr))
#can pass the name of the module as second argument if needed
return PseudoPackage(gen_func, qualname)
sys.modules["foo"] = make_abstract_package("foo")
from foo.bar.baz import thing_I_want
##prints:
# getting 'bar' from package 'foo'
# getting 'baz' from package 'foo.bar'
# getting 'thing_I_want' from package 'foo.bar.baz'
print(thing_I_want)
#prints "<module 'foo.bar.baz.thing_I_want' from '<PseudoPackage>'>"
Few notes on implementation
As general guidelines:
The function that computes attributes of the module should not import the module it's defining the attributes for
If you want a package or module to be available for import, you need to put it in sys.modules yourself.
PseudoPackage assumes each submodule is unique, don't reuse module objects.
It is also worth noting that sys.modules is only updated with submodules of PseudoPackages when an import statement that requires the name to be a module, for example if foo is a package already in sys.modules but foo.x has not been referenced yet then all these assertions will pass:
assert "foo.x" not in sys.modules and not hasattr(foo,"x")
import foo; foo.x #foo.x is computed but not added to sys.modules
assert "foo.x" not in sys.modules and hasattr(foo,"x")
from foo import x #x is retrieved from namespace but sys.modules is still not affected
assert "foo.x" not in sys.modules
import foo.x #if x is a module then "foo.x" is added to sys.modules
assert "foo.x" in sys.modules
as well in the above case if foo.x isn't a module then the statement import foo.x raises a ModuleNotFoundError.
Finally, while the problematic edge cases I have identified can be avoided by following the guidelines above, the docstring for _PseudoPackageLoader describes the implementation details responsible for unwanted behaviour for possible future modifications.
The recipe
import sys
from types import ModuleType
import importlib.abc #uses Loader and MetaPathFinder, more for inspection purposes then use
class RawPseudoModule(ModuleType):
"""
see PseudoModule for documentation, this class is not intended for direct use.
RawPseudoModule does not handle __path__ so the generating function of direct
instances are expected to make and return an appropriate value for __path__
*** if you do not know what an appropriate value for __path__ is
then use PseudoModule instead ***
"""
#using slots keeps these two variables out of the module dictionary
__slots__ = ["__generating_func", "__remember_results"]
def __init__(self, func, name=None, remember_results = True):
name = name or func.__name__
super(RawPseudoModule, self).__init__(name)
self.__file__ = "<{0.__class__.__name__}>".format(self)
self.__generating_func = func
self.__remember_results = remember_results
def __getattr__(self, attr):
value = self.__generating_func(attr, vars(self))
if self.__remember_results:
setattr(self, attr, value)
return value
class PseudoModule(RawPseudoModule):
"""
A module that has attributes generated from a specified function
The generating function passed to the constructor should have the signature:
f(attr:str, namespace:dict) -> object:
- attr is the name of the attribute accessed
- namespace is the currently defined values in the module
the function should return a value for the attribute or raise an AttributeError if it doesn't exist.
by default the result is then saved to the namespace so you don't
have to explicitly do "namespace[attr] = <value>" however this behaviour
can be overridden by specifying "remember_results = False" in the constructor.
If no name is specified in the constructor the function name will be
used for the module name instead, this allows the class to be used as a decorator
Note: the PseudoModule class is setup so that "import foo.bar"
when foo is a PseudoModule will fail stating "'foo' is not a package".
- to allow importing submodules use PseudoPackage.
- to handle the internal __path__ manually use RawPseudoPackage.
Note: the module is NOT added to sys.modules automatically.
"""
def __getattr__(self, attr):
#to not have submodules then __path__ must not exist
if attr == "__path__":
msg = "{0.__name__} is a PseudoModule, it is not a package so it doesn't have a __path__"
#this error message would only be seen by people who explicitly access __path__
raise AttributeError(msg.format(self))
return super(PseudoModule, self).__getattr__(attr)
class PseudoPackage(RawPseudoModule):
"""
A version of PseudoModule that sets itself up to allow importing subpackages
When a submodule is imported from a PseudoPackage:
- it is evaluated with the generating function.
- the name of the submodule is overriden to be correctly qualified
- and it is added to sys.modules to allow repeated imports.
Note: the top level package still needs to be added to sys.modules manually
Note: A RecursionError will be raised if the code that generates submodules
attempts to import another submodule from the PseudoPackage.
"""
#IMPLEMENTATION DETAIL: technically this doesn't deal with adding submodules to
# sys.modules, that is handled in _PseudoPackageLoader
# which explicitly checks for instances of PseudoPackage
__path__ = [] #packages must have a __path__ to be recognized as packages.
def __getattr__(self, attr):
value = super(PseudoPackage, self).__getattr__(attr)
if isinstance(value, ModuleType):
#I'm just going to say if it's a module then the name must be in this format.
value.__name__ = self.__name__ + "." + attr
return value
class _PseudoPackageLoader(importlib.abc.Loader, importlib.abc.MetaPathFinder):
"""
Singleton finder and loader for pseudo packages
When ever a subpackage of a PseudoPackage (that is already in sys.modules) is imported
this will handle loading it and adding the subpackage to sys.modules
Note that although PEP 302 states the finder should not depend on the parent
being loaded in sys.modules, this is implemented under the understanding that
the user of PseudoPackage will add their module to sys.modules manually themselves
so this will work only when the parent is present in sys.modules
Also PEP 302 indicates the module should be added to sys.modules first in case
it is imported during it's execution, however this is impossible due to the
nature of how the module actually gets loaded.
So for heaven's sake don't try to import a pseudo package or a module that uses
a pseudo package from within the code that generates it.
I have only tested this when the sub module is either PseudoModule or PseudoPackage
and it was created new from the generating function, ideally there would be a way
to allow the generating function to return an unexecuted module and this would
properly handle executing it but I don't know how to deal with that.
"""
def find_module(self, fullname, path):
#this will only support loading if the parent package is a PseudoPackage
base,_,_ = fullname.rpartition(".")
if isinstance(sys.modules.get(base), PseudoPackage):
return self
#I found that `if path is PseudoPackage.__path__` worked the same way for all the cases I tested
#however since load_module will fail if the base part isn't in sys.modules
# it seems safer to just check for that.
def load_module(self, fullname):
if fullname in sys.modules:
return sys.modules[fullname]
base,_,sub = fullname.rpartition(".")
parent = sys.modules[base]
try:
submodule = getattr(parent, sub)
except AttributeError:
#when we just access `foo.x` it raises an AttributeError
#but `import foo.x` should instead raise an ImportError
raise ImportError("cannot import name {!r}".format(sub))
if not isinstance(submodule, ModuleType):
#match the format of error raised when the submodule isn't a module
#example: `import sys.path` raises the same format of error.
raise ModuleNotFoundError("No module named {}".format(fullname))
#fill all the fields as described in PEP 302 except __name__
submodule.__loader__ = self
submodule.__package__ = base
submodule.__file__ = getattr(submodule, "__file__", "<submodule of PseudoPackage>")
#if there was a way to do this before the module was made that'd be nice
sys.modules[fullname] = submodule
#if we needed to execute the body of an unloaded module it'd be done here.
return submodule
#add the loader to sys.meta_path so it will handle our pseudo packages
sys.meta_path.append(_PseudoPackageLoader())
Thanks to the link #TadhgMcDonald-Jensen provided I managed to solve it:
import sys
from types import ModuleType
class FooImporter(object):
module = ModuleType('foo')
module.__path__ = [module.__name__]
def find_module(self, fullname, path):
if fullname == self.module.__name__:
return self
if path == [self.module.__name__]:
return self
def load_module(self, fullname):
if fullname == self.module.__name__:
return sys.modules.setdefault(fullname, self.module)
assert fullname.startswith(self.module.__name__ + '.')
try:
return sys.modules[fullname]
except KeyError:
submodule = ModuleType(fullname)
name = fullname[len(self.module.__name__) + 1:]
setattr(self.module, name, submodule)
sys.modules[fullname] = submodule
return submodule
sys.meta_path.append(FooImporter())
from foo import bar
#TadhgMcDonald-Jensen - please make an answer so that I can approve it.
I hope the following question is not too long. But otherwise I cannot explain by problem and what I want:
Learned from How to use importlib to import modules from arbitrary sources? (my question of yesterday)
I have written a specfic loader for a new file type (.xxx).
(In fact the xxx is an encrypted version of a pyc to protect code from being stolen).
I would like just to add an import hook for the new file type "xxx" without affecting the other types (.py, .pyc, .pyd) in any way.
Now, the loader is ModuleLoader, inheriting from mportlib.machinery.SourcelessFileLoader.
Using sys.path_hooks the loader shall be added as a hook:
myFinder = importlib.machinery.FileFinder
loader_details = (ModuleLoader, ['.xxx'])
sys.path_hooks.append(myFinder.path_hook(loader_details))
Note: This is activated once by calling modloader.activateLoader()
Upon loading a module named test (which is a test.xxx) I get:
>>> import modloader
>>> modloader.activateLoader()
>>> import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'test'
>>>
However, when I delete content of sys.path_hooks before adding the hook:
sys.path_hooks = []
sys.path.insert(0, '.') # current directory
sys.path_hooks.append(myFinder.path_hook(loader_details))
it works:
>>> modloader.activateLoader()
>>> import test
using xxx class
in xxxLoader exec_module
in xxxLoader get_code: .\test.xxx
ANALYZING ...
GENERATE CODE OBJECT ...
2 0 LOAD_CONST 0
3 LOAD_CONST 1 ('foo2')
6 MAKE_FUNCTION 0
9 STORE_NAME 0 (foo2)
12 LOAD_CONST 2 (None)
15 RETURN_VALUE
>>>>>> test
<module 'test' from '.\\test.xxx'>
The module is imported correctly after conversion of the files content to a code object.
However I cannot load the same module from a package: import pack.test
Note: __init__.py is of course as an empty file in pack directory.
>>> import pack.test
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.test'; 'pack' is not a package
>>>
Not enough, I cannot load plain *.py modules from that package anymore: I get the same error as above:
>>> import pack.testpy
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.testpy'; 'pack' is not a package
>>>
For my understanding sys.path_hooks is traversed until the last entry is tried. So why is the first variant (without deleting sys.path_hooks) not recognizing the new extension "xxx" and the second variant (deleting sys.path_hooks) do?
It looks like the machinery is throwing an exception rather than traversing further to the next entry, when an entry of sys.path_hooks is not able to recognize "xxx".
And why is the second version working for py, pyc and xxx modules in the current directory, but not working in the package pack? I would expect that py and pyc is not even working in the current dir, because sys.path_hooks contains only a hook for "xxx"...
The short answer is that the default PathFinder in sys.meta_path isn't meant to have new file extensions and importers added in the same paths it already supports. But there's still hope!
Quick Breakdown
sys.path_hooks is consumed by the importlib._bootstrap_external.PathFinder class.
When an import happens, each entry in sys.meta_path is asked to find a matching spec for the requested module. The PathFinder in particular will then take the contents of sys.path and pass it to the factory functions in sys.path_hooks. Each factory function has a chance to either raise an ImportError (basically the factory saying "nope, I don't support this path entry") or return a finder instance for that path. The first successfully returned finder is then cached in sys.path_importer_cache. From then on PathFinder will only ask those cached finder instances if they can provide the requested module.
If you look at the contents of sys.path_importer_cache, you'll see all of the directory entries from sys.path have been mapped to FileFinder instances. Non-directory entries (zip files, etc) will be mapped to other finders.
Thus, if you append a new factory created via FileFinder.path_hook to sys.path_hooks, your factory will only be invoked if the previous FileFinder hook didn't accept the path. This is unlikely, since FileFinder will work on any existing directory.
Alternatively, if you insert your new factory to sys.path_hooks ahead of the existing factories, the default hook will only be used if your new factory doesn't accept the path. And again, since FileFinder is so liberal with what it will accept, this would lead to only your loader being used, as you've already observed.
Making it Work
So you can either try to adjust that existing factory to also support your file extension and importer (which is difficult as the importers and extension string tuples are held in a closure), or do what I ended up doing, which is add a new meta path finder.
So eg. from my own project,
import sys
from importlib.abc import FileLoader
from importlib.machinery import FileFinder, PathFinder
from os import getcwd
from os.path import basename
from sibilant.module import prep_module, exec_module
SOURCE_SUFFIXES = [".lspy", ".sibilant"]
_path_importer_cache = {}
_path_hooks = []
class SibilantPathFinder(PathFinder):
"""
An overridden PathFinder which will hunt for sibilant files in
sys.path. Uses storage in this module to avoid conflicts with the
original PathFinder
"""
#classmethod
def invalidate_caches(cls):
for finder in _path_importer_cache.values():
if hasattr(finder, 'invalidate_caches'):
finder.invalidate_caches()
#classmethod
def _path_hooks(cls, path):
for hook in _path_hooks:
try:
return hook(path)
except ImportError:
continue
else:
return None
#classmethod
def _path_importer_cache(cls, path):
if path == '':
try:
path = getcwd()
except FileNotFoundError:
# Don't cache the failure as the cwd can easily change to
# a valid directory later on.
return None
try:
finder = _path_importer_cache[path]
except KeyError:
finder = cls._path_hooks(path)
_path_importer_cache[path] = finder
return finder
class SibilantSourceFileLoader(FileLoader):
def create_module(self, spec):
return None
def get_source(self, fullname):
return self.get_data(self.get_filename(fullname)).decode("utf8")
def exec_module(self, module):
name = module.__name__
source = self.get_source(name)
filename = basename(self.get_filename(name))
prep_module(module)
exec_module(module, source, filename=filename)
def _get_lspy_file_loader():
return (SibilantSourceFileLoader, SOURCE_SUFFIXES)
def _get_lspy_path_hook():
return FileFinder.path_hook(_get_lspy_file_loader())
def _install():
done = False
def install():
nonlocal done
if not done:
_path_hooks.append(_get_lspy_path_hook())
sys.meta_path.append(SibilantPathFinder)
done = True
return install
_install = _install()
_install()
The SibilantPathFinder overrides PathFinder and replaces only those methods which reference sys.path_hook and sys.path_importer_cache with similar implementations which instead look in a _path_hook and _path_importer_cache which are local to this module.
During import, the existing PathFinder will try to find a matching module. If it cannot, then my injected SibilantPathFinder will re-traverse the sys.path and try to find a match with one of my own file extensions.
Figuring More Out
I ended up delving into the source for the _bootstrap_external module
https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py
The _install function and the PathFinder.find_spec method are the best starting points to seeing why things work the way they do.
#obriencj's analysis of the situation is correct. But I came up with a different solution to this problem that doesn't require putting anything in sys.meta_path. Instead, it installs a special hook in sys.path_hooks that acts almost as a sort of middle-ware between the PathFinder in sys.meta_path, and the hooks in sys.path_hooks where, rather than just using the first hook that says "I can handle this path!" it tries all matching hooks in order, until it finds one that actually returns a useful ModuleSpec from its find_spec method:
#PathEntryFinder.register
class MetaFileFinder:
"""
A 'middleware', if you will, between the PathFinder sys.meta_path hook,
and sys.path_hooks hooks--particularly FileFinder.
The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
it will handle *any* directory. So if one wants to insert another
FileFinder.path_hook into sys.path_hooks, that will totally take over
importing for any directory, and previous path hooks will be ignored.
This class provides its own sys.path_hooks hook as follows: If inserted
on sys.path_hooks (it should be inserted early so that it can supersede
anything else). Its find_spec method then calls each hook on
sys.path_hooks after itself and, for each hook that can handle the given
sys.path entry, it calls the hook to create a finder, and calls that
finder's find_spec. So each sys.path_hooks entry is tried until a spec is
found or all finders are exhausted.
"""
class hook:
"""
Use this little internal class rather than a function with a closure
or a classmethod or anything like that so that it's easier to
identify our hook and skip over it while processing sys.path_hooks.
"""
def __init__(self, basepath=None):
self.basepath = os.path.abspath(basepath)
def __call__(self, path):
if not os.path.isdir(path):
raise ImportError('only directories are supported', path=path)
elif not self.handles(path):
raise ImportError(
'only directories under {} are supported'.format(
self.basepath), path=path)
return MetaFileFinder(path)
def handles(self, path):
"""
Return whether this hook will handle the given path, depending on
what its basepath is.
"""
path = os.path.abspath(path)
return (self.basepath is None or
os.path.commonpath([self.basepath, path]) == self.basepath)
def __init__(self, path):
self.path = path
self._finder_cache = {}
def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, self.path)
def find_spec(self, fullname, target=None):
if not sys.path_hooks:
return None
last = len(sys.path_hooks) - 1
for idx, hook in enumerate(sys.path_hooks):
if isinstance(hook, self.__class__.hook):
continue
finder = None
try:
if hook in self._finder_cache:
finder = self._finder_cache[hook]
if finder is None:
# We've tried this finder before and got an ImportError
continue
except TypeError:
# The hook is unhashable
pass
if finder is None:
try:
finder = hook(self.path)
except ImportError:
pass
try:
self._finder_cache[hook] = finder
except TypeError:
# The hook is unhashable for some reason so we don't bother
# caching it
pass
if finder is not None:
spec = finder.find_spec(fullname, target)
if (spec is not None and
(spec.loader is not None or idx == last)):
# If no __init__.<suffix> was found by any Finder,
# we may be importing a namespace package (which
# FileFinder.find_spec returns in this case). But we
# only want to return the namespace ModuleSpec if we've
# exhausted every other finder first.
return spec
# Module spec not found through any of the finders
return None
def invalidate_caches(self):
for finder in self._finder_cache.values():
finder.invalidate_caches()
#classmethod
def install(cls, basepath=None):
"""
Install the MetaFileFinder in the front sys.path_hooks, so that
it can support any existing sys.path_hooks and any that might
be appended later.
If given, only support paths under and including basepath. In this
case it's not necessary to invalidate the entire
sys.path_importer_cache, but only any existing entries under basepath.
"""
if basepath is not None:
basepath = os.path.abspath(basepath)
hook = cls.hook(basepath)
sys.path_hooks.insert(0, hook)
if basepath is None:
sys.path_importer_cache.clear()
else:
for path in list(sys.path_importer_cache):
if hook.handles(path):
del sys.path_importer_cache[path]
This is still, depressing, far more complication than should be necessary. I feel like on Python 2, before the import system rewrite, it was much simpler to do this since less of the support for the built-in module types (.py, etc.) was built on top of the import hooks themselves, so it was harder to break importing normal modules by adding hooks to import new modules types. I'm going to start a discussion on python-ideas to see if there's any way we can't improve this situation.
I came up with yet an alternative tweak. I won't say it is beautiful as it does a closure on an already existing one, but at least short :)
It adds loaders to the default FileLoader objects through a new hook. The original path_hook_for_FileFinder is wrapped in a closure and the loaders are injected into the FileFinder objects returned by the original hook.
After the new hook added the path_importer_cache is cleared as that is already filled with the original FileFinder objects. Those could also be updated dynamically, but I did not bother for now.
Disclaimer: not extensively tested yet. It does what I need in the easiest possible way I know, but the import system is complicated enough to produce funny side-effects for a tweak like this.
import sys
import importlib.machinery
def extend_path_hook_for_FileFinder(*loader_details):
orig_hook, orig_pos = None, None
for i, hook in enumerate(sys.path_hooks):
if hook.__name__ == 'path_hook_for_FileFinder':
orig_hook, orig_pos = hook, i
break
sys.path_hooks.remove(orig_hook)
def extended_path_hook_for_FileFinder(path):
orig_finder = orig_hook(path)
loaders = []
for loader, suffixes in loader_details:
loaders.extend((suffix, loader) for suffix in suffixes)
orig_finder._loaders.extend(loaders)
return orig_finder
sys.path_hooks.insert(orig_pos, extended_path_hook_for_FileFinder)
MY_SUFFIXES = ['.pymy']
class MySourceFileLoader(importlib.machinery.SourceFileLoader):
pass
loader_detail = (MySourceFileLoader, MY_SUFFIXES)
extend_path_hook_for_FileFinder(loader_detail)
# empty cache as it is already filled with simple FileFinder
# objects for the most common path elements
sys.path_importer_cache.clear()
sys.path_importer_cache.invalidate_caches()
I am trying to write a static documentation generator for a UI-library for Python 3 (toga).
Within the project there are the subdirectories :
iOS
setup.py
toga_iOS
__init__.py
app.py
mac
setup.py
toga_mac
__init__.py
app.py
I want to iterate through the directories and get the value of the __all__ attribute in the toga_x module. The issue I have is that each module is designed to be installed on that platform, e.g. the Windows one requires a Python package that installs on Windows, the Mac on Mac etc.
If I use importlib or __import__ it fails because within each of the __init__.py files it will import the platform-specific packages, e.g.
PLATFORM_LIST = {
'android': 'Android',
'cocoa': 'Mac OS cocoa',
'gtk': 'GTK +'
}
for module, label in PLATFORM_LIST.items():
print(module)
sys.path.append(os.path.join('../src',
module))
module = importlib.import_module('toga_'+module)
sys.modules[module] = module
_all = getattr(module, '__all__')
Fails with "ImportError: No module named 'android'".
There are lots of options, ast, pylint, compile, inspectlib. Which would be the best approach for getting the value of __all__ without having to install all the dependent modules?
you can use ast to find a static assignment node in a python source file:
import ast
def get_declaration_from_source(text, name="__all__"):
"""gets a single declaration from python source code"""
tree = ast.parse(text)
#walk through each statement (more or less) in the module
for node in tree.body:
#if assigning to a single target (a = b= 5 is multiple)
if isinstance(node, ast.Assign) and len(node.targets)==1:
target = node.targets[0]
#if assigning to the name we are looking for
if isinstance(target, ast.Name) and target.id == name:
#use literal_eval to get the actual value, can raise ValueError if not a literal value.
return ast.literal_eval(node.value)
raise NameError("name %r was not found"%(name,))
can use random source file as example:
import random
with open(random.__file__, "r") as f:
names = get_declaration_from_source(f.read())
>>> names
['Random', 'seed', 'random', 'uniform', 'randint', 'choice', 'sample', 'randrange', 'shuffle', 'normalvariate', 'lognormvariate', 'expovariate', 'vonmisesvariate', 'gammavariate', 'triangular', 'gauss', 'betavariate', 'paretovariate', 'weibullvariate', 'getstate', 'setstate', 'getrandbits', 'choices', 'SystemRandom']
Note that ast.parse can raise SyntaxError if the source code has a syntax error (might be able to raise other errors when compiling) and ast.literal_eval will raise a ValueError if the value isn't a python literal which as you have commented shouldn't be a problem, cheers.
I'm working on an auto-reload feature for WHIFF
http://whiff.sourceforge.net
(so you have to restart the HTTP server less often, ideally never).
I have the following code to reload a package module "location"
if a file is added to the package directory. It doesn't work on Windows XP.
How can I fix it? I think the problem is that getmtime(dir) doesn't
change on Windows when the directory content changes?
I'd really rather not compare an os.listdir(dir) with the last directory
content every time I access the package...
if not do_reload and hasattr(location, "__path__"):
path0 = location.__path__[0]
if os.path.exists(path0):
dir_mtime = int( os.path.getmtime(path0) )
if fn_mtime<dir_mtime:
print "dir change: reloading package root", location
do_reload = True
md_mtime = dir_mtime
In the code the "fn_mtime" is the recorded mtime from the last (re)load.
... added comment: I came up with the following work around, which I think
may work, but I don't care for it too much since it involves code generation.
I dynamically generate a code fragment to load a module and if it fails
it tries again after a reload. Not tested yet.
GET_MODULE_FUNCTION = """
def f():
import %(parent)s
try:
from %(parent)s import %(child)s
except ImportError:
# one more time...
reload(%(parent)s)
from %(parent)s import %(child)s
return %(child)s
"""
def my_import(partname, parent):
f = None # for pychecker
parentname = parent.__name__
defn = GET_MODULE_FUNCTION % {"parent": parentname, "child": partname}
#pr "executing"
#pr defn
try:
exec(defn) # defines function f()
except SyntaxError:
raise ImportError, "bad function name "+repr(partname)+"?"
partmodule = f()
#pr "got", partmodule
setattr(parent, partname, partmodule)
#pr "setattr", parent, ".", partname, "=", getattr(parent, partname)
return partmodule
Other suggestions welcome. I'm not happy about this...
long time no see. I'm not sure exactly what you're doing, but the equivalent of your code:
GET_MODULE_FUNCTION = """
def f():
import %(parent)s
try:
from %(parent)s import %(child)s
except ImportError:
# one more time...
reload(%(parent)s)
from %(parent)s import %(child)s
return %(child)s
"""
to be execed with:
defn = GET_MODULE_FUNCTION % {"parent": parentname, "child": partname}
exec(defn)
is (per the docs), assuming parentname names a package and partname names a module in that package (if partname is a top-level name of the parentname package, such as a function or class, you'll have to use a getattr at the end):
import sys
def f(parentname, partname):
name = '%s.%s' % (parentname, partname)
try:
__import__(name)
except ImportError:
parent = __import__(parentname)
reload(parent)
__import__(name)
return sys.modules[name]
without exec or anything weird, just call this f appropriately.
you can try using getatime() instead.
I'm not understanding your question completely...
Are you calling getmtime() on a directory or an individual file?
There are two things about your first code snippet that concern me:
You cast the float from getmtime to int. Dependening on the frequency this code is run, you might get unreliable results.
At the end of the code you assign dir_mtime to a variable md_mtime. fn_mtime, which you check against, seems not to be updated.