I'm trying to write a Python script/function to help determine where to import
something from. I want this functionality specifically for PyQt4 but it could
be extended and useful for the Python standard library.
To accomplish this I need to dynamically search modules, shared objects,
classes, and possibly more 'types' of things for a given search term.
I want to pass a module/class/etc. as a string, import it dynamically and then
search it.
The interface of the function would be something like this:
search(object_to_search, search_term)
Examples:
search('datetime', 'today') -> 'datetime.datetime.today'
search('PyQt4', 'NonModal') -> 'PyQt4.QtCore.Qt.NonModal'
search('PyQt4', 'QIcon') -> 'PyQt4.QtGui.QIcon'
I suppose this could also support wildcards like '*' to search all of
sys.path however this is beyond the scope of what I'm concered about at this
time.
This type of script would be especially useful for PyQt4. There are a lot of
'enum-like' things like NonModal above and finding the location to import
or reference them from can be cumbersome.
I tend to use the C++ Qt documentation quite a bit for PyQt4 information since
it's typically easy to translate the examples into Python. The C++ Qt
documentation is more comprehensive. It's also much easier to find sample code
for Qt proper. However, the import locations of some of these 'enum-like'
attributes are hard to find in PyQt4 without manually searching the docs or
guessing. The goal is to automate this process and just have the script tell
me the location of the thing I'm looking for.
I have tried a few different ways of implementing this including just searching
the object recursively with dir(), using the inspect module, and a hacky
version of import/discovery for shared objects. Below is my various
implementations, non of which really work quite right.
Is this type of thing even possible? If so, what is the best way to do this?
Also, I've tried to trim down the number of things I search by omitting
built-in things, functions, etc., but I'm not convinced this is necessary or
even correct.
Attempted solution:
def search_obj_inspect(obj, search_term):
"""Search given object for 'search_term' with 'inspect' module"""
import inspect
def searchable_type(obj):
is_class = inspect.isclass(obj)
root_type_or_obj = is_class and obj not in [object, type]
abstract_class = is_class and inspect.isabstract(obj)
method_or_func = inspect.ismethod(obj) or inspect.isfunction(obj)
try:
hidden_attribute = obj.__name__.startswith('__')
except AttributeError:
hidden_attribute = False
searchable = True
# Avoid infinite recursion when looking through 'object' and 'type' b/c
# they are members of each other
if True in [abstract_class, root_type_or_obj, hidden_attribute,
method_or_func, inspect.isbuiltin(obj)]:
searchable = False
return searchable
# FIXME: Search obj itself for search term? obj.__name__ == search_term?
for n, v in inspect.getmembers(obj,
predicate=lambda x: searchable_type(x)):
if search_term in n:
try:
return v.__name__
except AttributeError:
return str(v)
return None
def search_obj_dir(obj, search_term):
"""Search given object and all it's python submodules for attr"""
import re
import types
SEARCHABLE_TYPES = (types.ModuleType, types.ClassType, types.InstanceType,
types.DictProxyType)
for item in dir(obj):
if item.startswith('__'):
continue
if re.search(search_term, item):
return obj.__dict__[item].__module__
else:
try:
submod = obj.__dict__[item]
except (KeyError, AttributeError):
# Nothing left to recursively search
continue
if not isinstance(submod, SEARCHABLE_TYPES):
continue
#mod = search_obj_dir(submod, search_term)
#if mod:
#return '.'.join([mod, submod.__name__])
return None
def search_so(module, attr):
"""Search given modules shared objects for attr"""
import os
import re
try:
path = module.__path__[0]
except (AttributeError, IndexError):
return False
for root, dirs, files in os.walk(path):
for filename in files:
if filename.endswith('.so'):
py_module_name = filename.split('.')[0]
so_module = __import__(module.__name__, globals(), locals(),
[py_module_name])
if re.search(attr, py_module_name):
return ''.join([module.__name__, '.', py_module_name])
try:
found = search_obj_dir(
so_module.__dict__[py_module_name], attr)
except KeyError:
# This isn't a top-level so, might be a subpackage
# FIXME: Could recursively search SO as well, but imports
# might get weird
continue
if found:
return found
Thanks to #JonClements for the hint to look into the pydoc module. I found a way to achieve pretty much what I wanted with a combination of the pkgutil and inspect modules.
The 'trick' is to go through all the modules in a package (like PyQt4) with pkgutil.iter_modules and then all non-package objects can be searched with inspect.getmembers. The general idea is the following:
for pkg in given_pkg:
for module_or_class in pkg:
for attribute in module_or_class:
You can see the full solution here.
Related
This question already has answers here:
Is there a way to view cPickle or Pickle file contents without loading Python in Windows?
(3 answers)
Closed 4 years ago.
I need to do some maintenance on a system that basically looks like:
(Complicated legacy Python program) -> binary pickle file -> (Another complicated legacy Python program)
Which requires figuring out exactly what is in the intermediate pickle file. I suspect the file format is much simpler than the codes that generate and consume it, and it would help if I could verify that by eyeballing the file itself instead of having to figure out exactly what all the code does.
Is there a way to take the binary pickle file and convert it, not to live objects in memory (which is what every page I could find with a Google search reckons 'unpickle' means) but to some readable text format? JSON, XML, whatever, I'm not fussy about the exact format, anything would do as long as it is a complete and readable representation of the contents that I can load up in a text editor and look at.
Python native types are just readable enough. The hard part in your way is that unpickling will automatically try to import any modules with classes for any instances defined in code in your file.
Fortunatelly, Python is flexible enough its possible to temporarily hack the import machinery in order to fool the unpickling and give it false classes to fill in with the pickled attributes.
Then, it is a matter of converting the dictionary of the instances that were unpickled in this way back to human readable.
Fortunatelly, I maintain a pet project that performs this "temporary import system hacking", so I could lend a couple lines of code from there to make the same here.
In order to test this thing, I ended up creating a stand-alone script. As the comments on it spell: don't try to incorporate this in a larger program - it will break the running Python program as it is, by creating faking modules - but it should be enough for you to visualize what is pickled in there - although it would be impossible to match all the corner cases you can have there - you will have to work from here, mostly on the "pythonize" function bellow:
import re, pickle, pprint, sys
from types import ModuleType
from collections.abc import Sequence, Mapping, Set
from contextlib import contextmanager
def pythonize(obj):
if isinstance(obj, (str, bytes)):
return obj
if isinstance(obj, (Sequence, Set)):
container = []
for element in obj:
container.append(pythonize(element))
return container
elif isinstance(obj, Mapping):
container = {}
else:
container = {"$CLS": obj.__class__.__qualname__}
if not hasattr(obj, "__dict__"):
return repr(obj)
obj = obj.__dict__
for key, value in obj.items():
container[key] = pythonize(value)
return container
class FakeModule:
def __getattr__(self, attr):
cls = type(attr, (), {})
setattr(self, attr, cls)
return cls
def fake_importer(name, globals, locals, fromlist, level):
module = sys.modules[name] = FakeModule()
return module
#contextmanager
def fake_import_system():
# With code lifted from https://github.com/jsbueno/extradict - MapGetter functionality
builtins = __builtins__ if isinstance(__builtins__, dict) else __builtins__.__dict__
original_import = builtins["__import__"]
builtins["__import__"] = fake_importer
yield None
builtins["__import__"] = original_import
def unpickle_to_text(stream: bytes):
# WARNING: this example will wreck havoc in loaded modules!
# do not use as part of a complex system!!
action_log = []
with fake_import_system():
result = pickle.loads(stream)
pythonized = pythonize(result)
return pprint.pformat(pythonized)
if __name__ == "__main__":
print(unpickle_to_text(open(sys.argv[1], "rb").read()))
update: as this might have some use for more people, I just made a gist out of this code. Maybe it is even pip worth: https://gist.github.com/jsbueno/b72a20cba121926bec19163780390b92
If the application is old enough it might use pickle protocol 0, which is human-readable.
You could try the pickletools module found in python 3.2+.
Using python3 -m pickletools <file> will "disassemle" the pickle file for you.
Alternatively, you could try loading the data using data = pickle.load() and then immediately dump it using print(json.dumps(data)). Note that this might fail, because pickle can represent more things than JSON can.
A proper Python module will list all its public symbols in a list called __all__. Managing that list can be tedious, since you'll have to list each symbol twice. Surely there are better ways, probably using decorators so one would merely annotate the exported symbols as #export.
How would you write such a decorator? I'm certain there are different ways, so I'd like to see several answers with enough information that users can compare the approaches against one another.
In Is it a good practice to add names to __all__ using a decorator?, Ed L suggests the following, to be included in some utility library:
import sys
def export(fn):
"""Use a decorator to avoid retyping function/class names.
* Based on an idea by Duncan Booth:
http://groups.google.com/group/comp.lang.python/msg/11cbb03e09611b8a
* Improved via a suggestion by Dave Angel:
http://groups.google.com/group/comp.lang.python/msg/3d400fb22d8a42e1
"""
mod = sys.modules[fn.__module__]
if hasattr(mod, '__all__'):
name = fn.__name__
all_ = mod.__all__
if name not in all_:
all_.append(name)
else:
mod.__all__ = [fn.__name__]
return fn
We've adapted the name to match the other examples. With this in a local utility library, you'd simply write
from .utility import export
and then start using #export. Just one line of idiomatic Python, you can't get much simpler than this. On the downside, the module does require access to the module by using the __module__ property and the sys.modules cache, both of which may be problematic in some of the more esoteric setups (like custom import machinery, or wrapping functions from another module to create functions in this module).
The python part of the atpublic package by Barry Warsaw does something similar to this. It offers some keyword-based syntax, too, but the decorator variant relies on the same patterns used above.
This great answer by Aaron Hall suggests something very similar, with two more lines of code as it doesn't use __dict__.setdefault. It might be preferable if manipulating the module __dict__ is problematic for some reason.
You could simply declare the decorator at the module level like this:
__all__ = []
def export(obj):
__all__.append(obj.__name__)
return obj
This is perfect if you only use this in a single module. At 4 lines of code (plus probably some empty lines for typical formatting practices) it's not overly expensive to repeat this in different modules, but it does feel like code duplication in those cases.
You could define the following in some utility library:
def exporter():
all = []
def decorator(obj):
all.append(obj.__name__)
return obj
return decorator, all
export, __all__ = exporter()
export(exporter)
# possibly some other utilities, decorated with #export as well
Then inside your public library you'd do something like this:
from . import utility
export, __all__ = utility.exporter()
# start using #export
Using the library takes two lines of code here. It combines the definition of __all__ and the decorator. So people searching for one of them will find the other, thus helping readers to quickly understand your code. The above will also work in exotic environments, where the module may not be available from the sys.modules cache or where the __module__ property has been tampered with or some such.
https://github.com/russianidiot/public.py has yet another implementation of such a decorator. Its core file is currently 160 lines long! The crucial points appear to be the fact that it uses the inspect module to obtain the appropriate module based on the current call stack.
This is not a decorator approach, but provides the level of efficiency I think you're after.
https://pypi.org/project/auto-all/
You can use the two functions provided with the package to "start" and "end" capturing the module objects that you want included in the __all__ variable.
from auto_all import start_all, end_all
# Imports outside the start and end functions won't be externally availab;e.
from pathlib import Path
def a_private_function():
print("This is a private function.")
# Start defining externally accessible objects
start_all(globals())
def a_public_function():
print("This is a public function.")
# Stop defining externally accessible objects
end_all(globals())
The functions in the package are trivial (a few lines), so could be copied into your code if you want to avoid external dependencies.
While other variants are technically correct to a certain extent, one might also be sure that:
if the target module already has __all__ declared, it is handled correctly;
target appears in __all__ only once:
# utils.py
import sys
from typing import Any
def export(target: Any) -> Any:
"""
Mark a module-level object as exported.
Simplifies tracking of objects available via wildcard imports.
"""
mod = sys.modules[target.__module__]
__all__ = getattr(mod, '__all__', None)
if __all__ is None:
__all__ = []
setattr(mod, '__all__', __all__)
elif not isinstance(__all__, list):
__all__ = list(__all__)
setattr(mod, '__all__', __all__)
target_name = target.__name__
if target_name not in __all__:
__all__.append(target_name)
return target
I apologize for the mess that is the title.
I'm tackling a problem in which I want to have a module in a subdirectory from my main.py. I'd like to have any number of .py files in the subdirectory. From there, I'd like to take in user input, for example the string "foo", and then search through all the methods in this module and call it if it exists. I'm looking at some sort of frankenstein combination of either dir or the inspect module, and the getattr/hasattr methods, but haven't had any luck figuring out a way that works.
inspect.getmembers(module_name, inspect.ismethod)
This returns me a large mess of pre-defined methods that I'm unsure how to sort through. If there's a better way of going about that, TYIA. Otherwise, how would I go about the situation described above?
For your concrete case, this should work. Loop through all files in your subdirectory, try to import them as modules and try to find and execute the function whose name you are given.
import importlib, os
pkg = 'some_pkg' # this must be a python package
for x in os.listdir(pkg):
try:
module = importlib.import_module(pkg + '.' + x.replace('.py', ''))
fnc = getattr(module, 'foo')
fnc()
# break in case you want to stop at the first 'foo' fnc you find
except:
print('no foo here, or not a module!')
Quick background: writing a module. One of my objects has methods that may or may not be successfully completed - depending on the framework used underneath my module. So a few methods first need to check what framework they actually have under their feet. Current way of tackling this is:
def framework_dependent_function():
try:
import module.that.may.not.be.available
except ImportError:
# the required functionality is not available
# this function can not be run
raise WrongFramework
# or should I just leave the previous exception reach higher levels?
[ ... and so on ... ]
Yet something in my mind keeps telling me that doing imports in the middle of a file is a bad thing. Can't remember why, can't even come up with a reason - apart from slightly messier code, I guess.
So, is there anything downright wrong about doing what I'm doing here? Perhaps other ways of scouting what environment the module is running in, somewhere near __init__?
This version may be faster, because not every call to the function needs to try to import the necessary functionality:
try:
import module.that.may.not.be.available
def framework_dependent_function():
# whatever
except ImportError:
def framework_dependent_function():
# the required functionality is not available
# this function can not be run
raise NotImplementedError
This also allows you to do a single attempt to import the module, then define all of the functions that might not be available in a single block, perhaps even as
def notimplemented(*args, **kwargs):
raise NotImplementedError
fn1 = fn2 = fn3 = notimplemented
Put this at the top of your file, near the other imports, or in a separate module (my current project has one called utils.fixes). If you don't like function definition in a try/except block, then do
try:
from module.that.may.not.be.available import what_we_need
except ImportError:
what_we_need = notimplemented
If these functions need to be methods, you can then add them to your class later:
class Foo(object):
# assuming you've added a self argument to the previous function
framework_dependent_method = framework_dependent_function
Similar to larsmans suggestion but with a slight change
def NotImplemented():
raise NotImplementedError
try:
import something.external
except ImportError:
framework_dependent_function = NotImplemented
def framework_dependent_function():
#whatever
return
I don't like the idea of function definitions in the try: except: of the import
You could also use imp.find_module (see here) in order to check for the presence of a specific module.
I'm trying to find a way to lazily load a module-level variable.
Specifically, I've written a tiny Python library to talk to iTunes, and I want to have a DOWNLOAD_FOLDER_PATH module variable. Unfortunately, iTunes won't tell you where its download folder is, so I've written a function that grabs the filepath of a few podcast tracks and climbs back up the directory tree until it finds the "Downloads" directory.
This takes a second or two, so I'd like to have it evaluated lazily, rather than at module import time.
Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?
You can't do it with modules, but you can disguise a class "as if" it was a module, e.g., in itun.py, code...:
import sys
class _Sneaky(object):
def __init__(self):
self.download = None
#property
def DOWNLOAD_PATH(self):
if not self.download:
self.download = heavyComputations()
return self.download
def __getattr__(self, name):
return globals()[name]
# other parts of itun that you WANT to code in
# module-ish ways
sys.modules[__name__] = _Sneaky()
Now anybody can import itun... and get in fact your itun._Sneaky() instance. The __getattr__ is there to let you access anything else in itun.py that may be more convenient for you to code as a top-level module object, than inside _Sneaky!_)
It turns out that as of Python 3.7, it's possible to do this cleanly by defining a __getattr__() at the module level, as specified in PEP 562 and documented in the data model chapter in the Python reference documentation.
# mymodule.py
from typing import Any
DOWNLOAD_FOLDER_PATH: str
def _download_folder_path() -> str:
global DOWNLOAD_FOLDER_PATH
DOWNLOAD_FOLDER_PATH = ... # compute however ...
return DOWNLOAD_FOLDER_PATH
def __getattr__(name: str) -> Any:
if name == "DOWNLOAD_FOLDER_PATH":
return _download_folder_path()
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
I used Alex' implementation on Python 3.3, but this crashes miserably:
The code
def __getattr__(self, name):
return globals()[name]
is not correct because an AttributeError should be raised, not a KeyError.
This crashed immediately under Python 3.3, because a lot of introspection is done
during the import, looking for attributes like __path__, __loader__ etc.
Here is the version that we use now in our project to allow for lazy imports
in a module. The __init__ of the module is delayed until the first attribute access
that has not a special name:
""" config.py """
# lazy initialization of this module to avoid circular import.
# the trick is to replace this module by an instance!
# modelled after a post from Alex Martelli :-)
Lazy module variables--can it be done?
class _Sneaky(object):
def __init__(self, name):
self.module = sys.modules[name]
sys.modules[name] = self
self.initializing = True
def __getattr__(self, name):
# call module.__init__ after import introspection is done
if self.initializing and not name[:2] == '__' == name[-2:]:
self.initializing = False
__init__(self.module)
return getattr(self.module, name)
_Sneaky(__name__)
The module now needs to define an init function. This function can be used
to import modules that might import ourselves:
def __init__(module):
...
# do something that imports config.py again
...
The code can be put into another module, and it can be extended with properties
as in the examples above.
Maybe that is useful for somebody.
For Python 3.5 and 3.6, the proper way of doing this, according to the Python docs, is to subclass types.ModuleType and then dynamically update the module's __class__. So, here's a solution loosely on Christian Tismer's answer but probably not resembling it much at all:
import sys
import types
class _Sneaky(types.ModuleType):
#property
def DOWNLOAD_FOLDER_PATH(self):
if not hasattr(self, '_download_folder_path'):
self._download_folder_path = '/dev/block/'
return self._download_folder_path
sys.modules[__name__].__class__ = _Sneaky
For Python 3.7 and later, you can define a module-level __getattr__() function. See PEP 562 for details.
Since Python 3.7 (and as a result of PEP-562), this is now possible with the module-level __getattr__:
Inside your module, put something like:
def _long_function():
# print() function to show this is called only once
print("Determining DOWNLOAD_FOLDER_PATH...")
# Determine the module-level variable
path = "/some/path/here"
# Set the global (module scope)
globals()['DOWNLOAD_FOLDER_PATH'] = path
# ... and return it
return path
def __getattr__(name):
if name == "DOWNLOAD_FOLDER_PATH":
return _long_function()
# Implicit else
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
From this it should be clear that the _long_function() isn't executed when you import your module, e.g.:
print("-- before import --")
import somemodule
print("-- after import --")
results in just:
-- before import --
-- after import --
But when you attempt to access the name from the module, the module-level __getattr__ will be called, which in turn will call _long_function, which will perform the long-running task, cache it as a module-level variable, and return the result back to the code that called it.
For example, with the first block above inside the module "somemodule.py", the following code:
import somemodule
print("--")
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
produces:
--
Determining DOWNLOAD_FOLDER_PATH...
/some/path/here
--
/some/path/here
--
or, more clearly:
# LINE OF CODE # OUTPUT
import somemodule # (nothing)
print("--") # --
print(somemodule.DOWNLOAD_FOLDER_PATH) # Determining DOWNLOAD_FOLDER_PATH...
# /some/path/here
print("--") # --
print(somemodule.DOWNLOAD_FOLDER_PATH) # /some/path/here
print("--") # --
Lastly, you can also implement __dir__ as the PEP describes if you want to indicate (e.g. to code introspection tools) that DOWNLOAD_FOLDER_PATH is available.
Is there any way to lazily assign a module variable when it's first accessed or will I have to rely on a function?
I think you are correct in saying that a function is the best solution to your problem here.
I will give you a brief example to illustrate.
#myfile.py - an example module with some expensive module level code.
import os
# expensive operation to crawl up in directory structure
The expensive operation will be executed on import if it is at module level. There is not a way to stop this, short of lazily importing the entire module!!
#myfile2.py - a module with expensive code placed inside a function.
import os
def getdownloadsfolder(curdir=None):
"""a function that will search upward from the user's current directory
to find the 'Downloads' folder."""
# expensive operation now here.
You will be following best practice by using this method.
Recently I came across the same problem, and have found a way to do it.
class LazyObject(object):
def __init__(self):
self.initialized = False
setattr(self, 'data', None)
def init(self, *args):
#print 'initializing'
pass
def __len__(self): return len(self.data)
def __repr__(self): return repr(self.data)
def __getattribute__(self, key):
if object.__getattribute__(self, 'initialized') == False:
object.__getattribute__(self, 'init')(self)
setattr(self, 'initialized', True)
if key == 'data':
return object.__getattribute__(self, 'data')
else:
try:
return object.__getattribute__(self, 'data').__getattribute__(key)
except AttributeError:
return super(LazyObject, self).__getattribute__(key)
With this LazyObject, You can define a init method for the object, and the object will be initialized lazily, example code looks like:
o = LazyObject()
def slow_init(self):
time.sleep(1) # simulate slow initialization
self.data = 'done'
o.init = slow_init
the o object above will have exactly the same methods whatever 'done' object have, for example, you can do:
# o will be initialized, then apply the `len` method
assert len(o) == 4
complete code with tests (works in 2.7) can be found here:
https://gist.github.com/observerss/007fedc5b74c74f3ea08
If that variable lived in a class rather than a module, then you could overload getattr, or better yet, populate it in init.
SPEC 1
Probably the best known recipe for Lazy Loading module attributes (and modules) is in SPEC 1 (Draft) at scientific-python.org. SPECs are operational guidelines for projects in the Scientific Python ecosystem. There is discussion around the SPEC 1 at Scientific Python Discourse and the solution is offered as a package in PyPI as lazy_loader. The lazy_loader implementation relies on the module __gettattr__ support introduced in Python 3.7 (PEP 562), and it is used in scikit-image, NetworkX, and partially in Scipy
Example usage:
The following example is using the lazy_loader PyPI package. You could also just copy-paste the source code to be part of your project.
# mypackage/__init__.py
import lazy_loader
__getattr__, __dir__, __all__ = lazy_loader.attach(
__name__,
submodules=['bar'],
submod_attrs={
'foo.morefoo': ['FooFilter', 'do_foo', 'MODULE_VARIABLE'],
'grok.spam': ['spam_a', 'spam_b', 'spam_c']
}
)
this is the lazy import equivalent of
from . import bar
from .foo.morefoo import FooFilter, do_foo, MODULE_VARIABLE
from .grok.spam import (spam_a, spam_b, spam_c)
Short explanation on lazy_loader.attach
If you want to lazy-load a module, you list it in submodules (which is a list)
If you want to lazy-load something from a module (function, class, etc.), you list it in submod_attrs (which is a dict)
Type checking
Static type checkers and IDEs cannot infer type information from lazily loaded imports. As workaround, you may use type stubs (.pyi files), like this:
# mypackage/__init__.pyi
from .foo.morefoo import FooFilter as FooFilter, do_foo as do_foo, MODULE_VARIABLE as MODULE_VARIABLE
from .grok.spam import spam_a as spam_a, spam_b as spam_b, spam_c as spam_c
The SPEC 1 mentions that this X as X syntax is necessary due to PEP484.
Side notes
There was recently a PEP for Lazy Imports, PEP 690, but it was rejected.
In Tensorflow, there is lazy loading class at util.lazyloader.
There is one blog post from Brett Cannon (a Python core developer), where he showed in 2018 a module __getattr__ based implementation of lazy_loader, and provided it in a package called modutil, but the project is marked archived in GitHub. This has been an inspiration for the scientific-python lazy_loader.