Export decorator that manages __all__ - python

A proper Python module will list all its public symbols in a list called __all__. Managing that list can be tedious, since you'll have to list each symbol twice. Surely there are better ways, probably using decorators so one would merely annotate the exported symbols as #export.
How would you write such a decorator? I'm certain there are different ways, so I'd like to see several answers with enough information that users can compare the approaches against one another.

In Is it a good practice to add names to __all__ using a decorator?, Ed L suggests the following, to be included in some utility library:
import sys
def export(fn):
"""Use a decorator to avoid retyping function/class names.
* Based on an idea by Duncan Booth:
http://groups.google.com/group/comp.lang.python/msg/11cbb03e09611b8a
* Improved via a suggestion by Dave Angel:
http://groups.google.com/group/comp.lang.python/msg/3d400fb22d8a42e1
"""
mod = sys.modules[fn.__module__]
if hasattr(mod, '__all__'):
name = fn.__name__
all_ = mod.__all__
if name not in all_:
all_.append(name)
else:
mod.__all__ = [fn.__name__]
return fn
We've adapted the name to match the other examples. With this in a local utility library, you'd simply write
from .utility import export
and then start using #export. Just one line of idiomatic Python, you can't get much simpler than this. On the downside, the module does require access to the module by using the __module__ property and the sys.modules cache, both of which may be problematic in some of the more esoteric setups (like custom import machinery, or wrapping functions from another module to create functions in this module).
The python part of the atpublic package by Barry Warsaw does something similar to this. It offers some keyword-based syntax, too, but the decorator variant relies on the same patterns used above.
This great answer by Aaron Hall suggests something very similar, with two more lines of code as it doesn't use __dict__.setdefault. It might be preferable if manipulating the module __dict__ is problematic for some reason.

You could simply declare the decorator at the module level like this:
__all__ = []
def export(obj):
__all__.append(obj.__name__)
return obj
This is perfect if you only use this in a single module. At 4 lines of code (plus probably some empty lines for typical formatting practices) it's not overly expensive to repeat this in different modules, but it does feel like code duplication in those cases.

You could define the following in some utility library:
def exporter():
all = []
def decorator(obj):
all.append(obj.__name__)
return obj
return decorator, all
export, __all__ = exporter()
export(exporter)
# possibly some other utilities, decorated with #export as well
Then inside your public library you'd do something like this:
from . import utility
export, __all__ = utility.exporter()
# start using #export
Using the library takes two lines of code here. It combines the definition of __all__ and the decorator. So people searching for one of them will find the other, thus helping readers to quickly understand your code. The above will also work in exotic environments, where the module may not be available from the sys.modules cache or where the __module__ property has been tampered with or some such.

https://github.com/russianidiot/public.py has yet another implementation of such a decorator. Its core file is currently 160 lines long! The crucial points appear to be the fact that it uses the inspect module to obtain the appropriate module based on the current call stack.

This is not a decorator approach, but provides the level of efficiency I think you're after.
https://pypi.org/project/auto-all/
You can use the two functions provided with the package to "start" and "end" capturing the module objects that you want included in the __all__ variable.
from auto_all import start_all, end_all
# Imports outside the start and end functions won't be externally availab;e.
from pathlib import Path
def a_private_function():
print("This is a private function.")
# Start defining externally accessible objects
start_all(globals())
def a_public_function():
print("This is a public function.")
# Stop defining externally accessible objects
end_all(globals())
The functions in the package are trivial (a few lines), so could be copied into your code if you want to avoid external dependencies.

While other variants are technically correct to a certain extent, one might also be sure that:
if the target module already has __all__ declared, it is handled correctly;
target appears in __all__ only once:
# utils.py
import sys
from typing import Any
def export(target: Any) -> Any:
"""
Mark a module-level object as exported.
Simplifies tracking of objects available via wildcard imports.
"""
mod = sys.modules[target.__module__]
__all__ = getattr(mod, '__all__', None)
if __all__ is None:
__all__ = []
setattr(mod, '__all__', __all__)
elif not isinstance(__all__, list):
__all__ = list(__all__)
setattr(mod, '__all__', __all__)
target_name = target.__name__
if target_name not in __all__:
__all__.append(target_name)
return target

Related

How do I dynamically generate module contents in Python?

I know there are ways to perform dynamic import of Python modules themselves, but I would like to know if there's a way to write a module such that it can dynamically create its own module contents on demand. I am imagining a module hook that looks something like:
# In some_module.py:
def __import_name__(name):
return some_object
Such that if I were to write from some_module import foo in a script, Python will call some_module.__import_name__("foo") and let me dynamically create and return the contents.
I haven't found anything that works like this exactly in the documentation, though there are references to an "import protocol" with "finders" and "loaders" and "meta hooks" and "import path hooks" that permit customization of the import logic, and I imagine that such a thing is possible.
I discovered you can modify the behavior of a Module from within itself in arbitrary ways by setting sys.modules[__name__].__class__ to a class that implements whatever your chosen behavior.
import sys
import types
class DynamicModule(types.ModuleType):
# This function is what gets called on `from this_module import whatever`
# or `this_module.whatever` accesses.
def __getattr__(self, name):
# This check ensures we don't intercept special values like __path__
# if they're not set elsewhere.
if name.startswith("__") and name.endswith("__"):
return self.__getattribute__(name)
return make_object(name)
# Helpful to define this here if you need to dynamically construct the
# full set of available attributes.
#property
def __all__(self):
return get_all_objects()
# This ensures the DynamicModule class is used to define the behavior of
# this module.
sys.modules[__name__].__class__ = DynamicModule
Something about this feels like it may not be the intended path to do something like this, though, and that I should be hooking into the importlib machinery.

Neat way to define __all__?

Hi I'm building my own package and I have a question on __all__.
Are there any neat way to define __all__, other than explicitly typing each and every function in the module?
I find it very tedious...
I'm trying to make some code which wraps on frequently used libraries such as numpy, pytorch, os. The problem is, the libraries I used to create my modules also gets imported when I import my package.
I want to import every function / class that I defined, but I don't want the third-party libraries that I used in the process to get imported.
I use from .submodule import * in my __init__.py so that I can access my functions inside the submodule directly. (Just like we can access functions directly from the top package like np.sum(), torch.sum() )
My submodule has a lot of functions, and I want to import all of them to __init__.py, except for the third-party packages that I used.
I see that __all__ defines what to import when from package import * is called.
For example,
utils.py
__all__ = ['a']
def a():
pass
def b():
pass
__init__.py
from .utils import *
and
>>> import package
>>> package.a()
None
>>> package.b()
NameError: 'package.b' is not defined
What I want is something like
__all__ = Some_neat_fancy_method()
I tried locals() and dir(), but got lost along the way.
Any suggestions?
As others have pointed out, the whole point of __all__ is to explicitly specify what gets exposed to star-imports. By default everything is. If you really want to specify what doesn't get exposed instead, you can do a little trick and include all modules in __all__ and then remove the ones you want to exclude.
For example:
def _exclude(exclusions: list) -> list:
import types
# add everything as long as it's not a module and not prefixed with _
functions = [name for name, function in globals().items()
if not (name.startswith('_') or isinstance(function, types.ModuleType))]
# remove the exclusions from the functions
for exclusion in exclusions:
if exclusion in functions:
functions.remove(exclusion)
del types # deleting types from scope, introduced from the import
return functions
# the _ prefix is important, to not add these to the __all__
_exclusions = ["function1", "function2"]
__all__ = _exclude(_exclusions)
You can of course repurpose this to simply include everything that's not a function or prefixed with _ but it serves little use since everything is included in star-imports if you don't specify the __all__, so I thought it was better to include the exclusion idea. This way you can simply tell it to exclude specific functions.
Are there any neat way to define all, other than explicitly typing each and every function in the module?
Not built-in no. But defining __all__ by hand is basically the entire point, if you want to include everything in __all__ you can just do nothing at all:
If __all__ is not defined, the statement from sound.effects import * [...] ensures that the package sound.effects has been imported (possibly running any initialization code in __init__.py) and then imports whatever names are defined in the package.
The entire point of __all__ is to restricts what gets "exported" by star-imports. There's no real way for Python to know that except by having you tell it, for each symbol, whether it should be there or not.
One easy workaround is to alias all of your imports with a leading underscore. Anything with a leading underscore is excluded from from x import * style imports.
import numpy as _np
import pandas as _pd
def my_fn():
...

How should I perform imports in a python module without polluting its namespace?

I am developing a Python package for dealing with some scientific data. There are multiple frequently-used classes and functions from other modules and packages, including numpy, that I need in virtually every function defined in any module of the package.
What would be the Pythonic way to deal with them? I have considered multiple variants, but every has its own drawbacks.
Import the classes at module-level with from foreignmodule import Class1, Class2, function1, function2
Then the imported functions and classes are easily accessible from every function. On the other hand, they pollute the module namespace making dir(package.module) and help(package.module) cluttered with imported functions
Import the classes at function-level with from foreignmodule import Class1, Class2, function1, function2
The functions and classes are easily accessible and do not pollute the module, but imports from up to a dozen modules in every function look as a lot of duplicate code.
Import the modules at module-level with import foreignmodule
Not too much pollution is compensated by the need to prepend the module name to every function or class call.
Use some artificial workaround like using a function body for all these manipulations and returning only the objects to be exported... like this
def _export():
from foreignmodule import Class1, Class2, function1, function2
def myfunc(x):
return function1(x, function2(x))
return myfunc
myfunc = _export()
del _export
This manages to solve both problems, module namespace pollution and ease of use for functions... but it seems to be not Pythonic at all.
So what solution is the most Pythonic? Is there another good solution I overlooked?
Go ahead and do your usual from W import X, Y, Z and then use the __all__ special symbol to define what actual symbols you intend people to import from your module:
__all__ = ('MyClass1', 'MyClass2', 'myvar1', …)
This defines the symbols that will be imported into a user's module if they import * from your module.
In general, Python programmers should not be using dir() to figure out how to use your module, and if they are doing so it might indicate a problem somewhere else. They should be reading your documentation or typing help(yourmodule) to figure out how to use your library. Or they could browse the source code yourself, in which case (a) the difference between things you import and things you define is quite clear, and (b) they will see the __all__ declaration and know which toys they should be playing with.
If you try to support dir() in a situation like this for a task for which it was not designed, you will have to place annoying limitations on your own code, as I hope is clear from the other answers here. My advice: don't do it! Take a look at the Standard Library for guidance: it does from … import … whenever code clarity and conciseness require it, and provides (1) informative docstrings, (2) full documentation, and (3) readable code, so that no one ever has to run dir() on a module and try to tell the imports apart from the stuff actually defined in the module.
One technique I've seen used, including in the standard library, is to use import module as _module or from module import var as _var, i.e. assigning imported modules/variables to names starting with an underscore.
The effect is that other code, following the usual Python convention, treats those members as private. This applies even for code that doesn't look at __all__, such as IPython's autocomplete function.
An example from Python 3.3's random module:
from warnings import warn as _warn
from types import MethodType as _MethodType, BuiltinMethodType as _BuiltinMethodType
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
from math import sqrt as _sqrt, acos as _acos, cos as _cos, sin as _sin
from os import urandom as _urandom
from collections.abc import Set as _Set, Sequence as _Sequence
from hashlib import sha512 as _sha512
Another technique is to perform imports in function scope, so that they become local variables:
"""Some module"""
# imports conventionally go here
def some_function(arg):
"Do something with arg."
import re # Regular expressions solve everything
...
The main rationale for doing this is that it is effectively lazy, delaying the importing of a module's dependencies until they are actually used. Suppose one function in the module depends on a particular huge library. Importing the library at the top of the file would mean that importing the module would load the entire library. This way, importing the module can be quick, and only client code that actually calls that function incurs the cost of loading the library. Further, if the dependency library is not available, client code that doesn't need the dependent feature can still import the module and call the other functions. The disadvantage is that using function-level imports obscures what your code's dependencies are.
Example from Python 3.3's os.py:
def get_exec_path(env=None):
"""[...]"""
# Use a local import instead of a global import to limit the number of
# modules loaded at startup: the os module is always loaded at startup by
# Python. It may also avoid a bootstrap issue.
import warnings
Import the module as a whole: import foreignmodule. What you claim as a drawback is actually a benefit. Namely, prepending the module name makes your code easier to maintain and makes it more self-documenting.
Six months from now when you look at a line of code like foo = Bar(baz) you may ask yourself which module Bar came from, but with foo = cleverlib.Bar it is much less of a mystery.
Of course, the fewer imports you have, the less of a problem this is. For small programs with few dependencies it really doesn't matter all that much.
When you find yourself asking questions like this, ask yourself what makes the code easier to understand, rather than what makes the code easier to write. You write it once but you read it a lot.
For this situation I would go with an all_imports.py file which had all the
from foreignmodule import .....
from another module import .....
and then in your working modules
import all_imports as fgn # or whatever you want to prepend
...
something = fgn.Class1()
Another thing to be aware of
__all__ = ['func1', 'func2', 'this', 'that']
Now, any functions/classes/variables/etc that are in your module, but not in your modules's __all__ will not show up in help(), and won't be imported by from mymodule import * See Making python imports more structured? for more info.
I would compromise and just pick a short alias for the foreign module:
import foreignmodule as fm
It saves you completely from the pollution (probably the bigger issue) and at least reduces the prepending burden.
I know this is an old question. It may not be 'Pythonic', but the cleanest way I've discovered for exporting only certain module definitions is, really as you've found, to globally wrap the module in a function. But instead of returning them, to export names, you can simply globalize them (global thus in essence becomes a kind of 'export' keyword):
def module():
global MyPublicClass,ExportedModule
import somemodule as ExportedModule
import anothermodule as PrivateModule
class MyPublicClass:
def __init__(self):
pass
class MyPrivateClass:
def __init__(self):
pass
module()
del module
I know it's not much different than your original conclusion, but frankly to me this seems to be the cleanest option. The other advantage is, you can group any number of modules written this way into a single file, and their private terms won't overlap:
def module():
global A
i,j,k = 1,2,3
class A:
pass
module()
del module
def module():
global B
i,j,k = 7,8,9 # doesn't overwrite previous declarations
class B:
pass
module()
del module
Though, keep in mind their public definitions will, of course, overlap.

Python namespaces: How to make unique objects accessible in other modules?

I am writing a moderate-sized (a few KLOC) PyQt app. I started out writing it in nice modules for ease of comprehension but I am foundering on the rules of Python namespaces. At several points it is important to instantiate just one object of a class as a resource for other code.
For example: an object that represents Aspell attached as a subprocess, offering a check(word) method. Another example: the app features a single QTextEdit and other code needs to call on methods of this singular object, e.g. "if theEditWidget.document().isEmpty()..."
No matter where I instantiate such an object, it can only be referenced from code in that module and no other. So e.g. the code of the edit widget can't call on the Aspell gateway object unless the Aspell object is created in the same module. Fine except it is also needed from other modules.
In this question the bunch class is offered, but it seems to me a bunch has exactly the same problem: it's a unique object that can only be used in the module where it's created. Or am I completely missing the boat here?
OK suggested elsewhere, this seems like a simple answer to my problem. I just tested the following:
junk_main.py:
import junk_A
singularResource = junk_A.thing()
import junk_B
junk_B.handle = singularResource
print junk_B.look()
junk_A.py:
class thing():
def __init__(self):
self.member = 99
junk_B.py:
def look():
return handle.member
When I run junk_main it prints 99. So the main code can inject names into modules just by assignment. I am trying to think of reasons this is a bad idea.
You can access objects in a module with the . operator just like with a function. So, for example:
# Module a.py
a = 3
>>> import a
>>> print a.a
3
This is a trivial example, but you might want to do something like:
# Module EditWidget.py
theEditWidget = EditWidget()
...
# Another module
import EditWidget
if EditWidget.theEditWidget.document().isEmpty():
Or...
import * from EditWidget
if theEditWidget.document().isEmpty():
If you do go the import * from route, you can even define a list named __all__ in your modules with a list of the names (as strings) of all the objects you want your module to export to *. So if you wanted only theEditWidget to be exported, you could do:
# Module EditWidget.py
__all__ = ["theEditWidget"]
theEditWidget = EditWidget()
...
It turns out the answer is simpler than I thought. As I noted in the question, the main module can add names to an imported module. And any code can add members to an object. So the simple way to create an inter-module communication area is to create a very basic object in the main, say IMC (for inter-module communicator) and assign to it as members, anything that should be available to other modules:
IMC.special = A.thingy()
IMC.important_global_constant = 0x0001
etc. After importing any module, just assign IMC to it:
import B
B.IMC = IMC
Now, this is probably not the greatest idea from a software design standpoint. If you just limit IMC to holding named constants, it acts like a C header file. If it's just to give access to singular resources, it's like a link extern. But because of Python's liberal rules, code in any module can modify or add members to IMC. Used in an undisciplined way, "who changed that" could be a debugging issue. If there are multiple processes, race conditions are a danger.
At several points it is important to instantiate just one object of a class as a resource for other code.
Instead of trying to create some sort of singleton factory, can you not create the single-use object somewhere between the main point of entry for the program and instantiating the object that needs it? The single-use object can just be passed as a parameter to the other object. Logically, then, you won't create the single-use object more than once.
For example:
def main(...):
aspell_instance = ...
myapp = MyAppClass(aspell_instance)
or...
class SomeWidget(...):
def __init__(self, edit_widget):
self.edit_widget = edit_widget
def onSomeEvent(self, ...):
if self.edit_widget.document().isEmpty():
....
I don't know if that's clear enough, or if it's applicable to your situation. But to be honest, the only time I've found I can't do this is in a CherryPy-based webserver, where the points of entry were pretty much everywhere.

Is it a good practice to add names to __all__ using a decorator?

Is this a good practice in Python (from Active State Recipes -- Public Decorator)?
import sys
def public(f):
"""Use a decorator to avoid retyping function/class names.
* Based on an idea by Duncan Booth:
http://groups.google.com/group/comp.lang.python/msg/11cbb03e09611b8a
* Improved via a suggestion by Dave Angel:
http://groups.google.com/group/comp.lang.python/msg/3d400fb22d8a42e1
"""
all = sys.modules[f.__module__].__dict__.setdefault('__all__', [])
if f.__name__ not in all: # Prevent duplicates if run from an IDE.
all.append(f.__name__)
return f
public(public) # Emulate decorating ourself
The general idea would be to define a decorator that takes a function or class
and adds its name to the __all__ of the current module.
The more idiomatic way to do this in Python is to mark the private functions as private by starting their name with an underscore:
def public(x):
...
def _private_helper(y):
...
More people will be familiar with this style (which is also supported by the language: _private_helper will not be exported even if you do not use __all__) than with your public decorator.
Yes, it's a good practice. This decorator allows you to state your intentions right at function or class definition, rather than directly afterwards. That makes your code more readable.
#public
def foo():
pass
#public
class bar():
pass
class helper(): # not part of the modules public interface!
pass
Note: helper is still accessible to a user of the module by modulename.helper. It's just not imported with from modulename import *.
I think the question is a bit subjective, but I like the idea. I usually use __all__ in my modules but I sometimes forget to add a new function that I intended to be part of the public interface of the module. Since I usually import modules by name and not by wildcards, I don't notice the error until someone else in my team (who uses the wildcard syntax to import the entire public interface of a module) starts to complain.
Note: the title of the question is misleading as others have already noticed among the answers.
This doesn't automatically add names to __all__, it simply allows you to add a function to all by decorating it with #public. Seems like a nice idea to me.

Categories

Resources