I want my_module to export __all__ as empty list, i.e.
from my_module import *
assert '__all__' in dir() and __all__ == []
I can export __all__ like this (in 'my_module.py'):
__all__ = ['__all__']
However it predictably binds __all__ to itself , so that
from my_module import *
assert '__all__' in dir() and __all__ == ['__all__']
How can I export __all__ as an empty list? Failing that, how can I hook into import process to put __all__ into importing module's __dict__ on every top level import my_module statement, circumventing module caching.
I'll start with saying this is, in my mind, a terrible idea. You really should not implicitly alter what is exported from a module, this goes counter to the Zen of Python: Explicit is better than implicit..
I also agree with the highest-voted answer on the question you cite; Python already has a mechanism to mark functions 'private', by convention we use a leading underscore to indicate a function should not be considered part of the module API. This approach works with existing tools, vs. the decorator dynamically setting __all__ which certainly breaks static code analysers.
That out of the way, here is a shotgun pointing at your foot. Use it with care.
What you want here is a way to detect when names are imported. You cannot normally do this; there are no hooks for import statements. Once a module has been imported from source, a module object is added to sys.modules and re-used for subsequent imports, but that object is not notified of imports.
What you can do is hook into attribute access. Not with the default module object, but you can stuff any object into sys.modules and it'll be treated as a module. You could just subclass the module type even, then add a __getattribute__ method to that. It'll be called when importing any name with from module import name, for all names listed in __all__ when using from module import *, and in Python 3, __spec__ is accessed for all import forms, even when doing just import module.
You can then use this to hack your way into the calling frame globals, via sys._getframe():
import sys
import types
class AttributeAccessHookModule(types.ModuleType):
def __getattribute__(self, name):
if name == '__all__':
# assume we are being imported with from module import *
g = sys._getframe(1).f_globals
if '__all__' not in g:
g['__all__'] = []
return super(AttributeAccessHook, self).__getattribute__(name)
# replace *this* module with our hacked-up version
# this part goes at the *end* of your module.
replacement = sys.module[__name__] = AttributeAccessHook(__name__, __doc__)
for name, obj in globals().items():
setattr(replacement, name, obj)
The guy there sets __all__ on first decorator application, so not explicitly exporting anything causes it to implicitly export everything. I am trying to improve on this design: if the decorator is imported, then export nothing my default, regardless of it's usage.
Just set __all__ to an empty list at the start of your module, e.g.:
# this is my_module.py
from utilitymodule import public
__all__ = []
# and now you could use your #public decorator to optionally add module to it
Related
Hi I'm building my own package and I have a question on __all__.
Are there any neat way to define __all__, other than explicitly typing each and every function in the module?
I find it very tedious...
I'm trying to make some code which wraps on frequently used libraries such as numpy, pytorch, os. The problem is, the libraries I used to create my modules also gets imported when I import my package.
I want to import every function / class that I defined, but I don't want the third-party libraries that I used in the process to get imported.
I use from .submodule import * in my __init__.py so that I can access my functions inside the submodule directly. (Just like we can access functions directly from the top package like np.sum(), torch.sum() )
My submodule has a lot of functions, and I want to import all of them to __init__.py, except for the third-party packages that I used.
I see that __all__ defines what to import when from package import * is called.
For example,
utils.py
__all__ = ['a']
def a():
pass
def b():
pass
__init__.py
from .utils import *
and
>>> import package
>>> package.a()
None
>>> package.b()
NameError: 'package.b' is not defined
What I want is something like
__all__ = Some_neat_fancy_method()
I tried locals() and dir(), but got lost along the way.
Any suggestions?
As others have pointed out, the whole point of __all__ is to explicitly specify what gets exposed to star-imports. By default everything is. If you really want to specify what doesn't get exposed instead, you can do a little trick and include all modules in __all__ and then remove the ones you want to exclude.
For example:
def _exclude(exclusions: list) -> list:
import types
# add everything as long as it's not a module and not prefixed with _
functions = [name for name, function in globals().items()
if not (name.startswith('_') or isinstance(function, types.ModuleType))]
# remove the exclusions from the functions
for exclusion in exclusions:
if exclusion in functions:
functions.remove(exclusion)
del types # deleting types from scope, introduced from the import
return functions
# the _ prefix is important, to not add these to the __all__
_exclusions = ["function1", "function2"]
__all__ = _exclude(_exclusions)
You can of course repurpose this to simply include everything that's not a function or prefixed with _ but it serves little use since everything is included in star-imports if you don't specify the __all__, so I thought it was better to include the exclusion idea. This way you can simply tell it to exclude specific functions.
Are there any neat way to define all, other than explicitly typing each and every function in the module?
Not built-in no. But defining __all__ by hand is basically the entire point, if you want to include everything in __all__ you can just do nothing at all:
If __all__ is not defined, the statement from sound.effects import * [...] ensures that the package sound.effects has been imported (possibly running any initialization code in __init__.py) and then imports whatever names are defined in the package.
The entire point of __all__ is to restricts what gets "exported" by star-imports. There's no real way for Python to know that except by having you tell it, for each symbol, whether it should be there or not.
One easy workaround is to alias all of your imports with a leading underscore. Anything with a leading underscore is excluded from from x import * style imports.
import numpy as _np
import pandas as _pd
def my_fn():
...
Problem
Consider the following layout:
package/
main.py
math_helpers/
mymath.py
__init__.py
mymath.py contains:
import math
def foo():
pass
In main.py I want to be able to use code from mymath.py like so:
import math_helpers
math_helpers.foo()
In order to do so, __init__.py contains:
from .mymath import *
However, modules imported in mymath.py are now in the math_helpers namespace, e.g. math_helpers.math is accessible.
Current approach
I'm adding the following at the end of mymath.py.
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
This seems to work, but is it the correct approach?
On the one hand there are many good reasons not to do star imports, but on the other hand, python is for consenting adults.
__all__ is the recommended approach to determining what shows up in a star import. Your approach is correct, and you can further sanitize the namespace when finished:
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
del types
While less recommended, you can also sanitize elements directly out of the module, so that they don't show up at all. This will be a problem if you need to use them in a function defined in the module, since every function object has a __globals__ reference that is bound to its parent module's __dict__. But if you only import math_helpers to call math_helpers.foo(), and don't require a persistent reference to it elsewhere in the module, you can simply unlink it at the end:
del math_helpers
Long Version
A module import runs the code of the module in the namespace of the module's __dict__. Any names that are bound at the top level, whether by class definition, function definition, direct assignment, or other means, live in the that dictionary. Sometimes, it is desirable to clean up intermediate variables, as I suggested doing with types.
Let's say your module looks like this:
test_module.py
import math
import numpy as np
def x(n):
return math.sqrt(n)
class A(np.ndarray):
pass
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
In this case, __all__ will be ['x', 'A']. However, the module itself will contain the following names: 'math', 'np', 'x', 'A', 'types', '__all__'.
If you run del types at the end, it will remove that name from the namespace. Clearly this is safe because types is not referenced anywhere once __all__ has been constructed.
Similarly, if you wanted to remove np by adding del np, that would be OK. The class A is fully constructed by the end of the module code, so it does not require the global name np to reference its parent class.
Not so with math. If you were to do del math at the end of the module code, the function x would not work. If you import your module, you can see that x.__globals__ is the module's __dict__:
import test_module
test_module.__dict__ is test_module.x.__globals__
If you delete math from the module dictionary and call test_module.x, you will get
NameError: name 'math' is not defined
So you under some very special circumstances you may be able to sanitize the namespace of mymath.py, but that is not the recommended approach as it only applies to certain cases.
In conclusion, stick to using __all__.
A Story That's Sort of Relevant
One time, I had two modules that implemented similar functionality, but for different types of end users. There were a couple of functions that I wanted to copy out of module a into module b. The problem was that I wanted the functions to work as if they had been defined in module b. Unfortunately, they depended on a constant that was defined in a. b defined its own version of the constant. For example:
a.py
value = 1
def x():
return value
b.py
from a import x
value = 2
I wanted b.x to access b.value instead of a.value. I pulled that off by adding the following to b.py (based on https://stackoverflow.com/a/13503277/2988730):
import functools, types
x = functools.update_wrapper(types.FunctionType(x.__code__, globals(), x.__name__, x.__defaults__, x.__closure__), x)
x.__kwdefaults__ = x.__wrapped__.__kwdefaults__
x.__module__ = __name__
del functools, types
Why am I telling you all this? Well, you can make a version of your module that does not have any stray names in your namespace. You won't be able to see changes to global variables in your functions though. This is just an exercise in pushing python beyond its normal usage. I highly don't recommend doing this, but here is a sample module that effectively freezes its __dict__ as far as the functions are concerned. This has the same members as test_module above, but with no modules in the global namespace:
import math
import numpy as np
def x(n):
return math.sqrt(n)
class A(np.ndarray):
pass
import functools, types, sys
def wrap(obj):
""" Written this way to be able to handle classes """
for name in dir(obj):
if name.startswith('_'):
continue
thing = getattr(obj, name)
if isinstance(thing, FunctionType) and thing.__module__ == __name__:
setattr(obj, name,
functools.update_wrapper(types.FunctionType(thing.func_code, d, thing.__name__, thing.__defaults__, thing.__closure__), thing)
getattt(obj, name).__kwdefaults__ = thing.__kwdefaults__
elif isinstance(thing, type) and thing.__module__ == __name__:
wrap(thing)
d = globals().copy()
wrap(sys.modules[__name__])
del d, wrap, sys, math, np, functools, types
So yeah, please don't ever do this! But if you do, stick it in a utility class somewhere.
What is the equivalent of import * in Python using functions (presumably from importlib)?
I know that you can import a module with mod = __import__(...), which will delegate to whatever the currently configured implementation is. You can also do something like
mod_spec = importlib.utl.spec_from_file_location(...)
mod = importlib.util.module_from_spec(mod_spec)
mod_spec.loader.exec_module(mod)
which allows you to do crazy things like injecting things into the module by inserting them before the call to exec_module. (Courtesy of https://stackoverflow.com/a/67692/2988730 and https://stackoverflow.com/a/38650878/2988730)
However, my question remains. How does import * work in function form? What function determines which names to load from a module depending on the presence/contents of __all__?
There's no function for from whatever import *. In fact, there's no function for import whatever, either! When you do
mod = __import__(...)
the __import__ function is only responsible for part of the job. It provides you with a module object, but you have to assign that module object to a variable separately. There's no function that will import a module and assign it to a variable the way import whatever does.
In from whatever import *, there are two parts:
prepare the module object for whatever
assign variables
The "prepare the module object" part is almost identical to in import whatever, and it can be handled by the same function, __import__. There's a minor difference in that import * will load any not-yet-loaded submodules in a package's __all__ list; __import__ will handle this for you if you provide fromlist=['*']:
module = __import__('whatever', fromlist=['*'])
The part about assigning names is where the big differences occur, and again, you have to handle that yourself. It's fairly straightforward, as long as you're at global scope:
if hasattr(module, '__all__'):
all_names = module.__all__
else:
all_names = [name for name in dir(module) if not name.startswith('_')]
globals().update({name: getattr(module, name) for name in all_names})
Function scopes don't support assigning variables determined at runtime.
Lets say I have the following 2 classes in module a
class Real(object):
...
def print_stuff(self):
print 'real'
class Fake(Real):
def print_stff(self):
print 'fake'
in module b it uses the Real class
from a import Real
Real().print_stuff()
How do I monkey patch so that when b imports Real it's actually swapped with Fake?
I was trying to do like this in initialize script but it doesn't work.
if env == 'dev':
from a import Real, Fake
Real = Fake
My purpose is to use the Fake class in development mode.
You can use patch from the mock module. Here is an example:
with patch('yourpackage.b.Real') as fake_real:
fake_real.return_value = Fake()
foo = b.someClass()
foo.somemethod()
The issue is that when you do -
from a import Real, Fake
You are basically importing those two classes into your initialize script's namespace and creating Real and Fake names in the initialize script's namespace. Then you make the name Real in initialize script point to Fake , but that does not change anything in the actual a module.
If initialize script is another .py module/script at runs at the start of your original program , then you can use the below -
if env == 'dev':
import a
a.Real = a.Fake
Please note, this would make a.Real to refer to the Fake class whenever you use Real from a module after the above line is executed.
Though I would suggest that a better way would be to do this in your a module itself, by making it possible to check the env in that module, as -
if <someothermodule>.env == 'dev':
Real = Fake
As was asked in the comments -
Doesn't import a also import into initialize script's namespace? What's the difference between importing modules and classes?
The thing is that when you import just the class using from a import class , what you actually do is create that variable, class in your module namespace (in the module that you import it to) , changing that variable to point to something new in that module namespace does not affect the original class in its original module-object, its only affected in the module in which its changed.
But when you do import a, you are just importing the module a (and while importing the module object also gets cached in the sys.modules dictionary, so any other imports to a from any other modules would get this cached version from sys.modules ) (Another note, is that from a import something also internally imports a and caches it in sys.modules, but lets not get into those details as I think that is not necessary here).
And then when you do a.Real = <something> , you are changing the Real attribute of a module object, which points to the class, to something else, this mutates the a module directly, hence the change is also reflected, when the module a gets imported from some other module.
I have a system that collects all classes that derive from certain base classes and stores them in a dictionary. I want to avoid having to specify which classes are available (I would like to discover them programatically), so have used a from ModuleName import * statement. The user is then directed to place all tests to be collected in the ModuleName module. However, I cannot find a way to programatically determine what symbols were imported with that import statement. I have tried using dir() and __dict__ as indicated in the following example, but to no avail. How does one programatically find symbols imported in this manner (with import *)? I am unable to find them with the above methods.
testTypeFigureOuterrer.py:
from testType1 import *
from testType2 import *
class TestFigureOuterrer(object):
def __init__(self):
self.existingTests = {'type1':{},'type2':{}}
def findAndSortTests(self):
for symbol in dir(): # Also tried: dir(self) and __dict__
try:
thing = self.__getattribute__(symbol)
except AttributeError:
continue
if issubclass(thing,TestType1):
self.existingTests['type1'].update( dict(symbol,thing) )
elif issubclass(thing,TestType3):
self.existingTests['type2'].update( dict(symbol,thing) )
else:
continue
if __name__ == "__main__":
testFigureOuterrer = TestFigureOuterrer()
testFigureOuterrer.findAndSortTests()
testType1.py:
class TestType1(object):
pass
class TestA(TestType1):
pass
class TestB(TestType1):
pass
testType2.py:
class TestType2:
pass
class TestC(TestType2):
pass
class TestD(TestType2):
pass
Since you know the imports yourself, you should just import the module manually again, and then check the contents of the module. If an __all__ property is defined, its contents are imported as names when you do from module import *. Otherwise, just use all its members:
def getImportedNames (module):
names = module.__all__ if hasattr(module, '__all__') else dir(module)
return [name for name in names if not name.startswith('_')]
This has the benefit that you do not need to go through the globals, and filter everything out. And since you know the modules you import from at design time, you can also check them directly.
from testType1 import *
from testType2 import *
import testType1, testType2
print(getImportedNames(testType1))
print(getImportedNames(testType2))
Alternatively, you can also look up the module by its module name from sys.modules, so you don’t actually need the extra import:
import sys
def getImportedNames (moduleName):
module = sys.modules[moduleName]
names = module.__all__ if hasattr(module, '__all__') else dir(module)
return [name for name in names if not name.startswith('_')]
print(getImportedNames('testType1'))
print(getImportedNames('testType2'))
Take a look at this SO answer, which describes how to determine the name of loaded classes, you can get the name of all the classes defined within the context of the module.
import sys, inspect
clsmembers = inspect.getmembers(sys.modules['testType1'], inspect.isclass)
which is now defined as
[('TestA', testType1.TestA),
('TestB', testType1.TestB),
('TestType1', testType1.TestType1)]
You can also replace testType1 with __name__ when you're within the function of interest.
Don't use the * form of import. This dumps the imported names into your script's global namespace. Not only could they clobber some important bit of data by using the same name, you don't have any easy way to fish out the names you just imported. (Easiest way is probably to take a snapshot of globals().keys() before and after.)
Instead, import just the module:
import testType1
import testType2
Now you can easily get a list of what's in each module:
tests = dir(testType1)
And access each using getattr() on the module object:
for testname in tests:
test = getattr(testType1, testname)
if callable(test):
# do something with it