Problem
Consider the following layout:
package/
main.py
math_helpers/
mymath.py
__init__.py
mymath.py contains:
import math
def foo():
pass
In main.py I want to be able to use code from mymath.py like so:
import math_helpers
math_helpers.foo()
In order to do so, __init__.py contains:
from .mymath import *
However, modules imported in mymath.py are now in the math_helpers namespace, e.g. math_helpers.math is accessible.
Current approach
I'm adding the following at the end of mymath.py.
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
This seems to work, but is it the correct approach?
On the one hand there are many good reasons not to do star imports, but on the other hand, python is for consenting adults.
__all__ is the recommended approach to determining what shows up in a star import. Your approach is correct, and you can further sanitize the namespace when finished:
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
del types
While less recommended, you can also sanitize elements directly out of the module, so that they don't show up at all. This will be a problem if you need to use them in a function defined in the module, since every function object has a __globals__ reference that is bound to its parent module's __dict__. But if you only import math_helpers to call math_helpers.foo(), and don't require a persistent reference to it elsewhere in the module, you can simply unlink it at the end:
del math_helpers
Long Version
A module import runs the code of the module in the namespace of the module's __dict__. Any names that are bound at the top level, whether by class definition, function definition, direct assignment, or other means, live in the that dictionary. Sometimes, it is desirable to clean up intermediate variables, as I suggested doing with types.
Let's say your module looks like this:
test_module.py
import math
import numpy as np
def x(n):
return math.sqrt(n)
class A(np.ndarray):
pass
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
In this case, __all__ will be ['x', 'A']. However, the module itself will contain the following names: 'math', 'np', 'x', 'A', 'types', '__all__'.
If you run del types at the end, it will remove that name from the namespace. Clearly this is safe because types is not referenced anywhere once __all__ has been constructed.
Similarly, if you wanted to remove np by adding del np, that would be OK. The class A is fully constructed by the end of the module code, so it does not require the global name np to reference its parent class.
Not so with math. If you were to do del math at the end of the module code, the function x would not work. If you import your module, you can see that x.__globals__ is the module's __dict__:
import test_module
test_module.__dict__ is test_module.x.__globals__
If you delete math from the module dictionary and call test_module.x, you will get
NameError: name 'math' is not defined
So you under some very special circumstances you may be able to sanitize the namespace of mymath.py, but that is not the recommended approach as it only applies to certain cases.
In conclusion, stick to using __all__.
A Story That's Sort of Relevant
One time, I had two modules that implemented similar functionality, but for different types of end users. There were a couple of functions that I wanted to copy out of module a into module b. The problem was that I wanted the functions to work as if they had been defined in module b. Unfortunately, they depended on a constant that was defined in a. b defined its own version of the constant. For example:
a.py
value = 1
def x():
return value
b.py
from a import x
value = 2
I wanted b.x to access b.value instead of a.value. I pulled that off by adding the following to b.py (based on https://stackoverflow.com/a/13503277/2988730):
import functools, types
x = functools.update_wrapper(types.FunctionType(x.__code__, globals(), x.__name__, x.__defaults__, x.__closure__), x)
x.__kwdefaults__ = x.__wrapped__.__kwdefaults__
x.__module__ = __name__
del functools, types
Why am I telling you all this? Well, you can make a version of your module that does not have any stray names in your namespace. You won't be able to see changes to global variables in your functions though. This is just an exercise in pushing python beyond its normal usage. I highly don't recommend doing this, but here is a sample module that effectively freezes its __dict__ as far as the functions are concerned. This has the same members as test_module above, but with no modules in the global namespace:
import math
import numpy as np
def x(n):
return math.sqrt(n)
class A(np.ndarray):
pass
import functools, types, sys
def wrap(obj):
""" Written this way to be able to handle classes """
for name in dir(obj):
if name.startswith('_'):
continue
thing = getattr(obj, name)
if isinstance(thing, FunctionType) and thing.__module__ == __name__:
setattr(obj, name,
functools.update_wrapper(types.FunctionType(thing.func_code, d, thing.__name__, thing.__defaults__, thing.__closure__), thing)
getattt(obj, name).__kwdefaults__ = thing.__kwdefaults__
elif isinstance(thing, type) and thing.__module__ == __name__:
wrap(thing)
d = globals().copy()
wrap(sys.modules[__name__])
del d, wrap, sys, math, np, functools, types
So yeah, please don't ever do this! But if you do, stick it in a utility class somewhere.
Related
Hi I'm building my own package and I have a question on __all__.
Are there any neat way to define __all__, other than explicitly typing each and every function in the module?
I find it very tedious...
I'm trying to make some code which wraps on frequently used libraries such as numpy, pytorch, os. The problem is, the libraries I used to create my modules also gets imported when I import my package.
I want to import every function / class that I defined, but I don't want the third-party libraries that I used in the process to get imported.
I use from .submodule import * in my __init__.py so that I can access my functions inside the submodule directly. (Just like we can access functions directly from the top package like np.sum(), torch.sum() )
My submodule has a lot of functions, and I want to import all of them to __init__.py, except for the third-party packages that I used.
I see that __all__ defines what to import when from package import * is called.
For example,
utils.py
__all__ = ['a']
def a():
pass
def b():
pass
__init__.py
from .utils import *
and
>>> import package
>>> package.a()
None
>>> package.b()
NameError: 'package.b' is not defined
What I want is something like
__all__ = Some_neat_fancy_method()
I tried locals() and dir(), but got lost along the way.
Any suggestions?
As others have pointed out, the whole point of __all__ is to explicitly specify what gets exposed to star-imports. By default everything is. If you really want to specify what doesn't get exposed instead, you can do a little trick and include all modules in __all__ and then remove the ones you want to exclude.
For example:
def _exclude(exclusions: list) -> list:
import types
# add everything as long as it's not a module and not prefixed with _
functions = [name for name, function in globals().items()
if not (name.startswith('_') or isinstance(function, types.ModuleType))]
# remove the exclusions from the functions
for exclusion in exclusions:
if exclusion in functions:
functions.remove(exclusion)
del types # deleting types from scope, introduced from the import
return functions
# the _ prefix is important, to not add these to the __all__
_exclusions = ["function1", "function2"]
__all__ = _exclude(_exclusions)
You can of course repurpose this to simply include everything that's not a function or prefixed with _ but it serves little use since everything is included in star-imports if you don't specify the __all__, so I thought it was better to include the exclusion idea. This way you can simply tell it to exclude specific functions.
Are there any neat way to define all, other than explicitly typing each and every function in the module?
Not built-in no. But defining __all__ by hand is basically the entire point, if you want to include everything in __all__ you can just do nothing at all:
If __all__ is not defined, the statement from sound.effects import * [...] ensures that the package sound.effects has been imported (possibly running any initialization code in __init__.py) and then imports whatever names are defined in the package.
The entire point of __all__ is to restricts what gets "exported" by star-imports. There's no real way for Python to know that except by having you tell it, for each symbol, whether it should be there or not.
One easy workaround is to alias all of your imports with a leading underscore. Anything with a leading underscore is excluded from from x import * style imports.
import numpy as _np
import pandas as _pd
def my_fn():
...
What is the equivalent of import * in Python using functions (presumably from importlib)?
I know that you can import a module with mod = __import__(...), which will delegate to whatever the currently configured implementation is. You can also do something like
mod_spec = importlib.utl.spec_from_file_location(...)
mod = importlib.util.module_from_spec(mod_spec)
mod_spec.loader.exec_module(mod)
which allows you to do crazy things like injecting things into the module by inserting them before the call to exec_module. (Courtesy of https://stackoverflow.com/a/67692/2988730 and https://stackoverflow.com/a/38650878/2988730)
However, my question remains. How does import * work in function form? What function determines which names to load from a module depending on the presence/contents of __all__?
There's no function for from whatever import *. In fact, there's no function for import whatever, either! When you do
mod = __import__(...)
the __import__ function is only responsible for part of the job. It provides you with a module object, but you have to assign that module object to a variable separately. There's no function that will import a module and assign it to a variable the way import whatever does.
In from whatever import *, there are two parts:
prepare the module object for whatever
assign variables
The "prepare the module object" part is almost identical to in import whatever, and it can be handled by the same function, __import__. There's a minor difference in that import * will load any not-yet-loaded submodules in a package's __all__ list; __import__ will handle this for you if you provide fromlist=['*']:
module = __import__('whatever', fromlist=['*'])
The part about assigning names is where the big differences occur, and again, you have to handle that yourself. It's fairly straightforward, as long as you're at global scope:
if hasattr(module, '__all__'):
all_names = module.__all__
else:
all_names = [name for name in dir(module) if not name.startswith('_')]
globals().update({name: getattr(module, name) for name in all_names})
Function scopes don't support assigning variables determined at runtime.
I want my_module to export __all__ as empty list, i.e.
from my_module import *
assert '__all__' in dir() and __all__ == []
I can export __all__ like this (in 'my_module.py'):
__all__ = ['__all__']
However it predictably binds __all__ to itself , so that
from my_module import *
assert '__all__' in dir() and __all__ == ['__all__']
How can I export __all__ as an empty list? Failing that, how can I hook into import process to put __all__ into importing module's __dict__ on every top level import my_module statement, circumventing module caching.
I'll start with saying this is, in my mind, a terrible idea. You really should not implicitly alter what is exported from a module, this goes counter to the Zen of Python: Explicit is better than implicit..
I also agree with the highest-voted answer on the question you cite; Python already has a mechanism to mark functions 'private', by convention we use a leading underscore to indicate a function should not be considered part of the module API. This approach works with existing tools, vs. the decorator dynamically setting __all__ which certainly breaks static code analysers.
That out of the way, here is a shotgun pointing at your foot. Use it with care.
What you want here is a way to detect when names are imported. You cannot normally do this; there are no hooks for import statements. Once a module has been imported from source, a module object is added to sys.modules and re-used for subsequent imports, but that object is not notified of imports.
What you can do is hook into attribute access. Not with the default module object, but you can stuff any object into sys.modules and it'll be treated as a module. You could just subclass the module type even, then add a __getattribute__ method to that. It'll be called when importing any name with from module import name, for all names listed in __all__ when using from module import *, and in Python 3, __spec__ is accessed for all import forms, even when doing just import module.
You can then use this to hack your way into the calling frame globals, via sys._getframe():
import sys
import types
class AttributeAccessHookModule(types.ModuleType):
def __getattribute__(self, name):
if name == '__all__':
# assume we are being imported with from module import *
g = sys._getframe(1).f_globals
if '__all__' not in g:
g['__all__'] = []
return super(AttributeAccessHook, self).__getattribute__(name)
# replace *this* module with our hacked-up version
# this part goes at the *end* of your module.
replacement = sys.module[__name__] = AttributeAccessHook(__name__, __doc__)
for name, obj in globals().items():
setattr(replacement, name, obj)
The guy there sets __all__ on first decorator application, so not explicitly exporting anything causes it to implicitly export everything. I am trying to improve on this design: if the decorator is imported, then export nothing my default, regardless of it's usage.
Just set __all__ to an empty list at the start of your module, e.g.:
# this is my_module.py
from utilitymodule import public
__all__ = []
# and now you could use your #public decorator to optionally add module to it
I've literally been trying to understand Python imports for about a year now, and I've all but given up programming in Python because it just seems too obfuscated. I come from a C background, and I assumed that import worked like #include, yet if I try to import something, I invariably get errors.
If I have two files like this:
foo.py:
a = 1
bar.py:
import foo
print foo.a
input()
WHY do I need to reference the module name? Why not just be able to write import foo, print a? What is the point of this confusion? Why not just run the code and have stuff defined for you as if you wrote it in one big file? Why can't it work like C's #include directive where it basically copies and pastes your code? I don't have import problems in C.
To do what you want, you can use (not recommended, read further for explanation):
from foo import *
This will import everything to your current namespace, and you will be able to call print a.
However, the issue with this approach is the following. Consider the case when you have two modules, moduleA and moduleB, each having a function named GetSomeValue().
When you do:
from moduleA import *
from moduleB import *
you have a namespace resolution issue*, because what function are you actually calling with GetSomeValue(), the moduleA.GetSomeValue() or the moduleB.GetSomeValue()?
In addition to this, you can use the Import As feature:
from moduleA import GetSomeValue as AGetSomeValue
from moduleB import GetSomeValue as BGetSomeValue
Or
import moduleA.GetSomeValue as AGetSomeValue
import moduleB.GetSomeValue as BGetSomeValue
This approach resolves the conflict manually.
I am sure you can appreciate from these examples the need for explicit referencing.
* Python has its namespace resolution mechanisms, this is just a simplification for the purpose of the explanation.
Imagine you have your a function in your module which chooses some object from a list:
def choice(somelist):
...
Now imagine further that, either in that function or elsewhere in your module, you are using randint from the random library:
a = randint(1, x)
Therefore we
import random
You suggestion, that this does what is now accessed by from random import *, means that we now have two different functions called choice, as random includes one too. Only one will be accessible, but you have introduced ambiguity as to what choice() actually refers to elsewhere in your code.
This is why it is bad practice to import everything; either import what you need:
from random import randint
...
a = randint(1, x)
or the whole module:
import random
...
a = random.randint(1, x)
This has two benefits:
You minimise the risks of overlapping names (now and in future additions to your imported modules); and
When someone else reads your code, they can easily see where external functions come from.
There are a few good reasons. The module provides a sort of namespace for the objects in it, which allows you to use simple names without fear of collisions -- coming from a C background you have surely seen libraries with long, ugly function names to avoid colliding with anybody else.
Also, modules themselves are also objects. When a module is imported in more than one place in a python program, each actually gets the same reference. That way, changing foo.a changes it for everybody, not just the local module. This is in contrast to C where including a header is basically a copy+paste operation into the source file (obviously you can still share variables, but the mechanism is a bit different).
As mentioned, you can say from foo import * or better from foo import a, but understand that the underlying behavior is actually different, because you are taking a and binding it to your local module.
If you use something often, you can always use the from syntax to import it directly, or you can rename the module to something shorter, for example
import itertools as it
When you do import foo, a new module is created inside the current namespace named foo.
So, to use anything inside foo; you have to address it via the module.
However, if you use from from foo import something, you don't have use to prepend the module name, since it will load something from the module and assign to it the name something. (Not a recommended practice)
import importlib
# works like C's #include, you always call it with include(<path>, __name__)
def include(file, module_name):
spec = importlib.util.spec_from_file_location(module_name, file)
mod = importlib.util.module_from_spec(spec)
# spec.loader.exec_module(mod)
o = spec.loader.get_code(module_name)
exec(o, globals())
For example:
#### file a.py ####
a = 1
#### file b.py ####
b = 2
if __name__ == "__main__":
print("Hi, this is b.py")
#### file main.py ####
# assuming you have `include` in scope
include("a.py", __name__)
print(a)
include("b.py", __name__)
print(b)
the output will be:
1
Hi, this is b.py
2
I'm new to Python and programming in general (a couple of weeks at most).
Concerning Python and using modules, I realise that functions can imported using from a import *.
So instead of typing
a.sayHi()
a.sayBye()
I can say
sayHi()
sayBye()
which I find simplifies things a great deal. Now, say I have a bunch of variables that I want to use across modules and I have them all defined in one python module. How can I, using a similar method as mentioned above or an equally simple one, import these variables. I don't want to use import a and then be required to prefix all my variables with a..
The following situation would by ideal:
a.py
name = "Michael"
age = 15
b.py
some_function
if name == "Michael":
if age == 15:
print("Simple!")
Output:
Simple!
You gave the solution yourself: from a import * will work just fine. Python does not differentiate between functions and variables in this respect.
>>> from a import *
>>> if name == "Michael" and age == 15:
... print('Simple!')
...
Simple!
Just for some context, most linters will flag from module import * with a warning, because it's prone to namespace collisions that will cause headaches down the road.
Nobody has noted yet that, as an alternative, you can use the
from a import name, age
form and then use name and age directly (without the a. prefix). The from [module] import [identifiers] form is more future proof because you can easily see when one import will be overriding another.
Also note that "variables" aren't different from functions in Python in terms of how they're addressed -- every identifier like name or sayBye is pointing at some kind of object. The identifier name is pointing at a string object, sayBye is pointing at a function object, and age is pointing at an integer object. When you tell Python:
from a import name, age
you're saying "take those objects pointed at by name and age within module a and point at them in the current scope with the same identifiers".
Similarly, if you want to point at them with different identifiers on import, you can use the
from a import sayBye as bidFarewell
form. The same function object gets pointed at, except in the current scope the identifier pointing at it is bidFarewell whereas in module a the identifier pointing at it is sayBye.
Like others have said,
from module import *
will also import the modules variables.
However, you need to understand that you are not importing variables, just references to objects. Assigning something else to the imported names in the importing module won't affect the other modules.
Example: assume you have a module module.py containing the following code:
a= 1
b= 2
Then you have two other modules, mod1.py and mod2.py which both do the following:
from module import *
In each module, two names, a and b are created, pointing to the objects 1 and 2, respectively.
Now, if somewhere in mod1.py you assign something else to the global name a:
a= 3
the name a in module.py and the name a in mod2.py will still point to the object 1.
So from module import * will work if you want read-only globals, but it won't work if you want read-write globals. If the latter, you're better off just importing import module and then either getting the value (module.a) or setting the value (module.a= …) prefixed by the module.
You didn't say this directly, but I'm assuming you're having trouble with manipulating these global variables.
If you manipulate global variables from inside a function, you must declare them global
a = 10
def x():
global a
a = 15
print a
x()
print a
If you don't do that, then a = 15 will just create a local variable and assign it 15, while the global a stays 10