Managing optional dependencies in __init__.py

Managing optional dependencies in __init__.py - python

I am developing a python package K (so it has an __init__.py).
Such package contains different sub-packages, each about a different part of my work, Let us call one of these M (so it also has its own __init__.py).
Now M has 2 modules A and B containing one or more functions each, not important how many, but that with a difference: all functions in A depend on an optional dependency opt_dep_A and analogously for B opt_dep_B.
Both optional dependencies can be installed when installing K with pip as e.g. pip install 'K[opt_dep_A]'.
Now I am looking to make the user experience "modular", meaning that if the user is interested in the B functions he/she should be able to use them (provided opt_dep_B is installed) without worrying about A.
Here's the problem: A belongs to the same sub-package as B (because their functions relate to the same high-level "group").
How can I deal with this clash when importing/exposing from K's or M's __init__.py?
In both cases, when I do e.g. from M/__init__.py
from .A import func_A
from .B import func_B
if opt_dep_A is not installed, but the user does from K import func_B or from M import func_B than any catched/uncatched ImportError or raised warning from the A module will get triggered, even if I never wanted to import stuff from A.
I'd still like A and B to belong to the same level, though: how to maintain the modularity and still keep the same package structure? Is it possible?
I tried try/except clauses, importlib.util.find_spec, but the problem is really the fact that I am importing one part from the same level.

So, here is a basic way to handle this. In K/M/A.py, you can have something like this:
import whatever # not optional dependency
def func_A():
import opt_dep_A
...
def some_other_func():
# Doesn't require opt_dep_A
...
def func_A_frobnicate():
# requires opt_dep_A as well, but dont worry, imports are cached
import opt_dep_A
...
Essentially, you don't actually import the optional dependencies until the function (or class or whatever) is actually used. This way, it doesn't matter what namespace you move func_A into.
This is the way that, for example, pandas handles this, see:
https://github.com/pandas-dev/pandas/blob/8dab54d6573f7186ff0c3b6364d5e4dd635ff3e7/pandas/io/pytables.py#L559
Note, they wrap the import in a function that provides some error handling and a useful error message, as well as some other book-keeping logic, but the basic idea is the same.
Note, an approach where you handle the ImportError would work, of course, then you would be getting random NameErrors whenever you try to use the functions that have the dependency. So, I don't really think doing that saves you much work because then you have to handle the issue that the name won't exist in the module if you wrap your import at the top level in a try... except ImportError

Related

How to properly deal with optional features in python

I'm working on python packages that implement scientific models and I'm wondering what is the best way to handle optional features.
Here's the behavior I'd like:
If some optional dependencies can't be imported (plotting module on a headless machine for example), I'd like to disable the functions using these modules in my classes, warn the user if he tries to use them and all that without breaking the execution.
so the following script would work in any cases:
mymodel.dostuff()
mymodel.plot() <= only plots if possible, else display log an error
mymodel.domorestuff() <= get executed regardless of the result of the previous statement
So far the options I see are the following:
check in the __init __.py for available modules and keep a list of
them (but how to properly use it in the rest of the package?)
for each function relying on optional dependencies have a try import ...
except ... statement
putting functions depending on a particular module in a separated file
These options should work, but they all seem to be rather hacky and hard to maintain. what if we want to drop a dependency completely? or make it mandatory?

The easiest solution, of course, is to simply import the optional dependencies in the body of the function that requires them. But the always-right PEP 8 says:
Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
Not wanting to go against the best wishes of the python masters, I take the following approach, which has several benefits...
First, import with an try-except
Say one of my functions foo needs numpy, and I want to make it an optional dependency. At the top of the module, I put:
try:
import numpy as _numpy
except ImportError:
_has_numpy = False
else:
_has_numpy = True
Here (in the except block) would be the place to print a warning, preferably using the warnings module.
Then throw the exception in the function
What if the user calls foo and doesn't have numpy? I throw the exception there and document this behaviour.
def foo(x):
"""Requires numpy."""
if not _has_numpy:
raise ImportError("numpy is required to do this.")
...
Alternatively you can use a decorator and apply it to any function requiring that dependency:
#requires_numpy
def foo(x):
...
This has the benefit of preventing code duplication.
And add it as an optional dependency to your install script
If you're distributing code, look up how to add the extra dependency to the setup configuration. For example, with setuptools, I can write:
install_requires = ["networkx"],
extras_require = {
"numpy": ["numpy"],
"sklearn": ["scikit-learn"]}
This specifies that networkx is absolutely required at install time, but that the extra functionality of my module requires numpy and sklearn, which are optional.
Using this approach, here are the answers to your specific questions:
What if we want to make a dependency mandatory?
We can simply add our optional dependency to our setup tool's list of required dependencies. In the example above, we move numpy to install_requires. All of the code checking for the existence of numpy can then be removed, but leaving it in won't cause your program to break.
What if we want to drop a dependency completely?
Simply remove the check for the dependency in any function that previously required it. If you implemented the dependency check with a decorator, you could just change it so that it simply passes the original function through unchanged.
This approach has the benefit of placing all of the imports at the top of the module so that I can see at a glance what is required and what is optional.

I would use the mixin style of composing a class. Keep optional behaviour in separate classes and subclass those classes in your main class. If you detect that the optional behaviour is not possible then create a dummy mixin class instead. For example:
model.py
import numpy
import plotting
class Model(PrimaryBaseclass, plotting.Plotter):
def do_something(self):
...
plotting.py
from your_util_module import headless as _headless
__all__ = ["Plotter"]
if _headless:
import warnings
class Plotter:
def plot(self):
warnings.warn("Attempted to plot in a headless environment")
else:
class Plotter:
"""Expects an attribute called `data' when plotting."""
def plot(self):
...
Or, as an alternative, use decorators to describe when a function might be unavailable.
eg.
class unavailable:
def __init__(self, *, when):
self.when = when
def __call__(self, func):
if self.when:
def dummy(self, *args, **kwargs):
warnings.warn("{} unavailable with current setup"
.format(func.__qualname__))
return dummy
else:
return func
class Model:
#unavailable(when=headless)
def plot(self):
...

Python module: how to prevent importing modules called by the new module

I am new in Python and I am creating a module to re-use some code.
My module (impy.py) looks like this (it has one function so far)...
import numpy as np
def read_image(fname):
....
and it is stored in the following directory:
custom_modules/
__init.py__
impy.py
As you can see it uses the module numpy. The problem is that when I import it from another script, like this...
import custom_modules.impy as im
and I type im. I get the option of calling not only the function read_image() but also the module np.
How can I do to make it only available the functions I am writing in my module and not the modules that my module is calling (numpy in this case)?
Thank you very much for your help.

I've got a proposition, that could maybe answer the following concern: "I do not want to mess class/module attributes with class/module imports". Because, Idle also proposes access to imported modules within a class or module.
This simply consists in taking the conventional name that coders normally don't want to access and IDE not to propose: name starting with underscore. This is also known as "weak « internal use » indicator", as described in PEP 8 / Naming styles.
class C(object):
import numpy as _np # <-- here
def __init__(self):
# whatever we need
def do(self, arg):
# something useful
Now, in Idle, auto-completion will only propose do function; imported module is not proposed.
By the way, you should change the title of your question: you do not want to avoid imports of your imported modules (that would make them unusable), so it should rather be "how to prevent IDE to show imported modules of an imported module" or something similar.

You could import numpy inside your function
def read_image(fname):
import numpy as np
....
making it locally available to the read_image code, but not globally available.
Warning though, this might cause a performance hit (as numpy would be imported each time the code is run rather than just once on the initial import) - especially if you run read_image multiple times.

If you really want to hide it, then I suggest creating a new directory such that your structure looks like this:
custom_modules/
__init__.py
impy/
__init__.py
impy.py
and let the new impy/__init__.py contain
from impy import read_image
This way, you can control what ends up in the custom_modules.impy namespace.

Experiment trying to avoid Python circular dependencies

I have a testing environment to try to understand how python circular dependencies can be avoided importing the modules with an import x statement, instead of using a from x import y:
test/
__init__.py
testing.py
a/
__init__.py
m_a.py
b/
__init__.py
m_b.py
The files have the following content:
testing.py:
from a.m_a import A
m_a.py:
import b.m_b
print b.m_b
class A:
pass
m_b.py:
import a.m_a
print a.m_a
class B:
pass
There is a situation which I can't understand:
If I remove the print statements from modules m_a.py and m_b.py or only from m_b.py this works OK, but if the print is present at m_b.py, then the following error is thrown:
File "testing.py", line 1, in <module>
from a.m_a import A
File "/home/enric/test/a/m_a.py", line 1, in <module>
import b.m_b
File "/home/enric/test/b/m_b.py", line 3, in <module>
print a.m_a
AttributeError: 'module' object has no attribute 'm_a'
Do you have any ideas?

It only "works" with the print statements removed because you're not actually doing anything that depends on the imports. It's still a broken circular import.
Either run this in the debugger, or add a print statement after each line, and you'll see what happens:
testing.py: from a.m_a import A
a.m_a: import b.m_b
b.m_b: import a.m_a
b.m_b: print a.m_a
It's clearly trying to access a.m_a before the module finished importing. (In fact, you can see the rest of a.m_a on the stack in your backtrace.)
If you dump out sys.modules at this point, you'll find two partial modules named a and a.m_a, but if you dir(a), there's no m_a there yet.
As far as I can tell, the fact that m_a doesn't get added to a until m_a.py finishes evaluating is not documented anywhere in the Python 2.7 documentation. (3.x has much a more complete specification of the import process—but it's also a very different import process.) So, you can't rely on this either failing or succeeding; either one is perfectly legal for an implementation. (But it happens to fail in at least CPython and PyPy…)
More generally, using import foo instead of from foo import bar doesn't magically solve all circular-import problems. It just solves one particular class of circular-import problems (or, rather, makes that class moot). (I realize there is some misleading text in the FAQ about this.)
There are various tricks to work around circular imports while still letting you have circular top-level dependencies. But really, it's almost always simpler to get rid of the circular top-level dependencies.
In this toy case, there's really no reason for a.m_a to depend on b.m_b at all. If you need some that prints out a.m_a, there are better ways to get it than from a completely independent package!
In real-life code, there probably is some stuff in m_a that m_b needs and vice-versa. But usually, you can separate it out into two levels: stuff in m_a that needs m_b, and stuff in m_a that's needed by m_b. So, just split it into two modules. It's really the same thing as the common fix for a bunch of modules that try to reach back up and import main: split a utils off main.
What if there really is something that m_b needs from m_a, that also needs m_b? Well, in that case, you may have to insert a level of indirection. For example, maybe you can pass the thing-from-m_b into the function/constructor/whatever from m_a, so it can access it as a local parameter value instead of as a global. (It's hard to be more specific without a more specific problem.)
If worst comes to worst, and you can't remove the import via indirection, you have to move the import out of the way. That may again mean doing an import inside a function call, etc. (as explained in the FAQ immediately after the paragraph that set you off), or just moving some code above the import, or all kinds of other possibilities. But consider these last-ditch solutions to something which just can't be designed cleanly, not a roadmap to follow for your designs.

Control python imports to reduce size and overhead

I have created a number of personal libraries to help with my daily coding. Best practice is to put imports at the beginning of python programs. But say I import my library, or even just a function or class from the library. All of the modules are imported (even if those modules are used in other unused classes or functions). I assume this increases the overhead of the program?
One example. I have a library called pytools which looks something like this
import difflib
def foo():
# uses difflib.SequenceMatcher
def bar():
# benign function ie
print "Hello!"
return True
class foobar:
def __init__():
print "New foobar"
def ret_true():
return True
The function foo uses difflib. Now say I am writing a new program that needs to use bar and foobar. I could either write
import pytools
...
item = pytools.foobar()
vals = pytools.bar()
or I could do
from pytools import foobar, bar
...
item = foobar()
vals = bar()
Does either choice reduce overhead or preclude the import of foo and its dependencies on difflib? What if the import to difflib was inside of the foo function?
The problem I am running into is when converting simple programs into executables that only use one or two classes or functions from my libraries, The executable ends up being 50 mb or so.
I have read through py2exe's optimizing size page and can optimize using some of its suggestions.
http://www.py2exe.org/index.cgi/OptimizingSize
I guess I am really asking for best practice here. Is there some way to preclude the import of libraries whose dependencies are in unused functions or classes? I've watched import statements execute using a debugger and it appears that python only "picks up" the line with "def somefunction" before moving on. Is the rest of the import not completed until the function/class is used? This would mean putting high volume imports at the beginning of a function or class could reduce overhead for the rest of the library.

The only way to effectively reduce your dependencies is to split your tool box into smaller modules, and to only import the modules you need.
Putting imports at the beginning of unused functions will prevent loading these modules at run-time, but is discouraged because it hides the dependecies. Moreover, your Python-to-executable converter will likely need to include these modules anyway, since Python's dynamic nature makes it impossible to statically determine which functions are actually called.

Why is "import *" bad?

It is recommended to not to use import * in Python.
Can anyone please share the reason for that, so that I can avoid it doing next time?

Because it puts a lot of stuff into your namespace (might shadow some other object from previous import and you won't know about it).
Because you don't know exactly what is imported and can't easily find from which module a certain thing was imported (readability).
Because you can't use cool tools like pyflakes to statically detect errors in your code.

According to the Zen of Python:
Explicit is better than implicit.
... can't argue with that, surely?

You don't pass **locals() to functions, do you?
Since Python lacks an "include" statement, and the self parameter is explicit, and scoping rules are quite simple, it's usually very easy to point a finger at a variable and tell where that object comes from -- without reading other modules and without any kind of IDE (which are limited in the way of introspection anyway, by the fact the language is very dynamic).
The import * breaks all that.
Also, it has a concrete possibility of hiding bugs.
import os, sys, foo, sqlalchemy, mystuff
from bar import *
Now, if the bar module has any of the "os", "mystuff", etc... attributes, they will override the explicitly imported ones, and possibly point to very different things. Defining __all__ in bar is often wise -- this states what will implicitly be imported - but still it's hard to trace where objects come from, without reading and parsing the bar module and following its imports. A network of import * is the first thing I fix when I take ownership of a project.
Don't misunderstand me: if the import * were missing, I would cry to have it. But it has to be used carefully. A good use case is to provide a facade interface over another module.
Likewise, the use of conditional import statements, or imports inside function/class namespaces, requires a bit of discipline.
I think in medium-to-big projects, or small ones with several contributors, a minimum of hygiene is needed in terms of statical analysis -- running at least pyflakes or even better a properly configured pylint -- to catch several kind of bugs before they happen.
Of course since this is python -- feel free to break rules, and to explore -- but be wary of projects that could grow tenfold, if the source code is missing discipline it will be a problem.

That is because you are polluting the namespace. You will import all the functions and classes in your own namespace, which may clash with the functions you define yourself.
Furthermore, I think using a qualified name is more clear for the maintenance task; you see on the code line itself where a function comes from, so you can check out the docs much more easily.
In module foo:
def myFunc():
print 1
In your code:
from foo import *
def doThis():
myFunc() # Which myFunc is called?
def myFunc():
print 2

It is OK to do from ... import * in an interactive session.

Say you have the following code in a module called foo:
import ElementTree as etree
and then in your own module you have:
from lxml import etree
from foo import *
You now have a difficult-to-debug module that looks like it has lxml's etree in it, but really has ElementTree instead.

Understood the valid points people put here. However, I do have one argument that, sometimes, "star import" may not always be a bad practice:
When I want to structure my code in such a way that all the constants go to a module called const.py:
If I do import const, then for every constant, I have to refer it as const.SOMETHING, which is probably not the most convenient way.
If I do from const import SOMETHING_A, SOMETHING_B ..., then obviously it's way too verbose and defeats the purpose of the structuring.
Thus I feel in this case, doing a from const import * may be a better choice.

http://docs.python.org/tutorial/modules.html
Note that in general the practice of importing * from a module or package is frowned upon, since it often causes poorly readable code.

These are all good answers. I'm going to add that when teaching new people to code in Python, dealing with import * is very difficult. Even if you or they didn't write the code, it's still a stumbling block.
I teach children (about 8 years old) to program in Python to manipulate Minecraft. I like to give them a helpful coding environment to work with (Atom Editor) and teach REPL-driven development (via bpython). In Atom I find that the hints/completion works just as effectively as bpython. Luckily, unlike some other statistical analysis tools, Atom is not fooled by import *.
However, lets take this example... In this wrapper they from local_module import * a bunch modules including this list of blocks. Let's ignore the risk of namespace collisions. By doing from mcpi.block import * they make this entire list of obscure types of blocks something that you have to go look at to know what is available. If they had instead used from mcpi import block, then you could type walls = block. and then an autocomplete list would pop up.

It is a very BAD practice for two reasons:
Code Readability
Risk of overriding the variables/functions etc
For point 1:
Let's see an example of this:
from module1 import *
from module2 import *
from module3 import *
a = b + c - d
Here, on seeing the code no one will get idea regarding from which module b, c and d actually belongs.
On the other way, if you do it like:
# v v will know that these are from module1
from module1 import b, c # way 1
import module2 # way 2
a = b + c - module2.d
# ^ will know it is from module2
It is much cleaner for you, and also the new person joining your team will have better idea.
For point 2: Let say both module1 and module2 have variable as b. When I do:
from module1 import *
from module2 import *
print b # will print the value from module2
Here the value from module1 is lost. It will be hard to debug why the code is not working even if b is declared in module1 and I have written the code expecting my code to use module1.b
If you have same variables in different modules, and you do not want to import entire module, you may even do:
from module1 import b as mod1b
from module2 import b as mod2b

As a test, I created a module test.py with 2 functions A and B, which respectively print "A 1" and "B 1". After importing test.py with:
import test
. . . I can run the 2 functions as test.A() and test.B(), and "test" shows up as a module in the namespace, so if I edit test.py I can reload it with:
import importlib
importlib.reload(test)
But if I do the following:
from test import *
there is no reference to "test" in the namespace, so there is no way to reload it after an edit (as far as I can tell), which is a problem in an interactive session. Whereas either of the following:
import test
import test as tt
will add "test" or "tt" (respectively) as module names in the namespace, which will allow re-loading.
If I do:
from test import *
the names "A" and "B" show up in the namespace as functions. If I edit test.py, and repeat the above command, the modified versions of the functions do not get reloaded.
And the following command elicits an error message.
importlib.reload(test) # Error - name 'test' is not defined
If someone knows how to reload a module loaded with "from module import *", please post. Otherwise, this would be another reason to avoid the form:
from module import *

As suggested in the docs, you should (almost) never use import * in production code.
While importing * from a module is bad, importing * from a package is probably even worse.
By default, from package import * imports whatever names are defined by the package's __init__.py, including any submodules of the package that were loaded by previous import statements.
If a package’s __init__.py code defines a list named __all__, it is taken to be the list of submodule names that should be imported when from package import * is encountered.
Now consider this example (assuming there's no __all__ defined in sound/effects/__init__.py):
# anywhere in the code before import *
import sound.effects.echo
import sound.effects.surround
# in your module
from sound.effects import *
The last statement will import the echo and surround modules into the current namespace (possibly overriding previous definitions) because they are defined in the sound.effects package when the import statement is executed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Managing optional dependencies in init.py - python

Related

How to properly deal with optional features in python

Python module: how to prevent importing modules called by the new module

Experiment trying to avoid Python circular dependencies

Control python imports to reduce size and overhead

Why is "import *" bad?

Categories

Resources