Automatic replacement of indirect imports - python

I have a small module that defines a function:
# small_file.py
def func():
...
I have a larger module that does a wildcard import of that one:
# giant_file.py
from small_file import *
...
I have hundreds of other files that indirectly import the function via giant_file:
# file1.py
from giant_file import func
...
# file2.py
import giant_file.func
...
# file3.py
from giant_file import func, something_not_in_small_file
...
I would like to automatically change all these other files to import directly from small_file.py. This is to avoid the overhead of loading all of giant_file.py.
My question is this: is there a good way to automatically change all these files to import directly?
My default plan is to write a redbaron-based tool, but I'm hoping there might be a more lightweight approach utilizing an IDE's refactoring capabilities.

I solved this problem by writing a static analysis tool that analyzes my entire codebase and constructs an AST for each file. Using the AST's, the tool traverses imports and attributes each symbol read to an original assignment. It then rewrites all the imports in each file with direct-imports (eliminating unused imports along the way).
There are some potential pathological cases that the tool misses, such as dynamically constructing strings and then assigning to them via globals(). But it works for most "normal" code.
I used redbaron for this. If I could do it again, I would use LibCST instead, as redbaron has various bugs and unsupported python language elements, while LibCST appears to be more mature and actively maintained.

Related

Using __future__ style imports for module specific features in Python

The Python future statement from __future__ import feature provides a nice way to ease the transition to new language features. Is it is possible to implement a similar feature for Python libraries: from myproject.__future__ import feature?
It's straightforward to set a module wide constants on an import statement. What isn't obvious to me is how you could ensure these constants don't propagate to code executed in imported modules -- they should also require a future import to enable the new feature.
This came up recently in a discussion of possible indexing changes in NumPy. I don't expect it will actually be used in NumPy, but I can see it being useful for other projects.
As a concrete example, suppose that we do want to change how indexing works in some future version of NumPy. This would be a backwards incompatible change, so we decide we to use a future statement to ease the transition. A script using this new feature looks something like this:
import numpy as np
from numpy.__future__ import orthogonal_indexing
x = np.random.randn(5, 5)
print(x[[0, 1], [0, 1]]) # should use the "orthogonal indexing" feature
# prints a 2x2 array of random numbers
# we also want to use a legacy project that uses indexing, but
# hasn't been updated to the use the "orthogonal indexing" feature
from legacy_project import do_something
do_something(x) # should *not* use "orthogonal indexing"
If this isn't possible, what's the closest we can get for enabling local options? For example, is to possible to write something like:
from numpy import future
future.enable_orthogonal_indexing()
Using something like a context manager would be fine, but the problem is that we don't want to propagate options to nested scopes:
with numpy.future.enable_orthogonal_indexing():
print(x[[0, 1], [0, 1]]) # should use the "orthogonal indexing" feature
do_something(x) # should *not* use "orthogonal indexing" inside do_something
The way Python itself does this is pretty simple:
In the importer, when you try to import from a .py file, the code first scans the module for future statements.
Note that the only things allowed before a future statement are strings, comments, blank lines, and other future statements, which means it doesn't need to fully parse the code to do this. That's important, because future statements can change the way the code is parsed (in fact, that's the whole point of having them…); strings, comments, and blank lines can be handled by the lexer step, and future statements can be parsed with a very simple special-purpose parser.
Then, if any future statements are found, Python sets a corresponding flag bit, then re-seeks to the top of the file and calls compile with those flags. For example, for from __future__ import unicode_literals, it does flags |= __future__.unicode_literals.compiler_flag, which changes flags from 0 to 0x20000.
In this "real compile" step, the future statements are treated as normal imports, and you will end up with a __future__._Feature value in the variable unicode_literals in the module's globals.
Now, you can't quite do the same thing, because you're not going to reimplement or wrap the compiler. But what you can do is use your future-like statements to signal an AST transform step. Something like this:
flags = []
for line in f:
flag = parse_future(line)
if flag is None:
break
flags.append(flag)
f.seek(0)
contents = f.read()
tree = ast.parse(contents, f.name)
for flag in flags:
tree = transformers[flag](tree)
code = compile(tree, f.name)
Of course you have to write that parse_future function to return 0 for a blank line, comment, or string, a flag for a recognized future import (which you can look up dynamically if you want), or None for anything else. And you have to write the AST transformers for each flag. But they can be pretty simple—e.g., you can transform Subscript nodes into different Subscript nodes, or even into Call nodes that call different functions based on the form of the index.
To hook this into the import system, see PEP 302. Note that this gets simpler in Python 3.3, and simpler again in Python 3.4, so if you can require one of those versions, instead read the import system docs for your minimum version.
For a great example of import hooks and AST transformers being used in real life, see MacroPy. (Note that it's using the old 2.3-style import hook mechanism; again, your own code can be simpler if you can use 3.3+ or 3.4+. And of course your code isn't generating the transforms dynamically, which is the most complicated part of MacroPy…)
The __future__ in Python is both a module and also not. The Python __future__ is actually not imported from anywhere - it is a construct used by the Python bytecode compiler, deliberately chosen so that no new syntax needs to be created. There is also a __future__.py in the library directory; it can be imported as such: import __future__; and then you can for example access the __future__.print_function to find out which Python version makes the feature optionally available and in which version the feature is on by default.
It is possible to make a __future__ module that knows what is being imported. Here is an example of myproject/__future__.py that can intercept feature imports on per module basis:
import sys
import inspect
class FutureMagic(object):
inspect = inspect
#property
def more_magic(self):
importing_frame = self.inspect.getouterframes(
self.inspect.currentframe())[1][0]
module = importing_frame.f_globals['__name__']
print("more magic imported in %s" % module)
sys.modules[__name__] = FutureMagic()
On load time the module is replaced with a FutureMagic() instance. Whenever more_magic is imported from myproject.FutureMagic, the more_magic property method will be called, and it will print out the name of the module that imported the feature:
>>> from myproject.__future__ import more_magic
more magic imported in __main__
Now, you could have a bookkeeping of the modules that have imported this feature. Doing import myproject.__future__; myproject.__future__.more_magic would trigger the same machinery, but you could also ensure that the more_magic import be at the beginning of the file - its global variables at that point shouldn't contain anything else except values returned from this fake module; otherwise the value is being accessed for inspection only.
However the real question is: how could you use this - to find out from which module the function is being called is quite expensive, and would limit the usefulness of this feature.
Thus a possibly more fruitful approach could be to use import hooks to do source translation on abstract syntax trees on modules that do from mypackage.__future__ import more_magic, possibly changing all object[index] into __newgetitem__(operand, index).
No, you can't. The real __future__ import is special in that its effects are local to the individual file where it occurs. But ordinary imports are global: once one module does import blah, blah is executed and is available globally; other modules that later do import blah just retrieve the already-imported module. This means that if from numpy.__future__ changes something in numpy, everything that does import numpy will see the change.
As an aside, I don't think this is what that mailing list message is suggesting. I read it as suggesting an effect that is global, equivalent to setting a flag like numpy.useNewIndexing = True. This would mean that you should only set that flag at the top level of your application if you know that all parts of your application will work with that.
No, there is no reasonable way to do this. Let's go through the requirements.
First, you need to figure out which modules have your custom future statement enabled. Standard imports aren't up to this, but you could require them to e.g. call some enabling function and pass __name__ as a parameter. This is somewhat ugly:
from numpy.future import new_indexing
new_indexing(__name__)
This falls apart in the face of importlib.reload(), but meh.
Next, you need to figure out whether your caller is running in one of these modules. You'd start by pulling out the stack via inspect.stack() (which won't work under all Python implementations, misses C extension modules, etc.) and then goof around with inspect.getmodule() and the like.
Frankly, this is just a Bad Idea.
If the "feature" that you want to control can be boiled down to changing a name, then this is easy to do, like
from module.new_way import something
vs
from module.old_way import something
The feature you suggested is not, of course, but I would argue that this is the only Pythonic way of having different behavior in different scopes (and I do think you mean scope, not module, e.g., what if someone does an import inside a function definition), since scoping names is controlled and well supported by the interpreter itself.

Python module: how to prevent importing modules called by the new module

I am new in Python and I am creating a module to re-use some code.
My module (impy.py) looks like this (it has one function so far)...
import numpy as np
def read_image(fname):
....
and it is stored in the following directory:
custom_modules/
__init.py__
impy.py
As you can see it uses the module numpy. The problem is that when I import it from another script, like this...
import custom_modules.impy as im
and I type im. I get the option of calling not only the function read_image() but also the module np.
How can I do to make it only available the functions I am writing in my module and not the modules that my module is calling (numpy in this case)?
Thank you very much for your help.
I've got a proposition, that could maybe answer the following concern: "I do not want to mess class/module attributes with class/module imports". Because, Idle also proposes access to imported modules within a class or module.
This simply consists in taking the conventional name that coders normally don't want to access and IDE not to propose: name starting with underscore. This is also known as "weak « internal use » indicator", as described in PEP 8 / Naming styles.
class C(object):
import numpy as _np # <-- here
def __init__(self):
# whatever we need
def do(self, arg):
# something useful
Now, in Idle, auto-completion will only propose do function; imported module is not proposed.
By the way, you should change the title of your question: you do not want to avoid imports of your imported modules (that would make them unusable), so it should rather be "how to prevent IDE to show imported modules of an imported module" or something similar.
You could import numpy inside your function
def read_image(fname):
import numpy as np
....
making it locally available to the read_image code, but not globally available.
Warning though, this might cause a performance hit (as numpy would be imported each time the code is run rather than just once on the initial import) - especially if you run read_image multiple times.
If you really want to hide it, then I suggest creating a new directory such that your structure looks like this:
custom_modules/
__init__.py
impy/
__init__.py
impy.py
and let the new impy/__init__.py contain
from impy import read_image
This way, you can control what ends up in the custom_modules.impy namespace.

Monkeypatching hardcoded global configuration loaded from a .py file

A co-worker has a library that uses a hard-coded configuration defined in its own file. For instance:
constants.py:
API_URL="http://example.com/bogus"
Throughout the rest of the library, the configuration is accessed in the following manner.
from constants import API_URL
As you can imagine, this is not very flexible and causes problems during testing. If I want to change the configuration, I have to modify constants.py, which is in source code management.
Naturally, I'd much rather load the configuration from a JSON or YAML file. I can read the configuration into an object with no problems. Is there a way I can override the constants.py module without breaking the code, so that each global, e.g. API_URL is replaced by a value provided by my file?
I was thinking that after each from constants import ... I could add something like this:
from constants import * # existing configuration import
import json
new_config = json.load(open('config.json')) # load my config file into a dictionary
constants.__dict__.update(new_config) # override any constants with what I've loaded
The problem with this, of course, is that it's not very "DRY" and looks like it might be brittle.
Does anyone have a suggestion for doing this more cleanly? Thanks!
EDIT: looks like my approach doesn't work anyway. I guess "from import *" copies the values from the module into the current module's global scope?
DOUBLE EDIT: no, it does work; I'm just confused. But rather than doing this in X different files I'd like to have it work transparently if possible.
from module import <name> creates a reference in the importing module global namespace to the imported object. If that is an immutable, that means you now have to monkeypatch the value in the module that imported it.
Your only hope is to be the first to import constants and monkeypatch the names in that module. Subsequent imports will then use your monkeypatched values.
To patch the original module early, the following is enough:
import constants
for name, value in new_config.iteritems():
setattr(constants, name, value)

Control python imports to reduce size and overhead

I have created a number of personal libraries to help with my daily coding. Best practice is to put imports at the beginning of python programs. But say I import my library, or even just a function or class from the library. All of the modules are imported (even if those modules are used in other unused classes or functions). I assume this increases the overhead of the program?
One example. I have a library called pytools which looks something like this
import difflib
def foo():
# uses difflib.SequenceMatcher
def bar():
# benign function ie
print "Hello!"
return True
class foobar:
def __init__():
print "New foobar"
def ret_true():
return True
The function foo uses difflib. Now say I am writing a new program that needs to use bar and foobar. I could either write
import pytools
...
item = pytools.foobar()
vals = pytools.bar()
or I could do
from pytools import foobar, bar
...
item = foobar()
vals = bar()
Does either choice reduce overhead or preclude the import of foo and its dependencies on difflib? What if the import to difflib was inside of the foo function?
The problem I am running into is when converting simple programs into executables that only use one or two classes or functions from my libraries, The executable ends up being 50 mb or so.
I have read through py2exe's optimizing size page and can optimize using some of its suggestions.
http://www.py2exe.org/index.cgi/OptimizingSize
I guess I am really asking for best practice here. Is there some way to preclude the import of libraries whose dependencies are in unused functions or classes? I've watched import statements execute using a debugger and it appears that python only "picks up" the line with "def somefunction" before moving on. Is the rest of the import not completed until the function/class is used? This would mean putting high volume imports at the beginning of a function or class could reduce overhead for the rest of the library.
The only way to effectively reduce your dependencies is to split your tool box into smaller modules, and to only import the modules you need.
Putting imports at the beginning of unused functions will prevent loading these modules at run-time, but is discouraged because it hides the dependecies. Moreover, your Python-to-executable converter will likely need to include these modules anyway, since Python's dynamic nature makes it impossible to statically determine which functions are actually called.

How should I perform imports in a python module without polluting its namespace?

I am developing a Python package for dealing with some scientific data. There are multiple frequently-used classes and functions from other modules and packages, including numpy, that I need in virtually every function defined in any module of the package.
What would be the Pythonic way to deal with them? I have considered multiple variants, but every has its own drawbacks.
Import the classes at module-level with from foreignmodule import Class1, Class2, function1, function2
Then the imported functions and classes are easily accessible from every function. On the other hand, they pollute the module namespace making dir(package.module) and help(package.module) cluttered with imported functions
Import the classes at function-level with from foreignmodule import Class1, Class2, function1, function2
The functions and classes are easily accessible and do not pollute the module, but imports from up to a dozen modules in every function look as a lot of duplicate code.
Import the modules at module-level with import foreignmodule
Not too much pollution is compensated by the need to prepend the module name to every function or class call.
Use some artificial workaround like using a function body for all these manipulations and returning only the objects to be exported... like this
def _export():
from foreignmodule import Class1, Class2, function1, function2
def myfunc(x):
return function1(x, function2(x))
return myfunc
myfunc = _export()
del _export
This manages to solve both problems, module namespace pollution and ease of use for functions... but it seems to be not Pythonic at all.
So what solution is the most Pythonic? Is there another good solution I overlooked?
Go ahead and do your usual from W import X, Y, Z and then use the __all__ special symbol to define what actual symbols you intend people to import from your module:
__all__ = ('MyClass1', 'MyClass2', 'myvar1', …)
This defines the symbols that will be imported into a user's module if they import * from your module.
In general, Python programmers should not be using dir() to figure out how to use your module, and if they are doing so it might indicate a problem somewhere else. They should be reading your documentation or typing help(yourmodule) to figure out how to use your library. Or they could browse the source code yourself, in which case (a) the difference between things you import and things you define is quite clear, and (b) they will see the __all__ declaration and know which toys they should be playing with.
If you try to support dir() in a situation like this for a task for which it was not designed, you will have to place annoying limitations on your own code, as I hope is clear from the other answers here. My advice: don't do it! Take a look at the Standard Library for guidance: it does from … import … whenever code clarity and conciseness require it, and provides (1) informative docstrings, (2) full documentation, and (3) readable code, so that no one ever has to run dir() on a module and try to tell the imports apart from the stuff actually defined in the module.
One technique I've seen used, including in the standard library, is to use import module as _module or from module import var as _var, i.e. assigning imported modules/variables to names starting with an underscore.
The effect is that other code, following the usual Python convention, treats those members as private. This applies even for code that doesn't look at __all__, such as IPython's autocomplete function.
An example from Python 3.3's random module:
from warnings import warn as _warn
from types import MethodType as _MethodType, BuiltinMethodType as _BuiltinMethodType
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
from math import sqrt as _sqrt, acos as _acos, cos as _cos, sin as _sin
from os import urandom as _urandom
from collections.abc import Set as _Set, Sequence as _Sequence
from hashlib import sha512 as _sha512
Another technique is to perform imports in function scope, so that they become local variables:
"""Some module"""
# imports conventionally go here
def some_function(arg):
"Do something with arg."
import re # Regular expressions solve everything
...
The main rationale for doing this is that it is effectively lazy, delaying the importing of a module's dependencies until they are actually used. Suppose one function in the module depends on a particular huge library. Importing the library at the top of the file would mean that importing the module would load the entire library. This way, importing the module can be quick, and only client code that actually calls that function incurs the cost of loading the library. Further, if the dependency library is not available, client code that doesn't need the dependent feature can still import the module and call the other functions. The disadvantage is that using function-level imports obscures what your code's dependencies are.
Example from Python 3.3's os.py:
def get_exec_path(env=None):
"""[...]"""
# Use a local import instead of a global import to limit the number of
# modules loaded at startup: the os module is always loaded at startup by
# Python. It may also avoid a bootstrap issue.
import warnings
Import the module as a whole: import foreignmodule. What you claim as a drawback is actually a benefit. Namely, prepending the module name makes your code easier to maintain and makes it more self-documenting.
Six months from now when you look at a line of code like foo = Bar(baz) you may ask yourself which module Bar came from, but with foo = cleverlib.Bar it is much less of a mystery.
Of course, the fewer imports you have, the less of a problem this is. For small programs with few dependencies it really doesn't matter all that much.
When you find yourself asking questions like this, ask yourself what makes the code easier to understand, rather than what makes the code easier to write. You write it once but you read it a lot.
For this situation I would go with an all_imports.py file which had all the
from foreignmodule import .....
from another module import .....
and then in your working modules
import all_imports as fgn # or whatever you want to prepend
...
something = fgn.Class1()
Another thing to be aware of
__all__ = ['func1', 'func2', 'this', 'that']
Now, any functions/classes/variables/etc that are in your module, but not in your modules's __all__ will not show up in help(), and won't be imported by from mymodule import * See Making python imports more structured? for more info.
I would compromise and just pick a short alias for the foreign module:
import foreignmodule as fm
It saves you completely from the pollution (probably the bigger issue) and at least reduces the prepending burden.
I know this is an old question. It may not be 'Pythonic', but the cleanest way I've discovered for exporting only certain module definitions is, really as you've found, to globally wrap the module in a function. But instead of returning them, to export names, you can simply globalize them (global thus in essence becomes a kind of 'export' keyword):
def module():
global MyPublicClass,ExportedModule
import somemodule as ExportedModule
import anothermodule as PrivateModule
class MyPublicClass:
def __init__(self):
pass
class MyPrivateClass:
def __init__(self):
pass
module()
del module
I know it's not much different than your original conclusion, but frankly to me this seems to be the cleanest option. The other advantage is, you can group any number of modules written this way into a single file, and their private terms won't overlap:
def module():
global A
i,j,k = 1,2,3
class A:
pass
module()
del module
def module():
global B
i,j,k = 7,8,9 # doesn't overwrite previous declarations
class B:
pass
module()
del module
Though, keep in mind their public definitions will, of course, overlap.

Categories

Resources