Control python imports to reduce size and overhead

Control python imports to reduce size and overhead - python

I have created a number of personal libraries to help with my daily coding. Best practice is to put imports at the beginning of python programs. But say I import my library, or even just a function or class from the library. All of the modules are imported (even if those modules are used in other unused classes or functions). I assume this increases the overhead of the program?
One example. I have a library called pytools which looks something like this
import difflib
def foo():
# uses difflib.SequenceMatcher
def bar():
# benign function ie
print "Hello!"
return True
class foobar:
def __init__():
print "New foobar"
def ret_true():
return True
The function foo uses difflib. Now say I am writing a new program that needs to use bar and foobar. I could either write
import pytools
...
item = pytools.foobar()
vals = pytools.bar()
or I could do
from pytools import foobar, bar
...
item = foobar()
vals = bar()
Does either choice reduce overhead or preclude the import of foo and its dependencies on difflib? What if the import to difflib was inside of the foo function?
The problem I am running into is when converting simple programs into executables that only use one or two classes or functions from my libraries, The executable ends up being 50 mb or so.
I have read through py2exe's optimizing size page and can optimize using some of its suggestions.
http://www.py2exe.org/index.cgi/OptimizingSize
I guess I am really asking for best practice here. Is there some way to preclude the import of libraries whose dependencies are in unused functions or classes? I've watched import statements execute using a debugger and it appears that python only "picks up" the line with "def somefunction" before moving on. Is the rest of the import not completed until the function/class is used? This would mean putting high volume imports at the beginning of a function or class could reduce overhead for the rest of the library.

The only way to effectively reduce your dependencies is to split your tool box into smaller modules, and to only import the modules you need.
Putting imports at the beginning of unused functions will prevent loading these modules at run-time, but is discouraged because it hides the dependecies. Moreover, your Python-to-executable converter will likely need to include these modules anyway, since Python's dynamic nature makes it impossible to statically determine which functions are actually called.

Related

How to override or replace an imported function call on a different module

I am leveraging an existing framework for a tool build activity based on python. Let me get into my issue straight :
Let's say the framework I am using is having a module named m1.py where I am having below function
def func_should_not_run(*args,**kwargs):
<doing something >
And I have a another module named m2.py where I am having below class :
from m1 import
class JustAClass:
def __init__(self,*args,*kwargs):
<All kind of initialisation..>
def run_something(self,*args,*kwargs):
<lots of code before>
func_should_not_run(*args,*kwargs)
<lots of code after>
Now my code module my_mod.py is having below class where I am creating an instance of above class from framework and calling m2.JustAClass.run_something inside another method as below
class JustAnotherClass:
def __init__(self,*args,*kwargs):
<All kind of initialisation..>
self.obj1=JustAClass(*some_args,**some_kwargs)
def run(self):
<some code before>
self.obj1.run_something(*some_other_args,**some_other_kwargs)
<some code after>
Now due to some implementation issue with m1.func_should_not_run which is getting called inside m2.JustAClass.run_something , I need to replace it with my own function func_should_run so that when func_should_not_run will be called inside m2.JustAClass.run_something, it should instead execute func_should_run from my module.
How can I achieve this?
Is there any way if I can override the import statement "from m1 import" on m2.py from my_mod.py?

This solution is risky for some aspects and could potentially fail given its side-effects, but worth to be mentioned in my opinion.
The idea is to replace (or better reload) the module that depends on the module you want to change, after some adjustment. I am going to start from the code and then I will show you the problems and limits of this approach:
from m2 import JustAClass
def func_should_run():
print('This is the function you want to call')
class JustAnotherClass:
def __init__(self,*args, **kwargs):
self.obj1 = JustAClass(*args, **kwargs)
def run(self):
self.obj1.run_something()
if __name__ == '__main__':
import importlib
importlib.import_module('m1').func_should_not_run = func_should_run
importlib.reload(importlib.import_module('m2'))
janc = JustAnotherClass()
janc.run()
Output:
This is the function you want to call
After importing importlib:
importlib.import_module('m1').func_should_not_run = func_should_run: I am importing module m1 and changing the func_should_not_run reference to func_should_run. This means that, for all the following calls to func_should_not_run, the code executed is the one of func_should_run. Obviously, this is also not valid for objects referencing the old func_should_not_run, like m2.JustAClass, so
importlib.reload(importlib.import_module('m2')): here I am reloading the module m2, that is going to use the new version of func_should_not_run because the module m1 is already loaded in the cache (i.e. sys.module) and therefore is not going to reload it (for this reason Transitive reloading can't occur, unless you explicitly do that).
From now on, every instance of JustAnotherClass correctly calls func_should_run
Should you use importlib.reload() for this?
Typically, reloading a module is useful when you have applied changes to a certain module and you do not want to restart the whole system to see those changes. In your case, unless you have clear in mind all the risks of this approach, you are kind of abusing the reload().
What are the main side-effects of this solution?
For start, reloading has its costs, especially if inside the module you have some initialization code that you do not want to re-execute. This means:
You are inevitably going to execute the module code twice (at least)
Be sure to comment your code explaining that every occurence of func_should_not_run is actually replace with func_should_run, but this is definitely not a good practice and not maintainable if used in many places.
To conclude, it is a simple as much as risky solution that can be adopted taking all the necessary precautions, with the awareness that it is just a hack and not a reasonable design decision.

Automatic replacement of indirect imports

I have a small module that defines a function:
# small_file.py
def func():
...
I have a larger module that does a wildcard import of that one:
# giant_file.py
from small_file import *
...
I have hundreds of other files that indirectly import the function via giant_file:
# file1.py
from giant_file import func
...
# file2.py
import giant_file.func
...
# file3.py
from giant_file import func, something_not_in_small_file
...
I would like to automatically change all these other files to import directly from small_file.py. This is to avoid the overhead of loading all of giant_file.py.
My question is this: is there a good way to automatically change all these files to import directly?
My default plan is to write a redbaron-based tool, but I'm hoping there might be a more lightweight approach utilizing an IDE's refactoring capabilities.

I solved this problem by writing a static analysis tool that analyzes my entire codebase and constructs an AST for each file. Using the AST's, the tool traverses imports and attributes each symbol read to an original assignment. It then rewrites all the imports in each file with direct-imports (eliminating unused imports along the way).
There are some potential pathological cases that the tool misses, such as dynamically constructing strings and then assigning to them via globals(). But it works for most "normal" code.
I used redbaron for this. If I could do it again, I would use LibCST instead, as redbaron has various bugs and unsupported python language elements, while LibCST appears to be more mature and actively maintained.

Should I always use the most pythonic way to import modules?

I am making a tiny framework for games with pygame, on which I wish to implement basic code to quickly start new projects. This will be a module that whoever uses should just create a folder with subfolders for sprite classes, maps, levels, etc.
My question is, how should my framework module load these client modules? I was considering to design it so the developer could just pass to the main object the names of the directories, like:
game = Game()
game.scenarios = 'scenarios'
Then game will append 'scenarios' to sys.path and use __import__(). I've tested and it works.
But then I researched a little more to see if there were already some autoloader in python, so I could avoid to rewrite it, and I found this question Python modules autoloader?
Basically, it is not recommended to use a autoloader in python, since "explicit is better than implicit" and "Readability counts".
That way, I think, I should compel the user of my module to manually import each of his/her modules, and pass these to the game instance, like:
import framework.Game
import scenarios
#many other imports
game = Game()
game.scenarios = scenarios
#so many other game.whatever = whatever
But this doesn't looks good to me, not so confortable. See, I am used to work with php, and I love the way it works with it's autoloader.
So, the first exemple has some problability to crash or be some trouble, or is it just not 'pythonic'?
note: this is NOT an web application

I wouldn't consider letting a library import things from my current path or module good style. Instead I would only expect a library to import from two places:
Absolute imports from the global modules space, like things you have installed using pip. If a library does this, this library must also be found in its install_requires=[] list
Relative imports from inside itself. Nowadays these are explicitly imported from .:
from . import bla
from .bla import blubb
This means that passing an object or module local to my current scope must always happen explicitly:
from . import scenarios
import framework
scenarios.sprites # attribute exists
game = framework.Game(scenarios=scenarios)
This allows you to do things like mock the scenarios module:
import types
import framework
# a SimpleNamespace looks like a module, as they both have attributes
scenarios = types.SimpleNamespace(sprites='a', textures='b')
scenarios.sprites # attribute exists
game = framework.Game(scenarios=scenarios)
Also you can implement a framework.utils.Scenario() class that implements a certain interface to provide sprites, maps etc. The reason being: Sprites and Maps are usually saved in separate files: What you absolutely do not want to do is look at the scenarios's __file__ attribute and start guessing around in its files. Instead implement a method that provides a unified interface to that.
class Scenario():
def __init__(self):
...
def sprites(self):
# optionally load files from some default location
# If no such things as a default location exists, throw a NotImplemented error
...
And your user-specific scenarios will derive from it and optionally overload the loading methods
import framework.utils
class Scenario(framework.utils.Scenario):
def __init__(self):
...
def sprites(self):
# this method *must* load files from location
# accessing __file__ is OK here
...
What you can also do is have framework ship its own framework.contrib.scenarios module that is used in case no scenarios= keyword arg was used (i.e. for a square default map and some colorful default textures)
from . import contrib
class Game()
def __init__(self, ..., scenarios=None, ...):
if scenarios is None:
scenarios = contrib.scenarios
self.scenarios = scenarios

Using future style imports for module specific features in Python

The Python future statement from __future__ import feature provides a nice way to ease the transition to new language features. Is it is possible to implement a similar feature for Python libraries: from myproject.__future__ import feature?
It's straightforward to set a module wide constants on an import statement. What isn't obvious to me is how you could ensure these constants don't propagate to code executed in imported modules -- they should also require a future import to enable the new feature.
This came up recently in a discussion of possible indexing changes in NumPy. I don't expect it will actually be used in NumPy, but I can see it being useful for other projects.
As a concrete example, suppose that we do want to change how indexing works in some future version of NumPy. This would be a backwards incompatible change, so we decide we to use a future statement to ease the transition. A script using this new feature looks something like this:
import numpy as np
from numpy.__future__ import orthogonal_indexing
x = np.random.randn(5, 5)
print(x[[0, 1], [0, 1]]) # should use the "orthogonal indexing" feature
# prints a 2x2 array of random numbers
# we also want to use a legacy project that uses indexing, but
# hasn't been updated to the use the "orthogonal indexing" feature
from legacy_project import do_something
do_something(x) # should *not* use "orthogonal indexing"
If this isn't possible, what's the closest we can get for enabling local options? For example, is to possible to write something like:
from numpy import future
future.enable_orthogonal_indexing()
Using something like a context manager would be fine, but the problem is that we don't want to propagate options to nested scopes:
with numpy.future.enable_orthogonal_indexing():
print(x[[0, 1], [0, 1]]) # should use the "orthogonal indexing" feature
do_something(x) # should *not* use "orthogonal indexing" inside do_something

The way Python itself does this is pretty simple:
In the importer, when you try to import from a .py file, the code first scans the module for future statements.
Note that the only things allowed before a future statement are strings, comments, blank lines, and other future statements, which means it doesn't need to fully parse the code to do this. That's important, because future statements can change the way the code is parsed (in fact, that's the whole point of having them…); strings, comments, and blank lines can be handled by the lexer step, and future statements can be parsed with a very simple special-purpose parser.
Then, if any future statements are found, Python sets a corresponding flag bit, then re-seeks to the top of the file and calls compile with those flags. For example, for from __future__ import unicode_literals, it does flags |= __future__.unicode_literals.compiler_flag, which changes flags from 0 to 0x20000.
In this "real compile" step, the future statements are treated as normal imports, and you will end up with a __future__._Feature value in the variable unicode_literals in the module's globals.
Now, you can't quite do the same thing, because you're not going to reimplement or wrap the compiler. But what you can do is use your future-like statements to signal an AST transform step. Something like this:
flags = []
for line in f:
flag = parse_future(line)
if flag is None:
break
flags.append(flag)
f.seek(0)
contents = f.read()
tree = ast.parse(contents, f.name)
for flag in flags:
tree = transformers[flag](tree)
code = compile(tree, f.name)
Of course you have to write that parse_future function to return 0 for a blank line, comment, or string, a flag for a recognized future import (which you can look up dynamically if you want), or None for anything else. And you have to write the AST transformers for each flag. But they can be pretty simple—e.g., you can transform Subscript nodes into different Subscript nodes, or even into Call nodes that call different functions based on the form of the index.
To hook this into the import system, see PEP 302. Note that this gets simpler in Python 3.3, and simpler again in Python 3.4, so if you can require one of those versions, instead read the import system docs for your minimum version.
For a great example of import hooks and AST transformers being used in real life, see MacroPy. (Note that it's using the old 2.3-style import hook mechanism; again, your own code can be simpler if you can use 3.3+ or 3.4+. And of course your code isn't generating the transforms dynamically, which is the most complicated part of MacroPy…)

The __future__ in Python is both a module and also not. The Python __future__ is actually not imported from anywhere - it is a construct used by the Python bytecode compiler, deliberately chosen so that no new syntax needs to be created. There is also a __future__.py in the library directory; it can be imported as such: import __future__; and then you can for example access the __future__.print_function to find out which Python version makes the feature optionally available and in which version the feature is on by default.
It is possible to make a __future__ module that knows what is being imported. Here is an example of myproject/__future__.py that can intercept feature imports on per module basis:
import sys
import inspect
class FutureMagic(object):
inspect = inspect
#property
def more_magic(self):
importing_frame = self.inspect.getouterframes(
self.inspect.currentframe())[1][0]
module = importing_frame.f_globals['__name__']
print("more magic imported in %s" % module)
sys.modules[__name__] = FutureMagic()
On load time the module is replaced with a FutureMagic() instance. Whenever more_magic is imported from myproject.FutureMagic, the more_magic property method will be called, and it will print out the name of the module that imported the feature:
>>> from myproject.__future__ import more_magic
more magic imported in __main__
Now, you could have a bookkeeping of the modules that have imported this feature. Doing import myproject.__future__; myproject.__future__.more_magic would trigger the same machinery, but you could also ensure that the more_magic import be at the beginning of the file - its global variables at that point shouldn't contain anything else except values returned from this fake module; otherwise the value is being accessed for inspection only.
However the real question is: how could you use this - to find out from which module the function is being called is quite expensive, and would limit the usefulness of this feature.
Thus a possibly more fruitful approach could be to use import hooks to do source translation on abstract syntax trees on modules that do from mypackage.__future__ import more_magic, possibly changing all object[index] into __newgetitem__(operand, index).

No, you can't. The real __future__ import is special in that its effects are local to the individual file where it occurs. But ordinary imports are global: once one module does import blah, blah is executed and is available globally; other modules that later do import blah just retrieve the already-imported module. This means that if from numpy.__future__ changes something in numpy, everything that does import numpy will see the change.
As an aside, I don't think this is what that mailing list message is suggesting. I read it as suggesting an effect that is global, equivalent to setting a flag like numpy.useNewIndexing = True. This would mean that you should only set that flag at the top level of your application if you know that all parts of your application will work with that.

No, there is no reasonable way to do this. Let's go through the requirements.
First, you need to figure out which modules have your custom future statement enabled. Standard imports aren't up to this, but you could require them to e.g. call some enabling function and pass __name__ as a parameter. This is somewhat ugly:
from numpy.future import new_indexing
new_indexing(__name__)
This falls apart in the face of importlib.reload(), but meh.
Next, you need to figure out whether your caller is running in one of these modules. You'd start by pulling out the stack via inspect.stack() (which won't work under all Python implementations, misses C extension modules, etc.) and then goof around with inspect.getmodule() and the like.
Frankly, this is just a Bad Idea.

If the "feature" that you want to control can be boiled down to changing a name, then this is easy to do, like
from module.new_way import something
vs
from module.old_way import something
The feature you suggested is not, of course, but I would argue that this is the only Pythonic way of having different behavior in different scopes (and I do think you mean scope, not module, e.g., what if someone does an import inside a function definition), since scoping names is controlled and well supported by the interpreter itself.

How should I perform imports in a python module without polluting its namespace?

I am developing a Python package for dealing with some scientific data. There are multiple frequently-used classes and functions from other modules and packages, including numpy, that I need in virtually every function defined in any module of the package.
What would be the Pythonic way to deal with them? I have considered multiple variants, but every has its own drawbacks.
Import the classes at module-level with from foreignmodule import Class1, Class2, function1, function2
Then the imported functions and classes are easily accessible from every function. On the other hand, they pollute the module namespace making dir(package.module) and help(package.module) cluttered with imported functions
Import the classes at function-level with from foreignmodule import Class1, Class2, function1, function2
The functions and classes are easily accessible and do not pollute the module, but imports from up to a dozen modules in every function look as a lot of duplicate code.
Import the modules at module-level with import foreignmodule
Not too much pollution is compensated by the need to prepend the module name to every function or class call.
Use some artificial workaround like using a function body for all these manipulations and returning only the objects to be exported... like this
def _export():
from foreignmodule import Class1, Class2, function1, function2
def myfunc(x):
return function1(x, function2(x))
return myfunc
myfunc = _export()
del _export
This manages to solve both problems, module namespace pollution and ease of use for functions... but it seems to be not Pythonic at all.
So what solution is the most Pythonic? Is there another good solution I overlooked?

Go ahead and do your usual from W import X, Y, Z and then use the __all__ special symbol to define what actual symbols you intend people to import from your module:
__all__ = ('MyClass1', 'MyClass2', 'myvar1', …)
This defines the symbols that will be imported into a user's module if they import * from your module.
In general, Python programmers should not be using dir() to figure out how to use your module, and if they are doing so it might indicate a problem somewhere else. They should be reading your documentation or typing help(yourmodule) to figure out how to use your library. Or they could browse the source code yourself, in which case (a) the difference between things you import and things you define is quite clear, and (b) they will see the __all__ declaration and know which toys they should be playing with.
If you try to support dir() in a situation like this for a task for which it was not designed, you will have to place annoying limitations on your own code, as I hope is clear from the other answers here. My advice: don't do it! Take a look at the Standard Library for guidance: it does from … import … whenever code clarity and conciseness require it, and provides (1) informative docstrings, (2) full documentation, and (3) readable code, so that no one ever has to run dir() on a module and try to tell the imports apart from the stuff actually defined in the module.

One technique I've seen used, including in the standard library, is to use import module as _module or from module import var as _var, i.e. assigning imported modules/variables to names starting with an underscore.
The effect is that other code, following the usual Python convention, treats those members as private. This applies even for code that doesn't look at __all__, such as IPython's autocomplete function.
An example from Python 3.3's random module:
from warnings import warn as _warn
from types import MethodType as _MethodType, BuiltinMethodType as _BuiltinMethodType
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
from math import sqrt as _sqrt, acos as _acos, cos as _cos, sin as _sin
from os import urandom as _urandom
from collections.abc import Set as _Set, Sequence as _Sequence
from hashlib import sha512 as _sha512
Another technique is to perform imports in function scope, so that they become local variables:
"""Some module"""
# imports conventionally go here
def some_function(arg):
"Do something with arg."
import re # Regular expressions solve everything
...
The main rationale for doing this is that it is effectively lazy, delaying the importing of a module's dependencies until they are actually used. Suppose one function in the module depends on a particular huge library. Importing the library at the top of the file would mean that importing the module would load the entire library. This way, importing the module can be quick, and only client code that actually calls that function incurs the cost of loading the library. Further, if the dependency library is not available, client code that doesn't need the dependent feature can still import the module and call the other functions. The disadvantage is that using function-level imports obscures what your code's dependencies are.
Example from Python 3.3's os.py:
def get_exec_path(env=None):
"""[...]"""
# Use a local import instead of a global import to limit the number of
# modules loaded at startup: the os module is always loaded at startup by
# Python. It may also avoid a bootstrap issue.
import warnings

Import the module as a whole: import foreignmodule. What you claim as a drawback is actually a benefit. Namely, prepending the module name makes your code easier to maintain and makes it more self-documenting.
Six months from now when you look at a line of code like foo = Bar(baz) you may ask yourself which module Bar came from, but with foo = cleverlib.Bar it is much less of a mystery.
Of course, the fewer imports you have, the less of a problem this is. For small programs with few dependencies it really doesn't matter all that much.
When you find yourself asking questions like this, ask yourself what makes the code easier to understand, rather than what makes the code easier to write. You write it once but you read it a lot.

For this situation I would go with an all_imports.py file which had all the
from foreignmodule import .....
from another module import .....
and then in your working modules
import all_imports as fgn # or whatever you want to prepend
...
something = fgn.Class1()
Another thing to be aware of
__all__ = ['func1', 'func2', 'this', 'that']
Now, any functions/classes/variables/etc that are in your module, but not in your modules's __all__ will not show up in help(), and won't be imported by from mymodule import * See Making python imports more structured? for more info.

I would compromise and just pick a short alias for the foreign module:
import foreignmodule as fm
It saves you completely from the pollution (probably the bigger issue) and at least reduces the prepending burden.

I know this is an old question. It may not be 'Pythonic', but the cleanest way I've discovered for exporting only certain module definitions is, really as you've found, to globally wrap the module in a function. But instead of returning them, to export names, you can simply globalize them (global thus in essence becomes a kind of 'export' keyword):
def module():
global MyPublicClass,ExportedModule
import somemodule as ExportedModule
import anothermodule as PrivateModule
class MyPublicClass:
def __init__(self):
pass
class MyPrivateClass:
def __init__(self):
pass
module()
del module
I know it's not much different than your original conclusion, but frankly to me this seems to be the cleanest option. The other advantage is, you can group any number of modules written this way into a single file, and their private terms won't overlap:
def module():
global A
i,j,k = 1,2,3
class A:
pass
module()
del module
def module():
global B
i,j,k = 7,8,9 # doesn't overwrite previous declarations
class B:
pass
module()
del module
Though, keep in mind their public definitions will, of course, overlap.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Control python imports to reduce size and overhead - python

Related

How to override or replace an imported function call on a different module

Automatic replacement of indirect imports

Should I always use the most pythonic way to import modules?

Using future style imports for module specific features in Python

How should I perform imports in a python module without polluting its namespace?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Control python imports to reduce size and overhead - python

Related

How to override or replace an imported function call on a different module

Automatic replacement of indirect imports

Should I always use the most pythonic way to import modules?

Using __future__ style imports for module specific features in Python

How should I perform imports in a python module without polluting its namespace?

Categories

Resources

Using future style imports for module specific features in Python