Python - optimize by not importing at module level? - python

In a framework such as Django, I'd imagine that if a user lands on a page (running a view function called "some_page"), and you have 8 imports at the top of module that are irrelevant to that view, you're wasting cycles on those imports. My questions are:
Is it a large enough amount of resources to make an impact on a high-traffic website?
Is it such a bad practice to import inside of a function for this purpose that it should be avoided at said impact?
Note: This could be considered premature optimization, but I'm not interested in that argument. Let's assume, for the sake of practical theory, that this is a completed site with loads of traffic, needing to be optimized in every way possible, and the application code, as well as DB have been fully optimized by 50 PhD database admins and developers, and these imports are the only thing left.

No, don't do this. In a normal python execution environment on the web (mod_wsgi, gunicorn, etc.) when your process starts those imports will be executed, and then all subsequent requests will not re-execute the script. If you put the imports inside the functions they'll have to be processed every time the function is called.

Yes, it is a bad practice to import at the function level. By using smarter imports at the top of the module, you create a one time, small cost. However, if you place an import in a function you will suffer the cost of the import each time that function is run. So, rather than import in the function, just import at the top of the module.
A few things you can do to clean up and improve your imports:
Don't use wild imports e.g. from x import *
Where possible, just use a normal import e.g. import x
Try to split your code up into smaller modules that can be called separately, so that fewer imports are made
Also, placing imports at the top of the module is a matter of style. There's a reason why PEP 8 says that modules need to be imported at the top. It's far more readable and maintainable that way.
Finally, some imports at function level will cause compatibility issues in the future, as from x import * is not valid Python 3.x at function level.

1) The answer is no. Django/Python is not like PHP. Your whole module will not be reinterpreted with each pageview like happens with PHP includes. The module will be in memory and each page view will make a simple function call to your view.
2) Yes, it will be a counter-optimization to make imports at the view level.

No. Same reason as other answers.
Yes. Same reason as other answes.
BTW, you can also do import lazily.
For example, Importing toolkit can "import" a module in top level code, but the module is not actually loaded until one of the attributes is accessed.

Sometimes following boiler-plate makes sense:
foo = None
def foorify():
global foo
if not foo: from xxx import foo
foo.bar()
This makes sense when foorification is conditional on something that rarely changes, e.g. one server foorifies while another never does or if you don't want or cannot safely import foo during application startup or most tests.

Related

Circular imports hell

Python is extremely elegant language. Well, except... except imports. I still can't get it work the way it seems natural to me.
I have a class MyObjectA which is in file mypackage/myobjecta.py. This object uses some utility functions which are in mypackage/utils.py. So in my first lines in myobjecta.py I write:
from mypackage.utils import util_func1, util_func2
But some of the utility functions create and return new instances of MyObjectA. So I need to write in utils.py:
from mypackage.myobjecta import MyObjectA
Well, no I can't. This is a circular import and Python will refuse to do that.
There are many question here regarding this issue, but none seems to give satisfactory answer. From what I can read in all the answers:
Reorganize your modules, you are doing it wrong! But I do not know
how better to organize my modules even in such a simple case as I
presented.
Try just import ... rather than from ... import ...
(personally I hate to write and potentially refactor all the full
name qualifiers; I love to see what exactly I am importing into
module from the outside world). Would that help? I am not sure,
still there are circular imports.
Do hacks like import something in the inner scope of a function body just one line before you use something from other module.
I am still hoping there is solution number 4) which would be Pythonic in the sense of being functional and elegant and simple and working. Or is there not?
Note: I am primarily a C++ programmer, the example above is so much easily solved by including corresponding headers that I can't believe it is not possible in Python.
There is nothing hackish about importing something in a function body, it's an absolutely valid pattern:
def some_function():
import logging
do_some_logging()
Usually ImportErrors are only raised because of the way import() evaluates top level statements of the entire file when called.
In case you do not have a logic circular dependency...
, nothing is impossible in python...
There is a way around it if you positively want your imports on top:
From David Beazleys excellent talk Modules and Packages: Live and Let Die! - PyCon 2015, 1:54:00, here is a way to deal with circular imports in python:
try:
from images.serializers import SimplifiedImageSerializer
except ImportError:
import sys
SimplifiedImageSerializer = sys.modules[__package__ + '.SimplifiedImageSerializer']
This tries to import SimplifiedImageSerializer and if ImportError is raised (due to a circular import error or the it not existing) it will pull it from the importcache.
PS: You have to read this entire post in David Beazley's voice.
Don't import mypackage.utils to your main module, it already exists in mypackage.myobjecta. Once you import mypackage.myobjecta the code from that module is being executed and you don't need to import anything to your current module, because mypackage.myobjecta is already complete.
What you want isn't possible. There's no way for Python to know in which order it needs to execute the top-level code in order to do what you ask.
Assume you import utils first. Python will begin by evaluating the first statement, from mypackage.myobjecta import MyObjectA, which requires executing the top level of the myobjecta module. Python must then execute from mypackage.utils import util_func1, util_func2, but it can't do that until it resolves the myobjecta import.
Instead of recursing infinitely, Python resolves this situation by allowing the innermost import to complete without finishing. Thus, the utils import completes without executing the rest of the file, and your import statement fails because util_func1 doesn't exist yet.
The reason import myobjecta works is that it allows the symbols to be resolved later, after the body of every module has executed. Personally, I've run into a lot of confusion even with this kind of circular import, and so I don't recommend using them at all.
If you really want to use a circular import anyway, and you want them to be "from" imports, I think the only way it can reliably work is this: Define all symbols used by another module before importing from that module. In this case, your definitions for util_func1 and util_func2 must be before your from mypackage.myobjecta import MyObjectA statement in utils, and the definition of MyObjectA must be before from mypackage.utils import util_func1, util_func2 in myobjecta.
Compiled languages like C# can handle situations like this because the top level is a collection of definitions, not instructions. They don't have to create every class and every function in the order given. They can work things out in whatever order is required to avoid any cycles. (C++ does it by duplicating information in prototypes, which I personally feel is a rather hacky solution, but that's also not how Python works.)
The advantage of a system like Python is that it's highly dynamic. Yes you can define a class or a function differently based on something you only know at runtime. Or modify a class after it's been created. Or try to import dependencies and go without them if they're not available. If you don't feel these things are worth the inconvenience of adhering to a strict dependency tree, that's totally reasonable, and maybe you'd be better served by a compiled language.
Pythonistas frown upon importing from a function. Pythonistas usually frown upon global variables. Yet, I saw both and don't think the projects that used them were any worse than others done by some strict Pythhonistas. The feature does exist, not going into a long argument over its utility.
There's an alternative to the problem of importing from a function: when you import from the top of a file (or the bottom, really), this import will take some time (some small time, but some time), but Python will cache the entire file and if another file needs the same import, Python can retrieve the module quickly without importing. Whereas, if you import from a function, things get complicated: Python will have to process the import line each time you call the function, which might, in a tiny way, slow your program down.
A solution to this is to cache the module independently. Okay, this uses imports inside function bodies AND global variables. Wow!
_MODULEA = None
def util1():
if _MODULEA is None:
from mymodule import modulea as _MODULEA
obj = _MODULEA.ClassYouWant
return obj
I saw this strategy adopted with a project using a flat API. Whether you like it or not (and I'm not sure about that myself), it works and is fast, because the import line is executed only once (when the function first executes). Still, I would recommend restructuring: problems with circular imports show a problem in structure, usually, and this is always worth fixing. I do agree, though, it would be nice if Python provided more useful errors when this kind of situation happens.

python workaround for circular import

Ok so it is like this.
I'd rather not give away my code but if you really need it I will. I have two modules that need a bit from each other. the modules are called webhandler and datahandler.
In webhandler I have a line:
import datahandler
and in datahandler I have another line:
import webhandler
Now I know this is terrible code and a circular import like this causes the code to run twice (which is what im trying to avoid).
However the datahandler module needs to access several functions from the webhandler module, and the webhandler module needs access to several variables that are generated in the datahandler module. I dont see any workaround other than moving functions to different modules but that would ruin the organisation of my program and make no logical sense with the module naming.
Any help?
Circular dependencies are a form of code smell. If you have two modules that depend on each other, then that’s a very bad sign, and you should restructure your code.
There are a few different ways to do this; which one is best depends on what you are doing, and what parts of each module are actually used by another.
A very simple solution would be to just merge both modules, so you only have a single module that only depends on itself, or rather on its own contents. This is simple, but since you had separated modules before, it’s likely that you are introducing new problems that way because you no longer have a separation of concerns.
Another solution would be to make sure that the dependencies are actually required. If there are only a few parts of a module that depend on the other, maybe you could move those bits around in a way that the circular dependency is no longer required, or utilize the way imports work to make the circular dependencies no longer a problem.
The better solution would probably be to move the dependencies into a separate new module. If naming is really the hardest problem about that, then you’re probably doing it right. It might “ruin the organisation of [your] program” but since you have circular dependencies, there is something inherently wrong with your setup anyway.
What others have said about not doing circular imports is the best solution, but if you end up absolutely needing them (possibly for backwards compatibility or clarity of code), it's usually within just one method or function of one of the modules. Thus you can safely do this:
# modA.py
import modB
# modB.py
def functionDependingOnA():
import modA
...
There's a slight overhead to doing the import each time the function is called, but it is rather low unless it's called all the time. (about 400ns in my testing).
You could also do like this to avoid even that lookup:
# modA -- same as above.
# modB.py
_imports = {}
def _importA():
import modA
_imports['modA'] = modA
return modA
def functionDependingOnA():
modA = _imports.get('modA') or _importA()
This version only added 40ns of time on the second and subsequent calls, or about the same amount of time as an empty local function call.
SqlAlchemy uses the Dependency Injection pattern, where the required module is passed to a function by a decorator:
#util.dependencies("sqlalchemy.orm.util")
def identity_key(cls, orm_util, *args, **kwargs):
return orm_util.identity_key(*args, **kwargs)
This approach is basically the same as doing an import inside a function, but has slightly better performance.
the webhandler module needs access to several variables that are generated in the datahandler module
It might make sense to push any "generated" data to a third location. So datahandler functions call config.setvar( name, value ) when appropriate and webhandler functions call config.getvar( name ) when they need to. config would be a third sub-module, containing simple setvar and getvar functions that you write (wrappers around setting/getting elements of a global dictionary would be the simplest approach).
Then the datahandler code would import webhandler, config but webhandler would only need to import config.
I agree with poke however, that the need for such a question betrays the fact that you probably haven't yet got the design finalized as neatly and logically as you thought. If it were me, I would re-think the way modules are divided up.

How are imports in Python handled? [duplicate]

PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.

Does python import all the listed libraries?

I'm just wondering, I often have really long python files and imports tend to stack quite quickly.
PEP8 says that the imports should always be written at the beginning of the file.
Do all the imported libraries get imported when calling a function coded in the file? Or do only the necessary libraries get called?
Does it make sense to worry about this? Is there no reason to import libraries within the functions or classes that need them?
Every time Python hits an import statement, it checks to see if that module has already been imported, and if not, imports it. So the imports at the top of your file will happen as soon as your file is run or imported by another module.
There is some overhead to this, so it's generally best to keep imports at the top of your file so that cost gets taken care of up front.
The best place for imports is at the top of your file. That documents the dependencies in one place and makes errors from their absence appear earlier. The import itself actually occurs at the time of the import statement, but this seldom matters much.
It is not typical that you have anything to gain by not importing a library until you are in a function or method that needs it. (There is never anything to gain by doing so inside the body of a class.) It is rare that you want optional dependencies and even rarer that this is the right technique to get them, though. Perhaps you can share a compelling use case?
Does it make sense to worry about
this?
No
There no reason to import libraries within the functions or classes that need them.
It's just slow because the import statement has to check to see if it's been imported once, and realize that it has been imported.
If you put this in a function that's called frequently, you can waste some time with all the import checking.
Imports happen when the module that contains the imports gets executed or imported, not when the functions are called.
Ordinarily, I wouldn't worry about it. If you are encountering slowdowns, you might profile to see if your problem is related to this. If it is, you can check to see if your module can divided up into smaller modules.
But if all the files are getting used by the same program, you'll just end up importing everything anyway.
If a function inside a module is the only one to import a given other module (say you have a function sending tweets, only if some configuration option is on), then it makes sense to import that specific module in the function.
Unless I see some profiling data proving otherwise, my guess is that the overhead of an import statement in a function is completely negligible.

Should import statements always be at the top of a module?

PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.

Categories

Resources