Performance between "from package import *" and "import package" - python

Is there any performance difference between from package import * and import package?

No, the difference is not a question of performance. In both cases, the entire module must be parsed, and any module-level code will be executed. The only difference is in namespaces: in the first, all the names in the imported module will become names in the current module; in the second, only the package name is defined in the current module.
That said, there's very rarely a good reason to use from foo import *. Either import the module, or import specific names from it.

Related

“from module import *” VS “import module”

from module import * VS import module
What I know
I know the difference between the 2, the difference is when you are using from module import *, you can just refer the classes, functions etc. in the module just like they are defined in the file they are imported in itself.
But when you are just usingimport module, you have to use module. before the name of the object to refer it.
The problem
So what I don’t know is why is it sometimes considered bad practice to use from module import * instead of import module?
PEP 8 states that
Wildcard imports (from <module> import *) should be avoided, as they
make it unclear which names are present in the namespace, confusing
both readers and many automated tools. There is one defensible use
case for a wildcard import, which is to republish an internal
interface as part of a public API (for example, overwriting a pure
Python implementation of an interface with the definitions from an
optional accelerator module and exactly which definitions will be
overwritten isn't known in advance).

How to find if given python import statement is internal or external?

I have an excel sheet having thousands of import statements.
eg.,
from XYZ.loghelper import LogHelper,
import os,
from models import CustomUser, VerticalApp,
from django.http import HttpResponse,
Some of them are built-in and some are user defined.
Now I have to find whether they are user defined or built-in.
How can I do that?
I assume that by builtin you mean "part of Python's stdlib" (python's "builtin" features, being "builtin", don't need to be imported at all). The definition of "user defined" is much more vague - is a 3rd part package like Django "builtin" or "user-defined" ?
But anyway: the short answer is that technically you CAN NOT tell just from the import statement. Modules are looked up in sys.path and the first matching module will be selected, so if you have a module named "os.py" in a local directory that comes before your Python installation's stdlib directory in sys.path then "import os" will indeed import your own "os.py" module instead of the stdlib's one. IOW, you need to use the exact same environment, import the module, and check the module's __file__ attribute to find out where it's been imported from.
Now most python devs try to avoid shadowing stdlib's module names (for obvious reasons) so you can also just build a list (technically you want a set for better perfs but anyway) of Python's stdlibs modules names, parse your imports statements, and check if the name of the module to be imported belongs to the set of stdlib's names. This should yield correct results in most cases, but it's not garanteed to be 100% accurate.

Is there a module cache in python? [duplicate]

If a large module is loaded by some submodule of your code, is there any benefit to referencing the module from that namespace instead of importing it again?
For example:
I have a module MyLib, which makes extensive use of ReallyBigLib. If I have code that imports MyLib, should I dig the module out like so
import MyLib
ReallyBigLib = MyLib.SomeModule.ReallyBigLib
or just
import MyLib
import ReallyBigLib
Python modules could be considered as singletons... no matter how many times you import them they get initialized only once, so it's better to do:
import MyLib
import ReallyBigLib
Relevant documentation on the import statement:
https://docs.python.org/2/reference/simple_stmts.html#the-import-statement
Once the name of the module is known (unless otherwise specified, the term “module” will refer to both packages and modules), searching for the module or package can begin. The first place checked is sys.modules, the cache of all modules that have been imported previously. If the module is found there then it is used in step (2) of import.
The imported modules are cached in sys.modules:
This is a dictionary that maps module names to modules which have already been loaded. This can be manipulated to force reloading of modules and other tricks. Note that removing a module from this dictionary is not the same as calling reload() on the corresponding module object.
As others have pointed out, Python maintains an internal list of all modules that have been imported. When you import a module for the first time, the module (a script) is executed in its own namespace until the end, the internal list is updated, and execution of continues after the import statement.
Try this code:
# module/file a.py
print "Hello from a.py!"
import b
# module/file b.py
print "Hello from b.py!"
import a
There is no loop: there is only a cache lookup.
>>> import b
Hello from b.py!
Hello from a.py!
>>> import a
>>>
One of the beauties of Python is how everything devolves to executing a script in a namespace.
It makes no substantial difference. If the big module has already been loaded, the second import in your second example does nothing except adding 'ReallyBigLib' to the current namespace.
WARNING: Python does not guarantee that module will not be initialized twice.
I've stubled upon such issue. See discussion:
http://code.djangoproject.com/ticket/8193
The internal registry of imported modules is the sys.modules dictionary, which maps module names to module objects. You can look there to see all the modules that are currently imported.
You can also pull some useful tricks (if you need to) by monkeying with sys.modules - for example adding your own objects as pseudo-modules which can be imported by other modules.
It is the same performancewise. There is no JIT compiler in Python yet.

How should I perform imports in a python module without polluting its namespace?

I am developing a Python package for dealing with some scientific data. There are multiple frequently-used classes and functions from other modules and packages, including numpy, that I need in virtually every function defined in any module of the package.
What would be the Pythonic way to deal with them? I have considered multiple variants, but every has its own drawbacks.
Import the classes at module-level with from foreignmodule import Class1, Class2, function1, function2
Then the imported functions and classes are easily accessible from every function. On the other hand, they pollute the module namespace making dir(package.module) and help(package.module) cluttered with imported functions
Import the classes at function-level with from foreignmodule import Class1, Class2, function1, function2
The functions and classes are easily accessible and do not pollute the module, but imports from up to a dozen modules in every function look as a lot of duplicate code.
Import the modules at module-level with import foreignmodule
Not too much pollution is compensated by the need to prepend the module name to every function or class call.
Use some artificial workaround like using a function body for all these manipulations and returning only the objects to be exported... like this
def _export():
from foreignmodule import Class1, Class2, function1, function2
def myfunc(x):
return function1(x, function2(x))
return myfunc
myfunc = _export()
del _export
This manages to solve both problems, module namespace pollution and ease of use for functions... but it seems to be not Pythonic at all.
So what solution is the most Pythonic? Is there another good solution I overlooked?
Go ahead and do your usual from W import X, Y, Z and then use the __all__ special symbol to define what actual symbols you intend people to import from your module:
__all__ = ('MyClass1', 'MyClass2', 'myvar1', …)
This defines the symbols that will be imported into a user's module if they import * from your module.
In general, Python programmers should not be using dir() to figure out how to use your module, and if they are doing so it might indicate a problem somewhere else. They should be reading your documentation or typing help(yourmodule) to figure out how to use your library. Or they could browse the source code yourself, in which case (a) the difference between things you import and things you define is quite clear, and (b) they will see the __all__ declaration and know which toys they should be playing with.
If you try to support dir() in a situation like this for a task for which it was not designed, you will have to place annoying limitations on your own code, as I hope is clear from the other answers here. My advice: don't do it! Take a look at the Standard Library for guidance: it does from … import … whenever code clarity and conciseness require it, and provides (1) informative docstrings, (2) full documentation, and (3) readable code, so that no one ever has to run dir() on a module and try to tell the imports apart from the stuff actually defined in the module.
One technique I've seen used, including in the standard library, is to use import module as _module or from module import var as _var, i.e. assigning imported modules/variables to names starting with an underscore.
The effect is that other code, following the usual Python convention, treats those members as private. This applies even for code that doesn't look at __all__, such as IPython's autocomplete function.
An example from Python 3.3's random module:
from warnings import warn as _warn
from types import MethodType as _MethodType, BuiltinMethodType as _BuiltinMethodType
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
from math import sqrt as _sqrt, acos as _acos, cos as _cos, sin as _sin
from os import urandom as _urandom
from collections.abc import Set as _Set, Sequence as _Sequence
from hashlib import sha512 as _sha512
Another technique is to perform imports in function scope, so that they become local variables:
"""Some module"""
# imports conventionally go here
def some_function(arg):
"Do something with arg."
import re # Regular expressions solve everything
...
The main rationale for doing this is that it is effectively lazy, delaying the importing of a module's dependencies until they are actually used. Suppose one function in the module depends on a particular huge library. Importing the library at the top of the file would mean that importing the module would load the entire library. This way, importing the module can be quick, and only client code that actually calls that function incurs the cost of loading the library. Further, if the dependency library is not available, client code that doesn't need the dependent feature can still import the module and call the other functions. The disadvantage is that using function-level imports obscures what your code's dependencies are.
Example from Python 3.3's os.py:
def get_exec_path(env=None):
"""[...]"""
# Use a local import instead of a global import to limit the number of
# modules loaded at startup: the os module is always loaded at startup by
# Python. It may also avoid a bootstrap issue.
import warnings
Import the module as a whole: import foreignmodule. What you claim as a drawback is actually a benefit. Namely, prepending the module name makes your code easier to maintain and makes it more self-documenting.
Six months from now when you look at a line of code like foo = Bar(baz) you may ask yourself which module Bar came from, but with foo = cleverlib.Bar it is much less of a mystery.
Of course, the fewer imports you have, the less of a problem this is. For small programs with few dependencies it really doesn't matter all that much.
When you find yourself asking questions like this, ask yourself what makes the code easier to understand, rather than what makes the code easier to write. You write it once but you read it a lot.
For this situation I would go with an all_imports.py file which had all the
from foreignmodule import .....
from another module import .....
and then in your working modules
import all_imports as fgn # or whatever you want to prepend
...
something = fgn.Class1()
Another thing to be aware of
__all__ = ['func1', 'func2', 'this', 'that']
Now, any functions/classes/variables/etc that are in your module, but not in your modules's __all__ will not show up in help(), and won't be imported by from mymodule import * See Making python imports more structured? for more info.
I would compromise and just pick a short alias for the foreign module:
import foreignmodule as fm
It saves you completely from the pollution (probably the bigger issue) and at least reduces the prepending burden.
I know this is an old question. It may not be 'Pythonic', but the cleanest way I've discovered for exporting only certain module definitions is, really as you've found, to globally wrap the module in a function. But instead of returning them, to export names, you can simply globalize them (global thus in essence becomes a kind of 'export' keyword):
def module():
global MyPublicClass,ExportedModule
import somemodule as ExportedModule
import anothermodule as PrivateModule
class MyPublicClass:
def __init__(self):
pass
class MyPrivateClass:
def __init__(self):
pass
module()
del module
I know it's not much different than your original conclusion, but frankly to me this seems to be the cleanest option. The other advantage is, you can group any number of modules written this way into a single file, and their private terms won't overlap:
def module():
global A
i,j,k = 1,2,3
class A:
pass
module()
del module
def module():
global B
i,j,k = 7,8,9 # doesn't overwrite previous declarations
class B:
pass
module()
del module
Though, keep in mind their public definitions will, of course, overlap.

Why is "import *" bad?

It is recommended to not to use import * in Python.
Can anyone please share the reason for that, so that I can avoid it doing next time?
Because it puts a lot of stuff into your namespace (might shadow some other object from previous import and you won't know about it).
Because you don't know exactly what is imported and can't easily find from which module a certain thing was imported (readability).
Because you can't use cool tools like pyflakes to statically detect errors in your code.
According to the Zen of Python:
Explicit is better than implicit.
... can't argue with that, surely?
You don't pass **locals() to functions, do you?
Since Python lacks an "include" statement, and the self parameter is explicit, and scoping rules are quite simple, it's usually very easy to point a finger at a variable and tell where that object comes from -- without reading other modules and without any kind of IDE (which are limited in the way of introspection anyway, by the fact the language is very dynamic).
The import * breaks all that.
Also, it has a concrete possibility of hiding bugs.
import os, sys, foo, sqlalchemy, mystuff
from bar import *
Now, if the bar module has any of the "os", "mystuff", etc... attributes, they will override the explicitly imported ones, and possibly point to very different things. Defining __all__ in bar is often wise -- this states what will implicitly be imported - but still it's hard to trace where objects come from, without reading and parsing the bar module and following its imports. A network of import * is the first thing I fix when I take ownership of a project.
Don't misunderstand me: if the import * were missing, I would cry to have it. But it has to be used carefully. A good use case is to provide a facade interface over another module.
Likewise, the use of conditional import statements, or imports inside function/class namespaces, requires a bit of discipline.
I think in medium-to-big projects, or small ones with several contributors, a minimum of hygiene is needed in terms of statical analysis -- running at least pyflakes or even better a properly configured pylint -- to catch several kind of bugs before they happen.
Of course since this is python -- feel free to break rules, and to explore -- but be wary of projects that could grow tenfold, if the source code is missing discipline it will be a problem.
That is because you are polluting the namespace. You will import all the functions and classes in your own namespace, which may clash with the functions you define yourself.
Furthermore, I think using a qualified name is more clear for the maintenance task; you see on the code line itself where a function comes from, so you can check out the docs much more easily.
In module foo:
def myFunc():
print 1
In your code:
from foo import *
def doThis():
myFunc() # Which myFunc is called?
def myFunc():
print 2
It is OK to do from ... import * in an interactive session.
Say you have the following code in a module called foo:
import ElementTree as etree
and then in your own module you have:
from lxml import etree
from foo import *
You now have a difficult-to-debug module that looks like it has lxml's etree in it, but really has ElementTree instead.
Understood the valid points people put here. However, I do have one argument that, sometimes, "star import" may not always be a bad practice:
When I want to structure my code in such a way that all the constants go to a module called const.py:
If I do import const, then for every constant, I have to refer it as const.SOMETHING, which is probably not the most convenient way.
If I do from const import SOMETHING_A, SOMETHING_B ..., then obviously it's way too verbose and defeats the purpose of the structuring.
Thus I feel in this case, doing a from const import * may be a better choice.
http://docs.python.org/tutorial/modules.html
Note that in general the practice of importing * from a module or package is frowned upon, since it often causes poorly readable code.
These are all good answers. I'm going to add that when teaching new people to code in Python, dealing with import * is very difficult. Even if you or they didn't write the code, it's still a stumbling block.
I teach children (about 8 years old) to program in Python to manipulate Minecraft. I like to give them a helpful coding environment to work with (Atom Editor) and teach REPL-driven development (via bpython). In Atom I find that the hints/completion works just as effectively as bpython. Luckily, unlike some other statistical analysis tools, Atom is not fooled by import *.
However, lets take this example... In this wrapper they from local_module import * a bunch modules including this list of blocks. Let's ignore the risk of namespace collisions. By doing from mcpi.block import * they make this entire list of obscure types of blocks something that you have to go look at to know what is available. If they had instead used from mcpi import block, then you could type walls = block. and then an autocomplete list would pop up.
It is a very BAD practice for two reasons:
Code Readability
Risk of overriding the variables/functions etc
For point 1:
Let's see an example of this:
from module1 import *
from module2 import *
from module3 import *
a = b + c - d
Here, on seeing the code no one will get idea regarding from which module b, c and d actually belongs.
On the other way, if you do it like:
# v v will know that these are from module1
from module1 import b, c # way 1
import module2 # way 2
a = b + c - module2.d
# ^ will know it is from module2
It is much cleaner for you, and also the new person joining your team will have better idea.
For point 2: Let say both module1 and module2 have variable as b. When I do:
from module1 import *
from module2 import *
print b # will print the value from module2
Here the value from module1 is lost. It will be hard to debug why the code is not working even if b is declared in module1 and I have written the code expecting my code to use module1.b
If you have same variables in different modules, and you do not want to import entire module, you may even do:
from module1 import b as mod1b
from module2 import b as mod2b
As a test, I created a module test.py with 2 functions A and B, which respectively print "A 1" and "B 1". After importing test.py with:
import test
. . . I can run the 2 functions as test.A() and test.B(), and "test" shows up as a module in the namespace, so if I edit test.py I can reload it with:
import importlib
importlib.reload(test)
But if I do the following:
from test import *
there is no reference to "test" in the namespace, so there is no way to reload it after an edit (as far as I can tell), which is a problem in an interactive session. Whereas either of the following:
import test
import test as tt
will add "test" or "tt" (respectively) as module names in the namespace, which will allow re-loading.
If I do:
from test import *
the names "A" and "B" show up in the namespace as functions. If I edit test.py, and repeat the above command, the modified versions of the functions do not get reloaded.
And the following command elicits an error message.
importlib.reload(test) # Error - name 'test' is not defined
If someone knows how to reload a module loaded with "from module import *", please post. Otherwise, this would be another reason to avoid the form:
from module import *
As suggested in the docs, you should (almost) never use import * in production code.
While importing * from a module is bad, importing * from a package is probably even worse.
By default, from package import * imports whatever names are defined by the package's __init__.py, including any submodules of the package that were loaded by previous import statements.
If a package’s __init__.py code defines a list named __all__, it is taken to be the list of submodule names that should be imported when from package import * is encountered.
Now consider this example (assuming there's no __all__ defined in sound/effects/__init__.py):
# anywhere in the code before import *
import sound.effects.echo
import sound.effects.surround
# in your module
from sound.effects import *
The last statement will import the echo and surround modules into the current namespace (possibly overriding previous definitions) because they are defined in the sound.effects package when the import statement is executed.

Categories

Resources