How does importing work in Python [duplicate] - python

I have two related Python 'import' questions. They are easily testable, but I want answers that are language-defined and not implementation-specific, and I'm also interested in style/convention, so I'm asking here instead.
1)
If module A imports module B, and module B imports module C, can code in module A reference module C without an explicit import? If so, am I correct in assuming this is bad practice?
2)
If I import module A.B.C, does that import modules A and A.B as well? If so, is it by convention better to explicitly import A; import A.B; import A.B.C?

The first thing you should know is that the Python language is NOT an ISO standard. This is rather different from C/C++, and it means that there's no "proper" way to define a language behaviour - CPython might do something just because it was coded that way, and Jython might do the other way round.
about your questions, remember that "importing" a module is a two-part operation: first the module is loaded - if it had never been, e.g. if it wasn't available in sys.modules, then a name is bound to that module in the local namespace.
hence:
1) Yes, you can reference whatever you want from module a by providing the proper namespace, e.g. you'll have to do something like
B.C.name = "something"
And I think this is very rarely done in Python programs and could be considered bad practice since it forces a "transitive dep" - if some module B implementation is refactored and doesn't depend on C anymore, it should continue offering the C module just for satisfying A deps.
Of course setting __ all __ can prevent this, and a good practice may be to put __ all __ in all your modules, and export just the symbols you want to be really public.
2) Yes and no. Doing
import a.b.c.d
performs the first import phase (loading) on all modules, but the second just on a (and, recursively, in b with respect to c, etc) but all the modules in the chain must be referenced by full namespace; after such an import, you can do
a.something
a.b.something
a.b.c.something
but you can't do
c.something
b.something
I must admit that kind of usage is pretty rare as well; I generally prefer the "from module import something" way-to-import, and generally you just ask for what you need - such nesting is neither common in libraries, nor its usage is that common.
Many times there're "outer packages", just used for organization, which hold modules with classes. It's very likely that a, b, c above are just packages, and d is a module which truly holds classes, functions and other objects. So the proper usage would be:
from a.b.c.d import name1, name2, name3
I hope this satifies your curiosity.

Alan's given a great answer, but I wanted to add that for your question 1 it depends on what you mean by 'imports'.
If you use the from C import x syntax, then x becomes available in the namespace of B. If in A you then do import B, you will have access to x from A as B.x.
It's not so much bad practice as potentially confusing, and will make debugging etc harder as you won't necessarily know where the objects have come from.

Related

Python cross module variables and imports

I see that there are other questions related to cross module variables, but they don't really fully answer my question.
I have an application that I have split into 3 modules + 1 main application, mainly for ease of readability and maintainability.
2 of these modules have threads with variables that need to be modified from other modules and other module threads.
Whilst I can modify a module's variable from the main code, I don't appear to be able to modify one module's variable from another module unless I import every module into every other module.
The example below where a&b are imported into main and a module a needs to access a variable in module b:
main
module a
var a
module b
var a
main
a.a = 1
b.a = 2
module a
b.a = 3
module b
a.a = 0
without importing module a into module b and importing module b into module a, can this be achieved globally through the main program ?
If I do have to import a and b into main, and then import a into b and b into a, what are the implications in terms of memory and resource usage / speed etc ?
I tried the suggestion from #abarnert:
#moda
vara = 10
#modb
print(str(vara))
#main
import moda
from moda import vara
import modb
however I get "name error vara is not defined"
If the code in the modules are defined as classes, and the main program creates instances of these classes, the main program can pass an instance of one module class to another, and changes to that instance will be reflected everywhere. There would be no need to import a or b into each other, because they would simply have references to each other.
If I do have to import a and b into main, and then import a into b and b into a, what are the implications in terms of memory and resource usage / speed etc ?
Absolutely none for memory—every module that imports a will get a reference to the exact same a module object. All you're doing is increasing its refcount, not creating new objects.
For speed, the time to discover that you're trying to import a module that already exists is almost nothing (it's just looking up the module name in a dictionary). It is slightly slower to access a.a than to just access a. But this is very rarely an issue. If it is, you're almost certainly going to want to copy that value into the locals of whatever function is accessing it over and over, at which point it won't matter which globals it came from.
without importing module a into module b and importing module b into module a, can this be achieved globally through the main program ?
Sure. All you have to do is import (with from a import a or import a.a as aa or whatever) or copy the variables from module a into main.
Note that just makes a new name for each value; it doesn't make references to the variables. There is no such thing as a reference to a variable in Python.
This works if the variables are holding constants, or if they're holding mutable values that you modify. It just doesn't do anything useful if the variables are names that you want to rebind to new values. (If you do need to do that, just wrap the values in something mutable—e.g., turn each variable into a 1-item list, so you can rebind a[0] instead of a, which means anyone else who has a reference to a can see your new a[0] value.)
If you insist on a "true global", even that isn't impossible. See builtins for details. But you almost certainly don't want this.
If you want to be able to modify a module-level variable from a different module then yes, you will need to import the other module. I would question why you need to do this. Perhaps you should be breaking your code into classes instead of separate modules.
For example you could choose to encapsulate the variables that need to be modified by both modules inside a separate class and pass a single instance of that class to all classes (or modules but you should really use classes) that need it.
See Circular (or cyclic) imports in Python for more information about cyclical imports.

Why import when you need to use the full name?

In python, if you need a module from a different package you have to import it. Coming from a Java background, that makes sense.
import foo.bar
What doesn't make sense though, is why do I need to use the full name whenever I want to use bar? If I wanted to use the full name, why do I need to import? Doesn't using the full name immediately describe which module I'm addressing?
It just seems a little redundant to have from foo import bar when that's what import foo.bar should be doing. Also a little vague why I had to import when I was going to use the full name.
The thing is, even though Python's import statement is designed to look similar to Java's, they do completely different things under the hood. As you know, in Java an import statement is really little more than a hint to the compiler. It basically sets up an alias for a fully qualified class name. For example, when you write
import java.util.Set;
it tells the compiler that throughout that file, when you write Set, you mean java.util.Set. And if you write s.add(o) where s is an object of type Set, the compiler (or rather, linker) goes out and finds the add method in Set.class and puts in a reference to it.
But in Python,
import util.set
(that is a made-up module, by the way) does something completely different. See, in Python, packages and modules are not just names, they're actual objects, and when you write util.set in your code, that instructs Python to access an object named util and look for an attribute on it named set. The job of Python's import statement is to create that object and attribute. The way it works is that the interpreter looks for a file named util/__init__.py, uses the code in it to define properties of an object, and binds that object to the name util. Similarly, the code in util/set.py is used to initialize an object which is bound to util.set. There's a function called __import__ which takes care of all of this, and in fact the statement import util.set is basically equivalent to
util = __import__('util.set')
The point is, when you import a Python module, what you get is an object corresponding to the top-level package, util. In order to get access to util.set you need to go through that, and that's why it seems like you need to use fully qualified names in Python.
There are ways to get around this, of course. Since all these things are objects, one simple approach is to just bind util.set to a simpler name, i.e. after the import statement, you can have
set = util.set
and from that point on you can just use set where you otherwise would have written util.set. (Of course this obscures the built-in set class, so I don't recommend actually using the name set.) Or, as mentioned in at least one other answer, you could write
from util import set
or
import util.set as set
This still imports the package util with the module set in it, but instead of creating a variable util in the current scope, it creates a variable set that refers to util.set. Behind the scenes, this works kind of like
_util = __import__('util', fromlist='set')
set = _util.set
del _util
in the former case, or
_util = __import__('util.set')
set = _util.set
del _util
in the latter (although both ways do essentially the same thing). This form is semantically more like what Java's import statement does: it defines an alias (set) to something that would ordinarily only be accessible by a fully qualified name (util.set).
You can shorten it, if you would like:
import foo.bar as whateveriwant
Using the full name prevents two packages with the same-named submodules from clobbering each other.
There is a module in the standard library called io:
In [84]: import io
In [85]: io
Out[85]: <module 'io' from '/usr/lib/python2.6/io.pyc'>
There is also a module in scipy called io:
In [95]: import scipy.io
In [96]: scipy.io
Out[96]: <module 'scipy.io' from '/usr/lib/python2.6/dist-packages/scipy/io/__init__.pyc'>
If you wanted to use both modules in the same script, then namespaces are a convenient way to distinguish the two.
In [97]: import this
The Zen of Python, by Tim Peters
...
Namespaces are one honking great idea -- let's do more of those!
in Python, importing doesn't just indicate you might use something. The import actually executes code at the module level. You can think of the import as being the moment where the functions are 'interpreted' and created. Any code that is in the _____init_____.py level or not inside a function or class definition happens then.
The import also makes an inexpensive copy of the whole module's namespace and puts it inside the namespace of the file / module / whatever where it is imported. An IDE then has a list of the functions you might be starting to type for command completion.
Part of the Python philosophy is explicit is better than implicit. Python could automatically import the first time you try to access something from a package, but that's not explicit.
I'm also guessing that package initialization would be much more difficult if the imports were automatic, as it wouldn't be done consistently in the code.
You're a bit confused about how Python imports work. (I was too when I first started.) In Python, you can't simply refer to something within a module by the full name, unlike in Java; you HAVE to import the module first, regardless of how you plan on referring to the imported item. Try typing math.sqrt(5) in the interpreter without importing math or math.sqrt first and see what happens.
Anyway... the reason import foo.bar has you required to use foo.bar instead of just bar is to prevent accidental namespace conflicts. For example, what if you do import foo.bar, and then import baz.bar?
You could, of course, choose to do import foo.bar as bar (i.e. aliasing), but if you're doing that you may as well just use from foo import bar. (EDIT: except when you want to import methods and variables. Then you have to use the from ... import ... syntax. This includes instances where you want to import a method or variable without aliasing, i.e. you can't simply do import foo.bar if bar is a method or variable.)
Other than in Java, in Python import foo.bar declares, that you are going to use the thing referred to by foo.bar.
This matches with Python's philosophy that explicit is better than implicit. There are more programming languages that make inter-module dependencies more explicit than Java, for example Ada.
Using the full name makes it possible to disambiguate definitions with the same name coming from different modules.
You don't have to use the full name. Try one of these
from foo import bar
import foo.bar as bar
import foo.bar
bar = foo.bar
from foo import *
A few reasons why explicit imports are good:
They help signal to humans and tools what packages your module depends on.
They avoid the overhead of dynamically determining which packages have to be loaded (and possibly compiled) at run time.
They (along with sys.path) unambiguously distinguish symbols with conflicting names from different namespaces.
They give the programmer some control of what enters the namespace within which he is working.

dynamic module creation

I'd like to dynamically create a module from a dictionary, and I'm wondering if adding an element to sys.modules is really the best way to do this. EG
context = { a: 1, b: 2 }
import types
test_context_module = types.ModuleType('TestContext', 'Module created to provide a context for tests')
test_context_module.__dict__.update(context)
import sys
sys.modules['TestContext'] = test_context_module
My immediate goal in this regard is to be able to provide a context for timing test execution:
import timeit
timeit.Timer('a + b', 'from TestContext import *')
It seems that there are other ways to do this, since the Timer constructor takes objects as well as strings. I'm still interested in learning how to do this though, since a) it has other potential applications; and b) I'm not sure exactly how to use objects with the Timer constructor; doing so may prove to be less appropriate than this approach in some circumstances.
EDITS/REVELATIONS/PHOOEYS/EUREKA:
I've realized that the example code relating to running timing tests won't actually work, because import * only works at the module level, and the context in which that statement is executed is that of a function in the testit module. In other words, the globals dictionary used when executing that code is that of __main__, since that's where I was when I wrote the code in the interactive shell. So that rationale for figuring this out is a bit botched, but it's still a valid question.
I've discovered that the code run in the first set of examples has the undesirable effect that the namespace in which the newly created module's code executes is that of the module in which it was declared, not its own module. This is like way weird, and could lead to all sorts of unexpected rattlesnakeic sketchiness. So I'm pretty sure that this is not how this sort of thing is meant to be done, if it is in fact something that the Guido doth shine upon.
The similar-but-subtly-different case of dynamically loading a module from a file that is not in python's include path is quite easily accomplished using imp.load_source('NewModuleName', 'path/to/module/module_to_load.py'). This does load the module into sys.modules. However this doesn't really answer my question, because really, what if you're running python on an embedded platform with no filesystem?
I'm battling a considerable case of information overload at the moment, so I could be mistaken, but there doesn't seem to be anything in the imp module that's capable of this.
But the question, essentially, at this point is how to set the global (ie module) context for an object. Maybe I should ask that more specifically? And at a larger scope, how to get Python to do this while shoehorning objects into a given module?
Hmm, well one thing I can tell you is that the timeit function actually executes its code using the module's global variables. So in your example, you could write
import timeit
timeit.a = 1
timeit.b = 2
timeit.Timer('a + b').timeit()
and it would work. But that doesn't address your more general problem of defining a module dynamically.
Regarding the module definition problem, it's definitely possible and I think you've stumbled on to pretty much the best way to do it. For reference, the gist of what goes on when Python imports a module is basically the following:
module = imp.new_module(name)
execfile(file, module.__dict__)
That's kind of the same thing you do, except that you load the contents of the module from an existing dictionary instead of a file. (I don't know of any difference between types.ModuleType and imp.new_module other than the docstring, so you can probably use them interchangeably) What you're doing is somewhat akin to writing your own importer, and when you do that, you can certainly expect to mess with sys.modules.
As an aside, even if your import * thing was legal within a function, you might still have problems because oddly enough, the statement you pass to the Timer doesn't seem to recognize its own local variables. I invoked a bit of Python voodoo by the name of extract_context() (it's a function I wrote) to set a and b at the local scope and ran
print timeit.Timer('print locals(); a + b', 'sys.modules["__main__"].extract_context()').timeit()
Sure enough, the printout of locals() included a and b:
{'a': 1, 'b': 2, '_timer': <built-in function time>, '_it': repeat(None, 999999), '_t0': 1277378305.3572791, '_i': None}
but it still complained NameError: global name 'a' is not defined. Weird.

Python import mechanics

I have two related Python 'import' questions. They are easily testable, but I want answers that are language-defined and not implementation-specific, and I'm also interested in style/convention, so I'm asking here instead.
1)
If module A imports module B, and module B imports module C, can code in module A reference module C without an explicit import? If so, am I correct in assuming this is bad practice?
2)
If I import module A.B.C, does that import modules A and A.B as well? If so, is it by convention better to explicitly import A; import A.B; import A.B.C?
The first thing you should know is that the Python language is NOT an ISO standard. This is rather different from C/C++, and it means that there's no "proper" way to define a language behaviour - CPython might do something just because it was coded that way, and Jython might do the other way round.
about your questions, remember that "importing" a module is a two-part operation: first the module is loaded - if it had never been, e.g. if it wasn't available in sys.modules, then a name is bound to that module in the local namespace.
hence:
1) Yes, you can reference whatever you want from module a by providing the proper namespace, e.g. you'll have to do something like
B.C.name = "something"
And I think this is very rarely done in Python programs and could be considered bad practice since it forces a "transitive dep" - if some module B implementation is refactored and doesn't depend on C anymore, it should continue offering the C module just for satisfying A deps.
Of course setting __ all __ can prevent this, and a good practice may be to put __ all __ in all your modules, and export just the symbols you want to be really public.
2) Yes and no. Doing
import a.b.c.d
performs the first import phase (loading) on all modules, but the second just on a (and, recursively, in b with respect to c, etc) but all the modules in the chain must be referenced by full namespace; after such an import, you can do
a.something
a.b.something
a.b.c.something
but you can't do
c.something
b.something
I must admit that kind of usage is pretty rare as well; I generally prefer the "from module import something" way-to-import, and generally you just ask for what you need - such nesting is neither common in libraries, nor its usage is that common.
Many times there're "outer packages", just used for organization, which hold modules with classes. It's very likely that a, b, c above are just packages, and d is a module which truly holds classes, functions and other objects. So the proper usage would be:
from a.b.c.d import name1, name2, name3
I hope this satifies your curiosity.
Alan's given a great answer, but I wanted to add that for your question 1 it depends on what you mean by 'imports'.
If you use the from C import x syntax, then x becomes available in the namespace of B. If in A you then do import B, you will have access to x from A as B.x.
It's not so much bad practice as potentially confusing, and will make debugging etc harder as you won't necessarily know where the objects have come from.

Why is "import *" bad?

It is recommended to not to use import * in Python.
Can anyone please share the reason for that, so that I can avoid it doing next time?
Because it puts a lot of stuff into your namespace (might shadow some other object from previous import and you won't know about it).
Because you don't know exactly what is imported and can't easily find from which module a certain thing was imported (readability).
Because you can't use cool tools like pyflakes to statically detect errors in your code.
According to the Zen of Python:
Explicit is better than implicit.
... can't argue with that, surely?
You don't pass **locals() to functions, do you?
Since Python lacks an "include" statement, and the self parameter is explicit, and scoping rules are quite simple, it's usually very easy to point a finger at a variable and tell where that object comes from -- without reading other modules and without any kind of IDE (which are limited in the way of introspection anyway, by the fact the language is very dynamic).
The import * breaks all that.
Also, it has a concrete possibility of hiding bugs.
import os, sys, foo, sqlalchemy, mystuff
from bar import *
Now, if the bar module has any of the "os", "mystuff", etc... attributes, they will override the explicitly imported ones, and possibly point to very different things. Defining __all__ in bar is often wise -- this states what will implicitly be imported - but still it's hard to trace where objects come from, without reading and parsing the bar module and following its imports. A network of import * is the first thing I fix when I take ownership of a project.
Don't misunderstand me: if the import * were missing, I would cry to have it. But it has to be used carefully. A good use case is to provide a facade interface over another module.
Likewise, the use of conditional import statements, or imports inside function/class namespaces, requires a bit of discipline.
I think in medium-to-big projects, or small ones with several contributors, a minimum of hygiene is needed in terms of statical analysis -- running at least pyflakes or even better a properly configured pylint -- to catch several kind of bugs before they happen.
Of course since this is python -- feel free to break rules, and to explore -- but be wary of projects that could grow tenfold, if the source code is missing discipline it will be a problem.
That is because you are polluting the namespace. You will import all the functions and classes in your own namespace, which may clash with the functions you define yourself.
Furthermore, I think using a qualified name is more clear for the maintenance task; you see on the code line itself where a function comes from, so you can check out the docs much more easily.
In module foo:
def myFunc():
print 1
In your code:
from foo import *
def doThis():
myFunc() # Which myFunc is called?
def myFunc():
print 2
It is OK to do from ... import * in an interactive session.
Say you have the following code in a module called foo:
import ElementTree as etree
and then in your own module you have:
from lxml import etree
from foo import *
You now have a difficult-to-debug module that looks like it has lxml's etree in it, but really has ElementTree instead.
Understood the valid points people put here. However, I do have one argument that, sometimes, "star import" may not always be a bad practice:
When I want to structure my code in such a way that all the constants go to a module called const.py:
If I do import const, then for every constant, I have to refer it as const.SOMETHING, which is probably not the most convenient way.
If I do from const import SOMETHING_A, SOMETHING_B ..., then obviously it's way too verbose and defeats the purpose of the structuring.
Thus I feel in this case, doing a from const import * may be a better choice.
http://docs.python.org/tutorial/modules.html
Note that in general the practice of importing * from a module or package is frowned upon, since it often causes poorly readable code.
These are all good answers. I'm going to add that when teaching new people to code in Python, dealing with import * is very difficult. Even if you or they didn't write the code, it's still a stumbling block.
I teach children (about 8 years old) to program in Python to manipulate Minecraft. I like to give them a helpful coding environment to work with (Atom Editor) and teach REPL-driven development (via bpython). In Atom I find that the hints/completion works just as effectively as bpython. Luckily, unlike some other statistical analysis tools, Atom is not fooled by import *.
However, lets take this example... In this wrapper they from local_module import * a bunch modules including this list of blocks. Let's ignore the risk of namespace collisions. By doing from mcpi.block import * they make this entire list of obscure types of blocks something that you have to go look at to know what is available. If they had instead used from mcpi import block, then you could type walls = block. and then an autocomplete list would pop up.
It is a very BAD practice for two reasons:
Code Readability
Risk of overriding the variables/functions etc
For point 1:
Let's see an example of this:
from module1 import *
from module2 import *
from module3 import *
a = b + c - d
Here, on seeing the code no one will get idea regarding from which module b, c and d actually belongs.
On the other way, if you do it like:
# v v will know that these are from module1
from module1 import b, c # way 1
import module2 # way 2
a = b + c - module2.d
# ^ will know it is from module2
It is much cleaner for you, and also the new person joining your team will have better idea.
For point 2: Let say both module1 and module2 have variable as b. When I do:
from module1 import *
from module2 import *
print b # will print the value from module2
Here the value from module1 is lost. It will be hard to debug why the code is not working even if b is declared in module1 and I have written the code expecting my code to use module1.b
If you have same variables in different modules, and you do not want to import entire module, you may even do:
from module1 import b as mod1b
from module2 import b as mod2b
As a test, I created a module test.py with 2 functions A and B, which respectively print "A 1" and "B 1". After importing test.py with:
import test
. . . I can run the 2 functions as test.A() and test.B(), and "test" shows up as a module in the namespace, so if I edit test.py I can reload it with:
import importlib
importlib.reload(test)
But if I do the following:
from test import *
there is no reference to "test" in the namespace, so there is no way to reload it after an edit (as far as I can tell), which is a problem in an interactive session. Whereas either of the following:
import test
import test as tt
will add "test" or "tt" (respectively) as module names in the namespace, which will allow re-loading.
If I do:
from test import *
the names "A" and "B" show up in the namespace as functions. If I edit test.py, and repeat the above command, the modified versions of the functions do not get reloaded.
And the following command elicits an error message.
importlib.reload(test) # Error - name 'test' is not defined
If someone knows how to reload a module loaded with "from module import *", please post. Otherwise, this would be another reason to avoid the form:
from module import *
As suggested in the docs, you should (almost) never use import * in production code.
While importing * from a module is bad, importing * from a package is probably even worse.
By default, from package import * imports whatever names are defined by the package's __init__.py, including any submodules of the package that were loaded by previous import statements.
If a package’s __init__.py code defines a list named __all__, it is taken to be the list of submodule names that should be imported when from package import * is encountered.
Now consider this example (assuming there's no __all__ defined in sound/effects/__init__.py):
# anywhere in the code before import *
import sound.effects.echo
import sound.effects.surround
# in your module
from sound.effects import *
The last statement will import the echo and surround modules into the current namespace (possibly overriding previous definitions) because they are defined in the sound.effects package when the import statement is executed.

Categories

Resources