Python import precedence: packages or modules? - python

I wasn't clear how to correctly name this question.
Case 1
Assume that I have the following directory structure.
foo
|
+- bar/__init__.py
|
+- bar.py
If I have
from foo import bar
How do I know which bar (bar.py or bar/__init__.py) is being imported? Is there any easy way to automatically detect this from occurring?
Case 2
foo
|
+- foo.py
|
+- other.py
If other.py has the line
import foo
How do I know which foo (foo or foo.foo) is being imported? Again, is tehre any easy way to automatically detect this from occurring?

TLDR; a package takes precedence over a module of the same name if they are in the same directory.
From the docs:
"When a module named spam is imported, the interpreter searches for a file named spam.py in the current directory, and then in the list of directories specified by the environment variable PYTHONPATH. This has the same syntax as the shell variable PATH, that is, a list of directory names."
This is a bit misleading because the interpreter will also look for a package called spam (a directory called spam containing an __init__.py file). Since the directory entries are sorted before searching, packages take precedence over modules with the same name if they are in the same directory because spam comes before spam.py.
Note that "current directory" is relative to the main script path (the one where __name__ == '__main__' is True). So if you are at /home/billg calling /foo/bar.py, "current directory" refers to /foo.

from a python shell:
from foo import bar
print bar.__file__
should tell you which file has been imported
Rob

Packages (directories with __init__.py) take precedence over modules. The documentation of this fact is difficult to find but you can see this in the source: python 2.7, python 3.6 (thanks #qff for the find).
You will also need a __init__.py within the foo directory for your example to work.
If other.py is inside of foo/ then it will load foo.py (not the directory foo/) because it will look in the current directory first (unless you've played with PYTHONPATH or sys.path).

I would like to complement the accepted answer. For Python 3.3+, namespace packages have been introduced and the import order according to PEP 420 follows:
During import processing, the import machinery will continue to iterate over each directory in the parent path as it does in Python 3.2. While looking for a module or package named "foo", for each directory in the parent path:
If <directory>/foo/__init__.py is found, a regular package is imported and returned.
If not, but <directory>/foo.{py,pyc,so,pyd} is found, a module is imported and returned. The exact list of extension varies by platform and whether the -O flag is specified. The list here is representative.
If not, but <directory>/foo is found and is a directory, it is recorded and the scan continues with the next directory in the parent path.
Otherwise the scan continues with the next directory in the parent path.
If the scan completes without returning a module or package, and at least one directory was recorded, then a namespace package is created.

in the first case you're trying to import the function bar from file 'foo.py'
In the second you're trying to import the file 'foo.py'

Related

File not found on import when the same script is imported onto two other python scripts [duplicate]

This is a python newbie question:
I have the following directory structure:
test
-- test_file.py
a
-- b
-- module.py
where test, a and b are folders. Both test and a are on the same level.
module.py has a class called shape, and I want to instantiate an instance of it in test_file.py. How can I do so?
I have tried:
from a.b import module
but I got:
ImportError: No module named a.b
What you want is a relative import like:
from ..a.b import module
The problem with this is that it doesn't work if you are calling test_file.py as your main module. As stated here:
Note that both explicit and implicit relative imports are based on the name of the current module. Since the name of the main module is always "main", modules intended for use as the main module of a Python application should always use absolute imports.
So, if you want to call test_file.py as your main module, then you should consider changing the structure of your modules and using an absolute import, else just use the relative import from above.
The directory a needs to be a package. Add an __init__.py file to make it a package, which is a step up from being a simple directory.
The directory b also needs to be a subpackage of a. Add an __init__.py file.
The directory test should probably also be a package. Hard to say if this is necessary or not. It's usually a good idea for every directory of Python modules to be a formal package.
In order to import, the package needs to be on sys.path; this is built from the PYTHONPATH environment variable. By default the installed site-packages and the current working directory are (effectively) the only two places where a package can be found.
That means that a must either be installed, or, your current working directory must also be a package one level above a.
OR, you need to set your PYTHONPATH environment variable to include a.
http://docs.python.org/tutorial/modules.html#the-module-search-path
http://docs.python.org/using/cmdline.html#envvar-PYTHONPATH
Also, http://docs.python.org/library/site.html for complete information on how sys.path is built.
The first thing to do would be to quickly browse the official docs on this.
To make a directory a package, you'll have to add a __init__.py file. This means that you'll have such a file in the a and b directories. Then you can directly do an
import a.b.module
But you'll have to refer to it as a.b.module which is tedious so you can use the as form of the import like so
import a.b.module as mod #shorter name
and refer to it as mod.
Then you can instantiate things inside mod using the regular conventions like mod.shape().
There are a few other subtleties. Please go through the docs for details.

Why can I import successfully without __init__.py?

What exactly is the use of __init__.py? Yes, I know this file makes a directory into an importable package. However, consider the following example:
project/
foo/
__init__.py
a.py
bar/
b.py
If I want to import a into b, I have to add following statement:
sys.path.append('/path_to_foo')
import foo.a
This will run successfully with or without __init__.py. However, if there is not an sys.path.append statement, a "no module" error will occur, with or without __init__.py. This makes it seem lik eonly the system path matters, and that __init__.py does not have any effect.
Why would this import work without __init__.py?
__init__.py has nothing to do with whether Python can find your package. You've run your code in such a way that your package isn't on the search path by default, but if you had run it differently or configured your PYTHONPATH differently, the sys.path.append would have been unnecessary.
__init__.py used to be necessary to create a package, and in most cases, you should still provide it. Since Python 3.3, though, a folder without an __init__.py can be considered part of an implicit namespace package, a feature for splitting a package across multiple directories.
During import processing, the import machinery will continue to
iterate over each directory in the parent path as it does in Python
3.2. While looking for a module or package named "foo", for each directory in the parent path:
If <directory>/foo/__init__.py is found, a regular package is imported and returned.
If not, but <directory>/foo.{py,pyc,so,pyd} is found, a module is imported and returned. The exact list of extension varies by platform
and whether the -O flag is specified. The list here is
representative.
If not, but <directory>/foo is found and is a directory, it is recorded and the scan continues with the next directory in the parent
path.
Otherwise the scan continues with the next directory in the parent path.
If the scan completes without returning a module or package, and at
least one directory was recorded, then a namespace package is created.
If you really want to avoid __init__.py for some reason, you don't sys.path. Rather, create a module object and set its __path__ to a list of directories.
if I want to import a into b, I have to add following statement:
No! You'd just say: import foo.a. All this is provided you run the entire package at once using python -m main.module where main.module is the entry point to your entire application. It imports all other modules, and the modules that import more modules will try to look for them from the root of this project. For instance, foo.bar.c will import as foo.bar.b
Then it seems that only the system path matters and init.py does not have any effect.
You need to modify sys.path only when you are importing modules from locations that are not in your project, or the places where python looks for libraries. __init__.py not only makes a folder look like a package, it also does a few more things like "export" objects to outside world (__all__)
When you import something it has to either:
Retrieve an already loaded module or
Load the module that was imported
When you do import foo and python finds a folder called foo in a folder on your sys.path then it will look in that folder for an __init__.py to be considered the top level module.
(Note that if the package is not on your sys.path then you would need to append it's location to be able to import it.)
If that is not present it will look for a __init__.pyc version possibly in the __pycache__ folder, if that is also missing then that folder foo is not considered a loadable python package. If no other options for foo are found then an ImportError is raised.
If you try deleting the __init__.pyc file as well you will see that the the initializer script for a package is indeed necessary.

What's the order Python used to import module?

I encounted something strange about python import statement.
Let's say that I have a file structure like below:
foo\
__init__.py
bar.py
os.py
Codes in bar.py (Other files are empty)
import os
print os.__file__
The strange thing is when I run python -m foo.bar, it prints
foo/os.pyc
However, when I changed direcotry to foo, and run python -m bar, it prints
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc
What's the difference between the two ways I run script?
In a word, what's the order Python used to import module?
From official documents, I found several text about this problem (They made me even more confused)
6.1.2. The Module Search Path
the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path.
sys.path
the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first.
6.4.2. Intra-package References
In fact, such references are so common that the import statement first looks in the containing package before looking in the standard module search path.
...
If the imported module is not found in the current package (the package of which the current module is a submodule), the import statement looks for a top-level module with the given name.
What's the difference between the two ways I run script?
The difference is if foo is (from python's view) a loaded module or not.
If you run python -m foo.bar, foo is a valid module. Even with Python 2.7, import os is still a relative import and hence os gets resolved against the containing module (i.e. foo), first:
https://docs.python.org/2/tutorial/modules.html#intra-package-references:
The submodules often need to refer to each other. For example, the surround module might use the echo module. In fact, such references are so common that the import statement first looks in the containing package before looking in the standard module search path.
When you run python -m bar, bar is a top level module, i.e. it has no containing module. In that case import os goes to through sys.path.
The default module search for an import bla is
If a containing module exists, do a relative import against the containing module.
Go into sys.path and use the first successful import.
To disable (1), you can
from __future__ import absolute_import
at the very top of a module.
Confusing? Absolutely.

Python path explained: import from a subpackage

This questions is detailing a behavior that I can't explain to myself.
src/package/__init__.py is empty but present.
src/package/subpackage/__init__.py:
pink = 'It works'
src/package/test/test.py:
import package.subpackage as subpackage
# I also tried `import package.subpackage as subpackage
print subpackage.pink
Calling from src: python package/test/test.py just fails with ImportError: No module named subpackage. Please note that import package doesn't work either.
NB: (Running an interpreter from src and typing the import statement works perfectly well.
Should I understand that I'm not suppose to call subfile of a package? In my project it's a test file so it sounds logical for me have it here.
Why the current working directory is not in the import path?
Many thanks for those who reads and those who answers.
Because you package is not in $PYTHONPATH. If you what to call test.py, you can move your test.py file to src/ directory, or add src to $PYTHONPATH
PYTHONPATH="/path/to/src:$PYTHONPATH"
export PYTHONPATH
From Documentation
When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path
>>> import sys
>>> sys.path
The output is like this
['.', '/usr/bin', ...
This means that the current directory is in sys.path as well. If you want to import a module, please make sure that the module path is in sys.path, by adding your package directory to the environment variable PYTHONPATH, or changing your current directory or script directory to the package directory.
On python package/test/test.py fails, it's also ran from src:
when you starts a intepreter from src, '' is in sys.path, so path of src could be found;
when you run python package/test/test.py from src, '' is missing from sys.path, although os.path.abspath('.') shows current dir is "<xxx>\\src", "<xxx>\\src" is not in sys.path, while "<xxx>\\src\\package\\test" is in sys.path. That's saying, python adds path of the file to sys.path, not the path where you run the script.
see what the docs says:
As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted before the entries inserted as a result of PYTHONPATH.

Shouldn't the imports be absolute by default in python27?

Imagine the directory structure:
/
a/
__init__.py
b.py
c.py
c.py
File /a/b.py looks like:
import c
should_be_absolute = c
All the other files (including __init__) are empty.
When running a test script (using python 2.7):
import a.b
print a.b.should_be_absolute
with PYTHONPATH=/ from an empty directory (so nothing is added to PYTHONPATH from current directory) I get
<module 'a.c' from '/a/c.py'>
where according to PEP 328 and the statement import <> is always absolute I would expect:
<module 'c' from '/c.py'>
The output is as expected when I remove the /a/c.py file.
What am I missing? And if this is the correct behavior - how to import the c module from b (instead of a.c)?
Update:
According to python dev mailing list it appears to be a bug in the documentation. The imports are not absolute by default in python27.
you need to add from __future__ import absolute_import or use importlib.import_module('c') on Python 2.7
It is default on Python 3.
There was a bug in Python: __future__.py and its documentation claim absolute imports became mandatory in 2.7, but they didn't.
If you are only adding / to your PYTHONPATH, then the search order could still be looking for c in the current directory. It would be a lot better if you placed everything under a root package, and referred to it absolutely:
/myPackage
a/
__init__.py
b.py
c.py
__init__.py
c.py
And a PYTHONPATH like: export PYTHONPATH=/:$PYTHONPATH
So in your a.c you would do and of these:
from myPackage import c
from myPackage.c import Foo
import myPackage.c
This way, it is always relative to your package.
"Absolute" doesn't mean the one that you think it does; instead, it means that the "usual" package resolving procedure takes place: first, it looks in the directory of the package, then in all the elements of sys.path; which includes the elements from PYTHONPATH.
If you really want to, you can use tools like the imp module, but I'd recommend against it for something like this. Because in general, you shouldn't ever have to create a module with the same name as one in the standard Python distribution.

Categories

Resources