I'm reading PEP338 .Some words confused me:
If the module is found, and is of type PY_SOURCE or PY_COMPILED , then the command line is effectively reinterpreted from python <options> -m <module> <args> to python <options> <filename> <args> .
Do modules have types in Python?
Modules can be loaded from different sources. The author refers to 2 specific sources the module was loaded from, see the imp module documentation:
imp.PY_SOURCE
The module was found as a source file.
[...]
imp.PY_COMPILED
The module was found as a compiled code object file.
[...]
imp.C_EXTENSION
The module was found as dynamically loadable shared library.
These values are used in the return value of the imp.get_suffixes() function, among others.
The PEP states that only modules loaded from source (.py files) and from a bytecode cache file (.pyc) are supported; the -m switch does not support C extension modules (typically .so or .dll dynamically loaded libraries).
The resulting module object is still just a module object; the word type in the text you found is not referring Python's type system.
Quoting from the link PEP338
Proposed Semantics
The semantics proposed are fairly simple: if -m is
used to execute a module the PEP 302 import mechanisms are used to
locate the module and retrieve its compiled code, before executing the
module in accordance with the semantics for a top-level module.
Now let us refer to the documentation of imp (the import mechanism) and determine the different types of modules that can be imported
imp.get_suffixes()
imp.get_suffixes() Return a list of 3-element tuples, each describing
a particular type of module. Each triple has the form (suffix, mode,
type), where suffix is a string to be appended to the module name to
form the filename to search for, mode is the mode string to pass to
the built-in open() function to open the file (this can be 'r' for
text files or 'rb' for binary files), and type is the file type,
which has one of the values PY_SOURCE, PY_COMPILED, or C_EXTENSION,
described below.
and subsequently it explains what the different types are
imp.PY_SOURCE The module was found as a source file.
imp.PY_COMPILED The module was found as a compiled code object file.
imp.C_EXTENSION The module was found as dynamically loadable shared
library.
So, the types mentioned in PEP 338 are nothing but the types of modules that can be imported and of these only PY_SOURCE or PY_COMPILED are the only two types out of the above three the command line is effectively reinterpreted from python -m to python .
The type of module means the type of the file where the module is stored, as python files have some possible types (and extensions.
The most common are compiled python files (pyc extension) or the regular, python plain source (py).
There are many other py file extensions, see the (almost) full list here: https://stackoverflow.com/a/18032741/6575931.
Related
Recently I am reading some official doc about Modules and Import systems about python.
https://docs.python.org/3/reference/import.html
https://docs.python.org/3/tutorial/modules.html
https://docs.python.org/3/reference/simple_stmts.html#import
I notice that sys.modules which holds all the modules which have been loaded.
If I run a script like
import sys
print(sys.modules.keys())
I shall get the names for the modules which have been loaded by default except the sys module. (since I have import sys explicitly, but it may also be loaded by default because the import statement will first check whether the module to be imported has been loaded, if not then it will do the loading and initialization action).
I found that there is a set of modules called builtin modules, but I checked it with sys.builtin_module_names and found there are only part of them are loaded by default. And I also noticed that there are also some modules loaded by default come from the Python Standard Module/Library https://docs.python.org/3/tutorial/modules.html#standard-modules https://docs.python.org/3/library/. (Maybe the Python Standard should also contains all the builtin_modules)
So I want to know what is the format for these modules which will been loaded by default. Is there any official explanation about it?
The answer is in the documentation for "standard modules" that you linked to:
Python comes with a library of standard modules, described in a
separate document, the Python Library Reference (“Library Reference”
hereafter). Some modules are built into the interpreter; these provide
access to operations that are not part of the core of the language but
are nevertheless built in, either for efficiency or to provide access
to operating system primitives such as system calls.
I can make as much out as its build system generates a shared object, and also a Python module as a proxy to that shared object. But how does the Python runtime end up doing a dlopen of the generated shared object and bind Python method calls on to corresponding functions in the shared library?
I also found a reference in Python's import system about shared libraries, but nothing beyond that. Does CPython treat .so files in the module path as Python modules?
When doing import module Python will look for files with various extensions that could be python modules. That can be module.py, but also module.so (on Linux) or module.pyd (on Windows).
When loading a shared object, Python will load it like any dynamic library and then it will call the module's init method: It must be named PyInit_{module_name_here} and exported in the shared library.
You can read more about it here.
Specific example:
in /tmp/test, I create a file itertools.py with following content:
def f():
print("it is my own itertools.")
I know there is a builtin module named itertools
Then I run python3 from /tmp/test,
>>> import itertools
>>> itertools.__dict__
{'_tee': <class 'itertools._tee'>, 'groupby': <class 'itertools.groupby'>, ...}
So I import the builtin module itertools instead of /tmp/test/itertools.py.
It seems that Python searches for a builtin module before searching for a nonbuiltin module.
This is contrary to Python modules with identical names (i.e., reusing standard module names in packages). Why?
General rules:
From Python in a Nutshell
When a module is loaded, __import__ first checks whether the
module is built-in. The tuple sys.builtin_module_names names
all built-in modules, but rebinding that tuple does not affect
module loading.
The search for built-in modules also looks for modules in
platform-specific loca‐ tions, such as the Registry in Windows.
If module M is not built-in, __import__ looks for M ’s code as a file on the filesystem.
__import__ looks at the strings, which are the items of list sys.path , in order.
From Learning Python
In many cases, you can rely on the automatic nature of the module
import search path and won’t need to configure this path at all. If
you want to be able to import user- defined files across directory
boundaries, though, you will need to know how the search path works in
order to customize it. Roughly, Python’s module search path is
composed of the concatenation of these major components, some of which
are preset for you and some of which you can tailor to tell Python
where to look:
The home directory of the program
PYTHONPATH directories (if set)
Standard library directories
The contents of any .pth files (if present)
The site-packages home of third-party extensions
Are the five places in Learning Python stored in sys.path?
Are the five places in Learning Python searched only after failing to find a builtin module in sys.builtin_module_names?
Is "3. Standard library directories" not including the builtin modules? Where are the builtin modules stored? What are the relations between "3. Standard library directories" and the builtin modules?
Thanks.
This is only a partial answer but it may help clear up some concepts.
Builtin modules are typically implemented in C (at least for CPython) and compiled. These are listed in sys.builtin_module_names. Examples of such modules are sys, itertools and _collections (note the leading underscore).
Then there are the standard library modules. These are normal Python files and on Windows located in a folder lib inside your Python installation. For example collections (without underscore) or copy...
Then there are installed extension modules. These can be compiled modules, normal Python files, etc. On Windows these are found in the site_packages folder inside the lib folder.
If I look at my sys.path:
['', # current directory
'...\\Python\\python35.zip', # no idea
'...\\Python\\DLLs', # compiled modules (not builtins)
'...\\Python\\lib', # standard library modules
'...\\Python', # Python installation folder
'...\\Python\\lib\\site-packages', # installed modules
...]
It seems like 1, 3 and 5 are included in my sys.path so maybe 2 and 4 (if set) would be included in there as well. But that could also be something Windows specific.
As for your title question:
Does Python search for a builtin module before searching for a nonbuiltin module?
Yes! The builtins are searched first, before it progresses to look for a module in the current directory or the standard library (or in the installed modules).
For example if you have a sys.py a copy.py file in your current working directory and you try:
>>> import sys # a builtin module
>>> sys
<module 'sys' (built-in)> ... not the one from the current working directory
>>> import copy # a standard library module
>>> copy.__file__
... current working directory ... not the one from the standard library
If you're really interested in the specifics it would probably be a good idea to consult the importlib documentation and the referred PEPs. Personally, I wouldn't go down that path just out of curiosity.
In Python ( CPython) we can import module:
import module and module can be just *.py file ( with a python code) or module can be a module written in C/C++ ( be extending python). So, a such module is just compiled object file ( like *.so/*.o on the Unix).
I would like to know how is it executed by the interpreter exactly.
I think that python module is compiled to a bytecode and then it will be interpreted. In the case of C/C++ module functions from a such module are just executed. So, jump to the address and start execution.
Please correct me if I am wrong/ Please say more.
When you import a C extension, python uses the platform's shared library loader to load the library and then, as you say, jumps to a function in the library. But you can't load just any library or jump to any function this way. It only works for libs specifically implemented to support python and to functions that are exported by the library as a python object. The lib must understand python objects and use those objects to communicate.
Alternately, instead of importing, you can use a foreign-function library like ctypes to load the library and convert data to the C view of data to make calls.
Can you recommend a well-structured Python module combining both compiled C code (e.g. using distutils) and interpreted source code? I gather that "packages" can roll up interpreted modules and compiled modules, but I'm at a loss if it's possible to combine both compiled and interpreted sources into a single module. Does such a thing exist?
If not, is The Right Thing (TM) to have a package with from-import statements loading the public symbols from separated compiled and interpreted submodules?
You cannot have one module with both Python and C. Every .py file is a module, and C files are compiled and built into .so or .pyd files, each of which is a module. You can import the compiled module into the Python file and use them together.
If you want some ultra-simple examples, you might like A Whirlwind Excursion through Python C Extensions.