Why are so many modules initially loaded in CPython? - python

The python docs state:
A complete Python program is executed in a minimally initialized environment: all built-in and standard modules are available, but none have been initialized, except for sys (various system services), builtins (built-in functions, exceptions and None) and __main__.
This would suggest that only those three modules should be listed as loaded modules with the following code snippet:
import sys
print(sys.modules.keys())
However, running the code snippet using CPython v3.10 (with -S option) returns the following on my PC:
dict_keys(['sys', 'builtins', '_frozen_importlib', '_imp', '_thread', '_warnings', '_weakref', '_io', 'marshal', 'nt', 'winreg', '_frozen_importlib_external', 'time', 'zipimport', '_codecs', 'codecs', 'encodings.aliases', 'encodings', 'encodings.utf_8', 'encodings.cp1252', '_signal', '_abc', 'abc', 'io', '__main__'])
Why are there 22 extra modules loaded at runtime as compared to the "minimally initialized environment" mentioned in the docs?
I am updating my understanding of CPython's extra loaded modules with my own answer below.

What I have found so far:
The majority of the extra modules are active to provide the import keyword functionality, and for text encodings.
During interpreter initialisation, the import functionality is provided by importing _frozen_importlib and _imp (during pycore_interp_init, within init_importlib)
_setup() in importlib's _bootstrap.py then imports _thread, _warnings, and _weakref because they are builtin modules that are explicitly imported during bootstrap, and hence not really extra modules.
Interpreter initialisation then imports _frozen_importlib_external (during init_interp_main, within init_importlib_external)
importlib's _bootstrap_external.py imports four new module dependencies: _io, marshal, nt, and winreg packages. If not on Windows, posix gets imported rather than nt and winreg.
_io is imported because it is a builtin module explicitly imported during bootstrap, and hence not really an extra module.
marshal is imported because it is used to load/dump bytecode from/to .pyc files.
nt/posix is imported because it is used for operating system functions such as reading the current working directory.
winreg is imported because it is used to find modules declared in the windows registry.
As part of importing _frozen_importlib_external, the interpreter initialization then imports zipimport, presumably to allow for opening zip-format python archives
As part of importing zipimport, the only new dependency is the time module which is imported. The only use is time.mktime() to "convert the date/time values found in the Zip archive to a value that's compatible with the time stamp stored in .pyc files"
After _frozen_importlib_external (and thus after import keyword functionality is sorted), the interpreter initialization then imports encodings, presumably for decoding source text.
encodings.aliases is imported because it provides a dictionary of names to map to known encodings.
codecs is imported as it is a dependency of encodings
_codecs is imported, presumably because it is the C version of codecs?
encodings.utf_8 is then imported, presumably because it is the default encoding.
Because we are on Windows, encodings.cp1252 is also imported (encodings.latin_1 is imported instead if on Linux).
The interpreter initialization then imports _signal, presumably for the interpreter to deal with signal handling.
io is then fully imported, presumably to open source files?
abc is then imported as it is a dependency on io?
_abc is then imported, presumably because it is the C version of codecs?
(On Linux, readline is also imported)
And thus 22 extra 'modules' are loaded when using CPython.

Related

Using Pylint with PyModule generated using PyO3 and maturin

Pylint will not recognize any of the functions from a PyModule that I created using PyO3 and maturin. All of the functions import and run fine in the python code base, but for some reason Pylint is throwing E1011: no-member warnings.
Below is a (likely) incomplete dummy example, but is provided in order to show the way I am decorating using pymodule and pyfunction:
#[pyfunction]
fn add_nums(
_py: Python<'_>,
a: f32,
b: f32,
) -> PyResult<f32> {
let res:f32 = a+b;
Ok(res)
}
#[pymodule]
fn my_module(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(add_nums, m)?)?;
Ok(())
}
Then if I build that using maturin build --release and install the module, from the resulting wheelfile, into my python environment and import into a script:
import my_module
my_module.add_nums(5, 6) # ignore that these are not f32 - irrelevant this is a dummy example
If I then run pylint on that file (from terminal - VS Code pylint extension actually does not complain about this...), I end up with something like: E1101: Module 'my_module' has no 'add-nums' member (no-member) even though the code (not this code - but the real code which I cannot include here) runs just fine.
Has anyone successfully built wheelfiles using maturin, used them in another project, then had Pylint play nicely with that project and recognize that the methods do actually exist?
Pylint has a extension-pkg-allow-list setting which you can use to inspect non-python modules. It will need to load the extension into pylint's interpreter though, which is why it's not enabled by default.
There's also requests to support (and lint) pyi, but AFAIK that's not supported yet, cf #2873 and #4987.
Before Pylint 2.8, the setting is extension-pkg-whitelist.
Similar to the answer by #Masklinn except it looks like the term 'extension-pkg-whitelist' exists in older versions and later the 'extension-pkg-allow-list' does not (though it was introduced for obvious societal reasons).
add the following into the [MASTER] section of your .pylintrc:
[MASTER]
# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code.
extension-pkg-allow-list=
my_module
for versions where this is not supported (someone please update which version it changed here) use the extension-pkg-whitelist instead.

Invoking python3 with no arguments results in the interpreter opening a script called dis.py in the current directory. How to avoid similar problems?

Invoking the python 3.10.6 interpreter with no arguments produces the following output in the presence of a (possibly empty) file called dis.py in the working directory.
Python 3.10.6 (main, Aug 30 2022, 04:58:14) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Failed calling sys.__interactivehook__
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python#3.10/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site.py", line 447, in register_readline
import rlcompleter
File "/opt/homebrew/Cellar/python#3.10/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/rlcompleter.py", line 34, in <module>
import inspect
File "/opt/homebrew/Cellar/python#3.10/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/inspect.py", line 59, in <module>
for k, v in dis.COMPILER_FLAG_NAMES.items():
AttributeError: module 'dis' has no attribute 'COMPILER_FLAG_NAMES'
>>>
Clearly I shouldn't have my own files called dis.py lying around in the working directory! Are the names of all of the other modules included by
inspect.py
also "reserved"? Shouldn't the modules inspect.py imports be somehow fully qualified to stop this from happening? Any suggestions on what best practice might be would be helpful. This is a surprising behaviour and I am not aware of any warnings about it.
I was equally surprised and alarmed after reading this question. I often run python with no arguments just to do some arithmetic or call help(), and I've never stopped to worry about what directory I'm in and whether I trust the code in it.
I did some digging and found that it's because of the order of initialization of the interpreter: A bunch of stuff is imported, then an empty string (meaning the current working directory) is prepended to sys.path, then site.py is imported. And for interactive shells, site.py imports readline and rlcompleter, which imports inspect, which imports many things, like dis and ast. If the current working directory contains readline.py or rlcompleter.py or ast.py or dis.py etc. it will be executed. In Python 3.8 and earlier rlcompleter did not import inspect, and so the number of problematic names was smaller (just readline and rlcompleter as far as I can tell).
This has been addressed in Python 3.11.0 beta 1, in two ways:
gh-92345: pymain_run_python() now imports readline and rlcompleter before sys.path is extended to include the current working directory of an interactive interpreter.
gh-57684: Add the -P command line option and the PYTHONSAFEPATH environment variable to not prepend a potentially unsafe path to sys.path.
Note that -P and PYTHONSAFEPATH address a broader issue. Even non-interactive python interpreters normally prepend something to sys.path, either the current working directory or the directory containing the main script, and that can have malicious or accidental effects. Disabling that sys.path modification avoids those pitfalls but is backward-incompatible with some existing code. Still, I hope -P can become the default behavior someday.
What can you do today, with Python 3.10 (or earlier)? One idea would be to set your PYTHONSTARTUP environment variable (which affects only interactive interpreters) to point to a script that removes any empty string from the front of sys.path, like so:
import sys
if sys.path and not sys.path[0]:
sys.path.pop(0)
On occasions where you need to interactively import things from the current directory, you can either start the python shell with PYTHONSTARTUP= python3 or interactively enter sys.path.insert(0,'').

Reloading Python Packages

the main module sits within the runner package and executes stuff in the other packages. The main module can also Update the other packages and when that happens I want to reload them in order to get the new functions/modules that were added to those packages.
Project Structure
|--runner
|----main.py
|--core
|----module_1.py
|--configurations
|--utils
But that doesn't work.
I tried the following commands:
importlib.reload - only reloads a single module, using it recursively with sys.modules didn't add the new modules to the import tree. example: if after the update, "core" received a new module "module_new.py" and its imported in "module_1.py" it's not recognized after the reload.
I tried using IPython.lib.deepreload - it didn't work as well.
I've been stuck with this issue for some time, and haven't found any working solution yet.
Suggestions? Thanks
I fixed the issue by restarting the entire program using a while loop from an outer execution script.
Exit code 2: update required
Do
{
$process = Start-Process python -ArgumentList $CommandLine -verb RunAs -PassThru -WindowStyle Minimized -Wait
} WHILE ($process.ExitCode -eq 2)
Modules will be reloaded by import command if they are not in sys.modules dict
# import some standard (non-updatable) modules
import numpy as np
# save set of non-reloadable modules on first run,
# and delete reloadable modules on other runs
if 'init_modules' not in globals():
init_modules = set(sys.modules.keys())
else:
modules = list(sys.modules.keys())
for m in modules:
if m not in init_modules:
del(sys.modules[m])
# import reloadable packages and modules
import MyPackage

Where is the _imp module located in Python 3.4?

I was trying to understand how relative imports work with regular packages, and I was looking at my Python 3.4.3 folder for examples. In a file called machinery.py that I found in C:\Python34\Lib\importlib (I installed 3.4 in C:\Python34), I found this line:
import _imp
Where is this module located? Just curious about it. I tried doing the cmd search dir _imp.py /s on my Windows laptop from C:\Python34, but I didn't find anything. Out of a wild guess, I was thinking it may be some low level C library, so I tried searching with dir _imp.lib /s, but I didn't get anything there either.
As help(_imp) says:
_imp - (Extremely) low-level import machinery bits as used by importlib and imp.
It's a built-in module:
>>> _imp
<module '_imp' (built-in)>
and the source can be found in Python/import.c.

How do I find the location of Python module sources?

How do I learn where the source file for a given Python module is installed? Is the method different on Windows than on Linux?
I'm trying to look for the source of the datetime module in particular, but I'm interested in a more general answer as well.
For a pure python module you can find the source by looking at themodule.__file__.
The datetime module, however, is written in C, and therefore datetime.__file__ points to a .so file (there is no datetime.__file__ on Windows), and therefore, you can't see the source.
If you download a python source tarball and extract it, the modules' code can be found in the Modules subdirectory.
For example, if you want to find the datetime code for python 2.6, you can look at
Python-2.6/Modules/datetimemodule.c
You can also find the latest version of this file on github on the web at
https://github.com/python/cpython/blob/main/Modules/_datetimemodule.c
Running python -v from the command line should tell you what is being imported and from where. This works for me on Windows and Mac OS X.
C:\>python -v
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# C:\Python24\lib\site.pyc has bad mtime
import site # from C:\Python24\lib\site.py
# wrote C:\Python24\lib\site.pyc
# C:\Python24\lib\os.pyc has bad mtime
import os # from C:\Python24\lib\os.py
# wrote C:\Python24\lib\os.pyc
import nt # builtin
# C:\Python24\lib\ntpath.pyc has bad mtime
...
I'm not sure what those bad mtime's are on my install!
I realize this answer is 4 years late, but the existing answers are misleading people.
The right way to do this is never __file__, or trying to walk through sys.path and search for yourself, etc. (unless you need to be backward compatible beyond 2.1).
It's the inspect module—in particular, getfile or getsourcefile.
Unless you want to learn and implement the rules (which are documented, but painful, for CPython 2.x, and not documented at all for other implementations, or 3.x) for mapping .pyc to .py files; dealing with .zip archives, eggs, and module packages; trying different ways to get the path to .so/.pyd files that don't support __file__; figuring out what Jython/IronPython/PyPy do; etc. In which case, go for it.
Meanwhile, every Python version's source from 2.0+ is available online at http://hg.python.org/cpython/file/X.Y/ (e.g., 2.7 or 3.3). So, once you discover that inspect.getfile(datetime) is a .so or .pyd file like /usr/local/lib/python2.7/lib-dynload/datetime.so, you can look it up inside the Modules directory. Strictly speaking, there's no way to be sure of which file defines which module, but nearly all of them are either foo.c or foomodule.c, so it shouldn't be hard to guess that datetimemodule.c is what you want.
If you're using pip to install your modules, just pip show $module the location is returned.
The sys.path list contains the list of directories which will be searched for modules at runtime:
python -v
>>> import sys
>>> sys.path
['', '/usr/local/lib/python25.zip', '/usr/local/lib/python2.5', ... ]
from the standard library try imp.find_module
>>> import imp
>>> imp.find_module('fontTools')
(None, 'C:\\Python27\\lib\\site-packages\\FontTools\\fontTools', ('', '', 5))
>>> imp.find_module('datetime')
(None, 'datetime', ('', '', 6))
datetime is a builtin module, so there is no (Python) source file.
For modules coming from .py (or .pyc) files, you can use mymodule.__file__, e.g.
> import random
> random.__file__
'C:\\Python25\\lib\\random.pyc'
Here's a one-liner to get the filename for a module, suitable for shell aliasing:
echo 'import sys; t=__import__(sys.argv[1],fromlist=[\".\"]); print(t.__file__)' | python -
Set up as an alias:
alias getpmpath="echo 'import sys; t=__import__(sys.argv[1],fromlist=[\".\"]); print(t.__file__)' | python - "
To use:
$ getpmpath twisted
/usr/lib64/python2.6/site-packages/twisted/__init__.pyc
$ getpmpath twisted.web
/usr/lib64/python2.6/site-packages/twisted/web/__init__.pyc
In the python interpreter you could import the particular module and then type help(module). This gives details such as Name, File, Module Docs, Description et al.
Ex:
import os
help(os)
Help on module os:
NAME
os - OS routines for Mac, NT, or Posix depending on what system we're on.
FILE
/usr/lib/python2.6/os.py
MODULE DOCS
http://docs.python.org/library/os
DESCRIPTION
This exports:
- all functions from posix, nt, os2, or ce, e.g. unlink, stat, etc.
- os.path is one of the modules posixpath, or ntpath
- os.name is 'posix', 'nt', 'os2', 'ce' or 'riscos'
et al
On windows you can find the location of the python module as shown below:i.e find rest_framework module
New in Python 3.2, you can now use e.g. code_info() from the dis module:
http://docs.python.org/dev/whatsnew/3.2.html#dis
Check out this nifty "cdp" command to cd to the directory containing the source for the indicated Python module:
cdp () {
cd "$(python -c "import os.path as _, ${1}; \
print _.dirname(_.realpath(${1}.__file__[:-1]))"
)"
}
Just updating the answer in case anyone needs it now, I'm at Python 3.9 and using Pip to manage packages. Just use pip show, e.g.:
pip show numpy
It will give you all the details with the location of where pip is storing all your other packages.
On Ubuntu 12.04, for example numpy package for python2, can be found at:
/usr/lib/python2.7/dist-packages/numpy
Of course, this is not generic answer
Another way to check if you have multiple python versions installed, from the terminal.
$ python3 -m pip show pyperclip
Location: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
$ python -m pip show pyperclip
Location: /Users/umeshvuyyuru/Library/Python/2.7/lib/python/site-packages
Not all python modules are written in python. Datetime happens to be one of them that is not, and (on linux) is datetime.so.
You would have to download the source code to the python standard library to get at it.
For those who prefer a GUI solution: if you're using a gui such as Spyder (part of the Anaconda installation) you can just right-click the module name (such as "csv" in "import csv") and select "go to definition" - this will open the file, but also on the top you can see the exact file location ("C:....csv.py")
If you are not using interpreter then you can run the code below:
import site
print (site.getsitepackages())
Output:
['C:\\Users\\<your username>\\AppData\\Local\\Programs\\Python\\Python37', 'C:\\Users\\<your username>\\AppData\\Local\\Programs\\Python\\Python37\\lib\\site-packages']
The second element in Array will be your package location. In this case:
C:\Users\<your username>\AppData\Local\Programs\Python\Python37\lib\site-packages
In an IDE like Spyder, import the module and then run the module individually.
enter image description here
as written above
in python just use help(module)
ie
import fractions
help(fractions)
if your module, in the example fractions, is installed then it will tell you location and info about it, if its not installed it says module not available
if its not available it doesn't come by default with python in which case you can check where you found it for download info

Categories

Resources