Python import iteration, sys.path - python

So I was wondering, at this point in time I'm reading a book about Python. The book explains the following:
The import algorithm
To truly understand namespace packages, we have
to look under the hood to see how the import operation works in 3.3.
During imports, Python still iterates over each directory in the
module search path, sys.path, just as in 3.2 and earlier.
My question is: How is python able to iterate through sys.path when sys is not imported. Also if python is able to see sys without import to iterate through sys.path why do we need to import sys in our code?
>>> sys
NameError: name 'sys' is not defined.
>>> import sys
>>> sys
<module 'sys' (built-in)>

There's no contradiction. Python's sys module exposes the search path configurations that modifies the behaviour of import to the Python side, but even without importing sys in your Python code, the interpreter knows about its own configurations.
In the following CPython source code it is commented that
/* _PyMem_SetDefaultAllocator() is needed to get a known memory allocator,
since Py_SetPath(), Py_SetPythonHome() and Py_SetProgramName() can be
called before Py_Initialize() which can changes the memory allocator. */
What this means is that Py_SetPath(), which is responsible for setting the module search path, can be executed so early, before any Python code can be interpreted (for example, import statements), that it needs its own memory allocator before the interpreter's own memory allocator takes over.
By the time Python interpreter's main() function is run, it can already read the path configuration using Py_GetPath() that calls the internal function _PyPathConfig_Init() if necessary, which is safe to do even before the interpreter is ready to execute Python code.

Related

Why does Python select this unexpected module to import

I am working with a project that has a user-written module called types.py buried in a second-level package (its path from the project root is package/subpackage/types.py).
This is causing problems because the Python library also has a types module. When enum.py, another Python library module, attempts to import types, the user-written version is imported instead, wreaking havoc.
What's puzzling me is that the import inside enum.py does not qualify types with any package names:
# line 10 of enum.py:
from types import MappingProxyType, DynamicClassAttribute
so why is Python selecting the user-written types which is in a two-level subpackage? It seems to me the user-written types would only be imported if one uses
# what I expect an 'import' would have to be like to access the user-written types.py
from package.subpackage.types import ...
Another possible explanation would be that sys.path contained the package/subpackage directory, but this is not the case when I print its content right before the enum.py import:
enum.py: Path:
/home/me/PycharmProjects/myproject
/home/me/anaconda3/envs/myproject/lib/python37.zip
/home/me/anaconda3/envs/myproject/lib/python3.7
/home/me/anaconda3/envs/myproject/lib/python3.7/lib-dynload
/home/me/anaconda3/envs/myproject/lib/python3.7/site-packages
So, how can the importing of the user-written types.py module be explained?
UPDATE: the first comment suggests this happens because my project's path is the first item in sys.path. However, I set up a really simple project in which a module called mymodule is in package.subpackage:
Importing from mymodule without using the package and subpackage names does not work:
# main.py
# Works:
from package.subpackage.mymodule import my_module_field
# Does not work:
# from mymodule import my_module_field
So I still do not understand why the from types import in enum.py can work find the user-written types.py without the packages names.
UPDATE 2: printing out more information, I see that when I print sys.path as soon as enum.py starts (I modified the standard library file to print it), I see that the package/subpackage directory is in sys.path, even though it was not at the beginning of execution. So this explains why the user-written typos.py is being used.
The issue now is why sys.path is appended with the package/subpackage directory. I searched all occurrences of sys.path in my code and even though the current directory is appended to it at some points, it is never the package/subpackage directory. Where can this be happening?
Not sure this counts as a real answer because it would not be possible to answer it based on the question information by itself (and adding all the details to the question is impractical). In any case, here's the solution.
Basically, upon greater inspection I found out that a script invokes another script as an external process, and this latter script is in the package/subpackage directory, which is added to sys.path in the new process. About this last point, I'm not sure why; I am assuming that a script's current directory is always added to sys.path.

How to use globally imported packages in script

I want to put my interactive commands in a script, but I can't run the same commands in the script.
We are using a heavily packaged version of Python for our tests, we usually run tests in interactive mode, but now I want to place all the commands in a package. Below is an example using the time package.
In interactive mode:
>>> import time
>>> import myscript
In my script:
time.sleep(5)
I expected the script to refer to the globally imported packages and allow me to run sleep, but it says NameError: global name 'time' is not defined
How do I get my script to recognize all packages imported into the interactive terminal? We use thousands of packages in our toolkit, and I can't import them all into my script.
You have to import these libraries also in the .py file where you are going to use them. Python does not allow using them when they are imported in a higher level module, and that's the way it should be. Python, in some way, forces you being better programmer. Your script should be something like this:
import time
time.sleep(5)
If I have Module A:
import time
and Module B:
import A
then in module B I do have access to libraries that A imported, but they must be qualified thus:
A.time.sleep5()
In short, when you import a module, the public names in that module become accessible to the importer. But what you are attempting to do is quite different. In essence, you have Module A as:
import time
import B
And module B as:
time.sleep(5)
Module B neither directly imports the time package nor module A and therefore has no access to time. Being imported by a module that does have access to time does not confer to the imported module that access.

difference between "import foo.bar" and "import foo"?

I just found that when I do
import pygame.joystick
I not only have access to joystick, but also to display, i.e. I can for example do
pygame.display.init()
just as if I had simply imported pygame.
What is the difference?
What's happening is that importing pygame.joystick triggers additional imports; either the pygame package itself, or pygame.joystick, or any of the pygame.* modules these two modules import, happen to import pygame.display somewhere.
So the fact that you can now reference pygame.display is an accident of implementation details. You may not be able to in future versions (if the project no longer needs to import pygame.display to load pygame.joystick, for example).
It is better to stick to an explicit import in your own project.
On import, module's inner code on the top level is executed, and the module is added to sys.modules and made available for use. Based on what was inside the source file, anything can and may happen.
In your case, either of pygame/__init__.py or pygame/joystick.py contains:
import pygame.display
Hence the availability of the module you weren't even trying to import.
In the source code of joystick, they're importing pygame.display or pygame and the sort.
In the C version of the documentation:
In the file joystick.c they've included joystick.h
#include <joystick.h>

how the python interpreter find the modules path?

I'm new to python, and I find that to see the import search paths, you have to import the sys module and than access the list of paths using sys.path, if this list is not available until I explicitly import the sys module, so how the interpreter figure out where this module resides.
thanks for any explanation.
The module search path always exists, even before you import the sys module. The sys module just makes it available for you.
It reflects the contents of the system variable $PYTHONPATH, or a system default, if you have not set that environment variable.
There is a default search path within the interpreter. (https://docs.python.org/2/install/#modifying-python-s-search-path )
A default value for the path is configured into the Python binary when the interpreter is built.
BTW, sys is built into the Python interpreter. (https://docs.python.org/2/tutorial/modules.html#standard-modules)
One particular module deserves some attention: sys, which is built into every Python interpreter.

Using imported modules in more than one file

This question is a bit dumb but I have to know it. Is there any way to use imported modules inside other imported modules?
I mean, if I do this:
-main file-
import os
import othermodule
othermodule.a()
-othermodule-
def a():
return os.path.join('/', 'example') # Without reimporting the os module
The os module is not recognized by the file. Is there any way to "reuse" the os module?
There's no need to do that, Python only loads modules once (unless you unload them).
But if you really have a situation in which a module can't access the standard library (care to explain???), you can simply access the os module within the main module (e.g. mainfile.os, modules are just variables when imported into a module namespace).
If the os module is already loaded, you can also access it with sys.modules["os"].
You have to put import os in othermodule.py as well (or instead, if "main file" doesn't need os itself). This is a feature; it means othermodule doesn't have to care what junk is in "main file". Python will not read the files for os twice, so don't worry about that.
If you need to get at the variables in the main file for some reason, you can do that with import __main__, but it's considered a thing to be avoided.
If you need a module to be reread after it's already been imported, you probably should be using execfile rather than import.
Python only imports a module once. Any subsequent import calls, just access the existing module object.

Categories

Resources