Recently I am reading some official doc about Modules and Import systems about python.
https://docs.python.org/3/reference/import.html
https://docs.python.org/3/tutorial/modules.html
https://docs.python.org/3/reference/simple_stmts.html#import
I notice that sys.modules which holds all the modules which have been loaded.
If I run a script like
import sys
print(sys.modules.keys())
I shall get the names for the modules which have been loaded by default except the sys module. (since I have import sys explicitly, but it may also be loaded by default because the import statement will first check whether the module to be imported has been loaded, if not then it will do the loading and initialization action).
I found that there is a set of modules called builtin modules, but I checked it with sys.builtin_module_names and found there are only part of them are loaded by default. And I also noticed that there are also some modules loaded by default come from the Python Standard Module/Library https://docs.python.org/3/tutorial/modules.html#standard-modules https://docs.python.org/3/library/. (Maybe the Python Standard should also contains all the builtin_modules)
So I want to know what is the format for these modules which will been loaded by default. Is there any official explanation about it?
The answer is in the documentation for "standard modules" that you linked to:
Python comes with a library of standard modules, described in a
separate document, the Python Library Reference (“Library Reference”
hereafter). Some modules are built into the interpreter; these provide
access to operations that are not part of the core of the language but
are nevertheless built in, either for efficiency or to provide access
to operating system primitives such as system calls.
I can make as much out as its build system generates a shared object, and also a Python module as a proxy to that shared object. But how does the Python runtime end up doing a dlopen of the generated shared object and bind Python method calls on to corresponding functions in the shared library?
I also found a reference in Python's import system about shared libraries, but nothing beyond that. Does CPython treat .so files in the module path as Python modules?
When doing import module Python will look for files with various extensions that could be python modules. That can be module.py, but also module.so (on Linux) or module.pyd (on Windows).
When loading a shared object, Python will load it like any dynamic library and then it will call the module's init method: It must be named PyInit_{module_name_here} and exported in the shared library.
You can read more about it here.
I'm writing a set of Python modules that are essentially utility modules for other code that is contained in dynamic libraries. To make packaging and use easier, what I'd like to do is bundle them inside the library (on Windows as a string resource, on Linux I'm not sure yet - probably export a function that returns the string). Now I'm wondering if there is a way to import a Python module as a string literal of its source code. So essentially the equivalent of
mymod = "def func():\n return 1"
import(mymod)
Any imports in the imported module itself should also work. Ideally I'm thinking of some way of providing a callback function that is passed the name of a module and returns a string with its contents, so that I can recursively have my modules be loaded; but as a backup I can also live with the situation where upon doing the import of the main module, I would do the same thing in the init in python (i.e. dynamically load any dependencies manually - of course I don't like doing things manually, hence why this is a fallback :) )
Oh and I'd like this to work in Python 2.7 and 3, if that makes a difference...
If you are looking for a way to execute code from a string (which is not recommended, you might be better off going through the whole process of setup.py to make your library installable) you can use the exec statement as follows
exec(mymod)
This will parse the string as it would normal Python source and execute it, leaving you with it's side effects (such as defining functions and variables). This will work in both Python 2.7 and 3.x. See the documentation here and here for more details.
Alternatively, in Python 2.7 only, execfile does the same thing as exec but for a text file
execfile("path/to/my/mod")
The documentation explains what it does and doesn't do.
Can you recommend a well-structured Python module combining both compiled C code (e.g. using distutils) and interpreted source code? I gather that "packages" can roll up interpreted modules and compiled modules, but I'm at a loss if it's possible to combine both compiled and interpreted sources into a single module. Does such a thing exist?
If not, is The Right Thing (TM) to have a package with from-import statements loading the public symbols from separated compiled and interpreted submodules?
You cannot have one module with both Python and C. Every .py file is a module, and C files are compiled and built into .so or .pyd files, each of which is a module. You can import the compiled module into the Python file and use them together.
If you want some ultra-simple examples, you might like A Whirlwind Excursion through Python C Extensions.
I'm trying to write a software plug-in that embeds Python. On Windows the plug-in is technically a DLL (this may be relevant). The Python Windows FAQ says:
1.Do not build Python into your .exe file directly. On Windows, Python must be a DLL to handle importing modules that are themselves DLL’s. (This is the first key undocumented fact.) Instead, link to pythonNN.dll; it is typically installed in C:\Windows\System. NN is the Python version, a number such as “23” for Python 2.3.
My question is why exactly Python must be a DLL? If, as in my case, the host application is not an .exe, but also a DLL, could I build Python into it? Or, perhaps, this note means that third-party C extensions rely on pythonN.N.dll to be present and other DLL won't do? Assuming that I'd really want to have a single DLL, what should I do?
I see there's the dynload_win.c file, which appears to be the module to import C extensions on Windows and, as far as I can see, it scans the extension file to find which pythonX.X.dll it imports; but I'm not experienced with Windows and I don't quite understand all the code there.
You need to link to pythonXY.dll as a DLL, instead of linking the relevant code directly into your executable, because otherwise the Python runtime can't load other DLLs (the extension modules it relies on.) If you make your own DLL you could theoretically link all the Python code in that DLL directly, since it doesn't end up in the executable but still in a DLL. You'll have to take care to do the linking correctly, however, as pretty much none of the standard tools (like distutils) will do this for you.
However, regardless of how you embed Python, you can't make do with just the DLL, nor can you make do with just any DLL. The ABI changes between Python versions, so if you compiled your code against Python 2.6, you need python26.dll; you can't use python25.dll or python27.dll. And Python isn't just a DLL; it also needs its standard library, which includes extension modules (which are DLLs themselves, although they have the .pyd extension.) The code in dynload_win.c you ran into is for loading those DLLs, and are not related to loading of pythonXY.dll.
In short, in order to embed Python in your plugin, you need to either ship Python with the plugin, or require that the right Python version is already installed.
(Sorry, I did a stupid thing, I first wrote the question, and then registered, and now I cannot alter it or comment on the replies, because StackOverflow's engine doesn't think I'm the author. I cannot even properly thank those who replied :( So this is actually an update to the question and comments.)
Thanks for all the advice, it's very valuable. As far as I understand with some effort I can link Python statically into a custom DLL, provided that I compile other dynamically loaded extensions myself and link them against the same DLL. (I know I need to ship the standard library too; my plan was to append a zipped archive to the DLL file. As far as I understand, I will even be able to import pure Python modules from it.)
I also found an interesting place in dynload_win.c. (I understand it loads dynamic extensions that use Python C API, e.g. _ctypes.) As far as I can see it not only looks for init_ctypes symbol or whatever the extension name is, but also scans the .pyd file's import table looking for (regex) python\d+\. and then compares the found symbol with known pythonNN. string to make sure the extension was compiled for this version of Python. If the import table doesn't have such a symbol or it refers to another version, it raises an error.
For me it means that:
If I link an extension against pythonNN.dll and try to load it from my custom DLL that includes a statically linked Python, it will pass the check, but — well, here I'm not sure: will it fail because there's no pythonNN.dll (i.e. even before getting to the check) or it will happily load the symbols?
And if I link it against my custom DLL, it will find symbols, but won't pass the check :)
I guess I could rewrite this piece to suit my needs... Are there any other such places, I wonder.
Python needs to be a dll (with a standard name) such that your application, and the plugin, can use the same instance of python.
Plugin dlls are already going to expect to be loading (and using python from) a python26.dll (or whichever version) - if your python is statically embedded in your exe, then two different instances of the python library would be managing the same data structures.
If the python libraries use no static variables at all, and the compile settings are exactly the same this should not be a problem. However, generally its far safer to simply ensure that only one instance of the python interpreter is being used.
On *nix, all shared objects in a process, including the executable, contribute their exported names into a common pool; any of the shared objects can then pull any of the names from the pool and use them as they like. This allows e.g. cStringIO.so to pull the relevant Python library functions from the main executable when the Python library is statically-linked.
On Windows, each shared object has its own independent pool of names it can use. This means that it must read the relevant different shared objects it needs functions from. Since it is a lot of work to get all the names from the main executable, the Python functions are separated out into their own DLL.