How to prevent multiple initialization of dynamic library - python

I am working on Python version 2.7.
I have a module extension for Python written in C.
The module initialization function PyMODINIT_FUNC initmymodule contains some code for initializing OpenSSL library. My module built as shared library and loading via imp.load_dynamic
This module may loading many times and I can't control it. Django and python doing that. And when it loading twice then OPENSSL_config function calling twice too. And it leading to process crash.
I can't control it from C-code, I can't control it from Python-code.
Here look at the docs
http://docs.python.org/2.7/library/imp.html
It says:
imp.load_dynamic Load and initialize a module implemented as a
dynamically loadable shared library and return its module object. If
the module was already initialized, it will be initialized again.
Nice.
I found that the similar problem was solved in Python version 3.4
http://hg.python.org/cpython/file/ad51ed93377c/Python/import.c#l459
Modules which do support multiple initialization set their m_size
field to a non-negative number (indicating the size of the
module-specific state). They are still recorded in the extensions
dictionary, to avoid loading shared libraries twice.
But what shall I do in Python 2.7?

Maybe to do workaround by registering own custom import hook where you could control the case which causes you problem (prevent double initialization). Some references for writing custom import hooks:
Python import hooks article
PEP-302 New Import Hooks - python 2.3+
create and register custom import/reload functions - example of implementation in project lazy_reload
This is hackish solution, so I suggest extra caution if this is to be used in production systems.

I have found the cause of my problem. It happens because my django application uses driver to connect PostgreSQL and this driver loads OpenSSL library. It leads to conflict just as user315052 showed in this comment
I think I have to take out all crypto-functionality of my application to a separate process.

Related

what are the modules will be loaded by default with Python?

Recently I am reading some official doc about Modules and Import systems about python.
https://docs.python.org/3/reference/import.html
https://docs.python.org/3/tutorial/modules.html
https://docs.python.org/3/reference/simple_stmts.html#import
I notice that sys.modules which holds all the modules which have been loaded.
If I run a script like
import sys
print(sys.modules.keys())
I shall get the names for the modules which have been loaded by default except the sys module. (since I have import sys explicitly, but it may also be loaded by default because the import statement will first check whether the module to be imported has been loaded, if not then it will do the loading and initialization action).
I found that there is a set of modules called builtin modules, but I checked it with sys.builtin_module_names and found there are only part of them are loaded by default. And I also noticed that there are also some modules loaded by default come from the Python Standard Module/Library https://docs.python.org/3/tutorial/modules.html#standard-modules https://docs.python.org/3/library/. (Maybe the Python Standard should also contains all the builtin_modules)
So I want to know what is the format for these modules which will been loaded by default. Is there any official explanation about it?
The answer is in the documentation for "standard modules" that you linked to:
Python comes with a library of standard modules, described in a
separate document, the Python Library Reference (“Library Reference”
hereafter). Some modules are built into the interpreter; these provide
access to operations that are not part of the core of the language but
are nevertheless built in, either for efficiency or to provide access
to operating system primitives such as system calls.

What issues with future 3rd-party packages could be expected if removing unneeded Python3 core modules

I have an environment with some extreme constraints that require me to reduce the size of a planned Python 3.8.1 installation. The OS is not connected to the internet, and a user will never open an interactive shell or attach a debugger.
There are of course lots of ways to do this, and one of the ways I am exploring is to remove some core modules, for example python3-email. I am concerned that there are 3rd-party packages that future developers may include in their apps that have unused but required dependencies on core python features. For example, if python3-email is missing, what 3rd-party packages might not work that one would expect too? If a developer decides to use a logging package that contains an unreferenced EmailLogger class in a referenced module, it will break, simply because import email appears at the top.
Do package design requirements or guidelines exist that address this?
It's an interesting question, but it is too broad to be cleanly answered here. In short, the Python standard library is expected to always be there, even though sometimes it broken up in multiple parts (Debian for example). But you say it yourself, you don't know what your requirements are since you don't know yet what future packages will run on this interpreter... This is impossible to answer. One thing you could do is to use something like modulefinder on the future code before letting it run on that constrained Python interpreter.
I was able to get to a solution. The issue was best described to me as cascading imports. It is possible to stop a module from being loaded, by adding an entry to sys.modules. For example, when importing the asyncio module ssl and _ssl modules will be loaded, even though they are not needed outside of ssl. This can be stopped with the following code. This can be verified both by seeing the python process is 3MB smaller, but also by using module load hooks to watch each module as it loads:
import importhook
import sys
sys.modules['ssl'] = None
#importhook.on_import(importhook.ANY_MODULE)
def on_any_import(module):
print(module.__spec__.name)
assert module.__spec__.name not in ['ssl', '_ssl']
import asyncio
For my original question about 3rd-party design guidelines, some recommend placing the import statements within the class rathe that at the module level, however this is not routinely done.

GAE Managed VMs: Possible to use C-based Python libraries with standard runtime?

I'm building a background module for my app in Python 2.7, but it needs to use C-based external libraries such as OpenCV. While GAE only "directly" supports pure Python libraries, I understand that using a managed VM removes that constraint. What I'm not quite clear on, after reading the documentation, is whether I would need to use a custom runtime, or whether a standard Python runtime (for which there's a ready-made Docker file and built-in API support for Datastore, Task Queue, etc.) would be sufficient.
Thanks in advance for any insight!
The standard runtime is fine, you just need to add your extra dependencies to the Dockerfile that gets created. The tutorial in the docs (specifically Step 6) shows an example of building a python app that uses a C-extension.

boost module in Python 2.7?

I am trying to debug a file for a project I am working on, and the first thing I made sure to do is install/build all of the modules that the file is importing. Thisis the first line of the file:
from scitbx.array_family import flex
which in turn reads from flex.py,
from __future__ import division
import boost.optional # import dependency
import boost.std_pair # import dependency
import boost.python
I entered the commands in ipython individually and get stuck on importing boost.optional. Since they are all from the same module I tried searching for the module named boost.
I found the site: http://www.boost.org/doc/libs/1_57_0/more/getting_started/unix-variants.html
and installed the related .bz2 file in the same directory as my other modules to make sure it is within sys.path. However I still can't get ipython to import anything. Am I completely off base in my approach or is there some other boost module that I can't find? I should mention that I am a complete novice with computers, and am learning as I go along. Any advice is much appreciated!
The library you have installed is called Boost. This is a collection of C++ libraries, one of which is Boost.Python. However this library doesn't provide Python modules that you can import directly - it doesn't provide boost.optional. Instead it enables interoperability between Python and C++ - you can write a C++ library using Boost.Python that can then be used in a normal Python interpreter.
In you case boost.optional is provided by the CCTBX collection of software, which does depend on Boost and Boost.Python. So you are not too far off. This thread in the mailing list covers your error message and some potential solutions.
Essentially you need to use the custom cctbx.python command (or scitbx.python, they are equivalent) to run python as this sets the PYTHONPATH correctly for their requirements. It's also documented on this page.

Embedding Python on Windows: why does it have to be a DLL?

I'm trying to write a software plug-in that embeds Python. On Windows the plug-in is technically a DLL (this may be relevant). The Python Windows FAQ says:
1.Do not build Python into your .exe file directly. On Windows, Python must be a DLL to handle importing modules that are themselves DLL’s. (This is the first key undocumented fact.) Instead, link to pythonNN.dll; it is typically installed in C:\Windows\System. NN is the Python version, a number such as “23” for Python 2.3.
My question is why exactly Python must be a DLL? If, as in my case, the host application is not an .exe, but also a DLL, could I build Python into it? Or, perhaps, this note means that third-party C extensions rely on pythonN.N.dll to be present and other DLL won't do? Assuming that I'd really want to have a single DLL, what should I do?
I see there's the dynload_win.c file, which appears to be the module to import C extensions on Windows and, as far as I can see, it scans the extension file to find which pythonX.X.dll it imports; but I'm not experienced with Windows and I don't quite understand all the code there.
You need to link to pythonXY.dll as a DLL, instead of linking the relevant code directly into your executable, because otherwise the Python runtime can't load other DLLs (the extension modules it relies on.) If you make your own DLL you could theoretically link all the Python code in that DLL directly, since it doesn't end up in the executable but still in a DLL. You'll have to take care to do the linking correctly, however, as pretty much none of the standard tools (like distutils) will do this for you.
However, regardless of how you embed Python, you can't make do with just the DLL, nor can you make do with just any DLL. The ABI changes between Python versions, so if you compiled your code against Python 2.6, you need python26.dll; you can't use python25.dll or python27.dll. And Python isn't just a DLL; it also needs its standard library, which includes extension modules (which are DLLs themselves, although they have the .pyd extension.) The code in dynload_win.c you ran into is for loading those DLLs, and are not related to loading of pythonXY.dll.
In short, in order to embed Python in your plugin, you need to either ship Python with the plugin, or require that the right Python version is already installed.
(Sorry, I did a stupid thing, I first wrote the question, and then registered, and now I cannot alter it or comment on the replies, because StackOverflow's engine doesn't think I'm the author. I cannot even properly thank those who replied :( So this is actually an update to the question and comments.)
Thanks for all the advice, it's very valuable. As far as I understand with some effort I can link Python statically into a custom DLL, provided that I compile other dynamically loaded extensions myself and link them against the same DLL. (I know I need to ship the standard library too; my plan was to append a zipped archive to the DLL file. As far as I understand, I will even be able to import pure Python modules from it.)
I also found an interesting place in dynload_win.c. (I understand it loads dynamic extensions that use Python C API, e.g. _ctypes.) As far as I can see it not only looks for init_ctypes symbol or whatever the extension name is, but also scans the .pyd file's import table looking for (regex) python\d+\. and then compares the found symbol with known pythonNN. string to make sure the extension was compiled for this version of Python. If the import table doesn't have such a symbol or it refers to another version, it raises an error.
For me it means that:
If I link an extension against pythonNN.dll and try to load it from my custom DLL that includes a statically linked Python, it will pass the check, but — well, here I'm not sure: will it fail because there's no pythonNN.dll (i.e. even before getting to the check) or it will happily load the symbols?
And if I link it against my custom DLL, it will find symbols, but won't pass the check :)
I guess I could rewrite this piece to suit my needs... Are there any other such places, I wonder.
Python needs to be a dll (with a standard name) such that your application, and the plugin, can use the same instance of python.
Plugin dlls are already going to expect to be loading (and using python from) a python26.dll (or whichever version) - if your python is statically embedded in your exe, then two different instances of the python library would be managing the same data structures.
If the python libraries use no static variables at all, and the compile settings are exactly the same this should not be a problem. However, generally its far safer to simply ensure that only one instance of the python interpreter is being used.
On *nix, all shared objects in a process, including the executable, contribute their exported names into a common pool; any of the shared objects can then pull any of the names from the pool and use them as they like. This allows e.g. cStringIO.so to pull the relevant Python library functions from the main executable when the Python library is statically-linked.
On Windows, each shared object has its own independent pool of names it can use. This means that it must read the relevant different shared objects it needs functions from. Since it is a lot of work to get all the names from the main executable, the Python functions are separated out into their own DLL.

Categories

Resources