How to safely create and import .so files in Python 3? - python

I am trying to use Cython to create .so binary files from our .py files and shared it with our team.
However even if we all use Python3, most of times it should be exactly similar revision (let us say 3.7.8), otherwise we get an error to import them.
Is this behavior expected?
Some of revisions are compatible. For example if we make .so with python 3.5.2 and import in 3.6.8 it works but it does not work in 3.7.8
Where does this mess comes from and what is the safest way to do this?

To follow up on my comments:
On the same platform, extension modules should work within a "minor version" (i.e. modules built with 3.7.2 and 3.7.3 should be compatible). I'm struggling to find a source for this though. Beyond that some effort has made in the past to ensure compatibility between releases, but not so much any more so it's possible you may be lucky and things work.
distutils/setuptools and other similar build mechanisms tag extension modules with a suffix indicating the version and some other details. For example, and extension would be called foo.cpython-37m.so instead of just foo.so. These tags prevent the module from being used with other Python versions and are a good thing. If you are removing these tags then this mess is entirely on you.
Python now defines a more limited stable ABI that should be compatible across Python versions. Cython is working towards supporting that but at the moment it is not in a usable state. In a year or so it should be a good solution.
In summary, .so files are not expected to be portable between different Python versions. You should either standardise on a Python version or build the .so files locally.

Related

Importing cython generated *.so-module with another python-version or on another OS

How should a file myModule.cpython-35m-x86_64-linux-gnu.so be imported in python? Is it possible?
I tried the regular way:
import myModule
and the interpreter says:
`ModuleNotFoundError: No module named 'myModule'`
This is a software that I can't install in the cluster that I am working at so I just extracted the .deb package and it does not have a wheel file or structure to install.
It is problematic to use a C-extension built for one Python version in another Python version. Normally (at least for Python3) there is a mechanism in place to differentiate C-extensions for different Python versions, so they can co-exist in the same directory.
In your example, the suffix is cpython-35m-x86_64-linux-gnu so this C-extension will be picked up by a CPython3.5 on a x86_64 Linux. If you try to import this extension with another Python-version or on another plattform, the module isn't visible and ModuleNotFoundError is raised.
It is possible to see, which suffixes are accepted by the current Python version, e.g. via:
>>> import _imp
>>>_imp.extension_suffixes()
['.cpython-36m-x86_64-linux-gnu.so', '.abi3.so', '.so']
A possibility is to use the stable C-API which could be used with multiple Python versions without recompilation. Cython start to support it in version 3.0 (see this PR), see also this SO-post about setuptools and stable C-API.
One might want to be clever and rename the extension to simple .so, so it can be picked up by the Finder - this can/does work for some Python-version combinations on some platforms for some extension - yet this approach cannot be sustained in the long run and is not the right thing to do.
The right thing to do, is to build the C-extension for/with the right Python-version on the right OS/platform or to use the right wheel (or use stable C-API).
In general, a C-extension built for a python-version (let's say PythonA.B) cannot be used by another Python version (let's say PythonC.D), because those extensions/modules are linked against a special Python-library and the needed functionality might no longer/not yet be present in the library of another version.
This different to *.py-files and more similar to *.pyc-files which cannot be used with a different version.
While PEP-3147 regulates the suffices of *.pyc-files, PEP-3149 does the same for the C-extensions. PEP-3149 is however not the state-of-the-art, as some of the problems where fixed only in Python3.5, the whole discussion can be found here.

Managing Python 3 code with SCons

at work I have the task to convert a large library with Python 2.7 Code to Python 3.x.
This library contains a lot of scripts and extensions made with boost python for C++.
All of this is built with SCons which does not work with a Python 3.x interpreter, but now me and my supervisor want to know if there is a way around this.
The SConstruct file contains expressions with sys.version to determine the correct module-directories to import (numpy etc.). I do not know how to use SCons or the syntax, so I can not give a lot of information about this topic.
Can we use SCons to build Python 3 Code with the given extensions or do we have to wait until SCons is compatible with Python 3?
At the time of writing this, there are plans to support both Python 2.7 and 3.x in a single branch/version. Work on this feature has started, but it will take some more time to reach this goal.
So it looks as if your best bet would be to start right away. SCons itself should run fine under Python 2.7 for compiling the Boost extensions. The problem in your case are the added checks and detection mechanisms for deriving paths and module names from the version of the current Python interpreter.
Since you can't give any more detail about this process, my answer is somewhat vague here, sorry. In principle you'd have to find the place in the SConstructs/SConscripts where the version of the currently running Python interpreter is determined. Just hardcode this to the 3.x version that you have installed on the machine additionally, and keep your fingers crossed that the rest will work automatically.
Note how there is a clear separation here between "compiling code for a Python version" vs "compiling code under a Python version".
In general, a better understanding of SCons internal workings and basic principles might be helpful. If you find the time, check out the UserGuide ( http://scons.org/doc/production/HTML/scons-user.html ) or consult our user mailing list ( see http://scons.org/lists.php ) for larger questions and discussions.

What are the limitations of distributing .pyc files?

I've started working on a commercial application in Python, and I'm weighing my options for how to distribute the application.
Aside from the obvious (distribute sources with an appropriate commercial license), I'm considering distributing just the .pyc files without their corresponding .py sources. But I'm not familiar enough with Python's compatibility guarantees to know if this is even workable, much less whether it's a good idea or not.
Are .pyc files independent of the underlying OS? For example, would a .pyc file generated on a 64-bit Linux machine work on a 32-bit Windows machine?
I've found that .pyc file should be compatible across bugfix releases, but what about major and minor releases? For example, would a file generated with Python 3.1.5 be compatible with Python 3.2.x? Or would a .pyc file generated with Python 2.7.3 be compatible with a Python 3.x release?
Edit:
Primarily, I may have to appease stakeholders who are uncomfortable distributing sources. Distributing .pyc's without sources may give them some level of comfort, since it would require the extra step of decompiling to get at the sources, even if that step is somewhat trivial. Just enough of a barrier to keep honest people honest.
For example, would a file generated with Python 3.1.5 be compatible with Python 3.2.x?
No.
Or would a .pyc file generated with Python 2.7.3 be compatible with a Python 3.x release?
Doubly no.
I'm considering distributing just the .pyc files without their corresponding .py sources.
Python bytecode is high-level and trivially decompilable.
You certainly could distribute the .pyc files only. As Cat mentioned, no it would not be compatible with different major version of Python. It might prevent some people from viewing the source code, but the .pyc files are very easy to decompile. Basically if you can compile it, you can decompile it.
You could use a binary packager like py2exe / py2app / freeze. I've never tried them but someone could still decompile them if they wanted to.
As Cat said, pyc files are not cross version safe. Though what you're trying to hide from the users determines what you need to do.
As for source code, there is no good way to hide Python source code in a distributed application. If you just trying to hide specific details you could pack those into a C extension -- which would be much harder to decompile.
So if you're worried about code use, put a license attached to the code for no-use or translate the sections you don't want stolen to a compiled language. If you just want code to not be obviously Python, you can create a binary executable that wraps the Python code (though doesn't hide the actual details if someone extracts them from the file).

Cross-compiled Python can't find basic modules (math, operator, etc)

I can't seem to import any of the basic modules located in the "lib-dynload" directory. They are all there, but I get the error: "ImportError: No module named X" when trying to import them.
I checked my sys.path and it includes the directory where all of these modules are located and my PYTHONHOME environment variable is set correctly. I'm at a bit of a loss as to what the problem could be. Some background info: This is cross-compiled from Python 2.6.6 source and installed onto an ARM embedded Linux board with Angstrom.
It did have python on there before, I had tried to bit-bake it into the image but it was missing a lot of stuff. I ended up doing my best to clean the directory tree of anything to do with the previous python before loading on my cross compiled version.
An strace of a simple script that just attempts to import math: http://pastebin.com/3XgJ3nPR
I see no checks in that trace for filenames like math.so or mathmodule.so which might indicate that shared-object modules are turned off entirely — that the version of Python you have compiled cannot load binary modules dynamically.
More: looking over the config.out from my most recent Python build, I see several lines where Python is investigating whether the platform will let it dynamically load binary modules that end in .so:
checking for dlopen... yes
checking DYNLOADFILE... dynload_shlib.o
checking MACHDEP_OBJS... MACHDEP_OBJS
What do these lines say on your cross-compile?
I have recently run across a similar issue building Python 2.7.13, and I believe it is this bug which is being fixed for Python 3 but not ported back to 2. The build process (setup.py) generates a list of modules to build, and then subtracts the list of built-in modules (sys.builtin_module_names); however, setup.py is run (from the Makefile) using python2.7 which in my case picked up the system (Ubuntu) binary rather than the one built, so it subtracts off modules that are built-in for the system python (including operator and collections) but not for the one being built, so they are neither built-in nor built as external modules.
I was able to use a suggestion from the bug and prepend the built python, in the source directory, to the path (and add a symlink from python2.7 -> python). This worked because I was building an x86 python on a multi-arch x64 machine; if you are building for another system like ARM you may need to apply the patch from the bug to get the list of built-in modules from earlier in the build process rather than the host python.

Embedding Python on Windows: why does it have to be a DLL?

I'm trying to write a software plug-in that embeds Python. On Windows the plug-in is technically a DLL (this may be relevant). The Python Windows FAQ says:
1.Do not build Python into your .exe file directly. On Windows, Python must be a DLL to handle importing modules that are themselves DLL’s. (This is the first key undocumented fact.) Instead, link to pythonNN.dll; it is typically installed in C:\Windows\System. NN is the Python version, a number such as “23” for Python 2.3.
My question is why exactly Python must be a DLL? If, as in my case, the host application is not an .exe, but also a DLL, could I build Python into it? Or, perhaps, this note means that third-party C extensions rely on pythonN.N.dll to be present and other DLL won't do? Assuming that I'd really want to have a single DLL, what should I do?
I see there's the dynload_win.c file, which appears to be the module to import C extensions on Windows and, as far as I can see, it scans the extension file to find which pythonX.X.dll it imports; but I'm not experienced with Windows and I don't quite understand all the code there.
You need to link to pythonXY.dll as a DLL, instead of linking the relevant code directly into your executable, because otherwise the Python runtime can't load other DLLs (the extension modules it relies on.) If you make your own DLL you could theoretically link all the Python code in that DLL directly, since it doesn't end up in the executable but still in a DLL. You'll have to take care to do the linking correctly, however, as pretty much none of the standard tools (like distutils) will do this for you.
However, regardless of how you embed Python, you can't make do with just the DLL, nor can you make do with just any DLL. The ABI changes between Python versions, so if you compiled your code against Python 2.6, you need python26.dll; you can't use python25.dll or python27.dll. And Python isn't just a DLL; it also needs its standard library, which includes extension modules (which are DLLs themselves, although they have the .pyd extension.) The code in dynload_win.c you ran into is for loading those DLLs, and are not related to loading of pythonXY.dll.
In short, in order to embed Python in your plugin, you need to either ship Python with the plugin, or require that the right Python version is already installed.
(Sorry, I did a stupid thing, I first wrote the question, and then registered, and now I cannot alter it or comment on the replies, because StackOverflow's engine doesn't think I'm the author. I cannot even properly thank those who replied :( So this is actually an update to the question and comments.)
Thanks for all the advice, it's very valuable. As far as I understand with some effort I can link Python statically into a custom DLL, provided that I compile other dynamically loaded extensions myself and link them against the same DLL. (I know I need to ship the standard library too; my plan was to append a zipped archive to the DLL file. As far as I understand, I will even be able to import pure Python modules from it.)
I also found an interesting place in dynload_win.c. (I understand it loads dynamic extensions that use Python C API, e.g. _ctypes.) As far as I can see it not only looks for init_ctypes symbol or whatever the extension name is, but also scans the .pyd file's import table looking for (regex) python\d+\. and then compares the found symbol with known pythonNN. string to make sure the extension was compiled for this version of Python. If the import table doesn't have such a symbol or it refers to another version, it raises an error.
For me it means that:
If I link an extension against pythonNN.dll and try to load it from my custom DLL that includes a statically linked Python, it will pass the check, but — well, here I'm not sure: will it fail because there's no pythonNN.dll (i.e. even before getting to the check) or it will happily load the symbols?
And if I link it against my custom DLL, it will find symbols, but won't pass the check :)
I guess I could rewrite this piece to suit my needs... Are there any other such places, I wonder.
Python needs to be a dll (with a standard name) such that your application, and the plugin, can use the same instance of python.
Plugin dlls are already going to expect to be loading (and using python from) a python26.dll (or whichever version) - if your python is statically embedded in your exe, then two different instances of the python library would be managing the same data structures.
If the python libraries use no static variables at all, and the compile settings are exactly the same this should not be a problem. However, generally its far safer to simply ensure that only one instance of the python interpreter is being used.
On *nix, all shared objects in a process, including the executable, contribute their exported names into a common pool; any of the shared objects can then pull any of the names from the pool and use them as they like. This allows e.g. cStringIO.so to pull the relevant Python library functions from the main executable when the Python library is statically-linked.
On Windows, each shared object has its own independent pool of names it can use. This means that it must read the relevant different shared objects it needs functions from. Since it is a lot of work to get all the names from the main executable, the Python functions are separated out into their own DLL.

Categories

Resources