Programmatically obtain Python install paths in prefix, without distutils - python

As distutils is being removed from Python in the > 3.10 versions, and setuptools will not be added to the stdlib, I want to replace an existing setup.py recipe for building/installing a C++ library Cython extension (i.e. not primarily a Python package, not run in a venv, etc.) with some custom code.
The Cython part is working fine, and I just about managed to construct an equivalent call to the C++ compiler from that previously executed by distutils, by using config-var info from sysconfig... though the latter was very trial and error, with no documentation or particular consistency to the config-var collection as far as I could tell.
But I am now stuck on identifying what directory to install my build extension .so into, within the target prefix of my build. Depending on the platform and path scheme in use, and the prefix itself, the subdirs could be in lib or lib64, a pythonX.Y subdir of some sort, and a final site-packages, dist-packages or other directory. This decision was previously made by distutils but I can't find any equivalent code to return such path decisions in other stdlib packages.
Any suggestions of answers or best-practice approaches? (Other than "use setuptools", please!)

Related

Packaging executable, shared library, and Python bindings not finding library

I have a project, cloudgen, that I would like to add bindings for Python so I can access some of the underlying functions. I have stubbed out the initial work on a branch. Because the main executable is built with cmake, I decided to use scikit-build to manage the build and use pybind11 to deal with the binding (following this example repo).
When I run pip install . in a virtual environment, everything appears to work as expected. I find the executable is installed to <prefix>/bin, the library goes into <prefix>/lib, and the module goes into <prefix>/lib/pythonX.Y/site-packages/cloudgen. In fact, if I run pip uninstall cloudgen, all of the correct files are uninstalled. However, my problems arise when I start to test the Python bindings. I find two separate but related problems.
If I installed into an Anaconda environment, the module is able to resolve the path to the shared library and pass the tests, but the executable does not resolve the path to the library.
On the other hand, if I installed into a virtual environment using python -m venv, both the module and the executable are unable to resolve the path to the shared library.
Searching around, I came across this question which notes I could manipulate LD_LIBRARY_PATH (or equivalently DYLD_LIBRARY_PATH on macOS or PATH on Windows), but that is normally frowned upon. That question references an open issue that refers to including additional build products (which as I said appears to not be my problem) but doesn't address the library path resolution. I also came across this question asking about distributing the build products using scikit-build and this question using setuptools directly. Neither of the questions or answers address the library path resolution.
My question is: What is the correct way to distribute a package that contains an executable, shared library, and Python binding module and have the path resolution Just Work™?
A minimal working example is a bit much, but I created a gist to demonstrate the behavior.
After a bit more digging (and carefully reading the CMake documentation on RPATH), the correct answer appears to be explicitly setting RPATH on installation. The relevant change to the linked gist is to add the following to the CMakeLists.txt after creating the targets (adapted from the linked Wiki):
if (SKBUILD)
find_package(PythonExtensions REQUIRED)
set(lib_path "${PYTHON_PREFIX}/${CMAKE_INSTALL_LIBDIR}")
else()
set(lib_path "${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}")
endif()
list(FIND CMAKE_PLATFORM_IMPLICIT_LINK_DIRECTORIES "${lib_path}" is_system)
if ("${is_system}" STREQUAL "-1")
set_target_properties(mwe.exe PROPERTIES
INSTALL_RPATH_USE_LINK_PATH TRUE
INSTALL_RPATH "${lib_path}")
# The following is necessary for installation in a virtual
# environment `python -m pip venv env`
set_target_properties(_mwe PROPERTIES
INSTALL_RPATH_USE_LINK_PATH TRUE
INSTALL_RPATH "${lib_path}")
endif()
This does require following the rest of the details about setting the RPATH such as including (literally from the linked CMake Wiki):
# use, i.e. don't skip the full RPATH for the build tree
set(CMAKE_SKIP_BUILD_RPATH FALSE)
# when building, don't use the install RPATH already
# (but later on when installing)
set(CMAKE_BUILD_WITH_INSTALL_RPATH FALSE)
set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib")
# add the automatically determined parts of the RPATH
# which point to directories outside the build tree to the install RPATH
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
earlier in the CMakeLists.txt. The net result is this should disable the dynamic searching of the library path (LD_LIBRARY_PATH or DYLD_LIBRARY_PATH as appropriate) for the installed executable and Python module.

Compile python wheel with SWIG for manylinux without linking to libpython

I'm trying to dig a bit deeper into binary packages with python with the wheel format, and I thought I might try to give wheelifying one of my dependencies as an exercise.
To maximize the reusability of the resulting package I want to build it using the manylinux containers, which compiles the binaries included in the package with the lowest common denominator that I'm likely to encounter. However I'm unsure how to avoid linking against a specific libpython.so. The docs describe this in a rather abstract way:
Note that libpythonX.Y.so.1 is not on the list of libraries that a manylinux1 extension is allowed to link to. Explicitly linking to libpythonX.Y.so.1 is unnecessary in almost all cases: the way ELF linking works, extension modules that are loaded into the interpreter automatically get access to all of the interpreter's symbols, regardless of whether or not the extension itself is explicitly linked against libpython. Furthermore, explicit linking to libpython creates problems in the common configuration where Python is not built with --enable-shared. In particular, on Debian and Ubuntu systems, apt install pythonX.Y does not even install libpythonX.Y.so.1, meaning that any wheel that did depend on libpythonX.Y.so.1 could fail to import.
So I'm wondering, SWIG generates the python wrappers for the C functions, and obviously makes use of the libpython functions, so the linker will try to link in libpython, which manylinux doesn't provide. If I understand correctly I am supposed to somehow leave the references dangling/unlinked and the interpreter will provide those symbols at runtime when loading the shared library, but how am I supposed to compile this in the first place?

How to properly use 3rd party dependencies with Sublime Text plugins?

I'm trying to write a plugin for Sublime Text 3.
I have to use several third party packages in my code. I have managed to get the code working by manually copying the packages into /home/user/.config/sublime-text-3/Packages/User/, then I used relative imports to get to the needed code. How would I distribute the plugin to the end users? Telling them to copy the needed dependencies to the appropriate location is certainly not the way to go. How are 3rd party modules supposed to be used properly with Sublime Text plugins? I can't find any documentation online; all I see is the recommendation to put the modules in the folder.
Sublime uses it's own embedded Python interpreter (currently Python 3.3.6 although the next version will also support Python 3.8 as well) and as such it will completely ignore any version of Python that you may or may not have installed on your system, as well as any libraries that are installed for that version.
For that reason, if you want to use external modules (hereafter dependencies) you need to do extra work. There are a variety of ways to accomplish this, each with their own set of pros and cons.
The following lists the various ways that you can achieve this; all of them require a bit of an understanding about how modules work in Python in order to understand what's going on. By and large except for the paths involved there's nothing too "Sublime Text" about the mechanisms at play.
NOTE: The below is accurate as of the time of this answer. However there are plans for Package Control to change how it works with dependencies that are forthcoming that may change some aspect of this.
This is related to the upcoming version of Sublime supporting multiple versions of Python (and the manner in which it supports them) which the current Package Control mechanism does not support.
It's unclear at the moment if the change will bring a new way to specify dependencies or if only the inner workings of how the dependencies are installed will change. The existing mechanism may remain in place regardless just for backwards compatibility, however.
All roads to accessing a Python dependency from a Sublime plugin involve putting the code for it in a place where the Python interpreter is going to look for it. This is similar to how standard Python would do things, except that locations that are checked are contained within the area that Sublime uses to store your configuration (referred to as the Data directory) and instead of a standalone Python interpreter, Python is running in the plugin host.
Populate the library into the Lib folder
Since version 3.0 (build 3143), Sublime will create a folder named Lib in the data directory and inside of it a directory based on the name of the Python version. If you use Preferences > Browse Packages and go up one folder level, you'll see Lib, and inside of it a folder named e.g. python3.3 (or if you're using a newer build, python33 and python38).
Those directories are directly on the Python sys.path by default, so anything placed inside of them will be immediately available to any plugin just as a normal Python library (or any of those built in) would be. You could consider these folders to be something akin to the site-packages folder in standard Python.
So, any method by which you could install a standard Python library can be used so long as the result is files ending up in this folder. You could for example install a library via pip and then manually copy the files to that location from site-packages, manually install from sources, etc.
Lib/python3.3/
|-- librarya
| `-- file1.py
|-- libraryb
| `-- file2.py
`-- singlefile.py
Version restrictions apply here; the dependency that you want to use must support the version of Python that Sublime is using, or it won't work. This is particularly important for Python libraries with a native component (e.g. a .dll, .so or .dylib), which may require hand compiling the code.
This method is not automatic; you would need to do it to use your package locally, and anyone that wants to use your package would also need to do it as well. Since Sublime is currently using an older version of Python, it can be problematic to obtain a correct version of libraries as well.
In the future, Package Control will install dependencies in this location (Will added the folder specifically for this purpose during the run up to version 3.0), but as of the time I'm writing this answer that is not currently the case.
Vendor your dependencies directly inside of your own package
The Packages folder is on the sys.path by default as well; this is how Sublime finds and loads packages. This is true of both the physical Packages folder, as well as the "virtual" packages folder that contains the contents of sublime-package files.
For example, one can access the class that provides the exec command via:
from Default.exec import ExecCommand
This will work even though the exec.py file is actually stored in Default.sublime-package in the Sublime text install folder and not physically present in the Packages folder.
As a result of this, you can vendor any dependencies that you require directly inside of your own package. Here this could be the User package or any other package that you're creating.
It's important to note that Sublime will treat any Python file in the top level of a package as a plugin and try to load it as one. Hence it's important that if you go this route you create a sub-folder in your package and put the library in there.
MyPackage/
|-- alibrary
| `-- code.py
`-- my_plugin.py
With this structure, you can access the module directly:
import MyPackage.alibrary
from MyPackage.alibrary import someSymbol
Not all Python modules lend themselves to this method directly without modification; some code changes in the dependency may be required in order to allow different parts of the library to see other parts of itself, for example if it doesn't use relative import to get at sibling files. License restrictions may also get in the way of this as well, depending on the library that you're using.
On the other hand, this directly locks the version of the library that you're using to exactly the version that you tested with, which ensures that you won't be in for any undue surprises further on down the line.
Using this method, anything you do to distribute your package will automatically also distribute the vendored library that's contained inside. So if you're distributing by Package Control, you don't need to do anything special and it will Just Work™.
Modify the sys.path to point to a custom location
The Python that's embedded into Sublime is still standard Python, so if desired you can manually manipulate the sys.path that describes what folders to look for packages in so that it will look in a place of your choosing in addition to the standard locations that Sublime sets up automatically.
This is generally not a good idea since if done incorrectly things can go pear shaped quickly. It also still requires you to manually install libraries somewhere yourself first, and in that case you're better off using the Lib folder as outlined above, which is already on the sys.path.
I would consider this method an advanced solution and one you might use for testing purposes during development but otherwise not something that would be user facing. If you plan to distribute your package via Package Control, the review of your package would likely kick back a manipulation of the sys.path with a request to use another method.
Use Package Control's Dependency system (and the dependency exists)
Package control contains a dependency mechanism that uses a combination of the two prior methods to provide a way to install a dependency automatically. There is a list of available dependencies as well, though the list may not be complete.
If the dependency that you're interested in using is already available, you're good to go. There are two different ways to go about declaring that you need one or more dependencies on your package.
NOTE: Package Control doesn't currently support dependencies of dependencies; if a dependency requires that another library also be installed, you need to explicitly mention them both yourself.
The first involves adding a dependencies key to the entry for your package in the package control channel file. This is a step that you'd take at the point where you're adding your package to Package Control, which is something that's outside the scope of this answer.
While you're developing your package (or if you decide that you don't want to distribute your package via Package Control when you're done), then you can instead add a dependencies.json file into the root of your package (an example dependencies.json file is available to illustrate this).
Once you do that, you can choose Package Control: Satisfy Dependencies from the command Palette to have Package Control download and install the dependency for you (if needed).
This step is automatic if your package is being distributed and installed by Package Control; otherwise you need to tell your users to take this step once they install the package.
Use Package Control's Dependency system (but the dependency does not exist)
The method that Package Control uses to install dependencies is, as outlined at the top of the question subject to change at some point in the (possibly near) future. This may affect the instructions here. The overall mechanism may remain the same as far as setup is concerned, with only the locations of the installation changing, but that remains to be seen currently.
Package Control installs dependencies via a special combination of vendoring and also manipulation of the sys.path to allow things to be found. In order to do so, it requires that you lay out your dependency in a particular way and provide some extra metadata as well.
The layout for the package that contains the dependency when you're building it would have a structure similar to the following:
Packages/my_dependency/
├── .sublime-dependency
└── prefix
└── my_dependency
└── file.py
Package Control installs a dependency as a Package, and since Sublime treats every Python file in the root of a package as a plugin, the code for the dependency is not kept in the top level of the package. As seen above, the actual content of the dependency is stored inside of the folder labeled as prefix above (more on that in a second).
When the dependency is installed, Package Control adds an entry to it's special 0_package_control_loader package that causes the prefix folder to be added to the sys.path, which makes everything inside of it available to import statements as normal. This is why there's an inherent duplication of the name of the library (my_dependency in this example).
Regarding the prefix folder, this is not actually named that and instead has a special name that determines what combination of Sublime Text version, platform and architecture the dependency is available on (important for libraries that contain binaries, for example).
The name of the prefix folder actually follows the form {st_version}_{os}_{arch}, {st_version}_{os}, {st_version} or all. {st_version} can be st2 or st3, {os} can be windows, linux or osx and {arch} can be x32 or x64.
Thus you could say that your dependency supports only st3, st3_linux, st3_windows_x64 or any combination thereof. For something with native code you may specify several different versions by having multiple folders, though commonly all is used when the dependency contains pure Python code that will work regardless of the Sublime version, OS or architecture.
In this example, if we assume that the prefix folder is named all because my_dependency is pure Python, then the result of installing this dependency would be that Packages/my_dependency/all would be added to the sys.path, meaning that if you import my_dependency you're getting the code from inside of that folder.
During development (or if you don't want to distribute your dependency via Package Control), you create a .sublime-dependency file in the root of the package as shown above. This should be a text file with a single line that contains a 2 digit number (e.g. 01 or 50). This controls in what order each installed dependency will get added to the sys.path. You'd typically pick a lower number if your dependency has no other dependencies and a higher value if it does (so that it gets injected after those).
Once you have the initial dependency laid out in the correct format in the Packages folder, you would use the command Package Control: Install Local Dependency from the Command Palette, and then select the name of your dependency.
This causes Package Control to "install" the dependency (i.e. update the 0_package_control_loader package) to make the dependency active. This step would normally be taken by Package Control automatically when it installs a dependency for the first time, so if you are also manually distributing your dependency you need to provide instructions to take this step.

How does setuptools add modules to the runtime?

I'm new to Python and overwhelmed with documentation regarding modules, package modules and imports.
Best I can understand, there are no "packages", only modules and import mechanisms and the officially recommended high-level API is provided bysetuptools whose Distribution class is the key element that drives what and how modules are added to the runtime module import mechanism (sys.path?) but I'm lost in the details of things.
My understanding is that setup.py is a normal Python module that indirectly uses runtime API to add more modules while being executed before actual "user" code. As a normal module it could download more code (like installers), sort dependencies topologically, call system linker etc.
But specifically, I'm having difficulty in understanding:
How setup.py decides whether to shadow system modules (like math) or not
How setup.py configures the import mechanism so that above doesn't happen by accident
How does -m interpreter flag work in comparison?
For conciseness, let's assume the most recent CPython and standard library only.
EDIT:
I see that packages are real, so are globals and levels as seen in __import__ function. It appears that setuptools has control over them.
setuptools helps packaging the code into the right formats (example: setup.py build sdist bdist_wheel to build the archives that can be downloaded from PyPI) and then unpacking this code unto the right locations (example: setup.py install or indirectly pip install) so that later on it can be picked up automatically by Python's own import mechanisms (typically yes, the location is one of the directories pointed at by sys.path).
I might be wrong but I wouldn't say that:
setup.py decides whether to shadow system modules (like math) or not
nor that:
setup.py configures the import mechanism so that above doesn't happen by accident
As far as I know this can really well happen, there's nothing to prevent that and it is not necessarily a bad thing.
How does -m interpreter flag work in comparison?
Instead of instructing Python interpreter to execute the code from a Python file by its path (example: python ./setup.py) it is possible to instruct the Python interpreter to execute a Python module (example: python -m setup, python -m pip, python -m http.server). In this case the Python will look for the named module in the directories pointed at by sys.path. This is why setuptools/setup.py and pip install the packages in a location that - if everything is configured right - is in sys.path.
I recommend you have a look at the values stored in sys.path by yourself, with for example the following command: python -c "import sys; print(sys.path)".

Python + setuptools: distributing a pre-compiled shared library with boost.python bindings

I have a C++ library (we'll call it Example in the following) for which I wrote Python bindings using the boost.python library. This Python-wrapped library will be called pyExample. The entire project is built using CMake and the resulting Python-wrapped library is a file named libpyExample.so.
When I use the Python bindings from a Python script located in the same directory as libpyExample.so, I simply have to write:
import libpyExample
libpyExample.hello_world()
and this executes a hello_world() function exposed by the wrapping process.
What I want to do
For convenience, I would like my pyExample library to be available from anywhere simply using the command
import pyExample
I also want pyExample to be easily installable in any virtualenv in just one command. So I thought a convenient process would be to use setuptools to make that happen. That would therefore imply:
Making libpyExample.so visible for any Python script
Changing the name under which the module is accessed
I did find many things about compiling C++ extensions with setuptools, but nothing about packaging a pre-compiled C++ extension. Is what I want to do even possible?
What I do not want to do
I don't want to build the pyExample library with setuptools, I would like to avoid modifying the existing project too much. The CMake build is just fine, I can retrieve the libpyExample.so file very easily.
If I understand your question correctly, you have the following situation:
you have an existing CMake-based build of a C++ library with Python bindings
you want to package this library with setuptools
The latter then allows you to call python setup.py install --user, which installs the lib in the site-packages directory and makes it available from every path in your system.
What you want is possible, if you overload the classes that setuptools uses to build extensions, so that those classes actually call your CMake build system. This is not trivial, but you can find a working example here, provided by the pybind11 project:
https://github.com/pybind/cmake_example
Have a look into setup.py, you will see how the classes build_ext and Extension are inherited from and modified to execute the CMake build.
This should work out of the box for you or with little modification - if your build requires special -D flags to be set.
I hope this helps!

Categories

Resources