How to use precompiled headers with python wheels?

How to use precompiled headers with python wheels? - python

I'm writing a python packages which I would like to distribute as a wheel, so it can be easily installed via pip install.
As part of the functionality of the package itself I compile C++ code. For that, I distribute with the package some set of header files for the C++ code to include. Now, in order to speed up those compilation operations I'd like to provide a precompiled-header as part of the package.
I am able to do this if the package is installed via python setup.py install because I can add a step after the installation that generates the precompiled header in the installation directory (some-virtualenv/lib/python3.5/site-packages/...) directly.
But now I can't figure out how to do this when I distribute a wheel. It seems to me that the installation process of a wheel is supposed to be a simple unpack and copy and provides me no way of performing some extra configuration on the installed package (that would generate that precompiled header for example).
As part of my search of how to do this I stumbled across this, but no solution is offered there.
Is there any way around this or am I forced to use a source distribution for my package?

Related

Python packaging: Boost library as dependency

Assume that someone wants to package a Python (Cython) library that depends on the C++ boost library.
What is the best way to configure the setup.py so that the user is properly informed that it is required to install the boost library (i.e., apt-get install libboost-dev in Ubuntu, etc in other OSes)? Or is it a better practice to include the boost library in the python package distribution?

The question is better asked as
What is the best way to distribute a Python extension including
an external library dependency.
This is better dealt with binary wheel packages.
User does not need to know anything about setup.py, which is used for building and installing source code. User just needs to download and install a binary wheel package.
Including just the header files does not solve the problem of needing the library to build with and link to. It also opens up issues with version incompatibilities.
So setup.py need not have anything special about any of this, it just needs to know where to find headers which will be a sub-dir in your project if the library is included and which libraries to link with.
The documentation should include instructions on how to build from source, for which more than just boost is needed (python header files, appropriate compilers etc).
Tools like auditwheel then take care of bundling external library dependencies into the binary wheel, so end-users need not have the library installed to use your package.
See also manylinux for distributing binary Python extensions and this demo project.

pip package: proper way of compiling code that depends on libclang

I am building a python library, that I want to be installable via pip. The installation process requires a cpp file to be compiled, and that cpp file depends on libclang (in particular, it includes some of clang-c header files, and needs to be linked against libclang.so).
I am assuming that the end user has clang++ installed. However, I don't know where that installation is. For example, when I installed clang++ locally, even though it installed all the headers and the library I need, if I just compile a blank C++ file that has
#include <clang-c/CXCompilationDatabase.h>
It won't find it. I need to explicitly provide the path via command-line argument or CPLUS_INCLUDE_PATH.
Now I need some way for the script that pip invokes to find those headers. I obviously can ask the user to set CPLUS_INCLUDE_PATH and LD_LIBRARY_PATH to include paths to clang++ before running the installation process, but it seems ugly. I can add headers and the library into my package, but then I would rather it build against the version that the user has. Is there a way to find a clang++ installation if a user has one (or at least if he installed one via apt-get or another package manager?), or in general, what is the correct way of solving this issue when building a pip package?

Should I prepare eggs and zip in addition to tarball for PyPI?

I'm preparing my first package distribution for PyPI using setuptools etc. I've gotten the source distribution (.tar.gz) working well and now I'm wondering whether I should provide any other formats, like .zip and .egg.
On a random walk through PyPI I noticed some projects that provided eggs for different Python versions along with the source tarball. And I noticed one or two that provided the source distribution in .zip form.
What's the best practice for which formats to upload to PyPI? And do the various installation programs (easy_install, pip) have a preference for which one they use?

Current best practice is to add source distribution (sdist) and wheel (bdist_wheel). Look at what numpy does, for example.
If you have things to be built, you might save your users major headaches by providing a wheel. Compare the installation of numpy with wheel and without wheel (e.g. by installing a version for which they didn't create a wheel for your python version): without the wheel, you need a Fortran compiler and Blas libraries installed and usable. Without it, you don't even have to know that numpy uses Fortran.
If you have a pure Python package, the wheel format still has the advantage that it can be installed simply by copying the code to the correct location. I'm not sure how big this advantage is, though
See also: https://packaging.python.org/guides/distributing-packages-using-setuptools/#wheels

If you build no C-extensions, just ship the source tarball in tar.gz format. It can be installed on all platforms. A binary egg (installable by easy_install, not pip) is only neccessary if you build C-extensions and your software runs on Windows as people seldomly have a C-compiler on their machine.

What exactly does distutils do?

I have read the documentation but I don't understand.
Why do I have to use distutils to install python modules ?
Why do I just can't save the modules in python path ?

You don't have to use distutils. You can install modules manually, just like you can compile a C++ library manually (compile every implementation file, then link the .obj files) or install an application manually (compile, put into its own directory, add a shortcut for launching). It just gets tedious and error-prone, as every repetive task done manually.
Moreover, the manual steps I listed for the examples are pretty optimistic - often, you want to do more. For example, PyQt adds the .ui-to-.py-compiler to the path so you can invoke it via command line.
So you end up with a stack of work that could be automated. This alone is a good argument.
Also, the devs would have to write installing instructions. With distutils etc, you only have to specify what your project consists of (and fancy extras if and only if you need it) - for example, you don't need to tell it to put everything in a new folder in site-packages, because it already knows this.
So in the end, it's easier for developers and for users.

what python modules ? for installing python package if they exist in pypi you should do :
pip install <name_of_package>
if not, you should download them .tar.gz or what so ever and see if you find a setup.py and run it like this :
python setup.py install
or if you want to install it in development mode (you can change in package and see the result without installing it again ) :
python setup.py develop
this is the usual way to distribute python package (the setup.py); and this setup.py is the one that call disutils.
to summarize this distutils is a python package that help developer create a python package installer that will build and install a given package by just running the command setup.py install.
so basically what disutils does (i will sit only important stuff):
it search dependencies of the package (install dependencies automatically).
it copy the package modules in site-packages or just create a sym link if it's in develop mode
you can create an egg of you package.
it can also run test over your package.
you can use it to upload your package to pypi.
if you want more detail see this http://docs.python.org/library/distutils.html

You don't have to use distutils to get your own modules working on your own machine; saving them in your python path is sufficient.
When you decide to publish your modules for other people to use, distutils provides a standard way for them to install your modules on their machines. (The "dist" in "distutils" means distribution, as in distributing your software to others.)

Python packages installation in Windows

I recently began learning Python, and I am a bit confused about how packages are distributed and installed.
I understand that the official way of installing packages is distutils: you download the source tarball, unpack it, and run: python setup.py install, then the module will automagically install itself
I also know about setuptools which comes with easy_install helper script. It uses eggs for distribution, and from what I understand, is built on top of distutils and does the same thing as above, plus it takes care of any dependencies required, all fetched from PyPi
Then there is also pip, which I'm still not sure how it differ from the others.
Finally, as I am on a windows machine, a lot of packages also offers binary builds through a windows installer, especially the ones that requires compiling C/Fortran code, which otherwise would be a nightmare to manually compile on windows (assumes you have MSVC or MinGW/Cygwin dev environment with all necessary libraries setup.. nonetheless try to build numpy or scipy yourself and you will understand!)
So can someone help me make sense of all this, and explain the differences, pros/cons of each method. I'd like to know how each keeps track of packages (Windows Registry, config files, ..). In particular, how would you manage all your third-party libraries (be able to list installed packages, disable/uninstall, etc..)

I use pip, and not on Windows, so I can't provide comparison with the Windows-installer option, just some information about pip:
Pip is built on top of setuptools, and requires it to be installed.
Pip is a replacement (improvement) for setuptools' easy_install. It does everything easy_install does, plus a lot more (make sure all desired distributions can be downloaded before actually installing any of them to avoid broken installs, list installed distributions and versions, uninstall, search PyPI, install from a requirements file listing multiple distributions and versions...).
Pip currently does not support installing any form of precompiled or binary distributions, so any distributions with extensions requiring compilation can only be installed if you have the appropriate compiler available. Supporting installation from Windows binary installers is on the roadmap, but it's not clear when it will happen.
Until recently, pip's Windows support was flaky and untested. Thanks to a lot of work from Dave Abrahams, pip trunk now passes all its tests on Windows (and there's a continuous integration server helping us ensure it stays that way), but a release has not yet been made including that work. So more reliable Windows support should be coming with the next release.
All the standard Python package installation mechanisms store all metadata about installed distributions in a file or files next to the actual installed package(s). Distutils uses a distribution_name-X.X-pyX.X.egg-info file, pip uses a similarly-named directory with multiple metadata files in it. Easy_install puts all the installed Python code for a distribution inside its own zipfile or directory, and places an EGG-INFO directory inside that directory with metadata in it. If you import a Python package from the interactive prompt, check the value of package.__file__; you should find the metadata for that package's distribution nearby.
Info about installed distributions is only stored in any kind of global registry by OS-specific packaging tools such as Windows installers, Apt, or RPM. The standard Python packaging tools don't modify or pay attention to these listings.
Pip (or, in my opinion, any Python packaging tool) is best used with virtualenv, which allows you to create isolated per-project Python mini-environments into which you can install packages without affecting your overall system. Every new virtualenv automatically comes with pip installed in it.
A couple other projects you may want to be aware of as well (yes, there's more!):
distribute is a fork of setuptools which has some additional bugfixes and features.
distutils2 is intended to be the "next generation" of Python packaging. It is (hopefully) adopting the best features of distutils/setuptools/distribute/pip. It is being developed independently and is not ready for use yet, but eventually should replace distutils in the Python standard library and become the de facto Python packaging solution.
Hope all that helped clarify something! Good luck.

I use windows and python. It is somewhat frustrating, because pip doesn't always work to install things. Python is moving to pip, so I still use it. Pip is nice, because you can uninstall items and use
pip freeze > requirements.txt
pip install -r requirements.txt
Another reason I like pip is for virtual environments like venv with python 3.4. I have found venv a lot easier to use on windows than virtualenv.
If you cannot install a package you have to find the binary for it. http://www.lfd.uci.edu/~gohlke/pythonlibs/
I have found these binaries to be very useful.
Pip is trying to make something called a wheel for binary installations.
pip install wheel
wheel convert path\to\binary.exe
pip install converted_wheel.whl
You will also have to do this for any required libraries that do not install and are required for that package.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.