Include run-time dependencies in Python wheels

Include run-time dependencies in Python wheels - python

I'd like to distribute a whole virtualenv, or a bunch of Python wheels of exact versions with their runtime dependencies, for example:
pycurl
pycurl.so
libcurl.so
libz.so
libssl.so
libcrypto.so
libgssapi_krb5.so
libkrb5.so
libresolv.so
I suppose I could rely on the system to have libssl.so installed, but surely not libcurl.so of the correct version and probably not Kerberos.
What is the easiest way to package one library in a wheel with all the run-time dependency?
Or is that a fool's errand and I should package entire virtualenv?
How to do that reliably?
P.S. compiling on the fly is not an option, some modules are patched.

AFAIK, there is no good standard way to portably install dependencies with your package. Continuum has made conda for precisely this purpose. The numpy guys wrote their own distutils submodule in their package to install some complicated dependencies, and now at least some of them advocate conda as a solution. Unfortunately, you may have to make conda packages for some of these dependencies yourself.
If you're fine without portability, then targeting the package manager of the target machines will obviously work. Otherwise, for a portable package manager, conda is the only option I know of.
Alternatively, from your post ("compiling on the fly is not an option") it sounds like portability may not be an issue for you, in which case you could also install all the requirements to a prefix directory (most installers I've come across support a configure --prefix=/some/dir/ option). If you have a guaranteed single architecture, you could probably prefix-install all your dependencies to a single directory and pass that around like a file. The conda approach would probably be cleaner, but I've used prefix installs quite a bit and they tend to be one of the easiest solutions to get going.
Edit:
As for conda, it is simultaneously a package-manager and a "virtualenv"-like environment/python install. While virtualenv is added on top of an existing python install, conda takes over the whole install, so you can be more sure that all the dependencies are accounted for. Compared to pip, it is designed for adding generalized non-Python dependencies, instead of just compiling C/Cpp exentions. For more info, I would see:
pip vs conda (also recommends buildout as a possibility)
conda as a python install
As for how to use conda for your purpose, the docs explain how to create a recipe:
Conda build framework
Building a package requires a recipe. A recipe is flat directory which
contains the following files:
meta.yaml (metadata file)
build.sh (Unix build script which is executed using bash)
bld.bat (Windows build script which is executed using cmd)
run_test.py (optional Python test file)
patches to the source (optional, see below)
other resources, which are not included in the source and cannot be
generated by the build scripts.
The same recipe should be used to build a package on all platforms.
When building a package, the following steps are invoked:
read the metadata
download the source (into a cache)
extract the source in a source directory
apply the patches
create a build environment (build dependencies are installed here)
run the actual build script. The current working directory is the source
directory with environment variables set. The build script installs into
the build environment
do some necessary post processing steps: shebang, rpath, etc.
add conda metadata to the build environment
package up the new files in the build environment into a conda package
test the new conda package:
create a test environment with the package (and its dependencies)
run the test scripts
There are example recipes for many conda packages in the conda-recipes
<https://github.com/continuumio/conda-recipes>_ repo.
The :ref:conda skeleton <skeleton_ref> command can help to make skeleton recipes for common
repositories, such as PyPI <https://pypi.python.org/pypi>_.
Then, as a client, you would install the package similar to how you would install from pip
Lastly, docker may also be interesting to you, though I haven't seen it much used for Python.

You may want to look into PEX: https://pex.readthedocs.io/en/stable/whatispex.html
'Files with the .pex extension – “PEX files” or ”.pex files” – are self-contained executable Python virtual environments. PEX files make it easy to deploy Python applications: the deployment process becomes simply scp.'

Related

How to create a python package built with all dependencies [isolation]

TL;DR;
The package ColorMe, depends on mypy 3.
The package ColorMe is installed in project Rainbow and Rainbow depends on mypy 2.
How to distribute *ColorMe with all the dependencies, so those who use ColorMe don't need to follow its dependencies versions.
Complete Story:
I have created a project validator (code linter, env checker, etc etc) in my company, which needs to be installed as a dependency in your python solution (normally via pip, or poetry). Currently, it's packaged and distributed as a whl file via poetry build.
Though it's a project that keeps growing and has a lot of dependencies, it starting to conflict with the packages existing in the developers' solutions.
I don't want that my package influences their project, as it must be used as a stand-alone CLI and not as a code source for their project.
Is it possible to create a build or something that isolates the dependencies?
So when devs install my CLI tool (with everything needed for it), which has a dependency on mypy, flake8, etc., don't make the main project dependent on those packages as well?

Build conda package upon installation

So I have published a conda package (link).
This package contains .c extensions (coming from cython code), which need to be compiled when the package is installed. My problem is that none of the extensions are compiled when running the install command
conda install -c nicolashug scikit-surprise
Compiling the extensions can be done by simply running
python setup.py install
which is exactly what pip does. The package is on PyPI and works fine.
As far as I understand, this setup.py command is only called when I build the conda package using conda build: the meta.yaml file (created with conda skeleton) contains
build:
script: python setup.py install --single-version-externally-managed--record=record.txt
But I need this to be done when the package is installed, not built.
Reading the conda docs, it looks like the install process is merely a matter of copying files:
Installing the files of a conda package into an environment can be thought of as changing the directory to an environment, and then downloading and extracting the .zip file and its dependencies
That would mean I would have to build the package for all platforms and architectures, and then upload them to conda... Which is impossible to me.
So, is there a way to build the package when it is installed, just like pip does?

As far as I know, there is no way to have the compilation happen on the user's machine when installing a conda package. Indeed, the whole idea of a conda package is that you do the compiling so that I don't have to on my machine, and all that's distributed is the compiled library. On Windows in particular, setting up compilers so they work properly (with Python) is a big big PITA, which is one of the biggest reasons for conda (and also wheels installed by pip).
If you don't have access to a particular OS directly, you can use Continuous Integration (CI) services, such as Appveyor (Windows), Travis CI (Linux/macOS), or CircleCI (Linux/macOS) to build packages and upload them to Anaconda cloud (or to PyPI for that matter). These services integrate directly with GitHub and other code-sharing services, and are generally free for FOSS projects. That way, you can build packages on each commit, on each tag, or some other variation that you desire.
In the end, you may save more time by setting up these services, because you won't have to provide compiler support for users who can't install a source package from PyPI.

What exactly does distutils do?

I have read the documentation but I don't understand.
Why do I have to use distutils to install python modules ?
Why do I just can't save the modules in python path ?

You don't have to use distutils. You can install modules manually, just like you can compile a C++ library manually (compile every implementation file, then link the .obj files) or install an application manually (compile, put into its own directory, add a shortcut for launching). It just gets tedious and error-prone, as every repetive task done manually.
Moreover, the manual steps I listed for the examples are pretty optimistic - often, you want to do more. For example, PyQt adds the .ui-to-.py-compiler to the path so you can invoke it via command line.
So you end up with a stack of work that could be automated. This alone is a good argument.
Also, the devs would have to write installing instructions. With distutils etc, you only have to specify what your project consists of (and fancy extras if and only if you need it) - for example, you don't need to tell it to put everything in a new folder in site-packages, because it already knows this.
So in the end, it's easier for developers and for users.

what python modules ? for installing python package if they exist in pypi you should do :
pip install <name_of_package>
if not, you should download them .tar.gz or what so ever and see if you find a setup.py and run it like this :
python setup.py install
or if you want to install it in development mode (you can change in package and see the result without installing it again ) :
python setup.py develop
this is the usual way to distribute python package (the setup.py); and this setup.py is the one that call disutils.
to summarize this distutils is a python package that help developer create a python package installer that will build and install a given package by just running the command setup.py install.
so basically what disutils does (i will sit only important stuff):
it search dependencies of the package (install dependencies automatically).
it copy the package modules in site-packages or just create a sym link if it's in develop mode
you can create an egg of you package.
it can also run test over your package.
you can use it to upload your package to pypi.
if you want more detail see this http://docs.python.org/library/distutils.html

You don't have to use distutils to get your own modules working on your own machine; saving them in your python path is sufficient.
When you decide to publish your modules for other people to use, distutils provides a standard way for them to install your modules on their machines. (The "dist" in "distutils" means distribution, as in distributing your software to others.)

Python packages installation in Windows

I recently began learning Python, and I am a bit confused about how packages are distributed and installed.
I understand that the official way of installing packages is distutils: you download the source tarball, unpack it, and run: python setup.py install, then the module will automagically install itself
I also know about setuptools which comes with easy_install helper script. It uses eggs for distribution, and from what I understand, is built on top of distutils and does the same thing as above, plus it takes care of any dependencies required, all fetched from PyPi
Then there is also pip, which I'm still not sure how it differ from the others.
Finally, as I am on a windows machine, a lot of packages also offers binary builds through a windows installer, especially the ones that requires compiling C/Fortran code, which otherwise would be a nightmare to manually compile on windows (assumes you have MSVC or MinGW/Cygwin dev environment with all necessary libraries setup.. nonetheless try to build numpy or scipy yourself and you will understand!)
So can someone help me make sense of all this, and explain the differences, pros/cons of each method. I'd like to know how each keeps track of packages (Windows Registry, config files, ..). In particular, how would you manage all your third-party libraries (be able to list installed packages, disable/uninstall, etc..)

I use pip, and not on Windows, so I can't provide comparison with the Windows-installer option, just some information about pip:
Pip is built on top of setuptools, and requires it to be installed.
Pip is a replacement (improvement) for setuptools' easy_install. It does everything easy_install does, plus a lot more (make sure all desired distributions can be downloaded before actually installing any of them to avoid broken installs, list installed distributions and versions, uninstall, search PyPI, install from a requirements file listing multiple distributions and versions...).
Pip currently does not support installing any form of precompiled or binary distributions, so any distributions with extensions requiring compilation can only be installed if you have the appropriate compiler available. Supporting installation from Windows binary installers is on the roadmap, but it's not clear when it will happen.
Until recently, pip's Windows support was flaky and untested. Thanks to a lot of work from Dave Abrahams, pip trunk now passes all its tests on Windows (and there's a continuous integration server helping us ensure it stays that way), but a release has not yet been made including that work. So more reliable Windows support should be coming with the next release.
All the standard Python package installation mechanisms store all metadata about installed distributions in a file or files next to the actual installed package(s). Distutils uses a distribution_name-X.X-pyX.X.egg-info file, pip uses a similarly-named directory with multiple metadata files in it. Easy_install puts all the installed Python code for a distribution inside its own zipfile or directory, and places an EGG-INFO directory inside that directory with metadata in it. If you import a Python package from the interactive prompt, check the value of package.__file__; you should find the metadata for that package's distribution nearby.
Info about installed distributions is only stored in any kind of global registry by OS-specific packaging tools such as Windows installers, Apt, or RPM. The standard Python packaging tools don't modify or pay attention to these listings.
Pip (or, in my opinion, any Python packaging tool) is best used with virtualenv, which allows you to create isolated per-project Python mini-environments into which you can install packages without affecting your overall system. Every new virtualenv automatically comes with pip installed in it.
A couple other projects you may want to be aware of as well (yes, there's more!):
distribute is a fork of setuptools which has some additional bugfixes and features.
distutils2 is intended to be the "next generation" of Python packaging. It is (hopefully) adopting the best features of distutils/setuptools/distribute/pip. It is being developed independently and is not ready for use yet, but eventually should replace distutils in the Python standard library and become the de facto Python packaging solution.
Hope all that helped clarify something! Good luck.

I use windows and python. It is somewhat frustrating, because pip doesn't always work to install things. Python is moving to pip, so I still use it. Pip is nice, because you can uninstall items and use
pip freeze > requirements.txt
pip install -r requirements.txt
Another reason I like pip is for virtual environments like venv with python 3.4. I have found venv a lot easier to use on windows than virtualenv.
If you cannot install a package you have to find the binary for it. http://www.lfd.uci.edu/~gohlke/pythonlibs/
I have found these binaries to be very useful.
Pip is trying to make something called a wheel for binary installations.
pip install wheel
wheel convert path\to\binary.exe
pip install converted_wheel.whl
You will also have to do this for any required libraries that do not install and are required for that package.

using setuptools with post-install and python dependencies

This is somewhat related to this question. Let's say I have a package that I want to deploy via rpm because I need to do some file copying on post-install and I have some non-python dependencies I want to declare. But let's also say I have some python dependencies that are easily available in PyPI. It seems like if I just package as an egg, an unzip followed by python setup.py install will automatically take care of my python dependencies, at the expense of losing any post-install functionality and non-python dependencies.
Is there any recommended way of doing this? I suppose I could specify this in a pre-install script, but then I'm getting into information duplication and not really using setuptools for much of anything.
(My current setup involves passing install_requires = ['dependency_name'] to setup, which works for python setup.py bdist_egg and unzip my_package.egg; python my_package/setup.py install, but not for python setup.py bdist_rpm --post-install post-install.sh and rpm --install my_package.rpm.)

I think it would be best if your python dependencies were available as RPMs also, and declared as dependencies in the RPM. If they aren't available elsewhere, create them yourself, and put them in your yum repository.
Running PyPI installations as a side effect of RPM installation is evil, as it won't support proper uninstallation (i.e. uninstalling your RPM will remove your package, but leave the dependencies behind, with no proper removal procedure).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.