How to construct a Python package relying on large system libraries

How to construct a Python package relying on large system libraries - python

What's the appropriate way of constructing a Python package via disutils when that Python package relies on a large system library?
I found this similar question, but it refers to an installable Python app, not a generic package.
I've written a package that relies on OpenCV. I'm only concerned with supporting Linux distros, but most distros either don't provide OpenCV or provide a version that's too old to use. Unfortunately, OpenCV is to large and cumbersome (and depends on several other system libraries) to include in the package and compile during the build step.
My current approach is to simply do nothing special in my setup.py and just import its Python modules in a try/except, showing a detailed error message if the import fails. Is there a better way?

You could use zc.buildout: http://www.buildout.org/
You should be able to extend the buildout config for your project from this one: https://github.com/cpsaltis/opencv-buildout

Related

Importing cython generated *.so-module with another python-version or on another OS

How should a file myModule.cpython-35m-x86_64-linux-gnu.so be imported in python? Is it possible?
I tried the regular way:
import myModule
and the interpreter says:
`ModuleNotFoundError: No module named 'myModule'`
This is a software that I can't install in the cluster that I am working at so I just extracted the .deb package and it does not have a wheel file or structure to install.

It is problematic to use a C-extension built for one Python version in another Python version. Normally (at least for Python3) there is a mechanism in place to differentiate C-extensions for different Python versions, so they can co-exist in the same directory.
In your example, the suffix is cpython-35m-x86_64-linux-gnu so this C-extension will be picked up by a CPython3.5 on a x86_64 Linux. If you try to import this extension with another Python-version or on another plattform, the module isn't visible and ModuleNotFoundError is raised.
It is possible to see, which suffixes are accepted by the current Python version, e.g. via:
>>> import _imp
>>>_imp.extension_suffixes()
['.cpython-36m-x86_64-linux-gnu.so', '.abi3.so', '.so']
A possibility is to use the stable C-API which could be used with multiple Python versions without recompilation. Cython start to support it in version 3.0 (see this PR), see also this SO-post about setuptools and stable C-API.
One might want to be clever and rename the extension to simple .so, so it can be picked up by the Finder - this can/does work for some Python-version combinations on some platforms for some extension - yet this approach cannot be sustained in the long run and is not the right thing to do.
The right thing to do, is to build the C-extension for/with the right Python-version on the right OS/platform or to use the right wheel (or use stable C-API).
In general, a C-extension built for a python-version (let's say PythonA.B) cannot be used by another Python version (let's say PythonC.D), because those extensions/modules are linked against a special Python-library and the needed functionality might no longer/not yet be present in the library of another version.
This different to *.py-files and more similar to *.pyc-files which cannot be used with a different version.
While PEP-3147 regulates the suffices of *.pyc-files, PEP-3149 does the same for the C-extensions. PEP-3149 is however not the state-of-the-art, as some of the problems where fixed only in Python3.5, the whole discussion can be found here.

Distributing a Python module with ffmpeg dependency

Rookie software developer here. I’m working on a Python module that harnesses some functionality from the FFmpeg framework - specifically, the ebur128 filter function. Ideally the module will stand on its own as an independent, platform agnostic tool for verifying that audio clips comply with EBU loudness standards. It’s being designed so that end users need only perform one simple, (hopefully!) painless installation procedure, which will encompass the installation of both the FFmpeg libraries and my Python wrapper/GUI.
I apologize for the rather vague question, but does anyone have general advice for creating Python module with external dependencies, or specific advice for standardizing the FFmpeg installation across platforms? Distutils seems pretty helpful – are there other guidelines or standard practices for developing a neatly packaged Python tool? I want to minimize any installation headaches for end users.
Thanks very much.

For Windows
I think it will be easy to find ffmpeg binaries that work on any system, just like for Qt or whatever GUI library you are using. You can ship these binaries with your project and things will work (you may want to distinguish 32 bit and 64 bit systems, though).
It looks like you want to create a software that is self-contained and easily installable for end-users. Inkscape is such an example -- its installer contains Python and all other dependencies, in binary form (if required). That is, for Windows, you do not need to create a real Python package (which would allow installation with pip), and you do not need to look into distutils (which supports building C extensions). Both you do not need/want, I guess.
Maybe it will be enough for you to assemble a good directory structure and to distribute a ZIP archive with your software. This is enough if you do not need to interact with the Windows registry, for instance. Such programs are usually called "standalone", in the Windows world. However, you might still want to have a real Windows installer (even if it is just a self-extracting archive). The following article covers your requirements, I believe: http://cyrille.rossant.net/create-a-standalone-windows-installer-for-your-python-application/
It suggests using http://www.jrsoftware.org/isinfo.php for creating such an installer.
Other platforms
On other operating systems it will be more difficult. For instance, I think it will be almost impossible to create ffmpeg binaries that run on every Linux system, because ffmpeg itself has so many binary dependencies. I do not know whether you can statically build ffmpeg at all.

How can I build an RPM for an earlier version of python?

How can I build a python distribution RPM that is only dependent on an earlier version of python?
Why? I'm trying to build a distribution RPMs for RHEL6/CentOS 6, which only includes Python 2.6, but I am building usually on machines with Python 2.7.
This is an open source project, and I have already ensured that it shouldn't be including any libraries/APIs that are not in 2.6.
I am building the RPMs with:
python setup.py bdist_rpm
setup.py file:
from distutils.core import setup
setup(name='pyresttest',
version='0.1',
description=Text',
maintainer='Not listing here',
maintainer_email='no,just no',
url='project url here',
keywords='rest web http testing',
packages=['pyresttest'],
license='Apache License, Version 2.0',
requires=['yaml','pycurl']
)
(Specifics removed for the url, maintainer, email and description).
The RPM appears to be valid, but when I try to install on RHEL6, I get this error:
python(abi) = 2.7 is needed by pyresttest-0.1-1.noarch
There should be some way to get it to override the default python version to require, or supply a custom SPEC file, but after several hours of fiddling with it, I'm stuck. Ideas?
EDIT: I suppose I should clarify why I'm doing a RPM for python code, instead of just using setuptools or pip: this will hopefully go to production at work, where all deployments are RPM-based and most VMs are still RHEL6. Asking them to adopt another packaging tool is likely to be a non-starter, since our company is closely tied to the RPM format.

Re-organized the answer.
Actually, there's no "rpm-package". There're rpm-packages for RHEL6, rpm-packages for FedoraNN, rpm-packagse for OpenSUSE-X.Y and so on. And besides there're Debian, Ubuntu, Arch and Gentoo :)
You have the following possibilities with your Python package:
You may completely avoid rpm-, deb- and other "native linux packaging systems", and may opt to use a "python-native" packaging system like PIP. Thus you completely avoid the complexity and lack of compatibility between packaging systems in various versions and various flavours of Linux. And for a package which doesn't "infiltrate" deeply into "core system", this could be the best solution.
You may continue to use RPM as an archive format for your package but completely turn off automatic dependency calculations. This can be done with AutoReqProv: no directive in the spec. To be able to work with a customized spec one may use --spec-only and --spec-file distutils options. But remember that a package built this way is even worse than a zip from p.1: without proper dependencies it contains less necessary metainformation and thus "defames" the whole idea behind Linux packaging systems which were invented to built consistent systems, to avoid problems like "DLL hell" and to be suitable for automatic maintainance and updates. Actually you may add dependency information manually, via Requires: <something> tag but this may become even more hard and bporing if you target several Linux platforms at once.
In order to take into account all those complex and boring details and nuances of a particular package system you may create "build sandboxes" with appropriate versions of necessary Linux flavours. My preferred way to create such sandboxes is to use pre-created "OpenVZ templates", but without OpenVZ per se: simply unpack a given archive into a subdirectory (being root to preserve permissions), then chroot into the subdirectory, and voila! you've got Debian, RHEL etc... Fedora people have created Mock for the same purposes and likely Mock would be a more elaborated solution. As #BobMcGee suggests in the comment one also may consider Jenkins Docker plugin
Once you have a build sandbox with python distribution specific to that system, distutils etc you may automate the build process using simple scripting, bash or python.
That's it.

I do not do very much python work but have done some RPM packaging. You probably need to somehow do what one would normally do in the RPM's spec file and specify and require a particular release of your python package like so ...
# this would be in your spec file
requires: python <= 2.6
Take a look here for more info:
http://ftp.rpm.org/max-rpm/s1-rpm-depend-manual-dependencies.html

Deploy Python programs on Windows and fetch big library dependencies

I have some small Python programs which depend on several big libraries, such as:
NumPy & SciPy
matplotlib
PyQt
OpenCV
PIL
I'd like to make it easier to install these programs for Windows users. Currently I have two options:
either create huge executable bundles with PyInstaller, py2exe or similar tool,
or write step-by-step manual installation instructions.
Executable bundles are way too big. I always feel like there is some magic happening, which may or may not work the next time I use a different library or a new library version. I dislike wasted space too. Manual installation is too easy to do wrong, there are too many steps: download this particular interpreter version, download numpy, scipy, pyqt, pil binaries, make sure they all are built for the same python version and the same platform, install one after another, download and unpack OpenCV, copy its .pyd file deep inside Python installation, setup environment variables and file asssociations... You see, few users will have the patience and self-confidence to do all this.
What I'd like to do: distribute only a small Python source and, probably, an installation script, which fetches and installs all the missing dependencies (correct versions, correct platform, installs them in the right order). That's a trivial task with any Linux package manager, but I just don't know which tools can accomplish it on Windows.
Are there simple tools which can generate Windows installers from a list of URLs of dependencies1?
1 As you may have noticed, most of the libraries I listed are not installable with pip/easy_install, but require to run their own installers and modify some files and environment variables.

npackd exists http://code.google.com/p/windows-package-manager/ It could be done through here or use distribute (python 3.x) or setuptools (python 2.x) with easy_install, possibly pip (don't know it's windows compatibility). But I would choose npackd because PyQt and it's unusual setup for pip/easy_install (doesn't play with them nicely, using a configure.py instead of setup.py). Though you would have to create your own repo for npackd to use for some of them. I forget what is contributed in total for python libs with it.

AFAIK there is no tool (and I'd assume you googled), so you must make one yourself.
Fetching the proper library versions seems simple enough -- using python's ftplib you can fetch the proper installers for every library. How would you know which version is compatible with the user's python? You can store different lists of download URLs, each for a different python version (this method came off the top of my head and there is probably a better way; not that it matters much if it's simple and it works).
After you figure out how to make each installer run, you can py2exe your installer script, and even use it to fetch the program itself.
EDIT
Some Considerations
There are a couple of things that popped into my mind just as I posted:
First, some pseudocode (how I would approach it, anyway)
#first, we check modules
try:
import numpy
except ImportError:
#flag numpy for installation
#lather, rinse repeat for all dependencies
#next we check version compatibility -- note that if a library version you need
#is not backwards-compatible, you're in DLL hell, and there is little we can do.
<insert version-checking code here>
#once you have your unavailable dependencies, you install them
import ftplib
<all your file-downloading here>
#now you install. sorry I can't help you here.
There are a few things you can do to make your utility reusable --
put all URL lists, minimum version numbers, required library names etc in config files
Write a script which helps you set up an installer
Py2exe the installer-maker-script
Sell it
Even better, release it under GPL so we can all feast upon fruits of your labours.

I have a similar need as you, but in addition I need the packaged application to work on several platforms. I'm currently exploring the currently available solutions, here are a few interesting ones:
Use SnakeBasket, which wraps around Pip and add a recursive dependency resolution plus a heuristic to choose the right version when there are conflicts.
Package all dependencies as an egg, but not your sourcecode which will still be editable: https://stackoverflow.com/a/528064/1121352
Package all dependencies in a zip file and directly import the modules on the fly: Cross-platform alternative to py2exe or http://davidf.sjsoft.com/mirrors/mcmillan-inc/install1.html
Using buildout: http://www.buildout.org/en/latest/install.html
Using virtualenv with virtualenv-tools (instead of "relocate")
If your main problem when freezing your code using PyInstaller or similar is that you end up with a big single file, you can customize the process so that you get several files, one for each dependency, instead of one big executable.
I will update here if I find something that fills my bill.

Should I bundle C libraries with my Python application?

If I have a Python package that depends on some C libraries (like say the Gnu Scientific Library (GSL) for numerical computations), is it a good idea to bundle the library with my code?
I'd like to make my package as easy to install as possible for users and I don't want them to have to download C libraries by hand and supply include-paths. Also I could always ensure that the version of the library that I ship is compatible with my code.
However, is it possible that there are clashes if the user has the library installed already, or ar there any other reasons why I shouldn't do this?
I know that I can make it easier for users by just providing a binary distribution, but I'd like to avoid having to maintain binary distributions for all possible OSs. So, I'd like to stick to a source distribution, but for the user (who proudly owns a C compiler) installation should be as easy as python setup.py install.

Distribution is one of the hard parts for any software project. Java and .NET lift part of this burden by defining a standard runtime and then just saying "just distribute everything else." Of course there's a drawback: everything must be rewritten in a language supported by the runtime - as soon as you want to use native code, you lose all the advantages.
That's harder in Python, as it is in Ruby, C, C++ and other languages, as they usually leverage existing native libraries.
Generally speaking:
Make it possible to get a source sdist, via pypi.python.org as an example. Correctly set your install_requires (probably you'll require python bindings for GSL, not GSL itself). Use standard setuptools/distribute layout. This will let anyone - let's say a package maintainer for any distro - to pick up your software and package it.
Additionally, consider providing a full-blown installable package for your audience. You don't have to support all the distros and operating system; pick one or two that you consider will be used most. Tools like PyInstaller will let you create an installable, runnable package for many operating systems, but especially for linux you might want the user to install the distribution's own version of transitive deps (libgsl?) - you'll need a full-blown deb or rpm package to satisfy that - again, don't try supporting any and all the distro, you'll turn out mad. Support something you most use, and let other users to help you with other packaging needs.
Also take a look at Python Packaging Guide

You could have two separate branches of the src, one containing the libraries and another that doesn't. That way you can explicitly warn your users in case they have installed the libraries. Another solution could be (if the licences of the libraries allow you) is to wrap 'em up in a single file.
I think there's no unique solution, but this are the ideas I could think so far.
Good luck

You can use virtualenv to create a private Python environment for your application. This avoids conflicts with other libraries. It is best if you package modules and dependencies such as your libraries using Distribute. Distutils is something else that is worth researching.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.