Should I bundle C libraries with my Python application?

Should I bundle C libraries with my Python application? - python

If I have a Python package that depends on some C libraries (like say the Gnu Scientific Library (GSL) for numerical computations), is it a good idea to bundle the library with my code?
I'd like to make my package as easy to install as possible for users and I don't want them to have to download C libraries by hand and supply include-paths. Also I could always ensure that the version of the library that I ship is compatible with my code.
However, is it possible that there are clashes if the user has the library installed already, or ar there any other reasons why I shouldn't do this?
I know that I can make it easier for users by just providing a binary distribution, but I'd like to avoid having to maintain binary distributions for all possible OSs. So, I'd like to stick to a source distribution, but for the user (who proudly owns a C compiler) installation should be as easy as python setup.py install.

Distribution is one of the hard parts for any software project. Java and .NET lift part of this burden by defining a standard runtime and then just saying "just distribute everything else." Of course there's a drawback: everything must be rewritten in a language supported by the runtime - as soon as you want to use native code, you lose all the advantages.
That's harder in Python, as it is in Ruby, C, C++ and other languages, as they usually leverage existing native libraries.
Generally speaking:
Make it possible to get a source sdist, via pypi.python.org as an example. Correctly set your install_requires (probably you'll require python bindings for GSL, not GSL itself). Use standard setuptools/distribute layout. This will let anyone - let's say a package maintainer for any distro - to pick up your software and package it.
Additionally, consider providing a full-blown installable package for your audience. You don't have to support all the distros and operating system; pick one or two that you consider will be used most. Tools like PyInstaller will let you create an installable, runnable package for many operating systems, but especially for linux you might want the user to install the distribution's own version of transitive deps (libgsl?) - you'll need a full-blown deb or rpm package to satisfy that - again, don't try supporting any and all the distro, you'll turn out mad. Support something you most use, and let other users to help you with other packaging needs.
Also take a look at Python Packaging Guide

You could have two separate branches of the src, one containing the libraries and another that doesn't. That way you can explicitly warn your users in case they have installed the libraries. Another solution could be (if the licences of the libraries allow you) is to wrap 'em up in a single file.
I think there's no unique solution, but this are the ideas I could think so far.
Good luck

You can use virtualenv to create a private Python environment for your application. This avoids conflicts with other libraries. It is best if you package modules and dependencies such as your libraries using Distribute. Distutils is something else that is worth researching.

Related

Building a python package to publish in pypi

I am greatly confused with the process of building a python package that I want to distribute on pypi.
There are some specific, basic things that I did not understand:
What exactly is that gets published? Binaries? Source code? How do I do one or the other?
How do I build multiple platform-specific, os-specific build from the same codebase?
How do I build a the package for multiple versions of Python from the same codebase? Is it necessary if I want to support many python versions?
I am using a .toml file for the setup configuration.
I found some answers only, but all refer to procedures with either a setup.py or a setup.cfg.

What exactly is that gets published? Binaries? Source code?
Yes, and yes.
It depends on details of your project and your package config.
Arbitrary commands can be run during a package build.
You might, for example, run a fortran compiler locally
and ship binaries, or you might insist that each person
installing the package run their own local fortran compiler.
We usually expect full *.py source code will appear on pypi.org.
"Binaries" here will usually describe compiled machine code,
and not *.pyc bytecode files.
How do I build multiple platform-specific, os-specific build from the same codebase?
I have only done this via git pull on a target platform, followed
by a local build, but there is certainly support for cross target
toolchains if you need that.
How do I build a the package for multiple versions of Python from the same codebase?
Same as above -- do a build under each separate target version.
Is it necessary if I want to support many python versions?
Typically the answer is "no".
Pick a minimum required interpreter version, such as 3.7,
and do all your development / testing / release work in that.
Backward compatibility of interpreters is excellent.
Folks running 3.8 or 3.11 should have no trouble
with your package.
There can be a fly in the ointment.
Suppose your project depends on library X,
or depends on X which depends on Y.
And one of them stopped being updated a few years ago,
or went through a big change like a rename.
Your users who are on 3.11 might find it
inconvenient to obtain a compatible version of X or Y.
This might motivate you to do split releases,
for example via major version number or by
slightly altering your project name.
Clearly you haven't crossed that bridge quite yet.
The poetry ecosystem
is reasonably mature. It has tried to fix many of the
rough edges surrounding the python packaging practices
of the last few decades. I recommend that you prefer
modern over ancient practices, and that you adopt poetry
for your project.
If that won't fly for some reason, and especially if
binaries are a big deal for your project, consider publishing via
conda.
There are many pip pitfalls with target system
needing compilers and libraries. Conda does an
excellent job of ensuring that conda install ...
will Just Work.

Best practice management of conda, pip, system, and personal software

In my work I need a mix of C++, Fortran and Python codes to interact, in the latter case mainly via Cython and SWIG. They are a mix of widely-used libraries (typically available via some package management system), ones specific to my field but not written by me (mostly not packaged beyond source tarballs), and ones for which I'm the developer.
For a long time we've been able to get by without worrying too much about Python 3, and so I operated a ~/local install area with a mix of the compiled and Python2-based software in the usual bin, lib, lib/python2.*/site-packages, etc. structure, and with the relevant subdir paths included in my .bashrc PATH, LD_LIBRARY_PATH, PYTHONPATH environment variables. But in particular with the rise of Python 3 for machine learning, and an increase in incompatible ML packages, I've had to start operating virtualenv directories for some projects. This, and also the switch of system tools like meld to Python 3, mean that my single, global, Python 2 environment isn't fit for purpose anymore.
At the same time, I've become aware that conda and condaforge are now being pushed for a lot of relevant software. So there's now going to be system packages and Python versions, potentially conda environments (for specific Pythons), pip packages (in virtualenvs or not), and then my personal builds. This is quite a lot to operate consistently, and there doesn't seem to be much information out there about best-practice, especially when sharing some code between multiple projects, and mixing in non-Python libraries with these Python-focused tools. Installing a whole chain of manual dependencies in an independent conda or virtualenv environment for each project would be very difficult to manage and wasteful in terms of duplication of large libraries, but on the other hand there seems to be at least a need for separate Python 2/3 environments, perhaps with more project-specific virtualenvs within them.
So, scene set -- apologies for the length, but it's intrinsically complex. Is anyone else wrestling with this problem, and is there an emerging standard or best-practice way to manage the mix of system, conda, pip, and manual package dependencies for development of many projects, without undue duplication?
PS. I appreciate answers may be opinion-based to some extent, although good ones will evidence and justify their recommendations. On the other hand, it's definitely about software development rather than just software management. So I hope it's an appropriate question for SO, since I don't see a better fit within the SX network.

Distributing a Python module with ffmpeg dependency

Rookie software developer here. I’m working on a Python module that harnesses some functionality from the FFmpeg framework - specifically, the ebur128 filter function. Ideally the module will stand on its own as an independent, platform agnostic tool for verifying that audio clips comply with EBU loudness standards. It’s being designed so that end users need only perform one simple, (hopefully!) painless installation procedure, which will encompass the installation of both the FFmpeg libraries and my Python wrapper/GUI.
I apologize for the rather vague question, but does anyone have general advice for creating Python module with external dependencies, or specific advice for standardizing the FFmpeg installation across platforms? Distutils seems pretty helpful – are there other guidelines or standard practices for developing a neatly packaged Python tool? I want to minimize any installation headaches for end users.
Thanks very much.

For Windows
I think it will be easy to find ffmpeg binaries that work on any system, just like for Qt or whatever GUI library you are using. You can ship these binaries with your project and things will work (you may want to distinguish 32 bit and 64 bit systems, though).
It looks like you want to create a software that is self-contained and easily installable for end-users. Inkscape is such an example -- its installer contains Python and all other dependencies, in binary form (if required). That is, for Windows, you do not need to create a real Python package (which would allow installation with pip), and you do not need to look into distutils (which supports building C extensions). Both you do not need/want, I guess.
Maybe it will be enough for you to assemble a good directory structure and to distribute a ZIP archive with your software. This is enough if you do not need to interact with the Windows registry, for instance. Such programs are usually called "standalone", in the Windows world. However, you might still want to have a real Windows installer (even if it is just a self-extracting archive). The following article covers your requirements, I believe: http://cyrille.rossant.net/create-a-standalone-windows-installer-for-your-python-application/
It suggests using http://www.jrsoftware.org/isinfo.php for creating such an installer.
Other platforms
On other operating systems it will be more difficult. For instance, I think it will be almost impossible to create ffmpeg binaries that run on every Linux system, because ffmpeg itself has so many binary dependencies. I do not know whether you can statically build ffmpeg at all.

Which is the most pythonic: installing python modules via a package manager ( macports, apt) or via pip/easy_install/setuptools

Usually I tend to install things via the package manager, for unixy stuff. However, when I programmed a lot of perl, I would use CPAN, newer versions and all that.
In general, I used to install system stuff via package manager, and language stuff via it's own package manager ( gem/easy_install|pip/cpan)
Now using python primarily, I am wondering what best practice is?

The system python version and its libraries are often used by software in the distribution. As long as the software you are using are happy with the same versions of python and all the libraries as your distribution is, than using the distribution packages will work just fine.
However, quite often you need development version of packages, or newer version, or older versions. And then it doesn't work any more.
It is therefore usually recommeded to install your own Python version that you use for development, and create development environments with buildout or virtualenv or both, to isolate the system python and the development environment from each other.

There are two completely opposing camps: one in favor of system-provided packages, and one in favor of separate installation. I'm personally in the "system packages" camp. I'll provide arguments from each side below.
Pro system packages: system packager already cares about dependency, and compliance with overall system policies (such as file layout). System packages provide security updates while still caring about not breaking compatibility - so they sometimes backport security fixes that the upstream authors did not backport. System packages are "safe" wrt. system upgrades: after a system upgrade, you probably also have a new Python version, but all your Python modules are still there if they come from a system packager. That's all personal experience with Debian.
Con system packages: not all software may be provided as a system package, or not in the latest version; installing stuff yourself into the system may break system packages. Upgrades may break your application.
Pro separate installation: Some people (in particular web application developers) argue that you absolutely need a repeatable setup, with just the packages you want, and completely decoupled from system Python. This goes beyond self-installed vs. system packages, since even for self-installed, you might still modify the system python; with the separate installation, you won't. As Lennart discusses, there are now dedicated tool chains to support this setup. People argue that only this approach can guarantee repeatable results.
Con separate installation: you need to deal with bug fixes yourself, and you need to make sure all your users use the separate installation. In the case of web applications, the latter is typically easy to achieve.

Are there any other good alternatives to zc.buildout and/or virtualenv for installing non-python dependencies?

I am a member of a team that is about to launch a beta of a python (Django specifically) based web site and accompanying suite of backend tools. The team itself has doubled in size from 2 to 4 over the past few weeks and we expect continued growth for the next couple of months at least. One issue that has started to plague us is getting everyone up to speed in terms of getting their development environment configured and having all the right eggs installed, etc.
I'm looking for ways to simplify this process and make it less error prone. Both zc.buildout and virtualenv look like they would be good tools for addressing this problem but both seem to concentrate primarily on the python-specific issues. We have a couple of small subprojects in other languages (Java and Ruby specifically) as well as numerous python extensions that have to be compiled natively (lxml, MySQL drivers, etc). In fact, one of the biggest thorns in our side has been getting some of these extensions compiled against appropriate versions of the shared libraries so as to avoid segfaults, malloc errors and all sorts of similar issues. It doesn't help that out of 4 people we have 4 different development environments -- 1 leopard on ppc, 1 leopard on intel, 1 ubuntu and 1 windows.
Ultimately what would be ideal would be something that works roughly like this, from the dos/unix prompt:
$ git clone [repository url]
...
$ python setup-env.py
...
that then does what zc.buildout/virtualenv does (copy/symlink the python interpreter, provide a clean space to install eggs) then installs all required eggs, including installing any native shared library dependencies, installs the ruby project, the java project, etc.
Obviously this would be useful for both getting development environments up as well as deploying on staging/production servers.
Ideally I would like for the tool that accomplishes this to be written in/extensible via python, since that is (and always will be) the lingua franca of our team, but I am open to solutions in other languages.
So, my question then is: does anyone have any suggestions for better alternatives or any experiences they can share using one of these solutions to handle larger/broader install bases?

Setuptools may be capable of more of what you're looking for than you realize -- if you need a custom version of lxml to work correctly on MacOS X, for instance, you can put a URL to an appropriate egg inside your setup.py and have setuptools download and install that inside your developers' environments as necessary; it also can be told to download and install a specific version of a dependency from revision control.
That said, I'd lean towards using a scriptably generated virtual environment. It's pretty straightforward to build a kickstart file which installs whichever packages you depend on and then boot virtual machines (or production hardware!) against it, with puppet or similar software doing other administration (adding users, setting up services [where's your database come from?], etc). This comes in particularly handy when your production environment includes multiple machines -- just script the generation of multiple VMs within their handy little sandboxed subnet (I use libvirt+kvm for this; while kvm isn't available on all the platforms you have developers working on, qemu certainly is, or you can do as I do and have a small number of beefy VM hosts shared by multiple developers).
This gets you out of the headaches of supporting N platforms -- you only have a single virtual platform to support -- and means that your deployment process, as defined by the kickstart file and puppet code used for setup, is source-controlled and run through your QA and review processes just like everything else.

I always create a develop.py file at the top level of the project, and have also a packages directory with all of the .tar.gz files from PyPI that I want to install, and also included an unpacked copy of virtualenv that is ready to run right from that file. All of this goes into version control. Every developer can simply check out the trunk, run develop.py, and a few moments later will have a virtual environment ready to use that includes all of our dependencies at exactly the versions the other developers are using. And it works even if PyPI is down, which is very helpful at this point in that service's history.

Basically, you're looking for a cross-platform software/package installer (on the lines of apt-get/yum/etc.) I'm not sure something like that exists?
An alternative might be specifying the list of packages that need to be installed via the OS-specific package management system such as Fink or DarwinPorts for Mac OS X and having a script that sets up the build environment for the in-house code?

I have continued to research this issue since I posted the question. It looks like there are some attempts to address some of the needs I outlined, e.g. Minitage and Puppet which take different approaches but both may accomplish what I want -- although Minitage does not explicitly state that it supports Windows. Lacking any better options I will try to make either one of these or just extensive customized use of zc.buildout work for our needs, but I still feel like there must be better options out there.

You might consider creating virtual machine appliances with whatever production OS you are running, and all of the software dependencies pre-built. Code can be edited either remotely, or with a shared folder. It worked pretty well for me in a past life that had a fairly complicated development environment.

Puppet doesn't (easily) support the Win32 world either. If you're looking for a deployment mechanism and not just a "dev setup" tool, you might consider looking into ControlTier (http://open.controltier.com/) which has a open-source cross-platform solution.
Beyond that you're looking at "enterprise" software such as BladeLogic or OpsWare and typically an outrageous pricetag for the functionality offered (my opinion, obviously).
A lot of folks have been aggressively using a combination of Puppet and Capistrano (even non-rails developers) for deployment automation tools to pretty good effect. Downside, again, is that it's expecting a somewhat homogeneous environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.