Distributing a python package based on CUDA availability - python

I have written a python package that does stuff (the specifics are irrelevant). When using the package, the user can specify whether or not to make use of available GPUs. In my code, I use the CuPy package to utilize the available GPUs.
I would now like to distribute the package - ideally package it with setuptools, put it up on PyPI and have users install it using pip.
The issue:
I want it to be possible to install it on systems regardless of whether they have CUDA available or not. The problem is, if the system does not have CUDA, the installation (and consequently the import) of CuPy will fail. Therefore, I gather that I need to have two separate versions of the package, one that supports CUDA (and imports CuPy etc) and one that does not. Moreover, I need some mechanism that will choose the appropriate version during installation.
What I was hoping to find is some sort of tool that will check (during the installation process) whether CUDA is available (or, more practically - whether CuPy can be installed), and accordingly choose the proper version of my package to download and install.
Any ideas whether something of the sorts exists, and if not, then on the right way to implement it?
NOTE: I assume I could list CuPy in the dependencies in setuptools regardless of whether the system has CUDA or not, and assuming it does not, have my package catch any exceptions that arise due to trying to install/import/use it. But I feel this is rather barbaric, and was hoping to find something more elegant.
NOTE 2: Of course, I could let the user choose the right version for his system, but I would rather not, if possible.

Related

How to Install the Minimum Version of a Dependency

Question
How can I install the lowest possible version of a dependency using pip (or any other tool for that matter) given some set of version specifiers. So if I were to specify requests >=2.0, <3.0 I would actually want to install requests==2.0.
Motivation
When creating Python packages you probably want to enable users to interact with it in a variety of environments. This means that you'll often claim to support many versions of a given dependency. For example one might claim that their package supports requests>=2.0. The problem here is that to responsibly make this claim we need some way of testing that our package works both with requests==2.0 but also the latest version and anything in between.
Unfortunately I don't think there's a good solution for testing one's support for all possible combinations of dependencies, but you can at least try to test the maximum and minimum versions of your dependencies.
To solve this problem I usually create a requirements.min.txt file that contains the lowest version of all my dependencies so that I can install those when I want to test that my changes haven't broken my support for older versions. Thus, if I claimed to support requests>=2.0 my requirements.min.txt files would have requests==2.0.
This solution to the problem of testing against one's minimum dependency set feels messy though. Has anyone also found a good solution?
not a full answer, but something that might help you get started...
pip install pkg_name_here==
this will output all versions available on pip which you can then redirect to a string/list and split out the versions, then install any that fall between you min/max supported versions using some kind of conditional statement
Said feature not yet exists in pip (at time of writing).
However, there is an ongoing discussion https://github.com/pypa/pip/issues/8085 and PR https://github.com/pypa/pip/pull/11336 to implement it as --version-selection=min.
I hope it will be merged eventually.

Django: requirements.txt

So far I know requirements.txt like this: Django==2.0. Now I saw this style of writing Django>=1.8,<2.1.99
Can you explain to me what it means?
requirements.txt is a file where one specifies dependencies. For example your program will here depend on Django (well you probably do not want to implement Django yourself).
In case one only writes a custom application, and does not plan to export it (for example as a library) to other programmers, one can pin the version of the library, for example Django==2.0.1. Then you can always assume (given pip manages to install the correct package) that your environment will ave the correct version, and thus that if you follow the correct documentation, no problems will (well should) arise.
If you however implement a library, for example mygreatdjangolibrary, then you probably do not want to pin the version: it would mean that everybody that wants to use your library would have to install Django==2.0.1. Imagine that they want a feature that is only available in django-2.1, then they can - given they follow the dependencies strictly - not do this: your library requires 2.0.1. This is of course not manageable.
So typically in a library, one aims to give as much freedom to a user of a library. It would be ideal if regardless of the Django version the user installed, your library could work.
Unfortunately this would result in a lot of trouble for the library developer. Imagine that you have to take into account that a user can use Django-1.1 up to django-2.1. Through the years, several features have been introduced that the library then can not use, since the programmer should be conservative, and take into account that it is possible that these features do not exist in the library the user installed.
It becomes even worse since Django went through some refactoring: some features have later been removed, so we can not simply program on django-1.1 and hope that everything works out.
So in that case, it makes sense to specify a range of versions we support. For example we can read the documentation of django-2.0, and look to the release notes to see if something relevant changed in django-2.1, and let tox test both versions for the tests we write. We thus then can specify a range like Django>=2.0,<2.1.99.
This is also important if you depend on several libraries that each a common requirement. Say for example you want to install a library liba, and a library libb, both depend on Django, bot the two have a different range, for example:
liba:
Django>=1.10, <2.1
libb:
Django>=1.9, <1.11
Then this thus means that we can only install a Django version between >=1.10 and <1.11.
The above even easily gets more complex. Since liba and libb of course have versions as well, for example:
liba-0.1:
Django>=1.10, <2.1
liba-0.2:
Django>=1.11, <2.1
liba-0.3:
Django>=1.11, <2.2
libb-0.1:
Django>=1.5, <1.8
libb-0.2:
Django>=1.10, <2.0
So if we now want to install any liba, and any libb, we need to find a version of liba and libb that "allows" us to install a Django version, and that is not that trivial since for example if we would pick libb-0.1, then there is no version of liba that supports an "overlapping" Django version.
To the best of my knowledge, pip currently has no dependency resolution algorithm. It looks at the specification, and each time aims to pick the most recent that is satisfying the constraints, and recursively installs the dependencies of these packages.
Therefore it is up to the user to make sure that (sub)dependencies do not conflict: if we would specify liba libb==0.1, then pip will probably install Django-2.1, and then find out that libb can not work with this.
There are some dependency resolution programs. But the problem turns out to be quite hard (it is NP-hard if I recall correctly). So that means that for a given dependency tree, it can takes years to find a valid configuration.

How do I access different python module versions?

so I am working on a shared computer. It is a workhorse for computations across the department. The problem we have run into is controlling versions of imported modules. Take for example Biopython. Some people have requirements of an older version of biopython - 1.58. And yet others have requirements for the latest biopython - 1.61. How would I have both versions of the module installed side by side, and how does one specifically access a particular version. I ask because sometimes these apis change and break old scripts for other people (or they expect certain functionality that is no longer there).
I understand that one could locally (i.e. per user) install the module and specifically direct python to that module. Is there another way to handle this? Or would everyone have to create an export PYTHONPATH before using?
I'm not sure if it's possible to change the active installed versions of a given module. Given my understanding of how imports and site-packages work, I'm leaning towards no.
Have you considered using virtualenv though ?
With virtualenv, you could create multiple shared environments -- one for biopython 1.58 , another for 1.61 , another for whatever other special situations you need. They don't need to be locked down to a particular user, so while it would take more space than what you desired, it could take less space than everyone having their own python environment.
It sounds like you're doing scientific computing. You should use Anaconda, and make particular note of the conda tool, documented here.
Conda uses hard links whenever possible to avoid copies of the same files. It also manages non-python binary modules in a much better way than virtualenv (virtualenv chokes on VTK, for example).

Deploy Python programs on Windows and fetch big library dependencies

I have some small Python programs which depend on several big libraries, such as:
NumPy & SciPy
matplotlib
PyQt
OpenCV
PIL
I'd like to make it easier to install these programs for Windows users. Currently I have two options:
either create huge executable bundles with PyInstaller, py2exe or similar tool,
or write step-by-step manual installation instructions.
Executable bundles are way too big. I always feel like there is some magic happening, which may or may not work the next time I use a different library or a new library version. I dislike wasted space too. Manual installation is too easy to do wrong, there are too many steps: download this particular interpreter version, download numpy, scipy, pyqt, pil binaries, make sure they all are built for the same python version and the same platform, install one after another, download and unpack OpenCV, copy its .pyd file deep inside Python installation, setup environment variables and file asssociations... You see, few users will have the patience and self-confidence to do all this.
What I'd like to do: distribute only a small Python source and, probably, an installation script, which fetches and installs all the missing dependencies (correct versions, correct platform, installs them in the right order). That's a trivial task with any Linux package manager, but I just don't know which tools can accomplish it on Windows.
Are there simple tools which can generate Windows installers from a list of URLs of dependencies1?
1 As you may have noticed, most of the libraries I listed are not installable with pip/easy_install, but require to run their own installers and modify some files and environment variables.
npackd exists http://code.google.com/p/windows-package-manager/ It could be done through here or use distribute (python 3.x) or setuptools (python 2.x) with easy_install, possibly pip (don't know it's windows compatibility). But I would choose npackd because PyQt and it's unusual setup for pip/easy_install (doesn't play with them nicely, using a configure.py instead of setup.py). Though you would have to create your own repo for npackd to use for some of them. I forget what is contributed in total for python libs with it.
AFAIK there is no tool (and I'd assume you googled), so you must make one yourself.
Fetching the proper library versions seems simple enough -- using python's ftplib you can fetch the proper installers for every library. How would you know which version is compatible with the user's python? You can store different lists of download URLs, each for a different python version (this method came off the top of my head and there is probably a better way; not that it matters much if it's simple and it works).
After you figure out how to make each installer run, you can py2exe your installer script, and even use it to fetch the program itself.
EDIT
Some Considerations
There are a couple of things that popped into my mind just as I posted:
First, some pseudocode (how I would approach it, anyway)
#first, we check modules
try:
import numpy
except ImportError:
#flag numpy for installation
#lather, rinse repeat for all dependencies
#next we check version compatibility -- note that if a library version you need
#is not backwards-compatible, you're in DLL hell, and there is little we can do.
<insert version-checking code here>
#once you have your unavailable dependencies, you install them
import ftplib
<all your file-downloading here>
#now you install. sorry I can't help you here.
There are a few things you can do to make your utility reusable --
put all URL lists, minimum version numbers, required library names etc in config files
Write a script which helps you set up an installer
Py2exe the installer-maker-script
Sell it
Even better, release it under GPL so we can all feast upon fruits of your labours.
I have a similar need as you, but in addition I need the packaged application to work on several platforms. I'm currently exploring the currently available solutions, here are a few interesting ones:
Use SnakeBasket, which wraps around Pip and add a recursive dependency resolution plus a heuristic to choose the right version when there are conflicts.
Package all dependencies as an egg, but not your sourcecode which will still be editable: https://stackoverflow.com/a/528064/1121352
Package all dependencies in a zip file and directly import the modules on the fly: Cross-platform alternative to py2exe or http://davidf.sjsoft.com/mirrors/mcmillan-inc/install1.html
Using buildout: http://www.buildout.org/en/latest/install.html
Using virtualenv with virtualenv-tools (instead of "relocate")
If your main problem when freezing your code using PyInstaller or similar is that you end up with a big single file, you can customize the process so that you get several files, one for each dependency, instead of one big executable.
I will update here if I find something that fills my bill.

Should I bundle C libraries with my Python application?

If I have a Python package that depends on some C libraries (like say the Gnu Scientific Library (GSL) for numerical computations), is it a good idea to bundle the library with my code?
I'd like to make my package as easy to install as possible for users and I don't want them to have to download C libraries by hand and supply include-paths. Also I could always ensure that the version of the library that I ship is compatible with my code.
However, is it possible that there are clashes if the user has the library installed already, or ar there any other reasons why I shouldn't do this?
I know that I can make it easier for users by just providing a binary distribution, but I'd like to avoid having to maintain binary distributions for all possible OSs. So, I'd like to stick to a source distribution, but for the user (who proudly owns a C compiler) installation should be as easy as python setup.py install.
Distribution is one of the hard parts for any software project. Java and .NET lift part of this burden by defining a standard runtime and then just saying "just distribute everything else." Of course there's a drawback: everything must be rewritten in a language supported by the runtime - as soon as you want to use native code, you lose all the advantages.
That's harder in Python, as it is in Ruby, C, C++ and other languages, as they usually leverage existing native libraries.
Generally speaking:
Make it possible to get a source sdist, via pypi.python.org as an example. Correctly set your install_requires (probably you'll require python bindings for GSL, not GSL itself). Use standard setuptools/distribute layout. This will let anyone - let's say a package maintainer for any distro - to pick up your software and package it.
Additionally, consider providing a full-blown installable package for your audience. You don't have to support all the distros and operating system; pick one or two that you consider will be used most. Tools like PyInstaller will let you create an installable, runnable package for many operating systems, but especially for linux you might want the user to install the distribution's own version of transitive deps (libgsl?) - you'll need a full-blown deb or rpm package to satisfy that - again, don't try supporting any and all the distro, you'll turn out mad. Support something you most use, and let other users to help you with other packaging needs.
Also take a look at Python Packaging Guide
You could have two separate branches of the src, one containing the libraries and another that doesn't. That way you can explicitly warn your users in case they have installed the libraries. Another solution could be (if the licences of the libraries allow you) is to wrap 'em up in a single file.
I think there's no unique solution, but this are the ideas I could think so far.
Good luck
You can use virtualenv to create a private Python environment for your application. This avoids conflicts with other libraries. It is best if you package modules and dependencies such as your libraries using Distribute. Distutils is something else that is worth researching.

Categories

Resources