Versioning Python extension that builds against upstream library

Versioning Python extension that builds against upstream library - python

I am really confused on how to properly combine versioning for my own extension with the versioning of the upstream library I am interfacing to.
Background in a nutshell
There is an upstream static library (say, version 2_2_0_5), and I am building a Cython extension named foobar to provide an interface to it (say, version 0.5.3). Because upstream library is huge, I won't be able to cover its interface in a single shot, so I need to version my own extension as well.
Once new version of upstream is released, I take the latest version of my extension for previous version of upstream and start adjusting it according to the changes in the upstream release, increasing my own version number.
Requirements
I have to provide an easy way to restrict the version of my extension being installed to the given upstream version, something like pip install foobar[upstream==2.2.0.5].
Options I have considered
foobar-2.2.0.5+0.5.3
I have considered using local version labels, like 2.2.0.5+0.5.3, but because local version label has no semantics, I cannot install specific version of my extension without explicitly mentioning the version of upstream — there is seemingly no way to tell pip to do something like pip install foobar>=*+0.5.3.
foobar-0.5.3+2.2.0.5
An approach where my version is the "primary version" and upstream version is my local version label suffers from inability to pinpoint exact upstream version.
foobar_2_2_0_5-0.5.3
Another option I had considered is to inline the upstream version into the name of package itself, so I will have foobar_2_2_0_5-0.5.3 and foobar_2_2_1_9-0.6.8, but this seems extremely ugly to me, and does not offer any filtering capabilities — I cannot install foobar for upstream~=2.2.
foobar[upstream.2.2.0.5]-0.5.3
There are "extras" in setuptools, so package can be installed with foobar[hello,there], but there is no way to version extras — they are mere flags. And I am not sure whether having extra like foobar[upstream.2.2.0.5] would actually do any help, and readability suffers anyway.
I could theoretically make foobar a "virtual package" that would install relevant dependency given correct extra:
setup(
name='foobar',
extras_require={
'2.2.0.5': ['foobar_2_2_0_5'],
'2.2.1.9': ['foobar_2_2_1_9'],
}
)
In practice this would effectively disable the way to specify foobar version, and is essentially not different from foobar-2.2.0.5 approach.
What next?
I have to also keep in mind that upstream developers may use my extension with private builds, so it's necessary to make it possible to keep build tags in upstream version. I don't know how to accomplish that.
Also, reading explanations on the purpose of local version labels, it can be clearly seen that my use contradicts the intended use. However, I don't see any other mechanism I could make to use.
Given this all, I am really confused on what to do next, is there a way to solve this without rewriting half of the setuptools, and whether I am missing some obvious solution after all.

Related

Building a python package to publish in pypi

I am greatly confused with the process of building a python package that I want to distribute on pypi.
There are some specific, basic things that I did not understand:
What exactly is that gets published? Binaries? Source code? How do I do one or the other?
How do I build multiple platform-specific, os-specific build from the same codebase?
How do I build a the package for multiple versions of Python from the same codebase? Is it necessary if I want to support many python versions?
I am using a .toml file for the setup configuration.
I found some answers only, but all refer to procedures with either a setup.py or a setup.cfg.

What exactly is that gets published? Binaries? Source code?
Yes, and yes.
It depends on details of your project and your package config.
Arbitrary commands can be run during a package build.
You might, for example, run a fortran compiler locally
and ship binaries, or you might insist that each person
installing the package run their own local fortran compiler.
We usually expect full *.py source code will appear on pypi.org.
"Binaries" here will usually describe compiled machine code,
and not *.pyc bytecode files.
How do I build multiple platform-specific, os-specific build from the same codebase?
I have only done this via git pull on a target platform, followed
by a local build, but there is certainly support for cross target
toolchains if you need that.
How do I build a the package for multiple versions of Python from the same codebase?
Same as above -- do a build under each separate target version.
Is it necessary if I want to support many python versions?
Typically the answer is "no".
Pick a minimum required interpreter version, such as 3.7,
and do all your development / testing / release work in that.
Backward compatibility of interpreters is excellent.
Folks running 3.8 or 3.11 should have no trouble
with your package.
There can be a fly in the ointment.
Suppose your project depends on library X,
or depends on X which depends on Y.
And one of them stopped being updated a few years ago,
or went through a big change like a rename.
Your users who are on 3.11 might find it
inconvenient to obtain a compatible version of X or Y.
This might motivate you to do split releases,
for example via major version number or by
slightly altering your project name.
Clearly you haven't crossed that bridge quite yet.
The poetry ecosystem
is reasonably mature. It has tried to fix many of the
rough edges surrounding the python packaging practices
of the last few decades. I recommend that you prefer
modern over ancient practices, and that you adopt poetry
for your project.
If that won't fly for some reason, and especially if
binaries are a big deal for your project, consider publishing via
conda.
There are many pip pitfalls with target system
needing compilers and libraries. Conda does an
excellent job of ensuring that conda install ...
will Just Work.

Trying to make a Python project requirements version-free

Imagine a project MyLibrary which used to have its own requirements.txt file specifying all the versions needed by each of the dependencies...
lib_a==0.1
lib_b==0.11
lib_c==0.1.1
lib_d==0.1.2
lib_e==0.1.8
And a project ChildProject which happens to have the same kind of setup, with its own requirements.txt file and everything.
ChildProject uses MyLibrary as it needs some common functionality it has. The problem with this two, is that ChildProject has a library which is also specified in MyLibrary, but with a different version which causes conflict and causing the build to fail.
What I've done to get rid of the problem is to erase the dependencies in MyLibrary and specify the minimum and maximum versions for each of the libraries, specifying those in the setup_requires property within the setup() method...
setup(
setup_requires=['pbr', 'pytest-runner'],
install_requires=[
'lib_a>=0,<1',
'lib_b>=0,<2',
'lib_c>=0,<3',
'lib_d>=0,<4',
'lib_e>=0,<5'
],
pbr=True,
)
And here is where I get lost...
Should I remove requirements.txt in MyLibrary and leave all the versioning to child projects using ?
If so, how do I know that ChildProject is specifying all of the needed dependencies? What if I miss to specify lib_a in ChildProject?
Does the latest version that complies with the setup_requires constraints gets automatically installed or how does it work? (I ask this because AFAIK, install_requires just specified the constraints but doesn't include any library whatsoever in the project).

General suggestions for managing deps versions:
libraries dont't pin versions (i.e. either install_requires doesn't have version at all, or loose restrictions, i.e. <4). That's what you have already
applications can do whatever needed. In reality, it's highly recommended to pin your dependencies to some exact version (ant better yet — provide hash, to save yourself from forged libs). Reason for this — you can not guarantee 3rd-party libraries to follow semver. Which means that having >2, <3 in your requirements.txt may lead to broken build/deployment, because 3rd party lib released 2.5 which appears to be backward-incompatible with 2.4. So, you must do you best to avoid breaking builds by just re-building in different time. In other words, your build should be idempotent on PyPI state.
In general — you pin version to some state, test your application and commit/save/build/however you deliver. Some time later, you're revising versions (i.e. to update framework or address security patch), updating version in requirements.txt, testing your app with new deps state, if there's no conflicts/broken parts, you "freeze" that state with pinned versions, and build/deploy/etc. This kind of loop gives you space to occasionally update your requirements to stay up to date, and at the same time you have code that will not be broken by just re-installing dependencies.
If you're looking to easier dep management with version, I suggest taking a look at pipenv

Django: requirements.txt

So far I know requirements.txt like this: Django==2.0. Now I saw this style of writing Django>=1.8,<2.1.99
Can you explain to me what it means?

requirements.txt is a file where one specifies dependencies. For example your program will here depend on Django (well you probably do not want to implement Django yourself).
In case one only writes a custom application, and does not plan to export it (for example as a library) to other programmers, one can pin the version of the library, for example Django==2.0.1. Then you can always assume (given pip manages to install the correct package) that your environment will ave the correct version, and thus that if you follow the correct documentation, no problems will (well should) arise.
If you however implement a library, for example mygreatdjangolibrary, then you probably do not want to pin the version: it would mean that everybody that wants to use your library would have to install Django==2.0.1. Imagine that they want a feature that is only available in django-2.1, then they can - given they follow the dependencies strictly - not do this: your library requires 2.0.1. This is of course not manageable.
So typically in a library, one aims to give as much freedom to a user of a library. It would be ideal if regardless of the Django version the user installed, your library could work.
Unfortunately this would result in a lot of trouble for the library developer. Imagine that you have to take into account that a user can use Django-1.1 up to django-2.1. Through the years, several features have been introduced that the library then can not use, since the programmer should be conservative, and take into account that it is possible that these features do not exist in the library the user installed.
It becomes even worse since Django went through some refactoring: some features have later been removed, so we can not simply program on django-1.1 and hope that everything works out.
So in that case, it makes sense to specify a range of versions we support. For example we can read the documentation of django-2.0, and look to the release notes to see if something relevant changed in django-2.1, and let tox test both versions for the tests we write. We thus then can specify a range like Django>=2.0,<2.1.99.
This is also important if you depend on several libraries that each a common requirement. Say for example you want to install a library liba, and a library libb, both depend on Django, bot the two have a different range, for example:
liba:
Django>=1.10, <2.1
libb:
Django>=1.9, <1.11
Then this thus means that we can only install a Django version between >=1.10 and <1.11.
The above even easily gets more complex. Since liba and libb of course have versions as well, for example:
liba-0.1:
Django>=1.10, <2.1
liba-0.2:
Django>=1.11, <2.1
liba-0.3:
Django>=1.11, <2.2
libb-0.1:
Django>=1.5, <1.8
libb-0.2:
Django>=1.10, <2.0
So if we now want to install any liba, and any libb, we need to find a version of liba and libb that "allows" us to install a Django version, and that is not that trivial since for example if we would pick libb-0.1, then there is no version of liba that supports an "overlapping" Django version.
To the best of my knowledge, pip currently has no dependency resolution algorithm. It looks at the specification, and each time aims to pick the most recent that is satisfying the constraints, and recursively installs the dependencies of these packages.
Therefore it is up to the user to make sure that (sub)dependencies do not conflict: if we would specify liba libb==0.1, then pip will probably install Django-2.1, and then find out that libb can not work with this.
There are some dependency resolution programs. But the problem turns out to be quite hard (it is NP-hard if I recall correctly). So that means that for a given dependency tree, it can takes years to find a valid configuration.

How can I build an RPM for an earlier version of python?

How can I build a python distribution RPM that is only dependent on an earlier version of python?
Why? I'm trying to build a distribution RPMs for RHEL6/CentOS 6, which only includes Python 2.6, but I am building usually on machines with Python 2.7.
This is an open source project, and I have already ensured that it shouldn't be including any libraries/APIs that are not in 2.6.
I am building the RPMs with:
python setup.py bdist_rpm
setup.py file:
from distutils.core import setup
setup(name='pyresttest',
version='0.1',
description=Text',
maintainer='Not listing here',
maintainer_email='no,just no',
url='project url here',
keywords='rest web http testing',
packages=['pyresttest'],
license='Apache License, Version 2.0',
requires=['yaml','pycurl']
)
(Specifics removed for the url, maintainer, email and description).
The RPM appears to be valid, but when I try to install on RHEL6, I get this error:
python(abi) = 2.7 is needed by pyresttest-0.1-1.noarch
There should be some way to get it to override the default python version to require, or supply a custom SPEC file, but after several hours of fiddling with it, I'm stuck. Ideas?
EDIT: I suppose I should clarify why I'm doing a RPM for python code, instead of just using setuptools or pip: this will hopefully go to production at work, where all deployments are RPM-based and most VMs are still RHEL6. Asking them to adopt another packaging tool is likely to be a non-starter, since our company is closely tied to the RPM format.

Re-organized the answer.
Actually, there's no "rpm-package". There're rpm-packages for RHEL6, rpm-packages for FedoraNN, rpm-packagse for OpenSUSE-X.Y and so on. And besides there're Debian, Ubuntu, Arch and Gentoo :)
You have the following possibilities with your Python package:
You may completely avoid rpm-, deb- and other "native linux packaging systems", and may opt to use a "python-native" packaging system like PIP. Thus you completely avoid the complexity and lack of compatibility between packaging systems in various versions and various flavours of Linux. And for a package which doesn't "infiltrate" deeply into "core system", this could be the best solution.
You may continue to use RPM as an archive format for your package but completely turn off automatic dependency calculations. This can be done with AutoReqProv: no directive in the spec. To be able to work with a customized spec one may use --spec-only and --spec-file distutils options. But remember that a package built this way is even worse than a zip from p.1: without proper dependencies it contains less necessary metainformation and thus "defames" the whole idea behind Linux packaging systems which were invented to built consistent systems, to avoid problems like "DLL hell" and to be suitable for automatic maintainance and updates. Actually you may add dependency information manually, via Requires: <something> tag but this may become even more hard and bporing if you target several Linux platforms at once.
In order to take into account all those complex and boring details and nuances of a particular package system you may create "build sandboxes" with appropriate versions of necessary Linux flavours. My preferred way to create such sandboxes is to use pre-created "OpenVZ templates", but without OpenVZ per se: simply unpack a given archive into a subdirectory (being root to preserve permissions), then chroot into the subdirectory, and voila! you've got Debian, RHEL etc... Fedora people have created Mock for the same purposes and likely Mock would be a more elaborated solution. As #BobMcGee suggests in the comment one also may consider Jenkins Docker plugin
Once you have a build sandbox with python distribution specific to that system, distutils etc you may automate the build process using simple scripting, bash or python.
That's it.

I do not do very much python work but have done some RPM packaging. You probably need to somehow do what one would normally do in the RPM's spec file and specify and require a particular release of your python package like so ...
# this would be in your spec file
requires: python <= 2.6
Take a look here for more info:
http://ftp.rpm.org/max-rpm/s1-rpm-depend-manual-dependencies.html

How might I handle development versions of Python packages without relying on SCM?

One issue that comes up during Pinax development is dealing with development versions of external apps. I am trying to come up with a solution that doesn't involve bringing in the version control systems. Reason being I'd rather not have to install all the possible version control systems on my system (or force that upon contributors) and deal the problems that might arise during environment creation.
Take this situation (knowing how Pinax works will be beneficial to understanding):
We are beginning development on a new version of Pinax. The previous version has a pip requirements file with explicit versions set. A bug comes in for an external app that we'd like to get resolved. To get that bug fix in Pinax the current process is to simply make a minor release of the app assuming we have control of the app. Apps we don't have control we just deal with the release cycle of the app author or force them to make releases ;-) I am not too fond of constantly making minor releases for bug fixes as in some cases I'd like to be working on new features for apps as well. Of course branching the older version is what we do and then do backports as we need.
I'd love to hear some thoughts on this.

Could you handle this using the "==dev" version specifier? If the distribution's page on PyPI includes a link to a .tgz of the current dev version (such as both github and bitbucket provide automatically) and you append "#egg=project_name-dev" to the link, both easy_install and pip will use that .tgz if ==dev is requested.
This doesn't allow you to pin to anything more specific than "most recent tip/head", but in a lot of cases that might be good enough?

I meant to mention that the solution I had considered before asking was to put up a Pinax PyPI and make development releases on it. We could put up an instance of chishop. We are already using pip's --find-links to point at pypi.pinaxproject.com for packages we've had to release ourselves.

Most open source distributors (the Debians, Ubuntu's, MacPorts, et al) use some sort of patch management mechanism. So something like: import the base source code for each package as released, as a tar ball, or as a SCM snapshot. Then manage any necessary modifications on top of it using a patch manager, like quilt or Mercurial's Queues. Then bundle up each external package with any applied patches in a consistent format. Or have URLs to the base packages and URLs to the individual patches and have them applied during installation. That's essentially what MacPorts does.
EDIT: To take it one step further, you could then version control the set of patches across all of the external packages and make that available as a unit. That's quite easy to do with Mercurial Queues. Then you've simplified the problem to just publishing one set of patches using one SCM system, with the patches applied locally as above or available for developers to pull and apply to their copies of the base release packages.

EDIT: I am not sure I am reading your question correctly so the following may not answer your question directly.
Something I've considered, but haven't tested, is using pip's freeze bundle feature. Perhaps using that and distributing the bundle with Pinax would work? My only concern would be how different OS's are handled. For example, I've never used pip on Windows, so I wouldn't know how a bundle would interact there.
The full idea I hope to try is creating a paver script that controls management of the bundles, making it easy for users to upgrade to newer versions. This would require a bit of scaffolding though.
One other option may be you keeping a mirror of the apps you don't control, in a consistent vcs, and then distributing your mirrored versions. This would take away the need for "everyone" to have many different programs installed.
Other than that, it seems the only real solution is what you guys are doing, there isn't a hassle-free way that I've been able to find.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.