Tips for interacting with debian based repositories

Tips for interacting with debian based repositories - python

I am planning on writing a small program that interacts with a debian based repository - namely doing a partial mirror**. I am planning to write it in python.
What are some tips for working with the repository including already constructed 'wheels' (to save the invention of yet another one)?
Some issues I have identified
As it is going to be a partial mirror, I will need to regenerate the package lists (Release,Contents*, Packages.{bz2,gz}). (Maybe debian-installer can do it for me??)
How to get changes to package list (I already know that packages do not change, but that the lists only link to the latest file)?
** Already looked into apt-mirror and debmirror. Debmirror is the closest to what I want, however lacking in some features. If apt can deal with multiple releases and architectures then I will consider apt.

debian-installer doesn't generate repository metadata. For that, you want a tool like reprepro or mini-dinstall. They'll also handle the second point you raised.

Related

Pip Whl naming conventions for git branches

I feel like I am doing something wrong. We have some projects that produce pip packages in CI whenever we push a commit. I am using setuptools_scm to produce a version number based upon the last tag. I have two problems that I am struggling to solve.
Let's say we have a scenario where two developers are working in two different feature branches. Whenever either of them commits their code, our CI produces a new pip package and pushes it to a development pypi server. The version contains information about the previous tag and the commit hash, but it doesn't contain any information about the feature branch that produced it. If I look at the pypi server I will see packages from both developers. As far as I can see, I can't tell which packages came from which feature branch without significant effort.
If someone wants to test out the feature branch, then they need to figure out the exact version number produced by setuptools_scm - something like package-0.1.dev41+gabcdef12. This is painful to communicate everytime someone pushes a new commit. It would be nice if the branch name was somehow part of the version. (Something like package-0.1.branch.dev41+gabcdef12 Then the user could do a pip install package==0.1.branch to get the latest from my branch. But I see that this is not a valid version.)
I've looked at https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/specification.html and the various PEPs that it references. The only place where I could reasonable put a branch name would be in the local section. This would solve the first problem. I could easily see which feature branch each package came from. But it doesn't help me with testing out a feature branch.
I know that I could produce an alpha/beta/rc tag and use that. But this doesn't map to their intended use. An rc would normally have several commits from many feature branches that were merged since the last release, not a new rc for every commit on a feature branch.
I know that I'm not the only one using git and pip packages. Since I can't find a solution to the problem, I worry that I might be thinking about it wrong. Are there commonly used or standardized ways to handle these issues?

For those who may come across this in the future, I think the best solution is to not package feature branches. Pip allows us to install from a feature branch via pip install git+${REPO_URL}#branch syntax. This syntax works on the command line, and requirements.txt files, and with tools like pip-compile. The user can tie themselves to the head of a particular branch or a specific commit.
The syntax is not the easiest to remember, but it does a very effective job of allowing me to share a feature branch. When I want to make a release, I can then tag the repo and create a package that is more publicly consumable.

Reading setup.py via API to test requirements

I'm working on a plugin system and was thinking of simply using a setup.py file for each 'plugin' since it's already an existing dependency. The thing is, I need a way to test requirements.
Is there already an existing API in place for this, or would it make more sense just to roll a custom system and check it manually?

setup.py is a script, and you can't generally parse that to figure out requirements, especially since some setup scripts will change the requirements depending on the python version used to run them.
There is an upcoming standard that will fix this: PEP 345. At this point in time very few packages make use of this though. For more information on this topic you can look at the distutils-sig list archives where this topic has come up several times.
Have you looked at egg entry points? They basically implement a plugin system that you can use directly. This stackoverflow question has some information that might be interesting.

Tracking global migration to Python 3.x

Python 3.x is looking ever more tempting with cleaned up syntax (I like it, others may not) new features and what looks like a gradual progression towards more speed and better multithreading.
But Python 3.x is still held back by lack of 3rd party support. Important packages like Django, Twisted, etc. are not ported. It's hard to get an overview of where the bottlebecks in the migration are, how far it has come, and if it's progressing at all. The migration dependencies are also hard to map. Also, projects are probably waiting for Python 3.x to offer some major improvement over 2.x that would justify the effort of porting.
Ideally, there would be a site for tracking this migration overall, with (links to) migration plans and dependencies shown so that people willing to help the migration globally could coordinate their efforts and help specific projects. Perhaps also linking to projects' bug tracking systems for relevant migration-related bugs.
But perhaps I'm just not looking hard enough. Does someone know of any efforts to track global migration to Python 3.x?
(By "global", I mean the universe of open source projects built on Python.)
Update:
There's a poll right now on the Python home page which asks about packages you'd like to see ported to Python 3.x.

George Brandl has made a script that generates a graph with the amount of packages supporting Python 3:
The Link on the CheeseShop front page shows the packages in question: http://pypi.python.org/pypi?%3aaction=browse&c=533&show=all
There is also (a pretty crummy) list of unported packages ordered by how many depends on it: http://onpython3yet.com/ Why do I say it's crummy? Well, because it is done entirely without manual fixing up, resulting in things like listing Python as a package. This is to a large extent because people don't know that the "Dependencies" listing isn't a place to just list any sort of random dependencies, it should be used to list the packages that should be auto installed when you use easy_install/PIP. But for example in the Django world, they don't know that so you see things like "django-saddle" depending on Django and Python, and hence not being easy_installable.
That said, the list is interesting, and we see that PIL really should get ported.
Now this is not anything "global" it's just the packages on PyPI, and as such tend to be mostly Python modules, not separate applications. But I think the trend in general is visible there anyway.

The Python Package Index (PyPI) allows you to search for Python 3rd-party modules that support Python 3.x. It even has a Python 3 packages link which lists them all.
But that doesn't track individual projects' progress on Python 3 support. It just tells you which projects have achieved it.
Something I'd be interested to see is a graph of the total number/percentage of Python 3 packages in PyPI over time (from Python 3 release until present). I don't know if anyone has tracked this, or if the PyPI administrators have enough history data to produce such graphs.

Organizing Python projects with shared packages

What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).

The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.

If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.

How might I handle development versions of Python packages without relying on SCM?

One issue that comes up during Pinax development is dealing with development versions of external apps. I am trying to come up with a solution that doesn't involve bringing in the version control systems. Reason being I'd rather not have to install all the possible version control systems on my system (or force that upon contributors) and deal the problems that might arise during environment creation.
Take this situation (knowing how Pinax works will be beneficial to understanding):
We are beginning development on a new version of Pinax. The previous version has a pip requirements file with explicit versions set. A bug comes in for an external app that we'd like to get resolved. To get that bug fix in Pinax the current process is to simply make a minor release of the app assuming we have control of the app. Apps we don't have control we just deal with the release cycle of the app author or force them to make releases ;-) I am not too fond of constantly making minor releases for bug fixes as in some cases I'd like to be working on new features for apps as well. Of course branching the older version is what we do and then do backports as we need.
I'd love to hear some thoughts on this.

Could you handle this using the "==dev" version specifier? If the distribution's page on PyPI includes a link to a .tgz of the current dev version (such as both github and bitbucket provide automatically) and you append "#egg=project_name-dev" to the link, both easy_install and pip will use that .tgz if ==dev is requested.
This doesn't allow you to pin to anything more specific than "most recent tip/head", but in a lot of cases that might be good enough?

I meant to mention that the solution I had considered before asking was to put up a Pinax PyPI and make development releases on it. We could put up an instance of chishop. We are already using pip's --find-links to point at pypi.pinaxproject.com for packages we've had to release ourselves.

Most open source distributors (the Debians, Ubuntu's, MacPorts, et al) use some sort of patch management mechanism. So something like: import the base source code for each package as released, as a tar ball, or as a SCM snapshot. Then manage any necessary modifications on top of it using a patch manager, like quilt or Mercurial's Queues. Then bundle up each external package with any applied patches in a consistent format. Or have URLs to the base packages and URLs to the individual patches and have them applied during installation. That's essentially what MacPorts does.
EDIT: To take it one step further, you could then version control the set of patches across all of the external packages and make that available as a unit. That's quite easy to do with Mercurial Queues. Then you've simplified the problem to just publishing one set of patches using one SCM system, with the patches applied locally as above or available for developers to pull and apply to their copies of the base release packages.

EDIT: I am not sure I am reading your question correctly so the following may not answer your question directly.
Something I've considered, but haven't tested, is using pip's freeze bundle feature. Perhaps using that and distributing the bundle with Pinax would work? My only concern would be how different OS's are handled. For example, I've never used pip on Windows, so I wouldn't know how a bundle would interact there.
The full idea I hope to try is creating a paver script that controls management of the bundles, making it easy for users to upgrade to newer versions. This would require a bit of scaffolding though.
One other option may be you keeping a mirror of the apps you don't control, in a consistent vcs, and then distributing your mirrored versions. This would take away the need for "everyone" to have many different programs installed.
Other than that, it seems the only real solution is what you guys are doing, there isn't a hassle-free way that I've been able to find.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.