Organizing Python projects with shared packages

Organizing Python projects with shared packages - python

What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).

The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.

If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.

Related

Automate multiple installers

I have written a small python script that i want to share with other users.(i want to keep it as a script rather than and exe so that users can edit the codes if they need to)
my script has several external libraries for python which doesn't come with basic python.
But the other users doesn't have python and the required libraries installed in their PCs .
So,For convenient, I am wondering if there's any way to automate the installation process for installing python and the external libraries they need.
To make things more clear, what i meant is to combine all the installers into 1 single big installer.
For you information, all the installers are window x86 MSI installers and there are about 5 or 6 of them.
Is this possible?Could there be any drawbacks of doing this?
EDIT:
All the users are using windows XP pro 32 bit
python 2.7

I would suggest using NSIS. You can bundle all the MSI installers (including python) into one executable, and install them in "silent mode" in whatever order you want. NSIS also has a great script generator you can download.
Also, you might be interested in activepython. It comes with pip and automatically adds everything to your path so you can just pip install most of your dependencies from a batch script.

what i meant is to combine all the installers into 1 single big installer.
I am not sure, if you mean to make one msi out of several. If you have built the msis, this is possible to work out, but in most situations there were reasons for the separation.
But for now I assume as the others, that you want a setup which combines all msi setups into one, e.g. with a packing/selfextracting part, but probably with some own logic.
This is a very common setup pattern, some call it "bootstrapper". Unfortunately the maturity of most tools for bootstrapping is by far not comparable to the msi creation tools so most companies I know, write kind of an own bootstrapper with the dialogs and the control logic they want. This can be a very expensive job.
If you have not high requirements, it may sound a simple job. Just starting a number of processes after each other. But what about a seamless process bar, what about uninstallation (single or bundled), what about repair, modify, what about, if one of them fails or needs a reboot also concerning repair/uninstall/modify/update. And so on.
As mentioned, one of the first issues of bundling several setups into one is about caring how many and which uninstall entries shall the user see, and if it is ok that your bootstrapper does not create an own, combining one.
If this is not an issue for you, then you have chances to find an easy solution.
I know at least three tools for bootstrappers, some call it suites or bundles. I can only mention them here:
WiX has at least something called "Burn". Google for WiX Burn and you will find it. I haven't used it yet, so I can't tell about.
InstallShield Premier, which is not really what most people call a cheap product, allows setup "Suites" which is the same. I don't want to comment the quality here.
In the Windows SDK there is (has been?) a kind of template of a setup.exe to show how to start installation of msi out of a program. I have never looked into that example really to tell more about it.

I suggest putting all the files into a .sfx.exe archive and get them to run it. Extract all files to %temp% and run a batch script to install python.msi and copy the libraries from %temp% to the python library directory. If you want to install python 2.7.5, grab an "Ninite" installer from http://ninite.com/

How do I access different python module versions?

so I am working on a shared computer. It is a workhorse for computations across the department. The problem we have run into is controlling versions of imported modules. Take for example Biopython. Some people have requirements of an older version of biopython - 1.58. And yet others have requirements for the latest biopython - 1.61. How would I have both versions of the module installed side by side, and how does one specifically access a particular version. I ask because sometimes these apis change and break old scripts for other people (or they expect certain functionality that is no longer there).
I understand that one could locally (i.e. per user) install the module and specifically direct python to that module. Is there another way to handle this? Or would everyone have to create an export PYTHONPATH before using?

I'm not sure if it's possible to change the active installed versions of a given module. Given my understanding of how imports and site-packages work, I'm leaning towards no.
Have you considered using virtualenv though ?
With virtualenv, you could create multiple shared environments -- one for biopython 1.58 , another for 1.61 , another for whatever other special situations you need. They don't need to be locked down to a particular user, so while it would take more space than what you desired, it could take less space than everyone having their own python environment.

It sounds like you're doing scientific computing. You should use Anaconda, and make particular note of the conda tool, documented here.
Conda uses hard links whenever possible to avoid copies of the same files. It also manages non-python binary modules in a much better way than virtualenv (virtualenv chokes on VTK, for example).

What are common strategies for updating python programs?

I have a Windows program that I made with python and py2exe. I'd like to create an updating feature so that the software can be readily updated.
What are common ways of going about this?

If you think your code might benefit others, you could put it up on PyPI. Then having different versions is just updating your package, or telling your clients to use easy_install to get the latest version. This doesn't push updates, though.
You can try Esky, which is an auto-update framework for managing different versions, including fetching new versions and rolling back partial updates. It can be found on PyPI.
That said, I haven't used Esky. If you wish to roll your own auto-update feature, you might want to look at Boxed Dice to see how they got around to it.

When you package an app with py2exe, the result is usually a single executable (perhaps with some data files). This is simplest to update by just proposing the user to download and install a new version every once in a while (how you check with a server that such new version exists is a different question).
If you want to reduce the download size the user has to do, application commonly resort to breaking themselves up into multiple DLLs and updating only the relevant DLLs. When you have a Python application you don't have DLLs but you have an even easier option - you can just keep most of your app's logic outside the exe in .pyc files, and update just some of these .pyc files.
Now, mind you, .pyc files are easily "decompilable" into Python (a somewhat obfuscated version of your original code), but having an exe made with py2exe isn't much safer, because py2exe is open-source software and packs all the same files inside the exe anyway.
To conclude, my suggestion is don't bother. How large can your application be? With today's fast connections, it's easier to just make the user download a whole new version than to invest a lot of time into building partial-update functionality into your program.

How might I handle development versions of Python packages without relying on SCM?

One issue that comes up during Pinax development is dealing with development versions of external apps. I am trying to come up with a solution that doesn't involve bringing in the version control systems. Reason being I'd rather not have to install all the possible version control systems on my system (or force that upon contributors) and deal the problems that might arise during environment creation.
Take this situation (knowing how Pinax works will be beneficial to understanding):
We are beginning development on a new version of Pinax. The previous version has a pip requirements file with explicit versions set. A bug comes in for an external app that we'd like to get resolved. To get that bug fix in Pinax the current process is to simply make a minor release of the app assuming we have control of the app. Apps we don't have control we just deal with the release cycle of the app author or force them to make releases ;-) I am not too fond of constantly making minor releases for bug fixes as in some cases I'd like to be working on new features for apps as well. Of course branching the older version is what we do and then do backports as we need.
I'd love to hear some thoughts on this.

Could you handle this using the "==dev" version specifier? If the distribution's page on PyPI includes a link to a .tgz of the current dev version (such as both github and bitbucket provide automatically) and you append "#egg=project_name-dev" to the link, both easy_install and pip will use that .tgz if ==dev is requested.
This doesn't allow you to pin to anything more specific than "most recent tip/head", but in a lot of cases that might be good enough?

I meant to mention that the solution I had considered before asking was to put up a Pinax PyPI and make development releases on it. We could put up an instance of chishop. We are already using pip's --find-links to point at pypi.pinaxproject.com for packages we've had to release ourselves.

Most open source distributors (the Debians, Ubuntu's, MacPorts, et al) use some sort of patch management mechanism. So something like: import the base source code for each package as released, as a tar ball, or as a SCM snapshot. Then manage any necessary modifications on top of it using a patch manager, like quilt or Mercurial's Queues. Then bundle up each external package with any applied patches in a consistent format. Or have URLs to the base packages and URLs to the individual patches and have them applied during installation. That's essentially what MacPorts does.
EDIT: To take it one step further, you could then version control the set of patches across all of the external packages and make that available as a unit. That's quite easy to do with Mercurial Queues. Then you've simplified the problem to just publishing one set of patches using one SCM system, with the patches applied locally as above or available for developers to pull and apply to their copies of the base release packages.

EDIT: I am not sure I am reading your question correctly so the following may not answer your question directly.
Something I've considered, but haven't tested, is using pip's freeze bundle feature. Perhaps using that and distributing the bundle with Pinax would work? My only concern would be how different OS's are handled. For example, I've never used pip on Windows, so I wouldn't know how a bundle would interact there.
The full idea I hope to try is creating a paver script that controls management of the bundles, making it easy for users to upgrade to newer versions. This would require a bit of scaffolding though.
One other option may be you keeping a mirror of the apps you don't control, in a consistent vcs, and then distributing your mirrored versions. This would take away the need for "everyone" to have many different programs installed.
Other than that, it seems the only real solution is what you guys are doing, there isn't a hassle-free way that I've been able to find.

Are there any other good alternatives to zc.buildout and/or virtualenv for installing non-python dependencies?

I am a member of a team that is about to launch a beta of a python (Django specifically) based web site and accompanying suite of backend tools. The team itself has doubled in size from 2 to 4 over the past few weeks and we expect continued growth for the next couple of months at least. One issue that has started to plague us is getting everyone up to speed in terms of getting their development environment configured and having all the right eggs installed, etc.
I'm looking for ways to simplify this process and make it less error prone. Both zc.buildout and virtualenv look like they would be good tools for addressing this problem but both seem to concentrate primarily on the python-specific issues. We have a couple of small subprojects in other languages (Java and Ruby specifically) as well as numerous python extensions that have to be compiled natively (lxml, MySQL drivers, etc). In fact, one of the biggest thorns in our side has been getting some of these extensions compiled against appropriate versions of the shared libraries so as to avoid segfaults, malloc errors and all sorts of similar issues. It doesn't help that out of 4 people we have 4 different development environments -- 1 leopard on ppc, 1 leopard on intel, 1 ubuntu and 1 windows.
Ultimately what would be ideal would be something that works roughly like this, from the dos/unix prompt:
$ git clone [repository url]
...
$ python setup-env.py
...
that then does what zc.buildout/virtualenv does (copy/symlink the python interpreter, provide a clean space to install eggs) then installs all required eggs, including installing any native shared library dependencies, installs the ruby project, the java project, etc.
Obviously this would be useful for both getting development environments up as well as deploying on staging/production servers.
Ideally I would like for the tool that accomplishes this to be written in/extensible via python, since that is (and always will be) the lingua franca of our team, but I am open to solutions in other languages.
So, my question then is: does anyone have any suggestions for better alternatives or any experiences they can share using one of these solutions to handle larger/broader install bases?

Setuptools may be capable of more of what you're looking for than you realize -- if you need a custom version of lxml to work correctly on MacOS X, for instance, you can put a URL to an appropriate egg inside your setup.py and have setuptools download and install that inside your developers' environments as necessary; it also can be told to download and install a specific version of a dependency from revision control.
That said, I'd lean towards using a scriptably generated virtual environment. It's pretty straightforward to build a kickstart file which installs whichever packages you depend on and then boot virtual machines (or production hardware!) against it, with puppet or similar software doing other administration (adding users, setting up services [where's your database come from?], etc). This comes in particularly handy when your production environment includes multiple machines -- just script the generation of multiple VMs within their handy little sandboxed subnet (I use libvirt+kvm for this; while kvm isn't available on all the platforms you have developers working on, qemu certainly is, or you can do as I do and have a small number of beefy VM hosts shared by multiple developers).
This gets you out of the headaches of supporting N platforms -- you only have a single virtual platform to support -- and means that your deployment process, as defined by the kickstart file and puppet code used for setup, is source-controlled and run through your QA and review processes just like everything else.

I always create a develop.py file at the top level of the project, and have also a packages directory with all of the .tar.gz files from PyPI that I want to install, and also included an unpacked copy of virtualenv that is ready to run right from that file. All of this goes into version control. Every developer can simply check out the trunk, run develop.py, and a few moments later will have a virtual environment ready to use that includes all of our dependencies at exactly the versions the other developers are using. And it works even if PyPI is down, which is very helpful at this point in that service's history.

Basically, you're looking for a cross-platform software/package installer (on the lines of apt-get/yum/etc.) I'm not sure something like that exists?
An alternative might be specifying the list of packages that need to be installed via the OS-specific package management system such as Fink or DarwinPorts for Mac OS X and having a script that sets up the build environment for the in-house code?

I have continued to research this issue since I posted the question. It looks like there are some attempts to address some of the needs I outlined, e.g. Minitage and Puppet which take different approaches but both may accomplish what I want -- although Minitage does not explicitly state that it supports Windows. Lacking any better options I will try to make either one of these or just extensive customized use of zc.buildout work for our needs, but I still feel like there must be better options out there.

You might consider creating virtual machine appliances with whatever production OS you are running, and all of the software dependencies pre-built. Code can be edited either remotely, or with a shared folder. It worked pretty well for me in a past life that had a fairly complicated development environment.

Puppet doesn't (easily) support the Win32 world either. If you're looking for a deployment mechanism and not just a "dev setup" tool, you might consider looking into ControlTier (http://open.controltier.com/) which has a open-source cross-platform solution.
Beyond that you're looking at "enterprise" software such as BladeLogic or OpsWare and typically an outrageous pricetag for the functionality offered (my opinion, obviously).
A lot of folks have been aggressively using a combination of Puppet and Capistrano (even non-rails developers) for deployment automation tools to pretty good effect. Downside, again, is that it's expecting a somewhat homogeneous environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.