Tool (or combination of tools) for reproducible environments in Python

Tool (or combination of tools) for reproducible environments in Python - python

I used to be a java developer and we used tools like ant or maven to manage our development/testing/UAT environments in a standardized way. This allowed us to handle library dependencies, setting OS variables, compiling, deploying, running unit tests, and all the required tasks. Also, the scripts generated guaranteed that all the environments were almost equally configured, and all the task were performed in the same way by all the members of the team.
I'm starting to work in Python now and I'd like your advice in which tools should I use to accomplish the same as described for java.

virtualenv to create a contained virtual environment (prevent different versions of Python or Python packages from stomping on each other). There is increasing buzz from people moving to this tool. The author is the same as the older working-env.py mentioned by Aaron.
pip to install packages inside a virtualenv. The traditional is easy_install as answered by S. Lott, but pip works better with virtualenv. easy_install still has features not found in pip though.
scons as a build tool, although you won't need this if you stay purely Python.
Fabric paste, or paver for deployment.
buildbot for continuous integration.
Bazaar, mercurial, or git for version control.
Nose as an extension for unit testing.
PyFit for FIT testing.

I also work with both java and python.
For python development the maven equivalent is setuptools (http://peak.telecommunity.com/DevCenter/setuptools). For web application development I use this in combination with paster (http://pythonpaste.org/) for the deployment process

Other than easy_install?
For our Linux servers, we use easy_install and yum.
For our Windows development laptops, we use easy_install and a few MSI's for some projects.
Most of the Python libraries we use are source-only, so we can use the same distribution on all boxes. If we could have a network shared device, we'd put them all there. Sadly, our infrastructure is kind of scattered, so we have to either move .TAR files around or redo the installs to rebuild the environments.
In a few cases (e.g., PIL), we have to recompile and check the version numbers.

You will want easy_setup to get the eggs (roughly what Maven calls an artifact).
For setting up your environment, have a look at working-env.py
Python is not compiled but you can put all files for a project in an egg. This is done with setuptools
For CI, check this answer.

We would be remiss not to also mention Paver, which was created by Kevin Dangoor of TurboGears fame. The project is still in alpha, but it appears very promising. A snippet from the project page:
Paver is a Python-based build/distribution/deployment scripting tool along the lines of Make or Rake. What makes Paver unique is its integration with commonly used Python libraries. Common tasks that were easy before remain easy. More importantly, dealing with your applications specific needs and requirements is now much easier.

I do exactly this with a combination of setuptools and Hudson. I know Hudson is a java app, but it can run Python stuff just fine.

You might want to check our Devenv. It allows you to standardize the build environments for development, QA and UAT. It's free as in "free beer".
HTH

Related

Building a python package to publish in pypi

I am greatly confused with the process of building a python package that I want to distribute on pypi.
There are some specific, basic things that I did not understand:
What exactly is that gets published? Binaries? Source code? How do I do one or the other?
How do I build multiple platform-specific, os-specific build from the same codebase?
How do I build a the package for multiple versions of Python from the same codebase? Is it necessary if I want to support many python versions?
I am using a .toml file for the setup configuration.
I found some answers only, but all refer to procedures with either a setup.py or a setup.cfg.

What exactly is that gets published? Binaries? Source code?
Yes, and yes.
It depends on details of your project and your package config.
Arbitrary commands can be run during a package build.
You might, for example, run a fortran compiler locally
and ship binaries, or you might insist that each person
installing the package run their own local fortran compiler.
We usually expect full *.py source code will appear on pypi.org.
"Binaries" here will usually describe compiled machine code,
and not *.pyc bytecode files.
How do I build multiple platform-specific, os-specific build from the same codebase?
I have only done this via git pull on a target platform, followed
by a local build, but there is certainly support for cross target
toolchains if you need that.
How do I build a the package for multiple versions of Python from the same codebase?
Same as above -- do a build under each separate target version.
Is it necessary if I want to support many python versions?
Typically the answer is "no".
Pick a minimum required interpreter version, such as 3.7,
and do all your development / testing / release work in that.
Backward compatibility of interpreters is excellent.
Folks running 3.8 or 3.11 should have no trouble
with your package.
There can be a fly in the ointment.
Suppose your project depends on library X,
or depends on X which depends on Y.
And one of them stopped being updated a few years ago,
or went through a big change like a rename.
Your users who are on 3.11 might find it
inconvenient to obtain a compatible version of X or Y.
This might motivate you to do split releases,
for example via major version number or by
slightly altering your project name.
Clearly you haven't crossed that bridge quite yet.
The poetry ecosystem
is reasonably mature. It has tried to fix many of the
rough edges surrounding the python packaging practices
of the last few decades. I recommend that you prefer
modern over ancient practices, and that you adopt poetry
for your project.
If that won't fly for some reason, and especially if
binaries are a big deal for your project, consider publishing via
conda.
There are many pip pitfalls with target system
needing compilers and libraries. Conda does an
excellent job of ensuring that conda install ...
will Just Work.

Distributing python code with virtualenv?

I want to distribute some python code, with a few external dependencies, to machines with only core python installed (and users that unfamiliar with easy_install etc.).
I was wondering if perhaps virtualenv can be used for this purpose? I should be able to write some bash scripts that trigger the virtualenv (with the suitable packages) and then run my code.. but this seems somewhat messy, and I'm wondering if I'm re-inventing the wheel?
Are there any simple solutions to distributing python code with dependencies, that ideally doesn't require sudo on client machines?

Buildout - http://pypi.python.org/pypi/zc.buildout
As sample look at my clean project: http://hg.jackleo.info/hyde-0.5.3-buildout-enviroment/src its only 2 files that do the magic, more over Makefile is optional but then you'll need bootstrap.py (Make file downloads it, but it runs only on Linux). buildout.cfg is the main file where you write dependency's and configuration how project is laid down.
To get bootstrap.py just download from http://svn.zope.org/repos/main/zc.buildout/trunk/bootstrap/bootstrap.py
Then run python bootstap.py and bin/buildout. I do not recommend to install buildout locally although it is possible, just use the one bootstrap downloads.
I must admit that buildout is not the easiest solution but its really powerful. So learning is worth time.
UPDATE 2014-05-30
Since It was recently up-voted and used as an answer (probably), I wan to notify of few changes.
First of - buildout is now downloaded from github https://raw.githubusercontent.com/buildout/buildout/master/bootstrap/bootstrap.py
That hyde project would probably fail due to buildout 2 breaking changes.
Here you can find better samples http://www.buildout.org/en/latest/docs/index.html also I want to suggest to look at "collection of links related to Buildout" part, it might contain info for your project.
Secondly I am personally more in favor of setup.py script that can be installed using python. More about the egg structure can be found here http://peak.telecommunity.com/DevCenter/PythonEggs and if that looks too scary - look up google (query for python egg). It's actually more simple in my opinion than buildout (definitely easier to debug) as well as it is probably more useful since it can be distributed more easily and installed anywhere with a help of virtualenv or globally where with buildout you have to provide all of the building scripts with the source all of the time.

You can use a tool like PyInstaller for this purpose. Your application will appear as a single executable on all platforms, and include dependencies. The user doesn't even need Python installed!
See as an example my logview package, which has dependencies on PyQt4 and ZeroMQ and includes distributions for Linux, Mac OSX and Windows all created using PyInstaller.

You don't want to distribute your virtualenv, if that's what you're asking. But you can use pip to create a requirements file - typically called requirements.txt - and tell your users to create a virtualenv then run pip install -r requirements.txt, which will install all the dependencies for them.
See the pip docs for a description of the requirements file format, and the Pinax project for an example of a project that does this very well.

How to install python modules on a per project basis, rather than system wide?

I am trying to define a process for migrating django projects from my development server to my production server using git, and it's driving me crazy that distutils installs python modules system-wide. I've read the documentation but unless I'm missing something it seems to be mostly about how to change the installation directory. I need to be able to use different versions of the same module in different projects running on the same server, and deploy projects from git without having to download and install dependencies.
tl;dr: I need to know how to install python modules, using distutils, into my project's source tree for version control without compromising other projects using different versions of the same module.
I'm new to python, so I apologize in advance if this is common knowledge.

Besides the already mentioned virtualenv which is a good option but has the potential drawback of requiring a third-party module, Distutils itself has options to install modules into arbitrary locations. In particular, there is the home scheme which allows you to "build and maintain a personal stash of Python modules". It's described in the Python documentation set here.

Perhaps you are looking for virtualenv. It will allow you to install packages into a separate virtual Python "root".

for completeness sake, virtualenvwrapper makes every day work with virtualenv a lot quicker and simpler once you are working on multiple projects and/or on multiple development platforms at the same time.

If you are looking something akin to npm or yarn of the JavaScript world or composer of the PHP world, then you may want to look at pipenv (not to be confused with pip). Here's a guide to get you started.
Alternatively there is also Poetry, which some people say is even better, but I haven't used it yet.

Deploying python app to Mac and Windows users

I've written an app in python that depends on wxPython and some other python libraries. I know about pyexe for making python scripts executable on Windows, but what would be the easiest way to share this with my Mac using friends who wouldn't know how to install the required dependencies? One option would be to bundle my dependencies in the same package, but that seems kind of clunky. How do people usually deploy such apps? For once I miss Java...

You could check out py2app, which is similar to py2exe

How do people usually deploy such apps?
2 choices.
With instructions.
All bundled up.
You write simple instructions like this. Folks can follow these pretty reliably, unless they don't have enough privileges. Sometimes they need to sudo in linux environments.
Download easy_install (or pip)
easy_install this, easy_install that (or pip this, pip that)
easy_install whatever package you wrote.
It works really well. If you download some Python packages you'll see this in action.
Sphinx requires docutils. Django requires docutils and PIL. It works out really well to simply document the dependencies. Other folks seem to do it without serious problems. Follow their lead.
Bundling things up means you have to
(a) provide the entire original distribution (as required by most open source licenses)
(b) provide a compatible open source license with the licenses of the things you bundled. This can be easy if you depend on things that all of the same license. Otherwise, you basically can't redistribute them and have to resort to installation instructions.

Are there any other good alternatives to zc.buildout and/or virtualenv for installing non-python dependencies?

I am a member of a team that is about to launch a beta of a python (Django specifically) based web site and accompanying suite of backend tools. The team itself has doubled in size from 2 to 4 over the past few weeks and we expect continued growth for the next couple of months at least. One issue that has started to plague us is getting everyone up to speed in terms of getting their development environment configured and having all the right eggs installed, etc.
I'm looking for ways to simplify this process and make it less error prone. Both zc.buildout and virtualenv look like they would be good tools for addressing this problem but both seem to concentrate primarily on the python-specific issues. We have a couple of small subprojects in other languages (Java and Ruby specifically) as well as numerous python extensions that have to be compiled natively (lxml, MySQL drivers, etc). In fact, one of the biggest thorns in our side has been getting some of these extensions compiled against appropriate versions of the shared libraries so as to avoid segfaults, malloc errors and all sorts of similar issues. It doesn't help that out of 4 people we have 4 different development environments -- 1 leopard on ppc, 1 leopard on intel, 1 ubuntu and 1 windows.
Ultimately what would be ideal would be something that works roughly like this, from the dos/unix prompt:
$ git clone [repository url]
...
$ python setup-env.py
...
that then does what zc.buildout/virtualenv does (copy/symlink the python interpreter, provide a clean space to install eggs) then installs all required eggs, including installing any native shared library dependencies, installs the ruby project, the java project, etc.
Obviously this would be useful for both getting development environments up as well as deploying on staging/production servers.
Ideally I would like for the tool that accomplishes this to be written in/extensible via python, since that is (and always will be) the lingua franca of our team, but I am open to solutions in other languages.
So, my question then is: does anyone have any suggestions for better alternatives or any experiences they can share using one of these solutions to handle larger/broader install bases?

Setuptools may be capable of more of what you're looking for than you realize -- if you need a custom version of lxml to work correctly on MacOS X, for instance, you can put a URL to an appropriate egg inside your setup.py and have setuptools download and install that inside your developers' environments as necessary; it also can be told to download and install a specific version of a dependency from revision control.
That said, I'd lean towards using a scriptably generated virtual environment. It's pretty straightforward to build a kickstart file which installs whichever packages you depend on and then boot virtual machines (or production hardware!) against it, with puppet or similar software doing other administration (adding users, setting up services [where's your database come from?], etc). This comes in particularly handy when your production environment includes multiple machines -- just script the generation of multiple VMs within their handy little sandboxed subnet (I use libvirt+kvm for this; while kvm isn't available on all the platforms you have developers working on, qemu certainly is, or you can do as I do and have a small number of beefy VM hosts shared by multiple developers).
This gets you out of the headaches of supporting N platforms -- you only have a single virtual platform to support -- and means that your deployment process, as defined by the kickstart file and puppet code used for setup, is source-controlled and run through your QA and review processes just like everything else.

I always create a develop.py file at the top level of the project, and have also a packages directory with all of the .tar.gz files from PyPI that I want to install, and also included an unpacked copy of virtualenv that is ready to run right from that file. All of this goes into version control. Every developer can simply check out the trunk, run develop.py, and a few moments later will have a virtual environment ready to use that includes all of our dependencies at exactly the versions the other developers are using. And it works even if PyPI is down, which is very helpful at this point in that service's history.

Basically, you're looking for a cross-platform software/package installer (on the lines of apt-get/yum/etc.) I'm not sure something like that exists?
An alternative might be specifying the list of packages that need to be installed via the OS-specific package management system such as Fink or DarwinPorts for Mac OS X and having a script that sets up the build environment for the in-house code?

I have continued to research this issue since I posted the question. It looks like there are some attempts to address some of the needs I outlined, e.g. Minitage and Puppet which take different approaches but both may accomplish what I want -- although Minitage does not explicitly state that it supports Windows. Lacking any better options I will try to make either one of these or just extensive customized use of zc.buildout work for our needs, but I still feel like there must be better options out there.

You might consider creating virtual machine appliances with whatever production OS you are running, and all of the software dependencies pre-built. Code can be edited either remotely, or with a shared folder. It worked pretty well for me in a past life that had a fairly complicated development environment.

Puppet doesn't (easily) support the Win32 world either. If you're looking for a deployment mechanism and not just a "dev setup" tool, you might consider looking into ControlTier (http://open.controltier.com/) which has a open-source cross-platform solution.
Beyond that you're looking at "enterprise" software such as BladeLogic or OpsWare and typically an outrageous pricetag for the functionality offered (my opinion, obviously).
A lot of folks have been aggressively using a combination of Puppet and Capistrano (even non-rails developers) for deployment automation tools to pretty good effect. Downside, again, is that it's expecting a somewhat homogeneous environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.