Continuous Integration System for a Python Codebase

Continuous Integration System for a Python Codebase - python

I am starting to work on a hobby project with a Python codebase and I would like to set up some form of continuous integration (i.e. running a battery of test-cases each time a check-in is made and sending nag e-mails to responsible persons when the tests fail) similar to CruiseControl or TeamCity.
I realize I could do this with hooks in most VCSes, but that requires that the tests run on the same machine as the version control server, which isn't as elegant as I would like. Does anyone have any suggestions for a small, user-friendly, open-source continuous integration system suitable for a Python codebase?

We run Buildbot - Trac at work. I haven't used it too much since my codebase isn't part of the release cycle yet. But we run the tests on different environments (OSX/Linux/Win) and it sends emails — and it's written in Python.

One possibility is Hudson. It's written in Java, but there's integration with Python projects:
Hudson embraces Python
I've never tried it myself, however.
(Update, Sept. 2011: After a trademark dispute Hudson has been renamed to Jenkins.)

Second the Buildbot - Trac integration. You can find more information about the integration on the Buildbot website. At my previous job, we wrote and used the plugin they mention (tracbb).
What the plugin does is rewriting all of the Buildbot urls so you can use Buildbot from within Trac. (http://example.com/tracbb).
The really nice thing about Buildbot is that the configuration is written in Python. You can integrate your own Python code directly to the configuration. It's also very easy to write your own BuildSteps to execute specific tasks.
We used BuildSteps to get the source from SVN, pull the dependencies, publish test results to WebDAV, etcetera.
I wrote an X10 interface so we could send signals with build results. When the build failed, we switched on a red lava lamp. When the build succeeded, a green lava lamp switched on. Good times :-)

We use both Buildbot and Hudson for Jython development. Both are useful, but have different strengths and weaknesses.
Buildbot's configuration is pure Python and quite simple once you get the hang of it (look at the epydoc-generated API docs for the most current info). Buildbot makes it easier to define non-testing tasks and distribute the testers. However, it really has no concept of individual tests, just textual, HTML, and summary output, so if you want to have multi-level browsable test output and so forth you'll have to build it yourself, or just use Hudson.
Hudson has terrific support for drilling down from overall results into test suites and individual tests; it also is great for comparing test output between builds, but the distributed (master/slave) stuff is comparatively more complicated because you need a Java environment on the slaves too; also, Hudson is less tolerant of flaky network links between the master and slaves.
So, to get the benefits of both tools, we run a single instance of Hudson, which catches the common test failures, then we do multi-platform regression with Buildbot.
Here are our instances:
Jython Hudson
Jython buildbot

We are using Bitten wich is integrated with trac. And it's python based.

TeamCity has some Python integration.
But TeamCity is:
not open-source
is not small, but rather feature rich
is free for small-mid teams.

I have very good experiences with Travis-CI for smaller code bases.
The main advantages are:
setup is done in less than half a screen of config file
you can do your own installation or just use the free hosted version
semi-automatic setup for github repositories
no account needed on website; login via github
Some limitations:
Python is not supported as a first class language (as of time of writing; but you can use pip and apt-get to install python dependencies; see this tutorial)
code has to be hosted on github (at least when using the official version)

Related

Extracting utilities into libraries for python microservice applications

TL;DR: Any advice or resources on extracting code to reusable, well-structured and maintainable libraries?
I'm working on python applications in a microservice-style architecture, where we'll be developing and deploying a bunch of small applications, each solving a specific issues, maybe (or maybe not) by interacting with other applications/external services.
We just started moving to that microservice architecture, so we already have quite a bit of code in a monolithic project. As we're adding new microservices, it's obvious that we need to extract common code(e.g. utilities, base classes, ...) into libraries to avoid reimplementing or copy-pasting code that will then have to be maintained separately. As I'm trying to do that(which I've never really done before), I'm realizing it's not trivial and can become complicated pretty quickly, and I could spend some time overthinking it too.
So I'm looking for advices, or pointers to resources on best-practices related to this situation, i.e. writing well-structured python libraries, packaging and distributing libraries, sharing code in a microservice architecture and avoiding making mistakes that might put me in problematic situations, .
Concrete problems/challenges I'm facing:
* How best to group/separate code in version control. Like, one repository per package? The number of repositories can explode pretty quickly...

For shared libraries you can publish it to git in individual repositories and set them up to use Python package managers to install them in your project.
As far as application deployments, service dependencies, etc. I would advise for you to take a look at Docker for containerization, docker-compose for service dependencies locally, Artifactory or ECR for Docker image registries, and container orchestration platforms like Kubernetes.
Containers are similar to the virtual machines but at a more granular level, the process level. This effectively will allow you to run services together locally for testing and deploying them. It would no longer matter that each service is in a different repository.
If you don't have too many microservices, you could definitely use a mono-repo but if your engineering organization is large, its pretty costly to download all the updates for all the services. As an alternative, you may have your services that are divided in respective bounded contexts all live in a single repo to remove this deterrent. Long story short, it really depends what you will find beneficial. At the end of the day, the largest problems are never how many Git repositories you have, its how you define the bounds of your services, the service-to-service communication and infrastructure for deploying the services.

How to build a web service with one sandboxed Python (VM) per request

As part of an effort to make the scikit-image examples gallery interactive, I would like to build a web service that receives a Python code snippet, executes it, and provides me with the generated output image.
For safety, the Python instances launched should be sandboxed and resource controlled, so I was thinking of using LXC containers.
Is this a good way to approach the problem? If so, what is the recommended way of launching one Python VM per request?

Stefan, perhaps "Docker" could be of use? I get the impression that you could constrain the VM that the application is run in -- an example web service:
http://docs.docker.io/en/latest/examples/python_web_app/
You could try running the application on Digital Ocean, like so:
https://www.digitalocean.com/community/articles/how-to-install-and-use-docker-getting-started

[disclaimer: I'm an engineer at Continuum working on Wakari]
Wakari Enterprise (http://enterprise.wakari.io) is aiming to do exactly this, and we're hoping to back-port the functionality into Wakari Cloud (http://wakari.io) so "published" IPython Notebooks can have some knobs on them for variable input control, then they can be "invoked" in a sandboxed state, and then the output given back to the user.
However for things that exist now, you should look at Sage Notebook. A few years ago several people worked hard on a Sage Notebook Cell Server that could do exactly what you were asking for: execute small code snippets. I haven't followed it since then, but it seems it is still alive and well from a quick search:
http://sagecell.sagemath.org/?q=ejwwif
http://sagecell.sagemath.org
http://www.sagemath.org/eval.html
For the last URL, check out Graphics->Mandelbrot and you can see that Sage already has some great capabilities for UI widgets that are tied to the "cell execution".

I think docker is the way to go for this. The instances are very light weight, and docker is designed to spawn 100s of instances at a time (Spin up time is fractions of a second vs traditional VMs couple of seconds). Configured correctly I believe it also gives you a complete sandboxed environment. Then it matters not about trying to sandbox python :-D

I'm not sure if you really have to go as far as setting up LXC containers:
There is seccomp-nurse, a Python sandbox that leverages the seccomp feature of the Linux kernel.
Another option would be to use PyPy, which has explicit support for sandboxing out of the box.
In any case, do not use pysandbox, it is broken by design and has severe security risks.

Alternatives to Hadoop / Map-reduce framework for win32 platform

I'm finding Hadoop on Windows somewhat frustrating: I want to know if there are any serious alternatives to Hadoop for Win32 users. The features I most value are:
Ease of initial setup & deployment on a smallish network (I'd be astonished if we ever got more than 20 worker-PCs assigned to this project)
Ease of management - the ideal framework should have web/GUI based administration system so that I do not have to write one myself.
Something popular & stable. Bonuses depend on us getting this project delivered in time.
BACKGROUND:
The company I work for wants to build a new grid system to run some financial calculations.
The first framework I have been evaluating is Hadoop. This seemed to do exactly what was intended except that it's very UNIX oriented. I was able to get all of the tutorials up & running on an Ubuntu VirtualBox. Unfortunately nothing seems to run easily on Win32.
Yes... Win32: Our company has a policy that everything has to run on Windows. None of the server admins (or anybody outside of select few developers) know anything about Linux. I'd probably get in trouble if they found my virtual Ubuntu environment! The sad fact is that our grid needs to be hosted on Win32 (since all the test PCs run Windows XP 32bit), with an option to upgrade to Win64 at sometime in the future.
To complicate matters - 95% of what we want to run are Python scripts with C++ Windows 32bit DLL add ons. Our calculation library is overwhelmingly written in Python. Our calculation libraries will not run on anything other than Windows... I do not really have a choice

For python there is:
disco
bigtempo
celery - not really a map-reduce framework, but it's a good start if you want something very customized
And you can find a bunch of hadoop clients/integrations on pypi

You could try MPI. It is a standard for message-passing concurrent applications. We are running it on our Linux cluster but it is cross-platform. The most popular implementation is mpich2, written in C. There are python bindings for MPI through the mpi4py library.

IPython has some parallel computing features that are simple and work on windows. It may be enough for your needs. Here's a good place to start:
http://showmedo.com/videotutorials/video?name=7200100&fromSeriesID=720

I've compiled a list of available MapReduce/Hadoop offerings in the cloud (hosted services, PaaS-level), this might be of help as well.

Many distributed computing frameworks can be used for many-task computing. If you don't need the MapReduce paradigm, but rather the ability to distribute the tasks of a job across separate computers, communication and resource management, then you could take a look at other platforms in this area like Condor, or even Boinc; both run on Windows.
You could also run Hadoop on Linux virtual machines.

Jython, Jepp or Pylons for the performance

I'm trying to incorporate server-based code diff and highlighting in my GWT (Java) project. I managed to incorporate Pygments and difflib into my code using Jython. The basic idea is to generate complete markup on the server and then simply inject code into the page as innerHTML.
I found Jython completely inadequate as even for relatively small files (2K-3K lines) it takes Pygments or difflib forever (minutes not seconds) to process these files. Difflib actually reliably causes OOM errors in the process with dedicated 500M of memory
So I'm wondering if my current setup is wrong or Jython is simply unsuitable for this purpose?
If so, what's next? I discover Jepp but then I would have to build my project for each platform and it has little documentation and don't seem very stable. Another possibility would be to run Pylons as a separate webservice on the same host and get the markup directly to client or channel it through server. And yet another way is to use Java System to execute python script as a process and capture the output.
I would be very interested to hear solid suggestion on the matter.

Having a separate service sounds like the best way to go. For Pygments, there is already a service available (on Google App Engine). The source for the app is BSD open source and on GitHub here. You could adapt this to add difflib functionality too, of course.

I'm going to accept answer above since it coincides with my findings but just to let anyone who reads this know - running separate webservice for Pygments using Python-native solution such as Bottle performs many times better than embedded Jython. Especially on Linux

Tornado and Python 3.x

I really like Tornado and I would like to use it with Python 3, though it is written for Python versions 2.5 and 2.6.
Unfortunately it seems like the project's source doesn't come with a test suite. If I understand correctly the WSGI part of it wouldn't be that easy to port as it's spec is not ready for Python 3 yet (?), but I am rather interested in Tornado's async features so WSGI compatibility is not my main concern even if it would be nice.
Basically I would like to know what to look into/pay attention for when trying to port or whether there are already ports/forks already (I could not find any using google or browsing github, though I might have missed something).

first of all, I want to apologize for an answer to an outdated topic,
but once I found this topic through Google, I want to update important information!
In the Tornado 2.0 adds support for Python 3.2!
https://github.com/facebook/tornado/blob/master/setup.py
http://groups.google.com/group/python-tornado/browse_thread/thread/69415c13d129578b

Software without a decent test suite is legacy software -- even if it has been released yesterday!-) -- so the first important step is to start building a test suite; I recommend Feathers' book in the URL, but you can start with this PDF which is an essay, also by Feathers, preceding the book and summarizing one of the book's main core ideas and practices.
Once you do have the start of a test suite, run it with Python 2.6 and a -3 flag to warn you of things 2to3 may stumble on; once those are fixed, it's time to try 2to3 and try the test suite with Python 3. You'll no doubt have to keep beefing up the test suite as you go, and I recommend regularly submitting all the improvements to the upstream Tornado open source project -- those tests will be useful to anybody who needs to maintain or port Tornado, after all, not just to people interested in Python 3, so, with luck, you might gain followers and more and more contributors to the test suite.
I can't believe that people are releasing major open source projects, in 2009!!!, without decent test suites, but I'm trusting you that this is indeed what the Tornadoers have done...

Tornado is a good web framework over something that kind of looks like twisted, but doesn't have twisted's bug fixes or features. I did a port to twisted a while back that essentially just removed code.
Some of these features are very important. For example, if you're doing WSGI, you're blocking a non-blocking web framework. Bad Things will happen. Twisted's async web framework also has a WSGI container, but it uses deferToThread to prevent it from blocking other requests. Still not the right way to scale an app, but it falls apart much more slowly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.