Python 3.x is looking ever more tempting with cleaned up syntax (I like it, others may not) new features and what looks like a gradual progression towards more speed and better multithreading.
But Python 3.x is still held back by lack of 3rd party support. Important packages like Django, Twisted, etc. are not ported. It's hard to get an overview of where the bottlebecks in the migration are, how far it has come, and if it's progressing at all. The migration dependencies are also hard to map. Also, projects are probably waiting for Python 3.x to offer some major improvement over 2.x that would justify the effort of porting.
Ideally, there would be a site for tracking this migration overall, with (links to) migration plans and dependencies shown so that people willing to help the migration globally could coordinate their efforts and help specific projects. Perhaps also linking to projects' bug tracking systems for relevant migration-related bugs.
But perhaps I'm just not looking hard enough. Does someone know of any efforts to track global migration to Python 3.x?
(By "global", I mean the universe of open source projects built on Python.)
Update:
There's a poll right now on the Python home page which asks about packages you'd like to see ported to Python 3.x.
George Brandl has made a script that generates a graph with the amount of packages supporting Python 3:
The Link on the CheeseShop front page shows the packages in question: http://pypi.python.org/pypi?%3aaction=browse&c=533&show=all
There is also (a pretty crummy) list of unported packages ordered by how many depends on it: http://onpython3yet.com/ Why do I say it's crummy? Well, because it is done entirely without manual fixing up, resulting in things like listing Python as a package. This is to a large extent because people don't know that the "Dependencies" listing isn't a place to just list any sort of random dependencies, it should be used to list the packages that should be auto installed when you use easy_install/PIP. But for example in the Django world, they don't know that so you see things like "django-saddle" depending on Django and Python, and hence not being easy_installable.
That said, the list is interesting, and we see that PIL really should get ported.
Now this is not anything "global" it's just the packages on PyPI, and as such tend to be mostly Python modules, not separate applications. But I think the trend in general is visible there anyway.
The Python Package Index (PyPI) allows you to search for Python 3rd-party modules that support Python 3.x. It even has a Python 3 packages link which lists them all.
But that doesn't track individual projects' progress on Python 3 support. It just tells you which projects have achieved it.
Something I'd be interested to see is a graph of the total number/percentage of Python 3 packages in PyPI over time (from Python 3 release until present). I don't know if anyone has tracked this, or if the PyPI administrators have enough history data to produce such graphs.
Related
I've googled and googled, but have found almost nothing in the way of discussions or best practices in managing larger enterprise codebases in Python. Here, I'm simply soliciting any and all pointers to such information. Here's some background and some of the questions I'm looking to answer.
We're long-time Java developers, who have solved similar problems to those mentioned below largely using well established Java best practices, as well as Maven, Ant and a Sonotype Nexus repo.
I'm talking internal software only here. We're not looking to distribute anything Python-based. We've got multiple development groups using Python, each developing sharable utility code libraries, final web applications and stand-alone tools, all in pure Python. Each group has its own Github source repository.
How do we manage our shareable code, both within a group and across groups? Do we create eggs (or something similar) and distribute and install them into the Python system? If so, would we store them in our Nexus repo like our Java jars, or is there a more Python-specific method if internal package distribution? Or, do we just share raw code, checking out sources from multiple Github repos?
If we share raw code, how do we manage getting the Python searchpath right as we bring together code from multiple repositories?
How do we manage package namespaces when we want our packages to all live in a com.ourcompany base namespace? It seems like python isn't too happy when you bring together source trees with overlapping namespaces.
How do we manage third party package versioning? I've never seen easy_install or pip passed a version number. How do we lock down third party package versions?
Do tools exist to aid in Python code reviews, CI, regression testing, etc.?
We're relative newbies to Python code, so some of these questions may have fairly obvious answers. Still, I find it surprising that I can't find more information on managing larger Python codebases.
What issues will we encounter that I haven't thought to ask about, or don't yet know enough to even know to ask about?
Any valuable pointers will be greatly appreciated.
Well, I won't even try to answer all those (excellent) questions, but here are a few opinionated pointers which will hopefully help (as someone who works in both worlds, though more Java).
Packaging
If so, would we store them in our Nexus repo like our Java jars, or is
there a more Python-specific method if internal package distribution?
Or, do we just share raw code, checking out sources from multiple
Github repos?
Packaging in Python is historically a bit of a mess IMHO, though it feels like it's improving. Distutils is the major / native tool here - I've not used it much, feels slightly scary in places. In general, also check recommended tools.
Pip has all but won the war of mindshare, especially when installing 3rd party libraries. I've not solved the local library problem myself, (maybe someone else reading has), but if I were, I'd probably opt for Pip with local/network-disk repos e.g. by installing from wheels.
Another option (which can cause all sorts of hassles itself) is to package in your OS's native packager, be it Debian-style apt or by creating RPMs, etc. Of course, Windows not so much.
Versioning etc
How do we manage third party package versioning? I've never seen
easy_install or pip passed a version number.
Pip
Pip definitely supports version specifiers. Turns out Easy Install does too. I suppose many people / smaller projects opt for latest-and-greatest, which of course isn't always as "appropriate" in the enterprise...
Virtualenv
No discussion of versioning and Python would miss a Python2/3 reference, but I'm sure you're aware of all this already.
More important then would be to mention virtualenv. It truly frees you from the mess you can get in to testing multiple versions, bearing in mind especially that your (*NIX) operating systems typcially rely heavily on Python themselves. It's a big subject so have a look at the docs.
Developer Tooling
Do tools exist to aid in Python code reviews, CI, regression testing,
etc.?
Code Review
Very much so. Most code review tools are multi-language (it's just a formatting issue really), so just pick your favourite enterprise-friendly one, be it Crucible, Github's one (Barkeep?), Gerrit, or whatever.
CI
For CI you have almost as many options again. Running python apps is usually less involved than Java ones, so most CI systems, though often Java-focused, support Python. (FWIW, we use drone.io for Quod Libet). Jenkins should have no problem doing this, and it seems people have done so with TeamCity.
However, the "original" or "most Pythonic" is probably Buildbot, but I've not used it personally. Looks a lot newer than I remember, and it had quite a lot of support in the Python community I think...
Testing
For testing, though not quite as mature as JUnit / TestNG, check out the de-facto / JUnit-like unit testing unittest, but also (nicer?) alternatives like nose.py.
For higher level (BDD) testing, try something like Lettuce - as the name implies heavily inspired by Cucumber, or maybe Behave. I've not tried them, but common opinion is they're less mature than Cucumber / JBehave / Concordion / Rspec etc.
I am new at writing APIs in python, in any language for that matter. I was hoping to get pointers on how i can create an API that can be installed using setup.py method and used in other python projects. Something similar to the twitterapi.
I have already created and coded all the methods i want to include in the API. I just need to know how to implement the installation so other can use my code to leverage ideas they may have. Or if i need to format the code a certain way to facilitate installation.
I learn best with examples or tutorials.
Thanks so much.
It's worth noting that this part of python is undergoing some changes right now. It's all a bit messy. The most current overview I know of is the Hitchhiker's Guide to Packaging: http://guide.python-distribute.org/
The current state of packaging section is important: http://guide.python-distribute.org/introduction.html#current-state-of-packaging
The python packaging world is a mess (like poswald said). Here's a brief overview along with a bunch of pointers. Your basic problem (using setup.py etc.) is solved by reading the distutils guide which msw has mentioned in his comment.
Now for the dirt. The basic infrastructure of the distribution modules which is in the Python standard library is distutils referred to above. It's limited in some ways and so a series of extensions was written on top of it called setuptools. Setuptools along with actually increasing the functionality provided a command line "installer" called "easy_install".
Setuptools maintenance was not too great and so it was forked and a more active branch called "distribute" was setup and it is the preferred alternative right now. In addition to this, a replacement for easy_install named pip was created which was more modular and useful.
Now there's a huge project going which attempts to fold in all changes from distribute and stuff into a unified library that will go into the stdlib. It's tentatively called "distutils2".
What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).
The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.
If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.
Suppose I've developed a general-purpose end user utility written in Python. Previously, I had just one version available which was suitable for Python later than version 2.3 or so. It was sufficient to say, "download Python if you need to, then run this script". There was just one version of the script in source control (I'm using Git) to keep track of.
With Python 3, this is no longer necessarily true. For the foreseeable future, I will need to simultaneously develop two different versions, one suitable for Python 2.x and one suitable for Python 3.x. From a development perspective, I can think of a few options:
Maintain two different scripts in the same branch, making improvements to both simultaneously.
Maintain two separate branches, and merge common changes back and forth as development proceeds.
Maintain just one version of the script, plus check in a patch file that converts the script from one version to the other. When enough changes have been made that the patch no longer applies cleanly, resolve the conflicts and create a new patch.
I am currently leaning toward option 3, as the first two would involve a lot of error-prone tedium. But option 3 seems messy and my source control system is supposed to be managing patches for me.
For distribution packaging, there are more options to choose from:
Offer two different download packages, one suitable for Python 2 and one suitable for Python 3 (the user will have to know to download the correct one for whatever version of Python they have).
Offer one download package, with two different scripts inside (and then the user has to know to run the correct one).
One download package with two version-specific scripts, and a small stub loader that can run in both Python versions, that runs the correct script for the Python version installed.
Again I am currently leaning toward option 3 here, although I haven't tried to develop such a stub loader yet.
Any other ideas?
Edit: my original answer was based on the state of 2009, with Python 2.6 and 3.0 as the current versions. Now, with Python 2.7 and 3.3, there are other options. In particular, it is now quite feasible to use a single code base for Python 2 and Python 3.
See Porting Python 2 Code to Python 3
Original answer:
The official recommendation says:
For porting existing Python 2.5 or 2.6
source code to Python 3.0, the best
strategy is the following:
(Prerequisite:) Start with excellent test coverage.
Port to Python 2.6. This should be no more work than the average port
from Python 2.x to Python 2.(x+1).
Make sure all your tests pass.
(Still using 2.6:) Turn on the -3 command line switch. This enables
warnings about features that will be
removed (or change) in 3.0. Run your
test suite again, and fix code that
you get warnings about until there are
no warnings left, and all your tests
still pass.
Run the 2to3 source-to-source translator over your source code tree.
(See 2to3 - Automated Python 2 to 3
code translation for more on this
tool.) Run the result of the
translation under Python 3.0. Manually
fix up any remaining issues, fixing
problems until all tests pass again.
It is not recommended to try to write
source code that runs unchanged under
both Python 2.6 and 3.0; you’d have to
use a very contorted coding style,
e.g. avoiding print statements,
metaclasses, and much more. If you are
maintaining a library that needs to
support both Python 2.6 and Python
3.0, the best approach is to modify step 3 above by editing the 2.6
version of the source code and running
the 2to3 translator again, rather than
editing the 3.0 version of the source
code.
Ideally, you would end up with a single version, that is 2.6 compatible and can be translated to 3.0 using 2to3. In practice, you might not be able to achieve this goal completely. So you might need some manual modifications to get it to work under 3.0.
I would maintain these modifications in a branch, like your option 2. However, rather than maintaining the final 3.0-compatible version in this branch, I would consider to apply the manual modifications before the 2to3 translations, and put this modified 2.6 code into your branch. The advantage of this method would be that the difference between this branch and the 2.6 trunk would be rather small, and would only consist of manual changes, not the changes made by 2to3. This way, the separate branches should be easier to maintain and merge, and you should be able to benefit from future improvements in 2to3.
Alternatively, take a bit of a "wait and see" approach. Proceed with your porting only so far as you can go with a single 2.6 version plus 2to3 translation, and postpone the remaining manual modification until you really need a 3.0 version. Maybe by this time, you don't need any manual tweaks anymore...
For developement, option 3 is too cumbersome. Maintaining two branches is the easiest way although the way to do that will vary between VCSes. Many DVCS will be happier with separate repos (with a common ancestry to help merging) and centralized VCS will probably easier to work with with two branches. Option 1 is possible but you may miss something to merge and a bit more error-prone IMO.
For distribution, I'd use option 3 as well if possible. All 3 options are valid anyway and I have seen variations on these models from times to times.
I don't think I'd take this path at all. It's painful whichever way you look at it. Really, unless there's strong commercial interest in keeping both versions simultaneously, this is more headache than gain.
I think it makes more sense to just keep developing for 2.x for now, at least for a few months, up to a year. At some point in time it will be just time to declare on a final, stable version for 2.x and develop the next ones for 3.x+
For example, I won't switch to 3.x until some of the major frameworks go that way: PyQt, matplotlib, numpy, and some others. And I don't really mind if at some point they stop 2.x support and just start developing for 3.x, because I'll know that in a short time I'll be able to switch to 3.x too.
I would start by migrating to 2.6, which is very close to python 3.0. You might even want to wait for 2.7, which will be even closer to python 3.0.
And then, once you have migrated to 2.6 (or 2.7), I suggest you simply keep just one version of the script, with things like "if PY3K:... else:..." in the rare places where it will be mandatory. Of course it's not the kind of code we developers like to write, but then you don't have to worry about managing multiple scripts or branches or patches or distributions, which will be a nightmare.
Whatever you choose, make sure you have thorough tests with 100% code coverage.
Good luck!
Whichever option for development is chosen, most potential issues could be alleviated with thorough unit testing to ensure that the two versions produce matching output. That said, option 2 seems most natural to me: applying changes from one source tree to another source tree is a task (most) version control systems were designed for--why not take advantages of the tools they provide to ease this.
For development, it is difficult to say without 'knowing your audience'. Power Python users would probably appreciate not having to download two copies of your software yet for a more general user-base it should probably 'just work'.
One issue that comes up during Pinax development is dealing with development versions of external apps. I am trying to come up with a solution that doesn't involve bringing in the version control systems. Reason being I'd rather not have to install all the possible version control systems on my system (or force that upon contributors) and deal the problems that might arise during environment creation.
Take this situation (knowing how Pinax works will be beneficial to understanding):
We are beginning development on a new version of Pinax. The previous version has a pip requirements file with explicit versions set. A bug comes in for an external app that we'd like to get resolved. To get that bug fix in Pinax the current process is to simply make a minor release of the app assuming we have control of the app. Apps we don't have control we just deal with the release cycle of the app author or force them to make releases ;-) I am not too fond of constantly making minor releases for bug fixes as in some cases I'd like to be working on new features for apps as well. Of course branching the older version is what we do and then do backports as we need.
I'd love to hear some thoughts on this.
Could you handle this using the "==dev" version specifier? If the distribution's page on PyPI includes a link to a .tgz of the current dev version (such as both github and bitbucket provide automatically) and you append "#egg=project_name-dev" to the link, both easy_install and pip will use that .tgz if ==dev is requested.
This doesn't allow you to pin to anything more specific than "most recent tip/head", but in a lot of cases that might be good enough?
I meant to mention that the solution I had considered before asking was to put up a Pinax PyPI and make development releases on it. We could put up an instance of chishop. We are already using pip's --find-links to point at pypi.pinaxproject.com for packages we've had to release ourselves.
Most open source distributors (the Debians, Ubuntu's, MacPorts, et al) use some sort of patch management mechanism. So something like: import the base source code for each package as released, as a tar ball, or as a SCM snapshot. Then manage any necessary modifications on top of it using a patch manager, like quilt or Mercurial's Queues. Then bundle up each external package with any applied patches in a consistent format. Or have URLs to the base packages and URLs to the individual patches and have them applied during installation. That's essentially what MacPorts does.
EDIT: To take it one step further, you could then version control the set of patches across all of the external packages and make that available as a unit. That's quite easy to do with Mercurial Queues. Then you've simplified the problem to just publishing one set of patches using one SCM system, with the patches applied locally as above or available for developers to pull and apply to their copies of the base release packages.
EDIT: I am not sure I am reading your question correctly so the following may not answer your question directly.
Something I've considered, but haven't tested, is using pip's freeze bundle feature. Perhaps using that and distributing the bundle with Pinax would work? My only concern would be how different OS's are handled. For example, I've never used pip on Windows, so I wouldn't know how a bundle would interact there.
The full idea I hope to try is creating a paver script that controls management of the bundles, making it easy for users to upgrade to newer versions. This would require a bit of scaffolding though.
One other option may be you keeping a mirror of the apps you don't control, in a consistent vcs, and then distributing your mirrored versions. This would take away the need for "everyone" to have many different programs installed.
Other than that, it seems the only real solution is what you guys are doing, there isn't a hassle-free way that I've been able to find.