Python 3 development and distribution challenges

Python 3 development and distribution challenges - python

Suppose I've developed a general-purpose end user utility written in Python. Previously, I had just one version available which was suitable for Python later than version 2.3 or so. It was sufficient to say, "download Python if you need to, then run this script". There was just one version of the script in source control (I'm using Git) to keep track of.
With Python 3, this is no longer necessarily true. For the foreseeable future, I will need to simultaneously develop two different versions, one suitable for Python 2.x and one suitable for Python 3.x. From a development perspective, I can think of a few options:
Maintain two different scripts in the same branch, making improvements to both simultaneously.
Maintain two separate branches, and merge common changes back and forth as development proceeds.
Maintain just one version of the script, plus check in a patch file that converts the script from one version to the other. When enough changes have been made that the patch no longer applies cleanly, resolve the conflicts and create a new patch.
I am currently leaning toward option 3, as the first two would involve a lot of error-prone tedium. But option 3 seems messy and my source control system is supposed to be managing patches for me.
For distribution packaging, there are more options to choose from:
Offer two different download packages, one suitable for Python 2 and one suitable for Python 3 (the user will have to know to download the correct one for whatever version of Python they have).
Offer one download package, with two different scripts inside (and then the user has to know to run the correct one).
One download package with two version-specific scripts, and a small stub loader that can run in both Python versions, that runs the correct script for the Python version installed.
Again I am currently leaning toward option 3 here, although I haven't tried to develop such a stub loader yet.
Any other ideas?

Edit: my original answer was based on the state of 2009, with Python 2.6 and 3.0 as the current versions. Now, with Python 2.7 and 3.3, there are other options. In particular, it is now quite feasible to use a single code base for Python 2 and Python 3.
See Porting Python 2 Code to Python 3
Original answer:
The official recommendation says:
For porting existing Python 2.5 or 2.6
source code to Python 3.0, the best
strategy is the following:
(Prerequisite:) Start with excellent test coverage.
Port to Python 2.6. This should be no more work than the average port
from Python 2.x to Python 2.(x+1).
Make sure all your tests pass.
(Still using 2.6:) Turn on the -3 command line switch. This enables
warnings about features that will be
removed (or change) in 3.0. Run your
test suite again, and fix code that
you get warnings about until there are
no warnings left, and all your tests
still pass.
Run the 2to3 source-to-source translator over your source code tree.
(See 2to3 - Automated Python 2 to 3
code translation for more on this
tool.) Run the result of the
translation under Python 3.0. Manually
fix up any remaining issues, fixing
problems until all tests pass again.
It is not recommended to try to write
source code that runs unchanged under
both Python 2.6 and 3.0; you’d have to
use a very contorted coding style,
e.g. avoiding print statements,
metaclasses, and much more. If you are
maintaining a library that needs to
support both Python 2.6 and Python
3.0, the best approach is to modify step 3 above by editing the 2.6
version of the source code and running
the 2to3 translator again, rather than
editing the 3.0 version of the source
code.
Ideally, you would end up with a single version, that is 2.6 compatible and can be translated to 3.0 using 2to3. In practice, you might not be able to achieve this goal completely. So you might need some manual modifications to get it to work under 3.0.
I would maintain these modifications in a branch, like your option 2. However, rather than maintaining the final 3.0-compatible version in this branch, I would consider to apply the manual modifications before the 2to3 translations, and put this modified 2.6 code into your branch. The advantage of this method would be that the difference between this branch and the 2.6 trunk would be rather small, and would only consist of manual changes, not the changes made by 2to3. This way, the separate branches should be easier to maintain and merge, and you should be able to benefit from future improvements in 2to3.
Alternatively, take a bit of a "wait and see" approach. Proceed with your porting only so far as you can go with a single 2.6 version plus 2to3 translation, and postpone the remaining manual modification until you really need a 3.0 version. Maybe by this time, you don't need any manual tweaks anymore...

For developement, option 3 is too cumbersome. Maintaining two branches is the easiest way although the way to do that will vary between VCSes. Many DVCS will be happier with separate repos (with a common ancestry to help merging) and centralized VCS will probably easier to work with with two branches. Option 1 is possible but you may miss something to merge and a bit more error-prone IMO.
For distribution, I'd use option 3 as well if possible. All 3 options are valid anyway and I have seen variations on these models from times to times.

I don't think I'd take this path at all. It's painful whichever way you look at it. Really, unless there's strong commercial interest in keeping both versions simultaneously, this is more headache than gain.
I think it makes more sense to just keep developing for 2.x for now, at least for a few months, up to a year. At some point in time it will be just time to declare on a final, stable version for 2.x and develop the next ones for 3.x+
For example, I won't switch to 3.x until some of the major frameworks go that way: PyQt, matplotlib, numpy, and some others. And I don't really mind if at some point they stop 2.x support and just start developing for 3.x, because I'll know that in a short time I'll be able to switch to 3.x too.

I would start by migrating to 2.6, which is very close to python 3.0. You might even want to wait for 2.7, which will be even closer to python 3.0.
And then, once you have migrated to 2.6 (or 2.7), I suggest you simply keep just one version of the script, with things like "if PY3K:... else:..." in the rare places where it will be mandatory. Of course it's not the kind of code we developers like to write, but then you don't have to worry about managing multiple scripts or branches or patches or distributions, which will be a nightmare.
Whatever you choose, make sure you have thorough tests with 100% code coverage.
Good luck!

Whichever option for development is chosen, most potential issues could be alleviated with thorough unit testing to ensure that the two versions produce matching output. That said, option 2 seems most natural to me: applying changes from one source tree to another source tree is a task (most) version control systems were designed for--why not take advantages of the tools they provide to ease this.
For development, it is difficult to say without 'knowing your audience'. Power Python users would probably appreciate not having to download two copies of your software yet for a more general user-base it should probably 'just work'.

Related

Checking Python standard library function/method calls for compatibility with old Python versions

I have a set of scripts and utility modules that were written for a recent version of Python 3. Now suddenly, I have a need to make sure that all this code works properly under an older version of Python 3. I can't get the user to update to a more recent Python version -- that's not an option. So I need to identify all the instances where I've used some functionality that was introduced since the old version they have installed, so I can remove it or develop workarounds.
Approach #1: eyeball all the code and compare against documentation. Not ideal when there's this much code to look at.
Approach #2: create a virtual environment locally based on the old version in question using pyenv, run everything, see where it fails, and make fixes. I'm doing this anyway, because backporting to the older Python will also mean going backwards in a number of needed third-party modules from PyPi, and I'll need to make sure that the suite still functions properly. But I don't think it's a good way to identify all my version incompatibilities, because much of the code is only exercised based on particular characteristics of input data, and it'd be hard to make sure I exercise all the code (I don't yet have good unit tests that ensure every line will get executed).
Approach #3: in my virtual environment based on the older version, I used pyenv to install the pylint module, then used this pylint module to check my code. It ran; but it didn't identify issues with standard library calls. For example, I know that several of my functions call subprocess.run() with the "check_output=" Boolean argument, which didn't become available until version 3.7. I expected the 3.6 pylint run to spot this and yell at me; but it didn't. Does pylint not check standard library calls against definitions?
Anyway, this is all I've thought of so far. Any ideas gratefully appreciated. Thanks.

If you want to use pylint to check 3.6 code the most effective way is to use a 3.6 interpreter and environment and then run pylint in it. If you want to use the latest pylint version, you can use the py-version option using 3.6 but this is probably going to catch less issue because pylint will not check what you would have in python 3.6, only some known "hard coded" issue in python 3.6 (like for example f-strings for python 3.5, not missing args in subprocess.run).

As noted in the comments, the real issue is that you do not have a proper test suite, so the question is how can you get one cheaply.
Adding unit test can be time consuming. Before doing that, you can add actual end-to-end tests (which will take some computational time and longer feedback time, but that will be easier to implement), by simply running the program with the current version of python that it is working with and storing the results and then adding a test to show you reproduce the same results.
This kind of test is usually expensive to maintain (as each time you are updating the behavior, you have to update the results). However, there are a safeguard against regression, and allow you to perform some heavy refactoring on legacy code in order to move to a more testable structure.
In your case, these end-to-end test will allow you to test against several versions of python the actual application (not only parts of it).
Once you have a better test suite, you can them decide if this heavy end-to-end tests are worth keeping based on the maintenance burden of the test suite (let's not forget that the test suite should not slow you down in your development, so if it is the bottleneck, that means you should rethink your testing)
What will take time is to generate good input data to your end-to-end tests, to help you with that, you should use some coverage tool (you might even spot unreachable code thanks to that). If there are part of your code that you don't manage to reach, I would not bother at first about it, as it means it will be unlikely to be reached by your client (and if it is the case and it fails at your client, be sure to have proper logging implemented to be able to add the test case to your test suite)

Checking code for compatibility with Python 2 and 3

Is there any automated way to test that code is compatible with both Python 2 and 3? I've seen plenty of documentation on how to write code that is compatible with both, but nothing on automatically checking. Basically a kind of linting for compatibility between versions rather than syntax/style.
I have thought of either running tests with both interpreters or running a tool like six or 2to3 and checking that nothing is output; unfortunately, the former requires that you have 100% coverage with your tests and I would assume the latter requires that you have valid Python 2 code and would only pick up issues in compatibility with Python 3.
Is there anything out there that will accomplish this task?

There is no "fool-proof" way of doing this other than running the code on both versions and finding inconsistencies. With that said, CPython2.7 has a -3 flag which (according to the man page) says:
Warn about Python 3.x incompatibilities that 2to3 cannot trivially fix.
As for the case where you have valid python3 code and you want to backport it to python2.x -- You likely don't actually want to do this. python3.x is the future. This is likely to be a very painful problem in the general case. A lot of the reason to start using python3.x is because then you gain access to all sorts of cool new features. Trying to re-write code that is already relying on cool new features is frequently going to be very difficult. Your much better off trying to upgrade python2.x packages to work on python3.x than doing things the other way around.

Could you not look at the output from 2to3 to see if any code changes may be necessary ?

Python's freeze.py doesn't install on Windows

I have been looking for the freeze.py utility which is supposed to come bundled with Python 3 in a Python 3.3 Windows install (albeit with distribute and pip installed) and haven't found it. The utility can be downloaded directly out of the Python svn repository here, but I'm wondering: does freeze come with a standard Windows Python 3 install?

It looks like Windows binary installations of Python don't come with the freeze tool. And there's apparently a good reason for this. According to the freeze README in the source tree:
Under Windows 95 or NT, you must use the -p option and point it to the top of the Python source tree.
If you read the whole section, it comes down to this: On Windows, freeze only works if you've built Python from source, and have the resulting tree sitting around to be used for freezing. So, there's no good reason to give you freeze in binary installations.
Meanwhile, I probably should have asked this in the first place, but… are you sure you want freeze in the first place?
The freeze utility is very out of date (you might have guessed that from the README talking about requiring VC++ 5.0, Windows 95 or NT 4.0, etc.). It also never worked that well on Windows (as you can tell from the documentation describing it as a utility "… to compile executables for Unix systems"). And there's just a lot of things it can't handle, or handles badly. At this point should probably be considered more as example code than as a useful tool.
There are a number of third-party alternatives out there: cx_freeze, py2exe, PyInstaller, etc. If you search PyPI for "freeze" (and other terms that seem reasonable), you will find a bunch of these alternatives. If your goal is to create a standalone executable out of your Python script (which, btw, freeze can never do on Windows anyway), experiment with a few of these and pick the one you like best.
If your goal is something different, the right tool will be different—you might be better off using venv or just zipping up a user site-packages directory or creating a local PyPI server.
In the comments, you said:
What I was actually looking for is a tool to convert Python code to C code. Apparently, that's impossible.
It's not impossible, it's just not what freeze (or its successors/competitors) does. Cython compiles almost a strict superset of Python to C code, although it's C code that uses Python runtime objects (except where you explicitly statically declare variables and functions with C types). If C++ is an acceptable alternative to C, Shed Skin compiles a restricted subset of Python 2.6 (using native C++ objects, and using type inference so you don't have to statically declare your types).
The question is why you want to compile Python code to C.
If you're looking to optimize some slow code, Cython is great at speeding up small pieces of bottleneck code. It takes a bit of effort (deciding what to move to Cython, what static type declarations to put in, etc.), but the curve of payoff to effort is pretty solid. Shed Skin takes a lot less effort—if it works, it just speeds up everything, automatically—but it also means you can't write a lot of idiomatic Python code in the first place. But really, before looking at either, you should consider PyPy, a complete implementation of Python 2.7.3 (and hopefully 3.3 soon) in a JIT-compiling interpreter, that often offers similar speedups, with pretty much no tradeoffs at all. Or, alternatively, you may just need to rewrite slow code to take advantage of already-optimized libraries (numpy instead of mapping over lists, itertools instead of explicit loops, lxml instead of html.parse, …).
If you're looking to write Python code that can interact directly with C code, without all the headaches of ctypes (or manually building Python bindings), Cython scores again. Cython code can effectively natively call both Python code and C code, and the compiler makes it all work like magic.
If you're looking to get C code that you can read, maintain, and improve on… there, you're out of luck. And this one may actually be impossible. Idiomatic Python code is just so different from idiomatic C code that it's hard to imagine how you could translate one into the other.
If you're wondering what the underlying problem is:
As far as I can tell, freeze makes a lot of assumptions about how things are laid out. It should be enough to have any Python installation that can build C extension modules and embedding apps, but it's not, because freeze goes under the covers and expects that building to work in specific ways. A standard binary installation on almost every *nix platform ends up looking like what freeze expects,* but a standard binary installation on Windows looks completely different.
It's not impossible to hack things up using Windows symlinks (at least if you have Vista or later and a drive with a modern version of NTFS) to get everything organized the way freeze expects (I found a blog where someone did that with 2.7.1…), but really, I don't think it's worth trying. It will be a lot of work (especially if you're just learning this stuff), and there's no guarantee you won't immediately run into another problem.
* This isn't actually true. On a Mac, both Apple's pre-installed Python and the binary installers at python.org actually give you the files organized as a Mac framework—but they provide a bunch of symlinks that simulate the traditional layout, which is good enough. On most linux distros, and many other platforms, the binary python package doesn't include any of the development files at all—but once you install an add-on binary package named something like python-devel, then you've got the right layout. Anyway, none of this matters to you, because if you wanted to learn about dpkg dependencies or framework builds you wouldn't be using Windows, right?

Tracking global migration to Python 3.x

Python 3.x is looking ever more tempting with cleaned up syntax (I like it, others may not) new features and what looks like a gradual progression towards more speed and better multithreading.
But Python 3.x is still held back by lack of 3rd party support. Important packages like Django, Twisted, etc. are not ported. It's hard to get an overview of where the bottlebecks in the migration are, how far it has come, and if it's progressing at all. The migration dependencies are also hard to map. Also, projects are probably waiting for Python 3.x to offer some major improvement over 2.x that would justify the effort of porting.
Ideally, there would be a site for tracking this migration overall, with (links to) migration plans and dependencies shown so that people willing to help the migration globally could coordinate their efforts and help specific projects. Perhaps also linking to projects' bug tracking systems for relevant migration-related bugs.
But perhaps I'm just not looking hard enough. Does someone know of any efforts to track global migration to Python 3.x?
(By "global", I mean the universe of open source projects built on Python.)
Update:
There's a poll right now on the Python home page which asks about packages you'd like to see ported to Python 3.x.

George Brandl has made a script that generates a graph with the amount of packages supporting Python 3:
The Link on the CheeseShop front page shows the packages in question: http://pypi.python.org/pypi?%3aaction=browse&c=533&show=all
There is also (a pretty crummy) list of unported packages ordered by how many depends on it: http://onpython3yet.com/ Why do I say it's crummy? Well, because it is done entirely without manual fixing up, resulting in things like listing Python as a package. This is to a large extent because people don't know that the "Dependencies" listing isn't a place to just list any sort of random dependencies, it should be used to list the packages that should be auto installed when you use easy_install/PIP. But for example in the Django world, they don't know that so you see things like "django-saddle" depending on Django and Python, and hence not being easy_installable.
That said, the list is interesting, and we see that PIL really should get ported.
Now this is not anything "global" it's just the packages on PyPI, and as such tend to be mostly Python modules, not separate applications. But I think the trend in general is visible there anyway.

The Python Package Index (PyPI) allows you to search for Python 3rd-party modules that support Python 3.x. It even has a Python 3 packages link which lists them all.
But that doesn't track individual projects' progress on Python 3 support. It just tells you which projects have achieved it.
Something I'd be interested to see is a graph of the total number/percentage of Python 3 packages in PyPI over time (from Python 3 release until present). I don't know if anyone has tracked this, or if the PyPI administrators have enough history data to produce such graphs.

Is it still Python 2.6 versus Python 3?

G'day,
I'm wanting to go back to Python after not using it for a while and I saw this question "Python Version for a Newbie" while wondering about getting back into Python 2.6 or Python 3.
Almost all of the questions' answers were along the lines that most of the code out there, libraries, legacy systems, etc., is 2.5 or 2.6 rather than 3 so start with 2.x now and then head towards 3 later on.
Given that the question and all answers date from early December 2008 I was wondering is this still the case?
Should someone who wants to get back into Python maybe start off with 2.6 and then head towards 3 later on?

Yes. Virtually all live production systems will use 2.5/2.6 for a long time yet. There's no point learning 3.0, only to have to downgrade it because your host doesn't support it.
95% of what you will learn in 2.5/2.6 is applicable to 3 anyway.

Depends on the amount of libraries you're going to use.
Raw Python, or all libs are available for Py3k - go for it without any doubts.
Python code distributed as standalone app (using PyInstaller), relying on some GUI lib, XML-lib, win32api etc - double check if all libs are available at least as betas for Py3k. Chances are still quite high that some older lib is not available for Python 3.x, and either you port it by yourself to new Python version, or you switch to some other lib or - stick to Python 2.6 for a while.

If you want to use only standard library then try Python 3.1. If you want to use others libraries/frameworks then they dictate the version to use. For example web2py framework will work best on 2.5.

I would say that Python 2.4 is the safest to learn, but the changes from 2.4->2.5->2.6 make some small progress towards Python 3.x, even if they may never make it (if I recall there will be some more steps?).
Python 3.1 can be used if you own a dedicated server and intend to build your own applications from the ground up. WSGI does support this, but I wouldn't recommend it.
As has already been said, I would learn the Python 2.5 or Python 2.6 style, but I would make a few changes.
Look at the Python 3 style regarding brackets.
e.g. The print function in 2.x has always been just
print "Hello World"
Where as in 3.x you need to enclose it
print("Hello World")
This is probably a good practice to pick up on, but things like Exceptions will cause issues if you use 3.x in 2.x. I know it's probably a bit confusing, but if you make sure you wrap your functions (additional brackets shouldn't really hurt most things) so that nothing is bare (bare like the first code snippet above), then it'll help with the transition.

The problem is, if you started with 2.4 or more it is better if you start from there, so you'll get on track faster, after some time when you feel comfortable with you code you can try 3.0 and find out what did they change and learn the new style.
I for once still code in 2.6 style and follow those guidelines, still haven't seen the changes in 3.0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.