Related
I've read in few places that generally, Python doesn't provide backward compatibility, which means that any newer version of Python may break code that worked fine for earlier versions. If so, what is my way as a developer to know what versions of Python can execute my code successfully? Is there any set of rules/guarantees regarding this? Or should I just tell my users: Just run this with Python 3.8 (for example) - no more no less...?
99% of the time, if it works on Python 3.x, it'll work on 3.y where y >= x. Enabling warnings when running your code on the older version should pop DeprecationWarnings when you use a feature that's deprecated (and therefore likely to change/be removed in later Python versions). Aside from that, you can read the What's New docs for each version between the known good version and the later versions, in particular the Deprecated and Removed sections of each.
Beyond that, the only solution is good unit and component tests (you are using those, right? 😉) that you rerun on newer releases to verify stuff still works & behavior doesn't change.
According to PEP387, section "Making Incompatible Changes", before incompatible changes are made, a deprecation warning should appear in at least two minor Python versions of the same major version, or one minor version in an older major version. After that, it's a free game, in principle. This made me cringe with regards to safety. Who knows if people run airplanes on Python and if they don't always read the python-dev list. So if you have something that passes 100% coverage unit tests without deprecation warnings, your code should be safe for the next two minor releases.
You can avoid this issue and many others by containerizing your deployments.
tox is great for running unit tests against multiple Python versions. That’s useful for at least 2 major cases:
You want to ensure compatibility for a certain set of Python versions, say 3.7+, and to be told if you make any breaking changes.
You don’t really know what versions your code supports, but want to establish a baseline of supported versions for future work.
I don’t use it for internal projects where I can control over the environment where my code will be running. It’s lovely for people publishing apps or libraries to PyPI, though.
I'm looking at some software that is wanting to bring in Python 3.6 for use in an environment where 3.5 is the standard. Reading up on Python's documentation I can't find anything about whether:
3.5 is representative of a semantic version number
3.6 would represent a forwards compatible upgrade (ie: code written for a 3.5 runtime is guaranteed to work in a 3.6 runtime)
The fact that this page about porting to 3.7 exists makes me think strongly no but I can't see official docs on what the version numbers mean (if anything, ala Linux kernel versioning)
In the more general sense - is there a PEP around compatibility standards within the 3.X release stream?
The short answer is "No", the long answer is "They strive for something close to it".
As a rule, micro versions match semantic versioning rules; they're not supposed to break anything or add features, just fix bugs. This isn't always the case (e.g. 3.5.1 broke vars() on a namedtuple, because it caused a bug that was worse than the break when it came up), but it's very rare for code (especially Python level stuff, as opposed to C extensions) to break across a micro boundary.
Minor versions mostly "add features", but they will also make backwards incompatible changes with prior warning. For example, async and await became keywords in Python 3.7, which meant code using them as variable names broke, but with warnings enabled, you would have seen a DeprecationWarning in 3.6. Many syntax changes are initially introduced as optional imports from the special __future__ module, with documented timelines for becoming the default behavior.
None of the changes made in minor releases are broad changes; I doubt any individual deprecation or syntax change has affected even 1% of existing source code, but it does happen. If you've got a hundred third party dependencies, and you're jumping a minor version or two, there is a non-trivial chance that one of them will be broken by the change (example: pika prior to 0.12 used async as a variable name, and broke on Python 3.7; they released new versions that fixed the bug, but of course, moving from 0.11 and lower to 0.12 and higher changed their own API in ways that might break your code).
Major versions are roughly as you'd expect; backwards incompatible changes are expected/allowed (though they're generally not made frivolously; the bigger the change, the bigger the benefit).
Point is, it's close to semantic versioning, but in the interests of not having major releases every few years, while also not letting the language stagnate due to strict compatibility constraints, minor releases are allowed to break small amounts of existing code as long as there is warning (typically in the form of actual warnings from code using deprecated behavior, notes on the What's New documentation, and sometimes __future__ support to ease the migration path).
This is all officially documented (with slightly less detail) in their Development Cycle documentation:
To clarify terminology, Python uses a major.minor.micro nomenclature for production-ready releases. So for Python 3.1.2 final, that is a major version of 3, a minor version of 1, and a micro version of 2.
new major versions are exceptional; they only come when strongly incompatible changes are deemed necessary, and are planned very long in advance;
new minor versions are feature releases; they get released annually, from the current in-development branch;
new micro versions are bugfix releases; they get released roughly every 2 months; they are prepared in maintenance branches.
Here's the document on updating to 3.6.
If you had, for example, open(apath, 'U+') in your code in 3.5, it would fail in 3.6. So, clearly, Python 3.6 is not entirely backwards compatible to every usage in 3.5.
Realistically, you will need to test, although I feel fairly comfortable telling the average stackoverflow reader from almost every area that they should feel comfortable doing this upgrade.
As for Semantic Versioning, specifically, Python does not follow it, but it isn't entirely agnostic to the meaning of major, minor and bugfix releases. Python guidelines for its developers can be found here.
To clarify terminology, Python uses a major.minor.micro nomenclature
for production-ready releases. So for Python 3.1.2 final, that is a
major version of 3, a minor version of 1, and a micro version of 2.
new major versions are exceptional; they only come when strongly incompatible changes are deemed necessary, and are planned very long
in advance;
new minor versions are feature releases; they get released annually, from the current in-development branch;
new micro versions are bugfix releases; they get released roughly every 2 months; they are prepared in maintenance branches.
Also read PEP440, which is for modules, not about releasing new versions of python itself, but still relevant for the philosophy of the ecosystem.
Basically, I am a Java programmer who wants to learn Python language. I want to clarify why some of python libaries are distributing using non-portable manner.
Let me explain my thoughts. If someone creates a regular library using Java he prepares 1 (one) JAR file which can be used on different platforms:
my-great-lib-1.2.4.jar
I can use this lib (the same file) on any version of Windows or Linux.
In contrast to Java, python libraries may look like this:
bsdiff4-1.1.4.win-amd64-py2.5.exe
bsdiff4-1.1.4.win-amd64-py2.6.exe
bsdiff4-1.1.4.win-amd64-py2.7.exe
bsdiff4-1.1.4.win-amd64-py3.2.exe
bsdiff4-1.1.4.win-amd64-py3.3.exe
bsdiff4-1.1.4.win32-py2.5.exe
bsdiff4-1.1.4.win32-py2.6.exe
bsdiff4-1.1.4.win32-py2.7.exe
bsdiff4-1.1.4.win32-py3.2.exe
bsdiff4-1.1.4.win32-py3.3.exe
See full list on page.
It looks very strange for me. Even 32bit and 64bit platforms require different installers. Installers! Why do I need an installer in order to use one library? Moreover, outlined installers are only for Windows. Each of them is bind to particular python version. Where is portability?
Could anyone explain a necessity of 10 different files above?
In general, Python libraries are portable across platforms. Problems appear between different major Python versions (3 introduced some big changes from 2, but 2.7 is backwards compatible with 2.6) or when you use C code for optimizing CPU intensive code. On Linux, compiling it yourself is not a problem, when you call pip install package, it will do it for you. The problem is on Windows, where it is much more difficult to compile a C program, especially because not everybody has a compiler. So, for Windows, packages that need something in C, you usually get an installer.
Also, installers are used because they set up everything nicely, look in the registry for the appropriate place to put everything, offer a standard way to uninstall them (the ones from Chrisopther Goelke's site can be removed using Add/Remove programs in Control Panel) and because that's the standard on Windows: most of the programs on Windows are installed via an exe, because it doesn't have a standard and widespread package manager.
All these libraries are then portable: you can use them from any platform, but installing them is what differs.
There are many complications. In Java where your code and then byte-code is interpreted by JVM, the inherent computer architecture do not play lot of role as long as your code is interpreted well by JVM. In fact, that is one of the primary reason Java got so popular because your code should only worry about rightly compiled by JVM.
However, in Python situation is different. I am trying to summarize some of the reason which I think is important in following lines:
The language itself is evolving (although it is long in the scenario if you think!) and changes are happening inside the language. New features are added and sometime, even some remodeling of language is done ( Python 2.x to Python 3.x)
Python relies heavily on its C extensions and so does the applications written in Python. If you write a python program and have some CPU intensive code, you can choose to write it in C. This also adds in the necessity of creating number of libraries for various distribution.
For one python versions jump around. In python 3, the syntax of some builtins completely changed. For example:
raw_input()
changed to:
input()
also, a lot of the standard library has changed even in the alpha of 3.4. As for the 32/64 bit question, I cannot fully answer. I know that certain platforms have trouble when trying to run 32/64, and that may be the point there.
I have been looking for the freeze.py utility which is supposed to come bundled with Python 3 in a Python 3.3 Windows install (albeit with distribute and pip installed) and haven't found it. The utility can be downloaded directly out of the Python svn repository here, but I'm wondering: does freeze come with a standard Windows Python 3 install?
It looks like Windows binary installations of Python don't come with the freeze tool. And there's apparently a good reason for this. According to the freeze README in the source tree:
Under Windows 95 or NT, you must use the -p option and point it to the top of the Python source tree.
If you read the whole section, it comes down to this: On Windows, freeze only works if you've built Python from source, and have the resulting tree sitting around to be used for freezing. So, there's no good reason to give you freeze in binary installations.
Meanwhile, I probably should have asked this in the first place, but… are you sure you want freeze in the first place?
The freeze utility is very out of date (you might have guessed that from the README talking about requiring VC++ 5.0, Windows 95 or NT 4.0, etc.). It also never worked that well on Windows (as you can tell from the documentation describing it as a utility "… to compile executables for Unix systems"). And there's just a lot of things it can't handle, or handles badly. At this point should probably be considered more as example code than as a useful tool.
There are a number of third-party alternatives out there: cx_freeze, py2exe, PyInstaller, etc. If you search PyPI for "freeze" (and other terms that seem reasonable), you will find a bunch of these alternatives. If your goal is to create a standalone executable out of your Python script (which, btw, freeze can never do on Windows anyway), experiment with a few of these and pick the one you like best.
If your goal is something different, the right tool will be different—you might be better off using venv or just zipping up a user site-packages directory or creating a local PyPI server.
In the comments, you said:
What I was actually looking for is a tool to convert Python code to C code. Apparently, that's impossible.
It's not impossible, it's just not what freeze (or its successors/competitors) does. Cython compiles almost a strict superset of Python to C code, although it's C code that uses Python runtime objects (except where you explicitly statically declare variables and functions with C types). If C++ is an acceptable alternative to C, Shed Skin compiles a restricted subset of Python 2.6 (using native C++ objects, and using type inference so you don't have to statically declare your types).
The question is why you want to compile Python code to C.
If you're looking to optimize some slow code, Cython is great at speeding up small pieces of bottleneck code. It takes a bit of effort (deciding what to move to Cython, what static type declarations to put in, etc.), but the curve of payoff to effort is pretty solid. Shed Skin takes a lot less effort—if it works, it just speeds up everything, automatically—but it also means you can't write a lot of idiomatic Python code in the first place. But really, before looking at either, you should consider PyPy, a complete implementation of Python 2.7.3 (and hopefully 3.3 soon) in a JIT-compiling interpreter, that often offers similar speedups, with pretty much no tradeoffs at all. Or, alternatively, you may just need to rewrite slow code to take advantage of already-optimized libraries (numpy instead of mapping over lists, itertools instead of explicit loops, lxml instead of html.parse, …).
If you're looking to write Python code that can interact directly with C code, without all the headaches of ctypes (or manually building Python bindings), Cython scores again. Cython code can effectively natively call both Python code and C code, and the compiler makes it all work like magic.
If you're looking to get C code that you can read, maintain, and improve on… there, you're out of luck. And this one may actually be impossible. Idiomatic Python code is just so different from idiomatic C code that it's hard to imagine how you could translate one into the other.
If you're wondering what the underlying problem is:
As far as I can tell, freeze makes a lot of assumptions about how things are laid out. It should be enough to have any Python installation that can build C extension modules and embedding apps, but it's not, because freeze goes under the covers and expects that building to work in specific ways. A standard binary installation on almost every *nix platform ends up looking like what freeze expects,* but a standard binary installation on Windows looks completely different.
It's not impossible to hack things up using Windows symlinks (at least if you have Vista or later and a drive with a modern version of NTFS) to get everything organized the way freeze expects (I found a blog where someone did that with 2.7.1…), but really, I don't think it's worth trying. It will be a lot of work (especially if you're just learning this stuff), and there's no guarantee you won't immediately run into another problem.
* This isn't actually true. On a Mac, both Apple's pre-installed Python and the binary installers at python.org actually give you the files organized as a Mac framework—but they provide a bunch of symlinks that simulate the traditional layout, which is good enough. On most linux distros, and many other platforms, the binary python package doesn't include any of the development files at all—but once you install an add-on binary package named something like python-devel, then you've got the right layout. Anyway, none of this matters to you, because if you wanted to learn about dpkg dependencies or framework builds you wouldn't be using Windows, right?
Suppose I've developed a general-purpose end user utility written in Python. Previously, I had just one version available which was suitable for Python later than version 2.3 or so. It was sufficient to say, "download Python if you need to, then run this script". There was just one version of the script in source control (I'm using Git) to keep track of.
With Python 3, this is no longer necessarily true. For the foreseeable future, I will need to simultaneously develop two different versions, one suitable for Python 2.x and one suitable for Python 3.x. From a development perspective, I can think of a few options:
Maintain two different scripts in the same branch, making improvements to both simultaneously.
Maintain two separate branches, and merge common changes back and forth as development proceeds.
Maintain just one version of the script, plus check in a patch file that converts the script from one version to the other. When enough changes have been made that the patch no longer applies cleanly, resolve the conflicts and create a new patch.
I am currently leaning toward option 3, as the first two would involve a lot of error-prone tedium. But option 3 seems messy and my source control system is supposed to be managing patches for me.
For distribution packaging, there are more options to choose from:
Offer two different download packages, one suitable for Python 2 and one suitable for Python 3 (the user will have to know to download the correct one for whatever version of Python they have).
Offer one download package, with two different scripts inside (and then the user has to know to run the correct one).
One download package with two version-specific scripts, and a small stub loader that can run in both Python versions, that runs the correct script for the Python version installed.
Again I am currently leaning toward option 3 here, although I haven't tried to develop such a stub loader yet.
Any other ideas?
Edit: my original answer was based on the state of 2009, with Python 2.6 and 3.0 as the current versions. Now, with Python 2.7 and 3.3, there are other options. In particular, it is now quite feasible to use a single code base for Python 2 and Python 3.
See Porting Python 2 Code to Python 3
Original answer:
The official recommendation says:
For porting existing Python 2.5 or 2.6
source code to Python 3.0, the best
strategy is the following:
(Prerequisite:) Start with excellent test coverage.
Port to Python 2.6. This should be no more work than the average port
from Python 2.x to Python 2.(x+1).
Make sure all your tests pass.
(Still using 2.6:) Turn on the -3 command line switch. This enables
warnings about features that will be
removed (or change) in 3.0. Run your
test suite again, and fix code that
you get warnings about until there are
no warnings left, and all your tests
still pass.
Run the 2to3 source-to-source translator over your source code tree.
(See 2to3 - Automated Python 2 to 3
code translation for more on this
tool.) Run the result of the
translation under Python 3.0. Manually
fix up any remaining issues, fixing
problems until all tests pass again.
It is not recommended to try to write
source code that runs unchanged under
both Python 2.6 and 3.0; you’d have to
use a very contorted coding style,
e.g. avoiding print statements,
metaclasses, and much more. If you are
maintaining a library that needs to
support both Python 2.6 and Python
3.0, the best approach is to modify step 3 above by editing the 2.6
version of the source code and running
the 2to3 translator again, rather than
editing the 3.0 version of the source
code.
Ideally, you would end up with a single version, that is 2.6 compatible and can be translated to 3.0 using 2to3. In practice, you might not be able to achieve this goal completely. So you might need some manual modifications to get it to work under 3.0.
I would maintain these modifications in a branch, like your option 2. However, rather than maintaining the final 3.0-compatible version in this branch, I would consider to apply the manual modifications before the 2to3 translations, and put this modified 2.6 code into your branch. The advantage of this method would be that the difference between this branch and the 2.6 trunk would be rather small, and would only consist of manual changes, not the changes made by 2to3. This way, the separate branches should be easier to maintain and merge, and you should be able to benefit from future improvements in 2to3.
Alternatively, take a bit of a "wait and see" approach. Proceed with your porting only so far as you can go with a single 2.6 version plus 2to3 translation, and postpone the remaining manual modification until you really need a 3.0 version. Maybe by this time, you don't need any manual tweaks anymore...
For developement, option 3 is too cumbersome. Maintaining two branches is the easiest way although the way to do that will vary between VCSes. Many DVCS will be happier with separate repos (with a common ancestry to help merging) and centralized VCS will probably easier to work with with two branches. Option 1 is possible but you may miss something to merge and a bit more error-prone IMO.
For distribution, I'd use option 3 as well if possible. All 3 options are valid anyway and I have seen variations on these models from times to times.
I don't think I'd take this path at all. It's painful whichever way you look at it. Really, unless there's strong commercial interest in keeping both versions simultaneously, this is more headache than gain.
I think it makes more sense to just keep developing for 2.x for now, at least for a few months, up to a year. At some point in time it will be just time to declare on a final, stable version for 2.x and develop the next ones for 3.x+
For example, I won't switch to 3.x until some of the major frameworks go that way: PyQt, matplotlib, numpy, and some others. And I don't really mind if at some point they stop 2.x support and just start developing for 3.x, because I'll know that in a short time I'll be able to switch to 3.x too.
I would start by migrating to 2.6, which is very close to python 3.0. You might even want to wait for 2.7, which will be even closer to python 3.0.
And then, once you have migrated to 2.6 (or 2.7), I suggest you simply keep just one version of the script, with things like "if PY3K:... else:..." in the rare places where it will be mandatory. Of course it's not the kind of code we developers like to write, but then you don't have to worry about managing multiple scripts or branches or patches or distributions, which will be a nightmare.
Whatever you choose, make sure you have thorough tests with 100% code coverage.
Good luck!
Whichever option for development is chosen, most potential issues could be alleviated with thorough unit testing to ensure that the two versions produce matching output. That said, option 2 seems most natural to me: applying changes from one source tree to another source tree is a task (most) version control systems were designed for--why not take advantages of the tools they provide to ease this.
For development, it is difficult to say without 'knowing your audience'. Power Python users would probably appreciate not having to download two copies of your software yet for a more general user-base it should probably 'just work'.