Related
I've been hearing a lot about the PyPy project. They claim it is 6.3 times faster than the CPython interpreter on their site.
Whenever we talk about dynamic languages like Python, speed is one of the top issues. To solve this, they say PyPy is 6.3 times faster.
The second issue is parallelism, the infamous Global Interpreter Lock (GIL). For this, PyPy says it can give GIL-less Python.
If PyPy can solve these great challenges, what are its weaknesses that are preventing wider adoption? That is to say, what's preventing someone like me, a typical Python developer, from switching to PyPy right now?
NOTE: PyPy is more mature and better supported now than it was in 2013, when this question was asked. Avoid drawing conclusions from out-of-date information.
PyPy, as others have been quick to mention, has tenuous support for C extensions. It has support, but typically at slower-than-Python speeds and it's iffy at best. Hence a lot of modules simply require CPython. PyPy doesn't support numpy. Some extensions are still not supported (Pandas, SciPy, etc.), take a look at the list of supported packages before making the change. Note that many packages marked unsupported on the list are now supported.
Python 3 support is experimental at the moment. has just reached stable! As of 20th June 2014, PyPy3 2.3.1 - Fulcrum is out!
PyPy sometimes isn't actually faster for "scripts", which a lot of people use Python for. These are the short-running programs that do something simple and small. Because PyPy is a JIT compiler its main advantages come from long run times and simple types (such as numbers). PyPy's pre-JIT speeds can be bad compared to CPython.
Inertia. Moving to PyPy often requires retooling, which for some people and organizations is simply too much work.
Those are the main reasons that affect me, I'd say.
That site does not claim PyPy is 6.3 times faster than CPython. To quote:
The geometric average of all benchmarks is 0.16 or 6.3 times faster than CPython
This is a very different statement to the blanket statement you made, and when you understand the difference, you'll understand at least one set of reasons why you can't just say "use PyPy". It might sound like I'm nit-picking, but understanding why these two statements are totally different is vital.
To break that down:
The statement they make only applies to the benchmarks they've used. It says absolutely nothing about your program (unless your program is exactly the same as one of their benchmarks).
The statement is about an average of a group of benchmarks. There is no claim that running PyPy will give a 6.3 times improvement even for the programs they have tested.
There is no claim that PyPy will even run all the programs that CPython runs at all, let alone faster.
Because pypy is not 100% compatible, takes 8 gigs of ram to compile, is a moving target, and highly experimental, where cpython is stable, the default target for module builders for 2 decades (including c extensions that don't work on pypy), and already widely deployed.
Pypy will likely never be the reference implementation, but it is a good tool to have.
The second question is easier to answer: you basically can use PyPy as a drop-in replacement if all your code is pure Python. However, many widely used libraries (including some of the standard library) are written in C and compiled as Python extensions. Some of these can be made to work with PyPy, some can't. PyPy provides the same "forward-facing" tool as Python --- that is, it is Python --- but its innards are different, so tools that interface with those innards won't work.
As for the first question, I imagine it is sort of a Catch-22 with the first: PyPy has been evolving rapidly in an effort to improve speed and enhance interoperability with other code. This has made it more experimental than official.
I think it's possible that if PyPy gets into a stable state, it may start getting more widely used. I also think it would be great for Python to move away from its C underpinnings. But it won't happen for a while. PyPy hasn't yet reached the critical mass where it is almost useful enough on its own to do everything you'd want, which would motivate people to fill in the gaps.
I did a small benchmark on this topic. While many of the other posters have made good points about compatibility, my experience has been that PyPy isn't that much faster for just moving around bits. For many uses of Python, it really only exists to translate bits between two or more services. For example, not many web applications are performing CPU intensive analysis of datasets. Instead, they take some bytes from a client, store them in some sort of database, and later return them to other clients. Sometimes the format of the data is changed.
The BDFL and the CPython developers are a remarkably intelligent group of people and have a managed to help CPython perform excellent in such a scenario. Here's a shameless blog plug: http://www.hydrogen18.com/blog/unpickling-buffers.html . I'm using Stackless, which is derived from CPython and retains the full C module interface. I didn't find any advantage to using PyPy in that case.
Q: If PyPy can solve these great challenges (speed, memory consumption, parallelism) in comparison to CPython, what are its weaknesses that are preventing wider adoption?
A: First, there is little evidence that the PyPy team can solve the speed problem in general. Long-term evidence is showing that PyPy runs certain Python codes slower than CPython and this drawback seems to be rooted very deeply in PyPy.
Secondly, the current version of PyPy consumes much more memory than CPython in a rather large set of cases. So PyPy didn't solve the memory consumption problem yet.
Whether PyPy solves the mentioned great challenges and will in general be faster, less memory hungry, and more friendly to parallelism than CPython is an open question that cannot be solved in the short term. Some people are betting that PyPy will never be able to offer a general solution enabling it to dominate CPython 2.7 and 3.3 in all cases.
If PyPy succeeds to be better than CPython in general, which is questionable, the main weakness affecting its wider adoption will be its compatibility with CPython. There also exist issues such as the fact that CPython runs on a wider range of CPUs and OSes, but these issues are much less important compared to PyPy's performance and CPython-compatibility goals.
Q: Why can't I do drop in replacement of CPython with PyPy now?
A: PyPy isn't 100% compatible with CPython because it isn't simulating CPython under the hood. Some programs may still depend on CPython's unique features that are absent in PyPy such as C bindings, C implementations of Python object&methods, or the incremental nature of CPython's garbage collector.
CPython has reference counting and garbage collection, PyPy has garbage collection only.
So objects tend to be deleted earlier and __del__ is called in a more predictable way in CPython. Some software relies on this behavior, thus they are not ready for migrating to PyPy.
Some other software works with both, but uses less memory with CPython, because unused objects are freed earlier. (I don't have any measurements to indicate how significant this is and what other implementation details affect the memory use.)
For a lot of projects, there is actually 0% difference between the different pythons in terms of speed. That is those that are dominated by engineering time and where all pythons have the same amount of library support.
To make this simple: PyPy provides the speed that's lacked by CPython but sacrifices its compatibility. Most people, however, choose Python for its flexibility and its "battery-included" feature (high compatibility), not for its speed (it's still preferred though).
I've found examples, where PyPy is slower than Python.
But: Only on Windows.
C:\Users\User>python -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 294 msec per loop
C:\Users\User>pypy -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 1.33 sec per loop
So, if you think of PyPy, forget Windows.
On Linux, you can achieve awesome accelerations.
Example (list all primes between 1 and 1,000,000):
from sympy import sieve
primes = list(sieve.primerange(1, 10**6))
This runs 10(!) times faster on PyPy than on Python.
But not on windows. There it is only 3x as fast.
PyPy has had Python 3 support for a while, but according to this HackerNoon post by Anthony Shaw from April 2nd, 2018, PyPy3 is still several times slower than PyPy (Python 2).
For many scientific calculations, particularly matrix calculations, numpy is a better choice (see FAQ: Should I install numpy or numpypy?).
Pypy does not support gmpy2. You can instead make use of gmpy_cffi though I haven't tested its speed and the project had one release in 2014.
For Project Euler problems, I make frequent use of PyPy, and for simple numerical calculations often from __future__ import division is sufficient for my purposes, but Python 3 support is still being worked on as of 2018, with your best bet being on 64-bit Linux. Windows PyPy3.5 v6.0, the latest as of December 2018, is in beta.
Supported Python Versions
To cite the Zen of Python:
Readability counts.
For example, Python 3.8 introduced fstring =.
There might be other features in Python 3.8+ which are more important to you. PyPy does not support Python 3.8+ at the moment.
Shameless self-advertisement: Killer Features by Python version - if you want to know more things you miss by using older Python versions
I am looking to bring speed improvements to an existing application and I'm looking for advice on my possible options. The application is written in Python, uses wxPython, and is packaged with py2exe (I only target windows platforms). Parts of the application are computationally intensive and run too slowly in interpreted Python. I am not familiar with C, so porting parts of the code over is not really an option for me.
So my question is basically do I have a clear picture of my options as I outline below, or am I approaching this from the wrong direction?
Running with pypy: Today I started experimenting with Pypy - the results are exciting, in that I can run large parts of the code from the pypy interpreter and I'm seeing 5x+ speed improvements with no code changes. However, if I understand correctly, (a) Pypy with wxpython support is still a work in progress, and (b) I cannot compile it down to an exe for distribution anyway. So unless I'm mistaken, this seems like a no-go for me? There's no way to package things up so parts of it are executed with pypy?
Converting code to RPython, translating with pypy So the next option seems to be actually rewriting parts of the code to the pypy restricted language, which seems like a pretty large job. But if I do that, parts of the code can then be compiled to an executable (?) and then I can access the code through ctypes (?).
Other restricted options Shedskin seems to be a popular alternative here, does this fit my requirements better? Other options seem to be Cpython, Psyco, and Unladen, but they are all superseded or no longer maintained.
Using PyPy indeed rules out py2exe and similar tools, at least until one is ported (AFAIK there is no active work on that). Still, as PyPy binaries do not need to be installed, you might get away with a more complicated distribution that includes both your Python source code and a PyPy binary+stdlib and uses a small wrapper (batch file, executable) to ease launching. I can't comment on whether WxPython on PyPy is mature enough to be used, but perhaps someone on pypy-dev, wxpython-dev or either one's IRC channel can give a recommendation if you describe your situation.
Translating your code into RPython does not seem viable to me. The translation toolchain is not really a tool for general purpose development, and producing a C dll for embedding/ctypes seems nontrivial. Also, RPython code really is low-level, making your Python code restricted enough may amount to rewriting half of it.
As for other restricted options: You seem to mix up CPython (the original Python interpreter written in C) with Cython (a compiler for a Python-like language that emits C code suitable for CPython extension modules). Both projects are active. I'm not very familiar with Shedskin, but it seems to be a tool for developing whole programs, with little or no interaction with non-restricted Python code. Cython seems a much better fit: Although it requires manual type annotations and lower-level code to achieve really good performance, it's trivial to use from Python: The very purpose of the project is producing extension modules.
I would definitely look into Cython, I've been playing with it some and have seen speedups of ~100x over pure python. Use the profile module to find the bottlenecks first. Usually the loops are the biggest chances to increase speed when going to Cython. You should also look into seeing if you can use array/vector operations in Numpy instead of loops, if so that can also give extreme performance boosts. For instance:
a = range(1000000)
for i in range(len(a)):
a[i] += 5
is slow, real slow. On the other hand:
a = numpy.arange(10000000)
a = a +5
is fast, real fast.
Correction: shedskin can be used to generare extention modules, as well as whole programs.
I'm starting to study Python, and for now I like it very much. But, if you could just answer a few questions for me, which have been troubling me, and I can't find any definite answers to them:
What is the relationship between Python's C implementation (main version from python.org) and IronPython, in terms of language compatibility ? Is it the same language, and do I by learning one, will be able to smoothly cross to another, or is it Java to JavaScript ?
What is the current status to IronPython's libraries ? How much does it lags behind CPython libraries ? I'm mostly interested in numpy/scipy and f2py. Are they available to IronPython ?
What would be the best way to access VB from Python and the other way back (connecting some python libraries to Excel's VBA, to be exact) ?
1) IronPython and CPython share nearly identical language syntax. There is very little difference between them. Transitioning should be trivial.
2) The libraries in IronPython are very different than CPython. The Python libraries are a fair bit behind - quite a few of the CPython-accessible libraries will not work (currently) under IronPython. However, IronPython has clean, direct access to the entire .NET Framework, which means that it has one of the most extensive libraries natively accessible to it, so in many ways, it's far ahead of CPython. Some of the numpy/scipy libraries do not work in IronPython, but due to the .NET implementation, some of the functionality is not necessary, since the perf. characteristics are different.
3) Accessing Excel VBA is going to be easier using IronPython, if you're doing it from VBA. If you're trying to automate excel, IronPython is still easier, since you have access to the Execl Primary Interop Assemblies, and can directly automate it using the same libraries as C# and VB.NET.
What is the relationship between
Python's C implementation (main
version from python.org) and
IronPython, in terms of language
compatibility ? Is it the same
language, and do I by learning one,
will be able to smoothly cross to
another, or is it Java to JavaScript ?
Same language (at 2.5 level for now -- IronPython's not 2.6 yet AFAIK).
What is the current status to
IronPython's libraries ? How much does
it lags behind CPython libraries ? I'm
mostly interested in numpy/scipy and
f2py. Are they available to IronPython
?
Standard libraries are in a great state in today's IronPython, huge third-party extensions like the ones you mention far from it. numpy's starting to get feasible thanks to ironclad, but not production-level as numpy is from IronPython (as witnessed by the 0.5 version number for ironclad, &c). scipy is huge and sprawling and chock full of C-coded and Fortran-coded extensions: I have no first-hand experience but I suspect less than half will even run, much less run flawlessly, under any implementation except CPython.
What would be the best way to access
VB from Python and the other way back
(connecting some python libraries to
Excel's VBA, to be exact) ?
IronPython should make it easier via .NET approaches, but CPython is not that far via COM implementation in win32all.
Last but not least, by all means check out the book IronPython in Action -- as I say every time I recommend it, I'm biased (by having been a tech reviewer for it AND by friendship with one author) but I think it's objectively the best intro to Python for .NET developers AND at the same time the best intro to .NET for Pythonistas.
If you need all of scipy (WOW, but that's some "Renaissance Man" computational scientist!-), CPython is really the only real option today. I'm sure other large extensions (PyQt, say, or Mayavi) are in a similar state. For deep integration to today's Windows, however, I think IronPython may have an edge. For general-purpose uses, CPython may be better (esp. thanks to the many new features in 2.6), unless you're really keen to use many cores to the hilt within a single process, in which case IronPython (which is GIL-less) may prove advantageous again.
One way or another (or even on the JVM via Jython, or in peculiar environments via PyPy) Python is surely an awesome language, whatever implementation(s) you pick for a given application!-) Note that you don't need to stick with ONE implementation (though you should probably pick one VERSION -- 2.5 for maximal compatibility with IronPython, Jython, Google App Engine, etc; 2.6 if you don't care about any deployment options except "CPython on a machine under my own sole or virtual control";-).
IronPython version 2.0.2, the current release, supports Python 2.5 syntax. With the next release, 2.6, which is expected sometime over the next month or so (though I'm not sure the team have set a hard release date; here's the beta), they will support Python 2.6 syntax. So, less Java to JavaScript and more Java to J# :-)
All of the libraries that are themselves written in Python work pretty much perfectly. The ones that are written in C are more problematic; there is an open source project called Ironclad (full disclosure: developed and supported by my company), currently at version 0.8.5, which supports numpy pretty well but doesn't cover all of scipy. I don't think we've tested it with f2py. (The version of Ironclad mentioned below by Alex Martelli is over a year old, please avoid that one!)
Calling regular VB.NET from IronPython is pretty easy -- you can instantiate .NET classes and call methods (static or instance) really easily. Calling existing VBA code might be trickier; there are open source projects called Pyinex and XLW that you might want to take a look at. Alternatively, if you want a spreadsheet you can code in Python, then there's always Resolver One (another one from my company... :-)
1) The language implemented by CPython and IronPython are the same, or at most a version or two apart. This is nothing like the situation with Java and Javascript, which are two completely different languages given similar names by some bone-headed marketing decision.
2) 3rd-party libraries implemented in C (such as numpy) will have to be evaluated carefully. IronPython has a facility to execute C extensions (I forget the name), but there are many pitfalls, so you need to check with each library's maintainer
3) I have no idea.
CPython is implemented by C for corresponding platform, such as Windows, Linux or Unix; IronPython is implemented by C# and Windows .Net Framework, so it can only run on Windows Platform with .Net Framework.
For gramma, both are same. But we cannot say they are one same language. IronPython can use .Net Framework essily when you develop on windows platform.
By far, July 21, 2009 - IronPython 2.0.2, our latest stable release of IronPython, was released. you can refer to http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython.
You can access VB with .Net Framework function by IronPython. So, if you want to master IronPython, you'd better learn more .Net Framework.
How does IronPython stack up to the default Windows implementation of Python from python.org? If I am learning Python, will I be learning a subtley different language with IronPython, and what libraries would I be doing without?
Are there, alternatively, any pros to IronPython (not including .NET IL compiled classes) that would make it more attractive an option?
There are a number of important differences:
Interoperability with other .NET languages. You can use other .NET libraries from an IronPython application, or use IronPython from a C# application, for example. This interoperability is increasing, with a movement toward greater support for dynamic types in .NET 4.0. For a lot of detail on this, see these two presentations at PDC 2008.
Better concurrency/multi-core support, due to lack of a GIL. (Note that the GIL doesn't inhibit threading on a single-core machine---it only limits performance on multi-core machines.)
Limited ability to consume Python C extensions. The Ironclad project is making significant strides toward improving this---they've nearly gotten Numpy working!
Less cross-platform support; basically, you've got the CLR and Mono. Mono is impressive, though, and runs on many platforms---and they've got an implementation of Silverlight, called Moonlight.
Reports of improved performance, although I have not looked into this carefully.
Feature lag: since CPython is the reference Python implementation, it has the "latest and greatest" Python features, whereas IronPython necessarily lags behind. Many people do not find this to be a problem.
There are some subtle differences in how you write your code, but the biggest difference is in the libraries you have available.
With IronPython, you have all the .Net libraries available, but at the expense of some of the "normal" python libraries that haven't been ported to the .Net VM I think.
Basically, you should expect the syntax and the idioms to be the same, but a script written for IronPython wont run if you try giving it to the "regular" Python interpreter. The other way around is probably more likely, but there too you will find differences I think.
Well, it's generally faster.
Can't use modules, and only has a subset of the library.
Here's a list of differences.
See the blog post IronPython is a one-way gate. It summarizes some things I've learned about IronPython from asking questions on StackOverflow.
Python is Python, the only difference is that IronPython was designed to run on the CLR (.NET Framework), and as such, can inter-operate and consume .NET assemblies written in other .NET languages. So if your platform is Windows and you also use .NET or your company does then should consider IronPython.
One of the pros of IronPython is that, unlike CPython, IronPython doesn't use the Global Interpreter Lock, thus making threading more effective.
In the standard Python implementation, threads grab the GIL on each object access. This limits parallel execution, which matters especially if you expect to fully utilize multiple CPUs.
Pro: You can run IronPython in a browser if SilverLight is installed.
It also depends on whether you want your code to work on Linux. Dunno if IronPython will work on anything beside windows platforms.
I am relatively new to Python, and I have always used the standard cpython (v2.5) implementation.
I've been wondering about the other implementations though, particularly Jython and IronPython. What makes them better? What makes them worse? What other implementations are there?
I guess what I'm looking for is a summary and list of pros and cons for each implementation.
Jython and IronPython are useful if you have an overriding need to interface with existing libraries written in a different platform, like if you have 100,000 lines of Java and you just want to write a 20-line Python script. Not particularly useful for anything else, in my opinion, because they are perpetually a few versions behind CPython due to community inertia.
Stackless is interesting because it has support for green threads, continuations, etc. Sort of an Erlang-lite.
PyPy is an experimental interpreter/compiler that may one day supplant CPython, but for now is more of a testbed for new ideas.
An additional benefit for Jython, at least for some, is it lacks the GIL (the Global Interpreter Lock) and uses Java's native threads. This means that you can run pure Python code in parallel, something not possible with the GIL.
All of the implementations are listed here:
https://wiki.python.org/moin/PythonImplementations
CPython is the "reference implementation" and developed by Guido and the core developers.
Pros: Access to the libraries available for JVM or CLR.
Cons: Both naturally lag behind CPython in terms of features.
IronPython and Jython use the runtime environment for .NET or Java and with that comes Just In Time compilation and a garbage collector different from the original CPython. They might be also faster than CPython thanks to the JIT, but I don't know that for sure.
A downside in using Jython or IronPython is that you cannot use native C modules, they can be only used in CPython.
PyPy is a Python implementation written in RPython wich is a Python subset.
RPython can be translated to run on a VM or, unlike standard Python, RPython can be statically compiled.