AoT Compiler for Python

AoT Compiler for Python - python

I want to get my Python script working on a bare metal device like microcontroller WITHOUT the need for an interpreter. I know there are already JIT compilers for Python like PyPy, and interpreters like CPython.
However, existing interpreters I've seen (such as CPython) take up large memory (in MB range).
Is there an AOT compiler for Python (i.e. compiling directly to native hardware through intermediary like LLVM)?
I assume such a compiler would enable Python to run much faster compared to existing implementations AND with lower memory footprint. If there is, I wonder why that solution hasn't been popularized.

As you already mentioned Cython is an option (However, it is true that the result is big due since the C runtime need to implement the Python functionality together with your program).
With regards to LLVM there was a project by Google named unladen swallow. However, that project is mostly abandoned. You can find some information about it here
Basically it was an attempt to bring LLVM optimizations into the runtime of Cython. E.g JITTING Python code.
Another old alternative was shed skin which compiles Python to C++. Some information about it can be found here.
Yet another option similar to shed skin is to restrict yourself to a subset of the Python language and use micropython.

An alternative would be to use GraalVM with Truffle AOT with Python.
It's basically Python running on an optimized AOT for jvm.
The project looks promising. You can chek this link here:
https://www.graalvm.org/22.2/graalvm-as-a-platform/language-implementation-framework/AOT/

Recently came across codon,
Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 10-100x or more, on a single thread. Codon's performance is typically on par with (and sometimes better than) that of C/C++.

Related

Speeding up Parts of Existing Python App with PyPy or Shedskin

I am looking to bring speed improvements to an existing application and I'm looking for advice on my possible options. The application is written in Python, uses wxPython, and is packaged with py2exe (I only target windows platforms). Parts of the application are computationally intensive and run too slowly in interpreted Python. I am not familiar with C, so porting parts of the code over is not really an option for me.
So my question is basically do I have a clear picture of my options as I outline below, or am I approaching this from the wrong direction?
Running with pypy: Today I started experimenting with Pypy - the results are exciting, in that I can run large parts of the code from the pypy interpreter and I'm seeing 5x+ speed improvements with no code changes. However, if I understand correctly, (a) Pypy with wxpython support is still a work in progress, and (b) I cannot compile it down to an exe for distribution anyway. So unless I'm mistaken, this seems like a no-go for me? There's no way to package things up so parts of it are executed with pypy?
Converting code to RPython, translating with pypy So the next option seems to be actually rewriting parts of the code to the pypy restricted language, which seems like a pretty large job. But if I do that, parts of the code can then be compiled to an executable (?) and then I can access the code through ctypes (?).
Other restricted options Shedskin seems to be a popular alternative here, does this fit my requirements better? Other options seem to be Cpython, Psyco, and Unladen, but they are all superseded or no longer maintained.

Using PyPy indeed rules out py2exe and similar tools, at least until one is ported (AFAIK there is no active work on that). Still, as PyPy binaries do not need to be installed, you might get away with a more complicated distribution that includes both your Python source code and a PyPy binary+stdlib and uses a small wrapper (batch file, executable) to ease launching. I can't comment on whether WxPython on PyPy is mature enough to be used, but perhaps someone on pypy-dev, wxpython-dev or either one's IRC channel can give a recommendation if you describe your situation.
Translating your code into RPython does not seem viable to me. The translation toolchain is not really a tool for general purpose development, and producing a C dll for embedding/ctypes seems nontrivial. Also, RPython code really is low-level, making your Python code restricted enough may amount to rewriting half of it.
As for other restricted options: You seem to mix up CPython (the original Python interpreter written in C) with Cython (a compiler for a Python-like language that emits C code suitable for CPython extension modules). Both projects are active. I'm not very familiar with Shedskin, but it seems to be a tool for developing whole programs, with little or no interaction with non-restricted Python code. Cython seems a much better fit: Although it requires manual type annotations and lower-level code to achieve really good performance, it's trivial to use from Python: The very purpose of the project is producing extension modules.

I would definitely look into Cython, I've been playing with it some and have seen speedups of ~100x over pure python. Use the profile module to find the bottlenecks first. Usually the loops are the biggest chances to increase speed when going to Cython. You should also look into seeing if you can use array/vector operations in Numpy instead of loops, if so that can also give extreme performance boosts. For instance:
a = range(1000000)
for i in range(len(a)):
a[i] += 5
is slow, real slow. On the other hand:
a = numpy.arange(10000000)
a = a +5
is fast, real fast.

Correction: shedskin can be used to generare extention modules, as well as whole programs.

Use Cython as Python to C Converter

I have huge Python modules(+8000 lines) .They basically have tons of functions for interacting with a hardware platform via serial port by reading and writing to hardware registers.
They are not numerical algorithms. So application is just reading/writing to hardware registers/memory. I use these libraries to write custom scripts. Eventually, I need to move all these stuff to be run in an embedded processor on my hardware to have finer control, then I just kick off the event from PC and the rest is in hardware.
So I need to convert them to C.If I can have my scripts be converted to C by an automatic tool, that would save me a huge time. This is why I got attracted to Cython. Efficiency is not important my codes are not number crunchers. But generated code should be relatively small to fit in my limited memory (few hundreds of Kilobytes).
Can I use Cython as converter for my custom Python scripts to C? My guess is yes, in which case can I use these .c files to be run in my hardware? My guess is not since I have to have Cython in my hardware as well for them to run. But if just creates some .c files, I can go through and make it standalone since code is not using much of features of Python it just use it as a fast to implement script.

Yes, at its core this is what Cython does. But ...
You don't need Cython, however, you do need libpython. You may feel like it doesn't use that many Python features, but I think if you try this you'll find it's not true -- you won't be able to separate your program from its dependence on libpython while still using the Python language.
Another option is PyPy, specifically it's translation toolchain, NOT the PyPy Python interpreter. It lets you translate RPython, a subset of the Python language, into C. If you really aren't using many Python language features or libraries, this may work.
PyPy is mostly known as an alternative Python implementation, but it is also a set of tools for compiling dynamic languages into various forms. This is what allows the PyPy implementation of Python, written in (R)Python, to be compiled to machine code.
If C++ is available, Nuitka is a Python to C++ compiler that works for regular Python, not just RPython (which is what shedskin and PyPy use).

If C++ is available for that embedded platform, there is shed skin, it converts python into c++.

LLVM, Parrot, JVM, PyPy + python

What is the problem in developing some languages, for example python for some optimized techniques with some of LLVM / Parrot.
PyPy, LLVM, Parrot are the main technologies for common platform development.
I see this like:
PyPy - framework to build VM with build in optimized VM for python
So it quite general solution. The process goes as listed down:
dynamic_language_code ->
PyPy frontend ->
PyPy internal code - bytecode ->
PyPy optimization ->
leaving PyPy code and:
a. PyPy backend for some VM (like jvm)
b. som Kit to make own VM
c. processing/running PyPy internal code
Am I right About this process? For python there is optimized VM? Particularly by default there is build in VM for optimized PyPy code (step 5.c) - which is for python and every language processing can stop there and be running by it?
Parrot - much like PyPy, but without 5.a and 5.b ? Some internal improvements for dynamic processing (Parrot Magic Cookies).
Both Parrot and PyPy are designed to create a platform which create a common dynamic languages runtime, but PyPy wants more - also to create more VM.
Where is the sens of PyPy? For what we need to create more VM? Shouldn't be better to focus on one VM (like in parrot) - because there is common one code level - either PyPy internal bytecode or Parrot ones.
I think we can't gain nothing better to translate to PyPy bytecode to newly created with PyPy VMs.
LLVM - i see this very similar to PyPy but without VM generator.
It is mature, well designed environment with similar targets as PyPy (but without VM generator) but working on low level structure and great optimization/JIT techniques implemeted
Is see this as: LLVM is general use, but Parrot and **PyPy* designed for dynamic languages. In PyPy / Parrot is more easy to introduce some complicated techniques - because it is more high level then LLVM - like sophisticate compiler which can better understand high level code and produce better assembler code (which humans can't write in reasonable time), then the LLVM one?
Questions:
Am I right? Is there any reason that porting some dynamic language would be better to llvm then to for example Parrot?
I haven't see the activity on development python on Parrot. Is it because using python C extensions doesn't work on parrot? The same problem is in PyPy
Why other VMs don't want to move to LLVM / parrot. Eg ruby -> parrot, CLR/ JVM -> LLVM. Wouldn't be better for them to move to more sophisticated solution? LLVM is in high development process and has big companies investing in.
I know the problem might be in recompile are resources, if there is need to change bytecode - but it is not obligatory - as we can try to port old bytecode to new one, and new compilers produce new bytecode (never less java still need to interpreted own bytecode - so the frontend can check it and translate it to new bytecode)?
What are the problems with linking for example jvm libraries inside llvm (if we port somehow java/jvm/scala to llvm)?
Can you correct me if i'm wrong somewhere
Some addings:
How does Parrot compare to other virtual machines
What's the benefit of Parrot VM for end-users
What are the differences between LLVM and java/jvm
=============
CLARIFICATION
I want to figure how all this software consist - and what is the problem to porting one to other.

That not stuff anybody can possible answer in a stackoverflow questions but i give it a minmal shot.
First what problems do the 3 projects solve?
pypy allows you to implement an interpreter in a high level language and you get a generated jit for free. The good thing about this is that you don't have a dependence mismatch between the langauge and the platform. Thats the reason why pypy-clr is faster then IronPython.
More info here: http://codespeak.net/pypy/dist/pypy/doc/extradoc.html --> High performance implementation of Python for CLI/.NET with JIT compiler generation for dynamic)
llvm is a low level infrastructure for compilers. The general idea is to have one "high level assembly". All the optomizations work on that language. Then there is tons of infrastructure around to help you build compilers (JIT or AOT). Implementing a dynamic language on llvm is possible but needs more work then implementing it on pypy or parrot. You, for example, can't get a GC for free (there are GC you can use together with LLVM see http://llvm.org/devmtg/2009-10/ --> the vmkit video ) There are attempts to build a platform better for dynamic langauges based on llvm: http://www.ffconsultancy.com/ocaml/hlvm/
I don't know that much about parrot but as I understand they want to build one standard VM specialized for dynamic langauges (perl, php, python ....). The problem here is the same as with compiling to JVM/CLR there is a dependency missmatch, just a much smaller one. The VM still does not know the semantics of your langauge. As I unterstand parrot is still pretty slow for user code. (http://confreaks.net/videos/118-elcamp2010-parrot)
The answer to your question:
Am I right? Is there any reason that porting some dynamic language would be better to llvm then to for example Parrot?
Thats a question of effort. Building everthing your self and specialized for you will eventually be faster but it's a LOT more effort.
I haven't see the activity on development python on Parrot. Is it because using python C extensions doesn't work on parrot? The same problem is in PyPy.
Targeting parrot would (at this point) not likely have a benefit over pypy. Why nobody else does it I don't know.
Why other VMs don't want to move to LLVM / parrot. Eg ruby -> parrot, CLR/ JVM -> LLVM. Wouldn't be better for them to move to more sophisticated solution? LLVM is in high
development process and has big companies investing in.
Ok there is a lot of stuff in that question.
Like I said LLVM is hard to move to and parrot is not that fast (correct me if im wrong).
Ruby has Rubinius witch tries to do a lot in ruby and jits to llvm (http://llvm.org/devmtg/2009-10/ --> Accelerating Ruby with LLVM).
There is a implementation of CLR/JVM on LLVM but they both already have very mature implemantations that have big investments.
LLVM is not high level.
I know the problem might be in recompile are resources, if there is need to change bytecode - but it is not obligatory - as we can try to port old bytecode to new one, and new compilers produce new bytecode (never less java still need to interpreted own bytecode - so the frontend can check it and translate it to new bytecode)?
I have no idea what the question is.
What are the problems with linking for example jvm libraries inside llvm (if we port somehow java/jvm/scala to llvm)?
Watch the video of VMKit I linked above that show how far they got and what the problem is (and how they solved it).
Can you correct me if i'm wrong somewhere
Lots of stuff you wrote is wrong or I just don't anderstand what you mean, but the stuff I linked should make a lot of stuff clearer.
Some examples:
Clojure
The creater didn't want all the work of implementing his own vm and all the libraries. So where to go to? Since Clojure is a new langauge you can build it in a way that works well on a platform like the JVM by restricting a lot of dynamic stuff a language like python or ruby would have.
Python
The language can't (practically) be changed to work well on JVM/CLR. So implementing python on those wont bring massive speedups. Static compiler won't work very well either because there are not many static guarantees. Writing a JIT in C will be fast but very hard to change (see the psyco project). Using the llvm jit could work and is explored by the Unladen Swallow project (again http://llvm.org/devmtg/2009-10/ --> Unladen Swallow: Python on LLVM). Some people wanted to have python in python so they started pypy and their idea seams to work really well (see above). Parrot could work as well but I have not seen anybody have try (feel free).
On everything:
I think you're confused and I can understand that. Take your time and read, listen, watch everything you can get. Don't stress yourself. There are a lot of parts to this and eventually you see how what fits together and what makes sense and even when you know a lot there is still a lot of discussing one may do. The question is where to implement a new language or how to speed up a old language have many answers and if you ask 3 people you're likely to get three different answers.

What are you trying to implement? Your question is very confusingly worded (I realize English is likely not your first language).
LLVM and PyPy are both mature, useful projects, but really don't overlap much at this point. (At one point, PyPy could generate LLVM bytecode—which was statically compiled to an interpreter—as opposed to C code, but it didn't provide much of a performance benefit and is no longer supported.)
PyPy lets you write an interpreter in RPython and use that as a description to generate a native code interpreter or JIT; LLVM is a C++ framework for building a compiler toolchain which can also be used to implement a JIT. LLVM's optimizers, code generation and platform support are significantly more advanced than those of PyPy, but it isn't as well suited to building a dynamic language runtime (see the Unladen Swallow retrospective for some examples of why). In particular, it is not as effective at collecting/using runtime feedback (which is absolutely essential for making dynamic languages perform well) as PyPy's trace-based JIT. Also, LLVM's garbage collection support is still somewhat primitive, and it lacks PyPy's unique ability to automatically generate a JIT.
Incidentally two Java implementations are built on LLVM—J3/VMKit and Shark.
You might consider watching the PyPy talk from Stanford last week; it provides a pretty decent overview of how PyPy works. Carl Friedrich Bolz's presentation also provides a good overview of the state of VM implementation.

The main reason? Because VM design is not a settled technology, and having a variety of VMs with different goals and objectives allows a variety of mechnisms to be tried in parallel rather than all having to be tried in series.
The JVM, CLR, PyPy, Parrot, LLVM and the rest all target different kinds of problems in different ways. It's similar to the reasons why Chrome, Firefox, Safari and IE all use their own Javascript engines.
Unladen Swallow attempted to apply LLVM to CPython, and spent more of their time fixing issues in LLVM than they did in doing anything Python specific.
Python-on-Parrot suffered from semantic differences between Perl 6 and Python causing problems with the front-end compilation process, so future efforts in this area are likely to use the PyPy front-end to target the Parrot VM.
Different VM developers certainly keep an eye on what the others are doing, but even when they lift good ideas they will put their own spin on them before incorporating them.

Python - IronPython dilemma

I'm starting to study Python, and for now I like it very much. But, if you could just answer a few questions for me, which have been troubling me, and I can't find any definite answers to them:
What is the relationship between Python's C implementation (main version from python.org) and IronPython, in terms of language compatibility ? Is it the same language, and do I by learning one, will be able to smoothly cross to another, or is it Java to JavaScript ?
What is the current status to IronPython's libraries ? How much does it lags behind CPython libraries ? I'm mostly interested in numpy/scipy and f2py. Are they available to IronPython ?
What would be the best way to access VB from Python and the other way back (connecting some python libraries to Excel's VBA, to be exact) ?

1) IronPython and CPython share nearly identical language syntax. There is very little difference between them. Transitioning should be trivial.
2) The libraries in IronPython are very different than CPython. The Python libraries are a fair bit behind - quite a few of the CPython-accessible libraries will not work (currently) under IronPython. However, IronPython has clean, direct access to the entire .NET Framework, which means that it has one of the most extensive libraries natively accessible to it, so in many ways, it's far ahead of CPython. Some of the numpy/scipy libraries do not work in IronPython, but due to the .NET implementation, some of the functionality is not necessary, since the perf. characteristics are different.
3) Accessing Excel VBA is going to be easier using IronPython, if you're doing it from VBA. If you're trying to automate excel, IronPython is still easier, since you have access to the Execl Primary Interop Assemblies, and can directly automate it using the same libraries as C# and VB.NET.

What is the relationship between
Python's C implementation (main
version from python.org) and
IronPython, in terms of language
compatibility ? Is it the same
language, and do I by learning one,
will be able to smoothly cross to
another, or is it Java to JavaScript ?
Same language (at 2.5 level for now -- IronPython's not 2.6 yet AFAIK).
What is the current status to
IronPython's libraries ? How much does
it lags behind CPython libraries ? I'm
mostly interested in numpy/scipy and
f2py. Are they available to IronPython
?
Standard libraries are in a great state in today's IronPython, huge third-party extensions like the ones you mention far from it. numpy's starting to get feasible thanks to ironclad, but not production-level as numpy is from IronPython (as witnessed by the 0.5 version number for ironclad, &c). scipy is huge and sprawling and chock full of C-coded and Fortran-coded extensions: I have no first-hand experience but I suspect less than half will even run, much less run flawlessly, under any implementation except CPython.
What would be the best way to access
VB from Python and the other way back
(connecting some python libraries to
Excel's VBA, to be exact) ?
IronPython should make it easier via .NET approaches, but CPython is not that far via COM implementation in win32all.
Last but not least, by all means check out the book IronPython in Action -- as I say every time I recommend it, I'm biased (by having been a tech reviewer for it AND by friendship with one author) but I think it's objectively the best intro to Python for .NET developers AND at the same time the best intro to .NET for Pythonistas.
If you need all of scipy (WOW, but that's some "Renaissance Man" computational scientist!-), CPython is really the only real option today. I'm sure other large extensions (PyQt, say, or Mayavi) are in a similar state. For deep integration to today's Windows, however, I think IronPython may have an edge. For general-purpose uses, CPython may be better (esp. thanks to the many new features in 2.6), unless you're really keen to use many cores to the hilt within a single process, in which case IronPython (which is GIL-less) may prove advantageous again.
One way or another (or even on the JVM via Jython, or in peculiar environments via PyPy) Python is surely an awesome language, whatever implementation(s) you pick for a given application!-) Note that you don't need to stick with ONE implementation (though you should probably pick one VERSION -- 2.5 for maximal compatibility with IronPython, Jython, Google App Engine, etc; 2.6 if you don't care about any deployment options except "CPython on a machine under my own sole or virtual control";-).

IronPython version 2.0.2, the current release, supports Python 2.5 syntax. With the next release, 2.6, which is expected sometime over the next month or so (though I'm not sure the team have set a hard release date; here's the beta), they will support Python 2.6 syntax. So, less Java to JavaScript and more Java to J# :-)
All of the libraries that are themselves written in Python work pretty much perfectly. The ones that are written in C are more problematic; there is an open source project called Ironclad (full disclosure: developed and supported by my company), currently at version 0.8.5, which supports numpy pretty well but doesn't cover all of scipy. I don't think we've tested it with f2py. (The version of Ironclad mentioned below by Alex Martelli is over a year old, please avoid that one!)
Calling regular VB.NET from IronPython is pretty easy -- you can instantiate .NET classes and call methods (static or instance) really easily. Calling existing VBA code might be trickier; there are open source projects called Pyinex and XLW that you might want to take a look at. Alternatively, if you want a spreadsheet you can code in Python, then there's always Resolver One (another one from my company... :-)

1) The language implemented by CPython and IronPython are the same, or at most a version or two apart. This is nothing like the situation with Java and Javascript, which are two completely different languages given similar names by some bone-headed marketing decision.
2) 3rd-party libraries implemented in C (such as numpy) will have to be evaluated carefully. IronPython has a facility to execute C extensions (I forget the name), but there are many pitfalls, so you need to check with each library's maintainer
3) I have no idea.

CPython is implemented by C for corresponding platform, such as Windows, Linux or Unix; IronPython is implemented by C# and Windows .Net Framework, so it can only run on Windows Platform with .Net Framework.
For gramma, both are same. But we cannot say they are one same language. IronPython can use .Net Framework essily when you develop on windows platform.
By far, July 21, 2009 - IronPython 2.0.2, our latest stable release of IronPython, was released. you can refer to http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython.
You can access VB with .Net Framework function by IronPython. So, if you want to master IronPython, you'd better learn more .Net Framework.

Python or IronPython

How does IronPython stack up to the default Windows implementation of Python from python.org? If I am learning Python, will I be learning a subtley different language with IronPython, and what libraries would I be doing without?
Are there, alternatively, any pros to IronPython (not including .NET IL compiled classes) that would make it more attractive an option?

There are a number of important differences:
Interoperability with other .NET languages. You can use other .NET libraries from an IronPython application, or use IronPython from a C# application, for example. This interoperability is increasing, with a movement toward greater support for dynamic types in .NET 4.0. For a lot of detail on this, see these two presentations at PDC 2008.
Better concurrency/multi-core support, due to lack of a GIL. (Note that the GIL doesn't inhibit threading on a single-core machine---it only limits performance on multi-core machines.)
Limited ability to consume Python C extensions. The Ironclad project is making significant strides toward improving this---they've nearly gotten Numpy working!
Less cross-platform support; basically, you've got the CLR and Mono. Mono is impressive, though, and runs on many platforms---and they've got an implementation of Silverlight, called Moonlight.
Reports of improved performance, although I have not looked into this carefully.
Feature lag: since CPython is the reference Python implementation, it has the "latest and greatest" Python features, whereas IronPython necessarily lags behind. Many people do not find this to be a problem.

There are some subtle differences in how you write your code, but the biggest difference is in the libraries you have available.
With IronPython, you have all the .Net libraries available, but at the expense of some of the "normal" python libraries that haven't been ported to the .Net VM I think.
Basically, you should expect the syntax and the idioms to be the same, but a script written for IronPython wont run if you try giving it to the "regular" Python interpreter. The other way around is probably more likely, but there too you will find differences I think.

Well, it's generally faster.
Can't use modules, and only has a subset of the library.
Here's a list of differences.

See the blog post IronPython is a one-way gate. It summarizes some things I've learned about IronPython from asking questions on StackOverflow.

Python is Python, the only difference is that IronPython was designed to run on the CLR (.NET Framework), and as such, can inter-operate and consume .NET assemblies written in other .NET languages. So if your platform is Windows and you also use .NET or your company does then should consider IronPython.

One of the pros of IronPython is that, unlike CPython, IronPython doesn't use the Global Interpreter Lock, thus making threading more effective.
In the standard Python implementation, threads grab the GIL on each object access. This limits parallel execution, which matters especially if you expect to fully utilize multiple CPUs.

Pro: You can run IronPython in a browser if SilverLight is installed.

It also depends on whether you want your code to work on Linux. Dunno if IronPython will work on anything beside windows platforms.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.