What does translate mean in pypy? - python

I'm reading pypy's document, which has a section called Translating the PyPy Python interpreter. But i don't understand what does the word translate mean. Is it the same as compile?
The document says:
First download a pre-built PyPy for your architecture which you will use to translate your Python interpreter.
Is the pre-built PyPy here refer to the source code? Because there is no pypy/goal directory in the binary I have downloaded. If so, there is something wrong with the document. It is misleading.
Is the pypy-c created in the translation the same thing as bin/pypy in the binary?

i don't understand what does the word translate mean. Is it the same as compile?
What "translation" means is described in detail in The RPython Toolchain. There's also some higher-level introductory information in the Coding Guide and FAQ.
Summarizing their summary:
Compile and import the complete RPython program.
Dynamically analyze the program and annotate it with flow graphs.
Compile the flow graphs into lower-level flow graphs.
Optimize the compiled flow graphs.
Analyze the compiled and optimized flow graphs.
Generate C source from the flow graphs and analysis.
Compile and link the C source into a native executable.
So, step 1 uses the normal Python compiler, step 7 uses the normal C compiler (and linker), and steps 3 and 4 are similar to the kind of thing an optimizing compiler normally does. But calling the overall process "compilation" would be misleading. (Also, people would probably interpret it to mean something akin to what Shedskin does, which is definitely not right.)
Is the pypy-c created in the translation the same thing as bin/pypy in the binary?
What ends up in a binary distribution is basically the same as if you run the install process on the translation goal. So, yes, goal/pypy-c and bin/pypy are effectively the same thing.
Is the pre-built PyPy here refer to the source code?
No. It refers to a bin/pypy from a binary distribution. As the docs say, you can actually use any Python 2.6+, including CPython, or a goal/pypy-c left over from a previous build, etc. However, the translator will probably run fastest on the standard PyPy binary distribution, so that's what you should use unless you have a good reason to do otherwise.

Let me give you what I can - PyPy is several things:
A fast implementation of Python using a Just-In-Time compiler (written in RPython)
The RPython JIT compiler compiler
When the docs talk about translating the interpreter they are talking about generating a JIT compiler for Python out of the RPython implementation of the Python compiler.
Python Compiler (Written in RPython)
|--[RPython to JIT compiler compiler]-->
PyPy (JIT'ed Python Interpreter)
The key thing to note is that "compiler compiler" is not a typo. RPython is part of a toolchain used to generate JIT compilers. Rather than writing a compiler for your language and then writing a JIT layer for your compiler (which can be difficult and time consuming) instead you implement your language in RPython and the RPython translation toolchain writes you a JIT compiler for your language.
The easiest way to think about this is to imagine that the PyPy team hadn't written their own JIT compiler compiler. Imagine instead that Topaz (JIT Ruby) came first and that team had written a JIT compiler compiler in Ruby (we'll call it RRuby). The PyPy team would have then written the PyPy compiler in RRuby (instead, since PyPy came first, the Topaz team is implementing their JIT Ruby compiler in RPython).

Related

Python vs Cpython

What's all this fuss about Python and CPython (Jython,IronPython), I don't get it:
python.org mentions that CPython is:
The "traditional" implementation of Python (nicknamed CPython)
yet another Stack Overflow question mentions that:
CPython is the default byte-code interpreter of Python, which is written in C.
Honestly I don't get what both of those explanations practically mean but what I thought was that, if I use CPython does that mean when I run a sample python code, it compiles it to C language and then executes it as if it were C code
So what exactly is CPython and how does it differ when compared with python and should I probably use CPython over Python and if so what are its advantages?
So what is CPython?
CPython is the original Python implementation. It is the implementation you download from Python.org. People call it CPython to distinguish it from other, later, Python implementations, and to distinguish the implementation of the language engine from the Python programming language itself.
The latter part is where your confusion comes from; you need to keep Python-the-language separate from whatever runs the Python code.
CPython happens to be implemented in C. That is just an implementation detail, really. CPython compiles your Python code into bytecode (transparently) and interprets that bytecode in a evaluation loop.
CPython is also the first to implement new features; Python-the-language development uses CPython as the base; other implementations follow.
What about Jython, etc.?
Jython, IronPython and PyPy are the current "other" implementations of the Python programming language; these are implemented in Java, C# and RPython (a subset of Python), respectively. Jython compiles your Python code to Java bytecode, so your Python code can run on the JVM. IronPython lets you run Python on the Microsoft CLR. And PyPy, being implemented in (a subset of) Python, lets you run Python code faster than CPython, which rightly should blow your mind. :-)
Actually compiling to C
So CPython does not translate your Python code to C by itself. Instead, it runs an interpreter loop. There is a project that does translate Python-ish code to C, and that is called Cython. Cython adds a few extensions to the Python language, and lets you compile your code to C extensions, code that plugs into the CPython interpreter.
You need to distinguish between a language and an implementation. Python is a language,
According to Wikipedia, "A programming language is a notation for writing programs, which are specifications of a computation or algorithm". This means that it's simply the rules and syntax for writing code. Separately we have a programming language implementation which in most cases, is the actual interpreter or compiler.
Python is a language.
CPython is the implementation of Python in C. Jython is the implementation in Java, and so on.
To sum up: You are already using CPython (if you downloaded from here).
Even I had the same problem understanding how are CPython, JPython, IronPython, PyPy are different from each other.
So, I am willing to clear three things before I begin to explain:
Python: It is a language, it only states/describes how to convey/express yourself to the interpreter (the program which accepts your python code).
Implementation: It is all about how the interpreter was written, specifically, in what language and what it ends up doing.
Bytecode: It is the code that is processed by a program, usually referred to as a virtual machine, rather than by the "real" computer machine, the hardware processor.
CPython is the implementation, which was
written in C language. It ends up producing bytecode (stack-machine
based instruction set) which is Python specific and then executes it.
The reason to convert Python code to a bytecode is because it's easier to
implement an interpreter if it looks like machine instructions. But,
it isn't necessary to produce some bytecode prior to execution of the
Python code (but CPython does produce).
If you want to look at CPython's bytecode then you can. Here's how you can:
>>> def f(x, y): # line 1
... print("Hello") # line 2
... if x: # line 3
... y += x # line 4
... print(x, y) # line 5
... return x+y # line 6
... # line 7
>>> import dis # line 8
>>> dis.dis(f) # line 9
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello')
4 CALL_FUNCTION 1
6 POP_TOP
3 8 LOAD_FAST 0 (x)
10 POP_JUMP_IF_FALSE 20
4 12 LOAD_FAST 1 (y)
14 LOAD_FAST 0 (x)
16 INPLACE_ADD
18 STORE_FAST 1 (y)
5 >> 20 LOAD_GLOBAL 0 (print)
22 LOAD_FAST 0 (x)
24 LOAD_FAST 1 (y)
26 CALL_FUNCTION 2
28 POP_TOP
6 30 LOAD_FAST 0 (x)
32 LOAD_FAST 1 (y)
34 BINARY_ADD
36 RETURN_VALUE
Now, let's have a look at the above code. Lines 1 to 6 are a function definition. In line 8, we import the 'dis' module which can be used to view the intermediate Python bytecode (or you can say, disassembler for Python bytecode) that is generated by CPython (interpreter).
NOTE: I got the link to this code from #python IRC channel: https://gist.github.com/nedbat/e89fa710db0edfb9057dc8d18d979f9c
And then, there is Jython, which is written in Java and ends up producing Java byte code. The Java byte code runs on Java Runtime Environment, which is an implementation of Java Virtual Machine (JVM). If this is confusing then I suspect that you have no clue how Java works. In layman terms, Java (the language, not the compiler) code is taken by the Java compiler and outputs a file (which is Java byte code) that can be run only using a JRE. This is done so that, once the Java code is compiled then it can be ported to other machines in Java byte code format, which can be only run by JRE. If this is still confusing then you may want to have a look at this web page.
Here, you may ask if the CPython's bytecode is portable like Jython, I suspect not. The bytecode produced in CPython implementation was specific to that interpreter for making it easy for further execution of code (I also suspect that, such intermediate bytecode production, just for the ease the of processing is done in many other interpreters).
So, in Jython, when you compile your Python code, you end up with Java byte code, which can be run on a JVM.
Similarly, IronPython (written in C# language) compiles down your Python code to Common Language Runtime (CLR), which is a similar technology as compared to JVM, developed by Microsoft.
This article thoroughly explains the difference between different implementations of Python. Like the article puts it:
The first thing to realize is that ‘Python’ is an interface. There’s a
specification of what Python should do and how it should behave (as
with any interface). And there are multiple implementations (as with
any interface).
The second thing to realize is that ‘interpreted’ and ‘compiled’ are
properties of an implementation, not an interface.
Python is a language: a set of rules that can be used to write programs. There are several implementaions of this language.
No matter what implementation you take, they do pretty much the same thing: take the text of your program and interpret it, executing its instructions. None of them compile your code into C or any other language.
CPython is the original implementation, written in C. (The "C" part in "CPython" refers to the language that was used to write Python interpreter itself.)
Jython is the same language (Python), but implemented using Java.
IronPython interpreter was written in C#.
There's also PyPy - a Python interpreter written in Python. Make your pick :)
As python is open source that's why we can customize python as per our requirements. After customization, we can name that version as we want. That's why multiple flavours of python are available. Each flavour is a customized version of python to fulfill a special requirement. It is similar to the fact that there are multiple flavours of UNIX like, Ubuntu, Linux, RedHat Linux etc. Below are some of the flavours of python :
CPython
Default implementation of python programming language which we download from python.org, provided by python software foundation. It is written in C and python. It does not allow us to write any C code, only python code are allowed. CPython can be called as both an interpreter and a compiler as here our python code first gets compiled to python bytecode then the bytecode gets interpreted into platform-specific operations by PVM or Python Virtual Machine. Keep in mind interpreters have language syntaxes predefined that's why it does not need to translate into low level machine code. Here Interpreter just executes bytecode on the fly during runtime and results in platform-specific operations.
Old versions of JavaScript, Ruby, Php were fully interpreted languages as their interpreters would directly translate each line source code into platform-specific operations, no bytecode was involved. Bytecode is there in Java, Python, C++ (.net), C# to decouple the language from execution environment, i.e. for portability, write once, run anywhere. Since 2008, Google Chrome's V8 JavaScript engine has come up with Just-In-Time Compiler for JavaScript. It executes JavaScript code line-by-line like an interpreter to reduce start up time, but if encounters with a hot section with repeatedly executed line of code then optimizes that code using baseline or optimizing compiler.
Cython
Cython is a programming language which is a superset of python and C. It is written in C and python. It is designed to give C-like performance with python syntax and optional C-syntax. Cython is a compiled language as it generates C code and gets compiled by C compiler. We can write similar code in Cython as in default python or CPython, the differences are :
Cython allows us to write optional additional C code and,
In Cython, our python code gets translated into C code internally so that it can get compiled by C compiler. Although Cython results in considerably faster execution, but falls short of original C language execution. This is because Cython has to make calls to the CPython interpreter and CPython standard libraries to understand the written CPython code
JPython / Jython
Java implementation of python programming language. It is written in Java and python. Here our python code first gets compiled to Java bytecode and then that bytecode gets interpreted to platform-specific operations by JVM or Java Virtual Machine. This is similar to how Java code gets executed : Java code first gets compiled to intermediate bytecode then that bytecode gets interpreted to platform-specific operations by JVM
PyPy
RPython implementation of python programming language. It is written in a restricted subset of python called Restricted Python (RPython). PyPy runs faster than CPython because to interpret bytecode, PyPy has a Just-in-time Compiler (Interpreter + Compiler) while CPython has an Interpreter. So JIT Compiler in PyPy can execute Python bytecode line-by-line like an Interpreter to reduce start up time, but if encounters with a hot section with repeatedly executed line of code then optimizes that code using Baseline or Optimizing Compiler.
JIT Compiler in a Nutshell: Compiler in Python translates our high level source code into bytecode and to execute bytecode, some implementations have normal Interpreter, some have Just-in-time Compiler. To execute a loop which runs say, million times i.e. a very hot code, initially Interpreter will run it for some time and Monitor of JIT Compiler will watch that code. Then when it gets repeated some times i.e. the code becomes warm* then JIT compiler will send that code to Baseline Compiler which will make some assumptions on variable types etc. based on data gathered by Monitor while watching the code. From next iterations if assumptions turn out to be valid, then no need to retranslate bytecode into machine code, i.e. steps can be skipped for faster execution. If the code repeats a lot of times i.e. code becomes very hot then JIT compiler will send that code to Optimizing Compiler which will make more assumptions and will skip more steps for very fast execution.
JIT Compiler Drawbacks: initial slower execution when the code gets analysed, and if assumptions turn out to be false then optimized compiled code gets thrown out i.e. Deoptimization or Bailing out happens which can make code execution slower, although JIT compilers has limit for optimization/deoptimization cycle. After certain number of deoptimization happens, JIT Compiler just does not try to optimize anymore. Whereas normal Interpreter, in each iteration, repeatedly translates bytecode into machine code thereby taking more time to complete a loop which runs say, million times
IronPython
C# implementation of python, targeting the .NET framework
Ruby Python
works with Ruby platform
Anaconda Python
Distribution of python and R programming languages for scientific computing like, data science, machine learning, artificial intelligence, deep learning, handling large volume of data etc. Numerous number of libraries like, scikit-learn, tensorflow, pytorch, numba, pandas, jupyter, numpy, matplotlib etc. are available with this package
Stackless
Python for Concurrency
To test speed of each implementation, we write a program to call integrate_f 500 times using an N value of 50,000, and record the execution time over several runs. Below table shows the benchmark results :
Implementation
Execution Time (seconds)
Speed Up
CPython
9.25
CPython + Cython
0.21
44x
PyPy
0.57
16x
implementation means what language was used to implement Python and not how python Code would be implemented. The advantage of using CPython is the availability of C Run-time as well as easy integration with C/C++.
So CPython was originally implemented using C. There were other forks to the original implementation which enabled Python to lever-edge Java (JYthon) or .NET Runtime (IronPython).
Based on which Implementation you use, library availability might vary, for example Ctypes is not available in Jython, so any library which uses ctypes would not work in Jython. Similarly, if you want to use a Java Class, you cannot directly do so from CPython. You either need a glue (JEPP) or need to use Jython (The Java Implementation of Python)
You should know that CPython doesn't really support multithreading (it does, but not optimal) because of the Global Interpreter Lock. It also has no Optimisation mechanisms for recursion, and has many other limitations that other implementations and libraries try to fill.
You should take a look at this page on the python wiki.
Look at the code snippets on this page, it'll give you a good idea of what an interpreter is.
The original, and standard, implementation of Python is usually called CPython when
you want to contrast it with the other options (and just plain “Python” otherwise). This
name comes from the fact that it is coded in portable ANSI C language code. This is
the Python that you fetch from http://www.python.org, get with the ActivePython and
Enthought distributions, and have automatically on most Linux and Mac OS X machines.
If you’ve found a preinstalled version of Python on your machine, it’s probably
CPython, unless your company or organization is using Python in more specialized
ways.
Unless you want to script Java or .NET applications with Python or find the benefits
of Stackless or PyPy compelling, you probably want to use the standard CPython system.
Because it is the reference implementation of the language, it tends to run the
fastest, be the most complete, and be more up-to-date and robust than the alternative
systems.
A programming language implementation is a system for executing computer programs.
There are two general approaches to programming language implementation:
Interpretation: An interpreter takes as input a program in some language, and performs the actions written in that language on some machine.
Compilation: A compiler takes as input a program in some language, and translates that program into some other language, which may serve as input to another interpreter or another compiler.
Python is an interpreted high-level programming language created by Guido van Rossum in 1991.
CPython is reference version of the Python computing language, which is written in C created by Guido van Rossum too.
Other list of Python Implementations
Source
Cpython is the default implementation of Python and the one which we get onto our system when we download Python from its official website.
Cpython compiles the python source code file with .py extension into an intermediate bytecode which is usually given the .pyc extension, and gets executed by the Cpython Virtual Machine. This implementation of Python provides maximum compatibility with the Python packages and C extension modules.
There are many other Python implementations such as IronPython, Jython, PyPy, CPython, Stackless Python and many more.

Game Library with Support for RPython

Are there any Python game libraries (Pygame, Pyglet, etc.) with support for RPython? Or game libraries specifically made for RPython? Or bindings for a game library for RPython?
Yes, check out the gameboy interpreter written in RPython, pygirl. https://bitbucket.org/pypy/lang-gameboy
RPython is not Python. They are different languages even though it just so happens that you can execute RPython code with a Python interpreter and get similar effects (much more slowly) to running the compiled RPython program. It is extremely unlikely for there to ever be any reasonable Python library (game library or any other kind) that also works in RPython. It would have to be specifically written for RPython, and if it worked in RPython it would be so inflexible when considered as a regular Python library that nobody would use it in that context.
If you want to program in a lower level compiled language with game libraries, use C# or Java. RPython is not a good competitor (in large part because it has very few libraries for anything, not even much of a standard library).
If you want to program in Python, use Python (possibly use the PyPy implementation of Python, which can be faster than the standard Python interpreter if it supports all the libraries you're using). RPython will not help you write Python code that goes faster.
If your goal is not to specifically produce a game, but rather to do some project in RPython because you want to learn RPython specifically, then that's a perfectly reasonable aim. But writing a game in will probably not be the most helpful project for learning to write RPython; you should probably consider writing some sort of interpreter instead, since that's what RPython is designed to do.

python compiler

I have a few queries regarding python
Why is there no python compiler to create native code? I have found py2exe etc but they just pack a python interpreter along with them and hence, it is again the interpreter executing the code.
Is it not possible to create a python compiler like a LISP compiler and hence the code will execute faster(compared to C++)?
Thanks,
Vinay
Nuitka – Python Compiler
What it is
I thought there ought to be possible to use a compiler for Python, a better compiler than what CPython already has with its bytecode. This is what Nuitka is supposed to be.
It is my attempt to translate pure Python not into bytecode, but into machine code (via C++ compiler), while using libpython at run time. And then to do compile time and also run time analysis to speculatively execute things in a faster mode if certain expectations are met.
Question 1:
Nuitka (Direct Python code to C++)
ShedSkin (Compiles implicitly statically typed Python to C++, stand-alone programs or
extension modules)
Cython (From a superset of Python to C
extensions. Cython comes from Pyrex)
Question 2:
Not sure if I get it correctly but maybe the answer is:
psyco (A Just in time compiler (JIT) for Python Code, the
predecessor of the PyPy JIT )
The nearest equivalents for Python are cython and pypy.
There is, sort of.
See Cython -- I haven't had a chance to fully explore it yet, but as best as I can tell, it directly compiles Python code. You can also use (optional) static typing -- it won't be vanilla Python anymore, but it can lead to a speed boost, if you do it right. Also, see this: Can Cython compile to an EXE?
It might be because I don't have much experience with Lisp, but I'm not entirely sure by what you mean by 'create a Python compiler like a Lisp compiler'.
Numba is a newer Python compiler based on NumPy & LLVM, which falls back to CPython.

Use Cython as Python to C Converter

I have huge Python modules(+8000 lines) .They basically have tons of functions for interacting with a hardware platform via serial port by reading and writing to hardware registers.
They are not numerical algorithms. So application is just reading/writing to hardware registers/memory. I use these libraries to write custom scripts. Eventually, I need to move all these stuff to be run in an embedded processor on my hardware to have finer control, then I just kick off the event from PC and the rest is in hardware.
So I need to convert them to C.If I can have my scripts be converted to C by an automatic tool, that would save me a huge time. This is why I got attracted to Cython. Efficiency is not important my codes are not number crunchers. But generated code should be relatively small to fit in my limited memory (few hundreds of Kilobytes).
Can I use Cython as converter for my custom Python scripts to C? My guess is yes, in which case can I use these .c files to be run in my hardware? My guess is not since I have to have Cython in my hardware as well for them to run. But if just creates some .c files, I can go through and make it standalone since code is not using much of features of Python it just use it as a fast to implement script.
Yes, at its core this is what Cython does. But ...
You don't need Cython, however, you do need libpython. You may feel like it doesn't use that many Python features, but I think if you try this you'll find it's not true -- you won't be able to separate your program from its dependence on libpython while still using the Python language.
Another option is PyPy, specifically it's translation toolchain, NOT the PyPy Python interpreter. It lets you translate RPython, a subset of the Python language, into C. If you really aren't using many Python language features or libraries, this may work.
PyPy is mostly known as an alternative Python implementation, but it is also a set of tools for compiling dynamic languages into various forms. This is what allows the PyPy implementation of Python, written in (R)Python, to be compiled to machine code.
If C++ is available, Nuitka is a Python to C++ compiler that works for regular Python, not just RPython (which is what shedskin and PyPy use).
If C++ is available for that embedded platform, there is shed skin, it converts python into c++.

Is it possible to compile Python natively (beyond pyc byte code)?

I wonder if it is possible to create an executable module from a Python script. I need to have the most performance and the flexibility of Python script, without needing to run in the Python environment. I would use this code to load on demand user modules to customize my application.
There's pyrex that compiles python like source to python extension modules
rpython which allows you to compile python with some restrictions to various backends like C, LLVM, .Net etc.
There's also shed-skin which translates python to C++, but I can't say if it's any good.
PyPy implements a JIT compiler which attempts to optimize runtime by translating pieces of what's running at runtime to machine code, if you write for the PyPy interpreter that might be a feasible path.
The same author that is working on JIT in PyPy wrote psyco previously which optimizes python in the CPython interpreter.
You can use something like py2exe to compile your python script into an exe, or Freeze for a linux binary.
see: How can I create a directly-executable cross-platform GUI app using Python?
I've had a lot of success using Cython, which is based on and extends pyrex:
Cython is a language that makes
writing C extensions for the Python
language as easy as Python itself.
Cython is based on the well-known
Pyrex, but supports more cutting edge
functionality and optimizations.
The Cython language is very close to
the Python language, but Cython
additionally supports calling C
functions and declaring C types on
variables and class attributes. This
allows the compiler to generate very
efficient C code from Cython code.
This makes Cython the ideal language
for wrapping for external C libraries,
and for fast C modules that speed up
the execution of Python code.
I think you can use jython to compile python to Java bytecode, and then compile that with GCJ.

Categories

Resources