Is it possible to compile Python natively (beyond pyc byte code)?

Is it possible to compile Python natively (beyond pyc byte code)? - python

I wonder if it is possible to create an executable module from a Python script. I need to have the most performance and the flexibility of Python script, without needing to run in the Python environment. I would use this code to load on demand user modules to customize my application.

There's pyrex that compiles python like source to python extension modules
rpython which allows you to compile python with some restrictions to various backends like C, LLVM, .Net etc.
There's also shed-skin which translates python to C++, but I can't say if it's any good.
PyPy implements a JIT compiler which attempts to optimize runtime by translating pieces of what's running at runtime to machine code, if you write for the PyPy interpreter that might be a feasible path.
The same author that is working on JIT in PyPy wrote psyco previously which optimizes python in the CPython interpreter.

You can use something like py2exe to compile your python script into an exe, or Freeze for a linux binary.
see: How can I create a directly-executable cross-platform GUI app using Python?

I've had a lot of success using Cython, which is based on and extends pyrex:
Cython is a language that makes
writing C extensions for the Python
language as easy as Python itself.
Cython is based on the well-known
Pyrex, but supports more cutting edge
functionality and optimizations.
The Cython language is very close to
the Python language, but Cython
additionally supports calling C
functions and declaring C types on
variables and class attributes. This
allows the compiler to generate very
efficient C code from Cython code.
This makes Cython the ideal language
for wrapping for external C libraries,
and for fast C modules that speed up
the execution of Python code.

I think you can use jython to compile python to Java bytecode, and then compile that with GCJ.

Related

Please explain to me how does Python interpreter executes modules written in C/C++?

I'm trying to understand how it works. I know that Python interpreter translates python source code to byte code representation for a virtual machine (Python interpreter is a virtual machine) and executes those instructions. Where exactly does a C/C++ code comes in here? Does this virtual machine (Python interpreter) can also compile and execute C/C++ code?
I don't even know exactly what are right questions to ask here, just want a good explanation of how.
My background: I programmed in Python for a long time (mostly analytics/ML) and I have some basic understanding of computer systems, C compilation process, memory and processor. But I am not even close to being an expert on it.
I just want good understanding, not so much practical tips on how to create a Python module in C.
Thank you, I really appreciate your help!

It's all about a predictable entrypoint. The CPython reference interpreter (and other interpreters like PyPy that support C extensions of this sort), told to look for a module of a given name and finding a file matching the naming conventions for extension modules in one of the sys.path directories (e.g. for the spam module built for CPython 3.10 on x86-64 Linux, it would look for spam.cpython-310-x86_64-linux-gnu.so):
Uses the OS standard method for loading a dynamic library (aka shared object), e.g. LoadLibrary on Windows, dlopen on POSIX systems
Loads the entrypoint (using GetProcAddress on Windows, dlsym on POSIX) in it matching the specified naming convention, e.g. for the module named spam, it looks for a function named PyInit_spam following C name-mangling rules
Invokes that function, which is then wholly responsible for all other setup (calling PyModule_Create, performing any modifications to said module object, and returning it). The various APIs it invokes are what publish information for use by the user.

CPython, the "standard" Python interpreter, is written in C. It provides an extension API in C, so extensions written in C or C++ can register themselves to be called like normal Python modules. The Python interpreter cannot compile C or C++; it is the extension writer's responsibility to compile the module. But Python can run arbitrary C and C++ code through the help of that API.

Does WebAssembly run faster if written in C as opposed to Python?

There's a long list of languages that can be compiled into Wasm. Is there any performance gain from writing in something like C or Rust over Python? Or is it all the same since it is being compiled to Wasm?

Short answer: Yes, because Python, the language itself, is not compiled to Wasm, but its interpreter.
Saying Python supports Wasm does not always means the same. Firstly, Python is NOT a compiled language, it's a script language. Don't expect a script language will be compiled to a native (or Wasm) language because it is not meant to work that way.
Then how Python supports Wasm? Python interpreters/runtimes like cpython, which is written in C, are compiled to Wasm. There are two popular Python runtimes that supports Python: pyodide and Wasm port for micropython (there are a lot of efforts to run Python in a browser besides the two). Both of them are interpreters that translate Python to their own bytecode and then execute bytecode in Wasm. Of course there will be huge performance penalties just like cpython in the native environment.

Compiling to WebAssembly is basically just simulating a special form of assembly targeting virtual hardware. When you read "can compile language X" into Wasm, it doesn't always mean the language literally compiles directly to Wasm. In the case of Python, to my knowledge, it means "they compiled Python interpreters to Wasm" (e.g. CPython, PyPy), so the whole Python interpreter is Wasm, but it still interprets Python source code files normally, it doesn't convert them to special Wasm modules or anything. Which means all the overhead of the Python interpreter is there, on top of the overhead of the Wasm engine, etc.
So yes, C and Rust (which can target Wasm directly by swapping out the compiler backend) will still run faster than Python code targeting CPython compiled to Wasm, for the same reasons. Tools that speed up Python when run natively (e.g. Cython, raw CPython C extensions, etc.) may also work in Wasm to get the same speed ups, but it's not a free "Compile slow interpreted language to Wasm and become fast compiled language"; computers aren't that smart yet.

What does translate mean in pypy?

I'm reading pypy's document, which has a section called Translating the PyPy Python interpreter. But i don't understand what does the word translate mean. Is it the same as compile?
The document says:
First download a pre-built PyPy for your architecture which you will use to translate your Python interpreter.
Is the pre-built PyPy here refer to the source code? Because there is no pypy/goal directory in the binary I have downloaded. If so, there is something wrong with the document. It is misleading.
Is the pypy-c created in the translation the same thing as bin/pypy in the binary?

i don't understand what does the word translate mean. Is it the same as compile?
What "translation" means is described in detail in The RPython Toolchain. There's also some higher-level introductory information in the Coding Guide and FAQ.
Summarizing their summary:
Compile and import the complete RPython program.
Dynamically analyze the program and annotate it with flow graphs.
Compile the flow graphs into lower-level flow graphs.
Optimize the compiled flow graphs.
Analyze the compiled and optimized flow graphs.
Generate C source from the flow graphs and analysis.
Compile and link the C source into a native executable.
So, step 1 uses the normal Python compiler, step 7 uses the normal C compiler (and linker), and steps 3 and 4 are similar to the kind of thing an optimizing compiler normally does. But calling the overall process "compilation" would be misleading. (Also, people would probably interpret it to mean something akin to what Shedskin does, which is definitely not right.)
Is the pypy-c created in the translation the same thing as bin/pypy in the binary?
What ends up in a binary distribution is basically the same as if you run the install process on the translation goal. So, yes, goal/pypy-c and bin/pypy are effectively the same thing.
Is the pre-built PyPy here refer to the source code?
No. It refers to a bin/pypy from a binary distribution. As the docs say, you can actually use any Python 2.6+, including CPython, or a goal/pypy-c left over from a previous build, etc. However, the translator will probably run fastest on the standard PyPy binary distribution, so that's what you should use unless you have a good reason to do otherwise.

Let me give you what I can - PyPy is several things:
A fast implementation of Python using a Just-In-Time compiler (written in RPython)
The RPython JIT compiler compiler
When the docs talk about translating the interpreter they are talking about generating a JIT compiler for Python out of the RPython implementation of the Python compiler.
Python Compiler (Written in RPython)
|--[RPython to JIT compiler compiler]-->
PyPy (JIT'ed Python Interpreter)
The key thing to note is that "compiler compiler" is not a typo. RPython is part of a toolchain used to generate JIT compilers. Rather than writing a compiler for your language and then writing a JIT layer for your compiler (which can be difficult and time consuming) instead you implement your language in RPython and the RPython translation toolchain writes you a JIT compiler for your language.
The easiest way to think about this is to imagine that the PyPy team hadn't written their own JIT compiler compiler. Imagine instead that Topaz (JIT Ruby) came first and that team had written a JIT compiler compiler in Ruby (we'll call it RRuby). The PyPy team would have then written the PyPy compiler in RRuby (instead, since PyPy came first, the Topaz team is implementing their JIT Ruby compiler in RPython).

python compiler

I have a few queries regarding python
Why is there no python compiler to create native code? I have found py2exe etc but they just pack a python interpreter along with them and hence, it is again the interpreter executing the code.
Is it not possible to create a python compiler like a LISP compiler and hence the code will execute faster(compared to C++)?
Thanks,
Vinay

Nuitka – Python Compiler
What it is
I thought there ought to be possible to use a compiler for Python, a better compiler than what CPython already has with its bytecode. This is what Nuitka is supposed to be.
It is my attempt to translate pure Python not into bytecode, but into machine code (via C++ compiler), while using libpython at run time. And then to do compile time and also run time analysis to speculatively execute things in a faster mode if certain expectations are met.

Question 1:
Nuitka (Direct Python code to C++)
ShedSkin (Compiles implicitly statically typed Python to C++, stand-alone programs or
extension modules)
Cython (From a superset of Python to C
extensions. Cython comes from Pyrex)
Question 2:
Not sure if I get it correctly but maybe the answer is:
psyco (A Just in time compiler (JIT) for Python Code, the
predecessor of the PyPy JIT )

The nearest equivalents for Python are cython and pypy.

There is, sort of.
See Cython -- I haven't had a chance to fully explore it yet, but as best as I can tell, it directly compiles Python code. You can also use (optional) static typing -- it won't be vanilla Python anymore, but it can lead to a speed boost, if you do it right. Also, see this: Can Cython compile to an EXE?
It might be because I don't have much experience with Lisp, but I'm not entirely sure by what you mean by 'create a Python compiler like a Lisp compiler'.

Numba is a newer Python compiler based on NumPy & LLVM, which falls back to CPython.

Use Cython as Python to C Converter

I have huge Python modules(+8000 lines) .They basically have tons of functions for interacting with a hardware platform via serial port by reading and writing to hardware registers.
They are not numerical algorithms. So application is just reading/writing to hardware registers/memory. I use these libraries to write custom scripts. Eventually, I need to move all these stuff to be run in an embedded processor on my hardware to have finer control, then I just kick off the event from PC and the rest is in hardware.
So I need to convert them to C.If I can have my scripts be converted to C by an automatic tool, that would save me a huge time. This is why I got attracted to Cython. Efficiency is not important my codes are not number crunchers. But generated code should be relatively small to fit in my limited memory (few hundreds of Kilobytes).
Can I use Cython as converter for my custom Python scripts to C? My guess is yes, in which case can I use these .c files to be run in my hardware? My guess is not since I have to have Cython in my hardware as well for them to run. But if just creates some .c files, I can go through and make it standalone since code is not using much of features of Python it just use it as a fast to implement script.

Yes, at its core this is what Cython does. But ...
You don't need Cython, however, you do need libpython. You may feel like it doesn't use that many Python features, but I think if you try this you'll find it's not true -- you won't be able to separate your program from its dependence on libpython while still using the Python language.
Another option is PyPy, specifically it's translation toolchain, NOT the PyPy Python interpreter. It lets you translate RPython, a subset of the Python language, into C. If you really aren't using many Python language features or libraries, this may work.
PyPy is mostly known as an alternative Python implementation, but it is also a set of tools for compiling dynamic languages into various forms. This is what allows the PyPy implementation of Python, written in (R)Python, to be compiled to machine code.
If C++ is available, Nuitka is a Python to C++ compiler that works for regular Python, not just RPython (which is what shedskin and PyPy use).

If C++ is available for that embedded platform, there is shed skin, it converts python into c++.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.