Speeding up Parts of Existing Python App with PyPy or Shedskin

Speeding up Parts of Existing Python App with PyPy or Shedskin - python

I am looking to bring speed improvements to an existing application and I'm looking for advice on my possible options. The application is written in Python, uses wxPython, and is packaged with py2exe (I only target windows platforms). Parts of the application are computationally intensive and run too slowly in interpreted Python. I am not familiar with C, so porting parts of the code over is not really an option for me.
So my question is basically do I have a clear picture of my options as I outline below, or am I approaching this from the wrong direction?
Running with pypy: Today I started experimenting with Pypy - the results are exciting, in that I can run large parts of the code from the pypy interpreter and I'm seeing 5x+ speed improvements with no code changes. However, if I understand correctly, (a) Pypy with wxpython support is still a work in progress, and (b) I cannot compile it down to an exe for distribution anyway. So unless I'm mistaken, this seems like a no-go for me? There's no way to package things up so parts of it are executed with pypy?
Converting code to RPython, translating with pypy So the next option seems to be actually rewriting parts of the code to the pypy restricted language, which seems like a pretty large job. But if I do that, parts of the code can then be compiled to an executable (?) and then I can access the code through ctypes (?).
Other restricted options Shedskin seems to be a popular alternative here, does this fit my requirements better? Other options seem to be Cpython, Psyco, and Unladen, but they are all superseded or no longer maintained.

Using PyPy indeed rules out py2exe and similar tools, at least until one is ported (AFAIK there is no active work on that). Still, as PyPy binaries do not need to be installed, you might get away with a more complicated distribution that includes both your Python source code and a PyPy binary+stdlib and uses a small wrapper (batch file, executable) to ease launching. I can't comment on whether WxPython on PyPy is mature enough to be used, but perhaps someone on pypy-dev, wxpython-dev or either one's IRC channel can give a recommendation if you describe your situation.
Translating your code into RPython does not seem viable to me. The translation toolchain is not really a tool for general purpose development, and producing a C dll for embedding/ctypes seems nontrivial. Also, RPython code really is low-level, making your Python code restricted enough may amount to rewriting half of it.
As for other restricted options: You seem to mix up CPython (the original Python interpreter written in C) with Cython (a compiler for a Python-like language that emits C code suitable for CPython extension modules). Both projects are active. I'm not very familiar with Shedskin, but it seems to be a tool for developing whole programs, with little or no interaction with non-restricted Python code. Cython seems a much better fit: Although it requires manual type annotations and lower-level code to achieve really good performance, it's trivial to use from Python: The very purpose of the project is producing extension modules.

I would definitely look into Cython, I've been playing with it some and have seen speedups of ~100x over pure python. Use the profile module to find the bottlenecks first. Usually the loops are the biggest chances to increase speed when going to Cython. You should also look into seeing if you can use array/vector operations in Numpy instead of loops, if so that can also give extreme performance boosts. For instance:
a = range(1000000)
for i in range(len(a)):
a[i] += 5
is slow, real slow. On the other hand:
a = numpy.arange(10000000)
a = a +5
is fast, real fast.

Correction: shedskin can be used to generare extention modules, as well as whole programs.

Related

AoT Compiler for Python

I want to get my Python script working on a bare metal device like microcontroller WITHOUT the need for an interpreter. I know there are already JIT compilers for Python like PyPy, and interpreters like CPython.
However, existing interpreters I've seen (such as CPython) take up large memory (in MB range).
Is there an AOT compiler for Python (i.e. compiling directly to native hardware through intermediary like LLVM)?
I assume such a compiler would enable Python to run much faster compared to existing implementations AND with lower memory footprint. If there is, I wonder why that solution hasn't been popularized.

As you already mentioned Cython is an option (However, it is true that the result is big due since the C runtime need to implement the Python functionality together with your program).
With regards to LLVM there was a project by Google named unladen swallow. However, that project is mostly abandoned. You can find some information about it here
Basically it was an attempt to bring LLVM optimizations into the runtime of Cython. E.g JITTING Python code.
Another old alternative was shed skin which compiles Python to C++. Some information about it can be found here.
Yet another option similar to shed skin is to restrict yourself to a subset of the Python language and use micropython.

An alternative would be to use GraalVM with Truffle AOT with Python.
It's basically Python running on an optimized AOT for jvm.
The project looks promising. You can chek this link here:
https://www.graalvm.org/22.2/graalvm-as-a-platform/language-implementation-framework/AOT/

Recently came across codon,
Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 10-100x or more, on a single thread. Codon's performance is typically on par with (and sometimes better than) that of C/C++.

Is it possible to make a Python script its own OS?

I have written a Python script that I want to be able to run separately from the OS. As a result, I want it to be able to run as pretty much its own OS. Everything is shell based and there are no GUI components. How would I go about making a bootable version of this Python script, and is it even possible? Or would i have to put a bare-bones OS with the Python script installed as an addition?

For all practical purposes, the answer is no, it's not possible. The bare-bones os with a python install is your best option.

It is theoretically possible to implement a stand-alone Python interpreter, which runs as the OS of a system. That has been done with several interpreter languages. First of all, BASIC, then APL, Forth, Smalltalk, Java, and most likely some I'm not aware of. There is no reason why this couldn't be done with Python, it merely needs a bit of implementation work.
To give a rough estimation of the amount of work: a caffeine-driven expert coder takes about two days to implement a stand alone Forth interpreter, doubling as operating system, from scratch, on a platform (s)he's familiar with. By then the system will be far from finished or complete, but it will compile source, and allow extending itself with the results of that compilation. Other languages would differ. In case of Java, only porting an assembly implementation from one to an entirely different CPU took a team of 4 people about 2 weeks, while the back end of Java, the JVM, has a considerable resemblance to Forth. I'm not knowledgeable enough about Python inner workings to be able to give a meaningful estimate of the effort to implement a stand alone Python interpreter, but it could be roughly comparable to the effort of implementing a Java VM.

Performance differences between python from package and python compiled from source

I would like to know if there are any documented performance differences between a Python interpreter that I can install from an rpm (or using yum) and a Python interpreter compiled from sources (with a priori well set flags for compilations).
I am using a Redhat 6.3 machine as Django/Apache/Mod_WSGI production server. I have already properly compiled everything in different setups and in different orders. However, I usually keep the build-dev dependencies on such machine. For some various ego-related (and more or less practical) reasons, I would like to use Python-2.7.3. By default, Redhat comes with Python-2.6.6. I think I could go with it but it would hurt me somehow (I would have to drop and find a replacement for a few libraries and my ego).
However, besides my ego and dependencies, I would like to know what would be the impact in terms of performance for a Django server.

If you compile with the exact same flags that were used to compile the RPM version, you will get a binary that's exactly as fast. And you can get those flags by looking at the RPM's spec file.
However, you can sometimes do better than the pre-built version. For example, you can let the compiler optimize for your specific CPU, instead of for "general 386 compatible" (or whatever the RPM was optimized for). Of course if you don't know what you're doing (or are doing it on purpose), it's always possible to build something slower than the pre-built version, too.
Meanwhile, 2.7.3 is faster in a few areas than 2.6.6. Most of them usually won't affect you, but if they do, they'll probably be a big win.
Finally, for the vast majority of Python code, the speed of the Python interpreter itself isn't relevant to your overall performance or scalability. (And when it is, you probably want to try PyPy, Jython, or IronPython to replace CPython.) This is especially true for a WSGI service. If you're not doing anything slow, Apache will probably be the bottleneck. If you are doing anything slow, it's probably something I/O bound and well outside of Python's control (like reading files).
Ultimately, the only way you can know how much gain you get is by trying it both ways and performance testing. But if you just want a rule of thumb, I'd say expect a 0% gain, and be pleasantly surprised if you get lucky.

Developing PyPy's Rpython as a general programming language [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Is there any interest in developing Rpython (Restricted Python) from the PyPy project as a general purpose programming language? Perhaps it could be a fork from the PyPy project. Does such a project exist? Since the programs are compiled, one could simply contribute modules written in Rpython, and it could compete with other python implementations including CPython and PyPy.

I can't speak for everyone else, but I personally am extremely interested in using RPython as a general-purpose language. To answer some other people's questions:
Why? Because Cython is a pain to figure out how to use. If you don't put in a lot of tricky type declarations just right, you don't get any speedup. With RPython, it will run fast or it won't run at all.
Using PyPy offers a good speedup, but currently not nearly as much as RPython.
RPython might be a good way to get super-fast, somewhat Pythonic code. Here's an example to help you get started. I'm not aware of any large projects to do this, unfortunately.

Yes, there is already a project to use the translation tool chain of PyPy to create standalone executables and libraries using RPython. It is called RPythonic.

A general purpose RPython would not be a competitor for CPython and PyPy. It wouldn't even be a competitor for things like Cython.
Why? Because RPython is not Python!! RPython is merely a language which shares some syntax with Python; enough that a Python interpreter can execute RPython code, but you cannot take Python code and compile it as RPython.
The restrictions on Python that are added to enable RPython to be statically typed and compiled (indeed, to have static type inference) are so severe that they completely change the language. If you want to write an RPython program, you have to decide to do so up front and write it in RPython. You can't write a program in Python and then decide to compile it as RPython, and you can't even tweak a realistic Python program a bit to make it RPython. Normal Python code is nothing like normal RPython code; writing RPython is more similar to writing Java or C#.
So if you want to write general programs in Python but you want them to go faster, RPython has very little to offer you. That's the niche PyPy's Python interpreter is trying to fill.
If you want to write general programs in a lower level compiled language because you need your program to run faster than Python, then there are existing very mature languages and libraries for that, like Java or C#.
The reason to code in RPython would be to do something that is particularly made better by RPython. Like writing interpreters to which you can add garbage collectors and JIT compilers without having to write them by hand! Here RPython shines, and yes I would be very interested in a more polished and usable RPython interpreter-writing environment. But as a general purpose programming language for writing programs that don't particularly benefit from RPython's specialities, it would simply be a monstrous amount of work to get it to the point where it could compete with existing languages that already fill that role better. It barely even has a standard library now (because almost none of Python's extensive standard library is usable in RPython), for example.

From the looks of it, the restrictions are quite severe and on the whole it's a lot less to program in, I imagine. That's necessary for implementing PyPy, but generally if you want fast compiled code that can interact with Python, you'd use Cython (which is targeted at CPython extensions and supports pretty much all of Python seamlessly) or write code in one of the more common languages that can do this. And if you just want fast, compiled code... well, RPython may be more pleasant than e.g. C, but still, I don't see a significant advantage here (at least none that would warrant the effort to create a usable, stable language).

Why would I want to write directly in RPython?
It seems so much simpler to Python code and run PyPy.
Why would I want to write C code?
It seems so much simpler to write Python and have PyPy be implemented in C
Why would I want to write assembler code?
It seems so much simpler to write Python and have PyPy implemented in C and C implemented in Assembler.
I guess it really is turtles all the way down.
Why would I want to stop using the most convenient language and switch to a less convenient language?
What's the value in giving up a nice language?

how to speed up code?

i want to speed my code compilation..I have searched the internet and heard that psyco is a very tool to improve the speed.i have searched but could get a site for download.
i have installed any additional libraries or modules till date in my python..
can psyco user,tell where we can download the psyco and its installation and using procedures??
i use windows vista and python 2.6 does this work on this ??

I suggest to not rely on this tools, anyway psycho is being replaced by the new python implementations as PyPy and unladen swallow. To speed up "for free" you can use Cython and Shedskin. Anyway this is not the right way to speedup the code in my opinion.
If you are looking for speed here are some hints:
Profiling
Profiling
Profiling
You should use the cProfile module and find the bottlenecks, then proceed with the optimization.
If the optimization in python isn't enough, rewrite the relevant parts in Cython and you're ok.

Psyco does not speed up compilation (in fact, it would slow it down). However, if your problem is compilation speed in Python, there are some serious problems with your code.
If you are trying to improve runtime performance, Psyco does work with 32bit operating systems and Python version 2.5. The latest version is the first Google result for Psyco: http://psyco.sourceforge.net/
Psyco is no longer an "interesting" project as Python 3.x has gained Unladen Swallow, and most of the developer attention is divided between that and PyPy.
There are other ways of improving performance, not limited to Cython and Shed Skin

So it seems you don't want to speed up the compile but want to speed up the execution.
If that is the case, my mantra is "do less." Save off results and keep them around, don't re-read the same file(s) over and over again. Read a lot of data out of the file at once and work with it.
On files specifically, your performance will be pretty miserable if you're reading a little bit of data out of each file and switching between a number of files while doing it. Just read in each file in completion, one at a time, and then work with them.

Use the appropriate data structures. If you see that you are doing a lot
if element in list
#or
list.index(element)
then you might be better off with sets and dictionaries.
Don't create a list only to iterate over it, use generators or the itertools module.
Read Python Performance Tips
As already mentioned, do profiling.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.