Python: Using modules which support different Python versions in one project - python

I have 2 python modules where one only supports Python 2.x and the other one 3.x.
Unfortunately I need both for a project.
My workaround for now is to have them run on their own as separate programs and building up their communication via the socket module.
I will be ending up with 2 executables, what I would like to avoid.
The "connection" between both modules has to be as fast as possible.
So my question is if there is a way to somehow combine both to one executable at the end and if there is a better solution for a fast communication as the client-server construction I have now.

There really is no good way to avoid that workaround.
Conceptually, there's no reason that you couldn't embed two interpreters into the same process. But practically, the CPython interpreter depends on some static/global state. While 3.7 is much better about that than, say, 3.0 or 2.6 was, that state still hasn't nearly been eliminated.1 And, the way C linkage works, there's no way to get around that without changing the interpreter.
Also, embedding CPython isn't hard, but it's not trivial, in the way that running an interpreter as a subprocess is trivial—and it may be harder than coming up with an efficient way to pass or share state between subprocesses.
Of course there are other interpreters besides CPython. But the other major implementation with both 2.7 and 3.x versions isn't easily embeddable (PyPy), and the two that are easily embeddable don't have 3.x versions, and also can only be embedded in another VM, and can't run C extension modules (Jython and IronPython). It is possible to do something like using JEP to embed CPython 3.7 via JNI in a JVM while also using Jython 2.7 natively in that same JVM, but I doubt that approach will work for you.
Meanwhile, I mentioned that passing or sharing data between processes generally isn't that hard.
If you don't have that much data, you can usually just pass it pickled over a pipe.
If you do have a ton of data, it usually is, or could be, stored in memory in some structured form—numpy arrays, big hunks of ASCII or UTF-8 text, arrays of ctypes structs, etc.—that you can overlay on an mmap or shared memory segment.
Or, of course, you can come up with your own protocol and communicate with it over a (UNIX or IP) socket. But you don't necessarily have to jump right to that option.
Notice that multiprocessing supports both of the first two—although to take advantage of it with independent interpreters, you have to dig into its source and pull out the bits you need. And there are also third-party libraries that can help. (For example, if you need to pickle things that don't pickle natively, the answer is often as simple as "replace pickle with dill".)
1. Running multiple subinterpreters in various restricted ways does sort of work with things like mod_wsgi, and PEP 554 aims to get things to the state where you can easily and cleanly run multiple 3.7 subinterpreters in the same process, but still nothing like completely independent embeddings of CPython—the subinterpreters share a GIL, a cycle collector, an atexit handler, etc.

Related

Isolated Sub-Interpreters in Python without GIL

There are PEP-554 and PEP-684. Both are designed to support multiple interpreters on Thread level.
Does anyone know if these PEPs are implemented somewhere at least in experimental or pre-release versions of Python like 3.11?
I found out that Python 3.10 (maybe even 3.9) has these features in experimental build. If you build CPython by configuring with following flag:
./configure --with-experimental-isolated-subinterpreters
or by adding define to compile command when compiling all .c files:
#define EXPERIMENTAL_ISOLATED_SUBINTERPRETERS 1
I posted request of enabling this feature into one famous project, see issue here.
After enabling this feature as I suppose I will be able to create separate interpreters inside multiple threads (not processes), meaning that I don't need multiprocessing anymore.
More than that when using multiple Interpreters according to this feature description there is no need to have single GIL, every Interpreter in separate thread has its own GIL. It means that even if Interpreters are created inside thread, still all CPU cores are used, same like in multiprocessing. Current Python suffers from GIL only because it forces to use only single CPU core, thus multiprocessing is used by people to overcome this and use all CPU cores.
In description of these features it was said that authors had to modify by hand 1500 static and global variables by moving them all into per-thread local table inside thread state structure.
Presumably all these new features can be used right now only from Python C API.
If someone here knows how to use these isolated sub-interpreters features, can you provide some Python code or C API code with detailed example on how to use them?
Specifically I'm interested in how to use interpreters in such a way that all CPU cores are used, i.e. I want to know how to avoid single GIL, but to use multiple GILs (actually local locks, LILs). Of course I want inside threads, without using multiprocessing.

Interface Jython script from Python3 module?

If I have a script, or in this case just a function or two, written in Jython -- is there a way to interact with that code from my Python3 project?
No, not until Jython catches up with CPython enough for your whole Python 3 project to run in Jython. You can't run part of a Python application with one interpreter and the rest with another. You might be able to juggle multiple processes using remote procedure calls using pickle, but it'll be complex and brittle, not to mention slow (has to copy all data involved). If it's pure Python, just port those two functions to Python 3 (likely easy), or port your project to Python 2.5 (probably much harder). If it uses Jython's JVM interop, there are alternatives working with CPython, though possibly less mature. Depending on what you need Java for, there might be a pure Python alternative.

Cocoa: how safe is it to make an app that is coupled with python subroutines?

I want to make a Cocoa OS X app. I would prefer to use python scripts in it's core. However, not sure how safe is it. I know that python penetration is quite high, but what about version conflicts and migrations? Is it worth bundling whole python runtime into the OS X app?
Thanks.
So.... what this really boils down to is compatibility issues across versions, something that scripting languages are notoriously bad at maintaining. Python does better than most, but it is still quite problematic.
Apple has generally shipped legacy versions of interpreters on the system for exactly this reason. Thus, if you do rely on the system installed Python, I would recommend locking to a particular version. I.e. use /usr/bin/python2.6 and not the generic /usr/bin/python.
The alternative is as you state; bundle the python interpreter and any needed resources into your app. That is a bit of a pain the butt to do, but it addresses the compatibility issue. More or less; the reality is that Python is, effectively, an interface to the OS and, thus, is quite large with potential to break across any release. Not much you can about that, though.
Another possibility is to go the route that #kindall proposes; use PyObjC and implement your Cocoa application entirely or mostly in Python. Works fine. Been there, done that, and wouldn't do it again, frankly, as the maintenance/debugging issues of large scale scripted applications are nasty.
As an alternative, you might want to investigate using Lua (http://www.lua.org) as it is very much designed to be embedded in applications. Lua has a tiny interpreter and you can fully control exactly what features of your app are accessible at runtime. For example, World of Warcraft's UI is mostly implemented as Lua gluing together a set of fast UI primitives. Fully customizable on the client side, which is really impressive when you consider the security implications.
You should use py2app. It will bundle a Python executable, all the libraries you need, and your script together into a single executable. You can then add other executables (e.g. your Objective-C parts) into that app bundle.

Performance differences between python from package and python compiled from source

I would like to know if there are any documented performance differences between a Python interpreter that I can install from an rpm (or using yum) and a Python interpreter compiled from sources (with a priori well set flags for compilations).
I am using a Redhat 6.3 machine as Django/Apache/Mod_WSGI production server. I have already properly compiled everything in different setups and in different orders. However, I usually keep the build-dev dependencies on such machine. For some various ego-related (and more or less practical) reasons, I would like to use Python-2.7.3. By default, Redhat comes with Python-2.6.6. I think I could go with it but it would hurt me somehow (I would have to drop and find a replacement for a few libraries and my ego).
However, besides my ego and dependencies, I would like to know what would be the impact in terms of performance for a Django server.
If you compile with the exact same flags that were used to compile the RPM version, you will get a binary that's exactly as fast. And you can get those flags by looking at the RPM's spec file.
However, you can sometimes do better than the pre-built version. For example, you can let the compiler optimize for your specific CPU, instead of for "general 386 compatible" (or whatever the RPM was optimized for). Of course if you don't know what you're doing (or are doing it on purpose), it's always possible to build something slower than the pre-built version, too.
Meanwhile, 2.7.3 is faster in a few areas than 2.6.6. Most of them usually won't affect you, but if they do, they'll probably be a big win.
Finally, for the vast majority of Python code, the speed of the Python interpreter itself isn't relevant to your overall performance or scalability. (And when it is, you probably want to try PyPy, Jython, or IronPython to replace CPython.) This is especially true for a WSGI service. If you're not doing anything slow, Apache will probably be the bottleneck. If you are doing anything slow, it's probably something I/O bound and well outside of Python's control (like reading files).
Ultimately, the only way you can know how much gain you get is by trying it both ways and performance testing. But if you just want a rule of thumb, I'd say expect a 0% gain, and be pleasantly surprised if you get lucky.

Are Python threads buggy?

A reliable coder friend told me that Python's current multi-threading implementation is seriously buggy - enough to avoid using altogether. What can said about this rumor?
Python threads are good for concurrent I/O programming. Threads are swapped out of the CPU as soon as they block waiting for input from file, network, etc. This allows other Python threads to use the CPU while others wait. This would allow you to write a multi-threaded web server or web crawler, for example.
However, Python threads are serialized by the GIL when they enter interpreter core. This means that if two threads are crunching numbers, only one can run at any given moment. It also means that you can't take advantage of multi-core or multi-processor architectures.
There are solutions like running multiple Python interpreters concurrently, using a C based threading library. This is not for the faint of heart and the benefits might not be worth the trouble. Let's hope for an all Python solution in a future release.
The standard implementation of Python (generally known as CPython as it is written in C) uses OS threads, but since there is the Global Interpreter Lock, only one thread at a time is allowed to run Python code. But within those limitations, the threading libraries are robust and widely used.
If you want to be able to use multiple CPU cores, there are a few options. One is to use multiple python interpreters concurrently, as mentioned by others. Another option is to use a different implementation of Python that does not use a GIL. The two main options are Jython and IronPython.
Jython is written in Java, and is now fairly mature, though some incompatibilities remain. For example, the web framework Django does not run perfectly yet, but is getting closer all the time. Jython is great for thread safety, comes out better in benchmarks and has a cheeky message for those wanting the GIL.
IronPython uses the .NET framework and is written in C#. Compatibility is reaching the stage where Django can run on IronPython (at least as a demo) and there are guides to using threads in IronPython.
The GIL (Global Interpreter Lock) might be a problem, but the API is quite OK. Try out the excellent processing module, which implements the Threading API for separate processes. I am using that right now (albeit on OS X, have yet to do some testing on Windows) and am really impressed. The Queue class is really saving my bacon in terms of managing complexity!
EDIT: it seemes the processing module is being included in the standard library as of version 2.6 (import multiprocessing). Joy!
As far as I know there are no real bugs, but the performance when threading in cPython is really bad (compared to most other threading implementations, but usually good enough if all most of the threads do is block) due to the GIL (Global Interpreter Lock), so really it is implementation specific rather than language specific. Jython, for example, does not suffer from this due to using the Java thread model.
See this post on why it is not really feasible to remove the GIL from the cPython implementation, and this for some practical elaboration and workarounds.
Do a quick google for "Python GIL" for more information.
If you want to code in python and get great threading support, you might want to check out IronPython or Jython. Since the python code in IronPython and Jython run on the .NET CLR and Java VM respectively, they enjoy the great threading support built into those libraries. In addition to that, IronPython doesn't have the GIL, an issue that prevents CPython threads from taking full advantage of multi-core architectures.
I've used it in several applications and have never had nor heard of threading being anything other than 100% reliable, as long as you know its limits. You can't spawn 1000 threads at the same time and expect your program to run properly on Windows, however you can easily write a worker pool and just feed it 1000 operations, and keep everything nice and under control.

Categories

Resources