Does anybody knows fate of Global Interpreter Lock in Python 3.1 against C++ multithreading integration
GIL is still there in CPython 3.1; the Unladen Swallow projects aims (among many other performance boosts) to eventually remove it, but it's still a way from its goals, and is working on 2.6 first with the intent of eventually porting to 3.x for whatever x will be current by the time the 2.y version is considered to be done. For now, multiprocessing (instead of threading) remains the way of choice for using multiple cores in CPython (IronPython and Jython are fine too, but they don't support Python 3 currently, nor do they make C++ integration all that easy either;-).
Significant changes will occur in the GIL for Python 3.2. Take a look at the What's New for Python 3.2, and the thread that initiated it in the mailing list.
While the changes don't signify the end of the GIL, they herald potentially enormous performance gains.
Update
The general performance gains with the new GIL in 3.2 by Antoine Pitrou were negligible, and instead focused on improving contention issues that arise in certain corner cases.
An admirable effort by David Beazley was made to implement a scheduler to significantly improve performance when CPU and IO bound threads are mixed, which was unfortunately shot down.
The Unladen Swallow work was proposed for merging in Python 3.3, but this has been withdrawn due to lack of results in that project. PyPy is now the preferred project and is currently requesting funding to add Python3k support. There's very little chance that PyPy will become the default at present.
Efforts have been made for the last 15 years to remove the GIL from CPython but for the foreseeable future it is here to stay.
The GIL will not affect your code which does not use python objects. In Numpy, we release the GIL for computational code (linear algebra calls, etc...), and the underlying code can use multithreading freely (in fact, those are generally 3rd party libraries which do not know anything about python)
The GIL is a good thing.
Just make your C++ application release the GIL while it is doing its multithreaded work. Python code will continue to run in the other threads, unspoiled. Only acquire the GIL when you have to touch python objects.
I guess there will always be a GIL.
The reason is performance. Making all the low level access thread safe - means putting a mutex around each hash operation etc. is heavy. Remember that a simple statement like
self.foo(self.bar, 3, val)
Might already have at least 3 (if val is a global) hashtable lookups at the moment and maybe even much more if the method cache is not hot (depending on the inheritance depth of the class)
It's expensive - that's why Java dropped the idea and introduced hashtables which do not use a monitor call to get rid of its "Java Is Slow" trademark.
As I understand it the "brainfuck" scheduler will replace the GIL from python 3.2
BFS bainfuck scheduler
If the GIL is getting in the way, just use the multiprocessing module. It spawns new processes but uses the threading model and (most of the) api. In other words, you can do process-based parallelism in a thread-like way.
Related
I'm hoping someone can provide some insight as to what's fundamentally different about the Java Virtual Machine that allows it to implement threads nicely without the need for a Global Interpreter Lock (GIL), while Python necessitates such an evil.
Python (the language) doesn't need a GIL (which is why it can perfectly be implemented on JVM [Jython] and .NET [IronPython], and those implementations multithread freely). CPython (the popular implementation) has always used a GIL for ease of coding (esp. the coding of the garbage collection mechanisms) and of integration of non-thread-safe C-coded libraries (there used to be a ton of those around;-).
The Unladen Swallow project, among other ambitious goals, does plan a GIL-free virtual machine for Python -- to quote that site, "In addition, we intend to remove the GIL and fix the state of multithreading in Python. We believe this is possible through the implementation of a more sophisticated GC system, something like IBM's Recycler (Bacon et al, 2001)."
The JVM (at least hotspot) does have a similar concept to the "GIL", it's just much finer in its lock granularity, most of this comes from the GC's in hotspot which are more advanced.
In CPython it's one big lock (probably not that true, but good enough for arguments sake), in the JVM it's more spread about with different concepts depending on where it is used.
Take a look at, for example, vm/runtime/safepoint.hpp in the hotspot code, which is effectively a barrier. Once at a safepoint the entire VM has stopped with regard to java code, much like the python VM stops at the GIL.
In the Java world such VM pausing events are known as "stop-the-world", at these points only native code that is bound to certain criteria is free running, the rest of the VM has been stopped.
Also the lack of a coarse lock in java makes JNI much more difficult to write, as the JVM makes less guarantees about its environment for FFI calls, one of the things that cpython makes fairly easy (although not as easy as using ctypes).
There is a comment down below in this blog post http://www.grouplens.org/node/244 that hints at the reason why it was so easy dispense with a GIL for IronPython or Jython, it is that CPython uses reference counting whereas the other 2 VMs have garbage collectors.
The exact mechanics of why this is so I don't get, but it does sounds like a plausible reason.
In this link they have the following explanation:
... "Parts of the Interpreter aren't threadsafe, though mostly because making them all threadsafe by massive lock usage would slow single-threaded extremely (source). This seems to be related to the CPython garbage collector using reference counting (the JVM and CLR don't, and therefore don't need to lock/release a reference count every time). But even if someone thought of an acceptable solution and implemented it, third party libraries would still have the same problems."
Python lacks jit/aot and the time frame it was written at multithreaded processors didn't exist. Alternatively you could recompile everything in Julia lang which lacks GIL and gain some speed boost on your Python code. Also Jython kind of sucks it's slower than Cpython and Java. If you want to stick to Python consider using parallel plugins, you won't gain an instant speed boost but you can do parallel programming with the right plugin.
I'm rephrasing my question because I think many thought it was the question "does python have threads". It does, but CPython also has the GIL, which will never schedule more than one thread at any given time. That makes CPython threads useless for cpu-intensive computations.
I need to use threads; process parallelism won't work for me because of the IPC costs (I have large shared objects).
I'm currently using Jython (no GIL) with JyNI so that I can use numpy. JyNI is alpha, but it does now support numpy. I got this to work. However, JyNI is alpha and buggy, and the whole process is slow.
I've read a bunch of old threads. I wonder whether there has been a viable option since then? I'm forced to use python 2.7.
Thanks.
At the moment, Jython is still considerably slower than CPython. Depending on the program and how well the JIT can optimize it, multithreading might or might not pay off. Jython's primary design goal is compatibility, before performance. It is mainly intended for glue code and there is still a lot of potential for efficiency improvements. See e.g. zippy for a blazingly fast Python implementation in Java, however it is experimental and lacks Jython's compatibility level. In a way this represents the opposite design goal.
Now adding JyNI to Jython does not exactly make it faster, but so far I found that performace optimization in JyNI would be premature and usually the Jython part dominates the runtime anyway. Also, e.g. for NumPy the native numerics workload vastly dominates the glue code cost.
Finally, note that JyNI must emulate a GIL on C side. For details have a look at the paper https://arxiv.org/abs/1607.00825. Maybe it will be possible to operate certain extensions without a GIL - it depends on implementation detail, how sensitive an extension is to that. Currently the C-side GIL is mandatory. That's why you might not benefit from Java multithreading when using NumPy. C-extensions have the option to explicitly release the GIL e.g. during computationally intense operations that don't interact with the interpreter. I don't know if NumPy makes use of this.
JyNI is alpha and buggy
Please make sure to report bugs at the issue tracker.
CPython uses GIL to prevent problems such as mutual exclusion. However, the consequence is that the interpreter is not able to take advantage of a multi-core CPU. I also learnt that Jython does not require a GIL because its implementation is already thread-safe.
Does it mean that Jython is a superior implementation when it comes to concurrent programming and utilizing a multi-core CPU?
Yes, Jython uses Java-Threads (even if you're using the threading modul of Python) and so it has no GIL. But this isn't the answer (otherwise it has to be 42, because the question is unclear :^) ).
The better Question is, what criteria you have and if CPython or Jython would be better.
If you want real multithreadding, it's your thing.
If you want to use Java and Python, use it.
If you want fast execution times .... then are other languages maybe better (you can try to messure the time in a thread task in Python and the same code in Jython, but I guess even with GIL CPython would be faster).
Greets,
Zonk
The reason between Java and Python threads is
Java is designed to lock on the resources
Python is designed to lock the thread itself(GIL)
So Python's implementation performs better on a machine with singe core processor. This is fine 10-20 years before. With the increasing computing capacity, if we use multiprocessor machine with same piece of code, it performs very badly.
Is there any hack to disable GIL and use resource locking in Python(Like Java implementation)?
P.S. My application is currently running on Python 2.7.12. It is compute intensive with less I/O and network blocking. Assume that I can't use multiprocessing for my use case.
I think the most straight way for you, that will give you also a nice performance increase is to use Cython.
Cython is a Python superset that compiles Python-like code to C code (which makes use of the cPython API), and from there to executable. It allows one to optionally type variables, that then can use native C types instead of Python objects - and also allows one direct control of the GIL.
It does support a with nogil: statement in which the with block runs with the GIL turned off - if there are other threads running (you use the normal Python threading library), they won't be blocked while code is running on the marked with block.
Just keep in mind that the GIL is there for a reason: it is thanks to it that global complex objects like lists and dictionaries work without the danger of getting into an inconsistent state between treads. But if your "nogil" blocks restrict themselves to local data structures, you should have no problems.
Check the Cython project - and here is an specific example of turning off the GIL:
https://lbolla.info/blog/2013/12/23/python-threads-cython-gil
What are some good guidelines to follow when deciding to use threads or multiprocessing when speaking in terms of efficiency and code clarity?
Many of the differences between threading and multiprocessing are not really Python-specific, and some differences are specific to a certain Python implementation.
For CPython, I would use the multiprocessing module in either fo the following cases:
I need to make use of multiple cores simultaneously for performance reasons. The global interpreter lock (GIL) would prevent any speedup when using threads. (Sometimes you can get away with threads in this case anyway, for example when the main work is done in C code called via ctypes or when using Cython and explicitly releasing the GIL where approriate. Of course the latter requires extra care.) Note that this case is actually rather rare. Most applications are not limited by processor time, and if they really are, you usually don't use Python.
I want to turn my application into a real distributed application later. This is a lot easier to do for a multiprocessing application.
There is very little shared state needed between the the tasks to be performed.
In almost all other circumstances, I would use threads. (This includes making GUI applications responsive.)
For code clarity, one of the biggest things is to learn to know and love the Queue object for talking between threads (or processes, if using multiprocessing... multiprocessing has its own Queue object). Queues make things a lot easier and I think enable a lot cleaner code.
I had a look for some decent Queue examples, and this one has some great examples of how to use them and how useful they are (with the exact same logic applying for the multiprocessing Queue):
http://effbot.org/librarybook/queue.htm
For efficiency, the details and outcome may not noticeably affect most people, but for python <= 3.1 the implementation for CPython has some interesting (and potentially brutal), efficiency issues on multicore machines that you may want to know about. These issues involve the GIL. David Beazley did a video presentation on it a while back and it is definitely worth watching. More info here, including a followup talking about significant improvements on this front in python 3.2.
Basically, my cheap summary of the GIL-related multicore issue is that if you are expecting to get full multi-processor use out of CPython <= 2.7 by using multiple threads, don't be surprised if performance is not great, or even worse than single core. But if your threads are doing a bunch of i/o (file read/write, DB access, socket read/write, etc), you may not even notice the problem.
The multiprocessing module avoids this potential GIL problem entirely by creating a python interpreter (and GIL) per processor.