I was reading about PyPy's stackless feature. My question is simple: does this get around the GIL? The page says it allows coding in "massively concurrent style". Does this also mean massively parallel style, taking advantage of multiple cores?
No. The microthreads are more lightweight and convenient to program, but still can't execute in parallel for the same reason a "stackful" Python can't just run threads in parallel. Nothing about the microthreads solves the problems addressed by the GIL, and in fact they're not intended to provide parallelism.
Note that the same is true for the original CPython-based Stackless (see Stackless python and multicores?).
Related
Since CPU-bound parellelization is not achievable in CPython due to GIL.
The official documentation recommends to use multiprocessing instead of multithreading.
So, is the use of multiple processes more intensive on resources than multiple threads, if compared to multiprocessing/multithreading performance of any other programming language like Java or C++ which support true parellelization in both multiprocessing and multithreading?
There is little inherent additional cost to multiprocessing in python beyond the cost of forking (unix-like systems) or respawning a process. The expense is when data or state needs to be shared among the processes. This could be anything from the iterable given to Pool.map to the proxies in Manager. As long as those costs are kept low compared to the per-process work load, its a wash. (Note that python is usually slower than java and c++ for other reasons unrelated to mp).
I have been having no problems with performance with Python's Global Interpreter Lock. I've had to make a few things thread-safe - despite common advice, the GIL does NOT automatically guarantee thread-safety - but I've got a program commonly running upwards of 10 threads, where all of them can be active at any time, including together. It is a somewhat complex asynchronous messaging system.
I understand multiprocessing and am even using Celery in this program, but the solution would have to be very convoluted to work through multiprocessing for this problem set.
I'm running 2.7 and using recursive locks despite their performance penalties.
My question is this: will I run into scaling problems with the GIL? I have seen no performance problems with it so far. Measuring this is...problematic. Is there a number of threads or something similar that you hit and it just starts choking? Does GIL performance differ significantly from executing multi-threaded code on a single-core CPU?
Thanks!
The GIL is a complex topic and the exact behavior in your case is hard to explain without your code. So I can not tell you if you will run into troubles in future. I can just advise to bring you project to a recent version of Python 3 if posdible. There have been many improvents made to the GIL in Python 3.
The is nothing like a magic number of threads at which Python will break. The general rule is just: The more threads, the more problem. Most complicated is going from one to two.
The GIL is released in some situations, especially when C code is executed or I/O is done. This allows code to run in parallel. With the advanced featured of modern CPUs is wouldn't be wise to limit your code to just one CPU.
In their arXiv paper, the original authors of Julia mention the following:
2.14 Parallelism.
Parallel execution is provided by a message-based multi-processing system implemented in Julia in the standard library.
The language design supports the implementation of such libraries by
providing symmetric coroutines, which can also be thought of as
cooperatively scheduled threads. This feature allows asynchronous
communication to be hidden inside libraries, rather than requiring the
user to set up callbacks. Julia does not currently support native
threads, which is a limitation, but has the advantage of avoiding the
complexities of synchronized use of shared memory.
What do they mean by saying that Julia does not support native threads? What is a native thread?
Do other interpreted languages such as Python or R support this type of parallelism? Is Julia alone in this?
Update
When this question was asked in 2013, Julia indeed had no support for multithreading. Today, however, Julia supports native threading with what has emerged as the best language model for composable thread programming. This model was pioneered by Cilk and Intel's Threading Building Blocks, pushed further at the language level by Go, and is now also used by Julia. Statements in the original answer below about other dynamic languages remain true: they still do not support native threading with support for parallel execution of user code. The history of adding threading capabilities to Julia progressed in the following high-level stages:
Julia 0.3: support for native threads with an OpenMP-like compute model (i.e. parallel for loops). This functionality was highly limited: no I/O or networking was allowed in parallel code.
Julia 1.3: fully composable high performance M:N threading. This threading model is the same as Go (and Cilk and Intel TBB), where tasks are used to express potential concurrency, and those tasks are run on threads from a pool of native threads by a scheduler.
Julia 1.7: support for migration of tasks between native threads. This allows a task to begin execution on one native thread, get suspended, and then resume on a different thread, allowing full utilization of available compute resources.
Original Answer
"Native threads" are separate contexts of execution, managed by the operating system kernel, accessing a shared memory space and potentially executing concurrently on separate cores. Compare this with separate processes, which may execute concurrently on multiple cores but have separate memory spaces. Making sure that processes interact nicely is easy since they can only communicate with each other via the kernel. Ensuring that threads don't interact in unpredictable, buggy ways is very hard since they can read and write to the same memory in an unrestricted manner.
The R situation is fairly straightforward: R is not multithreaded. Python is a little more complicated: Python does support threading, but due to the global interpreter lock (GIL), no actual concurrent execution of Python code is possible. Other popular open source dynamic languages are in various mixed states with respect to native threading (Ruby: no/kinda/yes?; Node.js: no), but in general, the answer is no, they do not support fully concurrent native threading, so Julia is not alone in this.
When we do add shared-memory parallelism to Julia, as we plan to – whether using native threads or multiple processes with shared memory – it will be true concurrency and there will be no GIL preventing simultaneous execution of Julia code. However, this is an incredibly tricky feature to add to a language, as attested by the non-existent or limited support in other very popular, mature dynamic languages. Adding a shared-memory concurrency model is technically difficult, but the real problem is designing a programming model that will allow programmers to make effective use of hardware concurrency in a productive and safe way. This problem is generally unsolved and is a very active area of research and experimentation – there is no "gold standard" to copy. We could just add POSIX threads support, but that programming model is general considered to be dangerous and incredibly difficult to use correctly and effectively. Go has an excellent concurrency story, but it is designed for writing highly concurrent servers, not for concurrently operating on large data, so it's not at all clear that simply copying Go's model is a good idea for Julia.
I'm relatively new to threading and asynchronous programming in general, but I'm trying to understand the distinction between the two as it relates to the GIL in CPython.
From the reading that I've down, I understand that threads have their own stack and the two models are a different programming paradigm. But given that they cannot run concurrently because of the GIL, are python threads a type of asynchronous execution underneath? I'd really like to get a better understanding of how the python interpreter implements threading, specifically how it determines when one thread is blocking and another can be executed?
The GIL does only play a role when executing python code - calling Functions that are implemented in C, for example, GIL shouldn't interfere afaik. Also, downloading files or moving files from the disk can be made work concurrently with python Threads.
Quote from the Python Wiki:
Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.
You might have a look on the multiprocessing module, which allows you to overcome the GIL and use multiple Cores on the machine. Also, there is work going on to make PyPy, an alternative Python Interpreter, become GIL-less in some day (just search for STM/AME).
So, I'm toying around with Stackless Python and a question popped up in my head, maybe this is "assumed" or "common" knowledge, but I couldn't find it actually written anywhere on the stackless site.
Does Stackless Python take advantage of multicore CPUs? In normal Python you have the GIL being constantly present and to make (true) use of multiple cores you need to use several processes, is this true for Stackless also?
Stackless python does not make use of any kind of multi-core environment it runs on.
This is a common misconception about Stackless, as it allows the programmer to take advantage of thread-based programming. For many people these two are closely intertwined, but are, in fact two separate things.
Internally Stackless uses a round-robin scheduler to schedule every tasklet (micro threads), but no tasklet can be run concurrent with another one. This means that if one tasklet is busy, the others must wait until that tasklet relinquishes control. By default the scheduler will not stop a tasklet and give processor time to another. It is the tasklet's responsibility to schedule itself back in the end of the schedule queue using Stackless.schedule(), or by finishing its calculations.
all tasklets are thus executed in a sequential manner, even when multiplpe cores are available.
The reason why Stackless does not have multi-core support is because this makes threads a whole lot easier. And this is just what stackless is all about:
from the official stackless website
Stackless Python is an enhanced
version of the Python programming
language. It allows programmers to
reap the benefits of thread-based
programming without the performance
and complexity problems associated
with conventional threads. The
microthreads that Stackless adds to
Python are a cheap and lightweight
convenience which can if used
properly, give the following benefits:
Improved program structure.
More readable code.
Increased programmer productivity.
Here is a link to some more information about multiple cores and stackless.