Stackless python and multicores?

Stackless python and multicores? - python

So, I'm toying around with Stackless Python and a question popped up in my head, maybe this is "assumed" or "common" knowledge, but I couldn't find it actually written anywhere on the stackless site.
Does Stackless Python take advantage of multicore CPUs? In normal Python you have the GIL being constantly present and to make (true) use of multiple cores you need to use several processes, is this true for Stackless also?

Stackless python does not make use of any kind of multi-core environment it runs on.
This is a common misconception about Stackless, as it allows the programmer to take advantage of thread-based programming. For many people these two are closely intertwined, but are, in fact two separate things.
Internally Stackless uses a round-robin scheduler to schedule every tasklet (micro threads), but no tasklet can be run concurrent with another one. This means that if one tasklet is busy, the others must wait until that tasklet relinquishes control. By default the scheduler will not stop a tasklet and give processor time to another. It is the tasklet's responsibility to schedule itself back in the end of the schedule queue using Stackless.schedule(), or by finishing its calculations.
all tasklets are thus executed in a sequential manner, even when multiplpe cores are available.
The reason why Stackless does not have multi-core support is because this makes threads a whole lot easier. And this is just what stackless is all about:
from the official stackless website
Stackless Python is an enhanced
version of the Python programming
language. It allows programmers to
reap the benefits of thread-based
programming without the performance
and complexity problems associated
with conventional threads. The
microthreads that Stackless adds to
Python are a cheap and lightweight
convenience which can if used
properly, give the following benefits:
Improved program structure.
More readable code.
Increased programmer productivity.
Here is a link to some more information about multiple cores and stackless.

Related

Parallel threading python GIL vs Java

I know that python has a GIL that make threads not be able to run at the same time therefore threading is just context switching.
Why is java different?
Threads on the same CPU in every language cannot run parallel.
Is creating new thread in java utilizes cores in multi core machine?
python can only spawn threads on the same CPU, in contrast to java?
If 1. Is the case, when using more threads than CPUs even in java it comes back to context switching again for several of them?
If 1. Is the case then how is it differ from multiprocessing? Because utilizing multiple cores isn't guaranteed?
Isn't the whole point of threading is being able to use the same memory space? If java does run some of them in multiple threads for perallelism, how do they really share memory?
Thank you

Why is java different?
Because it is able to effectively use multiple cores at the same time.
Does creating a new thread in java utilizes cores in multi core machine?
Yes.
Python can only spawn threads on the same CPU, in contrast to Java?
Java can spawn multiple threads which will on different CPUs. Java is not responsible for the actual thread scheduling. That is handled by the OS. And the OS may reschedule a thread to a different CPU to the one that it started on.
I am not sure about the precise details for Python, but I think the GIL is an implementation detail rather than something that it intrinsic to the language itself1. But either way, in a Python implementation, the GIL means that you would get little performance benefit in spawning threads on multiple cores. As this page says:
"The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter."
If 1. is the case, when using more threads than CPUs does it come back to context switching in Java?
It depends. When switching a CPU between threads belonging to different processes, a full context switch is involved. But when switching between threads in the same process, only the (user) registers need to be switched. (The virtual memory registers and caches don't need to be switched / flushed because the threads share the same virtual address space.)
If 1. is the case then how is it differ from multiprocessing? Because utilizing multiple cores isn't guaranteed?
The key difference between multi-threading and multi-processing is that processes do not share any memory. By contrast, one thread in a process can see the memory of all of the others ... modulo issues of when changes are visible.
This difference has a variety of consequences.
Isn't the whole point of threading is being able to use the same memory space?
Yes, that is the main point ... when you compare multi-threading with multi-processing.
If Java does run some of them in multiple threads for parallelism ...
Java supports threads for many reasons. Parallelism is only one of those reasons. Others include multiplexing I/O and simplifying certain kinds of programming problem. These other reasons are also relevant to Python.
... how do [Java threads] really share memory?
The hardware deals with the issues of making the physical memory visible to all of the threads, and propagation of changes via the memory caches. It is complicated.
In Java the onus is on the programmer to "do the right thing" when threads make use of shared variables / objects. You need to use volatile variables, or synchronized blocks / methods, or something else that ensures that there is a happens before chain between a write and subsequent read. (Otherwise you can get issues with changes not being visible.)
This transfer of responsibility to the programmer allows the compiler to generate code with fewer main memory operations ... and hence that is faster. The downside is that if an application doesn't obey the rules, it is liable to behave in unexpected ways.
By contrast, in Python the memory model is unspecified, but there is an expectation (by millions of Python programmers) that it will behave in an intuitive fashion; i.e. a shared variable write performed by one thread will immediately be visible to other threads. This is hard to achieve efficiently while also allowing Python threads to run in parallel.
1 - While the GIL is not formally part of the Python spec, the influence of GIL on the (unspecified!) Python memory model and Python programmers assumptions make it more than merely an implementation detail. It remains to be seen if Python can successfully evolve into a language where multi-threading can use multiple cores effectively.

Not a complete answer here, but just adding a couple of things that Stephen C didn't already say:
Python can only spawn threads on the same CPU, in contrast to java?
That would be an optimization, not an essential fact. There's no reason in principle why Python could not simply allow the OS to schedule its threads on whatever CPU happened to be available at any given time.
OTOH, given that no two Python threads can do significant work at the same time, it potentially could improve performance if the threads all had affinity for the same CPU. (See what Stephen C said about "full context switch" vs. "only the (user) registers."
Giving user-mode processes control over processor affinity is a relatively new feature in some operating systems. I have no idea of whether or not any Python version actually uses that feature.
If java does run...multiple threads for parallelism...?
Java doesn't "run multiple threads for parallelism." Your Java program creates multiple threads for whatever reason you happen to want them. Most modern OSs provide threads. Java simply makes that ability available to application programmers in a way that is tightly integrated with the language itself. You are free to use them (or not) however you see fit.

Is Python multiprocessing intensive on resources?

Since CPU-bound parellelization is not achievable in CPython due to GIL.
The official documentation recommends to use multiprocessing instead of multithreading.
So, is the use of multiple processes more intensive on resources than multiple threads, if compared to multiprocessing/multithreading performance of any other programming language like Java or C++ which support true parellelization in both multiprocessing and multithreading?

There is little inherent additional cost to multiprocessing in python beyond the cost of forking (unix-like systems) or respawning a process. The expense is when data or state needs to be shared among the processes. This could be anything from the iterable given to Pool.map to the proxies in Manager. As long as those costs are kept low compared to the per-process work load, its a wash. (Note that python is usually slower than java and c++ for other reasons unrelated to mp).

Does Pypy's stackless thread option support parallel execution?

I was reading about PyPy's stackless feature. My question is simple: does this get around the GIL? The page says it allows coding in "massively concurrent style". Does this also mean massively parallel style, taking advantage of multiple cores?

No. The microthreads are more lightweight and convenient to program, but still can't execute in parallel for the same reason a "stackful" Python can't just run threads in parallel. Nothing about the microthreads solves the problems addressed by the GIL, and in fact they're not intended to provide parallelism.
Note that the same is true for the original CPython-based Stackless (see Stackless python and multicores?).

Parallelism in Julia: Native Threading Support

In their arXiv paper, the original authors of Julia mention the following:
2.14 Parallelism.
Parallel execution is provided by a message-based multi-processing system implemented in Julia in the standard library.
The language design supports the implementation of such libraries by
providing symmetric coroutines, which can also be thought of as
cooperatively scheduled threads. This feature allows asynchronous
communication to be hidden inside libraries, rather than requiring the
user to set up callbacks. Julia does not currently support native
threads, which is a limitation, but has the advantage of avoiding the
complexities of synchronized use of shared memory.
What do they mean by saying that Julia does not support native threads? What is a native thread?
Do other interpreted languages such as Python or R support this type of parallelism? Is Julia alone in this?

Update
When this question was asked in 2013, Julia indeed had no support for multithreading. Today, however, Julia supports native threading with what has emerged as the best language model for composable thread programming. This model was pioneered by Cilk and Intel's Threading Building Blocks, pushed further at the language level by Go, and is now also used by Julia. Statements in the original answer below about other dynamic languages remain true: they still do not support native threading with support for parallel execution of user code. The history of adding threading capabilities to Julia progressed in the following high-level stages:
Julia 0.3: support for native threads with an OpenMP-like compute model (i.e. parallel for loops). This functionality was highly limited: no I/O or networking was allowed in parallel code.
Julia 1.3: fully composable high performance M:N threading. This threading model is the same as Go (and Cilk and Intel TBB), where tasks are used to express potential concurrency, and those tasks are run on threads from a pool of native threads by a scheduler.
Julia 1.7: support for migration of tasks between native threads. This allows a task to begin execution on one native thread, get suspended, and then resume on a different thread, allowing full utilization of available compute resources.
Original Answer
"Native threads" are separate contexts of execution, managed by the operating system kernel, accessing a shared memory space and potentially executing concurrently on separate cores. Compare this with separate processes, which may execute concurrently on multiple cores but have separate memory spaces. Making sure that processes interact nicely is easy since they can only communicate with each other via the kernel. Ensuring that threads don't interact in unpredictable, buggy ways is very hard since they can read and write to the same memory in an unrestricted manner.
The R situation is fairly straightforward: R is not multithreaded. Python is a little more complicated: Python does support threading, but due to the global interpreter lock (GIL), no actual concurrent execution of Python code is possible. Other popular open source dynamic languages are in various mixed states with respect to native threading (Ruby: no/kinda/yes?; Node.js: no), but in general, the answer is no, they do not support fully concurrent native threading, so Julia is not alone in this.
When we do add shared-memory parallelism to Julia, as we plan to – whether using native threads or multiple processes with shared memory – it will be true concurrency and there will be no GIL preventing simultaneous execution of Julia code. However, this is an incredibly tricky feature to add to a language, as attested by the non-existent or limited support in other very popular, mature dynamic languages. Adding a shared-memory concurrency model is technically difficult, but the real problem is designing a programming model that will allow programmers to make effective use of hardware concurrency in a productive and safe way. This problem is generally unsolved and is a very active area of research and experimentation – there is no "gold standard" to copy. We could just add POSIX threads support, but that programming model is general considered to be dangerous and incredibly difficult to use correctly and effectively. Go has an excellent concurrency story, but it is designed for writing highly concurrent servers, not for concurrently operating on large data, so it's not at all clear that simply copying Go's model is a good idea for Julia.

Threads vs Asynchronous Execution in Python

I'm relatively new to threading and asynchronous programming in general, but I'm trying to understand the distinction between the two as it relates to the GIL in CPython.
From the reading that I've down, I understand that threads have their own stack and the two models are a different programming paradigm. But given that they cannot run concurrently because of the GIL, are python threads a type of asynchronous execution underneath? I'd really like to get a better understanding of how the python interpreter implements threading, specifically how it determines when one thread is blocking and another can be executed?

The GIL does only play a role when executing python code - calling Functions that are implemented in C, for example, GIL shouldn't interfere afaik. Also, downloading files or moving files from the disk can be made work concurrently with python Threads.
Quote from the Python Wiki:
Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.
You might have a look on the multiprocessing module, which allows you to overcome the GIL and use multiple Cores on the machine. Also, there is work going on to make PyPy, an alternative Python Interpreter, become GIL-less in some day (just search for STM/AME).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.