Combine multiprocess and multithread in Python

Combine multiprocess and multithread in Python - python

(I don't know if this should be asked here at SO, or one of the other stackexchange)
When doing heav I/O bound tasks e.g API-calls or database fetching, I wonder, if Python only uses one process for multithreading, i.e can we create even more threads by combining multiprocessing and multithreading, like the pseudo-code below
for process in Processes:
for thread in threads:
fetch_api_resuls(thread)
or does Python do this automatically?

I do not think there would be any point doing this: spinning up a new process has a relatively high cost, and spinning up a new thread has a pretty high cost. Serialising tasks to those threads or processes costs again, and synchronising state costs...again.
What I would do if I had two sets of problems:
I/O bound problems (e.g. fetching data over a network)
CPU bound problems related to those I/O bound problems
is to combine multiprocessing with asyncio. This has a much lower overhead---we only have one thread, and we pay for the scheduler only (but no serialisation), doesn't involve spinning up a gazillion processes (each of which uses around as much virtual memory as the parent process) or threads (each of which still uses a fair chunk of memory).
However, I would not use asyncio within the multiprocessing threads---I'd use asyncio in the main thread, and offload cpu-intensive tasks to a pool of worker threads when needed.
I suspect you probably can use threading inside multiprocessing, but it is very unlikely to bring you any speed boost.

Related

Fast IO operations in a seperate thread

I have a set of instruments my program communicates with and I'd like to put the communication in a separate thread.
The IO is pretty slow (~100 ms per item per instrument), and I need to record the resulted values from them in a shared array (of the last N values) and saved to a file, with repeated measurments taken as quickly as possible. Some instruments are slow to 'formulate' a responce, so some readings could be done concurrently, but the readings need to be approximately syncronised (i.e. 1 timestamp per row of readings)
I'd like this all to be done in a seperate thread(s) so the timing can't be interupted by computation etc happening in the main thread, but the main thread should be able to access the array.
Ideally I should be able to run some daq.start() and it gets going without further interaction.
What's the 'pythonic' way to do this? I've been reading about asyncio, threading, and multiprocessing and it's unclear to me which is appropriate.
In c++ I would have start thread1, which would just record measurments sequentially into a cache array. thread2 would flush that cache into the main shared array whenever it could get a lock. At the same time it would write that to the output file. By keeping track of the index ranges which were being read, lock conflicts would be rare (but importantly wouldn't interupt the DAQ when they happen)

the correct answer here is threads, this is not an opinion.
communicating with hardware is done using drivers implemented in DLLs, and when python calls a DLL it drops the GIL, therefore threads can execute in the background with as little overhead as possible on the python interpreter itself.
proper synchronization should be done, which include thread locks if writing to file is done using the threads, but also when writing to files the threads will drop the GIL and they will still have little to no overhead on your python interpreter.
both of the above are not the case for asyncio, which is designed for asynchronous networking, not hardware.
for the implementation, a threadpool is usually the most pythonic way to go about this, you just spawn as much workers as the number of instruments that you connect to and let them do their work.
since you are not using any of asyncio features, you should be using multiprocesing.threadpool with apply_async or imap_unordered and the thread locks from the threading module for writing to disk, there is also a barrier if you want to synchronize each frame across all threads.

Multiprocessing inside a child thread

I was learning about multi-processing and multi-threading.
From what I understand, threads run on the same core, so I was wondering if I create multiple processes inside a child thread will they be limited to that single core too?
I'm using python, so this is a question about that specific language but I would like to know if it is the same thing with other languages?

I'm not a pyhton expert but I expect this is like in other languages, because it's an OS feature in general.
Process
A process is executed by the OS and owns one thread which will be executed. This is in general your programm. You can start more threads inside your process to do some heavy calculations or whatever you have to do.
But they belong to the process.
Thread
One or more threads are owned by a process and execution will be distributed across all cores.
Now to your question
When you create a given number of threads these threads should in general be distributed across all your cores. They're not limited to the core who's executing the phyton interpreter.
Even when you create a subprocess from your phyton code the process can and should run on other cores.
You can read more about the gernal concept here:
Preemptive multitasking
There're some libraries in different languages who abstract a thread to something often called a Task or something else.
For these special cases it's possible that they're just running inside the thread they were created in.
For example. In the DotNet world there's a Thread and a Task. Often people are misusing the term thread when they're talking about a Task, which in general runns inside the thread it was created.

Every program is represented through one process. A process is the execution context one or multiple threads operate in. All threads in one process share the same tranche of virtual memory assigned to the process.
Python (refering to CPython, e.g. Jython and IronPython have no GIL) is special because it has the global interpreter lock (GIL), which prevents threaded python code from being run on multiple cores in parallel. Only code releasing the GIL can operate truely parallel (I/O operations and some C-extensions like numpy). That's why you will have to use the multiprocessing module for cpu-bound python-code you need to run in parallel. Processes startet with the multiprocessing module then will run it's own python interpreter instance so you can process code truely parallel.
Note that even a single threaded python-application can run on different cores, not in parallel but sequentially, in case the OS re-schedules execution to another core after a context switch took place.
Back to your question:
if I create multiple processes inside a child thread will they be limited to that single core too?
You don't create processes inside a thread, you spawn new independent python-processes with the same limitations of the original python process and on which cores threads of the new processes will execute is up to the OS (...as long you don't manipulate the core-affinity of a process, but let's not go there).

How does a python thread work?

I was wondering if python threads run concurrently or in parallel?
For example, if I have two tasks and run them inside two threads will they be running simultaneously or will they be scheduled to run concurrently?
I'm aware of GIL and that the threads are using just one CPU core.

This is a complicated question with a lot of explication needed. I'm going to stick with CPython simply because it's the most widely used and what I have experience with.
A Python thread is a system thread that requires the Python interpreter natively to execute its contents into bytecode at runtime. The GIL is an interpreter-specific (in this case, CPython) lock that forces each thread to acquire a lock on the interpreter, preventing two threads from running at the same time no matter what core they're on.
No CPU core can run more than one thread at a time. You need multiple cores to even talk sensibly about parallelism. Concurrency is not the same as parallelism - the former implies operations between two threads can be interleaved before either are finished but where neither thread need not start at the same time, while the latter implies operations that can be started at the same time. If that confuses you, better descriptions about the difference are here.
There are ways to introduce concurrency in a single-core CPU - namely, have threads that suspend (put themselves to sleep) and resume when needed - but there is no way to introduce parallelism with a single core.
Because of these facts, as a consequence, it depends.
System threads are inherently designed to be concurrent - there wouldn't be much point in having an operating system otherwise. Whether or not they are actually executed this way depends on the task: is there an atomic lock somewhere? (As we shall see, there is!)
Threads that execute CPU-bound computations - where there is a lot of code being executed, and concurrently the interpreter is dynamically invoked for each line - obtain a lock on the GIL that prevents other threads from executing the same. So, in that circumstance, only one thread works at a time across all cores, because no other thread can acquire the interpreter.
That being said, threads don't need to keep the GIL until they are finished, instead acquiring and releasing the lock as/when needed. It is possible for two threads to interleave their operations, because the GIL could be released at the end of a code block, grabbed by the other thread, released at the end of that code block, and so on. They won't run in parallel - but they can certainly be run concurrently.
I/O bound threads, on the other hand, spend a large amount of their time simply waiting for requests to complete. These threads don't acquire the GIL - why would they, when there's nothing to interpret? - so certainly you can have multiple I/O-waiting threads run in parallel, one core per thread. The minute code needs to be compiled to bytecode, however, (maybe you need to handle your request?) up goes the GIL again.
Processes in Python survive the GIL, because they're a collection of resources bundled with threads. Each process has its own interpreter, and therefore each thread in a process only has to compete with its own immediate process siblings for the GIL. That is why process-based parallelism is the recommended way to go in Python, even though it consumes more resources overall.
The Upshot
So two tasks in two threads could run in parallel provided they don't need access to the CPython interpreter. This could happen if they are waiting for I/O requests or are making use of a suitable other language (say, C) extension that doesn't require the Python interpreter, using a foreign function interface.
All threads can run concurrently in the sense of interleaved atomic operations. Exactly how atomic these interleavings can be - is the GIL released after a code block? After every line? - depends on the task and the thread. Python threads don't have to execute serially - one thread finishes, and then the other starts - so there is concurrency in that sense.

In CPython, the threads are real OS threads, and are scheduled to run concurrently by the operating system. However, as you noted the GIL means that only one thread will be executing instructions at a time.

Let me explain what all that means. Threads run inside the same virtual machine, and hence run on the same physical machine. Processes can run on the same physical machine or in another physical machine. If you architect your application around threads, you’ve done nothing to access multiple machines. So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.

Are coroutines in Python libraries always single-threaded?

From the gevent docs:
The greenlets all run in the same OS thread and are scheduled cooperatively.
From asyncio docs:
This module provides infrastructure for writing single-threaded concurrent code using coroutines. asyncio does provide
Try as I might, I haven't come across any major Python libraries that implement multi-threaded or multi-process coroutines i.e. spreading coroutines across multiple threads so as to increase the number of I/O connections that can be made.
I understand coroutines essentially allow the main thread to pause executing this one I/O bound task and move on to the next I/O bound task, forcing an interrupt only when one of these I/O operations finish and require handling. If that is the case, then distributing I/O tasks across several threads, each of which could be operating on different cores, should obviously increase the number of requests you could make.
Maybe I'm misunderstanding how coroutines work or are meant to work, so my question is in two parts:
Is it possible to even have a coroutine library that operates over multiple threads (possibly on different cores) or multiple processes?
If so, is there such a library?

Green-threads and thread in Python

As Wikipedia states:
Green threads emulate multi-threaded environments without relying on any native OS capabilities, and they are managed in user space instead of kernel space, enabling them to work in environments that do not have native thread support.
Python's threads are implemented as pthreads (kernel threads),
and because of the global interpreter lock (GIL), a Python process only runs one thread at a time.
[QUESTION]
But in the case of Green-threads (or so-called greenlet or tasklets),
Does the GIL affect them? Can there be more than one greenlet
running at a time?
What are the pitfalls of using greenlets or tasklets?
If I use greenlets, how many of them can a process can handle? (I am wondering because in a single process you can open threads up to
ulimit(-s, -v) set in your *ix system.)
I need a little insight, and it would help if someone could share their experience, or guide me to the right path.

You can think of greenlets more like cooperative threads. What this means is that there is no scheduler pre-emptively switching between your threads at any given moment - instead your greenlets voluntarily/explicitly give up control to one another at specified points in your code.
Does the GIL affect them? Can there be more than one greenlet running
at a time?
Only one code path is running at a time - the advantage is you have ultimate control over which one that is.
What are the pitfalls of using greenlets or tasklets?
You need to be more careful - a badly written greenlet will not yield control to other greenlets. On the other hand, since you know when a greenlet will context switch, you may be able to get away with not creating locks for shared data-structures.
If I use greenlets, how many of them can a process can handle? (I am wondering because in a single process you can open threads up to umask limit set in your *ix system.)
With regular threads, the more you have the more scheduler overhead you have. Also regular threads still have a relatively high context-switch overhead. Greenlets do not have this overhead associated with them. From the bottle documentation:
Most servers limit the size of their worker pools to a relatively low
number of concurrent threads, due to the high overhead involved in
switching between and creating new threads. While threads are cheap
compared to processes (forks), they are still expensive to create for
each new connection.
The gevent module adds greenlets to the mix. Greenlets behave similar
to traditional threads, but are very cheap to create. A gevent-based
server can spawn thousands of greenlets (one for each connection) with
almost no overhead. Blocking individual greenlets has no impact on the
servers ability to accept new requests. The number of concurrent
connections is virtually unlimited.
There's also some further reading here if you're interested:
http://sdiehl.github.io/gevent-tutorial/

I assume you're talking about evenlet/gevent greenlets
1) There can be only one greenlet running
2) It's cooperative multithreading, which means that if a greenlet is stuck in an infinite loop, your entire program is stuck, typically greenlets are scheduled either explicitly or during I/O
3) A lot more than threads, it depends of the amount of RAM available

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.