Fast IO operations in a seperate thread

Fast IO operations in a seperate thread - python

I have a set of instruments my program communicates with and I'd like to put the communication in a separate thread.
The IO is pretty slow (~100 ms per item per instrument), and I need to record the resulted values from them in a shared array (of the last N values) and saved to a file, with repeated measurments taken as quickly as possible. Some instruments are slow to 'formulate' a responce, so some readings could be done concurrently, but the readings need to be approximately syncronised (i.e. 1 timestamp per row of readings)
I'd like this all to be done in a seperate thread(s) so the timing can't be interupted by computation etc happening in the main thread, but the main thread should be able to access the array.
Ideally I should be able to run some daq.start() and it gets going without further interaction.
What's the 'pythonic' way to do this? I've been reading about asyncio, threading, and multiprocessing and it's unclear to me which is appropriate.
In c++ I would have start thread1, which would just record measurments sequentially into a cache array. thread2 would flush that cache into the main shared array whenever it could get a lock. At the same time it would write that to the output file. By keeping track of the index ranges which were being read, lock conflicts would be rare (but importantly wouldn't interupt the DAQ when they happen)

the correct answer here is threads, this is not an opinion.
communicating with hardware is done using drivers implemented in DLLs, and when python calls a DLL it drops the GIL, therefore threads can execute in the background with as little overhead as possible on the python interpreter itself.
proper synchronization should be done, which include thread locks if writing to file is done using the threads, but also when writing to files the threads will drop the GIL and they will still have little to no overhead on your python interpreter.
both of the above are not the case for asyncio, which is designed for asynchronous networking, not hardware.
for the implementation, a threadpool is usually the most pythonic way to go about this, you just spawn as much workers as the number of instruments that you connect to and let them do their work.
since you are not using any of asyncio features, you should be using multiprocesing.threadpool with apply_async or imap_unordered and the thread locks from the threading module for writing to disk, there is also a barrier if you want to synchronize each frame across all threads.

Related

Optimizing number of threads in Python

I am writing a program that analyses csv files in a directory, initially one file at a time. This could be several hundred files, but all of them are relatively small. My main runtime limitation was I/O, so I turned to multithreading using the threading library, which is a first for me.
I created a thread for each function call, following this guide, where each function call opens a csv in the desired directory. As a result, I have a list of threads for each file (i.e. hundreds of threads). However, my program still ran slowly, with the bulk of its time spent method 'acquire' of '_thread.lock' objects according to cProfile. I believe that this is because of the large number of threads resulting in lots of threads waiting for others to finish their tasks - is this correct?
How would you recommend I resolve this? My current idea is to split my list of files into equally sized chunks and to assign a thread to each chunk, rather than a thread to each file, and for each thread to iterate through the files in each chunk.

Python has something called the Global Interpreter Lock which seriously hurts your performance with that many threads, as each one is waiting to hold the "interpreter lock." I would recommend using Processes which if I remember are similar to Python thread objects in their use but do not suffer the same performance penalty of waiting for a lock. A thread and a process are different, but for your application, it sounds like it should not matter.
It is worth noting that the GIL can be released when performing I/O such as reading from a file, and therefore using threads might be fine - you just need to use fewer of them. In fact, with the number of threads/processes you are looking to create it might be a better idea to use a fixed pool of workers.

Combine multiprocess and multithread in Python

(I don't know if this should be asked here at SO, or one of the other stackexchange)
When doing heav I/O bound tasks e.g API-calls or database fetching, I wonder, if Python only uses one process for multithreading, i.e can we create even more threads by combining multiprocessing and multithreading, like the pseudo-code below
for process in Processes:
for thread in threads:
fetch_api_resuls(thread)
or does Python do this automatically?

I do not think there would be any point doing this: spinning up a new process has a relatively high cost, and spinning up a new thread has a pretty high cost. Serialising tasks to those threads or processes costs again, and synchronising state costs...again.
What I would do if I had two sets of problems:
I/O bound problems (e.g. fetching data over a network)
CPU bound problems related to those I/O bound problems
is to combine multiprocessing with asyncio. This has a much lower overhead---we only have one thread, and we pay for the scheduler only (but no serialisation), doesn't involve spinning up a gazillion processes (each of which uses around as much virtual memory as the parent process) or threads (each of which still uses a fair chunk of memory).
However, I would not use asyncio within the multiprocessing threads---I'd use asyncio in the main thread, and offload cpu-intensive tasks to a pool of worker threads when needed.
I suspect you probably can use threading inside multiprocessing, but it is very unlikely to bring you any speed boost.

Why python does (not) use more CPUs? [duplicate]

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Does Python support multithreading? Can it speed up execution time?

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Python: Interruptable threading in wx

My wx GUI shows thumbnails, but they're slow to generate, so:
The program should remain usable while the thumbnails are generating.
Switching to a new folder should stop generating thumbnails for the old folder.
If possible, thumbnail generation should make use of multiple processors.
What is the best way to do this?

Putting the thumbnail generation in a background thread with threading.Thread will solve your first problem, making the program usable.
If you want a way to interrupt it, the usual way is to add a "stop" variable which the background thread checks every so often (e.g., once per thumbnail), and the GUI thread sets when it wants to stop it. Ideally you should protect this with a threading.Condition. (The condition isn't actually necessary in most cases—the same GIL that prevents your code from parallelizing well also protects you from certain kinds of race conditions. But you shouldn't rely on that.)
For the third problem, the first question is: Is thumbnail generation actually CPU-bound? If you're spending more time reading and writing images from disk, it probably isn't, so there's no point trying to parallelize it. But, let's assume that it is.
First, if you have N cores, you want a pool of N threads, or N-1 if the main thread has a lot of work to do too, or maybe something like 2N or 2N-1 to trade off a bit of best-case performance for a bit of worst-case performance.
However, if that CPU work is done in Python, or in a C extension that nevertheless holds the Python GIL, this won't help, because most of the time, only one of those threads will actually be running.
One solution to this is to switch from threads to processes, ideally using the standard multiprocessing module. It has built-in APIs to create a pool of processes, and to submit jobs to the pool with simple load-balancing.
The problem with using processes is that you no longer get automatic sharing of data, so that "stop flag" won't work. You need to explicitly create a flag in shared memory, or use a pipe or some other mechanism for communication instead. The multiprocessing docs explain the various ways to do this.
You can actually just kill the subprocesses. However, you may not want to do this. First, unless you've written your code carefully, it may leave your thumbnail cache in an inconsistent state that will confuse the rest of your code. Also, if you want this to be efficient on Windows, creating the subprocesses takes some time (not as in "30 minutes" or anything, but enough to affect the perceived responsiveness of your code if you recreate the pool every time a user clicks a new folder), so you probably want to create the pool before you need it, and keep it for the entire life of the program.
Other than that, all you have to get right is the job size. Hopefully creating one thumbnail isn't too big of a job—but if it's too small of a job, you can batch multiple thumbnails up into a single job—or, more simply, look at the multiprocessing API and change the way it batches jobs when load-balancing.
Meanwhile, if you go with a pool solution (whether threads or processes), if your jobs are small enough, you may not really need to cancel. Just drain the job queue—each worker will finish whichever job it's working on now, but then sleep until you feed in more jobs. Remember to also drain the queue (and then maybe join the pool) when it's time to quit.
One last thing to keep in mind is that if you successfully generate thumbnails as fast as your computer is capable of generating them, you may actually cause the whole computer—and therefore your GUI—to become sluggish and unresponsive. This usually comes up when your code is actually I/O bound and you're using most of the disk bandwidth, or when you use lots of memory and trigger swap thrash, but if your code really is CPU-bound, and you're having problems because you're using all the CPU, you may want to either use 1 fewer core, or look into setting thread/process priorities.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.