Python multiprocessing prevent switching off to other processes

Python multiprocessing prevent switching off to other processes - python

While using the multiprocessing module in Python, is there a way to prevent the process of switching off to another process for a certain time?
I have ~50 different child processes spawned to retrieve data from a database (each process = each table in DB) and after querying and filtering the data, I try to write the output to an excel file.
Since all the processes have similar steps, they all end up to the writing process at similar times, and of course since I am writing to a single file, I have a lock that prevents multiple processes to write on the file.
The problem is though, the writing process seems to take very long, compared to the writing process when I had the same amount of data written in a single process (slower by at least x10)
I am guessing one of the reasons could be that while writing, the cpu is constantly switching off to other processes, which are all stuck at the mutex lock, only to come back to the process that is the only one that is active. I am guessing the context switching is a significant waste of time since there are a lot of processes to switch back and forth from
I was wondering if there was a way to lock a process such that for a certain part of the code, no context switching between processes happen
Or any other suggestions to speed up this process?

Don't use locking and don't write from multiple processes; Let the child processes return the output to the parent (e.g. via standard output), and have it wait for the processes to join to read it. I'm not 100% on the multiprocessing API but you could just have the parent process sleep and wait for a SIGCHLD and only then read data from an exited child's standard output, and write it to your output file.
This way only one process is writing to the file and you don't need any busy looping or whatever. It will be much simpler and much more efficient.

You can raise the priority of your process (Go to Task Manager, right click on the process and raise it's process priority). However the OS will context switch no matter what, your process has no better claim then other processes to the OS.

Related

Fast IO operations in a seperate thread

I have a set of instruments my program communicates with and I'd like to put the communication in a separate thread.
The IO is pretty slow (~100 ms per item per instrument), and I need to record the resulted values from them in a shared array (of the last N values) and saved to a file, with repeated measurments taken as quickly as possible. Some instruments are slow to 'formulate' a responce, so some readings could be done concurrently, but the readings need to be approximately syncronised (i.e. 1 timestamp per row of readings)
I'd like this all to be done in a seperate thread(s) so the timing can't be interupted by computation etc happening in the main thread, but the main thread should be able to access the array.
Ideally I should be able to run some daq.start() and it gets going without further interaction.
What's the 'pythonic' way to do this? I've been reading about asyncio, threading, and multiprocessing and it's unclear to me which is appropriate.
In c++ I would have start thread1, which would just record measurments sequentially into a cache array. thread2 would flush that cache into the main shared array whenever it could get a lock. At the same time it would write that to the output file. By keeping track of the index ranges which were being read, lock conflicts would be rare (but importantly wouldn't interupt the DAQ when they happen)

the correct answer here is threads, this is not an opinion.
communicating with hardware is done using drivers implemented in DLLs, and when python calls a DLL it drops the GIL, therefore threads can execute in the background with as little overhead as possible on the python interpreter itself.
proper synchronization should be done, which include thread locks if writing to file is done using the threads, but also when writing to files the threads will drop the GIL and they will still have little to no overhead on your python interpreter.
both of the above are not the case for asyncio, which is designed for asynchronous networking, not hardware.
for the implementation, a threadpool is usually the most pythonic way to go about this, you just spawn as much workers as the number of instruments that you connect to and let them do their work.
since you are not using any of asyncio features, you should be using multiprocesing.threadpool with apply_async or imap_unordered and the thread locks from the threading module for writing to disk, there is also a barrier if you want to synchronize each frame across all threads.

Optimizing number of threads in Python

I am writing a program that analyses csv files in a directory, initially one file at a time. This could be several hundred files, but all of them are relatively small. My main runtime limitation was I/O, so I turned to multithreading using the threading library, which is a first for me.
I created a thread for each function call, following this guide, where each function call opens a csv in the desired directory. As a result, I have a list of threads for each file (i.e. hundreds of threads). However, my program still ran slowly, with the bulk of its time spent method 'acquire' of '_thread.lock' objects according to cProfile. I believe that this is because of the large number of threads resulting in lots of threads waiting for others to finish their tasks - is this correct?
How would you recommend I resolve this? My current idea is to split my list of files into equally sized chunks and to assign a thread to each chunk, rather than a thread to each file, and for each thread to iterate through the files in each chunk.

Python has something called the Global Interpreter Lock which seriously hurts your performance with that many threads, as each one is waiting to hold the "interpreter lock." I would recommend using Processes which if I remember are similar to Python thread objects in their use but do not suffer the same performance penalty of waiting for a lock. A thread and a process are different, but for your application, it sounds like it should not matter.
It is worth noting that the GIL can be released when performing I/O such as reading from a file, and therefore using threads might be fine - you just need to use fewer of them. In fact, with the number of threads/processes you are looking to create it might be a better idea to use a fixed pool of workers.

Multiprocessing or os.fork, os.exec?

I am using multiprocessing module to fork child processes. Since on forking, child process gets the address space of parent process, I am getting the same logger for parent and child. I want to clear the address space of child process for any values carried over from parent. I got to know that multiprocessing does fork() at lower level but not exec(). I want to know whether it is good to use multiprocessing in my situation or should I go for os.fork() and os.exec() combination or is there any other solution?
Thanks.

Since multiprocessing is running a function from your program as if it were a thread function, it definitely needs a full copy of your process' state. That means doing fork().
Using a higher-level interface provided by multiprocessing is generally better. At least you should not care about the fork() return code yourself.
os.fork() is a lower level function providing less service out-of-the-box, though you certainly can use it for anything multiprocessing is used for... at the cost of partial reimplementation of multiprocessing code. So, I think, multiprocessing should be ok for you.
However, if you process' memory footprint is too large to duplicate it (or if you have other reasons to avoid forking -- open connections to databases, open log files etc.), you may have to make the function you want to run in a new process a separate python program. Then you can run it using subprocess, pass parameters to its stdin, capture its stdout and parse the output to get results.
UPD: os.exec... family of functions is hard to use for most of purposes since it replaces your process with a spawned one (if you run the same program as is running, it will restart from the very beginning, not keeping any in-memory data). However, if you really do not need to continue parent process execution, exec() may be of some use.
From my personal experience: os.fork() is used very often to create daemon processes on Unix; I often use subprocess (the communication is through stdin/stdout); almost never used multiprocessing; not a single time in my life I needed os.exec...().

You can just rebind the logger in the child process to its own. I don't know about other OS, but on Linux the forking doesn't duplicate the entire memory footprint (as Ellioh mentioned), but uses "copy-on-write" concept. So until you change something in the child process - it stays in the memory scope of the parent process. For instance, you can fork 100 child processes (that don't write into memory, only read) and check the overall memory usage. It'll not be parent_memory_usage * 100, but much less.

Is multiprocessing the right tool for me?

I need to write a very specific data processing daemon.
Here is how I thought it could work with multiprocessing :
Process #1: One process to fetch some vital meta data, they can be fetched every second, but those data must be available in process #2. Process #1 writes the data, and Process #2 reads them.
Process #2: Two processes which will fetch the real data based on what has been received in process #1. Fetched data will be stored into a (big) queue to be processed "later"
Process #3: Two (or more) processes which poll the queue created in Process #2 and process those data. Once done, a new queue is filled up to be used in Process #4
Process #4 : Two processes which will read the queue filled by Process(es) #3 and send the result back over HTTP.
The idea behind all these different processes is to specialize them as much as possible and to make them as independent as possible.
All thoses processes will be wrapped into a main daemon which is implemented here :
http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
I am wondering if what I have imagined is relevant/stupid/overkill/etc, especially if I run daemon multiprocessing.Process(es) within a main parent process which will daemonized.
Furthermore I am a bit concerned about potential locking problems. In theory processes that read and write data uses different variables/structures so that should avoid a few problems, but I am still concerned.
Maybe using multiprocessing for my context is not the right thing to do. I would love to get your feedback about this.
Notes :
I can not use Redis as a data structure server
I thought about using ZeroMQ for IPC but I would avoid using another extra library if multiprocessing can do the job as well.
Thanks in advance for your feedback.

Generally, your division in different workers with different tasks as well as your plan to let them communicate already looks good. However, one thing you should be aware of is whenever a processing step is I/O or CPU bound. If you are I/O bound, I'd go for the threading module whenever you can: the memory footprint of your application will be smaller and the communication between threads can be more efficient, as shared memory is allowed. Only if you need additional CPU power, go for multiprocessing. In your system, you can use both (it looks like process 3 (or more) will do some heavy computing, while the other workers will predominantly be I/O bound).

How do I limit the number of active threads in python?

Am new to python and making some headway with threading - am doing some music file conversion and want to be able to utilize the multiple cores on my machine (one active conversion thread per core).
class EncodeThread(threading.Thread):
# this is hacked together a bit, but should give you an idea
def run(self):
decode = subprocess.Popen(["flac","--decode","--stdout",self.src],
stdout=subprocess.PIPE)
encode = subprocess.Popen(["lame","--quiet","-",self.dest],
stdin=decode.stdout)
encode.communicate()
# some other code puts these threads with various src/dest pairs in a list
for proc in threads: # `threads` is my list of `threading.Thread` objects
proc.start()
Everything works, all the files get encoded, bravo! ... however, all the processes spawn immediately, yet I only want to run two at a time (one for each core). As soon as one is finished, I want it to move on to the next on the list until it is finished, then continue with the program.
How do I do this?
(I've looked at the thread pool and queue functions but I can't find a simple answer.)
Edit: maybe I should add that each of my threads is using subprocess.Popen to run a separate command line decoder (flac) piped to stdout which is fed into a command line encoder (lame/mp3).

If you want to limit the number of parallel threads, use a semaphore:
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
class EncodeThread(threading.Thread):
def run(self):
threadLimiter.acquire()
try:
<your code here>
finally:
threadLimiter.release()
Start all threads at once. All but maximumNumberOfThreads will wait in threadLimiter.acquire() and a waiting thread will only continue once another thread goes through threadLimiter.release().

"Each of my threads is using subprocess.Popen to run a separate command line [process]".
Why have a bunch of threads manage a bunch of processes? That's exactly what an OS does that for you. Why micro-manage what the OS already manages?
Rather than fool around with threads overseeing processes, just fork off processes. Your process table probably can't handle 2000 processes, but it can handle a few dozen (maybe a few hundred) pretty easily.
You want to have more work than your CPU's can possibly handle queued up. The real question is one of memory -- not processes or threads. If the sum of all the active data for all the processes exceeds physical memory, then data has to be swapped, and that will slow you down.
If your processes have a fairly small memory footprint, you can have lots and lots running. If your processes have a large memory footprint, you can't have very many running.

If you're using the default "cpython" version then this won't help you, because only one thread can execute at a time; look up Global Interpreter Lock. Instead, I'd suggest looking at the multiprocessing module in Python 2.6 -- it makes parallel programming a cinch. You can create a Pool object with 2*num_threads processes, and give it a bunch of tasks to do. It will execute up to 2*num_threads tasks at a time, until all are done.
At work I have recently migrated a bunch of Python XML tools (a differ, xpath grepper, and bulk xslt transformer) to use this, and have had very nice results with two processes per processor.

It looks to me that what you want is a pool of some sort, and in that pool you would like the have n threads where n == the number of processors on your system. You would then have another thread whose only job was to feed jobs into a queue which the worker threads could pick up and process as they became free (so for a dual code machine, you'd have three threads but the main thread would be doing very little).
As you are new to Python though I'll assume you don't know about the GIL and it's side-effects with regard to threading. If you read the article I linked you will soon understand why traditional multithreading solutions are not always the best in the Python world. Instead you should consider using the multiprocessing module (new in Python 2.6, in 2.5 you can use this backport) to achieve the same effect. It side-steps the issue of the GIL by using multiple processes as if they were threads within the same application. There are some restrictions about how you share data (you are working in different memory spaces) but actually this is no bad thing: they just encourage good practice such as minimising the contact points between threads (or processes in this case).
In your case you are probably intersted in using a pool as specified here.

Short answer: don't use threads.
For a working example, you can look at something I've recently tossed together at work. It's a little wrapper around ssh which runs a configurable number of Popen() subprocesses. I've posted it at: Bitbucket: classh (Cluster Admin's ssh Wrapper).
As noted, I don't use threads; I just spawn off the children, loop over them calling their .poll() methods and checking for timeouts (also configurable) and replenish the pool as I gather the results. I've played with different sleep() values and in the past I've written a version (before the subprocess module was added to Python) which used the signal module (SIGCHLD and SIGALRM) and the os.fork() and os.execve() functions --- which my on pipe and file descriptor plumbing, etc).
In my case I'm incrementally printing results as I gather them ... and remembering all of them to summarize at the end (when all the jobs have completed or been killed for exceeding the timeout).
I ran that, as posted, on a list of 25,000 internal hosts (many of which are down, retired, located internationally, not accessible to my test account etc). It completed the job in just over two hours and had no issues. (There were about 60 of them that were timeouts due to systems in degenerate/thrashing states -- proving that my timeout handling works correctly).
So I know this model works reliably. Running 100 current ssh processes with this code doesn't seem to cause any noticeable impact. (It's a moderately old FreeBSD box). I used to run the old (pre-subprocess) version with 100 concurrent processes on my old 512MB laptop without problems, too).
(BTW: I plan to clean this up and add features to it; feel free to contribute or to clone off your own branch of it; that's what Bitbucket.org is for).

I am not an expert in this, but I have read something about "Lock"s. This article might help you out
Hope this helps

I would like to add something, just as a reference for others looking to do something similar, but who might have coded things different from the OP. This question was the first one I came across when searching and the chosen answer pointed me in the right direction. Just trying to give something back.
import threading
import time
maximumNumberOfThreads = 2
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
def simulateThread(a,b):
threadLimiter.acquire()
try:
#do some stuff
c = a + b
print('a + b = ',c)
time.sleep(3)
except NameError: # Or some other type of error
# in case of exception, release
print('some error')
threadLimiter.release()
finally:
# if everything completes without error, release
threadLimiter.release()
threads = []
sample = [1,2,3,4,5,6,7,8,9]
for i in range(len(sample)):
thread = threading.Thread(target=(simulateThread),args=(sample[i],2))
thread.daemon = True
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
This basically follows what you will find on this site:
https://www.kite.com/python/docs/threading.BoundedSemaphore

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.