How to spawn parallel child processes on a multi-processor system?

How to spawn parallel child processes on a multi-processor system? - python

I have a Python script that I want to use as a controller to another Python script. I have a server with 64 processors, so want to spawn up to 64 child processes of this second Python script. The child script is called:
$ python create_graphs.py --name=NAME
where NAME is something like XYZ, ABC, NYU etc.
In my parent controller script I retrieve the name variable from a list:
my_list = [ 'XYZ', 'ABC', 'NYU' ]
So my question is, what is the best way to spawn off these processes as children? I want to limit the number of children to 64 at a time, so need to track the status (if the child process has finished or not) so I can efficiently keep the whole generation running.
I looked into using the subprocess package, but rejected it because it only spawns one child at a time. I finally found the multiprocessor package, but I admit to being overwhelmed by the whole threads vs. subprocesses documentation.
Right now, my script uses subprocess.call to only spawn one child at a time and looks like this:
#!/path/to/python
import subprocess, multiprocessing, Queue
from multiprocessing import Process
my_list = [ 'XYZ', 'ABC', 'NYU' ]
if __name__ == '__main__':
processors = multiprocessing.cpu_count()
for i in range(len(my_list)):
if( i < processors ):
cmd = ["python", "/path/to/create_graphs.py", "--name="+ my_list[i]]
child = subprocess.call( cmd, shell=False )
I really want it to spawn up 64 children at a time. In other stackoverflow questions I saw people using Queue, but it seems like that creates a performance hit?

What you are looking for is the process pool class in multiprocessing.
import multiprocessing
import subprocess
def work(cmd):
return subprocess.call(cmd, shell=False)
if __name__ == '__main__':
count = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=count)
print pool.map(work, ['ls'] * count)
And here is a calculation example to make it easier to understand. The following will divide 10000 tasks on N processes where N is the cpu count. Note that I'm passing None as the number of processes. This will cause the Pool class to use cpu_count for the number of processes (reference)
import multiprocessing
import subprocess
def calculate(value):
return value * 10
if __name__ == '__main__':
pool = multiprocessing.Pool(None)
tasks = range(10000)
results = []
r = pool.map_async(calculate, tasks, callback=results.append)
r.wait() # Wait on the results
print results

Here is the solution I came up, based on Nadia and Jim's comments. I am not sure if it is the best way, but it works. The original child script being called needs to be a shell script because I need to use some 3rd party apps including Matlab. So I had to take it out of Python and code it in bash.
import sys
import os
import multiprocessing
import subprocess
def work(staname):
print 'Processing station:',staname
print 'Parent process:', os.getppid()
print 'Process id:', os.getpid()
cmd = [ "/bin/bash" "/path/to/executable/create_graphs.sh","--name=%s" % (staname) ]
return subprocess.call(cmd, shell=False)
if __name__ == '__main__':
my_list = [ 'XYZ', 'ABC', 'NYU' ]
my_list.sort()
print my_list
# Get the number of processors available
num_processes = multiprocessing.cpu_count()
threads = []
len_stas = len(my_list)
print "+++ Number of stations to process: %s" % (len_stas)
# run until all the threads are done, and there is no data left
for list_item in my_list:
# if we aren't using all the processors AND there is still data left to
# compute, then spawn another thread
if( len(threads) < num_processes ):
p = multiprocessing.Process(target=work,args=[list_item])
p.start()
print p, p.is_alive()
threads.append(p)
else:
for thread in threads:
if not thread.is_alive():
threads.remove(thread)
Does this seem like a reasonable solution? I tried to use Jim's while loop format, but my script just returned nothing. I am not sure why that would be. Here is the output when I run the script with Jim's 'while' loop replacing the 'for' loop:
hostname{me}2% controller.py
['ABC', 'NYU', 'XYZ']
Number of processes: 64
+++ Number of stations to process: 3
hostname{me}3%
When I run it with the 'for' loop, I get something more meaningful:
hostname{me}6% controller.py
['ABC', 'NYU', 'XYZ']
Number of processes: 64
+++ Number of stations to process: 3
Processing station: ABC
Parent process: 1056
Process id: 1068
Processing station: NYU
Parent process: 1056
Process id: 1069
Processing station: XYZ
Parent process: 1056
Process id: 1071
hostname{me}7%
So this works, and I am happy. However, I still don't get why I can't use Jim's 'while' style loop instead of the 'for' loop I am using. Thanks for all the help - I am impressed with the breadth of knowledge # stackoverflow.

I would definitely use multiprocessing rather than rolling my own solution using subprocess.

I don't think you need queue unless you intend to get data out of the applications (Which if you do want data, I think it may be easier to add it to a database anyway)
but try this on for size:
put the contents of your create_graphs.py script all into a function called "create_graphs"
import threading
from create_graphs import create_graphs
num_processes = 64
my_list = [ 'XYZ', 'ABC', 'NYU' ]
threads = []
# run until all the threads are done, and there is no data left
while threads or my_list:
# if we aren't using all the processors AND there is still data left to
# compute, then spawn another thread
if (len(threads) < num_processes) and my_list:
t = threading.Thread(target=create_graphs, args=[ my_list.pop() ])
t.setDaemon(True)
t.start()
threads.append(t)
# in the case that we have the maximum number of threads check if any of them
# are done. (also do this when we run out of data, until all the threads are done)
else:
for thread in threads:
if not thread.isAlive():
threads.remove(thread)
I know that this will result in 1 less threads than processors, which is probably good, it leaves a processor to manage the threads, disk i/o, and other things happening on the computer. If you decide you want to use the last core just add one to it
edit: I think I may have misinterpreted the purpose of my_list. You do not need my_list to keep track of the threads at all (as they're all referenced by the items in the threads list). But this is a fine way of feeding the processes input - or even better: use a generator function ;)
The purpose of my_list and threads
my_list holds the data that you need to process in your function
threads is just a list of the currently running threads
the while loop does two things, start new threads to process the data, and check if any threads are done running.
So as long as you have either (a) more data to process, or (b) threads that aren't finished running.... you want to program to continue running. Once both lists are empty they will evaluate to False and the while loop will exit

Related

How to run n processes simultaneously in Python

I am trying to execute n processes simultaneously. The example below works with 2 processes that are supplied externally.
At the moment it is all hard-coded for just these 2 processes but I would need to come up with the generic solution how to accomplish the same - i.e. run n processes at the same time.
My code is as follows:
import multiprocessing
'''
The first process: print 'aa'
The second Process: print 'BB'
'''
def TR1():
print 'aaaaaaaaa'
def TR2():
print 'BBBBBBBB'
if __name__ == '__main__':
process_1 = multiprocessing.Process(name='process_1', target=TR1)
process_2 = multiprocessing.Process(name='process_2', target=TR2)
process_1.start()
process_2.start()
Thanks for your suggestions!

You can either spawn processes in a loop, or use executor pool.
In real life, later one is often preferred approach, as you can limit pool size and have easy result gathering.
If you're using python 2, there's backport including ProcessPoolExecutor

python multiprocessing .join() deadlock depends on worker function

I am using the multiprocessing python library to spawn 4 Process() objects to parallelize a cpu intensive task. The task (inspiration and code from this great article) is to compute the prime factors for every integer in a list.
main.py:
import random
import multiprocessing
import sys
num_inputs = 4000
num_procs = 4
proc_inputs = num_inputs/num_procs
input_list = [int(1000*random.random()) for i in xrange(num_inputs)]
output_queue = multiprocessing.Queue()
procs = []
for p_i in xrange(num_procs):
print "Process [%d]"%p_i
proc_list = input_list[proc_inputs * p_i:proc_inputs * (p_i + 1)]
print " - num inputs: [%d]"%len(proc_list)
# Using target=worker1 HANGS on join
p = multiprocessing.Process(target=worker1, args=(p_i, proc_list, output_queue))
# Using target=worker2 RETURNS with success
#p = multiprocessing.Process(target=worker2, args=(p_i, proc_list, output_queue))
procs.append(p)
p.start()
for p in jobs:
print "joining ", p, output_queue.qsize(), output_queue.full()
p.join()
print "joined ", p, output_queue.qsize(), output_queue.full()
print "Processing complete."
ret_vals = []
while output_queue.empty() == False:
ret_vals.append(output_queue.get())
print len(ret_vals)
print sys.getsizeof(ret_vals)
Observation:
If the target for each process is the function worker1, for an input list larger than 4000 elements the main thread gets stuck on .join(), waiting for the spawned processes to terminate and never returns.
If the target for each process is the function worker2, for the same input list the code works just fine and the main thread returns.
This is very confusing to me, as the only difference between worker1 and worker2 (see below) is that the former inserts individual lists in the Queue whereas the latter inserts a single list of lists for each process.
Why is there deadlock using worker1 and not using worker2 target?
Shouldn't both (or neither) go beyond the Multiprocessing Queue maxsize limit is 32767?
worker1 vs worker2:
def worker1(proc_num, proc_list, output_queue):
'''worker function which deadlocks'''
for num in proc_list:
output_queue.put(factorize_naive(num))
def worker2(proc_num, proc_list, output_queue):
'''worker function that works'''
workers_stuff = []
for num in proc_list:
workers_stuff.append(factorize_naive(num))
output_queue.put(workers_stuff)
There are a lot of similar questions on SO, but I believe the core of this questions is clearly distinct from all of them.
Related Links:
https://sopython.com/canon/82/programs-using-multiprocessing-hang-deadlock-and-never-complete/
python multiprocessing issues
python multiprocessing - process hangs on join for large queue
Process.join() and queue don't work with large numbers
Python 3 Multiprocessing queue deadlock when calling join before the queue is empty
Script using multiprocessing module does not terminate
Why does multiprocessing.Process.join() hang?
When to call .join() on a process?
What exactly is Python multiprocessing Module's .join() Method Doing?

The docs warn about this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
While a Queue appears to be unbounded, under the covers queued items are buffered in memory to avoid overloading inter-process pipes. A process cannot end normally before those memory buffers are flushed. Your worker1() puts a lot more items on the queue than your worker2(), and that's all there is to it. Note that the number of items that can queued before the implementation resorts to buffering in memory isn't defined: it can vary across OS and Python release.
As the docs suggest, the normal way to avoid this is to .get() all the items off the queue before you attempt to .join() the processes. As you've discovered, whether it's necessary to do so depends in an undefined way on how many items have been put on the queue by each worker process.

Running Python on multiple cores

I have created a (rather large) program that takes quite a long time to finish, and I started looking into ways to speed up the program.
I found that if I open task manager while the program is running only one core is being used.
After some research, I found this website:
Why does multiprocessing use only a single core after I import numpy? which gives a solution of os.system("taskset -p 0xff %d" % os.getpid()),
however this doesn't work for me, and my program continues to run on a single core.
I then found this:
is python capable of running on multiple cores?,
which pointed towards using multiprocessing.
So after looking into multiprocessing, I came across this documentary on how to use it https://docs.python.org/3/library/multiprocessing.html#examples
I tried the code:
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
a = input("Finished")
After running the code (not in IDLE) It said this:
Finished
hello bob
Finished
Note: after it said Finished the first time I pressed enter
So after this I am now even more confused and I have two questions
First: It still doesn't run with multiple cores (I have an 8 core Intel i7)
Second: Why does it input "Finished" before its even run the if statement code (and it's not even finished yet!)

To answer your second question first, "Finished" is printed to the terminal because a = input("Finished") is outside of your if __name__ == '__main__': code block. It is thus a module level constant which gets assigned when the module is first loaded and will execute before any code in the module runs.
To answer the first question, you only created one process which you run and then wait to complete before continuing. This gives you zero benefits of multiprocessing and incurs overhead of creating the new process.
Because you want to create several processes, you need to create a pool via a collection of some sort (e.g. a python list) and then start all of the processes.
In practice, you need to be concerned with more than the number of processors (such as the amount of available memory, the ability to restart workers that crash, etc.). However, here is a simple example that completes your task above.
import datetime as dt
from multiprocessing import Process, current_process
import sys
def f(name):
print('{}: hello {} from {}'.format(
dt.datetime.now(), name, current_process().name))
sys.stdout.flush()
if __name__ == '__main__':
worker_count = 8
worker_pool = []
for _ in range(worker_count):
p = Process(target=f, args=('bob',))
p.start()
worker_pool.append(p)
for p in worker_pool:
p.join() # Wait for all of the workers to finish.
# Allow time to view results before program terminates.
a = input("Finished") # raw_input(...) in Python 2.
Also note that if you join workers immediately after starting them, you are waiting for each worker to complete its task before starting the next worker. This is generally undesirable unless the ordering of the tasks must be sequential.
Typically Wrong
worker_1.start()
worker_1.join()
worker_2.start() # Must wait for worker_1 to complete before starting worker_2.
worker_2.join()
Usually Desired
worker_1.start()
worker_2.start() # Start all workers.
worker_1.join()
worker_2.join() # Wait for all workers to finish.
For more information, please refer to the following links:
https://docs.python.org/3/library/multiprocessing.html
Dead simple example of using Multiprocessing Queue, Pool and Locking
https://pymotw.com/2/multiprocessing/basics.html
https://pymotw.com/2/multiprocessing/communication.html
https://pymotw.com/2/multiprocessing/mapreduce.html

Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

I would like to run a number of jobs using a pool of processes and apply a given timeout after which a job should be killed and replaced by another working on the next task.
I have tried to use the multiprocessing module which offers a method to run of pool of workers asynchronously (e.g. using map_async), but there I can only set a "global" timeout after which all processes would be killed.
Is it possible to have an individual timeout after which only a single process that takes too long is killed and a new worker is added to the pool again instead (processing the next task and skipping the one that timed out)?
Here's a simple example to illustrate my problem:
def Check(n):
import time
if n % 2 == 0: # select some (arbitrary) subset of processes
print "%d timeout" % n
while 1:
# loop forever to simulate some process getting stuck
pass
print "%d done" % n
return 0
from multiprocessing import Pool
pool = Pool(processes=4)
result = pool.map_async(Check, range(10))
print result.get(timeout=1)
After the timeout all workers are killed and the program exits. I would like instead that it continues with the next subtask. Do I have to implement this behavior myself or are there existing solutions?
Update
It is possible to kill the hanging workers and they are automatically replaced. So I came up with this code:
jobs = pool.map_async(Check, range(10))
while 1:
try:
print "Waiting for result"
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
for c in multiprocessing.active_children():
c.terminate()
print result
The problem now is that the loop never exits; even after all tasks have been processed, calling get yields a timeout exception.

The pebble Pool module has been built for solving these types of issue. It supports timeout on given tasks allowing to detect them and easily recover.
from pebble import ProcessPool
from concurrent.futures import TimeoutError
with ProcessPool() as pool:
future = pool.schedule(function, args=[1,2], timeout=5)
try:
result = future.result()
except TimeoutError:
print "Function took longer than %d seconds" % error.args[1]
For your specific example:
from pebble import ProcessPool
from concurrent.futures import TimeoutError
results = []
with ProcessPool(max_workers=4) as pool:
future = pool.map(Check, range(10), timeout=5)
iterator = future.result()
# iterate over all results, if a computation timed out
# print it and continue to the next result
while True:
try:
result = next(iterator)
results.append(result)
except StopIteration:
break
except TimeoutError as error:
print "function took longer than %d seconds" % error.args[1]
print results

Currently the Python does not provide native means to the control execution time of each distinct task in the pool outside the worker itself.
So the easy way is to use wait_procs in the psutil module and implement the tasks as subprocesses.
If nonstandard libraries are not desirable, then you have to implement own Pool on base of subprocess module having the working cycle in the main process, poll() - ing the execution of each worker and performing required actions.
As for the updated problem, the pool becomes corrupted if you directly terminate one of the workers (it is the bug in the interpreter implementation, because such behavior should not be allowed): the worker is recreated, but the task is lost and the pool becomes nonjoinable.
You have to terminate all the pool and then recreate it again for another tasks:
from multiprocessing import Pool
while True:
pool = Pool(processes=4)
jobs = pool.map_async(Check, range(10))
print "Waiting for result"
try:
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
pool.terminate()
pool.join()
print result
UPDATE
Pebble is an excellent and handy library, which solves the issue. Pebble is designed for the asynchronous execution of Python functions, where is PyExPool is designed for the asynchronous execution of modules and external executables, though both can be used interchangeably.
One more aspect is when 3dparty dependencies are not desirable, then PyExPool can be a good choice, which is a single-file lightweight implementation of Multi-process Execution Pool with per-Job and global timeouts, opportunity to group Jobs into Tasks and other features.
PyExPool can be embedded into your sources and customized, having permissive Apache 2.0 license and production quality, being used in the core of one high-loaded scientific benchmarking framework.

Try the construction where each process is being joined with a timeout on a separate thread. So the main program never gets stuck and as well the processes which if gets stuck, would be killed due to timeout. This technique is a combination of threading and multiprocessing modules.
Here is my way to maintain the minimum x number of threads in the memory. Its an combination of threading and multiprocessing modules. It may be unusual to other techniques like respected fellow members have explained above BUT may be worth considerable. For the sake of explanation, I am taking a scenario of crawling a minimum of 5 websites at a time.
so here it is:-
#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading
# Crawler function
def crawler(domain):
# define crawler technique here.
output.write(scrapeddata + "\n")
pass
Next is threadController function. This function will control the flow of threads to the main memory. It will keep activating the threads to maintain the threadNum "minimum" limit ie. 5. Also it won't exit until, all Active threads(acitveCount) are finished up.
It will maintain a minimum of threadNum(5) startProcess function threads (these threads will eventually start the Processes from the processList while joining them with a time out of 60 seconds). After staring threadController, there would be 2 threads which are not included in the above limit of 5 ie. the Main thread and the threadController thread itself. thats why threading.activeCount() != 2 has been used.
def threadController():
print "Thread count before child thread starts is:-", threading.activeCount(), len(processList)
# staring first thread. This will make the activeCount=3
Thread(target = startProcess).start()
# loop while thread List is not empty OR active threads have not finished up.
while len(processList) != 0 or threading.activeCount() != 2:
if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
len(processList) != 0): # processList is not empty
Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **
startProcess function, as a separate thread, would start Processes from the processlist. The purpose of this function (**started as a different thread) is that It would become a parent thread for Processes. So when It will join them with a timeout of 60 seconds, this would stop the startProcess thread to move ahead but this won't stop threadController to perform. So this way, threadController will work as required.
def startProcess():
pr = processList.pop(0)
pr.start()
pr.join(60.00) # joining the thread with time out of 60 seconds as a float.
if __name__ == '__main__':
# a file holding a list of domains
domains = open("Domains.txt", "r").read().split("\n")
output = open("test.txt", "a")
processList = [] # thread list
threadNum = 5 # number of thread initiated processes to be run at one time
# making process List
for r in range(0, len(domains), 1):
domain = domains[r].strip()
p = Process(target = crawler, args = (domain,))
processList.append(p) # making a list of performer threads.
# starting the threadController as a seperate thread.
mt = Thread(target = threadController)
mt.start()
mt.join() # won't let go next until threadController thread finishes.
output.close()
print "Done"
Besides maintaining a minimum number of threads in the memory, my aim was to also have something which could avoid stuck threads or processes in the memory. I did this using the time out function. My apologies for any typing mistake.
I hope this construction would help anyone in this world.
Regards,
Vikas Gautam

Multiprocessing with python3 only runs once

I have a problem running multiple processes in python3 .
My program does the following:
1. Takes entries from an sqllite database and passes them to an input_queue
2. Create multiple processes that take items off the input_queue, run it through a function and output the result to the output queue.
3. Create a thread that takes items off the output_queue and prints them (This thread is obviously started before the first 2 steps)
My problem is that currently the 'function' in step 2 is only run as many times as the number of processes set, so for example if you set the number of processes to 8, it only runs 8 times then stops. I assumed it would keep running until it took all items off the input_queue.
Do I need to rewrite the function that takes the entries out of the database (step 1) into another process and then pass its output queue as an input queue for step 2?
Edit:
Here is an example of the code, I used a list of numbers as a substitute for the database entries as it still performs the same way. I have 300 items on the list and I would like it to process all 300 items, but at the moment it just processes 10 (the number of processes I have assigned)
#!/usr/bin/python3
from multiprocessing import Process,Queue
import multiprocessing
from threading import Thread
## This is the class that would be passed to the multi_processing function
class Processor:
def __init__(self,out_queue):
self.out_queue = out_queue
def __call__(self,in_queue):
data_entry = in_queue.get()
result = data_entry*2
self.out_queue.put(result)
#Performs the multiprocessing
def perform_distributed_processing(dbList,threads,processor_factory,output_queue):
input_queue = Queue()
# Create the Data processors.
for i in range(threads):
processor = processor_factory(output_queue)
data_proc = Process(target = processor,
args = (input_queue,))
data_proc.start()
# Push entries to the queue.
for entry in dbList:
input_queue.put(entry)
# Push stop markers to the queue, one for each thread.
for i in range(threads):
input_queue.put(None)
data_proc.join()
output_queue.put(None)
if __name__ == '__main__':
output_results = Queue()
def output_results_reader(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
# Establish results collecting thread.
results_process = Thread(target = output_results_reader,args = (output_results,))
results_process.start()
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
# Perform multi processing
perform_distributed_processing(dbList,10,Processor,output_results)
# Wait for it all to finish.
results_process.join()

A collection of processes that service an input queue and write to an output queue is pretty much the definition of a process pool.
If you want to know how to build one from scratch, the best way to learn is to look at the source code for multiprocessing.Pool, which is pretty simply Python, and very nicely written. But, as you might expect, you can just use multiprocessing.Pool instead of re-implementing it. The examples in the docs are very nice.
But really, you could make this even simpler by using an executor instead of a pool. It's hard to explain the difference (again, read the docs for both modules), but basically, a future is a "smart" result object, which means instead of a pool with a variety of different ways to run jobs and get results, you just need a dumb thing that doesn't know how to do anything but return futures. (Of course in the most trivial cases, the code looks almost identical either way…)
from concurrent.futures import ProcessPoolExecutor
def Processor(data_entry):
return data_entry*2
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
yield from executor.map(processor_factory, dbList)
if __name__ == '__main__':
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
for result in perform_distributed_processing(dbList, 8, Processor):
print(result)
Or, if you want to handle them as they come instead of in order:
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
fs = (executor.submit(processor_factory, db) for db in dbList)
yield from map(Future.result, as_completed(fs))
Notice that I also replaced your in-process queue and thread, because it wasn't doing anything but providing a way to interleave "wait for the next result" and "process the most recent result", and yield (or yield from, in this case) does that without all the complexity, overhead, and potential for getting things wrong.

Don't try to rewrite the whole multiprocessing library again. I think you can use any of multiprocessing.Pool methods depending on your needs - if this is a batch job you can even use the synchronous multiprocessing.Pool.map() - only instead of pushing to input queue, you need to write a generator that yields input to the threads.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to spawn parallel child processes on a multi-processor system? - python

I would definitely use multiprocessing rather than rolling my own solution using subprocess.

Related

How to run n processes simultaneously in Python

python multiprocessing .join() deadlock depends on worker function

Running Python on multiple cores

Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

Multiprocessing with python3 only runs once

Categories

Resources