Python multi-process with time-out kill for individual processes - python

I'm having trouble coming up with a piece of code that
spawns multiple processes and
for any individual process, kill
the process if it is still alive after 5 secs
I know how to handle 1) and 2) individually, but I don't know how to combine them together. Any suggestions would be helpful. Thanks!
For 1) I know how to write a simple multi-process program with return dictionary from here:
import multiprocessing
def worker(procnum, return_dict):
'''worker function'''
print str(procnum) + ' represent!'
return_dict[procnum] = procnum
if __name__ == '__main__':
manager = multiprocessing.Manager()
return_dict = manager.dict()
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,return_dict))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
print return_dict.values()
For 2), my program hangs on some data, as a function, which is an external C++ extension of Python, will not return. Since there are one million data points to handle, I need to have a time-out killer that kills this function when it's running too long, and moves on to the next iteration. Currently I set a wait time of 5 seconds before killing this process. I know how to write the code from here:
import multiprocessing
import time
# bar
def bar():
for i in range(100):
print "Tick"
time.sleep(1)
if __name__ == '__main__':
# Start bar as a process
p = multiprocessing.Process(target=bar)
p.start()
# Wait for 10 seconds or until process finishes
p.join(10)
# If thread is still active
if p.is_alive():
print "running... let's kill it..."
# Terminate
p.terminate()
p.join()
But, as I mentioned, I am not sure know how to combine these two pieces of code together. Mainly because I don't know where to put the p.join() and if p.alive(). What's the use of p.join() since we already have `p.start()'?
Thanks!

Related

Run function in parallel and grab outputs using Queue

I would like to fun a function using different arguments. For each different argument, I would like to run the function in parallel and then get the output of each run. It seems that the multiprocessing module can help here. I am not sure about the right steps to make this work.
Do I start all the processes, then get all the queues and then join all the processes in this order? Or do I get the results after I have joined? Or do I get the ith result after I have joined the ith process?
from numpy.random import uniform
from multiprocessing import Process, Queue
def function(x):
return uniform(0.0, x)
if __name__ == "__main__":
queue = Queue()
processes = []
x_values = [1.0, 10.0, 100.0]
# Start all processes
for x in x_values:
process = Process(target=function, args=(x, queue, ))
processes.append(process)
process.start()
# Grab results of the processes?
outputs = [queue.get() for _ in range(len(x_values))]
# Not even sure what this does but apparently it's needed
for process in processes:
process.join()
So lets make a simple example for multiprocessing pools with a loaded function that sleeps for 3 seconds and returns the value passed to it(your parameter) and also the result of the function which is just doubling it.
IIRC there's some issue with stopping pools cleanly
from multiprocessing import Pool
import time
def time_waster(val):
try:
time.sleep(3)
return (val, val*2) #return a tuple here but you can use a dict as well with all your parameters
except KeyboardInterrupt:
raise KeyboardInterruptError()
if __name__ == '__main__':
x = list(range(5)) #values to pass to the function
results = []
try:
with Pool(2) as p: #I use 2 but you can use as many as you have cores
results.append(p.map(time_waster,x))
except KeyboardInterrupt:
p.terminate()
except Exception as e:
p.terminate()
finally:
p.join()
print(results)
As an extra service added some keyboardinterrupt handlers as IIRC there are some issues interrupting pools.https://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool
proc.join() blocks until the process ended. queue.get() blocks until there is something in the queue. Because your processes don't put anything into the queue (in this example) than this code will never get beyond the queue.get() part... If your processes put something in the queue at the very end, then it doesn't matter if you first join() or get() because they happen at about the same time.

python multiprocessing - child process blocking parent process

I am trying to learn multiprocessing, and created an example, however it's behaving unexpectedly.
the parent process run, then create a child process, but resources doesnt go back to parent until child is done.
code:
from multiprocessing import Process
import time
def f():
newTime = time.time() + 7
while(time.time() < newTime):
print("inside child process")
time.sleep(int(5))
if __name__ == '__main__':
bln = True
while(True):
newTime = time.time() + 4
while(time.time() < newTime):
print("printing fillers")
if(bln):
p = Process(target=f)
p.start()
p.join()
bln = False
result
"inside child process"
(wait for 5 sec)
"inside child process"
"printing fillers"
"printing fillers"
[...]
If I remove 'p.join()' then it will work. But from my understanding, p.join() is to tell the program to wait for this thread/process to finish before ending the program.
Can someone tell me why this is happening?
But from my understanding, p.join() is to tell the program to wait for
this thread/process to finish before ending the program.
Nope, It blocks the main thread right then and there until the thread / process finishes. By doing that right after you start the process, you don't let the loop continue until each process completes.
It would be better to collect all the Process objects you create into a list, so they can be accessed after the loop creating them. Then in a new loop, wait for them to finish only after they are all created and started.
#for example
processes = []
for i in whatever:
p = Process(target=foo)
p.start()
processes.append(p)
for p in processes:
p.join()
If you want to be able to do things in the meantime (while waiting for join), it is most common to use yet another thread or process. You can also choose to only wait a short time on join by giving it a timeout value, and if the process doesn't complete in that amount of time, an exception will be thrown which you can catch with a try block, and decide to go do something else before trying to join again.
p.join() isn't for ending the program, it's for waiting for a subprocess to finish. If you need to end the program, use something like sys.exit(0) or raise SystemExit('your reason here')

Python multiprocessing NOT using available Cores

I ran below simple Python program - to do 4 processes separately. I expect the program completes execution in 4 seconds (as you can see in the code), but it takes 10 seconds - meaning it does not do parallel processing. I have more than 1 core in my CPU, but the program seems using just one. Please guide me how can I achieve parallel processing here? Thanks.
import multiprocessing
import time
from datetime import datetime
def foo(i):
print(datetime.now())
time.sleep(i)
print(datetime.now())
print("=========")
if __name__ == '__main__':
for i in range(4,0,-1):
p = multiprocessing.Process(target=foo, args=(i,))
p.start()
p.join()
print("Done main")
Whenever you call join on a process, your execution block and waits for that process to finish. So in your code, you always wait before for the last process to finish before starting the next one. You need to keep a reference to the processes and join to them after all of them have started, eg. like this:
if __name__ == '__main__':
processes = [multiprocessing.Process(target=foo, args=(i,))
for i in range(4,0,-1)]
for p in processes:
p.start()
for p in processes:
p.join()
print("Done main")

python multiprocessing pool: how can I know when all the workers in the pool have finished?

I am running a multiprocessing pool in python, where I have ~2000 tasks, being mapped to 24 workers with the pool.
each task creates a file based on some data analysis and webservices.
I want to run a new task, when all the tasks in the pool were finished. how can I tell when all the processes in the pool have finished?
You want to use the join method, which halts the main process thread from moving forward until all sub-processes ends:
Block the calling thread until the process whose join() method is called terminates or until the optional timeout occurs.
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
processes = []
for i in range(10):
p = Process(target=f, args=('bob',))
processes.append(p)
for p in processes:
p.start()
p.join()
# only get here once all processes have finished.
print('finished!')
EDIT:
To use join with pools
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, (10,)) # do some work
pool.close()
pool.join() # block at this line until all processes are done
print("completed")
You can use the wait() method of the ApplyResult object (which is what pool.apply_async returns).
import multiprocessing
def create_file(i):
open(f'{i}.txt', 'a').close()
if __name__ == '__main__':
# The default for n_processes is the detected number of CPUs
with multiprocessing.Pool() as pool:
# Launch the first round of tasks, building a list of ApplyResult objects
results = [pool.apply_async(create_file, (i,)) for i in range(50)]
# Wait for every task to finish
[result.wait() for result in results]
# {start your next task... the pool is still available}
# {when you reach here, the pool is closed}
This method works even if you're planning on using your pool again and don't want to close it--as an example, you might want to keep it around for the next iteration of your algorithm. Use a with statement or call pool.close() manually when you're done using it, or bad things will happen.

how to fetch process from python process pool

I want to create many processes,
each process runs 5 seconds later than a previous process,
namely, the time interval between each process starts is 5 seconds,
so that:
run process 1
wait 5 seconds
run process 2
wait 5 seconds
run process 3
wait 5 seconds
.....
like:
for i in range(10):
p = multiprocessing.Process(target=func)
p.start()
sleep(5)
#after all child process exit
do_something()
but I want to call do_something() after all the process exit
I don't know how to do the synchronization here
with a python pool libary, I can have
pool = multiprocessing.Pool(processes=4)
for i in xrange(500):
pool.apply_async(func, i)
pool.close()
pool.join()
do_something()
but in this way, 4 processes will run simultaneously,
I can't decide the time interval between processes,
is it possible to create a process pool and then fetch each process, something like
pool = multiprocessing.Pool(processes=4)
for i in xrange(500):
process = pool.fetch_one()
process(func, i)
time.sleep(5)
pool.close()
pool.join()
do_something()
are there such a library or source code snippets which satisfy my needs?
thanks
Just to put suggestions together, you could do something like:
plist = []
for i in range(10):
p = multiprocessing.Process(target=func)
p.start()
plist.append(p)
sleep(5)
for p in plist:
p.join()
do_something()
You could give a timeout argument to join() in order to handle stuck processes; in that case you'd have to keep iterating through the list, removing terminated processes until the list is empty.

Categories

Resources