This problem seems to have been eluding me - all the solutions are more like workarounds and add quite a bit of complexity to the code.
Since its been a good while since any posts regarding this have been made, are there any simple solutions to the following - upon detecting a keyboard interrupt, cleanly exit all the childs proceses, terminate the program?
Code below is snippet of my multiproccess structure - I'd like to preserve as much as posible, while adding the needed functionality:
from multiprocessing import Pool
import time
def multiprocess_init(l):
global lock
lock = l
def synchronous_print(i):
with lock:
print i
time.sleep(1)
if __name__ == '__main__':
lock = Lock()
pool = Pool(processes=5, initializer=multiprocess_init, initargs=(lock, ))
for i in range(1,20):
pool.map_async(synchronous_print, [i])
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish
The short answer is to move to python 3. Python 2 has multiple problems with thread/process synchronization that have been fixed in python 3.
In your case, multiprocessing will doggedly recreate your child processes every time you send keyboard interrupt and pool.close will get stuck and never exit. You can reduce the problem by explicitly exiting the child process with os.exit and by waiting for individual results from apply_async so that you don't get stuck in pool.close prison.
from multiprocessing import Pool, Lock
import time
import os
def multiprocess_init(l):
global lock
lock = l
print("initialized child")
def synchronous_print(i):
try:
with lock:
print i
time.sleep(1)
except KeyboardInterrupt:
print("exit child")
os.exit(2)
if __name__ == '__main__':
lock = Lock()
pool = Pool(processes=5, initializer=multiprocess_init, initargs=(lock, ))
results = []
for i in range(1,20):
results.append(pool.map_async(synchronous_print, [i]))
for result in results:
print('wait result')
result.wait()
pool.close() #necessary to prevent zombies
pool.join() #wait for all processes to finish
print("Join completes")
Related
I'm trying to launch a function (my_function) and stop its execution after a certain time is reached.
So i challenged multiprocessing library and everything works well. Here is the code, where my_function() has been changed to only create a dummy message.
from multiprocessing import Queue, Process
from multiprocessing.queues import Empty
import time
timeout=1
# timeout=3
def my_function(something):
time.sleep(2)
return f'my message: {something}'
def wrapper(something, queue):
message ="too late..."
try:
message = my_function(something)
return message
finally:
queue.put(message)
try:
queue = Queue()
params = ("hello", queue)
child_process = Process(target=wrapper, args=params)
child_process.start()
output = queue.get(timeout=timeout)
print(f"ok: {output}")
except Empty:
timeout_message = f"Timeout {timeout}s reached"
print(timeout_message)
finally:
if 'child_process' in locals():
child_process.kill()
You can test and verify that depending on timeout=1 or timeout=3, i can trigger an error or not.
My main problem is that the real my_function() is a torch model inference for which i would like to limit the number of threads (to 4 let's say)
One can easily do so if my_function were in the main process, but in my example i tried a lot of tricks to limit it in the child process without any success (using threadpoolctl.threadpool_limits(4), torch.set_num_threads(4), os.environ["OMP_NUM_THREADS"]=4, os.environ["MKL_NUM_THREADS"]=4).
I'm completely open to other solution that can monitor the time execution of a function while limiting the number of threads used by this function.
thanks
Regards
You can limit simultaneous process with Pool. (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)
You can set max tasks done per child. Check it out.
Here you have a sample from superfastpython by Jason Brownlee:
# SuperFastPython.com
# example of limiting the number of tasks per child in the process pool
from time import sleep
from multiprocessing.pool import Pool
from multiprocessing import current_process
# task executed in a worker process
def task(value):
# get the current process
process = current_process()
# report a message
print(f'Worker is {process.name} with {value}', flush=True)
# block for a moment
sleep(1)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool(2, maxtasksperchild=3) as pool:
# issue tasks to the process pool
for i in range(10):
pool.apply_async(task, args=(i,))
# close the process pool
pool.close()
# wait for all tasks to complete
pool.join()
I wrote a little multi-host online-scanner. So my question is the code correct? I mean, the program does what it should but if I look in the terminal at the top command, it shows me from time to time a lot of zombie threads.
The code for scan and threader function:
def scan(queue):
conf.verb = 0
while True:
try:
host = queue.get()
if host is None:
sys.exit(1)
icmp = sr1(IP(dst=host)/ICMP(),timeout=3)
if icmp:
with Print_lock:
print("Host: {} online".format(host))
saveTofile(RngFile, host)
except Empty:
print("Done")
queue.task_done()
def threader(queue, hostlist):
threads = []
max_threads = 223
for i in range(max_threads):
thread = Thread(target=scan, args=(queue,))
thread.start()
threads.append(thread)
for ip in hostlist:
queue.put(ip)
queue.join()
for i in range(max_threads):
queue.put(None)
for thread in threads:
thread.join()
P.S. Sorry for my terrible english
If you have a lot more threads than you have cores then you aren't really getting any benefit from spawning them, and even worse python has the global interpreter lock so you don't get real multithreading unless you use multiple processes. Use multiprocessing and set max_threads to multiprocessing.cpu_count().
Even better, you could use a pool.
from multiprocessing import Pool, cpu_count
with Pool(cpu_count()) as p:
results = p.map(scan, hostlist)
# change scan to take host directly instead of from a queue
# if you change
And that's it, no messing with queues and filling it with None to make sure you kill all your processes.
I should add, make sure you create your process pool inside the main module! Your code in its entirety should look like this:
from multiprocessing import Pool, cpu_count
def scan(host):
# whatever
if __name__ == "__main__":
with Pool(cpu_count()) as p:
results = p.map(scan, hostlist)
I am new to multiprocessing of Python, and I wrote the tiny script below:
import multiprocessing
import os
def task(queue):
print(100)
def run(pool):
queue = multiprocessing.Queue()
for i in range(os.cpu_count()):
pool.apply_async(task, args=(queue, ))
if __name__ == '__main__':
multiprocessing.freeze_support()
pool = multiprocessing.Pool()
run(pool)
pool.close()
pool.join()
I am wondering why the task() method is not executed and there is no output after running this script. Could anyone help me?
It is running, but it's dying with an error outside the main thread, and so you don't see the error. For that reason, it's always good to .get() the result of an async call, even if you don't care about the result: the .get() will raise the error that's otherwise invisible.
For example, change your loop like so:
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task, args=(queue,)))
for t in tasks:
t.get()
Then the new t.get() will blow up, ending with:
RuntimeError: Queue objects should only be shared between processes through inheritance
In short, passing Queue objects to Pool methods isn't supported.
But you can pass them to multiprocessing.Process(), or to a Pool initialization function. For example, here's a way to do the latter:
import multiprocessing
import os
def pool_init(q):
global queue # make queue global in workers
queue = q
def task():
# can use `queue` here if you like
print(100)
def run(pool):
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task))
for t in tasks:
t.get()
if __name__ == '__main__':
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(initializer=pool_init, initargs=(queue,))
run(pool)
pool.close()
pool.join()
On Linux-y systems, you can - as the original error message suggested - use process inheritance instead (but that's not possible on Windows).
Below is the code which demonstrates the problem. Please note that this is only an example, I am using the same logic in a more complicated application, where I can't use sleep as the amount of time, it will take for process1 to modify the variable, is dependent on the speed of the internet connection.
from multiprocessing import Process
code = False
def func():
global code
code = True
pro = Process(target=func)
pro.start()
while code == False:
pass
pro.terminate()
pro.join()
print('Done!')
On running this nothing appears on the screen. When I terminate the program, by pressing CTRL-C, the stack trace shows that the while loop was being executed.
Python has a few concurrency libraries: threading, multiprocessing and asyncio (and more).
multiprocessing is a library which uses subprocesses to bypass python's inability to concurrently run CPU intensive tasks. To share variables between different multiprocessing.Processes, create them via a multiprocessing.Manager() instance. For example:
import multiprocessing
import time
def func(event):
print("> func()")
time.sleep(1)
print("setting event")
event.set()
time.sleep(1)
print("< func()")
def main():
print("In main()")
manager = multiprocessing.Manager()
event = manager.Event()
p = multiprocessing.Process(target=func, args=(event,))
p.start()
while not event.is_set():
print("waiting...")
time.sleep(0.2)
print("OK! joining func()...")
p.join()
print('Done!')
if __name__ == "__main__":
main()
I've gone through (this SO thread)[ Synchronization issue using Python's multiprocessing module but it doesnt provide the answer.
The following code :-
rom multiprocessing import Process, Lock
def f(l, i):
l.acquire()
print 'hello world', i
l.release()
# do something else
if __name__ == '__main__':
lock = Lock()
for num in range(10):
Process(target=f, args=(lock, num)).start()
How do I get the processes to execute in order.? I want to hold up a lock for a few seconds and then release it and thereby moving forward with the P1 and P2 into the lock, and then P2 moving forward and P3 exceuting that lock. How would I get the processes to execute in order.?
It sounds like you just want to delay the start of each successive process. If that's the case, you can use a multiprocessing.Event to delay starting the next child in the main process. Just pass the event to the child, and have the child set the Event when its done doing whatever should run prior to starting the next child. The main process can wait on that Event, and once it's signalled, clear it and start the next child.
from multiprocessing import Process, Event
def f(e, i):
print 'hello world', i
e.set()
# do something else
if __name__ == '__main__':
event = Event()
for num in range(10):
p = Process(target=f, args=(event, num))
p.start()
event.wait()
event.clear()
this is not the purpose of locks. Your code architecture is bad for your use case. I think you should refactor your code to this:
from multiprocessing import Process
def f(i):
# do something here
if __name__ == '__main__':
for num in range(10):
print 'hello world', num
Process(target=f, args=(num,)).start()
in this case it will print in order and then will do the remaining part asynchronously