I want to write a service that launches multiple workers that work infinitely and then quit when main process is Ctrl+C'd. However, I do not understand how to handle Ctrl+C correctly.
I have a following testing code:
import os
import multiprocessing as mp
def g():
print(os.getpid())
while True:
pass
def main():
with mp.Pool(1) as pool:
try:
s = pool.starmap(g, [[]] * 1)
except KeyboardInterrupt:
print('Done')
if __name__ == "__main__":
print(os.getpid())
main()
When I try to Ctrl+C it, I expect process(es) running g to just receive SIGTERM and silently terminate, however, I receive something like that instead:
Process ForkPoolWorker-1:
Done
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "test.py", line 8, in g
pass
KeyboardInterrupt
This obviously means that parent and children processes both raise KeyboardInterrupt from Ctrl+C, further suggested by tests with kill -2. Why does this happen and how to deal with it to achieve what I want?
The signal that triggers KeyboardInterrupt is delivered to the whole pool. The child worker processes treat it the same as the parent, raising KeyboardInterrupt.
The easiest solution here is:
Disable the SIGINT handling in each worker on creation
Ensure the parent terminates the workers when it catches KeyboardInterrupt
You can do this easily by passing an initializer function that the Pool runs in each worker before the worker begins doing work:
import signal
import multiprocessing as mp
# Use initializer to ignore SIGINT in child processes
with mp.Pool(1, initializer=signal.signal, initargs=(signal.SIGINT, signal.SIG_IGN)) as pool:
try:
s = pool.starmap(g, [[]] * 1)
except KeyboardInterrupt:
print('Done')
The initializer replaces the default SIGINT handler with one that ignores SIGINT in the children, leaving it up to the parent process to kill them. The with statement in the parent handles this automatically (exiting the with block implicitly calls pool.terminate()), so all you're responsible for is catching the KeyboardInterrupt in the parent and converting from ugly traceback to simple message.
Related
When I run the below code:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
q = Queue()
def my_task(x, queue):
queue.put("Task Complete")
return x
with ProcessPoolExecutor() as executor:
tasks = [executor.submit(my_task, i, q) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
I get this error:
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "/usr/lib/python3.10/multiprocessing/context.py", line 373, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/nn.py", line 14, in <module>
print(task.result())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "/usr/lib/python3.10/multiprocessing/context.py", line 373, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
What is the purpose of multiprocessing.Queue if I cannot using for multiprocessing? How can I make this to work? In my real code, I need every worker to update a queue frequently about the task status so another thread will get data from that queue to feed a progress bar.
Short Explanation
Why can't you pass a multiprocessing.Queue as a worker function argument? The short answer is that submitted tasks are submitted to a transparent input queue from which the pool processes get the next task to be performed. But these arguments must be serializable with pickle and a multiprocessing.Queue is not in general serializable. But it is serializable for the special case of passing an argument to a child process as a function argument. Arguments to a multiprocessing.Process are stored as an attribute of the instance when it is created. When start is called on the instance, its state must be serialized to the new address space before the run method is called in that new address space. Why this serialization works for this case but not the general case is unclear to me; I would have to spend a lot of time looking at the source for the interpreter to come up with a definitive answer.
See what happens when I try to put a queue instance to a queue:
>>> from multiprocessing import Queue
>>> q1 = Queue()
>>> q2 = Queue()
>>> q1.put(q2)
>>> Traceback (most recent call last):
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 239, in _feed
obj = _ForkingPickler.dumps(obj)
File "C:\Program Files\Python38\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 359, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
>>> import pickle
>>> b = pickle.dumps(q2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python38\lib\multiprocessing\queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 359, in assert_spawning
raise RuntimeError(
RuntimeError: Queue objects should only be shared between processes through inheritance
>>>
How to Pass the Queue via Inheritance
First of all your code will run more slowly using multiprocessing then if you had just called my_task in a loop because multiprocessing introduces additional overhead (starting of processes and moving data across address spaces) which requires that what you gain from running my_task in parallel more than offsets the additional overhead. In your case it doesn't because my_task is not sufficiently CPU-intensive as to justify multiprocessing.
That said, when you wish to have your pool processes using a multiprocessing.Queue instance, it cannot be passed as an argument to a worker function (unlike the case when you are using explicitly multiprocessing.Process instances instead of a pool). Instead, you must initialize a global variable in each pool process with the queue instance.
If you are running under a platform that uses fork to create new processes, then you can just create queue as a global and it will be inherited by each pool process:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
queue = Queue()
def my_task(x):
queue.put("Task Complete")
return x
with ProcessPoolExecutor() as executor:
tasks = [executor.submit(my_task, i) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
# This queue must be read before the pool terminates:
for _ in range(10):
print(queue.get())
Prints:
1
0
2
3
6
5
4
7
8
9
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
Task Complete
If you need portability with platforms that do not use the fork method to create processes, such as Windows (which uses the spawn method), then you cannot allocate the queue as a global since each pool process will create its own queue instance. Instead, the main process must create the queue and then initialize each pool process' global queue variable by using the initializer and initargs:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Queue
def init_pool_processes(q):
global queue
queue = q
def my_task(x):
queue.put("Task Complete")
return x
# Windows compatibilitY
if __name__ == '__main__':
q = Queue()
with ProcessPoolExecutor(initializer=init_pool_processes, initargs=(q,)) as executor:
tasks = [executor.submit(my_task, i) for i in range(10)]
for task in as_completed(tasks):
print(task.result())
# This queue must be read before the pool terminates:
for _ in range(10):
print(q.get())
If you want to advance a progress bar as each task completes (you haven't precisely stated how the bar is to advance; see my comment to your question), then the following shows that a queue is necessary. If each task submitted consisted of N parts (for a total of 10 * N parts, since there are 10 tasks) and would like to see a single progress bar advance as each part is completed, then a queue is probably the most straight forward way of signaling a part completion back to the main process.
from concurrent.futures import ProcessPoolExecutor, as_completed
from tqdm import tqdm
def my_task(x):
return x
# Windows compatibilitY
if __name__ == '__main__':
with ProcessPoolExecutor() as executor:
with tqdm(total=10) as bar:
tasks = [executor.submit(my_task, i) for i in range(10)]
for _ in as_completed(tasks):
bar.update()
# To get the results in task submission order:
results = [task.result() for task in tasks]
print(results)
Given this example scenario:
def _callback(result):
if result == 2:
# introduce an exception into one of the callbacks
raise Exception("foo")
print (result)
def _target(v):
return v
worker_pool = Pool()
for i in range(10):
worker_pool.apply_async(_target, args=(i,), callback=_callback)
worker_pool.close()
worker_pool.join()
I was hoping to see each value of i printed except for i=2, which would instead have yielded an exception.
Instead I see something like the following:
0
1
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 479, in _handle_results
cache[job]._set(i, obj)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 649, in _set
self._callback(self._value)
File "test3.py", line 6, in _callback
raise Exception("foo")
Exception: foo
... and then execution just hangs.
I'm aware that Pool handles callbacks on a separate thread, but why would execution hang and how can I reliably guard against errors in a task's callback?
This is happening because the exception inside the callback method is basically killing the thread that handles the Pool, as it does not have an except block to handle this kind of situation. After the Thread is dead, it's unable to join the worker_pool, so your application hangs.
I believe that's a decision made by the Python maintainers, so the best way to handle this exception is to envelop your code inside a try/except block and handle it, instead of bubbling and letting the thread be killed.
I am trying to use the example Pika Async consumer (http://pika.readthedocs.io/en/0.10.0/examples/asynchronous_consumer_example.html) as a multiprocessing process (by making the ExampleConsumer class subclass multiprocessing.Process). However, I'm running into some issues with gracefully shutting down everything.
Let's say for example I have defined my procs as below:
for k, v in queues_callbacks.iteritems():
proc = ExampleConsumer(queue, k, v, rabbit_user, rabbit_pw, rabbit_host, rabbit_port)
"queues_callbacks" is basically just a dictionary of exchange : callback_function (ideally I'd like to be able to connect to several exchanges with this architecture).
Then I do the normal python way of dealing with starting processes:
try:
for proc in self.consumers:
proc.start()
for proc in self.consumers:
proc.join()
except KeyboardInterrupt:
for proc in self.consumers:
proc.terminate()
proc.join(1)
The issue is coming when I try to stop everything. Let's say I've overriden the "terminate" method to call the consumer's "stop" method then continue on with the normal terminate of Process. With this structure, I am getting some strange attribute errors
Traceback (most recent call last):
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 154, in <module>
main()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 150, in main
mybot.start()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 71, in start
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 53, in stop
self.__stop_consumers__()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 130, in __stop_consumers__
self.consumers[0].terminate()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 414, in terminate
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 399, in stop
self._connection.ioloop.start()
AttributeError: 'NoneType' object has no attribute 'ioloop'
It's as if these attributes somehow disappear at some point. In the particular case above, _connection is initialized as None, but then gets set when the Consumer is started. However, when the "stop" method is called, it has already reverted back to None (with nothing set to do so). I'm also observing other strange behavior, such as times when it appears that things are getting called twice (even though "stop" is called once). Any ideas as to what is going on here, or is this not the proper way of architecting this?
Thanks!
I have a Python process which spawns 5 other Python processes using the multiprocessing module. Let's call the parent process P0 and the others P1-P5. The requirement is, if we send a SIGTERM to P0, it should shut down P1 to P5 first and then exit itself.
The catch is P1 and P5 are waiting on semaphores. So when I send SIGTERM to these processes, they invoke the signal handler and exit. But since they are waiting on semaphore, they throw an exception. Is there any way to catch that exception before exit, so that P0 to P5 can make a graceful exit?
Traceback:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
Process Process-2:
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
Process Process-5:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/opt/fireeye/scripts/mip/StaticAnalysisRunner.py", line 45, in run
qsem.acquire()
You can install a signal handler which throws an exception which is then caught in the subprocess to handle exits gracefully.
Here is an example of a script which waits in a semaphore in a subprocess and terminates gracefully when sent a SIGTERM.
#!/usr/bin/env python
import signal
import time
import multiprocessing
class GracefulExit(Exception):
pass
def signal_handler(signum, frame):
raise GracefulExit()
def subprocess_function():
try:
sem = multiprocessing.Semaphore()
print "Acquiring semaphore"
sem.acquire()
print "Semaphore acquired"
print "Blocking on semaphore - waiting for SIGTERM"
sem.acquire()
except GracefulExit:
print "Subprocess exiting gracefully"
if __name__ == "__main__":
# Use signal handler to throw exception which can be caught to allow
# graceful exit.
signal.signal(signal.SIGTERM, signal_handler)
# Start a subprocess and wait for it to terminate.
p = multiprocessing.Process(target=subprocess_function)
p.start()
print "Subprocess pid: %d" % p.pid
p.join()
An example run of this script is as follows:
$ ./test.py
Subprocess pid: 7546
Acquiring semaphore
Semaphore acquired
Blocking on semaphore - waiting for SIGTERM
----> Use another shell to kill -TERM 7546
Subprocess exiting gracefully
There is no traceback from the subprocess and the flow shows that the subprocess exits in a graceful manner. This is because the SIGTERM is caught by the subprocess signal handler which throws a normal Python exception which can be handled inside the process.
I'm using the multiprocessing module to do parallel processing in my program, I Want to get the share dict object between multi-processes, I could do it when multi-process is closed normally, but could not get it when pressing CTRL+C, How could i achieve my goal ?
my code as follows
#!/usr/bin/python
from multiprocessing import Process, Manager, Pool
import os
import signal
import time
def init_worker():
signal.signal(signal.SIGINT, signal.SIG_IGN)
def run_worker(i,my_d):
print 'Work Started: %d %d' % (os.getpid(), i)
for j in range(5):
print j
tmp1 = str(i) + str(j)
my_d[tmp1] = j
time.sleep(1)
def main():
print "Initializng 3 workers"
pool = Pool(3, init_worker)
manager = Manager()
my_d = manager.dict()
try:
for i in range(3):
pool.apply_async(run_worker,args=(i,my_d))
pool.close()
pool.join()
print my_d
# When process is closed normally, I could get the my_d successfully
except KeyboardInterrupt:
print "Caught KeyboardInterrupt, terminating workers"
pool.terminate()
pool.join()
print my_d
#When process is closed by Ctrl+C, couldn't I get the my_d ?
if __name__ == "__main__":
main()
You need a shared dictionary, from the Manager-Object in multiprocessing.
See this similar question here (first answer):
Python multiprocessing: How do I share a dict among multiple processes?
Look at the error that is produced when you interrupt the parent process:
Caught KeyboardInterrupt, terminating workers
<DictProxy object, typeid 'dict' at 0x801abe150; '__str__()' failed>
Try changing print "Caught KeyboardInterrupt, terminating workers" to print len(my_d) and you can see in detail what happens. Note that this is before you try and terminate/join the pool of workers:
Traceback (most recent call last):
File "manager-test.py", line 39, in <module>
main()
File "manager-test.py", line 33, in main
print len(my_d)
File "<string>", line 2, in __len__
File "/usr/local/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/usr/local/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/local/lib/python2.7/multiprocessing/connection.py", line 169, in Client
c = SocketClient(address)
File "/usr/local/lib/python2.7/multiprocessing/connection.py", line 293, in SocketClient
s.connect(address)
File "/usr/local/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 2] No such file or directory
When you interrupt the main program, the connection from the child processes to the manager will be broken. This leaves the manager (and the objects that it manages) in an unusable state. The socket connection from the manager to the child processes doesn't work anymore, so the proxy cannot get the data.
If you want to interrupt long-running processes without losing data, you should do it more gently, I think. Something like this:
import select
import sys
print 'Type q<enter> it you want to quit...'
while True:
r, foo, bla = select.select([sys.stdin], [], [], 1)
if len(r):
what = sys.stdin.readline()
if 'q' in what:
print 'bye!'
break;
# E.g. check on the progress of your calculation here
# Close and join the pool here, and do other clean-up.