I have written the following code to understand the sys.excepthook on a multiprocess environment. I am using python 3. I have created 2 processes which would print and wait for getting ctrl+c.
from multiprocessing import Process
import multiprocessing
import sys
from time import sleep
class foo:
def f(self, name):
try:
raise ValueError("test value error")
except ValueError as e:
print(e)
print('hello', name)
while True:
pass
def myexcepthook(exctype, value, traceback):
print("Value: {}".format(value))
for p in multiprocessing.active_children():
p.terminate()
def main(name):
a = foo()
a.f(name)
sys.excepthook = myexcepthook
if __name__ == '__main__':
for i in range(2):
p = Process(target=main, args=('bob', ))
p.start()
I was expecting the following result when I press ctrl+C
python /home/test/test.py
test value error
hello bob
test value error
hello bob
Value: <KeyboardInterrupt>
But unfortunately, I got the following result.
/home/test/venvPython3/bin/python /home/test/test.py
test value error
hello bob
test value error
hello bob
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
Process Process-1:
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/test/test.py", line 26, in main
a.f(name)
File "/home/test/test.py", line 15, in f
pass
KeyboardInterrupt
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/test/test.py", line 26, in main
a.f(name)
File "/home/test/test.py", line 15, in f
pass
KeyboardInterrupt
Process finished with exit code 0
It will be a great help if somebody could point out what am I doing wrong. Also, please let me know how to get the expected output.
You almost did it.
At first, use exctype to print:
def myexcepthook(exctype, value, traceback):
print("Value: {}".format(exctype))
for p in multiprocessing.active_children():
p.terminate()
And join() created processes, to prevent premature exit
if __name__ == '__main__':
pr = []
for i in range(2):
p = Process(target=main, args=('bob', ))
p.start()
pr.append(p)
for p in pr:
p.join()
Related
I run the following code, and I want the child process to continue to complete the job when the parent process is terminated.
I need the child process to complete each submitted job.
The parent process doesn't matter.
import signal
from concurrent.futures import ProcessPoolExecutor
import time
def sub_task(bulks):
def signal_handler(signal, frame):
pass
signal.signal(signal.SIGINT, signal_handler)
for i in bulks:
time.sleep(0.1)
print("Success")
return True
def main():
# it is 40 cores
pool = ProcessPoolExecutor(4)
bulks = []
try:
for i in range(10000):
time.sleep(0.05)
bulks.append(i)
if len(bulks) >= 100:
print("Send")
pool.submit(sub_task, bulks)
bulks = []
if bulks:
pool.submit(sub_task, bulks)
except:
import traceback
traceback.print_exc()
pool.shutdown(wait=True)
return True
main()
But got this exception:
Send
Send
Success
Send
^CProcess ForkProcess-4:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/concurrent/futures/process.py", line 233, in _process_worker
call_item = call_queue.get(block=True)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 97, in get
res = self._recv_bytes()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
Traceback (most recent call last):
File "test1.py", line 37, in main
time.sleep(0.05)
KeyboardInterrupt
Can anyone help me out?
Thanks in advance
My reproduction is wrong, as noted in Rugnar's answer. I'm leaving the code mostly as-is as I'm not sure where this falls between clarifying and changing the meaning.
I have some thousands of jobs that I need to run and would like any errors to halt execution immediately.
I wrap the task in a try / except … raise so that I can log the error (without all the multiprocessing/threading noise), then reraise.
This does not kill the main process.
What's going on, and how can I get the early exit I'm looking for?
sys.exit(1) in the child deadlocks, wrapping the try / except … raise function in yet another function doesn't work either.
$ python3 mp_reraise.py
(0,)
(1,)
(2,)
(3,)
(4,)
(5,)
(6,)
(7,)
(8,)
(9,)
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "mp_reraise.py", line 5, in f_reraise
raise Exception(args)
Exception: (0,)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "mp_reraise.py", line 14, in <module>
test_reraise()
File "mp_reraise.py", line 12, in test_reraise
p.map(f_reraise, range(10))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
Exception: (0,)
mp_reraise.py
import multiprocessing
def f_reraise(*args):
try:
raise Exception(args)
except Exception as e:
print(e)
raise
def test_reraise():
with multiprocessing.Pool() as p:
p.map(f_reraise, range(10))
test_reraise()
If I don't catch and reraise, execution stops early as expected:
[this actually does not stop, as per Rugnar's answer]
$ python3 mp_raise.py
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "mp_raise.py", line 4, in f_raise
raise Exception(args)
Exception: (0,)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "mp_raise.py", line 10, in <module>
test_raise()
File "mp_raise.py", line 8, in test_raise
p.map(f_raise, range(10))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
Exception: (0,)
mp_raise.py
import multiprocessing
def f_raise(*args):
# missing print, which would demonstrate that
# this actually does not stop early
raise Exception(args)
def test_raise():
with multiprocessing.Pool() as p:
p.map(f_raise, range(10))
test_raise()
In your mp_raise.py you dont print anything so you dont see how much jobs were done. I added print and found out that pool sees an exeption of the child only when jobs iterator is exhausted. So it never stop early.
If you need stop early after exception, try this
import time
import multiprocessing as mp
def f_reraise(i):
if abort.is_set(): # cancel job if abort happened
return
time.sleep(i / 1000) # add sleep so jobs are not instant, like in real life
if abort.is_set(): # probably we need stop job in the middle of execution if abort happened
return
print(i)
try:
raise Exception(i)
except Exception as e:
abort.set()
print('error:', e)
raise
def init(a):
global abort
abort = a
def test_reraise():
_abort = mp.Event()
# jobs should stop being fed to the pool when abort happened
# so we wrap jobs iterator this way
def pool_args():
for i in range(100):
if not _abort.is_set():
yield i
# initializer and init is a way to share event between processes
# thanks to https://stackoverflow.com/questions/25557686/python-sharing-a-lock-between-processes
with mp.Pool(8, initializer=init, initargs=(_abort,)) as p:
p.map(f_reraise, pool_args())
if __name__ == '__main__':
test_reraise()
I'm trying to use a multiprocessing.Array in two separate processes in Python 3.7.4 (macOS 10.14.6). I start off by creating a new process using the spawn context, passing as an argument to it an Array object:
import multiprocessing, time, ctypes
def fn(c):
time.sleep(1)
print("value:", c.value)
def main():
ctx = multiprocessing.get_context("spawn")
arr = multiprocessing.Array(ctypes.c_char, 32)
p = ctx.Process(target=fn, args=(arr,))
p.start()
arr.value = b"hello"
p.join()
if __name__ == "__main__":
main()
However, when I try to read it, I get the following error:
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Users/federico/Workspace/test/test.py", line 6, in fn
print("value:", c.value)
File "<string>", line 3, in getvalue
OSError: [Errno 9] Bad file descriptor
The expected output, however, is value: hello. Anyone know what could be going wrong here? Thanks.
The array should also be defined in the context that you define for the multiprocessing like so:
import multiprocessing, time
import ctypes
from multiprocessing import Process
def fn(arr):
time.sleep(1)
print("value:", arr.value)
def main():
ctx = multiprocessing.get_context("spawn")
arr = ctx.Array(ctypes.c_char, 32)
p = ctx.Process(target=fn, args=(arr,))
p.start()
arr.value = b'hello'
p.join()
if __name__ == "__main__":
main()
My coworker asked for my help with a problem he was having with a daemon script he is working on. He was having a strange error involving a multiprocessing.Manager, which I managed to reproduce with the following five lines:
import multiprocessing, os, sys
mgr = multiprocessing.Manager()
pid = os.fork()
if pid > 0:
sys.exit(0)
When run on CentOS 6 Linux and Python 2.6, I get the following error:
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/util.py", line 235, in _run_finalizers
finalizer()
File "/usr/lib64/python2.6/multiprocessing/util.py", line 174, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 576, in _finalize_manager
if process.is_alive():
File "/usr/lib64/python2.6/multiprocessing/process.py", line 129, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib64/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib64/python2.6/multiprocessing/util.py", line 269, in _exit_function
p.join()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 117, in join
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib64/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib64/python2.6/multiprocessing/util.py", line 269, in _exit_function
p.join()
File "/usr/lib64/python2.6/multiprocessing/process.py", line 117, in join
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
I suspect the error is due to some interaction between os.fork and the multiprocessing.Manager, and that he should use the multiprocessing module to create new processes instead of os.fork. Can anyone confirm this and/or explain what is going on? If my hunch is correct, why is this the wrong place to use os.fork?
The issue is that Manager create a process and try to stop it at sys.exit. Since the memory of the process is copied (lazily) during the fork both the parent and the child try to stop the process and wait for it to stop. However, as the exception mention only the parent process can do that. If instead of using os.fork, you use multiprocessing.Process which will spawn a new process which wouldn't try to close the Manager at sys.exit.
I have one multprocess demo here, and I met some problems with it. Researched for a night, I cannot resolve the reason.
Any one can help me?
I want to have one parent process acts as producer, when there are tasks come, the parent can fork some children to consume these tasks. The parent monitors the child, if any one exits with exception, it can be restarted by parent.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from multiprocessing import Process, Queue from Queue import Empty import sys, signal, os, random, time import traceback
child_process = []
child_process_num = 4
queue = Queue(0)
def work(queue):
signal.signal(signal.SIGINT, signal.SIG_DFL)
signal.signal(signal.SIGTERM, signal.SIG_DFL)
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
time.sleep(10) #demo sleep
def kill_child_processes(signum, frame):
#terminate all children
pass
def restart_child_process(signum, frame):
global child_process
for i in xrange(len(child_process)):
child = child_process[i]
try:
if child.is_alive():
continue
except OSError, e:
pass
child.join() #join this process to make sure there is no zombie process
new_child = Process(target=work, args=(queue,))
new_child.start()
child_process[i] = new_child #restart one new process
child = None
return
if __name__ == '__main__':
reload(sys)
sys.setdefaultencoding("utf-8")
for i in xrange(child_process_num):
child = Process(target=work, args=(queue,))
child.start()
child_process.append(child)
signal.signal(signal.SIGINT, kill_child_processes)
signal.signal(signal.SIGTERM, kill_child_processes) #hook the SIGTERM
signal.signal(signal.SIGCHLD, restart_child_process)
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
When this program runs, there will be errors as below:
Error in atexit._run_exitfuncs:
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/local/python/lib/python2.6/atexit.py", line 30, in _run_exitfuncs
traceback.print_exc()
File "/usr/local/python/lib/python2.6/traceback.py", line 227, in print_exc
print_exception(etype, value, tb, limit, file)
File "/usr/local/python/lib/python2.6/traceback.py", line 124, in print_exception
_print(file, 'Traceback (most recent call last):')
File "/usr/local/python/lib/python2.6/traceback.py", line 12, in _print
def _print(file, str='', terminator='\n'):
File "test.py", line 42, in restart_child_process
new_child.start()
File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 99, in start
_cleanup()
File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup
if p._popen.poll() is not None:
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 10] No child processes
If I send signal to one child:kill –SIGINT {child_pid} I will get:
[root#mail1 mail]# kill -SIGINT 32545
[root#mail1 mail]# Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/local/python/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/local/python/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function
p.join()
File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 119, in join
res = self._popen.wait(timeout)
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 117, in wait
return self.poll(0)
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 4] Interrupted system call Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/local/python/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/local/python/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function
p.join()
File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 119, in join
res = self._popen.wait(timeout)
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 117, in wait
return self.poll(0)
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 4] Interrupted system call
Main proc is waiting for all child procs to be terminated before exits itself so there's a blocking call (i.e. wait4) registered as at_exit handles. The signal you sent interrupts that blocking call thus the stack trace.
The thing I'm not clear about is that if the signal sent to child would be redirected to the parent process, which then interrupted that wait4 call. This is something related to the Unix process group behaviors.