How to complete a childprocess's job when the parent is interrupted? - python

I run the following code, and I want the child process to continue to complete the job when the parent process is terminated.
I need the child process to complete each submitted job.
The parent process doesn't matter.
import signal
from concurrent.futures import ProcessPoolExecutor
import time
def sub_task(bulks):
def signal_handler(signal, frame):
pass
signal.signal(signal.SIGINT, signal_handler)
for i in bulks:
time.sleep(0.1)
print("Success")
return True
def main():
# it is 40 cores
pool = ProcessPoolExecutor(4)
bulks = []
try:
for i in range(10000):
time.sleep(0.05)
bulks.append(i)
if len(bulks) >= 100:
print("Send")
pool.submit(sub_task, bulks)
bulks = []
if bulks:
pool.submit(sub_task, bulks)
except:
import traceback
traceback.print_exc()
pool.shutdown(wait=True)
return True
main()
But got this exception:
Send
Send
Success
Send
^CProcess ForkProcess-4:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/concurrent/futures/process.py", line 233, in _process_worker
call_item = call_queue.get(block=True)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 97, in get
res = self._recv_bytes()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
Traceback (most recent call last):
File "test1.py", line 37, in main
time.sleep(0.05)
KeyboardInterrupt
Can anyone help me out?
Thanks in advance

Related

How to properly setup sys.excepthook

I have written the following code to understand the sys.excepthook on a multiprocess environment. I am using python 3. I have created 2 processes which would print and wait for getting ctrl+c.
from multiprocessing import Process
import multiprocessing
import sys
from time import sleep
class foo:
def f(self, name):
try:
raise ValueError("test value error")
except ValueError as e:
print(e)
print('hello', name)
while True:
pass
def myexcepthook(exctype, value, traceback):
print("Value: {}".format(value))
for p in multiprocessing.active_children():
p.terminate()
def main(name):
a = foo()
a.f(name)
sys.excepthook = myexcepthook
if __name__ == '__main__':
for i in range(2):
p = Process(target=main, args=('bob', ))
p.start()
I was expecting the following result when I press ctrl+C
python /home/test/test.py
test value error
hello bob
test value error
hello bob
Value: <KeyboardInterrupt>
But unfortunately, I got the following result.
/home/test/venvPython3/bin/python /home/test/test.py
test value error
hello bob
test value error
hello bob
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
Process Process-1:
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/test/test.py", line 26, in main
a.f(name)
File "/home/test/test.py", line 15, in f
pass
KeyboardInterrupt
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/test/test.py", line 26, in main
a.f(name)
File "/home/test/test.py", line 15, in f
pass
KeyboardInterrupt
Process finished with exit code 0
It will be a great help if somebody could point out what am I doing wrong. Also, please let me know how to get the expected output.
You almost did it.
At first, use exctype to print:
def myexcepthook(exctype, value, traceback):
print("Value: {}".format(exctype))
for p in multiprocessing.active_children():
p.terminate()
And join() created processes, to prevent premature exit
if __name__ == '__main__':
pr = []
for i in range(2):
p = Process(target=main, args=('bob', ))
p.start()
pr.append(p)
for p in pr:
p.join()

Python multiprocessing pool map_async freezes

I have a list of 80,000 strings that I am running through a discourse parser, and in order to increase the speed of this process I have been trying to use the python multiprocessing package.
The parser code requires python 2.7 and I am currently running it on a 2-core Ubuntu machine using a subset of the strings. For short lists, i.e. 20, the process runs without an issue on both cores, however if I run a list of about 100 strings, both workers will freeze at different points (so in some cases worker 1 won't stop until a few minutes after worker 2). This happens before all the strings are finished and anything is returned. Each time the cores stop at the same point given the same mapping function is used, but these points are different if I try a different mapping function, i.e. map vs map_async vs imap.
I have tried removing the strings at those indices, which did not have any affect and those strings run fine in a shorter list. Based on print statements I included, when the process appears to freeze the current iteration seems to finish for the current string and it just does not move on to the next string. It takes about an hour of run time to reach the spot where both workers have frozen and I have not been able to reproduce the issue in less time. The code involving the multiprocessing commands is:
def main(initial_file, chunksize = 2):
entered_file = pd.read_csv(initial_file)
entered_file = entered_file.ix[:, 0].tolist()
pool = multiprocessing.Pool()
result = pool.map_async(discourse_process, entered_file, chunksize = chunksize)
pool.close()
pool.join()
with open("final_results.csv", 'w') as file:
writer = csv.writer(file)
for listitem in result.get():
writer.writerow([listitem[0], listitem[1]])
if __name__ == '__main__':
main(sys.argv[1])
When I stop the process with Ctrl-C (which does not always work), the error message I receive is:
^CTraceback (most recent call last):
File "Combined_Script.py", line 94, in <module>
main(sys.argv[1])
File "Combined_Script.py", line 85, in main
pool.join()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 474, in join
p.join()
File "/usr/lib/python2.7/multiprocessing/process.py", line 145, in join
res = self._popen.wait(timeout)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 154, in wait
return self.poll(0)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 135, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 117, in worker
put((job, i, result))
File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put
wacquire()
KeyboardInterrupt
^CProcess PoolWorker-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 117, in worker
put((job, i, result))
File "/usr/lib/python2.7/multiprocessing/queues.py", line 392, in put
return send(obj)
KeyboardInterrupt
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib/python2.7/multiprocessing/util.py", line 305, in _exit_function
_run_finalizers(0)
File "/usr/lib/python2.7/multiprocessing/util.py", line 274, in _run_finalizers
finalizer()
File "/usr/lib/python2.7/multiprocessing/util.py", line 207, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 500, in _terminate_pool
outqueue.put(None) # sentinel
File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put
wacquire()
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib/python2.7/multiprocessing/util.py", line 305, in _exit_function
_run_finalizers(0)
File "/usr/lib/python2.7/multiprocessing/util.py", line 274, in _run_finalizers
finalizer()
File "/usr/lib/python2.7/multiprocessing/util.py", line 207, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 500, in _terminate_pool
outqueue.put(None) # sentinel
File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put
wacquire()
KeyboardInterrupt
When I look at the memory in another command window using htop, memory is at <3% once the workers freeze. This is my first attempt at parallel processing and I am not sure what else I might be missing?
I was not able to solve the issue with multiprocessing pool, however I came across the loky package and was able to use it to run my code with the following lines:
executor = loky.get_reusable_executor(timeout = 200, kill_workers = True)
results = executor.map(discourse_process, entered_file)
You could define a time to your process to return a result, otherwise it would raise an error:
try:
result.get(timeout = 1)
except multiprocessing.TimeoutError:
print("Error while retrieving the result")
Also you could verify if your process is succesful with
import time
while True:
try:
result.succesful()
except Exception:
print("Result is not yet succesful")
time.sleep(1)
Finally, checking out https://docs.python.org/2/library/multiprocessing.html ,is helpful.

Python Multiprocessing Processes: Delay in recognizing Event when using Manager Queue

I have the following codes:
from multiprocessing import Process, Manager, Event
manager = Manager()
shared_Queue = manager.Queue(10)
ev = Event()
def do_this(shared_queue, ev):
while not ev.is_set():
if not shared_Queue.__getattribute__('empty')():
item = shared_queue.get()
print item
print 'released!'
subprocs = []
for i in xrange(10):
subproc = Process(target=do_this, args=(shared_Queue, ev, ))
subprocs.append(subproc)
subproc.start()
now, if I run this, and I ask whether these processes are alive:
for subproc in subprocs: print subproc.is_alive()
of course I get all Trues.
After couple of doing these: * there is no error if I don't do these!
shared_Queue.put(3)
shared_Queue.put(5)
Now I want to set the Event to kill all of them using:
ev.set()
But then instead of seeing 'released!' 10 times, I get varying number of these prints, and after about 2 to 5 seconds, I get a bunch of errors:
released!
released!
released!
released!
released!
released!
released!
Process Process-10:
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 258, in _bootstrap
self.run()
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "<input>", line 10, in do_this
File "<string>", line 2, in get
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/managers.py", line 759, in _callmethod
kind, result = conn.recv()
EOFError
Process Process-5:
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 258, in _bootstrap
self.run()
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "<input>", line 10, in do_this
File "<string>", line 2, in get
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/managers.py", line 759, in _callmethod
kind, result = conn.recv()
EOFError
Process Process-7:
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 258, in _bootstrap
self.run()
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "<input>", line 10, in do_this
File "<string>", line 2, in get
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/managers.py", line 759, in _callmethod
kind, result = conn.recv()
EOFError
Why is it that some processes are unable to recognize the Event set and show up as errors later? Is there a better way to signal them to die?
Thanks for the comment stovfl, you are right, ev.set() does not kill anything I was carelessly using the word.
As for the issue I was having, I learned that multiprocessing Queue is process and thread safe, meaning, my process will halt just before writing something into the Queue if the Queue is already full.
If I try to set the event while some of the processes are still waiting to write something to the full Queue, they will not recognize the event set.
The key was to empty all the Queue, let the process write to it, and the get to the first line where it can check on the event!

Python multiprocessing on For Loop

First of all, I know there are quite some threads about multiprocessing on python already, but none of these seems to solve my problem.
Here is my problem:
I want to implement Random Forest Algorithm, and a naive way to do so would be like this:
def random_tree(Data):
tree = calculation(Data)
forest.append(tree)
forest = list()
for i in range(300):
random_tree(Data)
And theforest with 300 "trees" inside would be my final result. In this case, how do I turn this code into a multiprocessing version?
Update:
I just tried Mukund M K's method, in a very simplified script:
from multiprocessing import Pool
def f(x):
return 2*x
data = np.array([1,2,5])
pool = Pool(processes=4)
forest = pool.map(f, (data for i in range(4)))
# I use range() instead of xrange() because I am using Python 3.4
And now....the script is running like forever.....I open a python shell and enter the script line by line, and this is the messages I've got:
> Process SpawnPoolWorker-1:
> Process SpawnPoolWorker-2:
> Traceback (most recent call last):
> Process SpawnPoolWorker-3:
> Traceback (most recent call last):
> Process SpawnPoolWorker-4:
> Traceback (most recent call last):
> Traceback (most recent call last):
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
task = get()
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
task = get()
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
task = get()
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
task = get()
> File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
> File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
> AttributeError: Can't get attribute 'f' on
> AttributeError: Can't get attribute 'f' on
File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
> AttributeError: Can't get attribute 'f' on
File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
> AttributeError: Can't get attribute 'f' on
Update: I edited my sample code according to some other example code like this:
from multiprocessing import Pool
import numpy as np
def f(x):
return 2*x
if __name__ == '__main__':
data = np.array([1,2,3])
with Pool(5) as p:
result = p.map(f, (data for i in range(300)))
And it works now. What I need to do now is to fill in this with more sophisticated algorithm now..
Yet another question in my mind is: why could this code work, while the previous version couldn't?
You can do it with multiprocessing this way:
from multiprocessing import Pool
def random_tree(Data):
return calculation(Data)
pool = Pool(processes=4)
forest = pool.map(random_tree, (Data for i in range(300)))
Package processing might help you. Check it out here.

One python multiprocess errors

I have one multprocess demo here, and I met some problems with it. Researched for a night, I cannot resolve the reason.
Any one can help me?
I want to have one parent process acts as producer, when there are tasks come, the parent can fork some children to consume these tasks. The parent monitors the child, if any one exits with exception, it can be restarted by parent.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from multiprocessing import Process, Queue from Queue import Empty import sys, signal, os, random, time import traceback
child_process = []
child_process_num = 4
queue = Queue(0)
def work(queue):
signal.signal(signal.SIGINT, signal.SIG_DFL)
signal.signal(signal.SIGTERM, signal.SIG_DFL)
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
time.sleep(10) #demo sleep
def kill_child_processes(signum, frame):
#terminate all children
pass
def restart_child_process(signum, frame):
global child_process
for i in xrange(len(child_process)):
child = child_process[i]
try:
if child.is_alive():
continue
except OSError, e:
pass
child.join() #join this process to make sure there is no zombie process
new_child = Process(target=work, args=(queue,))
new_child.start()
child_process[i] = new_child #restart one new process
child = None
return
if __name__ == '__main__':
reload(sys)
sys.setdefaultencoding("utf-8")
for i in xrange(child_process_num):
child = Process(target=work, args=(queue,))
child.start()
child_process.append(child)
signal.signal(signal.SIGINT, kill_child_processes)
signal.signal(signal.SIGTERM, kill_child_processes) #hook the SIGTERM
signal.signal(signal.SIGCHLD, restart_child_process)
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
When this program runs, there will be errors as below:
Error in atexit._run_exitfuncs:
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/local/python/lib/python2.6/atexit.py", line 30, in _run_exitfuncs
traceback.print_exc()
File "/usr/local/python/lib/python2.6/traceback.py", line 227, in print_exc
print_exception(etype, value, tb, limit, file)
File "/usr/local/python/lib/python2.6/traceback.py", line 124, in print_exception
_print(file, 'Traceback (most recent call last):')
File "/usr/local/python/lib/python2.6/traceback.py", line 12, in _print
def _print(file, str='', terminator='\n'):
File "test.py", line 42, in restart_child_process
new_child.start()
File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 99, in start
_cleanup()
File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup
if p._popen.poll() is not None:
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 10] No child processes
If I send signal to one child:kill –SIGINT {child_pid} I will get:
[root#mail1 mail]# kill -SIGINT 32545
[root#mail1 mail]# Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/local/python/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/local/python/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function
p.join()
File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 119, in join
res = self._popen.wait(timeout)
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 117, in wait
return self.poll(0)
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 4] Interrupted system call Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/local/python/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/local/python/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function
p.join()
File "/usr/local/python/lib/python2.6/multiprocessing/process.py", line 119, in join
res = self._popen.wait(timeout)
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 117, in wait
return self.poll(0)
File "/usr/local/python/lib/python2.6/multiprocessing/forking.py", line 106, in poll
pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 4] Interrupted system call
Main proc is waiting for all child procs to be terminated before exits itself so there's a blocking call (i.e. wait4) registered as at_exit handles. The signal you sent interrupts that blocking call thus the stack trace.
The thing I'm not clear about is that if the signal sent to child would be redirected to the parent process, which then interrupted that wait4 call. This is something related to the Unix process group behaviors.

Categories

Resources