Gracefully terminate multiprocessing based program

Gracefully terminate multiprocessing based program - python

I am working on a python service that spawns Process to handle the workload. Since I don't know at the start of the service how many workers I need, I chose to not use Pool. The following is a simplified version:
import multiprocessing as mp
import time
from datetime import datetime
def _print(s): # just my cheap logging utility
print(f'{datetime.now()} - {s}')
def run_in_process(q, evt):
_print(f'starting process job')
while not evt.is_set(): # True
try:
x = q.get(timeout=2)
_print(f'received {x}')
except:
_print(f'timed-out')
if __name__ == '__main__':
with mp.Manager() as manager:
q = manager.Queue()
evt = manager.Event()
p = mp.Process(target=run_in_process, args=(q, evt))
p.start()
time.sleep(2)
data = 100
while True:
try:
q.put(data)
time.sleep(0.5)
data += 1
if data > 110:
break
except KeyboardInterrupt:
_print('finishing...')
#p.terminate()
break
time.sleep(3)
_print('setting event 0')
evt.set()
_print('joining process')
p.join()
_print('done')
The program works and exits gracefully, without any error messages. However, if I use Ctrl-C before I have all 10 events processed, I get the following error before it exits.
2022-04-01 12:41:06.866484 - received 101
2022-04-01 12:41:07.367628 - received 102
^C2022-04-01 12:41:07.507805 - timed-out
2022-04-01 12:41:07.507886 - finishing...
Process Process-2:
Traceback (most recent call last):
File "/<path-omitted>/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/<path-omitted>/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "mp.py", line 10, in run_in_process
while not evt.is_set(): # True
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 1088, in is_set
return self._callmethod('is_set')
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 819, in _callmethod
kind, result = conn.recv()
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-01 12:41:10.511334 - setting event 0
Traceback (most recent call last):
File "mp.py", line 42, in <module>
evt.set()
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 1090, in set
return self._callmethod('set')
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 818, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
A few observations:
The double error message looks exactly the same when I press Ctrl-C with my actual project. I think this is a good representation of my problem.
If I add p.terminate(), it doesn't change the behavior if the program is left to finish by itself. But if I press Ctrl-C halfway, I encounter the error message only once, I guess it's from the main thread/process.
If I change while not evt.is_set(): in run_in_process to an infinite loop: while Tre: and let the program finish its course I would continue to see periodic time-out prints which make sense. What I don't understand is that, if I press Ctrl-C, then the terminal will start spewing time-out without any time gap between them. What happened?
My ultimate question is: what is the correct way of construct this program so that when Ctrl-C is used (or a termination signal is generated to the program for that matter), the program stops gracefully?

I found out a solution to this problem myself by using signal.
The idea is to set up a signal catcher to catch specific signals, such as signal.SIGINT, signal.SIGTERM.
import multiprocessing as mp
from threading import Event
import signal
if __name__ == '__main__':
main_evt = Event()
def stop_main_handler(signum, frame):
if not main_evt.is_set():
main_evt.set()
signal.signal(signal.SIGINT, stop_main_handler)
with mp.Manager() as manager:
# creating mp queue, event and process
q = manager.Queue()
evt = manager.Event()
p = mp.Process(target=..., args=(q, evt))
p.start()
while not main_evt.is_set():
# processing data
# cleanup
evt.set()
p.join()
Or you can wrap it in an object-oriented fashion:
class SignalCatcher(object):
def __init__(self):
self._main_evt = Event()
def _stop_handler(self, signum, frame):
if not self._main_evt.is_set():
self._main_evt.set()
def block_until_signaled(self):
while not self._main_evt.is_set()
time.sleep(2)
Then you can use it as follows:
if __name__ == '__main__':
sc = SignalCatcher()
# this has to be outside. It seems that there is another process
# created by multiprocessing library, if you put sc creation in
# with-context, it would fail to signal each process.
with mp.Manager() as manager:
# creating process and starting it
# ...
sc.block_until_signaled()
# cleanup
# ...

Related

terminate python multiprocessing pool cleanly

I am using multiprocessing.pool to work with the http server in python - it works great, but when I terminate, I get a slew of errors from all the spawnpoolworkers - and I'm just wondering how I avoid this.
My main code:
def run(self):
global pool
port = self.arguments.port
pool = multiprocessing.Pool( processes= self.arguments.threads)
with http.server.HTTPServer( ("", port), Handler ) as daemon:
print(f"serving on port {port}")
while True:
try:
daemon.handle_request()
except KeyboardInterrupt:
print("\nexiting")
pool.terminate()
pool.join()
return 0
I've tried doing nothing to the pool, I've tried doing pool.close() - I've tried not joining. But even if I just run that - never even access the port or call anything onto the pool, I still get a random list of things like this when I press control-c
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/opt/homebrew/Cellar/python#3.10/3.10.1/F
how do I exit the pool cleanly, with no errors, and with no output?

ok - I'm stupid - the control-c was also interrupting all the child processes. This fixed it:
def ignore_control_c():
signal.signal(signal.SIGINT, signal.SIG_IGN)
pool = multiprocessing.Pool( processes = self.arguments.threads, initializer = ignore_control_c )

Python APSCheduler throwing exception after removing job

I am adding job in redis and on completion of job I have added an event handler.
In eventhandler I am returning value based on which I am removing job id from jobstore. It is removed successfully but immediately it throws an exception.
Code
from datetime import datetime
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.events import EVENT_JOB_EXECUTED
import logging
logging.basicConfig()
scheduler = BackgroundScheduler()
scheduler.add_jobstore('redis')
scheduler.start()
def tick():
print('Tick! The time is: %s' % datetime.now())
return 'success'
def removing_jobs(event):
if event.retval == 'success':
scheduler.remove_job(event.job_id)
scheduler.add_listener(removing_jobs, EVENT_JOB_EXECUTED)
try:
count = 0
while True:
count += 1
time.sleep(10)
job_ret = scheduler.add_job(tick, 'interval', id = str(count), seconds=10)
except (KeyboardInterrupt, SystemExit):
scheduler.shutdown()
Exception
Exception in thread APScheduler:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/schedulers/blocking.py", line 30, in _main_loop
wait_seconds = self._process_jobs()
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/schedulers/base.py", line 995, in _process_jobs
jobstore.update_job(job)
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/jobstores/redis.py", line 91, in update_job
raise JobLookupError(job.id)
apscheduler.jobstores.base.JobLookupError: 'No job by the id of 1 was found'

In short: you are removing the job while it is processed;
so you should remove the job outside its execution.
That's because the scheduler doesn't know what the job's execution will do; so it launches tick and sends a job object to the redis jobstore thinking it will be executed again. Before that, the EVENT_JOB_LISTENER launches removing_jobs.
The problem is that when the redis' jobstore gets the job for update its status, it is already deleted so it raises the JobLookupError.

How to prevent BrokenPipeErrors after receiving a SIGINT while using process shared objects in Python?

This Python program:
import concurrent.futures
import multiprocessing
import time
class A:
def __init__(self):
self.event = multiprocessing.Manager().Event()
def start(self):
try:
while True:
if self.event.is_set():
break
print("processing")
time.sleep(1)
except BaseException as e:
print(type(e).__name__ + " (from pool thread):", e)
def shutdown(self):
self.event.set()
if __name__ == "__main__":
try:
a = A()
pool = concurrent.futures.ThreadPoolExecutor(1)
future = pool.submit(a.start)
while not future.done():
concurrent.futures.wait([future], timeout=0.1)
except BaseException as e:
print(type(e).__name__ + " (from main thread):", e)
finally:
a.shutdown()
pool.shutdown()
outputs:
processing
processing
processing
KeyboardInterrupt (from main thread):
BrokenPipeError (from pool thread): [WinError 232] The pipe is being closed
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 788, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".\foo.py", line 34, in <module>
a.shutdown()
File ".\foo.py", line 21, in shutdown
self.event.set()
File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 1067, in set
return self._callmethod('set')
File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 792, in _callmethod
self._connect()
File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 779, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "C:\Program Files\Python37\lib\multiprocessing\connection.py", line 490, in Client
c = PipeClient(address)
File "C:\Program Files\Python37\lib\multiprocessing\connection.py", line 691, in PipeClient
_winapi.WaitNamedPipe(address, 1000)
FileNotFoundError: [WinError 2] The system cannot find the file specified
when it is run and a SIGINT signal is sent after three seconds (by pressing Ctrl+C).
Analysis. — The SIGINT signal is sent to the main thread of each process. In this case there are two processes: the main process and the manager's child process.
In the main thread of the main process: after receiving the SIGINT signal, the default SIGINT signal handler raises the KeyboardInterrupt exception, which is caught and printed.
In the main thread of the manager's child process: in the mean time, after receiving the SIGINT signal, the default SIGINT signal handler raises a KeyboardInterrupt exception, which terminates the child process. Consequently all subsequent uses of the manager's shared objects by other processes raise a BrokenPipeError exception.
In the pool's child thread of the main process: in this case, a BrokenPipeError exception is raised at the line if self.event.is_set():.
In the main thread of the main process: Finally, the flow of control reaches the line a.shutdown(), which raises the AttributeError and FileNotFoundError exceptions.
How to prevent this BrokenPipeError exception?

A solution to this issue is to override the default SIGINT signal handler with a handler that will ignore the signal, for instance with the signal.SIG_IGN standard signal handler. It is possible by calling the signal.signal function at the start of the manager's child process:
import concurrent.futures
import multiprocessing.managers
import signal
import time
def init():
signal.signal(signal.SIGINT, signal.SIG_IGN)
class A:
def __init__(self):
manager = multiprocessing.managers.SyncManager()
manager.start(init)
self.event = manager.Event()
def start(self):
try:
while True:
if self.event.is_set():
break
print("processing")
time.sleep(1)
except BaseException as e:
print(type(e).__name__ + " (from pool thread):", e)
def shutdown(self):
self.event.set()
if __name__ == "__main__":
try:
a = A()
pool = concurrent.futures.ThreadPoolExecutor(1)
future = pool.submit(a.start)
while not future.done():
concurrent.futures.wait([future], timeout=0.1)
except BaseException as e:
print(type(e).__name__ + " (from main thread):", e)
finally:
a.shutdown()
pool.shutdown()
Note. — This program also works with a concurrent.futures.ProcessPoolExecutor.

Why does multiprocessing.Process.join() hang?

I am using multiprocessing in this manner:
import multiprocessing as mp
def worker(thread_id, tasks, results):
tmp_dir = 'temp_for_{}'.format(thread_id)
os.makedirs(tmp_dir)
try:
while not tasks.empty():
data = tasks.get()
response = process_pdf(data, tmp_dir)
results.put(response)
except (KeyboardInterrupt, SystemExit):
log.info('Interrupt signal received in thread %s.', thread_id)
except Queue.Empty:
pass
except Exception:
log.error("Unexpected error in %s", thread_id, exc_info=True)
finally:
shutil.rmtree(tmp_dir)
log.info("Thread %s exit", thread_id)
if __name__ == "__main__":
tasks, results = mp.Queue(), mp.Queue()
for record in cursor.select(query):
tasks.put(record)
manager = mp.Manager()
workers = [mp.Process(target=worker, args=(i, tasks, results)) for i in xrange(8)]
for worker in workers:
worker.start()
try:
for worker in workers:
worker.join()
except (KeyboardInterrupt, SystemExit):
log.info('Interrupt signal received in main. Cleaning up main')
finally:
log.info('Got %s results. Saving', results.qsize())
while not results.empty():
cursor.update_one('documents', 'id', results.get())
cursor.close()
Here's the output when I run this code:
14:34:04 15/10 INFO: Thread 6 exit
14:34:04 15/10 INFO: Thread 7 exit
14:34:21 15/10 INFO: Thread 3 exit
14:34:24 15/10 INFO: Thread 2 exit
14:34:24 15/10 INFO: Thread 1 exit
14:34:29 15/10 INFO: Thread 5 exit
14:34:36 15/10 INFO: Thread 0 exit
14:35:37 15/10 INFO: Thread 4 exit
Then I enter ^C after waiting for a while with no progress, and get this output:
^C14:37:16 15/10 INFO: Interrupt signal received in main. Cleaning up main
14:37:16 15/10 INFO: Got 16 results. Saving
And I get this traceback for all threads:
Process Process-9:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 261, in _bootstrap
util._exit_function()
File "/usr/lib64/python2.7/multiprocessing/util.py", line 328, in _exit_function
util._exit_function()
File "/usr/lib64/python2.7/multiprocessing/util.py", line 274, in _run_finalizers
finalizer()
File "/usr/lib64/python2.7/multiprocessing/util.py", line 207, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 218, in _finalize_join
thread.join()
File "/usr/lib64/python2.7/threading.py", line 952, in join
thread.join()
File "/usr/lib64/python2.7/threading.py", line 340, in wait
waiter.acquire()
KeyboardInterrupt
Why is this hanging? If it's important, I can add that process_pdf() runs a few subprocesses with subprocess.Popen().

Big thanks to dano for his hint. Fix for this issue is create queue using Manager():
manager = mp.Manager()
tasks, results = manager.Queue(), manager.Queue()
Edit
Tnx to ShadowRanger. Looks like exceptions in dispatch fixed for 2.7.10 and now we can use multiprocessing.Pool with imap_unorderedand don't need write wall of code for simple job :) But I didn't try it yet

How to Quit program when all the thread have been finished?

#!/usr/bin/env python
import threading
import urllib, sys,os
import Queue
concurrent = 200
queue = Queue.Queue(concurrent*2)
try:
aim = sys.argv[1].lower()
dic = open(sys.argv[2],'r')
except:
print "Usage: %s url wordlist" % sys.argv[0]
sys.exit(1)
class Scanner(threading.Thread):
def __init__(self,queue):
threading.Thread.__init__(self)
self.queue=queue
def run(self):
while True:
self.path = self.queue.get()
self.geturl = urllib.urlopen(aim+'/'+self.path)
self.status = self.geturl.getcode()
self.url = aim+self.path
self.result = self.url+'=>'+str(self.status)
print self.result
self.writeresult(self.result)
self.queue.task_done()
def writeresult(self,result):
fp = open('result.txt','a+')
fp.write(result+'\n')
fp.close()
def main():
for i in range(concurrent):
t = Scanner(queue)
t.setDaemon(True)
t.start()
for path in dic.readlines():
queue.put(path.strip())
queue.join()
if __name__ == '__main__':
main()
It is a python program to scan the dir of the website, when the scanning finish,
it even not quit with the ctrl+c
i want to know when it finish the scanning how to quit the program automatically.
and when it is in process, it also appear some problem like this:
Exception in thread Thread-130:
Traceback (most recent call last):
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "tt.py", line 28, in run
self.geturl = urllib.urlopen(aim+'/'+self.path)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 207, in open
return getattr(self, name)(url)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 344, in open_http
h.endheaders(data)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 776, in send
self.connect()
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 757, in connect
self.timeout, self.source_address)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known

I wanted some practice so I tried this out and changed a lot. Does it get you a full set of results? You will need to replace paths with your original argument reading.
With those threads, maybe you are getting unhandled exceptions resulting in missing results? I added a mechanism to catch any errors during reading and pass those to the result writer.
I guess appending from multiple threads to a file is ok, but I added a writer thread to more cleanly manage the file
most of the assignments to self were unnecessary
if you still get socket errors, check the paths in the result file and see how you want to handle those results if at all
I'm no expert, so don't take this as best practice
import threading
import urllib
import Queue
concurrent = 5
aim = 'http://edition.cnn.com'
paths = ['2013/10/12/opinion/kazin-tea-party/index.html?hpt=hp_t5',
'2013/10/11/opinion/opinion-hay-nobel-opcw/index.html?hpt=hp_t5',
'2013/10/11/opinion/rosin-women-in-charge/index.html?hpt=hp_t5',
'some invalid path',
'2013'] # also an invalid path
def main():
work_q = Queue.Queue()
result_q = Queue.Queue()
# start the scanners and the result writer
scanners = [Scanner(work_q, result_q) for i in range(concurrent)]
for s in scanners:
s.start()
results_file_path = 'results.txt'
result_writer = ResultWriter(result_q, 'results.txt')
result_writer.start()
# send all the work and wait for it to be completed
for path in paths:
work_q.put(path.strip())
work_q.join()
# tell everyone to stop
# you could just kill the threads but you writer needs to close the file
for s in scanners:
work_q.put(Scanner.STOP_TOKEN)
result_q.put(ResultWriter.STOP_TOKEN) # make sure file gets closed
# wait for everyone to actually stop
for s in scanners:
s.join()
result_writer.join()
print 'the scan has finished and results are in {}'.format(results_file_path)
class Scanner(threading.Thread):
STOP_TOKEN = '<<stop>>'
def __init__(self, work_q, result_q):
threading.Thread.__init__(self)
self.work_q = work_q
self.result_q = result_q
def run(self):
while True:
path = status = None # reset in case of error
try:
try:
path = self.work_q.get(timeout=0.00001)
except Queue.Empty:
continue
if path == self.STOP_TOKEN:
break # stop looking for work
get_url = urllib.urlopen(aim + '/' + path)
status = get_url.getcode()
except Exception as e:
status = 'unhandled error ({})'.format(e)
self.result_q.put((path, status))
self.work_q.task_done()
class ResultWriter(threading.Thread):
STOP_TOKEN = '<<stop>>'
def __init__(self, result_q, results_file_path):
threading.Thread.__init__(self)
self.result_q = result_q
self.results_file_path = results_file_path
def run(self):
with open(self.results_file_path, 'w') as results_file:
while True:
try:
result = self.result_q.get(timeout=0.00001)
except Queue.Empty:
continue
if result == self.STOP_TOKEN:
break # stop looking for results
path, status = result
results_file.write('{}=>{}\n'.format(path, status))
if __name__ == '__main__':
main()

The program as it is, it will close when all threads have finished.
But to easily get rid of all those errors, in your function run, from the class, after the while True: claus, put everything that follows in a try: except: clause like this
try:
code
except:
pass
Its not exactly the cleanest way to do it, but considering what you are after, it will do the job, and will get you rid of those exceptions, which btw mean that some URLS have been timed out.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Gracefully terminate multiprocessing based program - python

Related

terminate python multiprocessing pool cleanly

Python APSCheduler throwing exception after removing job

How to prevent BrokenPipeErrors after receiving a SIGINT while using process shared objects in Python?

Why does multiprocessing.Process.join() hang?

How to Quit program when all the thread have been finished?

Categories

Resources