Why does multiprocessing.Process.join() hang?

Why does multiprocessing.Process.join() hang? - python

I am using multiprocessing in this manner:
import multiprocessing as mp
def worker(thread_id, tasks, results):
tmp_dir = 'temp_for_{}'.format(thread_id)
os.makedirs(tmp_dir)
try:
while not tasks.empty():
data = tasks.get()
response = process_pdf(data, tmp_dir)
results.put(response)
except (KeyboardInterrupt, SystemExit):
log.info('Interrupt signal received in thread %s.', thread_id)
except Queue.Empty:
pass
except Exception:
log.error("Unexpected error in %s", thread_id, exc_info=True)
finally:
shutil.rmtree(tmp_dir)
log.info("Thread %s exit", thread_id)
if __name__ == "__main__":
tasks, results = mp.Queue(), mp.Queue()
for record in cursor.select(query):
tasks.put(record)
manager = mp.Manager()
workers = [mp.Process(target=worker, args=(i, tasks, results)) for i in xrange(8)]
for worker in workers:
worker.start()
try:
for worker in workers:
worker.join()
except (KeyboardInterrupt, SystemExit):
log.info('Interrupt signal received in main. Cleaning up main')
finally:
log.info('Got %s results. Saving', results.qsize())
while not results.empty():
cursor.update_one('documents', 'id', results.get())
cursor.close()
Here's the output when I run this code:
14:34:04 15/10 INFO: Thread 6 exit
14:34:04 15/10 INFO: Thread 7 exit
14:34:21 15/10 INFO: Thread 3 exit
14:34:24 15/10 INFO: Thread 2 exit
14:34:24 15/10 INFO: Thread 1 exit
14:34:29 15/10 INFO: Thread 5 exit
14:34:36 15/10 INFO: Thread 0 exit
14:35:37 15/10 INFO: Thread 4 exit
Then I enter ^C after waiting for a while with no progress, and get this output:
^C14:37:16 15/10 INFO: Interrupt signal received in main. Cleaning up main
14:37:16 15/10 INFO: Got 16 results. Saving
And I get this traceback for all threads:
Process Process-9:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 261, in _bootstrap
util._exit_function()
File "/usr/lib64/python2.7/multiprocessing/util.py", line 328, in _exit_function
util._exit_function()
File "/usr/lib64/python2.7/multiprocessing/util.py", line 274, in _run_finalizers
finalizer()
File "/usr/lib64/python2.7/multiprocessing/util.py", line 207, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 218, in _finalize_join
thread.join()
File "/usr/lib64/python2.7/threading.py", line 952, in join
thread.join()
File "/usr/lib64/python2.7/threading.py", line 340, in wait
waiter.acquire()
KeyboardInterrupt
Why is this hanging? If it's important, I can add that process_pdf() runs a few subprocesses with subprocess.Popen().

Big thanks to dano for his hint. Fix for this issue is create queue using Manager():
manager = mp.Manager()
tasks, results = manager.Queue(), manager.Queue()
Edit
Tnx to ShadowRanger. Looks like exceptions in dispatch fixed for 2.7.10 and now we can use multiprocessing.Pool with imap_unorderedand don't need write wall of code for simple job :) But I didn't try it yet

Related

Gracefully terminate multiprocessing based program

I am working on a python service that spawns Process to handle the workload. Since I don't know at the start of the service how many workers I need, I chose to not use Pool. The following is a simplified version:
import multiprocessing as mp
import time
from datetime import datetime
def _print(s): # just my cheap logging utility
print(f'{datetime.now()} - {s}')
def run_in_process(q, evt):
_print(f'starting process job')
while not evt.is_set(): # True
try:
x = q.get(timeout=2)
_print(f'received {x}')
except:
_print(f'timed-out')
if __name__ == '__main__':
with mp.Manager() as manager:
q = manager.Queue()
evt = manager.Event()
p = mp.Process(target=run_in_process, args=(q, evt))
p.start()
time.sleep(2)
data = 100
while True:
try:
q.put(data)
time.sleep(0.5)
data += 1
if data > 110:
break
except KeyboardInterrupt:
_print('finishing...')
#p.terminate()
break
time.sleep(3)
_print('setting event 0')
evt.set()
_print('joining process')
p.join()
_print('done')
The program works and exits gracefully, without any error messages. However, if I use Ctrl-C before I have all 10 events processed, I get the following error before it exits.
2022-04-01 12:41:06.866484 - received 101
2022-04-01 12:41:07.367628 - received 102
^C2022-04-01 12:41:07.507805 - timed-out
2022-04-01 12:41:07.507886 - finishing...
Process Process-2:
Traceback (most recent call last):
File "/<path-omitted>/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/<path-omitted>/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "mp.py", line 10, in run_in_process
while not evt.is_set(): # True
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 1088, in is_set
return self._callmethod('is_set')
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 819, in _callmethod
kind, result = conn.recv()
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-01 12:41:10.511334 - setting event 0
Traceback (most recent call last):
File "mp.py", line 42, in <module>
evt.set()
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 1090, in set
return self._callmethod('set')
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 818, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
A few observations:
The double error message looks exactly the same when I press Ctrl-C with my actual project. I think this is a good representation of my problem.
If I add p.terminate(), it doesn't change the behavior if the program is left to finish by itself. But if I press Ctrl-C halfway, I encounter the error message only once, I guess it's from the main thread/process.
If I change while not evt.is_set(): in run_in_process to an infinite loop: while Tre: and let the program finish its course I would continue to see periodic time-out prints which make sense. What I don't understand is that, if I press Ctrl-C, then the terminal will start spewing time-out without any time gap between them. What happened?
My ultimate question is: what is the correct way of construct this program so that when Ctrl-C is used (or a termination signal is generated to the program for that matter), the program stops gracefully?

I found out a solution to this problem myself by using signal.
The idea is to set up a signal catcher to catch specific signals, such as signal.SIGINT, signal.SIGTERM.
import multiprocessing as mp
from threading import Event
import signal
if __name__ == '__main__':
main_evt = Event()
def stop_main_handler(signum, frame):
if not main_evt.is_set():
main_evt.set()
signal.signal(signal.SIGINT, stop_main_handler)
with mp.Manager() as manager:
# creating mp queue, event and process
q = manager.Queue()
evt = manager.Event()
p = mp.Process(target=..., args=(q, evt))
p.start()
while not main_evt.is_set():
# processing data
# cleanup
evt.set()
p.join()
Or you can wrap it in an object-oriented fashion:
class SignalCatcher(object):
def __init__(self):
self._main_evt = Event()
def _stop_handler(self, signum, frame):
if not self._main_evt.is_set():
self._main_evt.set()
def block_until_signaled(self):
while not self._main_evt.is_set()
time.sleep(2)
Then you can use it as follows:
if __name__ == '__main__':
sc = SignalCatcher()
# this has to be outside. It seems that there is another process
# created by multiprocessing library, if you put sc creation in
# with-context, it would fail to signal each process.
with mp.Manager() as manager:
# creating process and starting it
# ...
sc.block_until_signaled()
# cleanup
# ...

terminate python multiprocessing pool cleanly

I am using multiprocessing.pool to work with the http server in python - it works great, but when I terminate, I get a slew of errors from all the spawnpoolworkers - and I'm just wondering how I avoid this.
My main code:
def run(self):
global pool
port = self.arguments.port
pool = multiprocessing.Pool( processes= self.arguments.threads)
with http.server.HTTPServer( ("", port), Handler ) as daemon:
print(f"serving on port {port}")
while True:
try:
daemon.handle_request()
except KeyboardInterrupt:
print("\nexiting")
pool.terminate()
pool.join()
return 0
I've tried doing nothing to the pool, I've tried doing pool.close() - I've tried not joining. But even if I just run that - never even access the port or call anything onto the pool, I still get a random list of things like this when I press control-c
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/opt/homebrew/Cellar/python#3.10/3.10.1/F
how do I exit the pool cleanly, with no errors, and with no output?

ok - I'm stupid - the control-c was also interrupting all the child processes. This fixed it:
def ignore_control_c():
signal.signal(signal.SIGINT, signal.SIG_IGN)
pool = multiprocessing.Pool( processes = self.arguments.threads, initializer = ignore_control_c )

Python APSCheduler throwing exception after removing job

I am adding job in redis and on completion of job I have added an event handler.
In eventhandler I am returning value based on which I am removing job id from jobstore. It is removed successfully but immediately it throws an exception.
Code
from datetime import datetime
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.events import EVENT_JOB_EXECUTED
import logging
logging.basicConfig()
scheduler = BackgroundScheduler()
scheduler.add_jobstore('redis')
scheduler.start()
def tick():
print('Tick! The time is: %s' % datetime.now())
return 'success'
def removing_jobs(event):
if event.retval == 'success':
scheduler.remove_job(event.job_id)
scheduler.add_listener(removing_jobs, EVENT_JOB_EXECUTED)
try:
count = 0
while True:
count += 1
time.sleep(10)
job_ret = scheduler.add_job(tick, 'interval', id = str(count), seconds=10)
except (KeyboardInterrupt, SystemExit):
scheduler.shutdown()
Exception
Exception in thread APScheduler:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/schedulers/blocking.py", line 30, in _main_loop
wait_seconds = self._process_jobs()
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/schedulers/base.py", line 995, in _process_jobs
jobstore.update_job(job)
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/jobstores/redis.py", line 91, in update_job
raise JobLookupError(job.id)
apscheduler.jobstores.base.JobLookupError: 'No job by the id of 1 was found'

In short: you are removing the job while it is processed;
so you should remove the job outside its execution.
That's because the scheduler doesn't know what the job's execution will do; so it launches tick and sends a job object to the redis jobstore thinking it will be executed again. Before that, the EVENT_JOB_LISTENER launches removing_jobs.
The problem is that when the redis' jobstore gets the job for update its status, it is already deleted so it raises the JobLookupError.

How to prevent BrokenPipeErrors after receiving a SIGINT while using process shared objects in Python?

This Python program:
import concurrent.futures
import multiprocessing
import time
class A:
def __init__(self):
self.event = multiprocessing.Manager().Event()
def start(self):
try:
while True:
if self.event.is_set():
break
print("processing")
time.sleep(1)
except BaseException as e:
print(type(e).__name__ + " (from pool thread):", e)
def shutdown(self):
self.event.set()
if __name__ == "__main__":
try:
a = A()
pool = concurrent.futures.ThreadPoolExecutor(1)
future = pool.submit(a.start)
while not future.done():
concurrent.futures.wait([future], timeout=0.1)
except BaseException as e:
print(type(e).__name__ + " (from main thread):", e)
finally:
a.shutdown()
pool.shutdown()
outputs:
processing
processing
processing
KeyboardInterrupt (from main thread):
BrokenPipeError (from pool thread): [WinError 232] The pipe is being closed
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 788, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".\foo.py", line 34, in <module>
a.shutdown()
File ".\foo.py", line 21, in shutdown
self.event.set()
File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 1067, in set
return self._callmethod('set')
File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 792, in _callmethod
self._connect()
File "C:\Program Files\Python37\lib\multiprocessing\managers.py", line 779, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "C:\Program Files\Python37\lib\multiprocessing\connection.py", line 490, in Client
c = PipeClient(address)
File "C:\Program Files\Python37\lib\multiprocessing\connection.py", line 691, in PipeClient
_winapi.WaitNamedPipe(address, 1000)
FileNotFoundError: [WinError 2] The system cannot find the file specified
when it is run and a SIGINT signal is sent after three seconds (by pressing Ctrl+C).
Analysis. — The SIGINT signal is sent to the main thread of each process. In this case there are two processes: the main process and the manager's child process.
In the main thread of the main process: after receiving the SIGINT signal, the default SIGINT signal handler raises the KeyboardInterrupt exception, which is caught and printed.
In the main thread of the manager's child process: in the mean time, after receiving the SIGINT signal, the default SIGINT signal handler raises a KeyboardInterrupt exception, which terminates the child process. Consequently all subsequent uses of the manager's shared objects by other processes raise a BrokenPipeError exception.
In the pool's child thread of the main process: in this case, a BrokenPipeError exception is raised at the line if self.event.is_set():.
In the main thread of the main process: Finally, the flow of control reaches the line a.shutdown(), which raises the AttributeError and FileNotFoundError exceptions.
How to prevent this BrokenPipeError exception?

A solution to this issue is to override the default SIGINT signal handler with a handler that will ignore the signal, for instance with the signal.SIG_IGN standard signal handler. It is possible by calling the signal.signal function at the start of the manager's child process:
import concurrent.futures
import multiprocessing.managers
import signal
import time
def init():
signal.signal(signal.SIGINT, signal.SIG_IGN)
class A:
def __init__(self):
manager = multiprocessing.managers.SyncManager()
manager.start(init)
self.event = manager.Event()
def start(self):
try:
while True:
if self.event.is_set():
break
print("processing")
time.sleep(1)
except BaseException as e:
print(type(e).__name__ + " (from pool thread):", e)
def shutdown(self):
self.event.set()
if __name__ == "__main__":
try:
a = A()
pool = concurrent.futures.ThreadPoolExecutor(1)
future = pool.submit(a.start)
while not future.done():
concurrent.futures.wait([future], timeout=0.1)
except BaseException as e:
print(type(e).__name__ + " (from main thread):", e)
finally:
a.shutdown()
pool.shutdown()
Note. — This program also works with a concurrent.futures.ProcessPoolExecutor.

How to use Popen to run backgroud process and avoid zombie?

I've a listener server running new thread to for each client handler. Each handler can use:
proc = subprocess.Popen(argv, executable = "./Main.py", stdout = _stdout, stderr = subprocess.STDOUT, close_fds=False)
to run new process in background, after what the handler thread is ended.
After the background process is ended, it is kept in Z state. Is it possible to ask subprocess.Popen() to handle SIG_CHILD to avoid this zombie?
I don't want to read process state using proc.wait(), since for this I've to save the list of all running background processes...
UPD
I need to run some processes in background avoiding zombies and to run some processes with .communicate() to read data from these processes. In that case using signal trick from koblas I get an error:
File "./PyZWServer.py", line 115, in IsRunning
return (subprocess.Popen(["pgrep", "-c", "-x", name], stdout=subprocess.PIPE).communicate()[0] == "0")
File "/usr/lib/python2.6/subprocess.py", line 698, in communicate
self.wait()
File "/usr/lib/python2.6/subprocess.py", line 1170, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib/python2.6/subprocess.py", line 465, in _eintr_retry_call
return func(*args)
OSError: [Errno 10] No child processes
Error happened during handling of client

If you add a signal handler for SIGCHLD you will have the kernel handle the wait/reap piece.
Specifically the line:
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
Will take care of your Zombies.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does multiprocessing.Process.join() hang? - python

Related

Gracefully terminate multiprocessing based program

terminate python multiprocessing pool cleanly

Python APSCheduler throwing exception after removing job

How to prevent BrokenPipeErrors after receiving a SIGINT while using process shared objects in Python?

How to use Popen to run backgroud process and avoid zombie?

Categories

Resources