terminate python multiprocessing pool cleanly

terminate python multiprocessing pool cleanly - python

I am using multiprocessing.pool to work with the http server in python - it works great, but when I terminate, I get a slew of errors from all the spawnpoolworkers - and I'm just wondering how I avoid this.
My main code:
def run(self):
global pool
port = self.arguments.port
pool = multiprocessing.Pool( processes= self.arguments.threads)
with http.server.HTTPServer( ("", port), Handler ) as daemon:
print(f"serving on port {port}")
while True:
try:
daemon.handle_request()
except KeyboardInterrupt:
print("\nexiting")
pool.terminate()
pool.join()
return 0
I've tried doing nothing to the pool, I've tried doing pool.close() - I've tried not joining. But even if I just run that - never even access the port or call anything onto the pool, I still get a random list of things like this when I press control-c
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/opt/homebrew/Cellar/python#3.10/3.10.1/F
how do I exit the pool cleanly, with no errors, and with no output?

ok - I'm stupid - the control-c was also interrupting all the child processes. This fixed it:
def ignore_control_c():
signal.signal(signal.SIGINT, signal.SIG_IGN)
pool = multiprocessing.Pool( processes = self.arguments.threads, initializer = ignore_control_c )

Related

Raise Exception if thread hangs

I have scripts running 24/7 that sometimes get stuck when a thread in concurrent.futures gets no response for a request.
The hanging-threads 2.0.5 module prints out which thread hangs and why.
The print looks something like this:
Thread 139646566659840 "ThreadPoolExecutor-666849_1" hangs -
File "/usr/lib/python3.9/threading.py", line 912, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 77, in _worker
work_item.run()
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
How can I, instead of just printing out the hanging threads and files, raise an exception when a thread is not responding in a given time? The script should just restart itself if hanging occurs, instead of waiting for a response.
I have tried with timeout, but concurrent futures can not be cancelled while running.

concurrent futures can not be cancelled while running
This is your problem. A hanging thread is still 'running'. Cancelling it from outside is not possible.
Thus you have two options:
switch to something which can be cancelled, like a ProcessPoolExecutor, or
rewrite the blocking code so it fails.
Since you say 'response to a request'---if this is a network request and you are early enough/frustrated enough in the dev cycle I thoroughly recommend switching to a concurrent multiprocessing framework like asyncio. This is exactly what they were developed for. In particular you may be interested in trios implementation of cancel scopes.

Gracefully terminate multiprocessing based program

I am working on a python service that spawns Process to handle the workload. Since I don't know at the start of the service how many workers I need, I chose to not use Pool. The following is a simplified version:
import multiprocessing as mp
import time
from datetime import datetime
def _print(s): # just my cheap logging utility
print(f'{datetime.now()} - {s}')
def run_in_process(q, evt):
_print(f'starting process job')
while not evt.is_set(): # True
try:
x = q.get(timeout=2)
_print(f'received {x}')
except:
_print(f'timed-out')
if __name__ == '__main__':
with mp.Manager() as manager:
q = manager.Queue()
evt = manager.Event()
p = mp.Process(target=run_in_process, args=(q, evt))
p.start()
time.sleep(2)
data = 100
while True:
try:
q.put(data)
time.sleep(0.5)
data += 1
if data > 110:
break
except KeyboardInterrupt:
_print('finishing...')
#p.terminate()
break
time.sleep(3)
_print('setting event 0')
evt.set()
_print('joining process')
p.join()
_print('done')
The program works and exits gracefully, without any error messages. However, if I use Ctrl-C before I have all 10 events processed, I get the following error before it exits.
2022-04-01 12:41:06.866484 - received 101
2022-04-01 12:41:07.367628 - received 102
^C2022-04-01 12:41:07.507805 - timed-out
2022-04-01 12:41:07.507886 - finishing...
Process Process-2:
Traceback (most recent call last):
File "/<path-omitted>/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/<path-omitted>/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "mp.py", line 10, in run_in_process
while not evt.is_set(): # True
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 1088, in is_set
return self._callmethod('is_set')
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 819, in _callmethod
kind, result = conn.recv()
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-01 12:41:10.511334 - setting event 0
Traceback (most recent call last):
File "mp.py", line 42, in <module>
evt.set()
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 1090, in set
return self._callmethod('set')
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 818, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
A few observations:
The double error message looks exactly the same when I press Ctrl-C with my actual project. I think this is a good representation of my problem.
If I add p.terminate(), it doesn't change the behavior if the program is left to finish by itself. But if I press Ctrl-C halfway, I encounter the error message only once, I guess it's from the main thread/process.
If I change while not evt.is_set(): in run_in_process to an infinite loop: while Tre: and let the program finish its course I would continue to see periodic time-out prints which make sense. What I don't understand is that, if I press Ctrl-C, then the terminal will start spewing time-out without any time gap between them. What happened?
My ultimate question is: what is the correct way of construct this program so that when Ctrl-C is used (or a termination signal is generated to the program for that matter), the program stops gracefully?

I found out a solution to this problem myself by using signal.
The idea is to set up a signal catcher to catch specific signals, such as signal.SIGINT, signal.SIGTERM.
import multiprocessing as mp
from threading import Event
import signal
if __name__ == '__main__':
main_evt = Event()
def stop_main_handler(signum, frame):
if not main_evt.is_set():
main_evt.set()
signal.signal(signal.SIGINT, stop_main_handler)
with mp.Manager() as manager:
# creating mp queue, event and process
q = manager.Queue()
evt = manager.Event()
p = mp.Process(target=..., args=(q, evt))
p.start()
while not main_evt.is_set():
# processing data
# cleanup
evt.set()
p.join()
Or you can wrap it in an object-oriented fashion:
class SignalCatcher(object):
def __init__(self):
self._main_evt = Event()
def _stop_handler(self, signum, frame):
if not self._main_evt.is_set():
self._main_evt.set()
def block_until_signaled(self):
while not self._main_evt.is_set()
time.sleep(2)
Then you can use it as follows:
if __name__ == '__main__':
sc = SignalCatcher()
# this has to be outside. It seems that there is another process
# created by multiprocessing library, if you put sc creation in
# with-context, it would fail to signal each process.
with mp.Manager() as manager:
# creating process and starting it
# ...
sc.block_until_signaled()
# cleanup
# ...

python multiprocessing pool and additional queues

I would like to pass messages out from my function running in a process pool while the function is still running.
My application uses asyncio and multiprocessing queues to receive and distribute messages to a worker pool using asyncio.run_in_executor(). I manually created the pool so I could provide an initializer.
The problem I have is that I would like the functions that are running in the executor pool to be able to send messages out to the asyncio loop. This is how I started my new application process:
self._application = Application(self.outgoing_queue, self.incoming_queue, application_cores, log_level=logging.INFO)
self._application_process = mp.Process(target=self._application.run)
self._application_process.start()
the queues are from:
self.outgoing_queue = mp.Queue()
self.incoming_queue = mp.Queue()
I can't use my asyncio queue, or multiprocessing queue since those can't be passed to the process by this method:
async def run_operation():
kwargs = {
'out_queue': self._work_pool_queue
}
func = functools.partial(attribute, *args, **kwargs)
result = await self._loop.run_in_executor(self._work_pool, func)
result_msg = common.messages.MessageResult(result, msg.reply_id, msg.cpu_cost)
await self.outgoing_send(result_msg)
asyncio.create_task(run_operation())
My self._work_pool is created with:
self._work_pool = concurrent.futures.ProcessPoolExecutor(max_workers=self._cores, initializer=_work_pool_init)
since the following traceback results:
Task exception was never retrieved
future: <Task finished coro=<Application.message_router.<locals>.run_operation() done, defined at c:\users\brian\gitlab\rf-applications\rfapplications\common\application.py:95> exception=RuntimeError('Queue objects should only be shared between processes through inheritance')>
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "c:\Program Files\Python37\lib\multiprocessing\queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "c:\Program Files\Python37\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "c:\Program Files\Python37\lib\multiprocessing\queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "c:\Program Files\Python37\lib\multiprocessing\context.py", line 356, in assert_spawning
' through inheritance' % type(obj).__name__
RuntimeError: Queue objects should only be shared between processes through inheritance
"""
I was looking at using a Manager().Queue() (https://docs.python.org/3/library/multiprocessing.html#managers) since those can be sent to the process pool (Python multiprocessing Pool Queues communication). However, these queues seem to open up the possibility of remote connections, which I would like to avoid (I use secure websockets to communicate between remote machines right so far).

Python APSCheduler throwing exception after removing job

I am adding job in redis and on completion of job I have added an event handler.
In eventhandler I am returning value based on which I am removing job id from jobstore. It is removed successfully but immediately it throws an exception.
Code
from datetime import datetime
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.events import EVENT_JOB_EXECUTED
import logging
logging.basicConfig()
scheduler = BackgroundScheduler()
scheduler.add_jobstore('redis')
scheduler.start()
def tick():
print('Tick! The time is: %s' % datetime.now())
return 'success'
def removing_jobs(event):
if event.retval == 'success':
scheduler.remove_job(event.job_id)
scheduler.add_listener(removing_jobs, EVENT_JOB_EXECUTED)
try:
count = 0
while True:
count += 1
time.sleep(10)
job_ret = scheduler.add_job(tick, 'interval', id = str(count), seconds=10)
except (KeyboardInterrupt, SystemExit):
scheduler.shutdown()
Exception
Exception in thread APScheduler:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/schedulers/blocking.py", line 30, in _main_loop
wait_seconds = self._process_jobs()
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/schedulers/base.py", line 995, in _process_jobs
jobstore.update_job(job)
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/jobstores/redis.py", line 91, in update_job
raise JobLookupError(job.id)
apscheduler.jobstores.base.JobLookupError: 'No job by the id of 1 was found'

In short: you are removing the job while it is processed;
so you should remove the job outside its execution.
That's because the scheduler doesn't know what the job's execution will do; so it launches tick and sends a job object to the redis jobstore thinking it will be executed again. Before that, the EVENT_JOB_LISTENER launches removing_jobs.
The problem is that when the redis' jobstore gets the job for update its status, it is already deleted so it raises the JobLookupError.

How to use Popen to run backgroud process and avoid zombie?

I've a listener server running new thread to for each client handler. Each handler can use:
proc = subprocess.Popen(argv, executable = "./Main.py", stdout = _stdout, stderr = subprocess.STDOUT, close_fds=False)
to run new process in background, after what the handler thread is ended.
After the background process is ended, it is kept in Z state. Is it possible to ask subprocess.Popen() to handle SIG_CHILD to avoid this zombie?
I don't want to read process state using proc.wait(), since for this I've to save the list of all running background processes...
UPD
I need to run some processes in background avoiding zombies and to run some processes with .communicate() to read data from these processes. In that case using signal trick from koblas I get an error:
File "./PyZWServer.py", line 115, in IsRunning
return (subprocess.Popen(["pgrep", "-c", "-x", name], stdout=subprocess.PIPE).communicate()[0] == "0")
File "/usr/lib/python2.6/subprocess.py", line 698, in communicate
self.wait()
File "/usr/lib/python2.6/subprocess.py", line 1170, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib/python2.6/subprocess.py", line 465, in _eintr_retry_call
return func(*args)
OSError: [Errno 10] No child processes
Error happened during handling of client

If you add a signal handler for SIGCHLD you will have the kernel handle the wait/reap piece.
Specifically the line:
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
Will take care of your Zombies.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

terminate python multiprocessing pool cleanly - python

ok - I'm stupid - the control-c was also interrupting all the child processes. This fixed it: def ignore_control_c(): signal.signal(signal.SIGINT, signal.SIG_IGN) pool = multiprocessing.Pool( processes = self.arguments.threads, initializer = ignore_control_c )

Related

Raise Exception if thread hangs

Gracefully terminate multiprocessing based program

python multiprocessing pool and additional queues

Python APSCheduler throwing exception after removing job

How to use Popen to run backgroud process and avoid zombie?

Categories

Resources