I am using multiprocessing.pool to work with the http server in python - it works great, but when I terminate, I get a slew of errors from all the spawnpoolworkers - and I'm just wondering how I avoid this.
My main code:
def run(self):
global pool
port = self.arguments.port
pool = multiprocessing.Pool( processes= self.arguments.threads)
with http.server.HTTPServer( ("", port), Handler ) as daemon:
print(f"serving on port {port}")
while True:
try:
daemon.handle_request()
except KeyboardInterrupt:
print("\nexiting")
pool.terminate()
pool.join()
return 0
I've tried doing nothing to the pool, I've tried doing pool.close() - I've tried not joining. But even if I just run that - never even access the port or call anything onto the pool, I still get a random list of things like this when I press control-c
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/opt/homebrew/Cellar/python#3.10/3.10.1/F
how do I exit the pool cleanly, with no errors, and with no output?
ok - I'm stupid - the control-c was also interrupting all the child processes. This fixed it:
def ignore_control_c():
signal.signal(signal.SIGINT, signal.SIG_IGN)
pool = multiprocessing.Pool( processes = self.arguments.threads, initializer = ignore_control_c )
Related
I have scripts running 24/7 that sometimes get stuck when a thread in concurrent.futures gets no response for a request.
The hanging-threads 2.0.5 module prints out which thread hangs and why.
The print looks something like this:
Thread 139646566659840 "ThreadPoolExecutor-666849_1" hangs -
File "/usr/lib/python3.9/threading.py", line 912, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 77, in _worker
work_item.run()
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
How can I, instead of just printing out the hanging threads and files, raise an exception when a thread is not responding in a given time? The script should just restart itself if hanging occurs, instead of waiting for a response.
I have tried with timeout, but concurrent futures can not be cancelled while running.
concurrent futures can not be cancelled while running
This is your problem. A hanging thread is still 'running'. Cancelling it from outside is not possible.
Thus you have two options:
switch to something which can be cancelled, like a ProcessPoolExecutor, or
rewrite the blocking code so it fails.
Since you say 'response to a request'---if this is a network request and you are early enough/frustrated enough in the dev cycle I thoroughly recommend switching to a concurrent multiprocessing framework like asyncio. This is exactly what they were developed for. In particular you may be interested in trios implementation of cancel scopes.
I am working on a python service that spawns Process to handle the workload. Since I don't know at the start of the service how many workers I need, I chose to not use Pool. The following is a simplified version:
import multiprocessing as mp
import time
from datetime import datetime
def _print(s): # just my cheap logging utility
print(f'{datetime.now()} - {s}')
def run_in_process(q, evt):
_print(f'starting process job')
while not evt.is_set(): # True
try:
x = q.get(timeout=2)
_print(f'received {x}')
except:
_print(f'timed-out')
if __name__ == '__main__':
with mp.Manager() as manager:
q = manager.Queue()
evt = manager.Event()
p = mp.Process(target=run_in_process, args=(q, evt))
p.start()
time.sleep(2)
data = 100
while True:
try:
q.put(data)
time.sleep(0.5)
data += 1
if data > 110:
break
except KeyboardInterrupt:
_print('finishing...')
#p.terminate()
break
time.sleep(3)
_print('setting event 0')
evt.set()
_print('joining process')
p.join()
_print('done')
The program works and exits gracefully, without any error messages. However, if I use Ctrl-C before I have all 10 events processed, I get the following error before it exits.
2022-04-01 12:41:06.866484 - received 101
2022-04-01 12:41:07.367628 - received 102
^C2022-04-01 12:41:07.507805 - timed-out
2022-04-01 12:41:07.507886 - finishing...
Process Process-2:
Traceback (most recent call last):
File "/<path-omitted>/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/<path-omitted>/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "mp.py", line 10, in run_in_process
while not evt.is_set(): # True
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 1088, in is_set
return self._callmethod('is_set')
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 819, in _callmethod
kind, result = conn.recv()
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-01 12:41:10.511334 - setting event 0
Traceback (most recent call last):
File "mp.py", line 42, in <module>
evt.set()
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 1090, in set
return self._callmethod('set')
File "/<path-omitted>/python3.7/multiprocessing/managers.py", line 818, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/<path-omitted>/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
A few observations:
The double error message looks exactly the same when I press Ctrl-C with my actual project. I think this is a good representation of my problem.
If I add p.terminate(), it doesn't change the behavior if the program is left to finish by itself. But if I press Ctrl-C halfway, I encounter the error message only once, I guess it's from the main thread/process.
If I change while not evt.is_set(): in run_in_process to an infinite loop: while Tre: and let the program finish its course I would continue to see periodic time-out prints which make sense. What I don't understand is that, if I press Ctrl-C, then the terminal will start spewing time-out without any time gap between them. What happened?
My ultimate question is: what is the correct way of construct this program so that when Ctrl-C is used (or a termination signal is generated to the program for that matter), the program stops gracefully?
I found out a solution to this problem myself by using signal.
The idea is to set up a signal catcher to catch specific signals, such as signal.SIGINT, signal.SIGTERM.
import multiprocessing as mp
from threading import Event
import signal
if __name__ == '__main__':
main_evt = Event()
def stop_main_handler(signum, frame):
if not main_evt.is_set():
main_evt.set()
signal.signal(signal.SIGINT, stop_main_handler)
with mp.Manager() as manager:
# creating mp queue, event and process
q = manager.Queue()
evt = manager.Event()
p = mp.Process(target=..., args=(q, evt))
p.start()
while not main_evt.is_set():
# processing data
# cleanup
evt.set()
p.join()
Or you can wrap it in an object-oriented fashion:
class SignalCatcher(object):
def __init__(self):
self._main_evt = Event()
def _stop_handler(self, signum, frame):
if not self._main_evt.is_set():
self._main_evt.set()
def block_until_signaled(self):
while not self._main_evt.is_set()
time.sleep(2)
Then you can use it as follows:
if __name__ == '__main__':
sc = SignalCatcher()
# this has to be outside. It seems that there is another process
# created by multiprocessing library, if you put sc creation in
# with-context, it would fail to signal each process.
with mp.Manager() as manager:
# creating process and starting it
# ...
sc.block_until_signaled()
# cleanup
# ...
I would like to pass messages out from my function running in a process pool while the function is still running.
My application uses asyncio and multiprocessing queues to receive and distribute messages to a worker pool using asyncio.run_in_executor(). I manually created the pool so I could provide an initializer.
The problem I have is that I would like the functions that are running in the executor pool to be able to send messages out to the asyncio loop. This is how I started my new application process:
self._application = Application(self.outgoing_queue, self.incoming_queue, application_cores, log_level=logging.INFO)
self._application_process = mp.Process(target=self._application.run)
self._application_process.start()
the queues are from:
self.outgoing_queue = mp.Queue()
self.incoming_queue = mp.Queue()
I can't use my asyncio queue, or multiprocessing queue since those can't be passed to the process by this method:
async def run_operation():
kwargs = {
'out_queue': self._work_pool_queue
}
func = functools.partial(attribute, *args, **kwargs)
result = await self._loop.run_in_executor(self._work_pool, func)
result_msg = common.messages.MessageResult(result, msg.reply_id, msg.cpu_cost)
await self.outgoing_send(result_msg)
asyncio.create_task(run_operation())
My self._work_pool is created with:
self._work_pool = concurrent.futures.ProcessPoolExecutor(max_workers=self._cores, initializer=_work_pool_init)
since the following traceback results:
Task exception was never retrieved
future: <Task finished coro=<Application.message_router.<locals>.run_operation() done, defined at c:\users\brian\gitlab\rf-applications\rfapplications\common\application.py:95> exception=RuntimeError('Queue objects should only be shared between processes through inheritance')>
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "c:\Program Files\Python37\lib\multiprocessing\queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "c:\Program Files\Python37\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "c:\Program Files\Python37\lib\multiprocessing\queues.py", line 58, in __getstate__
context.assert_spawning(self)
File "c:\Program Files\Python37\lib\multiprocessing\context.py", line 356, in assert_spawning
' through inheritance' % type(obj).__name__
RuntimeError: Queue objects should only be shared between processes through inheritance
"""
I was looking at using a Manager().Queue() (https://docs.python.org/3/library/multiprocessing.html#managers) since those can be sent to the process pool (Python multiprocessing Pool Queues communication). However, these queues seem to open up the possibility of remote connections, which I would like to avoid (I use secure websockets to communicate between remote machines right so far).
I am adding job in redis and on completion of job I have added an event handler.
In eventhandler I am returning value based on which I am removing job id from jobstore. It is removed successfully but immediately it throws an exception.
Code
from datetime import datetime
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.events import EVENT_JOB_EXECUTED
import logging
logging.basicConfig()
scheduler = BackgroundScheduler()
scheduler.add_jobstore('redis')
scheduler.start()
def tick():
print('Tick! The time is: %s' % datetime.now())
return 'success'
def removing_jobs(event):
if event.retval == 'success':
scheduler.remove_job(event.job_id)
scheduler.add_listener(removing_jobs, EVENT_JOB_EXECUTED)
try:
count = 0
while True:
count += 1
time.sleep(10)
job_ret = scheduler.add_job(tick, 'interval', id = str(count), seconds=10)
except (KeyboardInterrupt, SystemExit):
scheduler.shutdown()
Exception
Exception in thread APScheduler:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/schedulers/blocking.py", line 30, in _main_loop
wait_seconds = self._process_jobs()
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/schedulers/base.py", line 995, in _process_jobs
jobstore.update_job(job)
File "/.virtualenvs/py3/lib/python3.5/site-packages/apscheduler/jobstores/redis.py", line 91, in update_job
raise JobLookupError(job.id)
apscheduler.jobstores.base.JobLookupError: 'No job by the id of 1 was found'
In short: you are removing the job while it is processed;
so you should remove the job outside its execution.
That's because the scheduler doesn't know what the job's execution will do; so it launches tick and sends a job object to the redis jobstore thinking it will be executed again. Before that, the EVENT_JOB_LISTENER launches removing_jobs.
The problem is that when the redis' jobstore gets the job for update its status, it is already deleted so it raises the JobLookupError.
I've a listener server running new thread to for each client handler. Each handler can use:
proc = subprocess.Popen(argv, executable = "./Main.py", stdout = _stdout, stderr = subprocess.STDOUT, close_fds=False)
to run new process in background, after what the handler thread is ended.
After the background process is ended, it is kept in Z state. Is it possible to ask subprocess.Popen() to handle SIG_CHILD to avoid this zombie?
I don't want to read process state using proc.wait(), since for this I've to save the list of all running background processes...
UPD
I need to run some processes in background avoiding zombies and to run some processes with .communicate() to read data from these processes. In that case using signal trick from koblas I get an error:
File "./PyZWServer.py", line 115, in IsRunning
return (subprocess.Popen(["pgrep", "-c", "-x", name], stdout=subprocess.PIPE).communicate()[0] == "0")
File "/usr/lib/python2.6/subprocess.py", line 698, in communicate
self.wait()
File "/usr/lib/python2.6/subprocess.py", line 1170, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib/python2.6/subprocess.py", line 465, in _eintr_retry_call
return func(*args)
OSError: [Errno 10] No child processes
Error happened during handling of client
If you add a signal handler for SIGCHLD you will have the kernel handle the wait/reap piece.
Specifically the line:
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
Will take care of your Zombies.