I've being using python multiprocessing for some task handling. The dev enviroment is Windows Server 2016 and python 3.7.0.
Sometimes there were child processes that stayed in the task list. But actually, they seemed to be completed(data were writen into database). The impact is that the logging stucked there, being unable to append latest logs.
Here is the code. Main function starts a listener process and several worker processes:
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process, args=(queue, listener_configurer))
listener.start()
...
workers = []
for loop:
worker = multiprocessing.Process(target=process_start, args=(queue, worker_configurer, plist))
workers.append(worker)
worker.start()
for w in workers:
w.join()
...
queue.put_nowait(None)
listener.join()
The listener process ends when it gets None, thus resulting the whole task to end.
def listener_process(queue, configurer):
configurer()
while True:
try:
record = queue.get()
if record is None:
break
if type(record) is not int:
Logger = logging.getLogger(record.name)
Logger.handle(record)
except Exception as e:
Logger.error(str(e), exc_info=True)
Task is scheduled to run by windows task scheduler.
Any idea why some multiprocessing processes were 'stuck' there?
It's being bothering me for some time. Thanks in advance.
Can I say for sure what is your problem? No. Can I say for sure you are doing something that can lead to a deadlock? Yes.
If you read the documentation carefully on multiprocessing.Queue, you will see the following warning:
Warning:
As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
This means that to be completely safe, you must join the listener process (which is issuing gets from the queue) first before joining the workers processes (which are issuing puts to the queue) to ensure that all the messages put to the queue have been read off the queue before you attempt to join the tasks that have done the puts to the queue.
But then how will the listener process know when to terminate if currently it is looking for the main process to write a None sentinel message to the queue signifying that it is quitting time but in the new design the main process must first wait for the listener to terminate before it waits for the workers to terminate? Presumably you have control over the source of the process_start function that implements the producer of messages that are written to the queue and presumably something triggers its decision to terminate. When these processes terminate it is they that must each write a None sentinel message to the queue signifying that they will not be producing any more messages. Then funtion listener_process must be passed an additional argument, i.e. the number of message producers so that it knows how many of these sentinels it should expect to see. Unfortunately, I can't discern from what you have coded, i.e. for loop:, what that number of processes is and it appears that you are instantiating each process with identical arguments. But for the sake of clarity I will modify your code to something that is more explicit:
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process, args=(queue, listener_configurer, len(plist)))
listener.start()
...
workers = []
# There will be len(plist) producer of messages:
for p in plist:
worker = multiprocessing.Process(target=process_start, args=(queue, worker_configurer, p))
workers.append(worker)
worker.start()
listener.join() # join the listener first
for w in workers:
w.join()
....
def listener_process(queue, configurer, n_producers):
configurer()
sentinel_count = 0
while True:
try:
record = queue.get()
if record is None:
sentinel_count += 1
if sentinel_count == n_producers:
break # we are done
continue
if type(record) is not int:
Logger = logging.getLogger(record.name)
Logger.handle(record)
except Exception as e:
Logger.error(str(e), exc_info=True)
Update
Here is a complete example. But to avoid the complexities of configuring various loggers with handlers, I am just using a simple print statement. But as you can see, everything is "logged.":
import multiprocessing
def process_start(queue, p):
for i in range(3):
queue.put(p)
queue.put(None) # Sentinel
def listener_process(queue, n_producers):
sentinel_count = 0
while True:
try:
record = queue.get()
if record is None:
sentinel_count += 1
if sentinel_count == n_producers:
break # we are done
continue
if type(record) is not int:
print(record)
except Exception as e:
print(e)
class Record:
def __init__(self, name, value):
self.name = name
self.value = value
def __repr__(self):
return f'name={self.name}, value={self.value}'
def main():
plist = [Record('basic', 'A'), Record('basic', 'B'), Record('basic', 'C')]
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process, args=(queue, len(plist)))
listener.start()
workers = []
# There will be len(plist) producer of messages:
for p in plist:
worker = multiprocessing.Process(target=process_start, args=(queue, p))
workers.append(worker)
worker.start()
listener.join() # join the listener first
for w in workers:
w.join()
# Required for Windows:
if __name__ == '__main__':
main()
Prints:
name=basic, value=A
name=basic, value=A
name=basic, value=A
name=basic, value=B
name=basic, value=B
name=basic, value=B
name=basic, value=C
name=basic, value=C
name=basic, value=C
Related
My English is not good.
I'm reading the code from ansible 1.1
The following is taken from "ansible-1.1/lib/ansible/runner/__init__.py"
def _executor_hook(job_queue, result_queue):
# attempt workaround of https://github.com/newsapps/beeswithmachineguns/issues/17
# this function also not present in CentOS 6
if HAS_ATFORK:
atfork()
signal.signal(signal.SIGINT, signal.SIG_IGN)
while not job_queue.empty():
try:
host = job_queue.get(block=False)
result_queue.put(multiprocessing_runner._executor(host))
except Queue.Empty:
pass
except:
traceback.print_exc()
class Runner(object):
# ...
def _parallel_exec(self, hosts):
''' handles mulitprocessing when more than 1 fork is required '''
manager = multiprocessing.Manager()
job_queue = manager.Queue()
for host in hosts:
job_queue.put(host)
result_queue = manager.Queue()
workers = []
for i in range(self.forks):
prc = multiprocessing.Process(target=_executor_hook,
args=(job_queue, result_queue))
prc.start()
workers.append(prc)
try:
for worker in workers:
worker.join()
except KeyboardInterrupt:
for worker in workers:
worker.terminate()
worker.join()
When an error is caught, the terminate method is also called.
What is the difference between this and a direct pass?
try:
for worker in workers:
worker.join()
except KeyboardInterrupt:
pass
This lets the workers do their thing until they're done
try:
for worker in workers:
worker.join()
This gets called when you press ctrl+c
except KeyboardInterrupt:
for worker in workers:
worker.terminate()
worker.join()
You are basically telling the program: "Don't let the workers finish their stuff, shut them down and get me out of here NOW"
On SIGINT, it will tell each worker to terminate, but then it still waits for them all to exit (completing whatever they do on termination) before it exits itself.
I am using the Python multiprocessing library. Whenever one of the processes throw a timeout error, my application ends itself. I want to keep the processes up.
I have a function that subscribes to a queue and listens to incoming messages:
def process_msg(i):
#get new message from the queue
#process it
import time
time.sleep(10)
return True
I have created a Pool that creates 6 processes and executes the process_msg() function above.
When the function times out, I want the Pool to call the function again and wait for new messages instead of exiting:
if __name__ == "main":
import multiprocessing
from multiprocessing import Pool
pool = Pool(processes=6)
collection = range(6)
try:
val = pool.map_async(process_msg, collection)
try:
res = val.get(5)
except TimeoutError:
print('timeout here')
pool.close()
pool.terminate()
pool.join()
The code runs and when I get a timeout, the application terminates itself.
What I want it to do is to print that the timeout as occurred and call the same function again.
What's the right approach?
Here's a skeleton for a program that works. The main issue you had is the use of pool.terminate, which "Stops the worker processes immediately without completing outstanding work" (see the documentation).
from multiprocessing import Pool, TimeoutError
def process_msg(i):
#get new message from the queue
#process it
import time
print(f"Starting to sleep, proxess # {i}")
time.sleep(10)
return True
def main():
print("in main")
pool = Pool(processes=6)
collection = range(6)
print("About to spawn sub processes")
val = pool.map_async(process_msg, collection)
while True:
try:
print("Waiting for results")
res = val.get(3)
print(f"Res is {res}")
break
except TimeoutError:
print("Timeout here")
print("Closing pool")
pool.close()
# pool.terminate() # do not terminate - it kill the child processes
print ("Joining pool")
pool.join()
print("exiting main")
if __name__ == "__main__":
main()
The output of this code is:
in main
About to spawn sub processes
Waiting for results
Starting to sleep, proxess # 0
Starting to sleep, proxess # 1
Starting to sleep, proxess # 2
Starting to sleep, proxess # 3
Starting to sleep, proxess # 4
Starting to sleep, proxess # 5
Timeout here
Waiting for results
Timeout here
Waiting for results
Timeout here
Waiting for results
Res is [True, True, True, True, True, True]
Closing pool
Joining pool
exiting main
I'm using Python 3. I have the following code that tries to catch CTRL+C when running a pool of async workers. Each worker just runs in an infinite loop waiting for messages to show up in a queue for processing. What I don't understand is why the log.info inside the except block and print message are not printed when I press CTRL+C. It just drops me back into the xterm. Is the with block doing something to prevent this?
def worker(q):
"""Worker to retrieve item from queue and process it.
Args:
q -- Queue
"""
# Run in an infinite loop. Get an item from the queue to process it. We MUST call q.task_done() to indicate
# that item is processed to prevent deadlock.
while True:
try:
# item = q.get()
q.get()
# TODO: We'll do work here.
log.info('Processed message')
finally:
q.task_done()
def some_func():
...
# Run in an infinite loop unless killed by user.
try:
log.info('Create pool with worker=%d to process messages', args.workers)
with mp.Pool(processes=4) as pool:
p = pool.apply_async(worker, (queue,))
p.get()
except KeyboardInterrupt:
log.info('Received CTRL-C ... exiting')
pass
print('got here')
return 0
Use asyncio, not multiprocessing
Depending on the nature of work to be done (whether CPU intensive or IO intensive) you might try asyncio, which has a simple pattern for graceful shutdown:
def main():
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
try:
loop.create_task(publish(queue))
loop.create_task(consume(queue))
loop.run_forever()
except KeyboardInterrupt:
logging.info("Process interrupted")
finally:
loop.close()
^ from https://www.roguelynn.com/words/asyncio-graceful-shutdowns/
If you must use multiprocessing
This question has been answered here: Python multiprocessing : Killing a process gracefully
In many cases I have a worker thread which pops data from a Queue and acts on it. At some kind of event I want my worker thread to stop. The simple solution is to add a timeout to the get call and check the Event/flag every time the get times out. This however as two problems:
Causes an unnecessary context switch
Delays the shutdown until a timeout occurs
Is there any better way to listen both to a stop event and new data in the Queue? Is it possible to listen to two Queue's at the same time and block until there's data in the first one? (In this case one can use a second Queue just to trigger the shutdown.)
The solution I'm currently using:
from queue import Queue, Empty
from threading import Event, Thread
from time import sleep
def worker(exit_event, queue):
print("Worker started.")
while not exit_event.isSet():
try:
data = queue.get(timeout=10)
print("got {}".format(data))
except Empty:
pass
print("Worker quit.")
if __name__ == "__main__":
exit_event = Event()
queue = Queue()
th = Thread(target=worker, args=(exit_event, queue))
th.start()
queue.put("Testing")
queue.put("Hello!")
sleep(2)
print("Asking worker to quit")
exit_event.set()
th.join()
print("All done..")
I guess you may easily reduce timeout to 0.1...0.01 sec. Slightly different solution is to use the queue to send both data and control commands to the thread:
import queue
import threading
import time
THREADSTOP = 0
class ThreadControl:
def __init__(self, command):
self.command = command
def worker(q):
print("Worker started.")
while True:
data = q.get()
if isinstance(data, ThreadControl):
if data.command == THREADSTOP:
break
print("got {}".format(data))
print("Worker quit.")
if __name__ == '__main__':
q = queue.Queue()
th = threading.Thread(target=worker, args=(q,))
th.start()
q.put("Testing")
q.put("Hello!")
time.sleep(2)
print("Asking worker to quit")
q.put(ThreadControl(command=THREADSTOP)) # sending command
th.join()
print("All done..")
Another option is to use sockets instead of queues.
I have a program that has two threads, the main thread and one additional that works on handling jobs from a FIFO queue.
Something like this:
import queue
import threading
q = queue.Queue()
def _worker():
while True:
msg = q.get(block=True)
print(msg)
q.task_done()
t = threading.Thread(target=_worker)
#t.daemon = True
t.start()
q.put('asdf-1')
q.put('asdf-2')
q.put('asdf-4')
q.put('asdf-4')
What I want to accomplish is basically to make sure the queue is emptied before the main thread exits.
If I set t.daemon to be True the program will exit before the queue is emptied, however if it's set to False the program will never exit. Is there some way to make sure the thread running the _worker() method clears the queue on main thread exit?
The comments touch on using .join(), but depending on your use case, using a join may make threading pointless.
I assume that your main thread will be doing things other than adding items to the queue - and may be shut down at any point, you just want to ensure that your queue is empty before shutting down is complete.
At the end of your main thread, you could add a simple empty check in a loop.
while not q.empty():
sleep(1)
If you don't set t.daemon = True then the thread will never finish. Setting the thread as a daemon thread will mean that the thread does not cause your program to stay running when the main thread finishes.
Put a special item (e.g. None) in the queue, that signals the worker thread to stop.
import queue
import threading
q = queue.Queue()
def _worker():
while True:
msg = q.get(block=True)
if msg is None:
return
print(msg) # do your stuff here
t = threading.Thread(target=_worker)
#t.daemon = True
t.start()
q.put('asdf-1')
q.put('asdf-2')
q.put('asdf-4')
q.put('asdf-4')
q.put(None)
t.join()