Deadlock with logging multiprocess/multithread python script - python

I am facing the problem with collecting logs from the following script.
Once I set up the SLEEP_TIME to too "small" value, the LoggingThread
threads somehow block the logging module. The script freeze on logging request
in the action function. If the SLEEP_TIME is about 0.1 the script collect
all log messages as I expect.
I tried to follow this answer but it does not solve my problem.
import multiprocessing
import threading
import logging
import time
SLEEP_TIME = 0.000001
logger = logging.getLogger()
ch = logging.StreamHandler()
ch.setFormatter(logging.Formatter('%(asctime)s %(levelname)s %(funcName)s(): %(message)s'))
ch.setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)
logger.addHandler(ch)
class LoggingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
while True:
logger.debug('LoggingThread: {}'.format(self))
time.sleep(SLEEP_TIME)
def action(i):
logger.debug('action: {}'.format(i))
def do_parallel_job():
processes = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=processes)
for i in range(20):
pool.apply_async(action, args=(i,))
pool.close()
pool.join()
if __name__ == '__main__':
logger.debug('START')
#
# multithread part
#
for _ in range(10):
lt = LoggingThread()
lt.setDaemon(True)
lt.start()
#
# multiprocess part
#
do_parallel_job()
logger.debug('FINISH')
How to use logging module in multiprocess and multithread scripts?

This is probably bug 6721.
The problem is common in any situation where you have locks, threads and forks. If thread 1 had a lock while thread 2 calls fork, in the forked process, there will only be thread 2 and the lock will be held forever. In your case, that is logging.StreamHandler.lock.
A fix can be found here (permalink) for the logging module. Note that you need to take care of any other locks, too.

I've run into similar issue just recently while using logging module together with Pathos multiprocessing library. Still not 100% sure, but it seems, that in my case the problem may have been caused by the fact, that logging handler was trying to reuse a lock object from within different processes.
Was able to fix it with a simple wrapper around default logging Handler:
import threading
from collections import defaultdict
from multiprocessing import current_process
import colorlog
class ProcessSafeHandler(colorlog.StreamHandler):
def __init__(self):
super().__init__()
self._locks = defaultdict(lambda: threading.RLock())
def acquire(self):
current_process_id = current_process().pid
self._locks[current_process_id].acquire()
def release(self):
current_process_id = current_process().pid
self._locks[current_process_id].release()

By default, multiprocessing will fork() the process in the pool when running on Linux. The resulting subprocess will lose all running threads except for the main one. So if you're on Linux, that's the problem.
First action item: You shouldn't ever use the fork()-based pool; see https://pythonspeed.com/articles/python-multiprocessing/ and https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods.
On Windows, and I think newer versions of Python on macOS, the "spawn"-based pool is used. This is also what you ought use on Linux. In this setup, a new Python process is started. As you would expect, the new process doesn't have any of the threads from the parent process, because it's a new process.
Second action item: you'll want to have logging setup done in each subprocess in the pool; the logging setup for the parent process isn't sufficient to get logs from the worker processes. You do this with the initializer keyword argument to Pool, e.g. write a function called setup_logging() and then do pool = multiprocessing.Pool(initializer=setup_logging) (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool).

Related

Why does multiprocessing logging work without a QueueListener?

I have a multiprocessing setup, with a worker function and a parent function. Here is a simplified version:
from functools import partial
from typing import List
import logging
from logging.handlers import QueueHandler, QueueListener
from multiprocessing import Manager, Pool, Queue
def parent_func(log_file: str, task_list: List[str]):
# Setup the root logger
logger = getLogger()
logger.addHandler(FileHandler(log_file))
# Setup the logging listener, which should log to the log file and to stdout
manager = Manager()
queue = manager.Queue()
# Copy all handlers from the root logger
listener = QueueListener(queue, *logger.handlers)
listener.start()
# Fork into workers
with Pool() as pool:
func = partial(
worker_func,
logging_queue=queue,
)
pool.map(func, task_list)
listener.stop()
def worker_func(
data: str,
logging_queue: Queue,
):
# Note: this logger is totally disconnected from the parent logger since it shares no handlers
logger = getLogger(data)
logger.handlers = []
logger.addHandler(QueueHandler(logging_queue))
logger.info("Processing {}".format(data))
The problems are:
Using this code, each logger.info call in the worker function logs everything to stdout twice, and
If I remove the listener variable entirely, I somehow still receive logs from the worker functions (even though they're running on a different process!)
This implies that I'm doing something very wrong, because the logging queue doesn't even seem to be necessary. What am I doing wrong here?

How can I inherit parent logger when using Python's multiprocessing? Especially for paramiko

I'm using Python's multiprocessing. I have set the logger in parent process, but I can't just simply inherit the parent's logging setting.
I don't worry about mixing up the log, for I use multiprocessing not for running jobs concurrently, but for time controlling, so only one sub process is running at same time.
My code without multiprocessing:
from multiprocessing import Process
import paramiko
import logging
import sys
def sftp_read():
# log.debug("Child process started") # This line will cause exception if it is run in sub process.
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
timeout = 60
ssh.connect('my_server', username='my_user', password='my_password', timeout=timeout, auth_timeout=timeout,
banner_timeout=timeout)
sftp = ssh.open_sftp()
fp = sftp.file('/home/my_user/my_file.txt')
lines = fp.readlines()
print ''.join(lines)
fp.close()
ssh.close()
def main():
sftp_read() # Call this function without multiprocessing
if __name__ == '__main__':
logging.basicConfig(stream=sys.stdout,
format='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s')
log = logging.getLogger()
log.setLevel(logging.DEBUG)
main()
The above code work properly, paramiko prints log normally like below:
[2018-11-20 10:38:45,051] {transport.py:1746} DEBUG - starting thread (client mode): 0x3052208L
[2018-11-20 10:38:45,051] {transport.py:1746} DEBUG - Local version/idstring: SSH-2.0-paramiko_2.4.2
[2018-11-20 10:38:45,405] {transport.py:1746} DEBUG - Remote version/idstring: SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.6
[2018-11-20 10:38:45,405] {transport.py:1746} INFO - Connected (version 2.0, client OpenSSH_7.2p2)
But when I change the main function into the following code to control the time (limit the max running time of the SFTP reading to 15 seconds):
def main():
# Use multiprocessing to limit the running time to at most 15 seconds.
p = Process(target=sftp_read)
try:
log.debug("About to start SSH")
p.start()
log.debug('Process started')
p.join(15)
finally:
if p.is_alive():
p.terminate()
log.debug('Terminated')
else:
log.debug("Finished normally")
Paramiko no longer prints log. Now I want to set the logging config to as same as the parent one, how can I do it?
I don't want an answer telling me to get a logger again, because in my production server there is a global logging setting and might be changed every now and then, I can't configure my own logging setting which is not controlled by the global setting.
So I wonder if there is a way allowing me to configure my sub process's logging setting as the parent one.
In Python, subprocesses are started on POSIX standard. SubProcesses in POSIX standard are created using fork system call. The child process created using fork is essentially a copy of everything in the parent process' memory. In your case, child process will have access to logger from parent.
Warning: fork copies everything; but, does not copy threads. Any threads running in parent process do not exist in child process.
import logging
from multiprocessing import Pool
from os import getpid
def runs_in_subprocess():
logging.info(
"I am the child, with PID {}".format(getpid()))
if __name__ == '__main__':
logging.basicConfig(
format='GADZOOKS %(message)s', level=logging.DEBUG)
logging.info(
"I am the parent, with PID {}".format(getpid()))
with Pool() as pool:
pool.apply(runs_in_subprocess)
Output:
GADZOOKS I am the parent, with PID 3884
GADZOOKS I am the child, with PID 3885
Notice how child processes in your pool inherit the parent process’ logging configuration
You might get into problem of deadlocks so beware of following:
Whenever the thread in the parent process writes a log messages, it adds it to a Queue. That involves acquiring a lock.
If the fork() happens at the wrong time, the lock is copied in an acquired state.
The child process copies the parent’s logging configuration—including the queue.
Whenever the child process writes a log message, it tries to write it to the queue.
That means acquiring the lock, but the lock is already acquired.
The child process now waits for the lock to be released.
The lock will never be released, because the thread that would release it wasn’t copied over by the fork().
In python3, you could avoid this using get_context.
from multiprocessing import get_context
def your_func():
with get_context("spawn").Pool() as pool:
# ... everything else is unchanged
Advice:
Using get_context create a new Pool and use process' within this pool to do the job for you.
Every process from the pool will have access to the parent process' log config.

Python - multiprocessing - processes became zombies

For a couple weeks I have been trying to solve a problem with a multiprocessing module in python (2.7.x)
Idea:
Lets have Message Queue (RabbitMQ in our case). Create a listener on that queue and on the message spawn task which will process that message.
Problem:
Everything works fine, but after a couple hundred tasks, some sub-processes became zombies which is the main problem.
We have also some limitation (such as max number of tasks per machine) - which in the end leads that the machine stops processing any task.
Current implementation:
I created minimal code which should explain our approach
# -*- coding: utf-8 -*-
from multiprocessing import Process
import signal
from threading import Lock
class Task(Process):
def __init__(self, data):
super(Task, self).__init__()
self.data = data
def run(self):
# ignore sigchild signals in subprocess
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
self.do_job() # long job there
pass
def do_job(self):
# very long job
pass
class MQListener(object):
def __init__(self):
self.tasks = []
self.tasks_lock = Lock()
self.register_signal_handler()
mq = RabbitMQ()
mq.listen("task_queue", self.on_message)
def register_signal_handler(self):
signal.signal(signal.SIGCHLD, self.on_signal_received)
def on_signal_received(self, *_):
self._check_existing_processes()
def on_message(self, message):
# ack message and create task
task = Task(message)
with self.tasks_lock:
self.tasks.append(task)
task.start()
pass
def _check_existing_processes(self):
"""
go over all created task, if some is not alive - remove them from tasks collection
"""
try:
with self.tasks_lock:
running_tasks = []
for w in self.tasks:
if not w.is_alive():
w.join()
else:
running_tasks.append(w)
self.tasks = running_tasks
except Exception:
# log
pass
if __name__ == '__main__':
m = MQListener()
I'm quite open to use some library for that - if you can recommend some, that will be great as well.
Using SIGCHLD to catch child processes termination has quite many gotchas. The signal handler is run asynchronously and multiple SIGCHLD calls might get aggregated.
In short is better not to use it as long as you're not really aware of how it works.
Your program has, as well, another issue: what happens if you get 10000 messages at once? You'll spawn 10000 processes altogether and kill your machine.
You could use a process Pool and let it handle all these issues for you.
from multiprocessing import Pool
class MQListener(object):
def __init__(self):
self.pool = Pool()
self.rabbitclient = RabbitMQ()
def new_message(self, message):
self.pool.apply_async(do_job, args=(message, ))
def run(self):
self.rabbitclient.listen("task_queue", self.new_message)
app = MQListener()
app.run()

Allow Greenlet to finish work at end of main module execution

I'm making a library that uses gevent to do some work asynchronously. I'd like to guarantee that the work is completed, even if the main module finishes execution.
class separate_library(object):
def __init__(self):
import gevent.monkey; gevent.monkey.patch_all()
def do_work(self):
from gevent import spawn
spawn(self._do)
def _do(self):
from gevent import sleep
sleep(1)
print 'Done!'
if __name__ == '__main__':
lib = separate_library()
lib.do_work()
If you run this, you'll notice the program ends immediately, and Done! doesn't get printed.
Now, the main module doesn't know, or care, how separate_library actually accomplishes the work (or even that gevent is being used), so it's unreasonable to require joining there.
Is there any way separate_library can detect certain types of program exits, and stall until the work is done? Keyboard interrupts, SIGINTs, and sys.exit() should end the program immediately, as that is probably the expected behaviour.
Thanks!
Try using a new thread that is not a daemon thread that spawns your gevent threads. Your program will not exit due to this non daemon thread.
import gevent
import threading
class separate_library(object):
def __init__(self):
import gevent.monkey; gevent.monkey.patch_all()
def do_work(self):
t = threading.Thread(target=self.spawn_gthreads)
t.setDaemon(False)
t.start()
def spawn_gthreads(self):
from gevent import spawn
gthreads = [spawn(self._do,x) for x in range(10)]
gevent.joinall(gthreads)
def _do(self,sec):
from gevent import sleep
sleep(sec)
print 'Done!'
if __name__ == '__main__':
lib = separate_library()
lib.do_work()

Adding objects to queue without interruption

I would like to put two objects into a queue, but I've got to be sure the objects are in both queues at the same time, therefore it should not be interrupted in between - something like an atomic block. Does some one have a solution? Many thanks...
queue_01.put(car)
queue_02.put(bike)
You could use a Condition object. You can tell the threads to wait with cond.wait(), and signal when the queues are ready with cond.notify_all(). See, for example, Doug Hellman's wonderful Python Module of the Week blog. His code uses multiprocessing; here I've adapted it for threading:
import threading
import Queue
import time
def stage_1(cond,q1,q2):
"""perform first stage of work, then notify stage_2 to continue"""
with cond:
q1.put('car')
q2.put('bike')
print 'stage_1 done and ready for stage 2'
cond.notify_all()
def stage_2(cond,q):
"""wait for the condition telling us stage_1 is done"""
name=threading.current_thread().name
print 'Starting', name
with cond:
cond.wait()
print '%s running' % name
def run():
# http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html#synchronizing-threads-with-a-condition-object
condition=threading.Condition()
queue_01=Queue.Queue()
queue_02=Queue.Queue()
s1=threading.Thread(name='s1', target=stage_1, args=(condition,queue_01,queue_02))
s2_clients=[
threading.Thread(name='stage_2[1]', target=stage_2, args=(condition,queue_01)),
threading.Thread(name='stage_2[2]', target=stage_2, args=(condition,queue_02)),
]
# Notice stage2 processes are started before stage1 process, and yet they wait
# until stage1 finishes
for c in s2_clients:
c.start()
time.sleep(1)
s1.start()
s1.join()
for c in s2_clients:
c.join()
run()
Running the script yields
Starting stage_2[1]
Starting stage_2[2]
stage_1 done and ready for stage 2 <-- Notice that stage2 is prevented from running until the queues have been packed.
stage_2[2] running
stage_2[1] running
To atomically add to two different queues, acquire the locks for both queues first. That's easiest to do by making a subclass of Queue that uses recursive locks.
import Queue # Note: module renamed to "queue" in Python 3
import threading
class MyQueue(Queue.Queue):
"Make a queue that uses a recursive lock instead of a regular lock"
def __init__(self):
Queue.Queue.__init__(self)
self.mutex = threading.RLock()
queue_01 = MyQueue()
queue_02 = MyQueue()
with queue_01.mutex:
with queue_02.mutex:
queue_01.put(1)
queue_02.put(2)

Categories

Resources