Python logging handler to append to list - python

Somewhat similar to this (unanswered) question here:
https://stackoverflow.com/questions/33136398/python-logging-handler-that-appends-messages-to-a-list
I am looking to use a handler to simply append anything (passing a filter) to a python list. I would envisage some kind of code like this:
import logging
import sys
mylog = logging.getLogger()
mylog.setLevel(logging.DEBUG)
log_list = []
lh = logging.SomeHandler(log_list)
lh.setLevel(logging.DEBUG)
mylog.addHandler(lh)
mylog.warning('argh')
print log_list[0]
The question is therefore - how can I implement this? What SomeHandler can I use?

Here is a naive, non thread-safe implementation:
import logging
class ListHandler(logging.Handler): # Inherit from logging.Handler
def __init__(self, log_list):
# run the regular Handler __init__
logging.Handler.__init__(self)
# Our custom argument
self.log_list = log_list
def emit(self, record):
# record.message is the log message
self.log_list.append(record.msg)

#imriqwe's answer is correct for a non thread-safe implementation, but if you need to be thread-safe, one solution is to use a queue.Queue() instead of a list. Here is some code I am using in an in-process project to generate a tkinter log window.
import logging
import queue
class QueuingHandler(logging.Handler):
"""A thread safe logging.Handler that writes messages into a queue object.
Designed to work with LoggingWidget so log messages from multiple
threads can be shown together in a single ttk.Frame.
The standard logging.QueueHandler/logging.QueueListener can not be used
for this because the QueueListener runs in a private thread, not the
main thread.
Warning: If multiple threads are writing into this Handler, all threads
must be joined before calling logging.shutdown() or any other log
destinations will be corrupted.
"""
def __init__(self, *args, message_queue, **kwargs):
"""Initialize by copying the queue and sending everything else to superclass."""
logging.Handler.__init__(self, *args, **kwargs)
self.message_queue = message_queue
def emit(self, record):
"""Add the formatted log message (sans newlines) to the queue."""
self.message_queue.put(self.format(record).rstrip('\n'))
To use, create a queue, create the handler using the queue, then add it to the logger (this example also creates a log file in the current directory):
LOG_FORMAT = '%(asctime)s: %(name)8s: %(levelname)8s: %(message)s'
# Setup root logger to write to a log file.
logging.basicConfig(filename='gui-test.log',
filemode='w',
format=LOG_FORMAT,
level=logging.DEBUG
)
# Get a child logger
logger = logging.getLogger(name='gui')
# Build our QueuingHandler
message_queue = queue.Queue()
handler = QueuingHandler(message_queue=message_queue, level=logging.DEBUG)
# Change the date/time format for the GUI to drop the date
formatter = logging.Formatter(LOG_FORMAT)
formatter.default_time_format = '%H:%M:%S'
handler.setFormatter(formatter)
# Add our QueuingHandler into the logging heirarchy at the lower level
logger.addHandler(handler)
Now all you have to do is read your messages from the queue.

Related

Python logging module always uses the same file

I have two logging classes, that should print logs on two different files.
First class
import logging
class firstLog:
def __init__(self):
self.logger = logging.getLogger("my_logger1")
logging.basicConfig(format="%(asctime)s.%(msecs)06d:\n%(message)s\n", level=logging.DEBUG,
datefmt="%d/%m/%Y %H:%M:%S",
filename="filename1.log")
def printlog(self, msg, *args, **kwargs):
self.logger.debug(msg, *args, **kwargs)
Second class
import logging
class secondLog:
def __init__(self):
self.logger = logging.getLogger("my_logger2")
logging.basicConfig(format="%(asctime)s.%(msecs)06d:\n%(message)s\n", level=logging.DEBUG,
datefmt="%d/%m/%Y %H:%M:%S",
filename="filename2.log")
def printlog(self, msg, *args, **kwargs):
self.logger.debug(msg, *args, **kwargs)
I instantiate the two classes, but only filename1.log is written
log1 = firstLog()
log2 = secondLog()
log1.printlog("text1")
log2.printlog("text2")
What I obtain is that a new file, called filename1.log is created, and contains
04/06/2021 08:57:04.000988:
text1
04/06/2021 08:57:04.000990:
text2
but I need two separate files
logging.basicConfig() is only applied once.
If logging has been configured already, basicConfig() will do nothing unless force is set. (source)
In your case you should get separate loggers with logging.getLogger(name) and different names. Then configure the loggers manually.
logging.basicConfig() is meant to be run once at your app's startup time, and will configure the root logger.
You will need to configure the two other loggers manually if you need their output to go elsewhere (or maybe configure a "routing" handler on the root logger).

Change log-level via mocking

I want to change the log-level temporarily.
My current strategy is to use mocking.
with mock.patch(...):
my_method_which_does_log()
All logging.info() calls inside the method should get ignored and not logged to the console.
How to implement the ... to make logs of level INFO get ignored?
The code is single-process and single-thread and executed during testing only.
I want to change the log-level temporarily.
A way to do this without mocking is logging.disable
class TestSomething(unittest.TestCase):
def setUp(self):
logging.disable(logging.WARNING)
def tearDown(self):
logging.disable(logging.NOTSET)
This example would only show messages of level WARNING and above for each test in the TestSomething class. (You call disable at the start and end of each test as needed. This seems a bit cleaner.)
To unset this temporary throttling, call logging.disable(logging.NOTSET):
If logging.disable(logging.NOTSET) is called, it effectively removes this overriding level, so that logging output again depends on the effective levels of individual loggers.
I don't think mocking is going to do what you want. The loggers are presumably already instantiated in this scenario, and level is an instance variable for each of the loggers (and also any of the handlers that each logger has).
You can create a custom context manager. That would look something like this:
Context Manager
import logging
class override_logging_level():
"A context manager for temporarily setting the logging level"
def __init__(self, level, process_handlers=True):
self.saved_level = {}
self.level = level
self.process_handlers = process_handlers
def __enter__(self):
# Save the root logger
self.save_logger('', logging.getLogger())
# Iterate over the other loggers
for name, logger in logging.Logger.manager.loggerDict.items():
self.save_logger(name, logger)
def __exit__(self, exception_type, exception_value, traceback):
# Restore the root logger
self.restore_logger('', logging.getLogger())
# Iterate over the loggers
for name, logger in logging.Logger.manager.loggerDict.items():
self.restore_logger(name, logger)
def save_logger(self, name, logger):
# Save off the level
self.saved_level[name] = logger.level
# Override the level
logger.setLevel(self.level)
if not self.process_handlers:
return
# Iterate over the handlers for this logger
for handler in logger.handlers:
# No reliable name. Just use the id of the object
self.saved_level[id(handler)] = handler.level
def restore_logger(self, name, logger):
# It's possible that some intervening code added one or more loggers...
if name not in self.saved_level:
return
# Restore the level for the logger
logger.setLevel(self.saved_level[name])
if not self.process_handlers:
return
# Iterate over the handlers for this logger
for handler in logger.handlers:
# Reconstruct the key for this handler
key = id(handler)
# Again, we could have possibly added more handlers
if key not in self.saved_level:
continue
# Restore the level for the handler
handler.setLevel(self.saved_level[key])
Test Code
# Setup for basic logging
logging.basicConfig(level=logging.ERROR)
# Create some loggers - the root logger and a couple others
lr = logging.getLogger()
l1 = logging.getLogger('L1')
l2 = logging.getLogger('L2')
# Won't see this message due to the level
lr.info("lr - msg 1")
l1.info("l1 - msg 1")
l2.info("l2 - msg 1")
# Temporarily override the level
with override_logging_level(logging.INFO):
# Will see
lr.info("lr - msg 2")
l1.info("l1 - msg 2")
l2.info("l2 - msg 2")
# Won't see, again...
lr.info("lr - msg 3")
l1.info("l1 - msg 3")
l2.info("l2 - msg 3")
Results
$ python ./main.py
INFO:root:lr - msg 2
INFO:L1:l1 - msg 2
INFO:L2:l2 - msg 2
Notes
The code would need to be enhanced to support multithreading; for example, logging.Logger.manager.loggerDict is a shared variable that's guarded by locks in the logging code.
Using #cryptoplex's approach of using Context Managers, here's the official version from the logging cookbook:
import logging
import sys
class LoggingContext(object):
def __init__(self, logger, level=None, handler=None, close=True):
self.logger = logger
self.level = level
self.handler = handler
self.close = close
def __enter__(self):
if self.level is not None:
self.old_level = self.logger.level
self.logger.setLevel(self.level)
if self.handler:
self.logger.addHandler(self.handler)
def __exit__(self, et, ev, tb):
if self.level is not None:
self.logger.setLevel(self.old_level)
if self.handler:
self.logger.removeHandler(self.handler)
if self.handler and self.close:
self.handler.close()
# implicit return of None => don't swallow exceptions
You could use dependency injection to pass the logger instance to the method you are testing. It is a bit more invasive though since you are changing your method a little, however it gives you more flexibility.
Add the logger parameter to your method signature, something along the lines of:
def my_method( your_other_params, logger):
pass
In your unit test file:
if __name__ == "__main__":
# define the logger you want to use:
logging.basicConfig( stream=sys.stderr )
logging.getLogger( "MyTests.test_my_method" ).setLevel( logging.DEBUG )
...
def test_my_method(self):
test_logger = logging.getLogger( "MyTests.test_my_method" )
# pass your logger to your method
my_method(your_normal_parameters, test_logger)
python logger docs: https://docs.python.org/3/library/logging.html
I use this pattern to write all logs to a list. It ignores logs of level INFO and smaller.
logs=[]
import logging
def my_log(logger_self, level, *args, **kwargs):
if level>logging.INFO:
logs.append((args, kwargs))
with mock.patch('logging.Logger._log', my_log):
my_method_which_does_log()

Python logging (logutils) with QueueHandler and QueueListener

Logging is needed for a multiprocess python app. Using queues appears to be the best solution. The logutils library provides this.
Is it possible to set the log levels for two handler independently? For example, in the test below I would like STREAM 1 to have WARNING messages and STREAM 2 to have INFO messages. In the test log at the end of the code an INFO message is created that should not output to the console from STREAM 1 handler (WARNING). However, it outputs to both handlers.
For reference, I have been using this page http://plumberjack.blogspot.co.uk/2010/09/improved-queuehandler-queuelistener.html by Vinay Sajip who is the author of the library.
# System imports
import logging
import logging.handlers
try:
import Queue as queue
except ImportError:
import queue
# Custom imports
from logutils.queue import QueueHandler, QueueListener
# Get queue
q = queue.Queue(-1)
# Setup stream handler 1 to output WARNING to console
h1 = logging.StreamHandler()
f1 = logging.Formatter('STREAM 1 WARNING: %(threadName)s: %(message)s')
h1.setFormatter(f1)
h1.setLevel(logging.WARNING) # NOT WORKING. This should log >= WARNING
# Setup stream handler 2 to output INFO to console
h2 = logging.StreamHandler()
f2 = logging.Formatter('STREAM 2 INFO: %(threadName)s: %(message)s')
h2.setFormatter(f2)
h2.setLevel(logging.INFO) # NOT WORKING. This should log >= WARNING
# Start queue listener using the stream handler above
ql = QueueListener(q, h1, h2)
ql.start()
# Create log and set handler to queue handle
root = logging.getLogger()
root.setLevel(logging.DEBUG) # Log level = DEBUG
qh = QueueHandler(q)
root.addHandler(qh)
root.info('Look out!') # Create INFO message
ql.stop()
You can use the respect_handler_level argument that's been added to the QueueListener initializer from Python version 3.5, so that the listener will respect handlers individual level.
From QueueListener's doc:
If respect_handler_level is True, a handler’s level is respected (compared with the level for the message) when deciding whether to pass messages to that handler; otherwise, the behaviour is as in previous Python versions - to always pass each message to each handler.
In your case you should replace the initialization of the QueueListener to:
ql = QueueListener(q, h1, h2, respect_handler_level=True)
This is a limitation in the implementation of the QueueListener.handle() method. This is currently:
def handle(self, record):
record = self.prepare(record)
for handler in self.handlers:
handler.handle(record)
To do what you want, it should be
def handle(self, record):
record = self.prepare(record)
for handler in self.handlers:
# CHANGED HERE TO ADD A CONDITION TO CHECK THE HANDLER LEVEL
if record.levelno >= handler.level:
handler.handle(record)
I will fix this at some point because I think this is better, but for now you can subclass QueueListener and override the handle method in the subclass.
I used this subclass to override the queue listener class. The two other methods addHandler and removeHandler allow for addition and removal of handlers. The CustomQueueListener should be used just like QueueListener. Follow the other logging examples for how to use addHandler() and removeHandler().
class CustomQueueListener(QueueListener):
def __init__(self, queue, *handlers):
super(CustomQueueListener, self).__init__(queue, *handlers)
"""
Initialise an instance with the specified queue and
handlers.
"""
# Changing this to a list from tuple in the parent class
self.handlers = list(handlers)
def handle(self, record):
"""
Override handle a record.
This just loops through the handlers offering them the record
to handle.
:param record: The record to handle.
"""
record = self.prepare(record)
for handler in self.handlers:
if record.levelno >= handler.level: # This check is not in the parent class
handler.handle(record)
def addHandler(self, hdlr):
"""
Add the specified handler to this logger.
"""
if not (hdlr in self.handlers):
self.handlers.append(hdlr)
def removeHandler(self, hdlr):
"""
Remove the specified handler from this logger.
"""
if hdlr in self.handlers:
hdlr.close()
self.handlers.remove(hdlr)

How can you make logging module functions use a different logger?

I can create a named child logger, so that all the logs output by that logger are marked with it's name. I can use that logger exclusively in my function/class/whatever.
However, if that code calls out to functions in another module that makes use of logging using just the logging module functions (that proxy to the root logger), how can I ensure that those log messages go through the same logger (or are at least logged in the same way)?
For example:
main.py
import logging
import other
def do_stuff(logger):
logger.info("doing stuff")
other.do_more_stuff()
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("stuff")
do_stuff(logger)
other.py
import logging
def do_more_stuff():
logging.info("doing other stuff")
Outputs:
$ python main.py
INFO:stuff:doing stuff
INFO:root:doing other stuff
I want to be able to cause both log lines to be marked with the name 'stuff', and I want to be able to do this only changing main.py.
How can I cause the logging calls in other.py to use a different logger without changing that module?
This is the solution I've come up with:
Using thread local data to store the contextual information, and using a Filter on the root loggers handlers to add this information to LogRecords before they are emitted.
context = threading.local()
context.name = None
class ContextFilter(logging.Filter):
def filter(self, record):
if context.name is not None:
record.name = "%s.%s" % (context.name, record.name)
return True
This is fine for me, because I'm using the logger name to indicate what task was being carried out when this message was logged.
I can then use context managers or decorators to make logging from a particular passage of code all appear as though it was logged from a particular child logger.
#contextlib.contextmanager
def logname(name):
old_name = context.name
if old_name is None:
context.name = name
else:
context.name = "%s.%s" % (old_name, name)
try:
yield
finally:
context.name = old_name
def as_logname(name):
def decorator(f):
#functools.wraps(f)
def wrapper(*args, **kwargs):
with logname(name):
return f(*args, **kwargs)
return wrapper
return decorator
So then, I can do:
with logname("stuff"):
logging.info("I'm doing stuff!")
do_more_stuff()
or:
#as_logname("things")
def do_things():
logging.info("Starting to do things")
do_more_stuff()
The key thing being that any logging that do_more_stuff() does will be logged as if it were logged with either a "stuff" or "things" child logger, without having to change do_more_stuff() at all.
This solution would have problems if you were going to have different handlers on different child loggers.
Use logging.setLoggerClass so that all loggers used by other modules use your logger subclass (emphasis mine):
Tells the logging system to use the class klass when instantiating a logger. The class should define __init__() such that only a name argument is required, and the __init__() should call Logger.__init__(). This function is typically called before any loggers are instantiated by applications which need to use custom logger behavior.
This is what logging.handlers (or the handlers in the logging module) is for. In addition to creating your logger, you create one or more handlers to send the logging information to various places and add them to the root logger. Most modules that do logging create a logger that they use for there own purposes but depend on the controlling script to create the handlers. Some frameworks decide to be super helpful and add handlers for you.
Read the logging docs, its all there.
(edit)
logging.basicConfig() is a helper function that adds a single handler to the root logger. You can control the format string it uses with the 'format=' parameter. If all you want to do is have all modules display "stuff", then use logging.basicConfig(level=logging.INFO, format="%(levelname)s:stuff:%(message)s").
The logging.{info,warning,…} methods just call the respective methods on a Logger object called root (cf. the logging module source), so if you know the other module is only calling the functions exported by the logging module, you can overwrite the logging module in others namespace with your logger object:
import logging
import other
def do_stuff(logger):
logger.info("doing stuff")
other.do_more_stuff()
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("stuff")
# Overwrite other.logging with your just-generated logger object:
other.logging = logger
do_stuff(logger)

How should I log while using multiprocessing in Python?

Right now I have a central module in a framework that spawns multiple processes using the Python 2.6 multiprocessing module. Because it uses multiprocessing, there is module-level multiprocessing-aware log, LOG = multiprocessing.get_logger(). Per the docs, this logger (EDIT) does not have process-shared locks so that you don't garble things up in sys.stderr (or whatever filehandle) by having multiple processes writing to it simultaneously.
The issue I have now is that the other modules in the framework are not multiprocessing-aware. The way I see it, I need to make all dependencies on this central module use multiprocessing-aware logging. That's annoying within the framework, let alone for all clients of the framework. Are there alternatives I'm not thinking of?
I just now wrote a log handler of my own that just feeds everything to the parent process via a pipe. I've only been testing it for ten minutes but it seems to work pretty well.
(Note: This is hardcoded to RotatingFileHandler, which is my own use case.)
Update: #javier now maintains this approach as a package available on Pypi - see multiprocessing-logging on Pypi, github at https://github.com/jruere/multiprocessing-logging
Update: Implementation!
This now uses a queue for correct handling of concurrency, and also recovers from errors correctly. I've now been using this in production for several months, and the current version below works without issue.
from logging.handlers import RotatingFileHandler
import multiprocessing, threading, logging, sys, traceback
class MultiProcessingLog(logging.Handler):
def __init__(self, name, mode, maxsize, rotate):
logging.Handler.__init__(self)
self._handler = RotatingFileHandler(name, mode, maxsize, rotate)
self.queue = multiprocessing.Queue(-1)
t = threading.Thread(target=self.receive)
t.daemon = True
t.start()
def setFormatter(self, fmt):
logging.Handler.setFormatter(self, fmt)
self._handler.setFormatter(fmt)
def receive(self):
while True:
try:
record = self.queue.get()
self._handler.emit(record)
except (KeyboardInterrupt, SystemExit):
raise
except EOFError:
break
except:
traceback.print_exc(file=sys.stderr)
def send(self, s):
self.queue.put_nowait(s)
def _format_record(self, record):
# ensure that exc_info and args
# have been stringified. Removes any chance of
# unpickleable things inside and possibly reduces
# message size sent over the pipe
if record.args:
record.msg = record.msg % record.args
record.args = None
if record.exc_info:
dummy = self.format(record)
record.exc_info = None
return record
def emit(self, record):
try:
s = self._format_record(record)
self.send(s)
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
def close(self):
self._handler.close()
logging.Handler.close(self)
The only way to deal with this non-intrusively is to:
Spawn each worker process such that its log goes to a different file descriptor (to disk or to pipe.) Ideally, all log entries should be timestamped.
Your controller process can then do one of the following:
If using disk files: Coalesce the log files at the end of the run, sorted by timestamp
If using pipes (recommended): Coalesce log entries on-the-fly from all pipes, into a central log file. (E.g., Periodically select from the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)
QueueHandler is native in Python 3.2+, and does exactly this. It is easily replicated in previous versions.
Python docs have two complete examples: Logging to a single file from multiple processes
For those using Python < 3.2, just copy QueueHandler into your own code from: https://gist.github.com/vsajip/591589 or alternatively import logutils.
Each process (including the parent process) puts its logging on the Queue, and then a listener thread or process (one example is provided for each) picks those up and writes them all to a file - no risk of corruption or garbling.
Below is another solution with a focus on simplicity for anyone else (like me) who get here from Google. Logging should be easy! Only for 3.2 or higher.
import multiprocessing
import logging
from logging.handlers import QueueHandler, QueueListener
import time
import random
def f(i):
time.sleep(random.uniform(.01, .05))
logging.info('function called with {} in worker thread.'.format(i))
time.sleep(random.uniform(.01, .05))
return i
def worker_init(q):
# all records from worker processes go to qh and then into q
qh = QueueHandler(q)
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logger.addHandler(qh)
def logger_init():
q = multiprocessing.Queue()
# this is the handler for all log records
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("%(levelname)s: %(asctime)s - %(process)s - %(message)s"))
# ql gets records from the queue and sends them to the handler
ql = QueueListener(q, handler)
ql.start()
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
# add the handler to the logger so records from this process are handled
logger.addHandler(handler)
return ql, q
def main():
q_listener, q = logger_init()
logging.info('hello from main thread')
pool = multiprocessing.Pool(4, worker_init, [q])
for result in pool.map(f, range(10)):
pass
pool.close()
pool.join()
q_listener.stop()
if __name__ == '__main__':
main()
Yet another alternative might be the various non-file-based logging handlers in the logging package:
SocketHandler
DatagramHandler
SyslogHandler
(and others)
This way, you could easily have a logging daemon somewhere that you could write to safely and would handle the results correctly. (E.g., a simple socket server that just unpickles the message and emits it to its own rotating file handler.)
The SyslogHandler would take care of this for you, too. Of course, you could use your own instance of syslog, not the system one.
As of 2020 it seems there is a simpler way of logging with multiprocessing.
This function will create the logger. You can set the format here and where you want your output to go (file, stdout):
def create_logger():
import multiprocessing, logging
logger = multiprocessing.get_logger()
logger.setLevel(logging.INFO)
formatter = logging.Formatter(\
'[%(asctime)s| %(levelname)s| %(processName)s] %(message)s')
handler = logging.FileHandler('logs/your_file_name.log')
handler.setFormatter(formatter)
# this bit will make sure you won't have
# duplicated messages in the output
if not len(logger.handlers):
logger.addHandler(handler)
return logger
In the init you instantiate the logger:
if __name__ == '__main__':
from multiprocessing import Pool
logger = create_logger()
logger.info('Starting pooling')
p = Pool()
# rest of the code
Now, you only need to add this reference in each function where you need logging:
logger = create_logger()
And output messages:
logger.info(f'My message from {something}')
Hope this helps.
A variant of the others that keeps the logging and queue thread separate.
"""sample code for logging in subprocesses using multiprocessing
* Little handler magic - The main process uses loggers and handlers as normal.
* Only a simple handler is needed in the subprocess that feeds the queue.
* Original logger name from subprocess is preserved when logged in main
process.
* As in the other implementations, a thread reads the queue and calls the
handlers. Except in this implementation, the thread is defined outside of a
handler, which makes the logger definitions simpler.
* Works with multiple handlers. If the logger in the main process defines
multiple handlers, they will all be fed records generated by the
subprocesses loggers.
tested with Python 2.5 and 2.6 on Linux and Windows
"""
import os
import sys
import time
import traceback
import multiprocessing, threading, logging, sys
DEFAULT_LEVEL = logging.DEBUG
formatter = logging.Formatter("%(levelname)s: %(asctime)s - %(name)s - %(process)s - %(message)s")
class SubProcessLogHandler(logging.Handler):
"""handler used by subprocesses
It simply puts items on a Queue for the main process to log.
"""
def __init__(self, queue):
logging.Handler.__init__(self)
self.queue = queue
def emit(self, record):
self.queue.put(record)
class LogQueueReader(threading.Thread):
"""thread to write subprocesses log records to main process log
This thread reads the records written by subprocesses and writes them to
the handlers defined in the main process's handlers.
"""
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
self.daemon = True
def run(self):
"""read from the queue and write to the log handlers
The logging documentation says logging is thread safe, so there
shouldn't be contention between normal logging (from the main
process) and this thread.
Note that we're using the name of the original logger.
"""
# Thanks Mike for the error checking code.
while True:
try:
record = self.queue.get()
# get the logger for this record
logger = logging.getLogger(record.name)
logger.callHandlers(record)
except (KeyboardInterrupt, SystemExit):
raise
except EOFError:
break
except:
traceback.print_exc(file=sys.stderr)
class LoggingProcess(multiprocessing.Process):
def __init__(self, queue):
multiprocessing.Process.__init__(self)
self.queue = queue
def _setupLogger(self):
# create the logger to use.
logger = logging.getLogger('test.subprocess')
# The only handler desired is the SubProcessLogHandler. If any others
# exist, remove them. In this case, on Unix and Linux the StreamHandler
# will be inherited.
for handler in logger.handlers:
# just a check for my sanity
assert not isinstance(handler, SubProcessLogHandler)
logger.removeHandler(handler)
# add the handler
handler = SubProcessLogHandler(self.queue)
handler.setFormatter(formatter)
logger.addHandler(handler)
# On Windows, the level will not be inherited. Also, we could just
# set the level to log everything here and filter it in the main
# process handlers. For now, just set it from the global default.
logger.setLevel(DEFAULT_LEVEL)
self.logger = logger
def run(self):
self._setupLogger()
logger = self.logger
# and here goes the logging
p = multiprocessing.current_process()
logger.info('hello from process %s with pid %s' % (p.name, p.pid))
if __name__ == '__main__':
# queue used by the subprocess loggers
queue = multiprocessing.Queue()
# Just a normal logger
logger = logging.getLogger('test')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(DEFAULT_LEVEL)
logger.info('hello from the main process')
# This thread will read from the subprocesses and write to the main log's
# handlers.
log_queue_reader = LogQueueReader(queue)
log_queue_reader.start()
# create the processes.
for i in range(10):
p = LoggingProcess(queue)
p.start()
# The way I read the multiprocessing warning about Queue, joining a
# process before it has finished feeding the Queue can cause a deadlock.
# Also, Queue.empty() is not realiable, so just make sure all processes
# are finished.
# active_children joins subprocesses when they're finished.
while multiprocessing.active_children():
time.sleep(.1)
All current solutions are too coupled to the logging configuration by using a handler. My solution has the following architecture and features:
You can use any logging configuration you want
Logging is done in a daemon thread
Safe shutdown of the daemon by using a context manager
Communication to the logging thread is done by multiprocessing.Queue
In subprocesses, logging.Logger (and already defined instances) are patched to send all records to the queue
New: format traceback and message before sending to queue to prevent pickling errors
Code with usage example and output can be found at the following Gist: https://gist.github.com/schlamar/7003737
Since we can represent multiprocess logging as many publishers and one subscriber (listener), using ZeroMQ to implement PUB-SUB messaging is indeed an option.
Moreover, PyZMQ module, the Python bindings for ZMQ, implements PUBHandler, which is object for publishing logging messages over a zmq.PUB socket.
There's a solution on the web, for centralized logging from distributed application using PyZMQ and PUBHandler, which can be easily adopted for working locally with multiple publishing processes.
formatters = {
logging.DEBUG: logging.Formatter("[%(name)s] %(message)s"),
logging.INFO: logging.Formatter("[%(name)s] %(message)s"),
logging.WARN: logging.Formatter("[%(name)s] %(message)s"),
logging.ERROR: logging.Formatter("[%(name)s] %(message)s"),
logging.CRITICAL: logging.Formatter("[%(name)s] %(message)s")
}
# This one will be used by publishing processes
class PUBLogger:
def __init__(self, host, port=config.PUBSUB_LOGGER_PORT):
self._logger = logging.getLogger(__name__)
self._logger.setLevel(logging.DEBUG)
self.ctx = zmq.Context()
self.pub = self.ctx.socket(zmq.PUB)
self.pub.connect('tcp://{0}:{1}'.format(socket.gethostbyname(host), port))
self._handler = PUBHandler(self.pub)
self._handler.formatters = formatters
self._logger.addHandler(self._handler)
#property
def logger(self):
return self._logger
# This one will be used by listener process
class SUBLogger:
def __init__(self, ip, output_dir="", port=config.PUBSUB_LOGGER_PORT):
self.output_dir = output_dir
self._logger = logging.getLogger()
self._logger.setLevel(logging.DEBUG)
self.ctx = zmq.Context()
self._sub = self.ctx.socket(zmq.SUB)
self._sub.bind('tcp://*:{1}'.format(ip, port))
self._sub.setsockopt(zmq.SUBSCRIBE, "")
handler = handlers.RotatingFileHandler(os.path.join(output_dir, "client_debug.log"), "w", 100 * 1024 * 1024, 10)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(asctime)s;%(levelname)s - %(message)s")
handler.setFormatter(formatter)
self._logger.addHandler(handler)
#property
def sub(self):
return self._sub
#property
def logger(self):
return self._logger
# And that's the way we actually run things:
# Listener process will forever listen on SUB socket for incoming messages
def run_sub_logger(ip, event):
sub_logger = SUBLogger(ip)
while not event.is_set():
try:
topic, message = sub_logger.sub.recv_multipart(flags=zmq.NOBLOCK)
log_msg = getattr(logging, topic.lower())
log_msg(message)
except zmq.ZMQError as zmq_error:
if zmq_error.errno == zmq.EAGAIN:
pass
# Publisher processes loggers should be initialized as follows:
class Publisher:
def __init__(self, stop_event, proc_id):
self.stop_event = stop_event
self.proc_id = proc_id
self._logger = pub_logger.PUBLogger('127.0.0.1').logger
def run(self):
self._logger.info("{0} - Sending message".format(proc_id))
def run_worker(event, proc_id):
worker = Publisher(event, proc_id)
worker.run()
# Starting subscriber process so we won't loose publisher's messages
sub_logger_process = Process(target=run_sub_logger,
args=('127.0.0.1'), stop_event,))
sub_logger_process.start()
#Starting publisher processes
for i in range(MAX_WORKERS_PER_CLIENT):
processes.append(Process(target=run_worker,
args=(stop_event, i,)))
for p in processes:
p.start()
I also like zzzeek's answer but Andre is correct that a queue is required to prevent garbling. I had some luck with the pipe, but did see garbling which is somewhat expected. Implementing it turned out to be harder than I thought, particularly due to running on Windows, where there are some additional restrictions about global variables and stuff (see: How's Python Multiprocessing Implemented on Windows?)
But, I finally got it working. This example probably isn't perfect, so comments and suggestions are welcome. It also does not support setting the formatter or anything other than the root logger. Basically, you have to reinit the logger in each of the pool processes with the queue and set up the other attributes on the logger.
Again, any suggestions on how to make the code better are welcome. I certainly don't know all the Python tricks yet :-)
import multiprocessing, logging, sys, re, os, StringIO, threading, time, Queue
class MultiProcessingLogHandler(logging.Handler):
def __init__(self, handler, queue, child=False):
logging.Handler.__init__(self)
self._handler = handler
self.queue = queue
# we only want one of the loggers to be pulling from the queue.
# If there is a way to do this without needing to be passed this
# information, that would be great!
if child == False:
self.shutdown = False
self.polltime = 1
t = threading.Thread(target=self.receive)
t.daemon = True
t.start()
def setFormatter(self, fmt):
logging.Handler.setFormatter(self, fmt)
self._handler.setFormatter(fmt)
def receive(self):
#print "receive on"
while (self.shutdown == False) or (self.queue.empty() == False):
# so we block for a short period of time so that we can
# check for the shutdown cases.
try:
record = self.queue.get(True, self.polltime)
self._handler.emit(record)
except Queue.Empty, e:
pass
def send(self, s):
# send just puts it in the queue for the server to retrieve
self.queue.put(s)
def _format_record(self, record):
ei = record.exc_info
if ei:
dummy = self.format(record) # just to get traceback text into record.exc_text
record.exc_info = None # to avoid Unpickleable error
return record
def emit(self, record):
try:
s = self._format_record(record)
self.send(s)
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
def close(self):
time.sleep(self.polltime+1) # give some time for messages to enter the queue.
self.shutdown = True
time.sleep(self.polltime+1) # give some time for the server to time out and see the shutdown
def __del__(self):
self.close() # hopefully this aids in orderly shutdown when things are going poorly.
def f(x):
# just a logging command...
logging.critical('function number: ' + str(x))
# to make some calls take longer than others, so the output is "jumbled" as real MP programs are.
time.sleep(x % 3)
def initPool(queue, level):
"""
This causes the logging module to be initialized with the necessary info
in pool threads to work correctly.
"""
logging.getLogger('').addHandler(MultiProcessingLogHandler(logging.StreamHandler(), queue, child=True))
logging.getLogger('').setLevel(level)
if __name__ == '__main__':
stream = StringIO.StringIO()
logQueue = multiprocessing.Queue(100)
handler= MultiProcessingLogHandler(logging.StreamHandler(stream), logQueue)
logging.getLogger('').addHandler(handler)
logging.getLogger('').setLevel(logging.DEBUG)
logging.debug('starting main')
# when bulding the pool on a Windows machine we also have to init the logger in all the instances with the queue and the level of logging.
pool = multiprocessing.Pool(processes=10, initializer=initPool, initargs=[logQueue, logging.getLogger('').getEffectiveLevel()] ) # start worker processes
pool.map(f, range(0,50))
pool.close()
logging.debug('done')
logging.shutdown()
print "stream output is:"
print stream.getvalue()
I'd like to suggest to use the logger_tt library: https://github.com/Dragon2fly/logger_tt
The multiporcessing_logging library is not working on my macOSX, while logger_tt does.
just publish somewhere your instance of the logger. that way, the other modules and clients can use your API to get the logger without having to import multiprocessing.
I liked zzzeek's answer. I would just substitute the Pipe for a Queue since if multiple threads/processes use the same pipe end to generate log messages they will get garbled.
The concurrent-log-handler seems to do the job perfectly. Tested on Windows. Supports also POSIX systems.
Main idea
Create a separate file with a function that returns a logger. The logger must have fresh instance of ConcurrentRotatingFileHandler for each process. Example function get_logger() given below.
Creating loggers is done at the initialization of the process. For a multiprocessing.Process subclass it would mean the beginning of the run() method.
Detailed instructions
I this example, I will use the following file structure
.
│-- child.py <-- For a child process
│-- logs.py <-- For setting up the logs for the app
│-- main.py <-- For a main process
│-- myapp.py <-- For starting the app
│-- somemodule.py <-- For an example, a "3rd party module using standard logging"
Code
Child process
# child.py
import multiprocessing as mp
import time
from somemodule import do_something
class ChildProcess(mp.Process):
def __init__(self):
self.logger = None
super().__init__()
def run(self):
from logs import get_logger
self.logger = get_logger()
while True:
time.sleep(1)
self.logger.info("Child process")
do_something()
Simple child process that inherits multiprocessing.Process and simply logs to file text "Child process"
Important: The get_logger() is called inside the run(), or elsewhere inside the child process (not module level or in __init__().) This is required as get_logger() creates ConcurrentRotatingFileHandler instance, and new instance is needed for each process.
The do_something is used just to demonstrate that this works with 3rd party library code which does not have any clue that you are using concurrent-log-handler.
Main Process
# main.py
import logging
import multiprocessing as mp
import time
from child import ChildProcess
from somemodule import do_something
class MainProcess(mp.Process):
def __init__(self):
self.logger = logging.getLogger()
super().__init__()
def run(self):
from logs import get_logger
self.logger = get_logger()
self.child = ChildProcess()
self.child.daemon = True
self.child.start()
while True:
time.sleep(0.5)
self.logger.critical("Main process")
do_something()
The main process that logs into file two times a second "Main process". Also inheriting from multiprocessing.Process.
Same comments for get_logger() and do_something() apply as for the child process.
Logger setup
# logs.py
import logging
import os
from concurrent_log_handler import ConcurrentRotatingFileHandler
LOGLEVEL = logging.DEBUG
def get_logger():
logger = logging.getLogger()
if logger.handlers:
return logger
# Use an absolute path to prevent file rotation trouble.
logfile = os.path.abspath("mylog.log")
logger.setLevel(LOGLEVEL)
# Rotate log after reaching 512K, keep 5 old copies.
filehandler = ConcurrentRotatingFileHandler(
logfile, mode="a", maxBytes=512 * 1024, backupCount=5, encoding="utf-8"
)
filehandler.setLevel(LOGLEVEL)
# create also handler for displaying output in the stdout
ch = logging.StreamHandler()
ch.setLevel(LOGLEVEL)
formatter = logging.Formatter(
"%(asctime)s - %(module)s - %(levelname)s - %(message)s [Process: %(process)d, %(filename)s:%(funcName)s(%(lineno)d)]"
)
# add formatter to ch
ch.setFormatter(formatter)
filehandler.setFormatter(formatter)
logger.addHandler(ch)
logger.addHandler(filehandler)
return logger
This uses the ConcurrentRotatingFileHandler from the concurrent-log-handler package. Each process needs a fresh ConcurrentRotatingFileHandler instance.
Note that all the arguments for the ConcurrentRotatingFileHandler should be the same in every process.
Example app
# myapp.py
if __name__ == "__main__":
from main import MainProcess
p = MainProcess()
p.start()
Just a simple example on how to start the multiprocess application
Example of 3rd party module using standard logging
# somemodule.py
import logging
logger = logging.getLogger("somemodule")
def do_something():
logging.info("doing something")
Just a simple example to test if loggers from 3rd party code will work normally.
Example output
2021-04-19 19:02:29,425 - main - CRITICAL - Main process [Process: 103348, main.py:run(23)]
2021-04-19 19:02:29,427 - somemodule - INFO - doing something [Process: 103348, somemodule.py:do_something(7)]
2021-04-19 19:02:29,929 - main - CRITICAL - Main process [Process: 103348, main.py:run(23)]
2021-04-19 19:02:29,931 - somemodule - INFO - doing something [Process: 103348, somemodule.py:do_something(7)]
2021-04-19 19:02:30,133 - child - INFO - Child process [Process: 76700, child.py:run(18)]
2021-04-19 19:02:30,137 - somemodule - INFO - doing something [Process: 76700, somemodule.py:do_something(7)]
2021-04-19 19:02:30,436 - main - CRITICAL - Main process [Process: 103348, main.py:run(23)]
2021-04-19 19:02:30,439 - somemodule - INFO - doing something [Process: 103348, somemodule.py:do_something(7)]
2021-04-19 19:02:30,944 - main - CRITICAL - Main process [Process: 103348, main.py:run(23)]
2021-04-19 19:02:30,946 - somemodule - INFO - doing something [Process: 103348, somemodule.py:do_something(7)]
2021-04-19 19:02:31,142 - child - INFO - Child process [Process: 76700, child.py:run(18)]
2021-04-19 19:02:31,145 - somemodule - INFO - doing something [Process: 76700, somemodule.py:do_something(7)]
2021-04-19 19:02:31,449 - main - CRITICAL - Main process [Process: 103348, main.py:run(23)]
2021-04-19 19:02:31,451 - somemodule - INFO - doing something [Process: 103348, somemodule.py:do_something(7)]
How about delegating all the logging to another process that reads all log entries from a Queue?
LOG_QUEUE = multiprocessing.JoinableQueue()
class CentralLogger(multiprocessing.Process):
def __init__(self, queue):
multiprocessing.Process.__init__(self)
self.queue = queue
self.log = logger.getLogger('some_config')
self.log.info("Started Central Logging process")
def run(self):
while True:
log_level, message = self.queue.get()
if log_level is None:
self.log.info("Shutting down Central Logging process")
break
else:
self.log.log(log_level, message)
central_logger_process = CentralLogger(LOG_QUEUE)
central_logger_process.start()
Simply share LOG_QUEUE via any of the multiprocess mechanisms or even inheritance and it all works out fine!
Below is a class that can be used in Windows environment, requires ActivePython.
You can also inherit for other logging handlers (StreamHandler etc.)
class SyncronizedFileHandler(logging.FileHandler):
MUTEX_NAME = 'logging_mutex'
def __init__(self , *args , **kwargs):
self.mutex = win32event.CreateMutex(None , False , self.MUTEX_NAME)
return super(SyncronizedFileHandler , self ).__init__(*args , **kwargs)
def emit(self, *args , **kwargs):
try:
win32event.WaitForSingleObject(self.mutex , win32event.INFINITE)
ret = super(SyncronizedFileHandler , self ).emit(*args , **kwargs)
finally:
win32event.ReleaseMutex(self.mutex)
return ret
And here is an example that demonstrates usage:
import logging
import random , time , os , sys , datetime
from string import letters
import win32api , win32event
from multiprocessing import Pool
def f(i):
time.sleep(random.randint(0,10) * 0.1)
ch = random.choice(letters)
logging.info( ch * 30)
def init_logging():
'''
initilize the loggers
'''
formatter = logging.Formatter("%(levelname)s - %(process)d - %(asctime)s - %(filename)s - %(lineno)d - %(message)s")
logger = logging.getLogger()
logger.setLevel(logging.INFO)
file_handler = SyncronizedFileHandler(sys.argv[1])
file_handler.setLevel(logging.INFO)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
#must be called in the parent and in every worker process
init_logging()
if __name__ == '__main__':
#multiprocessing stuff
pool = Pool(processes=10)
imap_result = pool.imap(f , range(30))
for i , _ in enumerate(imap_result):
pass
I have a solution that's similar to ironhacker's except that I use logging.exception in some of my code and found that I needed to format the exception before passing it back over the Queue since tracebacks aren't pickle'able:
class QueueHandler(logging.Handler):
def __init__(self, queue):
logging.Handler.__init__(self)
self.queue = queue
def emit(self, record):
if record.exc_info:
# can't pass exc_info across processes so just format now
record.exc_text = self.formatException(record.exc_info)
record.exc_info = None
self.queue.put(record)
def formatException(self, ei):
sio = cStringIO.StringIO()
traceback.print_exception(ei[0], ei[1], ei[2], None, sio)
s = sio.getvalue()
sio.close()
if s[-1] == "\n":
s = s[:-1]
return s
If you have deadlocks occurring in a combination of locks, threads and forks in the logging module, that is reported in bug report 6721 (see also related SO question).
There is a small fixup solution posted here.
However, that will just fix any potential deadlocks in logging. That will not fix that things are maybe garbled up. See the other answers presented here.
Here's my simple hack/workaround... not the most comprehensive, but easily modifiable and simpler to read and understand I think than any other answers I found before writing this:
import logging
import multiprocessing
class FakeLogger(object):
def __init__(self, q):
self.q = q
def info(self, item):
self.q.put('INFO - {}'.format(item))
def debug(self, item):
self.q.put('DEBUG - {}'.format(item))
def critical(self, item):
self.q.put('CRITICAL - {}'.format(item))
def warning(self, item):
self.q.put('WARNING - {}'.format(item))
def some_other_func_that_gets_logger_and_logs(num):
# notice the name get's discarded
# of course you can easily add this to your FakeLogger class
local_logger = logging.getLogger('local')
local_logger.info('Hey I am logging this: {} and working on it to make this {}!'.format(num, num*2))
local_logger.debug('hmm, something may need debugging here')
return num*2
def func_to_parallelize(data_chunk):
# unpack our args
the_num, logger_q = data_chunk
# since we're now in a new process, let's monkeypatch the logging module
logging.getLogger = lambda name=None: FakeLogger(logger_q)
# now do the actual work that happens to log stuff too
new_num = some_other_func_that_gets_logger_and_logs(the_num)
return (the_num, new_num)
if __name__ == '__main__':
multiprocessing.freeze_support()
m = multiprocessing.Manager()
logger_q = m.Queue()
# we have to pass our data to be parallel-processed
# we also need to pass the Queue object so we can retrieve the logs
parallelable_data = [(1, logger_q), (2, logger_q)]
# set up a pool of processes so we can take advantage of multiple CPU cores
pool_size = multiprocessing.cpu_count() * 2
pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=4)
worker_output = pool.map(func_to_parallelize, parallelable_data)
pool.close() # no more tasks
pool.join() # wrap up current tasks
# get the contents of our FakeLogger object
while not logger_q.empty():
print logger_q.get()
print 'worker output contained: {}'.format(worker_output)
There is this great package
Package:
https://pypi.python.org/pypi/multiprocessing-logging/
code:
https://github.com/jruere/multiprocessing-logging
Install:
pip install multiprocessing-logging
Then add:
import multiprocessing_logging
# This enables logs inside process
multiprocessing_logging.install_mp_handler()
For whoever might need this, I wrote a decorator for multiprocessing_logging package that adds the current process name to logs, so it becomes clear who logs what.
It also runs install_mp_handler() so it becomes unuseful to run it before creating a pool.
This allows me to see which worker creates which logs messages.
Here's the blueprint with an example:
import sys
import logging
from functools import wraps
import multiprocessing
import multiprocessing_logging
# Setup basic console logger as 'logger'
logger = logging.getLogger()
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setFormatter(logging.Formatter(u'%(asctime)s :: %(levelname)s :: %(message)s'))
logger.setLevel(logging.DEBUG)
logger.addHandler(console_handler)
# Create a decorator for functions that are called via multiprocessing pools
def logs_mp_process_names(fn):
class MultiProcessLogFilter(logging.Filter):
def filter(self, record):
try:
process_name = multiprocessing.current_process().name
except BaseException:
process_name = __name__
record.msg = f'{process_name} :: {record.msg}'
return True
multiprocessing_logging.install_mp_handler()
f = MultiProcessLogFilter()
# Wraps is needed here so apply / apply_async know the function name
#wraps(fn)
def wrapper(*args, **kwargs):
logger.removeFilter(f)
logger.addFilter(f)
return fn(*args, **kwargs)
return wrapper
# Create a test function and decorate it
#logs_mp_process_names
def test(argument):
logger.info(f'test function called via: {argument}')
# You can also redefine undecored functions
def undecorated_function():
logger.info('I am not decorated')
#logs_mp_process_names
def redecorated(*args, **kwargs):
return undecorated_function(*args, **kwargs)
# Enjoy
if __name__ == '__main__':
with multiprocessing.Pool() as mp_pool:
# Also works with apply_async
mp_pool.apply(test, ('mp pool',))
mp_pool.apply(redecorated)
logger.info('some main logs')
test('main program')
One of the alternatives is to write the mutliprocessing logging to a known file and register an atexit handler to join on those processes read it back on stderr; however, you won't get a real-time flow to the output messages on stderr that way.
Simplest idea as mentioned:
Grab the filename and the process id of the current process.
Set up a [WatchedFileHandler][1]. The reasons for this handler are discussed in detail here, but in short there are certain worse race conditions with the other logging handlers. This one has the shortest window for the race condition.
Choose a path to save the logs to such as /var/log/...

Categories

Resources