Python Logging vs performance

Python Logging vs performance - python

I'm using python Logger in one of my programs.
The program is a solver for an np-hard problem and therefore uses deep iterations that run several times.
My question is if the Logger can be an issue in the performance of my program and if there are better ways to log information maintaining performance.

Depending on your Logger configuration and the amount of logs your program produces, yes, logging can be a performance bottleneck because of the blocking Logger operation. For example when directly logging to an NFS file from a NFS server with slow response time. One possible approach to improve performance in such case would be switching to use of a logserver able to buffer and possibly batch logging operations - the blocking would be limited to the communication with the logserver, not to the (slow) logfile access, which is often better from the performance prospective.

I had very good experience using two different logfiles.
The server.log file is for the operator and receives only important messages, usually INFO, WARNING, ERROR, CRITICAL.
The debug.log file for the developer to analyze errors. It contains up to 100 DEBUG message from the thread of the time before an ERROR occurred.
For the second file, I use thread-local ring-buffers that are only written to a file when the program detects an error. Thus the server.log file remains small
but the developers get enough debug messages to analyze problems later. If not problem occurs, then both files are totally empty, and thus do not harm the performance. Of course, the buffers cost memory and a little CPU power, but that can be accepted.
This is an example implementation which I am using in Odoo (which is a Python application):
import logging, collections, time
class LogBuffer(logging.Handler):
"""Buffer debug messages per thread and write them out when an error (or warning) occurs"""
def __init__(self, target_handler, threshold, max_buffered_messages, max_buffer_seconds):
logging.Handler.__init__(self, logging.DEBUG)
self.tread_buffers = dict() # stores one buffer for each thread (key=thread number)
self.target_handler = target_handler
self.threshold = threshold
self.max_buffered_messages = max_buffered_messages
self.last_check_time = time.time()
self.max_buffer_seconds = max_buffer_seconds
def emit(self, record):
"""Do whatever it takes to actually log the specified logging record."""
# Create a thread local buffer, if not already exists
if record.thread not in self.tread_buffers:
thread_buffer = self.tread_buffers[record.thread] = collections.deque()
else:
thread_buffer = self.tread_buffers[record.thread]
# Put the log record into the buffer
thread_buffer.append(record)
# If the buffer became to large, then remove the oldest entry
if len(thread_buffer) > self.max_buffered_messages:
thread_buffer.popleft()
# produce output if the log level is high enough
if record.levelno >= self.threshold:
for r in thread_buffer:
self.target_handler.emit(r)
thread_buffer.clear()
# remove very old messages from all buffers once per minute
now = time.time()
elapsed = now - self.last_check_time
if elapsed > 60:
# Iterate over all buffers
for key, buffer in list(self.tread_buffers.items()):
# Iterate over the content of one buffer
for r in list(buffer):
age = now - r.created
if age > self.max_buffer_seconds:
buffer.remove(r)
# If the buffer is now empty, then remove it
if not buffer:
del self.tread_buffers[key]
self.last_check_time = now
Example how to create/configure such a logger:
import logging
from . import logbuffer
"""
Possible placeholders for the formatter:
%(name)s Name of the logger (logging channel)
%(levelno)s Numeric logging level for the message (DEBUG, INFO,
WARNING, ERROR, CRITICAL)
%(levelname)s Text logging level for the message ("DEBUG", "INFO",
"WARNING", "ERROR", "CRITICAL")
%(pathname)s Full pathname of the source file where the logging
call was issued (if available)
%(filename)s Filename portion of pathname
%(module)s Module (name portion of filename)
%(lineno)d Source line number where the logging call was issued
(if available)
%(funcName)s Function name
%(created)f Time when the LogRecord was created (time.time()
return value)
%(asctime)s Textual time when the LogRecord was created
%(msecs)d Millisecond portion of the creation time
%(relativeCreated)d Time in milliseconds when the LogRecord was created,
relative to the time the logging module was loaded
(typically at application startup time)
%(thread)d Thread ID (if available)
%(threadName)s Thread name (if available)
%(process)d Process ID (if available)
%(message)s The result of record.getMessage(), computed just as
the record is emitted
"""
# Log levels are: CRITICAL, ERROR, WARNING, INFO, DEBUG
# Specify the output format
formatter = logging.Formatter('%(asctime)-15s %(thread)20d %(levelname)-8s %(name)s %(message)s')
# Create server.log
server_log = logging.FileHandler('../log/server.log')
server_log.setLevel(logging.INFO)
server_log.setFormatter(formatter)
logging.root.addHandler(server_log)
# Create debug.log
debug_log = logging.FileHandler('../log/debug.log')
debug_log.setFormatter(formatter)
memory_handler = logbuffer.LogBuffer(debug_log, threshold=logging.ERROR, max_buffered_messages=100, max_buffer_seconds=600)
logging.root.addHandler(memory_handler)
# Specify log levels for individual packages
logging.getLogger('odoo.addons').setLevel(logging.DEBUG)
# The default log level for all other packages
logging.root.setLevel(logging.INFO)
Please let me know if you find this helpful. Im on a very beginner level regarding Python, but I have the same thing in Java and C++ already running successfully for years.

Related

How To Get All Python Logs Only On Error?

I am using pythons built in logging class. I'd like to only log out when an error occurs, but when it does, to log out everything up until that point for debugging purposes.
It would be nice if I could reset this as well, so a long running process doesn't contain gigabytes of logs.
As an example. I have a process that processes one million widgets. Processing a widget can be complicated and involve several steps. If processing fails, knowing all of the logs for that widget up to that point would be helpful.
from random import randrange
logger = logging.getLogger()
for widget in widgetGenerator():
logger.reset()
widget.process(logger)
class Widget():
def process(self, logger):
logger.info('doing stuff')
logger.info('do more stuff')
if randrange(0, 10) == 5:
logger.error('something bad happened')
1 out of 10 times the following would be printed:
doing stuff
doing more stuff
something bad happened
But the normal logs would not be printed otherwise.
Can this be done with the logger as is or do I need to roll my own implementation?

Use a MemoryHandler to buffer records using a threshold of e.g. ERROR, and make the MemoryHandler's target attribute point to a handler which writes to e.g. console or file. Then output should only occur if the threshold (e.g. ERROR) is hit during program execution.

Unable to set different logging levels for two logging handlers in python

As described in python's logging cookbook, I want to display logging.INFO on the console, while simultaneously writing logging.WARNING to a log file.
However, I see logging.INFO on the console as well as in the log file when using this code:
import logging
def initialize_logger():
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO) # <--- ADDING THIS LINE SOLVED IT
fh = logging.FileHandler('error.log') # create file handler which logs WARNING
fh.setLevel(logging.WARNING)
ch = logging.StreamHandler() # create console handler which logs INFO
ch.setLevel(logging.INFO)
formatter = logging.Formatter('%(name)s - %(message)s') # create formatter
ch.setFormatter(formatter) # add formatter to handlers
fh.setFormatter(formatter) # add formatter to handlers
logger.addHandler(ch) # add the handlers to logger
logger.addHandler(fh) # add the handlers to logger
return logger
logger = initialize_logger()
Why do I see the same logging level for FileHandler and StreamHandler?
How to properly set different logging levels for two simultaneously running handlers?
EDIT: added colon after function definition

Okay, to answer your first question,
Logger objects have a threefold job. First, they expose several
methods to application code so that applications can log messages at
runtime. Second, logger objects determine which log messages to act
upon based upon severity (the default filtering facility) or filter
objects. Third, logger objects pass along relevant log messages to all
interested log handlers.
So in your application, you need that some level messages get logged to a file, and some are displayed on console. So to do that, you would first need to create a logger object, specify the lowest severity (ie. the default filtering facility as mentioned in doc above) that will be dispatched to the appropriate destination (ie. first to the handler, then right handlers' destination).
It is like you are saying to the logger object, your handlers will be handling log messages only above this level. In case you do not specify it, or you give a level above which the handlers are going to dispatch, then that log message may not be dispatched to the handler because the logger object did not receive it in the first place. Make sense?
So that means if you are going to be using handlers, you are required to setLevel() for logger first, since that is the initial filter/point where the log messages are dispatched to. Later, the logger dispatches it to respective handlers.
For your next question,
I ran your code after adding the following lines at the bottom:
logger.debug('Quick zephyrs blow, vexing daft Jim.')
logger.info('How quickly daft jumping zebras vex.')
logger.warning('Jail zesty vixen who grabbed pay from quack.')
logger.error('The five boxing wizards jump quickly.')
and I am getting the last three (from logger.info()) in my console and the last two (from logger.warning()) in my file at temp/error.log. This is the expected correct behavior. So I am not getting the error here.
Are you sure you are adding the logger.setLevel(logging.INFO) at the beginning? If you don't, then you will get the same level for both file and console. Do check that and let me know. Hope this helps!

Add Custom Function Output to Python Logging Handler

Using logging to handle logging across multiple modules within a simulation framework with it's own 'time'.
Basically, I'm getting things like:
WARNING:Node[n0].App:RoutingTest:No Packet Count List set up yet; fudging it with an broadcast first
INFO:Node[n0].Layercake.ALOHA:Transmit to Any
INFO:Node[n0].Layercake.ALOHA:The timeout is 16.0910738255
WARNING:Node[n1].App:RoutingTest:No Packet Count List set up yet; fudging it with an broadcast first
INFO:Node[n1].Layercake.ALOHA:Transmit to Any
And while these happen more or less instantaneously in 'real' time it's tough to tell what that means in machine time.
Within the framework, there's a globally accessible Sim.now() that returns the current run time.
While I could go through all my logging uses and add this as an additional tail field, I'd rather add it as part of the base logging handler, however a scan through the relevant documentation and searches here and google haven't turned up anything directly relevant. There was one guy asking almost the same question but didn't get an appropriate response
In essence, I want to up date the base handler to prefix all log calls with a call to this function, effectively
logline="[{T}]:{msg}".format(T=Sim.now(), msg=logmsg)
Any pointers?

You could write a custom Formatter:
import logging
from sim import Sim
class SimNowPrefixFormatter(logging.Formatter):
def format(self, record):
log_message = super(SimNowPrefixFormatter, self).format(record)
return "[{}]:{}".format(Sim.now(), log_message)
# Your base logging handler
handler = logging.StreamHandler()
handler.setFormatter(SimNowPrefixFormatter("%(levelname)s:%(message)s"))
root_logger = logging.getLogger()
root_logger.addHandler(handler)

Python 2.7: log displayed twice when `logging` module is used in two python scripts

Context:
Python 2.7.
Two files in the same folder:
First: main script.
Second: custom module.
Goal:
Possibility to use the logging module without any clash (see output below).
Files:
a.py:
import logging
from b import test_b
def test_a(logger):
logger.debug("debug")
logger.info("info")
logger.warning("warning")
logger.error("error")
if __name__ == "__main__":
# Custom logger.
logger = logging.getLogger("test")
formatter = logging.Formatter('[%(levelname)s] %(message)s')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)
# Test A and B.
print "B"
test_b()
print "A"
test_a(logger)
b.py:
import logging
def test_b():
logging.debug("debug")
logging.info("info")
logging.warning("warning")
logging.error("error")
Output:
As one could see below, the log is displayed twice.
python a.py
B
WARNING:root:warning
ERROR:root:error
A
[DEBUG] debug
DEBUG:test:debug
[INFO] info
INFO:test:info
[WARNING] warning
WARNING:test:warning
[ERROR] error
ERROR:test:error
Would anyone have a solution to this?
EDIT: not running test_b() will cause no log duplication and correct log formatting (expected).

I'm not sure I understand your case, because the description doesn't match the output… but I think I know what your problem is.
As the docs explain:
Note: If you attach a handler to a logger and one or more of its ancestors, it may emit the same record multiple times. In general, you should not need to attach a handler to more than one logger - if you just attach it to the appropriate logger which is highest in the logger hierarchy, then it will see all events logged by all descendant loggers, provided that their propagate setting is left set to True. A common scenario is to attach handlers only to the root logger, and to let propagation take care of the rest.
And that "common scenario" usually works great, but I assume you need to attach a custom handler to "test", without affecting the root logger.
So, if you want a custom handler on "test", and you don't want its messages also going to the root handler, the answer is simple: turn off its propagate flag:
logger.propagate = False
The reason this only happens if you call test_b is that otherwise, the root logger never gets initialized. The first time you log to any logger that hasn't been configured, it effectively does a basicConfig() on that logger. So, calling logging.getLogger().info(msg) or logging.info(msg) will configure the root logger. But propagating from a child logger will not.
I believe this is explained somewhere in the logging HOWTO or cookbook, both under HOWTOs, but in the actual module docs, it's buried in the middle of a note about threading under logging.log:
Note: The above module-level functions which delegate to the root logger should not be used in threads, in versions of Python earlier than 2.7.1 and 3.2, unless at least one handler has been added to the root logger before the threads are started. These convenience functions call basicConfig() to ensure that at least one handler is available; ; in earlier versions of Python, this can (under rare circumstances) lead to handlers being added multiple times to the root logger, which can in turn lead to multiple messages for the same event.
It's pretty easy to see how you could have missed that!

What is the proper way to do logging to a CSV file?

I want to log some information of every single request sent to a busy http server in a formatted form. Using the logging module would create some thing I don't want to:
[I 131104 15:31:29 Sys:34]
I thought of the CSV format, but I don't know how to customize it. Python has the csv module, but I read in the manual
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
Since it would open and close a file each time, I am afraid this way would slow down the whole server performance. What could I do?

Just use python's logging module.
You can adjust the output the way you want; take a look at Changing the format of displayed messages:
To change the format which is used to display messages, you need to specify the format you want to use:
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
logging.debug('This message should appear on the console')
logging.info('So should this')
logging.warning('And this, too')
and Formatters:
Formatter objects configure the final order, structure, and contents of the log message.
You'll find a list of the attribtus you can use here: LogRecord attributes.
If you want to produce a valid csv-file, use python's csv module, too.
Here's a simple example:
import logging
import csv
import io
class CsvFormatter(logging.Formatter):
def __init__(self):
super().__init__()
self.output = io.StringIO()
self.writer = csv.writer(self.output, quoting=csv.QUOTE_ALL)
def format(self, record):
self.writer.writerow([record.levelname, record.msg])
data = self.output.getvalue()
self.output.truncate(0)
self.output.seek(0)
return data.strip()
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logging.root.handlers[0].setFormatter(CsvFormatter())
logger.debug('This message should appear on the console')
logger.info('So should "this", and it\'s using quoting...')
logger.warning('And this, too')
Output:
"DEBUG","This message should appear on the console"
"INFO","So should ""this"", and it's using quoting..."
"WARNING","And this, too"

As sloth suggests, you can easily edit the delimiter of the log to a comma, thus producing a CSV file.
Working example:
import logging
# create logger
lgr = logging.getLogger('logger name')
lgr.setLevel(logging.DEBUG) # log all escalated at and above DEBUG
# add a file handler
fh = logging.FileHandler('path_of_your_log.csv')
fh.setLevel(logging.DEBUG) # ensure all messages are logged to file
# create a formatter and set the formatter for the handler.
frmt = logging.Formatter('%(asctime)s,%(name)s,%(levelname)s,%(message)s')
fh.setFormatter(frmt)
# add the Handler to the logger
lgr.addHandler(fh)
# You can now start issuing logging statements in your code
lgr.debug('a debug message')
lgr.info('an info message')
lgr.warn('A Checkout this warning.')
lgr.error('An error writen here.')
lgr.critical('Something very critical happened.')

I would agree that you should use the logging module, but you can't really do it properly with just a format string like some of the other answers show, as they do not address the situation where you log a message that contains a comma.
If you need a solution that will properly escape any special characters in the message (or other fields, I suppose), you would have to write a custom formatter and set it.
logger = logging.getLogger()
formatter = MyCsvFormatter()
handler = logging.FileHandler(filename, "w")
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(level)
You'll obviously have to implement the MyCsvFormatter class, which should inherit from logging.Formatter and override the format() method
class MyCsvFormatter(logging.Formatter):
def __init__(self):
fmt = "%(levelname)s,%(message)s" # Set a format that uses commas, like the other answers
super(MyCsvFormatter, self).__init__(fmt=fmt)
def format(self, record):
msg = record.getMessage()
# convert msg to a csv compatible string using your method of choice
record.msg = msg
return super(MyCsvFormatter, self).format(self, record)
Note: I've done something like this before, but haven't tested this particular code sample
As far as doing the actual escaping of the message, here's one possible approach:
Python - write data into csv format as string (not file)

I don't think that is the best idea, but it is doable, and quite simple.
Manually buffer your log. Store log entries in some place, and write them to file from time to time.
If you know that your server will be constantly busy, flush your buffer when it reaches some size. If there may be big gaps in usage, I'd say that new thread (or better process, check yourself why threads suck and slow down apps) with endless (theoretically of course) loop of sleep/flush would be better call.
Also, remember to create some kind of hook that will flush buffer when server is interrupted or fails (maybe signals? or just try/except on main function - there are even more ways to do it), so you don't lose unflushed buffer data on unexpected exit.
I repeat - this is not the best idea, it's the first thing that came to my mind. You may want to consult logging implementations from Flask or some other webapp framework (AFAIR Flask has CSV logging too).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.