What is the proper way to do logging to a CSV file?

What is the proper way to do logging to a CSV file? - python

I want to log some information of every single request sent to a busy http server in a formatted form. Using the logging module would create some thing I don't want to:
[I 131104 15:31:29 Sys:34]
I thought of the CSV format, but I don't know how to customize it. Python has the csv module, but I read in the manual
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
Since it would open and close a file each time, I am afraid this way would slow down the whole server performance. What could I do?

Just use python's logging module.
You can adjust the output the way you want; take a look at Changing the format of displayed messages:
To change the format which is used to display messages, you need to specify the format you want to use:
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
logging.debug('This message should appear on the console')
logging.info('So should this')
logging.warning('And this, too')
and Formatters:
Formatter objects configure the final order, structure, and contents of the log message.
You'll find a list of the attribtus you can use here: LogRecord attributes.
If you want to produce a valid csv-file, use python's csv module, too.
Here's a simple example:
import logging
import csv
import io
class CsvFormatter(logging.Formatter):
def __init__(self):
super().__init__()
self.output = io.StringIO()
self.writer = csv.writer(self.output, quoting=csv.QUOTE_ALL)
def format(self, record):
self.writer.writerow([record.levelname, record.msg])
data = self.output.getvalue()
self.output.truncate(0)
self.output.seek(0)
return data.strip()
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logging.root.handlers[0].setFormatter(CsvFormatter())
logger.debug('This message should appear on the console')
logger.info('So should "this", and it\'s using quoting...')
logger.warning('And this, too')
Output:
"DEBUG","This message should appear on the console"
"INFO","So should ""this"", and it's using quoting..."
"WARNING","And this, too"

As sloth suggests, you can easily edit the delimiter of the log to a comma, thus producing a CSV file.
Working example:
import logging
# create logger
lgr = logging.getLogger('logger name')
lgr.setLevel(logging.DEBUG) # log all escalated at and above DEBUG
# add a file handler
fh = logging.FileHandler('path_of_your_log.csv')
fh.setLevel(logging.DEBUG) # ensure all messages are logged to file
# create a formatter and set the formatter for the handler.
frmt = logging.Formatter('%(asctime)s,%(name)s,%(levelname)s,%(message)s')
fh.setFormatter(frmt)
# add the Handler to the logger
lgr.addHandler(fh)
# You can now start issuing logging statements in your code
lgr.debug('a debug message')
lgr.info('an info message')
lgr.warn('A Checkout this warning.')
lgr.error('An error writen here.')
lgr.critical('Something very critical happened.')

I would agree that you should use the logging module, but you can't really do it properly with just a format string like some of the other answers show, as they do not address the situation where you log a message that contains a comma.
If you need a solution that will properly escape any special characters in the message (or other fields, I suppose), you would have to write a custom formatter and set it.
logger = logging.getLogger()
formatter = MyCsvFormatter()
handler = logging.FileHandler(filename, "w")
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(level)
You'll obviously have to implement the MyCsvFormatter class, which should inherit from logging.Formatter and override the format() method
class MyCsvFormatter(logging.Formatter):
def __init__(self):
fmt = "%(levelname)s,%(message)s" # Set a format that uses commas, like the other answers
super(MyCsvFormatter, self).__init__(fmt=fmt)
def format(self, record):
msg = record.getMessage()
# convert msg to a csv compatible string using your method of choice
record.msg = msg
return super(MyCsvFormatter, self).format(self, record)
Note: I've done something like this before, but haven't tested this particular code sample
As far as doing the actual escaping of the message, here's one possible approach:
Python - write data into csv format as string (not file)

I don't think that is the best idea, but it is doable, and quite simple.
Manually buffer your log. Store log entries in some place, and write them to file from time to time.
If you know that your server will be constantly busy, flush your buffer when it reaches some size. If there may be big gaps in usage, I'd say that new thread (or better process, check yourself why threads suck and slow down apps) with endless (theoretically of course) loop of sleep/flush would be better call.
Also, remember to create some kind of hook that will flush buffer when server is interrupted or fails (maybe signals? or just try/except on main function - there are even more ways to do it), so you don't lose unflushed buffer data on unexpected exit.
I repeat - this is not the best idea, it's the first thing that came to my mind. You may want to consult logging implementations from Flask or some other webapp framework (AFAIR Flask has CSV logging too).

Related

Unable to set different logging levels for two logging handlers in python

As described in python's logging cookbook, I want to display logging.INFO on the console, while simultaneously writing logging.WARNING to a log file.
However, I see logging.INFO on the console as well as in the log file when using this code:
import logging
def initialize_logger():
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO) # <--- ADDING THIS LINE SOLVED IT
fh = logging.FileHandler('error.log') # create file handler which logs WARNING
fh.setLevel(logging.WARNING)
ch = logging.StreamHandler() # create console handler which logs INFO
ch.setLevel(logging.INFO)
formatter = logging.Formatter('%(name)s - %(message)s') # create formatter
ch.setFormatter(formatter) # add formatter to handlers
fh.setFormatter(formatter) # add formatter to handlers
logger.addHandler(ch) # add the handlers to logger
logger.addHandler(fh) # add the handlers to logger
return logger
logger = initialize_logger()
Why do I see the same logging level for FileHandler and StreamHandler?
How to properly set different logging levels for two simultaneously running handlers?
EDIT: added colon after function definition

Okay, to answer your first question,
Logger objects have a threefold job. First, they expose several
methods to application code so that applications can log messages at
runtime. Second, logger objects determine which log messages to act
upon based upon severity (the default filtering facility) or filter
objects. Third, logger objects pass along relevant log messages to all
interested log handlers.
So in your application, you need that some level messages get logged to a file, and some are displayed on console. So to do that, you would first need to create a logger object, specify the lowest severity (ie. the default filtering facility as mentioned in doc above) that will be dispatched to the appropriate destination (ie. first to the handler, then right handlers' destination).
It is like you are saying to the logger object, your handlers will be handling log messages only above this level. In case you do not specify it, or you give a level above which the handlers are going to dispatch, then that log message may not be dispatched to the handler because the logger object did not receive it in the first place. Make sense?
So that means if you are going to be using handlers, you are required to setLevel() for logger first, since that is the initial filter/point where the log messages are dispatched to. Later, the logger dispatches it to respective handlers.
For your next question,
I ran your code after adding the following lines at the bottom:
logger.debug('Quick zephyrs blow, vexing daft Jim.')
logger.info('How quickly daft jumping zebras vex.')
logger.warning('Jail zesty vixen who grabbed pay from quack.')
logger.error('The five boxing wizards jump quickly.')
and I am getting the last three (from logger.info()) in my console and the last two (from logger.warning()) in my file at temp/error.log. This is the expected correct behavior. So I am not getting the error here.
Are you sure you are adding the logger.setLevel(logging.INFO) at the beginning? If you don't, then you will get the same level for both file and console. Do check that and let me know. Hope this helps!

Cancel Python log entries?

Is it possible and what would be the best way to cancel some log entries added via log.debug() after they were added?
I'm trying to not log anything if nothing noteworthy happened in my application (which I only know by the end of my main() loop).

You can use this piece of code:
logger = logging.getLogger()
logger.disabled = True
## content that WON'T appear in your log file ##
logger.disabled = False
## content that WILL appear in your log file ##
You should take a look here.
Maybe you'll find something useful.

Python Logging vs performance

I'm using python Logger in one of my programs.
The program is a solver for an np-hard problem and therefore uses deep iterations that run several times.
My question is if the Logger can be an issue in the performance of my program and if there are better ways to log information maintaining performance.

Depending on your Logger configuration and the amount of logs your program produces, yes, logging can be a performance bottleneck because of the blocking Logger operation. For example when directly logging to an NFS file from a NFS server with slow response time. One possible approach to improve performance in such case would be switching to use of a logserver able to buffer and possibly batch logging operations - the blocking would be limited to the communication with the logserver, not to the (slow) logfile access, which is often better from the performance prospective.

I had very good experience using two different logfiles.
The server.log file is for the operator and receives only important messages, usually INFO, WARNING, ERROR, CRITICAL.
The debug.log file for the developer to analyze errors. It contains up to 100 DEBUG message from the thread of the time before an ERROR occurred.
For the second file, I use thread-local ring-buffers that are only written to a file when the program detects an error. Thus the server.log file remains small
but the developers get enough debug messages to analyze problems later. If not problem occurs, then both files are totally empty, and thus do not harm the performance. Of course, the buffers cost memory and a little CPU power, but that can be accepted.
This is an example implementation which I am using in Odoo (which is a Python application):
import logging, collections, time
class LogBuffer(logging.Handler):
"""Buffer debug messages per thread and write them out when an error (or warning) occurs"""
def __init__(self, target_handler, threshold, max_buffered_messages, max_buffer_seconds):
logging.Handler.__init__(self, logging.DEBUG)
self.tread_buffers = dict() # stores one buffer for each thread (key=thread number)
self.target_handler = target_handler
self.threshold = threshold
self.max_buffered_messages = max_buffered_messages
self.last_check_time = time.time()
self.max_buffer_seconds = max_buffer_seconds
def emit(self, record):
"""Do whatever it takes to actually log the specified logging record."""
# Create a thread local buffer, if not already exists
if record.thread not in self.tread_buffers:
thread_buffer = self.tread_buffers[record.thread] = collections.deque()
else:
thread_buffer = self.tread_buffers[record.thread]
# Put the log record into the buffer
thread_buffer.append(record)
# If the buffer became to large, then remove the oldest entry
if len(thread_buffer) > self.max_buffered_messages:
thread_buffer.popleft()
# produce output if the log level is high enough
if record.levelno >= self.threshold:
for r in thread_buffer:
self.target_handler.emit(r)
thread_buffer.clear()
# remove very old messages from all buffers once per minute
now = time.time()
elapsed = now - self.last_check_time
if elapsed > 60:
# Iterate over all buffers
for key, buffer in list(self.tread_buffers.items()):
# Iterate over the content of one buffer
for r in list(buffer):
age = now - r.created
if age > self.max_buffer_seconds:
buffer.remove(r)
# If the buffer is now empty, then remove it
if not buffer:
del self.tread_buffers[key]
self.last_check_time = now
Example how to create/configure such a logger:
import logging
from . import logbuffer
"""
Possible placeholders for the formatter:
%(name)s Name of the logger (logging channel)
%(levelno)s Numeric logging level for the message (DEBUG, INFO,
WARNING, ERROR, CRITICAL)
%(levelname)s Text logging level for the message ("DEBUG", "INFO",
"WARNING", "ERROR", "CRITICAL")
%(pathname)s Full pathname of the source file where the logging
call was issued (if available)
%(filename)s Filename portion of pathname
%(module)s Module (name portion of filename)
%(lineno)d Source line number where the logging call was issued
(if available)
%(funcName)s Function name
%(created)f Time when the LogRecord was created (time.time()
return value)
%(asctime)s Textual time when the LogRecord was created
%(msecs)d Millisecond portion of the creation time
%(relativeCreated)d Time in milliseconds when the LogRecord was created,
relative to the time the logging module was loaded
(typically at application startup time)
%(thread)d Thread ID (if available)
%(threadName)s Thread name (if available)
%(process)d Process ID (if available)
%(message)s The result of record.getMessage(), computed just as
the record is emitted
"""
# Log levels are: CRITICAL, ERROR, WARNING, INFO, DEBUG
# Specify the output format
formatter = logging.Formatter('%(asctime)-15s %(thread)20d %(levelname)-8s %(name)s %(message)s')
# Create server.log
server_log = logging.FileHandler('../log/server.log')
server_log.setLevel(logging.INFO)
server_log.setFormatter(formatter)
logging.root.addHandler(server_log)
# Create debug.log
debug_log = logging.FileHandler('../log/debug.log')
debug_log.setFormatter(formatter)
memory_handler = logbuffer.LogBuffer(debug_log, threshold=logging.ERROR, max_buffered_messages=100, max_buffer_seconds=600)
logging.root.addHandler(memory_handler)
# Specify log levels for individual packages
logging.getLogger('odoo.addons').setLevel(logging.DEBUG)
# The default log level for all other packages
logging.root.setLevel(logging.INFO)
Please let me know if you find this helpful. Im on a very beginner level regarding Python, but I have the same thing in Java and C++ already running successfully for years.

Add Custom Function Output to Python Logging Handler

Using logging to handle logging across multiple modules within a simulation framework with it's own 'time'.
Basically, I'm getting things like:
WARNING:Node[n0].App:RoutingTest:No Packet Count List set up yet; fudging it with an broadcast first
INFO:Node[n0].Layercake.ALOHA:Transmit to Any
INFO:Node[n0].Layercake.ALOHA:The timeout is 16.0910738255
WARNING:Node[n1].App:RoutingTest:No Packet Count List set up yet; fudging it with an broadcast first
INFO:Node[n1].Layercake.ALOHA:Transmit to Any
And while these happen more or less instantaneously in 'real' time it's tough to tell what that means in machine time.
Within the framework, there's a globally accessible Sim.now() that returns the current run time.
While I could go through all my logging uses and add this as an additional tail field, I'd rather add it as part of the base logging handler, however a scan through the relevant documentation and searches here and google haven't turned up anything directly relevant. There was one guy asking almost the same question but didn't get an appropriate response
In essence, I want to up date the base handler to prefix all log calls with a call to this function, effectively
logline="[{T}]:{msg}".format(T=Sim.now(), msg=logmsg)
Any pointers?

You could write a custom Formatter:
import logging
from sim import Sim
class SimNowPrefixFormatter(logging.Formatter):
def format(self, record):
log_message = super(SimNowPrefixFormatter, self).format(record)
return "[{}]:{}".format(Sim.now(), log_message)
# Your base logging handler
handler = logging.StreamHandler()
handler.setFormatter(SimNowPrefixFormatter("%(levelname)s:%(message)s"))
root_logger = logging.getLogger()
root_logger.addHandler(handler)

Suppress newline in Python logging module

I'm trying to replace an ad-hoc logging system with Python's logging module. I'm using the logging system to output progress information for a long task on a single line so you can tail the log or watch it in a console. I've done this by having a flag on my logging function which suppresses the newline for that log message and build the line piece by piece.
All the logging is done from a single thread so there's no serialisation issues.
Is it possible to do this with Python's logging module? Is it a good idea?

If you wanted to do this you can change the logging handler terminator. I'm using Python 3.4. This was introduced in Python 3.2 as stated by Ninjakannon.
handler = logging.StreamHandler()
handler.terminator = ""
When the StreamHandler writes it writes the terminator last.

Let's start with your last question: No, I do not believe it's a good idea.
IMO, it hurts the readability of the logfile in the long run.
I suggest sticking with the logging module and using the '-f' option on your 'tail' command to watch the output from the console. You will probably end up using the FileHandler. Notice that the default argument for 'delay' is False meaning the output won't be buffered.
If you really needed to suppress newlines, I would recommend creating your own Handler.

The new line, \n, is inserted inside the StreamHandler class.
If you're really set on fixing this behaviour, then here's an example of how I solved this by monkey patching the emit(self, record) method inside the logging.StreamHandler class.
A monkey patch is a way to extend or modify the run-time code of dynamic languages without altering the original source code. This process has also been termed duck punching.
Here is the custom implementation of emit() that omits line breaks:
def customEmit(self, record):
# Monkey patch Emit function to avoid new lines between records
try:
msg = self.format(record)
if not hasattr(types, "UnicodeType"): #if no unicode support...
self.stream.write(msg)
else:
try:
if getattr(self.stream, 'encoding', None) is not None:
self.stream.write(msg.encode(self.stream.encoding))
else:
self.stream.write(msg)
except UnicodeError:
self.stream.write(msg.encode("UTF-8"))
self.flush()
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
Then you would make a custom logging class (in this case, subclassing from TimedRotatingFileHandler).
class SniffLogHandler(TimedRotatingFileHandler):
def __init__(self, filename, when, interval, backupCount=0,
encoding=None, delay=0, utc=0):
# Monkey patch 'emit' method
setattr(StreamHandler, StreamHandler.emit.__name__, customEmit)
TimedRotatingFileHandler.__init__(self, filename, when, interval,
backupCount, encoding, delay, utc)
Some people might argue that this type of solution is not Pythonic, or whatever. It might be so, so be careful.
Also, be aware that this will globally patch SteamHandler.emit(...), so if you are using multiple logging classes, then this patch will affect the other logging classes as well!
Check out these for further reading:
What is monkey-patching?
Is monkeypatching considered good programming practice?
Monkeypatching For Humans
Hope that helps.

Python 3.5.9
class MFileHandler(logging.FileHandler):
"""Handler that controls the writing of the newline character"""
special_code = '[!n]'
def emit(self, record) -> None:
if self.special_code in record.msg:
record.msg = record.msg.replace( self.special_code, '' )
self.terminator = ''
else:
self.terminator = '\n'
return super().emit(record)
Then
fHandler = MFileHandler(...)
Example:
# without \n
log.info( 'waiting...[!n]' )
...
log.info( 'OK' )
# with \n
log.info( 'waiting...' )
...
log.info( 'OK' )
log.txt:
waiting...OK
waiting...
OK

I encountered a need to log a certain section in a single line as I iterated through a tuple, but wanted to retain the overall logger.
Collected the output first into a single string and later sent it out to logger once I was out of the section. Example of concept
for fld in object._fields:
strX = (' {} --> {} ').format(fld, formattingFunction(getattr(obj,fld)))
debugLine += strX
logger.debug('{}'.format(debugLine))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.