I use import logging module for logging inside the AWS lambda with python 3.7 runtime.
I would like to perform certain manipulations on log statements before they are flushed to stdout, e.g. wrap the message as json and add tracing data, so that they would be parseable by Kibana parser.
I don't want to write my own decorator for that because that won't work for underlying dependencies.
Ideally, it should be something like a configured callback for the logger
so that it would do following work for me:
log_statement = {}
log_statement['message'] = 'this is the message'
log_statement['X-B3-TraceId'] = "76b85f5e32ce7b46"
log_statement['level'] = 'INFO'
sys.stdout.write(json.dumps(log_statement) + '\n')
while having still logger.info('this is the message').
How can I do that?
Answering my own question:
I had to use LoggerAdapter that is quite a good fit for the purpose of pre-processing log statements:
import logging
class CustomAdapter(logging.LoggerAdapter):
def process(self, msg, kwargs):
log_statement = '{"X-B3-TraceId":"%s", "message":"%s"}' % (self.extra['X-B3-TraceId'], msg) + '\n'
return log_statement, kwargs
See: https://docs.python.org/3/howto/logging-cookbook.html#using-loggeradapters-to-impart-contextual-information
In general, the next step would be just plugging in the adapter like:
import logging
...
logging.basicConfig(format='%(message)s')
logger = logging.getLogger()
logger.setLevel(LOG_LEVEL)
custom_logger = CustomAdapter(logger, {'X-B3-TraceId': "test"})
...
custom_logger.info("test")
Note: I had to put format as a message only because I need to get the whole statement as a JSON string. Unfortunately, thus I lost some predefined log statement parts, e.g. aws_request_id. This is the limitation of LoggerAdapter#process as it handles only the message part. If anyone has a better approach here, pls suggest.
It appears that AWS lambda python runtime somehow interferes with logging facility and changing the format like above did not work. So I had to do additionally this:
FORMAT = "%(message)s"
logger = logging.getLogger()
for h in logger.handlers:
h.setFormatter(logging.Formatter(FORMAT))
See: https://gist.github.com/niranjv/fb95e716151642e8ca553b0e38dd152e
Related
I monitor my script with the logging module of the Python Standard Library and I send the loggings to both the console with StreamHandler, and to a file with FileHandler.
I would like to have the option to disable a handler for a LogRecord independantly of its severity. For example, for a specific LogRecord I would like to have the option not to send it to the file destination or to the console (with passing a parameter).
I have found that the library has the Filter class for that reason (which is described as a finer grained way to filter blocks), but haven't figured out how to do it.
Any ideas how to do this in a cosistent way?
Finally, it is quite easy. I used a function as a Handler.filer as suggested in the comments.
This is a working example:
from pathlib import Path
import logging
from logging import LogRecord
def build_handler_filters(handler: str):
def handler_filter(record: LogRecord):
if hasattr(record, 'block'):
if record.block == handler:
return False
return True
return handler_filter
ch = logging.StreamHandler()
ch.addFilter(build_handler_filters('console'))
fh = logging.FileHandler(Path('/tmp/test.log'))
fh.addFilter(build_handler_filters('file'))
mylogger = logging.getLogger(__name__)
mylogger.setLevel(logging.DEBUG)
mylogger.addHandler(ch)
mylogger.addHandler(fh)
When the logger is called, the message is sent to both console and output, i.e.
mylogger.info('msg').
To block for example the file the logger should be called with the extra argument like this
mylogger.info('msg only to console', extra={'block': 'file'})
Disabling console is analogous.
Currently I use for logging a custom logging system that works as follow:
I have a Log class that ressemble the following:
class Log:
def __init__(self):
self.script = ""
self.datetime = datetime.datetime.now().replace(second=0, microsecond=0)
self.mssg = ""
self.mssg_detail = ""
self.err = ""
self.err_detail = ""
I created a function decorator that perform a try/except on the function call, and add a message either to .mssg or .err on the Log object accordingly.
def logging(fun):
#functools.wraps(fun)
def inner(self, *args):
try:
f = fun(self, *args)
self.logger.mssg += fun.__name__ +" :ok, "
return f
except Exception as e:
self.logger.err += fun.__name__ +": error: "+str(e.args)
return inner
So usually a script is a class that is composed of multiple methods that are run sequentially.
I hence run those methods (decorated such as mentionned above) , and lastly I upload the Log object into a mysql db.
This works quite fine and alright. But now I want to modify those items so that they integrate with the "official" logging module of python.
What I dont like about that module is that it is not possible to "save" the messages onto 1 log object in order to upload/save to log only at the end of the run. Rather each logging call will write/send the message to a file etc. - which create lots of performances issues sometimes. I could usehandlers.MemoryHandler , but it still doesn't seems to perform as my original system: it is said to collect messages and flush them to another handler periodically - which is not what i want: I want to collect the messages in memory and to flush them on request with an explicit function.
Anyone has any suggestions?
Here is my idea. Use a handler to capture the log in a StringIO. Then you can grab the StringIO whenever you want. Since there was perhaps some confusion in the discussion thread - StringIO is a "file-like" interface for strings, there isn't ever an actual file involved.
import logging
import io
def initialize_logging(log_level, log_name='default_logname'):
logger = logging.getLogger(log_name)
logger.setLevel(log_level)
log_stream = io.StringIO()
if not logger.handlers:
ch = logging.StreamHandler(log_stream)
ch.setLevel(log_level)
ch.setFormatter(logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
))
logger.addHandler(ch)
logger.propagate = 0
return logger, log_stream
And then something like:
>>> logger, log_stream = initialize_logging(logging.INFO, "logname")
>>> logger.warning("Hello World!")
And when you want the log information:
>>> log_stream.getvalue()
'2017-05-16 16:35:03,501 - logname - WARNING - Hello World!\n'
At program start (in the main), you can:
instanciate your custom logger => global variable/singleton.
register a function at program end which will flush your logger.
Run your decorated functions.
To register a function you can use atexit.register function. See the page Exit handlers in the doc.
EDIT
The idea above can be simplified.
To delay the logging, you can use the standard MemoryHandler handler, described in the page logging.handlers — Logging handlers
Take a look at this GitHub project: https://github.com/tantale/python-ini-cfg-demo
And replace the INI file by this:
[formatters]
keys=default
[formatter_default]
format=%(asctime)s:%(levelname)s:%(message)s
class=logging.Formatter
[handlers]
keys=console, alternate
[handler_console]
class=logging.handlers.MemoryHandler
formatter=default
args=(1024, INFO)
target=alternate
[handler_alternate]
class=logging.StreamHandler
formatter=default
args=()
[loggers]
keys=root
[logger_root]
level=DEBUG
formatter=default
handlers=console
To log to a database table, just replace the alternate handler by your own database handler.
There is some blog/SO questions about that:
You can look at Logging Exceptions To Your SQLAlchemy Database to create a SQLAlchemyHandler
See Store Django log to database if you are using DJango.
EDIT2
Note: ORM generally support "Eager loading", for instance with SqlAlchemy
What I want:
To use the logging library instead of print statements, everywhere. Some times it is nice to not terminate with a new line. Consider this simplified example:
for file in files:
print('Loading {}'.format(file), end='', flush=True)
try:
data = load(file)
print('\rLoaded {}'.format(file))
except:
print('\rFailed loading {}'.format(file))
The obvious way would be to use:
handler = logging.StreamHandler()
handler.terminator = ""
However, I do not want to add a handler to my library, and I do want the default behaviour of my main logger to be to terminate with a new line. Terminating with "" feels like it should be the exception, rather than the rule.
Is there a way that I could do something like:
logger.info(msg, terminator="")
without having to create a lot of subclasses to the logging module?
Is my take on the problem reasonable, or is there a better way of handling this?
I had a similar issue and this is what I use to get the results I wanted, seems to be similar to what you are trying to achieve:
import logging
def getLogger(name, fmt="[%(asctime)s]%(name)s<%(levelname)s>%(message)s",
terminator='\n'):
logger = logging.getLogger(name)
cHandle = logging.StreamHandler()
cHandle.terminator = terminator
cHandle.setFormatter(logging.Formatter(fmt=fmt, datefmt="%H:%M:%S"))
logger.addHandler(cHandle)
return logger
logger = getLogger(r'\n', terminator='\n')
rlogger = getLogger(r'\r', terminator='\r')
logger.setLevel(logging.DEBUG)
rlogger.setLevel(logging.DEBUG)
logger.info('test0')
logger.info('test1')
logger.info('-----------------------\n')
rlogger.info('test2')
rlogger.info('test3\n\n')
for i in range(100000):
rlogger.info("%d/%d", i + 1, 100000)
rlogger.info('\n')
Results:
[14:48:00]\n<INFO>test0
[14:48:00]\n<INFO>test1
[14:48:00]\n<INFO>-----------------------
[14:48:00]\r<INFO>test3
[14:48:04]\r<INFO>100000/100000
Is there some simple way of capturing and making assertions about logged messages with nose?
For example, I'd like to be able to do something like:
cook_eggs()
assert_logged("eggs are ready!")
You can create a custom handler which can check for the message being sent through logging. The BufferingHandler is a perfect match for this job.
You might also want to attach in your test the handler to any logger you are using in your code, such as logging.getLogger('foo').addHandler(...). You could eventually attach the handler in the setUp and tearDown methods of your test case.
import logging
import logging.handlers
class AssertingHandler(logging.handlers.BufferingHandler):
def __init__(self,capacity):
logging.handlers.BufferingHandler.__init__(self,capacity)
def assert_logged(self,test_case,msg):
for record in self.buffer:
s = self.format(record)
if s == msg:
return
test_case.assertTrue(False, "Failed to find log message: " + msg)
def cook_eggs():
logging.warn("eggs are ready!")
import unittest
class TestLogging(unittest.TestCase):
def test(self):
asserting_handler = AssertingHandler(10)
logging.getLogger().addHandler(asserting_handler)
cook_eggs()
asserting_handler.assert_logged(self,"eggs are ready!")
logging.getLogger().removeHandler(asserting_handler)
unittest.main()
This is what "Mock Objects" are for.
You can use a mock version of logging which will properly buffer the log messages so that you can later make assertions about them.
Just FWIW, in datalad project we needed similar functionality but also to just swallow the logs (and possibly introspect). So here came the solution -- swallow_logs context handler: https://github.com/datalad/datalad/blob/master/datalad/utils.py#L296 (currently at b633c9da46ab9cccde3d4767928d167a91857153). So now in the test we do smth like
def test_swallow_logs():
lgr = logging.getLogger('datalad')
with swallow_logs(new_level=9) as cm:
eq_(cm.out, '')
lgr.log(8, "very heavy debug")
eq_(cm.out, '') # not even visible at level 9
lgr.log(9, "debug1")
eq_(cm.out, 'debug1\n') # not even visible at level 9
lgr.info("info")
eq_(cm.out, 'debug1\ninfo\n') # not even visible at level 9
So right now we have a lot of python scripts and we are trying to consolidate them and fix and redundancies. One of the things we are trying to do, is to ensure that all sys.stdout/sys.stderr goes into the python logging module.
Now the main thing is, we want the following printed out:
[<ERROR LEVEL>] | <TIME> | <WHERE> | <MSG>
Now all sys.stdout / sys.stderr msgs pretty much in all of the python error messages are in the format of [LEVEL] - MSG, which are all written using sys.stdout/sys.stderr. I can parse the fine, in my sys.stdout wrapper and in the sys.stderr wrapper. Then call the corresponding logging level, depending on the parsed input.
So basically we have a package called foo, and a subpackage called log. In __init__.py we define the following:
def initLogging(default_level = logging.INFO, stdout_wrapper = None, \
stderr_wrapper = None):
"""
Initialize the default logging sub system
"""
root_logger = logging.getLogger('')
strm_out = logging.StreamHandler(sys.__stdout__)
strm_out.setFormatter(logging.Formatter(DEFAULT_LOG_TIME_FORMAT, \
DEFAULT_LOG_TIME_FORMAT))
root_logger.setLevel(default_level)
root_logger.addHandler(strm_out)
console_logger = logging.getLogger(LOGGER_CONSOLE)
strm_out = logging.StreamHandler(sys.__stdout__)
#strm_out.setFormatter(logging.Formatter(DEFAULT_LOG_MSG_FORMAT, \
# DEFAULT_LOG_TIME_FORMAT))
console_logger.setLevel(logging.INFO)
console_logger.addHandler(strm_out)
if stdout_wrapper:
sys.stdout = stdout_wrapper
if stderr_wrapper:
sys.stderr = stderr_wrapper
def cleanMsg(msg, is_stderr = False):
logy = logging.getLogger('MSG')
msg = msg.rstrip('\n').lstrip('\n')
p_level = r'^(\s+)?\[(?P<LEVEL>\w+)\](\s+)?(?P<MSG>.*)$'
m = re.match(p_level, msg)
if m:
msg = m.group('MSG')
if m.group('LEVEL') in ('WARNING'):
logy.warning(msg)
return
elif m.group('LEVEL') in ('ERROR'):
logy.error(msg)
return
if is_stderr:
logy.error(msg)
else:
logy.info(msg)
class StdOutWrapper:
"""
Call wrapper for stdout
"""
def write(self, s):
cleanMsg(s, False)
class StdErrWrapper:
"""
Call wrapper for stderr
"""
def write(self, s):
cleanMsg(s, True)
Now we would call this in one of our scripts for example:
import foo.log
foo.log.initLogging(20, foo.log.StdOutWrapper(), foo.log.StdErrWrapper())
sys.stdout.write('[ERROR] Foobar blew')
Which would be converted into an error log message. Like:
[ERROR] | 20090610 083215 | __init__.py | Foobar Blew
Now the problem is when we do that, The module where the error message was logged is now the __init__ (corresponding to foo.log.__init__.py file) which defeats the whole purpose.
I tried doing a deepCopy/shallowCopy of the stderr/stdout objects, but that does nothing, it still says the module the message occured in __init__.py. How can i make it so this doesn't happen?
The problem is that the logging module is looking a single layer up the call stack to find who called it, but now your function is an intermediate layer at that point (Though I'd have expected it to report cleanMsg, not __init__, as that's where you're calling into log()). Instead, you need it to go up two levels, or else pass who your caller is into the logged message. You can do this by inspecting up the stack frame yourself and grabbing the calling function, inserting it into the message.
To find your calling frame, you can use the inspect module:
import inspect
f = inspect.currentframe(N)
will look up N frames, and return you the frame pointer. ie your immediate caller is currentframe(1), but you may have to go another frame up if this is the stdout.write method.
Once you have the calling frame, you can get the executing code object, and look at the file and function name associated with it. eg:
code = f.f_code
caller = '%s:%s' % (code.co_filename, code.co_name)
You may also need to put some code to handle non-python code calling into you (ie. C functions or builtins), as these may lack f_code objects.
Alternatively, following up mikej's answer, you could use the same approach in a custom Logger class inheriting from logging.Logger that overrides findCaller to navigate several frames up, rather than one.
I think the problem is that your actual log messages are now being created by the logy.error and logy.info calls in cleanMsg, hence that method is the source of the log messages and you are seeing this as __init__.py
If you look in the source of Python's lib/logging/__init__.py you will see a method defined called findCaller which is what the logging module uses to derive the caller of a logging request.
Perhaps you can override this on your logging object to customise the behaviour?