redirecting sys.stdout to python logging - python

So right now we have a lot of python scripts and we are trying to consolidate them and fix and redundancies. One of the things we are trying to do, is to ensure that all sys.stdout/sys.stderr goes into the python logging module.
Now the main thing is, we want the following printed out:
[<ERROR LEVEL>] | <TIME> | <WHERE> | <MSG>
Now all sys.stdout / sys.stderr msgs pretty much in all of the python error messages are in the format of [LEVEL] - MSG, which are all written using sys.stdout/sys.stderr. I can parse the fine, in my sys.stdout wrapper and in the sys.stderr wrapper. Then call the corresponding logging level, depending on the parsed input.
So basically we have a package called foo, and a subpackage called log. In __init__.py we define the following:
def initLogging(default_level = logging.INFO, stdout_wrapper = None, \
stderr_wrapper = None):
"""
Initialize the default logging sub system
"""
root_logger = logging.getLogger('')
strm_out = logging.StreamHandler(sys.__stdout__)
strm_out.setFormatter(logging.Formatter(DEFAULT_LOG_TIME_FORMAT, \
DEFAULT_LOG_TIME_FORMAT))
root_logger.setLevel(default_level)
root_logger.addHandler(strm_out)
console_logger = logging.getLogger(LOGGER_CONSOLE)
strm_out = logging.StreamHandler(sys.__stdout__)
#strm_out.setFormatter(logging.Formatter(DEFAULT_LOG_MSG_FORMAT, \
# DEFAULT_LOG_TIME_FORMAT))
console_logger.setLevel(logging.INFO)
console_logger.addHandler(strm_out)
if stdout_wrapper:
sys.stdout = stdout_wrapper
if stderr_wrapper:
sys.stderr = stderr_wrapper
def cleanMsg(msg, is_stderr = False):
logy = logging.getLogger('MSG')
msg = msg.rstrip('\n').lstrip('\n')
p_level = r'^(\s+)?\[(?P<LEVEL>\w+)\](\s+)?(?P<MSG>.*)$'
m = re.match(p_level, msg)
if m:
msg = m.group('MSG')
if m.group('LEVEL') in ('WARNING'):
logy.warning(msg)
return
elif m.group('LEVEL') in ('ERROR'):
logy.error(msg)
return
if is_stderr:
logy.error(msg)
else:
logy.info(msg)
class StdOutWrapper:
"""
Call wrapper for stdout
"""
def write(self, s):
cleanMsg(s, False)
class StdErrWrapper:
"""
Call wrapper for stderr
"""
def write(self, s):
cleanMsg(s, True)
Now we would call this in one of our scripts for example:
import foo.log
foo.log.initLogging(20, foo.log.StdOutWrapper(), foo.log.StdErrWrapper())
sys.stdout.write('[ERROR] Foobar blew')
Which would be converted into an error log message. Like:
[ERROR] | 20090610 083215 | __init__.py | Foobar Blew
Now the problem is when we do that, The module where the error message was logged is now the __init__ (corresponding to foo.log.__init__.py file) which defeats the whole purpose.
I tried doing a deepCopy/shallowCopy of the stderr/stdout objects, but that does nothing, it still says the module the message occured in __init__.py. How can i make it so this doesn't happen?

The problem is that the logging module is looking a single layer up the call stack to find who called it, but now your function is an intermediate layer at that point (Though I'd have expected it to report cleanMsg, not __init__, as that's where you're calling into log()). Instead, you need it to go up two levels, or else pass who your caller is into the logged message. You can do this by inspecting up the stack frame yourself and grabbing the calling function, inserting it into the message.
To find your calling frame, you can use the inspect module:
import inspect
f = inspect.currentframe(N)
will look up N frames, and return you the frame pointer. ie your immediate caller is currentframe(1), but you may have to go another frame up if this is the stdout.write method.
Once you have the calling frame, you can get the executing code object, and look at the file and function name associated with it. eg:
code = f.f_code
caller = '%s:%s' % (code.co_filename, code.co_name)
You may also need to put some code to handle non-python code calling into you (ie. C functions or builtins), as these may lack f_code objects.
Alternatively, following up mikej's answer, you could use the same approach in a custom Logger class inheriting from logging.Logger that overrides findCaller to navigate several frames up, rather than one.

I think the problem is that your actual log messages are now being created by the logy.error and logy.info calls in cleanMsg, hence that method is the source of the log messages and you are seeing this as __init__.py
If you look in the source of Python's lib/logging/__init__.py you will see a method defined called findCaller which is what the logging module uses to derive the caller of a logging request.
Perhaps you can override this on your logging object to customise the behaviour?

Related

How can I decorate python logging output?

I use import logging module for logging inside the AWS lambda with python 3.7 runtime.
I would like to perform certain manipulations on log statements before they are flushed to stdout, e.g. wrap the message as json and add tracing data, so that they would be parseable by Kibana parser.
I don't want to write my own decorator for that because that won't work for underlying dependencies.
Ideally, it should be something like a configured callback for the logger
so that it would do following work for me:
log_statement = {}
log_statement['message'] = 'this is the message'
log_statement['X-B3-TraceId'] = "76b85f5e32ce7b46"
log_statement['level'] = 'INFO'
sys.stdout.write(json.dumps(log_statement) + '\n')
while having still logger.info('this is the message').
How can I do that?
Answering my own question:
I had to use LoggerAdapter that is quite a good fit for the purpose of pre-processing log statements:
import logging
class CustomAdapter(logging.LoggerAdapter):
def process(self, msg, kwargs):
log_statement = '{"X-B3-TraceId":"%s", "message":"%s"}' % (self.extra['X-B3-TraceId'], msg) + '\n'
return log_statement, kwargs
See: https://docs.python.org/3/howto/logging-cookbook.html#using-loggeradapters-to-impart-contextual-information
In general, the next step would be just plugging in the adapter like:
import logging
...
logging.basicConfig(format='%(message)s')
logger = logging.getLogger()
logger.setLevel(LOG_LEVEL)
custom_logger = CustomAdapter(logger, {'X-B3-TraceId': "test"})
...
custom_logger.info("test")
Note: I had to put format as a message only because I need to get the whole statement as a JSON string. Unfortunately, thus I lost some predefined log statement parts, e.g. aws_request_id. This is the limitation of LoggerAdapter#process as it handles only the message part. If anyone has a better approach here, pls suggest.
It appears that AWS lambda python runtime somehow interferes with logging facility and changing the format like above did not work. So I had to do additionally this:
FORMAT = "%(message)s"
logger = logging.getLogger()
for h in logger.handlers:
h.setFormatter(logging.Formatter(FORMAT))
See: https://gist.github.com/niranjv/fb95e716151642e8ca553b0e38dd152e

Logging module: too many open file descriptors

I am using Python logging module to print logs to a file, but I encountered the issue that "too many open file descriptors", I did remember to close the log file handlers, but the issue was still there.
Below is my code
class LogService(object):
__instance = None
def __init__(self):
self.__logger = logging.getLogger('ddd')
self.__handler = logging.FileHandler('/var/log/ddd/ddd.log')
self.__formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
self.__handler.setFormatter(self.__formatter)
#self.__logger.addHandler(self.__handler)
#classmethod
def getInstance(cls):
if cls.__instance == None:
cls.__instance = LogService()
return cls.__instance
# log Error
def logError(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.ERROR)
self.__logger.error(msg)
# Remember to close the file handler
self.closeHandler()
# log Warning
def logWarning(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.WARNING)
self.__logger.warn(msg)
# Remember to close the file handler
self.closeHandler()
# log Info
def logInfo(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.INFO)
self.__logger.info(msg)
# Remember to close the file handler
self.closeHandler()
def closeHandler(self):
self.__logger.removeHandler(self.__handler)
self.__handler.close()
And after running this code for a while, the following showed that there were too many open file descriptors.
[root#my-centos ~]# lsof | grep ddd | wc -l
11555
No no. The usage is far simpler
import logging
logging.basicConfig()
logger = logging.getLogger("mylogger")
logger.info("test")
logger.debug("test")
In your case you are appending the handler in every logging operation, which is at least overkill.
Check the documentation https://docs.python.org/2/library/logging.html
Each time you log anything, you add another instance of the handler.
Yes, you close it every time. But this just means it takes slightly longer to blow up. Closing it doesn't remove it from the logger.
The first message, you have one handler, so you open one file descriptor and then close it.
The next message, you have two handlers, so you open two file descriptors and close them.
The next message, you open three file descriptors and close them.
And so on, until you're opening more file descriptors than you're allowed to, and you get an error.
To solution is just to not do that.

pylint on in-memory file/stream

I'd like to embed pylint in a program. The user enters python programs (in Qt, in a QTextEdit, although not relevant) and in the background I call pylint to check the text he enters. Finally, I print the errors in a message box.
There are thus two questions: First, how can I do this without writing the entered text to a temporary file and giving it to pylint ? I suppose at some point pylint (or astroid) handles a stream and not a file anymore.
And, more importantly, is it a good idea ? Would it cause problems for imports or other stuffs ? Intuitively I would say no since it seems to spawn a new process (with epylint) but I'm no python expert so I'm really not sure. And if I use this to launch pylint, is it okay too ?
Edit:
I tried tinkering with pylint's internals, event fought with it, but finally have been stuck at some point.
Here is the code so far:
from astroid.builder import AstroidBuilder
from astroid.exceptions import AstroidBuildingException
from logilab.common.interface import implements
from pylint.interfaces import IRawChecker, ITokenChecker, IAstroidChecker
from pylint.lint import PyLinter
from pylint.reporters.text import TextReporter
from pylint.utils import PyLintASTWalker
class Validator():
def __init__(self):
self._messagesBuffer = InMemoryMessagesBuffer()
self._validator = None
self.initValidator()
def initValidator(self):
self._validator = StringPyLinter(reporter=TextReporter(output=self._messagesBuffer))
self._validator.load_default_plugins()
self._validator.disable('W0704')
self._validator.disable('I0020')
self._validator.disable('I0021')
self._validator.prepare_import_path([])
def destroyValidator(self):
self._validator.cleanup_import_path()
def check(self, string):
return self._validator.check(string)
class InMemoryMessagesBuffer():
def __init__(self):
self.content = []
def write(self, st):
self.content.append(st)
def messages(self):
return self.content
def reset(self):
self.content = []
class StringPyLinter(PyLinter):
"""Does what PyLinter does but sets checkers once
and redefines get_astroid to call build_string"""
def __init__(self, options=(), reporter=None, option_groups=(), pylintrc=None):
super(StringPyLinter, self).__init__(options, reporter, option_groups, pylintrc)
self._walker = None
self._used_checkers = None
self._tokencheckers = None
self._rawcheckers = None
self.initCheckers()
def __del__(self):
self.destroyCheckers()
def initCheckers(self):
self._walker = PyLintASTWalker(self)
self._used_checkers = self.prepare_checkers()
self._tokencheckers = [c for c in self._used_checkers if implements(c, ITokenChecker)
and c is not self]
self._rawcheckers = [c for c in self._used_checkers if implements(c, IRawChecker)]
# notify global begin
for checker in self._used_checkers:
checker.open()
if implements(checker, IAstroidChecker):
self._walker.add_checker(checker)
def destroyCheckers(self):
self._used_checkers.reverse()
for checker in self._used_checkers:
checker.close()
def check(self, string):
modname = "in_memory"
self.set_current_module(modname)
astroid = self.get_astroid(string, modname)
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
self._add_suppression_messages()
self.set_current_module('')
self.stats['statement'] = self._walker.nbstatements
def get_astroid(self, string, modname):
"""return an astroid representation for a module"""
try:
return AstroidBuilder().string_build(string, modname)
except SyntaxError as ex:
self.add_message('E0001', line=ex.lineno, args=ex.msg)
except AstroidBuildingException as ex:
self.add_message('F0010', args=ex)
except Exception as ex:
import traceback
traceback.print_exc()
self.add_message('F0002', args=(ex.__class__, ex))
if __name__ == '__main__':
code = """
a = 1
print(a)
"""
validator = Validator()
print(validator.check(code))
The traceback is the following:
Traceback (most recent call last):
File "validator.py", line 16, in <module>
main()
File "validator.py", line 13, in main
print(validator.check(code))
File "validator.py", line 30, in check
self._validator.check(string)
File "validator.py", line 79, in check
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
File "c:\Python33\lib\site-packages\pylint\lint.py", line 659, in check_astroid_module
tokens = tokenize_module(astroid)
File "c:\Python33\lib\site-packages\pylint\utils.py", line 103, in tokenize_module
print(module.file_stream)
AttributeError: 'NoneType' object has no attribute 'file_stream'
# And sometimes this is added :
File "c:\Python33\lib\site-packages\astroid\scoped_nodes.py", line 251, in file_stream
return open(self.file, 'rb')
OSError: [Errno 22] Invalid argument: '<?>'
I'll continue digging tomorrow. :)
I got it running.
the first one (NoneType …) is really easy and a bug in your code:
Encountering an exception can make get_astroid “fail”, i.e. send one syntax error message and return!
But for the secong one… such bullshit in pylint’s/logilab’s API… Let me explain: Your astroid object here is of type astroid.scoped_nodes.Module.
It’s also created by a factory, AstroidBuilder, which sets astroid.file = '<?>'.
Unfortunately, the Module class has following property:
#property
def file_stream(self):
if self.file is not None:
return open(self.file, 'rb')
return None
And there’s no way to skip that except for subclassing (Which would render us unable to use the magic in AstroidBuilder), so… monkey patching!
We replace the ill-defined property with one that checks an instance for a reference to our code bytes (e.g. astroid._file_bytes) before engaging in above default behavior.
def _monkeypatch_module(module_class):
if module_class.file_stream.fget.__name__ == 'file_stream_patched':
return # only patch if patch isn’t already applied
old_file_stream_fget = module_class.file_stream.fget
def file_stream_patched(self):
if hasattr(self, '_file_bytes'):
return BytesIO(self._file_bytes)
return old_file_stream_fget(self)
module_class.file_stream = property(file_stream_patched)
That monkeypatching can be called just before calling check_astroid_module. But one more thing has to be done. See, there’s more implicit behavior: Some checkers expect and use astroid’s file_encoding field. So we now have this code in the middle of check:
astroid = self.get_astroid(string, modname)
if astroid is not None:
_monkeypatch_module(astroid.__class__)
astroid._file_bytes = string.encode('utf-8')
astroid.file_encoding = 'utf-8'
self.check_astroid_module(astroid, self._walker, self._rawcheckers, self._tokencheckers)
One could say that no amount of linting creates actually good code. Unfortunately pylint unites enormous complexity with a specialization of calling it on files. Really good code has a nice native API and wraps that with a CLI interface. Don’t ask me why file_stream exists if internally, Module gets built from but forgets the source code.
PS: i had to change sth else in your code: load_default_plugins has to come before some other stuff (maybe prepare_checkers, maybe sth. else)
PPS: i suggest subclassing BaseReporter and using that instead of your InMemoryMessagesBuffer
PPPS: this just got pulled (3.2014), and will fix this: https://bitbucket.org/logilab/astroid/pull-request/15/astroidbuilderstring_build-was/diff
4PS: this is now in the official version, so no monkey patching required: astroid.scoped_nodes.Module now has a file_bytes property (without leading underscore).
Working with an unlocatable stream may definitly cause problems in case of relative imports, since the location is then needed to find the actually imported module.
Astroid support building an AST from a stream, but this is not used/exposed through Pylint which is a level higher and designed to work with files. So while you may acheive this it will need a bit of digging into the low-level APIs.
The easiest way is definitly to save the buffer to the file then to use the SA answer to start pylint programmatically if you wish (totally forgot this other account of mine found in other responses ;). Another option being to write a custom reporter to gain more control.

Read from a log file as it's being written using python

I'm trying to find a nice way to read a log file in real time using python. I'd like to process lines from a log file one at a time as it is written. Somehow I need to keep trying to read the file until it is created and then continue to process lines until I terminate the process. Is there an appropriate way to do this? Thanks.
Take a look at this PDF starting at page 38, ~slide I-77 and you'll find all the info you need. Of course the rest of the slides are amazing, too, but those specifically deal with your issue:
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
You could try with something like this:
import time
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print line, # already has newline
Example was extracted from here.
As this is Python and logging tagged, there is another possibility to do this.
I assume this is based on a Python logger, logging.Handler based.
You can just create a class that gets the (named) logger instance and overwrite the emit function to put it onto a GUI (if you need console just add a console handler to the file handler)
Example:
import logging
class log_viewer(logging.Handler):
""" Class to redistribute python logging data """
# have a class member to store the existing logger
logger_instance = logging.getLogger("SomeNameOfYourExistingLogger")
def __init__(self, *args, **kwargs):
# Initialize the Handler
logging.Handler.__init__(self, *args)
# optional take format
# setFormatter function is derived from logging.Handler
for key, value in kwargs.items():
if "{}".format(key) == "format":
self.setFormatter(value)
# make the logger send data to this class
self.logger_instance.addHandler(self)
def emit(self, record):
""" Overload of logging.Handler method """
record = self.format(record)
# ---------------------------------------
# Now you can send it to a GUI or similar
# "Do work" starts here.
# ---------------------------------------
# just as an example what e.g. a console
# handler would do:
print(record)
I am currently using similar code to add a TkinterTreectrl.Multilistbox for viewing logger output at runtime.
Off-Side: The logger only gets data as soon as it is initialized, so if you want to have all your data available, you need to initialize it at the very beginning. (I know this is what is expected, but I think it is worth being mentioned.)
Maybe you could do a system call to
tail -f
using os.system()

How do you subclass the file type in Python?

I'm trying to subclass the built-in file class in Python to add some extra features to stdin and stdout. Here's the code I have so far:
class TeeWithTimestamp(file):
"""
Class used to tee the output of a stream (such as stdout or stderr) into
another stream, and to add a timestamp to each message printed.
"""
def __init__(self, file1, file2):
"""Initializes the TeeWithTimestamp"""
self.file1 = file1
self.file2 = file2
self.at_start_of_line = True
def write(self, text):
"""Writes text to both files, prefixed with a timestamp"""
if len(text):
# Add timestamp if at the start of a line; also add [STDERR]
# for stderr
if self.at_start_of_line:
now = datetime.datetime.now()
prefix = now.strftime('[%H:%M:%S] ')
if self.file1 == sys.__stderr__:
prefix += '[STDERR] '
text = prefix + text
self.file1.write(text)
self.file2.write(text)
self.at_start_of_line = (text[-1] == '\n')
The purpose is to add a timestamp to the beginning of each message, and to log everything to a log file. However, the problem I run into is that if I do this:
# log_file has already been opened
sys.stdout = TeeWithTimestamp(sys.stdout, log_file)
Then when I try to do print 'foo', I get a ValueError: I/O operation on closed file. I can't meaningfully call file.__init__() in my __init__(), since I don't want to open a new file, and I can't assign self.closed = False either, since it's a read-only attribute.
How can I modify this so that I can do print 'foo', and so that it supports all of the standard file attributes and methods?
Calling file.__init__ is quite feasible (e.g., on '/dev/null') but no real use because your attempted override of write doesn't "take" for the purposes of print statements -- the latter internally calls the real file.write when it sees that sys.stdout is an actual instance of file (and by inheriting you've made it so).
print doesn't really need any other method except write, so making your class inherit from object instead of file will work.
If you need other file methods (i.e., print is not all you're doing), you're best advised to implement them yourself.
You can as well avoid using super :
class SuperFile(file):
def __init__(self, *args, **kwargs):
file.__init__(self, *args, **kwargs)
You'll be able to write with it.

Categories

Resources