Read from a log file as it's being written using python - python

I'm trying to find a nice way to read a log file in real time using python. I'd like to process lines from a log file one at a time as it is written. Somehow I need to keep trying to read the file until it is created and then continue to process lines until I terminate the process. Is there an appropriate way to do this? Thanks.

Take a look at this PDF starting at page 38, ~slide I-77 and you'll find all the info you need. Of course the rest of the slides are amazing, too, but those specifically deal with your issue:
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line

You could try with something like this:
import time
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print line, # already has newline
Example was extracted from here.

As this is Python and logging tagged, there is another possibility to do this.
I assume this is based on a Python logger, logging.Handler based.
You can just create a class that gets the (named) logger instance and overwrite the emit function to put it onto a GUI (if you need console just add a console handler to the file handler)
Example:
import logging
class log_viewer(logging.Handler):
""" Class to redistribute python logging data """
# have a class member to store the existing logger
logger_instance = logging.getLogger("SomeNameOfYourExistingLogger")
def __init__(self, *args, **kwargs):
# Initialize the Handler
logging.Handler.__init__(self, *args)
# optional take format
# setFormatter function is derived from logging.Handler
for key, value in kwargs.items():
if "{}".format(key) == "format":
self.setFormatter(value)
# make the logger send data to this class
self.logger_instance.addHandler(self)
def emit(self, record):
""" Overload of logging.Handler method """
record = self.format(record)
# ---------------------------------------
# Now you can send it to a GUI or similar
# "Do work" starts here.
# ---------------------------------------
# just as an example what e.g. a console
# handler would do:
print(record)
I am currently using similar code to add a TkinterTreectrl.Multilistbox for viewing logger output at runtime.
Off-Side: The logger only gets data as soon as it is initialized, so if you want to have all your data available, you need to initialize it at the very beginning. (I know this is what is expected, but I think it is worth being mentioned.)

Maybe you could do a system call to
tail -f
using os.system()

Related

Python File Handler with Rotating Content of File

I have written a simple logging program that attaches anything I send to it to a file:
def log(message):
with open ("log.txt", 'a+') as f:
f.write(message + "\n")
However, I would like to limit how big this file gets. When it gets to the maximum size, I would like for it to remove the first lines and append at the bottom.
Is this possible with a file handler or do I need to code it myself? I am also fine using a rotating file handler, but all the examples I have seen let the environment write exceptions automatically after setting a level, and I need to control what is written to the file.
Many thanks in advance!
This is an example of using python's built in RotatingFileHandler:
import logging
from logging.handlers import RotatingFileHandler
# change to a file you want to log to
logFile = 'log_r.log'
my_handler = RotatingFileHandler(logFile, mode='a', maxBytes=5*1024*1024,
backupCount=2, encoding=None, delay=0)
my_handler.setLevel(logging.INFO)
app_log = logging.getLogger('root')
app_log.setLevel(logging.INFO)
app_log.addHandler(my_handler)
def bad():
raise Exception("Something bad")
if __name__ == "__main__":
app_log.info("something")
try:
app_log.info("trying to run bad")
bad()
except Exception as e:
app_log.info("That was bad...")
finally:
app_log.info("Ran bad...")
The behaviour is slightly different to your proposed behaviour as it doesn't delete from the start of the file, instead moving the file to a different filename and starting from scratch.
Note that the only things that show in the log file when you run this are the pieces of text we're logging explicitly - i.e. no system junk you don't want.

How can I decorate python logging output?

I use import logging module for logging inside the AWS lambda with python 3.7 runtime.
I would like to perform certain manipulations on log statements before they are flushed to stdout, e.g. wrap the message as json and add tracing data, so that they would be parseable by Kibana parser.
I don't want to write my own decorator for that because that won't work for underlying dependencies.
Ideally, it should be something like a configured callback for the logger
so that it would do following work for me:
log_statement = {}
log_statement['message'] = 'this is the message'
log_statement['X-B3-TraceId'] = "76b85f5e32ce7b46"
log_statement['level'] = 'INFO'
sys.stdout.write(json.dumps(log_statement) + '\n')
while having still logger.info('this is the message').
How can I do that?
Answering my own question:
I had to use LoggerAdapter that is quite a good fit for the purpose of pre-processing log statements:
import logging
class CustomAdapter(logging.LoggerAdapter):
def process(self, msg, kwargs):
log_statement = '{"X-B3-TraceId":"%s", "message":"%s"}' % (self.extra['X-B3-TraceId'], msg) + '\n'
return log_statement, kwargs
See: https://docs.python.org/3/howto/logging-cookbook.html#using-loggeradapters-to-impart-contextual-information
In general, the next step would be just plugging in the adapter like:
import logging
...
logging.basicConfig(format='%(message)s')
logger = logging.getLogger()
logger.setLevel(LOG_LEVEL)
custom_logger = CustomAdapter(logger, {'X-B3-TraceId': "test"})
...
custom_logger.info("test")
Note: I had to put format as a message only because I need to get the whole statement as a JSON string. Unfortunately, thus I lost some predefined log statement parts, e.g. aws_request_id. This is the limitation of LoggerAdapter#process as it handles only the message part. If anyone has a better approach here, pls suggest.
It appears that AWS lambda python runtime somehow interferes with logging facility and changing the format like above did not work. So I had to do additionally this:
FORMAT = "%(message)s"
logger = logging.getLogger()
for h in logger.handlers:
h.setFormatter(logging.Formatter(FORMAT))
See: https://gist.github.com/niranjv/fb95e716151642e8ca553b0e38dd152e

Logging module: too many open file descriptors

I am using Python logging module to print logs to a file, but I encountered the issue that "too many open file descriptors", I did remember to close the log file handlers, but the issue was still there.
Below is my code
class LogService(object):
__instance = None
def __init__(self):
self.__logger = logging.getLogger('ddd')
self.__handler = logging.FileHandler('/var/log/ddd/ddd.log')
self.__formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
self.__handler.setFormatter(self.__formatter)
#self.__logger.addHandler(self.__handler)
#classmethod
def getInstance(cls):
if cls.__instance == None:
cls.__instance = LogService()
return cls.__instance
# log Error
def logError(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.ERROR)
self.__logger.error(msg)
# Remember to close the file handler
self.closeHandler()
# log Warning
def logWarning(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.WARNING)
self.__logger.warn(msg)
# Remember to close the file handler
self.closeHandler()
# log Info
def logInfo(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.INFO)
self.__logger.info(msg)
# Remember to close the file handler
self.closeHandler()
def closeHandler(self):
self.__logger.removeHandler(self.__handler)
self.__handler.close()
And after running this code for a while, the following showed that there were too many open file descriptors.
[root#my-centos ~]# lsof | grep ddd | wc -l
11555
No no. The usage is far simpler
import logging
logging.basicConfig()
logger = logging.getLogger("mylogger")
logger.info("test")
logger.debug("test")
In your case you are appending the handler in every logging operation, which is at least overkill.
Check the documentation https://docs.python.org/2/library/logging.html
Each time you log anything, you add another instance of the handler.
Yes, you close it every time. But this just means it takes slightly longer to blow up. Closing it doesn't remove it from the logger.
The first message, you have one handler, so you open one file descriptor and then close it.
The next message, you have two handlers, so you open two file descriptors and close them.
The next message, you open three file descriptors and close them.
And so on, until you're opening more file descriptors than you're allowed to, and you get an error.
To solution is just to not do that.

python monitor a log file non blocking

I have a test stub that will write several log messages to the system log.
But, this system log gets updated by many other applications as well. So, basically, I want to do a tail -f system.log | grep "application name" to get only the appropriate log messages.
I was looking at dbaez generator tricks, and I am trying to combine the both http://www.dabeaz.com/generators/follow.py and http://www.dabeaz.com/generators/apachelog.py
So, in my __main__(), I have something like this:
try:
dosomeprocessing() #outputs stuff to the log file
And within dosomeprocessing(), I run a loop, and for each loop, I want to see if there are any new log messages caused by my application, and not necessarily print it out, but store them somewhere to do some validation.
logfile = open("/var/adm/messages","r")
loglines = follow(logfile)
logpats = r'I2G(JV)'
logpat = re.compile(logpats)
groups = (logpat.match(line) for line in loglines)
for g in groups:
if g:
print g.groups()
The log looks something like :
Feb 4 12:55:27 Someprocessname.py I2G(JV)-300[20448]: [ID 702911 local2.error] [MSG-70047] xxxxxxxxxxxxxxxxxxxxxxx
Feb 4 12:55:27 Someprocessname.py I2G(JV)-300[20448]: [ID 702911 local2.error] [MSG-70055] xxxxxxxxxxxxxxxxxxxxxxx
in addition to a lot of other gobblygook.
Right now, it gets stuck in the for g in groups:
I am relatively new to python and asynchronous programming. Ideally, I would like to be able to have the tail running parallely to the main process, and read new data with each loop.
Please let me know if I need to add more information.
I suggest you use either watchdog or pyinotify to monitor changes to your log file.
Also, I would suggest to remember last position you read from. After you get IN_MODIFY notification, you could read from last position to the end of file and apply your loop again. Also, reset last position to 0 when it is bigger than size of file in case file was truncated.
Here is example:
import pyinotify
import re
import os
wm = pyinotify.WatchManager()
mask = pyinotify.IN_MODIFY
class EventHandler (pyinotify.ProcessEvent):
def __init__(self, file_path, *args, **kwargs):
super(EventHandler, self).__init__(*args, **kwargs)
self.file_path = file_path
self._last_position = 0
logpats = r'I2G\(JV\)'
self._logpat = re.compile(logpats)
def process_IN_MODIFY(self, event):
print "File changed: ", event.pathname
if self._last_position > os.path.getsize(self.file_path):
self._last_position = 0
with open(self.file_path) as f:
f.seek(self._last_position)
loglines = f.readlines()
self._last_position = f.tell()
groups = (self._logpat.search(line.strip()) for line in loglines)
for g in groups:
if g:
print g.string
handler = EventHandler('some_log.log')
notifier = pyinotify.Notifier(wm, handler)
wm.add_watch(handler.file_path, mask)
notifier.loop()

redirecting sys.stdout to python logging

So right now we have a lot of python scripts and we are trying to consolidate them and fix and redundancies. One of the things we are trying to do, is to ensure that all sys.stdout/sys.stderr goes into the python logging module.
Now the main thing is, we want the following printed out:
[<ERROR LEVEL>] | <TIME> | <WHERE> | <MSG>
Now all sys.stdout / sys.stderr msgs pretty much in all of the python error messages are in the format of [LEVEL] - MSG, which are all written using sys.stdout/sys.stderr. I can parse the fine, in my sys.stdout wrapper and in the sys.stderr wrapper. Then call the corresponding logging level, depending on the parsed input.
So basically we have a package called foo, and a subpackage called log. In __init__.py we define the following:
def initLogging(default_level = logging.INFO, stdout_wrapper = None, \
stderr_wrapper = None):
"""
Initialize the default logging sub system
"""
root_logger = logging.getLogger('')
strm_out = logging.StreamHandler(sys.__stdout__)
strm_out.setFormatter(logging.Formatter(DEFAULT_LOG_TIME_FORMAT, \
DEFAULT_LOG_TIME_FORMAT))
root_logger.setLevel(default_level)
root_logger.addHandler(strm_out)
console_logger = logging.getLogger(LOGGER_CONSOLE)
strm_out = logging.StreamHandler(sys.__stdout__)
#strm_out.setFormatter(logging.Formatter(DEFAULT_LOG_MSG_FORMAT, \
# DEFAULT_LOG_TIME_FORMAT))
console_logger.setLevel(logging.INFO)
console_logger.addHandler(strm_out)
if stdout_wrapper:
sys.stdout = stdout_wrapper
if stderr_wrapper:
sys.stderr = stderr_wrapper
def cleanMsg(msg, is_stderr = False):
logy = logging.getLogger('MSG')
msg = msg.rstrip('\n').lstrip('\n')
p_level = r'^(\s+)?\[(?P<LEVEL>\w+)\](\s+)?(?P<MSG>.*)$'
m = re.match(p_level, msg)
if m:
msg = m.group('MSG')
if m.group('LEVEL') in ('WARNING'):
logy.warning(msg)
return
elif m.group('LEVEL') in ('ERROR'):
logy.error(msg)
return
if is_stderr:
logy.error(msg)
else:
logy.info(msg)
class StdOutWrapper:
"""
Call wrapper for stdout
"""
def write(self, s):
cleanMsg(s, False)
class StdErrWrapper:
"""
Call wrapper for stderr
"""
def write(self, s):
cleanMsg(s, True)
Now we would call this in one of our scripts for example:
import foo.log
foo.log.initLogging(20, foo.log.StdOutWrapper(), foo.log.StdErrWrapper())
sys.stdout.write('[ERROR] Foobar blew')
Which would be converted into an error log message. Like:
[ERROR] | 20090610 083215 | __init__.py | Foobar Blew
Now the problem is when we do that, The module where the error message was logged is now the __init__ (corresponding to foo.log.__init__.py file) which defeats the whole purpose.
I tried doing a deepCopy/shallowCopy of the stderr/stdout objects, but that does nothing, it still says the module the message occured in __init__.py. How can i make it so this doesn't happen?
The problem is that the logging module is looking a single layer up the call stack to find who called it, but now your function is an intermediate layer at that point (Though I'd have expected it to report cleanMsg, not __init__, as that's where you're calling into log()). Instead, you need it to go up two levels, or else pass who your caller is into the logged message. You can do this by inspecting up the stack frame yourself and grabbing the calling function, inserting it into the message.
To find your calling frame, you can use the inspect module:
import inspect
f = inspect.currentframe(N)
will look up N frames, and return you the frame pointer. ie your immediate caller is currentframe(1), but you may have to go another frame up if this is the stdout.write method.
Once you have the calling frame, you can get the executing code object, and look at the file and function name associated with it. eg:
code = f.f_code
caller = '%s:%s' % (code.co_filename, code.co_name)
You may also need to put some code to handle non-python code calling into you (ie. C functions or builtins), as these may lack f_code objects.
Alternatively, following up mikej's answer, you could use the same approach in a custom Logger class inheriting from logging.Logger that overrides findCaller to navigate several frames up, rather than one.
I think the problem is that your actual log messages are now being created by the logy.error and logy.info calls in cleanMsg, hence that method is the source of the log messages and you are seeing this as __init__.py
If you look in the source of Python's lib/logging/__init__.py you will see a method defined called findCaller which is what the logging module uses to derive the caller of a logging request.
Perhaps you can override this on your logging object to customise the behaviour?

Categories

Resources