Logging module: too many open file descriptors - python

I am using Python logging module to print logs to a file, but I encountered the issue that "too many open file descriptors", I did remember to close the log file handlers, but the issue was still there.
Below is my code
class LogService(object):
__instance = None
def __init__(self):
self.__logger = logging.getLogger('ddd')
self.__handler = logging.FileHandler('/var/log/ddd/ddd.log')
self.__formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
self.__handler.setFormatter(self.__formatter)
#self.__logger.addHandler(self.__handler)
#classmethod
def getInstance(cls):
if cls.__instance == None:
cls.__instance = LogService()
return cls.__instance
# log Error
def logError(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.ERROR)
self.__logger.error(msg)
# Remember to close the file handler
self.closeHandler()
# log Warning
def logWarning(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.WARNING)
self.__logger.warn(msg)
# Remember to close the file handler
self.closeHandler()
# log Info
def logInfo(self, msg):
self.__logger.addHandler(self.__handler)
self.__logger.setLevel(logging.INFO)
self.__logger.info(msg)
# Remember to close the file handler
self.closeHandler()
def closeHandler(self):
self.__logger.removeHandler(self.__handler)
self.__handler.close()
And after running this code for a while, the following showed that there were too many open file descriptors.
[root#my-centos ~]# lsof | grep ddd | wc -l
11555

No no. The usage is far simpler
import logging
logging.basicConfig()
logger = logging.getLogger("mylogger")
logger.info("test")
logger.debug("test")
In your case you are appending the handler in every logging operation, which is at least overkill.
Check the documentation https://docs.python.org/2/library/logging.html

Each time you log anything, you add another instance of the handler.
Yes, you close it every time. But this just means it takes slightly longer to blow up. Closing it doesn't remove it from the logger.
The first message, you have one handler, so you open one file descriptor and then close it.
The next message, you have two handlers, so you open two file descriptors and close them.
The next message, you open three file descriptors and close them.
And so on, until you're opening more file descriptors than you're allowed to, and you get an error.
To solution is just to not do that.

Related

Prevent one of the logging handlers for specific messages

I monitor my script with the logging module of the Python Standard Library and I send the loggings to both the console with StreamHandler, and to a file with FileHandler.
I would like to have the option to disable a handler for a LogRecord independantly of its severity. For example, for a specific LogRecord I would like to have the option not to send it to the file destination or to the console (with passing a parameter).
I have found that the library has the Filter class for that reason (which is described as a finer grained way to filter blocks), but haven't figured out how to do it.
Any ideas how to do this in a cosistent way?
Finally, it is quite easy. I used a function as a Handler.filer as suggested in the comments.
This is a working example:
from pathlib import Path
import logging
from logging import LogRecord
def build_handler_filters(handler: str):
def handler_filter(record: LogRecord):
if hasattr(record, 'block'):
if record.block == handler:
return False
return True
return handler_filter
ch = logging.StreamHandler()
ch.addFilter(build_handler_filters('console'))
fh = logging.FileHandler(Path('/tmp/test.log'))
fh.addFilter(build_handler_filters('file'))
mylogger = logging.getLogger(__name__)
mylogger.setLevel(logging.DEBUG)
mylogger.addHandler(ch)
mylogger.addHandler(fh)
When the logger is called, the message is sent to both console and output, i.e.
mylogger.info('msg').
To block for example the file the logger should be called with the extra argument like this
mylogger.info('msg only to console', extra={'block': 'file'})
Disabling console is analogous.

Python File Handler with Rotating Content of File

I have written a simple logging program that attaches anything I send to it to a file:
def log(message):
with open ("log.txt", 'a+') as f:
f.write(message + "\n")
However, I would like to limit how big this file gets. When it gets to the maximum size, I would like for it to remove the first lines and append at the bottom.
Is this possible with a file handler or do I need to code it myself? I am also fine using a rotating file handler, but all the examples I have seen let the environment write exceptions automatically after setting a level, and I need to control what is written to the file.
Many thanks in advance!
This is an example of using python's built in RotatingFileHandler:
import logging
from logging.handlers import RotatingFileHandler
# change to a file you want to log to
logFile = 'log_r.log'
my_handler = RotatingFileHandler(logFile, mode='a', maxBytes=5*1024*1024,
backupCount=2, encoding=None, delay=0)
my_handler.setLevel(logging.INFO)
app_log = logging.getLogger('root')
app_log.setLevel(logging.INFO)
app_log.addHandler(my_handler)
def bad():
raise Exception("Something bad")
if __name__ == "__main__":
app_log.info("something")
try:
app_log.info("trying to run bad")
bad()
except Exception as e:
app_log.info("That was bad...")
finally:
app_log.info("Ran bad...")
The behaviour is slightly different to your proposed behaviour as it doesn't delete from the start of the file, instead moving the file to a different filename and starting from scratch.
Note that the only things that show in the log file when you run this are the pieces of text we're logging explicitly - i.e. no system junk you don't want.

Python: flush logging only at end of script run

Currently I use for logging a custom logging system that works as follow:
I have a Log class that ressemble the following:
class Log:
def __init__(self):
self.script = ""
self.datetime = datetime.datetime.now().replace(second=0, microsecond=0)
self.mssg = ""
self.mssg_detail = ""
self.err = ""
self.err_detail = ""
I created a function decorator that perform a try/except on the function call, and add a message either to .mssg or .err on the Log object accordingly.
def logging(fun):
#functools.wraps(fun)
def inner(self, *args):
try:
f = fun(self, *args)
self.logger.mssg += fun.__name__ +" :ok, "
return f
except Exception as e:
self.logger.err += fun.__name__ +": error: "+str(e.args)
return inner
So usually a script is a class that is composed of multiple methods that are run sequentially.
I hence run those methods (decorated such as mentionned above) , and lastly I upload the Log object into a mysql db.
This works quite fine and alright. But now I want to modify those items so that they integrate with the "official" logging module of python.
What I dont like about that module is that it is not possible to "save" the messages onto 1 log object in order to upload/save to log only at the end of the run. Rather each logging call will write/send the message to a file etc. - which create lots of performances issues sometimes. I could usehandlers.MemoryHandler , but it still doesn't seems to perform as my original system: it is said to collect messages and flush them to another handler periodically - which is not what i want: I want to collect the messages in memory and to flush them on request with an explicit function.
Anyone has any suggestions?
Here is my idea. Use a handler to capture the log in a StringIO. Then you can grab the StringIO whenever you want. Since there was perhaps some confusion in the discussion thread - StringIO is a "file-like" interface for strings, there isn't ever an actual file involved.
import logging
import io
def initialize_logging(log_level, log_name='default_logname'):
logger = logging.getLogger(log_name)
logger.setLevel(log_level)
log_stream = io.StringIO()
if not logger.handlers:
ch = logging.StreamHandler(log_stream)
ch.setLevel(log_level)
ch.setFormatter(logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
))
logger.addHandler(ch)
logger.propagate = 0
return logger, log_stream
And then something like:
>>> logger, log_stream = initialize_logging(logging.INFO, "logname")
>>> logger.warning("Hello World!")
And when you want the log information:
>>> log_stream.getvalue()
'2017-05-16 16:35:03,501 - logname - WARNING - Hello World!\n'
At program start (in the main), you can:
instanciate your custom logger => global variable/singleton.
register a function at program end which will flush your logger.
Run your decorated functions.
To register a function you can use atexit.register function. See the page Exit handlers in the doc.
EDIT
The idea above can be simplified.
To delay the logging, you can use the standard MemoryHandler handler, described in the page logging.handlers — Logging handlers
Take a look at this GitHub project: https://github.com/tantale/python-ini-cfg-demo
And replace the INI file by this:
[formatters]
keys=default
[formatter_default]
format=%(asctime)s:%(levelname)s:%(message)s
class=logging.Formatter
[handlers]
keys=console, alternate
[handler_console]
class=logging.handlers.MemoryHandler
formatter=default
args=(1024, INFO)
target=alternate
[handler_alternate]
class=logging.StreamHandler
formatter=default
args=()
[loggers]
keys=root
[logger_root]
level=DEBUG
formatter=default
handlers=console
To log to a database table, just replace the alternate handler by your own database handler.
There is some blog/SO questions about that:
You can look at Logging Exceptions To Your SQLAlchemy Database to create a SQLAlchemyHandler
See Store Django log to database if you are using DJango.
EDIT2
Note: ORM generally support "Eager loading", for instance with SqlAlchemy

Read from a log file as it's being written using python

I'm trying to find a nice way to read a log file in real time using python. I'd like to process lines from a log file one at a time as it is written. Somehow I need to keep trying to read the file until it is created and then continue to process lines until I terminate the process. Is there an appropriate way to do this? Thanks.
Take a look at this PDF starting at page 38, ~slide I-77 and you'll find all the info you need. Of course the rest of the slides are amazing, too, but those specifically deal with your issue:
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
You could try with something like this:
import time
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print line, # already has newline
Example was extracted from here.
As this is Python and logging tagged, there is another possibility to do this.
I assume this is based on a Python logger, logging.Handler based.
You can just create a class that gets the (named) logger instance and overwrite the emit function to put it onto a GUI (if you need console just add a console handler to the file handler)
Example:
import logging
class log_viewer(logging.Handler):
""" Class to redistribute python logging data """
# have a class member to store the existing logger
logger_instance = logging.getLogger("SomeNameOfYourExistingLogger")
def __init__(self, *args, **kwargs):
# Initialize the Handler
logging.Handler.__init__(self, *args)
# optional take format
# setFormatter function is derived from logging.Handler
for key, value in kwargs.items():
if "{}".format(key) == "format":
self.setFormatter(value)
# make the logger send data to this class
self.logger_instance.addHandler(self)
def emit(self, record):
""" Overload of logging.Handler method """
record = self.format(record)
# ---------------------------------------
# Now you can send it to a GUI or similar
# "Do work" starts here.
# ---------------------------------------
# just as an example what e.g. a console
# handler would do:
print(record)
I am currently using similar code to add a TkinterTreectrl.Multilistbox for viewing logger output at runtime.
Off-Side: The logger only gets data as soon as it is initialized, so if you want to have all your data available, you need to initialize it at the very beginning. (I know this is what is expected, but I think it is worth being mentioned.)
Maybe you could do a system call to
tail -f
using os.system()

redirecting sys.stdout to python logging

So right now we have a lot of python scripts and we are trying to consolidate them and fix and redundancies. One of the things we are trying to do, is to ensure that all sys.stdout/sys.stderr goes into the python logging module.
Now the main thing is, we want the following printed out:
[<ERROR LEVEL>] | <TIME> | <WHERE> | <MSG>
Now all sys.stdout / sys.stderr msgs pretty much in all of the python error messages are in the format of [LEVEL] - MSG, which are all written using sys.stdout/sys.stderr. I can parse the fine, in my sys.stdout wrapper and in the sys.stderr wrapper. Then call the corresponding logging level, depending on the parsed input.
So basically we have a package called foo, and a subpackage called log. In __init__.py we define the following:
def initLogging(default_level = logging.INFO, stdout_wrapper = None, \
stderr_wrapper = None):
"""
Initialize the default logging sub system
"""
root_logger = logging.getLogger('')
strm_out = logging.StreamHandler(sys.__stdout__)
strm_out.setFormatter(logging.Formatter(DEFAULT_LOG_TIME_FORMAT, \
DEFAULT_LOG_TIME_FORMAT))
root_logger.setLevel(default_level)
root_logger.addHandler(strm_out)
console_logger = logging.getLogger(LOGGER_CONSOLE)
strm_out = logging.StreamHandler(sys.__stdout__)
#strm_out.setFormatter(logging.Formatter(DEFAULT_LOG_MSG_FORMAT, \
# DEFAULT_LOG_TIME_FORMAT))
console_logger.setLevel(logging.INFO)
console_logger.addHandler(strm_out)
if stdout_wrapper:
sys.stdout = stdout_wrapper
if stderr_wrapper:
sys.stderr = stderr_wrapper
def cleanMsg(msg, is_stderr = False):
logy = logging.getLogger('MSG')
msg = msg.rstrip('\n').lstrip('\n')
p_level = r'^(\s+)?\[(?P<LEVEL>\w+)\](\s+)?(?P<MSG>.*)$'
m = re.match(p_level, msg)
if m:
msg = m.group('MSG')
if m.group('LEVEL') in ('WARNING'):
logy.warning(msg)
return
elif m.group('LEVEL') in ('ERROR'):
logy.error(msg)
return
if is_stderr:
logy.error(msg)
else:
logy.info(msg)
class StdOutWrapper:
"""
Call wrapper for stdout
"""
def write(self, s):
cleanMsg(s, False)
class StdErrWrapper:
"""
Call wrapper for stderr
"""
def write(self, s):
cleanMsg(s, True)
Now we would call this in one of our scripts for example:
import foo.log
foo.log.initLogging(20, foo.log.StdOutWrapper(), foo.log.StdErrWrapper())
sys.stdout.write('[ERROR] Foobar blew')
Which would be converted into an error log message. Like:
[ERROR] | 20090610 083215 | __init__.py | Foobar Blew
Now the problem is when we do that, The module where the error message was logged is now the __init__ (corresponding to foo.log.__init__.py file) which defeats the whole purpose.
I tried doing a deepCopy/shallowCopy of the stderr/stdout objects, but that does nothing, it still says the module the message occured in __init__.py. How can i make it so this doesn't happen?
The problem is that the logging module is looking a single layer up the call stack to find who called it, but now your function is an intermediate layer at that point (Though I'd have expected it to report cleanMsg, not __init__, as that's where you're calling into log()). Instead, you need it to go up two levels, or else pass who your caller is into the logged message. You can do this by inspecting up the stack frame yourself and grabbing the calling function, inserting it into the message.
To find your calling frame, you can use the inspect module:
import inspect
f = inspect.currentframe(N)
will look up N frames, and return you the frame pointer. ie your immediate caller is currentframe(1), but you may have to go another frame up if this is the stdout.write method.
Once you have the calling frame, you can get the executing code object, and look at the file and function name associated with it. eg:
code = f.f_code
caller = '%s:%s' % (code.co_filename, code.co_name)
You may also need to put some code to handle non-python code calling into you (ie. C functions or builtins), as these may lack f_code objects.
Alternatively, following up mikej's answer, you could use the same approach in a custom Logger class inheriting from logging.Logger that overrides findCaller to navigate several frames up, rather than one.
I think the problem is that your actual log messages are now being created by the logy.error and logy.info calls in cleanMsg, hence that method is the source of the log messages and you are seeing this as __init__.py
If you look in the source of Python's lib/logging/__init__.py you will see a method defined called findCaller which is what the logging module uses to derive the caller of a logging request.
Perhaps you can override this on your logging object to customise the behaviour?

Categories

Resources