Problems with Python's multithreading/logging modules

Problems with Python's multithreading/logging modules - python

I have a concurrently-run program where I want to create a log for each child process. I'll first describe my setup and then the issue I'm facing. Here are my primary modules:
mp_handler.py:
import logging
import multiprocessing as mp
def mp_handler(target, args_list):
# configure logs
for args in args_list:
logger_id = args[0] # first arg suffices to id a process, in my case
logger = logging.getLogger(logger_id)
handler = logging.FileHandler(logger_id + '.log')
logger.setLevel(logging.INFO)
logger.addHandler(handler)
mp.set_start_method('spawn') # bug fix, see below
# build each process
for args in args_list:
p = mp.Process(target = target, args = args)
p.start()
mp_worker.py:
import logging
from deco_module import deco
from my_module import function_with_open_cv
#deco
def mp_worker(args):
logger_id = arg[0]
logger = logging.getLogger(logger_id)
log.info("Information about process %s" % log_id)
# do a lot of stuff with openCV3
function_with_open_cv(args) # also logs to this child's log file
deco_module.py: this module does some exception handling and I have no idea why it might interfere but I figure I'd include it just in case.
from functools import wraps
import logging
def deco(function):
#wraps(function)
def wrapper(*args):
logger_id = *args[0]
logger = logging.getLogger(logger_id)
try:
function(*args)
except:
logger.info('a message in case the child fails.')
return wrapper
Now, on to my issue. I was getting the error described in this post: https://github.com/opencv/opencv/issues/5150. Hence, I wrote the mp.set_start_method('spawn') line in mp_handler().
After debugging, however, I found that that line was causing the logger = logging.getLogger(logger_id) line mp_worker() to create a NEW logger as opposed to getting the one created in the parent, i.e. mp_handler(). I was able to see this by printing hex(id(logger)) in both the parent and the child modules and see that the locations in memory are different. Indeed, as I said, writing mp.set_start_method('fork') avoids this issue (this makes very rough sense to me as my understanding is that spawn will create a new space for the logger).
main problem: So, the problem is, how do I work around the fact that I need the start method to be set to 'spawn' for the sake of OpenCV but need to toggle it off in order for log communication between modules (i.e. in order for mp_worker to recognize its correct logger_id in order to log to the correct file)? As part of good practice, I want to keep all logging configs out of the children and submodules alike.
secondary problem: supposing I ignore the fact that I need OpenCV and set the method to 'fork.' In this case I noticed that none of the logging.info() statements in the function_with_open_cv() function never get to the log! So, supposing your recommendation does involve setting it to fork, what is the work around here? EDIT: FIXED! This is also being caused by OpenCV. So the problem still stands... how do I use a spawn process and not lose my logger ID?
Thank you so much!

You shouldn't configure logging before a process is spawned, but after. See the documentation for an example of how to do it correctly. This applies to Python 3, but if you need to run it under Python 2, you can use the logutils package, which provides QueueListener and QueueHandler classes.
The logging cookbook contains more example code relating to using logging with multiprocessing.

Related

How to capture which module that prints to stdout/stderr within the whole python application?

I have a python web app that has global logger definition. Both stderr and stdout are overloaded to pass their string buffer to logging. However when there is an output from stdout/stderr, it only prints the logger module (in my formatter i added the module name), not the module where its coming from.
Is it possible to capture which modules those outputs are coming from? If so, how?
edit: added code here
class RedirectToLogger(object):
def __init__(self, output,logger):
self.logger = logger
self.flush = output.flush
def write(self, line_buffer):
for line in line_buffer.rstrip().splitlines():
self.logger.log(self.log_level, line.rstrip())
# this will invoke stderr/stdout output to logger.
sys.stdout = RedirectToLogger(sys.stdout, app_logger)
sys.stderr = RedirectToLogger(sys.stderr, app_logger)

Requires Python 3.8 or later:
With the information given in a comment this is possible to solve: when your redirect calls the log() method you can tell it that the caller can be found by going back one extra frame in the callstack. The stacklevel kwarg is used for that.
self.logger.log(self.log_level, line.rstrip(), stacklevel=2)

While the answer from blues seems to be a neat solution for version >3.8
For me, this was what i needed.
sys._getframe(1).f_code.co_filename
_getframe() arguement 1 will grab the frame object before the logger from the stack, which is the file where stdout/stderr is triggered.

Strange Issue Using Logging Module in Python

I seem to be running into a problem when I am logging data after invoking another module in an application I am working on. I'd like assistance in understanding what may be happening here.
To replicate the issue, I have developed the following script...
#!/usr/bin/python
import sys
import logging
from oletools.olevba import VBA_Parser, VBA_Scanner
from cloghandler import ConcurrentRotatingFileHandler
# set up logger for application
dbg_h = logging.getLogger('dbg_log')
dbglog = '%s' % 'dbg.log'
dbg_rotateHandler = ConcurrentRotatingFileHandler(dbglog, "a")
dbg_h.addHandler(dbg_rotateHandler)
dbg_h.setLevel(logging.ERROR)
# read some document as a buffer
buff = sys.stdin.read()
# generate issue
dbg_h.error('Before call to module....')
vba = VBA_Parser('None', data=buff)
dbg_h.error('After call to module....')
When I run this, I get the following...
cat somedocument.doc | ./replicate.py
ERROR:dbg_log:After call to module....
For some reason, my last dbg_h logger write attempt is getting output to the console as well as getting written to my dbg.log file? This only appears to happen AFTER the call to VBA_Parser.
cat dbg.log
Before call to module....
After call to module....
Anyone have any idea as to why this might be happening? I reviewed the source code of olevba and did not see anything that stuck out to me specifically.
Could this be a problem I should raise with the module author? Or am I doing something wrong with how I am using the cloghandler?

The oletools codebase is littered with calls to the root logger though calls to logging.debug(...), logging.error(...), and so on. Since the author didn't bother to configure the root logger, the default behavior is to dump to sys.stderr. Since sys.stderr defaults to the console when running from the command line, you get what you're seeing.
You should contact the author of oletools since they're not using the logging system effectively. Ideally they would use a named logger and push the messages to that logger. As a work-around to suppress the messages you could configure the root logger to use your handler.
# Set a handler
logger.root.addHandler(dbg_rotateHandler)
Be aware that this may lead to duplicated log messages.

difference in logging mechanism: API and application(python)

I am currently writing an API and an application which uses the API. I have gotten suggestions from people stating that I should perform logging using handlers in the application and use a "logger" object for logging from the API.
In light of the advice I received above, is the following implementation correct?
class test:
def __init__(self, verbose):
self.logger = logging.getLogger("test")
self.logger.setLevel(verbose)
def do_something(self):
# do something
self.logger.log("something")
# by doing this i get the error message "No handlers could be found for logger "test"
The implementation i had in mind was as follows:
#!/usr/bin/python
"""
....
....
create a logger with a handler
....
....
"""
myobject = test()
try:
myobject.do_something()
except SomeError:
logger.log("cant do something")
I'd like to get my basics strong, I'd be grateful for any help and suggestions for code you might recommend I look up.
Thanks!

It's not very clear whether your question is about the specifics of how to use logging or about logging exceptions, but if the latter, I would agree with Adam Crossland that log-and-swallow is a pattern to be avoided.
In terms of the mechanics of logging, I would make the following observations:
You don't need to have a logger as an instance member. It's more natural to declare loggers at module level using logger = logging.getLogger(__name__), and this will also work as expected in sub-packages.
Your call logger.log("message") would likely fail anyway because the log method has a level as the first argument, rather than a message.
You should declare handlers, and if your usage scenario is fairly simple you can do this in your main method or if __name__ == '__main__': clause by adding for example
logging.basicConfig(filename='/tmp/myapp.log', level=logging.DEBUG,
format='%(asctime)s %(levelname)s %(name)s %(message)s')
and then elsewhere in your code, just do for example
import logging
logger = logging.getLogger(__name__)
once at the top of each module where you want to use logging, and then
logger.debug('message with %s', 'arguments') # or .info, .warning, .error etc.
in your code wherever needed.

The danger with the pattern that you are thinking about is that you may end up effectively hiding exceptions by putting them in a log. Many exceptions really should crash your program because they represent a problem that needs to be fixed. Generally, it is more useful to be able to step into code with a debugger to find out what caused the exception.
If there are cases that an exception represents an expected condition that does not affect the stability of the app or the correctness of its behavior, doing nothing but writing a notation to the log is OK. But be very, very careful about how you use this.

I usually do the following:
import logging
import logging.config
logging.config.fileConfig('log.congig')
# for one line log records
G_LOG = logging.getLogger(__name__)
# for records with stacktraces
ST_LOG = logging.getLogger('stacktrace.' + __name__)
try:
# some code
G_LOG.info('some message %s %s', param1, param2)
except (StandardError,):
message = 'some message'
G_LOG.error(message)
# exc_info appends stacktrace to the log message
ST_LOG.error(message, exc_info=True)
Format of the config file can be seen in the python manual

How can I strip Python logging calls without commenting them out?

Today I was thinking about a Python project I wrote about a year back where I used logging pretty extensively. I remember having to comment out a lot of logging calls in inner-loop-like scenarios (the 90% code) because of the overhead (hotshot indicated it was one of my biggest bottlenecks).
I wonder now if there's some canonical way to programmatically strip out logging calls in Python applications without commenting and uncommenting all the time. I'd think you could use inspection/recompilation or bytecode manipulation to do something like this and target only the code objects that are causing bottlenecks. This way, you could add a manipulator as a post-compilation step and use a centralized configuration file, like so:
[Leave ERROR and above]
my_module.SomeClass.method_with_lots_of_warn_calls
[Leave WARN and above]
my_module.SomeOtherClass.method_with_lots_of_info_calls
[Leave INFO and above]
my_module.SomeWeirdClass.method_with_lots_of_debug_calls
Of course, you'd want to use it sparingly and probably with per-function granularity -- only for code objects that have shown logging to be a bottleneck. Anybody know of anything like this?
Note: There are a few things that make this more difficult to do in a performant manner because of dynamic typing and late binding. For example, any calls to a method named debug may have to be wrapped with an if not isinstance(log, Logger). In any case, I'm assuming all of the minor details can be overcome, either by a gentleman's agreement or some run-time checking. :-)

What about using logging.disable?
I've also found I had to use logging.isEnabledFor if the logging message is expensive to create.

Use pypreprocessor
Which can also be found on PYPI (Python Package Index) and be fetched using pip.
Here's a basic usage example:
from pypreprocessor import pypreprocessor
pypreprocessor.parse()
#define nologging
#ifdef nologging
...logging code you'd usually comment out manually...
#endif
Essentially, the preprocessor comments out code the way you were doing it manually before. It just does it on the fly conditionally depending on what you define.
You can also remove all of the preprocessor directives and commented out code from the postprocessed code by adding 'pypreprocessor.removeMeta = True' between the import and
parse() statements.
The bytecode output (.pyc) file will contain the optimized output.
SideNote: pypreprocessor is compatible with python2x and python3k.
Disclaimer: I'm the author of pypreprocessor.

I've also seen assert used in this fashion.
assert logging.warn('disable me with the -O option') is None
(I'm guessing that warn always returns none.. if not, you'll get an AssertionError
But really that's just a funny way of doing this:
if __debug__: logging.warn('disable me with the -O option')
When you run a script with that line in it with the -O option, the line will be removed from the optimized .pyo code. If, instead, you had your own variable, like in the following, you will have a conditional that is always executed (no matter what value the variable is), although a conditional should execute quicker than a function call:
my_debug = True
...
if my_debug: logging.warn('disable me by setting my_debug = False')
so if my understanding of debug is correct, it seems like a nice way to get rid of unnecessary logging calls. The flipside is that it also disables all of your asserts, so it is a problem if you need the asserts.

As an imperfect shortcut, how about mocking out logging in specific modules using something like MiniMock?
For example, if my_module.py was:
import logging
class C(object):
def __init__(self, *args, **kw):
logging.info("Instantiating")
You would replace your use of my_module with:
from minimock import Mock
import my_module
my_module.logging = Mock('logging')
c = my_module.C()
You'd only have to do this once, before the initial import of the module.
Getting the level specific behaviour would be simple enough by mocking specific methods, or having logging.getLogger return a mock object with some methods impotent and others delegating to the real logging module.
In practice, you'd probably want to replace MiniMock with something simpler and faster; at the very least something which doesn't print usage to stdout! Of course, this doesn't handle the problem of module A importing logging from module B (and hence A also importing the log granularity of B)...
This will never be as fast as not running the log statements at all, but should be much faster than going all the way into the depths of the logging module only to discover this record shouldn't be logged after all.

You could try something like this:
# Create something that accepts anything
class Fake(object):
def __getattr__(self, key):
return self
def __call__(self, *args, **kwargs):
return True
# Replace the logging module
import sys
sys.modules["logging"] = Fake()
It essentially replaces (or initially fills in) the space for the logging module with an instance of Fake which simply takes in anything. You must run the above code (just once!) before the logging module is attempted to be used anywhere. Here is a test:
import logging
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s %(levelname)-8s %(message)s',
datefmt='%a, %d %b %Y %H:%M:%S',
filename='/temp/myapp.log',
filemode='w')
logging.debug('A debug message')
logging.info('Some information')
logging.warning('A shot across the bows')
With the above, nothing at all was logged, as was to be expected.

I'd use some fancy logging decorator, or a bunch of them:
def doLogging(logTreshold):
def logFunction(aFunc):
def innerFunc(*args, **kwargs):
if LOGLEVEL >= logTreshold:
print ">>Called %s at %s"%(aFunc.__name__, time.strftime("%H:%M:%S"))
print ">>Parameters: ", args, kwargs if kwargs else ""
try:
return aFunc(*args, **kwargs)
finally:
print ">>%s took %s"%(aFunc.__name__, time.strftime("%H:%M:%S"))
return innerFunc
return logFunction
All you need is to declare LOGLEVEL constant in each module (or just globally and just import it in all modules) and then you can use it like this:
#doLogging(2.5)
def myPreciousFunction(one, two, three=4):
print "I'm doing some fancy computations :-)"
return
And if LOGLEVEL is no less than 2.5 you'll get output like this:
>>Called myPreciousFunction at 18:49:13
>>Parameters: (1, 2)
I'm doing some fancy computations :-)
>>myPreciousFunction took 18:49:13
As you can see, some work is needed for better handling of kwargs, so the default values will be printed if they are present, but that's another question.
You should probably use some logger module instead of raw print statements, but I wanted to focus on the decorator idea and avoid making code too long.
Anyway - with such decorator you get function-level logging, arbitrarily many log levels, ease of application to new function, and to disable logging you only need to set LOGLEVEL. And you can define different output streams/files for each function if you wish. You can write doLogging as:
def doLogging(logThreshold, outStream=sys.stdout):
.....
print >>outStream, ">>Called %s at %s" etc.
And utilize log files defined on a per-function basis.

This is an issue in my project as well--logging ends up on profiler reports pretty consistently.
I've used the _ast module before in a fork of PyFlakes (http://github.com/kevinw/pyflakes) ... and it is definitely possible to do what you suggest in your question--to inspect and inject guards before calls to logging methods (with your acknowledged caveat that you'd have to do some runtime type checking). See http://pyside.blogspot.com/2008/03/ast-compilation-from-python.html for a simple example.
Edit: I just noticed MetaPython on my planetpython.org feed--the example use case is removing log statements at import time.
Maybe the best solution would be for someone to reimplement logging as a C module, but I wouldn't be the first to jump at such an...opportunity :p

:-) We used to call that a preprocessor and although C's preprocessor had some of those capablities, the "king of the hill" was the preprocessor for IBM mainframe PL/I. It provided extensive language support in the preprocessor (full assignments, conditionals, looping, etc.) and it was possible to write "programs that wrote programs" using just the PL/I PP.
I wrote many applications with full-blown sophisticated program and data tracing (we didn't have a decent debugger for a back-end process at that time) for use in development and testing which then, when compiled with the appropriate "runtime flag" simply stripped all the tracing code out cleanly without any performance impact.
I think the decorator idea is a good one. You can write a decorator to wrap the functions that need logging. Then, for runtime distribution, the decorator is turned into a "no-op" which eliminates the debugging statements.
Jon R

I am doing a project currently that uses extensive logging for testing logic and execution times for a data analysis API using the Pandas library.
I found this string with a similar concern - e.g. what is the overhead on the logging.debug statements even if the logging.basicConfig level is set to level=logging.WARNING
I have resorted to writing the following script to comment out or uncomment the debug logging prior to deployment:
import os
import fileinput
comment = True
# exclude files or directories matching string
fil_dir_exclude = ["__","_archive",".pyc"]
if comment :
## Variables to comment
source_str = 'logging.debug'
replace_str = '#logging.debug'
else :
## Variables to uncomment
source_str = '#logging.debug'
replace_str = 'logging.debug'
# walk through directories
for root, dirs, files in os.walk('root/directory') :
# where files exist
if files:
# for each file
for file_single in files :
# build full file name
file_name = os.path.join(root,file_single)
# exclude files with matching string
if not any(exclude_str in file_name for exclude_str in fil_dir_exclude) :
# replace string in line
for line in fileinput.input(file_name, inplace=True):
print "%s" % (line.replace(source_str, replace_str)),
This is a file recursion that excludes files based on a list of criteria and performs an in place replace based on an answer found here: Search and replace a line in a file in Python

I like the 'if __debug_' solution except that putting it in front of every call is a bit distracting and ugly. I had this same problem and overcame it by writing a script which automatically parses your source files and replaces logging statements with pass statements (and commented out copies of the logging statements). It can also undo this conversion.
I use it when I deploy new code to a production environment when there are lots of logging statements which I don't need in a production setting and they are affecting performance.
You can find the script here: http://dound.com/2010/02/python-logging-performance/

Python: Warnings and logging verbose limit

I want to unify the whole logging facility of my app. Any warning is raise an exception, next I catch it and pass it to the logger. But the question: Is there in logging any mute facility? Sometimes logger becomes too verbose. Sometimes for the reason of too noisy warnings, is there are any verbose limit in warnings?
http://docs.python.org/library/logging.html
http://docs.python.org/library/warnings.html

Not only are there log levels, but there is a really flexible way of configuring them. If you are using named logger objects (e.g., logger = logging.getLogger(...)) then you can configure them appropriately. That will let you configure verbosity on a subsystem-by-subsystem basis where a subsystem is defined by the logging hierarchy.
The other option is to use logging.Filter and Warning filters to limit the output. I haven't used this method before but it looks like it might be a better fit for your needs.
Give PEP-282 a read for a good prose description of the Python logging package. I think that it describes the functionality much better than the module documentation does.
Edit after Clarification
You might be able to handle the logging portion of this using a custom class based on logging.Logger and registered with logging.setLoggerClass(). It really sounds like you want something similar to syslog's "Last message repeated 9 times". Unfortunately I don't know of an implementation of this anywhere. You might want to see if twisted.python.log supports this functionality.

from the very source you mentioned.
there are the log-levels, use the wisely ;-)
LEVELS = {'debug': logging.DEBUG,
'info': logging.INFO,
'warning': logging.WARNING,
'error': logging.ERROR,
'critical': logging.CRITICAL}

This will be a problem if you plan to make all logging calls from some blind error handler that doesn't know anything about the code that raised the error, which is what your question sounds like. How will you decide which logging calls get made and which don't?
The more standard practice is to use such blocks to recover if possible, and log an error (really, if it is an error that you weren't specifically prepared for, you want to know about it; use a high level). But don't rely on these blocks for all your state/debug information. Better to sprinkle your code with logging calls before it gets to the error-handler. That way, you can observe useful run-time information about a system when it is NOT failing and you can make logging calls of different severity. For example:
import logging
from traceback import format_exc
logger = logging.getLogger() # Gives the root logger. Change this for better organization
# Add your appenders or what have you
def handle_error(e):
logger.error("Unexpected error found")
logger.warn(format_exc()) #put the traceback in the log at lower level
... #Your recovery code
def do_stuff():
logger.info("Program started")
... #Your main code
logger.info("Stuff done")
if __name__ == "__main__":
try:
do_stuff()
except Exception,e:
handle_error(e)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.