Python logging with an external module and multiprocessing

Python logging with an external module and multiprocessing - python

I'm trying to use multiprocessing to run some numerical code using an external module, which makes use of python's logging module with the usual pattern:
import logging
logger = logging.getLogger(__name__)
logger.info("some message")
I would like each subprocess to have its own log file, which should contain any logging information from the external module produced under that subprocess. However, I find that the different subprocesses use the first subprocess's log file, or may at random use their own. Because the external module is fairly complicated, I have made this example which replicates the behaviour:
# test_module.py
import logging
logger = logging.getLogger(__name__)
class test_class:
def __init__(self, x):
logger.info(f'hello! {x}')
# test.py
def fun(x):
import test_module
import logging
log_file = f'{x}.log'
logging.basicConfig(level=logging.INFO, filename=log_file, filemode='w+')
B = test_module.test_class(x)
if __name__ == "__main__":
import multiprocessing as mp
nprocs = 5
with mp.get_context('spawn').Pool(nprocs) as pool:
pool.map(fun, [x for x in range(10)])
This produces the following for 0.log
INFO:test_module:hello! 0
INFO:test_module:hello! 3
INFO:test_module:hello! 4
INFO:test_module:hello! 5
INFO:test_module:hello! 6
INFO:test_module:hello! 7
INFO:test_module:hello! 9
Some of the messages are contained variously in 1.log etc, at random:
INFO:test_module:hello! 1
My intention is for each log file to only contain its own message, for example 0.log should simply be as follows:
INFO:test_module:hello! 0
My understanding is that each subprocess should import the logging module after being created, and this should have no effect on the other subprocesses. And yet, many of them produce logging output in 0.log. I added the 'spawn' option in an attempt to ensure this is the case, as from what I understand it should produce a subprocess which shares nothing with the parent. Removing it and using the default ('fork', on my system) results in similar behaviour.

I would advice against using basicConfig and using __name__ in such a scenario.
Instead, set up the loggers and file handlers individually:
# test_module.py
import logging
class test_class:
def __init__(self, x):
logger = logging.getLogger(str(x))
logger.info(f'hello! {x}')
#test.py
import test_module
import logging
import multiprocessing as mp
def fun(x):
logger = logging.getLogger(str(x))
logger.setLevel(logging.INFO)
log_file = f'{x}.log'
f_handler = logging.FileHandler(log_file, mode='w+')
logger.addHandler(f_handler)
test_module.test_class(x)
if __name__ == "__main__":
nprocs = 5
with mp.Pool(nprocs) as pool:
pool.map(fun, [x for x in range(10)])

Related

How to set a handler for all loggers within a project?

I want to use a memory logger in my project. It keeps track of the last n logging records. A minimal example main file looks like this:
import sys
import logging
from logging import StreamHandler
from test_module import do_stuff
logger = logging.getLogger(__name__)
class MemoryHandler(StreamHandler):
def __init__(self, n_logs: int):
StreamHandler.__init__(self)
self.n_logs = n_logs
self.my_records = []
def emit(self, record):
self.my_records.append(self.format(record))
self.my_records = self.my_records[-self.n_logs:]
def to_string(self):
return '\n'.join(self.my_records)
if __name__ == '__main__':
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
mem_handler = MemoryHandler(n_logs=10)
logger.addHandler(mem_handler)
logger.info('hello')
do_stuff()
print(mem_handler.to_string())
The test module I am importing do_stuff from looks like this:
import logging
logger = logging.getLogger(__name__)
def do_stuff():
logger.info('doing stuff')
When I run the main function two log statements appear. The one from main and the one from doing stuff, but the memory logger only receives "hello" and not "doing stuff":
INFO:__main__:hello
INFO:test_module:doing stuff
hello
I assume that this is because mem_handler is not added to the test_module logger. I can fix this by adding the mem_handler explicitely:
logging.getLogger('test_module').addHandler(mem_handler)
But in general I don't want to list all modules and add the mem_handler manually. How can I add the mem_handler to all loggers in my project?

The Python logging system is federated. That means there is a tree like structure similar to the package structure. This structure works by logger name and the levels are separated by dots.
If you use the module's __name__ to get the logger it will be equivalant to the dotted name of the package. for example:
package.subpackage.module
In this federated system a message is send up the loggers structure (unless one of the loggers is explicitly configured with propagate=False).
So, the best way to add a handler is to add it to the root logger on the top of the structure and make sure all loggers below propagate.
You can get the root logger with logging.getLogger() (without any name) and then add handlers or other configuration as you like.

Logging nested functions using joblib Parallel and delayed calls

In one of my scripts I have something like:
import logging
from joblib import Parallel, delayed
def f_A(x):
logging.info("f_A "+str(x))
def f_B():
logging.info("f_B")
res = Parallel(n_jobs=2, prefer="processes")(delayed(f_A)(x) for x in range(10))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
f_B()
I would expect that when I run python script.py something like:
INFO:root:f_B
INFO:root:f_A
to be shown in the console, instead I see:
INFO:root:f_B
but no information from f_A is shown.
How can I get f_A --and eventually functions called from there-- to show in the logs?
I think the issue is due to default logging level that is DEBUG and the main process doesn't share propagate the level to the children. If you modify slightly the script to:
import logging
from joblib import Parallel, delayed
def f_A(x):
logging.basicConfig(level=logging.INFO)
logging.info("f_A "+str(x))
def f_B():
logging.info("f_B")
res = Parallel(n_jobs=2, prefer="processes")(delayed(f_A)(x) for x in range(10))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
f_B()
then everything works as intended.

Logging separate files for different Processes in Python

I've seen a few questions regarding putting logs from different processes together when using the multiprocessing module in Python. I would like to do the opposite, produce separate log files for different processes, and they should log everything that happens when calling other modules without being mangled. In the example below I have a main program (main.py) and two modules (module1.py and module2.py), and I want the main logger (mainlog) to write to stdout, which it does fine. I also want a separate file for each process including logging from module1 and module2.
main.py:
import logging
import multiprocessing as mpr
import module1
import sys
mainlog = logging.getLogger("main")
h = logging.StreamHandler(sys.stdout)
mainlog.addHandler(h)
logging.root.setLevel(logging.DEBUG)
for i in xrange(0,3):
mainlog.info("Starting process ... %s", i)
log = logging.getLogger("module1")
h = logging.FileHandler("process_{0}.log".format(i))
fmt = logging.Formatter(fmt="%(levelname)-10s:%(filename)-20s:%(message)s")
h.setFormatter(fmt)
log.addHandler(h)
log.setLevel(logging.DEBUG)
p = mpr.Process(target=module1.do_something, args=(i,))
p.start()
A module1.py:
import logging
import module2
log = logging.getLogger("module1")
def do_something(i):
for j in xrange(0,100):
log.debug("do something. process %2s. iteration %2s", i,j)
module2.multiply(j,2)
And a module2.py:
import logging
log = logging.getLogger("module2")
def multiply(x,y):
log.debug("... multiplying %s x %s = %s", x,y, x*y)
return x*y
Instead I get the following output:
Starting process ... 0
Starting process ... 1
No handlers could be found for logger "module2"
Starting process ... 2
No handlers could be found for logger "module2"
No handlers could be found for logger "module2"
And 3 individual logging files (process_0.log, ...) that contain the messages from all processes together, instead of only one. Nothing from module2.py is logged. What am I doing wrong?

You need to configure logging in the child processes. They start off with a clean slate, and logging isn't configured in them. Are you on Windows, by any chance? Almost nothing is inherited from the parent process to the child process in Windows, whereas on POSIX, fork() semantics can allow some things to be inherited.

I ended up creating a subclass of logging.Logger to manage switch between logging to main and logging to disk. Now I can switch when necessary inside a Process:
import logging
import sys
class CGLogger(logging.Logger):
def __init__(self,name):
logging.Logger.__init__(self,name)
self.mainhandler = logging.StreamHandler(sys.stdout)
self.addHandler(self.mainhandler)
def stop_main_logging(self):
self.removeHandler(self.mainhandler)
def log_to_file(self, fn):
self.filehandler = logging.FileHandler(fn)
self.addHandler(self.filehandler)
def stop_logging_to_file(self):
self.removeHandler(self.filehandler)
def restart_main_logging(self):
self.addHandler(self.mainhandler)
def switch_to_file_logging(self, fn):
self.stop_main_logging()
self.log_to_file(fn)
def switch_to_main_logging(self):
self.stop_logging_to_file()
self.restart_main_logging(fn)
logging.setLoggerClass(CGLogger)

Python logging across multiple modules

I'm trying to add logging (to console rather than a file) to my a piece of code I've been working on for a while. Having read around a bit I have a pattern that I think should work, but I'm not quite sure where I'm going wrong.
I have the following three files (simplified, obviously):
controller.py
import my_module
import logging
from setup_log import configure_log
def main():
logger = configure_log(logging.DEBUG, __name__)
logger.info('Started logging')
my_module.main()
if __name__ == "__main__":
main()
setup_log.py
import logging
def configure_log(level=None, name=None):
logger = logging.getLogger(name)
logger.setLevel(level)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.DEBUG)
chFormatter = logging.Formatter('%(levelname)s - %(filename)s - Line: %(lineno)d - %(message)s')
console_handler.setFormatter(chFormatter)
logger.addHandler(console_handler)
return logger
my_module.py
import logging
def main():
logger = logging.getLogger(__name__)
logger.info("Starting my_module")
print "Something"
if __name__ == "__main__":
main()
When I run them, only the first call to logging produces an output to console - 'Started logging'. The second call to logging - 'Starting my module' is just passed over.
What have I misunderstood/mangled?

According to the documentation it looks like you might get away with an even simpler setup like so:
If your program consists of multiple modules, here’s an example of how
you could organize logging in it:
# myapp.py
import logging
import mylib
def main():
logging.basicConfig(filename='myapp.log', level=logging.INFO)
logging.info('Started')
mylib.do_something()
logging.info('Finished')
if __name__ == '__main__':
main()
# mylib.py
import logging
def do_something():
logging.info('Doing something')
If you run myapp.py, you should see this in myapp.log:
INFO:root:Started
INFO:root:Doing something
INFO:root:Finished
It looks like your call to logger = logging.getLogger(__name__) inside your module is creating a separate track (with a level of NOTSET but no parent relationship to result in a log entry)

The actual bug can be seen by putting the line:
print '__name__', __name__
at the beginning of both your mains which yields:
$ python controller.py
__name__ __main__
INFO - controller.py - Line: 8 - Started logging
__name__ my_module
Something
So you properly configured a logger called __main__ but the logger named my_module isn't configured.
The deeper problem is that you have two main methods which is probably confusing you (it did me).

Only print when run as script?

Is there a better way to only print when run as a script, when __name__ == '__main__' ?
I have some scripts that I also import and use parts of.
Something like the below will work but is ugly, and would have to be defined in each script separately:
def printif(s):
if globals()['__name__'] == '__main__':
print (s)
return
I looked briefly at some of python's logging libraries but would prefer a two lighter solution...
edit:
I ended up doing something like this:
# mylog.py
import sys
import logging
log = logging.getLogger()
#default logging level
log.setLevel(logging.WARNING)
log.addHandler(logging.StreamHandler(sys.stdout))
And from the script:
import log from mylog
...
log.info(...)
log.warning(...)
...
if __name__ == '__main__':
#override when script is run..
log.setLevel(logger.INFO)
This scheme has minimal code duplication, per script log levels, and a project-wide default level...which is exactly what I wanted.

run_as_script = False
def printif(s):
if run_as_script:
print (s)
return
if __name__ == '__main__':
run_as_script = True

In light of user318904's comment on my other answer, I'll provide an alternative (although this may not work in all cases, it might just be "good enough").
For a separate module:
import sys
def printif(s):
if sys.argv[0] != '':
print (s)

Using a logging library is really not that heavyweight:
import logging
log = logging.getLogger('myscript')
def dostuff(...):
....
log.info('message!')
...
if __name__ == '__main__':
import sys
log.setLevel(logging.INFO)
log.addHandler(logging.StreamHandler(sys.stdout))
...
One wart is the "WARNING: no handlers found for myscript" message that logging prints by default if you import this module (rather than run it as a script), and call your function without setting up logging. It'll be gone in Python 3.2. For Python 2.7, you can shut it off by adding
log.addHandler(logging.NullHandler())
at the top, and for older versions you'd have to define a NullHandler class like this:
class NullHandler(logging.Handler):
def emit(self, record):
pass
Looking back at all this, I say: go with Gerrat's suggestion. I'll leave mine here, for completeness.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python logging with an external module and multiprocessing - python

Related

How to set a handler for all loggers within a project?

Logging nested functions using joblib Parallel and delayed calls

Logging separate files for different Processes in Python

Python logging across multiple modules

Only print when run as script?

Categories

Resources