Logging nested functions using joblib Parallel and delayed calls - python

In one of my scripts I have something like:
import logging
from joblib import Parallel, delayed
def f_A(x):
logging.info("f_A "+str(x))
def f_B():
logging.info("f_B")
res = Parallel(n_jobs=2, prefer="processes")(delayed(f_A)(x) for x in range(10))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
f_B()
I would expect that when I run python script.py something like:
INFO:root:f_B
INFO:root:f_A
to be shown in the console, instead I see:
INFO:root:f_B
but no information from f_A is shown.
How can I get f_A --and eventually functions called from there-- to show in the logs?
I think the issue is due to default logging level that is DEBUG and the main process doesn't share propagate the level to the children. If you modify slightly the script to:
import logging
from joblib import Parallel, delayed
def f_A(x):
logging.basicConfig(level=logging.INFO)
logging.info("f_A "+str(x))
def f_B():
logging.info("f_B")
res = Parallel(n_jobs=2, prefer="processes")(delayed(f_A)(x) for x in range(10))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
f_B()
then everything works as intended.

Related

Capture / redirect all output of ProcessPoolExecutor

I am trying to capture all output from a ProcessPoolExecutor.
Imagine you have a file func.py:
print("imported") # I do not want this print in subprocesses
def f(x):
return x
then you run that function with a ProcessPoolExecutor like
from concurrent.futures import ProcessPoolExecutor
from func import f # ⚠️ the import will print! ⚠️
if __name__ == "__main__":
with ProcessPoolExecutor() as ex: # ⚠️ the import will happen here again and print! ⚠️
futs = [ex.submit(f, i) for i in range(15)]
for fut in futs:
fut.result()
Now I can capture the output of the first import using e.g., contextlib.redirect_stdout, however, I want to capture all output from the subprocesses too and redirect them to the stdout of the main process.
In my real use case, I get warnings that I want to capture, but a simple print reproduces the problem.
This is relevant to prevent the following bug https://github.com/Textualize/rich/issues/2371.

Python logging with an external module and multiprocessing

I'm trying to use multiprocessing to run some numerical code using an external module, which makes use of python's logging module with the usual pattern:
import logging
logger = logging.getLogger(__name__)
logger.info("some message")
I would like each subprocess to have its own log file, which should contain any logging information from the external module produced under that subprocess. However, I find that the different subprocesses use the first subprocess's log file, or may at random use their own. Because the external module is fairly complicated, I have made this example which replicates the behaviour:
# test_module.py
import logging
logger = logging.getLogger(__name__)
class test_class:
def __init__(self, x):
logger.info(f'hello! {x}')
# test.py
def fun(x):
import test_module
import logging
log_file = f'{x}.log'
logging.basicConfig(level=logging.INFO, filename=log_file, filemode='w+')
B = test_module.test_class(x)
if __name__ == "__main__":
import multiprocessing as mp
nprocs = 5
with mp.get_context('spawn').Pool(nprocs) as pool:
pool.map(fun, [x for x in range(10)])
This produces the following for 0.log
INFO:test_module:hello! 0
INFO:test_module:hello! 3
INFO:test_module:hello! 4
INFO:test_module:hello! 5
INFO:test_module:hello! 6
INFO:test_module:hello! 7
INFO:test_module:hello! 9
Some of the messages are contained variously in 1.log etc, at random:
INFO:test_module:hello! 1
My intention is for each log file to only contain its own message, for example 0.log should simply be as follows:
INFO:test_module:hello! 0
My understanding is that each subprocess should import the logging module after being created, and this should have no effect on the other subprocesses. And yet, many of them produce logging output in 0.log. I added the 'spawn' option in an attempt to ensure this is the case, as from what I understand it should produce a subprocess which shares nothing with the parent. Removing it and using the default ('fork', on my system) results in similar behaviour.
I would advice against using basicConfig and using __name__ in such a scenario.
Instead, set up the loggers and file handlers individually:
# test_module.py
import logging
class test_class:
def __init__(self, x):
logger = logging.getLogger(str(x))
logger.info(f'hello! {x}')
#test.py
import test_module
import logging
import multiprocessing as mp
def fun(x):
logger = logging.getLogger(str(x))
logger.setLevel(logging.INFO)
log_file = f'{x}.log'
f_handler = logging.FileHandler(log_file, mode='w+')
logger.addHandler(f_handler)
test_module.test_class(x)
if __name__ == "__main__":
nprocs = 5
with mp.Pool(nprocs) as pool:
pool.map(fun, [x for x in range(10)])

import keras disables multiprocessing

Code is working fine, but as soon as i uncomment line #5 multiprocessing is not working anymore.
I know that keras doesn't support to run in multiple processes but I want to run it in a main process.
from multiprocessing import Process
import logging
logging.basicConfig(filename='test.log')
logging.getLogger().setLevel(level=logging.DEBUG)
#import keras
def f():
logging.info("Test")
if __name__ == "__main__":
p = Process(target=f)
p.start()

Logging separate files for different Processes in Python

I've seen a few questions regarding putting logs from different processes together when using the multiprocessing module in Python. I would like to do the opposite, produce separate log files for different processes, and they should log everything that happens when calling other modules without being mangled. In the example below I have a main program (main.py) and two modules (module1.py and module2.py), and I want the main logger (mainlog) to write to stdout, which it does fine. I also want a separate file for each process including logging from module1 and module2.
main.py:
import logging
import multiprocessing as mpr
import module1
import sys
mainlog = logging.getLogger("main")
h = logging.StreamHandler(sys.stdout)
mainlog.addHandler(h)
logging.root.setLevel(logging.DEBUG)
for i in xrange(0,3):
mainlog.info("Starting process ... %s", i)
log = logging.getLogger("module1")
h = logging.FileHandler("process_{0}.log".format(i))
fmt = logging.Formatter(fmt="%(levelname)-10s:%(filename)-20s:%(message)s")
h.setFormatter(fmt)
log.addHandler(h)
log.setLevel(logging.DEBUG)
p = mpr.Process(target=module1.do_something, args=(i,))
p.start()
A module1.py:
import logging
import module2
log = logging.getLogger("module1")
def do_something(i):
for j in xrange(0,100):
log.debug("do something. process %2s. iteration %2s", i,j)
module2.multiply(j,2)
And a module2.py:
import logging
log = logging.getLogger("module2")
def multiply(x,y):
log.debug("... multiplying %s x %s = %s", x,y, x*y)
return x*y
Instead I get the following output:
Starting process ... 0
Starting process ... 1
No handlers could be found for logger "module2"
Starting process ... 2
No handlers could be found for logger "module2"
No handlers could be found for logger "module2"
And 3 individual logging files (process_0.log, ...) that contain the messages from all processes together, instead of only one. Nothing from module2.py is logged. What am I doing wrong?
You need to configure logging in the child processes. They start off with a clean slate, and logging isn't configured in them. Are you on Windows, by any chance? Almost nothing is inherited from the parent process to the child process in Windows, whereas on POSIX, fork() semantics can allow some things to be inherited.
I ended up creating a subclass of logging.Logger to manage switch between logging to main and logging to disk. Now I can switch when necessary inside a Process:
import logging
import sys
class CGLogger(logging.Logger):
def __init__(self,name):
logging.Logger.__init__(self,name)
self.mainhandler = logging.StreamHandler(sys.stdout)
self.addHandler(self.mainhandler)
def stop_main_logging(self):
self.removeHandler(self.mainhandler)
def log_to_file(self, fn):
self.filehandler = logging.FileHandler(fn)
self.addHandler(self.filehandler)
def stop_logging_to_file(self):
self.removeHandler(self.filehandler)
def restart_main_logging(self):
self.addHandler(self.mainhandler)
def switch_to_file_logging(self, fn):
self.stop_main_logging()
self.log_to_file(fn)
def switch_to_main_logging(self):
self.stop_logging_to_file()
self.restart_main_logging(fn)
logging.setLoggerClass(CGLogger)

Only print when run as script?

Is there a better way to only print when run as a script, when __name__ == '__main__' ?
I have some scripts that I also import and use parts of.
Something like the below will work but is ugly, and would have to be defined in each script separately:
def printif(s):
if globals()['__name__'] == '__main__':
print (s)
return
I looked briefly at some of python's logging libraries but would prefer a two lighter solution...
edit:
I ended up doing something like this:
# mylog.py
import sys
import logging
log = logging.getLogger()
#default logging level
log.setLevel(logging.WARNING)
log.addHandler(logging.StreamHandler(sys.stdout))
And from the script:
import log from mylog
...
log.info(...)
log.warning(...)
...
if __name__ == '__main__':
#override when script is run..
log.setLevel(logger.INFO)
This scheme has minimal code duplication, per script log levels, and a project-wide default level...which is exactly what I wanted.
run_as_script = False
def printif(s):
if run_as_script:
print (s)
return
if __name__ == '__main__':
run_as_script = True
In light of user318904's comment on my other answer, I'll provide an alternative (although this may not work in all cases, it might just be "good enough").
For a separate module:
import sys
def printif(s):
if sys.argv[0] != '':
print (s)
Using a logging library is really not that heavyweight:
import logging
log = logging.getLogger('myscript')
def dostuff(...):
....
log.info('message!')
...
if __name__ == '__main__':
import sys
log.setLevel(logging.INFO)
log.addHandler(logging.StreamHandler(sys.stdout))
...
One wart is the "WARNING: no handlers found for myscript" message that logging prints by default if you import this module (rather than run it as a script), and call your function without setting up logging. It'll be gone in Python 3.2. For Python 2.7, you can shut it off by adding
log.addHandler(logging.NullHandler())
at the top, and for older versions you'd have to define a NullHandler class like this:
class NullHandler(logging.Handler):
def emit(self, record):
pass
Looking back at all this, I say: go with Gerrat's suggestion. I'll leave mine here, for completeness.

Categories

Resources