Capture / redirect all output of ProcessPoolExecutor - python

I am trying to capture all output from a ProcessPoolExecutor.
Imagine you have a file func.py:
print("imported") # I do not want this print in subprocesses
def f(x):
return x
then you run that function with a ProcessPoolExecutor like
from concurrent.futures import ProcessPoolExecutor
from func import f # ⚠️ the import will print! ⚠️
if __name__ == "__main__":
with ProcessPoolExecutor() as ex: # ⚠️ the import will happen here again and print! ⚠️
futs = [ex.submit(f, i) for i in range(15)]
for fut in futs:
fut.result()
Now I can capture the output of the first import using e.g., contextlib.redirect_stdout, however, I want to capture all output from the subprocesses too and redirect them to the stdout of the main process.
In my real use case, I get warnings that I want to capture, but a simple print reproduces the problem.
This is relevant to prevent the following bug https://github.com/Textualize/rich/issues/2371.

Related

Nothing is printed while using concurrent.futures

I want to make a process run parallelly, so I am using concurrent.futures . The problem is that it does not execute the function hello().
import time
import concurrent.futures
def hello(name):
print(f'hello {name}')
sleep(1)
if __name__ == "__main__":
t1=time.perf_counter()
names=["Jack","John","Lily","Stephen"]
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.map(hello,names)
t2=time.perf_counter()
print(f'{t2-t1} seconds')
Output
0.5415315 seconds
After going through the concurrent.futures documentation I found that ProcessPoolExecutor does not work in the interactive interpreter. So you need to make a file and run it via command prompt/bash shell.

Logging nested functions using joblib Parallel and delayed calls

In one of my scripts I have something like:
import logging
from joblib import Parallel, delayed
def f_A(x):
logging.info("f_A "+str(x))
def f_B():
logging.info("f_B")
res = Parallel(n_jobs=2, prefer="processes")(delayed(f_A)(x) for x in range(10))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
f_B()
I would expect that when I run python script.py something like:
INFO:root:f_B
INFO:root:f_A
to be shown in the console, instead I see:
INFO:root:f_B
but no information from f_A is shown.
How can I get f_A --and eventually functions called from there-- to show in the logs?
I think the issue is due to default logging level that is DEBUG and the main process doesn't share propagate the level to the children. If you modify slightly the script to:
import logging
from joblib import Parallel, delayed
def f_A(x):
logging.basicConfig(level=logging.INFO)
logging.info("f_A "+str(x))
def f_B():
logging.info("f_B")
res = Parallel(n_jobs=2, prefer="processes")(delayed(f_A)(x) for x in range(10))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
f_B()
then everything works as intended.

How to use concurrent.futures in Python

Im struggling to get multithreading working in Python. I have i function which i want to execute on 5 threads based on a parameter. I also needs 2 parameters that are the same for every thread. This is what i have:
from concurrent.futures import ThreadPoolExecutor
def do_something_parallel(sameValue1, sameValue2, differentValue):
print(str(sameValue1)) #same everytime
print(str(sameValue2)) #same everytime
print(str(differentValue)) #different
main():
differentValues = ["1000ms", "100ms", "10ms", "20ms", "50ms"]
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(do_something_parallel, sameValue1, sameValue2, differentValue) for differentValue in differentValues]
But i don't know what to do next
If you don't care about the order, you can now do:
from concurrent.futures import as_completed
# The rest of your code here
for f in as_completed(futures):
# Do what you want with f.result(), for example:
print(f.result())
Otherwise, if you care about order, it might make sense to use ThreadPoolExecutor.map with functools.partial to fill in the arguments that are always the same:
from functools import partial
# The rest of your code...
with ThreadPoolExecutor(max_workers=5) as executor:
results = executor.map(
partial(do_something_parallel, sameValue1, sameValue2),
differentValues
))

How to capture python subprocess stdout in unittest

I am trying to write a unit test that executes a function that writes to stdout, capture that output, and check the result. The function in question is a black box: we can't change how it is writing it's output. For purposes of this example I've simplified it quite a bit, but essentially the function generates its output using subprocess.call().
No matter what I try I can't capture the output. It is always written to the screen, and the test fails because it captures nothing. I experimented with both print() and os.system(). With print() I can capture stdout, but not with os.system() either.
It's also not specific to unittesting. I've written my test example without that with the same results.
Questions similar to this have been asked a lot, and the answers all seem to boil down to use subprocess.Popen() and communicate(), but that would require changing the black box. I'm sure there's an answer I just haven't come across, but I'm stumped.
We are using Python-2.7.
Anyway my example code is this:
#!/usr/bin/env python
from __future__ import print_function
import sys
sys.dont_write_bytecode = True
import os
import unittest
import subprocess
from contextlib import contextmanager
from cStringIO import StringIO
# from somwhere import my_function
def my_function(arg):
#print('my_function:', arg)
subprocess.call(['/bin/echo', 'my_function: ', arg], shell=False)
#os.system('echo my_function: ' + arg)
#contextmanager
def redirect_cm(new_stdout):
old_stdout = sys.stdout
sys.stdout = new_stdout
try:
yield
finally:
sys.stdout = old_stdout
class Test_something(unittest.TestCase):
def test(self):
fptr = StringIO()
with redirect_cm(fptr):
my_function("some_value")
self.assertEqual("my_function: some_value\n", fptr.getvalue())
if __name__ == '__main__':
unittest.main()
There are two issues in the above code
StringIO fptr does not shared by the current and the spawned process, we could not get the result in current process even if the spawned process has written result to StringIO object
Changing sys.stdout doesn’t affect the standard I/O streams of processes executed by os.popen(), os.system() or the exec*() family of functions in the os module
A simple solution is
use os.pipe to share result between the two processes
use os.dup2 instead of changing sys.stdout
A demo example as following shown
import sys
import os
import subprocess
from contextlib import contextmanager
#contextmanager
def redirect_stdout(new_out):
old_stdout = os.dup(1)
try:
os.dup2(new_out, sys.stdout.fileno())
yield
finally:
os.dup2(old_stdout, 1)
def test():
reader, writer = os.pipe()
with redirect_stdout(writer):
subprocess.call(['/bin/echo', 'something happened what'], shell=False)
print os.read(reader, 1024)
test()

terminate all processes in a Pool

I have a python script that looks like follows:
import os
import tempfile
from multiprocessing import Pool
def runReport(a, b, c):
# do task.
temp_dir = tempfile.gettempdir()
if (os.path.isfile(temp_dir + "/stop_check")):
# How to terminate all processes in the pool here?
def runReports(args):
return runReport(*args)
def main(argv):
pool = Pool(4)
args = []
# Code to generate args. args is an array of tuples of form (a, b, c)
pool.map(runReports, args)
if (__name__ == '__main__'):
main(sys.argv[1:])
There is another python script that creates this file /tmp/stop_check.
When this file gets created, I need to terminate the Pool. How can I achieve this?
Only the parent process can terminate the pool. You're better off having the parent run a loop that checks for the existence of that file, rather than trying to have each child do it and then signal the parent somehow:
import os
import sys
import time
import tempfile
from multiprocessing import Pool
def runReport(*args):
# do task
def runReports(args):
return runReport(*args)
def main(argv):
pool = Pool(4)
args = []
# Code to generate args. args is an array of tuples of form (a, b, c)
result = pool.map_async(runReports, args)
temp_dir = tempfile.gettempdir()
while not result.ready():
if os.path.isfile(temp_dir + "/stop_check"):
pool.terminate()
break
result.wait(.5) # Wait a bit to avoid pegging the CPU. You can tune this value as you see fit.
if (__name__ == '__main__'):
main(sys.argv[1:])
By using map_async instead of map, you're free to have the parent use a loop to check for the existence of the file, and then terminate the pool when necessary. Do not that using terminate to kill the children means that they won't get to do any clean up at all, so you need to make sure none of them access resources that could get left in an inconsistent state if the process dies while using them.

Categories

Resources