Python debugging with "assert" when using multiprocessing module - python

I have a specific version of this question on debugging multiprocessing in Python. I use assert statements extensively throughout my code to catch bugs. When a false assert fires, the program stops, and the file name and line number of the offending assert is printed to stderr.
However when I used multiprocessing.Pool, all I get back is that there was an AssertionError but no information about where the offending assert is.
For example, the following minimal code uses multiprocessing map for cores >= 2' and regularmapfunction forcores == 1`:
import multiprocessing
import logging
mpl = multiprocessing.log_to_stderr()
mpl.setLevel(logging.INFO)
def test(foo):
print foo
assert False
cores = 2
if cores > 1:
pool = multiprocessing.Pool(cores)
pool.map(test, range(cores))
else:
map(test, range(cores))
For cores == 1 I get the following error:
File "test_multiprocessing.py", line 16, in test
assert False
For cores == 2 I get:
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
If you notice, I tried the logging answer suggest here but that doesn't provide the assert information either.
Is there a way using multiprocessing module or any other threading module to get the location of an offending assert statement?

Related

How to implement multi-processing into a python module?

I would like to implement multiprocessing into a simulation which I have written in python. The simulation is very extensive and to clean the code I have created a number of modules.
One of the modules is now supposed to do some number crunching. Thus, I'd like to implement multiprocessing. However, I will always encounter an issue as I can not employ an if __name__ == "__main__" guard with in the module.
I can reproduce the error by running the following:
# filename: test_mp_module.py
import concurrent.futures
def test_fct(arg):
return arg
class TestMpModule():
def __init__(self):
pass
def do(arg):
para = [1,2,3]
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(test_fct, para)
for result in results:
print(result)
and
# filename: main.py
from test_mp_module import TestMpModule
test = TestMpModule()
test.do()
The Exception displayed states:
runfile('C:/XXX/test_mp.py', wdir='C:/XXX')
Reloaded modules: test_mp_module
Traceback (most recent call last):
File "C:\XXX\test_mp.py", line 17, in <module>
test.do()
File "C:\XXX\test_mp_module.py", line 22, in do
for result in results:
File "C:\YYY\Anaconda3\lib\concurrent\futures\process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "C:\YYY\Anaconda3\lib\concurrent\futures\_base.py", line 611, in result_iterator
yield fs.pop().result()
File "C:\YYY\Anaconda3\lib\concurrent\futures\_base.py", line 439, in result
return self.__get_result()
File "C:\YYY\Anaconda3\lib\concurrent\futures\_base.py", line 388, in __get_result
raise self._exception
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I'm using Python 3.8.3, usually execute my code in Spyder and run a Windows machine.
How may I adapt my code to utilise multiprocessing within a module? Would that be even possible in the first place - I found very conflicting statements?
Any help is appreciated, cheers.
Try this for your file "main.py":
if __name__ == '__main__':
test = TestMpModule()
test.do()
For the multiprocessing part, I recommend to use the multiprocessing package. Here is a little exemple on how to use it:
import multiprocessing
def my_func(i):
return i
if __name__ == '__main__':
with multiprocessing.Pool(multiprocessing.cpu_count()) as p:
outputs = p.starmap(my_func, [(i, ) for i in range(5)])
print(outputs) # > [0, 1, 2, 3, 4]
I found a solution not sure if it is considered pretty, though. The name_guard needs to be carried into the module as follows:
# filename: test_mp_module.py
import concurrent.futures
def test_fct(i):
return i
class TestMpModule():
def __init__(self):
pass
def do(self, name_guard):
para = [1, 2, 3]
if name_guard == 'parent_module_name': # check parent module name here
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(test_fct, para)
for result in results:
print(result)
and
from test_mp_module import TestMpModule
if __name__ == "__main__":
name_guard = "parent_module_name" # insert __name__ here
test = TestMpModule()
test.do(name_guard)
Works fine now.

Multiprocessing Robust to Occasional Failures

I have a 100-1000 timeseries paths and a fairly expensive simulation that I'd like to parallelize. However, the library I'm using hangs on rare occasions and I'd like to make it robust to those issues. This is the current setup:
with Pool() as pool:
res = pool.map_async(simulation_that_occasionally_hangs, (p for p in paths))
all_costs = res.get()
I know get() has a timeout parameter but if I understand correctly that works on the whole process of the 1000 paths. What I'd like to do is check if any single simulation is taking longer than 5 minutes (a normal path takes 4 seconds) and if so just stop that path and continue to get() the rest.
EDIT:
Testing timeout in pebble
def fibonacci(n):
if n == 0: return 0
elif n == 1: return 1
else: return fibonacci(n - 1) + fibonacci(n - 2)
def main():
with ProcessPool() as pool:
future = pool.map(fibonacci, range(40), timeout=10)
iterator = future.result()
all = []
while True:
try:
all.append(next(iterator))
except StopIteration:
break
except TimeoutError as e:
print(f'function took longer than {e.args[1]} seconds')
print(all)
Errors:
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\anaconda3\lib\multiprocessing\spawn.py", line 99, in spawn_main
new_handle = reduction.steal_handle(parent_pid, pipe_handle)
File "C:\anaconda3\lib\multiprocessing\reduction.py", line 87, in steal_handle
_winapi.DUPLICATE_SAME_ACCESS | _winapi.DUPLICATE_CLOSE_SOURCE)
PermissionError: [WinError 5] Access is denied
The pebble library has been designed to address these kinds of issues. It handles transparently job timeouts and failures such as C library crashes.
You can check the documentation examples to see how to use it. It has a similar interface as concurrent.futures.
Probably the easiest way is to run each heavy simulation in a separate subprocess, with the parent process watching it. Specifically:
def risky_simulation(path):
...
def safe_simulation(path):
p = multiprocessing.Process(target=risky_simulation, args=(path,))
p.start()
p.join(timeout) # Your timeout here
p.kill() # or p.terminate()
# Here read and return the output of the simulation
# Can be from a file, or using some communication object
# between processes, from the `multiprocessing` module
with Pool() as pool:
res = pool.map_async(safe_simulation, paths)
all_costs = res.get()
Notes:
If the simulation may hang, you may want to run it in a separate process (i.e. the Process object should not be a thread), as depending on how it's done, it may catch the GIL.
This solution only uses the pool for the immediate sub-processes, but the computations are off-loaded to new processes. We can also make sure the computations share a pool, but that would result in uglier code, so I skipped it.

Raise exception if script fails

I have a python script, tutorial.py. I want to run this script from a file test_tutorial.py, which is within my python test suite. If tutorial.py executes without any exceptions, I want the test to pass; if any exceptions are raised during execution of tutorial.py, I want the test to fail.
Here is how I am writing test_tutorial.py, which does not produce the desired behavior:
from os import system
test_passes = False
try:
system("python tutorial.py")
test_passes = True
except:
pass
assert test_passes
I find that the above control flow is incorrect: if tutorial.py raises an exception, then the assert line never executes.
What is the correct way to test if an external script raises an exception?
If there is no error s will be 0:
from os import system
s=system("python tutorial.py")
assert s == 0
Or use subprocess:
from subprocess import PIPE,Popen
s = Popen(["python" ,"tutorial.py"],stderr=PIPE)
_,err = s.communicate() # err will be empty string if the program runs ok
assert not err
Your try/except is catching nothing from the tutorial file, you can move everything outside the it and it will behave the same:
from os import system
test_passes = False
s = system("python tutorial.py")
test_passes = True
assert test_passes
from os import system
test_passes = False
try:
system("python tutorial.py")
test_passes = True
except:
pass
finally:
assert test_passes
This is going to solve your problem.
Finally block is going to process if any error is raised. Check this for more information.It's usually using for file process if it's not with open() method, to see the file is safely closed.

Failures with Python multiprocessing.Pool when maxtasksperchild is set

I am using Python 2.7.8 on Linux and am seeing a consistent failure in a program that uses multiprocessing.Pool(). When I set maxtasksperchild to None, then all is well, when testing across a variety of values for processes. But if I set maxtasksperchild=n (n>=1), then I invariably end with an uncaught exception. Here is the main block:
if __name__ == "__main__":
options = parse_cmdline()
subproc = Sub_process(options)
lock = multiprocessing.Lock()
[...]
pool = multiprocessing.Pool(processes=options.processes,
maxtasksperchild=options.maxtasksperchild)
imap_it = pool.imap(recluster_block, subproc.input_block_generator())
#import pdb; pdb.set_trace()
for count, result in enumerate(imap_it):
print "Count = {}".format(count)
if result is None or len(result) == 0:
# presumably error was reported
continue
(interval, block_id, num_hpcs, num_final, retlist) = result
for c in retlist:
subproc.output_cluster(c, lock)
print "About to close_outfile."
subproc.close_outfile()
print "About to close pool."
pool.close()
print "About to join pool."
pool.join()
For debugging I have added a print statement showing the number of times through the loop. Here are a couple runs:
$ $prog --processes=2 --maxtasksperchild=2
Count = 0
Count = 1
Count = 2
Traceback (most recent call last):
File "[...]reclustering.py", line 821, in <module>
for count, result in enumerate(imap_it):
File "[...]/lib/python2.7/multiprocessing/pool.py", line 659, in next
raise value
TypeError: 'int' object is not callable
$ $prog --processes=2 --maxtasksperchild=1
Count = 0
Count = 1
Traceback (most recent call last):
[same message as above]
If I do not set maxtasksperchild, the program runs to completion successfully. Also, if I uncomment the "import pdb; pdb.set_trace()" line and enter the debugger, then the problem does not appear (Heisenbug). So, am I doing something wrong in the code here? Are there conditions on the code that generates the input (subproc.input_block_generator) or the code that processes it (recluster_block), that are known to cause issues like this? Thanks!
maxtasksperchild causes multiprocessing to respawn child processes. The idea is to get rid of any cruft that is building up. The problem is, you can get new cruft from the parent. When the child respawns, it gets the current state of the parent process, which is different than the orignal spawn. You are doing your work in the script's global namespace, so you are changing the environment the child will see quite a bit. Specifically, you use a variable called 'count' that masks a previous 'from itertools import count' statement.
To fix this:
use namespaces (itertools.count, like you said in the comment) to reduce name collisions
do your work in a function so that local variables aren't propagated to the child.

Python multiprocessing: synchronizing file-like object

I'm trying to make a file like object which is meant to be assigned to sys.stdout/sys.stderr during testing to provide deterministic output. It's not meant to be fast, just reliable. What I have so far almost works, but I need some help getting rid of the last few edge-case errors.
Here is my current implementation.
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
from os import getpid
class MultiProcessFile(object):
"""
helper for testing multiprocessing
multiprocessing poses a problem for doctests, since the strategy
of replacing sys.stdout/stderr with file-like objects then
inspecting the results won't work: the child processes will
write to the objects, but the data will not be reflected
in the parent doctest-ing process.
The solution is to create file-like objects which will interact with
multiprocessing in a more desirable way.
All processes can write to this object, but only the creator can read.
This allows the testing system to see a unified picture of I/O.
"""
def __init__(self):
# per advice at:
# http://docs.python.org/library/multiprocessing.html#all-platforms
from multiprocessing import Queue
self.__master = getpid()
self.__queue = Queue()
self.__buffer = StringIO()
self.softspace = 0
def buffer(self):
if getpid() != self.__master:
return
from Queue import Empty
from collections import defaultdict
cache = defaultdict(str)
while True:
try:
pid, data = self.__queue.get_nowait()
except Empty:
break
cache[pid] += data
for pid in sorted(cache):
self.__buffer.write( '%s wrote: %r\n' % (pid, cache[pid]) )
def write(self, data):
self.__queue.put((getpid(), data))
def __iter__(self):
"getattr doesn't work for iter()"
self.buffer()
return self.__buffer
def getvalue(self):
self.buffer()
return self.__buffer.getvalue()
def flush(self):
"meaningless"
pass
... and a quick test script:
#!/usr/bin/python2.6
from multiprocessing import Process
from mpfile import MultiProcessFile
def printer(msg):
print msg
processes = []
for i in range(20):
processes.append( Process(target=printer, args=(i,), name='printer') )
print 'START'
import sys
buffer = MultiProcessFile()
sys.stdout = buffer
for p in processes:
p.start()
for p in processes:
p.join()
for i in range(20):
print i,
print
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
print
print 'DONE'
print
buffer.buffer()
print buffer.getvalue()
This works perfectly 95% of the time, but it has three edge-case problems. I have to run the test script in a fast while-loop to reproduce these.
3% of the time, the parent process output isn't completely reflected. I assume this is because the data is being consumed before the Queue-flushing thread can catch up. I haven't though of a way to wait for the thread without deadlocking.
.5% of the time, there's a traceback from the multiprocess.Queue implementation
.01% of the time, the PIDs wrap around, and so sorting by PID gives the wrong ordering.
In the very worst case (odds: one in 70 million), the output would look like this:
START
DONE
302 wrote: '19\n'
32731 wrote: '0 1 2 3 4 5 6 7 8 '
32732 wrote: '0\n'
32734 wrote: '1\n'
32735 wrote: '2\n'
32736 wrote: '3\n'
32737 wrote: '4\n'
32738 wrote: '5\n'
32743 wrote: '6\n'
32744 wrote: '7\n'
32745 wrote: '8\n'
32749 wrote: '9\n'
32751 wrote: '10\n'
32752 wrote: '11\n'
32753 wrote: '12\n'
32754 wrote: '13\n'
32756 wrote: '14\n'
32757 wrote: '15\n'
32759 wrote: '16\n'
32760 wrote: '17\n'
32761 wrote: '18\n'
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
File "/usr/lib/python2.6/threading.py", line 484, in run
File "/usr/lib/python2.6/multiprocessing/queues.py", line 233, in _feed
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
In python2.7 the exception is slightly different:
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 505, in run
File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
<type 'exceptions.IOError'>: [Errno 32] Broken pipe
How do I get rid of these edge cases?
The solution came in two parts. I've successfully run the test program 200 thousand times without any change in output.
The easy part was to use multiprocessing.current_process()._identity to sort the messages. This is not a part of the published API, but it is a unique, deterministic identifier of each process. This fixed the problem with PIDs wrapping around and giving a bad ordering of output.
The other part of the solution was to use multiprocessing.Manager().Queue() rather than the multiprocessing.Queue. This fixes problem #2 above because the manager lives in a separate Process, and so avoids some of the bad special cases when using a Queue from the owning process. #3 is fixed because the Queue is fully exhausted and the feeder thread dies naturally before python starts shutting down and closes stdin.
I have encountered far fewer multiprocessing bugs with Python 2.7 than with Python 2.6. Having said this, the solution I used to avoid the "Exception in thread QueueFeederThread" problem is to sleep momentarily, possibly for 0.01s, in each process in which the the Queue is used. It is true that using sleep is not desirable or even reliable, but the specified duration was observed to work sufficiently well in practice for me. You can also try 0.1s.

Categories

Resources