Failures with Python multiprocessing.Pool when maxtasksperchild is set

Failures with Python multiprocessing.Pool when maxtasksperchild is set - python

I am using Python 2.7.8 on Linux and am seeing a consistent failure in a program that uses multiprocessing.Pool(). When I set maxtasksperchild to None, then all is well, when testing across a variety of values for processes. But if I set maxtasksperchild=n (n>=1), then I invariably end with an uncaught exception. Here is the main block:
if __name__ == "__main__":
options = parse_cmdline()
subproc = Sub_process(options)
lock = multiprocessing.Lock()
[...]
pool = multiprocessing.Pool(processes=options.processes,
maxtasksperchild=options.maxtasksperchild)
imap_it = pool.imap(recluster_block, subproc.input_block_generator())
#import pdb; pdb.set_trace()
for count, result in enumerate(imap_it):
print "Count = {}".format(count)
if result is None or len(result) == 0:
# presumably error was reported
continue
(interval, block_id, num_hpcs, num_final, retlist) = result
for c in retlist:
subproc.output_cluster(c, lock)
print "About to close_outfile."
subproc.close_outfile()
print "About to close pool."
pool.close()
print "About to join pool."
pool.join()
For debugging I have added a print statement showing the number of times through the loop. Here are a couple runs:
$ $prog --processes=2 --maxtasksperchild=2
Count = 0
Count = 1
Count = 2
Traceback (most recent call last):
File "[...]reclustering.py", line 821, in <module>
for count, result in enumerate(imap_it):
File "[...]/lib/python2.7/multiprocessing/pool.py", line 659, in next
raise value
TypeError: 'int' object is not callable
$ $prog --processes=2 --maxtasksperchild=1
Count = 0
Count = 1
Traceback (most recent call last):
[same message as above]
If I do not set maxtasksperchild, the program runs to completion successfully. Also, if I uncomment the "import pdb; pdb.set_trace()" line and enter the debugger, then the problem does not appear (Heisenbug). So, am I doing something wrong in the code here? Are there conditions on the code that generates the input (subproc.input_block_generator) or the code that processes it (recluster_block), that are known to cause issues like this? Thanks!

maxtasksperchild causes multiprocessing to respawn child processes. The idea is to get rid of any cruft that is building up. The problem is, you can get new cruft from the parent. When the child respawns, it gets the current state of the parent process, which is different than the orignal spawn. You are doing your work in the script's global namespace, so you are changing the environment the child will see quite a bit. Specifically, you use a variable called 'count' that masks a previous 'from itertools import count' statement.
To fix this:
use namespaces (itertools.count, like you said in the comment) to reduce name collisions
do your work in a function so that local variables aren't propagated to the child.

Related

PyCharm output error messages interspersed with console output. How to fix this?

I'm running PyCharm Community Edition 4.0.4
Does anyone know why the error messages don't display after the console output?
Thanks
C:\Python27\python.exe "F:/Google Drive/code/python_scripts/leetcode/lc_127_word_ladder.py"
Traceback (most recent call last):
START
File "F:/Google Drive/code/python_scripts/leetcode/lc_127_word_ladder.py", line 68, in <module>
print sol.ladderLength('talk', 'tail', set)
Graph:
File "F:/Google Drive/code/python_scripts/leetcode/lc_127_word_ladder.py", line 54, in ladderLength
hall ['fall']
for item in graph[node[0]]:
fall ['hall']
KeyError: 'talk'
End Graph:
Visited = {'talk': 0}
Node = ['talk', 0]
Queue Before = deque([])
Process finished with exit code 1
If you'll notice, print statements such as START, Graph:, hall ['fall'], up to Queue Before = deque([]) all happen within the functioning part of my code. The Error messages should appear after all this.

This is caused by PyCharm mixing print statements from stdout and stderr. There's a fix if you add the following line to your idea.properties file:
output.reader.blocking.mode=true
Get to idea.properties via Help | Edit Custom Properties.

might just be an issue with the stdout.
a workaround would be to use sys.flush.stdout() after your print statements.
import sys
do_something()
print("Your print statement")
sys.stdout.flush()

I'm new to pycharm, so not sure if there's a clean way to do this. But as a workaround, you could replace your print function with a custom one that sleeps quickly after printing, then your traceback should appear after your outputs.
import time
print = (lambda p: lambda *args,**kwargs: [p(*args,**kwargs), time.sleep(.01)])(print)
'''
# the above is just a one liner equivalent to this decorator
def add_sleep(p):
def new_p(*args, **kwargs):
p(*args,**kwargs)
time.sleep(.01)
return new_p
print = add_sleep(print)
'''

Learning the Python Thread Module

I am trying to learn more about the thread module. I've come up with a quick script but am getting an error when I run it. The docs show the format as:
thread.start_new_thread ( function, args[, kwargs] )
My method only has one argument.
#!/usr/bin/python
import ftplib
import thread
sites = ["ftp.openbsd.org","ftp.ucsb.edu","ubuntu.osuosl.org"]
def ftpconnect(target):
ftp = ftplib.FTP(target)
ftp.login()
print "File list from: %s" % target
files = ftp.dir()
print files
for i in sites:
thread.start_new_thread(ftpconnect(i))
The error I am seeing occurs after one iteration of the for loop:
Traceback (most recent call last): File "./ftpthread.py", line 16,
in
thread.start_new_thread(ftpconnect(i)) TypeError: start_new_thread expected at least 2 arguments, got 1
Any suggestions for this learning process would be appreciated. I also looked into using threading, but I am unable to import threading since its not install apparently and I haven't found any documentation for installing that module yet.
Thank You!
There error I get when trying to import threading on my Mac is:
>>> import threading
# threading.pyc matches threading.py
import threading # precompiled from threading.pyc
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "threading.py", line 7, in <module>
class WorkerThread(threading.Thread) :
AttributeError: 'module' object has no attribute 'Thread'

The thread.start_new_thread function is really low-level and doesn't give you a lot of control. Take a look at the threading module, more specifically the Thread class: http://docs.python.org/2/library/threading.html#thread-objects
You then want to replace the last 2 lines of your script with:
# This line should be at the top of your file, obviously :p
from threading import Thread
threads = []
for i in sites:
t = Thread(target=ftpconnect, args=[i])
threads.append(t)
t.start()
# Wait for all the threads to complete before exiting the program.
for t in threads:
t.join()
Your code was failing, by the way, because in your for loop, you were calling ftpconnect(i), waiting for it to complete, and then trying to use its return value (that is, None) to start a new thread, which obviously doesn't work.
In general, starting a thread is done by giving it a callable object (function/method -- you want the callable object, not the result of a call -- my_function, not my_function()), and optional arguments to give the callable object (in our case, [i] because ftpconnect takes one positional argument and you want it to be i), and then calling the Thread object's start method.

Now that you can import threading, start with best practices at once ;-)
import threading
threads = [threading.Thread(target=ftpconnect, args=(s,))
for s in sites]
for t in threads:
t.start()
for t in threads: # shut down cleanly
t.join()

What you want is to pass the function object and arguments to the function to thread.start_new_thread, not execute the function.
Like this:
for i in sites:
thread.start_new_thread(ftpconnect, (i,))

"WindowsError: Access is denied" on calling Process.terminate

I enforce a timeout for a block of code using the multiprocessing module. It appears that with certain sized inputs, the following error is raised:
WindowsError: [Error 5] Access is denied
I can replicate this error with the following code. Note that the code completes with '467,912,040' but not with '517,912,040'.
import multiprocessing, Queue
def wrapper(queue, lst):
lst.append(1)
queue.put(lst)
queue.close()
def timeout(timeout, lst):
q = multiprocessing.Queue(1)
proc = multiprocessing.Process(target=wrapper, args=(q, lst))
proc.start()
try:
result = q.get(True, timeout)
except Queue.Empty:
return None
finally:
proc.terminate()
return result
if __name__ == "__main__":
# lst = [0]*417912040 # this works fine
# lst = [0]*467912040 # this works fine
lst = [0] * 517912040 # this does not
print "List length:",len(lst)
timeout(60*30, lst)
The output (including error):
List length: 517912040
Traceback (most recent call last):
File ".\multiprocessing_error.py", line 29, in <module>
print "List length:",len(lst)
File ".\multiprocessing_error.py", line 21, in timeout
proc.terminate()
File "C:\Python27\lib\multiprocessing\process.py", line 137, in terminate
self._popen.terminate()
File "C:\Python27\lib\multiprocessing\forking.py", line 306, in terminate
_subprocess.TerminateProcess(int(self._handle), TERMINATE)
WindowsError: [Error 5] Access is denied
Am I not permitted to terminate a Process of a certain size?
I am using Python 2.7 on Windows 7 (64bit).

While I am still uncertain regarding the precise cause of the problem, I have some additional observations as well as a workaround.
Workaround.
Adding a try-except block in the finally clause.
finally:
try:
proc.terminate()
except WindowsError:
pass
This also seems to be the solution arrived at in a related (?) issue posted here on GitHub (you may have to scroll down a bit).
Observations.
This error is dependent on the size of the object passed to the Process/Queue, but it is not related to the execution of the Process itself. In the OP, the Process completes before the timeout expires.
proc.is_alive returns True before and after the execution of proc.terminate() (which then throws the WindowsError). A second or two later, proc.is_alive() returns False and a second call to proc.terminate() succeeds.
Forcing the main thread to sleep time.sleep(1) in the finally block also prevents the throwing of the WindowsError. Thanks, #tdelaney's comment in the OP.
My best guess is that proc is in the process of freeing memory (?, or something comparable) while being killed by the OS (having completed execution) when the call to proc.terminate() attempts to kill it again.

Python multiprocessing, ValueError: I/O operation on closed file

I'm having a problem with the Python multiprocessing package. Below is a simple example code that illustrates my problem.
import multiprocessing as mp
import time
def test_file(f):
f.write("Testing...\n")
print f.name
return None
if __name__ == "__main__":
f = open("test.txt", 'w')
proc = mp.Process(target=test_file, args=[f])
proc.start()
proc.join()
When I run this, I get the following error.
Process Process-1:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self.target(*self._args, **self._kwargs)
File "C:\Users\Ray\Google Drive\Programming\Python\tests\follow_test.py", line 24, in test_file
f.write("Testing...\n")
ValueError: I/O operation on closed file
Press any key to continue . . .
It seems that somehow the file handle is 'lost' during the creation of the new process. Could someone please explain what's going on?

I had similar issues in the past. Not sure whether it is done within the multiprocessing module or whether open sets the close-on-exec flag by default but I know for sure that file handles opened in the main process are closed in the multiprocessing children.
The obvious work around is to pass the filename as a parameter to the child process' init function and open it once within each child (if using a pool), or to pass it as a parameter to the target function and open/close on each invocation. The former requires the use of a global to store the file handle (not a good thing) - unless someone can show me how to avoid that :) - and the latter can incur a performance hit (but can be used with multiprocessing.Process directly).
Example of the former:
filehandle = None
def child_init(filename):
global filehandle
filehandle = open(filename,...)
../..
def child_target(args):
../..
if __name__ == '__main__':
# some code which defines filename
proc = multiprocessing.Pool(processes=1,initializer=child_init,initargs=[filename])
proc.apply(child_target,args)

Python multiprocessing: synchronizing file-like object

I'm trying to make a file like object which is meant to be assigned to sys.stdout/sys.stderr during testing to provide deterministic output. It's not meant to be fast, just reliable. What I have so far almost works, but I need some help getting rid of the last few edge-case errors.
Here is my current implementation.
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
from os import getpid
class MultiProcessFile(object):
"""
helper for testing multiprocessing
multiprocessing poses a problem for doctests, since the strategy
of replacing sys.stdout/stderr with file-like objects then
inspecting the results won't work: the child processes will
write to the objects, but the data will not be reflected
in the parent doctest-ing process.
The solution is to create file-like objects which will interact with
multiprocessing in a more desirable way.
All processes can write to this object, but only the creator can read.
This allows the testing system to see a unified picture of I/O.
"""
def __init__(self):
# per advice at:
# http://docs.python.org/library/multiprocessing.html#all-platforms
from multiprocessing import Queue
self.__master = getpid()
self.__queue = Queue()
self.__buffer = StringIO()
self.softspace = 0
def buffer(self):
if getpid() != self.__master:
return
from Queue import Empty
from collections import defaultdict
cache = defaultdict(str)
while True:
try:
pid, data = self.__queue.get_nowait()
except Empty:
break
cache[pid] += data
for pid in sorted(cache):
self.__buffer.write( '%s wrote: %r\n' % (pid, cache[pid]) )
def write(self, data):
self.__queue.put((getpid(), data))
def __iter__(self):
"getattr doesn't work for iter()"
self.buffer()
return self.__buffer
def getvalue(self):
self.buffer()
return self.__buffer.getvalue()
def flush(self):
"meaningless"
pass
... and a quick test script:
#!/usr/bin/python2.6
from multiprocessing import Process
from mpfile import MultiProcessFile
def printer(msg):
print msg
processes = []
for i in range(20):
processes.append( Process(target=printer, args=(i,), name='printer') )
print 'START'
import sys
buffer = MultiProcessFile()
sys.stdout = buffer
for p in processes:
p.start()
for p in processes:
p.join()
for i in range(20):
print i,
print
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
print
print 'DONE'
print
buffer.buffer()
print buffer.getvalue()
This works perfectly 95% of the time, but it has three edge-case problems. I have to run the test script in a fast while-loop to reproduce these.
3% of the time, the parent process output isn't completely reflected. I assume this is because the data is being consumed before the Queue-flushing thread can catch up. I haven't though of a way to wait for the thread without deadlocking.
.5% of the time, there's a traceback from the multiprocess.Queue implementation
.01% of the time, the PIDs wrap around, and so sorting by PID gives the wrong ordering.
In the very worst case (odds: one in 70 million), the output would look like this:
START
DONE
302 wrote: '19\n'
32731 wrote: '0 1 2 3 4 5 6 7 8 '
32732 wrote: '0\n'
32734 wrote: '1\n'
32735 wrote: '2\n'
32736 wrote: '3\n'
32737 wrote: '4\n'
32738 wrote: '5\n'
32743 wrote: '6\n'
32744 wrote: '7\n'
32745 wrote: '8\n'
32749 wrote: '9\n'
32751 wrote: '10\n'
32752 wrote: '11\n'
32753 wrote: '12\n'
32754 wrote: '13\n'
32756 wrote: '14\n'
32757 wrote: '15\n'
32759 wrote: '16\n'
32760 wrote: '17\n'
32761 wrote: '18\n'
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
File "/usr/lib/python2.6/threading.py", line 484, in run
File "/usr/lib/python2.6/multiprocessing/queues.py", line 233, in _feed
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
In python2.7 the exception is slightly different:
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 505, in run
File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
<type 'exceptions.IOError'>: [Errno 32] Broken pipe
How do I get rid of these edge cases?

The solution came in two parts. I've successfully run the test program 200 thousand times without any change in output.
The easy part was to use multiprocessing.current_process()._identity to sort the messages. This is not a part of the published API, but it is a unique, deterministic identifier of each process. This fixed the problem with PIDs wrapping around and giving a bad ordering of output.
The other part of the solution was to use multiprocessing.Manager().Queue() rather than the multiprocessing.Queue. This fixes problem #2 above because the manager lives in a separate Process, and so avoids some of the bad special cases when using a Queue from the owning process. #3 is fixed because the Queue is fully exhausted and the feeder thread dies naturally before python starts shutting down and closes stdin.

I have encountered far fewer multiprocessing bugs with Python 2.7 than with Python 2.6. Having said this, the solution I used to avoid the "Exception in thread QueueFeederThread" problem is to sleep momentarily, possibly for 0.01s, in each process in which the the Queue is used. It is true that using sleep is not desirable or even reliable, but the specified duration was observed to work sufficiently well in practice for me. You can also try 0.1s.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Failures with Python multiprocessing.Pool when maxtasksperchild is set - python

Related

PyCharm output error messages interspersed with console output. How to fix this?

Learning the Python Thread Module

"WindowsError: Access is denied" on calling Process.terminate

Python multiprocessing, ValueError: I/O operation on closed file

Python multiprocessing: synchronizing file-like object

Categories

Resources