Iterate over list from leading and trailing with multiprocessing - python

I want to iterate over a list with 2 function using multiprocessing one function iterate over the main_list from leading and other from trailing, I want this function each time that iterates over the sample list (g) put the element in main list till one of them find a duplicate in list then I want the terminate both processes and return the seen elements.
I expect that the first process return :
['a', 'b', 'c', 'd', 'e', 'f']
And the second return :
['l', 'k', 'j', 'i', 'h', 'g']
this is my code that returns an Error:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.list()
# Fn definitions and such
def a(main_path,g,l=[]):
for i in g:
l.append(i)
print 'a'
if i in main_path:
return l
main_path.append(i)
def b(main_path,g,l=[]):
for i in g:
l.append(i)
print 'b'
if i in main_path:
return l
main_path.append(i)
g=['a','b','c','d','e','f','g','h','i','j','k','l']
g2=g[::-1]
p1 = Process(target=a, args=(d,g))
p2 = Process(target=b, args=(d,g2))
p1.start()
p2.start()
And this is the Traceback:
a
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/bluebird/Desktop/persiantext.py", line 17, in a
if i in main_path:
File "<string>", line 2, in __contains__
File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
b
c = SocketClient(address)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
s.connect(address)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/bluebird/Desktop/persiantext.py", line 27, in b
if i in main_path:
File "<string>", line 2, in __contains__
File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
c = SocketClient(address)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
s.connect(address)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
Note that i have not any idea that how terminate both processes after that one of them find a duplicated element!!

There are all kinds of other problems in your code, but since I already explained them on your other question, I won't get into them here.
The new problem is that you're not joining your child processes. In your threaded version, this wasn't an issue just because your main thread accidentally had a "block forever" before the end. But here, you don't have that, so the main process reaches the end of the script while the background processes are still running.
When this happens, it's not entirely defined what your code will do.* But basically, you're destroying the manager object, which shuts down the manager server while the background processes are still using it, so they're going to raise exceptions the next time they try to access a managed object.
The solution is to add p1.join() and p2.join() to the end of your script.
But that really only gets you back to the same situation as your threaded code (except not blocking forever at the end). You've still got code that's completely serialized, and a big race condition, and so on.
If you're curious why this happens:
At the end of the script, all of your module's globals go out of scope.** Since those variables are the only reference you have to the manager and process objects, those objects get garbage-collected, and their destructors get called.
For a manager object, the destructor shuts down the server.
For a process object, I'm not entirely sure, but I think the destructor does nothing (rather than join it and/or interrupt it). Instead, there's an atexit function, that runs after all of the destructors, that joins any still-running processes.***
So, first the manager goes away, then the main process starts waiting for the children to finish; the next time each one tries to access a managed object, it fails and exits. Once all of them do that, the main process finishes waiting and exits.
* The multiprocessing changes in 3.2 and the shutdown changes in 3.4 make things a lot cleaner, so if we weren't talking about 2.7, there would be less "here's what usually happens but not always" and "here's what happens in one particular implementation on one particular platform".
** This isn't actually guaranteed by 2.7, and garbage-collecting all of the modules' globals doesn't always happen. But in this particular simple case, I'm pretty sure it will always work this way, at least in CPython, although I don't want to try to explain why.
*** That's definitely how it works with threads, at least on CPython 2.7 on Unix… again, this isn't at all documented in 2.x, so you can only tell by reading the source or experimenting on the platforms/implementations/versions that matter to you… And I don't want to track this through the source unless there's likely to be something puzzling or interesting to find.

Related

Implementation details of Python multiprocessing fork vs spawn

According to the multiprocessing documentation on picklability, it states
Picklability
Ensure that the arguments to the methods of proxies are picklable.
More picklability
Ensure that all arguments to Process.init() are picklable. Also, if you subclass Process then make sure that instances will be picklable when the Process.start method is called.
I think it basically means that whatever is sent through arguments of Process will be pickled/unpickled.
But in Better to inherit than pickle/unpickle session, it states
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
I conducted the experiment which shows the output Read successfully..
def read_dataset(dataset, window):
return dataset.read(window=window)
if __name__ == "__main__":
mp.set_start_method("fork")
with rasterio.open(Path("test.tiff").absolute()) as dataset:
window = Window(col_off=0, row_off=0, width=100, height=100)
p1 = mp.Process(target=read_dataset, args=(dataset, window))
p1.start()
p1.join()
print("Read successfully.")
But when changing to mp.set_start_method("spawn"), it shows the error below.
Traceback (most recent call last):
File "test.py", line 88, in <module>
p1.start()
File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "stringsource", line 2, in rasterio._io.DatasetReaderBase.__reduce_cython__
TypeError: self._hds cannot be converted to a Python object for pickling
My question is the following.
When a child process is generated with fork, the variable is inherited instead of pickled/unpickled. But when a child process is generated with spawn, then the arguments are sent through pickling/unpickling. Where can I find such implementation detail? Thanks.
multiprocessing context
popen fork
spawn fork
fork shares a value in memory and starts it.
spawn is implemented by creating a cmd for the source code and sharing some variables through a pipe.
If you edit and save the source code just before spawning, you can get the result of the modified code.

HTTP request concurrency and reverse proxy

Apologies if the title is not descriptive and I struggle to find a good title for the question.
My question involves Python, CouchDB (to a lesser degree), multiprocessing and networking. It started out as I was trying to debug a co-worker's program using Python's multiprocessing module to parallelize requests to a CouchDB database using couchdb-python. I created a minimal program to exhibit the bug and eventually solved the issue, but the solution drew another question which I was not able to answer to my best knowledge. I'm hoping experts on SO could help me with this, so here it goes.
The premise of the problem is pretty simple. We have n resources, all of which can be retrieved concurrently. Instead of making n serial requests, my co-worker is using the multiprocessing module to fetch all n resources in parallel. Here's a program I wrote to demonstrate the issue:
The Script (bug.py)
import couchdb
import multiprocessing
server = couchdb.Server(SERVER)
try:
database = server.create('test')
except:
server.delete('test')
database = server.create('test')
database.save({'_id': '1', 'type': 'dog', 'name': 'chase'})
database.save({'_id': '2', 'type': 'dog', 'name': 'rubble'})
database.save({'_id': '3', 'type': 'cat', 'name': 'kali'})
def query_id(id):
print(dict(database[id]))
def main():
args = [
['dog', 'chase'],
['dog', 'rubble'],
['cat', 'kali'],
]
print('-' * 80)
processes = []
for id_ in ['1', '2', '3']:
proc = multiprocessing.Process(target=query_id, args=(id_))
processes.append(proc)
proc.start()
for proc in processes:
proc.join()
if __name__ == '__main__':
main()
Pretty innocent code, right? Well, running it on the latest couchdb and couchdb-python gives the following error:
The output
--------------------------------------------------------------------------------
Process Process-2:
Process Process-1:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
File "bug.py", line 25, in query_id
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
return Document(data)
TypeError: 'ResponseBody' object is not iterable
return Document(data)
TypeError: 'ResponseBody' object is not iterable
After some digging, I finally found out that couchdb-python's implementation of ConnectionPool is not multiprocess safe. See this PR for more details. Basically, all processes share the same ConnectionPool object, and was given the same httplib.HTTPConnection object, and when they all simultaneously try to read from the connection, the string being returned is garbled, and the bug ensued. You can see the evidence of it if you put print(os.getpid(), line) inside httplib.HTTPResponse._read_status method. Here's a sample output after the print statement is added:
(26490, 'TP1.120 O\r\n')
(26489, 'T/ 0KServer: CouchDB/1.6.1 (Erlang OTP/17)\r\n')
Process Process-2:
Process Process-3:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
return Document(data)
TypeError: 'ResponseBody' object is not iterable
return Document(data)
TypeError: 'ResponseBody' object is not iterable
As seen here, the first line being read from the sub-processes are only partial, indicating a race condition here. If I further inspect the HTTPConnection object, I can see all three processes are sharing the same connection object, the socket to the server and the file descriptor from the socket that's being used for reading.
Puzzle
So far so good. I've identified the root cause of the problem and put together a fix. However, complication arises when I put the couchdb instance behind a reverse proxy. In this case, I'm using haproxy. Here's a sample config:
global
...
defaults
...
listen couchdb
bind *:9999
mode http
stats enable
option httpclose
option forwardfor
server couchdb-1 127.0.0.1:5984 check
and point the couchdb server url to http://localhost:9999 in the bug script, reran the script, and everything was fine! I also inspected the connection object, the socket and the file descriptor, and there were also shared among all processes.
This got me puzzled. I brought up mitmproxy and inspected what's going on in the two cases: with or without haproxy.
Without haproxy
When the parallel requests are made without haproxy, I observed in the mitmproxy details tab (I showed a single request, but the timing sequence is the same for all 3 concurrent requests):
The event sequence here suggests a blocking synchronous request.
With haproxy
You can see the sequence here is different from that without haproxy. Request is considered complete without the server connection being initiated.
Question
I'm not used to working at this low level so I know my knowledge in this regard is pretty lacking here. I want to understand what difference did putting haproxy in front of it brought that subverted the multiprocessing bug in couchdb-python? haproxy is event-based, so I suspect that has something to do with it, but would really appreciate someone explaining the difference!
Thanks a bunch in advance!

How can I catch a memory error in a spawned thread?

I've never used the multiprocessing library before, so all advice is welcome..
I've got a python program that uses the multiprocessing library to do some memory-intensive tasks in multiple processes, which occasionally runs out of memory (I'm working on optimizations, but that's not what this question is about). Sometimes, an out-of-memory error gets thrown in a way that I can't seem to catch (output below), and then the program hangs on pool.join() (I'm using multiprocessing.Pool. How can I make the program do something other than indefinitely wait when this problem occurs?
Ideally, The memory error is propagated back to the main process which then dies.
Here's the memory error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 325, in _handle_workers
pool._maintain_pool()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 229, in _maintain_pool
self._repopulate_pool()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 222, in _repopulate_pool
w.start()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
And here's where i manage multiprocessing:
mp_pool = mp.Pool(processes=num_processes)
mp_results = list()
for datum in input_data:
data_args = {
'value': 0 // actually some other simple dict key/values
}
mp_results.append(mp_pool.apply_async(_process_data, args=(common_args, data_args)))
frame_pool.close()
frame_pool.join() // hangs here when that thread dies..
for result_async in mp_results:
result = result_async.get()
// do stuff to collect results
// rest of the code
When I interrupt the hanging program, I get:
Process process_003:
Traceback (most recent call last):
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/queues.py", line 374, in get
return recv()
racquire()
KeyboardInterrupt
This is actually a known bug in python's multiprocessing module, fixed in python 3 (here's a summarizing blog post I found). There's a patch attached to python issue 22393, but that hasn't been officially applied.
Basically, if one of a multiprocess pool's sub-processes die unexpectedly (out of memory, killed externally, etc.), the pool will wait indefinitely.

Python multiprocessing lock mechanism failing when acquired lock

I am trying to implement a multiprocessing application which can access a shared data resource. I am using locking mechanism to make sure the shared resource is accessed safely. However I am hitting error . Surprisingly if process 1 acquires lock first it is servicing the request and it is failing on next process which is trying to acquire lock.But if some other process other than 1 is trying to acquire lock first it is failing in very first run . I am new to python and using documentation to implement this So I am unaware if I am missing any basic safety mechanisms here.Any data point as why I am witnessing this would be of great help
PROGRAM:
#!/usr/bin/python
from multiprocessing import Process, Manager, Lock
import os
import Queue
import time
lock = Lock()
def launch_worker(d,l,index):
global lock
lock.acquire()
d[index] = "new"
print "in process"+str(index)
print d
lock.release()
return None
def dispatcher():
i=1
d={}
mp = Manager()
d = mp.dict()
d[1] = "a"
d[2] = "b"
d[3] = "c"
d[4] = "d"
d[5] = "e"
l = mp.list(range(10))
for i in range(4):
p = Process(target=launch_worker, args=(d,l,i))
i = i+1
p.start()
return None
if __name__ == '__main__':
dispatcher()
ERROR when process 1 is serviced first
in process0
{0: 'new', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "dispatcher.py", line 10, in launch_worker
d[index] = "new"
File "<string>", line 2, in __setitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
ERROR when process 2 is serviced first
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "dispatcher.py", line 10, in launch_worker
d[index] = "new"
File "<string>", line 2, in __setitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 150, in Client
deliver_challenge(c, authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 373, in deliver_challenge
response = connection.recv_bytes(256) # reject large message
IOError: [Errno 104] Connection reset by peer
The dict your workers modify is a shared object managed by the dispatching process; modifications to that object by the workers requires that they communicate with the dispatching process. The errors you see come from the fact that your dispatcher isn't waiting for the worker processes after it launches them; it's exiting too soon, so it might not exist for them to communicate with when they need to.
The first worker or two that attempts to update the shared dict might succeed, because when they modify the shared dict the process containing the Manager instance might still exist (e.g., it might still be in the process of creating further workers). Thus in your examples you see some successful output. But the managing process soon exits, and the next worker that attempts a modification will fail. (The error messages you see are typical of failed attempts at inter-process communication; you'll probably also see EOF errors if you run your program a few more times.)
What you need to do is call the join method on the Process objects as a way of waiting for each of them to exit. The following modification of your dispatcher shows the basic idea:
def dispatcher():
mp = Manager()
d = mp.dict()
d[1] = "a"
d[2] = "b"
d[3] = "c"
d[4] = "d"
d[5] = "e"
procs = []
for i in range(4):
p = Process(target=launch_worker, args=(d,i))
procs.append(p)
p.start()
for p in procs:
p.join()

Error with multiprocessing, atexit and global data

Sorry in advance, this is going to be long ...
Possibly related:
Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"
Definitely related:
python parallel map (multiprocessing.Pool.map) with global data
Keyboard Interrupts with python's multiprocessing Pool
Here's a "simple" script I hacked together to illustrate my problem...
import time
import multiprocessing as multi
import atexit
cleanup_stuff=multi.Manager().list([])
##################################################
# Some code to allow keyboard interrupts
##################################################
was_interrupted=multi.Manager().list([])
class _interrupt(object):
"""
Toy class to allow retrieval of the interrupt that triggered it's execution
"""
def __init__(self,interrupt):
self.interrupt=interrupt
def interrupt():
was_interrupted.append(1)
def interruptable(func):
"""
decorator to allow functions to be "interruptable" by
a keyboard interrupt when in python's multiprocessing.Pool.map
**Note**, this won't actually cause the Map to be interrupted,
It will merely cause the following functions to be not executed.
"""
def newfunc(*args,**kwargs):
try:
if(not was_interrupted):
return func(*args,**kwargs)
else:
return False
except KeyboardInterrupt as e:
interrupt()
return _interrupt(e) #If we really want to know about the interrupt...
return newfunc
#atexit.register
def cleanup():
for i in cleanup_stuff:
print(i)
return
#interruptable
def func(i):
print(i)
cleanup_stuff.append(i)
time.sleep(float(i)/10.)
return i
#Must wrap func here, otherwise it won't be found in __main__'s dict
#Maybe because it was created dynamically using the decorator?
def wrapper(*args):
return func(*args)
if __name__ == "__main__":
#This is an attempt to use signals -- I also attempted something similar where
#The signals were only caught in the child processes...Or only on the main process...
#
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#Try 2 with signals (only catch signal on main process)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#def startup(): signal.signal(signal.SIGINT,signal.SIG_IGN)
#p=multi.Pool(processes=4,initializer=startup)
#Try 3 with signals (only catch signal on child processes)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,signal.SIG_IGN)
#def startup(): signal.signal(signal.SIGINT,onSigInt)
#p=multi.Pool(processes=4,initializer=startup)
p=multi.Pool(4)
try:
out=p.map(wrapper,range(30))
#out=p.map_async(wrapper,range(30)).get() #This doesn't work either...
#The following lines don't work either
#Effectively trying to roll my own p.map() with p.apply_async
# results=[p.apply_async(wrapper,args=(i,)) for i in range(30)]
# out = [ r.get() for r in results() ]
except KeyboardInterrupt:
print ("Hello!")
out=None
finally:
p.terminate()
p.join()
print (out)
This works just fine if no KeyboardInterrupt is raised. However, if I raise one, the following exception occurs:
10
7
9
12
^CHello!
None
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
socket.error: [Errno 2] No such file or directory
Interestingly enough, the code does exit the Pool.map function without calling any of the additional functions ... The problem seems to be that the KeyboardInterrupt isn't handled properly at some point, but it is a little confusing where that is, and why it isn't handled in interruptable. Thanks.
Note, the same problem happens if I use out=p.map_async(wrapper,range(30)).get()
EDIT 1
A little closer ... If I enclose the out=p.map(...) in a try,except,finally clause, it gets rid of the first exception ... the other ones are still raised in atexit however. The code and traceback above have been updated.
EDIT 2
Something else that does not work has been added to the code above as a comment. (Same error). This attempt was inspired by:
http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
EDIT 3
Another failed attempt using signals added to the code above.
EDIT 4
I have figured out how to restructure my code so that the above is no longer necessary. In the (unlikely) event that someone stumbles upon this thread with the same use-case that I had, I will describe my solution ...
Use Case
I have a function which generates temporary files using the tempfile module. I would like those temporary files to be cleaned up when the program exits. My initial attempt was to pack each temporary file name into a list and then delete all the elements of the list with a function registered via atexit.register. The problem is that the updated list was not being updated across multiple processes. This is where I got the idea of using multiprocessing.Manager to manage the list data. Unfortunately, this fails on a KeyboardInterrupt no matter how hard I tried because the communication sockets between processes were broken for some reason. The solution to this problem is simple. Prior to using multiprocessing, set the temporary file directory ... something like tempfile.tempdir=tempfile.mkdtemp() and then register a function to delete the temporary directory. Each of the processes writes to the same temporary directory, so it works. Of course, this solution only works where the shared data is a list of files that needs to be deleted at the end of the program's life.

Categories

Resources