Fixed strptime exception with thread lock, but slows down the program - python

I have the following code, which when is running inside of a thread (the full code is here - https://github.com/eWizardII/homobabel/blob/master/lovebird.py)
for null in range(0,1):
while True:
try:
with open('C:/Twitter/tweets/user_0_' + str(self.id) + '.json', mode='w') as f:
f.write('[')
threadLock.acquire()
for i, seed in enumerate(Cursor(api.user_timeline,screen_name=self.ip).items(200)):
if i>0:
f.write(", ")
f.write("%s" % (json.dumps(dict(sc=seed.author.statuses_count))))
j = j + 1
threadLock.release()
f.write("]")
except tweepy.TweepError, e:
with open('C:/Twitter/tweets/user_0_' + str(self.id) + '.json', mode='a') as f:
f.write("]")
print "ERROR on " + str(self.ip) + " Reason: ", e
with open('C:/Twitter/errors_0.txt', mode='a') as a_file:
new_ii = "ERROR on " + str(self.ip) + " Reason: " + str(e) + "\n"
a_file.write(new_ii)
break
Now without the thread lock I generate the following error:
Exception in thread Thread-117: Traceback (most recent call last): File "C:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run() File "C:/Twitter/homobabel/lovebird.py", line 62, in run
for i, seed in enumerate(Cursor(api.user_timeline,screen_name=self.ip).items(200)): File "build\bdist.win-amd64\egg\tweepy\cursor.py", line 110, in next
self.current_page = self.page_iterator.next() File "build\bdist.win-amd64\egg\tweepy\cursor.py", line 85, in next
items = self.method(page=self.current_page,
*self.args, **self.kargs) File "build\bdist.win-amd64\egg\tweepy\binder.py", line 196, in _call
return method.execute() File "build\bdist.win-amd64\egg\tweepy\binder.py", line 182, in execute
result = self.api.parser.parse(self, resp.read()) File "build\bdist.win-amd64\egg\tweepy\parsers.py", line 75, in parse
result = model.parse_list(method.api, json) File "build\bdist.win-amd64\egg\tweepy\models.py", line 38, in parse_list
results.append(cls.parse(api, obj)) File "build\bdist.win-amd64\egg\tweepy\models.py", line 49, in parse
user = User.parse(api, v) File "build\bdist.win-amd64\egg\tweepy\models.py", line 86, in parse
setattr(user, k, parse_datetime(v)) File "build\bdist.win-amd64\egg\tweepy\utils.py", line 17, in parse_datetime
date = datetime(*(time.strptime(string, '%a %b %d %H:%M:%S +0000 %Y')[0:6])) File "C:\Python27\lib\_strptime.py", line 454, in _strptime_time
return _strptime(data_string, format)[0] File "C:\Python27\lib\_strptime.py", line 300, in _strptime
_TimeRE_cache = TimeRE() File "C:\Python27\lib\_strptime.py", line 188, in __init__
self.locale_time = LocaleTime() File "C:\Python27\lib\_strptime.py", line 77, in __init__
raise ValueError("locale changed during initialization") ValueError: locale changed during initialization
The problem is with thread lock on, each thread runs itself serially basically, and it takes way to long for each loop to run for there to be any advantage to having a thread anymore. So if there isn't a way to get rid of the thread lock, is there a way to have it run the for loop inside of the try statement faster?

According to a previous Answer on StackOverflow, time.strptime is not thread-safe. Unfortunately, the error referenced in that question is different than the error you're experiencing.
Their solution was to call time.strptime prior to initializing any threads, and then subsequent calls to time.strptime in various threads will work.
I think the same solution may work in your situation after reviewing the _strptime and locale standard library modules. I can't be certain it will work since I can't test your code locally, but I thought I'd provide you with a potential solution.
Let me know if this works.
Edit:
I've done a bit more research and the Python standard library is calling setlocale in the locale.h C header file. According to the setlocale documentation, this is not thread-safe and that calls to setlocale should occur before initializing threads as I mentioned previously.
Unfortunately, setlocale is called each time you call time.strptime. So, I suggest the following:
Test out the solution laid out earlier, try calling time.strptime before initializing the threads and remove the locks.
If #1 doesn't work, you'll probably need to roll your own time.strptime function that is thread-safe as mentioned in the Python documentation for the locale module.

The problem you are running into is related to missing thread safety of the used functions and modules.
As you can see here, tweepy is not re-entrant nor thread safe. As you can see here, Python's LocaleTime is not too.
For a multi-threaded application like yours, wrap the tweepy API through your own class that is synchronized (RLock'ed). But do not derive from the tweepy class, make a has-a relationship with a private attribute to tweepy instance.

Related

Why does multiprocessing not working when opening a file?

As I am trying out the multiprocessing pool module, I noticed that it does not work when I am loading / opening any kind of file. The code below works as expected. When I uncomment lines 8-9, the script skips the pool.apply_async method, and loopingTest never runs.
import time
from multiprocessing import Pool
class MultiClass:
def __init__(self):
file = 'test.txt'
# with open(file, 'r') as f: # This is the culprit
# self.d = f
self.n = 50000000
self.cases = ['1st time', '2nd time']
self.multiProc(self.cases)
print("It's done")
def loopingTest(self, cases):
print(f"looping start for {cases}")
n = self.n
while n > 0:
n -= 1
print(f"looping done for {cases}")
def multiProc(self, cases):
test = False
pool = Pool(processes=2)
if not test:
for i in cases:
pool.apply_async(self.loopingTest, (i,))
pool.close()
pool.join()
if __name__ == '__main__':
start = time.time()
w = MultiClass()
end = time.time()
print(f'Script finished in {end - start} seconds')
You see this behavior because calling apply_async fails when you save the file descriptor (self.d) to your instance. When you call apply_async(self.loopingTest, ...), Python needs to pickle self.loopingTest to send it to the worker process, which also requires pickling self. When you have the open file descriptor saved as a property of self, the pickling fails, because file descriptors can't be pickled. You'll see this for yourself if you use apply instead of apply_async in your sample code. You'll get an error like this:
Traceback (most recent call last):
File "a.py", line 36, in <module>
w = MultiClass()
File "a.py", line 12, in __init__
self.multiProc(self.cases)
File "a.py", line 28, in multiProc
out.get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
put(task)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.TextIOWrapper' object
You need to change your code either avoiding saving the file descriptor to self, only create it in the worker method (if that's where you need to use it), or by using the tools Python provides to control the pickle/unpickle process for your class. Depending on the use-case, you can also turn the method you're passing to apply_async into a top-level function, so that self doesn't need to be pickled at all.

Flask-RQ2 Redis error: ZADD requires an equal number of values and scores

I have tried implementing the basic Flask-RQ2 setup as per the docs to attempt to write to two separate files concurrently, but I am getting the following Redis error: redis.exceptions.RedisError: ZADD requires an equal number of values and scores when the worker tries to perform a job in the Redis queue.
Here's the full stack trace:
10:20:37: Worker rq:worker:1d0c83d6294249018669d9052fd759eb: started, version 1.2.0
10:20:37: *** Listening on default...
10:20:37: Cleaning registries for queue: default
10:20:37: default: tester('time keeps on slipping') (02292167-c7e8-4040-a97b-742f96ea8756)
10:20:37: Worker rq:worker:1d0c83d6294249018669d9052fd759eb: found an unhandled exception, quitting...
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/rq/worker.py", line 515, in work
self.execute_job(job, queue)
File "/usr/local/lib/python3.6/dist-packages/rq/worker.py", line 727, in execute_job
self.fork_work_horse(job, queue)
File "/usr/local/lib/python3.6/dist-packages/rq/worker.py", line 667, in fork_work_horse
self.main_work_horse(job, queue)
File "/usr/local/lib/python3.6/dist-packages/rq/worker.py", line 744, in main_work_horse
raise e
File "/usr/local/lib/python3.6/dist-packages/rq/worker.py", line 741, in main_work_horse
self.perform_job(job, queue)
File "/usr/local/lib/python3.6/dist-packages/rq/worker.py", line 866, in perform_job
self.prepare_job_execution(job, heartbeat_ttl)
File "/usr/local/lib/python3.6/dist-packages/rq/worker.py", line 779, in prepare_job_execution
registry.add(job, timeout, pipeline=pipeline)
File "/usr/local/lib/python3.6/dist-packages/rq/registry.py", line 64, in add
return pipeline.zadd(self.key, {job.id: score})
File "/usr/local/lib/python3.6/dist-packages/redis/client.py", line 1691, in zadd
raise RedisError("ZADD requires an equal number of "
redis.exceptions.RedisError: ZADD requires an equal number of values and scores
My main is this:
#!/usr/bin/env python3
from flask import Flask
from flask_rq2 import RQ
import time
import tester
app = Flask(__name__)
rq = RQ(app)
default_worker = rq.get_worker()
default_queue = rq.get_queue()
tester = tester.Tester()
while True:
default_queue.enqueue(tester.tester, args=["time keeps on slipping"])
default_worker.work(burst=True)
with open('test_2.txt', 'w+') as f:
data = f.read() + "it works!\n"
time.sleep(5)
if __name__ == "__main__":
app.run()
and my tester.py module is thus:
#!/usr/bin/env python3
import time
class Tester:
def tester(string):
with open('test.txt', 'w+') as f:
data = f.read() + string + "\n"
f.write(data)
time.sleep(5)
I'm using the following:
python==3.6.7-1~18.04
redis==2.10.6
rq==1.2.0
Flask==1.0.2
Flask-RQ2==18.3
I've also tried the simpler setup from the documentation were you don't specify either queue or worker but implicitly rely on the Flask-RQ2 module defaults... Any help with this would be greatly appreciated.
Reading the docs a bit deeper it seems for it to work with Redis<3.0 you need Flask-RQ2==18.2.2.
I have different errors now but I can work with this.

pyserial with multiprocessing gives me a ctype error

Hi I'm trying to write a module that lets me read and send data via pyserial. I have to be able to read the data in parallel to my main script. With the help of a stackoverflow user, I have a basic and working skeleton of the program, but when I tried adding a class I created that uses pyserial (handles finding port, speed, etc) found here I get the following error:
File "<ipython-input-1-830fa23bc600>", line 1, in <module>
runfile('C:.../pythonInterface1/Main.py', wdir='C:/Users/Daniel.000/Desktop/Daniel/Python/pythonInterface1')
File "C:...\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:...\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Daniel.000/Desktop/Daniel/Python/pythonInterface1/Main.py", line 39, in <module>
p.start()
File "C:...\Anaconda3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:...\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:...\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:...\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:...\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
ValueError: ctypes objects containing pointers cannot be pickled
This is the code I am using to call the class in SerialConnection.py
import multiprocessing
from time import sleep
from operator import methodcaller
from SerialConnection import SerialConnection as SC
class Spawn:
def __init__(self, _number, _max):
self._number = _number
self._max = _max
# Don't call update here
def request(self, x):
print("{} was requested.".format(x))
def update(self):
while True:
print("Spawned {} of {}".format(self._number, self._max))
sleep(2)
if __name__ == '__main__':
'''
spawn = Spawn(1, 1) # Create the object as normal
p = multiprocessing.Process(target=methodcaller("update"), args=(spawn,)) # Run the loop in the process
p.start()
while True:
sleep(1.5)
spawn.request(2) # Now you can reference the "spawn"
'''
device = SC()
print(device.Port)
print(device.Baud)
print(device.ID)
print(device.Error)
print(device.EMsg)
p = multiprocessing.Process(target=methodcaller("ReadData"), args=(device,)) # Run the loop in the process
p.start()
while True:
sleep(1.5)
device.SendData('0003')
What am I doing wrong for this class to be giving me problems? Is there some form of restriction to use pyserial and multiprocessing together? I know it can be done but I don't understand how...
here is the traceback i get from python
Traceback (most recent call last): File "C:...\Python\pythonInterface1\Main.py", line 45, in <module>
p.start()
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:...\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj) ValueError: ctypes objects containing pointers cannot be pickled
You are trying to pass a SerialConnection instance to another process as an argument. For that python has first to serialize (pickle) the object, and it is not possible for SerialConnection objects.
As said in Rob Streeting's answer, a possible solution would be to allow the SerialConnection object to be copied to the other process' memory using the fork that occurs when multiprocessing.Process.start is invoked, but this will not work on Windows as it does not use fork.
A simpler, cross-platform and more efficient way to achieve parallelism in your code would be to use a thread instead of a process. The changes to your code are minimal:
import threading
p = threading.Thread(target=methodcaller("ReadData"), args=(device,))
I think the problem is due to something inside device being unpicklable (i.e., not serializable by python). Take a look at this page to see if you can see any rules that may be broken by something in your device object.
So why does device need to be picklable at all?
When a multiprocessing.Process is started, it uses fork() at the operating system level (unless otherwise specified) to create the new process. What this means is that the whole context of the parent process is "copied" over to the child. This does not require pickling, as it's done at the operating system level.
(Note: On unix at least, this "copy" is actually a pretty cheap operation because it used a feature called "copy-on-write". This means that both parent and child processes actually read from the same memory until one or the other modifies it, at which point the original state is copied over to the child process.)
However, the arguments of the function that you want the process to take care of do have to be pickled, because they are not part of the main process's context. So, that includes your device variable.
I think you might be able to resolve your issue by allowing device to be copied as part of the fork operation rather than passing it in as a variable. To do this though, you'll need a wrapper function around the operation you want your process to do, in this case methodcaller("ReadData"). Something like this:
if __name__ == "__main__":
device = SC()
def call_read_data():
device.ReadData()
...
p = multiprocessing.Process(target=call_read_data) # Run the loop in the process
p.start()

Putting models in a callback function from ctypes library

I am trying to setup an application based on the Google App Engine using the Managed VM feature.
I am using a shared library written in C++ using ctypes
cdll.LoadLibrary('./mylib.so')
which registers a callback function
CB_FUNC_TYPE = CFUNCTYPE(None, eSubscriptionType)
cbFuncType = CB_FUNC_TYPE(scrptCallbackHandler)
in which i want to save data to the ndb datastore
def scrptCallbackHandler(arg):
model = Model(name=str(arg.data))
model.put()
I am registering a callback function in which i want to take the Data from the C++ program and put it in the ndb datastore. This results in an error. On the devserver it behaves slightly different, so from a production server:
suspended generator _put_tasklet(context.py:343) raised BadRequestError(Application Id (app) format is invalid: '_')LOG 2 1429698464071045 suspended generator put(context.py:810) raised BadRequestError(Application Id (app) format is invalid: '_')
Traceback (most recent call last):
File "_ctypes/callbacks.c", line 314, in 'calling callback function' File "/home/vmagent/app/isw_cloud_client.py", line 343, in scrptCallbackHandler node.put()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/model.py", line 3380, in _put return self._put_async(**ctx_options).get_result()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 325, in get_result self.check_success()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 368, in _help_tasklet_along value = gen.throw(exc.__class__, exc, tb)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/context.py", line 810, in put key = yield self._put_batcher.add(entity, options)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 368, in _help_tasklet_along value = gen.throw(exc.__class__, exc, tb)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/context.py", line 343, in _put_tasklet keys = yield self._conn.async_put(options, datastore_entities)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 454, in _on_rpc_completion result = rpc.get_result()
File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result return self.__get_result_hook(self)
File "/home/vmagent/python_vm_runtime/google/appengine/datastore/datastore_rpc.py", line 1827, in __put_hook self.check_rpc_success(rpc)
File "/home/vmagent/python_vm_runtime/google/appengine/datastore/datastore_rpc.py", line 1342, in check_rpc_success raise _ToDatastoreError(err)google.appengine.api.datastore_errors.BadRequestError: Application Id (app) format is invalid: '_'
The start of the C++ program is triggered by a call to a Request handler but runs in the background and accepts incoming data which should be processed in the callback.
Update: As Tim pointed out already it seems that the context of the wsgi handler is lost. Most likely the solution here would be to create the application context somehow.
I am only guessing what is my problem and i want to tell what i did to solve it.
The execution context of the callback functions is somewhat different than the rest of the python application. Any asynchronous operation in the callback fails. I tried doing an http call or saving it to the datastore. The operations never finish and after 60s the application shows an error that they crashed. I guess this is because how the python manages the execution and the corresponding memory allocation.
I was able to execute the callback in an object's context by wrapping it in a closure within a class. This wasnt really the problem but the solution can be found in this answer: How can I get methods to work as callbacks with python ctypes?
For my solution i am now using a combination of cloud-endpoints on another module and background threads on the ctypes-module.
Within the C-Callback i start a background thread, which is able to do asynchronous work
# Start a background thread using the background thread service from GAE
background_thread.start_new_background_thread(putData, [name, value])
And here the simple task it executes:
# Here i call my cloud-endpoints
def putData(name, value):
body = {
'name' : 'name',
'value' : int(value)
}
res = service.objects().create(body=body).execute()
Of course i need to do error handling and additional stuff, but for me this is a good solution.
Note: Adding models to the datastore in the bg thread failed because the environment in the bg thread is different from the application and the app id was not set.

Error with multiprocessing, atexit and global data

Sorry in advance, this is going to be long ...
Possibly related:
Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"
Definitely related:
python parallel map (multiprocessing.Pool.map) with global data
Keyboard Interrupts with python's multiprocessing Pool
Here's a "simple" script I hacked together to illustrate my problem...
import time
import multiprocessing as multi
import atexit
cleanup_stuff=multi.Manager().list([])
##################################################
# Some code to allow keyboard interrupts
##################################################
was_interrupted=multi.Manager().list([])
class _interrupt(object):
"""
Toy class to allow retrieval of the interrupt that triggered it's execution
"""
def __init__(self,interrupt):
self.interrupt=interrupt
def interrupt():
was_interrupted.append(1)
def interruptable(func):
"""
decorator to allow functions to be "interruptable" by
a keyboard interrupt when in python's multiprocessing.Pool.map
**Note**, this won't actually cause the Map to be interrupted,
It will merely cause the following functions to be not executed.
"""
def newfunc(*args,**kwargs):
try:
if(not was_interrupted):
return func(*args,**kwargs)
else:
return False
except KeyboardInterrupt as e:
interrupt()
return _interrupt(e) #If we really want to know about the interrupt...
return newfunc
#atexit.register
def cleanup():
for i in cleanup_stuff:
print(i)
return
#interruptable
def func(i):
print(i)
cleanup_stuff.append(i)
time.sleep(float(i)/10.)
return i
#Must wrap func here, otherwise it won't be found in __main__'s dict
#Maybe because it was created dynamically using the decorator?
def wrapper(*args):
return func(*args)
if __name__ == "__main__":
#This is an attempt to use signals -- I also attempted something similar where
#The signals were only caught in the child processes...Or only on the main process...
#
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#Try 2 with signals (only catch signal on main process)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,onSigInt)
#def startup(): signal.signal(signal.SIGINT,signal.SIG_IGN)
#p=multi.Pool(processes=4,initializer=startup)
#Try 3 with signals (only catch signal on child processes)
#import signal
#def onSigInt(*args): interrupt()
#signal.signal(signal.SIGINT,signal.SIG_IGN)
#def startup(): signal.signal(signal.SIGINT,onSigInt)
#p=multi.Pool(processes=4,initializer=startup)
p=multi.Pool(4)
try:
out=p.map(wrapper,range(30))
#out=p.map_async(wrapper,range(30)).get() #This doesn't work either...
#The following lines don't work either
#Effectively trying to roll my own p.map() with p.apply_async
# results=[p.apply_async(wrapper,args=(i,)) for i in range(30)]
# out = [ r.get() for r in results() ]
except KeyboardInterrupt:
print ("Hello!")
out=None
finally:
p.terminate()
p.join()
print (out)
This works just fine if no KeyboardInterrupt is raised. However, if I raise one, the following exception occurs:
10
7
9
12
^CHello!
None
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "test.py", line 58, in cleanup
for i in cleanup_stuff:
File "<string>", line 2, in __getitem__
File "/usr/lib/python2.6/multiprocessing/managers.py", line 722, in _callmethod
self._connect()
File "/usr/lib/python2.6/multiprocessing/managers.py", line 709, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
socket.error: [Errno 2] No such file or directory
Interestingly enough, the code does exit the Pool.map function without calling any of the additional functions ... The problem seems to be that the KeyboardInterrupt isn't handled properly at some point, but it is a little confusing where that is, and why it isn't handled in interruptable. Thanks.
Note, the same problem happens if I use out=p.map_async(wrapper,range(30)).get()
EDIT 1
A little closer ... If I enclose the out=p.map(...) in a try,except,finally clause, it gets rid of the first exception ... the other ones are still raised in atexit however. The code and traceback above have been updated.
EDIT 2
Something else that does not work has been added to the code above as a comment. (Same error). This attempt was inspired by:
http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
EDIT 3
Another failed attempt using signals added to the code above.
EDIT 4
I have figured out how to restructure my code so that the above is no longer necessary. In the (unlikely) event that someone stumbles upon this thread with the same use-case that I had, I will describe my solution ...
Use Case
I have a function which generates temporary files using the tempfile module. I would like those temporary files to be cleaned up when the program exits. My initial attempt was to pack each temporary file name into a list and then delete all the elements of the list with a function registered via atexit.register. The problem is that the updated list was not being updated across multiple processes. This is where I got the idea of using multiprocessing.Manager to manage the list data. Unfortunately, this fails on a KeyboardInterrupt no matter how hard I tried because the communication sockets between processes were broken for some reason. The solution to this problem is simple. Prior to using multiprocessing, set the temporary file directory ... something like tempfile.tempdir=tempfile.mkdtemp() and then register a function to delete the temporary directory. Each of the processes writes to the same temporary directory, so it works. Of course, this solution only works where the shared data is a list of files that needs to be deleted at the end of the program's life.

Categories

Resources