I use to run long independent jobs with multiprocessing.Pool.map
import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count())
input_var = [1,2,3]
ris = pool.map(long_function,input_var)
pool.close()
pool.join()
This works well but if for example I get an error in long_function(2) I will lose all the information that I have obtained with long_function(1) and long_function(3).
is there a way to avoid this?
The best would be to obtain an output like ris=[long_function(1), ERROR, long_function(3)]
Is there anyway to do that?
def safe_long_function(*args, **kwargs):
try:
return long_function(*args, **kwargs)
except Exception as e:
return e
You basically want to catch the exceptions thrown and then return them rather than raise them.
For example
def long_function(x):
if x == 2:
raise Exception("This number is even")
import multiprocessing
pool = multiprocessing.Pool() # default is num CPUs
input_var = [1,2,3]
ris = pool.map(safe_long_function, input_var)
pool.close()
pool.join()
print ris
This will give [1, Exception("This number is even"), 3]
You can then do something like
for result in ris:
if isinstance(result, Exception):
print "Error: %s" % result
else:
print result
Related
I have n threads running simultaneously. These threads are processing a list containing m test cases. For example, thread n-1 is working on item m[i-1] while thread n is working on item m[i]. I want to stop all threads if for example thread n-1 failed or return a signal. How can I achieve this?
Here is a MWE:
This is my processing function
def process(input_addr):
i =+ 1
print('Total number of executed unit tests: {}'.format(i))
print("executed {}. thread".format(input_addr))
try:
command = 'python3 '+input_addr
result = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
msg, err = result.communicate()
if msg.decode('utf-8') != '':
stat = parse_shell(msg.decode('utf-8'))
if stat:
print('Test Failed')
return True
else:
stat = parse_shell(err)
if stat:
print('Test Failed')
return True
except Exception as e:
print("thread.\nMessage:{1}".format(e))
Here is my pool:
def pre_run_test_files(self):
with Pool(10) as p:
p.map(process, self.test_files)
I am using:
from multiprocessing import Pool
You can have your worker function, process simply raise an exception and use an error_callback function with apply_async that calls terminate on the pool as in the following demo:
from multiprocessing import Pool
def process(i):
import time
time.sleep(1)
if i == 6:
raise ValueError(f'Bad value: {i}')
print(i, flush=True)
def my_error_callback(e):
pool.terminate()
print(e)
if __name__ == '__main__':
pool = Pool(4)
for i in range(20):
pool.apply_async(process, args=(i,), error_callback=my_error_callback)
# wait for all tasks to complete
pool.close()
pool.join()
Prints:
0
1
3
2
4
5
7
Bad value: 6
You should be able to adapt the above code to your particular problem.
Update
Because your original code used the map method, there is a second solution that use methid imap_unordered, which will returns an iterator that on every iteration returns the next return value from your worker function, process, or raises an exception if your worker function raised an exception. With method imap_unordere these results are returned in an arbitrary completion order rather than in task submission order, but when the default chunksize argument of 1 is used, this arbitrary order is typically task-completion order. This is what you want so that you can detect an exception at the earliest possible time and terminate the pool. Of course, if you cared about the return values from process, then you would use method imap so that the results are returned in task-submission order. But in that case if when case i == 6 is when the exception is raised but that task happened to be the first task to complete, its exception could still not be returned until the tasks submitted for i == 1 though 5 were completed.
In the following code a pool size of 8 is used, and all tasks first sleep for 1 second before printing their arguments and returning except for the case of i == 6, which raises an exception immediately. Using imap_unordered we have:
from multiprocessing import Pool
def process(i):
import time
# raise an exception immediately for i == 6 without sleeping
if (i != 6):
time.sleep(1)
else:
raise ValueError(f'Bad value: {i}')
print(i, flush=True)
if __name__ == '__main__':
pool = Pool(8)
results = pool.imap_unordered(process, range(20))
try:
# Iterate results as task complete until
# we are done or one raises an exeption:
for result in results:
# we don't care about the return value:
pass
except Exception as e:
pool.terminate()
print(e)
pool.close()
pool.join()
Prints:
Bad value: 6
If we replace the call to imap_unordered with a call to imap, then the output is:
0
1
2
3
4
5
Bad value: 6
The first solution, using apply_async with a error_callback argument, allows for the exception to be acted upon as soon as it occurs and if you care about the results in task submission order, you can save the multiprocessing.AsyncResult objects returned by apply_async in a list and call get on these objects. Try the following code with RAISE_EXCEPTION set to True and then to False:
from multiprocessing import Pool
import time
RAISE_EXCEPTION = True
def process(i):
if RAISE_EXCEPTION and i == 6:
raise ValueError(f'Bad value: {i}')
time.sleep(1)
return i # instead of printing
def my_error_callback(e):
global got_exception
got_exception = True
pool.terminate()
print(e)
if __name__ == '__main__':
got_exception = False
pool = Pool(4)
async_results = [pool.apply_async(process, args=(i,), error_callback=my_error_callback) for i in range(20)]
# Wait for all tasks to complete:
pool.close()
pool.join()
if not got_exception:
for async_result in async_results:
print(async_result.get())
I found the solution:
def process(i, input_addr, event):
kill_flag = False
if not event.is_set():
print('Total number of executed unit tests: {}'.format(i))
print("executed {}. thread".format(input_addr))
try:
command = 'python3 '+input_addr
result = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
msg, err = result.communicate()
if msg.decode('utf-8') != '':
stat = parse_shell(msg.decode('utf-8'))
if stat:
print('Test Failed')
kill_flag = True
# all_run.append(input_addr)
#write_list_to_txt(input_addr, valid_tests)
else:
kill_flag = False
else:
stat = parse_shell(err)
if stat:
print('Test Failed')
kill_flag = True
# all_run.append(input_addr)
#write_list_to_txt(input_addr, valid_tests)
else:
kill_flag = False
except Exception as e:
print("thread.\nMessage:{1}".format(e))
if kill_flag:
event.set()
def manager():
p= multiprocessing.Pool(10)
m = multiprocessing.Manager()
event = m.Event()
for i,f in enumerate(self.test_files):
p.apply_async(process, (i, f, event))
p.close()
event.wait()
p.terminate()
According to https://docs.python.org/3/library/multiprocessing.html
multiprocessing forks (for *nix) to create a worker process to execute tasks. We can verify this by setting up a global variable in a module prior to the fork.
If the worker function imports that module and finds the variable present, then the process memory has been copied. And so it is:
import os
def f(x):
import sys
return sys._mypid # <<< value is returned by subprocess!
def set_state():
import sys
sys._mypid = os.getpid()
def g():
from multiprocessing import Pool
pool = Pool(4)
try:
for z in pool.imap(f, range(1000)):
print(z)
finally:
pool.close()
pool.join()
if __name__=='__main__':
set_state()
g()
However, if things work this way, what business does multiprocessing have in serializing the work function, f?
In this example:
import os
def set_state():
import sys
sys._mypid = os.getpid()
def g():
def f(x):
import sys
return sys._mypid
from multiprocessing import Pool
pool = Pool(4)
try:
for z in pool.imap(f, range(1000)):
print(z)
finally:
pool.close()
pool.join()
if __name__=='__main__':
set_state()
g()
we get:
AttributeError: Can't pickle local object 'g.<locals>.f'
Stackoverflow and the internet is full of ways to work around this. (Python's standard pickle function can handle functions, but not function with closure data).
But why do we get here? A copy-on-write version of f is in the forked process's memory. Why does it need to be serialized at all?
Derp -- it has to be this way because:
pool = Pool(4) <<< processes created here
for z in pool.imap(f, range(1000)): <<< reference to function
FYI... anyone wanting to fork, where the new process has access to the function (and thereby avoids serializing the function), can follow this pattern:
import collections
import multiprocessing as mp
import os
import pickle
import threading
_STATUS_DATA = 0
_STATUS_ERR = 1
_STATUS_POISON = 2
Message = collections.namedtuple(
"Message",
["status",
"payload",
"sequence_id"
]
)
def parallel_map(
target,
args,
num_processes,
inq_maxsize=None,
outq_maxsize=None,
serialize=pickle.dumps,
deserialize=pickle.loads,
start_method="fork",
preserve_order=True,
):
"""
:param target: Target function
:param args: Iterable of single parameter arguments for target.
:param num_processes: Number of processes.
:param inq_maxsize:
:param outq_maxsize:
:param serialize:
:param deserialize:
:param start_method:
:param preserve_order: If true result are returns in the order received by args. Otherwise,
first result is returned first
:return:
"""
if inq_maxsize is None: inq_maxsize=10*num_processes
if outq_maxsize is None: outq_maxsize=10*num_processes
inq = mp.Queue(maxsize=inq_maxsize)
outq = mp.Queue(maxsize=outq_maxsize)
poison = serialize(Message(_STATUS_POISON, None, -1))
deserialize(poison) # Test
def work():
while True:
obj = inq.get()
# print("{} - GET .. OK".format(os.getpid()))
# inq.task_done()
try:
msg = deserialize(obj)
assert isinstance(msg, Message)
if msg.status==_STATUS_POISON:
outq.put(serialize(Message(_STATUS_POISON,None,msg.sequence_id)))
# print("{} - RETURN POISON .. OK".format(os.getpid()))
return
else:
args, kw = msg.payload
result = target(*args,**kw)
outq.put(serialize(Message(_STATUS_DATA,result,msg.sequence_id)))
except Exception as e:
try:
outq.put(serialize(Message(_STATUS_ERR,e,msg.sequence_id)))
except Exception as e2:
try:
outq.put(serialize(Message(_STATUS_ERR,None,-1)))
# outq.put(serialize(1,Exception("Unable to serialize response")))
# TODO. Log exception
except Exception as e3:
pass
if start_method == "thread":
_start_method = threading.Thread
else:
_start_method = mp.get_context('fork').Process
processes = [
_start_method(
target=work,
name="parallel_map.work"
)
for _ in range(num_processes)]
for p in processes:
p.start()
quitting = []
def quit_processes():
if not quitting:
quitting.append(1)
# Send poison pills - kill child processes
for _ in range(num_processes):
inq.put(poison)
nsent = [0]
def send():
# Send the data
for seq_id, arg in enumerate(args):
obj = ((arg,), {})
inq.put(serialize(Message(_STATUS_DATA, obj, seq_id)))
nsent[0] += 1
quit_processes()
# Publish
sender = threading.Thread(
target=send,
name="parallel_map.sender",
daemon=True)
sender.start()
try:
# Consume
nquit = [0]
buffer = {}
nyielded = 0
while True:
result = outq.get() # Waiting here
# outq.task_done()
msg = deserialize(result)
assert isinstance(msg, Message)
if msg.status == _STATUS_POISON:
nquit[0]+=1
# print(">>> QUIT ACK {}".format(nquit[0]))
if nquit[0]>=num_processes:
break
else:
assert msg.sequence_id>=0
if preserve_order:
buffer[msg.sequence_id] = msg
while True:
if nyielded not in buffer:
break
msg = buffer.pop(nyielded)
nyielded += 1
if msg.status==_STATUS_ERR:
if isinstance(msg.payload, Exception):
raise msg.payload
else:
raise Exception("Unexpected exception")
else:
assert msg.status==_STATUS_DATA
yield msg.payload
else:
if msg.status==_STATUS_ERR:
if isinstance(msg.payload, Exception):
raise msg.payload
else:
raise Exception("Unexpected exception")
else:
assert msg.status==_STATUS_DATA
yield msg.payload
# if nyielded == nsent:
# break
except Exception as e:
raise
finally:
if not quitting:
quit_processes()
sender.join()
for p in processes:
p.join()
def f(x):
time.sleep(0.01)
if x ==-1:
raise Exception("Boo")
return x
Usage:
def f(x):
time.sleep(0.01)
if x ==-1:
raise Exception("Boo")
return x
for result in parallel_map(target=f, <<< not serialized
args=range(100),
num_processes=8,
start_method="fork"):
pass
... with that caveat: for every thread you have in your program when you fork, a puppy dies.
How to exit from a function called my multiprocessing.Pool
Here is an example of the code I am using, when I put a condition to exit from function worker when I use this as a script in terminal it halts and does not exit.
def worker(n):
if n == 4:
exit("wrong number") # tried to use sys.exit(1) did not work
return n*2
def caller(mylist, n=1):
n_cores = n if n > 1 else multiprocessing.cpu_count()
print(n_cores)
pool = multiprocessing.Pool(processes=n_cores)
result = pool.map(worker, mylist)
pool.close()
pool.join()
return result
l = [2, 3, 60, 4]
myresult = caller(l, 4)
As I said, I don't think you can exit the process running the main script from a worker process.
You haven't explained exactly why you want to do this, so this answer is a guess, but perhaps raising a custom Exception and handling it in an explict except as shown below would be an acceptable way to workaround the limitation.
import multiprocessing
import sys
class WorkerStopException(Exception):
pass
def worker(n):
if n == 4:
raise WorkerStopException()
return n*2
def caller(mylist, n=1):
n_cores = n if n > 1 else multiprocessing.cpu_count()
print(n_cores)
pool = multiprocessing.Pool(processes=n_cores)
try:
result = pool.map(worker, mylist)
except WorkerStopException:
sys.exit("wrong number")
pool.close()
pool.join()
return result
if __name__ == '__main__':
l = [2, 3, 60, 4]
myresult = caller(l, 4)
Output displayed when run:
4
wrong number
(The 4 is the number of CPUs my system has.)
The thing with pool.map is, that it will raise exceptions from child-processes only after all tasks are finished. But your comments sound like you need immediate abortion of all processing as soon as a wrong value is detected in any process. This would be a job for pool.apply_async then.
pool.apply_async offers error_callbacks, which you can use to let the pool terminate. Workers will be fed item-wise instead of chunk-wise like with the pool.map variants, so you get the chance for early exit on each processed argument.
I'm basically reusing my answer from here:
from time import sleep
from multiprocessing import Pool
def f(x):
sleep(x)
print(f"f({x})")
if x == 4:
raise ValueError(f'wrong number: {x}')
return x * 2
def on_error(e):
if type(e) is ValueError:
global terminated
terminated = True
pool.terminate()
print(f"oops: {type(e).__name__}('{e}')")
def main():
global pool
global terminated
terminated = False
pool = Pool(4)
results = [pool.apply_async(f, (x,), error_callback=on_error)
for x in range(10)]
pool.close()
pool.join()
if not terminated:
for r in results:
print(r.get())
if __name__ == '__main__':
main()
Output:
f(0)
f(1)
f(2)
f(3)
f(4)
oops: ValueError('wrong number: 4')
Process finished with exit code 0
I have implemented a parser like this,
import multiprocessing
import time
def foo(i):
try:
# some codes
except Exception, e:
print e
def worker(i):
foo(i)
time.sleep(i)
return i
if __name__ == "__main__":
pool = multiprocessing.Pool(processes=4)
result = pool.map_async(worker, range(15))
while not result.ready():
print("num left: {}".format(result._number_left))
time.sleep(1)
real_result = result.get()
pool.close()
pool.join()
My parser actually finishes all the processes but the results are not available ie, it's still inside the while loop and printing num left : 2. How I stop this? And I don't want the value of real_result variable.
I'm running Ubuntu 14.04, python 2.7
Corresponding part of my code looks like,
async_args = ((date, kw_dict) for date in dates)
pool = Pool(processes=4)
no_rec = []
def check_for_exit(msg):
print msg
if last_date in msg:
print 'Terminating the pool'
pool.terminate()
try:
result = pool.map_async(parse_date_range, async_args)
while not result.ready():
print("num left: {}".format(result._number_left))
sleep(1)
real_result = result.get(5)
passed_dates = []
for x, y in real_result:
passed_dates.append(x)
if y:
no_rec.append(y[0])
# if last_date in passed_dates:
# print 'Terminating the pool'
# pool.terminate()
pool.close()
except:
print 'Pool error'
pool.terminate()
print traceback.format_exc()
finally:
pool.join()
My bet is that you have faulty parse_date_range,
which causes a worker process to terminate without producing any result or py exception.
Probably libc's exit is called by a C module/lib due to a realy nasty error.
This code reproduces the infinite loop you observe:
import sys
import multiprocessing
import time
def parse_date_range(i):
if i == 5:
sys.exit(1) # or raise SystemExit;
# other exceptions are handled by the pool
time.sleep(i/19.)
return i
if __name__ == "__main__":
pool = multiprocessing.Pool(4)
result = pool.map_async(parse_date_range, range(15))
while not result.ready():
print("num left: {}".format(result._number_left))
time.sleep(1)
real_result = result.get()
pool.close()
pool.join()
Hope this'll help.
I am a newbie to python,i am have function that calculate feature for my data and then return a list that should be processed and written in file.,..i am using Pool to do the calculation and then and use the callback function to write into file,however the callback function is not being call,i ve put some print statement in it but it is definetly not being called.
my code looks like this:
def write_arrow_format(results):
print("writer called")
results[1].to_csv("../data/model_data/feature-"+results[2],sep='\t',encoding='utf-8')
with open('../data/model_data/arow-'+results[2],'w') as f:
for dic in results[0]:
feature_list=[]
print(dic)
beginLine=True
for key,value in dic.items():
if(beginLine):
feature_list.append(str(value))
beginLine=False
else:
feature_list.append(str(key)+":"+str(value))
feature_line=" ".join(feature_list)
f.write(feature_line+"\n")
def generate_features(users,impressions,interactions,items,filename):
#some processing
return [result1,result2,filename]
if __name__=="__main__":
pool=mp.Pool(mp.cpu_count()-1)
for i in range(interval):
if i==interval:
pool.apply_async(generate_features,(users[begin:],impressions,interactions,items,str(i)),callback=write_arrow_format)
else:
pool.apply_async(generate_features,(users[begin:begin+interval],impressions,interactions,items,str(i)),callback=write_arrow_format)
begin=begin+interval
pool.close()
pool.join()
It's not obvious from your post what is contained in the list returned by generate_features. However, if any of result1, result2, or filename are not serializable, then for some reason the multiprocessing lib will not call the callback function and will fail to do so silently. I think this is because the multiprocessing lib attempts to pickle objects before passing them back and forth between child processes and the parent process. If anything you're returning isn't "pickleable" (i.e not serializable) then the callback doesn't get called.
I've encountered this bug myself, and it turned out to be an instance of a logger object that was giving me troubles. Here is some sample code to reproduce my issue:
import multiprocessing as mp
import logging
def bad_test_func(ii):
print('Calling bad function with arg %i'%ii)
name = "file_%i.log"%ii
logging.basicConfig(filename=name,level=logging.DEBUG)
if ii < 4:
log = logging.getLogger()
else:
log = "Test log %i"%ii
return log
def good_test_func(ii):
print('Calling good function with arg %i'%ii)
instance = ('hello', 'world', ii)
return instance
def pool_test(func):
def callback(item):
print('This is the callback')
print('I have been given the following item: ')
print(item)
num_processes = 3
pool = mp.Pool(processes = num_processes)
results = []
for i in range(5):
res = pool.apply_async(func, (i,), callback=callback)
results.append(res)
pool.close()
pool.join()
def main():
print('#'*30)
print('Calling pool test with bad function')
print('#'*30)
pool_test(bad_test_func)
print('#'*30)
print('Calling pool test with good function')
print('#'*30)
pool_test(good_test_func)
if __name__ == '__main__':
main()
Hopefully this helpful and points you in the right direction.