Python multiprocessing pool map_async freezes - python

I have a list of 80,000 strings that I am running through a discourse parser, and in order to increase the speed of this process I have been trying to use the python multiprocessing package.
The parser code requires python 2.7 and I am currently running it on a 2-core Ubuntu machine using a subset of the strings. For short lists, i.e. 20, the process runs without an issue on both cores, however if I run a list of about 100 strings, both workers will freeze at different points (so in some cases worker 1 won't stop until a few minutes after worker 2). This happens before all the strings are finished and anything is returned. Each time the cores stop at the same point given the same mapping function is used, but these points are different if I try a different mapping function, i.e. map vs map_async vs imap.
I have tried removing the strings at those indices, which did not have any affect and those strings run fine in a shorter list. Based on print statements I included, when the process appears to freeze the current iteration seems to finish for the current string and it just does not move on to the next string. It takes about an hour of run time to reach the spot where both workers have frozen and I have not been able to reproduce the issue in less time. The code involving the multiprocessing commands is:
def main(initial_file, chunksize = 2):
entered_file = pd.read_csv(initial_file)
entered_file = entered_file.ix[:, 0].tolist()
pool = multiprocessing.Pool()
result = pool.map_async(discourse_process, entered_file, chunksize = chunksize)
pool.close()
pool.join()
with open("final_results.csv", 'w') as file:
writer = csv.writer(file)
for listitem in result.get():
writer.writerow([listitem[0], listitem[1]])
if __name__ == '__main__':
main(sys.argv[1])
When I stop the process with Ctrl-C (which does not always work), the error message I receive is:
^CTraceback (most recent call last):
File "Combined_Script.py", line 94, in <module>
main(sys.argv[1])
File "Combined_Script.py", line 85, in main
pool.join()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 474, in join
p.join()
File "/usr/lib/python2.7/multiprocessing/process.py", line 145, in join
res = self._popen.wait(timeout)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 154, in wait
return self.poll(0)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 135, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 117, in worker
put((job, i, result))
File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put
wacquire()
KeyboardInterrupt
^CProcess PoolWorker-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 117, in worker
put((job, i, result))
File "/usr/lib/python2.7/multiprocessing/queues.py", line 392, in put
return send(obj)
KeyboardInterrupt
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib/python2.7/multiprocessing/util.py", line 305, in _exit_function
_run_finalizers(0)
File "/usr/lib/python2.7/multiprocessing/util.py", line 274, in _run_finalizers
finalizer()
File "/usr/lib/python2.7/multiprocessing/util.py", line 207, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 500, in _terminate_pool
outqueue.put(None) # sentinel
File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put
wacquire()
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib/python2.7/multiprocessing/util.py", line 305, in _exit_function
_run_finalizers(0)
File "/usr/lib/python2.7/multiprocessing/util.py", line 274, in _run_finalizers
finalizer()
File "/usr/lib/python2.7/multiprocessing/util.py", line 207, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 500, in _terminate_pool
outqueue.put(None) # sentinel
File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put
wacquire()
KeyboardInterrupt
When I look at the memory in another command window using htop, memory is at <3% once the workers freeze. This is my first attempt at parallel processing and I am not sure what else I might be missing?

I was not able to solve the issue with multiprocessing pool, however I came across the loky package and was able to use it to run my code with the following lines:
executor = loky.get_reusable_executor(timeout = 200, kill_workers = True)
results = executor.map(discourse_process, entered_file)

You could define a time to your process to return a result, otherwise it would raise an error:
try:
result.get(timeout = 1)
except multiprocessing.TimeoutError:
print("Error while retrieving the result")
Also you could verify if your process is succesful with
import time
while True:
try:
result.succesful()
except Exception:
print("Result is not yet succesful")
time.sleep(1)
Finally, checking out https://docs.python.org/2/library/multiprocessing.html ,is helpful.

Related

InvalidStateError when retrieving results from concurrent.futures.ProcessPoolExecutor

I'm using the ProcessPoolExecuter from concurrent.future to distribute a task across a number of processes.
The processes return results which I collect into a list in the main process. However, I get an InvalidStateError (and a BrokenProcessPool error) when iterating over these results, and don't understand how to avoid this.
Here's the relevant code:
from concurrent.futures import ProcessPoolExecutor as Pool # requires python 3.8
# ...
with Pool() as pool:
result = pool.map(self.run_sample, dataset)
# This is the line that seems to cause the error:
for i, sample in enumerate(result):
# ...
# ...
def run_sample(self, sample:DataSample ):
# Function run in seperate Processes
# Do something with sample
# ...
return sample
When I iterate over that list of results, I sometimes (i.e. every ~30 000 samples or so) get the following error. Note that the error seems to be caused by the iteration in for i, sample in enumerate(result):
Exception in thread QueueManagerThread:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/concurrent/futures/process.py", line 394, in _queue_management_worker
work_item.future.set_exception(bpe)
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 547, in set_exception
raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7f40f279b250 state=cancelled>
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Traceback (most recent call last):
File "src/run_nonrigid_displacement.py", line 78, in <module>
pipeline.run( dataset )
File "/home/me/Projects/Deformation/nonrigid-data-generation-pipeline2/src/core/pipeline.py", line 170, in run
for i, sample in enumerate(result):
File "/usr/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
How should I (safely and cleanly) aggregate and process results from the ProcessPoolExecuter?
Using python3.8 and pip list | grep future returns "0.18.2"

How to run decorated function in a separate and terminatable process?

I am dealing with an existing test suite, where we want to implement a timeout functionality, which will cause a hanging test to time out and then move on with its regular teardown/cleanup.
I am toying with the idea of running each test in a process, which I can terminate after e.g. a timeout of 3 seconds. Ideally, I don't want to modify the test cases and instead just add a decorator indicating the test is affected by this timeout behavior.
This is what I have, a minimal example:
import multiprocessing
import sys
from time import sleep
def timeout(func):
def wrapper():
proc = multiprocessing.Process(target=func)
proc.start()
sleep(3)
proc.terminate()
return wrapper
#timeout
def my_test():
while True:
sleep(1)
if __name__ == "__main__":
my_test()
But for some reason, it seems pickle cannot deal with this and the decorator somehow messes up the reference to the function, as this error is hit:
$ python multiproc.py
2019-11-07 07:34:37.098 | DEBUG | __main__:wrapper:13 - In wrapper
Traceback (most recent call last):
File "multiproc.py", line 30, in <module>
my_test()
File "multiproc.py", line 15, in wrapper
proc.start()
File "C:\Python36\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Python36\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Python36\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Python36\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function my_test at 0x000001C0A7E87400>: it's not the same object as __main__.my_test
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python36\lib\multiprocessing\spawn.py", line 99, in spawn_main
new_handle = reduction.steal_handle(parent_pid, pipe_handle)
File "C:\Python36\lib\multiprocessing\reduction.py", line 82, in steal_handle
_winapi.PROCESS_DUP_HANDLE, False, source_pid)
OSError: [WinError 87] The parameter is incorrect
Does anyone have an idea if this can be solved without modifying the existing test case?

Python Multiprocessing Processes: Delay in recognizing Event when using Manager Queue

I have the following codes:
from multiprocessing import Process, Manager, Event
manager = Manager()
shared_Queue = manager.Queue(10)
ev = Event()
def do_this(shared_queue, ev):
while not ev.is_set():
if not shared_Queue.__getattribute__('empty')():
item = shared_queue.get()
print item
print 'released!'
subprocs = []
for i in xrange(10):
subproc = Process(target=do_this, args=(shared_Queue, ev, ))
subprocs.append(subproc)
subproc.start()
now, if I run this, and I ask whether these processes are alive:
for subproc in subprocs: print subproc.is_alive()
of course I get all Trues.
After couple of doing these: * there is no error if I don't do these!
shared_Queue.put(3)
shared_Queue.put(5)
Now I want to set the Event to kill all of them using:
ev.set()
But then instead of seeing 'released!' 10 times, I get varying number of these prints, and after about 2 to 5 seconds, I get a bunch of errors:
released!
released!
released!
released!
released!
released!
released!
Process Process-10:
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 258, in _bootstrap
self.run()
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "<input>", line 10, in do_this
File "<string>", line 2, in get
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/managers.py", line 759, in _callmethod
kind, result = conn.recv()
EOFError
Process Process-5:
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 258, in _bootstrap
self.run()
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "<input>", line 10, in do_this
File "<string>", line 2, in get
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/managers.py", line 759, in _callmethod
kind, result = conn.recv()
EOFError
Process Process-7:
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 258, in _bootstrap
self.run()
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "<input>", line 10, in do_this
File "<string>", line 2, in get
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
multiprocessing/managers.py", line 759, in _callmethod
kind, result = conn.recv()
EOFError
Why is it that some processes are unable to recognize the Event set and show up as errors later? Is there a better way to signal them to die?
Thanks for the comment stovfl, you are right, ev.set() does not kill anything I was carelessly using the word.
As for the issue I was having, I learned that multiprocessing Queue is process and thread safe, meaning, my process will halt just before writing something into the Queue if the Queue is already full.
If I try to set the event while some of the processes are still waiting to write something to the full Queue, they will not recognize the event set.
The key was to empty all the Queue, let the process write to it, and the get to the first line where it can check on the event!

Python multiprocessing on For Loop

First of all, I know there are quite some threads about multiprocessing on python already, but none of these seems to solve my problem.
Here is my problem:
I want to implement Random Forest Algorithm, and a naive way to do so would be like this:
def random_tree(Data):
tree = calculation(Data)
forest.append(tree)
forest = list()
for i in range(300):
random_tree(Data)
And theforest with 300 "trees" inside would be my final result. In this case, how do I turn this code into a multiprocessing version?
Update:
I just tried Mukund M K's method, in a very simplified script:
from multiprocessing import Pool
def f(x):
return 2*x
data = np.array([1,2,5])
pool = Pool(processes=4)
forest = pool.map(f, (data for i in range(4)))
# I use range() instead of xrange() because I am using Python 3.4
And now....the script is running like forever.....I open a python shell and enter the script line by line, and this is the messages I've got:
> Process SpawnPoolWorker-1:
> Process SpawnPoolWorker-2:
> Traceback (most recent call last):
> Process SpawnPoolWorker-3:
> Traceback (most recent call last):
> Process SpawnPoolWorker-4:
> Traceback (most recent call last):
> Traceback (most recent call last):
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
task = get()
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
task = get()
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
task = get()
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
task = get()
> File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
> File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
> AttributeError: Can't get attribute 'f' on
> AttributeError: Can't get attribute 'f' on
File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
> AttributeError: Can't get attribute 'f' on
File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
> AttributeError: Can't get attribute 'f' on
Update: I edited my sample code according to some other example code like this:
from multiprocessing import Pool
import numpy as np
def f(x):
return 2*x
if __name__ == '__main__':
data = np.array([1,2,3])
with Pool(5) as p:
result = p.map(f, (data for i in range(300)))
And it works now. What I need to do now is to fill in this with more sophisticated algorithm now..
Yet another question in my mind is: why could this code work, while the previous version couldn't?
You can do it with multiprocessing this way:
from multiprocessing import Pool
def random_tree(Data):
return calculation(Data)
pool = Pool(processes=4)
forest = pool.map(random_tree, (Data for i in range(300)))
Package processing might help you. Check it out here.

How can I catch a memory error in a spawned thread?

I've never used the multiprocessing library before, so all advice is welcome..
I've got a python program that uses the multiprocessing library to do some memory-intensive tasks in multiple processes, which occasionally runs out of memory (I'm working on optimizations, but that's not what this question is about). Sometimes, an out-of-memory error gets thrown in a way that I can't seem to catch (output below), and then the program hangs on pool.join() (I'm using multiprocessing.Pool. How can I make the program do something other than indefinitely wait when this problem occurs?
Ideally, The memory error is propagated back to the main process which then dies.
Here's the memory error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 325, in _handle_workers
pool._maintain_pool()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 229, in _maintain_pool
self._repopulate_pool()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 222, in _repopulate_pool
w.start()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
And here's where i manage multiprocessing:
mp_pool = mp.Pool(processes=num_processes)
mp_results = list()
for datum in input_data:
data_args = {
'value': 0 // actually some other simple dict key/values
}
mp_results.append(mp_pool.apply_async(_process_data, args=(common_args, data_args)))
frame_pool.close()
frame_pool.join() // hangs here when that thread dies..
for result_async in mp_results:
result = result_async.get()
// do stuff to collect results
// rest of the code
When I interrupt the hanging program, I get:
Process process_003:
Traceback (most recent call last):
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/queues.py", line 374, in get
return recv()
racquire()
KeyboardInterrupt
This is actually a known bug in python's multiprocessing module, fixed in python 3 (here's a summarizing blog post I found). There's a patch attached to python issue 22393, but that hasn't been officially applied.
Basically, if one of a multiprocess pool's sub-processes die unexpectedly (out of memory, killed externally, etc.), the pool will wait indefinitely.

Categories

Resources