Python multiprocessing pool.map raises IndexError

Python multiprocessing pool.map raises IndexError - python

I've developed a utility using python/cython that sorts CSV files and generates stats for a client, but invoking pool.map seems to raise an exception before my mapped function has a chance to execute. Sorting a small number of files seems to function as expected, but as the number of files grows to say 10, I get the below IndexError after calling pool.map. Does anyone happen to recognize the below error? Any help is greatly appreciated.
While the code is under NDA, the use-case is fairly simple:
Code Sample:
def sort_files(csv_files):
pool_size = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=pool_size)
sorted_dicts = pool.map(sort_file, csv_files, 1)
return sorted_dicts
def sort_file(csv_file):
print 'sorting %s...' % csv_file
# sort code
Output:
File "generic.pyx", line 17, in generic.sort_files (/users/cyounker/.pyxbld/temp.linux-x86_64-2.7/pyrex/generic.c:1723)
sorted_dicts = pool.map(sort_file, csv_files, 1)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 227, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
IndexError: list index out of range

The IndexError is an error you get somewhere in sort_file(), i.e. in a subprocess. It is re-raised by the parent process. Apparently multiprocessing doesn't make any attempt to inform us about where the error really comes from (e.g. on which lines it occurred) or even just what argument to sort_file() caused it. I hate multiprocessing even more :-(

Check further up in the command output.
In Python 3.4 at least, multiprocessing.pool will helpfully print a RemoteTraceback above the parent process traceback. You'll see something like:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.4/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/path/to/your/code/here.py", line 80, in sort_file
something = row[index]
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "generic.pyx", line 17, in generic.sort_files (/users/cyounker/.pyxbld/temp.linux-x86_64-2.7/pyrex/generic.c:1723)
sorted_dicts = pool.map(sort_file, csv_files, 1)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 227, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
IndexError: list index out of range
In the case above, the code raising the error is at /path/to/your/code/here.py", line 80
see also debugging errors in python multiprocessing

Related

Why is try except block not catching an IndexError?

I have the following function:
def get_prev_match_elos(player_id, prev_matches):
try:
last_match = prev_matches[-1]
return last_match, player_id
except IndexError:
return
Sometimes prev_matches can be an empty list so I've added the try except block to catch an IndexError. However, I'm still getting an explicit IndexError on last_match = prev_matches[-1] when I pass an empty list instead of the except block kicking in.
I've tried replicating this function in another file and it works fine! Any ideas?
Full error:
Exception has occurred: IndexError
list index out of range
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\elo.py", line 145, in get_prev_match_elos
last_match = prev_matches[-1]
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\elo.py", line 24, in engineer_elos
get_prev_match_elos(player_id, prev_matches_all_surface)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 499, in engineer_variables
engineer_elos(dal, p1_id, date, surface, params)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 99, in run_updater
engineer_variables(dal, matches_for_engineering, params)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\decorators.py", line 12, in wrapper_timer
value = func(*args, **kwargs)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 72, in main
run_updater(dal, scraper)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 645, in <module>
main()

I also can't replicate the error, but an easy fix is to not use Exceptions this way. Programming languages aren't optimized for manually handling exceptions often. They should only be used for preemptively capturing possible failures, not for normal logic. Try checking if it's empty instead.
def get_prev_match_elos(player_id, prev_matches):
if not prev_matches:
return
last_match = prev_matches[-1]
return last_match, player_id
Here's Microsoft's take, using C# as the language:

early exit from multiprocessing.Pool.map (raise in child process doesn't work)

My reproduction is wrong, as noted in Rugnar's answer. I'm leaving the code mostly as-is as I'm not sure where this falls between clarifying and changing the meaning.
I have some thousands of jobs that I need to run and would like any errors to halt execution immediately.
I wrap the task in a try / except … raise so that I can log the error (without all the multiprocessing/threading noise), then reraise.
This does not kill the main process.
What's going on, and how can I get the early exit I'm looking for?
sys.exit(1) in the child deadlocks, wrapping the try / except … raise function in yet another function doesn't work either.
$ python3 mp_reraise.py
(0,)
(1,)
(2,)
(3,)
(4,)
(5,)
(6,)
(7,)
(8,)
(9,)
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "mp_reraise.py", line 5, in f_reraise
raise Exception(args)
Exception: (0,)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "mp_reraise.py", line 14, in <module>
test_reraise()
File "mp_reraise.py", line 12, in test_reraise
p.map(f_reraise, range(10))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
Exception: (0,)
mp_reraise.py
import multiprocessing
def f_reraise(*args):
try:
raise Exception(args)
except Exception as e:
print(e)
raise
def test_reraise():
with multiprocessing.Pool() as p:
p.map(f_reraise, range(10))
test_reraise()
If I don't catch and reraise, execution stops early as expected:
[this actually does not stop, as per Rugnar's answer]
$ python3 mp_raise.py
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "mp_raise.py", line 4, in f_raise
raise Exception(args)
Exception: (0,)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "mp_raise.py", line 10, in <module>
test_raise()
File "mp_raise.py", line 8, in test_raise
p.map(f_raise, range(10))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
Exception: (0,)
mp_raise.py
import multiprocessing
def f_raise(*args):
# missing print, which would demonstrate that
# this actually does not stop early
raise Exception(args)
def test_raise():
with multiprocessing.Pool() as p:
p.map(f_raise, range(10))
test_raise()

In your mp_raise.py you dont print anything so you dont see how much jobs were done. I added print and found out that pool sees an exeption of the child only when jobs iterator is exhausted. So it never stop early.
If you need stop early after exception, try this
import time
import multiprocessing as mp
def f_reraise(i):
if abort.is_set(): # cancel job if abort happened
return
time.sleep(i / 1000) # add sleep so jobs are not instant, like in real life
if abort.is_set(): # probably we need stop job in the middle of execution if abort happened
return
print(i)
try:
raise Exception(i)
except Exception as e:
abort.set()
print('error:', e)
raise
def init(a):
global abort
abort = a
def test_reraise():
_abort = mp.Event()
# jobs should stop being fed to the pool when abort happened
# so we wrap jobs iterator this way
def pool_args():
for i in range(100):
if not _abort.is_set():
yield i
# initializer and init is a way to share event between processes
# thanks to https://stackoverflow.com/questions/25557686/python-sharing-a-lock-between-processes
with mp.Pool(8, initializer=init, initargs=(_abort,)) as p:
p.map(f_reraise, pool_args())
if __name__ == '__main__':
test_reraise()

Error pickling a `matlab` object in joblib `Parallel` context

I'm running some Matlab code in parallel from inside a Python context (I know, but that's what's going on), and I'm hitting an import error involving matlab.double. The same code works fine in a multiprocessing.Pool, so I am having trouble figuring out what the problem is. Here's a minimal reproducing test case.
import matlab
from multiprocessing import Pool
from joblib import Parallel, delayed
# A global object that I would like to be available in the parallel subroutine
x = matlab.double([[0.0]])
def f(i):
print(i, x)
with Pool(4) as p:
p.map(f, range(10))
# This prints 1, [[0.0]]\n2, [[0.0]]\n... as expected
for _ in Parallel(4, backend='multiprocessing')(delayed(f)(i) for i in range(10)):
pass
# This also prints 1, [[0.0]]\n2, [[0.0]]\n... as expected
# Now run with default `backend='loky'`
for _ in Parallel(4)(delayed(f)(i) for i in range(10)):
pass
# ^ this crashes.
So, the only problematic one is the one using the 'loky' backend.
The full traceback is:
exception calling callback for <Future at 0x7f63b5a57358 state=finished raised BrokenProcessPool>
joblib.externals.loky.process_executor._RemoteTraceback:
'''
Traceback (most recent call last):
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "~/miniconda3/envs/myenv/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/mlarray.py", line 31, in <module>
from _internal.mlarray_sequence import _MLArrayMetaClass
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/_internal/mlarray_sequence.py", line 3, in <module>
from _internal.mlarray_utils import _get_strides, _get_size, \
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/_internal/mlarray_utils.py", line 4, in <module>
import matlab
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/__init__.py", line 24, in <module>
from mlarray import double, single, uint8, int8, uint16, \
ImportError: cannot import name 'double'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 309, in __call__
self.parallel.dispatch_next()
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 731, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 510, in apply_async
future = self._workers.submit(SafeFunction(func))
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/reusable_executor.py", line 151, in submit
fn, *args, **kwargs)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 1022, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
joblib.externals.loky.process_executor._RemoteTraceback:
'''
Traceback (most recent call last):
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "~/miniconda3/envs/myenv/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/mlarray.py", line 31, in <module>
from _internal.mlarray_sequence import _MLArrayMetaClass
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/_internal/mlarray_sequence.py", line 3, in <module>
from _internal.mlarray_utils import _get_strides, _get_size, \
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/_internal/mlarray_utils.py", line 4, in <module>
import matlab
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/matlab/__init__.py", line 24, in <module>
from mlarray import double, single, uint8, int8, uint16, \
ImportError: cannot import name 'double'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test.py", line 20, in <module>
for _ in Parallel(4)(delayed(f)(i) for i in range(10)):
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 934, in __call__
self.retrieve()
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
return future.result(timeout=timeout)
File "~/miniconda3/envs/myenv/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "~/miniconda3/envs/myenv/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 309, in __call__
self.parallel.dispatch_next()
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 731, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 510, in apply_async
future = self._workers.submit(SafeFunction(func))
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/reusable_executor.py", line 151, in submit
fn, *args, **kwargs)
File "~/miniconda3/envs/myenv/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 1022, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Looking at the traceback, it seems like the root cause is an issue importing the matlab package in the child process.
It's probably worth noting that this all runs just fine if instead I had defined x = np.array([[0.0]]) (after importing numpy as np). And of course the main process has no problem with any matlab imports, so I am not sure why the child process would.
I'm not sure if this error has anything in particular to do with the matlab package, or if it's something to do with global variables and cloudpickle or loky. In my application it would help to stick with loky, so I'd appreciate any insight!
I should also note that I'm using the official Matlab engine for Python: https://www.mathworks.com/help/matlab/matlab-engine-for-python.html. I suppose that might make it hard for others to try out the test cases, so I wish I could reproduce this error with a type other than matlab.double, but I haven't found another yet.
Digging around more, I've noticed that the process of importing the matlab package is more circular than I would expect, and I'm speculating that this could be part of the problem? The issue is that when import matlab is run by loky's _ForkingPickler, first some file matlab/mlarray.py is imported, which imports some other files, one of which contains import matlab, and this causes matlab/__init__.py to be run, which internally has from mlarray import double, single, uint8, ... which is the line that causes the crash.
Could this circularity be the issue? If so, why can I import this module in the main process but not in the loky backend?

The error is caused by incorrect loading order of global objects in the child processes. It can be seen clearly in the traceback
_ForkingPickler.loads(res) -> ... -> import matlab -> from mlarray import ...
that matlab is not yet imported when the global variable x is loaded by cloudpickle.
joblib with loky seems to treat modules as normal global objects and send them dynamically to the child processes. joblib doesn't record the order in which those objects/modules were defined. Therefore they are loaded (initialized) in a random order in the child processes.
A simple workaround is to manually pickle the matlab object and load it after importing matlab inside your function.
import matlab
import pickle
px = pickle.dumps(matlab.double([[0.0]]))
def f(i):
import matlab
x=pickle.loads(px)
print(i, x)
Of course you can also use the joblib.dumps and loads to serialize the objects.
Use initializer
Thanks to the suggestion of #Aaron, you can also use an initializer (for loky) to import Matlab before loading x.
Currently there's no simple API to specify initializer. So I wrote a simple function:
def with_initializer(self, f_init):
# Overwrite initializer hook in the Loky ProcessPoolExecutor
# https://github.com/tomMoral/loky/blob/f4739e123acb711781e46581d5ed31ed8201c7a9/loky/process_executor.py#L850
hasattr(self._backend, '_workers') or self.__enter__()
origin_init = self._backend._workers._initializer
def new_init():
origin_init()
f_init()
self._backend._workers._initializer = new_init if callable(origin_init) else f_init
return self
It is a little bit hacky but works well with the current version of joblib and loky.
Then you can use it like:
import matlab
from joblib import Parallel, delayed
x = matlab.double([[0.0]])
def f(i):
print(i, x)
def _init_matlab():
import matlab
with Parallel(4) as p:
for _ in with_initializer(p, _init_matlab)(delayed(f)(i) for i in range(10)):
pass
I hope the developers of joblib will add initializer argument to the constructor of Parallel in the future.

Python program "genipe" - TypeError: init() got an unexpected keyword argument 'normalize'

I am trying to run the program genipe to do some genome-wide survival analysis. I have installed genipe and all the relevant directories. However, when I go to run the program I get the error:
"TypeError: _ init _() got an unexpected keyword argument 'normalize'"
I haven't edited any of the genipe scripts and I have run genipe with no issues on a different server so I am not sure what is going wrong! Any help would be greatly appreciated.
Many thanks,
Caragh
Edit:
I am using python version 3.6.1
Traceback as follows:
Traceback (most recent call last):
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/site-packages/genipe/tools/imputed_stats.py", line 965, in process_impute2_site
use_ml=site_info.use_ml,
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/site-packages/genipe/tools/imputed_stats.py", line 1048, in fit_cox
cf = CoxPHFitter(alpha=0.95, tie_method="Efron", normalize=False)
TypeError: __init__() got an unexpected keyword argument 'normalize'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/site-packages/genipe/tools/imputed_stats.py", line 811, in compute_statistics
for result in pool.map(process_impute2_site, sites_to_process):
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
TypeError: __init__() got an unexpected keyword argument 'normalize'
[2017-05-31 14:18:53 ERROR] __init__() got an unexpected keyword argument 'normalize'
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/site-packages/genipe/tools/imputed_stats.py", line 965, in process_impute2_site
use_ml=site_info.use_ml,
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/site-packages/genipe/tools/imputed_stats.py", line 1048, in fit_cox
cf = CoxPHFitter(alpha=0.95, tie_method="Efron", normalize=False)
TypeError: __init__() got an unexpected keyword argument 'normalize'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/users/k1640238/miniconda/envs/genipe_pyvenv/bin/imputed-stats", line 11, in <module>
sys.exit(main())
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/site-packages/genipe/tools/imputed_stats.py", line 161, in main
options=args,
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/site-packages/genipe/tools/imputed_stats.py", line 811, in compute_statistics
for result in pool.map(process_impute2_site, sites_to_process):
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/users/k1640238/miniconda/envs/genipe_pyvenv/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
TypeError: __init__() got an unexpected keyword argument 'normalize'

Judging by the lifeline changelog this keyword argument has been removed from this particular function. Lifeline is a package which contains this particular function and is used by genipe.
You can either install previous version of lifeline by yourself and see if that will help or wait for updates in genipe library.
Looking at further errors from your comments, it seems like this is problematic code. You are trying to use dmatrices but it seems like it is not defined. Probably because mentioned try/catch block couldn't find statsmodel installed and therefore patsy wasn't imported either.
Try to install few more packages manually, starting with
statsmodel
patsy
and see if you will get any errors then...

See errata's answer above, I was using the wrong versions of some of the dependancies, but even with that the program was still giving me errors. However when I reverted to python 3.4, the program worked.

Multiprocessing pool in Python 3

I had a scripts working in Python 2 and I am trying to make it work in Python 3.
One thing I stumbled upon and have no idea about how to solve is the fact that the get() method in the class Applyresult() seems to throw now:
In Pycharm the Traceback is:
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 3.4.1\helpers\pydev\pydevd.py", line 1733, in <module>
debugger.run(setup['file'], None, None)
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 3.4.1\helpers\pydev\pydevd.py", line 1226, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 3.4.1\helpers\pydev\_pydev_execfile.py", line 38, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
File "D:/SRC/CDR/trunk/RegressionTests/ExeTests/pyQCDReg_Launcher.py", line 125, in <module>
result_list = ar.get()
File "C:\Python33\lib\multiprocessing\pool.py", line 564, in get
raise self._value
TypeError: can't use a string pattern on a bytes-like object
The line offending is in pool.py:
def get(self, timeout=None):
self.wait(timeout)
if not self.ready():
raise TimeoutError
if self._success:
return self._value
else:
raise self._value // This is the line raising the exception
It is called from the following line in my script:
pool = Pool(processes=8)
ar = pool.map_async(run_regtest, command_list)
pool.close()
start_time = time.time()
while True:
elapsed = time.time() - start_time
if (ar.ready()): break
remaining = ar._number_left
print("Elapsed,", elapsed, ", Waiting for", remaining, "tasks to complete...")
time.sleep(30)
pool.join()
ar.wait()
print("finished")
result_list = ar.get() // This is the offending line causing the exception
This script was working in Python 2 and I cannot understand why it would not be working in Python 3. Does anyone have an idea why ?

From the multiprocessing documentation:
get([timeout])
Return the result when it arrives. If timeout is not None and
the result does not arrive within timeout seconds then
multiprocessing.TimeoutError is raised. If the remote call
raised an exception then that exception will be reraised by
get().
It seems likely to me that your exception comes from run_regtest.
The exception you are getting is pretty common when you switch from Python 2 to Python 3. Many functions in the standard library (and other libraries) that used to return strings now return bytes. A bytes object can be converted to a string using b.decode('utf-8'), for example, you only need to know the encoding.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing pool.map raises IndexError - python

Related

Why is try except block not catching an IndexError?

early exit from multiprocessing.Pool.map (raise in child process doesn't work)

Error pickling a `matlab` object in joblib `Parallel` context

Python program "genipe" - TypeError: init() got an unexpected keyword argument 'normalize'

Multiprocessing pool in Python 3

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing pool.map raises IndexError - python

Related

Why is try except block not catching an IndexError?

early exit from multiprocessing.Pool.map (raise in child process doesn't work)

Error pickling a `matlab` object in joblib `Parallel` context

Python program "genipe" - TypeError: __init__() got an unexpected keyword argument 'normalize'

Multiprocessing pool in Python 3

Categories

Resources

Python program "genipe" - TypeError: init() got an unexpected keyword argument 'normalize'