Parallelize Functions in Numpy

Parallelize Functions in Numpy - python

I have a function ComparePatchMany written in numpy, which does some basic matrix functions (dot product, diagonal, etc), and due to the size of matrices I'm using, is too slow. In order to achieve some speedup, I want to run calls to this function in parallel. Because of memory issues, I can't seem to call this on any more than 100 stacked matrices at a time. So simply running ComparePatchMany on a giant matrix is out (though it works in MatLab).
What I have right now is:
def comparePatchManyRunner(tex_flat,imMask_flat,s_tex,metrics,i):
metrics[i] = ComparePatchMany.main(tex_flat[imMask_flat==1,:],np.reshape(s_tex[:,i],(-1,1)))
# N = 100
def main(TexLib,tex,OperationMask,N,gpu=0):
if gpu:
print 'ERROR: GPU Capability not set'
else:
tex_flat = np.array([tex.flatten('F')]).T
CreateGrid = np.ones((TexLib.Gr.l_y.shape[1],TexLib.Gr.l_x.shape[1]))
PatchMap = np.nan*CreateGrid
MetricMap = np.nan*CreateGrid
list_of_patches = np.argwhere(CreateGrid>0)
for i in range(list_of_patches.shape[0]):
y,x = list_of_patches[i]
imMask = TexLib.obtainMask(y,x)
Box = [TexLib.Gr.l_x[0,x],TexLib.Gr.l_x[-1,x],TexLib.Gr.l_y[0,y],TexLib.Gr.l_y[-1,y]]
imMaskO = imMask
imMask = imMask & OperationMask
imMask_flat = np.dstack((imMask,imMask,imMask))
if gpu:
print 'ERROR! GPU Capability not yet implemented'
# TODO
else:
imMask_flat = imMask_flat.flatten('F')
if np.sum(imMask)<8:
continue
indd_s = np.random.randint(TexLib.NumTexs,size=(1,N*5))
s_tex = TexLib.ImW[imMask_flat==1][:,np.squeeze(indd_s)]
s_tex = s_tex.astype('float32')
if gpu:
print 'ERROR! GPU Capability not yet implemented'
# TODO
else:
metrics = np.zeros((N*5,1))
shared_arr = multiprocessing.Array('d',metrics)
processes = [multiprocessing.Process(target=comparePatchManyRunner, args=(tex_flat,imMask_flat,s_tex,shared_arr,i)) for i in xrange(N*5)]
for p in processes:
p.start()
for p in processes:
p.join()
metrics = shared_arr
print metrics
I think this may be creating 500 processes, which could be an issue. One problem I keep running into with this and previous versions is IOError: [Errno 32] Broken pipe, which originates from p.start().
I'm working on Windows with Python 2.7, NumPy 1.8, and SciPy 0.13.2.
EDIT:
Comments suggested I use pools. So I'm trying this:
metrics = np.zeros((N*5,1))
shared_arr = multiprocessing.Array('d',metrics,lock=False)
po = multiprocessing.Pool(processes=2)
po.map_async(comparePatchManyRunner,((tex_flat,imMask_flat,s_tex,shared_arr,idex) for idex in xrange(N*5)))
But it doesn't seem to be writing anything to shared_arr, and I keep getting a PicklingError:
Exception in thread Thread-29:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Python27\lib\multiprocessing\pool.py", line 342, in _handle_tasks
put(task)
PicklingError: Can't pickle <class 'multiprocessing.sharedctypes.c_double_Array_500'>: attribute lookup multiprocessing.sharedctypes.c_double_Array_500 failed

Related

Error when running processes in Python with a list as input

I'm trying to get this code to work:
CODE
Process run method
def process(tetrahedrons, startOfWork, endOfWork, RelativeResult):
print("Start of the process", mp.current_process().name)
result = False
for c in range(startOfWork,endOfWork):
for j in range(c+1,len(tetrahedrons)):
if tetrahedrons[c].IsInterpenetrated(tetrahedrons[j]) == 1:
result = True
print("Relative result = ",result, end="\n\n")
RelativeResult.put(result)
Parallel search method
#staticmethod
def parallelInterpenetrationSearchWithProcess(tetrahedrons):
'''
parallel verification of the existence of an interpenetration between tetrahedra
'''
time_start = datetime.datetime.now()
N_CORE = mp.cpu_count()
RelativeResult = mp.Queue()
Workers = []
dimList = len(tetrahedrons)
QuantityForWorker = int(dimList / N_CORE)
Rest = dimList % N_CORE
StartWork = 0
if QuantityForWorker == 0:
iterations = Rest -1
else:
iterations = N_CORE
for i in range(iterations):
EndWork = StartWork + QuantityForWorker
if i < Rest -1:
EndWork = EndWork + 1
IdWork = i
Workers.append(mp.Process(target= process, args=(tetrahedrons, StartWork, EndWork, RelativeResult,)))
Workers[IdWork].start()
StartWork = EndWork
for worker in Workers:
worker.join()
while not(RelativeResult.empty()):
if RelativeResult.get():
print("Parallel search took %f seconds" % ( timeelapsed(time_start) ))
return True
print("Parallel search took %f seconds" % ( timeelapsed(time_start) ))
return False
What this code does: the static method of parallel search takes as input a list of "tetrahedron" objects, and checks if a certain property is verified between these elements, which case would be the search for an intersection between the two tetrahedra.
The parallel search method is called by a main method which builds the list of tetrahedra and then passes it to the parallel search method to verify the above property.
I tried to test this code with a tetrahedron list containing 100 elements and it works fine but when I try to run it with a higher input, "specifically 160000 elements", I get the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
So I thought that the problem could lie in the use of the current parameters, since going to pass a list containing 160000 elements to a process, the latter would copy this list in memory, and carrying out this procedure for 4 or 8 processes, would not have been the maximum. Convinced of this solution then I had done in such a way as to pass a part of the list and only one element to be checked to each process, but this approach also gave negative results. So I tried to share the list between processes with the help of multiprocessing.Manager (). Exactly:
#staticmethod
def parallelInterpenetrationSearchWithProcess(tetrahedrons):
'''
parallel verification of the existence of an interpenetration between tetrahedra
'''
time_start = datetime.datetime.now()
N_CORE = mp.cpu_count()
RelativeResult = mp.Queue()
Workers = []
trList = mp.Manager().list(tetrahedrons)
dimList = len(tetrahedrons)
QuantityForWorker = int(dimList / N_CORE)
Rest = dimList % N_CORE
StartWork = 0
if QuantityForWorker == 0:
iterations = Rest -1
else:
iterations = N_CORE
for i in range(iterations):
EndWork = StartWork + QuantityForWorker
if i < Rest -1:
EndWork = EndWork + 1
IdWork = i
Workers.append(mp.Process(target= process, args=(trList, StartWork, EndWork, RelativeResult,)))
Workers[IdWork].start()
StartWork = EndWork
for worker in Workers:
worker.join()
while not(RelativeResult.empty()):
if RelativeResult.get():
print("Parallel search took %f seconds" % ( timeelapsed(time_start) ))
return True
print("Parallel search took %f seconds" % ( timeelapsed(time_start) ))
return False
But this approach gives me the following error:
Traceback (most recent call last):
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\Users\85612\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
cli.main()
File "c:\Users\85612\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main
run()
File "c:\Users\85612\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 267, in run_file
runpy.run_path(options.target, run_name=compat.force_str("__main__"))
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\Users\85612\Documents\GMarco2\tet\main.py", line 19, in <module>
out = mesh.mesh.parallelInterpenetrationSearchWithProcess(trList)
File "c:\Users\85612\Documents\GMarco2\tet\mesh\mesh.py", line 264, in parallelInterpenetrationSearchWithProcess
trList = mp.Manager().list(tetrahedrons)
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\multiprocessing\managers.py", line 701, in temp
token, exp = self._create(typeid, *args, **kwds)
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\multiprocessing\managers.py", line 586, in _create
id, exposed = dispatch(conn, None, 'create', (typeid,)+args, kwds)
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\multiprocessing\managers.py", line 78, in dispatch
c.send((id, methodname, args, kwds))
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\85612\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
RecursionError: maximum recursion depth exceeded while pickling an object
In the first case with the error: EOFError: Ran out of input it seems as if trying to read from file. I don't know if this message is due to Python being an interpreted language or not. I have read a lot of questions about it with different solutions but none that could help me. However, I have noticed that Pickle is mentioned in many of these solutions. From what I understand Pickle is essentially concerned with converting a hierarchy of Python objects into a stream of bytes. I'm new to python and all this information is creating a bit of confusion.
Question
In conclusion I would like to know what my problem could be, if I am wrong in something or if I am missing some piece of the puzzle to be able to make my code work. A brief explanation of the role Pickle plays in my code would also be welcome if possible. Thanks in advance.
Looking through the online documentation on pickle I noticed that this module is used in multiprocessing information passing.
So the doubt I have is that my classes are badly managed by pickle or rather that they cannot be pickled. I am trying to verify that this statement is true, but I am having quite a few difficulties in making my code digestible to pickle.
This is my tetrahedron class
def handedness(vertices):
"""
returns the handedness of a list of vertices
"""
# compute handedness
c = np.cross(vertices[1].coords-vertices[0].coords , vertices[2].coords-vertices[0].coords)
h = np.inner( c, vertices[3].coords-vertices[0].coords )
return h
class tetrahedron:
"""
a tetrahedron
"""
def __init__(self, vertices):
"""
initialize a tetrahedron
"""
# compute handedness
h = handedness(vertices)
# vertex indices with correct right-handed orientation
if h > 0:
self.vertices = vertices
else:
s = "old h = %f" % (h)
# swap last two vertices
self.vertices = [vertices[0], vertices[1], vertices[3], vertices[2]]
vertices = self.vertices
h = handedness(vertices)
s = s + " new h = %f" % (h)
print(s)
This is my vertex class
class vertex:
"""
a vertex
"""
def __init__(self, coords, idx):
"""
initialize a vertex
"""
# vertex coordinates
self.coords = numpy.array(coords)
# incident triangles
self.triangles = []
# id
self.idx = idx

Multi-GPU training of AllenNLP coreference resolution

I'm trying to replicate (or come close) to the results obtained by the End-to-end Neural Coreference Resolution paper on the CoNLL-2012 shared task. I intend to do some enhancements on top of this, so I decided to use AllenNLP's CoreferenceResolver. This is how I'm initialising & training the model:
import torch
from allennlp.common import Params
from allennlp.data import Vocabulary
from allennlp.data.dataset_readers import ConllCorefReader
from allennlp.data.dataset_readers.dataset_utils import Ontonotes
from allennlp.data.iterators import BasicIterator, MultiprocessIterator
from allennlp.data.token_indexers import SingleIdTokenIndexer, TokenCharactersIndexer
from allennlp.models import CoreferenceResolver
from allennlp.modules import Embedding, FeedForward
from allennlp.modules.seq2seq_encoders import PytorchSeq2SeqWrapper
from allennlp.modules.seq2vec_encoders import CnnEncoder
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder
from allennlp.modules.token_embedders import TokenCharactersEncoder
from allennlp.training import Trainer
from allennlp.training.learning_rate_schedulers import LearningRateScheduler
from torch.nn import LSTM, ReLU
from torch.optim import Adam
def read_data(directory_path):
data = []
for file_path in Ontonotes().dataset_path_iterator(directory_path):
data += dataset_reader.read(file_path)
return data
INPUT_FILE_PATH_TEMPLATE = "data/CoNLL-2012/v4/data/%s"
dataset_reader = ConllCorefReader(10, {"tokens": SingleIdTokenIndexer(),
"token_characters": TokenCharactersIndexer()})
training_data = read_data(INPUT_FILE_PATH_TEMPLATE % "train")
validation_data = read_data(INPUT_FILE_PATH_TEMPLATE % "development")
vocabulary = Vocabulary.from_instances(training_data + validation_data)
model = CoreferenceResolver(vocab=vocabulary,
text_field_embedder=BasicTextFieldEmbedder({"tokens": Embedding.from_params(vocabulary, Params({"embedding_dim": embeddings_dimension, "pretrained_file": "glove.840B.300d.txt"})),
"token_characters": TokenCharactersEncoder(embedding=Embedding(num_embeddings=vocabulary.get_vocab_size("token_characters"), embedding_dim=8, vocab_namespace="token_characters"),
encoder=CnnEncoder(embedding_dim=8, num_filters=50, ngram_filter_sizes=(3, 4, 5), output_dim=100))}),
context_layer=PytorchSeq2SeqWrapper(LSTM(input_size=400, hidden_size=200, num_layers=1, dropout=0.2, bidirectional=True, batch_first=True)),
mention_feedforward=FeedForward(input_dim=1220, num_layers=2, hidden_dims=[150, 150], activations=[ReLU(), ReLU()], dropout=[0.2, 0.2]),
antecedent_feedforward=FeedForward(input_dim=3680, num_layers=2, hidden_dims=[150, 150], activations=[ReLU(), ReLU()], dropout=[0.2, 0.2]),
feature_size=20,
max_span_width=10,
spans_per_word=0.4,
max_antecedents=250,
lexical_dropout=0.5)
if torch.cuda.is_available():
cuda_device = 0
model = model.cuda(cuda_device)
else:
cuda_device = -1
iterator = BasicIterator(batch_size=1)
iterator.index_with(vocabulary)
optimiser = Adam(model.parameters(), weight_decay=0.1)
Trainer(model=model,
train_dataset=training_data,
validation_dataset=validation_data,
optimizer=optimiser,
learning_rate_scheduler=LearningRateScheduler.from_params(optimiser, Params({"type": "step", "step_size": 100})),
iterator=iterator,
num_epochs=150,
patience=1,
cuda_device=cuda_device).train()
After reading the data I've trained the model but ran out of GPU memory: RuntimeError: CUDA out of memory. Tried to allocate 4.43 GiB (GPU 0; 11.17 GiB total capacity; 3.96 GiB already allocated; 3.40 GiB free; 3.47 GiB cached). Therefore, I attempted to make use of multiple GPUs to train this model. I'm making use of Tesla K80s (which have 12GiB memory).
I've tried making use of AllenNLP's MultiprocessIterator, by itialising the iterator as MultiprocessIterator(BasicIterator(batch_size=1), num_workers=torch.cuda.device_count()). However, only 1 GPU is being used (by monitoring the memory usage through the nvidia-smi command) & got the error below. I also tried fiddling with its parameters (increasing num_workers or decreasing output_queue_size) & the ulimit (as mentioned by this PyTorch issue) to no avail.
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/user/.local/lib/python3.6/site-packages/allennlp/data/iterators/multiprocess_iterator.py", line 32, in _create_tensor_dicts
output_queue.put(tensor_dict)
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/user/.local/lib/python3.6/site-packages/allennlp/data/iterators/multiprocess_iterator.py", line 32, in _create_tensor_dicts
output_queue.put(tensor_dict)
File "<string>", line 2, in put
File "<string>", line 2, in put
File "/usr/lib/python3.6/multiprocessing/managers.py", line 772, in _callmethod
raise convert_to_error(kind, result)
File "/usr/lib/python3.6/multiprocessing/managers.py", line 772, in _callmethod
raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/managers.py", line 228, in serve_client
request = recv()
File "/usr/lib/python3.6/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/home/user/.local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 276, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
return recvfds(s, 1)[0]
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds
len(ancdata))
RuntimeError: received 0 items of ancdata
---------------------------------------------------------------------------
I also tried achieving this through PyTorch's DataParallel, by wrapping the model's context_layer, mention_feedforward, antecedent_feedforward with a custom DataParallelWrapper (to provide compatibility with the AllenNLP-assumed class functions). Still, only 1 GPU is used & it eventually runs out of memory as before.
class DataParallelWrapper(DataParallel):
def __init__(self, module):
super().__init__(module)
def get_output_dim(self):
return self.module.get_output_dim()
def get_input_dim(self):
return self.module.get_input_dim()
def forward(self, *inputs):
return self.module.forward(inputs)

After some digging through the code I found out that AllenNLP does this under the hood directly through its Trainer. The cuda_device can either be a single int (in the case of single-processing) or a list of ints (in the case of multi-processing):
cuda_device : Union[int, List[int]], optional (default = -1)
An integer or list of integers specifying the CUDA device(s) to use. If -1, the CPU is used.
So all GPU devices needed should be passed on instead:
if torch.cuda.is_available():
cuda_device = list(range(torch.cuda.device_count()))
model = model.cuda(cuda_device[0])
else:
cuda_device = -1
Note that the model still has to be manually moved to the GPU (via model.cuda(...)), as it would otherwise try to use multiple CPUs instead.

Ways to handle exceptions in Dask distributed

I'm having a lot of success using Dask and Distributed to develop data analysis pipelines. One thing that I'm still looking forward to improving, however, is the way I handle exceptions.
Right now if, I write the following
def my_function (value):
return 1 / value
results = (dask.bag
.from_sequence(range(-10, 10))
.map(my_function))
print(results.compute())
... then on running the program I get a long, long list of tracebacks (one per worker, I'm guessing). The most relevant segment being
distributed.utils - ERROR - division by zero
Traceback (most recent call last):
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/distributed/utils.py", line 193, in f
result[0] = yield gen.maybe_future(func(*args, **kwargs))
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/distributed/client.py", line 1473, in _get
result = yield self._gather(packed)
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/distributed/client.py", line 923, in _gather
st.traceback)
File "/Users/ajmazurie/test/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/mnt/lustrefs/work/aurelien.mazurie/test_dask/.env/pyenv-3.6.0-default/lib/python3.6/site-packages/dask/bag/core.py", line 1411, in reify
File "test.py", line 9, in my_function
return 1 / value
ZeroDivisionError: division by zero
Here, of course, a visual inspection will tell me that the error was dividing a number by zero. What I'm wondering is if there is a better way to track these errors. For example, I cannot seem to be able to catch the exception itself:
import dask.bag
import distributed
try:
dask_scheduler = "127.0.0.1:8786"
dask_client = distributed.Client(dask_scheduler)
def my_function (value):
return 1 / value
results = (dask.bag
.from_sequence(range(-10, 10))
.map(my_function))
#dask_client.persist(results)
print(results.compute())
except Exception as e:
print("error: %s" % e)
EDIT: Note that in my example I'm using distributed, not just dask. There is a dask-scheduler listening on port 8786 with four dask-worker processes registered to it.
This code will produce the exact same output as above, meaning that I'm not actually catching the exception with my try/except block.
Now, since we're talking of distributed tasks across a cluster it is obviously non trivial to propagate exceptions back to me. Is there any guideline to do so? Right now my solution is to have functions return both a result and an optional error message, then process the results and error messages separately:
def my_function (value):
try:
return {"result": 1 / value, "error": None}
except ZeroDivisionError:
return {"result": None, "error": "boom!"}
results = (dask.bag
.from_sequence(range(-10, 10))
.map(my_function))
dask_client.persist(results)
errors = (results
.pluck("error")
.filter(lambda x: x is not None)
.compute())
print(errors)
results = (results
.pluck("result")
.filter(lambda x: x is not None)
.compute())
print(results)
This works, but I'm wondering if I'm sandblasting the soup cracker here. EDIT: Another option would be to use something like a Maybe monad, but once again I'd like to know if I'm overthinking it.

Dask automatically packages up exceptions that occurred remotely and reraises them locally. Here is what I get when I run your example
In [1]: from dask.distributed import Client
In [2]: client = Client('localhost:8786')
In [3]: import dask.bag
In [4]: try:
...: def my_function (value):
...: return 1 / value
...:
...: results = (dask.bag
...: .from_sequence(range(-10, 10))
...: .map(my_function))
...:
...: print(results.compute())
...:
...: except Exception as e:
...: import pdb; pdb.set_trace()
...: print("error: %s" % e)
...:
distributed.utils - ERROR - division by zero
> <ipython-input-4-17aa5fbfb732>(13)<module>()
-> print("error: %s" % e)
(Pdb) pp e
ZeroDivisionError('division by zero',)

You could wrap your function like so:
def exception_handler(orig_func):
def wrapper(*args,**kwargs):
try:
return orig_func(*args,**kwargs)
except:
import sys
sys.exit(1)
return wrapper
You could use a decorator or do:
wrapped = exception_handler(my_function)
dask_client.map(wrapper, range(100))
This seems to automatically rebalance tasks if a worker fails. But I don't know how to remove the failed worker from the pool.

Working withmultiprocessing and h5py

I get an error when trying to run a command with joblib/multiprocessing in parallel:
Here the traceback:
Process PoolWorker-263:
Traceback (most recent call last):
File "/home/marcel/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/marcel/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/marcel/anaconda/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/home/marcel/.local/lib/python2.7/site-packages/joblib/pool.py", line 363, in get
File "_objects.pyx", line 240, in h5py._objects.ObjectID.__cinit__ (h5py/_objects.c:2994)
TypeError: __cinit__() takes exactly 1 positional argument (0 given)
As you can see from the error message I work with data loaded using h5py. To complicate things further the routine I want to parallelize uses numba in one of its subroutines, but I hope that does not matter.
Here is a running example, which you can copy and paste:
from joblib import Parallel,delayed
import numpy as np
import h5py as h5
import os
def testfunc(h5data, row):
# some very boneheaded CPU work
data_slice = h5data[:,row,...]
ma = np.mean(data_slice, axis = 1)
x = row
return ma, x
def run():
data = np.random.random((100,100,100))
print data
f_out = h5.File('tmp.h5', 'w')
dset = f_out.create_dataset('mydata', data = data )
f_out.close()
f_in = h5.File('tmp.h5', 'r')
h5data = f_in['mydata']
pool = Parallel(n_jobs=-1,verbose=1,pre_dispatch='all')
results = pool(delayed(testfunc)(h5data, i) for i in range(h5data.shape[1]))
f_in.close()
os.remove('tmp.h5')
if __name__ == '__main__':
run()
Any ideas, what I'm doing wrong?
Edit: Okay at least I can exclude numba from the list of evildoers...

1You can try to replace ˋjoblibwith [pathos][1] which replacespicklewithdill`. This solves generally all pickling issues.

Multiprocessing over grid points of a 2D function

I have a two dimensional function and I want to compute the elements of the function on the grid points but the two loops over rows and columns are very slow and I want to use multiprocessing to increase the speed of the code. I have written the following code to do two loops:
from multiprocessing import Pool
#Grid points
ra = np.linspace(25.1446, 25.7329, 1000)
dec = np.linspace(-10.477, -9.889, 1000)
#The 2D function
def like2d(x,y):
stuff=[RaDec, beta, rho_c_over_sigma_c, zhalo, rho_crit]
m=3e14
c=7.455
param=[x, y, m, c]
return reduced_shear( param, stuff, observed_g, g_err)
pool = Pool(processes=12)
def data_stream(a, b):
for i, av in enumerate(a):
for j, bv in enumerate(b):
yield (i, j), (av, bv)
def myfunc(args):
return args[0], like2d(*args[1])
counter,likelihood = pool.map(myfunc, data_stream(ra, dec))
But I got the following error message:
Process PoolWorker-1:
Traceback (most recent call last):
File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/user/anaconda/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/user/anaconda/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
AttributeError: 'module' object has no attribute 'myfunc'
Process PoolWorker-2:
Traceback (most recent call last):
File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/user/anaconda/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/user/anaconda/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
AttributeError: 'module' object has no attribute 'myfunc'
Process PoolWorker-3:
Traceback (most recent call last):
File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/user/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/user/anaconda/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/user/anaconda/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
AttributeError: 'module' object has no attribute 'myfunc'
Process PoolWorker-4:
Everything is defined and I do not understand why this error message raised!! Can anybody point out what might be wrong?
Another approach to do loops with multiprocessing and save the results in a 2d array:
#Grid points
ra = np.linspace(25.1446, 25.7329, 1000)
dec = np.linspace(-10.477, -9.889, 1000)
#The 2D function
def like2d(x,y):
stuff=[RaDec, beta, rho_c_over_sigma_c, zhalo, rho_crit]
m=3e14
c=7.455
param=[x, y, m, c]
return reduced_shear( param, stuff, observed_g, g_err)
shared_array_base = multiprocessing.Array(ctypes.c_double, ra.shape[0]*dec.shape[0])
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape( ra.shape[0],dec.shape[0])
# Parallel processing
def my_func(i, def_param=shared_array):
shared_array[i,:] = np.array([float(like2d(ra[j],dec[i])) for j in range(ra.shape[0])])
print "processing to estimate likelihood in 2D grids......!!!"
start = time.time()
pool = multiprocessing.Pool(processes=12)
pool.map(my_func, range(dec.shape[0]))
print shared_array
end = time.time()
print end - start

You have to create the Pool after the worker function (myfunc) definition. Creating the Pool causes Python to fork your worker processes right at that point, and the only things that will be defined in the children are the functions defined above the Pool definition. Also, map will return a list of tuples (one for each object yielded by data_stream), not a single tuple. So you need this:
from multiprocessing import Pool
#Grid points
ra = np.linspace(25.1446, 25.7329, 1000)
dec = np.linspace(-10.477, -9.889, 1000)
#The 2D function
def like2d(x,y):
stuff=[RaDec, beta, rho_c_over_sigma_c, zhalo, rho_crit]
m=3e14
c=7.455
param=[x, y, m, c]
return reduced_shear( param, stuff, observed_g, g_err)
def data_stream(a, b):
for i, av in enumerate(a):
for j, bv in enumerate(b):
yield (i, j), (av, bv)
def myfunc(args):
return args[0], like2d(*args[1])
if __name__ == "__main__":
pool = Pool(processes=12)
results = pool.map(myfunc, data_stream(ra, dec)) # results is a list of tuples.
for counter,likelihood in results:
print("counter: {}, likelihood: {}".format(counter, likelihood))
I added the if __name__ == "__main__": guard, which isn't necessary on POSIX platforms, but would be necessary on Windows (which doesn't support os.fork()).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallelize Functions in Numpy - python

Related

Error when running processes in Python with a list as input

Multi-GPU training of AllenNLP coreference resolution

Ways to handle exceptions in Dask distributed

Working withmultiprocessing and h5py

Multiprocessing over grid points of a 2D function

Categories

Resources