multiprocessing: How do I share a dict among multiple processes?

multiprocessing: How do I share a dict among multiple processes? - python

A program that creates several processes that work on a join-able queue, Q, and may eventually manipulate a global dictionary D to store results. (so each child process may use D to store its result and also see what results the other child processes are producing)
If I print the dictionary D in a child process, I see the modifications that have been done on it (i.e. on D). But after the main process joins Q, if I print D, it's an empty dict!
I understand it is a synchronization/lock issue. Can someone tell me what is happening here, and how I can synchronize access to D?

A general answer involves using a Manager object. Adapted from the docs:
from multiprocessing import Process, Manager
def f(d):
d[1] += '1'
d['2'] += 2
if __name__ == '__main__':
manager = Manager()
d = manager.dict()
d[1] = '1'
d['2'] = 2
p1 = Process(target=f, args=(d,))
p2 = Process(target=f, args=(d,))
p1.start()
p2.start()
p1.join()
p2.join()
print d
Output:
$ python mul.py
{1: '111', '2': 6}

multiprocessing is not like threading. Each child process will get a copy of the main process's memory. Generally state is shared via communication (pipes/sockets), signals, or shared memory.
Multiprocessing makes some abstractions available for your use case - shared state that's treated as local by use of proxies or shared memory: http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes
Relevant sections:
http://docs.python.org/library/multiprocessing.html#shared-ctypes-objects
http://docs.python.org/library/multiprocessing.html#module-multiprocessing.managers

In addition to #senderle's here, some might also be wondering how to use the functionality of multiprocessing.Pool.
The nice thing is that there is a .Pool() method to the manager instance that mimics all the familiar API of the top-level multiprocessing.
from itertools import repeat
import multiprocessing as mp
import os
import pprint
def f(d: dict) -> None:
pid = os.getpid()
d[pid] = "Hi, I was written by process %d" % pid
if __name__ == '__main__':
with mp.Manager() as manager:
d = manager.dict()
with manager.Pool() as pool:
pool.map(f, repeat(d, 10))
# `d` is a DictProxy object that can be converted to dict
pprint.pprint(dict(d))
Output:
$ python3 mul.py
{22562: 'Hi, I was written by process 22562',
22563: 'Hi, I was written by process 22563',
22564: 'Hi, I was written by process 22564',
22565: 'Hi, I was written by process 22565',
22566: 'Hi, I was written by process 22566',
22567: 'Hi, I was written by process 22567',
22568: 'Hi, I was written by process 22568',
22569: 'Hi, I was written by process 22569',
22570: 'Hi, I was written by process 22570',
22571: 'Hi, I was written by process 22571'}
This is a slightly different example where each process just logs its process ID to the global DictProxy object d.

I'd like to share my own work that is faster than Manager's dict and is simpler and more stable than pyshmht library that uses tons of memory and doesn't work for Mac OS. Though my dict only works for plain strings and is immutable currently.
I use linear probing implementation and store keys and values pairs in a separate memory block after the table.
from mmap import mmap
import struct
from timeit import default_timer
from multiprocessing import Manager
from pyshmht import HashTable
class shared_immutable_dict:
def __init__(self, a):
self.hs = 1 << (len(a) * 3).bit_length()
kvp = self.hs * 4
ht = [0xffffffff] * self.hs
kvl = []
for k, v in a.iteritems():
h = self.hash(k)
while ht[h] != 0xffffffff:
h = (h + 1) & (self.hs - 1)
ht[h] = kvp
kvp += self.kvlen(k) + self.kvlen(v)
kvl.append(k)
kvl.append(v)
self.m = mmap(-1, kvp)
for p in ht:
self.m.write(uint_format.pack(p))
for x in kvl:
if len(x) <= 0x7f:
self.m.write_byte(chr(len(x)))
else:
self.m.write(uint_format.pack(0x80000000 + len(x)))
self.m.write(x)
def hash(self, k):
h = hash(k)
h = (h + (h >> 3) + (h >> 13) + (h >> 23)) * 1749375391 & (self.hs - 1)
return h
def get(self, k, d=None):
h = self.hash(k)
while True:
x = uint_format.unpack(self.m[h * 4:h * 4 + 4])[0]
if x == 0xffffffff:
return d
self.m.seek(x)
if k == self.read_kv():
return self.read_kv()
h = (h + 1) & (self.hs - 1)
def read_kv(self):
sz = ord(self.m.read_byte())
if sz & 0x80:
sz = uint_format.unpack(chr(sz) + self.m.read(3))[0] - 0x80000000
return self.m.read(sz)
def kvlen(self, k):
return len(k) + (1 if len(k) <= 0x7f else 4)
def __contains__(self, k):
return self.get(k, None) is not None
def close(self):
self.m.close()
uint_format = struct.Struct('>I')
def uget(a, k, d=None):
return to_unicode(a.get(to_str(k), d))
def uin(a, k):
return to_str(k) in a
def to_unicode(s):
return s.decode('utf-8') if isinstance(s, str) else s
def to_str(s):
return s.encode('utf-8') if isinstance(s, unicode) else s
def mmap_test():
n = 1000000
d = shared_immutable_dict({str(i * 2): '1' for i in xrange(n)})
start_time = default_timer()
for i in xrange(n):
if bool(d.get(str(i))) != (i % 2 == 0):
raise Exception(i)
print 'mmap speed: %d gets per sec' % (n / (default_timer() - start_time))
def manager_test():
n = 100000
d = Manager().dict({str(i * 2): '1' for i in xrange(n)})
start_time = default_timer()
for i in xrange(n):
if bool(d.get(str(i))) != (i % 2 == 0):
raise Exception(i)
print 'manager speed: %d gets per sec' % (n / (default_timer() - start_time))
def shm_test():
n = 1000000
d = HashTable('tmp', n)
d.update({str(i * 2): '1' for i in xrange(n)})
start_time = default_timer()
for i in xrange(n):
if bool(d.get(str(i))) != (i % 2 == 0):
raise Exception(i)
print 'shm speed: %d gets per sec' % (n / (default_timer() - start_time))
if __name__ == '__main__':
mmap_test()
manager_test()
shm_test()
On my laptop performance results are:
mmap speed: 247288 gets per sec
manager speed: 33792 gets per sec
shm speed: 691332 gets per sec
simple usage example:
ht = shared_immutable_dict({'a': '1', 'b': '2'})
print ht.get('a')

Maybe you can try pyshmht, sharing memory based hash table extension for Python.
Notice
It's not fully tested, just for your reference.
It currently lacks lock/sem mechanisms for multiprocessing.

Related

Why my CPU are not running 100% while I'm using multiprocessing?

I'm learning python multiprocessing and I tried this code :
def f(name):
n = 0
print('running ', name)
for i in range(1000):
for j in range(1000):
for k in range(100):
n += 1
print(n)
if __name__ == '__main__':
tab = []
for i in range(10):
p = Process(target=f, args=(i,))
p.start()
tab.append(p)
for t in tab:
t.join()
It worked well, and I saw in the monitor that CPU were all running 100%.
But when I switched to my application, which is using a shared variable for multiprocessing, I saw that CPU were all running at 40-50% and execution times are worst than without multiprocessing.
def repeat_optimization(self, cycle, nb_cycles, res):
start = time.time()
res['a'].append(a())
res['b'].append(b())
end = time.time()
print("Benchmark iteration : " + str(cycle + 1) + "/" + str(nb_cycles) + " completed in " + str(round(end - start, 2)) + "s")
def parallelize_repeat_optimization(self, nb_cycles):
manager = Manager()
res = manager.dict()
res['a'] = []
res['b'] = []
tab = []
for cycle in range(nb_cycles):
p = Process(target=self.repeat_optimization, args=(cycle, nb_cycles, res))
p.start()
tab.append(p)
for t in tab:
t.join()
return res
What's wrong with my code, since I do not see a main difference with the first example ?

In any multiprocessing situation you will always have higher utilization if each process is completely independent. In this case, there are hidden costs to using a Manager to synchronize the two processes. Under the hood, the processes have to communicate with each other to make sure they do not overwrite each others' changes, and this means each process will waste a lot of time waiting for responses from the other process.

concurrent.futures.ThreadPoolExecutor is slower than for list comprehension

I'm testing a trivial function using list comprehension vs concurrent.futures:
class Test:
#staticmethod
def something(times = 1):
return sum([1 for i in range(times)])
#staticmethod
def simulate1(function, N):
l = []
for i in range(N):
outcome = function()
l.append(outcome)
return sum(l) / N
#staticmethod
def simulate2(function, N):
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
l = [outcome for outcome in executor.map(lambda x: function(), range(N))]
return sum(l) / N
#staticmethod
def simulate3(function, N):
import concurrent.futures
l = 0
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = [executor.submit(function) for i in range(N)]
for future in concurrent.futures.as_completed(futures):
l += future.result()
return l / N
def simulation():
simulationRate = 100000
import datetime
s = datetime.datetime.now()
o = Test.simulate1(lambda : Test.something(10), simulationRate)
print((datetime.datetime.now() - s))
s = datetime.datetime.now()
o = Test.simulate2(lambda : Test.something(10), simulationRate)
print((datetime.datetime.now() - s))
s = datetime.datetime.now()
o = Test.simulate3(lambda : Test.something(10), simulationRate)
print((datetime.datetime.now() - s))
simulation()
Measuring the time, I get:
0:00:00.258000
0:00:10.348000
0:00:10.556000
I'm getting started with concurrency so I don't understand what is the bottleneck that prevents the threads to run faster.

if you change your task function to this, you will see the difference:
def something(n):
""" simulate doing some io based task.
"""
time.sleep(0.001)
return sum(1 for i in range(n))
On my mac pro, this gives:
0:00:13.774700
0:00:01.591226
0:00:01.489159
The concurrent.future is obvious more faster this time.
The reason is that: you are simulating a cpu based task, because of python's GIL, concurrent.future make it slower.
concurrent.future provides a high-level interface for asynchronously executing callables, you are using it for wrong scene.

Python Threads are not Improving Speed

In order to speed up a certain list processing logic, I wrote a decorator that would 1) intercept incoming function call 2) take its input list, break it into multiple pieces 4) pass these pieces to the original function on seperate threads 5) combine output and return
I thought it was a pretty neat idea, until I coded it and saw there was no change in speed! Even though I see multiple cores busy on htop, multithreaded version is actually slower than the single thread version.
Does this have to do with the infamous cpython GIL?
Thanks!
from threading import Thread
import numpy as np
import time
# breaks a list into n list of lists
def split(a, n):
k, m = len(a) / n, len(a) % n
return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in xrange(n))
THREAD_NUM = 8
def parallel_compute(fn):
class Worker(Thread):
def __init__(self, *args):
Thread.__init__(self)
self.result = None
self.args = args
def run(self):
self.result = fn(*self.args)
def new_compute(*args, **kwargs):
threads = [Worker(args[0], args[1], args[2], x) for x in split(args[3], THREAD_NUM)]
for x in threads: x.start()
for x in threads: x.join()
final_res = []
for x in threads: final_res.extend(x.result)
return final_res
return new_compute
# some function that does a lot of computation
def f(x): return np.abs(np.tan(np.cos(np.sqrt(x**2))))
class Foo:
#parallel_compute
def compute(self, bla, blah, input_list):
return map(f, input_list)
inp = [i for i in range(40*1000*100)]
#inp = [1,2,3,4,5,6,7]
if __name__ == "__main__":
o = Foo()
start = time.time()
res = o.compute(None, None, inp)
end = time.time()
print 'parallel', end - start
Single thread version
import time, fast_one, numpy as np
class SlowFoo:
def compute(self, bla, blah, input_list):
return map(fast_one.f, input_list)
if __name__ == "__main__":
o = SlowFoo()
start = time.time()
res = np.array(o.compute(None, None, fast_one.inp))
end = time.time()
print 'single', end - start
And here is the multiprocessing version that gives "PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed".
import pathos.multiprocessing as mp
import numpy as np, dill
import time
def split(a, n):
k, m = len(a) / n, len(a) % n
return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in xrange(n))
def f(x): return np.abs(np.tan(np.cos(np.sqrt(x**2))))
def compute(input_list):
return map(f, input_list)
D = 2; pool = mp.Pool(D)
def parallel_compute(fn):
def new_compute(*args, **kwargs):
inp = []
for x in split(args[0], D): inp.append(x)
outputs_async = pool.map_async(fn, inp)
outputs = outputs_async.get()
outputs = [y for x in outputs for y in x]
return outputs
return new_compute
compute = parallel_compute(compute)
inp = [i for i in range(40*1000)]
if __name__ == "__main__":
start = time.time()
res = compute(inp)
end = time.time()
print 'parallel', end - start
print len(res)

Yes, when your threads are doing CPU-bound work implemented in Python (not by, say, C extensions which can release the GIL before and after marshalling/demarshalling data from Python structures), the GIL is a problem here.
I'd suggest using a multiprocessing model, a Python implementation that doesn't have it (IronPython, Jython, etc), or a different language altogether (if you're doing performance-sensitive work, there's no end of languages nearly as fluid as Python but with considerably better runtime performance).

Alternatively you can redsign and start all parallel Code in subprocesses.
You need worker-threads which start a subprocess for calculation.
Those subprocesses can run really parallel.

Memoization, Classes, and Multiprocessing in Python

I am trying to do some computations using the multiprocessing module in python 2.7.2.
My code is like this:
from multiprocessing import Pool
import sys
sys.setrecursionlimit(10000)
partitions = []
class Partitions:
parts = {} #My goal is to use this dict to speed
#up calculations in every process that
#uses it, without having to build it up
#from nothing each time
def __init__(self):
pass
def p1(self, k, n):
if (k,n) in Partitions.parts:
return Partitions.parts[(k, n)]
if k>n:
return 0
if k==n:
return 1
Partitions.parts[(k,n)] = self.p1(k+1, n) + self.p1(k, n-k)
return Partitions.parts[(k,n)]
def P(self, n):
result = 0
for k in xrange(1,n/2 + 1):
result += self.p1(k, n-k)
return 1 + result
p = Partitions()
def log(results):
if results:
partitions.extend(results)
return None
def partWorker(start,stop):
ps = []
for n in xrange(start, stop):
ps.append(((1,n), p.P(n)))
return ps
def main():
pool = Pool()
step = 150
for i in xrange(0,301,step):
pool.apply_async(partWorker, (i, i+step), callback = log)
pool.close()
pool.join()
return None
if __name__=="__main__":
main()
I am new to this, I basically copied the format of the prime code on this page:
python prime crunching: processing pool is slower?
Can I get process running in each core all looking at the same dictionary to assist their
calculations? The way it behaves now, each process creates it's own dictionaries and it eats up ram like crazy.

I'm not sure if this is what you want ... but, take a look at multiprocessing.Manager ( http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes ). Managers allow you to share a dict between processes.

Multiprocessing: How to use Pool.map on a function defined in a class?

When I run something like:
from multiprocessing import Pool
p = Pool(5)
def f(x):
return x*x
p.map(f, [1,2,3])
it works fine. However, putting this as a function of a class:
class calculate(object):
def run(self):
def f(x):
return x*x
p = Pool()
return p.map(f, [1,2,3])
cl = calculate()
print cl.run()
Gives me the following error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/sw/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/sw/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/sw/lib/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I've seen a post from Alex Martelli dealing with the same kind of problem, but it wasn't explicit enough.

I could not use the code posted so far because code using "multiprocessing.Pool" do not work with lambda expressions and code not using "multiprocessing.Pool" spawn as many processes as there are work items.
I adapted the code s.t. it spawns a predefined amount of workers and only iterates through the input list if there exists an idle worker. I also enabled the "daemon" mode for the workers s.t. ctrl-c works as expected.
import multiprocessing
def fun(f, q_in, q_out):
while True:
i, x = q_in.get()
if i is None:
break
q_out.put((i, f(x)))
def parmap(f, X, nprocs=multiprocessing.cpu_count()):
q_in = multiprocessing.Queue(1)
q_out = multiprocessing.Queue()
proc = [multiprocessing.Process(target=fun, args=(f, q_in, q_out))
for _ in range(nprocs)]
for p in proc:
p.daemon = True
p.start()
sent = [q_in.put((i, x)) for i, x in enumerate(X)]
[q_in.put((None, None)) for _ in range(nprocs)]
res = [q_out.get() for _ in range(len(sent))]
[p.join() for p in proc]
return [x for i, x in sorted(res)]
if __name__ == '__main__':
print(parmap(lambda i: i * 2, [1, 2, 3, 4, 6, 7, 8]))

Multiprocessing and pickling is broken and limited unless you jump outside the standard library.
If you use a fork of multiprocessing called pathos.multiprocesssing, you can directly use classes and class methods in multiprocessing's map functions. This is because dill is used instead of pickle or cPickle, and dill can serialize almost anything in python.
pathos.multiprocessing also provides an asynchronous map function… and it can map functions with multiple arguments (e.g. map(math.pow, [1,2,3], [4,5,6]))
See discussions:
What can multiprocessing and dill do together?
and:
http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization
It even handles the code you wrote initially, without modification, and from the interpreter. Why do anything else that's more fragile and specific to a single case?
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> class calculate(object):
... def run(self):
... def f(x):
... return x*x
... p = Pool()
... return p.map(f, [1,2,3])
...
>>> cl = calculate()
>>> print cl.run()
[1, 4, 9]
Get the code here:
https://github.com/uqfoundation/pathos
And, just to show off a little more of what it can do:
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>>
>>> p = Pool(4)
>>>
>>> def add(x,y):
... return x+y
...
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>>
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>>
>>> class Test(object):
... def plus(self, x, y):
... return x+y
...
>>> t = Test()
>>>
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>>
>>> res = p.amap(t.plus, x, y)
>>> res.get()
[4, 6, 8, 10]

I also was annoyed by restrictions on what sort of functions pool.map could accept. I wrote the following to circumvent this. It appears to work, even for recursive use of parmap.
from multiprocessing import Process, Pipe
from itertools import izip
def spawn(f):
def fun(pipe, x):
pipe.send(f(x))
pipe.close()
return fun
def parmap(f, X):
pipe = [Pipe() for x in X]
proc = [Process(target=spawn(f), args=(c, x)) for x, (p, c) in izip(X, pipe)]
[p.start() for p in proc]
[p.join() for p in proc]
return [p.recv() for (p, c) in pipe]
if __name__ == '__main__':
print parmap(lambda x: x**x, range(1, 5))

There is currently no solution to your problem, as far as I know: the function that you give to map() must be accessible through an import of your module. This is why robert's code works: the function f() can be obtained by importing the following code:
def f(x):
return x*x
class Calculate(object):
def run(self):
p = Pool()
return p.map(f, [1,2,3])
if __name__ == '__main__':
cl = Calculate()
print cl.run()
I actually added a "main" section, because this follows the recommendations for the Windows platform ("Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects").
I also added an uppercase letter in front of Calculate, so as to follow PEP 8. :)

The solution by mrule is correct but has a bug: if the child sends back a large amount of data, it can fill the pipe's buffer, blocking on the child's pipe.send(), while the parent is waiting for the child to exit on pipe.join(). The solution is to read the child's data before join()ing the child. Furthermore the child should close the parent's end of the pipe to prevent a deadlock. The code below fixes that. Also be aware that this parmap creates one process per element in X. A more advanced solution is to use multiprocessing.cpu_count() to divide X into a number of chunks, and then merge the results before returning. I leave that as an exercise to the reader so as not to spoil the conciseness of the nice answer by mrule. ;)
from multiprocessing import Process, Pipe
from itertools import izip
def spawn(f):
def fun(ppipe, cpipe,x):
ppipe.close()
cpipe.send(f(x))
cpipe.close()
return fun
def parmap(f,X):
pipe=[Pipe() for x in X]
proc=[Process(target=spawn(f),args=(p,c,x)) for x,(p,c) in izip(X,pipe)]
[p.start() for p in proc]
ret = [p.recv() for (p,c) in pipe]
[p.join() for p in proc]
return ret
if __name__ == '__main__':
print parmap(lambda x:x**x,range(1,5))

I've also struggled with this. I had functions as data members of a class, as a simplified example:
from multiprocessing import Pool
import itertools
pool = Pool()
class Example(object):
def __init__(self, my_add):
self.f = my_add
def add_lists(self, list1, list2):
# Needed to do something like this (the following line won't work)
return pool.map(self.f,list1,list2)
I needed to use the function self.f in a Pool.map() call from within the same class and self.f did not take a tuple as an argument. Since this function was embedded in a class, it was not clear to me how to write the type of wrapper other answers suggested.
I solved this problem by using a different wrapper that takes a tuple/list, where the first element is the function, and the remaining elements are the arguments to that function, called eval_func_tuple(f_args). Using this, the problematic line can be replaced by return pool.map(eval_func_tuple, itertools.izip(itertools.repeat(self.f), list1, list2)). Here is the full code:
File: util.py
def add(a, b): return a+b
def eval_func_tuple(f_args):
"""Takes a tuple of a function and args, evaluates and returns result"""
return f_args[0](*f_args[1:])
File: main.py
from multiprocessing import Pool
import itertools
import util
pool = Pool()
class Example(object):
def __init__(self, my_add):
self.f = my_add
def add_lists(self, list1, list2):
# The following line will now work
return pool.map(util.eval_func_tuple,
itertools.izip(itertools.repeat(self.f), list1, list2))
if __name__ == '__main__':
myExample = Example(util.add)
list1 = [1, 2, 3]
list2 = [10, 20, 30]
print myExample.add_lists(list1, list2)
Running main.py will give [11, 22, 33]. Feel free to improve this, for example eval_func_tuple could also be modified to take keyword arguments.
On another note, in another answers, the function "parmap" can be made more efficient for the case of more Processes than number of CPUs available. I'm copying an edited version below. This is my first post and I wasn't sure if I should directly edit the original answer. I also renamed some variables.
from multiprocessing import Process, Pipe
from itertools import izip
def spawn(f):
def fun(pipe,x):
pipe.send(f(x))
pipe.close()
return fun
def parmap(f,X):
pipe=[Pipe() for x in X]
processes=[Process(target=spawn(f),args=(c,x)) for x,(p,c) in izip(X,pipe)]
numProcesses = len(processes)
processNum = 0
outputList = []
while processNum < numProcesses:
endProcessNum = min(processNum+multiprocessing.cpu_count(), numProcesses)
for proc in processes[processNum:endProcessNum]:
proc.start()
for proc in processes[processNum:endProcessNum]:
proc.join()
for proc,c in pipe[processNum:endProcessNum]:
outputList.append(proc.recv())
processNum = endProcessNum
return outputList
if __name__ == '__main__':
print parmap(lambda x:x**x,range(1,5))

I know that this question was asked 8 years and 10 months ago but I want to present you my solution:
from multiprocessing import Pool
class Test:
def __init__(self):
self.main()
#staticmethod
def methodForMultiprocessing(x):
print(x*x)
def main(self):
if __name__ == "__main__":
p = Pool()
p.map(Test.methodForMultiprocessing, list(range(1, 11)))
p.close()
TestObject = Test()
You just need to make your class function into a static method. But it's also possible with a class method:
from multiprocessing import Pool
class Test:
def __init__(self):
self.main()
#classmethod
def methodForMultiprocessing(cls, x):
print(x*x)
def main(self):
if __name__ == "__main__":
p = Pool()
p.map(Test.methodForMultiprocessing, list(range(1, 11)))
p.close()
TestObject = Test()
Tested in Python 3.7.3

I know this was asked over 6 years ago now, but just wanted to add my solution, as some of the suggestions above seem horribly complicated, but my solution was actually very simple.
All I had to do was wrap the pool.map() call to a helper function. Passing the class object along with args for the method as a tuple, which looked a bit like this.
def run_in_parallel(args):
return args[0].method(args[1])
myclass = MyClass()
method_args = [1,2,3,4,5,6]
args_map = [ (myclass, arg) for arg in method_args ]
pool = Pool()
pool.map(run_in_parallel, args_map)

I took klaus se's and aganders3's answer, and made a documented module that is more readable and holds in one file. You can just add it to your project. It even has an optional progress bar !
"""
The ``processes`` module provides some convenience functions
for using parallel processes in python.
Adapted from http://stackoverflow.com/a/16071616/287297
Example usage:
print prll_map(lambda i: i * 2, [1, 2, 3, 4, 6, 7, 8], 32, verbose=True)
Comments:
"It spawns a predefined amount of workers and only iterates through the input list
if there exists an idle worker. I also enabled the "daemon" mode for the workers so
that KeyboardInterupt works as expected."
Pitfalls: all the stdouts are sent back to the parent stdout, intertwined.
Alternatively, use this fork of multiprocessing:
https://github.com/uqfoundation/multiprocess
"""
# Modules #
import multiprocessing
from tqdm import tqdm
################################################################################
def apply_function(func_to_apply, queue_in, queue_out):
while not queue_in.empty():
num, obj = queue_in.get()
queue_out.put((num, func_to_apply(obj)))
################################################################################
def prll_map(func_to_apply, items, cpus=None, verbose=False):
# Number of processes to use #
if cpus is None: cpus = min(multiprocessing.cpu_count(), 32)
# Create queues #
q_in = multiprocessing.Queue()
q_out = multiprocessing.Queue()
# Process list #
new_proc = lambda t,a: multiprocessing.Process(target=t, args=a)
processes = [new_proc(apply_function, (func_to_apply, q_in, q_out)) for x in range(cpus)]
# Put all the items (objects) in the queue #
sent = [q_in.put((i, x)) for i, x in enumerate(items)]
# Start them all #
for proc in processes:
proc.daemon = True
proc.start()
# Display progress bar or not #
if verbose:
results = [q_out.get() for x in tqdm(range(len(sent)))]
else:
results = [q_out.get() for x in range(len(sent))]
# Wait for them to finish #
for proc in processes: proc.join()
# Return results #
return [x for i, x in sorted(results)]
################################################################################
def test():
def slow_square(x):
import time
time.sleep(2)
return x**2
objs = range(20)
squares = prll_map(slow_square, objs, 4, verbose=True)
print "Result: %s" % squares
EDIT: Added #alexander-mcfarlane suggestion and a test function

Functions defined in classes (even within functions within classes) don't really pickle. However, this works:
def f(x):
return x*x
class calculate(object):
def run(self):
p = Pool()
return p.map(f, [1,2,3])
cl = calculate()
print cl.run()

I modified klaus se's method because while it was working for me with small lists, it would hang when the number of items was ~1000 or greater. Instead of pushing the jobs one at a time with the None stop condition, I load up the input queue all at once and just let the processes munch on it until it's empty.
from multiprocessing import cpu_count, Queue, Process
def apply_func(f, q_in, q_out):
while not q_in.empty():
i, x = q_in.get()
q_out.put((i, f(x)))
# map a function using a pool of processes
def parmap(f, X, nprocs = cpu_count()):
q_in, q_out = Queue(), Queue()
proc = [Process(target=apply_func, args=(f, q_in, q_out)) for _ in range(nprocs)]
sent = [q_in.put((i, x)) for i, x in enumerate(X)]
[p.start() for p in proc]
res = [q_out.get() for _ in sent]
[p.join() for p in proc]
return [x for i,x in sorted(res)]
Edit: unfortunately now I am running into this error on my system: Multiprocessing Queue maxsize limit is 32767, hopefully the workarounds there will help.

You can run your code without any issues if you somehow manually ignore the Pool object from the list of objects in the class because it is not pickleable as the error says. You can do this with the __getstate__ function (look here too) as follow. The Pool object will try to find the __getstate__ and __setstate__ functions and execute them if it finds it when you run map, map_async etc:
class calculate(object):
def __init__(self):
self.p = Pool()
def __getstate__(self):
self_dict = self.__dict__.copy()
del self_dict['p']
return self_dict
def __setstate__(self, state):
self.__dict__.update(state)
def f(self, x):
return x*x
def run(self):
return self.p.map(self.f, [1,2,3])
Then do:
cl = calculate()
cl.run()
will give you the output:
[1, 4, 9]
I've tested the above code in Python 3.x and it works.

Here is my solution, which I think is a bit less hackish than most others here. It is similar to nightowl's answer.
someclasses = [MyClass(), MyClass(), MyClass()]
def method_caller(some_object, some_method='the method'):
return getattr(some_object, some_method)()
othermethod = partial(method_caller, some_method='othermethod')
with Pool(6) as pool:
result = pool.map(othermethod, someclasses)

This may not be a very good solution but in my case, I solve it like this.
from multiprocessing import Pool
def foo1(data):
self = data.get('slf')
lst = data.get('lst')
return sum(lst) + self.foo2()
class Foo(object):
def __init__(self, a, b):
self.a = a
self.b = b
def foo2(self):
return self.a**self.b
def foo(self):
p = Pool(5)
lst = [1, 2, 3]
result = p.map(foo1, (dict(slf=self, lst=lst),))
return result
if __name__ == '__main__':
print(Foo(2, 4).foo())
I had to pass self to my function as I have to access attributes and functions of my class through that function. This is working for me. Corrections and suggestions are always welcome.

Here is a boilerplate I wrote for using multiprocessing Pool in python3, specifically python3.7.7 was used to run the tests. I got my fastest runs using imap_unordered. Just plug in your scenario and try it out. You can use timeit or just time.time() to figure out which works best for you.
import multiprocessing
import time
NUMBER_OF_PROCESSES = multiprocessing.cpu_count()
MP_FUNCTION = 'starmap' # 'imap_unordered' or 'starmap' or 'apply_async'
def process_chunk(a_chunk):
print(f"processig mp chunk {a_chunk}")
return a_chunk
map_jobs = [1, 2, 3, 4]
result_sum = 0
s = time.time()
if MP_FUNCTION == 'imap_unordered':
pool = multiprocessing.Pool(processes=NUMBER_OF_PROCESSES)
for i in pool.imap_unordered(process_chunk, map_jobs):
result_sum += i
elif MP_FUNCTION == 'starmap':
pool = multiprocessing.Pool(processes=NUMBER_OF_PROCESSES)
try:
map_jobs = [(i, ) for i in map_jobs]
result_sum = pool.starmap(process_chunk, map_jobs)
result_sum = sum(result_sum)
finally:
pool.close()
pool.join()
elif MP_FUNCTION == 'apply_async':
with multiprocessing.Pool(processes=NUMBER_OF_PROCESSES) as pool:
result_sum = [pool.apply_async(process_chunk, [i, ]).get() for i in map_jobs]
result_sum = sum(result_sum)
print(f"result_sum is {result_sum}, took {time.time() - s}s")
In the above scenario imap_unordered actually seems to perform the worst for me. Try out your case and benchmark it on the machine you plan to run it on. Also read up on Process Pools. Cheers!

I'm not sure if this approach has been taken but a work around i'm using is:
from multiprocessing import Pool
t = None
def run(n):
return t.f(n)
class Test(object):
def __init__(self, number):
self.number = number
def f(self, x):
print x * self.number
def pool(self):
pool = Pool(2)
pool.map(run, range(10))
if __name__ == '__main__':
t = Test(9)
t.pool()
pool = Pool(2)
pool.map(run, range(10))
Output should be:
0
9
18
27
36
45
54
63
72
81
0
9
18
27
36
45
54
63
72
81

class Calculate(object):
# Your instance method to be executed
def f(self, x, y):
return x*y
if __name__ == '__main__':
inp_list = [1,2,3]
y = 2
cal_obj = Calculate()
pool = Pool(2)
results = pool.map(lambda x: cal_obj.f(x, y), inp_list)
There is a possibility that you would want to apply this function for each different instance of the class. Then here is the solution for that also
class Calculate(object):
# Your instance method to be executed
def __init__(self, x):
self.x = x
def f(self, y):
return self.x*y
if __name__ == '__main__':
inp_list = [Calculate(i) for i in range(3)]
y = 2
pool = Pool(2)
results = pool.map(lambda x: x.f(y), inp_list)

From http://www.rueckstiess.net/research/snippets/show/ca1d7d90 and http://qingkaikong.blogspot.com/2016/12/python-parallel-method-in-class.html
We can make an external function and seed it with the class self object:
from joblib import Parallel, delayed
def unwrap_self(arg, **kwarg):
return square_class.square_int(*arg, **kwarg)
class square_class:
def square_int(self, i):
return i * i
def run(self, num):
results = []
results = Parallel(n_jobs= -1, backend="threading")\
(delayed(unwrap_self)(i) for i in zip([self]*len(num), num))
print(results)
OR without joblib:
from multiprocessing import Pool
import time
def unwrap_self_f(arg, **kwarg):
return C.f(*arg, **kwarg)
class C:
def f(self, name):
print 'hello %s,'%name
time.sleep(5)
print 'nice to meet you.'
def run(self):
pool = Pool(processes=2)
names = ('frank', 'justin', 'osi', 'thomas')
pool.map(unwrap_self_f, zip([self]*len(names), names))
if __name__ == '__main__':
c = C()
c.run()

To implement multiprocessing in aws lambda we have two ways.
Note : Threadpool doesn't work in aws lambda
use the example solution which is provided by aws team
please use this link https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/
use this package https://pypi.org/project/lambda-multiprocessing/
i have implemented my lambda function with both the solution and both is working fine can't share my code here but this 2 links will help you for sure.
i find 2 nd way more easy to implement.

There are also some libraries to make this easier, for example autothread (only for Python 3.6 and up):
import autothread
class calculate(object):
def run(self):
#autothread.multiprocessed()
def f(x: int):
return x*x
return f([1,2,3])
cl = calculate()
print(cl.run())
You can also take a look at lox.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

multiprocessing: How do I share a dict among multiple processes? - python

Maybe you can try pyshmht, sharing memory based hash table extension for Python. Notice It's not fully tested, just for your reference. It currently lacks lock/sem mechanisms for multiprocessing.

Related

Why my CPU are not running 100% while I'm using multiprocessing?

concurrent.futures.ThreadPoolExecutor is slower than for list comprehension

Python Threads are not Improving Speed

Memoization, Classes, and Multiprocessing in Python

Multiprocessing: How to use Pool.map on a function defined in a class?

Categories

Resources