I have a function that does a calculation and saves the state of the calculation in the result dictionary (default default argument). I first run it, then run several processes using the multiprocessing module. I need to run the function again in each of those parallel processes, but after this function has run once, I need the cached state to be returned, the value must not be recalculated. This requirement doesn't make sense in my example, but I can't think of a simple realistic argument that would require this restriction. Using a dict as mutable default argument works, but
this doesn't work with the multiprocessing module. What approach can I use to get the same effect?
Note that the state value is something (a dictionary containing class values) that cannot be passed to the multiple processes as an argument afaik.
The SO question Python multiprocessing: How do I share a dict among multiple processes? seems to cover similar ground. Perhaps I can use a Manager to do what I need, but it is not obvious how. Alternatively, one could perhaps save the value to a global object, per https://stackoverflow.com/a/4534956/350713, but that doesn't seem very elegant.
def foo(result={}):
if result:
print "returning cached result"
return result
result[1] = 2
return result
def parafn():
from multiprocessing import Pool
pool = Pool(processes=2)
arglist = []
foo()
for i in range(4):
arglist.append({})
results = []
r = pool.map_async(foo, arglist, callback=results.append)
r.get()
r.wait()
pool.close()
pool.join()
return results
print parafn()
UPDATE: Thanks for the comments. I've got a working example now, posted below.
I think the safest way of exchange data between procesess is with a Queue, the multiprocessing module brings you 2 types of them Queue and JoinableQueue, see documentation:
http://docs.python.org/library/multiprocessing.html#exchanging-objects-between-processes
This code would not win any beauty prizes, but works for me.
This example is similar to the example in the question, but with some minor changes.
The add_to_d construct is a bit awkward, but I don't see a better way to do this.
Brief summary: I copy the state of foo's d, (which is a mutable default argument) back to foo,
but the foo in the new process spaces created by the pool. Once this is done, then foo in the new process spaces
will not recalculate the cached values.
It seems this is what the pool initializer does, though the documentation is not very explicit.
class bar(object):
def __init__(self, x):
self.x = x
def __repr__(self):
return "<bar "+ str(self.x) +">"
def foo(x=None, add_to_d=None, d = {}):
if add_to_d:
d.update(add_to_d)
if x is None:
return
if x in d:
print "returning cached result, d is %s, x is %s"%(d, x)
return d[x]
d[x] = bar(x)
return d[x]
def finit(cacheval):
foo(x=None, add_to_d=cacheval)
def parafn():
from multiprocessing import Pool
arglist = []
foo(1)
pool = Pool(processes=2, initializer=finit, initargs=[foo.func_defaults[2]])
arglist = range(4)
results = []
r = pool.map_async(foo, iterable=arglist, callback=results.append)
r.get()
r.wait()
pool.close()
pool.join()
return results
print parafn()
Related
I have a Look Up Table LUT which is a very large dictionary (24G).
And I have millions of inputs to perform query on it.
I want to split the millions of inputs across 32 jobs, and run them in parallel.
Due to the space contraint, I cannot run multiple python scripts, because that will result in memory overload.
I want to use the multiprocessing module to only load the LUT just once, and then have different processes look it up, while sharing it as a global variable, without having to duplicate it.
However when I look at the htop, it seems each subprocess are re-creating the LUT? I made this claim because under the VIRT, RES, SHR. The numbers are very high.
But at the same time I dont see the additional memory used in the Mem row, it increased from 11Gb to 12.3G and just hovers there.
So im confused, is it, or is it not re-creating the LUT within each sub process ?
How should i proceed to make sure i am running parallel works, without duplicating LUT in each subprocess ?
Code is shown below the picture.
(In this experiment I'm only using 1Gb of LUT so, dont worry about it not being 24Gb)
import os, sys, time, pprint, pdb, datetime
import threading, multiprocessing
## Print the process/thread details
def getDetails(idx):
pid = os.getpid()
threadName = threading.current_thread().name
processName = multiprocessing.current_process().name
print(f"{idx})\tpid={pid}\tprocessName={processName}\tthreadName={threadName} ")
return pid, threadName, processName
def ComplexAlgorithm(value):
# Instead of just lookup like this
# the real algorithm is some complex algorithm that performs some search
return value in LUT
## Querying the 24Gb LUT from my millions of lines of input
def PerformMatching(idx, NumberOfLines):
pid, threadName, processName = getDetails(idx)
NumberMatches = 0
for _ in range(NumberOfLines):
# I will actually read the contents from my file live,
# but here just assume i generate random numbers
value = random.randint(-100, 100)
if ComplexAlgorithm(value): NumberMatches += 1
print(f"\t{idx}) | LUT={len(LUT)} | NumberMatches={NumberMatches} | done")
if __name__ == "__main__":
## Init
num_processes = 9
# this is just a pseudo-call to show you the structure of my LUT, the real one is larger
LUT = (dict(i,set([i])) for i in range(1000))
## Store the multiple filenames
ListOfLists = []
for idx in range(num_processes):
NumberOfLines = 10000
ListOfLists.append( NumberOfLines )
## Init the processes
ProcessList = []
for processIndex in range(num_processes):
ProcessList.append(
multiprocessing.Process(
target=PerformMatching,
args=(processIndex, ListOfLists[processIndex])
)
)
ProcessList[processIndex].start()
## Wait until the process terminates.
for processIndex in range(num_processes):
ProcessList[processIndex].join()
## Done
If you want to go the route of using a multiprocessing.Manager, this is how you could do it. The trade-off is that the dictionary is represented by a reference to a proxy for the actual dictionary that exists in a different address space and consequently every dictionary reference results in the equivalent of a remote procedure call. In other words, access is much slower compared with a "regular" dictionary.
In the demo program below, I have only defined a couple of methods for my managed dictionary, but you can define whatever you need. I have also used a multiprocessing pool instead of explicitly starting individual processes; you might consider doing likewise.
from multiprocessing.managers import BaseManager, BaseProxy
from multiprocessing import Pool
from functools import partial
def worker(LUT, key):
return LUT[key]
class MyDict:
def __init__(self):
""" initialize the dictionary """
# the very large dictionary reduced for demo purposes:
self._dict = {i: i for i in range(100)}
def get(self, obj, default=None):
""" delegates to underlying dict """
return self._dict.get(obj, default)
def __getitem__(self, obj):
""" delegates to underlying dict """
return self._dict[obj]
class MyDictManager(BaseManager):
pass
class MyDictProxy(BaseProxy):
_exposed_ = ('get', '__getitem__')
def get(self, *args, **kwargs):
return self._callmethod('get', args, kwargs)
def __getitem__(self, *args, **kwargs):
return self._callmethod('__getitem__', args, kwargs)
def main():
MyDictManager.register('MyDict', MyDict, MyDictProxy)
with MyDictManager() as manager:
my_dict = manager.MyDict()
pool = Pool()
# pass proxy instead of actual LUT:
results = pool.map(partial(worker, my_dict), range(100))
print(sum(results))
if __name__ == '__main__':
main()
Prints:
4950
Discussion
Python comes with a managed dict class built in obtainable with multiprocessing.Manager().dict(). But initializing such a large number of entries with such a dictionary would be very inefficient based on my prior comment that each access would be relatively expensive. It seemed to me that it would be less expensive to create our own managed class that had an underlying "regular" dictionary that could be initialized directly when the managed class is constructed and not via the proxy reference. And while it is true that the managed dict that comes with Python can be instantiated with an already built dictionary, which avoids that inefficiency problem, my concern is that memory efficiency would suffer because you would have two instances of the dictionary, i.e. the "regular" dictionary and the "managed" dictionary.
I'm doing a lot of calculations with a python script. As it is CPU-bound my usual approach with the threading module didn't yield any performance improvements.
I was now trying to use Multiprocessing instead of Multithreading to better use my CPU and speed up the lengthy calculations.
I found some example codes here on stackoverflow, but I don't get the script to accept more than one argument. Could somebody help me out with this? I've never used these modules before an I'm pretty sure I'm using Pool.map wrong. - Any help is appreciated. Other ways to accomplish Multiprocessing are also welcome.
from multiprocessing import Pool
def calculation(foo, bar, foobar, baz):
# Do a lot of calculations based on the variables
# Later the result is written to a file.
result = foo * bar * foobar * baz
print(result)
if __name__ == '__main__':
for foo in range(3):
for bar in range(5):
for baz in range(4):
for foobar in range(10):
Pool.map(calculation, foo, bar, foobar, baz)
Pool.close()
Pool.join()
You are, as you suspected, using map wrong, in more ways than one.
The point of map is to call a function on all elements of an iterable. Just like the builtin map function, but in parallel. If you want queue a single call, just use apply_async.
For the problem you were specifically asking about: map takes a single-argument function. If you want to pass multiple arguments, you can modify or wrap your function to take a single tuple instead of multiple arguments (I'll show this at the end), or just use starmap. Or, if you want to use apply_async, it takes a function of multiple arguments, but you pass apply_async an argument tuple, not separate arguments.
You need to call map on a Pool instance, not the Pool class. What you're trying to do is akin to try to read from the file type instead of reading from a particular open file.
You're trying to close and join the Pool after every iteration. You don't want to do that until you've finished all of them, or your code will just wait for the first one to finish, and then raise an exception for the second one.
So, the smallest change that would work is:
if __name__ == '__main__':
pool = Pool()
for foo in range(3):
for bar in range(5):
for baz in range(4):
for foobar in range(10):
pool.apply_async(calculation, (foo, bar, foobar, baz))
pool.close()
pool.join()
Notice that I kept everything inside the if __name__ == '__main__': block—including the new Pool() constructor. I won't show this in the later examples, but it's necessary for all of them, for reasons explained in the Programming guidelines section of the docs.1
If you instead want to use one of the map functions, you need an iterable full of arguments, like this:
pool = Pool()
args = ((foo, bar, foobar, baz)
for foo in range(3)
for bar in range(5)
for baz in range(4)
for foobar in range(10))
pool.starmap(calculation, args)
pool.close()
pool.join()
Or, more simply:
pool = Pool()
pool.starmap(calculate, itertools.product(range(3), range(5), range(4), range(10)))
pool.close()
pool.join()
Assuming you're not using an old version of Python, you can simplify it even further by using the Pool in a with statement:
with Pool() as pool:
pool.starmap(calculate,
itertools.product(range(3), range(5), range(4), range(10)))
One problem with using map or starmap is that it does extra work to make sure you get the results back in order. But you're just returning None and ignoring it, so why do that work?
Using apply_async doesn't have that problem.
You can also replace map with imap_unordered, but there is no istarmap_unordered, so you'd need to wrap your function to not need starmap:
def starcalculate(args):
return calculate(*args)
with Pool() as pool:
pool.imap_unordered(starcalculate,
itertools.product(range(3), range(5), range(4), range(10)))
1. If you're using the spawn or forkserver start methods—and spawn is the defaults on Windows—every child process does the equivalent of importing your module. So, all top-level code that isn't protected by a __main__ guard will get run in every child. The module tries to protect you from some of the worst consequences of this (e.g., instead of forkbombing your computer with an exponential explosion of children creating new children, you will often get an exception), but it can't make the code actually work.
I have a list of objects and need to call a member function of every object. Is it possible to use multiprocessing for that?
I wrote a short example of what i want to do.
import multiprocessing as mp
class Example:
data1 = 0
data2 = 3
def compute():
self.val3 = 6
listofobjects = []
for i in range(5):
listofobjects.append(Example())
pool = mp.Pool()
pool.map(listofobjects[range(5)].compute())
There are two conceptual issues that #abarnert has pointed out, beyond the syntactic and usage problems in your "pseudocode". The first is that map works with a function that is applied to the elements of your input. The second is that each sub-process gets a copy of you object, so changes to attributes are not automatically seen in the originals. Both issues can be worked around.
To answer your immediate question, here is how you would apply a method to your list:
with mp.Pool() as pool:
pool.map(Example.compute, listofobjects)
Example.compute is an unbound method. That means that it is just a regular function that accepts self as a first argument, making it a perfect fit for map. I also use the pool as a context manager, which is recommended to ensure that cleanup is done properly whether or not an error occurs.
The code above would not work because the effects of compute would be local to the subprocess. The only way to pass them back to the original process would be to return them from the function you passed to map. If you don't want to modify compute, you could do something like this:
def get_val3(x):
x.compute()
return x.val3
with mp.Pool() as pool:
for value, obj in zip(pool.map(get_val3, listofobjects), listofobjects):
obj.val3 = value
If you were willing to modify compute to return the object it is operating on (self), you could use it to replace the original objects much more efficiently:
class Example:
...
def compute():
...
return self
with mp.Pool() as pool:
listofobjects = list(pool.map(Example.compute, listofobjects))
Update
If your object or some part of its reference tree does not support pickling (which is the form of serialization normally used to pass objects between processes), you can at least get rid of the wrapper function by returning the updated value directly from compute:
class Example:
...
def compute():
self.val3 = ...
return self.val3
with mp.Pool() as pool:
for value, obj in zip(pool.map(Example.compute, listofobjects), listofobjects):
obj.val3 = value
I have created a little code snippet (the original code is much larger) that calls a function which creates an object, but within a pool of processes:
import multiprocessing
class TestClass(object):
pass
def func():
obj = TestClass()
cpname = multiprocessing.current_process().name
print "{0}, Address: {1}".format(cpname, str(obj))
pool = multiprocessing.Pool(2)
results = [pool.apply_async(func) for _ in range(2)]
for res in results:
res.get()
pool.close()
pool.join()
When I run this code, I get the following output:
PoolWorker-1, Address: <__main__.TestClass object at 0x7f05d3fdad50>
PoolWorker-2, Address: <__main__.TestClass object at 0x7f05d3fdad50>
What I don't understand is, why the objects have the same address, even though the are in separate processes?
How can I make sure that every process creates its own object?
Thank you very much fir your help.
When you fork() for multiprocessing, it duplicates your process. The memory allocator and all addresses in the parent process will be copied into the child process. As a result, the next allocation will be very likely to have the same address.
You can verify that they are in fact separate objects like so:
import time
def func():
obj = TestClass()
obj.name = multiprocessing.current_process().name
print obj.name, str(obj)
time.sleep(1)
print obj.name, str(obj)
The objects are different, different processes use different virtual address space and the same address in different processes points to the different memory region.
If you change your example little bit you'll see that returned objects are different:
import multiprocessing
class TestClass(object):
pass
def func():
obj = TestClass()
cpname = multiprocessing.current_process().name
print "{0}, Address: {1}".format(cpname, str(obj))
return obj
pool = multiprocessing.Pool(2)
results = [pool.apply_async(func) for _ in range(2)]
results = [res.get() for res in results]
pool.close()
pool.join()
print results
Seems likely you're using a Linux-ish system, where new process are created via fork(). In that case, you should expect a great deal of overlap between addresses. That doesn't mean your obj instances occupy the same physical memory - just that they share the same virtual (process-local) addresses.
More here:
What happens to address's, values, and pointers after a fork()
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
how do I parallelize a simple python loop?
I'm quite new to Python (using Python 3.2) and I have a question concerning parallelisation. I have a for-loop that I wish to execute in parallel using "multiprocessing" in Python 3.2:
def computation:
global output
for x in range(i,j):
localResult = ... #perform some computation as a function of i and j
output.append(localResult)
In total, I want to perform this computation for a range of i=0 to j=100. Thus I want to create a number of processes that each call the function "computation" with a subdomain of the total range. Any ideas of how do to this? Is there a better way than using multiprocessing?
More specific, I want to perform a domain decomposition and I have the following code:
from multiprocessing import Pool
class testModule:
def __init__(self):
self
def computation(self, args):
start, end = args
print('start: ', start, ' end: ', end)
testMod = testModule()
length = 100
np=4
p = Pool(processes=np)
p.map(yes tMod.computation, [(length, startPosition, length//np) for startPosition in range(0, length, length//np)])
I get an error message mentioning PicklingError. Any ideas what could be the problem here?
Joblib is designed specifically to wrap around multiprocessing for the purposes of simple parallel looping. I suggest using that instead of grappling with multiprocessing directly.
The simple case looks something like this:
from joblib import Parallel, delayed
Parallel(n_jobs=2)(delayed(foo)(i**2) for i in range(10)) # n_jobs = number of processes
The syntax is simple once you understand it. We are using generator syntax in which delayed is used to call function foo with its arguments contained in the parentheses that follow.
In your case, you should either rewrite your for loop with generator syntax, or define another function (i.e. 'worker' function) to perform the operations of a single loop iteration and place that into the generator syntax of a call to Parallel.
In the later case, you would do something like:
Parallel(n_jobs=2)(delayed(foo)(parameters) for x in range(i,j))
where foo is a function you define to handle the body of your for loop. Note that you do not want to append to a list, since Parallel is returning a list anyway.
In this case, you probably want to define a simple function to perform the calculation and get localResult.
def getLocalResult(args):
""" Do whatever you want in this func.
The point is that it takes x,i,j and
returns localResult
"""
x,i,j = args #unpack args
return doSomething(x,i,j)
Now in your computation function, you just create a pool of workers and map the local results:
import multiprocessing
def computation(np=4):
""" np is number of processes to fork """
p = multiprocessing.Pool(np)
output = p.map(getLocalResults, [(x,i,j) for x in range(i,j)] )
return output
I've removed the global here because it's unnecessary (globals are usually unnecessary). In your calling routine you should just do output.extend(computation(np=4)) or something similar.
EDIT
Here's a "working" example of your code:
from multiprocessing import Pool
def computation(args):
length, startPosition, npoints = args
print(args)
length = 100
np=4
p = Pool(processes=np)
p.map(computation, [(startPosition,startPosition+length//np, length//np) for startPosition in range(0, length, length//np)])
Note that what you had didn't work because you were using an instance method as your function. multiprocessing starts new processes and sends the information between processes via pickle, therefore, only objects which can be pickled can be used. Note that it really doesn't make sense to use an instance method anyway. Each process is a copy of the parent, so any changes to state which happen in the processes do not propagate back to the parent anyway.