Python instance counter in ProcessPool - python

I'm having trouble finding the correct way to implement the following: I have a class in Python3 for which I keep an instance counter. Using a concurrent.futures.ProcessPoolExecutor, I submit several tasks that make use of this class. I assumed that since the tasks ran in separate processes there would be no shared state between them, but it would seem I was wrong as this instance counter is shared among them. The following code exemplifies what I mean:
import concurrent.futures
class A:
counter = 0
def __init__(self):
A.counter += 1
self.id = A.counter
def hello(self):
return f'Hello from node{self.id}'
def start():
instance = A()
return instance.hello()
results = []
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
for i in range(4):
f = executor.submit(start)
results.append(f)
for r in results:
print(r.result())
The output from the above is:
Hello from node1
Hello from node2
Hello from node1
Hello from node1
The issue is not the race condition when accessing the counter, my issue is that the variable is even shared at all when I was expecting it to be private per process (e.g. start at 0 for each worker). What would be a pythonic way to achieve this?
Thanks in advance.

Here it would seem you found that tasks are not always evenly distributed between workers within a processing pool, and one of the workers managed to complete 2 "tasks" while one (of the 4) got none. In each worker the A class is defined either by copying the memory space from the time at which fork is called (*nix) or from importing the main file (Windows and MacOS). The class attribute counter behaves like a global variable, because it was not defined as an instance attribute, so any worker getting multiple tasks will see that value incremented each time. While it is possible to avoid this by limiting workers to only completing one task each before they're terminated and re-started, it's generally good practice to avoid global state in general. maxtasksperchild is much more frequently used to clean up instances where child processes may for various reasons not release memory or file handles over time to prevent leaks.
As you have stated in the comments, a long running task reduces the relative impact of the overhead of re-starting the process each task, however if you are using any of the functions that map a function over an iterable, and accept a chunksize argument, this approach may break. A "task" isn't specifically one iteration of the map, but may be several at once (to reduce overhead of transferring the arguments and results). This example should demonstrate a pool with maxtasksperchild=1, where each child ends up calling start() 4 times:
from multiprocessing import Pool
class A:
counter = 0
def __init__(self):
A.counter += 1
self.id = A.counter
def hello(self):
return f'Hello from node{self.id}'
def start(_):
instance = A()
print( instance.hello())
if __name__ == "__main__":
with Pool(4, maxtasksperchild=1) as p:
p.map(start, range(16), chunksize=4)

Related

Updating the same instance variables from different processes

Here is a simple secinaro:
class Test:
def __init__(self):
self.foo = []
def append(self, x):
self.foo.append(x)
def get(self):
return self.foo
def process_append_queue(append_queue, bar):
while True:
x = append_queue.get()
if x is None:
break
bar.append(x)
print("worker done")
def main():
import multiprocessing as mp
bar = Test()
append_queue = mp.Queue(10)
append_queue_process = mp.Process(target=process_append_queue, args=(append_queue, bar))
append_queue_process.start()
for i in range(100):
append_queue.put(i)
append_queue.put(None)
append_queue_process.join()
print str(bar.get())
if __name__=="__main__":
main()
When you call bar.get() at the end of the main() function why does it still return an empty list? How can I make it so that the child process also works with the same instance of Test not a new one?
All answers appreciated!
In general, processes have distinct address spaces, so that mutations of an object in one process have no effect on any object in any other process. Interprocess communication is needed to tell a process about changes made in another process.
That can be done explicitly (using things like multiprocessing.Queue), or implicitly if you use a facility implemented by multiprocessing for this purpose. For example, a great deal of work is done under the covers to make changes to a multiprocessing.Queue visible across processes.
The easiest way in your specific example is to replace your __init__ function like so:
def __init__(self):
import multiprocessing as mp
self.foo = mp.Manager().list()
It so happens that an mp.Manager instance supports a list() method that creates a process-aware list object (really a proxy for a list object, which forwards list operations to an under-the-covers server process that maintains a single copy of "the real" list - the list object isn't really shared across processes, because that's impossible - but the proxies make it appear to be shared).
So if you make that change, your code will display the results you expect - and there is no simpler way.
Note that multiprocessing works better the less IPC (interprocess communication) you need, and that's true pretty much regardless of application or programming language.
Objects are copied between processes by pickling them and passing the string over a pipe. There is no way to achieve true "shared memory" for pure Python objects between processes. To achieve precisely this type of synchronization, take a look at the multiprocessing.Manager documentation (https://docs.python.org/2/library/multiprocessing.html#managers) which provides you with examples about synchronized versions of common Python container types. These are "proxied" containers where operations on the proxy send all arguments across the process boundary, pickled, and are then executed in the parent process.

how to make a thread-safe global counter in python

I'm creating a threading.Timer(2,work) run threads. Inside each work function, upon some condition the global counter must increment without conflict for access of counter variable among the spawned work threads.
I've tried Queue.Queue assigned counter as well as threading.Lock(). Which is a best way to implement thread-safe global increment variable.
Previously someone asked question here: Python threading. How do I lock a thread?
Not sure if you have tried this specific syntax already, but for me this has always worked well:
Define a global lock:
import threading
threadLock = threading.Lock()
and then you have to acquire and release the lock every time you increase your counter in your individual threads:
with threadLock:
global_counter += 1
One solution is to protect the counter with a multiprocessing.Lock. You could keep it in a class, like so:
from multiprocessing import Process, RawValue, Lock
import time
class Counter(object):
def __init__(self, value=0):
# RawValue because we don't need it to create a Lock:
self.val = RawValue('i', value)
self.lock = Lock()
def increment(self):
with self.lock:
self.val.value += 1
def value(self):
with self.lock:
return self.val.value
def inc(counter):
for i in range(1000):
counter.increment()
if __name__ == '__main__':
thread_safe_counter = Counter(0)
procs = [Process(target=inc, args=(thread_safe_counter,)) for i in range(100)]
for p in procs: p.start()
for p in procs: p.join()
print (thread_safe_counter.value())
The above snippet was first taken from Eli Bendersky's blog, here.
If you're using CPython1, you can do this without explicit locks:
import itertools
class Counter:
def __init__(self):
self._incs = itertools.count()
self._accesses = itertools.count()
def increment(self):
next(self._incs)
def value(self):
return next(self._incs) - next(self._accesses)
my_global_counter = Counter()
We need two counters: one to count increments and one to count accesses of value(). This is because itertools.count does not provide a way to access the current value, only the next value. So we need to "undo" the increments we incur just by asking for the value.
This is threadsafe because itertools.count.__next__() is atomic in CPython (thanks, GIL!) and we don't persist the difference.
Note that if value() is accessed in parallel, the exact number may not be perfectly stable or strictly monotonically increasing. It could be plus or minus a margin proportional to the number of threads accessing. In theory, self._incs could be updated first in one thread while self._accesses is updated first in another thread. But overall the system will never lose any data due to unguarded writes; it will always settle to the correct value.
1 Not all Python is CPython, but a lot (most?) is.
2 Credit to https://julien.danjou.info/atomic-lock-free-counters-in-python/ for the initial idea to use itertools.count to increment and a second access counter to correct. They stopped just short of removing all locks.

Python multiprocessing seems near impossible to do within classes/using any class instances. What is its intended use?

I have an alogirithm that I am trying to parallelize, because of very long run times in serial. However, the function that needs to be parallelized is inside a class. multiprocessing.Pool seems to be the best and fastest way to do this, but there is a problem. It's target function can not be a function of an object instance. Meaning this; you declare a Pool in the following way:
import multiprocessing as mp
cpus = mp.cpu_count()
poolCount = cpus*2
pool = mp.Pool(processes = poolCount, maxtasksperchild = 2)
And then actually use it as so:
pool.map(self.TargetFunction, args)
But this throws an error, because object instances cannot be pickled, as the Pool function does to pass information to all of its child processes. But I have to use self.TargetFunction
So I had an idea, I would create a new Python file named parallel and simply write a couple of functions without putting them in a class, and call those functions from within my original class (of whose function I want to parallelize)
So I tried this:
import multiprocessing as mp
def MatrixHelper(args):
WM = args[0][0]
print(WM.CreateMatrixMp(*args))
return WM.CreateMatrixMp(*args)
def Start(sigmaI, sigmaX, numPixels, WM):
cpus = mp.cpu_count()
poolCount = cpus * 2
args = [(WM, sigmaI, sigmaX, i) for i in range(numPixels)]
print('Number of cpu\'s to process WM:%d'%cpus)
pool = mp.Pool(processes = poolCount, maxtasksperchild = 2)
tempData = pool.map(MatrixHelper, args)
return tempData
These functions are not part of a class, using MatrixHelper in Pools map function works fine. But I realized while doing this that it was no way out. The function in need of parallelization (CreateMatrixMp) expects an object to be passed to it (it is declared as def CreateMatrixMp(self, sigmaI, sigmaX, i))
Since it is not being called from within its class, it doesn't get a self passed to it. To solve this, I passed the Start funtion the calling object itself. As in, I say parallel.Start(sigmaI, sigmaX, self.numPixels, self). The object self then becomes WM so that I will be able to finally call the desired function as WM.CreateMatrixMp().
I'm sure that that is a very sloppy way of coding, but I just wanted to see if it would work. But nope, pickling error again, the map function cannot handle any objects instances at all.
So my question is, why is it designed this way? It seems useless, it seems to be completely disfunctional in any program that uses classes at all.
I tried using Process rather than Pool, but this requires the array that I am ultimately writing to to be shared, which requires processes waiting for eachother. If I don't want it to be shared, then I have each process write its own smaller array, and do one big write at the end. But both of these result in slower run times than when I was doing this serially! Pythons builtin multiprocessing seems absolutely useless!
Can someone please give me some guidance as to how to actually save time with multiprocessing, in the context of my tagret function being inside a class? I have read on posts here to use pathos.multiprocessing instead, but I am on Windows, and am working on this project with multiple people who all have different set ups. Having everyone try to install it would be inconveinient.
I was having a similar issue with trying to use multiprocessing within a class. I was able to solve it with a relatively easy workaround I found online. Basically you use a function outside of your class that unwraps/unpacks the method inside your function that you're trying to parallelize. Here are the two websites I found that explain how to do it.
Website 1 (joblib example)
Website 2 (multiprocessing module example)
For both, the idea is to do something like this:
rom multiprocessing import Pool
import time
def unwrap_self_f(arg, **kwarg):
return C.f(*arg, **kwarg)
class C:
def f(self, name):
print 'hello %s,'%name
time.sleep(5)
print 'nice to meet you.'
def run(self):
pool = Pool(processes=2)
names = ('frank', 'justin', 'osi', 'thomas')
pool.map(unwrap_self_f, zip([self]*len(names), names))
if __name__ == '__main__':
c = C()
c.run()
The essence of how multiprocessing works is that it spawns sub-processes that receive parameters to run a certain function. In order to pass these arguments, it needs that they are, well, passable: non-exclusive to the main process, s.a. sockets, file descriptors and other low-level, OS related stuff.
This translates into "need to be pickleable or serializable".
On the same topic, parallel processing works best when you (can) have self-contained divisions of a problem. I can tell you want to share some sort of input/stream/database source, but this will probably create a bottleneck that you'll have to tackle at some point (at least, from the "python script" side, rather than the "OS/database" side. Fortunately, you have to tackle it early now.
You can re-code your classes to spawn/create these non-pickable resources when neeeded rather than at start
def targetFunction(self, range_params):
if not self.ready():
self._init_source()
#rest of the code
You kinda tackled the problem the other way around (initialized an object based on params). And yes, parallel processing comes with a cost.
You can see the multiprocessing programming guidelines for an even more thorough insight on this matter.
this is an old post but it still is one of the top results when you search for the topic. Some good info for this question can be found at this stack overflow: python subclassing multiprocessing.Process
I tried some workarounds to try calling pool.starmap from inside of a class to another function in the class. Making it a staticmethod or having a function on the outside call it didn't work and gave the same error. A class instance just can't be pickled so we need to create the instance after we start the multiprocessing.
What I ended up doing that worked for me was to separate my class into two classes. Basically, the function you are calling the multiprocessing on needs to be called right after you instantiate a new object for the class it belongs to.
Something like this:
from multiprocessing import Pool
class B:
...
def process_feature(idx, feature):
# do stuff in the new process
pass
...
def multiprocess_feature(process_args):
b_instance = B()
return b_instance.process_feature(*process_args)
class A:
...
def process_stuff():
...
with Pool(processes=num_processes, maxtasksperchild=10) as pool:
results = pool.starmap(
multiprocess_feature,
[
(idx, feature)
for idx, feature in enumerate(features)
],
chunksize=100,
)
...
...
...

Creating and reusing objects in python processes

I have an embarrassingly parallelizable problem consisting on a bunch of tasks that get solved independently of each other. Solving each of the tasks is quite lengthy, so this is a prime candidate for multi-processing.
The problem is that solving my tasks requires creating a specific object that is very time consuming on its own but can be reused for all the tasks (think of an external binary program that needs to be launched), so in the serial version I do something like this:
def costly_function(task, my_object):
solution = solve_task_using_my_object
return solution
def solve_problem():
my_object = create_costly_object()
tasks = get_list_of_tasks()
all_solutions = [costly_function(task, my_object) for task in tasks]
return all_solutions
When I try to parallelize this program using multiprocessing, my_object cannot be passed as a parameter for a number of reasons (it cannot be pickled, and it should not run more than one task at the same time), so I have to resort to create a separate instance of the object for each task:
def costly_function(task):
my_object = create_costly_object()
solution = solve_task_using_my_object
return solution
def psolve_problem():
pool = multiprocessing.Pool()
tasks = get_list_of_tasks()
all_solutions = pool.map_async(costly_function, tasks)
return all_solutions.get()
but the added costs of creating multiple instances of my_object makes this code only marginally faster than the serialized one.
If I could create a separate instance of my_object in each process and then reuse them for all the tasks that get run in that process, my timings would significantly improve. Any pointers on how to do that?
I found a simple way of solving my own problem without bringing in any tools besides the standard library, I thought I'd write it down here in case somebody else has a similar problem.
multiprocessing.Pool accepts an initializer function (with arguments) that gets run when each process is launched. The return value of this function is not stored anywhere, but one can take advantage of the function to set up a global variable:
def init_process():
global my_object
my_object = create_costly_object()
def costly_function(task):
global my_object
solution = solve_task_using_my_object
return solution
def psolve_problem():
pool = multiprocessing.Pool(initializer=init_process)
tasks = get_list_of_tasks()
all_solutions = pool.map_async(costly_function, tasks)
return all_solutions.get()
Since each process has a separate global namespace, the instantiated objects do not clash, and they are created only once per process.
Probably not the most elegant solution, but it's simple enough and gives me a near-linear speedup.
you can have celery project handle all this for you, among many other features it also have a way to run some task initialization that can be used latter by all tasks
You are right that you are constrained to pickable objects when using multiprocessing. Are you absolutely sure that your object is un-pickleable?
Have you tried dill? If you import it in, anytime pickle is called it will use the dill bindings. It worked for me, when I was trying to use multiprocessing on sympy equations.

python multiprocessing scheduling task

I have 8 CPU core and 200 tasks to done. Each tasks are isolate. There is no need to wait or share the result. I'm looking for a way to run 8 tasks/processes at a time (Maximum) and when one of them finished. The remaining task will automatic start process.
How to know when the child process was done and start a new child process. First I'm trying to use process(multiprocessing) and it's hard to figure out. Then I try to use pool and face with the pickle problem cause I need to use dynamic instantiate.
Edited : Adding my code of Pool
class Collectorparallel():
def fire(self,obj):
collectorController = Collectorcontroller()
collectorController.crawlTask(obj)
def start(self):
log_to_stderr(logging.DEBUG)
pluginObjectList = []
for pluginName in self.settingModel.getAllCollectorName():
name = pluginName.capitalize()
#Get plugin class and instanitiate object
module = __import__('plugins.'+pluginName,fromlist=[name])
pluginClass = getattr(module,name)
pluginObject = pluginClass()
pluginObjectList.append(pluginObject)
pool = Pool(8)
jobs = pool.map(self.fire,pluginObjectList)
pool.close()
print pluginObjectList
pluginObjectList got something like
[<plugins.name1.Name1 instance at 0x1f54290>, <plugins.name2.Name2 instance at 0x1f54f38>]
PicklingError: Can't pickle : attribute lookup builtin.instancemethod failed
but the Process version work fine
Warning this is kinda subjective to deployment and situation but my current setup is as follows
I have a worker program, I fire up 6 copies (I have 6 cores).
Each worker does the following;
Connect to a Redis instance
Try and pop some work of a specific list
Pushes back logging information
Either idles or terminates on a lack of work in the 'queue'
Then each program is essentially standalone while still doing the work you require with a separate queuing system. As you have no go-between on your processes, this might be a solution to your problem.
I'm not an expert in multiprocessing in Python but I tried some fiew things with this help http://www.tutorialspoint.com/python/python_multithreading.htm and this one too http://www.devshed.com/c/a/Python/Basic-Threading-in-Python/1/ .
You can for example use this method isAlive which answering your question.
The solution to your problem is trivial. First of all, note that methods cannot be pickled. In fact only the types listed in pickle's documentation can be pickled:
None, True, and False
integers, long integers, floating point numbers, complex numbers
normal and Unicode strings
tuples, lists, sets, and dictionaries containing only picklable objects
functions defined at the top level of a module
built-in functions defined at the top level of a module
classes that are defined at the top level of a module
instances of such classes whose __dict__ or the result of calling __getstate__() is picklable (see section The pickle protocol
for details).
[...]
Note that functions (built-in and user-defined) are pickled by
“fully qualified” name reference, not by value. This means that
only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of
its function attributes are pickled. Thus the defining module must be
importable in the unpickling environment, and the module must contain
the named object, otherwise an exception will be raised. [4]
Similarly, classes are pickled by named reference, so the same
restrictions in the unpickling environment apply. Note that none of
the class’s code or data is pickled[...]
Clearly a method isn't a function defined at the top level of a module, hence it cannot be pickled.(read carefully that part of the documentation to avoid future problems with pickle!) But it is absolutely trivial to replace the method with a global function and passing self as additional parameter:
import itertools as it
def global_fire(argument):
self, obj = argument
self.fire(obj)
class Collectorparallel():
def fire(self,obj):
collectorController = Collectorcontroller()
collectorController.crawlTask(obj)
def start(self):
log_to_stderr(logging.DEBUG)
pluginObjectList = []
for pluginName in self.settingModel.getAllCollectorName():
name = pluginName.capitalize()
#Get plugin class and instanitiate object
module = __import__('plugins.'+pluginName,fromlist=[name])
pluginClass = getattr(module,name)
pluginObject = pluginClass()
pluginObjectList.append(pluginObject)
pool = Pool(8)
jobs = pool.map(global_fire, zip(it.repeat(self), pluginObjectList))
pool.close()
print pluginObjectList
Note that, since Pool.map calls the given function with only one argument, we have to "pack together" both self and the actual argument. To do this I have zipped it.repeat(self) and the original iterable.
If you do not care about the order in which the calls are done then using pool.imap_unordered might provide better performances. However it returns an iterable and not a list, so if you want the list of results you'll have to do jobs = list(pool.imap_unordered(...)).
I believe that this code will remove all pickling problems.
class Collectorparallel():
def __call__(self,cNames):
for pluginName in cNames:
name = pluginName.capitalize()
#Get plugin class and instanitiate object
module = __import__('plugins.'+pluginName,fromlist=[name])
pluginClass = getattr(module,name)
pluginObject = pluginClass()
pluginObjectList.append(pluginObject)
collectorController = Collectorcontroller()
collectorController.crawlTask(obj)
def start(self):
log_to_stderr(logging.DEBUG)
pool = Pool(8)
jobs = pool.map(self,self.settingModel.getAllCollectorName())
pool.close()
What has happened here is that Collectorparallel has been turned into a callable. The list of plugin names is used as the iterable for the pool, the actual determination of the plugins and their instantiation is done in each of the worker processes, and the class instance object is used as the callable for each worker process.

Categories

Resources