I'm creating a threading.Timer(2,work) run threads. Inside each work function, upon some condition the global counter must increment without conflict for access of counter variable among the spawned work threads.
I've tried Queue.Queue assigned counter as well as threading.Lock(). Which is a best way to implement thread-safe global increment variable.
Previously someone asked question here: Python threading. How do I lock a thread?
Not sure if you have tried this specific syntax already, but for me this has always worked well:
Define a global lock:
import threading
threadLock = threading.Lock()
and then you have to acquire and release the lock every time you increase your counter in your individual threads:
with threadLock:
global_counter += 1
One solution is to protect the counter with a multiprocessing.Lock. You could keep it in a class, like so:
from multiprocessing import Process, RawValue, Lock
import time
class Counter(object):
def __init__(self, value=0):
# RawValue because we don't need it to create a Lock:
self.val = RawValue('i', value)
self.lock = Lock()
def increment(self):
with self.lock:
self.val.value += 1
def value(self):
with self.lock:
return self.val.value
def inc(counter):
for i in range(1000):
counter.increment()
if __name__ == '__main__':
thread_safe_counter = Counter(0)
procs = [Process(target=inc, args=(thread_safe_counter,)) for i in range(100)]
for p in procs: p.start()
for p in procs: p.join()
print (thread_safe_counter.value())
The above snippet was first taken from Eli Bendersky's blog, here.
If you're using CPython1, you can do this without explicit locks:
import itertools
class Counter:
def __init__(self):
self._incs = itertools.count()
self._accesses = itertools.count()
def increment(self):
next(self._incs)
def value(self):
return next(self._incs) - next(self._accesses)
my_global_counter = Counter()
We need two counters: one to count increments and one to count accesses of value(). This is because itertools.count does not provide a way to access the current value, only the next value. So we need to "undo" the increments we incur just by asking for the value.
This is threadsafe because itertools.count.__next__() is atomic in CPython (thanks, GIL!) and we don't persist the difference.
Note that if value() is accessed in parallel, the exact number may not be perfectly stable or strictly monotonically increasing. It could be plus or minus a margin proportional to the number of threads accessing. In theory, self._incs could be updated first in one thread while self._accesses is updated first in another thread. But overall the system will never lose any data due to unguarded writes; it will always settle to the correct value.
1 Not all Python is CPython, but a lot (most?) is.
2 Credit to https://julien.danjou.info/atomic-lock-free-counters-in-python/ for the initial idea to use itertools.count to increment and a second access counter to correct. They stopped just short of removing all locks.
Related
I'm having trouble finding the correct way to implement the following: I have a class in Python3 for which I keep an instance counter. Using a concurrent.futures.ProcessPoolExecutor, I submit several tasks that make use of this class. I assumed that since the tasks ran in separate processes there would be no shared state between them, but it would seem I was wrong as this instance counter is shared among them. The following code exemplifies what I mean:
import concurrent.futures
class A:
counter = 0
def __init__(self):
A.counter += 1
self.id = A.counter
def hello(self):
return f'Hello from node{self.id}'
def start():
instance = A()
return instance.hello()
results = []
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
for i in range(4):
f = executor.submit(start)
results.append(f)
for r in results:
print(r.result())
The output from the above is:
Hello from node1
Hello from node2
Hello from node1
Hello from node1
The issue is not the race condition when accessing the counter, my issue is that the variable is even shared at all when I was expecting it to be private per process (e.g. start at 0 for each worker). What would be a pythonic way to achieve this?
Thanks in advance.
Here it would seem you found that tasks are not always evenly distributed between workers within a processing pool, and one of the workers managed to complete 2 "tasks" while one (of the 4) got none. In each worker the A class is defined either by copying the memory space from the time at which fork is called (*nix) or from importing the main file (Windows and MacOS). The class attribute counter behaves like a global variable, because it was not defined as an instance attribute, so any worker getting multiple tasks will see that value incremented each time. While it is possible to avoid this by limiting workers to only completing one task each before they're terminated and re-started, it's generally good practice to avoid global state in general. maxtasksperchild is much more frequently used to clean up instances where child processes may for various reasons not release memory or file handles over time to prevent leaks.
As you have stated in the comments, a long running task reduces the relative impact of the overhead of re-starting the process each task, however if you are using any of the functions that map a function over an iterable, and accept a chunksize argument, this approach may break. A "task" isn't specifically one iteration of the map, but may be several at once (to reduce overhead of transferring the arguments and results). This example should demonstrate a pool with maxtasksperchild=1, where each child ends up calling start() 4 times:
from multiprocessing import Pool
class A:
counter = 0
def __init__(self):
A.counter += 1
self.id = A.counter
def hello(self):
return f'Hello from node{self.id}'
def start(_):
instance = A()
print( instance.hello())
if __name__ == "__main__":
with Pool(4, maxtasksperchild=1) as p:
p.map(start, range(16), chunksize=4)
I have a Look Up Table LUT which is a very large dictionary (24G).
And I have millions of inputs to perform query on it.
I want to split the millions of inputs across 32 jobs, and run them in parallel.
Due to the space contraint, I cannot run multiple python scripts, because that will result in memory overload.
I want to use the multiprocessing module to only load the LUT just once, and then have different processes look it up, while sharing it as a global variable, without having to duplicate it.
However when I look at the htop, it seems each subprocess are re-creating the LUT? I made this claim because under the VIRT, RES, SHR. The numbers are very high.
But at the same time I dont see the additional memory used in the Mem row, it increased from 11Gb to 12.3G and just hovers there.
So im confused, is it, or is it not re-creating the LUT within each sub process ?
How should i proceed to make sure i am running parallel works, without duplicating LUT in each subprocess ?
Code is shown below the picture.
(In this experiment I'm only using 1Gb of LUT so, dont worry about it not being 24Gb)
import os, sys, time, pprint, pdb, datetime
import threading, multiprocessing
## Print the process/thread details
def getDetails(idx):
pid = os.getpid()
threadName = threading.current_thread().name
processName = multiprocessing.current_process().name
print(f"{idx})\tpid={pid}\tprocessName={processName}\tthreadName={threadName} ")
return pid, threadName, processName
def ComplexAlgorithm(value):
# Instead of just lookup like this
# the real algorithm is some complex algorithm that performs some search
return value in LUT
## Querying the 24Gb LUT from my millions of lines of input
def PerformMatching(idx, NumberOfLines):
pid, threadName, processName = getDetails(idx)
NumberMatches = 0
for _ in range(NumberOfLines):
# I will actually read the contents from my file live,
# but here just assume i generate random numbers
value = random.randint(-100, 100)
if ComplexAlgorithm(value): NumberMatches += 1
print(f"\t{idx}) | LUT={len(LUT)} | NumberMatches={NumberMatches} | done")
if __name__ == "__main__":
## Init
num_processes = 9
# this is just a pseudo-call to show you the structure of my LUT, the real one is larger
LUT = (dict(i,set([i])) for i in range(1000))
## Store the multiple filenames
ListOfLists = []
for idx in range(num_processes):
NumberOfLines = 10000
ListOfLists.append( NumberOfLines )
## Init the processes
ProcessList = []
for processIndex in range(num_processes):
ProcessList.append(
multiprocessing.Process(
target=PerformMatching,
args=(processIndex, ListOfLists[processIndex])
)
)
ProcessList[processIndex].start()
## Wait until the process terminates.
for processIndex in range(num_processes):
ProcessList[processIndex].join()
## Done
If you want to go the route of using a multiprocessing.Manager, this is how you could do it. The trade-off is that the dictionary is represented by a reference to a proxy for the actual dictionary that exists in a different address space and consequently every dictionary reference results in the equivalent of a remote procedure call. In other words, access is much slower compared with a "regular" dictionary.
In the demo program below, I have only defined a couple of methods for my managed dictionary, but you can define whatever you need. I have also used a multiprocessing pool instead of explicitly starting individual processes; you might consider doing likewise.
from multiprocessing.managers import BaseManager, BaseProxy
from multiprocessing import Pool
from functools import partial
def worker(LUT, key):
return LUT[key]
class MyDict:
def __init__(self):
""" initialize the dictionary """
# the very large dictionary reduced for demo purposes:
self._dict = {i: i for i in range(100)}
def get(self, obj, default=None):
""" delegates to underlying dict """
return self._dict.get(obj, default)
def __getitem__(self, obj):
""" delegates to underlying dict """
return self._dict[obj]
class MyDictManager(BaseManager):
pass
class MyDictProxy(BaseProxy):
_exposed_ = ('get', '__getitem__')
def get(self, *args, **kwargs):
return self._callmethod('get', args, kwargs)
def __getitem__(self, *args, **kwargs):
return self._callmethod('__getitem__', args, kwargs)
def main():
MyDictManager.register('MyDict', MyDict, MyDictProxy)
with MyDictManager() as manager:
my_dict = manager.MyDict()
pool = Pool()
# pass proxy instead of actual LUT:
results = pool.map(partial(worker, my_dict), range(100))
print(sum(results))
if __name__ == '__main__':
main()
Prints:
4950
Discussion
Python comes with a managed dict class built in obtainable with multiprocessing.Manager().dict(). But initializing such a large number of entries with such a dictionary would be very inefficient based on my prior comment that each access would be relatively expensive. It seemed to me that it would be less expensive to create our own managed class that had an underlying "regular" dictionary that could be initialized directly when the managed class is constructed and not via the proxy reference. And while it is true that the managed dict that comes with Python can be instantiated with an already built dictionary, which avoids that inefficiency problem, my concern is that memory efficiency would suffer because you would have two instances of the dictionary, i.e. the "regular" dictionary and the "managed" dictionary.
I have an automation test, which uses function that creates screenshots to a folder. This function is called by multiple screenshot instances. On every test run, a new folder is created, so I don't care about counter reset. In order to reflect the order at which these screenshots are taken, I had to come up with names that could be sorted by order. This is my solution:
def make_screenshot_file(file_name):
order = Counter().count
test_suites_path = _make_job_directory()
return make_writable_file(os.path.join(test_suites_path,'screenshot',file_name % order))
class Counter():
__counter_instance = None
def __init__(self):
if Counter.__counter_instance is None:
self.count = 1
Counter.__counter_instance = self
else:
Counter.__counter_instance.count += 1
self.count = Counter.__counter_instance.count
It works fine for me. But I keep thinking that there should be an easier way to solve this problem. Is there? And if singleton is the only way, could my code be optimized in any way?
What you're trying to do here is simulate a global variable.
There is no good reason to do that. If you really want a global variable, make it explicitly a global variable.
You could create a simple Counter class that increments count by 1 each time you access it, and then create a global instance of it. But the standard library already gives you something like that for free, in itertools.count, as DSM explains in a comment.
So:
import itertools
_counter = itertools.count()
def make_screenshot_file(file_name):
order = next(_counter)
test_suites_path = _make_job_directory()
return make_writable_file(os.path.join(test_suites_path,'screenshot',file_name % order))
I'm not sure why you're so worried about how much storage or time this takes up, because I can't conceive of any program where it could possibly matter whether you were using 8 bytes or 800 for a single object you could never have more than one or, or whether it took 3ns or 3us to access it when you only do so a handful of times.
But if you are worried, as you can see from the source, count is implemented in C, it's pretty memory-efficient, and if you don't do anything fancy with it, it comes down to basically a single PyNumber_Add to generate each number, which is a lot less than interpreting a few lines of code.
Since you asked, here's how you could radically simplify your existing code by using a _count class attribute instead of a __counter_instance class attribute:
class Counter():
_count = 0
def count(self):
Counter._count += 1
return Counter.count
Of course now you have to to Counter().count() instead of just Counter().count—but you can fix that trivially with #property if it matters.
It's worth pointing out that it's a really bad idea to use a classic class instead of a new-style class (by passing nothing inside the parens), and if you do want a classic class you should leave the parens off, and most Python programmers will associate the name Counter with the class collections.Counter, and there's no reason count couldn't be a #classmethod or #staticmethod… at which point this is exactly Andrew T.'s answer. Which, as he points out, is much simpler than what you're doing, and no more or less Pythonic.
But really, all of this is no better than just making _count a module-level global and adding a module-level count() function that increments and returns it.
why not just do
order = time.time()
or do something like
import glob #glob is used for unix like path expansion
order = len(glob.glob(os.path.join(test_suites_path,"screenshot","%s*"%filename))
Using static methods and variables. Not very pythonic, but simpler.
def make_screenshot_file(file_name):
order = Counter.count() #Note the move of the parens
test_suites_path = _make_job_directory()
return make_writable_file(os.path.join(test_suites_path,'screenshot',file_name % order))
class Counter():
count_n = 0
#staticmethod
def count():
Counter.count_n += 1
return Counter.count_n
print Counter.count()
print Counter.count()
print Counter.count()
print Counter.count()
print Counter.count()
atarzwell#freeman:~/src$ python so.py
1
2
3
4
5
Well , you can use this solution, just make sure you never initialize the order kwarg!
Mutable Kwargs in function's work like classes global variables. And the value isn't reset to default between calls, as you might think at first!
def make_screenshot_file(file_name , order=[0]):
order[0] = order[0] + 1
test_suites_path = _make_job_directory()
return make_writable_file(os.path.join(test_suites_path,'screenshot',file_name % order[0]))
I have a function that does a calculation and saves the state of the calculation in the result dictionary (default default argument). I first run it, then run several processes using the multiprocessing module. I need to run the function again in each of those parallel processes, but after this function has run once, I need the cached state to be returned, the value must not be recalculated. This requirement doesn't make sense in my example, but I can't think of a simple realistic argument that would require this restriction. Using a dict as mutable default argument works, but
this doesn't work with the multiprocessing module. What approach can I use to get the same effect?
Note that the state value is something (a dictionary containing class values) that cannot be passed to the multiple processes as an argument afaik.
The SO question Python multiprocessing: How do I share a dict among multiple processes? seems to cover similar ground. Perhaps I can use a Manager to do what I need, but it is not obvious how. Alternatively, one could perhaps save the value to a global object, per https://stackoverflow.com/a/4534956/350713, but that doesn't seem very elegant.
def foo(result={}):
if result:
print "returning cached result"
return result
result[1] = 2
return result
def parafn():
from multiprocessing import Pool
pool = Pool(processes=2)
arglist = []
foo()
for i in range(4):
arglist.append({})
results = []
r = pool.map_async(foo, arglist, callback=results.append)
r.get()
r.wait()
pool.close()
pool.join()
return results
print parafn()
UPDATE: Thanks for the comments. I've got a working example now, posted below.
I think the safest way of exchange data between procesess is with a Queue, the multiprocessing module brings you 2 types of them Queue and JoinableQueue, see documentation:
http://docs.python.org/library/multiprocessing.html#exchanging-objects-between-processes
This code would not win any beauty prizes, but works for me.
This example is similar to the example in the question, but with some minor changes.
The add_to_d construct is a bit awkward, but I don't see a better way to do this.
Brief summary: I copy the state of foo's d, (which is a mutable default argument) back to foo,
but the foo in the new process spaces created by the pool. Once this is done, then foo in the new process spaces
will not recalculate the cached values.
It seems this is what the pool initializer does, though the documentation is not very explicit.
class bar(object):
def __init__(self, x):
self.x = x
def __repr__(self):
return "<bar "+ str(self.x) +">"
def foo(x=None, add_to_d=None, d = {}):
if add_to_d:
d.update(add_to_d)
if x is None:
return
if x in d:
print "returning cached result, d is %s, x is %s"%(d, x)
return d[x]
d[x] = bar(x)
return d[x]
def finit(cacheval):
foo(x=None, add_to_d=cacheval)
def parafn():
from multiprocessing import Pool
arglist = []
foo(1)
pool = Pool(processes=2, initializer=finit, initargs=[foo.func_defaults[2]])
arglist = range(4)
results = []
r = pool.map_async(foo, iterable=arglist, callback=results.append)
r.get()
r.wait()
pool.close()
pool.join()
return results
print parafn()
I've read a little about coroutines, in particular with python, and something is not entirely obvious to me.
I have implemented a producer/consumer model, a basic version of which is as follows:
#!/usr/bin/env python
class MyConsumer(object):
def __init__(self, name):
self.__name = name
def __call__(self, data):
return self.observer(data)
def observer(self, data):
print self.__name + ': ' + str(data)
class MyProducer(object):
def __init__(self):
self.__observers = []
self.__counter = 0
def add_observer(self, observer):
self.__observers.append(observer)
def run(self):
while self.__counter < 10:
for each_observer in self.__observers:
each_observer(self.__counter)
self.__counter += 1
def main():
consumer_one = MyConsumer('consumer one')
consumer_two = MyConsumer('consumer two')
producer = MyProducer()
producer.add_observer(consumer_one)
producer.add_observer(consumer_two)
# run
producer.run()
if __name__ == "__main__":
main()
Obviously, MyConsumer could have routines for producing as well and so a data pipeline can be built easily. As I have implemented this in practice, a base class is defined that implements the logic of the consumer/producer model and single processing function is implemented that is overwritten in child classes. This makes it very simple to produce data pipelines with easily defined, isolated processing elements.
This seems to me to be typical of the kinds of applications that are presented for coroutines, for example in the oft quoted tutorial: http://www.dabeaz.com/coroutines/index.html. Unfortunately, it is not apparent to me what the advantages of coroutines are over the implementation above. I can see that in languages in which callable objects are more difficult to handle, there is something to be gained, but in the case of python, this doesn't seem to be an issue.
Can anybody shed some light on this for me? Thanks.
edit: Apologies, the producer in the above code counts from 0 to 9 and notifies the consumers, which then print out their name followed by the count value.
When using the coroutines approach, both the consumer and the producer code can be simpler sometimes. In your approach, at least one of them must be written as a finite-state-machine (assuming some state is involved).
With the coroutines approach they are essentially independent processes.
An example would help:
Take the example you provided but now assume the consumer prints only every 2nd input. Your approach requires adding an object member indicating whether the received input is an odd or even sample.
def observer(self, data):
self.odd_sample = !self.odd_sample
if self.odd_sample:
print str(data)
When using a coroutine, one would just loop over the input, dropping every second input. The 'state' is implicitly maintained by the current position in the code:
while True:
y = producer()
print(y)
y = producer()
# ignore this value