TL;DR I want to collect the accumulated data in the globals of each worker when the pool is finished processing
Description of what I think I'm missing
As I'm new to multiprocessing, I don't know of all the features that exist. I am looking for a way to make a worker return the value it was initialized with (after manipulating that value a bunch of millions of times). Then, I hope I can collect and merge all these values at the end of the program when all the 'jobs' are done.
import multiprocessing as mp
from collections import defaultdict, Counter
from customtools import load_regexes #, . . .
import gzip
import nltk
result_dict = None
regexes = None
def create_worker():
global result_dict
global regexes
result_dict = defaultdict(Counter) # I want to return this at the end
# these are a bunch of huge regexes
regexes = load_regexes()
These functions represents the way I load and process data. The data is a big gzipfile with articles.
def load_data(semaphore):
with gzip.open('some10Gbfile') as f:
for line in file:
semaphore.acquire()
yield str(line, 'utf-8')
def worker_job(line):
global regexes
global result_dict
hits = defaultdict(Counter)
for sent in nltk.sent_tokenize(line[3:]):
for rename, regex in regex.items():
for hit in regex.finditer(sent):
hits[rename][hit.group(0)]+=1
# and more and more... results = _filter(_extract(hits))
# store some data in results_dict here . . .
return filtered_hits
Class ResultEater():
def __init__(self):
self.wordscounts=defaultdict(Counter)
self.filtered=Counter()
def eat_results(self, filte red_hits):
for k, v in filte.items():
for i, c in v.items():
self.wordscount[k][i]+=c
This is the main program
if __name__ == '__main__':
pool = mp.Pool(mp.cpu_count(), initializer=create_worker)
semaphore = mp.Semaphore(50)
loader = load_data(semaphore)
results = ResultEater()
for intermediate_result in pool.imap_unordered(worker_job, loader, chunksize=10):
results.eat_results(intermediate_result)
semaphore.release()
# results.eat_workers(the_leftover_workers_or_something)
results.print()
I don't really think I understand how exactly returning the data incrementally isn't sufficient, but it kinda seems like you need some sort of finalization function to send the data similar to how you have an initialization function. Unfortunately, I don't think this sort of thing exists for mp.Pool, so it'll require you to use a couple mp.Process's, and send input args, and return results with a couple mp.Queue's
On a side note your use of Semaphore is unncessary, as the call to the "load_data" iterator always happens on the main process. I have moved that to another "producer" process, which puts inputs to a queue, which is also already synchronized automatically by default. This allows you to have one process for gathering inputs, several processes for processing the inputs to outputs, and leaves the main (parent) process to gather outputs. If the "producer" generating the inputs is IO limited by file read speed (very likely), it could also be in a thread rather than a process, but in this case the difference is probably minimal.
I have created an example of a custom "Pool" which allows you to return some data at the end of each worker's "life" using aforementioned "producer-consumer" scheme. there are print statements to track what is going on in each process, but please also read the comments to track what's going on and why:
import multiprocessing as mp
from time import sleep
from queue import Empty
class ExitFlag:
def __init__(self, exit_value=None):
self.exit_value = exit_value #optionally pass value along with exit flag
def producer_func(input_q, n_workers):
for i in range(100): #100 lines of some long file
print(f"put {i}")
input_q.put(i) #put each line of the file to the work queue
print('stopping consumers')
for i in range(n_workers):
input_q.put(ExitFlag()) #send shut down signal to each of the workers
print('producer exiting')
def consumer_func(input_q, output_q, work_func):
counter = 0
while True:
try:
item = input_q.get(.1) #never wait forever on a "get". It's a recipe for deadlock.
except Empty:
continue
print(f"get {item}")
if isinstance(item, ExitFlag):
break
else:
counter += 1
output_q.put(work_func(item))
output_q.put(ExitFlag(exit_value=counter))
print('consumer exiting')
def work_func(number):
sleep(.1) #some heavy nltk work...
return number*2
if __name__ == '__main__':
input_q = mp.Queue(maxsize=10) #only bother limiting size if you have memory usage constraints
output_q = mp.Queue(maxsize=10)
n_workers = mp.cpu_count()
producer = mp.Process(target=producer_func, args=(input_q, n_workers)) #generate the input from another process. (this could just as easily be a thread as it seems it will be IO limited anyway)
producer.start()
consumers = [mp.Process(target=consumer_func, args=(input_q, output_q, work_func)) for _ in range(n_workers)]
for c in consumers: c.start()
total = 0
stop_signals = 0
exit_values = []
while True:
try:
item = output_q.get(.1)
except Empty:
continue
if isinstance(item, ExitFlag):
stop_signals += 1
if item.exit_value is not None:
exit_values.append(item.exit_value) #do something with the return at the end
if stop_signals >= n_workers: #stop waiting for more results once all consumers finish
break
else:
total += item #do something with the incremental return values
print(total)
print(exit_values)
#cleanup
producer.join()
print("producer joined")
for c in consumers: c.join()
print("consumers joined")
How can you feed an iterable to multiple consumers in constant space?
TLDR
Write an implementation which passes the following test in CONSTANT SPACE, while
treating min, max and sum as black boxes.
def testit(implementation, N):
assert implementation(range(N), min, max, sum) == (0, N-1, N*(N-1)//2)
Discussion
We love iterators because they let us process streams of data lazily,
allowing the treatment of huge amounts of data in CONSTANT SPACE.
def source_summary(source, summary):
return summary(source)
N = 10 ** 8
print(source_summary(range(N), min))
print(source_summary(range(N), max))
print(source_summary(range(N), sum))
Each line took a few seconds to execute, but used very little memory. However,
It did require 3 separate traversals of the source. So this will not work if
your source is a network connection, data acquisition hardware, etc. unless you cache all the data somewhere, losing the CONSTANT SPACE requirement.
Here's a version which demonstrates this problem
def source_summaries(source, *summaries):
from itertools import tee
return tuple(map(source_summary, tee(source, len(summaries)),
summaries))
testit(source_summaries, N)
print('OK')
The test passes, but tee had to keep a copy of all the data, so the space usage goes up from O(1) to O(N).
How can you obtain the results in a single traversal with constant memory?
It is, of course, possible to pass the test given at the top, with O(1) space usage, by cheating:
using knowledge of the specific iterator-consumers that the test uses. But
that is not the point: source_summaries should work with any iterator
consumables such as set, collections.Counter, ''.join, including any
and all that may be written in the future. The implementation must treat them
as black boxes.
To be clear: the only knowledge available about the consumers is that each one consumes one iterable and returns one result. Using any other knowledge about the consumer is cheating.
Ideas
[EDIT: I have posted an implementation of this idea as an answer]
I can imagine a solution (which I really don't like) that uses
preemptive threading
a custom iterator linking the consumer to the source
Let's call the custom iterator link.
For each consumer, run
result = consumer(<link instance for this thread>)
<link instance for this thread>.set_result(result)
on a separate thread.
On the main thread, something along the lines of
for item in source:
for l in links:
l.push(item)
for l in links:
l.stop()
for thread in threads:
thread.join()
return tuple(link.get_result, links)
link.__next__ blocks until the link instance receives
.push(item) in which case it returns the item
.stop() in which case it raises StopIteration
The data races look like a nightmare. You'd need a queue for the pushes, and probably a sentinel object would need to be placed in the queue by link.stop() ... and a bunch of other things I'm overlooking.
I would prefer to use cooperative threading, but consumer(link) seems to be
unavoidably un-cooperative.
Do you have any less messy suggestions?
Here is an alternative implementation of your idea. It uses cooperative multi-threading. As you suggested, the key point is to use multi-threading and having the iterators __next__ method block until all threads have consumed the current iterate.
In addition, the iterator contains an (optional) buffer of constant size. With this buffer we can read the source in chunks and avoid a lot of the locking/synchronization.
My implementation also handles the case in which some consumers stop iterating before reaching the end of the iterator.
import threading
class BufferedMultiIter:
def __init__(self, source, n, bufsize = 1):
'''`source` is an iterator or iterable,
`n` is the number of threads that will interact with this iterator,
`bufsize` is the size of the internal buffer. The iterator will read
and buffer elements from `source` in chunks of `bufsize`. The bigger
the buffer is, the better the performance but also the bigger the
(constant) space requirement.
'''
self._source = iter(source)
self._n = n
# Condition variable for synchronization
self._cond = threading.Condition()
# Buffered values
bufsize = max(bufsize, 1)
self._buffer = [None] * bufsize
self._buffered = 0
self._next = threading.local()
# State variables to implement the "wait for buffer to get refilled"
# protocol
self._serial = 0
self._waiting = 0
# True if we reached the end of the source
self._stop = False
# Was the thread killed (for error handling)?
self._killed = False
def _fill_buffer(self):
'''Refill the internal buffer.'''
self._buffered = 0
while self._buffered < len(self._buffer):
try:
self._buffer[self._buffered] = next(self._source)
self._buffered += 1
except StopIteration:
self._stop = True
break
# Explicitly clear the unused part of the buffer to release
# references as early as possible
for i in range(self._buffered, len(self._buffer)):
self._buffer[i] = None
self._waiting = 0
self._serial += 1
def register_thread(self):
'''Register a thread.
Each thread that wants to access this iterator must first register
with the iterator. It is an error to register the same thread more
than once. It is an error to access this iterator with a thread that
was not registered (with the exception of calling `kill`). It is an
error to register more threads than the number that was passed to the
constructor.
'''
self._next.i = 0
def unregister_thread(self):
'''Unregister a thread from this iterator.
This should be called when a thread is done using the iterator.
It catches the case in which a consumer does not consume all the
elements from the iterator but exits early.
'''
assert hasattr(self._next, 'i')
delattr(self._next, 'i')
with self._cond:
assert self._n > 0
self._n -= 1
if self._waiting == self._n:
self._fill_buffer()
self._cond.notify_all()
def kill(self):
'''Forcibly kill this iterator.
This will wake up all threads currently blocked in `__next__` and
will have them raise a `StopIteration`.
This function should be called in case of error to terminate all
threads as fast as possible.
'''
self._cond.acquire()
self._killed = True
self._stop = True
self._cond.notify_all()
self._cond.release()
def __iter__(self): return self
def __next__(self):
if self._next.i == self._buffered:
# We read everything from the buffer.
# Wait until all other threads have also consumed the buffer
# completely and then refill it.
with self._cond:
old = self._serial
self._waiting += 1
if self._waiting == self._n:
self._fill_buffer()
self._cond.notify_all()
else:
# Wait until the serial number changes. A change in
# serial number indicates that another thread has filled
# the buffer
while self._serial == old and not self._killed:
self._cond.wait()
# Start at beginning of newly filled buffer
self._next.i = 0
if self._killed:
raise StopIteration
k = self._next.i
if k == self._buffered and self._stop:
raise StopIteration
value = self._buffer[k]
self._next.i = k + 1
return value
class NotAll:
'''A consumer that does not consume all the elements from the source.'''
def __init__(self, limit):
self._limit = limit
self._consumed = 0
def __call__(self, it):
last = None
for k in it:
last = k
self._consumed += 1
if self._consumed >= self._limit:
break
return last
def multi_iter(iterable, *consumers, **kwargs):
'''Iterate using multiple consumers.
Each value in `iterable` is presented to each of the `consumers`.
The function returns a tuple with the results of all `consumers`.
There is an optional `bufsize` argument. This controls the internal
buffer size. The bigger the buffer, the better the performance, but also
the bigger the (constant) space requirement of the operation.
NOTE: This will spawn a new thread for each consumer! The iteration is
multi-threaded and happens in parallel for each element.
'''
n = len(consumers)
it = BufferedMultiIter(iterable, n, kwargs.get('bufsize', 1))
threads = list() # List with **running** threads
result = [None] * n
def thread_func(i, c):
it.register_thread()
result[i] = c(it)
it.unregister_thread()
try:
for c in consumers:
t = threading.Thread(target = thread_func, args = (len(threads), c))
t.start()
threads.append(t)
except:
# Here we should forcibly kill all the threads but there is not
# t.kill() function or similar. So the best we can do is stop the
# iterator
it.kill()
finally:
while len(threads) > 0:
t = threads.pop(-1)
t.join()
return tuple(result)
from time import time
N = 10 ** 7
notall1 = NotAll(1)
notall1000 = NotAll(1000)
start1 = time()
res1 = (min(range(N)), max(range(N)), sum(range(N)), NotAll(1)(range(N)),
NotAll(1000)(range(N)))
stop1 = time()
print('5 iterators: %s %.2f' % (str(res1), stop1 - start1))
for p in range(5):
start2 = time()
res2 = multi_iter(range(N), min, max, sum, NotAll(1), NotAll(1000),
bufsize = 2**p)
stop2 = time()
print('multi_iter%d: %s %.2f' % (p, str(res2), stop2 - start2))
The timings are again horrible but you can see how using a constant size buffer improves things significantly:
5 iterators: (0, 9999999, 49999995000000, 0, 999) 0.71
multi_iter0: (0, 9999999, 49999995000000, 0, 999) 342.36
multi_iter1: (0, 9999999, 49999995000000, 0, 999) 264.71
multi_iter2: (0, 9999999, 49999995000000, 0, 999) 151.06
multi_iter3: (0, 9999999, 49999995000000, 0, 999) 95.79
multi_iter4: (0, 9999999, 49999995000000, 0, 999) 72.79
Maybe this can serve as a source of ideas for a good implementation.
Here is an implementation of the preemptive threading solution outlined in the original question.
[EDIT: There is a serious problem with this implementation. [EDIT, now fixed, using a solution inspired by Daniel Junglas.]
Consumers which do not iterate through the whole iterable, will cause a space leak in the queue inside Link. For example:
def exceeds_10(iterable):
for item in iterable:
if item > 10:
return True
return False
if you use this as one of the consumers and use the source range(10**6), it will stop removing items from the queue inside Link after the first 11 items, leaving approximately 10**6 items to be accumulated in the queue!
]
class Link:
def __init__(self, queue):
self.queue = queue
def __iter__(self):
return self
def __next__(self):
item = self.queue.get()
if item is FINISHED:
raise StopIteration
return item
def put(self, item):
self.queue.put(item)
def stop(self):
self.queue.put(FINISHED)
def consumer_not_listening_any_more(self):
self.__class__ = ClosedLink
class ClosedLink:
def put(self, _): pass
def stop(self) : pass
class FINISHED: pass
def make_thread(link, consumer, future):
from threading import Thread
return Thread(target = lambda: on_thread(link, consumer, future))
def on_thread(link, consumer, future):
future.set_result(consumer(link))
link.consumer_not_listening_any_more()
def source_summaries_PREEMPTIVE_THREAD(source, *consumers):
from queue import SimpleQueue as Queue
from asyncio import Future
links = tuple(Link(Queue()) for _ in consumers)
futures = tuple( Future() for _ in consumers)
threads = tuple(map(make_thread, links, consumers, futures))
for thread in threads:
thread.start()
for item in source:
for link in links:
link.put(item)
for link in links:
link.stop()
for t in threads:
t.join()
return tuple(f.result() for f in futures)
It works, but (unsirprisingly) with a horrible degradation in performance:
def time(thunk):
from time import time
start = time()
thunk()
stop = time()
return stop - start
N = 10 ** 7
t = time(lambda: testit(source_summaries, N))
print(f'old: {N} in {t:5.1f} s')
t = time(lambda: testit(source_summaries_PREEMPTIVE_THREAD, N))
print(f'new: {N} in {t:5.1f} s')
giving
old: 10000000 in 1.2 s
new: 10000000 in 30.1 s
So, even though this is a theoretical solution, it is not a practical one[*].
Consequently, I think that this approach is a dead end, unless there's a way of persuading consumer to yield cooperatively (as opposed to forcing it to yield preemptively) in
def on_thread(link, consumer, future):
future.set_result(consumer(link))
... but that seems fundamentally impossible. Would love to be proven wrong.
[*] This is actually a bit harsh: the test does absolutely nothing with trivial data; if this were part of a larger computation which performed heavy calculations on the elements, then this approach could be genuinely useful.
I'm using empty while loops a lot, for example:
I have a thread running in the background that will change a value called "a" in 5 seconds. however, I'm using a different function at the same time, and I want to let the second function know that the value has changed, so what I always did was:
import threading, time
class example:
def __init__(self):
self.a = 0
def valchange(self):
time.sleep(5)
self.a += 1
time.sleep(1)
print("im changing the a value to " + str(self.a))
print("those print commands needs to run after notifier stopped his while and started printing")
def notifier(exam :example, num :int):
while(exam.a != num):
pass
print("it changed to " + str(num))
exa = example()
i = 1
while(i <= 16):
temp= threading.Thread(target=notifier, args=(exa, i, ))
temp.start()
i += 3
i = 1
while(i <= 16):
exa.valchange()
i += 1
It's important to mention, that example could not use wait and set to an event, because there is no indication to when you need to run set, and how much threads are running in the background, and even what numbers will have a thread waiting for them to change.
And also you can't use join because changing 'a' is not a sign to print, only the condition is the sign.
Async and select can't help me as well because of the last reason.
Is there any way to create something, that will stop the program fromrunning until the condition will become true? you can provide your solution with any programming language you want, but mainly I'm using python 3.
EDIT: please remember that I need it to work with every condition. And my code example- is only an example, so if something works there, it doesn't necessarily will work with a different condition.
Thank you very much in advance :)
Idea:
wait(a == 5) // will do nothing until a == 5
You need to use select or epoll system calls if you're waiting for some system operation to finish. In case you're waiting for a certain IO event, then you can use asyncio (provided your Python version > 3.3), otherwise you could consider twisted.
If you're doing some CPU bound operations you need to consider multiple processes or threads, only then you can do any such monitoring effectively. Having a while loop running infinitely without any interruption is a disaster waiting to happen.
If your thread only changes a's value once, at the end of its life, then you can use .join() to wait for the thread to terminate.
import threading
import time
class example:
def __init__(self):
self.a = 0
self.temp = threading.Thread(target=self.valchange)
self.temp.start()
self.notifier()
def valchange(self):
time.sleep(5)
self.a = 1
def notifier(self):
self.temp.join()
print("the value of a has changed")
example()
If the thread might change a's value at any point in its lifetime, then you can use one of the threading module's more generalized control flow objects to coordinate execution. For instance, the Event object.
import threading
import time
class example:
def __init__(self):
self.a = 0
self.event = threading.Event()
temp = threading.Thread(target=self.valchange)
temp.start()
self.notifier()
def valchange(self):
time.sleep(5)
self.a = 1
self.event.set()
def notifier(self):
self.event.wait()
print("the value of a has changed")
example()
One drawback to this Event approach is that the thread target has to explicitly call set() whenever it changes the value of a, which can be irritating if you change a several times in your code. You could automate this away using a property:
import threading
import time
class example(object):
def __init__(self):
self._a = 0
self._a_event = threading.Event()
temp = threading.Thread(target=self.valchange)
temp.start()
self.notifier()
#property
def a(self):
return self._a
#a.setter
def a(self, value):
self._a = value
self._a_event.set()
def valchange(self):
time.sleep(5)
self.a = 1
def notifier(self):
self._a_event.wait()
print("the value of a has changed")
example()
Now valchange doesn't have to do anything special after setting a's value.
What you are describing is a spin lock, and might be fine, depending on your use case.
The alternative approach is to have the code you are waiting on call you back when it reaches a certain condition. This would require an async framework such as https://docs.python.org/3/library/asyncio-task.html
There are some nice simple examples in those docs so I won't insult your intelligence by pasting them here.
I am trying to access the same global dictionary from different threads in python simultaneously. Thread safety at the accessing point is not a consern for me since all accesses are reads and dont modify the dictionary.
I changed my code to do the accesses from multiple threads but i have noticed no increase in the speed of the execution, after checking arround it seems that the interpreter serializes the accesses in effect making the change in my code null.
Is there an easy way to have a structure like concurrentHashMap of Java in python?
The part of the code in question follows:
class csvThread (threading.Thread):
def __init__(self, threadID, bizName):
threading.Thread.__init__(self)
self.threadID = threadID
self.bizName = bizName
def run(self):
thread_function(self.bizName)
def thread_function(biz):
first = True
bizTempImgMap = {}
for imag in bizMap[biz]:
if not similar(bizTempImgMap, imgMap[imag]):
bizTempImgMap[imag] = imgMap[imag]
if first:
a = imgMap[imag]
sum = a
else:
c = np.column_stack((a, imgMap[imag]))
sum += imgMap[imag]
a = c.max(1) #max
first = False
else:
print ("-")
csvLock.acquire()
writer.writerow([biz]+a.astype(np.str_).tolist()+(np.true_divide(sum, len(bizTempImgMap.keys()))).tolist())
csvLock.release()
csvLock = threading.Lock()
...
imgMap = img_vector_load('test_photos.csv')
bizMap = img_busyness_load('csv/test_photo_to_biz_ids.csv')
...
for biz in bizMap.keys():
if len(threads)<100:
thread = csvThread(len(threads), biz)
threads.append(thread)
thread.start()
else:
print("\nWaiting for threads to finish\n")
for t in threads:
t.join()
print("\nThreads Finished\n")
threads = []
"i have noticed no increase in the speed of the execution"
No speed increase will be done by using threads in python, since they all work on the same core.
Take a look to: GIL
Notice this, python threading should be used for concurrent arquitectures not for speed performance.
In case you want to keep this implementation use multiprocessing.
I think I should use Lock object to protect custom class when using multiple threads, however, because Python use GIL to ensure that only one thread is running at any given time, does it mean that there's no need to use Lock to protect built-in type like list? example,
num_list = []
def consumer():
while True:
if len(num_list) > 0:
num = num_list.pop()
print num
return
def producer():
num_list.append(1)
consumer_thread = threading.Thread(target = consumer)
producer_thread = threading.Thread(target = producer)
consumer_thread.start()
producer_thread.start()
The GIL protects the interpreter state, not yours. There are some operations that are effectively atomic - they require a single bytecode and thus effectively do not require locking. (see is python variable assignment atomic? for an answer from a very reputable Python contributor).
There isn't really any good documentation on this though so I wouldn't rely on that in general unless if you plan on disassembling bytecode to test your assumptions. If you plan on modifying state from multiple contexts (or modifying and accessing complex state) then you should plan on using some sort of locking/synchronization mechanism.
If you're interested in approaching this class of problem from a different angle you should look into the Queue module. A common pattern in Python code is to use a synchronized queue to communicate among thread contexts rather than working with shared state.
#jeremy-brown explains with words(see below)... but if you want a counter example:
The lock isn't protecting your state. The following example doesn't use locks, and as a result if the xrange value is high enough it will result in failures: IndexError: pop from empty list.
import threading
import time
con1_list =[]
con2_list =[]
stop = 10000
total = 500000
num_list = []
def consumer(name, doneFlag):
while True:
if len(num_list) > 0:
if name == 'nix':
con2_list.append(num_list.pop())
if len(con2_list) == stop:
print 'done b'
return
else:
con1_list.append(num_list.pop())
if len(con1_list) == stop:
print 'done a'
return
def producer():
for x in xrange(total):
num_list.append(x)
def test():
while not (len(con2_list) >=stop and len(con1_list) >=stop):
time.sleep(1)
print set(con1_list).intersection( set(con2_list))
consumer_thread = threading.Thread(target = consumer, args=('nick',done1))
consumer_thread2 = threading.Thread(target = consumer, args=('nix',done2))
producer_thread = threading.Thread(target = producer)
watcher = threading.Thread(target = test)
consumer_thread.start();consumer_thread2.start();producer_thread.start();watcher.start()