How can you feed an iterable to multiple consumers in constant space? - python

How can you feed an iterable to multiple consumers in constant space?
Write an implementation which passes the following test in CONSTANT SPACE, while
treating min, max and sum as black boxes.
def testit(implementation, N):
assert implementation(range(N), min, max, sum) == (0, N-1, N*(N-1)//2)
We love iterators because they let us process streams of data lazily,
allowing the treatment of huge amounts of data in CONSTANT SPACE.
def source_summary(source, summary):
return summary(source)
N = 10 ** 8
print(source_summary(range(N), min))
print(source_summary(range(N), max))
print(source_summary(range(N), sum))
Each line took a few seconds to execute, but used very little memory. However,
It did require 3 separate traversals of the source. So this will not work if
your source is a network connection, data acquisition hardware, etc. unless you cache all the data somewhere, losing the CONSTANT SPACE requirement.
Here's a version which demonstrates this problem
def source_summaries(source, *summaries):
from itertools import tee
return tuple(map(source_summary, tee(source, len(summaries)),
testit(source_summaries, N)
The test passes, but tee had to keep a copy of all the data, so the space usage goes up from O(1) to O(N).
How can you obtain the results in a single traversal with constant memory?
It is, of course, possible to pass the test given at the top, with O(1) space usage, by cheating:
using knowledge of the specific iterator-consumers that the test uses. But
that is not the point: source_summaries should work with any iterator
consumables such as set, collections.Counter, ''.join, including any
and all that may be written in the future. The implementation must treat them
as black boxes.
To be clear: the only knowledge available about the consumers is that each one consumes one iterable and returns one result. Using any other knowledge about the consumer is cheating.
[EDIT: I have posted an implementation of this idea as an answer]
I can imagine a solution (which I really don't like) that uses
preemptive threading
a custom iterator linking the consumer to the source
Let's call the custom iterator link.
For each consumer, run
result = consumer(<link instance for this thread>)
<link instance for this thread>.set_result(result)
on a separate thread.
On the main thread, something along the lines of
for item in source:
for l in links:
for l in links:
for thread in threads:
return tuple(link.get_result, links)
link.__next__ blocks until the link instance receives
.push(item) in which case it returns the item
.stop() in which case it raises StopIteration
The data races look like a nightmare. You'd need a queue for the pushes, and probably a sentinel object would need to be placed in the queue by link.stop() ... and a bunch of other things I'm overlooking.
I would prefer to use cooperative threading, but consumer(link) seems to be
unavoidably un-cooperative.
Do you have any less messy suggestions?

Here is an alternative implementation of your idea. It uses cooperative multi-threading. As you suggested, the key point is to use multi-threading and having the iterators __next__ method block until all threads have consumed the current iterate.
In addition, the iterator contains an (optional) buffer of constant size. With this buffer we can read the source in chunks and avoid a lot of the locking/synchronization.
My implementation also handles the case in which some consumers stop iterating before reaching the end of the iterator.
import threading
class BufferedMultiIter:
def __init__(self, source, n, bufsize = 1):
'''`source` is an iterator or iterable,
`n` is the number of threads that will interact with this iterator,
`bufsize` is the size of the internal buffer. The iterator will read
and buffer elements from `source` in chunks of `bufsize`. The bigger
the buffer is, the better the performance but also the bigger the
(constant) space requirement.
self._source = iter(source)
self._n = n
# Condition variable for synchronization
self._cond = threading.Condition()
# Buffered values
bufsize = max(bufsize, 1)
self._buffer = [None] * bufsize
self._buffered = 0
self._next = threading.local()
# State variables to implement the "wait for buffer to get refilled"
# protocol
self._serial = 0
self._waiting = 0
# True if we reached the end of the source
self._stop = False
# Was the thread killed (for error handling)?
self._killed = False
def _fill_buffer(self):
'''Refill the internal buffer.'''
self._buffered = 0
while self._buffered < len(self._buffer):
self._buffer[self._buffered] = next(self._source)
self._buffered += 1
except StopIteration:
self._stop = True
# Explicitly clear the unused part of the buffer to release
# references as early as possible
for i in range(self._buffered, len(self._buffer)):
self._buffer[i] = None
self._waiting = 0
self._serial += 1
def register_thread(self):
'''Register a thread.
Each thread that wants to access this iterator must first register
with the iterator. It is an error to register the same thread more
than once. It is an error to access this iterator with a thread that
was not registered (with the exception of calling `kill`). It is an
error to register more threads than the number that was passed to the
self._next.i = 0
def unregister_thread(self):
'''Unregister a thread from this iterator.
This should be called when a thread is done using the iterator.
It catches the case in which a consumer does not consume all the
elements from the iterator but exits early.
assert hasattr(self._next, 'i')
delattr(self._next, 'i')
with self._cond:
assert self._n > 0
self._n -= 1
if self._waiting == self._n:
def kill(self):
'''Forcibly kill this iterator.
This will wake up all threads currently blocked in `__next__` and
will have them raise a `StopIteration`.
This function should be called in case of error to terminate all
threads as fast as possible.
self._killed = True
self._stop = True
def __iter__(self): return self
def __next__(self):
if self._next.i == self._buffered:
# We read everything from the buffer.
# Wait until all other threads have also consumed the buffer
# completely and then refill it.
with self._cond:
old = self._serial
self._waiting += 1
if self._waiting == self._n:
# Wait until the serial number changes. A change in
# serial number indicates that another thread has filled
# the buffer
while self._serial == old and not self._killed:
# Start at beginning of newly filled buffer
self._next.i = 0
if self._killed:
raise StopIteration
k = self._next.i
if k == self._buffered and self._stop:
raise StopIteration
value = self._buffer[k]
self._next.i = k + 1
return value
class NotAll:
'''A consumer that does not consume all the elements from the source.'''
def __init__(self, limit):
self._limit = limit
self._consumed = 0
def __call__(self, it):
last = None
for k in it:
last = k
self._consumed += 1
if self._consumed >= self._limit:
return last
def multi_iter(iterable, *consumers, **kwargs):
'''Iterate using multiple consumers.
Each value in `iterable` is presented to each of the `consumers`.
The function returns a tuple with the results of all `consumers`.
There is an optional `bufsize` argument. This controls the internal
buffer size. The bigger the buffer, the better the performance, but also
the bigger the (constant) space requirement of the operation.
NOTE: This will spawn a new thread for each consumer! The iteration is
multi-threaded and happens in parallel for each element.
n = len(consumers)
it = BufferedMultiIter(iterable, n, kwargs.get('bufsize', 1))
threads = list() # List with **running** threads
result = [None] * n
def thread_func(i, c):
result[i] = c(it)
for c in consumers:
t = threading.Thread(target = thread_func, args = (len(threads), c))
# Here we should forcibly kill all the threads but there is not
# t.kill() function or similar. So the best we can do is stop the
# iterator
while len(threads) > 0:
t = threads.pop(-1)
return tuple(result)
from time import time
N = 10 ** 7
notall1 = NotAll(1)
notall1000 = NotAll(1000)
start1 = time()
res1 = (min(range(N)), max(range(N)), sum(range(N)), NotAll(1)(range(N)),
stop1 = time()
print('5 iterators: %s %.2f' % (str(res1), stop1 - start1))
for p in range(5):
start2 = time()
res2 = multi_iter(range(N), min, max, sum, NotAll(1), NotAll(1000),
bufsize = 2**p)
stop2 = time()
print('multi_iter%d: %s %.2f' % (p, str(res2), stop2 - start2))
The timings are again horrible but you can see how using a constant size buffer improves things significantly:
5 iterators: (0, 9999999, 49999995000000, 0, 999) 0.71
multi_iter0: (0, 9999999, 49999995000000, 0, 999) 342.36
multi_iter1: (0, 9999999, 49999995000000, 0, 999) 264.71
multi_iter2: (0, 9999999, 49999995000000, 0, 999) 151.06
multi_iter3: (0, 9999999, 49999995000000, 0, 999) 95.79
multi_iter4: (0, 9999999, 49999995000000, 0, 999) 72.79
Maybe this can serve as a source of ideas for a good implementation.

Here is an implementation of the preemptive threading solution outlined in the original question.
[EDIT: There is a serious problem with this implementation. [EDIT, now fixed, using a solution inspired by Daniel Junglas.]
Consumers which do not iterate through the whole iterable, will cause a space leak in the queue inside Link. For example:
def exceeds_10(iterable):
for item in iterable:
if item > 10:
return True
return False
if you use this as one of the consumers and use the source range(10**6), it will stop removing items from the queue inside Link after the first 11 items, leaving approximately 10**6 items to be accumulated in the queue!
class Link:
def __init__(self, queue):
self.queue = queue
def __iter__(self):
return self
def __next__(self):
item = self.queue.get()
if item is FINISHED:
raise StopIteration
return item
def put(self, item):
def stop(self):
def consumer_not_listening_any_more(self):
self.__class__ = ClosedLink
class ClosedLink:
def put(self, _): pass
def stop(self) : pass
class FINISHED: pass
def make_thread(link, consumer, future):
from threading import Thread
return Thread(target = lambda: on_thread(link, consumer, future))
def on_thread(link, consumer, future):
def source_summaries_PREEMPTIVE_THREAD(source, *consumers):
from queue import SimpleQueue as Queue
from asyncio import Future
links = tuple(Link(Queue()) for _ in consumers)
futures = tuple( Future() for _ in consumers)
threads = tuple(map(make_thread, links, consumers, futures))
for thread in threads:
for item in source:
for link in links:
for link in links:
for t in threads:
return tuple(f.result() for f in futures)
It works, but (unsirprisingly) with a horrible degradation in performance:
def time(thunk):
from time import time
start = time()
stop = time()
return stop - start
N = 10 ** 7
t = time(lambda: testit(source_summaries, N))
print(f'old: {N} in {t:5.1f} s')
t = time(lambda: testit(source_summaries_PREEMPTIVE_THREAD, N))
print(f'new: {N} in {t:5.1f} s')
old: 10000000 in 1.2 s
new: 10000000 in 30.1 s
So, even though this is a theoretical solution, it is not a practical one[*].
Consequently, I think that this approach is a dead end, unless there's a way of persuading consumer to yield cooperatively (as opposed to forcing it to yield preemptively) in
def on_thread(link, consumer, future):
... but that seems fundamentally impossible. Would love to be proven wrong.
[*] This is actually a bit harsh: the test does absolutely nothing with trivial data; if this were part of a larger computation which performed heavy calculations on the elements, then this approach could be genuinely useful.


Is it possible to set maxtasksperchild for a threadpool?

After encountering some probable memory leaks in a long running multi threaded script I found out about maxtasksperchild, which can be used in a Multi process pool like this:
import multiprocessing
with multiprocessing.Pool(processes=32, maxtasksperchild=x) as pool:
Is something similar possible for the Threadpool (multiprocessing.pool.ThreadPool)?
As the answer by noxdafox said, there is no way in the parent class, you can use threading module to control the max number of tasks per child. As you want to use multiprocessing.pool.ThreadPool, threading module is similar, so...
def split_processing(yourlist, num_splits=4):
yourlist = list which you want to pass to function for threading.
num_splits = control total units passed.
split_size = len(yourlist) // num_splits
threads = []
for i in range(num_splits):
start = i * split_size
end = len(yourlist) if i+1 == num_splits else (i+1) * split_size
threads.append(threading.Thread(target=function, args=(yourlist, start, end)))
# wait for all threads to finish
for t in threads:
Lets say
yourlist has 100 items, then
if num_splits = 10; then threads = 10, each thread has 10 tasks.
if num_splits = 5; then threads = 5, each thread has 20 tasks.
if num_splits = 50; then threads = 50, each thread has 2 tasks.
and vice versa.
Looking at multiprocessing.pool.ThreadPool implementation it becomes evident that the maxtaskperchild parameter is not propagated to the parent multiprocessing.Pool class. The multiprocessing.pool.ThreadPool implementation has never been completed, hence it lacks few features (as well as tests and documentation).
The pebble package implements a ThreadPool which supports workers restart after a given amount of tasks have been processed.
I wanted a ThreadPool that will run a new task as soon as another task in the pool completes (i.e. maxtasksperchild=1). I decided to write a small "ThreadPool" class that creates a new thread for every task. As soon a task in the pool completes, another thread is created for the next value in the iterable passed to the map method. The map method blocks until all values in the passed iterable have been processed and their threads returned.
import threading
class ThreadPool():
def __init__(self, processes=20):
self.processes = processes
self.threads = [Thread() for _ in range(0, processes)]
def get_dead_threads(self):
dead = []
for thread in self.threads:
if not thread.is_alive():
return dead
def is_thread_running(self):
return len(self.get_dead_threads()) < self.processes
def map(self, func, values):
attempted_count = 0
values_iter = iter(values)
# loop until all values have been attempted to be processed and
# all threads are finished running
while (attempted_count < len(values) or self.is_thread_running()):
for thread in self.get_dead_threads():
# run thread with the next value
value = next(values_iter)
attempted_count += 1, value)
except StopIteration:
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, exc_tb):
class Thread():
def __init__(self):
self.thread = None
def run(self, target, *args, **kwargs):
self.thread = threading.Thread(target=target,
def is_alive(self):
if self.thread:
return self.thread.is_alive()
return False
You can use it like this:
def run_job(self, value, mp_queue=None):
# do something with value
value += 1
with ThreadPool(processes=2) as pool:, [1, 2, 3, 4, 5])

Tee'ing and re-joining pipelines

I'm playing with co-routines, and they're often described as useful for pipelines. I found a lecture from Berkeley that was very helpful ( ), but there's one thing I'm having trouble with. In that lecture, there's a diagram where a pipeline forks, then re-combines later. If order doesn't matter, recombining is easy, the consumer has one yield, but two producers are send()ing to it. But what if order matters? What if I want strict alternation (get a value from left fork, get a value from right fork, lather, rinse, repeat)? Is this possible?
Trivial recombine:
def producer1(ncr):
while True:
def producer2(ncr):
while True:
def combine():
while True:
chain = combine()
I get an output of 0 1 0 1 etc, but I'm pretty sure that's a side effect of scheduling. Is there a way to guarantee the ordering, like a yield-from-1,yield-from-2?
To be clear, I know of yield from and __await__, but I haven't understood them yet.
This isn't difficult if you "pull" through your pipeline rather than "push":
def producer1():
while True:
yield 0
def producer2():
while True:
yield 1
def combine(*producers):
while True:
for producer in producers:
val = next(producer)
combine(producer1(), producer2())
Should reliably produce alternating 1s and 0s
You can also have the final consumer (the thing that does work with each value- printing in this case) work as a receiver with no reference to the
producers if you really want:
def producer1():
while True:
yield 0
def producer2():
while True:
yield 1
def combine_to_push(co, *producers):
while True:
for producer in producers:
s = next(producer)
def consumer():
while True:
val = (yield)
co = consumer()
combine_to_push(co, producer1(), producer2())
I think figured out how to do it. It works with trivial pipelines, anyhow. Here's my combine:
class combiner():
def __init__(self,n,ncr):
self.q = [ deque() for i in range(n)]
self.n = n
self.x = 0
self.ncr = ncr
def receiver(self,n):
while True:
def sender(self):
while True:
if self.q[self.x]:
self.x = (self.x + 1) % self.n
this will round-robin between n streams. Basically, combiner.receiver() is a coroutine that takes data from a stream and puts it into a queue. There's a unique queue per stream. combiner.sender will flush out as much of the queue as it can manage then return.
I'm worrying a little that calling a function from a generator that then does a send might be bad, but I could just roll sender into receiver and that issue goes away....

Iterable multiprocessing Queue not exiting

import multiprocessing.queues as queues
import multiprocessing
class I(queues.Queue):
def __init__(self, maxsize=0):
super(I, self).__init__(maxsize)
self.length = 0
def __iter__(self):
return self
def put(self, obj, block=True, timeout=None):
super(I, self).put(obj,block,timeout)
self.length += 1
def get(self, block = True, timeout = None):
self.length -= 1
return super(I, self).get(block, timeout)
def __len__(self):
return self.length
def next(self):
item = self.get()
if item == 'Done':
raise StopIteration
return item
def thisworker(item):
print 'got this item: %s' % item
return item
the_pool = multiprocessing.Pool(1)
print, q)
I'm trying to create an iterable queue to use with multiprocessing pool map.
The idea is that the function thisworker would append some items to the queue until a condition is met and then exit after putting 'Done' in the queue (I've not done it here in this code yet)
But, this code never completes, it always hangs up.
I'm not able to debug the real cause.
Request your help
PS: I've used self.length because the map_async method called from under requires to use the length of the iterable to form a variable: chunksize, which will be used to get tasks from the pool.
The problem is that you're treating 'Done' as a special-case item in the Queue, which indicates that the iteration should stop. So, if you iterate over the Queue using a for loop with your example, all that will be returned is 1. However, you're claiming that the length of the Queue is 2. This is screwing up the map code, which is relying on that length to accurately represent the number of items in the iterable in order to know when all the results have returned from the workers:
class MapResult(ApplyResult):
def __init__(self, cache, chunksize, length, callback):
ApplyResult.__init__(self, cache, callback)
# _number_left is used to know when the MapResult is done
self._number_left = length//chunksize + bool(length % chunksize)
So, you need to make the length actually be accurate. You can do that a few ways, but I would recommend not requiring a sentinel to be loaded into the Queue at all, and use get_nowait instead:
import multiprocessing.queues as queues
import multiprocessing
from Queue import Empty
class I(queues.Queue):
def __init__(self, maxsize=0):
super(I, self).__init__(maxsize)
self.length = 0
... <snip>
def next(self):
item = self.get_nowait()
except Empty:
raise StopIteration
return item
def thisworker(item):
print 'got this item: %s' % item
return item
the_pool = multiprocessing.Pool(1)
print, q)
Also, note that this approach isn't process safe. The length attribute will only be correct if you only put into the Queue from a single process, and then never put again after sending the Queue to a worker process. It also won't work in Python 3 without adjusting the imports and implementation, because the constructor for multiprocessing.queues.Queue has changed.
Instead of subclassing multiprocessing.queues.Queue, I would recommend using the iter built-in to iterate over the Queue:
q = multiprocessing.Queue()
q.put(None) # None is our sentinel, you could use 'Done', if you wanted, iter(q.get, None)) # This will call q.get() until None is returned
This will work on all versions of Python, is much less code, and is process-safe.
Based on the requirements you mentioned in the comment to my answer, I think you're better off using imap instead of map, so that you don't need to know the length of the Queue at all. The reality is, you can't accurately determine that, and in fact the length may end up growing as you're iterating. If you use imap exclusively, then doing something similar to your original approach will work fine:
import multiprocessing
class I(object):
def __init__(self, maxsize=0):
self.q = multiprocessing.Queue(maxsize)
def __getattr__(self, attr):
if hasattr(self.q, attr):
return getattr(self.q, attr)
def __iter__(self):
return self
def next(self):
item = self.q.get()
if item == 'Done':
raise StopIteration
return item
def thisworker(item):
if item == 1:
if item == 2:
print 'got this item: %s' % item
return item
the_pool = multiprocessing.Pool(2) # 2 workers
print list(the_pool.imap(thisworker, q))
got this item: 1
got this item: 5
got this item: 3
got this item: 2
[1, 2, 5, 3]
I got rid of the code that worried about the length, and used delegation instead of inheritance, for better Python 3.x compatibility.
Note that my original suggestion, to use iter(q.get, <sentinel>), still works here, too, as long as you use imap instead of map.

should I protect built-in data structure( list, dict) when using multiple threads?

I think I should use Lock object to protect custom class when using multiple threads, however, because Python use GIL to ensure that only one thread is running at any given time, does it mean that there's no need to use Lock to protect built-in type like list? example,
num_list = []
def consumer():
while True:
if len(num_list) > 0:
num = num_list.pop()
print num
def producer():
consumer_thread = threading.Thread(target = consumer)
producer_thread = threading.Thread(target = producer)
The GIL protects the interpreter state, not yours. There are some operations that are effectively atomic - they require a single bytecode and thus effectively do not require locking. (see is python variable assignment atomic? for an answer from a very reputable Python contributor).
There isn't really any good documentation on this though so I wouldn't rely on that in general unless if you plan on disassembling bytecode to test your assumptions. If you plan on modifying state from multiple contexts (or modifying and accessing complex state) then you should plan on using some sort of locking/synchronization mechanism.
If you're interested in approaching this class of problem from a different angle you should look into the Queue module. A common pattern in Python code is to use a synchronized queue to communicate among thread contexts rather than working with shared state.
#jeremy-brown explains with words(see below)... but if you want a counter example:
The lock isn't protecting your state. The following example doesn't use locks, and as a result if the xrange value is high enough it will result in failures: IndexError: pop from empty list.
import threading
import time
con1_list =[]
con2_list =[]
stop = 10000
total = 500000
num_list = []
def consumer(name, doneFlag):
while True:
if len(num_list) > 0:
if name == 'nix':
if len(con2_list) == stop:
print 'done b'
if len(con1_list) == stop:
print 'done a'
def producer():
for x in xrange(total):
def test():
while not (len(con2_list) >=stop and len(con1_list) >=stop):
print set(con1_list).intersection( set(con2_list))
consumer_thread = threading.Thread(target = consumer, args=('nick',done1))
consumer_thread2 = threading.Thread(target = consumer, args=('nix',done2))
producer_thread = threading.Thread(target = producer)
watcher = threading.Thread(target = test)

multiprocessing - pool allocation

I notice this behavior in python for the pool allocation. Even though I have 20 processes in the pool, when I do a map_async for say 8 processes, instead of throwing all the processes to execute, I get only 4 executing. when those 4 finish, it sends two more, and then when those two finish is sends one.
When I throw more than 20 at it, it runs all 20, until it starts to get less than 20 in the queue, when the above behavior repeats.
I assume this is done on purpose, but it looks weird. My goal is to have the requests processed as soon as they come in and obviously this behavior does not fit.
Using python 2.6 with billiard for maxtasksperchild support
Any ideas how can I improve it?
mypool = pool.Pool(processes=settings['num-processes'], initializer=StartChild, maxtasksperchild=10)
while True:
lines = DbData.GetAll()
if len(lines) > 0:
print 'Starting to process: ', len(lines), ' urls'
Res = mypool.map_async(RunChild, lines)
Returns = Res.get(None)
print 'Pool returns: ', idx, Returns
One way I deal with multiprocessing in Python is the following:
I have data on which I want to use a function function().
First I create a multiprocessing subclass:
import multiprocessing
class ProcessThread(multiprocessing.Process):
def __init__(self, id_t, inputqueue, idqueue, function, resultqueue):
self.id_t = id_t
self.inputlist = inputqueue
self.idqueue = idqueue
self.function = function
self.resultqueue = resultqueue
def run(self):
s = "process number: " + str(self.id_t) + " starting"
print s
result = []
while self.inputqueue.qsize() > 0
inp = self.inputqueue.get()
except Exception:
result = self.function(inp)
while 1:
except Exception:
and the main function:
inputqueue = multiprocessing.Queue()
resultqueue = multiprocessing.Queue()
idqueue = multiprocessing.Queue()
def function(data):
print data # or what you want
for datum in data:
for i in xrange(nbprocess):
ProcessThread(i, inputqueue, idqueue, function, resultqueue).start()
and finally get results:
results = []
while idqueue.qsize() < nbprocess:
while resultqueue.qsize() > 0:
In this way you can control perfectly what is appended with process and other stuff.
Using a multiprocessing inputqueue is an efficient technique only if the computation for each datum is quite slow (< 1,2 seconds) because of the concurrent access of the different process to the queues (that why I use exception). If your function computes very quickly, consider splitting up your data only once at the begining and put chunks of the dataset for every process at the beginning.

