Python Multiprocessing using Process: Consuming Large Memory

Python Multiprocessing using Process: Consuming Large Memory - python

I am running multiple processes from single python code:
Code Snippet:
while 1:
if sqsObject.msgCount() > 0:
ReadyMsg = sqsObject.readM2Q()
if ReadyMsg == 0:
continue
fileName = ReadyMsg['fileName']
dirName = ReadyMsg['dirName']
uuid = ReadyMsg['uid']
guid = ReadyMsg['guid']
callback = ReadyMsg['callbackurl']
# print ("Trigger Algorithm Process")
if(countProcess < maxProcess):
try:
retValue = Process(target=dosomething, args=(dirName, uuid,guid,callback))
processArray.append(retValue)
retValue.start()
countProcess = countProcess + 1
except:
print "Cannot Run Process"
else:
for i in range(len(processArray)):
if (processArray[i].is_alive() == True):
continue
else:
try:
#print 'Restart Process'
processArray[i] = Process(target=dosomething, args=(dirName,uuid,guid,callback))
processArray[i].start()
except:
print "Cannot Run Process"
else: # No more request to service
for i in range(len(processArray)):
if (processArray[i].is_alive() == True):
processRunning = 1
break
else:
continue
if processRunning == 0:
countProcess = 0
else:
processRunning = 0
Here I am reading the messages from the queue and creating a process to run the algorithm on that message. I am putting upper limit of maxProcess. And hence after reaching maxProcess, I want to reuse the processArray slots which are not alive by checking is_alive().
This process runs fine for smaller number of processes however, for large number of messages say 100, Memory consumption goes through roof. I am thinking I have leak by reusing the process slots.
Not sure what is wrong in the process.
Thank you in advance for spotting an error or wise advise.

Your code is, in a word, weird :-)
It's not an mvce, so no one else can test it, but just looking at it, you have this (slightly simplified) structure in the inner loop:
if count < limit:
... start a new process, and increment count ...
else:
do things that can potentially start even more processes
(but never, ever, decrease count)
which seems unwise at best.
There are no invocations of a process instance's join(), anywhere. (We'll get back to the outer loop and its else case in a bit.)
Let's look more closely at the inner loop's else case code:
for i in range(len(processArray)):
if (processArray[i].is_alive() == True):
Leaving aside the unnecessary == True test—which is a bit of a risk, since the is_alive() method does not specifically promise to return True and False, just something that works boolean-ly—consider this description from the documentation (this link goes to py2k docs but py3k is the same, and your print statements imply your code is py2k anyway):
is_alive()
Return whether the process is alive.
Roughly, a process object is alive from the moment the start() method returns until the child process terminates.
Since we can't see the code for dosomething, it's hard to say whether these things ever terminate. Probably they do (by exiting), but if they don't, or don't soon enough, we could get problems here, where we just drop the message we pulled off the queue in the outer loop.
If they do terminate, we just drop the process reference from the array, by overwriting it:
processArray[i] = Process(...)
The previous value in processArray[i] is discarded. It's not clear if you may have saved this anywhere else, but if you have not, the Process instance gets discarded, and now it is actually impossible to call its join() method.
Some Python data structures tend to clean themselves up when abandoned (e.g., open streams flush output and close as needed), but the multiprocess code appears not to auto-join() its children. So this could be the, or a, source of the problem.
Finally, whenever we do get to the else case in the outer loop, we have the same somewhat odd search for any alive processes—which, incidentally, can be written more clearly as:
if any(p.is_alive() for p in processArray):
as long as we don't care about which particular ones are alive, and which are not—and if none report themselves as alive, we reset the count, but never do anything with the variable processArray, so that each processArray[i] still holds the identity of the Process instance. (So at least we could call join on each of these, excluding any lost by overwriting.)
Rather than building your own Pool yourself, you are probably better off using multiprocess.Pool and its apply and apply_async methods, as in miraculixx's answer.

Not sure what is wrong in the process.
It appears you are creating as many processes as there are messages, even when the maxProcess count is reached.
I am thinking I have leak by reusing the process slots.
There is no need to manage the processes yourself. Just use a process pool:
# before your while loop starts
from multiprocessing import Pool
pool = Pool(processes=max_process)
while 1:
...
# instead of creating a new Process
res = pool.apply_async(dosomething,
args=(dirName,uuid,guid,callback))
# after the while loop has finished
# -- wait to finish
pool.close()
pool.join()
Ways to submit jobs
Note that the Pool class supports several ways to submit jobs:
apply_async - one message at a time
map_async - a chunk of messages at a time
If messages arrive fast enough it might be better to collect several of them (say 10 or 100 at a time, depending on the actual processing done) and use map to submit a "mini-batch" to the target function at a time:
...
while True:
messages = []
# build mini-batch of messages
while len(messages) < batch_size:
... # get message
messages.append((dirName,uuid,guid,callback))
pool.map_async(dosomething, messages)
To avoid memory leaks left by dosomething you can ask the Pool to restart a process after it has consumed some number of messages:
max_tasks = 5 # some sensible number
Pool(max_processes, maxtasksperchild=max_tasks)
Going distributed
If with this approach the memory capacity is still exceeded, consider using a distributed approach i.e. add more machines. Using Celery that would be pretty straight forward, coming from the above:
# tasks.py
#task
def dosomething(...):
... # same code as before
# driver.py
while True:
... # get messages as before
res = somefunc.apply_async(args=(dirName,uuid,guid,callback))

Related

Python Maximum Thread Count [duplicate]

import threading
threads = []
for n in range(0, 60000):
t = threading.Thread(target=function,args=(x, n))
t.start()
threads.append(t)
for t in threads:
t.join()
It is working well for range up to 800 on my laptop, but if I increase range to more than 800 I get the error can't create new thread.
How can I control number to threads to get created or any other way to make it work like timeout? I tried using threading.BoundedSemaphore function but that doesn't seem to work properly.

The problem is that no major platform (as of mid-2013) will let you create anywhere near this number of threads. There are a wide variety of different limitations you could run into, and without knowing your platform, its configuration, and the exact error you got, it's impossible to know which one you ran into. But here are two examples:
On 32-bit Windows, the default thread stack is 1MB, and all of your thread stacks have to fit into the same 2GB of virtual memory space as everything else in your program, so you will run out long before 60000.
On 64-bit linux, you will likely exhaust one of your session's soft ulimit values before you get anywhere near running out of page space. (Linux has a variety of different limits beyond the ones required by POSIX.)
So, how can i control number to threads to get created or any other way to make it work like timeout or whatever?
Using as many threads as possible is very unlikely to be what you actually want to do. Running 800 threads on an 8-core machine means that you're spending a whole lot of time context-switching between the threads, and the cache keeps getting flushed before it ever gets primed, and so on.
Most likely, what you really want is one of the following:
One thread per CPU, serving a pool of 60000 tasks.
Maybe processes instead of threads (if the primary work is in Python, or in C code that doesn't explicitly release the GIL).
Maybe a fixed number of threads (e.g., a web browsers may do, say, 12 concurrent requests at a time, whether you have 1 core or 64).
Maybe a pool of, say, 600 batches of 100 tasks apiece, instead of 60000 single tasks.
60000 cooperatively-scheduled fibers/greenlets/microthreads all sharing one real thread.
Maybe explicit coroutines instead of a scheduler.
Or "magic" cooperative greenlets via, e.g. gevent.
Maybe one thread per CPU, each running 1/Nth of the fibers.
But it's certainly possible.
Once you've hit whichever limit you're hitting, it's very likely that trying again will fail until a thread has finished its job and been joined, and it's pretty likely that trying again will succeed after that happens. So, given that you're apparently getting an exception, you could handle this the same way as anything else in Python: with a try/except block. For example, something like this:
threads = []
for n in range(0, 60000):
while True:
t = threading.Thread(target=function,args=(x, n))
try:
t.start()
threads.append(t)
except WhateverTheExceptionIs as e:
if threads:
threads[0].join()
del threads[0]
else:
raise
else:
break
for t in threads:
t.join()
Of course this assumes that the first task launched is likely to be the one of the first tasks finished. If this is not true, you'll need some way to explicitly signal doneness (condition, semaphore, queue, etc.), or you'll need to use some lower-level (platform-specific) library that gives you a way to wait on a whole list until at least one thread is finished.
Also, note that on some platforms (e.g., Windows XP), you can get bizarre behavior just getting near the limits.
On top of being a lot better, doing the right thing will probably be a lot simpler as well. For example, here's a process-per-CPU pool:
with concurrent.futures.ProcessPoolExecutor() as executor:
fs = [executor.submit(function, x, n) for n in range(60000)]
concurrent.futures.wait(fs)
… and a fixed-thread-count pool:
with concurrent.futures.ThreadPoolExecutor(12) as executor:
fs = [executor.submit(function, x, n) for n in range(60000)]
concurrent.futures.wait(fs)
… and a balancing-CPU-parallelism-with-numpy-vectorization batching pool:
with concurrent.futures.ThreadPoolExecutor() as executor:
batchsize = 60000 // os.cpu_count()
fs = [executor.submit(np.vector_function, x,
np.arange(n, min(n+batchsize, 60000)))
for n in range(0, 60000, batchsize)]
concurrent.futures.wait(fs)
In the examples above, I used a list comprehension to submit all of the jobs and gather their futures, because we're not doing anything else inside the loop. But from your comments, it sounds like you do have other stuff you want to do inside the loop. So, let's convert it back into an explicit for statement:
with concurrent.futures.ProcessPoolExecutor() as executor:
fs = []
for n in range(60000):
fs.append(executor.submit(function, x, n))
concurrent.futures.wait(fs)
And now, whatever you want to add inside that loop, you can.
However, I don't think you actually want to add anything inside that loop. The loop just submits all the jobs as fast as possible; it's the wait function that sits around waiting for them all to finish, and it's probably there that you want to exit early.
To do this, you can use wait with the FIRST_COMPLETED flag, but it's much simpler to use as_completed.
Also, I'm assuming error is some kind of value that gets set by the tasks. In that case, you will need to put a Lock around it, as with any other mutable value shared between threads. (This is one place where there's slightly more than a one-line difference between a ProcessPoolExecutor and a ThreadPoolExecutor—if you use processes, you need multiprocessing.Lock instead of threading.Lock.)
So:
error_lock = threading.Lock
error = []
def function(x, n):
# blah blah
try:
# blah blah
except Exception as e:
with error_lock:
error.append(e)
# blah blah
with concurrent.futures.ProcessPoolExecutor() as executor:
fs = [executor.submit(function, x, n) for n in range(60000)]
for f in concurrent.futures.as_completed(fs):
do_something_with(f.result())
with error_lock:
if len(error) > 1: exit()
However, you might want to consider a different design. In general, if you can avoid sharing between threads, your life gets a lot easier. And futures are designed to make that easy, by letting you return a value or raise an exception, just like a regular function call. That f.result() will give you the returned value or raise the raised exception. So, you can rewrite that code as:
def function(x, n):
# blah blah
# don't bother to catch exceptions here, let them propagate out
with concurrent.futures.ProcessPoolExecutor() as executor:
fs = [executor.submit(function, x, n) for n in range(60000)]
error = []
for f in concurrent.futures.as_completed(fs):
try:
result = f.result()
except Exception as e:
error.append(e)
if len(error) > 1: exit()
else:
do_something_with(result)
Notice how similar this looks to the ThreadPoolExecutor Example in the docs. This simple pattern is enough to handle almost anything without locks, as long as the tasks don't need to interact with each other.

Can a python Multiprocessing queue be passed to the child process?

I have a big dataset in a data acquisition system I wrote in python that takes infinitely long to pass over a queue from the child process to the parent. I want to save the data acquired at the end of the acquisition and tried this using the queue function in Multiprocessing. Instead of doing it this way I would prefer it if I could instead pass a message over the queue from the parent to the child to save my data before I kill the child process. Is this possible? An example of what I thought it might look like is:
def acquireData(self, var1, queue):
import h5py
# Put my acquisition code here
queue.get()
if queue == True:
f = h5py.File("FileName","w")
f.create_dataset('Data',data=data)
f.close()
if __name__ == '__main__':
from multiprocessing import Process, Queue
queue = Queue()
inter_thread = Process(target=acquireData, args=(var1,queue))
queue.put(False)
inter_thread.start()
while True:
if not args.automate:
# Let c++ threads run for given amount of time
# Wait for stop from OP GUI
else:
queue.put(True)
break
print("Acquisition finished, cleaning up...")
sleep(2)
inter_thread.terminate()
Is this allowed? If this type of interfacing between processes is allowed then do I have the right notation? For some reference I have on the order of 9e7 data points in the array I'm trying to save and I have 7 arrays which is simply not being passed to my parent process in a timely manner by putting these arrays into the queue. Thank you.

First, yes, passing a queue to a child is not only legal, but the main use case for queues. See the first example in the docs, which does exactly that.
However, you've got some problems with your code:
queue.get()
if queue == True:
First, your queue is never going to be the boolean value True, it's going to be a Queue. You almost never want to check if x == True: in Python; you want to check if x:. For example, if [1, 2]: will pass, while if [1, 2] == True: will not.
Second, your queue isn't even the thing you want to check in the first place. It isn't truthy or falsey (or it isn't relevant whether it is); it's the value the main process put on the queue and you pulled off that's either truthy or falsey. Which you discarded as soon as you retrieved it.
So, do this:
flag = queue.get()
if flag:
Or, more simply:
if queue.get():
I'm not sure whether this is exactly what you want or not. That queue.get() will block forever until the main process puts something there. Is that what you wanted? If so, great; you're done with this part of your code. If not, you need to think about what you wanted instead.
As designed, the parent will always wait 2 seconds, even if the child finished long before that. A better solution is to join the child with a timeout of 2 seconds. Then you can terminate it if times out.
Plus, are you sure the termination behavior you've designed is what you want? You're doing a "soft kill request" with the queue, then waiting 2 seconds, then doing a "medium-hard kill request" with terminate, and never doing a "hard kill" with kill. That could be a perfectly reasonable design—but if it's not your design, you've implemented the wrong thing.

Multiprocessing with shared queue and end criteria

I've got this original function that I want to switch to multiprocess:
def optimal(t0, tf, frequences, delay, ratio = 0):
First = True # First
for s in delay:
delay = 0 # delay between signals,
timelines = list()
for i in range(len(frequences)):
timelines.append(time_builder(frequences[i], t0+delay, tf))
delay += s
trio_overlap = trio_combination(timelines, ratio)
valid = True
for items in trio_overlap.values():
if len(list(set(items))) == len(items):
continue
else:
valid = False
if not valid:
continue
overlap = duo_combination(timelines)
optimal = ... depending of conditions
return optimal
If valid = True after the test, it will compute an optimization parameter called optim_param and try to minimize it. If it gets under a certain threshold, optim_param < 0.3, I break out of the loop and take this value as my answer.
My problem is that as I develop my model, the complexity is starting to rise, and single thread computation takes too long. I would like to process the computation in parallel. Since each process will have to compare the result obtained with an s value to the current optimal, I tried to implement a Queue.
It's my first time doing multiprocessing, and even if I think I'm on the right track, I kinda feel like my code is messy and incomplete. Could I get some help?
Thanks :D

Instead of manually creating a process for each case, consider using Pool.imap_unordered. The trick is how to cleanly shut down when a passable result is obtained: you can implement this by passing a generator that exits early in case a flag is set that it checks every cycle. The main program reads from the iterator, maintains the best result seen, and sets the flag when it is good enough. The final trick is to slow down the (internal) thread reading from the generator to prevent a large backlog of scheduled tasks that must be waited on (or, uncleanly, killed) after the good result is obtained. Given the number of processes in the pool, that pacing can be achieved with a semaphore.
Here's an example (with trivial analysis) to demonstrate:
import multiprocessing,threading,os
def interrupted(data,sem,interrupt):
for x in data:
yield x
sem.acquire()
if interrupt: break
def analyze(x): return x**2
np=os.cpu_count()
pool=multiprocessing.Pool(np)
sem=threading.Semaphore(np-1)
token=[] # mutable
vals=pool.imap_unordered(analyze,interrupted(range(-10,10),sem,token))
pool.close() # optional: to let processes exit faster
best=None
for res in vals:
if best is None or res<best:
best=res
if best<5: token.append(None) # make it truthy
sem.release()
pool.join()
print(best)
There are of course other ways to share the semaphore and interrupt flag with the generator; this way uses an ugly data type but has the virtue of using no global variables (or even closures).

Output Queue of a Python multiprocessing is providing more results than expected

From the following code I would expect that the length of the resulting list were the same as the one of the range of items with which the multiprocess is feed:
import multiprocessing as mp
def worker(working_queue, output_queue):
while True:
if working_queue.empty() is True:
break #this is supposed to end the process.
else:
picked = working_queue.get()
if picked % 2 == 0:
output_queue.put(picked)
else:
working_queue.put(picked+1)
return
if __name__ == '__main__':
static_input = xrange(100)
working_q = mp.Queue()
output_q = mp.Queue()
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())]
for proc in processes:
proc.start()
for proc in processes:
proc.join()
results_bank = []
while True:
if output_q.empty() is True:
break
else:
results_bank.append(output_q.get())
print len(results_bank) # length of this list should be equal to static_input, which is the range used to populate the input queue. In other words, this tells whether all the items placed for processing were actually processed.
results_bank.sort()
print results_bank
Has anyone any idea about how to make this code to run properly?

This code will never stop:
Each worker gets an item from the queue as long as it is not empty:
picked = working_queue.get()
and puts a new one for each that it got:
working_queue.put(picked+1)
As a result the queue will never be empty except when the timing between the process happens to be such that the queue is empty at the moment one of the processes calls empty(). Because the queue length is initially 100 and you have as many processes as cpu_count() I would be surprised if this ever stops on any realistic system.
Well executing the code with slight modification proves me wrong, it does stop at some point, which actually surprises me. Executing the code with one process there seems to be a bug, because after some time the process freezes but does not return. With multiple processes the result is varying.
Adding a short sleep period in the loop iteration makes the code behave as I expected and explained above. There seems to be some timing issue between Queue.put, Queue.get and Queue.empty, although they are supposed to be thread-safe. Removing the empty test also gives the expected result (without ever getting stuck at an empty queue).
Found the reason for the varying behaviour. The objects put on the queue are not flushed immediately. Therefore empty might return False although there are items in the queue waiting to be flushed.
From the documentation:
Note: When an object is put on a queue, the object is pickled and a
background thread later flushes the pickled data to an underlying
pipe. This has some consequences which are a little surprising, but
should not cause any practical difficulties – if they really bother
you then you can instead use a queue created with a manager.
After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising Queue.Empty.
If multiple processes are enqueuing objects, it is possible for the objects to be received at the other end out-of-order. However, objects enqueued by the same process will always be in the expected order with respect to each other.

Dumping a multiprocessing.Queue into a list

I wish to dump a multiprocessing.Queue into a list. For that task I've written the following function:
import Queue
def dump_queue(queue):
"""
Empties all pending items in a queue and returns them in a list.
"""
result = []
# START DEBUG CODE
initial_size = queue.qsize()
print("Queue has %s items initially." % initial_size)
# END DEBUG CODE
while True:
try:
thing = queue.get(block=False)
result.append(thing)
except Queue.Empty:
# START DEBUG CODE
current_size = queue.qsize()
total_size = current_size + len(result)
print("Dumping complete:")
if current_size == initial_size:
print("No items were added to the queue.")
else:
print("%s items were added to the queue." % \
(total_size - initial_size))
print("Extracted %s items from the queue, queue has %s items \
left" % (len(result), current_size))
# END DEBUG CODE
return result
But for some reason it doesn't work.
Observe the following shell session:
>>> import multiprocessing
>>> q = multiprocessing.Queue()
>>> for i in range(100):
... q.put([range(200) for j in range(100)])
...
>>> q.qsize()
100
>>> l=dump_queue(q)
Queue has 100 items initially.
Dumping complete:
0 items were added to the queue.
Extracted 1 items from the queue, queue has 99 items left
>>> l=dump_queue(q)
Queue has 99 items initially.
Dumping complete:
0 items were added to the queue.
Extracted 3 items from the queue, queue has 96 items left
>>> l=dump_queue(q)
Queue has 96 items initially.
Dumping complete:
0 items were added to the queue.
Extracted 1 items from the queue, queue has 95 items left
>>>
What's happening here? Why aren't all the items being dumped?

Try this:
import Queue
import time
def dump_queue(queue):
"""
Empties all pending items in a queue and returns them in a list.
"""
result = []
for i in iter(queue.get, 'STOP'):
result.append(i)
time.sleep(.1)
return result
import multiprocessing
q = multiprocessing.Queue()
for i in range(100):
q.put([range(200) for j in range(100)])
q.put('STOP')
l=dump_queue(q)
print len(l)
Multiprocessing queues have an internal buffer which has a feeder thread which pulls work off a buffer and flushes it to the pipe. If not all of the objects have been flushed, I could see a case where Empty is raised prematurely. Using a sentinel to indicate the end of the queue is safe (and reliable). Also, using the iter(get, sentinel) idiom is just better than relying on Empty.
I don't like that it could raise empty due to flushing timing (I added the time.sleep(.1) to allow a context switch to the feeder thread, you may not need it, it works without it - it's a habit to release the GIL).

# in theory:
def dump_queue(q):
q.put(None)
return list(iter(q.get, None))
# in practice this might be more resilient:
def dump_queue(q):
q.put(None)
return list(iter(lambda : q.get(timeout=0.00001), None))
# but neither case handles all the ways things can break
# for that you need 'managers' and 'futures' ... see Commentary
I prefer None for sentinels, but I would tend to agree with jnoller that mp.queue could use a safe and simple sentinel. His comments on risks of getting empty raised early is also valid, see below.
Commentary:
This is old and Python has changed, but, this does come up has a hit if you're having issues with lists <-> queue in MP Python. So, let's look a little deeper:
First off, this is not a bug, it's a feature: https://bugs.python.org/issue20147. To save you some time from reading that discussion and more details in the documentation, here are some highlights (kind of philosophical but I think it might help some who are starting with MP/MT in Python):
MP Queues are structures capable of being communicated with from different threads, different processes on the same system, and in fact can be different (networked) computers
In general with parallel/distributed systems, strict synchronization is expensive, so every time you use part of the API for any MP/MT datastructures, you need to look at the documentation to see what it promises to do, or not. Hint: if a function doesn't include the word "lock" or "semaphore" or "barrier" etc, then it will be some mixture of "asynchronous" and "best effort" (approximate), or what you might call "flaky."
Specific to this situation: Python is an interpreted language, with a famous single interpreter thread with it's famous "Global Interpreter Lock" (GIL). If your entire program is single-process, single threaded, then everything is hunky dory. If not (and with MP it's egregiously not), you need to give the interpreter some breathing room. time.sleep() is your friend. In this case, timeouts.
In your solution you are only using flaky functions - get() and qsize(). And the code is in fact worse than you might think - dial up the size of the queue and the size of the objects and you're likely to break things:
Now, you can work with flaky routines, but you need to give them room to maneuver. In your example you're just hammering that queue. All you need to do is change the line thing = queue.get(block=False) to instead be thing = queue.get(block=True,timeout=0.00001) and you should be fine.
The time 0.00001 is chosen carefully (10^-5), it's about the smallest that you can safely make it (this is where art meets science).
Some comments on why you need the timout: this relates to the internals of how MP queues work. When you 'put' something into an MP queue, it's not actually put into the queue, it's queued up to eventually be there. That's why qsize() happens to give you a correct result - that part of the code knows there's a pile of things "in" the queue. You just need to realize that an object "in" the queue is not the same thing as "i can now read it." Think of MP queues as sending a letter with USPS or FedEx - you might have a receipt and a tracking number showing that "it's in the mail," but the recipient can't open it yet. Now, to be even more specific, in your case you get '0' items accessible right away. That's because the single interpreter thread you're running hasn't had any chance to process stuff that's "queued up", so your first loop just queues up a bunch of stuff for the queue, but you're immediately forcing your single thread to try to do a get() before it's even had a chance to line up even a single object for you.
One might argue that it slows code down to have these timeouts. Not really - MP queues are heavy-weight constructs, you should only be using them to pass pretty heavy-weight "things" around, either big chunks of data, or at least complex computation. the act of adding 10^-5 seconds actually does is give the interpreter a chance to do thread scheduling - at which point it will see your backed-up put() operations.
Caveat
The above is not completely correct, and this is (arguably) an issue with the design of the get() function. The semantics of setting timeout to non-zero is that the get() function will not block for longer than that before returning Empty. But it might not actually be Empty (yet). So if you know your queue has a bunch of stuff to get, then the second solution above works better, or even with a longer timeout. Personally I think they should have kept the timeout=0 behavior, but had some actual built-in tolerance of 1e-5, because a lot of people will get confused about what can happen around gets and puts to MP constructs.
In your example code, you're not actually spinning up parallel processes. If we were to do that, then you'd start getting some random results - sometimes only some of the queue objects will be removed, sometimes it will hang, sometimes it will crash, sometimes more than one thing will happen. In the below example, one process crashes and the other hangs:
The underlying problem is that when you insert the sentinel, you need to know that the queue is finished. That should be done has part of the logic around the queue - if for example you have a classical master-worker design, then the master would need to push a sentinel (end) when the last task has been added. Otherwise you end up with race conditions.
The "correct" (resilient) approach is to involve managers and futures:
import multiprocessing
import concurrent.futures
def fill_queue(q):
for i in range(5000):
q.put([range(200) for j in range(100)])
def dump_queue(q):
q.put(None)
return list(iter(q.get, None))
with multiprocessing.Manager() as manager:
q = manager.Queue()
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.submit(fill_queue, q) # add stuff
executor.submit(fill_queue, q) # add more stuff
executor.submit(fill_queue, q) # ... and more
# 'step out' of the executor
l = dump_queue(q)
# 'step out' of the manager
print(f"Saw {len(l)} items")
Let the manager handle your MP constructs (queues, dictionaries, etc), and within that let the futures handle your processes (and within that, if you want, let another future handle threads). This assures that things are cleaned up as you 'unravel' the work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.