Python - how exactly Queue works? - python

Regarding example from documentation:
https://docs.python.org/2/library/queue.html#Queue.Queue.get
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
How actually the worker would know all work is done, the queue is empty and we can exit? I don't understand it...

Your worker hangs in a while True: loop which means that the function/thread will never return.
The "magic" lies in the code you didn't show:
t = Thread(target=worker)
t.daemon = True
t.start()
The daemon parameter controls when the uppper thread can exit
The entire Python program exits when no alive non-daemon threads are left.
Which means, that the program will exit, because the main thread exists.
The worker thread thenically lives, but will be destroyed when the main thread ends (because "there are no non-daemon threads left").
The main thread exit condition is
q.join()
The documentatino for join will show, when it stops blocking the execution.
[...] When the count of unfinished tasks drops to zero, join() unblocks.

I'll keep it simple. Queue is basically collections of items like list for instance, difference it doesn't allow random access of elements. You insert and delete items in certain manner. The default type of queue is FIFO( first in first out). As you can figure from its name, its like a normal queue you see at any supermarket(or any place) the first person who entered the line will leave first.
There are three types of queue:
FIFO
LIFO
PRIORITY
FIFO like i said has rule of first in first out:
import queue #importing the library
q=queue.Queue() #create a queue object
for i in range(5):
print(q.put(i)) #adding elements into our queue
while not q.empty():
print(q.get()) #to delete item and printing it
LIFO works on the principle of last in first out:
import queue #importing the library
q=queue.LifoQueue() #create a queue object
for i in range(5):
print(q.put(i)) #adding elements into our queue
while not q.empty():
print(q.get()) #to delete item and printing it
PRIORTY queue gives out data in ascending order, as in, the smallest one will exit the queue first.
import queue #importing the library
q=queue.LifoQueue() #create a queue object
q.put(3)
q.put(7)
q.put(2)
q.put(7)
q.put(1)
while not q.empty():
print(q.get()) #to delete item and printing it
To answer your last question, as you can see in example, you can use q.empty() to check if your queue is empty or not.
If you have any further doubt feel free to ask.

Related

Python, ThreadPoolExecutor, pool execution doesnt terminate

I have got I simple code modelling a more complicated problem I am to solve. Here I have 3 funcs- worker, task submitter (seek tasks and put it to queue once it gets new ones) and function creating a pool and adding new tasks to this pool. But the code doesnt happen to finish the run after queue gets empty and all the tasks in a list turn finished.I am too dump to have an idea why the hell it doesnt terminate the While loop with condition... I have tried a different ways to code the thing, nothing works
from concurrent.futures import ThreadPoolExecutor as Tpe
import time
import random
import queue
import threading
def task_submit(q):
for i in range(7):
threading.currentThread().setName('task_submit')
new_task = random.randint(10, 20)
q.put_nowait(new_task)
print(f' {i} new task with argument {new_task} has been added to queue')
time.sleep(5)
def worker(t):
threading.currentThread().setName(f'worker {t}')
print(f'{threading.currentThread().getName()} started')
time.sleep(t)
print(f'{threading.currentThread().getName()} FINISHED!')
def execution():
executor = Tpe(max_workers=4)
q = queue.Queue(maxsize=100)
q_thread = executor.submit(task_submit, q)
tasks = [executor.submit(worker, q.get())]
execution_finished = False
while not execution_finished: #all([task.done() for task in tasks]):
if not all([task.done() for task in tasks]):
print(' still in progress .....................')
tasks.append(executor.submit(worker, q.get()))
else:
print(' all done!')
executor.shutdown()
execution_finished = True
execution()
It doesn't terminate because you are trying to remove an item from an empty queue. The problem is here:
while not execution_finished:
if not all([task.done() for task in tasks]):
print(' still in progress .....................')
tasks.append(executor.submit(worker, q.get()))
The last line here submits a new work item to the executor. Suppose that happens to be the last item in the queue. At that moment, the executor is not finished and will not be finished for a few seconds. Your main thread goes back to the while not execution_finished line, and the if statement evaluates true because some of the tasks are still running. So you try to submit one more item but you can't, because the queue is now empty. The call to q.get blocks the main loop until the queue contains an item, which never happens. The other threads finish but the program doesn't exit because the main thread is blocked.
Perhaps you should check for an empty queue, but I'm not sure that's the right idea because I probably don't understand your requirements. In any case, that's why your script doesn't exit.

In Python multiprocessing when child process writes data to Queue and no one reads it, child process does not exit. WHY

I have a python code where the main process creates a child process. There is a shared queue between the two processes. The child process writes some data to this shared queue. The main process join()s on the child process.
If the data in the queue is not removed with get(), the child process does not terminate and the main is blocked at join(). Why is it so.
Following is the code that I used :
from multiprocessing import Process, Queue
from time import *
def f(q):
q.put([42, None, 'hello', [x for x in range(100000)]])
print (q.qsize())
#q.get()
print (q.qsize())
q = Queue()
print (q.qsize())
p = Process(target=f, args=(q,))
p.start()
sleep(1)
#print (q.get())
print('bef join')
p.join()
print('aft join')
At present the q.get() is commented and so the output is :
0
1
1
bef join
and then the code is blocked.
But if I uncomment one of the q.get() invocations, then the code runs completely with the following output :
0
1
0
bef join
aft join
Well, if you look at the Queue documentation, it explicitly says that
Queue.join : Blocks until all items in the queue have been gotten and processed. It seems logic to me that join() blocks your program if you don't empty the Queue.
To me, you need to learn about the philosophy of Multiprocessing. You have several tasks to do that don't need each other to be run, and your program at the moment is too slow for you. You need to use Multiprocess !
But don't forget there will (trust me) come a time when you will need to wait until some parallel computations are all done, because you need all of these elements to do your next task. And that's where, in your case, join() comes in. You are basically saying : I was doing things asynchronously. But now, my next task needs to be synced with the different items I computed before. Let's wait here until they are all ready.

Worker stopping at first item reading from queue.Queue

I am trying to understand how Queue works
from queue import Queue
from threading import Thread
q = Queue()
urls = ['http://www.linkedin.com', 'http://www.amazon.com', 'http://www.facebook.com', 'http://www.uber.com']
def worker():
item = q.get()
print(item)
q.task_done()
for i in range(1):
t = Thread(target=worker)
t.daemon = True
t.start()
for url in urls:
q.put(url)
q.join()
I was expecting it to print out all of the URL's but only the first one is being printed out.
I thought that the worker would get the first item, print it out, then go back to grab the next item. In this case, I'm just creating 1 thread but can add more threads once I understand what is going on.
Why is it only printing the first URL?
Your worker only runs its code once -- grabbing one item from the queue, printing it, then exiting. To grab everything: you'll need a loop.
Since you've started this thread as a daemon, it's easy to just loop forever. You're essentially spinning off a thread that says "Grab something out of the queue if there's something there. If not, wait 'till there is. Print that thing, then repeat until the program exits."
def worker():
while True:
item = q.get()
print(item)
q.task_done()
What a queue is usually used for is either an easy FIFO stack (for which you could arguably recommend collections.deque in its place) or as a means of coordinating a whole group of workers to do distributed work. Imagine you have a group of 4:
NUM_WORKERS = 4
for _ in range(NUM_WORKERS):
t = Thread(daemon=True, target=worker)
t.start()
and wanted to handle a whole bunch of items
for i in range(1, 1000001):
# 1..1000000
q.put(i)
Now the work will be distributed among all four workers, without any worker grabbing the same item as another. This serves to coordinate your concurrency.

When Starting multiple threads in a Queue, which one gets the first off the stack

The python example gives anexample of how to wait for enqueued tasks to be completed but I am not sure how the order of retrieval is determined. Here is the code:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
As I interpret it, this code starts whatever the range of threads is, the puts an however many items are in the source in the queue.
So if you start 20 threads, and put 30 items in the queue, it seems like you will have 20 worker threads all calling
while True:
item = q.get()
do_work(item)
So the first time an item is put on a queue, which of the 20 threads actually gets the item just put on the queue?
Generally speaking, there isn't going to be a guaranteed order, only guaranteed mutual exclusion. Assuming you are using something like queue.Queue (Python 3), it uses synchronization primitives to ensure only one thread can get() an item at a time. But the order in which the threads get their chance will be affected by the vagaries of the OS scheduler - load, priorities, etc.

Python: Multithreading using join and Queue sometimes blocks forever

My code is as follows:
def PreDutyCycleSolve(self, procCount):
z = self.crystal.z
#D1 = np.empty(len(z))
#D2 = np.empty(len(z))
D1D2q = multiprocessing.Queue()
procs = []
for proc in range(procCount):
p = multiprocessing.Process(target=self.DutyCycleSolve,
args=(proc,
z[proc::procCount],
D1D2q))
procs.append(p)
for proc in procs:
proc.start()
for proc in procs:
proc.join()
while D1D2q.empty() is False:
x = D1D2q.get()
print x
I have a function, DutyCycleSolve, which get divided up and run from (in my case, four processes). The issue is, depending on the length of the array, z, sometimes, the code just gets stuck and never proceeds past proc.join. I've verified (by printing some text in self.DutyCycleSolve that self.DutyCycleSolve always returns and the process always exits from that function.
It appears that it exits from the function, and then (sometimes) gets stuck at join.
Any ideas why? I'm new to this.
Thanks.
From the docs:
Bear in mind that a process that has put items in a queue will wait
before terminating until all the buffered items are fed by the
“feeder” thread to the underlying pipe. [...]
This means that whenever you use a queue you need to make sure that
all items which have been put on the queue will eventually be removed
before the process is joined. Otherwise you cannot be sure that
processes which have put items on the queue will terminate. Remember
also that non-daemonic processes will be automatically be joined.
In other words, whenever you use queues, the right way to go is get() first, and then join(). See the docs for an example.

Categories

Resources