So I'm using python's Condition from the threading module:
from threading import Thread, Condition
condition = Condition()
I have a Producer class (subclass of Thread) that essentially in a for loop adds items to a queue until the queue is full (i.e. reached a defined max length) and a Consumer class that in a for loop pops items unless the queue is empty. In the Producer, if the queue is full, we have a condition.wait() statement, and similarly in the consumer class, if the queue is empty we have a condition.wait() statement. If neither of these conditions are met (queue neither full nor empty) each class do its thing (adds an item to the queue or pops an item, for Producer or Consumer respectively) and then before releasing the condition ( condition.release()) , we have a condition.notify() statement. I read from the documentation that notify() wakes up one of the threads waiting.
My question now is two folds:
What does exactly "waking up" mean for a thread?
When I removed the notify() statements from both classes, my program runs just fine for few iteration of the for loops (i.e. producer pushes items and consumer pops items), and then at some point the producer keeps pushing items to the queue without consumer running, the queue gets full, but the consumer never runs again and the program just sorta halts.
What is the significance of notify() that seems essential to the performance of this program.
Thanks a lot for your help:)
1) It signals thread waiting for a lock release it can continue execution.
2) Condition.notify(n) takes up to n threads from the internal queue and calls release on the locks they are waiting for - thus waking them up. If the internal queue is empty, there is no one to wake up and the notify call has no effect. That's why from the beginning, removing notify had no effect, but once consumer threads called wait, there was no one to wake them up and they were waiting for ever.
TL;DR
Waking up here can be seen as rousing the producer or consumer from sleep AND informing them that they can get back to work as soon as their shared PPE (the underlying lock) is available.
Consumer saw when the queue got empty, and went to sleep, waiting for the producer to wake it up when it adds to the queue.
But you've taken away the producer's ability to wake the consumer up.
Producer saw when the queue got full, and went to sleep, waiting for the consumer to wake it up when it takes from the queue.
But you've taken away the consumer's ability to wake the producer up.
And now they're both asleep.
The Discussion Proper
The key to appreciating the significance of Condition.notify() lies in realizing that when you're wait()ing on a condition, you are NOT waiting for its underlying lock to become available, but rather, you're waiting for some notification or a timeout on that condition.
According to the Python docs on threading.Condition.wait
wait(timeout=None)
Wait until notified or until a timeout occurs...
When you invoke notify on a condition, you're essentially informing n threads which invoked wait on that condition object that they may stop waiting as soon as they can acquire the underlying lock.
This can be inferred from the Python docs on threading.Condition.notify
notify(n=1)
...
Note: an awakened thread does not actually return
from its wait() call until it can reacquire the lock.
Since notify() does not release the lock, its caller should.
Now, here's my educated guess at what is happening with your code:
Producer pushes a number of items onto queue.
Consumer consumes a number of items from queue.
Queue soon becomes empty.
Consumer wait()s to be notified that it may resume consuming (as soon as it can acquire the underlying lock)
Producer pushes items onto queue.
Producer never gets to notify() consumer that it may resume consuming.
Queue soon becomes full.
Producer wait()s to be notified that it may resume producing (as soon as it can acquire the underlying lock)
Consumer is waiting to be awakened by Producer, and Producer is waiting to be awakened by Consumer.
Stalemate!
Related
I am quite experienced in single-threaded Python as well as embarrasingly parallel multi-processing, but this is the first time I attempt processing something with a producer- and a consumer-thread via a shared queue.
The producer thread is going to download data items from URLs and put them in a queue. Simultaneously, a consumer thread is going to process the data items as they arrive on the queue.
Eventually, there will be no more data items to download and the program should terminate. I wish for the consumer thread to be able to distinguish whether it should keep waiting at an empty queue, because more items may be coming in, or it should terminate, because the producer thread is done.
I am considering signaling the latter situation by placing a special object on the queue in the producer thread when there are no more data items to download. When the consumer thread sees this object, it then stops waiting at the queue and terminates.
Is this a sensible approach?
I am starting multiple instances of the rabbitmq consumer(same queue) through a single process(multiprocessing). On an interrupt, I want all the consumers to gracefully shutdown. By that I mean, in case a process fetched from queue is already running, let it finish and then stop consuming any more requests and stop the queue.
Is there a way of knowing if queue is executing something and then wait for it to finish and then stop the queue?
The RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
Is there a way of knowing if queue is executing something and then wait for it to finish and then stop the queue?
No, there is no way to know this. You should be using message acknowledgements. When you wish to stop consuming, you can call basic_cancel and then exit that consumer process. RabbitMQ will only consider those messages as acknowledged as delivered, so you won't have to worry about losing a message.
I new to Python and am developing an application in Python 2.7. I am using a thread pool provided by the concurrent.futures library. Once a thread from ThreadPool is started, it needs to wait for some message from RabbitMQ.
How can I implement this logic in Python to make this thread from the pool wait for event messages? Basically I need to wake up a waiting thread once I receive message from RabbitMQ (i.e wait and notify implementation on ThreadPool).
First you define a Queue:
from Queue import Queue
q = Queue()
then, in your thread, you attempt to get an item from that queue:
msg = q.get()
this will block the entire thread until there is something to be found in the queue.
Now, at the same time, assuming your incoming events are notified by means of triggering callbacks, you register a callback that simply puts the received RabbitMQ message in the queue:
def on_message(msg):
q.put(msg)
rabbitmq_channel.register_callback(on_message)
or if you like shorter code:
rabbitmq_channel.register_callback(lambda msg: q.put(msg))
(the above is pseudocode because I've not used RabbitMQ nor whatever Python bindings for RabbitMQ, but you should be able to easily figure out how to adapt the snippet to your real application code; the key part to pay attention to is q.put(msg)—just make sure that part gets invoked as soon as a new message is notified.)
as soon as this happens, the thread is awakened and is free to process the message. In order to reuse the same thread for multiple messages, just use a while loop:
while True:
msg = q.get()
process_message(msg)
P.S. I would suggest looking into Gevent and how to combine it with RabbitMQ in your Python application so as to be able to get rid of threads and use more lightweight and scalable green threading mechanism instead without ever having to manage a threadpool (because you can just have tens of thousands of greenlets spawned and killed on the fly):
# this thing always called in a green thread; forget about pools and queues.
def on_message(msg):
# you're in a green thread now; just process away!
benefit_from("all the gevent goodness!")
spawn_and_join_10_sub_greenlets()
rabbitmq_channel.register_callback(lambda msg: gevent.spawn(on_message, msg))
I have a queue which can contain max 4 queued objects. These objects are threads running web service requests. The thread part is OK.
I have followed many tutorials which talk about consumer and producer threads used to fill in and out the queue object.
My question is about the consumer part. In all these tutorials and regarding the python doc, the only way I have found to pile out objects from the queue is :
while len(requltArray) < amountOfThreads:
thread = q.get(True)
thread.join()
Imagine the q.get(True) piles out a thread with an invalid web service request. And imagine this thread have to wait for urllib timeout to end. My consumer will be blocked for some seconds. As my queue is limited to 4 threads and maybe the 3 others have ended yet, I waste time until consumer can continue the pile-outs (and producer can fill the queue).
Is there any way or well-known design pattern to avoid this waste of time ?
Thanks for your help
Maybe you could use conditions
Imagine you want to put a new thread into your queue but it is full. Then you could wait() for the condition object (pseudo code):
condition.acquire()
while not queue.has_free_place():
condition.wait()
add_new_thread_to_queue()
condition.release()
And inside your queued threads you could place something like the following code at the end of execution:
condition.acquire()
remove_myself_from_queue()
condition.notify()
condition.release()
My consumer side of the queue:
m = queue.get()
queue.task_done()
<rest of the program>
Questions:
Does task_done() effectively pops m off the queue and release whatever locks the consumer has on the queue?
I need to use m during the rest of the program. Is it safe, or do I need to copy it before I call task_done() or is m usable after task_done()?
be happy
No, queue.get() pops the item off the queue. After you do that, you can do whatever you want with it, as long as the producer works like it should and doesn't touch it anymore. queue.task_done() is called only to notify the queue that you are done with something (it doesn't even know about the specific item, it just counts unfinished items in the queue), so that queue.join() knows the work is finished.