Python Queue get()/task_done() issue - python

My consumer side of the queue:
m = queue.get()
queue.task_done()
<rest of the program>
Questions:
Does task_done() effectively pops m off the queue and release whatever locks the consumer has on the queue?
I need to use m during the rest of the program. Is it safe, or do I need to copy it before I call task_done() or is m usable after task_done()?
be happy

No, queue.get() pops the item off the queue. After you do that, you can do whatever you want with it, as long as the producer works like it should and doesn't touch it anymore. queue.task_done() is called only to notify the queue that you are done with something (it doesn't even know about the specific item, it just counts unfinished items in the queue), so that queue.join() knows the work is finished.

Related

Is it reasonable to terminate multi-threaded processing via a queue by means of a special object in the queue?

I am quite experienced in single-threaded Python as well as embarrasingly parallel multi-processing, but this is the first time I attempt processing something with a producer- and a consumer-thread via a shared queue.
The producer thread is going to download data items from URLs and put them in a queue. Simultaneously, a consumer thread is going to process the data items as they arrive on the queue.
Eventually, there will be no more data items to download and the program should terminate. I wish for the consumer thread to be able to distinguish whether it should keep waiting at an empty queue, because more items may be coming in, or it should terminate, because the producer thread is done.
I am considering signaling the latter situation by placing a special object on the queue in the producer thread when there are no more data items to download. When the consumer thread sees this object, it then stops waiting at the queue and terminates.
Is this a sensible approach?

Await queue.get is stuck on an empty queue

I'm having issues with asyncio queues. Execution gets stuck on await queue.get() if the queue is empty - even if I publish something into the queue.
I have a loop which reads the event queue, which starts right after the app loads, so the queue is empty on the first await. In a different co-routine I publish a message to this queue, however the execution waits on the await statement. Only a single consumer is reading the queue. I publish the message using put_nowait():
async def _event_loop(self):
while True:
try:
# if self.events.empty():
# await asyncio.sleep(0.1)
# continue
ev = await self.events.get()
print(ev)
If I uncomment the commented out part, the whole thing starts working.
I noticed a similar issue here:
https://github.com/mosquito/aio-pika/issues/56
But I had no luck figuring out how to fix this.
Does anyone have any idea what's wrong?
You are filling the queue from a thread different than the one that runs the event loop. By design, asyncio queues are not thread-safe and can only be safely accessed from asyncio coroutines and callbacks.
You can fix the issue by changing your call to queue.put_nowait(elem), to something like loop.call_soon_threadsafe(queue.put_nowait, elem), where loop is the event loop object which you must also pass to the thread, probably the same way you pass the queue.
why would then the uncommented part of the code fix the issue?
Uncommenting effectively removes the need for the coroutine to wake up while waiting on an empty queue. The wakeup didn't work because put_nowait assumes it is run from the event loop thread, and therefore doesn't need to emit an additional wakeup signal. See e.g. this answer for details.

What is the significance of condition.notify() in python's threading module?

So I'm using python's Condition from the threading module:
from threading import Thread, Condition
condition = Condition()
I have a Producer class (subclass of Thread) that essentially in a for loop adds items to a queue until the queue is full (i.e. reached a defined max length) and a Consumer class that in a for loop pops items unless the queue is empty. In the Producer, if the queue is full, we have a condition.wait() statement, and similarly in the consumer class, if the queue is empty we have a condition.wait() statement. If neither of these conditions are met (queue neither full nor empty) each class do its thing (adds an item to the queue or pops an item, for Producer or Consumer respectively) and then before releasing the condition ( condition.release()) , we have a condition.notify() statement. I read from the documentation that notify() wakes up one of the threads waiting.
My question now is two folds:
What does exactly "waking up" mean for a thread?
When I removed the notify() statements from both classes, my program runs just fine for few iteration of the for loops (i.e. producer pushes items and consumer pops items), and then at some point the producer keeps pushing items to the queue without consumer running, the queue gets full, but the consumer never runs again and the program just sorta halts.
What is the significance of notify() that seems essential to the performance of this program.
Thanks a lot for your help:)
1) It signals thread waiting for a lock release it can continue execution.
2) Condition.notify(n) takes up to n threads from the internal queue and calls release on the locks they are waiting for - thus waking them up. If the internal queue is empty, there is no one to wake up and the notify call has no effect. That's why from the beginning, removing notify had no effect, but once consumer threads called wait, there was no one to wake them up and they were waiting for ever.
TL;DR
Waking up here can be seen as rousing the producer or consumer from sleep AND informing them that they can get back to work as soon as their shared PPE (the underlying lock) is available.
Consumer saw when the queue got empty, and went to sleep, waiting for the producer to wake it up when it adds to the queue.
But you've taken away the producer's ability to wake the consumer up.
Producer saw when the queue got full, and went to sleep, waiting for the consumer to wake it up when it takes from the queue.
But you've taken away the consumer's ability to wake the producer up.
And now they're both asleep.
The Discussion Proper
The key to appreciating the significance of Condition.notify() lies in realizing that when you're wait()ing on a condition, you are NOT waiting for its underlying lock to become available, but rather, you're waiting for some notification or a timeout on that condition.
According to the Python docs on threading.Condition.wait
wait(timeout=None)
Wait until notified or until a timeout occurs...
When you invoke notify on a condition, you're essentially informing n threads which invoked wait on that condition object that they may stop waiting as soon as they can acquire the underlying lock.
This can be inferred from the Python docs on threading.Condition.notify
notify(n=1)
...
Note: an awakened thread does not actually return
from its wait() call until it can reacquire the lock.
Since notify() does not release the lock, its caller should.
Now, here's my educated guess at what is happening with your code:
Producer pushes a number of items onto queue.
Consumer consumes a number of items from queue.
Queue soon becomes empty.
Consumer wait()s to be notified that it may resume consuming (as soon as it can acquire the underlying lock)
Producer pushes items onto queue.
Producer never gets to notify() consumer that it may resume consuming.
Queue soon becomes full.
Producer wait()s to be notified that it may resume producing (as soon as it can acquire the underlying lock)
Consumer is waiting to be awakened by Producer, and Producer is waiting to be awakened by Consumer.
Stalemate!

How to wakeup thread from thread pool in python?

I new to Python and am developing an application in Python 2.7. I am using a thread pool provided by the concurrent.futures library. Once a thread from ThreadPool is started, it needs to wait for some message from RabbitMQ.
How can I implement this logic in Python to make this thread from the pool wait for event messages? Basically I need to wake up a waiting thread once I receive message from RabbitMQ (i.e wait and notify implementation on ThreadPool).
First you define a Queue:
from Queue import Queue
q = Queue()
then, in your thread, you attempt to get an item from that queue:
msg = q.get()
this will block the entire thread until there is something to be found in the queue.
Now, at the same time, assuming your incoming events are notified by means of triggering callbacks, you register a callback that simply puts the received RabbitMQ message in the queue:
def on_message(msg):
q.put(msg)
rabbitmq_channel.register_callback(on_message)
or if you like shorter code:
rabbitmq_channel.register_callback(lambda msg: q.put(msg))
(the above is pseudocode because I've not used RabbitMQ nor whatever Python bindings for RabbitMQ, but you should be able to easily figure out how to adapt the snippet to your real application code; the key part to pay attention to is q.put(msg)—just make sure that part gets invoked as soon as a new message is notified.)
as soon as this happens, the thread is awakened and is free to process the message. In order to reuse the same thread for multiple messages, just use a while loop:
while True:
msg = q.get()
process_message(msg)
P.S. I would suggest looking into Gevent and how to combine it with RabbitMQ in your Python application so as to be able to get rid of threads and use more lightweight and scalable green threading mechanism instead without ever having to manage a threadpool (because you can just have tens of thousands of greenlets spawned and killed on the fly):
# this thing always called in a green thread; forget about pools and queues.
def on_message(msg):
# you're in a green thread now; just process away!
benefit_from("all the gevent goodness!")
spawn_and_join_10_sub_greenlets()
rabbitmq_channel.register_callback(lambda msg: gevent.spawn(on_message, msg))

Python using queues for countdown watchdog timer

I have a program which spawns 4 threads, these threads need to stay running indefinitely and if one of them crashes I need to know so I can restart.
If I use a list with 4 numbers and pass it to each thread through using a queue. Then all each thread has to do is reset its section in the timer while the main thread counts it down.
So the queue will never be empty, only a single value could go to 0, and then if this happens then the main thread knows its child hasn't responded and it can act accordingly.
But every time I .get() from the queue, it makes it empty, so I have to get from the queue, store into a variable, modify the variable and put it back in the queue.
Is this fine using the queue like this for a watchdog.
If you're using Threads, you could regularly check through threading.enumerate to make sure that you have the correct number and kind of threads running.
But, also, passing things into a Queue that gets returned from a thread is a technique that I have at least seen used to make sure that threads are still running. So, if I'm understanding you correctly, what you're doing isn't completely crazy.
Your "thread must re-set its sentinal occasionally" might make more sense to have as a list of Queues that each Thread is expected to respond to asap. This depends on if your Threads are actually doing process-intensive stuff, or if they're just backgrounded for interface reasons. If they're not spending all their time doing math, you could do something like:
def guarded_thread(sentinal_queue, *args):
while True:
try:
sentinal_queue.get_nowait()
sentinal_queue.put('got it')
except Queue.Empty:
# we just want to make sure that we respond if we have been
# pinged
pass
# do actual work with other args
def main(arguments):
queues = [Queue() for q in range(4)]
threads = [(Thread(target=guarded_thread, args=(queue, args)), queue)
for queue, args in zip(queues, arguments)]
for thread, queue in threads:
thread.start()
while True:
for thread, queue in threads:
queue.put(True)
for thread, queue in threads:
try:
response = queue.get(True, MAX_TIMEOUT)
if response != 'got it':
# either re-send or restart the thread
except Queue.Empty:
# restart the thread
time.sleep(PING_INTERVAL)
Note that you could also use different request/response queues to avoid having different kinds of sentinal values, it depends on your actual code which one would look less crazy.

Categories

Resources