I'm having issues with asyncio queues. Execution gets stuck on await queue.get() if the queue is empty - even if I publish something into the queue.
I have a loop which reads the event queue, which starts right after the app loads, so the queue is empty on the first await. In a different co-routine I publish a message to this queue, however the execution waits on the await statement. Only a single consumer is reading the queue. I publish the message using put_nowait():
async def _event_loop(self):
while True:
try:
# if self.events.empty():
# await asyncio.sleep(0.1)
# continue
ev = await self.events.get()
print(ev)
If I uncomment the commented out part, the whole thing starts working.
I noticed a similar issue here:
https://github.com/mosquito/aio-pika/issues/56
But I had no luck figuring out how to fix this.
Does anyone have any idea what's wrong?
You are filling the queue from a thread different than the one that runs the event loop. By design, asyncio queues are not thread-safe and can only be safely accessed from asyncio coroutines and callbacks.
You can fix the issue by changing your call to queue.put_nowait(elem), to something like loop.call_soon_threadsafe(queue.put_nowait, elem), where loop is the event loop object which you must also pass to the thread, probably the same way you pass the queue.
why would then the uncommented part of the code fix the issue?
Uncommenting effectively removes the need for the coroutine to wake up while waiting on an empty queue. The wakeup didn't work because put_nowait assumes it is run from the event loop thread, and therefore doesn't need to emit an additional wakeup signal. See e.g. this answer for details.
Related
I have one program that collects data from a websocket, processes the data and if some conditions apply I want to call another function that does something with the data.
This is easy enough, but I want the program that collects the data from the websocket to keep running.
I have 'fixed' this quite ugly by writing the data in a database and letting the second program check the database every few seconds. But I don't want to use this solution, since I occasionally get database is locked errors.
Is there a way to start program B from program A while program A keeps running?
I have looked at multi threading and multi processing, and I feel this could be a way to solve it, but while I grasp the basic of that, it is still a bit too difficult for me to use.
Is there an easier way? and if not should I study multi threading or multi processing more?
(or if anyone knows a good guide/video, that would be great too!)
I suggest launching a worker thread, waiting for data to process. Main thread listen to websocket, and send data to worker through pipe.
The logic of worker is:
while True:
data = peek_data_or_sleep(pipe)
process_data(data)
This way you won't get thousands of workers when incoming traffic is high.
So the key point is how to send data to worker, usually a pipe or message queue.
I've used Celery with RabbitMQ as message queue. Send data to Celery from Django server, and Celery call your function from another process.
Here is an example assuming you are using asyncio for WebSockets:
import asyncio
from time import sleep
async def web_socket(queue: asyncio.Queue):
for i in range(5):
await asyncio.sleep(1.0)
await queue.put(f"Here is message n°{i}!")
await queue.put(None)
def expensive_work(message: str):
sleep(0.5)
print(message)
async def worker(queue: asyncio.Queue):
while True:
message = await queue.get()
if message is None: break
await asyncio.to_thread(expensive_work, message)
async def main():
queue = asyncio.Queue()
await asyncio.gather(
web_socket(queue),
worker(queue)
)
if __name__ == "__main__":
asyncio.run(main())
The web_socket() function simulates a websocket listener which receives messages. For each received message, it put it in a queue that will be shared with another task running concurrently and processing the message.
The expensive_work() function simulates the processing task to apply to each message.
The worker() function will be running concurrently to the websocket listener. It reads values from the shared queue and process them. If the processing is really expensive (for instance a CPU-bound task) consider running it in a ProcessPoolExecutor (see here how to do that) to avoid blocking the event loop.
Finally, the main() function creates the shared queue, launches the two tasks concurrently with asyncio.gather() and then awaits the completion of both tasks.
If you are using threads and blocking IO, the solution is essentially similar but using threading Threads and queue.Queue. Beware not to mix multithreading and asyncio concurrency, or search on how to do it properly.
I'm launching a new process (edit the same thing applies to a new thread) for computations from an async event loop. The new process has it's own asyncio event loop running and runs fine without any kind of blocking behavior.
I created two queues (multiprocessing.Queue or multiprocessing.Manager.Queue), one for outgoing messages, and another for incoming messages. I get the same behavior with both queues. The queue for outgoing messages is working fine, as I put/get a message on the queue with :
await asyncio.get_running_loop().run_in_executor(None, self.incoming_queue.put, msg)
msg = await asyncio.get_running_loop().run_in_executor(None, self.incoming_queue.get, True, 1)
However, when I attempt to run the same get() command in my original asyncio application using the asyncio run_in_executor command, it just hangs. The event loop itself seems fine and responsive.
Disabling the working queue doesn't change things, and neither does the executor (default, thread, or process).
Ideas?
I've decided to make an answer here based on my investigation. In short: what works in a new event loop in a new process does NOT work in the Django Channels event loop for one reason or another.
My current solution is to manually create a new thread to run my synchronous listener in. I'm looking into options for why the Channels event loop wouldn't work in my use case.
I am writing a Python program which schedules a number of asynchronous, I/O-bound items to occur, many of which will also be scheduling other, similar work items. The work items themselves are completely independent of one another and they do not require each others' results to be complete, nor do I need to gather any results from them for any sort of local output (beyond logging, which takes place as part of the work items themselves).
I was originally using a pattern like this:
async def some_task(foo):
pending = []
for x in foo:
# ... do some work ...
if some_condition:
pending.append(some_task(bar))
if pending:
await asyncio.wait(pending)
However, I was running into trouble with some of the nested asyncio.wait(pending) calls sometimes hanging forever, even though the individual things being awaited were always completing (according to the debug output that was produced when I used KeyboardInterrupt to list out the state of the un-gathered results, which showed all of the futures as being in the done state). When I asked others for help they said I should be using asyncio.create_task instead, but I am not finding any useful information about how to do this nor have I been able to get clarification from the people who suggested this.
So, how can I satisfy this use case?
Python asyncio.Queue may help to tie your program processing to program completion. It has a join() method which will block until all items in the queue have been received and processed.
Another benefit that I like is that the worker becomes more explicit as it pulls from a queue processes, potentially adds more items, and then ACKS, but this is just personal preference.
async def worker(q):
while True:
item = await queue.get()
# process item potentially requeue more work
if some_condition:
await q.put('something new')
queue.task_done()
async def run():
queue = asyncio.Queue()
worker = asyncio.ensure_future(worker(queue))
await queue.join()
worker.cancel()
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()
The example above was adapted from asyncio producer_consumer example and modified since your worker both consumes and produces:
https://asyncio.readthedocs.io/en/latest/producer_consumer.html
I'm not super sure how to fix your specific example but I would def look at the primitives that asyncio offers to help the event loop hook into your program state, notably join and using a Queue.
I new to Python and am developing an application in Python 2.7. I am using a thread pool provided by the concurrent.futures library. Once a thread from ThreadPool is started, it needs to wait for some message from RabbitMQ.
How can I implement this logic in Python to make this thread from the pool wait for event messages? Basically I need to wake up a waiting thread once I receive message from RabbitMQ (i.e wait and notify implementation on ThreadPool).
First you define a Queue:
from Queue import Queue
q = Queue()
then, in your thread, you attempt to get an item from that queue:
msg = q.get()
this will block the entire thread until there is something to be found in the queue.
Now, at the same time, assuming your incoming events are notified by means of triggering callbacks, you register a callback that simply puts the received RabbitMQ message in the queue:
def on_message(msg):
q.put(msg)
rabbitmq_channel.register_callback(on_message)
or if you like shorter code:
rabbitmq_channel.register_callback(lambda msg: q.put(msg))
(the above is pseudocode because I've not used RabbitMQ nor whatever Python bindings for RabbitMQ, but you should be able to easily figure out how to adapt the snippet to your real application code; the key part to pay attention to is q.put(msg)—just make sure that part gets invoked as soon as a new message is notified.)
as soon as this happens, the thread is awakened and is free to process the message. In order to reuse the same thread for multiple messages, just use a while loop:
while True:
msg = q.get()
process_message(msg)
P.S. I would suggest looking into Gevent and how to combine it with RabbitMQ in your Python application so as to be able to get rid of threads and use more lightweight and scalable green threading mechanism instead without ever having to manage a threadpool (because you can just have tens of thousands of greenlets spawned and killed on the fly):
# this thing always called in a green thread; forget about pools and queues.
def on_message(msg):
# you're in a green thread now; just process away!
benefit_from("all the gevent goodness!")
spawn_and_join_10_sub_greenlets()
rabbitmq_channel.register_callback(lambda msg: gevent.spawn(on_message, msg))
My consumer side of the queue:
m = queue.get()
queue.task_done()
<rest of the program>
Questions:
Does task_done() effectively pops m off the queue and release whatever locks the consumer has on the queue?
I need to use m during the rest of the program. Is it safe, or do I need to copy it before I call task_done() or is m usable after task_done()?
be happy
No, queue.get() pops the item off the queue. After you do that, you can do whatever you want with it, as long as the producer works like it should and doesn't touch it anymore. queue.task_done() is called only to notify the queue that you are done with something (it doesn't even know about the specific item, it just counts unfinished items in the queue), so that queue.join() knows the work is finished.