I've been looking around and see some answers saying to use globals, but doesn't seem thread safe. I have also tried using queues but that is apparently blocking, at least in how I did it. Can someone help/show an example on how to launch a thread from the main thread and communicate between one thread to another in a non-blocking thread safe way? Basically, the use case is that the threads will be looping and checking fairly constantly if there's something that needs to be done/changed and act accordingly. Thanks for the help
Python Queues are thread safe according to the documentation. I don't think it should be a problem to push and pop from a shared queue within threads. https://docs.python.org/3/library/queue.html
I am quite experienced in single-threaded Python as well as embarrasingly parallel multi-processing, but this is the first time I attempt processing something with a producer- and a consumer-thread via a shared queue.
The producer thread is going to download data items from URLs and put them in a queue. Simultaneously, a consumer thread is going to process the data items as they arrive on the queue.
Eventually, there will be no more data items to download and the program should terminate. I wish for the consumer thread to be able to distinguish whether it should keep waiting at an empty queue, because more items may be coming in, or it should terminate, because the producer thread is done.
I am considering signaling the latter situation by placing a special object on the queue in the producer thread when there are no more data items to download. When the consumer thread sees this object, it then stops waiting at the queue and terminates.
Is this a sensible approach?
I'm pretty new to multiprocessing in Python and I've done a lot of digging around, but can't seem to find exactly what I'm looking for. I have a bit of a consumer/producer problem where I have a simple server with an endpoint that consumes from a queue and a function that produces onto the queue. The queue can be full, so the producer doesn't always need to be running.
While the queue isn't full, I want the producer task to run but I don't want it to block the server from receiving or servicing requests. I tried using multithreading but this producing process is very slow and the GIL slows it down too much. I want the server to be running all the time, and whenever the queue is no longer full (something has been consumed), I want to kick off this producer task as a separate process and I want it to run until the queue is full again. What is the best way to share the queue so that the producer process can access the queue used by the main process?
What is the best way to share the queue so that the producer process can access the queue used by the main process?
If this is the important part of your question (which seems like it's actually several questions), then multiprocessing.Queue seems to be exactly what you need. I've used this in several projects to have multiple processes feed a queue for consumption by a separate process, so if that's what you're looking for, this should work.
I'm trying to stay connected to multiple queues in RabbitMQ. Each time I pop a new message from one of these queue, I'd like to spawn an external process.
This process will take some time to process the message, and I don't want to start processing another message from that specific queue until the one I popped earlier is completed. If possible, I wouldn't want to keep a process/thread around just to wait on the external process to complete and ack the server. Ideally, I would like to ack in this external process, maybe passing some identifier so that it can connect to RabbitMQ and ack the message.
Is it possible to design this system with RabbitMQ? I'm using Python and Pika, if this is relevant to the answer.
Thanks!
RabbitMQ can do this.
You only want to read from the queue when you're ready - so spin up a thread that can spawn the external process and watch it, then fetch the next message from the queue when the process is done. You can then have mulitiple threads running in parallel to manage multiple queues.
I'm not sure what you want an ack for? Are you trying to stop RabbitMQ from adding new elements to that queue if it gets too full (because its elements are being processed too slowly/not at all)? There might be a way to do this when you add messages to the queues - before adding an item, check to make sure that the number of messages already in that queue is not "much greater than" the average across all queues?
I have read lots about python threading and the various means to 'talk' across thread boundaries. My case seems a little different, so I would like to get advice on the best option:
Instead of having many identical worker threads waiting for items in a shared queue, I have a handful of mostly autonomous, non-daemonic threads with unique identifiers going about their business. These threads do not block and normally do not care about each other. They sleep most of the time and wake up periodically. Occasionally, based on certain conditions, one thread needs to 'tell' another thread to do something specific - an action -, meaningful to the receiving thread. There are many different combinations of actions and recipients, so using Events for every combination seems unwieldly. The queue object seems to be the recommended way to achieve this. However, if I have a shared queue and post an item on the queue having just one recipient thread, then every other thread needs monitor the queue, pull every item, check if it is addressed to it, and put it back in the queue if it was addressed to another thread. That seems a lot of getting and putting items from the queue for nothing. Alternatively, I could employ a 'router' thread: one shared-by-all queue plus one queue for every 'normal' thread, shared with the router thread. Normal threads only ever put items in the shared queue, the router pulls every item, inspects it and puts it on the addressee's queue. Still, a lot of putting and getting items from queues....
Are there any other ways to achieve what I need to do ? It seems a pub-sub class is the right approach, but there is no such thread-safe module in standard python, at least to my knowledge.
Many thanks for your suggestions.
Instead of having many identical worker threads waiting for items in a shared queue
I think this is the right approach to do this. Just remove identical and shared from the above statement. i.e.
having many worker threads waiting for items in queues
So I would suggest using Celery for this approach.
Occasionally, based on certain conditions, one thread needs to 'tell'
another thread to do something specific - an action, meaningful to the receiving thread.
This can be done by calling another celery task from within the calling task. All the tasks can have separate queues.
Thanks for the response. After some thoughts, I have decided to use the approach of many queues and a router-thread (hub-and-spoke). Every 'normal' thread has its private queue to the router, enabling separate send and receive queues or 'channels'. The router's queue is shared by all threads (as a property) and used by 'normal' threads as a send-only-channel, ie they only post items to this queue, and only the router listens to it, ie pulls items. Additionally, each 'normal' thread uses its own queue as a 'receive-only-channel' on which it listens and which is shared only with the router. Threads register themselves with the router on the router queue/channel, the router maintains a list of registered threads including their queues, so it can send an item to a specific thread after its registration.
This means that peer to peer communication is not possible, all communication is sent via the router.
There are several reasons I did it this way:
1. There is no logic in the thread for checking if an item is addressed to 'me', making the code simpler and no constant pulling, checking and re-putting of items on one shared queue. Threads only listen on their queue, when a message arrives the thread can be sure that the message is addressed to it, including the router itself.
2. The router can act as a message bus, do vocabulary translation and has the possibility to address messages to external programs or hosts.
3. Threads don't need to know anything about other threads capabilities, ie they just speak the language of the router. In a peer-to-peer world, all peers must be able to understand each other, and since my threads are of many different classes, I would have to teach each class all other classes' vocabulary.
Hope this helps someone some day when faced with a similar challenge.