I have a python messaging application that uses ZMQ. Each object has a PUB and a SUB queue, and they connect to each other. In some particular cases I want to wait for a particular message in the SUB queue, leaving the ones that I am not interested for later processing.
Right now, I am getting all messages and queuing those I am not interested in a Python Queue, until I found the one I am waiting for. But his means that in each processing routing I need to check first in the Python Queue for old messages. Is there a better way?
A zmq publisher doesn't do any queueing... it drops messages when there isn't a SUB available to receive those messages.
The better way in your situation would be to create a generic sub who only will subscribe to certain messages of interest. That way you can spin up all of the different SUBs (even within one thread and using a zmq poller) and they will all process messages as they come from the PUB....
This is what the PUB/SUB pattern is primarily used for. Subs only subscribe to messages of interest, thus eliminating the need to cycle through a queue of messages at every loop looking for messages of interest.
Related
I'm trying to stay connected to multiple queues in RabbitMQ. Each time I pop a new message from one of these queue, I'd like to spawn an external process.
This process will take some time to process the message, and I don't want to start processing another message from that specific queue until the one I popped earlier is completed. If possible, I wouldn't want to keep a process/thread around just to wait on the external process to complete and ack the server. Ideally, I would like to ack in this external process, maybe passing some identifier so that it can connect to RabbitMQ and ack the message.
Is it possible to design this system with RabbitMQ? I'm using Python and Pika, if this is relevant to the answer.
Thanks!
RabbitMQ can do this.
You only want to read from the queue when you're ready - so spin up a thread that can spawn the external process and watch it, then fetch the next message from the queue when the process is done. You can then have mulitiple threads running in parallel to manage multiple queues.
I'm not sure what you want an ack for? Are you trying to stop RabbitMQ from adding new elements to that queue if it gets too full (because its elements are being processed too slowly/not at all)? There might be a way to do this when you add messages to the queues - before adding an item, check to make sure that the number of messages already in that queue is not "much greater than" the average across all queues?
So far for a single queue in RabbitMQ I have used a single channel
But now I have multiple queues created dynamically, so do I have to create a new channel for each queue or one channel can be to receive/send messages from/to different queues?
# consuming
for ch in items:
channel1 = rconn.channel()
channel1.queue_declare(queue=itm)
channel1.basic_consume(some_callback, queue=itm, no_ack=True)
channel1.start_consuming()
# publishing
for ch in items:
# ....
channel1.basic_publish(exchange="", routing_key=itm, body="fdsfds")
I've had weird issues when I tried to reuse the channel. I'd go with multiple channels. One per each type of producer/consumer is what I ended using iirc.
You do not need to have one queue per channel. You can both declare and consume from multiple queues on the same channel. See this question for more info.
In many client libraries, the queue declaration "RPC" operations should not be mixed with the consume "streaming" operations. In such cases, it's better to have two channels: one for any number of RPC things like queue declarations, deletions, binding creation, etc., and one for any number of consumes.
I think the official Python driver handles this correctly and does not require more than one channel for both.
To (very roughly and nondeterministically) test this, start a publisher somewhere sending a steady stream of messages to a queue, and create a consumer on that queue that consumes messages while repeatedly declaring other queues. If everything works well for awhile, your client is fine mixing RPC and streaming operations. Of course, the client's documentation on the subject is a better authority than this test.
I have read lots about python threading and the various means to 'talk' across thread boundaries. My case seems a little different, so I would like to get advice on the best option:
Instead of having many identical worker threads waiting for items in a shared queue, I have a handful of mostly autonomous, non-daemonic threads with unique identifiers going about their business. These threads do not block and normally do not care about each other. They sleep most of the time and wake up periodically. Occasionally, based on certain conditions, one thread needs to 'tell' another thread to do something specific - an action -, meaningful to the receiving thread. There are many different combinations of actions and recipients, so using Events for every combination seems unwieldly. The queue object seems to be the recommended way to achieve this. However, if I have a shared queue and post an item on the queue having just one recipient thread, then every other thread needs monitor the queue, pull every item, check if it is addressed to it, and put it back in the queue if it was addressed to another thread. That seems a lot of getting and putting items from the queue for nothing. Alternatively, I could employ a 'router' thread: one shared-by-all queue plus one queue for every 'normal' thread, shared with the router thread. Normal threads only ever put items in the shared queue, the router pulls every item, inspects it and puts it on the addressee's queue. Still, a lot of putting and getting items from queues....
Are there any other ways to achieve what I need to do ? It seems a pub-sub class is the right approach, but there is no such thread-safe module in standard python, at least to my knowledge.
Many thanks for your suggestions.
Instead of having many identical worker threads waiting for items in a shared queue
I think this is the right approach to do this. Just remove identical and shared from the above statement. i.e.
having many worker threads waiting for items in queues
So I would suggest using Celery for this approach.
Occasionally, based on certain conditions, one thread needs to 'tell'
another thread to do something specific - an action, meaningful to the receiving thread.
This can be done by calling another celery task from within the calling task. All the tasks can have separate queues.
Thanks for the response. After some thoughts, I have decided to use the approach of many queues and a router-thread (hub-and-spoke). Every 'normal' thread has its private queue to the router, enabling separate send and receive queues or 'channels'. The router's queue is shared by all threads (as a property) and used by 'normal' threads as a send-only-channel, ie they only post items to this queue, and only the router listens to it, ie pulls items. Additionally, each 'normal' thread uses its own queue as a 'receive-only-channel' on which it listens and which is shared only with the router. Threads register themselves with the router on the router queue/channel, the router maintains a list of registered threads including their queues, so it can send an item to a specific thread after its registration.
This means that peer to peer communication is not possible, all communication is sent via the router.
There are several reasons I did it this way:
1. There is no logic in the thread for checking if an item is addressed to 'me', making the code simpler and no constant pulling, checking and re-putting of items on one shared queue. Threads only listen on their queue, when a message arrives the thread can be sure that the message is addressed to it, including the router itself.
2. The router can act as a message bus, do vocabulary translation and has the possibility to address messages to external programs or hosts.
3. Threads don't need to know anything about other threads capabilities, ie they just speak the language of the router. In a peer-to-peer world, all peers must be able to understand each other, and since my threads are of many different classes, I would have to teach each class all other classes' vocabulary.
Hope this helps someone some day when faced with a similar challenge.
suppose i have a producer on rabbitmq server which will generate a random number and pass it to the consumer. Consumer will receive all the random-numbers from producer. If i will kill my consumer process, what will producer do in this situation? whether it will continuously generate the number and when ever the consumer(client) will come up it will start sending again all the numbers generated by producer or some thing else...
To fully embrace the functionality you need to understand how the rabbitmq broker works with exchanges. I believe this will solve your problem.
Instead of sending to a single queue you will create an exchange. The producer sends to the exchange. In this state with no queues at this point the messages will be discarded. You will then need to create a queue in order for a consumer to receive the messages. The consumer will create the queue and bind it to the exchange. At that point the queue will receive messages and deliver them to the consumer.
In your case you will probably use a fanout exchange so that you do not need to worry about binding and routing keys. But you should also set you queue to be autodelete. That will ensure that when the consumer goes down the queue will be deleted. And hence the producer, unaffected by this, will continue to send messages to the exchange that are discarded until the queue is reconnected.
For now I'm going to assume you have a topic exchange. If there is a queue and its bound to the same exchange and routing key (or dotted prefix) of the producer, the queue will build up the messages whether there is a consumer there or not... for the most part.
The core idea in the messaging model in RabbitMQ is that the producer
never sends any messages directly to a queue. Actually, quite often
the producer doesn't even know if a message will be delivered to any
queue at all.
-- http://www.rabbitmq.com/tutorials/tutorial-three-python.html
If the queue does not exist the message gets discarded. If the queue does exist (ie its durable) there are configurations that you can make on the queue and/or message that will make it so your messages have a TTL or time to live: (http://www.rabbitmq.com/ttl.html and http://www.rabbitmq.com/dlx.html). You might also want to look into and understand queue durability and auto-delete. I highly recommend you look at the AMQP quick reference also as you can figure out what you want from that: http://www.rabbitmq.com/amqp-0-9-1-quickref.html . You'll have to convert the pseudo code to your library or client.
Basically it all boils down to what type of exchange and the configuration of the queue and message.
The q in rabbit mq stands for queue. This means that all messages placed in the queue, in your case the producers random numbers, will remain in the queue until someone comes to get them.
I have an idea. Write a WebSocket based RPC that would process messages according to the scenario below.
Client connects to a WS (web socket) server
Client sends a message to the WS server
WS server puts the message into the incoming queue (can be a multiprocessing.Queue or RabbitMQ queue)
One of the workers in the process pool picks up the message for processing
Message is being processed (can be blazingly fast or extremely slow - it is irrelevant for the WS server)
After the message is processed, results of the processing are pushed to the outcoming queue
WS server pops the result from the queue and sends it to the client
NOTE: the key point is that the WS server should be non-blocking and responsible only for:
connection acceptance
getting messages from the client and puting them into the incoming queue
popping messages from the outcoming queue and sending them back to the client
NOTE2: it might be a good idea to store client identifier somehow and pass it around with the message from the client
NOTE3: it is completely fine that because of queueing the messages back and forth the speed of simple message processing (e.g. get message as input and push it back as a result) shall become lower. Target goal is to be able to run processor expensive operations (rough non-practical example: several nested “for” loops) in the pool with the same code style as handling fast messages. I.e. pop message from the input queue together with some sort of client identifier, process it (might take a while) and push the processing results together with client ID to the output queue.
Questions:
In TornadoWeb, if I have a queue (multiprocessing or Rabit), how can
I make Tornado’s IOLoop trigger some callback whenever there is a new
item in that queue? Can you navigate me to some existing
implementation if there is any?
Is there any ready implementation of such a design? (Not necessarily with Tornado)
Maybe I should use another language (not python) to implement such a design?
Acknowledgments:
Recommendations to use REST and WSGI for whatever goal I aim to achieve are not welcome
Comments like “Here is a link to the code that I found by googling for 2 seconds. It has some imports from tornado and multiprocessing.I am not sure what it does, however I am for 99% certain that it isexactly what you need” are not welcome neither
Recommendations to use asynchronous libraries instead of normal blocking ones are ... :)
Tornado's IOLoop allows you handling events from any file object by its file descriptor, so you could try this:
connect with each of your workers processes through multiprocessing.Pipe
call add_handler for each pipe's parent end (using the connection's fileno())
make the workers write some random garbage each time they put something into the output queue, no matter if that's multiprocessing.Queue of any MQ.
handle the answers form the workers in the event handlers