I'm getting into the whole amqp thing and i have a question regarding which type of exchange type to use under the following scenario's:
1) i have the need to create a worker pool where each worker does something when they receive a message. now i want different workers attached to different types of tasks; which i can specify by using the routing keys of each message in a topic fashion. on the consumer end, playing around a bit with kombu i notice that if i specify the same queue name but with different routing keys i can not 'filter' the messages. eg if i have one consumer with '#' and another with 'foo.#' - both using the same queue name, the latter consumer will work round robin on the queue with the former consumer. is this expected? i am running both consumers on the same machine.
2) so given that, i construct unique queue names for each consumer and this time, each consumer does only get what i ask for with the routing key. however, because they are distinct queues, i may get a task in more than just one consumer. eg if consumer 1 has key '#', and consumer 2 has 'foo.#'; when consumer 2 receives (and acks) a message, consumer 1 also gets the same message. this is not what i want; i would like only one consumer to get the message only. is there a way i can achieve this without writing a 'task manager'?
cheers,
For most people it is best to just use a topic exchange for everything until you fully understand how AMQP works. You can get fanout and direct behavior just by choosing the right binding key for a queue. For instance if you use "#" for a binding key, then that queue behaves as if it was connected to a direct exchange. And if you bind two or more queues to the same routing keys, then those queues function as it if was a fanout exchange.
The round robin behavior is expected. Both tasks are subscribed to the exact same queue. The fact that the binding keys are different just confuses everything. Probably whoever binds last, will set the binding key for every queue user. Best not to do that. I build a system in which several queues have anywhere from 4 to 15 instances of the exact same worker code, pulling messages off the same queue and then collecting data from web services. I have even had the workers running on different CPUs although in the end that was not necessary for performance.
I'm not sure why you are using wildcards in the binding keys. If you have 8 consumers named A through H, and each one does a different job, then why not publish messages with routing keys work.A through work.H and then use the same binding keys work.A through work.H. That way if you have multiple instances of worker B, they all bind to work.B and no message is delivered twice.
Also, if you don't ack a message after handling it, then eventually it will go back on the queue and be delivered again. Hopefully you are acking after successfully handling the message. No task manager is needed, just better understanding of all the AMQP knobs.
Related
I have 1 big task which consists out of 200 sub-tasks (messages) which will be published onto a queue. If I want to cancel this 1 task, the 200 messages (or the ones that are left and not processed yet) should be deleted. Is there any way to delete these published messages in a queue?
One solution I could think of is to create a queue (Q) which where I publish the name of a new queue (X). Each consumer connects then to this new dynamically created queue (X) and process the 200 published messages. If I want to abort the entire task I delete only that queue (X) from the publisher side. Is that a common approach?
I see few issues with your suggested approach.
The first problem is due to RMQ consumer prefetch which is intended to improve performance by reducing the amount of requests to the broker. If your consumers have retrieved a batch of tasks they will process them all before they ask for new ones, only then they will realize the queue was cancelled. Therefore, your cancellation request would not be handled properly most of the times. You could reduce the prefetch count to 1 to avoid this side effect but this would increase the pressure over the network and reduce overall speed.
The second issue is that the AMQP protocol does not provide mechanisms for gracefully dealing with queue deletion. Therefore your consumers would need to carefully deal with queues disappearing as they would otherwise crash. By doing so, you would loose visibility over bugs and issues. How can you distinguish when a queue was explicitly deleted from a case where it actually crashed?
What I would recommend in this case is marking all your tasks with an identifier of their parent job. Each time a consumer starts consuming a new task, it would check if the parent job is valid or has been cancelled. In the latter case, it would simply ignore the task and move to the next one. You need a supporting service for that. A Redis instance should be more then enough for example.
This mechanism would be way simpler and robust. You can spin as many consumers as you want without the need of orchestrating their connection to the right queue. Also out-of-order or interleaved tasks would not be a problem.
I have read lots about python threading and the various means to 'talk' across thread boundaries. My case seems a little different, so I would like to get advice on the best option:
Instead of having many identical worker threads waiting for items in a shared queue, I have a handful of mostly autonomous, non-daemonic threads with unique identifiers going about their business. These threads do not block and normally do not care about each other. They sleep most of the time and wake up periodically. Occasionally, based on certain conditions, one thread needs to 'tell' another thread to do something specific - an action -, meaningful to the receiving thread. There are many different combinations of actions and recipients, so using Events for every combination seems unwieldly. The queue object seems to be the recommended way to achieve this. However, if I have a shared queue and post an item on the queue having just one recipient thread, then every other thread needs monitor the queue, pull every item, check if it is addressed to it, and put it back in the queue if it was addressed to another thread. That seems a lot of getting and putting items from the queue for nothing. Alternatively, I could employ a 'router' thread: one shared-by-all queue plus one queue for every 'normal' thread, shared with the router thread. Normal threads only ever put items in the shared queue, the router pulls every item, inspects it and puts it on the addressee's queue. Still, a lot of putting and getting items from queues....
Are there any other ways to achieve what I need to do ? It seems a pub-sub class is the right approach, but there is no such thread-safe module in standard python, at least to my knowledge.
Many thanks for your suggestions.
Instead of having many identical worker threads waiting for items in a shared queue
I think this is the right approach to do this. Just remove identical and shared from the above statement. i.e.
having many worker threads waiting for items in queues
So I would suggest using Celery for this approach.
Occasionally, based on certain conditions, one thread needs to 'tell'
another thread to do something specific - an action, meaningful to the receiving thread.
This can be done by calling another celery task from within the calling task. All the tasks can have separate queues.
Thanks for the response. After some thoughts, I have decided to use the approach of many queues and a router-thread (hub-and-spoke). Every 'normal' thread has its private queue to the router, enabling separate send and receive queues or 'channels'. The router's queue is shared by all threads (as a property) and used by 'normal' threads as a send-only-channel, ie they only post items to this queue, and only the router listens to it, ie pulls items. Additionally, each 'normal' thread uses its own queue as a 'receive-only-channel' on which it listens and which is shared only with the router. Threads register themselves with the router on the router queue/channel, the router maintains a list of registered threads including their queues, so it can send an item to a specific thread after its registration.
This means that peer to peer communication is not possible, all communication is sent via the router.
There are several reasons I did it this way:
1. There is no logic in the thread for checking if an item is addressed to 'me', making the code simpler and no constant pulling, checking and re-putting of items on one shared queue. Threads only listen on their queue, when a message arrives the thread can be sure that the message is addressed to it, including the router itself.
2. The router can act as a message bus, do vocabulary translation and has the possibility to address messages to external programs or hosts.
3. Threads don't need to know anything about other threads capabilities, ie they just speak the language of the router. In a peer-to-peer world, all peers must be able to understand each other, and since my threads are of many different classes, I would have to teach each class all other classes' vocabulary.
Hope this helps someone some day when faced with a similar challenge.
suppose i have a producer on rabbitmq server which will generate a random number and pass it to the consumer. Consumer will receive all the random-numbers from producer. If i will kill my consumer process, what will producer do in this situation? whether it will continuously generate the number and when ever the consumer(client) will come up it will start sending again all the numbers generated by producer or some thing else...
To fully embrace the functionality you need to understand how the rabbitmq broker works with exchanges. I believe this will solve your problem.
Instead of sending to a single queue you will create an exchange. The producer sends to the exchange. In this state with no queues at this point the messages will be discarded. You will then need to create a queue in order for a consumer to receive the messages. The consumer will create the queue and bind it to the exchange. At that point the queue will receive messages and deliver them to the consumer.
In your case you will probably use a fanout exchange so that you do not need to worry about binding and routing keys. But you should also set you queue to be autodelete. That will ensure that when the consumer goes down the queue will be deleted. And hence the producer, unaffected by this, will continue to send messages to the exchange that are discarded until the queue is reconnected.
For now I'm going to assume you have a topic exchange. If there is a queue and its bound to the same exchange and routing key (or dotted prefix) of the producer, the queue will build up the messages whether there is a consumer there or not... for the most part.
The core idea in the messaging model in RabbitMQ is that the producer
never sends any messages directly to a queue. Actually, quite often
the producer doesn't even know if a message will be delivered to any
queue at all.
-- http://www.rabbitmq.com/tutorials/tutorial-three-python.html
If the queue does not exist the message gets discarded. If the queue does exist (ie its durable) there are configurations that you can make on the queue and/or message that will make it so your messages have a TTL or time to live: (http://www.rabbitmq.com/ttl.html and http://www.rabbitmq.com/dlx.html). You might also want to look into and understand queue durability and auto-delete. I highly recommend you look at the AMQP quick reference also as you can figure out what you want from that: http://www.rabbitmq.com/amqp-0-9-1-quickref.html . You'll have to convert the pseudo code to your library or client.
Basically it all boils down to what type of exchange and the configuration of the queue and message.
The q in rabbit mq stands for queue. This means that all messages placed in the queue, in your case the producers random numbers, will remain in the queue until someone comes to get them.
Where are the messages stored (in rabbit) when you produce a message and send it without declaring a queue or mentioning it in basic_publish? The code I have to work with looks something like this:
... bunch of setup code (no queue declaring tho)...
channel.exchange_declare(exchange='name', type='direct')
channel.basic_publish(exchange='exch_name', routing_key='rkey', body='message' )
conn.close()
I've looked through the web to my abbilities but haven't found an answer to this. I have a hunch that rabbit creates a queue for as long as this message isn't consumed, and my worries are that this would be quite heavy for the rabbit if it has to declare this queue and then destroy it several (thousand!?) times per minute/hour.
When you publish you (usually) publish to an exchange, as you are doing. The exchange decides what to do with that message. If there is nothing to do with the message it is discarded. If there is something to do with the message then it is routed accordingly.
In your original code snippet where there is not queue declared the message will be discarded.
As you say in your comment there was a queue created by the producer. There are options here that you haven't stated. I will try to run through the possibilities. Usually you would declared the queue in the consumer. However if you wish to make sure that they consumer sees all the messages then the queue must be created by the producer and bound to the exchange by the producer to ensure that every message ends up in this queue. Then when the queue is consumed by the consumer it will see all the messages. Alternatively you can create the queue externally from the code as a non autodelete queue and possibly as a durable queue (this will keep the queue even if you restart RabbitMQ) using the commandline or management gui. You will still need to do a declaration in the producer for the exchange in order to send and a declaration in the consumer to receive but they exchange and queue will already exist and you will just simply be connecting to them.
Queues and exchanges are not persistent they are durable or not, which means they will exist after restarting RabbitMQ. Queues have autodelete so that when they consumer is disconnected from them they will no longer exist.
Messages can be persistent, so that if you send a message to an exchange that will be routed to a queue, the message is not read and the RabbitMQ is restarted the message will still be there upon restart. Even if a message is not persistent if the Queue is not Durable then it will be lost, or if the message is not routed to a queue in the first place.
Make sure that after you create a queue you bind the queue properly to the exchange using the same key that you are using as the routing key for your messages.
I have some queue, for etc:
online_queue = self._channel.queue_declare(
durable = True,
queue = 'online'
)
At the moment, I need to flush all content in this queue.
But, at this moment, another process, probably, may publish to this queue.
If I use channel.queue_purge(queue='online'), what will happened with messages, published, while queue_purge still working?
Depending on your ultimate goal, you might be able to solve this issue by using a temporary queue.
To make things more clear, lets give things some names. Call your current queue (the one you want to purge) Queue A, and assume it is 1-1 bound to Exchange A.
If you create a new queue (Queue B) and bind it to Exchange A in the same way that Queue A is bound, Queue B will now get all of the messages (from the time of binding) that Queue A gets.
You can now safely purge Queue A without loosing any of the messages that got sent in after Queue B was bound.
Re-bind Queue A to Exchange A and you are back up and running.
You can then deal with the "interim" messages in Queue B however you might need to.
This has the advantage of having a very well defined behavior and doesn't get you into any race conditions because you can completely blow Queue A away and re-create it instead of purging.
You're describing a race condition. Some might remain in the queue and some others might get purged. Or all of them will get purged. Or none of them will get purged.
There's just no way to tell, because it's a time-dependent situation. You should re-examine your need to purge a queue which is still active, or build a more robust consumer that can live with the fact that there might be messages in the queue it is connecting to (which is basically what consumers have to live with, anyway).