I want to write a consumer with a SelectConnection.
We have several devices in our network infrastructure that close connections after a certain time, therefore I want to use the heartbeat functionality.
As far as I know, the IOLoop runs on the main thread, so heartbeat frames can not be processed while this thread is processing the message.
My idea is to create several worker threads that process the messages so that the main thread can handle the IOLoop. The processing of a message takes a lot of resources, so only a certain amount of the messages should be processed at once. Instead of storing the remaining messages on the client side, I would like to leave them in the queue.
Is there a way to interrupt the consumption of messages, without interrupting the heartbeat?
I am not an expert on SelectConnection for pika, but you could implement this by setting the Consumer Prefetch (QoS) to the wanted number of processes.
This would basically mean that once a message comes in, you offload it to a process or thread, once the message has been processed you then acknowledge that the message has been processed.
As an example, if you set the QoS to 10. The client would pull at most 10 messages, and won't pull any new messages until at least one of those has been acknowledged.
The important part here is that you would need to acknowledge messages only once you are finished processing them.
Related
I have a Python application that has autocommit=True and it is using poll() to get messages with a interval of 1 second. I was reading on the documentation and it mentions that polling reads message in a background thread and queues them so that the main thread can take them afterwards. I was a bit confused there on what happens if I have multiple messages queued and my consumer crashes. Would those messages queued from the background thread have been committed already and hence get lost?
As mentioned in the docs, every auto.commit.interval.ms, any polled offsets will get committed.
If you are concerned about missing data, you should always disable auto-commits, in any Kafka client, and handle commits on your own after you know you've actually processed those records.
I'm trying to stay connected to multiple queues in RabbitMQ. Each time I pop a new message from one of these queue, I'd like to spawn an external process.
This process will take some time to process the message, and I don't want to start processing another message from that specific queue until the one I popped earlier is completed. If possible, I wouldn't want to keep a process/thread around just to wait on the external process to complete and ack the server. Ideally, I would like to ack in this external process, maybe passing some identifier so that it can connect to RabbitMQ and ack the message.
Is it possible to design this system with RabbitMQ? I'm using Python and Pika, if this is relevant to the answer.
Thanks!
RabbitMQ can do this.
You only want to read from the queue when you're ready - so spin up a thread that can spawn the external process and watch it, then fetch the next message from the queue when the process is done. You can then have mulitiple threads running in parallel to manage multiple queues.
I'm not sure what you want an ack for? Are you trying to stop RabbitMQ from adding new elements to that queue if it gets too full (because its elements are being processed too slowly/not at all)? There might be a way to do this when you add messages to the queues - before adding an item, check to make sure that the number of messages already in that queue is not "much greater than" the average across all queues?
suppose i have a producer on rabbitmq server which will generate a random number and pass it to the consumer. Consumer will receive all the random-numbers from producer. If i will kill my consumer process, what will producer do in this situation? whether it will continuously generate the number and when ever the consumer(client) will come up it will start sending again all the numbers generated by producer or some thing else...
To fully embrace the functionality you need to understand how the rabbitmq broker works with exchanges. I believe this will solve your problem.
Instead of sending to a single queue you will create an exchange. The producer sends to the exchange. In this state with no queues at this point the messages will be discarded. You will then need to create a queue in order for a consumer to receive the messages. The consumer will create the queue and bind it to the exchange. At that point the queue will receive messages and deliver them to the consumer.
In your case you will probably use a fanout exchange so that you do not need to worry about binding and routing keys. But you should also set you queue to be autodelete. That will ensure that when the consumer goes down the queue will be deleted. And hence the producer, unaffected by this, will continue to send messages to the exchange that are discarded until the queue is reconnected.
For now I'm going to assume you have a topic exchange. If there is a queue and its bound to the same exchange and routing key (or dotted prefix) of the producer, the queue will build up the messages whether there is a consumer there or not... for the most part.
The core idea in the messaging model in RabbitMQ is that the producer
never sends any messages directly to a queue. Actually, quite often
the producer doesn't even know if a message will be delivered to any
queue at all.
-- http://www.rabbitmq.com/tutorials/tutorial-three-python.html
If the queue does not exist the message gets discarded. If the queue does exist (ie its durable) there are configurations that you can make on the queue and/or message that will make it so your messages have a TTL or time to live: (http://www.rabbitmq.com/ttl.html and http://www.rabbitmq.com/dlx.html). You might also want to look into and understand queue durability and auto-delete. I highly recommend you look at the AMQP quick reference also as you can figure out what you want from that: http://www.rabbitmq.com/amqp-0-9-1-quickref.html . You'll have to convert the pseudo code to your library or client.
Basically it all boils down to what type of exchange and the configuration of the queue and message.
The q in rabbit mq stands for queue. This means that all messages placed in the queue, in your case the producers random numbers, will remain in the queue until someone comes to get them.
I have an idea. Write a WebSocket based RPC that would process messages according to the scenario below.
Client connects to a WS (web socket) server
Client sends a message to the WS server
WS server puts the message into the incoming queue (can be a multiprocessing.Queue or RabbitMQ queue)
One of the workers in the process pool picks up the message for processing
Message is being processed (can be blazingly fast or extremely slow - it is irrelevant for the WS server)
After the message is processed, results of the processing are pushed to the outcoming queue
WS server pops the result from the queue and sends it to the client
NOTE: the key point is that the WS server should be non-blocking and responsible only for:
connection acceptance
getting messages from the client and puting them into the incoming queue
popping messages from the outcoming queue and sending them back to the client
NOTE2: it might be a good idea to store client identifier somehow and pass it around with the message from the client
NOTE3: it is completely fine that because of queueing the messages back and forth the speed of simple message processing (e.g. get message as input and push it back as a result) shall become lower. Target goal is to be able to run processor expensive operations (rough non-practical example: several nested “for” loops) in the pool with the same code style as handling fast messages. I.e. pop message from the input queue together with some sort of client identifier, process it (might take a while) and push the processing results together with client ID to the output queue.
Questions:
In TornadoWeb, if I have a queue (multiprocessing or Rabit), how can
I make Tornado’s IOLoop trigger some callback whenever there is a new
item in that queue? Can you navigate me to some existing
implementation if there is any?
Is there any ready implementation of such a design? (Not necessarily with Tornado)
Maybe I should use another language (not python) to implement such a design?
Acknowledgments:
Recommendations to use REST and WSGI for whatever goal I aim to achieve are not welcome
Comments like “Here is a link to the code that I found by googling for 2 seconds. It has some imports from tornado and multiprocessing.I am not sure what it does, however I am for 99% certain that it isexactly what you need” are not welcome neither
Recommendations to use asynchronous libraries instead of normal blocking ones are ... :)
Tornado's IOLoop allows you handling events from any file object by its file descriptor, so you could try this:
connect with each of your workers processes through multiprocessing.Pipe
call add_handler for each pipe's parent end (using the connection's fileno())
make the workers write some random garbage each time they put something into the output queue, no matter if that's multiprocessing.Queue of any MQ.
handle the answers form the workers in the event handlers
Where are the messages stored (in rabbit) when you produce a message and send it without declaring a queue or mentioning it in basic_publish? The code I have to work with looks something like this:
... bunch of setup code (no queue declaring tho)...
channel.exchange_declare(exchange='name', type='direct')
channel.basic_publish(exchange='exch_name', routing_key='rkey', body='message' )
conn.close()
I've looked through the web to my abbilities but haven't found an answer to this. I have a hunch that rabbit creates a queue for as long as this message isn't consumed, and my worries are that this would be quite heavy for the rabbit if it has to declare this queue and then destroy it several (thousand!?) times per minute/hour.
When you publish you (usually) publish to an exchange, as you are doing. The exchange decides what to do with that message. If there is nothing to do with the message it is discarded. If there is something to do with the message then it is routed accordingly.
In your original code snippet where there is not queue declared the message will be discarded.
As you say in your comment there was a queue created by the producer. There are options here that you haven't stated. I will try to run through the possibilities. Usually you would declared the queue in the consumer. However if you wish to make sure that they consumer sees all the messages then the queue must be created by the producer and bound to the exchange by the producer to ensure that every message ends up in this queue. Then when the queue is consumed by the consumer it will see all the messages. Alternatively you can create the queue externally from the code as a non autodelete queue and possibly as a durable queue (this will keep the queue even if you restart RabbitMQ) using the commandline or management gui. You will still need to do a declaration in the producer for the exchange in order to send and a declaration in the consumer to receive but they exchange and queue will already exist and you will just simply be connecting to them.
Queues and exchanges are not persistent they are durable or not, which means they will exist after restarting RabbitMQ. Queues have autodelete so that when they consumer is disconnected from them they will no longer exist.
Messages can be persistent, so that if you send a message to an exchange that will be routed to a queue, the message is not read and the RabbitMQ is restarted the message will still be there upon restart. Even if a message is not persistent if the Queue is not Durable then it will be lost, or if the message is not routed to a queue in the first place.
Make sure that after you create a queue you bind the queue properly to the exchange using the same key that you are using as the routing key for your messages.