Using RabbitMQ and pika (python), I am running a job queuing system that feeds nodes (asynchronous consumers) with tasks. Each message that defines a task is only acknowloedged once that task is completed.
Sometimes I need to perform updates on these nodes and I have created an exit mode, in which the node waits for its tasks to finish, then exits gracefully. I can then perform my maintenance work.
So that a node does not get more messages from RabbitMQ while in this exit mode, I let it call the basic_cancel method before waiting for the jobs to finish.
This effect of this method is described in the pika documentation :
This method cancels a consumer. This does not affect already
delivered messages, but it does mean the server will not send any more
messages for that consumer. The client may receive an arbitrary number
of messages in between sending the cancel method and receiving the
cancel-ok reply. It may also be sent from the server to the client in
the event of the consumer being unexpectedly cancelled (i.e. cancelled
for any reason other than the server receiving the corresponding
basic.cancel from the client). This allows clients to be notified of
the loss of consumers due to events such as queue deletion.
So if you read "already delivered messages" as messages already received, but not necessarily acknowledged, the tasks the exit mode allows to wait for should not be requeued even if the the consumer node that runs it cancels itself out of the queuing system.
My code for the stop function of my async consumer class (taken from the pika example) is similar to this one :
def stop(self):
"""Cleanly shutdown the connection to RabbitMQ by stopping the consumer
with RabbitMQ. When RabbitMQ confirms the cancellation, on_cancelok
will be invoked by pika, which will then closing the channel and
connection. The IOLoop is started again because this method is invoked
when CTRL-C is pressed raising a KeyboardInterrupt exception. This
exception stops the IOLoop which needs to be running for pika to
communicate with RabbitMQ. All of the commands issued prior to starting
the IOLoop will be buffered but not processed.
"""
LOGGER.info('Stopping')
self._closing = True
self.stop_consuming()
LOGGER.info('Waiting for all running jobs to complete')
for index, thread in enumerate(self.threads):
if thread.is_alive():
thread.join()
# also tried with a while loop that waits 10s as long as the
# thread is still alive
LOGGER.info('Thread {} has finished'.format(index))
# also tried moving the call to stop consuming up to this point
if self._connection!=None:
self._connection.ioloop.start()
LOGGER.info('Closing connection')
self.close_connection()
My issue is that after the consumer cancellation, the async consumer appears to not be sending heartbeats anymore, even if I perform the cancellation after the loop where I wait for my tasks (threads) to finish.
I have read about a process_data_events function for BlockingConnections but I could not find such function. Is the ioloop of the SelectConnection class the equivalent for async consumer ?
As my node in exit mode does not send heartbeats anymore, the tasks it is currently performing will be requeued by RabbitMQ once the maximum heartbeat is reached. I would like to keep this heartbeat untouched, as it is anyway not an issue when I am not in exit mode (my heartbeat here is about 100s, and my tasks might take as much as 2 hours to complete).
Looking at the RabbitMQ logs, the heartbeat is indeed the reason :
=ERROR REPORT==== 12-Apr-2017::19:24:23 ===
closing AMQP connection (.....) :
missed heartbeats from client, timeout: 100s
The only workaround I can think of is acknowledging the messages corresponding to the tasks still running when in exit mode, and hoping that these tasks will not fail...
Is there any method from the channel or connection that I can use to send some heartbeats manually while waiting ?
Could the issue be that the time.sleep() or thread.join() method (from the python threading package) act as completely blocking and do not allow some other threads to perform what they need ? I use in other applications and they don't seem to act as such.
As this issue only appears when in exit mode, I guess there is something in the stop function that causes the consumer to stop sending heartbeats, but as I have also tried (without any success) to call the stop_consuming method only after the wait-on-running-tasks loop, I don't see what can be the root of this issue.
Thanks a lot for your help !
turns out the stop_consuming function was calling basic_cancel in an asynchronous manner with a callback on the channel.close() function, resulting in my application to stop its RabbitMQ interaction and RabbitMQ requeuing the unackesdmessages. Actually realized that as the threads trying to later acknowledge the remaining tasks were having an error as the channel was now set to None, and thus did not have a ack method anymore.
Hope it helps someone!
Related
I have a flask-socketio server running on multiple pods, using redis as a message queue. I want to ensure that emits from external processes reach their destination 100% of the time, or to know when they have failed.
When process A emits an event to a socket that's connected to process B, the event goes through the message queue to process B, to the client. Is there any way I can intercept the outgoing emit on process B? Ideally i'd then use a worker to check after a few seconds if the message reached the client (via a confirm event emitted from the client) or it will be emitted again.
This code runs on process A:
#app.route('/ex')
def ex_route():
socketio.emit('external', {'text': f'sender: {socket.gethostname()}, welcome!'}, room='some_room')
return jsonify(f'sending message to room "some_room" from {socket.gethostname()}')
This is the output from process A
INFO:socketio.server:emitting event "external" to some_room [/]
INFO:geventwebsocket.handler:127.0.0.1 - - [2019-01-11 13:33:44] "GET /ex HTTP/1.1" 200 177 0.003196
This is the output from process B
INFO:engineio.server:9aab2215a0da4816a45e3fdc1e449fce: Sending packet MESSAGE data 2["external",{"text":"sender: *******, welcome!"}]
There is currently no mechanism to do what you ask, unfortunately.
I think you basically have two approaches to go about this:
Always run your emits from the main server(s). If you need to emit from an auxiliary process, use an IPC mechanism to notify the server so that it can run the emit on its behalf. And now you can use callbacks.
Ignore the callbacks, and instead have the client acknowledge receipt of the event by emitting back to the server.
Adding callback support for auxiliary processes should not be terribly difficult, by the way. I never needed that functionality myself and you are the first to ask about it. Maybe I should look into that at some point.
Edit: after some thought, I came up with a 3rd option:
You can connect your external process to the server as a client, instead of using the "emit-only" option. If this process is a client, it can emit an event to the server, which in turn the server can relay to the external client. When the client replies to the server, the server can once again relay the response to the external process, which is not another client and has full send and receive capabilities.
Using IPC is not very robust, especially in case of server receiving a lot of requests there might be an issue where you receive a message and don't retranslate it and it's vital.
Use either celery or zmq or redis itself for interconnect. The most natural is using socketio itself like mentioned by Miguel as it's already waiting for the requests has the environment and can emit anytime.
I've used a greenlet hack over threads - where greenlet is lighter than threads and runs in the same environment allowing it to send the message while your main thread awaits socket in non-blocking mode. Basically you write a thread, then apply eventlet or gevent to the whole code via monkeypatching and the thread becomes a greenlet - an inbetween function call. You put a sleep on it so it doesn't hog all resources and you have your sender because greenlets share environment easily, they are not bound by io, just cpu (which is the same for threads in Python but greenlets are even more lightweight due to no OS-level context change at all).
But as soon as CPU load increased I switched over to client/server. Imbuing IPC would require massive rewrites from the ground up.
I am starting multiple instances of the rabbitmq consumer(same queue) through a single process(multiprocessing). On an interrupt, I want all the consumers to gracefully shutdown. By that I mean, in case a process fetched from queue is already running, let it finish and then stop consuming any more requests and stop the queue.
Is there a way of knowing if queue is executing something and then wait for it to finish and then stop the queue?
The RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
Is there a way of knowing if queue is executing something and then wait for it to finish and then stop the queue?
No, there is no way to know this. You should be using message acknowledgements. When you wish to stop consuming, you can call basic_cancel and then exit that consumer process. RabbitMQ will only consider those messages as acknowledged as delivered, so you won't have to worry about losing a message.
I want to write a consumer with a SelectConnection.
We have several devices in our network infrastructure that close connections after a certain time, therefore I want to use the heartbeat functionality.
As far as I know, the IOLoop runs on the main thread, so heartbeat frames can not be processed while this thread is processing the message.
My idea is to create several worker threads that process the messages so that the main thread can handle the IOLoop. The processing of a message takes a lot of resources, so only a certain amount of the messages should be processed at once. Instead of storing the remaining messages on the client side, I would like to leave them in the queue.
Is there a way to interrupt the consumption of messages, without interrupting the heartbeat?
I am not an expert on SelectConnection for pika, but you could implement this by setting the Consumer Prefetch (QoS) to the wanted number of processes.
This would basically mean that once a message comes in, you offload it to a process or thread, once the message has been processed you then acknowledge that the message has been processed.
As an example, if you set the QoS to 10. The client would pull at most 10 messages, and won't pull any new messages until at least one of those has been acknowledged.
The important part here is that you would need to acknowledge messages only once you are finished processing them.
Recentlly i use python and redis to build a smarl messge-driven project.
i use one thread to subsribe to redis channel(called message thread here); a timer thread; and a worker thread;
when message thread got enough messages, it post a task to worker.
i use redis-py to communicate with redis
Message Thread:
subscribe to redis;
while True:
get message;
if len(messages)>threashold: post task to Worker
Worker Thread:
while True:
wait task event;
do task; //this may be heavy
here comes the problem:
after this work for a while, the redis-py subpub blocked!(ofcource redis is still publish message, but it do not return anymore, it just blocked!). i use gdb attach to it, i see stack frame like this:
[Switching to thread 4 (Thread 1084229984 (LWP 9812))]#0 0x000000302b80b0cf in __read_nocancel () from /lib64/tls/libpthread.so.0
(gdb) bt
0 0x000000302b80b0cf in __read_nocancel () from /lib64/tls/libpthread.so.0
1 0x00000000004e129a in posix_read (self=Variable "self" is not available.) at./Modules/posixmodule.c:6592
2 0x00000000004a04c5 in PyEval_EvalFrameEx (f=0x157a8c0, throwflag=Variable "throwflag" is not available.) at Python/ceval.c:4323
i even use redis 'client kill' command to kill the connection between python and redis, but python still block there, never return or raise exeption. the only way is to kill the python process use kill -9.
then i comment work's 'do task' procedure(remember this task is heavy, it make heavy network io, cpu calculation), it works well and no problem abserved.
so, it seems come to conclusion: Once i use worker do task, message thread will block at socket read.
How can this happen!!
The most probable explanation is you use the same Redis connection in your task processing code. You should not.
Once a connection has subscribed, you cannot use it for anything except receiving messages, or running additional SUBSCRIBE, PSUBSCRIBE, UNSUBSCRIBE and PUNSUBSCRIBE commands.
You probably need a second Redis connection in your task processing code.
I use a thread pool and and I submit some tasks to be proccessed.
Sometimes a server that I ping or my internet connection could be down. This can be determined by any of the running threads.
In that case I would like the thread that detected the error to notify the others to pause
their execution until the error is fixed.
Do you know how is this possible?
I add to the above. The ideal solution would be :
when a threads detects the program to send a message to all other threads to wait.
Also it should notify an external server and after receiving that everything is ok from the
server to send again a signal to the other threads to continue the work.
I found it using threading.Event
I just have an event
event = Event()
and I use the
event.wait()
In the beggining I
event.set()
when a thread detects an error it
event.clear()
and the threads are waiting until
event.set()
This seems to work