Recentlly i use python and redis to build a smarl messge-driven project.
i use one thread to subsribe to redis channel(called message thread here); a timer thread; and a worker thread;
when message thread got enough messages, it post a task to worker.
i use redis-py to communicate with redis
Message Thread:
subscribe to redis;
while True:
get message;
if len(messages)>threashold: post task to Worker
Worker Thread:
while True:
wait task event;
do task; //this may be heavy
here comes the problem:
after this work for a while, the redis-py subpub blocked!(ofcource redis is still publish message, but it do not return anymore, it just blocked!). i use gdb attach to it, i see stack frame like this:
[Switching to thread 4 (Thread 1084229984 (LWP 9812))]#0 0x000000302b80b0cf in __read_nocancel () from /lib64/tls/libpthread.so.0
(gdb) bt
0 0x000000302b80b0cf in __read_nocancel () from /lib64/tls/libpthread.so.0
1 0x00000000004e129a in posix_read (self=Variable "self" is not available.) at./Modules/posixmodule.c:6592
2 0x00000000004a04c5 in PyEval_EvalFrameEx (f=0x157a8c0, throwflag=Variable "throwflag" is not available.) at Python/ceval.c:4323
i even use redis 'client kill' command to kill the connection between python and redis, but python still block there, never return or raise exeption. the only way is to kill the python process use kill -9.
then i comment work's 'do task' procedure(remember this task is heavy, it make heavy network io, cpu calculation), it works well and no problem abserved.
so, it seems come to conclusion: Once i use worker do task, message thread will block at socket read.
How can this happen!!
The most probable explanation is you use the same Redis connection in your task processing code. You should not.
Once a connection has subscribed, you cannot use it for anything except receiving messages, or running additional SUBSCRIBE, PSUBSCRIBE, UNSUBSCRIBE and PUNSUBSCRIBE commands.
You probably need a second Redis connection in your task processing code.
Related
I have a flask-socketio server running on multiple pods, using redis as a message queue. I want to ensure that emits from external processes reach their destination 100% of the time, or to know when they have failed.
When process A emits an event to a socket that's connected to process B, the event goes through the message queue to process B, to the client. Is there any way I can intercept the outgoing emit on process B? Ideally i'd then use a worker to check after a few seconds if the message reached the client (via a confirm event emitted from the client) or it will be emitted again.
This code runs on process A:
#app.route('/ex')
def ex_route():
socketio.emit('external', {'text': f'sender: {socket.gethostname()}, welcome!'}, room='some_room')
return jsonify(f'sending message to room "some_room" from {socket.gethostname()}')
This is the output from process A
INFO:socketio.server:emitting event "external" to some_room [/]
INFO:geventwebsocket.handler:127.0.0.1 - - [2019-01-11 13:33:44] "GET /ex HTTP/1.1" 200 177 0.003196
This is the output from process B
INFO:engineio.server:9aab2215a0da4816a45e3fdc1e449fce: Sending packet MESSAGE data 2["external",{"text":"sender: *******, welcome!"}]
There is currently no mechanism to do what you ask, unfortunately.
I think you basically have two approaches to go about this:
Always run your emits from the main server(s). If you need to emit from an auxiliary process, use an IPC mechanism to notify the server so that it can run the emit on its behalf. And now you can use callbacks.
Ignore the callbacks, and instead have the client acknowledge receipt of the event by emitting back to the server.
Adding callback support for auxiliary processes should not be terribly difficult, by the way. I never needed that functionality myself and you are the first to ask about it. Maybe I should look into that at some point.
Edit: after some thought, I came up with a 3rd option:
You can connect your external process to the server as a client, instead of using the "emit-only" option. If this process is a client, it can emit an event to the server, which in turn the server can relay to the external client. When the client replies to the server, the server can once again relay the response to the external process, which is not another client and has full send and receive capabilities.
Using IPC is not very robust, especially in case of server receiving a lot of requests there might be an issue where you receive a message and don't retranslate it and it's vital.
Use either celery or zmq or redis itself for interconnect. The most natural is using socketio itself like mentioned by Miguel as it's already waiting for the requests has the environment and can emit anytime.
I've used a greenlet hack over threads - where greenlet is lighter than threads and runs in the same environment allowing it to send the message while your main thread awaits socket in non-blocking mode. Basically you write a thread, then apply eventlet or gevent to the whole code via monkeypatching and the thread becomes a greenlet - an inbetween function call. You put a sleep on it so it doesn't hog all resources and you have your sender because greenlets share environment easily, they are not bound by io, just cpu (which is the same for threads in Python but greenlets are even more lightweight due to no OS-level context change at all).
But as soon as CPU load increased I switched over to client/server. Imbuing IPC would require massive rewrites from the ground up.
Using RabbitMQ and pika (python), I am running a job queuing system that feeds nodes (asynchronous consumers) with tasks. Each message that defines a task is only acknowloedged once that task is completed.
Sometimes I need to perform updates on these nodes and I have created an exit mode, in which the node waits for its tasks to finish, then exits gracefully. I can then perform my maintenance work.
So that a node does not get more messages from RabbitMQ while in this exit mode, I let it call the basic_cancel method before waiting for the jobs to finish.
This effect of this method is described in the pika documentation :
This method cancels a consumer. This does not affect already
delivered messages, but it does mean the server will not send any more
messages for that consumer. The client may receive an arbitrary number
of messages in between sending the cancel method and receiving the
cancel-ok reply. It may also be sent from the server to the client in
the event of the consumer being unexpectedly cancelled (i.e. cancelled
for any reason other than the server receiving the corresponding
basic.cancel from the client). This allows clients to be notified of
the loss of consumers due to events such as queue deletion.
So if you read "already delivered messages" as messages already received, but not necessarily acknowledged, the tasks the exit mode allows to wait for should not be requeued even if the the consumer node that runs it cancels itself out of the queuing system.
My code for the stop function of my async consumer class (taken from the pika example) is similar to this one :
def stop(self):
"""Cleanly shutdown the connection to RabbitMQ by stopping the consumer
with RabbitMQ. When RabbitMQ confirms the cancellation, on_cancelok
will be invoked by pika, which will then closing the channel and
connection. The IOLoop is started again because this method is invoked
when CTRL-C is pressed raising a KeyboardInterrupt exception. This
exception stops the IOLoop which needs to be running for pika to
communicate with RabbitMQ. All of the commands issued prior to starting
the IOLoop will be buffered but not processed.
"""
LOGGER.info('Stopping')
self._closing = True
self.stop_consuming()
LOGGER.info('Waiting for all running jobs to complete')
for index, thread in enumerate(self.threads):
if thread.is_alive():
thread.join()
# also tried with a while loop that waits 10s as long as the
# thread is still alive
LOGGER.info('Thread {} has finished'.format(index))
# also tried moving the call to stop consuming up to this point
if self._connection!=None:
self._connection.ioloop.start()
LOGGER.info('Closing connection')
self.close_connection()
My issue is that after the consumer cancellation, the async consumer appears to not be sending heartbeats anymore, even if I perform the cancellation after the loop where I wait for my tasks (threads) to finish.
I have read about a process_data_events function for BlockingConnections but I could not find such function. Is the ioloop of the SelectConnection class the equivalent for async consumer ?
As my node in exit mode does not send heartbeats anymore, the tasks it is currently performing will be requeued by RabbitMQ once the maximum heartbeat is reached. I would like to keep this heartbeat untouched, as it is anyway not an issue when I am not in exit mode (my heartbeat here is about 100s, and my tasks might take as much as 2 hours to complete).
Looking at the RabbitMQ logs, the heartbeat is indeed the reason :
=ERROR REPORT==== 12-Apr-2017::19:24:23 ===
closing AMQP connection (.....) :
missed heartbeats from client, timeout: 100s
The only workaround I can think of is acknowledging the messages corresponding to the tasks still running when in exit mode, and hoping that these tasks will not fail...
Is there any method from the channel or connection that I can use to send some heartbeats manually while waiting ?
Could the issue be that the time.sleep() or thread.join() method (from the python threading package) act as completely blocking and do not allow some other threads to perform what they need ? I use in other applications and they don't seem to act as such.
As this issue only appears when in exit mode, I guess there is something in the stop function that causes the consumer to stop sending heartbeats, but as I have also tried (without any success) to call the stop_consuming method only after the wait-on-running-tasks loop, I don't see what can be the root of this issue.
Thanks a lot for your help !
turns out the stop_consuming function was calling basic_cancel in an asynchronous manner with a callback on the channel.close() function, resulting in my application to stop its RabbitMQ interaction and RabbitMQ requeuing the unackesdmessages. Actually realized that as the threads trying to later acknowledge the remaining tasks were having an error as the channel was now set to None, and thus did not have a ack method anymore.
Hope it helps someone!
I have a Python log handler that spawns a new thread for each log entry, and within the new thread the log is sent to another server. However, I'm finding that the request times out intermittently. If I disable the handler, the problem goes away.
I have tried other WSGI servers (WSGIUtils, WSGIRef) and I cannot reproduce this issue.
Any ideas?
I'm running Gunicorn 19.3 with sync workers and Django 1.6 on Debian.
I think this is another case of GIL - Global Interpreter Lock. Assume this happens:
Request comes in
Worker starts
Worker logs -> thread is started
Worker is done
Response is sent
Thread logs event
Looks good, right. But what if the thread does its work faster? Then you get:
Request comes in
Worker starts
Worker logs -> thread is started
Worker is done
Thread logs event
Response is sent
and depending on which locks the thread holds or which OS functions it invokes, the thread might not release the GIL for some time which would block the worker or at least prevent the response from being delivered completely.
When looking at the code, I see that you're creating an SSL connection for each log message. SSL is very, very, very expensive. Don't do that. Instead, create a worker thread which opens the SSL socket once and keeps it open. The python library should release the GIL when you write to the socket so other code can work. But I'm not sure whether Python releases the lock when you open a socket.
I new to Python and am developing an application in Python 2.7. I am using a thread pool provided by the concurrent.futures library. Once a thread from ThreadPool is started, it needs to wait for some message from RabbitMQ.
How can I implement this logic in Python to make this thread from the pool wait for event messages? Basically I need to wake up a waiting thread once I receive message from RabbitMQ (i.e wait and notify implementation on ThreadPool).
First you define a Queue:
from Queue import Queue
q = Queue()
then, in your thread, you attempt to get an item from that queue:
msg = q.get()
this will block the entire thread until there is something to be found in the queue.
Now, at the same time, assuming your incoming events are notified by means of triggering callbacks, you register a callback that simply puts the received RabbitMQ message in the queue:
def on_message(msg):
q.put(msg)
rabbitmq_channel.register_callback(on_message)
or if you like shorter code:
rabbitmq_channel.register_callback(lambda msg: q.put(msg))
(the above is pseudocode because I've not used RabbitMQ nor whatever Python bindings for RabbitMQ, but you should be able to easily figure out how to adapt the snippet to your real application code; the key part to pay attention to is q.put(msg)—just make sure that part gets invoked as soon as a new message is notified.)
as soon as this happens, the thread is awakened and is free to process the message. In order to reuse the same thread for multiple messages, just use a while loop:
while True:
msg = q.get()
process_message(msg)
P.S. I would suggest looking into Gevent and how to combine it with RabbitMQ in your Python application so as to be able to get rid of threads and use more lightweight and scalable green threading mechanism instead without ever having to manage a threadpool (because you can just have tens of thousands of greenlets spawned and killed on the fly):
# this thing always called in a green thread; forget about pools and queues.
def on_message(msg):
# you're in a green thread now; just process away!
benefit_from("all the gevent goodness!")
spawn_and_join_10_sub_greenlets()
rabbitmq_channel.register_callback(lambda msg: gevent.spawn(on_message, msg))
from multiprocessing import Process
a=Process(target=worker, args=())
a.start()
I am making a multiple worker-process app (don't laugh yet) in which each worker can gracefully reload. Whenever the code is updated, new requests are served by new worker processes with the new code. This is such that
A newly launched thread contains updated code
ensure that no requests are dropped
I already made a worker that listens:
serves requests when it gets aa request signal
kills itself when the next signal is a control signal
I did it in zeromq. The clients connect to this server using zeromq. The clients do not interact by HTTP.
What is a good way to reload the code? Can you explain a scheme that is simple and stupid enough to be robust?
What I have in mind/ can do
Launch a thread within the main process that iterates:
Signal every worker process to die
Launch new worker processes
But this approach will drop (I configured it that way) requests between the death of the last old worker and the spawning of the first new worker.
And no, I am not a college student. The "homework" just means a curiosity-driven pursuit.
Reloading code in python is a notoriously difficult problem.
Here's how I would deal with it:
at server startup, listen on your HTTP port (but do not start accepting connections)
Use multiprocessing or some such to create some worker processes; this should happen after the socket starts listening so that the subprocesses inherit the socket.
each worker can then accept connections and service requests.
when the parent process learns that it should reload, it shuts down the listening socket.
when a worker tries to accept a closed socket, it recieves a socket.error exception, and should terminate
the parent process can start a new main process (as in subprocess.Popen(sys.argv)) the new process can start accepting connections immediately.
the old process can now wait for the child workers to finish; the children cannot accept new connections (since the listening socket is shutdown). Once all child process have finished handling in-flight requests and closed, the parent process can also terminate.
The way that I did this in python is fairly simple but depends on a middleman or message broker. Basically, I receive a message, process it and ack it. If a message is not acked, then after a timeout the broker requeues it.
With this in place, you simpy kill and restart the process. In my case the process traps SIGINT and does an orderly shutdown. However it dies, the supervisor notices that the worker has died and starts a new one, which continues processing messages from the queue.
I was inspired by Erlang's supervision tree model where everything is designed to survive the death of a process. I even made my workers send a heartbeat to the supervisor periodically (ZeroMQ PUB to supervisor SUB) so that the supervisor can kill and restart a process if it hangs for any reason.