I have a Twisted application (Python 2.7) that uses the kombu module to communicate with a RabbitMQ message server.
We're having problems with closing connections (probably firewall related) and I'm trying to implement the heartbeat_check() method to handle this. I've got a heartbeat value of 10 set on the connection and I've got a Twisted LoopingCall that calls the heartbeat_check(rate=2) method every 5 seconds.
However, once things get rolling I'm getting an exception thrown every other call to heartbeat_check() (based on the logging information I've got in the function that LoopingCall calls, which includes the call to heartbeat_check). I've tried all kinds of variations of heartbeat and rate values and nothing seems to help. When I debug into the heartbeat_call() it looks like some minor message traffic is being sent back and forth, am I supposed to respond to that in my message consumer?
Any hints or pointers would be very helpful, thanks in advance!!
Doug
Related
I have started a private project with Django and Channels to build a web-based UI to control the music player daemon (mpd) on raspberry pi. I know that there are other projects like Volumio or moode audio etc. out of the box that is doing the same, but my intension is to learn something new!
Up to now I have managed to setup a nginx server on the pi that communicates with my devices like mobile phone or pc. In the background nginx communicates with an uWSGI server for http requests to Django and a daphne server as asgi for ws connection to Django Channels. As well there is a redis server installed as backend because the Channels Layer needs this. So, on client request a simple html page as UI is served and a websocket connection is established so far.
In parallel I have a separate script as a mpd handler which is wrapped in a while loop to keep it alive, and which does all the stuff with mpd using the python module python-mpd2.
The mpd handler shall get its commands via websocket from the clients/consumers like play, stop etc. and reacts on that. At the same time, it shall send the timeline of the song when a song is playing, let’s say every one second as well via websocket. I could manage to send frequently data to all connected clients/consumers with async_to_sync(channel_layer.group_send) from outside but I couldn’t find a solution how to pass data/commands coming from the clients via websocket to my separate running mpd handler script.
I read in the docs for Django Channels that it is not recommended to use while loops in the consumers because this will block all the communication – that’s right I have tried this already. Then I tried to receive messages with the command async_to_sync(channel_layer.receive)('channel_name') in the mpd handler with a direct connection to a consumer. But this command blocks my mpd handler because it works async although I use async_to_sync.
So, my question:
Is it possible to pass messages to outside of Django Channels to other scripts with channel own methods? Do you have any suggestion how to solve this maybe with other methods or workarounds? I am looking for a reliable solution.
I gave thoughts to that issue and have some ideas, but I don’t know if this will lead to any solution:
Polling:
The clients send frequently messages and requests via websocket to control the mpd and update the UI. In this case no handler would be needed. (I don’t know if this method will generate to much traffic on the websocket and makes it slow. As well, the connection to mpd has to be established frequently and closed again. Don’t know if this works robust.)
Database:
Generate a database where consumers and the mpd handler have access to. The consumers write the incoming messages in a database and the mpd handler reads them out and does the job. (Here I don’t know if there will be problems when the consumers and mpd handler try to access the db at the same time.)
Using Queues with multiprocessing module:
Consumers passes the messages via a queue to the mpd handler. (Don’t know if this is possible.)
Catching up the messages in redis:
Mpd handler listens frequently on redis to catch up the messages. I read that when the Layers are used in common way the groups and channel names are listed on redis only. Messages are passed via redis when the consumers are started as workers. (That would mean that all my consumers must start as background worker, but how?)
I hope you may have a solution to my question. You may realise from my ideas and the question marks involved to solve this problem that I am not an IT expert. As I wrote at the beginning, I have another engineering background and a newbie in this but very interested to learn something new! So please be patient with me when I don’t understand everything immediately.
I hope to read your answers soon and thank you in advance.
Best regards.
Whilst nobody gave an answer to my question, I tried a little bit out some possible options.
I changed the binding of mpd from fix IP to a socket connection and created a mpd_Handler class with some functions/methods like connect to mpd, disconnect, play, pause etc.
This class is imported in Django consumers.py and views.py. Whenever a web client connects to Django or has a new command (like play, skip etc.), the mpd_Handler will perform the command and respond the actual state of mpd like current song metadata.
A second mpd handler which is running outside of Django as a separate script monitors frequently the mpd state to detect any changes. In case of a change at mpd (e.g., the song of web radio stream has changed or the duration time of the song) this handler informs all clients that are connected to Django consumer group with the command async_to_sync(channel_layer.group_send) so that the clients can update their UI.
At the moment it works, and I hope this is a good solution and helps others who have the same problem. Other suggestions are still welcome!
Best regards.
Sometimes our rabbit messaging server requires a restart. After which however some consumers which are listening via basic consume blocking call do not consume any messages until they are restarted themselves and neither do they raise any exception.
What is the reason for this and how might I fix?
In the connectionFactory, please ensure the following property is set to true:
factory.setAutomaticRecoveryEnabled(true);
For more details, please refer the document here
As I mentioned in my comment, every AMQP client library has a different way to recover connections, and some depend on the developer to do that. There is NO canonical method.
Pika has this example as a starting point for connection recovery. Note that the code is for the unreleased version of Pika (1.0.0). If you're on 0.12.0 you will have to adjust the parameters to the method calls.
The best way to test and implement connection recovery is to simulate failure conditions and then code for them. Run your application, then kill the beam.smp process (RabbitMQ) to see what happens. If you have a RabbitMQ cluster, use firewall rules to simulate a network partition. Can your application handle that? What happens when you run rabbitmqctl stop_app; sleep 10; rabbitmqctl start_app? Can your app handle that?
Run your application through a TCP proxy like toxiproxy and introduce latency and other non-optimal conditions. Shut down the proxy to simulate a sudden TCP connection close. In each case, code for that failure condition and log the event so that someone can later diagnose what has happened.
I have seen too many developers code for the "happy path" only to have their applications fail spectacularly in production with zero ability to determine the source of the failure.
Im trying to make a tcp communication, where the server sends a message every x seconds through a socket, and should stop sending those messages on a certain condition where the client isnt sending any message for 5 seconds.
To be more detailed, the client also sends constant messages which are all ignored by the server on the same socket as above, and can stop sending them at any unknown time. The messages are, for simplicity, used as alive messages to inform the server that the communication is still relevant.
The problem is that if i want to send repeated messages from the server, i cannot allow it to "get busy" and start receiving messages instead, thus i cannot detect when a new messages arrives from the other side and act accordingly.
The problem is independent of the programming language, but to be more specific im using python, and cannot access the code of the client.
Is there any option of receiving and sending messages on a single socket simultaneously?
Thanks!
Option 1
Use two threads, one will write to the socket and the second will read from it.
This works since sockets are full-duplex (allow bi-directional simultaneous access).
Option 2
Use a single thread that manages all keep alives using select.epoll. This way one thread can handle multiple clients. Remember though, that if this isn't the only thread that uses the sockets, you might need to handle thread safety on your own
As discussed in another answer, threads are one common approach. The other approach is to use an event loop and nonblocking I/O. Recent versions of Python (I think starting at 3.4) include a package called asyncio that supports this.
You can call the create_connection method on an event_loop to create an asyncio connection. See this example for a simple server that reads and writes over TCP.
In many cases an event loop can permit higher performance than threads, but it has the disadvantage of requiring most or all of your code to be aware of the event model.
I am using apache-zookeeper and kazoo framework for one of my requirement. I have a simple zookeeper cluster setup and few clients connecting to server cluster to read node information. I am facing kazoo.exceptions.ConnectionLoss randomly(once in fifty times).
My concern is on what all times this exception is raised ? Below are the points I thought.
Connection to server was lost
Server didn't respond back within timeout set in server configuration
Can there be any other reasons for this exception? I don't see documentation explaining anything in detail on this.
I fear I don't have ready answer but looking at Kazoo code I think this can come in following conditions,
Socket read timeout,
Socket write timeout,
deserialize failure because of timeout issues,
Client creation with high initial bytes value of node
Tried to gather this from Kazoo unittest code test_connection test_client ,
I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.
From researching I understand that either a heartbeat or an increased connection timeout can be used to solve this. Both these solutions raise errors when attempting them. In reading answers to similar posts I've also learned that many changes have been implemented to RabbitMQ since the answers were posted (e.g. the default heartbeat timeout has changed to 60 from 580 prior to RabbitMQ 3.5.5).
When specifying a heartbeat and blocked connection timeout:
credentials = pika.PlainCredentials('user', 'password')
parameters = pika.ConnectionParameters('XXX.XXX.XXX.XXX', port, '/', credentials, blocked_connection_timeout=2000)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
The following error is displayed:
TypeError: __init__() got an unexpected keyword argument 'blocked_connection_timeout'
When specifying heartbeat_interval=1000 in the connection parameters a similar error is shown: TypeError: __init__() got an unexpected keyword argument 'heartbeat_interval'
And similarly for socket_timeout = 1000 the following error is displayed: TypeError: __init__() got an unexpected keyword argument 'socket_timeout'
I am running RabbitMQ 3.6.1, pika 0.10.0 and python 2.7 on Ubuntu 14.04.
Why are the above approaches producing errors?
Can a heartbeat approach be used where there is a long running continuous task? For example can heartbeats be used when performing large database joins which take 30+ mins? I am in favour of the heartbeat approach as many times it is difficult to judge how long a task such as database join will take.
I've read through answers to similar questions
Update: running code from the pika documentation produces the same error.
I've run into the same problem with my systems, that you are seeing, with dropped connection during very long tasks.
It's possible the heartbeat might help keep your connection alive, if your network setup is such that idle TCP/IP connections are forcefully dropped. If that's not the case, though, changing the heartbeat won't help.
Changing the connection timeout won't help at all. This setting is only used when initially creating the connection.
I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.
there are two reasons for this, both of which you have run into already:
Connections drop randomly, even under the best of circumstances
Re-starting a process because of a re-queued message can cause problems
Having deployed RabbitMQ code with tasks that range from less than a second, out to several hours in time, I found that acknowledging the message immediately and updating the system with status messages works best for very long tasks, like this.
You will need to have a system of record (probably with a database) that keeps track of the status of a given job.
When the consumer picks up a message and starts the process, it should acknowledge the message right away and send a "started" status message to the system of record.
As the process completes, send another message to say it's done.
This won't solve the dropped connection problem, but nothing will 100% solve that anyways. Instead, it will prevent the message re-queueing problem from happening when a connection is dropped.
This solution does introduce another problem, though: when the long running process crashes, how do you resume the work?
The basic answer is to use the system of record (your database) status for the job to tell you that you need to pick up that work again. When the app starts, check the database to see if there is work that is unfinished. If there is, resume or restart that work in whatever manner is appropriate.
I've already see this issue. The reason is you declare to use this queue. but you didn't bind the queue in the exchange.
for example:
#Bean(name = "test_queue")
public Queue testQueue() {
return queue("test_queue");
}
#RabbitListener(queues = "test_queue_1")
public void listenCreateEvent(){
}
if you listen a queue didn't bind to the exchange. it will happen.