Recover from dropped connection in redis pub/sub

Recover from dropped connection in redis pub/sub - python

I am running client that is connecting to a redis db. The client is on a WiFi connection and will drop the connection at times. Unfortunately, when this happens, the program just keeps running without throwing any type of warning.
r = redis.StrictRedis(host=XX, password=YY...)
ps = r.pubsub()
ps.subscribe("12345")
for items in ps.listen():
if items['type'] == 'message':
data = items['data']
Ideally, what I am looking for is a catch an event when the connection is lost, try and reestablish the connection, do some error correcting, then get things back up and running. Should this be done in the python program? Should I have an external watchdog?

Unfortunately, one have to 'ping' Redis to check if it is available. If You try to put a value to Redis storage, it will raise an ConnectionError exception if connection is lost. But the listen() generator will not close automatically when connection is lost.
I think that hacking Redis' connection pool could help, give it a try.
P.S. In is very insecure to connect to redis in an untrusted network environment.

This is an old, old question but I linked one of my own questions to it and happened to run across it again. It turned out there was a bug in the redis library that caused the client to enter an infinite loop attempting to reconnect if it lost connection to the redis server. I debugged the issue and PR'd the change. it was merged a long time ago now. Once surfaced the maintainer also knew of a second location that had the same issue.
This problem shouldn't occur anymore.
To fully answer the question, I can't remember which error it is given the time since I fixed this but there is now a specific error returned you can catch and reconnect on.

Related

Python Mysql-Connector. Which is better connection.close() or connection.disconnect() or connection.shutdown()

I have a question and I hope that someone could help me.
To give you some context, imagine a loop like this:
while True:
conn = mysql.connector.connect(**args) #args without specifying poolname
conn.cursor().execute(something)
conn.commit()
conn.cursor.close()
#at this point what is better:
conn.close()
#or
conn.disconnect()
#or
conn.shutdown()
In my case, I'm using conn.close() but after a long time of execution, the script I always get an error:
mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server during query
Aparently I'm exceeding the time-out of the mysql connection which is by default 8 hours. But looking at the loop, it's creating and closing new connections on each iteration. I'm pretty sure that the cursor execution takes no more than an hour.
So the question is: doesn't the close() method close the connection? Should I use disconnect() or shutdown() instead? What are the differences between using one or the other.
I hope I've explained myself well, best regards!

There might be a problem inside your code.
Normally, close() will work everytime even if you are using loop.
But still try to trial and error those three command and see what suits your code.

The doc say that clearly
close() is a synonym for disconnect().
For a connection obtained from a connection pool, close() does not
actually close it but returns it to the pool and makes it available
for subsequent connection requests
disconnect() tries to send a QUIT command and close the socket. It raises no exceptions. MySQLConnection.close() is a synonymous method name and more commonly used.
To shut down the connection without sending a QUIT command first, use
shutdown().
For shutdown
Unlike disconnect(), shutdown() closes the client connection without
attempting to send a QUIT command to the server first. Thus, it will
not block if the connection is disrupted for some reason such as
network failure.
But I do not figure out why you get Lost connection to MySQL server during query You may check this discussion Lost connection to MySQL server during query

How to close a SolrClient connection?

I am using SolrClient for python with Solr 6.6.2. It works as expected but I cannot find anything in the documentation for closing the connection after opening it.
def getdocbyid(docidlist):
for id in docidlist:
solr = SolrClient('http://localhost:8983/solr', auth=("solradmin", "Admin098"))
doc = solr.get('Collection_Test',doc_id=id)
print(doc)
I do not know if the client closes it automatically or not. If it doesn't, wouldn't it be a problem if several connections are left open? I just want to know if it there is any way to close the connection. Here is the link to the documentation:
https://solrclient.readthedocs.io/en/latest/

The connections are not kept around indefinitely. The standard timeout for any persistent http connection in Jetty is five seconds as far as I remember, so you do not have to worry about the number of connections being kept alive exploding.
The Jetty server will also just drop the connection if required, as it's not required to keep it around as a guarantee for the client. solrclient uses a requests session internally, so it should do pipelining for subsequent queries. If you run into issues with this you can keep a set of clients available as a pool in your application instead, then request an available client instead of creating a new one each time.
I'm however pretty sure you won't run into any issues with the default settings.

RabbitMQ closes connection when processing long running tasks and timeout settings produce errors

I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.
From researching I understand that either a heartbeat or an increased connection timeout can be used to solve this. Both these solutions raise errors when attempting them. In reading answers to similar posts I've also learned that many changes have been implemented to RabbitMQ since the answers were posted (e.g. the default heartbeat timeout has changed to 60 from 580 prior to RabbitMQ 3.5.5).
When specifying a heartbeat and blocked connection timeout:
credentials = pika.PlainCredentials('user', 'password')
parameters = pika.ConnectionParameters('XXX.XXX.XXX.XXX', port, '/', credentials, blocked_connection_timeout=2000)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
The following error is displayed:
TypeError: __init__() got an unexpected keyword argument 'blocked_connection_timeout'
When specifying heartbeat_interval=1000 in the connection parameters a similar error is shown: TypeError: __init__() got an unexpected keyword argument 'heartbeat_interval'
And similarly for socket_timeout = 1000 the following error is displayed: TypeError: __init__() got an unexpected keyword argument 'socket_timeout'
I am running RabbitMQ 3.6.1, pika 0.10.0 and python 2.7 on Ubuntu 14.04.
Why are the above approaches producing errors?
Can a heartbeat approach be used where there is a long running continuous task? For example can heartbeats be used when performing large database joins which take 30+ mins? I am in favour of the heartbeat approach as many times it is difficult to judge how long a task such as database join will take.
I've read through answers to similar questions
Update: running code from the pika documentation produces the same error.

I've run into the same problem with my systems, that you are seeing, with dropped connection during very long tasks.
It's possible the heartbeat might help keep your connection alive, if your network setup is such that idle TCP/IP connections are forcefully dropped. If that's not the case, though, changing the heartbeat won't help.
Changing the connection timeout won't help at all. This setting is only used when initially creating the connection.
I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.
there are two reasons for this, both of which you have run into already:
Connections drop randomly, even under the best of circumstances
Re-starting a process because of a re-queued message can cause problems
Having deployed RabbitMQ code with tasks that range from less than a second, out to several hours in time, I found that acknowledging the message immediately and updating the system with status messages works best for very long tasks, like this.
You will need to have a system of record (probably with a database) that keeps track of the status of a given job.
When the consumer picks up a message and starts the process, it should acknowledge the message right away and send a "started" status message to the system of record.
As the process completes, send another message to say it's done.
This won't solve the dropped connection problem, but nothing will 100% solve that anyways. Instead, it will prevent the message re-queueing problem from happening when a connection is dropped.
This solution does introduce another problem, though: when the long running process crashes, how do you resume the work?
The basic answer is to use the system of record (your database) status for the job to tell you that you need to pick up that work again. When the app starts, check the database to see if there is work that is unfinished. If there is, resume or restart that work in whatever manner is appropriate.

I've already see this issue. The reason is you declare to use this queue. but you didn't bind the queue in the exchange.
for example:
#Bean(name = "test_queue")
public Queue testQueue() {
return queue("test_queue");
}
#RabbitListener(queues = "test_queue_1")
public void listenCreateEvent(){
}
if you listen a queue didn't bind to the exchange. it will happen.

insufficient data in "D" message

I'm using SQLAlchemy scoped sessions to work with a postgresql 9.4 database.
Sometimes I get an error that says "DatabaseError: (DatabaseError) insufficient data in "D" message". I cannot reproduce this error and it happens in an unpredictable way.
After looking at he postgres log files, this error occurs shortly after postgresql logs "could not receive data from client: Connection reset by peer". I guess that means that the connection was cut from the application side. But I don't see anything that could cause this.

It's time to break out your network tools. You have errors on both end that suggests something caused your connection to drop.
It might be hardware, drivers, some bug in your software stack or a proxy / firewall deciding it didn't like the look of your connection and killed it. It's unlikely to be PostgreSQL itself or any of your Python code.
Fire up tcpdump or wireshark and take a look at the packets going back and fore. Ideally on both ends of the connection. That should give you a good indication of where the problem is.

MySQLdb execute timeout

Sometimes in our production environment occurs situation when connection between service (which is python program that uses MySQLdb) and mysql server is flacky, some packages are lost, some black magic happens and .execute() of MySQLdb.Cursor object never ends (or take great amount of time to end).
This is very bad because it is waste of service worker threads. Sometimes it leads to exhausting of workers pool and service stops responding at all.
So the question is: Is there a way to interrupt MySQLdb.Connection.execute operation after given amount of time?

if the communication is such a problem, consider writing a 'proxy' that receives your SQL commands over the flaky connection and relays them to the MySQL server on a reliable channel (maybe running on the same box as the MySQL server). This way you have total control over failure detection and retrying.

You need to analyse exactly what the problem is. MySQL connections should eventually timeout if the server is gone; TCP keepalives are generally enabled. You may be able to tune the OS-level TCP timeouts.
If the database is "flaky", then you definitely need to investigate how. It seems unlikely that the database really is the problem, more likely that networking in between is.
If you are using (some) stateful firewalls of any kind, it's possible that they're losing some of the state, thus causing otherwise good long-lived connections to go dead.
You might want to consider changing the idle timeout parameter in MySQL; otherwise, a long-lived, unused connection may go "stale", where the server and client both think it's still alive, but some stateful network element in between has "forgotten" about the TCP connection. An application trying to use such a "stale" connection will have a long wait before receiving an error (but it should eventually).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.