Sometimes in our production environment occurs situation when connection between service (which is python program that uses MySQLdb) and mysql server is flacky, some packages are lost, some black magic happens and .execute() of MySQLdb.Cursor object never ends (or take great amount of time to end).
This is very bad because it is waste of service worker threads. Sometimes it leads to exhausting of workers pool and service stops responding at all.
So the question is: Is there a way to interrupt MySQLdb.Connection.execute operation after given amount of time?
if the communication is such a problem, consider writing a 'proxy' that receives your SQL commands over the flaky connection and relays them to the MySQL server on a reliable channel (maybe running on the same box as the MySQL server). This way you have total control over failure detection and retrying.
You need to analyse exactly what the problem is. MySQL connections should eventually timeout if the server is gone; TCP keepalives are generally enabled. You may be able to tune the OS-level TCP timeouts.
If the database is "flaky", then you definitely need to investigate how. It seems unlikely that the database really is the problem, more likely that networking in between is.
If you are using (some) stateful firewalls of any kind, it's possible that they're losing some of the state, thus causing otherwise good long-lived connections to go dead.
You might want to consider changing the idle timeout parameter in MySQL; otherwise, a long-lived, unused connection may go "stale", where the server and client both think it's still alive, but some stateful network element in between has "forgotten" about the TCP connection. An application trying to use such a "stale" connection will have a long wait before receiving an error (but it should eventually).
Related
I used ftputil to download a batch of files from a FTP server. It raised the error ftputil.error.FTPIOError: [Errno 60] Operation timed out.
As described in Documentation – ftputil,
keep_alive() attempts to keep the connection to the remote server active in order to prevent timeouts from happening. This method is primarily intended to keep the underlying FTP connection of an FTPHost object alive while a file is uploaded or downloaded. This will require either an extra thread while the upload or download is in progress or calling keep_alive from a callback function.
I called keep_alive from a callback function with,
ftp_host.download(source, target, callback=ftp_host.keep_alive)
but it raised ERROR __main__ keep_alive() takes 1 positional argument but 2 were given.
How do I keep a FTP connection alive?
This isn't directly an answer to your question, but it may help finding an answer for your particular problem yourself. Also, a ticket on the ftputil website is better for help with debugging a problem. That said, I think it's fine to ask on StackOverflow first since you don't know in advance if the problem is a simple one or not. :-)
Since FTP is a stateful protocol, client and server can't send arbitrary commands at a given time. The allowed commands and possibly replies are determined by the state the connection is in. See also the state diagrams in RFC 959.
To work around this limitation, ftputil creates a new FTP connection behind the scenes for each remote file object [1]. With this approach, you can still send commands like chdir or start a download while another is still in progress. However, this means that from the perspective of the server, all these FTP connections that come from a single FTPHost object are independent connections, so each of these connections can have their timeout at different times, depending on the usage pattern of the respective connection.
For example, there was ftputil ticket 141, where presumably the main connection initiated by the FTPHost object timed out while a connection used for downloading was still usable.
In your case, it might be helpful to find out which of the underlying connections is timing out (the initial connection or a connection for a remote file). You can use ftputil.session.session_factory to create factories that have FTP debugging enabled (see the documentation).
Unfortunately, a timeout of 60 seconds is quite short, so there are relatively many chances for timeouts.
Especially given the possibility of timeouts in FTP connections, my advice is to write software for FTP transfers in a way so that you can restart the operation (ideally with a new FTPHost object for robustness) where it was interrupted by the timeout. So far I haven't been able to come up with a way to universally work around timeouts. In simple cases you may actually be better off using ftplib directly, although ftputil has robustness and latency improvements that ftplib doesn't have. Using ftplib doesn't save you from timeouts, but at least you don't have any "hidden" connections that may make debugging more difficult.
[1] That said, if you close a remote file in ftputil, the underlying FTP connection can be reused unless it's not timed out. The library checks for a timeout before it reuses the connection.
The picture regarding timeouts is even more complicated by ftputil caching a lot of information from the server to reduce latency. For example, if you call FTPHost.getcwd(), the current directory is retrieved from a cached attribute, not by sending a PWD command to the server and thereby resetting the timeout. Stat information from directory listings is also usually cached.
After couple hours looking for solutions I got it running without '421 Timeout' errors calling keepalive from separate thread. However your I/O Timeout error probably was caused by connection problems.
import ftputil
from threading import Thread
from time import sleep
fhandle = ftputil.FTPHost('host', 'user', 'pwd')
quitThread = 0
def _thread_keep_alive():
while quitThread == 0:
print("KEEPALIVE!")
fhandle.keep_alive()
sleep(25)
thread = Thread(target = _thread_keep_alive)
thread.start()
# some downloading...
quitThread = 1
fhandle.close()
I am using SolrClient for python with Solr 6.6.2. It works as expected but I cannot find anything in the documentation for closing the connection after opening it.
def getdocbyid(docidlist):
for id in docidlist:
solr = SolrClient('http://localhost:8983/solr', auth=("solradmin", "Admin098"))
doc = solr.get('Collection_Test',doc_id=id)
print(doc)
I do not know if the client closes it automatically or not. If it doesn't, wouldn't it be a problem if several connections are left open? I just want to know if it there is any way to close the connection. Here is the link to the documentation:
https://solrclient.readthedocs.io/en/latest/
The connections are not kept around indefinitely. The standard timeout for any persistent http connection in Jetty is five seconds as far as I remember, so you do not have to worry about the number of connections being kept alive exploding.
The Jetty server will also just drop the connection if required, as it's not required to keep it around as a guarantee for the client. solrclient uses a requests session internally, so it should do pipelining for subsequent queries. If you run into issues with this you can keep a set of clients available as a pool in your application instead, then request an available client instead of creating a new one each time.
I'm however pretty sure you won't run into any issues with the default settings.
I am working on a websocket server and am trying to use python twisted + autobahn but I believe I am hitting a memory leak. In fact I was able to reproduce it with the echo code on https://github.com/crossbario/autobahn-python/tree/master/examples/twisted/websocket/echo
The symptom I see is that on the server side the protocol instances are never freed after connection is closed.
I have tried to examine this in various ways - simplest being to add a print in del method, more complex is examining with pdb and gc. And yes - observing the memory use of the process climbing steadily as connections are made and closed over and over.
What I expect to happen is - after onClose completes the protocol instance should go away for good. In fact I have other server implementations based on twisted (but without autobahn websockets) and I have confirmed that's how it works there (Although I use connectionLost instead).
Does anyone have a clue what is happening?
I faced the issue of memory overflow with an autobahn web socket server that distributed realtime data to clients. The issue was however with clients that keep the connection open but is not able to consumer the data.
This caused the memory to keep on accumulating at the server side. I was able to address the issue by finding the variable responsible keeping the buffer data. Its the transport._tempDataBuffer variable from transport layer in twisted. By defining a maximum size limit on the buffer and clearing it when full, solved the issue for me.
Don't know if you are referring to the same issue, see if this helps.
Sometimes our rabbit messaging server requires a restart. After which however some consumers which are listening via basic consume blocking call do not consume any messages until they are restarted themselves and neither do they raise any exception.
What is the reason for this and how might I fix?
In the connectionFactory, please ensure the following property is set to true:
factory.setAutomaticRecoveryEnabled(true);
For more details, please refer the document here
As I mentioned in my comment, every AMQP client library has a different way to recover connections, and some depend on the developer to do that. There is NO canonical method.
Pika has this example as a starting point for connection recovery. Note that the code is for the unreleased version of Pika (1.0.0). If you're on 0.12.0 you will have to adjust the parameters to the method calls.
The best way to test and implement connection recovery is to simulate failure conditions and then code for them. Run your application, then kill the beam.smp process (RabbitMQ) to see what happens. If you have a RabbitMQ cluster, use firewall rules to simulate a network partition. Can your application handle that? What happens when you run rabbitmqctl stop_app; sleep 10; rabbitmqctl start_app? Can your app handle that?
Run your application through a TCP proxy like toxiproxy and introduce latency and other non-optimal conditions. Shut down the proxy to simulate a sudden TCP connection close. In each case, code for that failure condition and log the event so that someone can later diagnose what has happened.
I have seen too many developers code for the "happy path" only to have their applications fail spectacularly in production with zero ability to determine the source of the failure.
I have a Python test program for testing features of another software component, let's call the latter the component under test (COT).
The Python test program is connected to the COT via a persistent TCP connection.
The Python program is using the Python socket API for this.
Now in order to simulate a failure of the physical link, I'd like to have the Python program shut the socket down, but without disconnecting appropriately.
I.e. I don't want anything to be sent on the TCP channel any more, including any TCP SYN/ACK/FIN. I just want the socket to go silent. It must not respond to the remote packets any more.
This is not as easy as it seems, since calling close on a socket will send TCP FIN packets to the remote end. (graceful disconnection).
So how can I kill the socket without sending any packets out?
I cannot shut down the Python program itself, because it needs to maintain other connections to other components.
For information, the socket runs in a separate thread. So I thought of abruptly killing the thread, but this is also not so easy. (Is there any way to kill a Thread?)
Any ideas?
You can't do that from a userland process since in-kernel network stack still holds resources and state related to given TCP connection. Event if you kill your whole process the kernel is going to send a FIN to the other side since it knows what file descriptors your process had and will try to clean them up properly.
One way to get around this is to engage firewall software (on local or intermediate machine). Call a script that tells the firewall to drop all packets from/to given IP and port (that of course would need appropriate administrative privileges).
Contrary to Nikolai's answer, there is indeed a way to reset the connection from userland such that an RST is sent and pending data discarded, rather than a FIN after all the pending data. However as it is more abused than used, I won't publish it here. And I don't know whether it can be done from Python. Setting one of the three possible SO_LINGER configurations and closing will do it. I won't say more than that, and I will say that this technique should only be used for the purpose outlined in the question.