Python Socket Client Disappears, Server Can Not Tell

Python Socket Client Disappears, Server Can Not Tell - python

I'm going crazy writing a little socket server in python. Everything was working fine, but I noticed that in the case where the client just disappears, the server can't tell. I simulate this by pulling the ethernet cable between the client and server, close the client, then plug the cable back in. The server never hears that the client disconnected and will wait forever, never allowing more clients to connect.
I figured I'd solve this by adding a timeout to the read loop so that it would try and read every 10 seconds. I thought maybe if it tried to read from the socket it would notice the client was missing. But then I realized there really is no way for the server to know that.
So I added a heartbeat. If the server goes 10 seconds without reading, it will send data to the client. However, even this is successful (meaning doesn't throw any kind of exception). So I am able to both read and write to a client that isn't there any more. Is there any way to know that the client is gone without implementing some kind of challenge/response protocol between the client and server? That would be a breaking change in this case and I'd like to avoid it.
Here is the core of my code for this:
def _loop(self):
command = ""
while True:
socket, address = self._listen_socket.accept()
self._socket = socket
self._socket.settimeout(10)
socket.sendall("Welcome\r\n\r\n")
while True:
try:
data = socket.recv(1)
except timeout: # Went 10 seconds without data
pass
except Exception as e: # Likely the client closed the connection
break
if data:
command = command + data
if data == "\n" or data == "\r":
if len(command.strip()) > 0:
self._parse_command(command.strip(), socket)
command = ""
if data == '\x08':
command = command[:-2]
else: # Timeout on read
try:
self._socket.sendall("event,heartbeat\r\n") # Send heartbeat
except:
self._socket.close()
break
The sendall for the heartbeat never throws an exception and the recv only throws a timeout (or another exception if the client properly closes the connection under normal circumstances).
Any ideas? Am I wrong that sending to a client that doesn't ACK should generate an exception eventually (I've tested for several minutes).

The behavior you are observing is the expected behavior for a TCP socket connection. In particular, in general the TCP stack has no way of knowing that an ethernet cable has been pulled or that the (now physically disconnected) remote client program has shut down; all it knows is that it has stopped receiving acknowledgement packets from the remote peer, and for all it knows the packets could just be getting dropped by an overloaded router somewhere and the issue will resolve itself momentarily. Given that, it does what TCP always does when its packets don't get acknowledged: it reduces its transmission rate and its number-of-packets-in-flight limit, and retransmits the unacknowledged packets in the hope that they will get through this time.
Assuming the server's socket has outgoing data pending, the TCP stack will eventually (i.e. after a few minutes) decide that no data has gone through for a long-enough time, and unilaterally close the connection. So if you're okay with a problem-detection time of a few minutes, the easiest way to avoid the zombie-connection problem is simply to be sure to periodically send a bit of heartbeat data over the TCP connection, as you described. When the TCP stack tries (and repeatedly fails) to get the outgoing data sent-and-acknowledged, that is what eventually will trigger it to close the connection.
If you want something quicker than that, you'll need to implement your own challenge/response system with timeouts (either over the TCP socket, or over a separate TCP socket, or over UDP), but note that in doing so you are likely to suffer from false positives yourself (e.g. you might end up severing a TCP connection that was not actually dead but only suffering from a temporary condition of lost packets due to congestion). Whether or not that's a worthwhile tradeoff depends on what sort of program you are writing. (Note also that UDP has its own issues, particularly if you want your system to work across firewalls, etc)

Related

Python - Read remaining data from socket after TCP RST

I'm implementing a file transfer protocol with the following use case:
The server sends the file chunk by chunk inside several frames.
The client might cancel the transfer: for this, it sends a message and disconnects at TCP level.
What happened in that case on server side (Python running on Windows) is that I catch a ConnectionResetException (this is normal, the client has disconnected the socket) while sending the data to the client. I would want to read the latest data sent by the client (the message used to abort the call), but calling mysocket.recv() still raises a ConnectionResetException.
With a wireshark capture, I can clearly see that the message was properly sent by the client prior to TCP disonnection.
Any idea floks? Thanks!
VR

In order to understand what to do about this situation, you need to understand how a TCP connection is closed (see, e.g. this) and how the socket API relates to a clean shutdown (without fail, see this).
Your client is most likely calling close to terminate the connection. The problem with this is that there may be unread data in the socket receive queue or data arriving shortly from the other end that you will no longer be able to read, which is basically an error condition. To signal to the other end that data sent cannot be delivered to the receiving application, a reset is sent (well, technically, "SHOULD be sent" as per the RFC) and the TCP connection is abnormally terminated.
You might think that enabling SO_LINGER will help (many, many bits have been spilt over this so I won't elaborate further), but it won't solve the problem of unread data by the client causing the reset.
The client needs to instead call shutdown(SHUT_WR) to indicate that it is done sending, and then continue to call recv() until it reads 0 bytes indicating the other side is done sending. You may then call close().
Note that the Python 2 socket documentation states that
Depending on the platform, shutting down one half of the connection can also close the opposite half (e.g. on Mac OS X, shutdown(SHUT_WR) does not allow further reads on the other end of the connection).
This sounds like a bug to me. To get around this, you would have to send your cancel message, then keep reading until you get 0 bytes so that you know the server received the cancel message. You may then close the socket.
The Python 3.8 docs make no such disclaimer.

How to know the status of tcp connect in python?

In python, tcp connect returns success even though the connect request is in queue at server end. Is there any way to know at client whether accept happened or in queue at server?

The problem is not related to Python but is caused by the underlying socket machinery that does its best to hide low level network events from the program. The best I can imagine would be to try a higher level protocol handshake (send a hello string and set a timeout for receiving the answer) but it would make no difference between the following problem:
connection is queued on peer and still not accepted
connection has been accepted, but for any other reason the server could not process it in allocated time
(only if timeout is very short) congestion on machines (including sender) and network added a delay greater that the timeout
My advice is simply that you do not even want to worry with such low level details. As problems can arise server side after the connection has been accepted, you will have to deal with possible higher level protocol errors, timeouts or connection loss. Just say that there is no difference between a timeout after connection has been accepted and a timeout to accept the connection.

If connect returns and there is no error, the TCP 3-Way Handshake has taken place successfully.
Client: connect sends a SYN (and blocks)
Server: (blocking on accept) sends a SYN,ACK
Client: connect sends an ACK
After 3, connectgives control back to you on the client side and accept also gives control back to the caller on the server side.
Of course, if the server is fully loaded, there is no guarantee that the wake-up of accept means actual processing of the request, but the fact that connect has woken up and returned with no error is a guarantee of having successfully set-up the TCP connection.
Packets can be sent.
For a good explanation see for example:
https://www.ibm.com/developerworks/aix/library/au-tcpsystemcalls/index.html
And head to the The 3-way TCP handshake section

Keeping python sockets alive in event of connection loss

I'm trying to make a socket connection that will stay alive so that in event of connection loss. So basically I want to keep the server always open (also the client preferably) and restart the client after the connection is lost. But if one end shuts down both ends shut down. I simulated this by having both ends on the same computer "localhost" and just clicking the X button. Could this be the source of my problems?
Anyway my connection code
m.connect(("localhost", 5000))
is in a if and try and while e.g.
while True:
if tryconnection:
#Error handeling
try:
m.connect(("localhost", 5000))
init = True
tryconnection = False
except socket.error:
init = False
tryconnection = True
And at the end of my code I just a m.send("example") when I press a button and if that returns an error the code of trying to connect to "localhost" starts again. And the server is a pretty generic server setup with a while loop around the x.accept(). So how do keep them both alive when the connection closes so they can reconnect when it opens again. Or is my code alright and its just by simulating on the same computer is messing with it?

I'm assuming we're dealing with TCP here since you use the word "connection".
It all depend by what you mean by "connection loss".
If by connection loss you mean that the data exchanges between the server and the client may be suspended/irresponsive (important: I did not say "closed" here) for a long among of time, seconds or minutes, then there's not much you can do about it and it's fine like that because the TCP protocol have been carefully designed to handle such situations gracefully. The timeout before deciding one or the other side is definitely down, give up, and close the connection is veeeery long (minutes). Example of such situation: the client is your smartphone, connected to some server on the web, and you enter a long tunnel.
But when you say: "But if one end shuts down both ends shut down. I simulated this by having both ends on the same computer localhost and just clicking the X button", what you are doing is actually closing the connections.
If you abruptly terminate the server: the TCP/IP implementation of your operating system will know that there's not any more a process listening on port 5000, and will cleanly close all connections to that port. In doing so a few TCP segments exchange will occur with the client(s) side (it's a TCP 4-way tear down or a reset), and all clients will be disconected. It is important to understand that this is done at the TCP/IP implementation level, that's to say your operating system.
If you abruptly terminate a client, accordingly, the TCP/IP implementation of your operating system will cleanly close the connection from it's port Y to your server port 5000.
In both cases/side, at the network level, that would be the same as if you explicitly (not abruptly) closed the connection in your code.
...and once closed, there's no way you can possibly re-establish those connections as they were before. You have to establish new connections.
If you want to establish these new connections and get the application logic to the state it was before, now that's another topic. TCP alone can't help you here. You need a higher level protocol, maybe your own, to implement stateful client/server application.

The issue is not related to the programming language, in this case python. The oeprating system (Windows or linux), has the final word regarding the resilience degree of the socket.

Right way to do TCP connection between python and Qt?

I want to connect two programs via TCP. My main program is written with Qt and needs to talk to another program written in Python. I think about using TCP sockets and Google's protobuf to exchange the messages. In Qt, I use a QTcpSocket that accepts the connection and reads from the stream, as soon as its readyRead-Signal is triggered. In python, I also use a tcp-socket and send messages.
This works very well, as long as no side is killed. Currently, the python-side is sending messages to the C++ side. (socket.send(str(id)+"\ņ")) After every send, I check for exceptions (connection reset by peer, broken pipe, ...) to see if the message was received.
If I kill the C++ program, the next message send from the python client triggers no exception, but is obviously not received. The next message triggers the exception, but the last message is lost.
After a bit of experimenting, I found that sending an empty message (socket.send("\n")) after each message solves the problem. I do now
try:
s.send(str(id)+"\n");
s.send("\n")
sleep(0.5)
except socket.error,v:
print "FAILed to send",id,v[0],v[1]
and receive the exception as soon as the C++-Peer is killed (calling s.send(str(id)+"\n\n") however does not help).
Finally, my question is: Is this a reliable way to check if my message was received?
I don't want to switch to UDP as I don't want to implement my own ACK-messages for each message.
This is my first time I use sockets with python and C++ and can't really explain why my approach works, so I'm a bit uncomfortable using it.
Can someone tell me a a bit more? I guess that the python socket expects an ACK for the first send(int(id)+"\n") after sending the send("\n") and then realizes that the pipe is broken. Is this correct?

When a TCP connection is broken by the remote peer, your TCP socket will become ready-for-read, and then when you try to recv() from it, recv() will return 0.
Of course if your sending program is only calling send() (the way your Python program is), then it won't notice what's going on with the socket's recv-side, and you end up with the problem you described.
On the other hand, you don't want to just blindly call recv() either, because if recv() is called and the remote peer hasn't sent any data, recv() will block waiting for data and unless the remote peer ever actually sends some, you'll have a deadlock.
The simplest way to deal with that is to use select() to multiplex your I/O, so that your Python script can know when it's appropriate to call send() and/or recv(). Something like this:
import socket
import select
[...]
while 1:
socketsToReadFrom = [s]
if (you_still_have_more_data_to_send):
socketsToWriteTo = [s]
else:
socketsToWriteTo = None
# This select() call will block until there's something to do
socketsReadForRead, socketsReadyForWrite, junk = select.select(socketsToReadFrom, socketsToWriteTo, None)
if (s in socketsToReadFrom):
readBytes = s.recv(1024)
if (len(readBytes) > 0):
print "Read %i bytes from remote peer!" % readBytes
else:
print "Remote peer closed the TCP Connection!!"
break
if ((socketsToWriteTo != None) and (s in socketsToWriteTo)):
s.send(some_more_data)
As far as verifying whether your message was received, that's a bit tricky since TCP (and the network stack) do a fair amount of pipelining/buffering. In particular, a successful return from send() only tells you that your data has been handed off to your local TCP stack's outgoing-data buffer; it doesn't mean that the data has arrived at the remote peer already. If you really want a "receipt" that the remote peer has already processed the data, you'll have to have the remote peer send back some kind of acknowledgement. Note that under TCP that level of sophistication is often unnecessary though, since barring a network or hardware failure (or the remote peer closing the TCP connection), you can be fairly sure that the TCP stack will get your data there eventually; e.g. if a packet got dropped, the TCP stack will resend it automatically. Data loss will only occur if the network connectivity stops working for an extended period (e.g. several minutes), at which point the TCP stack will give up and close the TCP connection.

Python doesn't detect a closed socket until the second send

When I close the socket on one end of a connection, the other end gets an error the second time it sends data, but not the first time:
import socket
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(("localhost", 12345))
server.listen(1)
client = socket.create_connection(("localhost",12345))
sock, addr = server.accept()
sock.close()
client.sendall("Hello World!") # no error
client.sendall("Goodbye World!") # error happens here
I've tried setting TCP_NODELAY, using send instead of sendall, checking the fileno(), I can't find any way to get the first send to throw an error or even to detect afterwards that it failed. EDIT: calling sock.shutdown before sock.close doesn't help. EDIT #2: even adding a time.sleep after closing and before writing doesn't matter. EDIT #3: checking the byte count returned by send doesn't help, since it always returns the number of bytes in the message.
So the only solution I can come up with if I want to detect errors is to follow each sendall with a client.sendall("") which will raise an error. But this seems hackish. I'm on a Linux 2.6.x so even if a solution only worked for that OS I'd be happy.

This is expected, and how the TCP/IP APIs are implemented (so it's similar in pretty much all languages and on all operating systems)
The short story is, you cannot do anything to guarantee that a send() call returns an error directly if that send() call somehow cannot deliver data to the other end. send/write calls just delivers the data to the TCP stack, and it's up to the TCP stack to deliver it when it can.
TCP is also just a transport protocol, if you need to know if your application "messages" have reached the other end, you need to implement that yourself(some form of ACK), as part of your application protocol - there's no other free lunch.
However - if you read() from a socket, you can get notified immediatly when an error occurs, or when the other end closed the socket - you usually need to do this in some form of multiplexing event loop (that is, using select/poll or some other IO multiplexing facility).
Just note that you cannot read() from a socket to learn whether the most recent send/write succeded, Here's a few cases as of why (but it's the cases one doesn't think about that always get you)
several write() calls got buffered up due to network congestion, or because the tcp window was closed (perhaps a slow reader) and then the other end closes the socket or a hard network error occurs, thus you can't tell if if was the last write that didn't get through, or a write you did 30 seconds ago.
Network error, or firewall silently drops your packets (no ICMP replys are generated), You will have to wait until TCP times out the connection to get an error which can be many seconds, usually several minutes.
TCP is busy doing retransmission as you call send - maybe those retransmissions generate an error.(really the same as the first case)

As per the docs, try calling sock.shutdown() before the call to sock.close().

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.