Avoiding TCP/IP connection hanging

Avoiding TCP/IP connection hanging - python

I am communicating with an instrument via TCP/IP using the Python socket package.
The program sends a command to the instrument to perform an action, and then repetitively sends another "check" command until it receives a "done" reply. However, after many loops, the program hangs while waiting for a "done" reply.
I have circumvented this problem by using the recv_timeout() function below, which returns no data if the socket is hanging, then I close the connection with socket.close() and reconnect.
Is there a more elegant solution without having to reboot anything?
import socket
import time
def recv_timeout(self,timeout=0.5):
'''
code from http://code.activestate.com/recipes/408859/
'''
self.s.setblocking(0)
total_data=[];data='';begin=time.time()
while 1:There must be a way I can reboot to carry on communicating with the instrument, without having to restart.
#if you got some data, then break after wait sec
if total_data and time.time()-begin>timeout:
break
#if you got no data at all, wait a little longer
elif time.time()-begin>timeout*2:
break
try:
data=self.s.recv(8192)
if data:
total_data.append(data)
begin=time.time()
else:
time.sleep(0.1)
except:
pass
return ''.join(total_data)
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect(('555.555.55.555',23))
for action_num in range(0,1000):
socket.sendall(('performaction %s \r'%action_num).encode())
while True:
time.sleep(0.2)
socket.sendall(('checkdone \r').encode())
done = socket.recv_timeout()
if not done:
print 'communication broken...what should I do?'
socket.close()
time.sleep(60)
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect(('555.555.55.555',23))
elif done == '1':
print 'done performing action'
break
socket.close()

I have circumvented this problem by using the recv_timeout() function
below, which returns no data if the socket is hanging
Are you certain that the socket will hang forever? What about the possibility that the instrument just sometimes takes more than half a second to respond? (Note that even if the instrument's software is good at responding in a timely manner, that is no guarantee that the response data will actually get to your Python program in a timely manner. For example, if the TCP packets containing the response get dropped by the network and have to be resent, that could cause them to take more than .5 seconds to return to your program. You can force that scenario to occur by pulling the Ethernet cable out of your PC for a second or two, and then plugging it back in... you'll see that the response bytes still make it through, just a second or two later on (after the dropped packets get resent); that is, if your Python program hasn't given up on them and closed the socket already.
Is there a more elegant solution without having to reboot anything?
The elegant solution is to figure out what is happening to the reply bytes in the fault scenario, and fixing the underlying bug so that the reply bytes no longer get lost. WireShark can be very helpful in diagnosing where the fault is; for example if WireShark shows that the response bytes did enter your computer's Ethernet port, then that is a pretty good clue that the bug is in your Python program's handling of the incoming bytes(*). On the other hand if the response bytes never show up in WireShark, then there might be a bug in the instrument itself that causes it to fail to respond sometimes. Wireshark would also show you if the problem is that your Python script failed to send out the "check" command for some reason.
That said, if you really can't fix the underlying bug (e.g. because it's a bug in the instrument and you don't have the ability to upgrade the source code of the software running on the instrument) then the only thing you can do is what you are doing -- close the socket connection and reconnect. If the instrument doesn't want to respond for some reason, you can't force it to respond.
(*) One thing to do is print out the contents of the string returned by recv_timeout(). You may find that you did get a reply, but it just wasn't the '1' string you were expecting.

Related

Why doesn't my Python socket correctly send data but my serial terminal does?

I have to send some serial commands from a PC to an equipment as part of a bigger application and no matter what i try, i cannot seem to get python to send the data correctly. When i send the same commands through Termite, everything works as expected. I've been trying to solve this for days and i'm at a loss as to what i could try next.
The things i did and could rule out as not the cause for this issue:
Improper socket configuration: I've made sure to use the correct baud rate, bitlength, parity and stopbits as Termite (which happen to be the default settings for the socket anyways)
Check the command's correctness and termination: Within Termite i have to append CRLF as termination, so in Python i just add \r\n at the end of the command. This seems to be identical, as i've checked both commands on the oscilloscope and they are identical, termination characters included. I've went further and also used a serial port monitor and compared the two and they're identical - no missing bits and correct termination.
I've tried adding a heading \r\n to each command (this is what the equipment expects as command termination) thinking there might be some garbage or noise when first sending some data, but to no avail.
Clearing the input buffer, maybe there could be some issue when not reading from a socket that sends some response. I dont need what the equipment sends via serial, it offers no useful feedback but i did this anyways and have gotten no results. Before each transmission, i make sure to read all the bytes available.
Making sure Windows does not close my port. This i'm not that sure of, maybe it still does it after some long time, but so far i've gotten no errors and could always write and receive data, as confirmed by the serial monitor.
Below are some excerpts of socket configuration and command sending:
try:
self.SPC = serial.Serial(port=connectionData.get("spc_com"), baudrate=115200, timeout=1)
except serial.SerialException as err:
print(f'SPC Serial Error: {err.strerror}')
return False
def SPCCommand(self, command: str):
if not command:
return
try:
self.readAndClearBuffer()
self.SPC.write(command.encode())
except serial.SerialException as err:
print(f'SPC Serial Error: {err.strerror}')
return
def readAndClearBuffer(self):
data = ''
while True:
try:
data = self.SPC.read(1024)
if not data:
return
print(data)
except serial.SerialException as err:
return
self.SPCCommand("\r\n8000011200000000")
I don't have any idea what else to try. The fact that the commands are identical on the oscilloscope, as well as the serial monitor, leaves me at a loss. Could there be a different issue? Is there anything else i could try?

How to receive data and use it on if () statement?

i have tried to receive data form my connection by this code on python 2.7 :
server = socket(AF_INET, SOCK_STREAM)
server.bind(('0.0.0.0', 21))
server.listen(1)
client , addr = server.accept()
data = client.recv(2048)
When i have to print or send to another connection my data it's working , however i want to add those lines :
if(data == "/disconnect") :
<disconnect blha blha... you know >
else :
<print the data and send it back blha blha... >
( i have checked without that if statement and " disconnect blha blha .. " works nicely )
it's just pass the code so when my client requests to disconnect the request is sent to the server as the "original" messages .. (the sever don't kick him)
what should i do ? Thanks !

You have two problems, and you need to fix both of them.
First, a TCP socket is just a stream of bytes. When you do a recv, you're not going to get exactly one message sent by send from the other side, you're going to get whatever's in the buffer at the moment—which could be two messages, or half a message, or anything else. When you're just testing with localhost connections on a computer that isn't heavily loaded, on many platforms, it will do what you're hoping for >99% of the time—but that just makes the problem hard to debug, it doesn't fix it. And as soon as you try to access the same code over the internet, it'll start failing most of the time instead of rarely.
Fortunately, the client appears to be sending messages as text, without any embedded newlines, with a \r\n Windows-style end-of-line between each message. This is a perfectly good protocol; you just have to write the code to handle that protocol on the receive side.
The second problem is that, even if you happen to get exactly one message send, that message includes the \r\n end-of-line. And '/disconnect\r\n' == '/disconnect' is of course going to be false. So, as part of your protocol handler, you need to strip off the newlines.
As it happens, you could solve both problems by using the makefile method to give you a file object that you can iterate, or readline on, etc., just like you do with a file that you open from disk, which you probably already know how to handle.
But it's worth learning how to do this stuff, so I'll show you how to do it manually. The key is that you keep a buffer around, add each recv onto that buffer, and then split it into lines, put any remainder back on the buffer, and process each line as a message. There are more elegant/concise ways to write this, but let's keep it dead simple:
buf = ''
while True:
data = client.recv(2048)
buf += data
lines = buf.split('\r\n')
buf = lines.pop()
for line in lines:
# line is now a single message, with the newline stripped
if line == "/disconnect":
# do disconnect stuff
else:
# do normal message stuff
That's all you need to get the basics working. But in a real server, you also need some code to handle two other conditions—because clients don't always shut down cleanly. For example, if a client gets disconnected from the internet before it can send a /disconnect message, you don't want to keep spinning and reading nothing forever, you want to treat it as a disconnect.
if not data: means the client has done a clean (at the TCP level) shutdown. So, you need to disconnect and break out of the receive loop.
Depending on your design, it may be legal to shutdown only the send side and wait for a final reply from the server, so you want to make sure you've finished sending whatever you have. (This is common in many internet protocols.)
It may even be legal to not send a final newline before shutting down; if you want to support this, you should check if buf: and if so, treat buf as one last command. (This is not common in many protocol—but is a common bug in clients, so, e.g., many web servers will handle it.)
try:/except Exception as e: will catch all kinds of errors. These errors mean the socket is no longer usable (or that there's a serious error in your code, of course), so you want to handle this by throwing away the connection and breaking out of the receive loop, without first sending any final response or reading any final message.
It's almost always worth logging that e in some way (maybe just print 'Error from', addr, repr(e)), so if you're getting unexpected exceptions you have something to debug.

Python Sockets Select is hanging - Doing other tasks while waiting for socket data?

I am rather a noob here, but trying to setup a script where I can poll a socket, and when no socket data has been sent, a loop continues to run and do other things. I have been playing with several examples I found using select(), but no matter how I organize the code, it seems to stop on or near the server.recv() line and wait for a response. I want to skip out of this if no data has been sent by a client, or if no client connection exists.
Note that this application does not require the server script to send any reply data, if it makes any difference.
The actual application is to run a loop and animate some LEDs (which needs root access to the I/O on a Raspberry Pi). I am going to send this script data from another separate script via sockets that will pass in control parameters for the animations. This way the external script does not require root access.
So far the sending and receiving of data works great, I just can't get loop to keep spinning in the absence of incoming data. It is my understanding that this is what select() was intended to allow, but the examples I've found don't seem to be working that way.
I have attempted adding server.setblocking(0) a few different places to no avail. (If I understand correctly a non-blocking instance should allow the code to skip over the recv() if no data has been sent, but I may be off on this).
I have based my code on an example here:
http://ilab.cs.byu.edu/python/select/echoserver.html
Here is the server side script followed by the client side script.
Server Code: sockselectserver.py
#!/usr/bin/env python
import select
import socket
import sys
server = socket.socket()
host = socket.gethostname()
port = 20568
size = 1024
server.bind((host,port))
server.listen(5)
input = [server,sys.stdin]
running = 1
while running:
inputready,outputready,exceptready = select.select(input,[],[])
for s in inputready:
if s == server:
# handle the server socket
client, address = server.accept()
input.append(client)
elif s == sys.stdin:
# handle standard input
junk = sys.stdin.readline()
running = 0
else:
# handle all other sockets
data = s.recv(size)
if data:
s.send(data)
else:
s.close()
input.remove(s)
print "looping"
server.close()
Client Code: skclient.py
#!/usr/bin/python # This is client.py file
import socket # Import socket module
s = socket.socket() # Create a socket object
host = socket.gethostname() # Get local machine name
port = 20568 # Reserve a port for your service.
s.connect((host, port))
data = "123:120:230:51:210:120:55:12:35:24"
s.send(data)
print s.recv(1024)
s.close # Close the socket when done
What I would like to achieve by this example is to see "looping" repeated forever, then when the client script sends data, see that data print, then see the "looping" resume printing over and over. That would tell me it's doing what is intended I can take it from there.
Interesting enough, when I test this as is, whenever I run the client, I see "looping" printed 3 times on the screen, then no more. I don't fully understand what is happening inside the select, but I'd assume it would only print 1 time.
I tried moving the inputready.. select.select() around to different places but found it appears to need to be called each time, otherwise the server stops responding (for example if it is called once prior to the endless while: loop).
I'm hoping this can be made simple enough that it can be taught to other hacker types in a maker class, so I'm hopeful I don't need to get too crazy with multi-threading and more elaborate solutions. As a last resort I'm considering logging all my parameters to mySQL from the external script then using this script to query them back out of tables. I've got experience there and would probably work, but it seems this socket angle would be a more direct solution.
Any help very much appreciated.

Great news. This was an easy fix, wanted to post in case anyone else needed it. The suggestion from acw1668 above got me going.
Simply added a timeout of "0" to the select.select() like this:
inputready,outputready,exceptready = select.select(input,[],[],0)
This is in the python docs but somehow I missed it. Link here: https://docs.python.org/2/library/select.html
Per the docs:
The optional timeout argument specifies a time-out as a floating point number in seconds. When the timeout argument is omitted the function blocks until at least one file descriptor is ready. A time-out value of zero specifies a poll and never blocks.
I tested the same code as above, adding a delay of 5 seconds using time.sleep(5) right after the print "looping" line. With the delay, if no data or client is present the code just loops every 5 seconds and prints "looping" to the screen. If I kick off the client script during the 5 second delay, it pauses and the message is processed the next time the 5 second delay ends. Occasionally it doesn't respond the very next loop, but rather the loop following. I assume this is because the first time through the server.accept is running and the next time through the s.recv() is running which actually exchanges the data.

Python Socket Client Disappears, Server Can Not Tell

I'm going crazy writing a little socket server in python. Everything was working fine, but I noticed that in the case where the client just disappears, the server can't tell. I simulate this by pulling the ethernet cable between the client and server, close the client, then plug the cable back in. The server never hears that the client disconnected and will wait forever, never allowing more clients to connect.
I figured I'd solve this by adding a timeout to the read loop so that it would try and read every 10 seconds. I thought maybe if it tried to read from the socket it would notice the client was missing. But then I realized there really is no way for the server to know that.
So I added a heartbeat. If the server goes 10 seconds without reading, it will send data to the client. However, even this is successful (meaning doesn't throw any kind of exception). So I am able to both read and write to a client that isn't there any more. Is there any way to know that the client is gone without implementing some kind of challenge/response protocol between the client and server? That would be a breaking change in this case and I'd like to avoid it.
Here is the core of my code for this:
def _loop(self):
command = ""
while True:
socket, address = self._listen_socket.accept()
self._socket = socket
self._socket.settimeout(10)
socket.sendall("Welcome\r\n\r\n")
while True:
try:
data = socket.recv(1)
except timeout: # Went 10 seconds without data
pass
except Exception as e: # Likely the client closed the connection
break
if data:
command = command + data
if data == "\n" or data == "\r":
if len(command.strip()) > 0:
self._parse_command(command.strip(), socket)
command = ""
if data == '\x08':
command = command[:-2]
else: # Timeout on read
try:
self._socket.sendall("event,heartbeat\r\n") # Send heartbeat
except:
self._socket.close()
break
The sendall for the heartbeat never throws an exception and the recv only throws a timeout (or another exception if the client properly closes the connection under normal circumstances).
Any ideas? Am I wrong that sending to a client that doesn't ACK should generate an exception eventually (I've tested for several minutes).

The behavior you are observing is the expected behavior for a TCP socket connection. In particular, in general the TCP stack has no way of knowing that an ethernet cable has been pulled or that the (now physically disconnected) remote client program has shut down; all it knows is that it has stopped receiving acknowledgement packets from the remote peer, and for all it knows the packets could just be getting dropped by an overloaded router somewhere and the issue will resolve itself momentarily. Given that, it does what TCP always does when its packets don't get acknowledged: it reduces its transmission rate and its number-of-packets-in-flight limit, and retransmits the unacknowledged packets in the hope that they will get through this time.
Assuming the server's socket has outgoing data pending, the TCP stack will eventually (i.e. after a few minutes) decide that no data has gone through for a long-enough time, and unilaterally close the connection. So if you're okay with a problem-detection time of a few minutes, the easiest way to avoid the zombie-connection problem is simply to be sure to periodically send a bit of heartbeat data over the TCP connection, as you described. When the TCP stack tries (and repeatedly fails) to get the outgoing data sent-and-acknowledged, that is what eventually will trigger it to close the connection.
If you want something quicker than that, you'll need to implement your own challenge/response system with timeouts (either over the TCP socket, or over a separate TCP socket, or over UDP), but note that in doing so you are likely to suffer from false positives yourself (e.g. you might end up severing a TCP connection that was not actually dead but only suffering from a temporary condition of lost packets due to congestion). Whether or not that's a worthwhile tradeoff depends on what sort of program you are writing. (Note also that UDP has its own issues, particularly if you want your system to work across firewalls, etc)

Python doesn't detect a closed socket until the second send

When I close the socket on one end of a connection, the other end gets an error the second time it sends data, but not the first time:
import socket
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(("localhost", 12345))
server.listen(1)
client = socket.create_connection(("localhost",12345))
sock, addr = server.accept()
sock.close()
client.sendall("Hello World!") # no error
client.sendall("Goodbye World!") # error happens here
I've tried setting TCP_NODELAY, using send instead of sendall, checking the fileno(), I can't find any way to get the first send to throw an error or even to detect afterwards that it failed. EDIT: calling sock.shutdown before sock.close doesn't help. EDIT #2: even adding a time.sleep after closing and before writing doesn't matter. EDIT #3: checking the byte count returned by send doesn't help, since it always returns the number of bytes in the message.
So the only solution I can come up with if I want to detect errors is to follow each sendall with a client.sendall("") which will raise an error. But this seems hackish. I'm on a Linux 2.6.x so even if a solution only worked for that OS I'd be happy.

This is expected, and how the TCP/IP APIs are implemented (so it's similar in pretty much all languages and on all operating systems)
The short story is, you cannot do anything to guarantee that a send() call returns an error directly if that send() call somehow cannot deliver data to the other end. send/write calls just delivers the data to the TCP stack, and it's up to the TCP stack to deliver it when it can.
TCP is also just a transport protocol, if you need to know if your application "messages" have reached the other end, you need to implement that yourself(some form of ACK), as part of your application protocol - there's no other free lunch.
However - if you read() from a socket, you can get notified immediatly when an error occurs, or when the other end closed the socket - you usually need to do this in some form of multiplexing event loop (that is, using select/poll or some other IO multiplexing facility).
Just note that you cannot read() from a socket to learn whether the most recent send/write succeded, Here's a few cases as of why (but it's the cases one doesn't think about that always get you)
several write() calls got buffered up due to network congestion, or because the tcp window was closed (perhaps a slow reader) and then the other end closes the socket or a hard network error occurs, thus you can't tell if if was the last write that didn't get through, or a write you did 30 seconds ago.
Network error, or firewall silently drops your packets (no ICMP replys are generated), You will have to wait until TCP times out the connection to get an error which can be many seconds, usually several minutes.
TCP is busy doing retransmission as you call send - maybe those retransmissions generate an error.(really the same as the first case)

As per the docs, try calling sock.shutdown() before the call to sock.close().

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.