Python TCP Socket Data Sometimes Missing Parts. Socket Overflow?

Python TCP Socket Data Sometimes Missing Parts. Socket Overflow? - python

Short description:
Client sends server data via TCP socket. Data varies in length and is strings broken up by the delimiter "~~~*~~~"
For the most part it works fine. For a while. After a few minutes data winds up all over the place. So I start tracking the problem and data is ending up in the wrong place because the full thing has not been passed.
Everything comes into the server script and is parsed by a different delimiter -NewData-* then placed into a Queue. This is the code:
Yes I know the buffer size is huge. No I don't send data that kind of size in one go but I was toying around with it.
class service(SocketServer.BaseRequestHandler):
def handle(self):
data = 'dummy'
#print "Client connected with ", self.client_address
while len(data):
data = self.request.recv(163840000)
#print data
BigSocketParse = []
BigSocketParse = data.split('*-New*Data-*')
print "Putting data in queue"
for eachmatch in BigSocketParse:
#print eachmatch
q.put(str(eachmatch))
#print data
#self.request.send(data)
#print "Client exited"
self.request.close()
class ThreadedTCPServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
pass
t = ThreadedTCPServer(('',500), service)
t.serve_forever()
I then have a thread running on while not q.empty(): which parses the data by the other delimiter "~~~*~~~"
So this works for a while. An example of the kind of data I'm sending:
2016-02-23 18:01:24.140000~~~*~~~Snowboarding~~~*~~~Blue Hills~~~*~~~Powder 42
~~~*~~~Board Rental~~~*~~~15.0~~~*~~~1~~~*~~~http://bigshoes.com
~~~*~~~No Wax~~~*~~~50.00~~~*~~~No Ramps~~~*~~~2016-02-23 19:45:00.000000~~~*~~~-15
But things started to break. So I took some control data and sent it in a loop. Would work for a while then results started winding up in the wrong place. And this turned up in my queue:
2016-02-23 18:01:24.140000~~~*~~~Snowboarding~~~*~~~Blue Hills~~~*~~~Powder 42
~~~*~~~Board Rental~~~*~~~15.0~~~*~~~1~~~*~~~http://bigshoes.com
~~~*~~~No Wax~~~*~~~50.00~~~*~~~No Ramps~~~*~~~2016-02-23 19:45:00.000000~~~*~
Cutting out the last "~~-15".
So the exact same data works then later doesn't. That suggests some kind of overflow to me.
The client connects like this:
class Connect(object):
def connect(self):
host = socket.gethostname() # Get local machine name
#host = "127.0.0.1"
port = 500 # Reserve a port for your service.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#print('connecting to host')
sock.connect((host, port))
return sock
def send(self, command):
sock = self.connect()
#recv_data = ""
#data = True
#print('sending: ' + command)
sock.sendall(command)
sock.close()
return
It doesn't wait for a response because I don't want it hanging around waiting for one. But it closes the socket and (as far as I understand) I don't need to flush the socket buffer or anything it should just be clearing itself when the connection closes.
Would really appreciate any help on this one. It's driving me a little spare at this point.
Updates:
I'm running this on both my local machine and a pretty beefy server and I'd be pushed to believe it's a hardware issue. The server/client both run locally and sockets are used as a way for them to communicate so I don't believe latency would be the cause.
I've been reading into the issues with TCP communication. An area where I feel I'll quickly be out of my depth but I'm starting to wonder if it's not an overflow but just some king of congestion.
If sendall on the client does not ensure everything is sent maybe some kind of timer/check on the server side to make sure nothing more is coming.

The basic issue is that your:
data = self.request.recv(163840000)
line is not guaranteed to receive all the data at once (regardless of how big you make the buffer).
In order to function properly, you have to handle the case where you don't get all the data at once (you need to track where you are, and append to it). See the relevant example in the Python docs on using a socket:
Now we come to the major stumbling block of sockets - send and recv operate on the network buffers. They do not necessarily handle all the bytes you hand them (or expect from them), because their major focus is handling the network buffers. In general, they return when the associated network buffers have been filled (send) or emptied (recv). They then tell you how many bytes they handled. It is your responsibility to call them again until your message has been completely dealt with.

As mentioned, you are not receiving the full message even though you have a large buffer size. You need to keep receiving until you get zero bytes. You can write your own generator that takes the request object and yields the parts. The nice side is that you can start processing messages while some are still coming in
def recvblocks(request):
buf = ''
while 1:
newdata = request.recv(10000)
if not newdata:
if buf:
yield buf
return
buf += newdata
parts = buf.split('*-New*Data-*')
buf = parts.pop()
for part in parts:
yield part
But you need a fix on your client also. You need to shutdown the socket before close to really close the TCP connection
sock.sendall(command)
sock.shutdown(request.SHUT_RDWR)
sock.close()

Related

I have trouble understanding the code for socket programming in python

I'm a beginner in the field of sockets and lately trying ti create a terminal chat app with that.I still have trouble understanding setblocking and select functions
"This is the code i have taken from a website i'm reading from and in the code if there is nothing in data, how does it mean that the socket has been disconnected and please also do explain what affect the setblocking in the server or the client does.I have read somewhere that setblocking allows to move on if the data has been fully not recieved,i'm not quite satisfied with the explaination.Please explain in simple words "
import select
import socket
import sys
import Queue
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setblocking(0)
server_address = ('localhost', 10000)
server.bind(server_address)
server.listen(5)
inputs = [ server ]
outputs = [ ]
message_queues = {}
while inputs:
readable, writable, exceptional = select.select(inputs, outputs, inputs)
for s in readable:
if s is server:
connection, client_address = s.accept()
connection.setblocking(0)
inputs.append(connection)
message_queues[connection] = Queue.Queue()
else:
data = s.recv(1024)
if data:
message_queues[s].put(data)
if s not in outputs:
outputs.append(s)
else:
if s in outputs:
outputs.remove(s)
inputs.remove(s)
s.close()

if there is nothing in data, how does it mean that the socket has been disconnected
The POSIX specification of recv() says:
Upon successful completion, recv() shall return the length of the message in bytes. If no messages are available to be
received and the peer has performed an orderly shutdown, recv() shall return 0. …
In the Python interface, return value 0 corresponds to a returned buffer of length 0, i. e. nothing in data.
what affect the setblocking in the server or the client does.
The setblocking(0) sets the socket to non-blocking, i. e. if e. g. the accept() or recv() cannot be completed immediately, the operation fails rather than blocks until complete. In the given code, this can hardly happen, since the operations are not tried before they are possible (due to the use of select()). However, the example is bad in that it includes output in the select() arguments, resulting in a busy loop since output is writable most of the time.

Python TCP Sockets: How to know if a specific connection has sent information

I have a multi-threaded Python 3 application that on thread #1 accepts TCP socket communications. Thread #2 will check all current connections if they have anything to receive, then act accordingly.
So, currently I have a list called all_connections which is a list of accepted socket connection objects.
Using for connection in all_connections: I can loop through all the connection objects. I know I use conn.recv(256) to check if there is anything ready to recive on this socket. Will this block the loop though untill there is something to receive? I have set conn.setblocking(1) beforehand although Im unsure if this is the best way to get around it:
Here is some example code:
Thread 1
self.all_connections = [] # init a list to hold connection objs
while 1:
try:
conn, address = self.socket.accept()
conn.setblocking(1) # non blocking
except Exception as e:
continue
self.all_connections.append(conn) # Save the connection object
Thread 2
while True:
for connection in self.all_connections:
received = connection.recv(256)
return
So, I'm only interested in connections that have actually sent something, as I will be sending them something back most likely.
I know I can use select.select in order to check if there is anything to receive on the socket, but that wouldn't help me reference the specific connection.

Yes, read() will block; this is the default behaviour. Calling socket.setblocking(1) actually enables blocking, which is opposite of what you wanted. setblocking(False) will set non-blocking mode. I/O on non-blocking sockets requires that you use exception handling.
A better way, and you are already headed in the right direction, is to use select(). You do in fact know which socket sent data because select() returns a list of sockets that are available for reading, writing, or that have an error status. You pass to select() a list of the sockets that you are interested in and it returns those that are available for I/O. Here is the function signature:
select(...)
select(rlist, wlist, xlist[, timeout]) -> (rlist, wlist, xlist)
So the code in thread 2 would look something like this:
from select import select
while True:
rlist, wlist, xlist = select(self.all_connections, [], [])
for connection in rlist:
received = connection.recv(256)
The above code only checks for readable sockets in the list of all connections and reads data from those that are ready. The read will not block.

Socket with mysterious buffer

I am building a python based interface for pulling data over TCP from an instrument. The datastream comes as specific events, and the timing is not steady: I get bursts of data and then slow periods. They are small data packets, so for simplicity assume they come across as complete packets.
Here is the behavior I get from the socket:
Send Event #1: socket.recv returns event #1
Send Event #2: socket.recv returns event #2
Quickly Send Event #3-50: socket.recv returns only events #3-30 (returns 27 times)
Slowly send Event #51: socket returns.recv event #31
Slowly send Event #52: socket returns.recv event #32
No data is lost. But there is clearly a buffer somewhere that is filled, and the socket is now returning old data. But shouldn't recv just keep returning till that buffer is empty? Instead, it is only returning when it receives a new packet, despite having a buffer of packets built up. Weird!
Here is the essence of the code (this is for non-blocking, I've also done blocking with just recv - same result). For simplicity I stripped all the packet reassembly stuff. I've carefully traced it back to the socket, so I know that is not to blame.
class mysocket:
def __init__(self,ip,port):
self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.socket.connect((ip,port))
self.keepConn = True
self.socket.setblocking(0)
threading.Thread(target = self.rcvThread).start()
threading.Thread(target = self.parseThread).start()
def rcvThread(self):
while self.keepConn:
readable,writable,inError = select([self.socket],[self.socket],[],.1)
if readable:
packet = self.socket.recv(4096)
self.recvqueue.put_nowait(packet)
try:
xmitmsg = self.sendqueue.get_nowait()
except Queue.Empty:
pass
else:
if writable:
self.socket.send(xmitmsg)
def parseThread(self,rest = .1):
while self.keepConn:
try:
output = self.recvqueue.get_nowait()
eventnumber = struct.unpack('<H',output[:2]
print eventnumber
except Queue.Empty:
sleep(rest)
Why can't I get the socket to dump all the data in it's buffer? I can never catch up! This one is too odd. Anybody have pointers?
I'm an amateur but I've really done my homework on this one and am completely baffled.

packet = self.socket.recv(4096)
self.recvqueue.put_nowait(packet)
TCP is a stream-based protocol, not a message-based one. It doesn't preserve message boundaries. Meaning you can't expect to have one recv() call per message. If you send data in a burst, Nagle's algorithm will combine the data into one TCP packet.
Your code assumes that each recv() call returns one "packet", and the parse thread prints the first number from each "packet". But recv() doesn't return packets, it returns chunks of data from the TCP stream. These chunks can contain one message or multiple messages or even partial messages. There's no guarantee that the first two bytes are always event numbers.
Typically, reading data from a TCP connection involves calling recv() multiple times and storing the data you get in a buffer. Once you've received an entire message then you remove the appropriate number of bytes from the buffer and process them.
If you have variable-length messages then you need to keep track of message boundaries yourself. TCP doesn't do it for you like UDP does. That means adding a header containing the message length to the front of each message.
try:
xmitmsg = self.sendqueue.get_nowait()
except Queue.Empty:
pass
else:
if writable:
self.socket.send(xmitmsg)
On another note, it looks like this code has a bug. It removes messages from the sendqueue whether or not the socket is writable. If the socket's not writable it'll silently throw away messages.

WinSock error #10055

I have client-server architecture build in python, unfortunately the original design was made that each request to server is represented by one TCP connection and I have to send requests in large groups (20 000+) and sometimes there occurs socket error #10055.
I've already found out how to handle it in python:
>>> errno.errorcode[10055]
'WSAENOBUFS'
>>> errno.WSAENOBUFS
10055
And build a code that is able to handle that error and reconnect (of course with little time delay to give server time to do whatever it has to do):
class MyConnect:
# __init__ and send are not important here
def __enter__(self):
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# Try several reconnects
for i in range(0,100):
try:
self.sock.connect((self.address, self.port))
break
except socket.error as e:
if e.errno == errno.WSAENOBUFS:
time.sleep(1)
else:
raise
return self
def __exit__(self, type, value, traceback):
self.sock.close()
# Pseudocode
for i in range(0,20000):
with MyConnect(ip,port) as c:
c.send(i)
My questions are:
is there any "good practice" way to do this?
is e.errno == errno.WSAENOBUFS multi-platform? If not so, how to make it multi-platform?
Note: I've tested in just on Windows yet, I need it to work on Linux too.

You are clogging your TCP stack with outgoing data and all the connection establishment and termination packets.
If you have to stick to this design, then force each connection to linger until its data has been successfully sent. That is to say, that by default, close() on the socket returns immediately and further delivery attempts and connection tear-down happen "in the background". You can see that doing so over 20000+ times in a tight loop can easily overwhelm the OS network stack.
The following will force your socket close() to hang on for up to 10 seconds trying to deliver the data:
import struct
s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 10))
Note that this is not the same as Python socket.sendall() - that one just passes all the bytes to the kernel .
Hope this helps.

What's wrong with this basic python select on windows?

I'm having trouble using select. I just want a mean to know which clients are still there to receive data. There is my code :
import socket, select
server = socket.socket()
server.bind(('localhost',80))
server.listen(1)
answer = "HTTP/1.1 200 OK\r\n"
answer+= "Content-type: text/plain\r\n"
answer+= "Connection: close\r\n"
body = "test msg"
answer+= "Content-length: %d\r\n\r\n" % len(body)
answer+= body
clients = []
while True:
nextclient,addr = server.accept()
clients.append(nextclient)
clients = select.select([],clients,[],0.0)[1]
for client in clients:
client.send(answer)
The select send me everytime all the sockets opened before, even if the connection was closed on the other end, this results in a Errno1053 : an etablished connection was aborted by the software in your host machine.
I thank you in advance for your help.

Your select never blocks.
A time-out value of zero specifies a poll and never blocks.
Also, your listen method's argument is absolutely extreme.
socket.listen(backlog)
Listen for connections made to the socket. The backlog argument specifies the maximum number of queued connections and should be at
least 0; the maximum value is system-dependent (usually 5)

As far as I can tell, you never close a socket after writing to it and you don't as well remove it from clients.
Besides, you overwrite clients so that your list of clients is lost; some clients will never be processed.
Something like
clients_now = select.select([],clients,[],0.0)[1]
for client in clients_now:
client.send(answer)
client.close()
clients.remove(client)
might help.
BTW, just a small block of 1 or 10 ms will keep your server responsive, but prevents a high CPU load because of idle waiting.
BTW2: Maybe you should include your server socket in the select process as well...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python TCP Socket Data Sometimes Missing Parts. Socket Overflow? - python

Related

I have trouble understanding the code for socket programming in python

Python TCP Sockets: How to know if a specific connection has sent information

Socket with mysterious buffer

WinSock error #10055

What's wrong with this basic python select on windows?

Categories

Resources