Socket lose part of message in UDP

Socket lose part of message in UDP - python

I am trying to send an image frame through a UDP socket with Python 2.7, the current frame I am trying to send is 921600 bytes (640 x 480). And the buffer limit for UDP messages are 65507 bytes, so I need to split the message, here is how I am doing it.
From client.py:
image_string = frame.tostring() # frame is an multi-d numpy array
message_size = len(image_string)
sock.sendto(str(message_size), (HOST, PORT)) # First send the size
for i in xrange(0, message_size, 65507): # Split it and send
sock.sendto(image_string[i:i + 65507], (HOST, PORT))
sock.sendto("\n", (HOST, PORT)) # Mark the end to avoid hanging.
Here is how I am receiving it in server.py, I inserted some prints for debugging.
image_string = ""
data, addr = sock.recvfrom(1024) # recieve image size
message_size = int(data)
print "Incoming image with size: " + data
for i in xrange(0, message_size, 65507):
data, addr = sock.recvfrom(65507)
image_string += data.strip()
print "received part, image is now:", len(image_string)
print "End of image"
So I am reading the message same way I send it, it checks out in theory however not in practice. Possibly because of some packet loss after the client is done sending - the server is still stuck trying to read (blocked).
I know that UDP is unreliable and hard to work with, however I read that UDP is used in many video streaming applications, so I believe there should exist a solution to this problem, but I can not find it.
All help is appreciated, thanks.
Edit1: The reason I suspect packet loss is the problem, is because every time I run the test, I end up with different size of image being already sent before the server hangs.
Edit2: I forgot to mention that I tried different size of chunks while partitioning, 1024 and 500 bytes revealed no difference (5-20 bytes lost in 921600). But I should mention that I am sending and recieving from localhost, which already provides minimum error.

I know that UDP is unreliable and hard to work with, however I read that UDP is used in many video streaming applications, so I believe there should exist a solution to this problem, but I can not find it.
Those guys can. They design their protocol knowing that data may be lost (or even the contrary, arrive multiple times), it may arrive out of order, and their protocols/applications expect that.
You can not simply cut your data into pieces and sent them with UDP. You have to form each individual message in a way that each of them has a meaning on its own. If it is a "stream" it has to contain where that particular piece of data is located in the stream, and when your application receives it, it will know if the given piece of data can be handled, it is obsolete (arrived too late, or arrived already), it should be put aside and hope that some preceding parts would arrive, perhaps so unusable in itself, that the application should send a direct request to the sender in order to get things synchonized again.
In case of transferring an image - or a series of images -, you could send the offset of the data, and simply overwrite the given offset a fixed size buffer (which can host one entire image) whenever receiving something, and render the result. Then the buffer would always contain some image, at least some mixture of several images - or in extremely lucky cases a single, "real" image.
EDIT: an example of 'evaluating' what to do with a package: besides the offset, the number ('timestamp') of the image could be there too, and then the application could avoid overwriting a newer part of the image with something old - should some packet from the past (re)appear for any reason.

Why do you use the maximum buffer limit? The largest payload you can reliably send through UDP is 534 bytes. Sending more than that may cause fragmentation. If you are concerned with the data loss, use TCP.

Related

Socket. How to receive all data with socket.recv()?

I have a problem with receiving data from server to client. I have the following client-side function that attempts to receive data from the server. The data sent by the server using the socket.sendall (data) function is greater than buff_size so I need a loop to read all the data.
def receiveAll (sock):
data = ""
buff_size = 4096
while True:
part = sock.recv (buff_size)
data + = part
if part <buff_size:
break;
return data
The problem that occurs to me is that after the first iteration (read the first 4096mb), in the second the program is blocked waiting for the other data in part = sock.recv (buff_size). How do I have to do so that recv() can continue reading the other missing data? Thank you.

Your interpretation is wrong. Your code reads all the data that it get from the server. It just doesn't know that it should stop listening for incoming data. It doesn't know that the server sent everything it had.
First of all note that these lines
if part <buff_size:
break;
are very wrong. First of all you are comparing a string to int (in Python3.x that would throw an exception). But even if you meant if len(part) <buff_size: then this is still wrong. Because first of all there might be a lag in the middle of streaming and you will only read a piece smaller then buff_size. Your code will stop there.
Also if your server sends a content of the size being a multiple of buff_size then the if part will never be satisfied and it will hang on .recv() forever.
Side note: don't use semicolons ;. It's Python.
There are several solutions to your problem but none of them can be used correctly without modyfing the server side.
As a client you have to know when to stop reading. But the only way to know it is if the server does something special and you will understand it. This is called a communication protocol. You have to add a meaning to data you send/receive.
For example if you use HTTP, then a server sends this header Content-Length: 12345 before body so now as a client you know that you only need to read 12345 bytes (your buffer doesn't have to be as big, but with that info you will know how many times you have to loop before reading it all).
Some binary protocols may send the size of the content in first 2 or 4 bytes for example. This can be easily interpreted on the client side as well.
Easier solution is this: simply make server close the connection after he sends all the data. Then you will only need to add check if not part: break in your code.

How does the python socket.recv() method know that the end of the message has been reached?

Let's say I'm using 1024 as buffer size for my client socket:
recv(1024)
Let's assume the message the server wants to send to me consists of 2024 bytes.
Only 1024 bytes can be received by my socket. What's happening to the other 1000 bytes?
Will the recv-method wait for a certain amount of time (say 2 seconds) for more data to come and stop working after this time span? (I.e., if the rest of the data arrives after 3 seconds, the data will not be received by the socket any more?)
or
Will the recv-method stop working immediately after having received 1024 bytes of data? (I.e. will the other 1000 bytes be discarded?)
In case that 1.) is correct ... is there a way for me to to determine the amount of time, the recv data should wait before returning or is it determined by the system? (I.e. could I tell the socket to wait for 5 seconds before stopping to wait for more data?)
UPDATE:
Assume, I have the following code:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((sys.argv[1], port))
s.send('Hello, world')
data = s.recv(1024)
print("received: {}".format(data))
s.close()
Assume that the server sends data of size > 1024 bytes. Can I be sure that the variable "data" will contain all the data (including those beyond the 1024th byte)?
If I can't be sure about that, how would I have to change the code so that I can always be sure that the variable "data" will contain all the data sent (in one or many steps) from the server?

It depends on the protocol. Some protocols like UDP send messages and exactly 1 message is returned per recv. Assuming you are talking about TCP specifically, there are several factors involved. TCP is stream oriented and because of things like the amount of currently outstanding send/recv data, lost/reordered packets on the wire, delayed acknowledgement of data, and the Nagle algorithm (which delays some small sends by a few hundred milliseconds), its behavior can change subtly as a conversation between client and server progresses.
All the receiver knows is that it is getting a stream of bytes. It could get anything from 1 to the fully requested buffer size on any recv. There is no one-to-one correlation between the send call on one side and the recv call on the other.
If you need to figure out message boundaries its up to the higher level protocols to figure that out. Take HTTP for example. It starts with a \r\n delimited header and then has a count of the remaining bytes the client should expect to receive. The client knows how to read the header because of the \r\n then knows exactly how many bytes are coming next. Part of the charm of RESTful protocols is that they are HTTP based and somebody else already figured this stuff out!
Some protocols use NUL to delimit messages. Others may have a fixed length binary header that includes a count of any variable data to come. I like zeromq which has a robust messaging system on top of TCP.
More details on what happens with receive...
When you do recv(1024), there are 6 possibilities
There is no receive data. recv will wait until there is receive data. You can change that by setting a timeout.
There is partial receive data. You'll get that part right away. The rest is either buffered or hasn't been sent yet and you just do another recv to get more (and the same rules apply).
There is more than 1024 bytes available. You'll get 1024 of that data and the rest is buffered in the kernel waiting for another receive.
The other side has shut down the socket. You'll get 0 bytes of data. 0 means you will never get more data on that socket. But if you keep asking for data, you'll keep getting 0 bytes.
The other side has reset the socket. You'll get an exception.
Some other strange thing has gone on and you'll get an exception for that.

Sending big files in Twisted

I have a really simple code that allows me to send an image from client to server. And it works.
As simple as this:
On the client side...
def sendFile(self):
image = open(picname)
data = image.read()
self.transport.write(data)
On the server side...
def dataReceived(self, data):
print 'Received'
f = open("image.png",'wb')
f.write(data)
f.close()
Problem with this is that only works if the image is up to 4.somethingkB, as it stops working when the image is bigger (at least doesn't work when gets to 6kB). Then, is when I see that the "Received" is being printed more than one time. Which makes me think that data is being separated in smaller chunks. However, even if those chunks of data get to the server (as I'm seeing the repeated print called from the dataReceived) the image is corrupted and can't be opened.
I don't know that much about protocols, but I supposed that TCP should be reliable, so the fact that the packets got there in a different order or so, shouldn't...happen? So I was thinking that maybe Twisted is doing something there that I ignore and maybe I should use another Protocol.
So here is my question. Is there something that I could do now to make it work or I should definitely change to another Protocol? If so...any idea? My goal would be sending a bigger image, maybe the order of hundreds of kB.

This is a variant of an entry in the Twisted FAQ:
Why is protocol.dataReceived called with only part of the data I called transport.write with?
TCP is a stream-based protocol. It is delivering a stream of bytes, which may be broken up into an arbitrary number of fragments. If you write one big blob of bytes, it may be broken up into an arbitrary number of smaller chunks, depending on the characteristics of your physical network connection. When you say that TCP should be "reliable", and that the chunks should arrive in order, you are roughly correct: however, what arrives in order is the bytes, not the chunks.
What you are doing in your dataReceived method is, upon receiving each chunk, opening a file and writing the contents of just that chunk to "image.png", then closing it. If you change it to open the file in connectionMade and close the file in connectionLost you should see at least vaguely the right behavior, although this will still cause you to get corrupted / truncated images if the connection is lost unexpectedly, with no warning. You should really use a framing protocol like AMP; although if you're just sending big blobs of data around, HTTP is probably a better choice.

Python: Sending large object over UDP

I am new to socket programming and recently picked up Python for it. I have a few questions in mind which I can't seems to find a definite answer for.
I am looking into sending data over UDP and have written a simple python script to do just that. Works fine sending small objects (Small pickled objects to be exact) across but how should I handle objects that are too large to be fitted in one UDP packet?
I've thought of first sizing up the object in bytes. Nothing will be done if the object is small enough to be fitted in a UDP packet, but if the object is too huge, the object will then be split up evenly (if possible) into many smaller chunks so that it can be fitted into multiple UDP packets and be sent across to the client. Once the client receive the chunks, the client will reassemble the multiple UDP packets into the original state.
I immediately hit my first brick wall when trying to implement the mentioned above.
From my research done, it doesn't seems like there is any 'effective' way in getting the byte size of an object. This means I am unable to determine if an object is too large to fit in a UDP packet.
What happen if I insist on sending an large object across to the client? Will it get fragmented automatically and be reassembled on the client side or will the packet be dropped by the client?
What is the right way to handle large object over UDP? Keeping in mind that the large object could be a file that is 1GB in size or a byte object that is 25MB in size.
Thanks in advance.
Side Notes:
I do understand that UDP packets may not always come in order and
therefore I have already implemented countermeasure for it which is
to tag a sequence number to the UDP packets sent out to the client.
I do understand that there is no assurance that the client will receive all of the UDP packets. I am not concerned about packet loss for now.
I do understand that TCP is the right candidate for what I am trying to do but I am focusing on understanding UDP and on how to handle situations where acknowledgement of packets from client is not possible for now.
I do understand the usage of pickle is insecure. Will look into it at later stage.

A UDP packet can be as large as approximately 64k. So if you want to send a file that is larger than that you can fragment yourself into packets of 64k. That is the theoretical maximum. My advice is to use fragments of smaller chunks of 500 bytes.
IP is responsible for fragmentation and reassembly of the packets if you do use 64k packets. Smaller packets of 500 bytes are not likely to be fragmented because the mtu is usually around 1500 bytes. If you use larger packets that are fragmented, IP is going to drop them if one of those fragments is lost.
You are right that using TCP is probably better to use for something like this or even an existing protocol like TFTP. It implements a per packet acking mechanism and sequence numbers just like you did.

Most applications dealing with sockets might store data in memory until it is all sent. Bad idea! I have a product application that has to send very large files over the web and have used chunking methods in the past. I just rewrote some of my code in Python ( binfileio ). In my applications, I have sent chuck files to a reserved folder and, once all chunked files were tucked in bed, I reassembled them. I've never trusted sending large files across a wire that could get cut at any time. Hope this helps.

Pytho 3.0
Code for Sending Data Through UDP Communication
import socket
UDP_IP = "127.0.0.1"
UDP_PORT = 5005
MESSAGE = "Hello, World!"
print("UDP target IP:",UDP_IP)
print("UDP target port:",UDP_PORT)
print("message:",MESSAGE)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.sendto(bytes(MESSAGE, "utf-8"), (UDP_IP, UDP_PORT))

How can I deserialize incoming data on the TCP server?

I've set up a server reusing the code found in the documentation where I have self.data = self.request.recv(1024).strip().
But how do I go from this, deserialize it to protobuf message (Message.proto/Message_pb2.py). Right now it seems that it's receiving chunks of 1024 bytes, and that more then one at the time... making it all rubbish :D

TCP is typically just a stream of data. Just because you sent each packet as a unit, doesn't mean the receiver gets that. Large messages may be split into multiple packets; small messages may be combined into a single packet.
The only way to interpret multiple messages over TCP is with some kind of "framing". With text-based protocols, a CR/LF/CRLF/zero-byte might signify the end of each frame, but that won't work with binary protocols like protobuf. In such cases, the most common approach is to simply prepend each message with the length, for example in a fixed-size (4 bytes?) network-byte-order chunk. Then the payload. In the case of protobuf, the API for your platform may also provide a mechanism to write the length as a "varint".
Then, reading is a matter of:
read an entire length-header
read (and buffer) that many bytes
process the buffered data
rinse and repeat
But keeping in mind that you might have (in a single packet) the end of one message, 2 complete messages, and the start of another message (maybe half of the length-header, just to make it interesting). So: keeping track of exactly what you are reading at any point becomes paramount.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.