Receiving multicast data with socket recvfrom in Python

Receiving multicast data with socket recvfrom in Python - python

I have a multicast server sending data that must be captured by a python client. The problem is that recvfrom does not receive any data or at least receive the first packet and sorta caches it. If I use recvfrom in a loop then my data is received correctly.
My question is why I should use recvfrom in a loop to have the expected behavior?
from socket import *
s=socket(AF_INET, SOCK_DGRAM)
s.bind(('172.30.102.141',12345))
m=s.recvfrom(1024)
print m[0]
# sleep for x seconds here
m=s.recvfrom(1024)
print m[0]
# print the exact same thing as previously...

One thing is for sure, multicast is basically sending UDP packages and you have to keep listening for new packages. That is true even for TCP protocol based communication.
When you use low level interfaces for network communication, like socket is, it's up on both sides to define application level protocol.
That means, you define how receiving party concludes that message is complete. This is because message could get split in multiple parts/packets that get through the network. So receiving side has to assemble them in a proper way and then check if the message is whole. After that you push it up through the pipeline of processing messages or whatever you do in receiving side.
When using UDP, receiving side doesn't know if there is any packet on its way, so it just does try to recvfrom 1024 bytes and finishes. It doesn't know and should not care if there is more data on it's way. It's up to you to take care of that.

Related

Can I say that socket.send() "flushed"/"resets" the TCP stream here?

I have a simple server-client program:
In server.py:
import socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind(("127.0.0.1", 1234))
server_socket.listen()
connection_socket, address = server_socket.accept()
with connection_socket:
data = connection_socket.recv(1000)
connection_socket.send(bytearray([0x0]))
print(data)
server_socket.close()
And in client.py:
import socket
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(("127.0.0.1", 1234))
client_socket.send(bytearray([0x0, 0x1, 0x2]))
print(client_socket.recv(1))
client_socket.send(bytearray([0x3, 0x4, 0x5]))
client_socket.close()
Here's what I think is going on:
What I know of the TCP protocol is that it is "stream-based". I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled. This is seemingly interrupted by the send made by the server or the recv made by the client. The following 3 bytes go unreceived.
Are these correct assumptions? If not, what is really going on here?
Thanks in advance for your help!

I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled.
Which is wrong. recv blocks until at least one byte is received. The number given just specifies the maximum number of bytes which should be read, i.e. neither the exact number nor the minimum number.
The following 3 bytes go unreceived.
It is likely that in this specific case the 1000 bytes are received at once, leaving 3 bytes unread. This is different though if larger amounts of data are send, especially over links with low MTU (i.e. local network, WiFi vs. localhost traffic). Here it can be seen that only parts of the expected data are received during a single recv.
Even the assumption that send will send all given data is wrong: send will only send at most the given data. One needs to actually check the return value to see how much actually got send. Use sendall instead if you want to have everything send.
Can I say that socket.send() “flushed”/“resets” the TCP stream here?
No. send and recv work only on the socket write and read buffers. They don't actually cause a sending or receiving. This is done by the OS instead. A send just puts the data into the sockets write buffer and the OS will eventually transmit this data. This transmission is not in all cases done immediately though. If there are outstanding unacknowledged data the sending might get deferred until the data are acknowledged (details depend on the TCP window). If only few data are in the buffer the OS might wait a while for the application to call send with more data in order to keep the transmission overhead low (NAGLE algorithm).
Thus the phrase "flush" has no real meaning here. And "reset" actually means something completely different with TCP - namely forcibly breaking the connection using the RST flag. So don't use these phrases in this context.

Python TCP Duplicate Message

I'm working on an in-house TCP Server, and TCP client. When there is 0% packet loss the Server and Client work fine. However, when I have 20% or more packet loss I am seeing duplicate TCP messages. I am receiving something like this....
Client <-- MessageA -- Server
Client -- MessageB --> Server
Client <-- MessageCMessageA -- Server
Is it possible that MessageA is not completely making it to the Client, it times outs, then TCP resends it, and then the original message makes it which is received at a later time by the Client?
My question is if TCP works like that, and if that's a possible scenario with a network containing 20% packet loss or more.
Barebones of how the client and server are sending/receiving data...
socket.recv(1024)
socket.send(1024)

No, it is not possible. TCP guarantees that it will either deliver data exactly once and in the order in which it was sent, or signal an error to the application. Hence, there is probably a bug in your code. The most likely is that your code fails to deal with partial reads.
When you perform a write or send on a TCP socket, the TCP module will segment your data into as many packets as are needed. In the presence of packet loss, it is possible that some packets have arrived successfully but others must be resent. In that case, the corresponding read or recv will only receive part of the data — the remaining data will arrive in a subsequent read or recv. In other words, TCP does not preserve message boundaries.
Your code is probably interpreting such a split message as multiple messages. Make sure that you accumulate a full message in your buffer before attempting to parse it.

python: invoking socket.recvfrom() twice

I am writing two python scripts to communicate over UDP using python sockets. Here's the related part of code
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.setblocking(True) #I want it to be blocking
#(...)
(msg, addr) = sock.recvfrom(4)
#(...)
(msg2, addr2) = sock.recvfrom(2)
I want the receiving to be blocking and I don't know the size of the whole message before I read the first 4-byte part. The above code becomes blocked on the sock.recvrfom(2) part, whereas modified, with one sock.recvfrom instead of two works alright:
(msg, addr) = sock.recvfrom(6) #works ok, but isn't enough for my needs
Any idea how I can conveniently read the incoming data in two parts or why the code doesn't work as expected?

socket.recvfrom(size) will (for UDP sockets) read one packet, up to size bytes. The excess data is discarded. If you want to receive the whole packet, you have to pass a larger bufsize, then process the packet in bits (instead of trying to receive it in bits.)
If you want a more convenient, less fickle interface to network I/O, consider Twisted.

Read from UDP socket dequeues the whole datagram.

UDP is a message-based protocol. recvfrom will read the entire message that was originally sent, but if the buffer isn't big enough, it will throw an exception:
socket.error: [Errno 10040] A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself
So I am not sure why you would hang on the 2nd recvfrom if a 6-byte message was originally sent. You should throw an exception on the first recvfrom. Perhaps post an actual working, minimal example of the client and the server program.

Python doesn't detect a closed socket until the second send

When I close the socket on one end of a connection, the other end gets an error the second time it sends data, but not the first time:
import socket
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(("localhost", 12345))
server.listen(1)
client = socket.create_connection(("localhost",12345))
sock, addr = server.accept()
sock.close()
client.sendall("Hello World!") # no error
client.sendall("Goodbye World!") # error happens here
I've tried setting TCP_NODELAY, using send instead of sendall, checking the fileno(), I can't find any way to get the first send to throw an error or even to detect afterwards that it failed. EDIT: calling sock.shutdown before sock.close doesn't help. EDIT #2: even adding a time.sleep after closing and before writing doesn't matter. EDIT #3: checking the byte count returned by send doesn't help, since it always returns the number of bytes in the message.
So the only solution I can come up with if I want to detect errors is to follow each sendall with a client.sendall("") which will raise an error. But this seems hackish. I'm on a Linux 2.6.x so even if a solution only worked for that OS I'd be happy.

This is expected, and how the TCP/IP APIs are implemented (so it's similar in pretty much all languages and on all operating systems)
The short story is, you cannot do anything to guarantee that a send() call returns an error directly if that send() call somehow cannot deliver data to the other end. send/write calls just delivers the data to the TCP stack, and it's up to the TCP stack to deliver it when it can.
TCP is also just a transport protocol, if you need to know if your application "messages" have reached the other end, you need to implement that yourself(some form of ACK), as part of your application protocol - there's no other free lunch.
However - if you read() from a socket, you can get notified immediatly when an error occurs, or when the other end closed the socket - you usually need to do this in some form of multiplexing event loop (that is, using select/poll or some other IO multiplexing facility).
Just note that you cannot read() from a socket to learn whether the most recent send/write succeded, Here's a few cases as of why (but it's the cases one doesn't think about that always get you)
several write() calls got buffered up due to network congestion, or because the tcp window was closed (perhaps a slow reader) and then the other end closes the socket or a hard network error occurs, thus you can't tell if if was the last write that didn't get through, or a write you did 30 seconds ago.
Network error, or firewall silently drops your packets (no ICMP replys are generated), You will have to wait until TCP times out the connection to get an error which can be many seconds, usually several minutes.
TCP is busy doing retransmission as you call send - maybe those retransmissions generate an error.(really the same as the first case)

As per the docs, try calling sock.shutdown() before the call to sock.close().

Split up python packets?

Is there a way python can distinguish between packets being sent ? e.g.
python receives data
it process data
clients sends first packet
client sends second packet
python receives data, can i receive the first packet rather then all info in the buffer
I know i can set it up up so it sends data i confirm and the client wont send more data it i have confirmed that have a processed the last piece but i'd rather not
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(("", 2000))
sock.listen(5)
all the relevant socket data

There are basically two approaches:
At the start of each packet, send an integer specifying how long that packet will be. When you receive data, read the integer first, then read that many more bytes as the first packet.
Send some sort special marker between packets. This only works if you can guarantee that the marker cannot occur within a packet.
As S. Lott points out, you could instead use UDP (which is packet-based) instead of TCP (which is stream-based), but then you give up the other features that TCP provides (retransmission of dropped packets, sequential packets, and congestion control). It's not too hard to write your own code for retransmission, but congestion control is difficult to get right.

Is there a way python can distinguish between packets being sent ?
Yes. Use UDP instead of TCP.

Netstring is a simple serialization
format used to send data packets. Each
data packet is of the form
'length:data'.
http://en.wikipedia.org/wiki/Netstring
Python networking frameworks like
twisted has direct support for
netstring.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.