I am writing two python scripts to communicate over UDP using python sockets. Here's the related part of code
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.setblocking(True) #I want it to be blocking
#(...)
(msg, addr) = sock.recvfrom(4)
#(...)
(msg2, addr2) = sock.recvfrom(2)
I want the receiving to be blocking and I don't know the size of the whole message before I read the first 4-byte part. The above code becomes blocked on the sock.recvrfom(2) part, whereas modified, with one sock.recvfrom instead of two works alright:
(msg, addr) = sock.recvfrom(6) #works ok, but isn't enough for my needs
Any idea how I can conveniently read the incoming data in two parts or why the code doesn't work as expected?
socket.recvfrom(size) will (for UDP sockets) read one packet, up to size bytes. The excess data is discarded. If you want to receive the whole packet, you have to pass a larger bufsize, then process the packet in bits (instead of trying to receive it in bits.)
If you want a more convenient, less fickle interface to network I/O, consider Twisted.
Read from UDP socket dequeues the whole datagram.
UDP is a message-based protocol. recvfrom will read the entire message that was originally sent, but if the buffer isn't big enough, it will throw an exception:
socket.error: [Errno 10040] A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself
So I am not sure why you would hang on the 2nd recvfrom if a 6-byte message was originally sent. You should throw an exception on the first recvfrom. Perhaps post an actual working, minimal example of the client and the server program.
Related
I have a simple server-client program:
In server.py:
import socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind(("127.0.0.1", 1234))
server_socket.listen()
connection_socket, address = server_socket.accept()
with connection_socket:
data = connection_socket.recv(1000)
connection_socket.send(bytearray([0x0]))
print(data)
server_socket.close()
And in client.py:
import socket
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(("127.0.0.1", 1234))
client_socket.send(bytearray([0x0, 0x1, 0x2]))
print(client_socket.recv(1))
client_socket.send(bytearray([0x3, 0x4, 0x5]))
client_socket.close()
Here's what I think is going on:
What I know of the TCP protocol is that it is "stream-based". I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled. This is seemingly interrupted by the send made by the server or the recv made by the client. The following 3 bytes go unreceived.
Are these correct assumptions? If not, what is really going on here?
Thanks in advance for your help!
I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled.
Which is wrong. recv blocks until at least one byte is received. The number given just specifies the maximum number of bytes which should be read, i.e. neither the exact number nor the minimum number.
The following 3 bytes go unreceived.
It is likely that in this specific case the 1000 bytes are received at once, leaving 3 bytes unread. This is different though if larger amounts of data are send, especially over links with low MTU (i.e. local network, WiFi vs. localhost traffic). Here it can be seen that only parts of the expected data are received during a single recv.
Even the assumption that send will send all given data is wrong: send will only send at most the given data. One needs to actually check the return value to see how much actually got send. Use sendall instead if you want to have everything send.
Can I say that socket.send() “flushed”/“resets” the TCP stream here?
No. send and recv work only on the socket write and read buffers. They don't actually cause a sending or receiving. This is done by the OS instead. A send just puts the data into the sockets write buffer and the OS will eventually transmit this data. This transmission is not in all cases done immediately though. If there are outstanding unacknowledged data the sending might get deferred until the data are acknowledged (details depend on the TCP window). If only few data are in the buffer the OS might wait a while for the application to call send with more data in order to keep the transmission overhead low (NAGLE algorithm).
Thus the phrase "flush" has no real meaning here. And "reset" actually means something completely different with TCP - namely forcibly breaking the connection using the RST flag. So don't use these phrases in this context.
I'm trying to understand how udp messages are received. I have an external tool that sends data over udp every 1 second, and a simple python script that receives them something like this.
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(ip,port)
while True:
data, addr = sock.recvfrom(num)
I can receive the data, but if I change the code to
while True:
data, addr = sock.recvfrom(num)
time.sleep(10)
I am still receiving the same messages as before, just at a slower rate. I was expecting the messages sent during the 'time.sleep(10)' will be lost (which I understand will be most if not all the messages). Is there an internal storage that stores all the messages sent, whether or not the receiver is receiving them?
A Socket has a buffer that has nothing to do with python but with the OS.
So yes, the udp packets are just sitting there and waiting for the application to read them from the buffer to the application memory.
Of course this buffer is limited so if you wait too long tthe buffer will get full you will start to lose packets.
I have a multicast server sending data that must be captured by a python client. The problem is that recvfrom does not receive any data or at least receive the first packet and sorta caches it. If I use recvfrom in a loop then my data is received correctly.
My question is why I should use recvfrom in a loop to have the expected behavior?
from socket import *
s=socket(AF_INET, SOCK_DGRAM)
s.bind(('172.30.102.141',12345))
m=s.recvfrom(1024)
print m[0]
# sleep for x seconds here
m=s.recvfrom(1024)
print m[0]
# print the exact same thing as previously...
One thing is for sure, multicast is basically sending UDP packages and you have to keep listening for new packages. That is true even for TCP protocol based communication.
When you use low level interfaces for network communication, like socket is, it's up on both sides to define application level protocol.
That means, you define how receiving party concludes that message is complete. This is because message could get split in multiple parts/packets that get through the network. So receiving side has to assemble them in a proper way and then check if the message is whole. After that you push it up through the pipeline of processing messages or whatever you do in receiving side.
When using UDP, receiving side doesn't know if there is any packet on its way, so it just does try to recvfrom 1024 bytes and finishes. It doesn't know and should not care if there is more data on it's way. It's up to you to take care of that.
I am writing a simple UDP-based client/server app and testing with both the client/server on localhost, and I would like for the sender to know when send() would have blocked. I am using Python, so I think I can do:
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setblocking(0)
s.connect(('127.0.0.1', 12345))
data = "x"
for i in range(0, 9000): # More than about 9000 gives an error
data += x
while True:
try:
s.send(data)
except socket.error as e:
print "Would have blocked"
# Do something useful here
I would like to test that my error-handling code works, so I would like to get send() to want to block. The problem is, I cannot figure out how to do that. I have tried have:
BUFFSIZE = 2000
input = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
input.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, BUFFSIZE)
input.bind(('127.0.0.1', 12345))
while True:
data = input.recv(BUFFSIZE)
time.sleep(100)
Between sleeping for 100 sec and setting a small receive buffer, I would have expected the buffer to have filled up. However, it never does. So how can I get the receive buffer to fill up so that send blocks?
I am using Mac OS Lion and the Macports version of Python 2.6.
UDP doesn't normally block. If the receiver has a full buffer, the packet gets silently discarded. This is by design. If you want reliable transport and blocking semantics, use TCP instead.
Comment: The reason TCP blocks is because the sender gets confirmation for every packet it sends. The sender only allows a certain amount of data "in transit" that does not have confirmation, and blocks when this threshold is reached. Since UDP does not send confirmation of received packets, the sender has no way of knowing when to block. Of course, it might decide to block if it saturates its Ethernet port, but with UDP there is no way to tell if your uplink is saturated, the receiver is hung, or gremlins ate your packet. No guarantees!
Is there a way python can distinguish between packets being sent ? e.g.
python receives data
it process data
clients sends first packet
client sends second packet
python receives data, can i receive the first packet rather then all info in the buffer
I know i can set it up up so it sends data i confirm and the client wont send more data it i have confirmed that have a processed the last piece but i'd rather not
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(("", 2000))
sock.listen(5)
all the relevant socket data
There are basically two approaches:
At the start of each packet, send an integer specifying how long that packet will be. When you receive data, read the integer first, then read that many more bytes as the first packet.
Send some sort special marker between packets. This only works if you can guarantee that the marker cannot occur within a packet.
As S. Lott points out, you could instead use UDP (which is packet-based) instead of TCP (which is stream-based), but then you give up the other features that TCP provides (retransmission of dropped packets, sequential packets, and congestion control). It's not too hard to write your own code for retransmission, but congestion control is difficult to get right.
Is there a way python can distinguish between packets being sent ?
Yes. Use UDP instead of TCP.
Netstring is a simple serialization
format used to send data packets. Each
data packet is of the form
'length:data'.
http://en.wikipedia.org/wiki/Netstring
Python networking frameworks like
twisted has direct support for
netstring.