I want to send/receive 'portioned' data using TCP Python socket module.
My receiving server side socket is set to receive 40 bytes of data in a single recv call:
while True:
data = connection.recv(40)
if not data:
break
...
connection.close()
I have some sample data ~500 bytes long, converter to bytes object, that is being sent to 'server' by 'client':
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
...
s.send(sample_data)
Although I only have one call to send method does it mean that client sends 40 bytes of 'sample_data' at a time while server 'requests' 40 bytes at once for as long as whole package is not sent completely?
I found in this post there's SO_SNDBUF parameter that sets size of send buffer for socket, but how is it different from 'ordinary' socket without SO_SNDBUF being set?
buffer_size = 40
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, buffer_size)
...
s.send(sample_data)
TCP is a streaming protocol, meaning there are no message boundaries. It is just a stream of bytes. Think of it as a FIFO. If client sends 500 bytes, then there will be 500 total bytes to receive on the server. A recv(40) will receive from 1-40 bytes in the order sent. You must check the return value to see how much you received.
You could receive data lengths of: 40,40,40,40,40,40,40,40,40,40,40,40,20
Or you could receive something like: 40,40,5,40,40,40,40,40,40,40,40,40,40,15
Making another recv(40) on a blocking socket will then hang until at least one more byte is received, or return zero bytes if the client closes the socket.
It is up to the server to concatenate the data together and determine whether a complete data transmission was received or not. Normally you'll need to define a protocol to decide what a complete message entails. This could be "500 bytes is a complete message". You'll also need to handle the case that two messages were sent and a single recv could get data from the end of one message and the beginning of another.
You do not normally need to adjust the buffer size with socket options.
Related
I have a simple server-client program:
In server.py:
import socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind(("127.0.0.1", 1234))
server_socket.listen()
connection_socket, address = server_socket.accept()
with connection_socket:
data = connection_socket.recv(1000)
connection_socket.send(bytearray([0x0]))
print(data)
server_socket.close()
And in client.py:
import socket
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(("127.0.0.1", 1234))
client_socket.send(bytearray([0x0, 0x1, 0x2]))
print(client_socket.recv(1))
client_socket.send(bytearray([0x3, 0x4, 0x5]))
client_socket.close()
Here's what I think is going on:
What I know of the TCP protocol is that it is "stream-based". I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled. This is seemingly interrupted by the send made by the server or the recv made by the client. The following 3 bytes go unreceived.
Are these correct assumptions? If not, what is really going on here?
Thanks in advance for your help!
I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled.
Which is wrong. recv blocks until at least one byte is received. The number given just specifies the maximum number of bytes which should be read, i.e. neither the exact number nor the minimum number.
The following 3 bytes go unreceived.
It is likely that in this specific case the 1000 bytes are received at once, leaving 3 bytes unread. This is different though if larger amounts of data are send, especially over links with low MTU (i.e. local network, WiFi vs. localhost traffic). Here it can be seen that only parts of the expected data are received during a single recv.
Even the assumption that send will send all given data is wrong: send will only send at most the given data. One needs to actually check the return value to see how much actually got send. Use sendall instead if you want to have everything send.
Can I say that socket.send() “flushed”/“resets” the TCP stream here?
No. send and recv work only on the socket write and read buffers. They don't actually cause a sending or receiving. This is done by the OS instead. A send just puts the data into the sockets write buffer and the OS will eventually transmit this data. This transmission is not in all cases done immediately though. If there are outstanding unacknowledged data the sending might get deferred until the data are acknowledged (details depend on the TCP window). If only few data are in the buffer the OS might wait a while for the application to call send with more data in order to keep the transmission overhead low (NAGLE algorithm).
Thus the phrase "flush" has no real meaning here. And "reset" actually means something completely different with TCP - namely forcibly breaking the connection using the RST flag. So don't use these phrases in this context.
I'm trying to understand how udp messages are received. I have an external tool that sends data over udp every 1 second, and a simple python script that receives them something like this.
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(ip,port)
while True:
data, addr = sock.recvfrom(num)
I can receive the data, but if I change the code to
while True:
data, addr = sock.recvfrom(num)
time.sleep(10)
I am still receiving the same messages as before, just at a slower rate. I was expecting the messages sent during the 'time.sleep(10)' will be lost (which I understand will be most if not all the messages). Is there an internal storage that stores all the messages sent, whether or not the receiver is receiving them?
A Socket has a buffer that has nothing to do with python but with the OS.
So yes, the udp packets are just sitting there and waiting for the application to read them from the buffer to the application memory.
Of course this buffer is limited so if you wait too long tthe buffer will get full you will start to lose packets.
Let's say I'm using 1024 as buffer size for my client socket:
recv(1024)
Let's assume the message the server wants to send to me consists of 2024 bytes.
Only 1024 bytes can be received by my socket. What's happening to the other 1000 bytes?
Will the recv-method wait for a certain amount of time (say 2 seconds) for more data to come and stop working after this time span? (I.e., if the rest of the data arrives after 3 seconds, the data will not be received by the socket any more?)
or
Will the recv-method stop working immediately after having received 1024 bytes of data? (I.e. will the other 1000 bytes be discarded?)
In case that 1.) is correct ... is there a way for me to to determine the amount of time, the recv data should wait before returning or is it determined by the system? (I.e. could I tell the socket to wait for 5 seconds before stopping to wait for more data?)
UPDATE:
Assume, I have the following code:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((sys.argv[1], port))
s.send('Hello, world')
data = s.recv(1024)
print("received: {}".format(data))
s.close()
Assume that the server sends data of size > 1024 bytes. Can I be sure that the variable "data" will contain all the data (including those beyond the 1024th byte)?
If I can't be sure about that, how would I have to change the code so that I can always be sure that the variable "data" will contain all the data sent (in one or many steps) from the server?
It depends on the protocol. Some protocols like UDP send messages and exactly 1 message is returned per recv. Assuming you are talking about TCP specifically, there are several factors involved. TCP is stream oriented and because of things like the amount of currently outstanding send/recv data, lost/reordered packets on the wire, delayed acknowledgement of data, and the Nagle algorithm (which delays some small sends by a few hundred milliseconds), its behavior can change subtly as a conversation between client and server progresses.
All the receiver knows is that it is getting a stream of bytes. It could get anything from 1 to the fully requested buffer size on any recv. There is no one-to-one correlation between the send call on one side and the recv call on the other.
If you need to figure out message boundaries its up to the higher level protocols to figure that out. Take HTTP for example. It starts with a \r\n delimited header and then has a count of the remaining bytes the client should expect to receive. The client knows how to read the header because of the \r\n then knows exactly how many bytes are coming next. Part of the charm of RESTful protocols is that they are HTTP based and somebody else already figured this stuff out!
Some protocols use NUL to delimit messages. Others may have a fixed length binary header that includes a count of any variable data to come. I like zeromq which has a robust messaging system on top of TCP.
More details on what happens with receive...
When you do recv(1024), there are 6 possibilities
There is no receive data. recv will wait until there is receive data. You can change that by setting a timeout.
There is partial receive data. You'll get that part right away. The rest is either buffered or hasn't been sent yet and you just do another recv to get more (and the same rules apply).
There is more than 1024 bytes available. You'll get 1024 of that data and the rest is buffered in the kernel waiting for another receive.
The other side has shut down the socket. You'll get 0 bytes of data. 0 means you will never get more data on that socket. But if you keep asking for data, you'll keep getting 0 bytes.
The other side has reset the socket. You'll get an exception.
Some other strange thing has gone on and you'll get an exception for that.
I am writing two python scripts to communicate over UDP using python sockets. Here's the related part of code
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.setblocking(True) #I want it to be blocking
#(...)
(msg, addr) = sock.recvfrom(4)
#(...)
(msg2, addr2) = sock.recvfrom(2)
I want the receiving to be blocking and I don't know the size of the whole message before I read the first 4-byte part. The above code becomes blocked on the sock.recvrfom(2) part, whereas modified, with one sock.recvfrom instead of two works alright:
(msg, addr) = sock.recvfrom(6) #works ok, but isn't enough for my needs
Any idea how I can conveniently read the incoming data in two parts or why the code doesn't work as expected?
socket.recvfrom(size) will (for UDP sockets) read one packet, up to size bytes. The excess data is discarded. If you want to receive the whole packet, you have to pass a larger bufsize, then process the packet in bits (instead of trying to receive it in bits.)
If you want a more convenient, less fickle interface to network I/O, consider Twisted.
Read from UDP socket dequeues the whole datagram.
UDP is a message-based protocol. recvfrom will read the entire message that was originally sent, but if the buffer isn't big enough, it will throw an exception:
socket.error: [Errno 10040] A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself
So I am not sure why you would hang on the 2nd recvfrom if a 6-byte message was originally sent. You should throw an exception on the first recvfrom. Perhaps post an actual working, minimal example of the client and the server program.
Is there a way python can distinguish between packets being sent ? e.g.
python receives data
it process data
clients sends first packet
client sends second packet
python receives data, can i receive the first packet rather then all info in the buffer
I know i can set it up up so it sends data i confirm and the client wont send more data it i have confirmed that have a processed the last piece but i'd rather not
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(("", 2000))
sock.listen(5)
all the relevant socket data
There are basically two approaches:
At the start of each packet, send an integer specifying how long that packet will be. When you receive data, read the integer first, then read that many more bytes as the first packet.
Send some sort special marker between packets. This only works if you can guarantee that the marker cannot occur within a packet.
As S. Lott points out, you could instead use UDP (which is packet-based) instead of TCP (which is stream-based), but then you give up the other features that TCP provides (retransmission of dropped packets, sequential packets, and congestion control). It's not too hard to write your own code for retransmission, but congestion control is difficult to get right.
Is there a way python can distinguish between packets being sent ?
Yes. Use UDP instead of TCP.
Netstring is a simple serialization
format used to send data packets. Each
data packet is of the form
'length:data'.
http://en.wikipedia.org/wiki/Netstring
Python networking frameworks like
twisted has direct support for
netstring.