How does TCP handle a file with large size?

How does TCP handle a file with large size? - python

I have two questions about TCP protocol :
Has TCP a limitation on the size of each packet to send them? if yes, what is the largest size of each packet to be sent?
and how does TCP handle files with large size? does it send it at
once time(1 packet) or TCP converts it into smaller packets?
and I use python and socket library.

TCP is a streaming protocol. Which means that you as sender have no control over how many send operations are actually going on in the kernel.
There are several mechanisms at play here. A client can call send many times with only one byte. TCP will try to accumulate these bytes until the MTU of the L2 is filled and only then send it, or sooner if a timeout is reached.
In one send operation you can send 64K approx.
To send a file, you need thus several send operatons until the whole file is sent.
On the receiving side of the TCP, there is no relation to the number of send operations and the number of receive operations. If the sender send 5 times 10 bytes consecutively and fast, very likely they will be received as 50 bytes in one receive operation.
Actually for TCP there is no such thing as a packet. It is called a frame.
Packets are used for UDP, because UDP is not a streaming protocol.
In UDP, if you send 5 bytes, the receiver will receive 5 bytes or nothing because UDP is not connection oriented. Whereas TCP is.
For UDP the max size is approx 64k. This 64k will get chopped to pieces by IP to send it over the wire, and IP will put it back together upon reception. This is all because of the L2 MTU.

Related

Why does Python sock.recv receives data from 2 different packets?

The behavior is random.
I made pcaps with wireshark and tcpdump. Both show the packets length correctly.
When I do sock.recv, I randomly receive data from 2 consecutive packets. The behaviour is rare. From aprx 100 packets, 1-2 recv contains data from 2 consecutive packets.
The packets are sent very fast. Some are received bellow 1ms. However this is not a good indicator because other packets received in similar time diff are read correctly.
The socket is AF_INET, SOCK_STREAM, non blocking and it is implemented using selectors.
The socket is a client

As #jasonharper says, TCP is a protocol that provides you with a stream of bytes. It doesn't provide you with packets.
If you have a protocol that runs over TCP, and that has a notion of individual packets, there is no guarantee that a single chunk of bytes delivered on a TCP socket will begin at the beginning of a higher-level packet or will end at the end of the same higher-level packet. A packet may be broken up between two chunks, and a chunk may include bits of more than one packet. The only guarantee you get from TCP is that the data you get is received in the order in which it's transmitted.
As noted in the comment, protocols that run atop TCP generally either use some form of terminator to mark the end of a packet or put a byte count at the beginning of a packet to indicate how many bytes are in the packet.

Can I say that socket.send() "flushed"/"resets" the TCP stream here?

I have a simple server-client program:
In server.py:
import socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind(("127.0.0.1", 1234))
server_socket.listen()
connection_socket, address = server_socket.accept()
with connection_socket:
data = connection_socket.recv(1000)
connection_socket.send(bytearray([0x0]))
print(data)
server_socket.close()
And in client.py:
import socket
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(("127.0.0.1", 1234))
client_socket.send(bytearray([0x0, 0x1, 0x2]))
print(client_socket.recv(1))
client_socket.send(bytearray([0x3, 0x4, 0x5]))
client_socket.close()
Here's what I think is going on:
What I know of the TCP protocol is that it is "stream-based". I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled. This is seemingly interrupted by the send made by the server or the recv made by the client. The following 3 bytes go unreceived.
Are these correct assumptions? If not, what is really going on here?
Thanks in advance for your help!

I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled.
Which is wrong. recv blocks until at least one byte is received. The number given just specifies the maximum number of bytes which should be read, i.e. neither the exact number nor the minimum number.
The following 3 bytes go unreceived.
It is likely that in this specific case the 1000 bytes are received at once, leaving 3 bytes unread. This is different though if larger amounts of data are send, especially over links with low MTU (i.e. local network, WiFi vs. localhost traffic). Here it can be seen that only parts of the expected data are received during a single recv.
Even the assumption that send will send all given data is wrong: send will only send at most the given data. One needs to actually check the return value to see how much actually got send. Use sendall instead if you want to have everything send.
Can I say that socket.send() “flushed”/“resets” the TCP stream here?
No. send and recv work only on the socket write and read buffers. They don't actually cause a sending or receiving. This is done by the OS instead. A send just puts the data into the sockets write buffer and the OS will eventually transmit this data. This transmission is not in all cases done immediately though. If there are outstanding unacknowledged data the sending might get deferred until the data are acknowledged (details depend on the TCP window). If only few data are in the buffer the OS might wait a while for the application to call send with more data in order to keep the transmission overhead low (NAGLE algorithm).
Thus the phrase "flush" has no real meaning here. And "reset" actually means something completely different with TCP - namely forcibly breaking the connection using the RST flag. So don't use these phrases in this context.

Reliable udp using tcp retransmit [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am client receiver of UDP multicast data sent by Sender server (Stock Exchange data). I am continuously receiving udp multicast packet flow sequentially numbered 1 to approximately 35,000,000 sent uniformly over a period of 6 hours . I need to ensure all packets upto say N are received before the set of N packets is periodically processed after every say ~ 256 packets. i.e. I need reliable UDP.
Reliable UDP is mimicked using TCP retransmit. If any udp packet(s) is lost/not received, it is requested by using tcp protocol by specifying the desired missing packet range (starting number, ending number).
Sender keeps record of all the packets (stock exchange data) it has sent via UDP multicast so far. So Sender will resend by TCP only those packets numbers that the receiver specifically requests for via TCP. This is how UDP reliability is achieved by receiver. The UDP drop ratio is very small (less than 0.001%) except when starting the UDP multicast in the middle of the day, in which case all previously sent UDP packets from 1 to some N will need to be resent on TCP, while live transmission of UDP multicast data packet number N+1 onward is being received.) I can't request Sender (Stock Exchange) to change its protocol--it is fixed.
What is the efficient algorithm to implement this in terms of CPU?
The issue is speed BigOh. I can make a naive algorithm using several nested loops and methods, but it not necessarily the best.
I am thinking of maintaining a number N which confirms I have received UDP
packets 1 through N, and any packet no. M which is not the next expected packet no. N+1 will be buffered, for say 256 packets, and then TCP will be used to request the missing numbers. Then normal UDP reception resumes over from the last confirmed received number after TCP request is filled.
Example:
Suppose UDP packets received by receiver are in the following sequence {1,2,3,6,7,8,9,10 ...}
After packet No. 3, the next packet is No. 6. Packets 4 through 5 are missing.
So the missing packets {4,5} are requested using TCP request({4 through 5}), and {6,7,8,9,10} are buffered. There is enough space on the 10GBaseT LAN card for buffering 35,000,000 packets.
So: receive UDP {1,2,3}, refill by TCP request {4,5}, continue receive UDP {6,7,8,9,10, ...}

I assume since you are using multicast that there are going to be multiple receivers of this data? (Because if not, you'd probably be using unicast instead)
Therefore, if the receivers are going to have the option of requesting TCP retransmission of packets they didn't get, that means that the transmitting program will need to keep a copy of recently-sent UDP packets in memory, so that when it receives a retransmit-request, it will have the requested data available to retransmit. Assuming you're stamping each packet with a unique ID, it can store this data in a std::map or std::unordered_map or similar for quick lookup.
The real question becomes, how much of this old-packet data should the transmitter retain? ideally it would retain all of it, because you never know how much a given receiver might have missed and might want to request; but that would require infinite memory so that's not a realistic option. Probably the best you can do is decide how much RAM you're willing to tie up for this purpose, and keep a count of the total number of bytes you have in your table, and when it reaches the limit, start dropping the oldest packets from the table in order to keep its size under the limit.
I wrote an open-source library that uses essentially the technique you describe (multicast UDP + TCP-retransmit-to-recover-from-packet-loss) to synchronize databases across multiple hosts as quickly as possible; some things I learned while implementing it include:
If/when you can, pack your data-messages together into larger packets, up to the MTU of the network you are transmitting over (e.g. 1388 bytes for IPv4/Ethernet). Very small packet-sizes (like 48-bytes/packet) are inefficient, since the fixed-sized packet-headers make up a greater percentage of the total data sent/received.
Only try to send when your sending-socket indicates it is ready-for-write. (i.e. don't assume that you will never fill up the socket's outgoing-data-buffer; if your traffic is "bursty", you probably will at some point)
Minimize UDP packet loss by making your UDP sockets' send and receive buffers as large as you can get away with
Further minimize UDP packet loss by doing all the UDP receiving in a dedicated, high-priority thread (which can then route the received UDP data back to a normal-priority thread for further processing -- the main thing is to avoid allowing the receiving UDP-socket's incoming-data-buffer to overflow if possible)
For the TCP retransmission part, keep in mind that TCP streams can potentially slow down to nearly zero bytes-per-second in the worst case scenario, which makes it important to ensure that poor TCP performance to client A doesn't block the TCP communications to/from clients B, C, D, etc. This can be accomplished either via non-blocking I/O and select() (or poll() or similar), or asynchronous networking, or via multiple threads; avoid blocking I/O unless you are implementing a thread-per-socket model (and probably avoid that model as well, since a thread that is indefinitely-blocked-inside-recv() is difficult to shut down cleanly)
Think about under what circumstances (if any) it is acceptable for a client to never receive a particular packet at all; are there situations where that is okay? Or must the entire system grind to a halt until every receiver has received every packet in the group, regardless of how long that might take?
If you want to get really fancy, you can look into Forward Error Correction algorithms that encode data across packets, such that the receiver can still decode all of the data even if it never receives (up to a certain percentage of) the packets. This makes the need for a re-transmit request less likely, at the cost of making all of the packets slightly larger.

reading both tcp and udp packets from same socket

I am trying to read packets in a router, like this in python:
# (skipping the exception handling code here)
s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.ntohs(0x0003))
while True:
p = s.recvfrom(2000)
pkt = p[0]
# process pkt here ...
Answers to a related question (36115971) say that parameters and methods for UDP vs TCP data are different (some say recv is for TCP and recvfrom is for UDP, and others say the opposite, similarly some say 1024 as buffer size for TCP and larger for UDP, and again some say the reverse). In my case of reading in a router, I do not have different sockets for TCP and UDP, so I need to read both from the same socket, so I am bit confused regarding how I should read the incoming packets.
(1) Should I use recv() or recvfrom(), if I want to read both TCP and UDP packets?
(2) Do the calls return data one packet at a time, or do they return after the buffer is filled up? eg, if I have a large buffer of 4096 bytes, and the incoming streaming 2 packets have 2400 bytes each, will the call return as soon as the 1st packet ends, or will it return after filling up the buffer from the 2nd packet also?
(2a) same question, but if I have a smaller buffer of 2000 bytes. It is clear that on the 1st call I will get the first 2000 bytes of the 1st packet. But on the next call, will I get the last 400 bytes of the 1st packet, or the first 2000 bytes of the 2nd packet?
(3) If I am delayed in making the next call, maybe because I was busy processing the 1st dataset, am I in danger of losing data, or will the OS keep its internal queue of the incoming packets to be given to me when I call the next time? If the OS keeps its internal queue, where can I find information about its size?
NOTE: Some of the given replies have been divergent, so let me put in some boundaries to my question. Hopefully these restrictions will help to give more specific answers.
(a) My objective is to sniff the incoming packets with python sockets only. So other solutions involving tcpdump or tshark etc are outside the scope.
(b) The objective is to only sniff for incoming packets. Additional details like packet reordering (for connection oriented protocols like TCP) are outside the scope, actually they are avoidable overhead.

If you're reading packets from a raw socket (as shown in your source code), then you can easily read all packets from the same socket. Be sure this is what you intend to do. A raw socket is for doing packet inspection for troubleshooting, forensic, security or educational purposes. You cannot easily communicate with another system this way.
And likewise, the receive calls will not differ here by protocol because you are not actually using TCP or UDP, you're simply receiving the raw packets that those protocols build and decode.
(1) Should I use recv() or recvfrom(), if I want to read both TCP and UDP packets?
Either one will work. recv() will return to you only the actual packet data, while recvfrom will return to you the data along with metadata about the packet, including the interface from which the data was received (and other things defined in struct sockaddr_ll from the packet(7) man page).
(2) Do the calls return data one packet at a time, or do they return after the buffer is filled up? eg, if I have a large buffer of 4096 bytes, and the incoming streaming 2 packets have 2400 bytes each, will the call return as soon as the 1st packet ends, or will it return after filling up the buffer from the 2nd packet also?
When using a raw socket like this, you get exactly one packet at a time. You will never get more than one. If the buffer you give is not large enough, then the packet will be truncated (with the ending bytes discarded).
(2a) same question, but if I have a smaller buffer of 2000 bytes. It is clear that on the 1st call I will get the first 2000 bytes of the 1st packet. But on the next call, will I get the last 400 bytes of the 1st packet, or the first 2000 bytes of the 2nd packet?
Generally speaking, packets on most networks are limited to about 1514 bytes. This is because the traditional "MTU" (Maximum Transfer Unit) that is configured on the network interface is 1500 bytes and usually an Ethernet header containing two MAC addresses (6 bytes each) plus a two-byte Ethertype is prepended to that. In a switch or router, you may also see packets that have an additional 4-byte header containing a VLAN header (IEEE 802.1Q). (But, some networks internally use "jumbo" packets up to about 9K in size for specific purposes.)
You should also understand that, in writing an application, one can send UDP datagrams (or TCP buffers) larger than the maximum packet size. In that case, the OS breaks those up into smaller chunks for sending (and they are re-assembled on the destination side before being handed to an application). When you're receiving raw packets like this, you will see the packets in their low-level, possibly fragmented, state.
(3) If I am delayed in making the next call, maybe because I was busy processing the 1st dataset, am I in danger of losing data, or will the OS keep its internal queue of the incoming packets to be given to me when I call the next time? If the OS keeps its internal queue, where can I find information about its size?
The OS will keep a queue of packets for you. The size is of course limited since there is no way you would be able to keep up with, say, a 1Gb NIC at full line rate (let alone a 10Gb or higher NIC). The size is configured in a system-specific way. On linux -- and probably other Unix-based systems -- you can call getsockopt with SOL_SOCKET / SO_RCVBUF to get an idea of the queue space available.
On linux, at least, the size can be set with setsockopt up to a system-imposed maximum (which itself can be configured with various sysctl settings).

I think you should not do that, because TCP assures various things like reliability, ordering, flow control, and congestion. However UDP does not guarantee anything.
These parameters are defined in the moment of creation of the socket by operating system. That is why I think that you cannot do that you are saying.
Open two different sockets, one native UDP sock and one native TCP sock.

Python TCP Duplicate Message

I'm working on an in-house TCP Server, and TCP client. When there is 0% packet loss the Server and Client work fine. However, when I have 20% or more packet loss I am seeing duplicate TCP messages. I am receiving something like this....
Client <-- MessageA -- Server
Client -- MessageB --> Server
Client <-- MessageCMessageA -- Server
Is it possible that MessageA is not completely making it to the Client, it times outs, then TCP resends it, and then the original message makes it which is received at a later time by the Client?
My question is if TCP works like that, and if that's a possible scenario with a network containing 20% packet loss or more.
Barebones of how the client and server are sending/receiving data...
socket.recv(1024)
socket.send(1024)

No, it is not possible. TCP guarantees that it will either deliver data exactly once and in the order in which it was sent, or signal an error to the application. Hence, there is probably a bug in your code. The most likely is that your code fails to deal with partial reads.
When you perform a write or send on a TCP socket, the TCP module will segment your data into as many packets as are needed. In the presence of packet loss, it is possible that some packets have arrived successfully but others must be resent. In that case, the corresponding read or recv will only receive part of the data — the remaining data will arrive in a subsequent read or recv. In other words, TCP does not preserve message boundaries.
Your code is probably interpreting such a split message as multiple messages. Make sure that you accumulate a full message in your buffer before attempting to parse it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.