Is there a way python can distinguish between packets being sent ? e.g.
python receives data
it process data
clients sends first packet
client sends second packet
python receives data, can i receive the first packet rather then all info in the buffer
I know i can set it up up so it sends data i confirm and the client wont send more data it i have confirmed that have a processed the last piece but i'd rather not
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(("", 2000))
sock.listen(5)
all the relevant socket data
There are basically two approaches:
At the start of each packet, send an integer specifying how long that packet will be. When you receive data, read the integer first, then read that many more bytes as the first packet.
Send some sort special marker between packets. This only works if you can guarantee that the marker cannot occur within a packet.
As S. Lott points out, you could instead use UDP (which is packet-based) instead of TCP (which is stream-based), but then you give up the other features that TCP provides (retransmission of dropped packets, sequential packets, and congestion control). It's not too hard to write your own code for retransmission, but congestion control is difficult to get right.
Is there a way python can distinguish between packets being sent ?
Yes. Use UDP instead of TCP.
Netstring is a simple serialization
format used to send data packets. Each
data packet is of the form
'length:data'.
http://en.wikipedia.org/wiki/Netstring
Python networking frameworks like
twisted has direct support for
netstring.
Related
The behavior is random.
I made pcaps with wireshark and tcpdump. Both show the packets length correctly.
When I do sock.recv, I randomly receive data from 2 consecutive packets. The behaviour is rare. From aprx 100 packets, 1-2 recv contains data from 2 consecutive packets.
The packets are sent very fast. Some are received bellow 1ms. However this is not a good indicator because other packets received in similar time diff are read correctly.
The socket is AF_INET, SOCK_STREAM, non blocking and it is implemented using selectors.
The socket is a client
As #jasonharper says, TCP is a protocol that provides you with a stream of bytes. It doesn't provide you with packets.
If you have a protocol that runs over TCP, and that has a notion of individual packets, there is no guarantee that a single chunk of bytes delivered on a TCP socket will begin at the beginning of a higher-level packet or will end at the end of the same higher-level packet. A packet may be broken up between two chunks, and a chunk may include bits of more than one packet. The only guarantee you get from TCP is that the data you get is received in the order in which it's transmitted.
As noted in the comment, protocols that run atop TCP generally either use some form of terminator to mark the end of a packet or put a byte count at the beginning of a packet to indicate how many bytes are in the packet.
I have a simple server-client program:
In server.py:
import socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind(("127.0.0.1", 1234))
server_socket.listen()
connection_socket, address = server_socket.accept()
with connection_socket:
data = connection_socket.recv(1000)
connection_socket.send(bytearray([0x0]))
print(data)
server_socket.close()
And in client.py:
import socket
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(("127.0.0.1", 1234))
client_socket.send(bytearray([0x0, 0x1, 0x2]))
print(client_socket.recv(1))
client_socket.send(bytearray([0x3, 0x4, 0x5]))
client_socket.close()
Here's what I think is going on:
What I know of the TCP protocol is that it is "stream-based". I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled. This is seemingly interrupted by the send made by the server or the recv made by the client. The following 3 bytes go unreceived.
Are these correct assumptions? If not, what is really going on here?
Thanks in advance for your help!
I've read here that recv blocks IO until my request of 1000 bytes has been fulfilled.
Which is wrong. recv blocks until at least one byte is received. The number given just specifies the maximum number of bytes which should be read, i.e. neither the exact number nor the minimum number.
The following 3 bytes go unreceived.
It is likely that in this specific case the 1000 bytes are received at once, leaving 3 bytes unread. This is different though if larger amounts of data are send, especially over links with low MTU (i.e. local network, WiFi vs. localhost traffic). Here it can be seen that only parts of the expected data are received during a single recv.
Even the assumption that send will send all given data is wrong: send will only send at most the given data. One needs to actually check the return value to see how much actually got send. Use sendall instead if you want to have everything send.
Can I say that socket.send() “flushed”/“resets” the TCP stream here?
No. send and recv work only on the socket write and read buffers. They don't actually cause a sending or receiving. This is done by the OS instead. A send just puts the data into the sockets write buffer and the OS will eventually transmit this data. This transmission is not in all cases done immediately though. If there are outstanding unacknowledged data the sending might get deferred until the data are acknowledged (details depend on the TCP window). If only few data are in the buffer the OS might wait a while for the application to call send with more data in order to keep the transmission overhead low (NAGLE algorithm).
Thus the phrase "flush" has no real meaning here. And "reset" actually means something completely different with TCP - namely forcibly breaking the connection using the RST flag. So don't use these phrases in this context.
I have a multicast server sending data that must be captured by a python client. The problem is that recvfrom does not receive any data or at least receive the first packet and sorta caches it. If I use recvfrom in a loop then my data is received correctly.
My question is why I should use recvfrom in a loop to have the expected behavior?
from socket import *
s=socket(AF_INET, SOCK_DGRAM)
s.bind(('172.30.102.141',12345))
m=s.recvfrom(1024)
print m[0]
# sleep for x seconds here
m=s.recvfrom(1024)
print m[0]
# print the exact same thing as previously...
One thing is for sure, multicast is basically sending UDP packages and you have to keep listening for new packages. That is true even for TCP protocol based communication.
When you use low level interfaces for network communication, like socket is, it's up on both sides to define application level protocol.
That means, you define how receiving party concludes that message is complete. This is because message could get split in multiple parts/packets that get through the network. So receiving side has to assemble them in a proper way and then check if the message is whole. After that you push it up through the pipeline of processing messages or whatever you do in receiving side.
When using UDP, receiving side doesn't know if there is any packet on its way, so it just does try to recvfrom 1024 bytes and finishes. It doesn't know and should not care if there is more data on it's way. It's up to you to take care of that.
I am new to socket programming and recently picked up Python for it. I have a few questions in mind which I can't seems to find a definite answer for.
I am looking into sending data over UDP and have written a simple python script to do just that. Works fine sending small objects (Small pickled objects to be exact) across but how should I handle objects that are too large to be fitted in one UDP packet?
I've thought of first sizing up the object in bytes. Nothing will be done if the object is small enough to be fitted in a UDP packet, but if the object is too huge, the object will then be split up evenly (if possible) into many smaller chunks so that it can be fitted into multiple UDP packets and be sent across to the client. Once the client receive the chunks, the client will reassemble the multiple UDP packets into the original state.
I immediately hit my first brick wall when trying to implement the mentioned above.
From my research done, it doesn't seems like there is any 'effective' way in getting the byte size of an object. This means I am unable to determine if an object is too large to fit in a UDP packet.
What happen if I insist on sending an large object across to the client? Will it get fragmented automatically and be reassembled on the client side or will the packet be dropped by the client?
What is the right way to handle large object over UDP? Keeping in mind that the large object could be a file that is 1GB in size or a byte object that is 25MB in size.
Thanks in advance.
Side Notes:
I do understand that UDP packets may not always come in order and
therefore I have already implemented countermeasure for it which is
to tag a sequence number to the UDP packets sent out to the client.
I do understand that there is no assurance that the client will receive all of the UDP packets. I am not concerned about packet loss for now.
I do understand that TCP is the right candidate for what I am trying to do but I am focusing on understanding UDP and on how to handle situations where acknowledgement of packets from client is not possible for now.
I do understand the usage of pickle is insecure. Will look into it at later stage.
A UDP packet can be as large as approximately 64k. So if you want to send a file that is larger than that you can fragment yourself into packets of 64k. That is the theoretical maximum. My advice is to use fragments of smaller chunks of 500 bytes.
IP is responsible for fragmentation and reassembly of the packets if you do use 64k packets. Smaller packets of 500 bytes are not likely to be fragmented because the mtu is usually around 1500 bytes. If you use larger packets that are fragmented, IP is going to drop them if one of those fragments is lost.
You are right that using TCP is probably better to use for something like this or even an existing protocol like TFTP. It implements a per packet acking mechanism and sequence numbers just like you did.
Most applications dealing with sockets might store data in memory until it is all sent. Bad idea! I have a product application that has to send very large files over the web and have used chunking methods in the past. I just rewrote some of my code in Python ( binfileio ). In my applications, I have sent chuck files to a reserved folder and, once all chunked files were tucked in bed, I reassembled them. I've never trusted sending large files across a wire that could get cut at any time. Hope this helps.
Pytho 3.0
Code for Sending Data Through UDP Communication
import socket
UDP_IP = "127.0.0.1"
UDP_PORT = 5005
MESSAGE = "Hello, World!"
print("UDP target IP:",UDP_IP)
print("UDP target port:",UDP_PORT)
print("message:",MESSAGE)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.sendto(bytes(MESSAGE, "utf-8"), (UDP_IP, UDP_PORT))
I am writing two python scripts to communicate over UDP using python sockets. Here's the related part of code
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.setblocking(True) #I want it to be blocking
#(...)
(msg, addr) = sock.recvfrom(4)
#(...)
(msg2, addr2) = sock.recvfrom(2)
I want the receiving to be blocking and I don't know the size of the whole message before I read the first 4-byte part. The above code becomes blocked on the sock.recvrfom(2) part, whereas modified, with one sock.recvfrom instead of two works alright:
(msg, addr) = sock.recvfrom(6) #works ok, but isn't enough for my needs
Any idea how I can conveniently read the incoming data in two parts or why the code doesn't work as expected?
socket.recvfrom(size) will (for UDP sockets) read one packet, up to size bytes. The excess data is discarded. If you want to receive the whole packet, you have to pass a larger bufsize, then process the packet in bits (instead of trying to receive it in bits.)
If you want a more convenient, less fickle interface to network I/O, consider Twisted.
Read from UDP socket dequeues the whole datagram.
UDP is a message-based protocol. recvfrom will read the entire message that was originally sent, but if the buffer isn't big enough, it will throw an exception:
socket.error: [Errno 10040] A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself
So I am not sure why you would hang on the 2nd recvfrom if a 6-byte message was originally sent. You should throw an exception on the first recvfrom. Perhaps post an actual working, minimal example of the client and the server program.