Data Corrupted When Reading from Byte Stream - python

For a networking project, I'm using UDP Multicast to build an overlay network with my own implementation of IP.
I use the following to parse and build my Header first, then append the payload:
def __init__(buffer_size_bytes):
self.__buffer = bytearray(buffer_size_bytes)
def read_sock(self, listening_socket):
n_bytes, addr = listening_socket.recvfrom_into(self.__buffer, Packet.HEADER_SIZE)
packet = Packet.parse_header(self.__buffer)
if packet.payload_length is not 0:
packet.payload = parse_payload(packet.payload_length, listening_socket)
self.__router.add_to_route_queue(packet, listening_socket.locator)
def parse_payload(to_read, socket):
payload = bytearray(to_read)
view = memoryview(payload)
while to_read:
n_bytes, addr = socket.recvfrom_into(view, to_read)
view = view[n_bytes:]
to_read -= n_bytes
return payload
The header seems to be parsed correctly, but the payload gets corrupted every time. I can't figure out what I'm doing wrong when parsing the payload, and I can confirm I'm sending a bytearray from the other side.
For example, when I send a packet with the payload "Hello World" encoded in utf-8, I receive the following:
b'`\x00\x00\x00\x00\x0b\x00\x1f\x00\x00\x00'
The Packet.parse_header method:
def parse_header(cls, packet_bytes):
values = struct.unpack(cls.ILNPv6_HEADER_FORMAT, packet_bytes[:cls.HEADER_SIZE])
flow_label = values[0] & 1048575
traffic_class = (values[0] >> 20 & 255)
version = values[0] >> 28
payload_length = values[1]
next_header = values[2]
hop_limit = values[3]
src = (values[4], values[5])
dest = (values[6], values[7])
return Packet(src, dest, next_header, hop_limit, version, traffic_class, flow_label, payload_length)
For reference, the entire sent packet looks like this:
b'`\x00\x00\x00\x00\x0b\x00\x1f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01Hello World'
On receiving the first packet, the socket.recvfrom_into blocks when reading for the payload, and doesn't return until I send another message. It then seems to discard the payload of the previous message and use the second packet received as the payload...

Found my explanation here.
So the key thing was that I'm using UDP. And UDP sockets discard anything that doesn't fit in the buffer you give it.
TCP sockets however behave more like the bytestream I was expecting.
Fun!

Related

send a json containing a big buffer (bytearray) through sockets: gets truncated

I'm trying to send a json containing text fields and a buffer in a bytearray, from a micro-controller to a Windows server
msg = {"some_stuff": "some_stuff", "buf": bytearray(b'\xfe\xc2\xf1\xfe\xd5\xc0 ...')}
Note that the buffer is quite long (so that I can't put it here as reference) len(buf) -> 35973
I'm sending the length of the message before to the server so that it knows how long is the message to be received
def send_json(conn, msg):
msg = json.dumps(msg).encode('utf-8')
msg_length = len(msg)
header = str(msg_length).encode('utf-8')
header += b' ' * (64 - len(header))
conn.send(header)
conn.send(msg)
The receiving function is then
def receive_json(conn) -> dict:
msg_length = int(
conn.recv(64).decode('utf-8').replace(' ', '')
)
msg_b = conn.recv(msg_length)
msg_s = msg_b.decode('utf-8')
try:
msg_d = json.loads(msg_s)
except:
msg_d = eval(msg_s)
return msg_d
The problem is that the received message is truncated.
msg_b = b'{"buf": bytearray(b\'\\xfe\\xc2\\xf1 ... \\x06u\\xd0\\xff\\xb'
It's worth mentioning that while in debug, if I stop for a while with a breakpoint on line msg_b = conn.recv(msg_length), before running it, the received message is complete.
So it seems that in the receiving function the conn.recv(msg_length) instruction does not wait to receive a message of the specified length (msg_length)
Why is it the case? What can I do to receive a complete message?
I could introduce time.sleep between receiving the length of the message and the message, but how to know how much to wait depending on the message length?
Thank you
My solution was to check for how much of the message is missing and iterate till the message is complete
def receive_json(conn) -> dict:
msg_length = int(
conn.recv(64).decode('utf-8').replace(' ', '')
)
buf = bytearray(b'')
while len(buf) < msg_length:
missing_length = msg_length - len(buf)
packet = conn.recv(missing_length)
buf.extend(packet)
msg_s = buf.decode('utf-8')
try:
msg_d = json.loads(msg_s)
except:
msg_d = eval(msg_s)
return msg_d
TCP is a streaming protocol that guarantees delivery of bytes in the order sent, but not with the same send breaks. You need to define a protocol (which you have, as a 64-byte header of message size, then the message data), and then buffer reads until you have a complete message.
Python sockets have a .makefile method that handles the buffering for you, where you can .read(n) a specific number of bytes or .readline() to read a newline-terminated line. With this you can implement the following client and server:
server.py
import socket
import json
import time
s = socket.socket()
s.bind(('',5000))
s.listen()
while True:
c,a = s.accept()
print(f'{a} connected')
# wrap socket in a file-like buffer
with c, c.makefile('rb') as r: # read binary so .read(n) gets n bytes
while True:
header = r.readline() # read header up to a newline
if not header: break # if empty string, client closed connection
size = int(header)
data = json.loads(r.read(size)) # read exactly "size" bytes and decode JSON
print(f'{a}: {data}')
print(f'{a} disconnected')
client.py
import socket
import json
def send_json(conn, msg):
# smaller data size if non-ASCII used.
data = json.dumps(msg, ensure_ascii=False).encode()
msg_length = len(data) # length in encoded bytes
# send newline-terminated header, then data
conn.sendall(f'{msg_length}\n'.encode())
conn.sendall(data)
s = socket.socket()
s.connect(('localhost',5000))
with s:
send_json(s, {'name':'马克'}) # check to support non-ASCII properly
send_json(s, [1,2,3])
Start server.py, then run client.py a couple of times:
Output:
('127.0.0.1', 26013) connected
('127.0.0.1', 26013): {'name': '马克'}
('127.0.0.1', 26013): [1, 2, 3]
('127.0.0.1', 26013) disconnected
('127.0.0.1', 26015) connected
('127.0.0.1', 26015): {'name': '马克'}
('127.0.0.1', 26015): [1, 2, 3]
('127.0.0.1', 26015) disconnected

Using a custom socket recvall function works only, if thread is put to sleep

I have the following socket listening on my local network:
def recvall(sock):
BUFF_SIZE = 4096 # 4 KiB
fragments = []
while True:
chunk = sock.recv(BUFF_SIZE)
fragments.append(chunk)
# if the following line is removed, data is omitted
time.sleep(0.005)
if len(chunk) < BUFF_SIZE:
break
data = b''.join(fragments)
return data
def main():
pcd = o3d.geometry.PointCloud()
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('192.168.0.22', 2525))
print("starting listening...")
s.listen(1)
counter = 0
while True:
clientsocket, address = s.accept()
print(f"Connection from {address} has been established!")
received_data = recvall(clientsocket)
clientsocket.send(bytes(f"response nr {counter}!", "utf-8"))
counter += 1
print(len(received_data))
if __name__ == "__main__":
main()
To this port, I'm sending byte data with a length of 172800 bytes from an app on my mobile phone.
As one can see, I'm printing the amount of data received. The amount is only correct, if I use time.sleep() as shown in the code above. If I don't use this method, only a part of the data is received.
Obviously this is some timing issue, the question is: How can I be sure to receive all the data all the time without using time.sleep() (since this is also not 100% certain to work, depending on the sleeping time set)
sock.recv() returns the data that is available. The relevant piece from the man page of recv(2) is:
The receive calls normally return any data available, up to the requested amount,
rather than waiting for receipt of the full amount requested.
In your case, time.sleep(0.005) seems to allow for all the remaining data of the message to arrive and be stored in the buffer.
There are some options to eliminate the need for time.sleep(0.005). Which one is the most appropriate depends on your needs.
If the sender sends data, but does not expect a response, you can have the sender close the socket after it sends the data, i.e., sock.close() after sock.sendall(). recv() will return an empty string that can be used to break out of the while loop on the receiver.
def recvall(sock):
BUFF_SIZE = 4096
fragments = []
while True:
chunk = sock.recv(BUFF_SIZE)
if not chunk:
break
fragments.append(chunk)
return b''.join(fragments)
If the sender sends messages of fixed length, e.g., 172800 bytes, you can use recv() in a loop until the receiver receives an entire message.
def recvall(sock, length=172800):
fragments = []
while length:
chunk = sock.recv(length)
if not chunk:
raise EOFError('socket closed')
length -= len(chunk)
fragments.append(chunk)
return b''.join(fragments)
Other options include a. adding a delimiter, e.g., a special character that cannot be part of the data, at the end of the messages that the sender sends; the receiver can then run recv() in a loop until it detects the delimiter and b. prefixing the messages on the sender with their length; the receiver will then know how many bytes to expect for each message.

Getting audio file from a UDP packet

Posting this here out of desperation. Any help is appreciated. Thank you.
Backstory:
I am helping my friend with a device that he got from China. The device supposedly sends a audio file to my server using UDP.
assuming you want some Python code to do this automatically, here's how I'd validate and decode the packet:
import struct
def decode_packet(packet):
framehead, version, command, datalen = struct.unpack_from('!HBBH', packet)
valid = (
framehead == 0x55aa and
version == 0x00 and
command == 0x1e and
len(packet) <= datalen + 11
)
if not valid:
# ignore other protocols using this address/port
print(
' header invalid',
f'{framehead:04x} {version:02x} {command:02x} {datalen:04x}'
)
return
if len(packet) < datalen + 11:
print(' warning: packet was truncated')
offset, = struct.unpack_from('!I', packet, 6)
if datalen == 4:
print(f' end of data: file size={offset}')
return
data = packet[10:10+datalen]
print(f' got data: offset={offset} len={len(data)} hex(data)={data.hex()}')
if len(packet) == datalen + 11:
print(f' hex(checksum)={packet[datalen + 10:].hex()}')
it obviously prints out a lot of stuff, but this is good to seeing if the device is actually following the documented protocol. it doesn't seem to be, as the +4 on the data length doesn't seem to be being applied. you can test this with:
decode_packet(bytes.fromhex('55aa001e038400000000a9b6ad98d2923...'))
assuming you can get this to function correctly, you can put this into some code that listens for packets on the correct port:
import socket
def server(portnum):
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock:
sock.bind(('', portnum))
while True:
packet, addr = sock.recvfrom(10240)
print(f'received {len(packet)} bytes from {addr[0]}')
decode_packet(packet)
again, doesn't do much. you'd want to write the data to a file rather than printing it out, but you can pull the offset out and you get a signal for when the data has finished transferring

Python - Sending Packets issuing error - Minecraft Packets

I'm using the following script:
import socket
import struct
username = "username_value"
verification_key = "verification_key"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # boilerplate
s.connect(("example.com", 1234)) # adjust accordingly
# now for the packet
# note that the String type is specified as having a length of 64, we'll pad that
packet = ""
packet += struct.pack("B", 1) # packet type
packet += struct.pack("B", 7) # protocol version
packet += "%-64s" % username # magic!
packet += "%-64s" % verification_key
packet += struct.pack("B", 0) # that unused byte, assuming a NULL byte here
# send what we've crafted
s.send(packet)
and getting a response of:
packet += struct.pack("B", 1) # packet type
TypeError: Can't convert 'bytes' object to str implicitly
I am almost brand-new to Python, and just started, but I understand the language. I read up and found something about Python 3 changing the way you use packets. I feel kind of hopeless. Help? Thank you
In python 3 you have to implicitly define your string packet as a bytes
packet = b""
instead of packet = ""

Sending low level raw tcp packets python

I have been working on a program lately for raw packets. We recently had a lecture about raw packets so I have been trying to learn and do exactly what my professor told me. I have a problem with my program it comes up with an error saying destination address required, its raw so I don't want to do socket.connect(destaddr) even though that will fix the error. Here is my code:
Here is the class and function:
#not real mac address to protect privacy also removed preamble
class packet(object):
b = ""
def __init__(self, payload):
self.payload = payload
def ether(self):
#preamble = "55555555555555D5"
macdest = "123456789101" #my mac address - needed to remove colons
macsource = "123456789101" #router mac address without colons
ethertype = "0800" #removed 0x because it is not needed
fcs = "" #frame check sequence none so far
frame = macdest+macsource+ethertype
return frame
def ip(self): #in hexadecimal
version = "4" #ipv4 hex
ihl = "5" #header length hex
dscp = "00" #default
ecn = "00" #default
length = "36" #ether-24 + ip-20 + tcp-30 = 54 to hexa = 35
idip="0000" #random id
flags = "40" #dont fragment flag is 2 to hex is 4
offset = "00" #space taker
ttl = "40"#hex(64) = 40
protocol = "06" #for tcp
checksum = "0000"
ipaddrfrom = "c0a8010a"
ipaddrto = "c0a80101"
datagram = version+ihl+dscp+ecn+length+idip+flags+offset+ttl+protocol+checksum+ipaddrfrom+ipaddrto
return datagram
def tcp(self):
portsrc = "15c0" #5568
portdest = "0050" #80
syn = "00000000"
ack = "00000000"
nonce = "80"
fin = "10"
windowscale = "813b"
checksum = "0000"
segment = portsrc+portdest+syn+ack+nonce+fin+windowscale + checksum
return segment
def getpacket(self):
frame = self.ether()
datagram = self.ip()
segment = self.tcp()
payload = self.payload
packet = frame+datagram+segment+payload
a = 0
b = ""
for char in packet:
a = a+1
b = b + char
if a == 4:
b = b + " "
a=0
self.fmtpacket = b
return packet
def raw():
s = socket(AF_INET, SOCK_RAW, IPPROTO_IP)
s.bind(('192.168.1.10', 0))
pckt = packet("")
netpacket = pckt.getpacket()
print "Sending: " + pckt.fmtpacket
print ""
s.sendall(netpacket)
data = s.recv(4096)
print data
If your professor is okay with it, you may find Scapy a lot easier to work with in creating raw packets in python.
From their website:
Scapy is a powerful interactive packet manipulation program. It is able to forge or decode packets of a wide number of protocols, send them on the wire, capture them, match requests and replies, and much more. It can easily handle most classical tasks like scanning, tracerouting, probing, unit tests, attacks or network discovery (it can replace hping, 85% of nmap, arpspoof, arp-sk, arping, tcpdump, tethereal, p0f, etc.)
Is there a reason for binding to '0.0.0.0'? When you create a raw socket, you'll need to bind it to an interface.
One thing I notice is that you'll need the '\x' prefix for hex.
Right now, you're stringing together chars.
For example, in ip(), version + ihl = '45'. That's a string, not a hex value. When you're sending this along, as a raw packet, that's two bytes instead of the one that you want. You want to send '\x45', not '45'.
packet to be sent should contain the actual bytes and not the string.

Categories

Resources