Python String Prefix by 4 Byte Length

Python String Prefix by 4 Byte Length - python

I'm trying to write a server in Python to communicate with a pre-existing client whose message packets are ASCII strings, but prepended by four-byte unsigned integer values representative of the length of the remaining string.
I've done a receiver, but I'm sure there's a a more pythonic way. Until I find it, I haven't done the sender. I can easily calculate the message length, convert it to bytes and transmit the message.The bit I'm struggling with is creating an integer which is an array of four bytes.
Let me clarify: If my string is 260 characters in length, I wish to prepend a big-endian four byte integer representation of 260. So, I don't want the ASCII string "0260" in front of the string, rather, I want four (non-ASCII) bytes representative of 0x00000104.
My code to receive the length prepended string from the client looks like this:
sizeBytes = 4 # size of the integer representing the string length
# receive big-endian 4 byte integer from client
data = conn.recv(sizeBytes)
if not data:
break
dLen = 0
for i in range(sizeBytes):
dLen = dLen + pow(2,i) * data[sizeBytes-i-1]
data = str(conn.recv(dLen),'UTF-8')
I could simply do the reverse. I'm new to Python and feel that what I've done is probably longhand!!
1) Is there a better way of receiving and decoding the length?
2) What's the "sister" method to encode the length for transmission?
Thanks.

The struct module is helpful here
for writing:
import struct
msg = 'some message containing 260 ascii characters'
length = len(msg)
encoded_length = struct.pack('>I', length)
encoded_length will be a string of 4 bytes with value '\x00\x00\x01\x04'
for reading:
length = struct.unpack('>I', received_msg[:4])[0]

An example using asyncio:
import asyncio
import struct
def send_message(writer, message):
data = message.encode()
size = struct.pack('>L', len(data))
writer.write(size + data)
async def receive_message(reader):
data = await reader.readexactly(4)
size = struct.unpack('>L', data)[0]
data = await reader.readexactly(size)
return data.decode()
The complete code is here

Related

Trying to send string variable via Python socket

I'm in a CTF competition and I'm stuck on a challenge where I have to retrieve a string from a socket, reverse it and get it back. The string changes too fast to do it manually. I'm able to get the string and reverse it but am failing at sending it back. I'm pretty sure I'm either trying to do something that's not possible or am just too inexperienced at Python/sockets/etc. to kung fu my way through.
Here's my code:
import socket
aliensocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
aliensocket.connect(('localhost', 10000))
aliensocket.send('GET_KEY'.encode())
key = aliensocket.recv(1024)
truncKey = str(key)[2:16]
revKey = truncKey[::-1]
print(truncKey)
print(revKey)
aliensocket.send(bytes(revKey.encode('UTF-8')))
print(aliensocket.recv(1024))
aliensocket.close()
And here is the output:
F9SIJINIK4DF7M
M7FD4KINIJIS9F
b'Server expects key to unlock or GET_KEY to retrieve the reversed key'

key is received as a byte string. The b'' wrapped around it when printed just indicates it is a byte string. It is not part of the string. .encode() turns a Unicode string into a byte string, but you can just mark a string as a byte string by prefixing with b.
Just do:
aliensocket.send(b'GET_KEY')
key = aliensocket.recv(1024)
revKey = truncKey[::-1]
print(truncKey) # or do truncKey.decode() if you don't want to see b''
print(revKey)
aliensocket.send(revKey)

data = ''
while True:
chunk = aliensocket.recv(1)
data +=chunk
if not chunk:
rev = data[::-1]
aliensocket.sendall(rev)
break

Python 3 garbage data into the struct.unpack

I am trying to send and receive messages via socket using Python 3.
BUFFER_SIZE = 1024
# Create message
MLI = struct.pack("!I", len(MESSAGE))
MLI_MESSAGE = MLI + str.encode(MESSAGE)
When the message receive:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))
s.send(MLI_MESSAGE)
print ("Sent data: ‘", MESSAGE, "’")
# Receive MLI from response (you might want to add some timeout handling as well
resp = s.recv(BUFFER_SIZE)
resp = struct.unpack("!I", resp)[0]
print(resp)
resp:
b'\x00\t\xeb\x07\xdf\x01\x00\xdf\x02\x010'
I am getting that error:
struct.error: unpack requires a buffer of 4 bytes
I think it is related with \t char into the resp but I am not sure. How can I remove that \t char and how to solve that issue?

You are basically trying to do the following (sockets removed):
1 import struct
2
3 msg = "foobar"
4 mli = struct.pack("!I", len(msg))
5 mli_msg = mli + str.encode(msg)
6
7 len = struct.unpack("!I", mli_msg)[0]
8 print(len)
The extraction of the length in line 7 will fail since you put the whole mli_msg as argument to unpack, not only the expected 4 bytes for the len. Instead you should do:
7 len = struct.unpack("!I", mli_msg[:4])[0]
Apart from that it is wrong to first take the length of the message and then convert the message to bytes. The first takes the number of characters while the latter takes the number of bytes, which will differ when non-ASCII characters are involved - check len("ü") vs. len(str.encode("ü")). You need to first convert the message to bytes thus and then take the length to provide the correct byte length for what you send.
4 encoded_msg = str.encode(msg)
5 mli_msg = struct.pack("!I", len(encoded_msg)) + encoded_msg

Explanation of !I:
! indicates big endian alignment
I indicates unsigned integer type, occupying 4 bytes.
The variable resp value is b'\x00\t\xeb\x07\xdf\x01\x00\xdf\x02\x010', and the length exceeds 4.
You can intercept 4 bytes for parsing, like below.
import struct
resp = b'\x00\t\xeb\x07\xdf\x01\x00\xdf\x02\x010'
print(struct.unpack("!I", resp[:4])[0])
# 649991

How to read UTF16-BE encoded bytes with length header

I want to decode a series of strings of variable length which have been encoded in UTF16-BE preceded by a two-bytes long big-endian integer indicating the half the byte-length of the following string. e.g:
Length String (encoded) Length String (encoded) ...
\x00\x05 \x00H\x00e\x00l\x00l\x00o \x00\x06 \x00W\x00o\x00r\x00l\x00d\x00! ...
All these strings and their length headers are concatenated in one big bytestring.
I have the encoded bytestring as bytes object in memory. I would like to have an iterable function which would yield strings until it reaches the end of the ByteString.

Not a huge improvement, but your code can be streamlined a bit.
def decode_strings(byte_string: ByteString) -> Generator[str]:
with io.BytesIO(byte_string) as stream:
while (s := stream.read(2)):
length = int.from_bytes(s, byteorder="big")
yield bytes.decode(stream.read(length), encoding="utf_16_be")

Currently I do it like this, but somehow I'm imagining Raymond Hettinger's "There must be a better way!".
import io
import functools
from typing import ByteString
from typing import Iterable
# Decoders
int_BE = functools.partial(int.from_bytes, byteorder="big")
utf16_BE = functools.partial(bytes.decode, encoding="utf_16_be")
encoded_strings = b"\x00\x05\x00H\x00e\x00l\x00l\x00o\x00\x06\x00W\x00o\x00r\x00l\x00d\x00!"
header_length = 2
def decode_strings(byte_string: ByteString) -> Iterable[str]:
stream = io.BytesIO(byte_string)
while True:
length = int_BE(stream.read(header_length))
if length:
text = utf16_BE(stream.read(length * 2))
yield text
else:
break
stream.close()
if __name__ == "__main__":
for text in decode_strings(encoded_strings):
print(text)
Thanks for any suggestions.

How do I 'declare' an empty bytes variable?

How do I initialize ('declare') an empty bytes variable in Python 3?
I am trying to receive chunks of bytes, and later change that to a
utf-8 string.
However, I'm not sure how to initialize the initial variable that will
hold the entire series of bytes. This variable is called msg.
I can't initialize it as None, because you can't add a bytes and a
NoneType. I can't initialize it as a unicode string, because then
I will be trying to add bytes to a string.
Also, as the receiving program evolves it might get me in to a mess
with series of bytes that contain only parts of characters.
I can't do without a msg initialization, because then msg would be
referenced before assignment.
The following is the code in question
def handleClient(conn, addr):
print('Connection from:', addr)
msg = ?
while 1:
chunk = conn.recv(1024)
if not chunk:
break
msg = msg + chunk
msg = str(msg, 'UTF-8')
conn.close()
print('Received:', unpack(msg))

Just use an empty byte string, b''.
However, concatenating to a string repeatedly involves copying the string many times. A bytearray, which is mutable, will likely be faster:
msg = bytearray() # New empty byte array
# Append data to the array
msg.extend(b"blah")
msg.extend(b"foo")
To decode the byte array to a string, use msg.decode(encoding='utf-8').

bytes() works for me;
>>> bytes() == b''
True

Use msg = bytes('', encoding = 'your encoding here').
Encase you want to go with the default encoding, simply use msg = b'', but this will garbage the whole buffer if its not in the same encoding

As per documentation:
Blockquote
socket.recv(bufsize[, flags])
Receive data from the socket. The return value is a string representing the data received.
Blockquote
So, I think msg="" should work just fine:
>>> msg = ""
>>> msg
''
>>> len(msg)
0
>>>

To allocate bytes of some arbitrary length do
bytes(bytearray(100))

Reading UDP Packets

I am having some trouble dissecting a UDP packet. I am receiving the packets and storing the data and sender-address in variables 'data' and 'addr' with:
data,addr = UDPSock.recvfrom(buf)
This parses the data as a string, that I am now unable to turn into bytes. I know the structure of the datagram packet which is a total of 28 bytes, and that the data I am trying to get out is in bytes 17:28.
I have tried doing this:
mybytes = data[16:19]
print struct.unpack('>I', mybytes)
--> struct.error: unpack str size does not match format
And this:
response = (0, 0, data[16], data[17], 6)
bytes = array('B', response[:-1])
print struct.unpack('>I', bytes)
--> TypeError: Type not compatible with array type
And this:
print "\nData byte 17:", str.encode(data[17])
--> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128)
More specifically I want to parse what I think is an unsigned int. And now I am not sure what to try next. I am completely new to sockets and byte-conversions in Python, so any advice would be helpful :)
Thanks,
Thomas

An unsigned int32 is 4 bytes long, so you have to feed 4 bytes into struct.unpack.
Replace
mybytes = data[16:19]
with
mybytes = data[16:20]
(right number is the first byte not included, i.e. range(16,19) = [16,17,18]) and you should be good to go.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python String Prefix by 4 Byte Length - python

Related

Trying to send string variable via Python socket

Python 3 garbage data into the struct.unpack

How to read UTF16-BE encoded bytes with length header

How do I 'declare' an empty bytes variable?

Reading UDP Packets

Categories

Resources