Exclude escaped byte char from serial.read_until() - python

I'm writing code to communicate back and forth with a module over serial which returns specific byte values to indicate the start/end of its communication. The length of the data returned can vary as can all content between the start header and end footer.
In an ideal scenario, I'd be able to use the following code to receive all data from the module:
start = b'\x5a'
end = b'\x5b'
max_size = 1024
def get_from_serial(ser: serial.Serial) -> bytes:
with ser:
_ = ser.read_until(expected=start, size=max_size)
data = ser.read_until(expected=end, size=max_size)
return start + data
Unfortunately, there are circumstances where the data sent by the module includes bytes that match either the start or end byte values. In these instances, the module prepends an escape character to them:
valid_start = b'\x5a'
valid_end = b'\x5b'
escaped_start = b'\x5c\x5a'
escaped_end = b'\x5c\x5b'
A valid start/end byte can be preceded by ANY byte value other than an escape one:
good_result = b'\x5a\xff\x5c\x5b\xff\x5b'
bad_result = b'\x5a\xff\x5c\x5b' # missed b'\xff\x5b'
Is there a way to configure ser.read_until() to ignore any escaped instance of a start/end byte and only return when encountering a valid start/end byte?
There's probably a way to do this with a loop that checks if data[-2] == b'\x5c': each time ser.read_until() returns something though I feel it could get complicated if the module returns multiple instances of an escaped start/end byte scattered throughout the data.
Any thoughts or suggestions would be greatly appreciated.
Edit:
Starting to think this isn't actually possible to do from inside ser.read_until() so have added a check before returning the data.
start = b'\x5a'
end = b'\x5b'
escape = b'\x5c'
max_size = 1024
def get_from_serial(ser: serial.Serial) -> bytes:
with ser:
_ = ser.read_until(expected=start, size=max_size)
data = ser.read_until(expected=end, size=max_size)
if valid_packet(data):
return start + data
else:
raise Exception("Invalid packet")
def valid_packet(packet: bytearray) -> bool:
header = packet[:1]
footer = packet[-1:]
escape_check = packet[-2:-1]
valid_header = header == start
valid_footer = footer == end
not_escaped = escape_check != escape
return all([
valid_header,
valid_footer,
not_escaped
])

Related

Trying to send string variable via Python socket

I'm in a CTF competition and I'm stuck on a challenge where I have to retrieve a string from a socket, reverse it and get it back. The string changes too fast to do it manually. I'm able to get the string and reverse it but am failing at sending it back. I'm pretty sure I'm either trying to do something that's not possible or am just too inexperienced at Python/sockets/etc. to kung fu my way through.
Here's my code:
import socket
aliensocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
aliensocket.connect(('localhost', 10000))
aliensocket.send('GET_KEY'.encode())
key = aliensocket.recv(1024)
truncKey = str(key)[2:16]
revKey = truncKey[::-1]
print(truncKey)
print(revKey)
aliensocket.send(bytes(revKey.encode('UTF-8')))
print(aliensocket.recv(1024))
aliensocket.close()
And here is the output:
F9SIJINIK4DF7M
M7FD4KINIJIS9F
b'Server expects key to unlock or GET_KEY to retrieve the reversed key'
key is received as a byte string. The b'' wrapped around it when printed just indicates it is a byte string. It is not part of the string. .encode() turns a Unicode string into a byte string, but you can just mark a string as a byte string by prefixing with b.
Just do:
aliensocket.send(b'GET_KEY')
key = aliensocket.recv(1024)
revKey = truncKey[::-1]
print(truncKey) # or do truncKey.decode() if you don't want to see b''
print(revKey)
aliensocket.send(revKey)
data = ''
while True:
chunk = aliensocket.recv(1)
data +=chunk
if not chunk:
rev = data[::-1]
aliensocket.sendall(rev)
break

Encoding a file with ord function

I'm trying to encode a file and output the encode into a new file, but I got this error:
TypeError: ord() expected string of length 1, but int found
My code:
from sys import argv, exit
def encode(data):
encoded = ''
while data:
current = data[0]
count = 1
for i in data[1:]:
if i == current:
count += 1
else:
break
if count == 255:
break
encoded += '{}{}'.format(chr(ord(current) & 255), chr(count & 255)) #error occurs here.
data = data[count:]
return encoded
if __name__ == '__main__':
if len(argv) < 2:
print('Please specify input file!')
exit(0)
with open(argv[1], 'rb') as (f):
data = f.read()
with open(argv[1] + '.out', 'wb') as (f):
f.write(encode(data))
Additional question: How do I decode the encoded file?
You are reading bytes (open(..., 'rb')), so when you take one element of the byte string, you get a byte, ie. a number. This number already is the character code, so just leave out the ord. Alternatively, you could open the file without the b modifier (open(..., 'r')), which will return a string; I would advise to keep it as a byte string though (or you could run into encoding issues if you are parsing something non-ascii).
You will run into a similar problem saving your file: you cannot write a string into a file opened with the b modifier. Since you have characters outside the ascii range (>128), writing as a string is not a good idea, since python will try to encode your characters (eg. in UTF-8), and you will end up with completely different bytes. Therefore, the best solution probably is not to concat your data to a string in your loop (the part where you do '{}{}'.format(...), but to have a list (encoded = [], concat with encoded.append(current)) and convert that to a byte string using bytes(encoded) after your loop. You can then pass that to write without a problem.
As for how to decode your file, you can just open the file like you do for encoding, read two bytes b1 and b2, and append [b1]*b2 to your output (again, as a list), and convert that to a byte string with bytes().

How to read UTF16-BE encoded bytes with length header

I want to decode a series of strings of variable length which have been encoded in UTF16-BE preceded by a two-bytes long big-endian integer indicating the half the byte-length of the following string. e.g:
Length String (encoded) Length String (encoded) ...
\x00\x05 \x00H\x00e\x00l\x00l\x00o \x00\x06 \x00W\x00o\x00r\x00l\x00d\x00! ...
All these strings and their length headers are concatenated in one big bytestring.
I have the encoded bytestring as bytes object in memory. I would like to have an iterable function which would yield strings until it reaches the end of the ByteString.
Not a huge improvement, but your code can be streamlined a bit.
def decode_strings(byte_string: ByteString) -> Generator[str]:
with io.BytesIO(byte_string) as stream:
while (s := stream.read(2)):
length = int.from_bytes(s, byteorder="big")
yield bytes.decode(stream.read(length), encoding="utf_16_be")
Currently I do it like this, but somehow I'm imagining Raymond Hettinger's "There must be a better way!".
import io
import functools
from typing import ByteString
from typing import Iterable
# Decoders
int_BE = functools.partial(int.from_bytes, byteorder="big")
utf16_BE = functools.partial(bytes.decode, encoding="utf_16_be")
encoded_strings = b"\x00\x05\x00H\x00e\x00l\x00l\x00o\x00\x06\x00W\x00o\x00r\x00l\x00d\x00!"
header_length = 2
def decode_strings(byte_string: ByteString) -> Iterable[str]:
stream = io.BytesIO(byte_string)
while True:
length = int_BE(stream.read(header_length))
if length:
text = utf16_BE(stream.read(length * 2))
yield text
else:
break
stream.close()
if __name__ == "__main__":
for text in decode_strings(encoded_strings):
print(text)
Thanks for any suggestions.

Python Memorybuffer pywin32

I got some code:
def get_text(self, id):
edit_hwnd = win32gui.GetDlgItem(self.hwnd, id) # 获取窗口句柄
time.sleep(0.2)
self.edit_hwnd = edit_hwnd
length = win32api.SendMessage(
edit_hwnd, win32con.WM_GETTEXTLENGTH) + 1 # 获取窗体内容长度
buf = win32gui.PyMakeBuffer(length) # 准备buffer对象作为容器
win32gui.SendMessage(edit_hwnd, win32con.WM_GETTEXT,
length, buf) # 获取窗体内容放入容器
try:
address, length = win32gui.PyGetBufferAddressAndLen(buf) # 获取容器的内存地址
except ValueError:
print('error')
return
text = win32gui.PyGetString(address, length) # 取得字符串
buf.release()
del buf
return text
This function for get string at windows.I need to while this func to always get this value.When the value changed,i do something.But now when i done this while,my program exit with error code C000005.How can i fix it.
buf.release()
del buf
It's i added when i found this problem.It look like does't work.
The messages WM_GETTEXTLENGTH returns the length of the text in characters (excluding the terminating null character) and the maximum buffer length given to WM_GETTEXT also is based on characters (including the terminating null character).
A character in the NT-based Windows systems is encoded in a double-byte character set (DBCS), meaning two bytes per character.
The function win32gui.PyMakeBuffer(length) returns a buffer of length bytes.
So if length is the return value of WM_GETTEXTLENGTH, the reserved buffer should be length * 2 + 2 bytes long and the maximum buffer length given to WM_GETTEXT should be length + 1.

Python pySerial read data from arduino breaks when sending "(char)0"

I send some data from an arduino using pySerial.
My Data looks like
bytearray(DST, SRC, STATUS, TYPE, CHANNEL, DATA..., SIZEOFDATA)
where sizeofData is a test that all bytes are received.
The problem is, every time when a byte is zero, my python program just stops reading there:
serial_port = serial.Serial("/dev/ttyUSB0")
while serial_port.isOpen():
response_header_str = serial_port.readline()
format = '>';
format += ('B'*len(response_header_str));
response_header = struct.unpack(format, response_header_str)
pprint(response_header)
serial_port.close()
For example, when I send bytearray(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) everything is fine. But when I send something like bytearray(1,2,3,4,0,1,2,3,4) I don't see everything beginning with the zero.
The problem is that I cannot avoid sending zeros as I am just sending the "memory dump" e.g. when I send a float value, there might be zero bytes.
how can I tell pyserial not to ignore zero bytes.
I've looked through the source of PySerial and the problem is in PySerial's implementation of FileLike.readline (in http://svn.code.sf.net/p/pyserial/code/trunk/pyserial/serial/serialutil.py). The offending function is:
def readline(self, size=None, eol=LF):
"""\
Read a line which is terminated with end-of-line (eol) character
('\n' by default) or until timeout.
"""
leneol = len(eol)
line = bytearray()
while True:
c = self.read(1)
if c:
line += c
if line[-leneol:] == eol:
break
if size is not None and len(line) >= size:
break
else:
break
return bytes(line)
With the obvious problem being the if c: line. When c == b'\x00' this evaluates to false, and the routine breaks out of the read loop. The easiest thing to do would be to reimplement this yourself as something like:
def readline(port, size=None, eol="\n"):
"""\
Read a line which is terminated with end-of-line (eol) character
('\n' by default) or until timeout.
"""
leneol = len(eol)
line = bytearray()
while True:
line += port.read(1)
if line[-leneol:] == eol:
break
if size is not None and len(line) >= size:
break
return bytes(line)
To clarify from your comments, this is a replacement for the Serial.readline method that will consume null-bytes and add them to the returned string until it hits the eol character, which we define here as "\n".
An example of using the new method, with a file-object substituted for the socket:
>>> # Create some example data terminated by a newline containing nulls.
>>> handle = open("test.dat", "wb")
>>> handle.write(b"hell\x00o, w\x00rld\n")
>>> handle.close()
>>>
>>> # Use our readline method to read it back in.
>>> handle = open("test.dat", "rb")
>>> readline(handle)
'hell\x00o, w\x00rld\n'
Hopefully this makes a little more sense.

Categories

Resources