How to unpack variable length data in python struct - python

I am building a p2p application where it is necessary for me to unpack an udp announce response from the tracker.
announce response:
Offset Size Name
0 32-bit integer action
4 32-bit integer transaction_id
8 32-bit integer interval
12 32-bit integer leechers
16 32-bit integer seeders
20 + 6 * n 32-bit integer IP address
24 + 6 * n 16-bit integer TCP port
20 + 6 * N
I need to read all the data listed in name field from the above table, but the values of seeders, ip address and port are variable.
how do i unpack all the data from the response properly using python struct ?

The struct module is pretty usefull for unpacking binary data.
action, transaction_id, interval, leechers, seeders = struct.unpack('>iiiii', announse_response)
You then have to loop over the rest of the data to get all ip/port data:
ip_port_list = []
while True:
try: ip_port_list.append(struct.unpack('>ih', announse_response))
except: break
Without struct you will have to read byte by byte and then convert the big endian problem.

I experienced the same struggle, so wrote a simple package for this called binread.
from binread import formatclass, U32, U16, Array
#formatclass
class Seeder:
ip_addr = U32
tcp_port = U16
#formatclass
class AnnounceResponse:
action = U32
transaction_id = U32
interval = U32
leechers = U32
seeders = U32
# `length` refers to the value of the previous `seeders` field
seeder_data = Array(Seeder, length="seeders")
# `data` is a bytes object
resp = AnnounceResponse.read(data)
print(resp.transaction_id)
print(len(resp.seeder_data))
print(resp.seeder_data[0].tcp_port)
You could optionally specify the endianness for all values with #formatclass(byteorder='little') (or 'big').
Full disclosure, I am the author of binread.

Related

Unpack IEEE 754 Floating Point Number

I am reading two 16 bit registers from a tcp client using the pymodbus module. The two registers make up a 32 bit IEEE 754 encoded floating point number. Currently I have the 32 bit binary value of the registers shown in the code below.
start_address = 0x1112
reg_count = 2
client = ModbusTcpClient(<IP_ADDRESS>)
response = client.read_input_registers(start_address,reg_count)
reg_1 = response.getRegister(0)<<(16 - (response.getRegister(0).bit_length())) #Get in 16 bit format
reg_2 = response.getRegister(1)<<(16 - (response.getRegister(1).bit_length())) #Get in 16 bit format
volts = (reg_1 << 16) | reg_2 #Get the 32 bit format
The above works fine to get the encoded value the problem is decoding it. I was going to code something like in this video but I came across the 'f' format in the struct module for IEEE 754 encoding. I tried decode the 32 bit float stored in volts in the code above using the unpack method in the struct module but ran into the following errors.
val = struct.unpack('f',volts)
>>> TypeError: a bytes-like object is required, not 'int'
Ok tried convert it to a 32 bit binary string.
temp = bin(volts)
val = struct.unpack('f',temp)
>>> TypeError: a bytes-like object is required, not 'str'
Tried to covert it to a bytes like object as in this post and format in different ways.
val = struct.unpack('f',bytes(volts))
>>> TypeError: string argument without an encoding
temp = "{0:b}".format(volts)
val = struct.unpack('f',temp)
>>> ValueError: Unknown format code 'b' for object of type 'str'
val = struct.unpack('f',volts.encode())
>>> struct.error: unpack requires a buffer of 4 bytes
Where do I add this buffer and where in the documentation does it say I need this buffer with the unpack method? It does say in the documentation
The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
The calcsize(fmt) function returns a value in bytes but the len(string) returns a value of the length of the string, no?
Any suggestions are welcome.
EDIT
There is a solution to decoding below however a better solution to obtaining the 32 bit register value from the two 16 bit register values is shown below compared to the original in the question.
start_address = 0x1112
reg_count = 2
client = ModbusTcpClient(<IP_ADDRESS>)
response = client.read_input_registers(start_address,reg_count)
reg_1 = response.getRegister(0)
reg_2 = response.getRegister(1)
# Shift reg 1 by 16 bits
reg_1s = reg_1 << 16
# OR with the reg_2
total = reg_1s | reg_2
I found a solution to the problem using the BinaryPayloadDecoder.fromRegisters() from the pymodbus moudule instead of the struct module. Note that this solution is specific to the modbus smart meter device I am using as the byte order and word order of the registers could change in other devices. It may still work in other devices to decode registers but I would advise to read the documentation of the device first to be sure. I left in the comments in the code below but when I refer to page 24 this is just for my device.
from pymodbus.client.sync import ModbusTcpClient
from pymodbus.constants import Endian
from pymodbus.payload import BinaryPayloadDecoder
start_address = 0x1112
reg_count = 2
client = ModbusTcpClient(<IP_ADDRESS>)
response = client.read_input_registers(start_address,reg_count)
# The response will contain two registers making a 32 bit floating point number
# Use the BinaryPayloadDecoder.fromRegisters() function to decode
# The coding scheme for a 32 bit float is IEEE 754 https://en.wikipedia.org/wiki/IEEE_754
# The MS Bytes are stored in the first address and the LS bytes are stored in the second address,
# this corresponds to a big endian byte order (Second parameter in function)
# The documentation for the Modbus registers for the smart meter on page 24 says that
# the low word is the first priority, this correspond to a little endian word order (Third parameter in function)
decoder = BinaryPayloadDecoder.fromRegisters(response.registers, Endian.Big, wordorder=Endian.Little)
final_val = (decoder.decode_32bit_float())
client.close()
EDIT
Credit to juanpa-arrivillaga and chepner the problem can be solved using the struct module also with the byteorder='little'. The two functions in the code below can be used if the byteorder is little or if the byte order is big depending upon the implementation.
import struct
from pymodbus.client.sync import ModbusTcpClient
def big_endian(response):
reg_1 = response.getRegister(0)
reg_2 = response.getRegister(1)
# Shift reg 1 by 16 bits
reg_1s = reg_1 << 16
# OR with the reg_2
total = reg_1s | reg_2
return total
def little_endian(response):
reg_1 = response.getRegister(0)
reg_2 = response.getRegister(1)
# Shift reg 2 by 16 bits
reg_2s = reg_2 << 16
# OR with the reg_1
total = reg_2s | reg_1
return(total)
start_address = 0x1112
reg_count = 2
client = ModbusTcpClient(<IP_ADDRESS>)
response = client.read_input_registers(start_address,reg_count)
# Little
little = little_endian(response)
lit_byte = little.to_bytes(4,byteorder='little')
print(struct.unpack('f',lit_byte))
# Big
big = big_endian(response)
big_byte = big.to_bytes(4,byteorder='big')
print(struct.unpack('f',big_byte))

Serial Port data

I am attempting to read the data from an Absolute Encoder with a USB interface using pyserial on the Raspberry. The datasheet for the encoder is below. The USB interface data is on page 22-23
https://www.rls.si/eng/fileuploader/download/download/?d=0&file=custom%2Fupload%2FData-sheet-AksIM-offaxis-rotary-absolute-encoder.pdf
I have successfully connected to the Encoder and I am able to send commands using
port = serial.Serial("/dev/serial/by-id/usb-RLS_Merilna_tehnkis_AksIM_encoder_3454353-if00")
port.write(b"x")
where x is any of the available Commands listed for the USB interface.
For example port.write(b"1") is meant to initiate a single position request. I am able to print the output from encoder with
x = port.read()
print(x)
The problem is converting the output into actual positiong data. port.write(b"1") outputs the following data:
b'\xea\xd0\x05\x00\x00\x00\xef'
I know that the first and last bytes are just the header and footer. Bytes 5 and 6 are the encoder status. Bytes 2-4 is the actual position data. The customer support has informed me that I need to take bytes 2 to 4, shift them into a 32 bit unsigned integer (into lower 3 bytes), convert to a floating point number, divide by 0xFF FF FF, multiply by 360. Result are degrees.
I'm not exactly sure how to do this. Can someone please let me know the python prgramming/functions I need to write in order to do this. Thank you.
You have to use builtin from_bytes() method:
x = b'\xea\xd0\x05\x00\x00\x00\xef'
number = 360 * float(
int.from_bytes(x[1:4], 'big') # get integer from bytes
) / 0xffffff
print(number)
will print:
292.5274832563092
This is the way to extract the bytes and shift them into an integer and scale as a float:
x = b'\xea\xd0\x05\x00\x00\x00\xef'
print(x)
int_value = 0 # initialise shift register
for index in range(1,4):
int_value *= 256 # shift up by 8 bits
int_value += x[index] # or in the next byte
print(int_value)
# scale the integer against the max value
float_value = 360 * float(int_value) / 0xffffff
print(float_value)
Output:
b'\xea\xd0\x05\x00\x00\x00\xef'
13632768
292.5274832563092

Python: Convert packet object/inet object to 32 bit integer

I have IPv4 address and want to convert it to 32 bit integer.
i am able to convert IPv4 address into string using socket.inet_ntop and then convert that string to 32 bit integer
but is there a direct way?
An IPv4 address in its basic form is a 32-bit integer in network byte order.
I'm assuming you have it as a sequence of bytes (because that is what you would normally hand off to inet_ntop).
What you will need to convert it into a python integer is the struct module and its unpack method along with the "!I" format specification (which means network byte order, unsigned 32-bit integer). See this code:
from socket import inet_ntop, inet_pton, AF_INET
from struct import unpack
ip = inet_pton(AF_INET, "192.168.1.42")
ip_as_integer = unpack("!I", ip)[0]
print("As string[{}] => As bytes[{}] => As integer[{}]".format(
inet_ntop(AF_INET, ip), ip, ip_as_integer))
You could of course also reconstruct the integer bytewise:
ip_as_integer = (ip[0] << 24) | (ip[1] << 16) | (ip[2] << 8) | ip[3]

How to unpack a C-style structure inside another structure?

I am receiving data via socket interface from an application (server) written in C. The data being posted has the following structure. I am receiving data with a client written in Python.
struct hdr
{
int Id;
char PktType;
int SeqNo;
int Pktlength;
};
struct trl
{
char Message[16];
long long info;
};
struct data
{
char value[10];
double result;
long long count;
short int valueid;
};
typedef struct
{
struct hdr hdr_buf;
struct data data_buf[100];
struct trl trl_buf;
} trx_unit;
How do I unpack the received data to access my inner data buffer?
Using the struct library is the way to go. However, you will have to know a bit more about the C program that is serializing the data. Consider the hdr structure. If the C program is sending it using the naive approach:
struct hdr header;
send(sd, &hdr, sizeof(header), 0);
Then your client cannot safely interpret the bytes that are sent to it because there is an indeterminate amount of padding inserted between the struct members. In particular, I would expect three bytes of padding following the PktType member.
The safest way to approach sending around binary data is to have the server and client serialize the bytes directly to ensure that there is no additional padding and to make the byte ordering of multibyte integers explicit. For example:
/*
* Send a header over a socket.
*
* The header is sent as a stream of packed bytes with
* integers in "network" byte order. For example, a
* header value of:
* Id: 0x11223344
* PktType: 0xff
* SeqNo: 0x55667788
* PktLength: 0x99aabbcc
*
* is sent as the following byte stream:
* 11 22 33 44 ff 55 66 77 88 99 aa bb cc
*/
void
send_header(int sd, struct hdr const* header)
{ /* NO ERROR HANDLING */
uint32_t num = htonl((uint32_t)header->Id);
send(sd, &num, sizeof(num), 0);
send(sd, &header->PktType, sizeof(header->PktType), 0);
num = htonl((uint32_t)header->SeqNo);
send(sd, &num, sizeof(num), 0);
num = htonl((uint32_t)header->PktLength);
send(sd, &num, sizeof(num), 0);
}
This will ensure that your client can safely decode it using the struct module:
buf = s.recv(13) # packed data is 13 bytes long
id_, pkt_type, seq_no, pkt_length = struct.unpack('>IBII', buf)
If you cannot modify the C code to fix the serialization indeterminacy, then you will have to read the data from the stream and figure out where the C compiler is inserting padding and manually build struct format strings to match using the padding byte format character to ignore padding values.
I usually write a decoder class in Python that reads a complete value from the socket. In your case it would look something like:
class PacketReader(object):
def __init__(self, sd):
self._socket = sd
def read_packet(self):
id_, pkt_type, seq_no, pkt_length = self._read_header()
data_bufs = [self._read_data_buf() for _ in range(0, 100)]
message, info = self._read_trl()
return {'id': id_, 'pkt_type': pkt_type, 'seq_no': seq_no,
'data_bufs': data_bufs, 'message': message,
'info': info}
def _read_header(self):
"""
Read and unpack a ``hdr`` structure.
:returns: a :class:`tuple` of the header data values
in order - *Id*, *PktType*, *SeqNo*, and *PktLength*
The header is assumed to be packed as 13 bytes with
integers in network byte order.
"""
buf = self._socket.read(13)
# > Multibyte values in network order
# I Id as 32-bit unsigned integer value
# B PktType as 8-bit unsigned integer value
# I SeqNo as 32-bit unsigned integer value
# I PktLength as 32-bit unsigned integer value
return struct.unpack('>IBII', buf)
def _read_data_buf(self):
"""
Read and unpack a single ``data`` structure.
:returns: a :class:`tuple` of data values in order -
*value*, *result*, *count*, and *value*
The data structure is assumed to be packed as 28 bytes
with integers in network byte order and doubles encoded
as IEEE 754 binary64 in network byte order.
"""
buf = self._socket.read(28) # assumes double is binary64
# > Multibyte values in network order
# 10s value bytes
# d result encoded as IEEE 754 binary64 value
# q count encoded as a 64-bit signed integer
# H valueid as a 16-bit unsigned integer value
return struct.unpack('>10sdqH', buf)
def _read_trl(self):
"""
Read and unpack a ``trl`` structure.
:returns: a :class:`tuple` of trl values in order -
*Message* as byte string, *info*
The structure is assumed to be packed as 24 bytes with
integers in network byte order.
"""
buf = self.socket.read(24)
# > Multibyte values in network order
# 16s message bytes
# q info encoded as a 64-bit signed value
return struct.unpack('>16sq', buf)
Mind you that this is untested and probably contains syntax errors but that is how I would approach the problem.
The struct library has all you need to do this.

How to Pack Different Types of Data Together into Binary Data with Python

Suppose there is a user-defined protocol as below:
The protocol:
------------- ------------- ---------------- -----------------
| Seqno. | ip | port | user name |
| int, 4 bytes| int, 4 bytes| short, 2 bytes | string, 50 bytes|
the [user name] field stores a string ending with zero,
if the string length is less than 50 bytes, padding with zeros.
Usually I will pack these fields in C language like this:
//Pseudo code
buffer = new char[60];
memset(buffer, 0, 60);
memcpy(buffer, &htonl(Seqno), 4);
memcpy(buffer+4, &htonl(ip), 4);
memcpy(buffer+4, &htons(port), 2);
memcpy(buffer+2, Usrname.c_str(), Usrname.length() + 1);
But how can we pack the protocol data in python? I am new to python.
Use the struct module:
import struct
binary_value = struct.pack('!2IH50s', seqno, ip, port, usrname)
This packs 2 4-byte unsigned integers, a 2-byte unsigned short and a 50-byte string into 60 bytes with network (big-endian) byte ordering. The string will be padded out with nulls to make up the length:
>>> import struct
>>> seqno = 42
>>> ip = 0xc6fcce10
>>> port = 80
>>> usrname = 'Martijn Pieters'
>>> struct.pack('!2IH50s', seqno, ip, port, usrname)
'\x00\x00\x00*\xc6\xfc\xce\x10\x00PMartijn Pieters\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Python's string representation uses ASCII characters for any bytes in the ASCII printable range, \xhh for most other byte points, so the 42 became \x00\x00\x00*.

Categories

Resources