How to unpack a C-style structure inside another structure?

How to unpack a C-style structure inside another structure? - python

I am receiving data via socket interface from an application (server) written in C. The data being posted has the following structure. I am receiving data with a client written in Python.
struct hdr
{
int Id;
char PktType;
int SeqNo;
int Pktlength;
};
struct trl
{
char Message[16];
long long info;
};
struct data
{
char value[10];
double result;
long long count;
short int valueid;
};
typedef struct
{
struct hdr hdr_buf;
struct data data_buf[100];
struct trl trl_buf;
} trx_unit;
How do I unpack the received data to access my inner data buffer?

Using the struct library is the way to go. However, you will have to know a bit more about the C program that is serializing the data. Consider the hdr structure. If the C program is sending it using the naive approach:
struct hdr header;
send(sd, &hdr, sizeof(header), 0);
Then your client cannot safely interpret the bytes that are sent to it because there is an indeterminate amount of padding inserted between the struct members. In particular, I would expect three bytes of padding following the PktType member.
The safest way to approach sending around binary data is to have the server and client serialize the bytes directly to ensure that there is no additional padding and to make the byte ordering of multibyte integers explicit. For example:
/*
* Send a header over a socket.
*
* The header is sent as a stream of packed bytes with
* integers in "network" byte order. For example, a
* header value of:
* Id: 0x11223344
* PktType: 0xff
* SeqNo: 0x55667788
* PktLength: 0x99aabbcc
*
* is sent as the following byte stream:
* 11 22 33 44 ff 55 66 77 88 99 aa bb cc
*/
void
send_header(int sd, struct hdr const* header)
{ /* NO ERROR HANDLING */
uint32_t num = htonl((uint32_t)header->Id);
send(sd, &num, sizeof(num), 0);
send(sd, &header->PktType, sizeof(header->PktType), 0);
num = htonl((uint32_t)header->SeqNo);
send(sd, &num, sizeof(num), 0);
num = htonl((uint32_t)header->PktLength);
send(sd, &num, sizeof(num), 0);
}
This will ensure that your client can safely decode it using the struct module:
buf = s.recv(13) # packed data is 13 bytes long
id_, pkt_type, seq_no, pkt_length = struct.unpack('>IBII', buf)
If you cannot modify the C code to fix the serialization indeterminacy, then you will have to read the data from the stream and figure out where the C compiler is inserting padding and manually build struct format strings to match using the padding byte format character to ignore padding values.
I usually write a decoder class in Python that reads a complete value from the socket. In your case it would look something like:
class PacketReader(object):
def __init__(self, sd):
self._socket = sd
def read_packet(self):
id_, pkt_type, seq_no, pkt_length = self._read_header()
data_bufs = [self._read_data_buf() for _ in range(0, 100)]
message, info = self._read_trl()
return {'id': id_, 'pkt_type': pkt_type, 'seq_no': seq_no,
'data_bufs': data_bufs, 'message': message,
'info': info}
def _read_header(self):
"""
Read and unpack a ``hdr`` structure.
:returns: a :class:`tuple` of the header data values
in order - *Id*, *PktType*, *SeqNo*, and *PktLength*
The header is assumed to be packed as 13 bytes with
integers in network byte order.
"""
buf = self._socket.read(13)
# > Multibyte values in network order
# I Id as 32-bit unsigned integer value
# B PktType as 8-bit unsigned integer value
# I SeqNo as 32-bit unsigned integer value
# I PktLength as 32-bit unsigned integer value
return struct.unpack('>IBII', buf)
def _read_data_buf(self):
"""
Read and unpack a single ``data`` structure.
:returns: a :class:`tuple` of data values in order -
*value*, *result*, *count*, and *value*
The data structure is assumed to be packed as 28 bytes
with integers in network byte order and doubles encoded
as IEEE 754 binary64 in network byte order.
"""
buf = self._socket.read(28) # assumes double is binary64
# > Multibyte values in network order
# 10s value bytes
# d result encoded as IEEE 754 binary64 value
# q count encoded as a 64-bit signed integer
# H valueid as a 16-bit unsigned integer value
return struct.unpack('>10sdqH', buf)
def _read_trl(self):
"""
Read and unpack a ``trl`` structure.
:returns: a :class:`tuple` of trl values in order -
*Message* as byte string, *info*
The structure is assumed to be packed as 24 bytes with
integers in network byte order.
"""
buf = self.socket.read(24)
# > Multibyte values in network order
# 16s message bytes
# q info encoded as a 64-bit signed value
return struct.unpack('>16sq', buf)
Mind you that this is untested and probably contains syntax errors but that is how I would approach the problem.

The struct library has all you need to do this.

Related

Sending big amount of bytes from Swift to Python with types and #_cdecl

I have two simple functions, one pass to another array of UInts.
When I pass small array of 20 UInts function works, but when I pass 21576 Uints function returns small amount of bites, why is it happened?
I checked UnsafeMutablePointer<UInt8> inside have correct numbers, but on Python side they are lost.
Swift:
#_cdecl("getPointer")
public func getPointer() -> UnsafeMutablePointer<UInt8>{
let arr: Array<UInt8> =[1,2,3.....] //here is big array
if let buffer = buffer {
buffer.deallocate()
buffer.deinitialize(count: arr.count)
}
buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: arr.count * MemoryLayout<UInt8>.stride)
buffer!.initialize(from: arr, count: arr.count)
return buffer!
}
Python:
native_lib = ctypes.CDLL('./libH264_decoder')
native_lib.getPointer.restype = ndpointer(dtype=ctypes.c_uint8)
cont = cast(native_lib.getPointer(), c_char_p).value
returns b'\x1cri\x1aVL\xa4q\xfc\xa7\xaezb\x83HC\x94\xb4#\xde?x\xdb\xb1\xd3\x1d\x07\xb5#\xc8\x85\x0eP\xaa\x9ew\x03\x93\xfe8\xa6\x97D\xca\xc6\xcc'

I don't know Swift, but to return a data buffer containing nulls to ctypes you need to know the size of the buffer and can't use c_char_p as the return type since ctypes assumes null-terminated data and converts that specific type to a bytes object. Use POINTER(c_char) instead for arbitrary data that can contain nulls.
Below I've made a simple C DLL that returns a pointer to some data and returns the size in an additional output parameter. The same technique should work for Swift assuming it uses the standard C ABI to export functions, but you will need to pass back both a pointer and a size if the size is variable.
test.c
__declspec(dllexport)
char* get_data(int* size) {
*size = 8;
return "\x11\x22\x00\x33\x44\x00\x55\x66";
}
test.py
import ctypes as ct
dll = ct.CDLL('./test')
dll.get_data.argtypes = ct.POINTER(ct.c_int),
dll.get_data.restype = ct.POINTER(ct.c_char) # do NOT use ct.c_char_p
size = ct.c_int() # allocate ctypes storage for the output parameter.
buf = dll.get_data(ct.byref(size)) # pass by reference.
print(buf[:size.value].hex(' ')) # Use string slicing to control the size.
# .hex(' ') for pretty-printing the data.
Output:
11 22 00 33 44 00 55 66

On the python caller site, you cast the returned array to a "pointer of chars", which expects to be NULL-terminated.
If you do not have any binary zeros in your array, you could check the follwing (only the important parts here):
var arr: Array<UInt8> = [] //here is big array
for _ in 0..<100 {
for i in 0..<26 {
arr.append(UInt8(65+i)) // A...Z
}
}
arr.append(0) // Terminate C-String alike with binary Zero
buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: arr.count * MemoryLayout<UInt8>.stride)
buffer!.initialize(from: arr, count: arr.count)
return buffer!

Can zlib compressed output avoid using certain byte value?

It seems that the output of zlib.compress uses all possible byte values. Is this possible to use 255 of 256 byte values (for example avoid using \n)?
Note that I just use the python manual as a reference, but the question is not specific to python (i.e. any other languages that has a zlib library).

No, this is not possible. Apart from the compressed data itself, there is standardized control structures which contain integers. Those integers may accidentially lead to any 8-bit character ending up in the bytestream.
Your only chance would be to encode the zlib bytestream into another format, e.g. base64.

The whole point of compression is to reduce the size as much as possible. If zlib or any compressor only used 255 of the 256 byte values, the size of the output would be increased by at least 0.07%.
That may be perfectly fine for you, so you can simply post-process the compressed output, or any data at all, to remove one particular byte value at the expense of some expansion. The simplest approach would be to replace that byte when it occurs with a two-byte escape sequence. You would also then need to replace the escape prefix with a different two-byte escape sequence. That would expand the data on average by 0.8%. That is exactly what Hans provided in another answer here.
If that cost is too high, you can do something more sophisticated, which is to decode a fixed Huffman code that encodes 255 symbols of equal probability. To decode you then encode that Huffman code. The input is a sequence of bits, not bytes, and most of the time you will need to pad the input with some zero bits to encode the last symbol. The Huffman code turns one symbol into seven bits and the other 254 symbols into eight bits. So going the other way, it will expand the input by a little less than 0.1%. For short messages it will be a little more, since often less than seven bits at the very end will be encoded into a symbol.
Implementation in C:
// Placed in the public domain by Mark Adler, 26 June 2020.
// Encode an arbitrary stream of bytes into a stream of symbols limited to 255
// values. In particular, avoid the \n (10) byte value. With -d, decode back to
// the original byte stream. Take input from stdin, and write output to stdout.
#include <stdio.h>
#include <string.h>
// Encode arbitrary bytes to a sequence of 255 symbols, which are written out
// as bytes that exclude the value '\n' (10). This encoding is actually a
// decoding of a fixed Huffman code of 255 symbols of equal probability. The
// output will be on average a little less than 0.1% larger than the input,
// plus one byte, assuming random input. This is intended to be used on
// compressed data, which will appear random. An input of all zero bits will
// have the maximum possible expansion, which is 14.3%, plus one byte.
int nolf_encode(FILE *in, FILE *out) {
unsigned buf = 0;
int bits = 0, ch;
do {
if (bits < 8) {
ch = getc(in);
if (ch != EOF) {
buf |= (unsigned)ch << bits;
bits += 8;
}
else if (bits == 0)
break;
}
if ((buf & 0x7f) == 0) {
buf >>= 7;
bits -= 7;
putc(0, out);
continue;
}
int sym = buf & 0xff;
buf >>= 8;
bits -= 8;
if (sym >= '\n' && sym < 128)
sym++;
putc(sym, out);
} while (ch != EOF);
return 0;
}
// Decode a sequence of symbols from a set of 255 that was encoded by
// nolf_encode(). The input is read as bytes that exclude the value '\n' (10).
// Any such values in the input are ignored and flagged in an error message.
// The sequence is decoded to the original sequence of arbitrary bytes. The
// decoding is actually an encoding of a fixed Huffman code of 255 symbols of
// equal probability.
int nolf_decode(FILE *in, FILE *out) {
unsigned long lfs = 0;
unsigned buf = 0;
int bits = 0, ch;
while ((ch = getc(in)) != EOF) {
if (ch == '\n') {
lfs++;
continue;
}
if (ch == 0) {
if (bits == 0) {
bits = 7;
continue;
}
bits--;
}
else {
if (ch > '\n' && ch <= 128)
ch--;
buf |= (unsigned)ch << bits;
}
putc(buf, out);
buf >>= 8;
}
if (lfs)
fprintf(stderr, "nolf: %lu unexpected line feeds ignored\n", lfs);
return lfs != 0;
}
// Encode (no arguments) or decode (-d) from stdin to stdout.
int main(int argc, char **argv) {
if (argc == 1)
return nolf_encode(stdin, stdout);
else if (argc == 2 && strcmp(argv[1], "-d") == 0)
return nolf_decode(stdin, stdout);
fputs("nolf: unknown options (use -d to decode)\n", stderr);
return 1;
}

As #ypnos says, this isn't possible within zlib itself. You mentioned that base64 encoding is too inefficient, but it's pretty easy to use an escape character to encode a character you want to avoid (like newlines).
This isn't the most efficient code in the world (and you might want to do something like finding the least used bytes to save a tiny bit more space), but it's readable enough and demonstrates the idea. You can losslessly encode/decode, and the encoded stream won't have any newlines.
def encode(data):
# order matters
return data.replace(b'a', b'aa').replace(b'\n', b'ab')
def decode(data):
def _foo():
pair = False
for b in data:
if pair:
# yield b'a' if b==b'a' else b'\n'
yield 97 if b==97 else 10
pair = False
elif b==97: # b'a'
pair = True
else:
yield b
return bytes(_foo())
As some measure of confidence you can check this exhaustively on small bytestrings:
from itertools import *
all(
bytes(p) == decode(encode(bytes(p)))
for c in combinations_with_replacement(b'ab\nc', r=6)
for p in permutations(c)
)

Equivalent of this code in Python

I need to send data in a specific format from my Python program. The goal is that I can send data from my Python on my PC to an XBee connected to an Arduino. This has been done from a program (running on an Arduino) but not Python (Python code usually have been the receiver).
// allocate two bytes for to hold a 10-bit analog reading
uint8_t payload[7];
int value1 = 100; // data to be sent
payload[0] = value1 >> 8 & 0xff; // payload[0] = MSB.
payload[1] = value1 & 0xff; // 0xff = 1111 1111, i.e.
And finally sends the data with this command (using XBee library):
Tx16Request tx_5001 = Tx16Request(0x5001, payload, sizeof(payload));
xbee.send(tx_5001);
In Python, using XBee Python library, I want to send this data but it should conform to this format so that I can use the existing code to read the data and convert it to useful value.
In summary, does anyone know what's the equivalent of the following code in Python?
uint8_t payload[7];
int value1 = 100; // data to be sent
payload[0] = value1 >> 8 & 0xff; // payload[0] = MSB.
payload[1] = value1 & 0xff; // 0xff = 1111 1111, i.e.
On the receive side, I have the following to extract the value that can be used:
uint8_t dataHigh = rx.getData(0);
uint8_t dataLow = rx.getData(1);
int value1 = dataLow + (dataHigh * 256);
Edit: I really don't need two bytes for my data, so I did a test by defining a byte in python data=chr(0x5A) and used that in my transmit, and in the received I used int value1 = analogHigh which returns 232 regardless what I define the data!

You probably looking for struct (to convert python int to unit8_t) I have used it to communicate with low level devices.
So
import struct
var = 25 # try 2**17 and see what happens
print(struct.pack('<H', var)) # b'\x00\x19'
print(struct.pack('>H', var)) # b'\x00\x19'
As you can see this > or < changing bytes between Little and Big Endian.
H is for short (uint16_t) and B is for unsigned char (uint8_t).
The b'' is python bytes string that allow you to see bytes data.
So i would to suggest you to start from xbee.send(b'\x45\x15') and see what happens on the other side. If communication works then start to create function that convert python types to structure reperesents.
See also:
BitwiseOperators

There are different ways to handle this, one way is to do a for loop
for (int i = 0; i < rx.getDataLength(); i++)
{
Serial.print( rx.getData(i), HEX );
}
and extract the data from that string which there are so many ways to do it.

collecting 'double' type data from arduino

I'm trying to send floating point data from arduino to python.The data is sent as 8 successive bytes of data (size of double) followed by newline character ('\n').How to collect these successive bytes and convert it to proper format at python end (system end)
void USART_transmitdouble(double* d)
{
union Sharedblock
{
char part[sizeof(double)];
double data;
}my_block;
my_block.data = *d;
for(int i=0;i<sizeof(double);++i)
{
USART_send(my_block.part[i]);
}
USART_send('\n');
}
int main()
{
USART_init();
double dble=5.5;
while(1)
{
USART_transmitdouble(&dble);
}
return 0;
}
python code.Sure this wouldn't print the data in proper format but just want to show what i have tried.
import serial,time
my_port = serial.Serial('/dev/tty.usbmodemfa131',19200)
while 1:
print my_port.readline(),
time.sleep(0.15)
Update:
my_ser = serial.Serial('/dev/tty.usbmodemfa131',19200)
while 1:
#a = raw_input('enter a value:')
#my_ser.write(a)
data = my_ser.read(5)
f_data, = struct.unpack('<fx',data)
print f_data
#time.sleep(0.5)
Using struct module as shown in the above code is able to print float values. But,
50% of the time,the data is printed correctly.But if I mess with time.sleep() or stop the transmission and restart it,incorrect values are printed out.I guess the wrong set of 4 bytes are being unpacked in this case.Any idea on what we can do here??

On Arduino, a double is the same as float, i.e. a little-endian single-precision floating-point number that occupies 4 bytes of memory. This means that you should read exactly 5 bytes, use the little-endian variant of the f format to unpack it, and ignore the trailing newline with x:
import struct
...
data = my_port.read(5)
num, = struct.unpack('<fx', data)
Note that you don't want to use readline because any byte of the representation of the floating-point number can be '\n'.
As Nikklas B. pointed out, you don't even need to bother with the newline at all, just send the 4 bytes and read as many from Python. In that case the format string will be '<f'.

compute CRC of struct in Python

I have the following struct, from the NRPE daemon code in C:
typedef struct packet_struct {
int16_t packet_version;
int16_t packet_type;
uint32_t crc32_value;
int16_t result_code;
char buffer[1024];
} packet;
I want to send this data format to the C daemon from Python. The CRC is calculated when crc32_value is 0, then it is put into the struct. My Python code to do this is as follows:
cmd = '_NRPE_CHECK'
pkt = struct.pack('hhIh1024s', 2, 1, 0, 0, cmd)
# pkt has length of 1034, as it should
checksum = zlib.crc32(pkt) & 0xFFFFFFFF
pkt = struct.pack('hhIh1024s', 2, 1, checksum, 0, cmd)
socket.send(....)
The daemon is receiving these values: version=2 type=1 crc=FE4BBC49 result=0
But it is calculating crc=3731C3FD
The actual C code to compute the CRC is:
https://github.com/KristianLyng/nrpe/blob/master/src/utils.c
and it is called via:
calculate_crc32((char *)packet, sizeof(packet));
When I ported those two functions to Python, I get the same as what zlib.crc32 returns.
Is my struct.pack call correct? Why is my CRC computation differing from the server's?

From the Python struct documentation:
To handle platform-independent data formats or omit implicit pad
bytes, use standard size and alignment instead of native size and
alignment: see Byte Order, Size, and Alignment for details.
Use '!' as the first format character to make the packed structure platform-independent. It forces big-endian, standard type sizes, and no pad bytes. Then the CRCs should be consistent.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to unpack a C-style structure inside another structure? - python

The struct library has all you need to do this.

Related

Sending big amount of bytes from Swift to Python with types and #_cdecl

Can zlib compressed output avoid using certain byte value?

Equivalent of this code in Python

collecting 'double' type data from arduino

compute CRC of struct in Python

Categories

Resources