unpacking pythons struct.pack in another language

unpacking pythons struct.pack in another language - python

I want to "unpack" OR de-serialize the formatted data that is outputed from python's struct.pack() function. The data is sent over the network to another platform that uses Java only.
The Python function that sends data over the network, uses this formater:
def formatOutputMsg_Array(self, mac, arr):
mac_bin = mac.encode("ascii");
mac_len = len(mac_bin);
arr_bin = array.array('d', arr).tobytes();
arr_len = len(arr_bin);
m = struct.pack('qqd%ss%ss' % (mac_len, arr_len), mac_len, arr_len, time.time(), mac_bin, arr_bin);
return m
Here are the docs for python's struct (refer to section 7.3.2.2. Format Characters):
https://docs.python.org/2/library/struct.html
1) The issue is what does 'qqd%ss%ss' mean ???
Does it mean -> long,long,double,char,char,[],char[],char,char[],char[]
2) why is modulo "%" used here with a tuple 'qqd%ss%ss' % (mac_len, arr_len) ?

The first argument to pack is the result of the expression 'qqd%ss%ss' % (mac_len, arr_len), where the two %s are replaced by the values of the given variables. Assuming mac_len == 8 and arr_len == 4, for example, the result is qqd8s4s. s preceded by a number simply means to copy the given bytes for that format into the result.

Related

Why can't Python struct module pack (or unpack) multi bytes with little endian

I'm dealing with some multi bytes issues. For example, I have a variable a = b'\x00\x01\x02\x03', it is a bytes object rather than int. I'd like to struct.pack it to form a package with little endian, but <4s didn't work. In fact, <4s and >4s get the same results. What to do if I'd like the result to be b'\x03\x02\x01\x00.
I know I could use struct.pack('<L', struct.unpack('>L', a)), but is it the only and correct way to deal with multi bytes objects?
Example:
import struct
import secrets
mhdr = b'\x20'
joineui = b'\x00\x01\x02\x03\x04\x05\x06\x07'
deveui = b'\x08\x09\x10\x11\x12\x13\x14\x15'
devnonce = secrets.token_bytes(2)
joinreq = struct.pack(
'<s8s8s2s',
mhdr,
joineui,
deveui,
devnonce,
)
# The expected joinreq should be b'\x20\x07\x06\x05\x04\x03\x02\x01\x00\x15\x14\x13\x12\x11\x10\x09\x08...'

It seems to me you do not want to have 4 single chars, but instead 1 integer.
So instead of '4s' you should try using 'i' or 'I' (whether it is signed or unsigned).
Your example should look like
import struct
import secrets
mhdr = b'\x20'
joineui = b'\x00\x01\x02\x03\x04\x05\x06\x07'
deveui = b'\x08\x09\x10\x11\x12\x13\x14\x15'
devnonce = secrets.token_bytes(2)
joinreq = struct.pack(
'<BQQH', #use small letters if the values are signed instead of unsigned
mhdr,
joineui,
deveui,
devnonce,
)
"Q" stands for long long unsigned (8byte). If you want to use float instead you can use d for double float precision (8byte).
You can see the meaning of all letters in the documentation of struct.

Convert list of numbers to list of length 1 byte objects [duplicate]

I was trying to build this bytes object in Python 3:
b'3\r\n'
so I tried the obvious (for me), and found a weird behaviour:
>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'
Apparently:
>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I've been unable to see any pointers on why the bytes conversion works this way reading the documentation. However, I did find some surprise messages in this Python issue about adding format to bytes (see also Python 3 bytes formatting):
http://bugs.python.org/issue3982
This interacts even more poorly with oddities like bytes(int) returning zeroes now
and:
It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int; but honestly, even an error would be better than this behavior. (If I wanted this behavior - which I never have - I'd rather it be a classmethod, invoked like "bytes.zeroes(n)".)
Can someone explain me where this behaviour comes from?

From python 3.2 you can use to_bytes:
>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'
def int_to_bytes(x: int) -> bytes:
return x.to_bytes((x.bit_length() + 7) // 8, 'big')
def int_from_bytes(xbytes: bytes) -> int:
return int.from_bytes(xbytes, 'big')
Accordingly, x == int_from_bytes(int_to_bytes(x)).
Note that the above encoding works only for unsigned (non-negative) integers.
For signed integers, the bit length is a bit more tricky to calculate:
def int_to_bytes(number: int) -> bytes:
return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)
def int_from_bytes(binary_data: bytes) -> Optional[int]:
return int.from_bytes(binary_data, byteorder='big', signed=True)

That's the way it was designed - and it makes sense because usually, you would call bytes on an iterable instead of a single integer:
>>> bytes([3])
b'\x03'
The docs state this, as well as the docstring for bytes:
>>> help(bytes)
...
bytes(int) -> bytes object of size given by the parameter initialized with null bytes

You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.

Python 3.5+ introduces %-interpolation (printf-style formatting) for bytes:
>>> b'%d\r\n' % 3
b'3\r\n'
See PEP 0461 -- Adding % formatting to bytes and bytearray.
On earlier versions, you could use str and .encode('ascii') the result:
>>> s = '%d\r\n' % 3
>>> s.encode('ascii')
b'3\r\n'
Note: It is different from what int.to_bytes produces:
>>> n = 3
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x03'
>>> b'3' == b'\x33' != b'\x03'
True

The documentation says:
bytes(int) -> bytes object of size given by the parameter
initialized with null bytes
The sequence:
b'3\r\n'
It is the character '3' (decimal 51) the character '\r' (13) and '\n' (10).
Therefore, the way would treat it as such, for example:
>>> bytes([51, 13, 10])
b'3\r\n'
>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'
>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'
Tested on IPython 1.1.0 & Python 3.2.3

The ASCIIfication of 3 is "\x33" not "\x03"!
That is what python does for str(3) but it would be totally wrong for bytes, as they should be considered arrays of binary data and not be abused as strings.
The most easy way to achieve what you want is bytes((3,)), which is better than bytes([3]) because initializing a list is much more expensive, so never use lists when you can use tuples. You can convert bigger integers by using int.to_bytes(3, "little").
Initializing bytes with a given length makes sense and is the most useful, as they are often used to create some type of buffer for which you need some memory of given size allocated. I often use this when initializing arrays or expanding some file by writing zeros to it.

I was curious about performance of various methods for a single int in the range [0, 255], so I decided to do some timing tests.
Based on the timings below, and from the general trend I observed from trying many different values and configurations, struct.pack seems to be the fastest, followed by int.to_bytes, bytes, and with str.encode (unsurprisingly) being the slowest. Note that the results show some more variation than is represented, and int.to_bytes and bytes sometimes switched speed ranking during testing, but struct.pack is clearly the fastest.
Results in CPython 3.7 on Windows:
Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop
Test module (named int_to_byte.py):
"""Functions for converting a single int to a bytes object with that int's value."""
import random
import shlex
import struct
import timeit
def bytes_(i):
"""From Tim Pietzcker's answer:
https://stackoverflow.com/a/21017834/8117067
"""
return bytes([i])
def to_bytes(i):
"""From brunsgaard's answer:
https://stackoverflow.com/a/30375198/8117067
"""
return i.to_bytes(1, byteorder='big')
def struct_pack(i):
"""From Andy Hayden's answer:
https://stackoverflow.com/a/26920966/8117067
"""
return struct.pack('B', i)
# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067
def chr_encode(i):
"""Another method, from Quuxplusone's answer here:
https://codereview.stackexchange.com/a/210789/140921
Similar to g10guang's answer:
https://stackoverflow.com/a/51558790/8117067
"""
return chr(i).encode('latin1')
converters = [bytes_, to_bytes, struct_pack, chr_encode]
def one_byte_equality_test():
"""Test that results are identical for ints in the range [0, 255]."""
for i in range(256):
results = [c(i) for c in converters]
# Test that all results are equal
start = results[0]
if any(start != b for b in results):
raise ValueError(results)
def timing_tests(value=None):
"""Test each of the functions with a random int."""
if value is None:
# random.randint takes more time than int to byte conversion
# so it can't be a part of the timeit call
value = random.randint(0, 255)
print(f'Testing with {value}:')
for c in converters:
print(f'{c.__name__}: ', end='')
# Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
timeit.main(args=shlex.split(
f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
f"'{c.__name__}(value)'"
))

The behaviour comes from the fact that in Python prior to version 3 bytes was just an alias for str. In Python3.x bytes is an immutable version of bytearray - completely new type, not backwards compatible.

From bytes docs:
Accordingly, constructor arguments are interpreted as for bytearray().
Then, from bytearray docs:
The optional source parameter can be used to initialize the array in a few different ways:
If it is an integer, the array will have that size and will be initialized with null bytes.
Note, that differs from 2.x (where x >= 6) behavior, where bytes is simply str:
>>> bytes is str
True
PEP 3112:
The 2.6 str differs from 3.0’s bytes type in various ways; most notably, the constructor is completely different.

int (including Python2's long) can be converted to bytes using following function:
import codecs
def int2bytes(i):
hex_value = '{0:x}'.format(i)
# make length of hex_value a multiple of two
hex_value = '0' * (len(hex_value) % 2) + hex_value
return codecs.decode(hex_value, 'hex_codec')
The reverse conversion can be done by another one:
import codecs
import six # should be installed via 'pip install six'
long = six.integer_types[-1]
def bytes2int(b):
return long(codecs.encode(b, 'hex_codec'), 16)
Both functions work on both Python2 and Python3.

Although the prior answer by brunsgaard is an efficient encoding, it works only for unsigned integers. This one builds upon it to work for both signed and unsigned integers.
def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
return i.to_bytes(length, byteorder='big', signed=signed)
def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
return int.from_bytes(b, byteorder='big', signed=signed)
# Test unsigned:
for i in range(1025):
assert i == bytes_to_int(int_to_bytes(i))
# Test signed:
for i in range(-1024, 1025):
assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)
For the encoder, (i + ((i * signed) < 0)).bit_length() is used instead of just i.bit_length() because the latter leads to an inefficient encoding of -128, -32768, etc.
Credit: CervEd for fixing a minor inefficiency.

As you want to deal with binary representation, the best is to use ctypes.
import ctypes
x = ctypes.c_int(1234)
bytes(x)
You must use the specific integer representation (signed/unsigned and the number of bits: c_uint8, c_int8, c_unit16,...).

Some answers don't work with large numbers.
Convert integer to the hex representation, then convert it to bytes:
def int_to_bytes(number):
hrepr = hex(number).replace('0x', '')
if len(hrepr) % 2 == 1:
hrepr = '0' + hrepr
return bytes.fromhex(hrepr)
Result:
>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

I think you can convert the int to str first, before you convert to byte.
That should produce the format you want.
bytes(str(your_number),'UTF-8') + b'\r\n'
It works for me in py3.8.

If the question is how to convert an integer itself (not its string equivalent) into bytes, I think the robust answer is:
>>> i = 5
>>> i.to_bytes(2, 'big')
b'\x00\x05'
>>> int.from_bytes(i.to_bytes(2, 'big'), byteorder='big')
5
More information on these methods here:
https://docs.python.org/3.8/library/stdtypes.html#int.to_bytes
https://docs.python.org/3.8/library/stdtypes.html#int.from_bytes

>>> chr(116).encode()
b't'

Converting a hex value of a string to an Ascii character with a hex value

Specifically in Python 2.4, which is unfortunately old, I need to convert a length into hex value. Length of 1 would be '\x00\x01' while a length of 65535 would be '\xFF\xFF'.
import struct
hexdict = {'0':'\x00\x00', '1':'\x00\x01', '2':'\x00\x02', '3':'\x00\x03', '4':'\x00\x04', '5':'\x00\x05', '6':'\x00\x06', '7':'\x00\x07', '8':'\x00\x08', '9':'\x00\x09', 'a':'\x00\x0a', 'b':'\x00\x0b', 'c':'\x00\x0c', 'd':'\x00\x0d', 'e':'\x00\x0e', 'f':'\x00\x0f'}
def convert(int_value): # Not in original request
encoded = format(int_value, 'x')
length = len(encoded)
encoded = encoded.zfill(length+length%2)
retval = encoded.decode('hex')
if x < 256:
retval = '\x00' + retval
return retval
for x in range(16):
print hexdict[str(hex(x)[-1])] # Original, terrible method
print convert(x) # Slightly better method
print struct.pack(">H", x) # Best method
Aside from having a dictionary like above, how can I convert an arbitrary number <= 65535 into this hex string representation, filling 2 bytes of space?
Thanks to Linuxios and an answer I found while waiting for that answer, I have found three methods to do this. Obviously, Linuxios' answer is the best, unless for some reason importing struct is not desired.

Using Python's built-in struct package:
import struct
struct.pack(">H", x)
For example, struct.pack(">H", 1) gives '\x00\x01' and struct.pack(">H", 65535) gives '\xff\xff'.

Get the big-endian byte sequence of integer in Python

Based on this post: How to return RSA key in jwks_uri endpoint for OpenID Connect Discovery
I need to base64url-encode the octet value of this two numbers:
n = 124692971944797177402996703053303877641609106436730124136075828918287037758927191447826707233876916396730936365584704201525802806009892366608834910101419219957891196104538322266555160652329444921468362525907130134965311064068870381940624996449410632960760491317833379253431879193412822078872504618021680609253
e = 65537
The "n" (modulus) parameter contains the modulus value for the RSA public key. It is represented as a Base64urlUInt-encoded value.
Note that implementers have found that some cryptographic libraries
prefix an extra zero-valued octet to the modulus representations they
return, for instance, returning 257 octets for a 2048-bit key, rather
than 256. Implementations using such libraries will need to take
care to omit the extra octet from the base64url-encoded
representation.
The "e" (exponent) parameter contains the exponent value for the RSA
public key. It is represented as a Base64urlUInt-encoded value.
For instance, when representing the value 65537, the octet sequence
to be base64url-encoded MUST consist of the three octets [1, 0, 1];
the resulting representation for this value is "AQAB".
For example, a valid encode should look like this: https://www.googleapis.com/oauth2/v3/certs
¿How could I do this in Python?

After searching the best way to tackle this problem, using pyjwkest seems to be a good one instead of creating my own function.
pip install pyjwkest
Then we use long_to_base64 function for this
>>> from jwkest import long_to_base64
>>> long_to_base64(65537)
'AQAB'

Unfortunately pack() doesn't support numbers that big, and int.to_bytes() is only supported in Python 3, so we'll have to pack them ourselves before encoding. Inspired by this post I came to a solution by converting to a hex string first:
import math
import base64
def Base64urlUInt(n):
# fromhex() needs an even number of hex characters,
# so when converting our number to hex we need to give it an even
# length. (2 characters per byte, 8 bits per byte)
length = int(math.ceil(n.bit_length() / 8.0)) * 2
fmt = '%%0%dx' % length
packed = bytearray.fromhex(fmt % n)
return base64.urlsafe_b64encode(packed).rstrip('=')
Resulting in:
n = 124692971944797177402996703053303877641609106436730124136075828918287037758927191447826707233876916396730936365584704201525802806009892366608834910101419219957891196104538322266555160652329444921468362525907130134965311064068870381940624996449410632960760491317833379253431879193412822078872504618021680609253
e = 65537
Base64urlUInt(n) == 'sZGVa39dSmJ5c7mbOsJZaq62MVjPD3xNPb-Aw3VJznk6piF5GGgdMoQmAjNmANVBBpPUyQU2SEHgXQvp6j52E662umdV2xU-1ETzn2dW23jtdTFPHRG4BFZz7m14MXX9i0QqgWVnTRy-DD5VITkFZvBqCEzWjT_y47DYD2Dod-U'
Base64urlUInt(e) == 'AQAB'

Here is a different bit of Python code for the task, taken from rsalette
def bytes_to_int(data):
"""Convert bytes to an integer"""
hexy = binascii.hexlify(data)
hexy = b'0'*(len(hexy)%2) + hexy
return int(hexy, 16)
def b64_to_int(data):
"""Convert urlsafe_b64encode(data) to an integer"""
return bytes_to_int(urlsafe_b64decode(data))
def int_to_bytes(integer):
hexy = as_binary('%x' % integer)
hexy = b'0'*(len(hexy)%2) + hexy
data = binascii.unhexlify(hexy)
return data
def int_to_b64(integer):
"""Convert an integer to urlsafe_b64encode() data"""
return urlsafe_b64encode(int_to_bytes(integer))
def as_binary(text):
return text.encode('latin1')

C to Python code conversion(print address-like values)

I am trying to convert the following code from c to Python. The C code looks like:
seed = (time(0) ^ (getpid() << 16));
fprintf("0x%08x \n", seed);
that outputs values like 0x7d24defb.
And the python code:
time1 = int(time.time())
seed = (time1 ^ (os.getpid() <<16))
that outputs values like: 1492460964
What do i need to modify at the python code so I get address-like values?

It depends on the way the value is displayed. The %x flag in printf-functions displays the given value in hexadecimal. In Python you can use the hex function to convert the value to a hexadecimal representation.

The equivalent Python code to: fprintf("0x%08x \n", seed);
>>> '0x{:08x}"'.format(1492460964)
'0x58f525a4"'
Note that hex() alone won't pad zeros to size 8 like the C code does.

I suppose this is what you what:
>>> n =hex (int(time.time()) ^ (os.getpid() <<16))
>>> print n
0x431c2fd2
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

unpacking pythons struct.pack in another language - python

Related

Why can't Python struct module pack (or unpack) multi bytes with little endian

Convert list of numbers to list of length 1 byte objects [duplicate]

Converting a hex value of a string to an Ascii character with a hex value

Get the big-endian byte sequence of integer in Python

C to Python code conversion(print address-like values)

Categories

Resources