Python 3 struct.pack() printing weird characters

Python 3 struct.pack() printing weird characters - python

I am testing struct module because I would like to send simple commands with parameters in bytes (char) and unsigned int to another application.
However I found some weird things when converting to little endian unsigned int, these examples print the correct hexadecimal representation:
>>> import struct
>>> struct.pack('<I',7)
b'\x07\x00\x00\x00'
>>> struct.pack('<I',11)
b'\x0b\x00\x00\x00'
>>> struct.pack('<I',16)
b'\x10\x00\x00\x00'
>>> struct.pack('<I',15)
b'\x0f\x00\x00\x00'
but these examples apparently not:
>>> struct.pack('<I',10)
b'\n\x00\x00\x00'
>>> struct.pack('<I',32)
b' \x00\x00\x00'
>>> struct.pack('<I',64)
b'#\x00\x00\x00'
I would appreciate any explanation or hint. Thanks beforehand!

Python is being helpful.
The bytes representation will use ASCII characters for any bytes that are printable and escape codes for the rest.
Thus, 0x40 is printed as #, because that's a printable byte. But 0x0a is represented as \n instead, because that is the standard Python escape sequence for a newline character. 0x00 is represented as \x00, a hex escape sequence denoting the NULL byte value. Etc.
All this is just the Python representation when echoing the values, for your debugging benefit. The actual value itself still consists of actual byte values.
>>> b'\x40' == b'#'
True
>>> b'\x0a' == b'\n'
True
It's just that any byte in the printable ASCII range will be shown as that ASCII character rather than a \xhh hex escape or dedicated \c one-character escape sequence.
If you wanted to see only hexadecimal representations, use the binascii.hexlify() function:
>>> import binascii
>>> binascii.hexlify(b'#\x00\x00\x00')
b'40000000'
>>> binascii.hexlify(b'\n\x00\x00\x00')
b'0a000000'
which returns bytes as hex characters (with no prefixes), instead. The return value is of course no longer the same value, you now have a bytestring of twice the original length consisting of characters representing hexadecimal values, literal a through to f and 0 through to 9 characters.

"\xNN" is just the way to represent a non-prinatble character ... it will give you the prinable character if it can
print "\x0a" == "\n" == chr(10)

Related

Bytes operations in Python

I'm working on a project in which I have to perform some byte operations using python and I'd like to understand some basic principals before I go on with it.
t1 = b"\xAC\x42\x4C\x45\x54\x43\x48\x49\x4E\x47\x4C\x45\x59"
t2 = "\xAC\x42\x4C\x45\x54\x43\x48\x49\x4E\x47\x4C\x45\x59"
print("Adding b character before: ",t1)
print("Using bytes(str): ",bytes(t2,"utf-8"))
print("Using str.encode: ",t2.encode())
In particular, I cannot understand why the console prints this when I run the code above:
C:\Users\Marco\PycharmProjects\codeTest\venv\Scripts\python.exe C:/Users/Marco/PycharmProjects/codeTest/msgPack/temp.py
Adding b character before: b'\xacBLETCHINGLEY'
Using bytes(str): b'\xc2\xacBLETCHINGLEY'
Using str.encode: b'\xc2\xacBLETCHINGLEY'
What I would like to understand is why, if I use bytes() or decode, I get an extra "\xc2" in front of the value. What does it mean? Is this supposed to appear? And if so, how can I get rid of it without using the first method?

Because bytes objects and str objects are two different things. The former represents a sequence of bytes, the latter represents a sequence of unicode code points. There's a huge difference between the byte 172 and the unicode code point 172.
In particular, the byte 172 doesn't encode anything in particular in unicode. On the other hand, unicode code point 172 refers to the following character:
>>> c = chr(172)
>>> print(c)
¬
And of course, they actual raw bytes this would correspond to depend on the encoding. Using utf-8 it is a two-byte encoding:
>>> c.encode()
b'\xc2\xac'
In the latin-1 encoding, it is a 1 byte:
>>> c.encode('latin')
b'\xac'
If you want raw bytes, the most precise/easy way then is to use a bytes-literal.

In a string literal, \xhh (h being a hex digit) selects the corresponding unicode character U+0000 to U+00FF, with U+00AC being the ¬ "not sign". When encoding to utf-8, all code points above 0x7F take two or more bytes. \xc2\xac is the utf-8 encoding of U+00AC.
>>> "\u00AC" == "\xAC"
True
>>> "\u00AC" == "¬"
True
>>> "\xAC" == "¬"
True
>>> "\u00AC".encode('utf-8')
b'\xc2\xac'
>>> "¬".encode("utf-8")
b'\xc2\xac'

Saving byte sequence when translating from string to bytearray

I get a string that is formed from messages of different types. I'm interested in the message, which before the appearance in the string was in the format bytearray, but now comes in the format str. For example, I need to translate 001bc5045000043a, obtained in str format, into byteray format, saving the numbers from this sequence to bytearray.

To convert a string of hex digits to a bytearray, assuming two hex digits per byte, use bytearray.fromhex:
>>> h = '001bc5045000043a'
>>> ba = bytearray.fromhex(h)
>>> ba
bytearray(b'\x00\x1b\xc5\x04P\x00\x04:')
Python will represent bytes as the equivalent ASCII character if the byte is in the ASCII range (0-127 / 0 - 0x7f) and the character is printable, hence 0x3a is displayed as ':'
>>> chr(int('3a', 16))
':'

Byte conversion fail

I have a problem with Python's & bitwise operation:
>>> x = 0xc1
>>> y = 0x7f
>>> x & y
>>> 65
>>> bytes([65])
>>> b'A'
The problem is the conversion from decimal to hex. 65 is 0x41, however Python says that it is 'A'. Why?

The value that you already have is exactly the value you want. From a comment:
I was using bytes function because I want to concat the result of base64.b64decode(coded_string) with one more byte at the end.
bytes([65]) creates a bytes object with a single byte with the numeric value 65. What that number means depends on how you interpret the bytes.
The fact that its repr happens to be b'A' isn't relevant. What the value actually is, is the one byte you want. But the repr of a bytes object, as the docs explain, uses the bytes literal format for convenience. Any byte that matches a printable ASCII character gets represented as that character, a few common values get represented with backslash escapes like \n, and anything else as a hex escape, all within b'…'
So, repr(bytes([65])) is b'A', because byte 65 is the printable ASCII character A.
If you want to get a string with the hexadecimal representation of the number 65, you can use the hex function—or, if you want more control over the formatting, the format function:
>>> hex(65)
'0x41'
>>> format(65, '02x')
'41'
But that's not what you want here. You want the value b'A', and you already have that.

65 is not A in hex, it's A in ASCII code; print(bytes([65])) and print(chr(65)) outputs b'A' and A, respectively (ASCII representations). Hexadecimal is merely a numeral system with 16 as its base. 0x41 is therefore 4 * 16^1 + 1 * 16^0 = 65.

String and hexadecimal representation

I'm trying the following:
>>> a = '\01'
>>> a
>>> '\x01'
>>> b = '\11'
>>> b
>>> '\t'
>>> c = '\21'
>>> c
>>> '\x11'
I don't understand why sometimes I get hexadecimal representation and other times not.
In '\xhh' the 'x' is fundamental or not?

Short Answer
You see the hexadecimal representation for those characters for which your native code page cannot represent your characters
Long Answer
Assuming you are using windows, and your default code page is cp1252, '\01' is a non-printable character, an ascii control code, which stands for Start of Heading. As there is no known printable representation of the character, a hexadecimal value is used to display the value.

The numbers \11, \21 are OCTAL numbers. \11 octal is \x09 (hex) is equal to '\t' (tab char). \21 octal is \x11 hex is 17 decimal.

Why does struct.pack seemingly return "0"?

I have a number in integer form which I need to convert into 4 bytes and store it in a list. I am trying to use the struct module but am unable to get it to work:
struct.pack("i", 34);
This returns 0 when I am expecting the binary equivalent to be printed.
Expected output:
[0x00 0x00 0x00 0x22]
But struct.pack is returning empty. What am I doing wrong?

The output is returned as a byte string, and Python will print such strings as ASCII characters whenever possible:
>>> import struct
>>> struct.pack("i", 34)
b'"\x00\x00\x00'
Note the quote at the start, that's ASCII codepoint 34:
>>> ord('"')
34
>>> hex(ord('"'))
'0x22'
>>> struct.pack("i", 34)[0]
34
Note that in Python 3, the bytes type is a sequence of integers, each value in the range 0 to 255, so indexing in the last example produces the integer value for the byte displayed as ".
For more information on Python byte strings, see What does a b prefix before a python string mean?
If you expected the ordering to be reversed, then you may need to indicate a byte order:
>>> struct.pack(">i",34)
b'\x00\x00\x00"'
where > indicates big-endian alignment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3 struct.pack() printing weird characters - python

"\xNN" is just the way to represent a non-prinatble character ... it will give you the prinable character if it can print "\x0a" == "\n" == chr(10)

Related

Bytes operations in Python

Saving byte sequence when translating from string to bytearray

Byte conversion fail

String and hexadecimal representation

Why does struct.pack seemingly return "0"?

Categories

Resources