I am wanting to use hexadecimal values when working with bytes. However I'm not quite sure on how this is done. For instance:
>>> bytearray([0x10,0x20,0x30])
bytearray(b'\x10 0')
Why are 0x20 and 0x30 ignored?
0x20 and 0x30 are not ignored: a bytearray is formatted like ASCII-characters and 0x20 happens to be the ASCII-code for a space (), the same for 0x30 which maps to a zero (0).
This is simply a compact way to represent a binary array. You can read all values and their corresponding characters in this Wikipedia article.
In case the character is not printable, it is formatted as \x?? with ?? the hexadecimal code.
They are not ignored. 0x20 is the ASCII codepoint for a space, 0x30 is the digit 0. Both are there in the output, following the \x10 byte.
What you are seeing is the representation of the bytes value, which tries to be as readable as possible. It does so by displaying any byte in the printable ASCII range as that ASCII character. Anything not so representable is shown either as a \xhh escape sequence or a shorter two character \? escape (such as \n for a newline or \t for a tab character).
Note that Python produces an integer value for each 0xhh hex notation; it's nothing more than alternative syntax to produce the same integer value. You could have used decimal notation or octal notation too, and the result would have been the same; you are putting integer values into a list object, from which you then create a bytearray object. The original hex notation is not preserved in this process:
>>> [0x10, 0x20, 0x30]
[16, 32, 48]
>>> bytearray([16, 32, 48])
bytearray(b'\x10 0')
>>> [0o20, 0o40, 0o60]
[16, 32, 48]
>>> bytearray([0o20, 0o40, 0o60])
bytearray(b'\x10 0')
The actual values in the bytearray are still integers anyway; if you index into the object you get the individual byte values:
>>> ba = bytearray([0x10, 0x20, 0x30])
>>> ba
bytearray(b'\x10 0')
>>> ba[1] # 0x20, the space
32
>>> ba[2] # 0x30, the 0 digit
48
Related
I get a string that is formed from messages of different types. I'm interested in the message, which before the appearance in the string was in the format bytearray, but now comes in the format str. For example, I need to translate 001bc5045000043a, obtained in str format, into byteray format, saving the numbers from this sequence to bytearray.
To convert a string of hex digits to a bytearray, assuming two hex digits per byte, use bytearray.fromhex:
>>> h = '001bc5045000043a'
>>> ba = bytearray.fromhex(h)
>>> ba
bytearray(b'\x00\x1b\xc5\x04P\x00\x04:')
Python will represent bytes as the equivalent ASCII character if the byte is in the ASCII range (0-127 / 0 - 0x7f) and the character is printable, hence 0x3a is displayed as ':'
>>> chr(int('3a', 16))
':'
I'm trying to implement the Blowfish algorithm in Python. The way I understand it, I have to use a key like "abcd" and then XOR it with a hexadecimal array (cycling the key if necessary)
P = (
0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344, 0xa4093822, 0x299f31d0,
0x082efa98, 0xec4e6c89, 0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917, 0x9216d5d9, 0x8979fb1b,
)
The data types here have me very confused. I saw somewhere that 'abcd' = 0x61626364. In that case, XORing the first element of P would simply be 0x61626364 ^ 0x243f6a88.
So, how do I convert a string like 'abcd' to the format 0x?????. or perhaps there's a better way? Any light on this would be very appreciated!
To convert a string to an array of bytes:
b = bytes('abcd', 'ascii')
To convert array of bytes to int:
i = int.from_bytes(b, byteorder='big', signed=False)
Two hexadecimal digits can encode exactly one byte. This makes sense, because each hexadecimal digit can be in 16 different states, so two hexadecimal digits can be in 16 * 16 = 256 different states, which is exactly the same as the number of states representable in a single byte.
Because ASCII characters can also be encoded in a single byte, any ASCII character can be encoded as two hexadecimal digits.
For example, the letter a has character code 97 in ASCII. Converting the decimal number 97 to base 16 (hexadecimal) gives you 0x61.
You can therefore take any string and convert it into a hexadecimal number by taking every character and representing it as two hex digits in your number. Looking at your example above, a = 0x61, b = 0x62, c = 0x63, and d = 0x64. Putting these all together gives you the representation abcd = 0x61626364.
I have a problem with Python's & bitwise operation:
>>> x = 0xc1
>>> y = 0x7f
>>> x & y
>>> 65
>>> bytes([65])
>>> b'A'
The problem is the conversion from decimal to hex. 65 is 0x41, however Python says that it is 'A'. Why?
The value that you already have is exactly the value you want. From a comment:
I was using bytes function because I want to concat the result of base64.b64decode(coded_string) with one more byte at the end.
bytes([65]) creates a bytes object with a single byte with the numeric value 65. What that number means depends on how you interpret the bytes.
The fact that its repr happens to be b'A' isn't relevant. What the value actually is, is the one byte you want. But the repr of a bytes object, as the docs explain, uses the bytes literal format for convenience. Any byte that matches a printable ASCII character gets represented as that character, a few common values get represented with backslash escapes like \n, and anything else as a hex escape, all within b'…'
So, repr(bytes([65])) is b'A', because byte 65 is the printable ASCII character A.
If you want to get a string with the hexadecimal representation of the number 65, you can use the hex function—or, if you want more control over the formatting, the format function:
>>> hex(65)
'0x41'
>>> format(65, '02x')
'41'
But that's not what you want here. You want the value b'A', and you already have that.
65 is not A in hex, it's A in ASCII code; print(bytes([65])) and print(chr(65)) outputs b'A' and A, respectively (ASCII representations). Hexadecimal is merely a numeral system with 16 as its base. 0x41 is therefore 4 * 16^1 + 1 * 16^0 = 65.
I am testing struct module because I would like to send simple commands with parameters in bytes (char) and unsigned int to another application.
However I found some weird things when converting to little endian unsigned int, these examples print the correct hexadecimal representation:
>>> import struct
>>> struct.pack('<I',7)
b'\x07\x00\x00\x00'
>>> struct.pack('<I',11)
b'\x0b\x00\x00\x00'
>>> struct.pack('<I',16)
b'\x10\x00\x00\x00'
>>> struct.pack('<I',15)
b'\x0f\x00\x00\x00'
but these examples apparently not:
>>> struct.pack('<I',10)
b'\n\x00\x00\x00'
>>> struct.pack('<I',32)
b' \x00\x00\x00'
>>> struct.pack('<I',64)
b'#\x00\x00\x00'
I would appreciate any explanation or hint. Thanks beforehand!
Python is being helpful.
The bytes representation will use ASCII characters for any bytes that are printable and escape codes for the rest.
Thus, 0x40 is printed as #, because that's a printable byte. But 0x0a is represented as \n instead, because that is the standard Python escape sequence for a newline character. 0x00 is represented as \x00, a hex escape sequence denoting the NULL byte value. Etc.
All this is just the Python representation when echoing the values, for your debugging benefit. The actual value itself still consists of actual byte values.
>>> b'\x40' == b'#'
True
>>> b'\x0a' == b'\n'
True
It's just that any byte in the printable ASCII range will be shown as that ASCII character rather than a \xhh hex escape or dedicated \c one-character escape sequence.
If you wanted to see only hexadecimal representations, use the binascii.hexlify() function:
>>> import binascii
>>> binascii.hexlify(b'#\x00\x00\x00')
b'40000000'
>>> binascii.hexlify(b'\n\x00\x00\x00')
b'0a000000'
which returns bytes as hex characters (with no prefixes), instead. The return value is of course no longer the same value, you now have a bytestring of twice the original length consisting of characters representing hexadecimal values, literal a through to f and 0 through to 9 characters.
"\xNN" is just the way to represent a non-prinatble character ... it will give you the prinable character if it can
print "\x0a" == "\n" == chr(10)
I have a number in integer form which I need to convert into 4 bytes and store it in a list. I am trying to use the struct module but am unable to get it to work:
struct.pack("i", 34);
This returns 0 when I am expecting the binary equivalent to be printed.
Expected output:
[0x00 0x00 0x00 0x22]
But struct.pack is returning empty. What am I doing wrong?
The output is returned as a byte string, and Python will print such strings as ASCII characters whenever possible:
>>> import struct
>>> struct.pack("i", 34)
b'"\x00\x00\x00'
Note the quote at the start, that's ASCII codepoint 34:
>>> ord('"')
34
>>> hex(ord('"'))
'0x22'
>>> struct.pack("i", 34)[0]
34
Note that in Python 3, the bytes type is a sequence of integers, each value in the range 0 to 255, so indexing in the last example produces the integer value for the byte displayed as ".
For more information on Python byte strings, see What does a b prefix before a python string mean?
If you expected the ordering to be reversed, then you may need to indicate a byte order:
>>> struct.pack(">i",34)
b'\x00\x00\x00"'
where > indicates big-endian alignment.