Saving byte sequence when translating from string to bytearray - python

I get a string that is formed from messages of different types. I'm interested in the message, which before the appearance in the string was in the format bytearray, but now comes in the format str. For example, I need to translate 001bc5045000043a, obtained in str format, into byteray format, saving the numbers from this sequence to bytearray.

To convert a string of hex digits to a bytearray, assuming two hex digits per byte, use bytearray.fromhex:
>>> h = '001bc5045000043a'
>>> ba = bytearray.fromhex(h)
>>> ba
bytearray(b'\x00\x1b\xc5\x04P\x00\x04:')
Python will represent bytes as the equivalent ASCII character if the byte is in the ASCII range (0-127 / 0 - 0x7f) and the character is printable, hence 0x3a is displayed as ':'
>>> chr(int('3a', 16))
':'

Related

Convert from ASCII to Hex in Python

I'm trying to convert a string with special characters from ASCII to Hex using python, but it doesn't seem that I'm getting the correct value, noting that it works just fine whenever I try to convert a string that has no special characters. So basically here is what I'm doing:
import binascii
s = "D`Cزف³›"
s_bytes = str.encode(s)
hex_value = str(binascii.hexlify(s_bytes),'ascii')
print (hex_value)
Output
446043d8b2d981c2b316e280ba
Where the output should be (using online converter https://www.rapidtables.com/convert/number/ascii-to-hex.html):
446043632641b3203a
str.encode(s) defaults to utf8 encoding, which doesn't give you the byte values needed to get the desired output. The values you want are simply Unicode ordinals as hexadecimal values, so get the ordinal, convert to hex and join them all together:
s = 'D`Cزف³›'
h = ''.join([f'{ord(c):x}' for c in s])
print(h)
446043632641b3203a
Just realize that Unicode ordinals can be 1-6 hexadecimal digits long, so there is no easy way to reverse the process since you have no spacing of the numbers.

Python incorrectly converts between bytes and hex for me

I have an info_address that I want to convert to delimited hex
info_address_original = b'002dd748'
What i want is
info_address_coded = b'\x00\x2d\xd7\x48'
I tried this solution
info_address_original = b'002dd748'
info_address_intermediary = info_address_original.decode("utf-8") # '002dd748'
info_address_coded = bytes.fromhex( info_address_intermediary ) # b'\x00-\xd7H'
and i get
info_address_coded = b'\x00-\xd7H'
What my debugger shows
How would one go about correctly turning a bytes string like that to delimited hex? It worked implicitly in Python 2 but it doesn't work the way i would want in Python 3.
This is only a representation of the bytes. '-' is the same as '\x2d'.
>>> b'\x00\x2d\xd7\x48' == b'\x00-\xd7H'
True
The default representation of a byte string is to display the character value for all ascii printable characters and the encoded \xhh representation where hh is the hexadecimal value of the byte.
That means that b'\x00\x2d\xd7\x48' and `b'\x00-\xd7H' are the exact same string containing 4 bytes.

How to XOR literal with a string

I'm trying to implement the Blowfish algorithm in Python. The way I understand it, I have to use a key like "abcd" and then XOR it with a hexadecimal array (cycling the key if necessary)
P = (
0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344, 0xa4093822, 0x299f31d0,
0x082efa98, 0xec4e6c89, 0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917, 0x9216d5d9, 0x8979fb1b,
)
The data types here have me very confused. I saw somewhere that 'abcd' = 0x61626364. In that case, XORing the first element of P would simply be 0x61626364 ^ 0x243f6a88.
So, how do I convert a string like 'abcd' to the format 0x?????. or perhaps there's a better way? Any light on this would be very appreciated!
To convert a string to an array of bytes:
b = bytes('abcd', 'ascii')
To convert array of bytes to int:
i = int.from_bytes(b, byteorder='big', signed=False)
Two hexadecimal digits can encode exactly one byte. This makes sense, because each hexadecimal digit can be in 16 different states, so two hexadecimal digits can be in 16 * 16 = 256 different states, which is exactly the same as the number of states representable in a single byte.
Because ASCII characters can also be encoded in a single byte, any ASCII character can be encoded as two hexadecimal digits.
For example, the letter a has character code 97 in ASCII. Converting the decimal number 97 to base 16 (hexadecimal) gives you 0x61.
You can therefore take any string and convert it into a hexadecimal number by taking every character and representing it as two hex digits in your number. Looking at your example above, a = 0x61, b = 0x62, c = 0x63, and d = 0x64. Putting these all together gives you the representation abcd = 0x61626364.

Byte conversion fail

I have a problem with Python's & bitwise operation:
>>> x = 0xc1
>>> y = 0x7f
>>> x & y
>>> 65
>>> bytes([65])
>>> b'A'
The problem is the conversion from decimal to hex. 65 is 0x41, however Python says that it is 'A'. Why?
The value that you already have is exactly the value you want. From a comment:
I was using bytes function because I want to concat the result of base64.b64decode(coded_string) with one more byte at the end.
bytes([65]) creates a bytes object with a single byte with the numeric value 65. What that number means depends on how you interpret the bytes.
The fact that its repr happens to be b'A' isn't relevant. What the value actually is, is the one byte you want. But the repr of a bytes object, as the docs explain, uses the bytes literal format for convenience. Any byte that matches a printable ASCII character gets represented as that character, a few common values get represented with backslash escapes like \n, and anything else as a hex escape, all within b'…'
So, repr(bytes([65])) is b'A', because byte 65 is the printable ASCII character A.
If you want to get a string with the hexadecimal representation of the number 65, you can use the hex function—or, if you want more control over the formatting, the format function:
>>> hex(65)
'0x41'
>>> format(65, '02x')
'41'
But that's not what you want here. You want the value b'A', and you already have that.
65 is not A in hex, it's A in ASCII code; print(bytes([65])) and print(chr(65)) outputs b'A' and A, respectively (ASCII representations). Hexadecimal is merely a numeral system with 16 as its base. 0x41 is therefore 4 * 16^1 + 1 * 16^0 = 65.

Python 3 struct.pack() printing weird characters

I am testing struct module because I would like to send simple commands with parameters in bytes (char) and unsigned int to another application.
However I found some weird things when converting to little endian unsigned int, these examples print the correct hexadecimal representation:
>>> import struct
>>> struct.pack('<I',7)
b'\x07\x00\x00\x00'
>>> struct.pack('<I',11)
b'\x0b\x00\x00\x00'
>>> struct.pack('<I',16)
b'\x10\x00\x00\x00'
>>> struct.pack('<I',15)
b'\x0f\x00\x00\x00'
but these examples apparently not:
>>> struct.pack('<I',10)
b'\n\x00\x00\x00'
>>> struct.pack('<I',32)
b' \x00\x00\x00'
>>> struct.pack('<I',64)
b'#\x00\x00\x00'
I would appreciate any explanation or hint. Thanks beforehand!
Python is being helpful.
The bytes representation will use ASCII characters for any bytes that are printable and escape codes for the rest.
Thus, 0x40 is printed as #, because that's a printable byte. But 0x0a is represented as \n instead, because that is the standard Python escape sequence for a newline character. 0x00 is represented as \x00, a hex escape sequence denoting the NULL byte value. Etc.
All this is just the Python representation when echoing the values, for your debugging benefit. The actual value itself still consists of actual byte values.
>>> b'\x40' == b'#'
True
>>> b'\x0a' == b'\n'
True
It's just that any byte in the printable ASCII range will be shown as that ASCII character rather than a \xhh hex escape or dedicated \c one-character escape sequence.
If you wanted to see only hexadecimal representations, use the binascii.hexlify() function:
>>> import binascii
>>> binascii.hexlify(b'#\x00\x00\x00')
b'40000000'
>>> binascii.hexlify(b'\n\x00\x00\x00')
b'0a000000'
which returns bytes as hex characters (with no prefixes), instead. The return value is of course no longer the same value, you now have a bytestring of twice the original length consisting of characters representing hexadecimal values, literal a through to f and 0 through to 9 characters.
"\xNN" is just the way to represent a non-prinatble character ... it will give you the prinable character if it can
print "\x0a" == "\n" == chr(10)

Categories

Resources