How to XOR literal with a string - python

I'm trying to implement the Blowfish algorithm in Python. The way I understand it, I have to use a key like "abcd" and then XOR it with a hexadecimal array (cycling the key if necessary)
P = (
0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344, 0xa4093822, 0x299f31d0,
0x082efa98, 0xec4e6c89, 0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917, 0x9216d5d9, 0x8979fb1b,
)
The data types here have me very confused. I saw somewhere that 'abcd' = 0x61626364. In that case, XORing the first element of P would simply be 0x61626364 ^ 0x243f6a88.
So, how do I convert a string like 'abcd' to the format 0x?????. or perhaps there's a better way? Any light on this would be very appreciated!

To convert a string to an array of bytes:
b = bytes('abcd', 'ascii')
To convert array of bytes to int:
i = int.from_bytes(b, byteorder='big', signed=False)

Two hexadecimal digits can encode exactly one byte. This makes sense, because each hexadecimal digit can be in 16 different states, so two hexadecimal digits can be in 16 * 16 = 256 different states, which is exactly the same as the number of states representable in a single byte.
Because ASCII characters can also be encoded in a single byte, any ASCII character can be encoded as two hexadecimal digits.
For example, the letter a has character code 97 in ASCII. Converting the decimal number 97 to base 16 (hexadecimal) gives you 0x61.
You can therefore take any string and convert it into a hexadecimal number by taking every character and representing it as two hex digits in your number. Looking at your example above, a = 0x61, b = 0x62, c = 0x63, and d = 0x64. Putting these all together gives you the representation abcd = 0x61626364.

Related

How to compare a hex byte with its literal (visual) representation in Python?

I want to compare two entities, one being an int as a single byte and the other a str which is the ASCII code of the visual representation (visual reading) of that byte (not its ASCII value).
For example: I have the byte 0x5a, which I want to compare with a string that says '5a' (or '5A', case is not important). I don't need to compare the byte versus the 'Z' ASCII character, which in my case would be a different thing.
How can I do that?
There are functions that allow you to transform numbers into their string representation, in certain basis. In your case, hex should do the trick. For example:
>>> hex(0x5a)
'0x5a'
>>> hex(0x5a)[2:] # get rid of `0x` if you don't want it
'5a'
You can use hex() to turn the integer into a hex string, and then you can slice off the first two characters using string slicing to remove the leading 0x:
lhs = 90
rhs = "5a"
print(hex(lhs)[2:] == rhs)
This outputs:
True

Why is a whitespace character only represented by 6 bits in ASCII?

I have written a code in python to represent strings using their ASCII counterparts. I have noticed that every character is replaced by 7 bits (as I expected). The problem is that every time I include a space in the string I am converting it is only represented by 6 bits instead of 7. This is a bit of a problem for a Vernam Cipher program I am writing where my ASCII code is always a few bits smaller than my key due to spaces. Here is the code and output below:
string = 'Hello t'
ASCII = ""
for c in string:
ASCII += bin(ord(c))
ASCII = ASCII.replace('0b', ' ')
print(ASCII)
Output: 1001000 1100101 1101100 1101100 1101111 100000 1110100
As can be seen in the output the 6th sequence of bits which represents the space character has only 6 bits and not 7 like the rest of the characters.
Instead of bin(ord(c)), which will automatically strip leading bits, use string formatting to ensure a minimum width:
f'{ord(c):07b}'
The problem lies within your "conversion" - the value for whitespace happens to only need 6 bits, and the bin built-in simply don't do left padding with zeros. That is why you are getting 7 bits for other chars - but it would really be more confortable if you would use 8 bits for everything.
One way to go is, instead of using the bin call, use string formatting operators: these can, besides the base conversion, pad the missing bits with 0s:
string = 'Hello t'
# agregating values in a list so that you can easily separate the binary strings with a " "
# by using ".join"
bin_strings = []
for code in string.encode("ASCII"): # you really should do this from bytes -
#which are encoded text. Moreover, iterating a bytes
# object yield 0-255 ints, no need to call "ord"
bin_strings.append(f"{code:08b}") # format the number in `code` in base 2 (b), with 8 digits, padding with 0s
ASCII = ' '.join(bin_strings)
Or, as a oneliner:
ASCII = " ".join(f"{code:08b}" for code in "Hello World".encode("ASCII"))

Saving byte sequence when translating from string to bytearray

I get a string that is formed from messages of different types. I'm interested in the message, which before the appearance in the string was in the format bytearray, but now comes in the format str. For example, I need to translate 001bc5045000043a, obtained in str format, into byteray format, saving the numbers from this sequence to bytearray.
To convert a string of hex digits to a bytearray, assuming two hex digits per byte, use bytearray.fromhex:
>>> h = '001bc5045000043a'
>>> ba = bytearray.fromhex(h)
>>> ba
bytearray(b'\x00\x1b\xc5\x04P\x00\x04:')
Python will represent bytes as the equivalent ASCII character if the byte is in the ASCII range (0-127 / 0 - 0x7f) and the character is printable, hence 0x3a is displayed as ':'
>>> chr(int('3a', 16))
':'

Python store non numeric string as number

I am currently trying to find a way to convert any sort of text to a number, so that it can later be converted back to text.
So something like this:
text = "some string"
number = somefunction(text)
text = someotherfunction(number)
print(text) #output "some string"
If you're using Python 3, it's pretty easy. First, convert the str to bytes in a chosen encoding (utf-8 is usually appropriate), then use int.from_bytes to convert to an int:
number = int.from_bytes(mystring.encode('utf-8'), 'little')
Converting back is slightly trickier (and will lose trailing NUL bytes unless you've stored how long the resulting string should be somewhere else; if you switch to 'big' endianness, you lose leading NUL bytes instead of trailing):
recoveredstring = number.to_bytes((number.bit_length() + 7) // 8, 'little').decode('utf-8')
You can do something similar in Python 2, but it's less efficient/direct:
import binascii
number = int(binascii.hexlify(mystring.encode('utf-8')), 16)
hx = '%x' % number
hx = hx.zfill(len(hx) + (len(hx) & 1)) # Make even length hex nibbles
recoveredstring = binascii.unhexlify(hx).decode('utf-8')
That's equivalent to the 'big' endian approach in Python 3; reversing the intermediate bytes as you go in each direction would get the 'little' effect.
You can use the ASCII values to do this:
ASCII to int:
ord('a') # = 97
Back to a string:
str(unichr(97)) # = 'a'
From there you could iterate over the string one character at a time and store these in another string. Assuming you are using standard ASCII characters, you would need to zero pad the numbers (because some are two digits and some three) like so:
s = 'My string'
number_string = ''
for c in s:
number_string += str(ord(c)).zfill(3)
To decode this, you will read the new string three characters at a time and decode them into a new string.
This assumes a few things:
all characters can be represented by ASCII (you could use Unicode code points if not)
you are storing the numeric value as a string, not as an actual int type (not a big deal in Python—saves you from having to deal with maximum values for int on different systems)
you absolutely must have a numeric value, i.e. some kind of hexadecimal representation (which could be converted into an int) and cryptographic algorithms won't work
we're not talking about GB+ of text that needs to be converted in this manner

Byte conversion fail

I have a problem with Python's & bitwise operation:
>>> x = 0xc1
>>> y = 0x7f
>>> x & y
>>> 65
>>> bytes([65])
>>> b'A'
The problem is the conversion from decimal to hex. 65 is 0x41, however Python says that it is 'A'. Why?
The value that you already have is exactly the value you want. From a comment:
I was using bytes function because I want to concat the result of base64.b64decode(coded_string) with one more byte at the end.
bytes([65]) creates a bytes object with a single byte with the numeric value 65. What that number means depends on how you interpret the bytes.
The fact that its repr happens to be b'A' isn't relevant. What the value actually is, is the one byte you want. But the repr of a bytes object, as the docs explain, uses the bytes literal format for convenience. Any byte that matches a printable ASCII character gets represented as that character, a few common values get represented with backslash escapes like \n, and anything else as a hex escape, all within b'…'
So, repr(bytes([65])) is b'A', because byte 65 is the printable ASCII character A.
If you want to get a string with the hexadecimal representation of the number 65, you can use the hex function—or, if you want more control over the formatting, the format function:
>>> hex(65)
'0x41'
>>> format(65, '02x')
'41'
But that's not what you want here. You want the value b'A', and you already have that.
65 is not A in hex, it's A in ASCII code; print(bytes([65])) and print(chr(65)) outputs b'A' and A, respectively (ASCII representations). Hexadecimal is merely a numeral system with 16 as its base. 0x41 is therefore 4 * 16^1 + 1 * 16^0 = 65.

Categories

Resources