Why is a whitespace character only represented by 6 bits in ASCII? - python

I have written a code in python to represent strings using their ASCII counterparts. I have noticed that every character is replaced by 7 bits (as I expected). The problem is that every time I include a space in the string I am converting it is only represented by 6 bits instead of 7. This is a bit of a problem for a Vernam Cipher program I am writing where my ASCII code is always a few bits smaller than my key due to spaces. Here is the code and output below:
string = 'Hello t'
ASCII = ""
for c in string:
ASCII += bin(ord(c))
ASCII = ASCII.replace('0b', ' ')
print(ASCII)
Output: 1001000 1100101 1101100 1101100 1101111 100000 1110100
As can be seen in the output the 6th sequence of bits which represents the space character has only 6 bits and not 7 like the rest of the characters.

Instead of bin(ord(c)), which will automatically strip leading bits, use string formatting to ensure a minimum width:
f'{ord(c):07b}'

The problem lies within your "conversion" - the value for whitespace happens to only need 6 bits, and the bin built-in simply don't do left padding with zeros. That is why you are getting 7 bits for other chars - but it would really be more confortable if you would use 8 bits for everything.
One way to go is, instead of using the bin call, use string formatting operators: these can, besides the base conversion, pad the missing bits with 0s:
string = 'Hello t'
# agregating values in a list so that you can easily separate the binary strings with a " "
# by using ".join"
bin_strings = []
for code in string.encode("ASCII"): # you really should do this from bytes -
#which are encoded text. Moreover, iterating a bytes
# object yield 0-255 ints, no need to call "ord"
bin_strings.append(f"{code:08b}") # format the number in `code` in base 2 (b), with 8 digits, padding with 0s
ASCII = ' '.join(bin_strings)
Or, as a oneliner:
ASCII = " ".join(f"{code:08b}" for code in "Hello World".encode("ASCII"))

Related

hexdump function in python

With below python code
zero = '00000000000000000000000000000000'
print(bytes.fromhex(zero))
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
three = '33333333333333333333333333333333'
print(bytes.fromhex(three))
b'3333333333333333
On printing bytes.fromhex(zero), exact 32 character is printed in hexadecimal.
But while printing bytes.fromhex(three), only 16 character is printed. zero and three string both are of same length:32
The method fromhex(string) returns a bytes object, decoding the given string object. The string must contain two hexadecimal digits per byte, with ASCII whitespace being ignored.
It means the method read one-byte hex string "33" as ASCII code and 33 in ASCII code represents number 3.
Try "34343434" or "32323232". You will see 4444 or 2222.
Have a look at https://docs.python.org/3/library/stdtypes.html

Python incorrectly converts between bytes and hex for me

I have an info_address that I want to convert to delimited hex
info_address_original = b'002dd748'
What i want is
info_address_coded = b'\x00\x2d\xd7\x48'
I tried this solution
info_address_original = b'002dd748'
info_address_intermediary = info_address_original.decode("utf-8") # '002dd748'
info_address_coded = bytes.fromhex( info_address_intermediary ) # b'\x00-\xd7H'
and i get
info_address_coded = b'\x00-\xd7H'
What my debugger shows
How would one go about correctly turning a bytes string like that to delimited hex? It worked implicitly in Python 2 but it doesn't work the way i would want in Python 3.
This is only a representation of the bytes. '-' is the same as '\x2d'.
>>> b'\x00\x2d\xd7\x48' == b'\x00-\xd7H'
True
The default representation of a byte string is to display the character value for all ascii printable characters and the encoded \xhh representation where hh is the hexadecimal value of the byte.
That means that b'\x00\x2d\xd7\x48' and `b'\x00-\xd7H' are the exact same string containing 4 bytes.

How to convert byte string with non-printable chars to hexadecimal in python? [duplicate]

This question already has answers here:
What's the correct way to convert bytes to a hex string in Python 3?
(9 answers)
Closed 7 years ago.
I have an ANSI string Ď–ór˙rXüď\ő‡íQl7 and I need to convert it to hexadecimal like this:
06cf96f30a7258fcef5cf587ed51156c37 (converted with XVI32).
The problem is that Python cannot encode all characters correctly (some of them are incorrectly displayed even here, on Stack Overflow) so I have to deal with them with a byte string.
So the above string is in bytes this: b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
And that's what I need to convert to hexadecimal.
So far I tried binascii with no success, I've tried this:
h = ""
for i in b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7':
h += hex(i)
print(h)
It prints:
0x60xcf0x960xf30xa0x720x830xff0x720x580xfc0xef0x5c0xf50x870xed0x510x150x6c0x37
Okay. It looks like I'm getting somewhere... but what's up with the 0x thing?
When I remove 0x from the string like this:
h.replace("0x", "")
I get 6cf96f3a7283ff7258fcef5cf587ed51156c37 which looks like it's correct.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
Any ideas?
If you're running python 3.5+, bytes type has an new bytes.hex() method that returns string representation.
>>> h = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> h.hex()
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
Otherwise you can use binascii.hexlify() to do the same thing
>>> import binascii
>>> binascii.hexlify(h).decode('utf8')
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
As per the documentation, hex() converts “an integer number to a lowercase hexadecimal string prefixed with ‘0x’.” So when using hex() you always get a 0x prefix. You will always have to remove that if you want to concatenate multiple hex representations.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
That does not make any sense. x is not a valid hexadecimal character, so in your solution it can only be generated by the hex() call. And that, as said above, will always create a 0x. So the sequence 0x can never appear in a different way in your resulting string, so replacing 0x by nothing should work just fine.
The actual problem in your solution is that hex() does not enforce a two-digit result, as simply shown by this example:
>>> hex(10)
'0xa'
>>> hex(2)
'0x2'
So in your case, since the string starts with b\x06 which represents the number 6, hex(6) only returns 0x6, so you only get a single digit here which is the real cause of your problem.
What you can do is use format strings to perform the conversion to hexadecimal. That way you can both leave out the prefix and enforce a length of two digits. You can then use str.join to combine it all into a single hexadecimal string:
>>> value = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> ''.join(['{:02x}'.format(x) for x in value])
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
This solution does not only work with a bytes string but with really anything that can be formatted as a hexadecimal string (e.g. an integer list):
>>> value = [1, 2, 3, 4]
>>> ''.join(['{:02x}'.format(x) for x in value])
'01020304'

Python store non numeric string as number

I am currently trying to find a way to convert any sort of text to a number, so that it can later be converted back to text.
So something like this:
text = "some string"
number = somefunction(text)
text = someotherfunction(number)
print(text) #output "some string"
If you're using Python 3, it's pretty easy. First, convert the str to bytes in a chosen encoding (utf-8 is usually appropriate), then use int.from_bytes to convert to an int:
number = int.from_bytes(mystring.encode('utf-8'), 'little')
Converting back is slightly trickier (and will lose trailing NUL bytes unless you've stored how long the resulting string should be somewhere else; if you switch to 'big' endianness, you lose leading NUL bytes instead of trailing):
recoveredstring = number.to_bytes((number.bit_length() + 7) // 8, 'little').decode('utf-8')
You can do something similar in Python 2, but it's less efficient/direct:
import binascii
number = int(binascii.hexlify(mystring.encode('utf-8')), 16)
hx = '%x' % number
hx = hx.zfill(len(hx) + (len(hx) & 1)) # Make even length hex nibbles
recoveredstring = binascii.unhexlify(hx).decode('utf-8')
That's equivalent to the 'big' endian approach in Python 3; reversing the intermediate bytes as you go in each direction would get the 'little' effect.
You can use the ASCII values to do this:
ASCII to int:
ord('a') # = 97
Back to a string:
str(unichr(97)) # = 'a'
From there you could iterate over the string one character at a time and store these in another string. Assuming you are using standard ASCII characters, you would need to zero pad the numbers (because some are two digits and some three) like so:
s = 'My string'
number_string = ''
for c in s:
number_string += str(ord(c)).zfill(3)
To decode this, you will read the new string three characters at a time and decode them into a new string.
This assumes a few things:
all characters can be represented by ASCII (you could use Unicode code points if not)
you are storing the numeric value as a string, not as an actual int type (not a big deal in Python—saves you from having to deal with maximum values for int on different systems)
you absolutely must have a numeric value, i.e. some kind of hexadecimal representation (which could be converted into an int) and cryptographic algorithms won't work
we're not talking about GB+ of text that needs to be converted in this manner

How to XOR literal with a string

I'm trying to implement the Blowfish algorithm in Python. The way I understand it, I have to use a key like "abcd" and then XOR it with a hexadecimal array (cycling the key if necessary)
P = (
0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344, 0xa4093822, 0x299f31d0,
0x082efa98, 0xec4e6c89, 0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917, 0x9216d5d9, 0x8979fb1b,
)
The data types here have me very confused. I saw somewhere that 'abcd' = 0x61626364. In that case, XORing the first element of P would simply be 0x61626364 ^ 0x243f6a88.
So, how do I convert a string like 'abcd' to the format 0x?????. or perhaps there's a better way? Any light on this would be very appreciated!
To convert a string to an array of bytes:
b = bytes('abcd', 'ascii')
To convert array of bytes to int:
i = int.from_bytes(b, byteorder='big', signed=False)
Two hexadecimal digits can encode exactly one byte. This makes sense, because each hexadecimal digit can be in 16 different states, so two hexadecimal digits can be in 16 * 16 = 256 different states, which is exactly the same as the number of states representable in a single byte.
Because ASCII characters can also be encoded in a single byte, any ASCII character can be encoded as two hexadecimal digits.
For example, the letter a has character code 97 in ASCII. Converting the decimal number 97 to base 16 (hexadecimal) gives you 0x61.
You can therefore take any string and convert it into a hexadecimal number by taking every character and representing it as two hex digits in your number. Looking at your example above, a = 0x61, b = 0x62, c = 0x63, and d = 0x64. Putting these all together gives you the representation abcd = 0x61626364.

Categories

Resources