How to convert binary string to ascii string in python? [duplicate] - python

This question already has answers here:
Convert binary to ASCII and vice versa
(8 answers)
Closed 6 years ago.
I've made a little python program that reads binary from a file and stores it to a text file, read the text file and store the binary. But, I can't get the binary to work...
it reads the files like this:
f_bin = open(bin_file,"rb")
to_bin_data = f_bin.read()
bin_data = bin(reduce(lambda x, y: 256*x+y, (ord(c) for c in to_bin_data), 0))
f_bin.close()
this one doesen't work for me... Convert binary to ASCII and vice versa
Something like this webpage: http://www.roubaixinteractive.com/PlayGround/Binary_Conversion/Binary_To_Text.asp
Edit: I've now made a long if else script for it, but thanks for the answers

Let's take the word 'hello' which is 0110100001100101011011000110110001101111
To translate that back to characters we can use chr and int (with a base of 2) and some list slicing...
''.join(chr(int(bin_text[i:i+8], 2)) for i in xrange(0, len(bin_text), 8))
If we wanted to take 'hello' and convert it to binary we can use ord and string formatting...
''.join('{:08b}'.format(ord(c)) for c in 'hello')

Maybe you can use built-in functions:
>>> myString = "hello"
>>> ba = bytearray(myString)
>>> ba[0]
104
>>> bin(ba[0])
'0b1101000'
Split the 0b:
>>> bin(ba[0]).split('b')[1]
'1101000'
or
>>> bin(ba[0])[2:]
'1101000'
I'll hope you can solve your problem with the snippets! :)

I use the struct module:
import struct
buf=struct.unpack('c',to_bin_data) # for one character
buf=struct.unpack('s',to_bin_data) # for a string
edit: sorry, misunderstood the question... This works for binary data, not for strings of binary representaion of characters.

Related

Importing unicode characters from YAML to Python [duplicate]

I'm trying to write out to a flat file some Chinese, or Russian or various non-English character-sets for testing purposes. I'm getting stuck on how to output a Unicode hex-decimal or decimal value to its corresponding character.
For example in Python, if you had a hard coded set of characters like абвгдежзийкл you would assign value = u"абвгдежзийкл" and no problem.
If however you had a single decimal or hex decimal like 1081 / 0439 stored in a variable and you wanted to print that out with it's corresponding actual character (and not just output 0x439) how would this be done? The Unicode decimal/hex value above refers to й.
Python 2: Use unichr():
>>> print(unichr(1081))
й
Python 3: Use chr():
>>> print(chr(1081))
й
So the answer to the question is:
convert the hexadecimal value to decimal with int(hex_value, 16)
then get the corresponding strin with chr().
To sum up:
>>> print(chr(int('0x897F', 16)))
西
While working on a project that included parsing some JSONs, I encountered a similar problem. I had a lot of strings that had all non-ASCII characters escaped like this:
>>> print(content)
\u0412\u044B j\u0435\u0441\u0442\u0435 \u0438\u0437 \u0420\u043E\u0441\u0441\u0438\u0438?
...
>>> print(content)
\u010Cemu jesi na\u010Dinal izu\u010Dati med\u017Euslovjansky jezyk?
Converting such mixes symbol-by-symbol with unichr() would be tedious. The solution I eventually decided on:
content.encode("utf8").decode("unicode-escape")
The first operation (encoding) produces bytestrings like this:
b'\\u0412\\u044B j\\u0435\\u0441\\u0442\\u0435 \\u0438\\u0437 \\u0420\\u043E\\u0441\\u0441\\u0438\\u0438?'
b'\\u010Cemu jesi na\\u010Dinal izu\\u010Dati med\\u017Euslovjansky jezyk?'
and the second operation (decoding) transforms the byte string into Unicode string but with \\ replaced by \, which "unpacks" the characters, giving the result like this:
Вы jесте из России?
Čemu jesi načinal izučati medžuslovjansky jezyk?
If you run into the error:
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
While trying to convert your hex value using unichr, you can get around that error by doing something like:
>>> n = int('0001f600', 16)
>>> s = '\\U{:0>8X}'.format(n)
>>> s
'\\U0001F600'
>>> binary = s.decode('unicode-escape')
>>> print(binary)
😀

Create raw unicode character from hex string representation/enter single backslash [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
I want to create a raw unicode character from a string hex representation. That is, I have a string s = '\u0222' which will be the 'Ȣ' character.
Now, this works if I do
>>> s = '\u0222'
>>> print(s)
'Ȣ'
but, if I try to do concatenation, it comes out as
>>> h = '0222'
>>> s = r'\u' + '0222'
>>> print(s)
\u0222
>>> s
'\\u0222'
because as it can be seen, what's actually in string is '\\u' not '\u'. How can I create the unicode character from hex strings or, how can I enter a true single backslash?
This was a lot harder to solve than I initially expected:
code = '0222'
uni_code = r'\u' + code
s = uni_code.encode().decode('unicode_escape')
print(s)
Or
code = b'0222'
uni_code = b'\u' + code
s = uni_code.decode('unicode_escape')
print(s)
Entering \u0222 is only for string constants and the Python interpreter generates a single Unicode code point for that syntax. It's not meant to be constructed manually. The chr() function is used to generate Unicode code points. The following works for strings or integers:
>>> chr(int('0222',16)) # convert string to int base 16
'Ȣ'
>>> chr(0x222) # or just pass an integer.
'Ȣ'
And FYI ord() is the complementary function:
>>> hex(ord('Ȣ'))
'0x222'

Python 3: Get Bytes from File

I'm trying to get bytes from a png file in python 3, and print a string showing the bytes from the png file. However, it gives me this output:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00(\x00\x00\x00(\x08\x02\x00\x00\x00\x03\x9c/:\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00\x04gAMA\x00\x00\xb1\x8f\x0b\xfca\x05\x00\x00\x00\tpHYs\x00\x00\x0e\xc3\x00\x00\x0e\xc3\x01\xc7o\xa8d\x00\x00\x01XIDATXG\xe5\xcd\xb1m\x031\x14\x04\xd1\xebF\xad\xb8+\xd5\xe0\x8a\xe5`f\x19|,.\xa0\x0fL\xf4\xc0h\x08.\xafo\xf5>\xc8/a;\xc2/a;\xc2/a;\xc2/a;\xc2/a\x0b\xebC\x1c\r+la}\x88\xa3a\x85-\x88\xbf?\xff=p4\xac\xb0\x05q\xacl\x1c8\x1aV\xd8\x828V6\x0e\x1c\r+lA\x1c+\x1b\x07\x8e\x86\x15\xb6 \x8e\x95\x8d\x03G\xc3\n[\x10\xeb\xca\xbd\xfa\xc4\xd1\xb0\xc2\x16\xc4\xbar\xaf>q4\xac\xb0\x05\xb1\xae|\xde\xafz\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xf7\xea\x13G\xc3\n[\x10\xeb\xca\xbd\xfa\xc4\xd1\xb0\xc2\x16\xc4\xb1\xb2q\xe0hXa\x0b\xe2X\xd98p4\xac\xb0\x05q\xacl\x1c8\x1aV\xd8\x828V6\x0e\x1c\r+lA\x1c+\x1b\x07\x8e\x86\x15\xb6\xb0>\xc4\xd1\xb0\xc2\x16\xd6\x878\x1aV\xd8\x8e\xf0K\xd8\x8e\xf0K\xd8\x8e\xf0K\xd8\x8e\xf0K\xd8\x8e\xf0\xcb/s]\x7f\xf8o$|7\xc4\xdf\xeb\x00\x00\x00\x00IEND\xaeB`\x82'
instead of normal bytes (here are the bytes it should show): 89504E470D0A1A0A0000000D4948445200000028000000280802000000039C2F3A000000017352474200AECE1CE90000000467414D410000B18F0BFC6105000000097048597300000EC300000EC301C76FA86400000158494441545847E5CDB16D03311404D1EB46ADB82BD5E08AE56066197C2C2EA00F4CF4C068082EAF6FF53EC82F613BC22F613BC22F613BC22F613BC22F610BEB431C0D2B6C617D88A361852D88BF3FFF3D7034ACB00571AC6C1C381A56D8823856360E1C0D2B6C411C2B1B078E8615B6208E958D0347C30A5B10EBCABDFAC4D1B0C216C4BA72AF3E7134ACB005B1AE7CDEAF7AB8AD4F1C0D2B6C41AC2BE3BF75B8AD4F1C0D2B6C41AC2BE3BF75B8AD4F1C0D2B6C41AC2BE3BF75B8AD4F1C0D2B6C41AC2BE3BF75B8AD4F1C0D2B6C41AC2BE3BF75B8AD4F1C0D2B6C41AC2BE3BF75B8AD4F1C0D2B6C41AC2BE3BF75B8AD4F1C0D2B6C41AC2BF7EA1347C30A5B10EBCABDFAC4D1B0C216C4B1B271E06858610BE258D9387034ACB00571AC6C1C381A56D8823856360E1C0D2B6C411C2B1B078E8615B6B03EC4D1B0C216D687381A56D88EF04BD88EF04BD88EF04BD88EF04BD88EF0CB2F735D7FF86F247C37C4DFEB0000000049454E44AE426082
Here is the code that I wrote to do this:
fileread = input("Input File: ")
with open(fileread, 'rb') as readfile:
string = str(readfile.read())
readfile.close()
print("String: "+string)
newstr = str(bytes(string, 'utf-8').decode('utf-8'))
Can anyone help me?
You've got it right. It's just showing the ASCII representation of the data as that's usually the more useful form
>>> s = b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00(\x00\x00\x00(\x08\x02\x00\x00\x00\x03\x9c/:\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00\x04gAMA\x00\x00\xb1\x8f\x0b\xfca\x05\x00\x00\x00\tpHYs\x00\x00\x0e\xc3\x00\x00\x0e\xc3\x01\xc7o\xa8d\x00\x00\x01XIDATXG\xe5\xcd\xb1m\x031\x14\x04\xd1\xebF\xad\xb8+\xd5\xe0\x8a\xe5`f\x19|,.\xa0\x0fL\xf4\xc0h\x08.\xafo\xf5>\xc8/a;\xc2/a;\xc2/a;\xc2/a;\xc2/a\x0b\xebC\x1c\r+la}\x88\xa3a\x85-\x88\xbf?\xff=p4\xac\xb0\x05q\xacl\x1c8\x1aV\xd8\x828V6\x0e\x1c\r+lA\x1c+\x1b\x07\x8e\x86\x15\xb6 \x8e\x95\x8d\x03G\xc3\n[\x10\xeb\xca\xbd\xfa\xc4\xd1\xb0\xc2\x16\xc4\xbar\xaf>q4\xac\xb0\x05\xb1\xae|\xde\xafz\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xe3\xbfu\xb8\xadO\x1c\r+lA\xac+\xf7\xea\x13G\xc3\n[\x10\xeb\xca\xbd\xfa\xc4\xd1\xb0\xc2\x16\xc4\xb1\xb2q\xe0hXa\x0b\xe2X\xd98p4\xac\xb0\x05q\xacl\x1c8\x1aV\xd8\x828V6\x0e\x1c\r+lA\x1c+\x1b\x07\x8e\x86\x15\xb6\xb0>\xc4\xd1\xb0\xc2\x16\xd6\x878\x1aV\xd8\x8e\xf0K\xd8\x8e\xf0K\xd8\x8e\xf0K\xd8\x8e\xf0K\xd8\x8e\xf0\xcb/s]\x7f\xf8o$|7\xc4\xdf\xeb\x00\x00\x00\x00IEND\xaeB`\x82'
>>> s[0]
137
>>> s[1]
80
>>> s[2]
78
>>> hex(s[0])
'0x89'
>>> hex(s[1])
'0x50'
>>> hex(s[2])
'0x4e'
>>>
I don't think you'd need the UTF-8 decode step as this is just binary data right?
If you actually want an ASCII representation of the data in hex form to match what you have in the question you could use
>>> ''.join('%02x' % c for c in s)
'89504e470d0a1a0a0000000d4948445200000028000000280802000000039c2f3a000000017352474200aece1ce90000000467414d410000b18f0bfc6105000000097048597300000ec300000ec301c76fa86400000158494441545847e5cdb16d03311404d1eb46adb82bd5e08ae56066197c2c2ea00f4cf4c068082eaf6ff53ec82f613bc22f613bc22f613bc22f613bc22f610beb431c0d2b6c617d88a361852d88bf3fff3d7034acb00571ac6c1c381a56d8823856360e1c0d2b6c411c2b1b078e8615b6208e958d0347c30a5b10ebcabdfac4d1b0c216c4ba72af3e7134acb005b1ae7cdeaf7ab8ad4f1c0d2b6c41ac2be3bf75b8ad4f1c0d2b6c41ac2be3bf75b8ad4f1c0d2b6c41ac2be3bf75b8ad4f1c0d2b6c41ac2be3bf75b8ad4f1c0d2b6c41ac2be3bf75b8ad4f1c0d2b6c41ac2be3bf75b8ad4f1c0d2b6c41ac2be3bf75b8ad4f1c0d2b6c41ac2bf7ea1347c30a5b10ebcabdfac4d1b0c216c4b1b271e06858610be258d9387034acb00571ac6c1c381a56d8823856360e1c0d2b6c411c2b1b078e8615b6b03ec4d1b0c216d687381a56d88ef04bd88ef04bd88ef04bd88ef04bd88ef0cb2f735d7ff86f247c37c4dfeb0000000049454e44ae426082'
You're getting the bytes fine; you just want to print them differently from the default Python method (which uses characters for printable ASCII codes so you can read them more easily). Just iterate over the bytes and format them however you like:
for byte in string:
print(("%02x" % byte).upper(), end="")
If the file isn't too large, you could also do it with one print() call by doing the formatting all at once and printing that:
print("".join(("%02x" % byte).upper() for byte in string))
This will build a string using approximately 6 times the amount of memory as your file before printing it. Use the first method if this could be a problem.
Actually, I just remembered... Python has a module for this!
from binascii import hexlify
print(hexlify(string).upper())
This will actually use even more memory, since it converts the letters in the hex string to uppercase after building it, but if you're OK with lowercase letters in your hex, this is probably the best solution.
BTW, it's advisable not to call what you read from your file string; it's binary data, not text.

How to convert byte string with non-printable chars to hexadecimal in python? [duplicate]

This question already has answers here:
What's the correct way to convert bytes to a hex string in Python 3?
(9 answers)
Closed 7 years ago.
I have an ANSI string Ď–ór˙rXüď\ő‡íQl7 and I need to convert it to hexadecimal like this:
06cf96f30a7258fcef5cf587ed51156c37 (converted with XVI32).
The problem is that Python cannot encode all characters correctly (some of them are incorrectly displayed even here, on Stack Overflow) so I have to deal with them with a byte string.
So the above string is in bytes this: b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
And that's what I need to convert to hexadecimal.
So far I tried binascii with no success, I've tried this:
h = ""
for i in b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7':
h += hex(i)
print(h)
It prints:
0x60xcf0x960xf30xa0x720x830xff0x720x580xfc0xef0x5c0xf50x870xed0x510x150x6c0x37
Okay. It looks like I'm getting somewhere... but what's up with the 0x thing?
When I remove 0x from the string like this:
h.replace("0x", "")
I get 6cf96f3a7283ff7258fcef5cf587ed51156c37 which looks like it's correct.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
Any ideas?
If you're running python 3.5+, bytes type has an new bytes.hex() method that returns string representation.
>>> h = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> h.hex()
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
Otherwise you can use binascii.hexlify() to do the same thing
>>> import binascii
>>> binascii.hexlify(h).decode('utf8')
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
As per the documentation, hex() converts “an integer number to a lowercase hexadecimal string prefixed with ‘0x’.” So when using hex() you always get a 0x prefix. You will always have to remove that if you want to concatenate multiple hex representations.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
That does not make any sense. x is not a valid hexadecimal character, so in your solution it can only be generated by the hex() call. And that, as said above, will always create a 0x. So the sequence 0x can never appear in a different way in your resulting string, so replacing 0x by nothing should work just fine.
The actual problem in your solution is that hex() does not enforce a two-digit result, as simply shown by this example:
>>> hex(10)
'0xa'
>>> hex(2)
'0x2'
So in your case, since the string starts with b\x06 which represents the number 6, hex(6) only returns 0x6, so you only get a single digit here which is the real cause of your problem.
What you can do is use format strings to perform the conversion to hexadecimal. That way you can both leave out the prefix and enforce a length of two digits. You can then use str.join to combine it all into a single hexadecimal string:
>>> value = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> ''.join(['{:02x}'.format(x) for x in value])
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
This solution does not only work with a bytes string but with really anything that can be formatted as a hexadecimal string (e.g. an integer list):
>>> value = [1, 2, 3, 4]
>>> ''.join(['{:02x}'.format(x) for x in value])
'01020304'

How to convert a binary representation of a string back to the original string in Python?

I haven't been able to find an answer to the following question:
Start with a string, convert it to its binary representation. How do you get back the original string in Python?
Example:
a = 'hi us'
b = ''.join(format(ord(c), '08b') for c in a)
then b = 0110100001101001001000000111010101110011
Now I want to get 'hi us' back in Python 2.x. For example, this website accomplishes the task:
http://string-functions.com/binary-string.aspx
I've seen several answers for Java, but haven't had luck implementing to Python. I've also tried b.decode(), but don't know which encoding I should use in this case.
use this code:
import binascii
n = int('0110100001101001001000000111010101110011', 2)
binascii.unhexlify('%x' % n)
>>> print ''.join(chr(int(b[i:i+8], 2)) for i in range(0, len(b), 8))
'hi us'
Split b in chunks of 8, parse to int using radix 2, convert to char, join the resulting list as a string.

Categories

Resources