How to read file which contains 16-bit hex value? - python

I have a file like this:
\u9515\u7691\u853c\u788d\u7231
\u9515\u7691\u853c\u788d\u7231
\u9515\u7691\u853c\u788d\u7231
now I want to read this file to print string, I do this like this:
with open(fi, "rb") as fi:
print(fi.readline().strip().decode("utf-8"))
but I find that it still print
\u9515\u7691\u853c\u788d\u7231
how can I get the real string:
锕皑蔼碍爱

you can decode your string using unicode-escape
line = "\\u9515\\u7691\\u853c\\u788d\\u7231"
print line.decode("unicode-escape")

Your decode function treats your data as regular string. Try doing it like this:
with open(fi, "rb") as fi:
data = fi.readline().strip()
encode_data = data.encode("utf-8")
print(encode_data.decode("utf-8")

Alternatively, as this is a Python escaped string, you can use ast.literal_eval:
line = r"\u9515\u7691\u853c\u788d\u7231"
print(ast.literal_eval('u"' + line + '"')
gives as expected:
锕皑蔼碍爱

Related

How can I decode a string read from file?

I read an file into a string in Python, and it shows up as encoded (not sure the encoding).
query = ""
with open(file_path) as f:
for line in f.readlines():
print(line)
query += line
query
The lines all print out in English as expected
select * from table
but the query at the end shows up like
ÿþd\x00r\x00o\x00p\x00 \x00t\x00a\x00b\x00l\x00e\x00
What's going on?
Agreed with Carlos, the encoding seems to be UTF-16LE. There seems to be BOM present, thus encoding="utf-16" would be able to autodetect if it's little- or big-endian.
Idiomatic Python would be:
with open(file_path, encoding="...") as f:
for line in f:
# do something with this line
In your case, you append each line to query, thus entire code can be reduced to:
query = open(file_path, encoding="...").read()
It seems like UTF-16 data.
Can you try decoding it with utf-16?
with open(file_path) as f:
query=f.decode('utf-16')
print(query)
with open(filePath) as f:
fileContents = f.read()
if isinstance(fileContents, str):
fileContents = fileContents.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.
elif isinstance(fileContents, unicode):
fileContents = fileContents.encode('ascii', 'ignore')

re.sub replace hexadecimal with string

I have a file that contains hexadecimal numbers that I want to convert to strings:
'\x73\x63\x6f\x72\x65\x73': '\x4c\x6f\x72\x65\x6d\x20\x69\x70\x73\x75\x6d',
'Status', ['\x75\x70\x64\x61\x74\x65']
But when using re.sub to replace each occurrence of a hexadecimal escaped number by its ascii representation, it doesn't seem to find the hexadecimal number in the first place.
I've tried using raw strings, but it didn't change anything. I still can't replace them.
import re, binascii
with open('hex.txt', 'r') as f:
file = f.read()
hexList = re.findall(r"'([\\x\w+]*)'", file)
for item in hexList:
file = re.sub(r"('{}')".format(item), str(binascii.unhexlify(item.replace('\\x', ''))), file)
#file = re.sub("('"+item+"')".format(item), str(binascii.unhexlify(item.replace('\\x', ''))), file)
print(file)```
You can use the following code fragment
import re, binascii
with open('hex.txt', 'r') as f:
file = f.read()
hexList = re.findall(r'((?:\\x[0-9a-f][0-9a-f])+)', file)
for item in hexList:
file = re.sub(r"('{}')".format(item.replace('\\', '\\\\')), str(binascii.unhexlify(item.replace('\\x', ''))), file)
print(file)
The regex used by you for finding the hex-strings is wrong because it is even finding the Status as the hex-string.

python write umlauts into file

i have the following output, which i want to write into a file:
l = ["Bücher", "Hefte, "Mappen"]
i do it like:
f = codecs.open("testfile.txt", "a", stdout_encoding)
f.write(l)
f.close()
in my Textfile i want to see: ["Bücher", "Hefte, "Mappen"] instead of B\xc3\xbccher
Is there any way to do so without looping over the list and decode each item ? Like to give the write() function any parameter?
Many thanks
First, make sure you use unicode strings: add the "u" prefix to strings:
l = [u"Bücher", u"Hefte", u"Mappen"]
Then you can write or append to a file:
I recommend you to use the io module which is Python 2/3 compatible.
with io.open("testfile.txt", mode="a", encoding="UTF8") as fd:
for line in l:
fd.write(line + "\n")
To read your text file in one piece:
with io.open("testfile.txt", mode="r", encoding="UTF8") as fd:
content = fd.read()
The result content is an Unicode string.
If you decode this string using UTF8 encoding, you'll get bytes string like this:
b"B\xc3\xbccher"
Edit using writelines.
The method writelines() writes a sequence of strings to the file. The sequence can be any iterable object producing strings, typically a list of strings. There is no return value.
# add new lines
lines = [line + "\n" for line in l]
with io.open("testfile.txt", mode="a", encoding="UTF8") as fd:
fd.writelines(lines)

Python: Decode base64 multiple strings in a file

I'm new to python and I have a file like this:
cw==ZA==YQ==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==dA==ZQ==cw==dA==
It's an keybord input, coded with base64, and new I want to decode it
I try this by the code is stoping at first character decoded.
import base64
file = "my_file.txt"
fin = open(file, "rb")
binary_data = fin.read()
fin.close()
b64_data = base64.b64decode(binary_data)
b64_fname = "original_b64.txt"
fout = open(b64_fname, "w")
fout.write(b64_data)
fout.close
Any help is welcome. thanks
I assume that you created your test input string yourself.
If I split your test input string in blocks of 4 characters and decode each one apart, I get the following:
>>> import base64
>>> s = 'cw==ZA==YQ==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==dA==ZQ==cw==dA=='
>>> ''.join(base64.b64decode(s[i:i+4]) for i in range(0, len(s), 4))
'sdadasdasdasdasdtest'
However, the correct base64 encoding of your test string sdadasdasdasdasdtest is:
>>> base64.b64encode('sdadasdasdasdasdtest')
'c2RhZGFzZGFzZGFzZGFzZHRlc3Q='
If you place this string in my_file.txt (and rewriting your code to be a bit more concise) then it all works.
import base64
with open("my_file.txt") as f, open("original_b64.txt", 'w') as g:
encoded = f.read()
decoded = base64.b64decode(encoded)
g.write(decoded)

Create text file of hexadecimal from binary

I would like to convert a binary to hexadecimal in a certain format and save it as a text file.
The end product should be something like this:
"\x7f\xe8\x89\x00\x00\x00\x60\x89\xe5\x31\xd2\x64\x8b\x52"
Input is from an executable file "a".
This is my current code:
with open('a', 'rb') as f:
byte = f.read(1)
hexbyte = '\\x%02s' % byte
print hexbyte
A few issues with this:
This only prints the first byte.
The result is "\x" and a box like this:
00
7f
In terminal it looks exactly like this:
Why is this so? And finally, how do I save all the hexadecimals to a text file to get the end product shown above?
EDIT: Able to save the file as text with
txt = open('out.txt', 'w')
print >> txt, hexbyte
txt.close()
You can't inject numbers into escape sequences like that. Escape sequences are essentially constants, so, they can't have dynamic parts.
There's already a module for this, anyway:
from binascii import hexlify
with open('test', 'rb') as f:
print(hexlify(f.read()).decode('utf-8'))
Just use the hexlify function on a byte string and it'll give you a hex byte string. You need the decode to convert it back into an ordinary string.
Not quite sure if decode works in Python 2, but you really should be using Python 3, anyway.
Your output looks like a representation of a bytestring in Python returned by repr():
with open('input_file', 'rb') as file:
print repr(file.read())
Note: some bytes are shown as ascii characters e.g. '\x52' == 'R'. If you want all bytes to be shown as the hex escapes:
with open('input_file', 'rb') as file:
print "\\x" + "\\x".join([c.encode('hex') for c in file.read()])
Just add the content to list and print:
with open("default.png",'rb') as file_png:
a = file_png.read()
l = []
l.append(a)
print l

Categories

Resources