How to convert hex string to plain text? - python

How to read a hex file and convert it to plain text?
For example, this is my file user.dat.(For China mainland user.dat)
And here is what I have tried so far:
# -*- coding:utf-8 -*-
with open('user.dat','rb') as f:
data = f.read()
print data
And the result is like this.Some is right,while some is not.
How to get the entire right content?

just add this line of in your code, str.decode('hex') will decode string into plain text.
output = data.decode('hex')
print output
OK you have some error so try this...
import binascii
with open('user.dat', 'rb') as f:
data = f.read()
print(binascii.hexlify(data))

Related

trying to print human readable ascii string

I am trying to print a string which is human readable ascii but not getting any output. What am i missing?
import string
file = open("file.txt", "r")
data = file.read()
data = data.split("\n")
for line in data:
if line not in string.printable:
continue
else:
print line
If your file's content is text, you should read files like this:
import string
with open("file.txt", "r") as file:
for line in file:
if all( c in string.printable for c in line):
print line
You must check every character individually to see if it is printable. There is another post about checking that string is printable: Test if a python string is printable
Also, you can read about context manager about how to open file right way: What is the most pythonic way to open a file?

How to display cyrillic text from a file in python?

I want to read some cyrilic text from a txt file in Python 3.
This is what the text file contains.
абцдефгчийклмнопярстувшхыз
I used:
with open('text.txt', 'r') as myfile:
text=myfile.read()
print (text)
But this is the ouput in the python shell:
ÿþ01F45D3G89:;<=>?O#ABC2HEK7
Can someone explain why this is the output?
Python supports utf-8 for this sort of thing.
You should be able to do:
with open('text.txt', encoding = 'utf-8', mode = 'r') as my_file:
...
Also, be sure that your text file is saved with utf-8 encoding. I tested this in my shell and without proper encoding my output was:
?????????????????????
With proper encoding:
file = open('text.txt', encoding='utf-8', mode='r')
text = file.read()
print(text)
абцдефгчийклмнопярстувшхы
Try working on the file using codecs, you need to
import codecs
and then do
text = codecs.open('text.txt', 'r', 'utf-8')
Basically you need utf8

Processing Russian text file fails

I have this code:
# -*- coding: utf-8 -*-
import codecs
prefix = u"а"
rus_file = "rus_names.txt"
output = "rus_surnames.txt"
with codecs.open(rus_file, 'r', 'utf-8') as infile:
with codecs.open(output, 'a', 'utf-8') as outfile:
for line in infile.readlines():
outfile.write(line+prefix)
And it gives me smth kinda chineese text in an output file. Even when I try to outfile.write(line) it gives me the same crap in an output. I just don't get it.
The purpose: I have a huge file with male surnames. I need to get the same file with female surnames. In russian it looks like this: Ivanov - Ivanova | Иванов - Иванова
Try
lastname = str(line+prefix, 'utf-8')
outfile.write(lastname)
So #AndreyAtapin was partially right. I've tried to add lines in a file which contains my previous mistakes with chineese characters. even flushing the file didn't help. But when I delete it and script creates it once again, it works! thanks.

Some characters (trademark sign, etc) unable to write to a file but is printable on the screen

I've been trying to scrape data from a website and write out the data that I find to a file. More than 90% of the time, I don't run into Unicode errors but when the data has the following characters such as "Burger King®, Hans Café", it doesn't like writing that into the file so my error handling prints it to the screen as is and without any further errors.
I've tried the encode and decode functions and the various encodings but to no avail.
Please find an excerpt of the current code that I've written below:
import urllib2,sys
import re
import os
import urllib
import string
import time
from BeautifulSoup import BeautifulSoup,NavigableString, SoupStrainer
from string import maketrans
import codecs
f=codecs.open('alldetails7.txt', mode='w', encoding='utf-8', errors='replace')
...
soup5 = BeautifulSoup(html5)
enc_s5 = soup5.originalEncoding
for company in iter(soup5.findAll(height="20px")):
stream = ""
count_detail = 1
for tag in iter(company.findAll('td')):
if count_detail > 1:
stream = stream + tag.text.replace(u',',u';')
if count_detail < 4 :
stream=stream+","
count_detail = count_detail + 1
stream.strip()
try:
f.write(str(stnum)+","+br_name_addr+","+stream.decode(enc_s5)+os.linesep)
except:
print "Unicode error ->"+str(storenum)+","+branch_name_address+","+stream
Your f.write() line doesn't make sense to me - stream will be a unicode since it's made indirectly from from tag.text and BeautifulSoup gives you Unicode, so you shouldn't call decode on stream. (You use decode to turn a str with a particular character encoding into a unicode.) You've opened the file for writing with codecs.open() and told it to use UTF-8, so you can just write() a unicode and that should work. So, instead I would try:
f.write(unicode(stnum)+br_name_addr+u","+stream+os.linesep)
... or, supposing that instead you had just opened the file with f=open('alldetails7.txt','w'), you would do:
line = unicode(stnum)+br_name_addr+u","+stream+os.linesep
f.write(line.encode('utf-8'))
Have you checked the encoding of the file you're writing to, and made sure the characters can be shown in the encoding you're trying to write to the file? Try setting the character encoding to UTF-8 or something else explicitly to have the characters show up.

This is my current way of writing to a file. However, I can't do UTF-8?

f = open("go.txt", "w")
f.write(title)
f.close()
What if "title" is in japanese/utf-8? How do I modify this code to be able to write "title" without having the ascii error?
Edit: Then, how do I read this file in UTF-8?
How to use UTF-8:
import codecs
# ...
# title is a unicode string
# ...
f = codecs.open("go.txt", "w", "utf-8")
f.write(title)
# ...
fileObj = codecs.open("go.txt", "r", "utf-8")
u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file
It depends on whether you want to insert a Unicode UTF-8 byte order mark, of which the only way I know of is to open a normal file and write:
import codecs
f = open('go.txt', 'wb')
f.write(codecs.BOM_UTF8)
f.write(title.encode('utf-8')
f.close()
Generally though, I don't want to add a UTF-8 BOM and the following will suffice though:
import codecs
f = codecs.open('go.txt', 'w', 'utf-8')
f.write(title)
f.close()

Categories

Resources