I was trying to read a txt file in python. After the following input:
f = open("test.txt","r") #opens file with name of "test.txt"
print(f.read(1))
print(f.read())
instead of looking at the text I'm returned this:
how do i visualize the output?
Thanks
I think you need to go line by line. Be careful, if its a big text file this could go on for awhile.
for line in f:
print line.decode('utf-8').strip()
some strings not being read correctly, so you'll need the decode line.
See:UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13: ordinal not in range(128)
try this setup...
it worked for my console printing problems:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
Related
I have been trying to put Arabic text in a .txt file and when do so using the code bellow I get this error: UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-4: character maps to <undefined>
code:
Log1 = open("File.txt", "a")
Log1.write("سلام")
Log1.close()
This question was asked in stack overflow many times but all of them has suggested using utf-8 which will output \xd8\xb3\xd9\x84\xd8\xa7\xd9\x85 for this case, I was wondering if there is anyways to make the thing in the text file look like the سلام instead of \xd8\xb3\xd9\x84\xd8\xa7\xd9\x85.
This works perfectly:
FILENAME = 'foo.txt'
with open(FILENAME, "w", encoding='utf-8') as data:
data.write("سلام")
Then from a zsh shell:
cat foo.txt
سلام
I am working with python2.7 and nltk on a large txt file of content scraped from various websites..however I am getting various unicode errors such as
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6: ordinal not in range(128)
My question is not so much how I can 'fix' this with python but instead is there anything I can do to the .txt file (as in formatting) before 'feeding' it to python, such as 'make plain text' to avoid this issue entirely?
Update:
I looked around and found a solution within python that seems to work perfectly:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
try opening the file with:
f = open(fname, encoding="ascii", errors="surrogateescape")
Change the "ascii" with the desired encoding.
I'm having a bit of trouble outputting words from a text-image to a .txt file.
import pytesseract
from PIL import Image, ImageEnhance, ImageFilter
text = pytesseract.image_to_string(Image.open("book_image.jpg"))
file = open("text_file","w")
file.write(text)
print(text)
The code which reads the image file and prints out the words on the image works fine. The problem is when I try to take the text and write it to file, I get the following error;
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 366: ordinal not in range(128)
Could anyone please explain how I can convert the variable text to a string?
Try this:
file = open("text_file", "w", encoding='utf8', errors="ignore")
Also try:
file.write(text).encode('utf-8').strip()
Im trying to write Chinese characters into a text file from a SQL output called result.
result looks like this:
[('你好吗', 345re4, '2015-07-20'), ('我很好',45dde2, '2015-07-20').....]
This is my code:
#result is a list of tuples
file = open("my.txt", "w")
for row in result:
print >> file, row[0].encode('utf-8')
file.close()
row[0] contains Chinese text like this: 你好吗
I also tried:
print >> file, str(row[0]).encode('utf-8')
and
print >> file, 'u'+str(row[0]).encode('utf-8')
but both gave the same error.
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)
Found a simple solution instead of doing encoding and decoding by formatting the file to "utf-8" from the beginning using codecs.
import codecs
file = codecs.open("my.txt", "w", "utf-8")
Don't forget to ad the UTF8 BOM on the file beginning if you wish to view your file in text editor correctly:
file = open(...)
file.write("\xef\xbb\xbf")
for row in result:
print >> file, u""+row[0].decode("mbcs").encode("utf-8")
file.close()
I think you'll have to decode from your machines default encoding to unicode(), then encode it as UTF-8.
mbcs represents (at least it did ages a go) default encoding on Windows.
But do not rely on that.
Did you try the codecs module?
Reading the word "beyoncè" from a text file, python is handling it as "beyonc\xc3\xa9".
If I write it into a file, it shows correctly, but in console It shows like that.
Also If I try to use it in my program I get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 17: ordinal not in range(128)
how can I let Python read beyoncè from a text file as beyonce and getting rid of this problem?
See if this helps:
f= open('mytextfile.txt', encoding='utf-8', 'w')
f.write(line)
try
string="beyonc\xc3\xa9"
string.decode("utf-8")
foo=open("foo.txt","wb")
foo.write(string)
foo.close()