I am getting a message 'ascii' codec can't encode character u'\xe9' when I am writing a string to my file, heres how I am writing my file
my_file = open(output_path, "w")
my_file.write(output_string)
my_file.close()
I have been searching and found answers like this UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) and the first one didn't work and then this one I'm confused why I am encoding data I want to be able to read
import io
f = io.open(filename, 'w', encoding='utf8')
Thanks for the help
As mentioned, you're trying to write non-ASCII characters with the ASCII encoding. Since the built-in open function doesn't support the encoding parameter, then consider always using io.open in Python 2.7 (which is the default since Python 3.x).
Related
I am using Python Notebook to open this text file in Windows 10. However UTF-8 encoding isn't working. How should I solve this error?
Python is trying to open the file using your system's default encoding, but the bytes in the file cannot be decoded with that encoding.
You need to pass the correct encoding to open. We don't know what the correct encoding is, but the most likely candidates are UTF-8 or, less common these days, latin-1. So you would do something like
with open('myfile.txt', 'r', encoding='UTF-8') as f:
for line in f:
# do something with the line
I have run some scritp in Python 2.7 which generated a file, and when I tried to open it I found the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 2623: ordinal not in range(128)
Any clue on how to open it in Python 3.5?
Your file is in utf-8 (probably). The ASCII codec can't decode unicode text.
You should use the proper codec. The file.read() function returns a bytes-like object. You can turn that into a string like so:
contents = str(file.read(), 'utf-8')
You can specify the encoding when opening the file:
with open(myfile, encoding='utf-8) as f:
pass
I am trying a simple parsing of a file and get the error due to special characters:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
infile = 'finance.txt'
input = open(infile)
for line in input:
if line.startswith(u'▼'):
I get the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1718: ordinal not in range(128)
Solution?
You need to provide the encoding. For example if it is utf-8:
import io
with io.open(infile, encoding='utf-8') as fobj:
for line in fobj:
if line.startswith(u'▼'):
This works for Python 2 and 3. Per default Python 2 opens files assuming no encoding, i.e. reading the content will return byte strings. Therefore, you can read only ascii characters. In Python 3 the default is what
locale.getpreferredencoding(False) returns, in many cases utf-8. The standard open() in Python 2 does not allow to specify an encoding. Using io.open() makes it future proof because you don't need to change your code when switching to Python 3.
In Python 3:
>>> io.open is open
True
Open your file with the correct encoding, for example if your file is UTF8 encoded with Python 3:
with open('finance.txt', encoding='utf8') as f:
for line in input:
if line.startswith(u'▼'):
# whatever
With Python 2 you can use io.open() (also works in Python 3):
import io
with io.open('finance.txt', encoding='utf8') as f:
for line in input:
if line.startswith(u'▼'):
# whatever
I have an input file in Windows-1252 encoding that contains the '®' character. I need to write this character to a UTF-8 file. Also assume I must use Python 2.7. Seems easy enough, but I keep getting UnicodeDecodeErrors.
I originally had just opened the original file using codecs.open() with UTF-8 encoding, which worked fine for all of the ASCII characters until it encountered the ® symbol, whereupon it choked with the error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 2867043:
invalid start byte
I knew that I would have to properly decode it as cp1252 to fix this problem, so I opened it in the proper encoding and then encoded the data as UTF-8 prior to writing. But that produced a new error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 22:
ordinal not in range(128)
Here is a minimum working example:
with codecs.open('in.txt', mode='rb', encoding='cp1252') as inf:
with codecs.open('out.txt', mode='wb', encoding='utf-8') as of:
for line in inf:
of.write(line.encode('utf-8'))
Here is the contents of in.txt:
Sample file
Here is my sample file® yay.
I thought perhaps I could just open it in 'rb' mode with no encoding specified and specifically handle the decoding and encoding of each line like so:
of.write(line.decode('cp1252').encode('utf-8'))
But that also didn't work, giving the same error as when I just opened it as UTF-8.
How do I read data from a Windows-1252 file, properly decode it then encode it as UTF-8 and write it to a UTF-8 file? The above method has always worked for me in the past until I encountered the ® character.
Your file is not in Windows-1252 if 0xC2 should represent the ® character; in Windows-1252, 0xC2 is Â.
However, you should just use
of.write(line)
since encoding properly is the whole reason you're using codecs in the first place.
Im trying to write Chinese characters into a text file from a SQL output called result.
result looks like this:
[('你好吗', 345re4, '2015-07-20'), ('我很好',45dde2, '2015-07-20').....]
This is my code:
#result is a list of tuples
file = open("my.txt", "w")
for row in result:
print >> file, row[0].encode('utf-8')
file.close()
row[0] contains Chinese text like this: 你好吗
I also tried:
print >> file, str(row[0]).encode('utf-8')
and
print >> file, 'u'+str(row[0]).encode('utf-8')
but both gave the same error.
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)
Found a simple solution instead of doing encoding and decoding by formatting the file to "utf-8" from the beginning using codecs.
import codecs
file = codecs.open("my.txt", "w", "utf-8")
Don't forget to ad the UTF8 BOM on the file beginning if you wish to view your file in text editor correctly:
file = open(...)
file.write("\xef\xbb\xbf")
for row in result:
print >> file, u""+row[0].decode("mbcs").encode("utf-8")
file.close()
I think you'll have to decode from your machines default encoding to unicode(), then encode it as UTF-8.
mbcs represents (at least it did ages a go) default encoding on Windows.
But do not rely on that.
Did you try the codecs module?