I use python3 module xlsxwriter write a excel file, and I want give the file a name which contains chinese word. However it gives me an exception UnicodeEncodeError: 'latin-1' codec can't encode characters in position 3-8: ordinal not in range(256)
I want to know how to deal with it.
Try to put a 'u' before the file name to represent it as a unicode string.
u'filename_with_chinese_chars"
Related
I'm trying to export some data to CSV from out of a database, and I'm struggling to understand the following UnicodeEncodeError:
>>> sample
u'I\u2019m now'
>>> type(sample)
<type 'unicode'>
>>> str(sample)
Traceback (most recent call last):
File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 1: ordinal not in range(128)
>>> print sample
I’m now
>>> sample.encode('utf-8', 'ignore')
'I\xe2\x80\x99m now'
I'm confused. Is it unicode or not? What does the UnicodeEncodeError actually mean in this context? Why does print work just fine? If I want to be able to save this data to a CSV file, how can I handle the encoding so that it does not generate an error when I try to use csv.writer's writerow?
Thanks for your help.
It is a Python unicode object, you used type(sample) to verify that. Also, it contains Unicode, so you can serialize it to a file that has one of the Unicode encodings.
The encoding error needs to be read carefully: It is the "ascii" codec that can't represent that string. ASCII is just the Unicode subset with codepoints below 127. Your string uses codepoint 0x2019, so it can't be encoded with ASCII.
print works because it is correctly implemented and it doesn't try to encode the string as ASCII. I think you would get similar errors if stdout was set up with e.g. Latin-1 as encoding, but it seems your system can handle a wider range of Unicode than that.
In order to write a CSV file, you could just use UTF-8 as encoding for that file. I haven't used the CSV module though, so I'm not sure exactly how. In any case, if it doesn't work, you should provide the exact code that doesn't as MCVE in a different question.
BTW: Please upgrade to Python 3! It has many improvements over the 2.x series, also concerning string/Unicode handling.
I have a text file which contains unicode characters in the following format:
\u0935\u094d\u0926\u094d\u0928\u094d\u0935\u094d\u0926\
I want to convert it into devnagri characters in the following format:
वर्जनरूपमिति दर्शित्म् । स पूरुषः अमृतत्वाय कल्पते व्द्न्व्द
and then write it to a file.
Presently my code
encoded = x.encode('utf-8')
print (encoded.decode('unicode-escape'))
can print the devnagri characters in the terminal. However when I try to write it to a file using
text = 'target:'+encoded.decode('unicode-escape')+'\n'
fileid.write(text)
I am getting the following error.
'ascii' codec can't encode characters in position 7-18: ordinal not in range(128)
Can anybody please help me?
If you are using Python 2 it's because after using .decode('unicode-escape') you have an unicode object and fileid.write() only accepts string objects. Python then tries to convert the object to a byte string by using the ASCII encoding that doesn't cover devnagri characters. This conversion causes the exception.
You need to manually convert the unicode string back into a byte string before writing it to the file:
fileid.write(text.encode('utf-8'))
Here I assumed you want UTF-8 encoding. If you want to save the characters in another encoding replace 'utf-8' with the name of that encoding.
In Python 3 you can set the used encoding when opening the file:
fileid = open('compare.txt', 'a', encoding='utf-8')
Then the extra .encode('utf-8') isn't neccessary.
Im trying to write Chinese characters into a text file from a SQL output called result.
result looks like this:
[('你好吗', 345re4, '2015-07-20'), ('我很好',45dde2, '2015-07-20').....]
This is my code:
#result is a list of tuples
file = open("my.txt", "w")
for row in result:
print >> file, row[0].encode('utf-8')
file.close()
row[0] contains Chinese text like this: 你好吗
I also tried:
print >> file, str(row[0]).encode('utf-8')
and
print >> file, 'u'+str(row[0]).encode('utf-8')
but both gave the same error.
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)
Found a simple solution instead of doing encoding and decoding by formatting the file to "utf-8" from the beginning using codecs.
import codecs
file = codecs.open("my.txt", "w", "utf-8")
Don't forget to ad the UTF8 BOM on the file beginning if you wish to view your file in text editor correctly:
file = open(...)
file.write("\xef\xbb\xbf")
for row in result:
print >> file, u""+row[0].decode("mbcs").encode("utf-8")
file.close()
I think you'll have to decode from your machines default encoding to unicode(), then encode it as UTF-8.
mbcs represents (at least it did ages a go) default encoding on Windows.
But do not rely on that.
Did you try the codecs module?
I have the line:
c.writerow(new_values)
That writes a number of values to a csv file. Normally it is working fine but sometimes it throws an exception and doesn't write the line in the csv file. I have no idea how I can find out why.
This is my exception handling right now:
try:
c.writerow(new_values)
except:
print()
print ("Write Error: ", new_values)
I commented out my own exception and it says:
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in position 14: character maps to <undefined>
Ok, I solved it by myself:
I just had to add ", encoding='utf-8'" to my csv.writer line:
c = csv.writer(open("Myfile.csv", 'w', newline='', encoding='utf-8'))
the csv module in python is notorious for not handling unicode characters well. Unless all characters fall in the ascii codec you probably won't be able to write the row. There is a (somewhat) drop in replacement called unicodecsv that you may want to look into. https://pypi.python.org/pypi/unicodecsv
I have some code that converts a Unicode representation of hebrew text file into hebrew for display
for example:
f = open(sys.argv[1])
for line in f:
print eval('u"' + line +'"')
This works fun when I run it in PyDev (eclipse), but when I run it from the command line, I get
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 9-10: ordinal not in range(256)
An example line from the input file is:
\u05d9\u05d5\u05dd
What is the problem? How can I solve this?
Do not use eval(); instead use the unicode_escape codec to interpret that data:
for line in f:
line = line.decode('unicode_escape')
The unicode_escape encoding interprets \uabcd character sequences the same way Python would when parsing a unicode literal in the source code:
>>> '\u05d9\u05d5\u05dd'.decode('unicode_escape')
u'\u05d9\u05d5\u05dd'
The exception you see is not caused by the eval() statement though; I suspect it is being caused by an attempt to print the result instead. Python will try to encode unicode values automatically and will detect what encoding the current terminal uses.
Your Eclipse output window uses a different encoding from your terminal; if the latter is configured to support Latin-1 then you'll see that exact exception, as Python tries to encode Hebrew codepoints to an encoding that doesn't support those:
>>> u'\u05d9\u05d5\u05dd'.encode('latin1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256)
The solution is to reconfigure your terminal (UTF-8 would be a good choice), or to not print unicode values with codepoints that cannot be encoded to Latin-1.
If you are redirecting output from Python to a file, then Python cannot determine the output encoding automatically. In that case you can use the PYTHONIOENCODING environment variable to tell Python what encoding to use for standard I/O:
PYTHONIOENCODING=utf-8 python yourscript.py > outputfile.txt
Thank you, this solved my problem.
line.decode('unicode_escape')
did the trick.
Followup - This now works, but if I try to send the output to a file:
python myScript.py > textfile.txt
The file itself has the error:
'ascii' codec can't encode characters in position 42-44: ordinal not in range(128)