when writing to csv file writerow fails with UnicodeEncodeError - python

I have the line:
c.writerow(new_values)
That writes a number of values to a csv file. Normally it is working fine but sometimes it throws an exception and doesn't write the line in the csv file. I have no idea how I can find out why.
This is my exception handling right now:
try:
c.writerow(new_values)
except:
print()
print ("Write Error: ", new_values)
I commented out my own exception and it says:
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in position 14: character maps to <undefined>

Ok, I solved it by myself:
I just had to add ", encoding='utf-8'" to my csv.writer line:
c = csv.writer(open("Myfile.csv", 'w', newline='', encoding='utf-8'))

the csv module in python is notorious for not handling unicode characters well. Unless all characters fall in the ascii codec you probably won't be able to write the row. There is a (somewhat) drop in replacement called unicodecsv that you may want to look into. https://pypi.python.org/pypi/unicodecsv

Related

Python - Reading CSV UnicodeError

I have exported a CSV from Kaggle - https://www.kaggle.com/ngyptr/python-nltk-sentiment-analysis. However, when I attempt to iterate through the file, I receive unicode errors concerning certain characters that cannot be encoded.
File "C:\Program Files\Python35\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 264: character maps to
I have enabled utf-8 encoding while opening the file, which I assumed would have decoded the ASCII characters. Evidently not.
My Code:
with open("sentimentDataSet.csv", "r", encoding="utf-8" ,errors='ignore', newline='') as file:
reader = csv.reader(file)-
for row in reader:
if row:
print(row)
if row[sentimentCsvColumn] == sentimentScores(row[textCsvColumn]):
accuracyCount += 1
print(accuracyCount)
That's an encode error as you're printing the row, and has little to do with reading the actual CSV.
Your Windows terminal is in CP850 encoding, which can't represent everything.
There are some things you can do here.
A simple way is to set the PYTHONIOENCODING environment variable to a combination that will trash things it can't represent. set PYTHONIOENCODING=cp850:replace before running Python will have Python replace characters unrepresentable in CP850 with question marks.
Change your terminal encoding to UTF-8: chcp 65001 before running Python.
Encode the thing by hand before printing: print(str(data).encode('ascii', 'replace'))
Don't print the thing.

Python Reading File and Identifying Source of UnicodeDecodeError

I am trying to read a text file using the following statement:
with open(inputFile) as fp:
for line in fp:
if len(line) > 0:
lineRecords.append(line.strip());
The problem is that I get the following error:
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 6880: character maps to <undefined>
My question is how can I identify exactly where in the file the error is encountered since the position Python gives is tied to the location in the record being read at the time and not the absolution position in the file. So is it the 6,880 character in record 20 or the 6,880 character in record 2000? Without record information, the position value returned by Python is worthless.
Bottom line: is there a way to get Python to tell me what record it was processing at the time it encountered the error?
(And yes I know that 0x9d is a tab character and that I can do a search for that but that is not what I am after.)
Thanks.
Update: the post at UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function has nothing to do with the question I am asking - which is how can I get Python to tell me what record of the input file it was reading when it encountered the unicode error.
I think the only way is to track the line number separately and output it yourself.
with open(inputFile) as fp:
num = 0
try:
for num, line in enumerate(fp):
if len(line) > 0:
lineRecords.append(line.strip())
except UnicodeDecodeError as e:
print('Line ', num, e)
You can use the read method of the file object to obtain the first 6880 characters, encode it, and the length of the resulting bytes object will be the index of the starting byte of the offending character:
with open(inputFile) as fp:
print(len(fp.read(6880).encode()))
I have faced this issue before and the easiest fix is to open file in utf8 mode
with open(inputFile, encoding="utf8") as fp:

python3 write excel has exception UnicodeEncodeError

I use python3 module xlsxwriter write a excel file, and I want give the file a name which contains chinese word. However it gives me an exception UnicodeEncodeError: 'latin-1' codec can't encode characters in position 3-8: ordinal not in range(256)
I want to know how to deal with it.
Try to put a 'u' before the file name to represent it as a unicode string.
u'filename_with_chinese_chars"

Processing CSV with Python

*I am having an issue processing CSV'S. I get this error:
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 22: character maps to <undefined>
What do I need to do to fix this? I think there is an issue where the CSV matches with the user's input.*
import csv
csvanswer=input("Type your issue here!").lower()
searchanswer=(csvanswer).split()
userinput = open("task2csv.csv")
csv_task=csv.reader(userinput)
for row in csv_task:
if row[0] in searchanswer:
print(row)
break
Your input file is probably in an encoding other than the default on your system. You can fix this by explicitly providing the correct file encoding to the open call (you should also pass newline='' to the open call to properly obey the csv module's requirements).
For example, if the file is UTF-8, you'd do:
userinput = open("task2csv.csv", encoding='utf-8', newline='')
If it's some other encoding (UTF-16 is common for files produced by Windows programs), you'd use that. If it's some terrible non-UTF encoding, you're going to have to figure out the locale settings on the machine that produced it, they could be any number of different encodings.

encoding='utf-8' raise UnicodeEncodeError when opening utf-8 file with Chinese char

I cann't open file with any Chinese charecter, with encording set to utf-8:
text = open('file.txt', mode='r', encoding='utf-8').read()
print(text)
UnicodeEncodeError: 'charmap' codec can't encode character '\u70e6' in position 0: character maps to <undefined>
The file is 100% utf-8.
http://asdfasd.net/423/file.txt
http://asdfasd.net/423/test.py
If I remove encoding='utf-8' everything is ok.
What is wrong here with encoding?
I always use encoding='utf-8' when opening files, I don't now what happened now.
The exception you see comes from printing your data. Printing requires that you encode the data to the encoding used by your terminal or Windows console.
You can see this from the exception (and from the traceback, but you didn't include that); if you have a problem with decoding data (which is what happens when you read from a file) then you would get a UnicodeDecodeError, you got a UnicodeEncodeError instead.
You need to either adjust your terminal or console encoding, or not print the data
See http://wiki.python.org/moin/PrintFails for troubleshooting help.

Categories

Resources