About UnicodeDecodeError - python

I am writing a program to count the words with python(3.6), the code runs smoothly from the terminal. But if I use python IDLE, below error happens:
Traceback (most recent call last):
File "/Users/zhangchaont/python/Course Python Programming/6.7V2.py", line 122, in <module>
main()
File "/Users/zhangchaont/python/Course Python Programming/6.7V2.py", line 21, in main
for line in txtFile:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 33: ordinal not in range(128)
How to solve this?

Since there is not much info about your code. I can only suggest instead of codecs you can also use this package.
https://github.com/iki/unidecode. The method below should solve your problem. Open your file with open method, and pass it the file_handle.read()
unidecode.unidecode_expect_nonascii(string)

Related

Python: Can't seem to decode the textfile

I am trying to open the file i.md3 in python. Only the first line is displayed properly.
This file is correct as I can open this in C easily with structures and pointers.
How can I decode this file. I have tried many encoding techniques many of which can show only output of first line.
Without the "encoding=cp850" there is an error:
Traceback (most recent call last):
File "D:\Eclipse Workspace\IGG Project\Main.py", line 40, in <module>
line = fp1.read()
File "C:\Program Files (x86)\Python38-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 35: character maps to <undefined>
This file is correct as I can open this in C easily with structures and pointers.
Code:
fp1 = open("i.md3", encoding="cp850")
while 1:
line = fp1.read()
if not line:
break
print (line)
First few lines of output is in the link below:
https://i.stack.imgur.com/TGZhq.png

Error when printing random line from text file

I need to print a random line from the file "Long films".
My code is:
import random
with open('Long films') as f:
lines = f.readlines()
print(random.choice(lines))
But it prints this error:
Traceback (most recent call last):
line 3, in <module>
lines = f.readlines()
line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 36: ordinal not in range(128)
What do I need to do in order to avoid this error?
The problem is not with printing, it is with reading. It seems your file has some special characters. Try opening your file with a different encoding:
with open('Long films', encoding='latin-1') as f:
...
Also, have you made any settings to your locale? Have you set any encoding scheme at the top of your file? Ordinarily, python3 will "helpfully" decode your text to utf-8, so you typically should not be getting this error.

VADER-Sentiment-Analysis toolkit and decoding to UTF-8

I'm trying out this awesome sentiment analysis toolkit for python called Vader (https://github.com/cjhutto/vaderSentiment#python-code-example). However, I'm not even able to run their examples, because of a decoding problem (?).
I've tried the .decode('utf-8'), but it still gives me this error code:
Traceback (most recent call last):
File "/Users/solari/Codes/EmotionalTwitter/vader.py", line 22, in
<module>
analyzer = SentimentIntensityAnalyzer()
File "/usr/local/lib/python3.6/site-
packages/vaderSentiment/vaderSentiment.py", line 199, in __init__
self.lexicon_full_filepath = f.read()
File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/l
ib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6573: ordinal not in range(128)
[Finished in 0.5s with exit code 1]
Why does it complain about this "ascii codec"? Because if I've read their documentation correctly this should be in utf-8 anyway. Also, I'm using Python 3.6.2.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

I have written the code below that provides the path to an Excel file that will be created with the XLWT module
master_path = r"C:\Users\nbt8ye8\Documents\Docs\Report Automation\KBE Reporting\Reports"
master_excel_file_raw = "KBE Master Data.xls"
master_excel_file = os.path.join(master_path, master_excel_file_raw)
Then later in the code I create the Excel file (which works without a problem) with the code below:
master_excel_wbook = xlwt.Workbook()
master_excel_wsheet = master_excel_wbook.add_sheet("All Data", cell_overwrite_ok=True)
master_excel_wbook.save(master_excel_file)
master_excel_wbook.save(tempfile.TemporaryFile())
However when I run the code it gives me the errors below.
Traceback (most recent call last):
File "C:\Users\nbt8ye8\workspace\Report Automation\import_data.py", line 1225, in <module>
createExcelFile()
File "C:\Users\nbt8ye8\workspace\Report Automation\import_data.py", line 1219, in createExcelFile
master_excel_wbook.save(master_excel_file)
File "build\bdist.win32\egg\xlwt\Workbook.py", line 662, in save
File "build\bdist.win32\egg\xlwt\Workbook.py", line 637, in get_biff_data
File "build\bdist.win32\egg\xlwt\Workbook.py", line 599, in __sst_rec
File "build\bdist.win32\egg\xlwt\BIFFRecords.py", line 76, in get_biff_record
File "build\bdist.win32\egg\xlwt\BIFFRecords.py", line 91, in _add_to_sst
File "build\bdist.win32\egg\xlwt\UnicodeUtils.py", line 50, in upack2
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
Does anyone know how to resolve this issue? I have tried encoding and decoding the strings and so far it has not worked but it is also highly likely I did not do it correctly. Any help would be greatly apprecieated. Thank you!
#RyanG was correct, this error was thrown because there was unicode data in my Excel file. Once I modified the Excel file to remove the unicode data the problem was resolved. Thanks again.

UnicodeDecodeError for writing file

I know that this is a very common error, but it's the first time I've encountered it when trying to write a file.
I'm using networkx to work with graphs for network analysis, and when I try to write into any format:
nx.write_gml(G, "Graph.gml")
nx.write_pajek(G, "Graph.net")
nx.write_gexf(G, "graph.gexf")
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2, in write_pajek
File "/Library/Python/2.7/site-packages/networkx/utils/decorators.py", line 263, in _open_file
result = func(*new_args, **kwargs)
File "/Library/Python/2.7/site-packages/networkx/readwrite/pajek.py", line 100, in write_pajek
path.write(line.encode(encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
I haven't found documentation on this, so quite confused.
Wondering if you can make use of codec module to solve it or not. Just create a file object by codec as following before feeding to networkx.
ex,
import codecs
f = codecs.open("graph.gml", "w", "utf-8")

Categories

Resources