Python 2: ASCII issue when writing in Excel files - python

Problem sketch:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
I'm trying to write a simple python program that can auto-complete the blank units with data appeared in the same column above.
Since there're Chinese characters in the file, I've thought of the issue of ASCII, so I tried to change it into UTF-8.
Codes shown below:
#!/usr/bin/python
# -*- coding:utf-8 -*-
from xlrd import open_workbook
from xlwt import Workbook
from xlutils.copy import copy
rb = open_workbook('data.xls', 'utf-8')
wb = copy(rb)
sh = wb.get_sheet(0)
s = rb.sheet_by_index(0)
cols = s.ncols
rows = s.nrows
temp = 0
for cx in range(cols):
for rx in range(rows):
if s.cell_value(rowx = rx, colx = cx).encode('utf-8') != "":
temp = s.cell_value(rowx = rx, colx = cx).encode('utf-8')
print(temp) #to verify
else:
sh.write(rx, cx, temp)
wb.save('data.xls')
However, the issue still happened. Result in terminal:
ZishengdeMacBook-Pro:Downloads zisheng$ python form.py
(printed result ignored, and it looked good)
Traceback (most recent call last):
File "form.py", line 41, in <module>
wb.save('data.xls')
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/Workbook.py", line 710, in save
doc.save(filename_or_stream, self.get_biff_data())
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/Workbook.py", line 674, in get_biff_data
shared_str_table = self.__sst_rec()
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/Workbook.py", line 636, in __sst_rec
return self.__sst.get_biff_record()
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/BIFFRecords.py", line 77, in get_biff_record
self._add_to_sst(s)
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/BIFFRecords.py", line 92, in _add_to_sst
u_str = upack2(s, self.encoding)
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/UnicodeUtils.py", line 50, in upack2
us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
Anyone can help? Thanks in advance!
I've figured it out!
To solve this, we can add UTF-8 notation in the writing process:
sh.write(rx, cx, unicode(temp, 'utf-8'))
And it's done.

Problem solved.
To solve this, we can add UTF-8 notation in the writing process:
sh.write(rx, cx, unicode(temp, 'utf-8'))

Related

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 355: invalid start byte

I've been trying to iterate through a csv file with the following code:
`
import csv
import os, sys
directory = "/Users/aliharam/Desktop/Lamis File"
files = []
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
# checking if it is a file
if os.path.isfile(f):
files.append(f)
files.pop()
for i in files:
with open(i, 'r') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
print(row)
`
This is the error I am getting:
Traceback (most recent call last):
File "/Users/aliharam/PycharmProjects/LamisTasks/Normalization.py", line 16, in <module>
for row in datareader:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 355: invalid start byte
['\tAli Haram \tAli Haram ']
Process finished with exit code 1
How do I fix this?!!
I tried using
dataset = pd.read_csv(i, header= 0,
encoding= 'unicode_escape')
and
with io.open(filename, 'r', encoding='utf-8') as fn:
lines = fn.readlines()
both didn't work
The file your program reads contains character(at position 355) which does not belong to Unicode.
If we assume you are reading a Unicode encoded file, then there is an error in your data file. First you need to make sure the file your program reads is encoded in Unicode or not.

Failing to Convert Files from CSV to Excel

Attempting to convert a folder list of csv files to excel. Unfortunately most of them do not work and I also get following errors. When I do the same via excel front end, it works fine to save them from csv. Any ideas what I might be doing wrong?
import os
import glob
import csv
import openpyxl # from https://pythonhosted.org/openpyxl/ or PyPI (e.g. via pip)
for csvfile in glob.glob(os.path.join('.', '*.csv')):
wb = openpyxl.Workbook()
ws = wb.active
with open(csvfile, 'rb') as f:
reader = csv.reader(f)
for r, row in enumerate(reader, start=1):
for c, val in enumerate(row, start=1):
ws.cell(row=r, column=c).value = val
wb.save(csvfile + '.xlsx')
Get the following errors:
(most recent call last):
File "C:\Users\test\Documents\ConvertCSVtoXLSX\2007+.py", line 14, in
ws.cell(row=r, column=c).value = val
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 272, in value
self._bind_value(value)
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 229, in _bind_value
value = self.check_string(value)
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 180, in check_string
value = unicode(value, self.encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 30: invalid start byte
It looks like pyopenxl expects that it will receive data encoded as UTF-8, but the data you are passing it has some other encoding - probably one of the Windows cp* codepages. You can determine the system's default encoding by calling locale.getpreferredencoding. Let's assume it's cp1252.
In the traceback, we can see that this is the failing line:
unicode(value, self.encoding)
resulting in this error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 30: invalid start byte
pyopenxl is trying to decode the value it receives from UTF-8, and failing; we can work round this by re-encoding the value before passing it to pyopenxl.
for c, val in enumerate(row, start=1):
fixed_val = unicode(val, 'cp1252').encode('utf-8')
ws.cell(row=r, column=c).value = fixed_val
If it's possible that some of your files are encoded as UTF-8 and some are encoded in your system's default encoding, you may need to wrap the original assignment in a try/except block
for c, val in enumerate(row, start=1):
try:
ws.cell(row=r, column=c).value = val
except UnicodeDecodeError:
fixed_val = unicode(val, 'cp1252').encode('utf-8')
ws.cell(row=r, column=c).value = fixed_val

return codecs.ascii_decode(input, self.errors)[0]

I am reading a songs file in csv format and I do not know what I am doing wrong.
import csv
import os
import random
file = open("songs.csv", "rU")
reader = csv.reader(file)
for song in reader:
print(song[0], song[1], song[2])
file.close()
This is the error:
Traceback (most recent call last):
File "/Users/kuku/Desktop/hey/mine/test.py", line 10, in <module>
for song in reader:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 414: ordinal not in range(128)
try
for song in [unicode(song, 'utf-8') for song in reader]:
print(...)
With this bit of your code:
for song in reader:
print( song[0], song[1],song[2])
you are printing elements 0, 1 and 2 of the lines in reader during each iteration of the loop. This will cause a (different) error if there are fewer than 3 elements in total.
If you don't know that there will be at least 3 elements in each line, you could include the code in a try, except block:
with open("songs.csv", "r") as f:
song_reader = csv.reader(f)
for song_line in song_reader:
lyric = song_line
try:
print(lyric[0], lyric[1], lyric[2])
except:
pass # ...or preferably do something better
It's worth noting that in most cases it is preferable to open a file within a with block, as shown above. This negates the need for file.close().
You can open the file in utf-8 encoding.
file = open("songs.csv", "rU", encoding="utf-8")

json.loads() gives UnicodeEncodeError when parsing JSON object recived from node.js

I am trying to send some json object from my node.js server to a python script. However when trying to convert the json object to dictionary using json.loads, for many inputs there are UnicodeEncodeErrors. WHat do I need to do in order to correctly decode the js object.
Error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2062: character maps to <undefined>
at PythonShell.parseError (D:\Users\Temp\Desktop\empman\node_modules\python-shell\index.js:183:17)
at terminateIfNeeded (D:\Users\Temp\Desktop\empman\node_modules\python-shell\index.js:98:28)
at ChildProcess.<anonymous> (D:\Users\Temp\Desktop\empman\node_modules\python-shell\index.js:88:9)
at emitTwo (events.js:106:13)
at ChildProcess.emit (events.js:191:7)
at Process.ChildProcess._handle.onexit (internal/child_process.js:219:12)
at Process.onexit (D:\Users\Temp\Desktop\empman\node_modules\async-listener\glue.js:188:31)
----- Python Traceback -----
File "word.py", line 38, in <module>
json_data=open('data.txt').read()
File "D:\Users\Temp\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
corresponding python code
from docx import Document
from docx.shared import Inches
import sys
import io
import json
document = Document('template.docx')
# newdocument = Document('resume.docx')
# print(sys.argv) # Note the first argument is always the script filename.
resumearray = [];
for x in range(0, 21):
resumearray.append(input())
#json_data=open('data.txt').read()
f = io.open('data','r', encoding='utf-16-le')
# #datastore = json.loads(f.read)
print(f.read())
# text = f.read()
# json_data = text
# document.add_paragraph('_______________________________________________________________________')
#document.add_paragraph(resumearray[1])
k=resumearray[1]
#document.add_paragraph(k)
jsobject = json.loads(k)
document.add_paragraph('_______________________________________________')
#document.add_paragraph(jsobject.values())
for x in range(0, 9):
if resumearray[x]=='[]':
document.add_paragraph('nothing was found')
else:
document.add_paragraph(resumearray[x])
You are running python on Windows, where the default encoding is cp1252. The json is encoded as utf-8, hence the error.
>>> with open('blob.json', encoding='cp1252') as f:
... j = json.load(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python3.6/json/__init__.py", line 296, in load
return loads(fp.read(),
File "/usr/local/lib/python3.6/encodings/cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2795: character maps to <undefined>
Use utf-8 instead:
>>> with open('blob.json', encoding='utf-8') as f:
... j = json.load(f)
...
>>> print(len(j))
29

codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 318: ordinal not in range(128)

I am trying to open and readlines a .txt file that contains a large amount of text. Below is my code, i dont know how to solve this problem. Any help would be very appreciated.
file = input("Please enter a .txt file: ")
myfile = open(file)
x = myfile.readlines()
print (x)
when i enter the .txt file this is the full error message is displayed below:
line 10, in <module> x = myfile.readlines()
line 26, in decode return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 318: ordinal not in range(128)
Instead of using codecs, I solve it this way:
def test():
path = './test.log'
file = open(path, 'r+', encoding='utf-8')
while True:
lines = file.readlines()
if not lines:
break
for line in lines:
print(line)
You must give encoding param precisely.
You can also try to encode :
with open(file) as f:
for line in f:
line = line.encode('ascii','ignore').decode('UTF-8','ignore')
print(line)
#AndriiAbramamov is right, your shoud check that question, here is a way you can open your file which is also on that link
import codecs
f = codecs.open('words.txt', 'r', 'UTF-8')
for line in f:
print(line)
Another way is to use regex, so when you open the file you can remove any special character like double quotes and so on.

Categories

Resources