Failing to Convert Files from CSV to Excel - python

Attempting to convert a folder list of csv files to excel. Unfortunately most of them do not work and I also get following errors. When I do the same via excel front end, it works fine to save them from csv. Any ideas what I might be doing wrong?
import os
import glob
import csv
import openpyxl # from https://pythonhosted.org/openpyxl/ or PyPI (e.g. via pip)
for csvfile in glob.glob(os.path.join('.', '*.csv')):
wb = openpyxl.Workbook()
ws = wb.active
with open(csvfile, 'rb') as f:
reader = csv.reader(f)
for r, row in enumerate(reader, start=1):
for c, val in enumerate(row, start=1):
ws.cell(row=r, column=c).value = val
wb.save(csvfile + '.xlsx')
Get the following errors:
(most recent call last):
File "C:\Users\test\Documents\ConvertCSVtoXLSX\2007+.py", line 14, in
ws.cell(row=r, column=c).value = val
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 272, in value
self._bind_value(value)
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 229, in _bind_value
value = self.check_string(value)
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 180, in check_string
value = unicode(value, self.encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 30: invalid start byte

It looks like pyopenxl expects that it will receive data encoded as UTF-8, but the data you are passing it has some other encoding - probably one of the Windows cp* codepages. You can determine the system's default encoding by calling locale.getpreferredencoding. Let's assume it's cp1252.
In the traceback, we can see that this is the failing line:
unicode(value, self.encoding)
resulting in this error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 30: invalid start byte
pyopenxl is trying to decode the value it receives from UTF-8, and failing; we can work round this by re-encoding the value before passing it to pyopenxl.
for c, val in enumerate(row, start=1):
fixed_val = unicode(val, 'cp1252').encode('utf-8')
ws.cell(row=r, column=c).value = fixed_val
If it's possible that some of your files are encoded as UTF-8 and some are encoded in your system's default encoding, you may need to wrap the original assignment in a try/except block
for c, val in enumerate(row, start=1):
try:
ws.cell(row=r, column=c).value = val
except UnicodeDecodeError:
fixed_val = unicode(val, 'cp1252').encode('utf-8')
ws.cell(row=r, column=c).value = fixed_val

Related

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 355: invalid start byte

I've been trying to iterate through a csv file with the following code:
`
import csv
import os, sys
directory = "/Users/aliharam/Desktop/Lamis File"
files = []
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
# checking if it is a file
if os.path.isfile(f):
files.append(f)
files.pop()
for i in files:
with open(i, 'r') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
print(row)
`
This is the error I am getting:
Traceback (most recent call last):
File "/Users/aliharam/PycharmProjects/LamisTasks/Normalization.py", line 16, in <module>
for row in datareader:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 355: invalid start byte
['\tAli Haram \tAli Haram ']
Process finished with exit code 1
How do I fix this?!!
I tried using
dataset = pd.read_csv(i, header= 0,
encoding= 'unicode_escape')
and
with io.open(filename, 'r', encoding='utf-8') as fn:
lines = fn.readlines()
both didn't work
The file your program reads contains character(at position 355) which does not belong to Unicode.
If we assume you are reading a Unicode encoded file, then there is an error in your data file. First you need to make sure the file your program reads is encoded in Unicode or not.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python. While the file is encoded in utf-8

I am getting a decoding error in the below code while the file is already is in utf-8. Please explain how I can solve this issue.
I am getting errors on the loop
for row in reader:
def loadPhishTank(self):
db = RedirectDB(self.runConfig)
phishFile = self.runConfig.phishTankLocation+'phishTank-'+self.runConfig.day+'.csv'
if (not os.path.exists(self.runConfig.phishTankLocation)):
os.makedirs(self.runConfig.phishTankLocation)
self.downloadPhishTank(phishFile)
with open(phishFile, encoding='utf-8') as fin:
reader = csv.DictReader(fin)
urls = {}
for row in reader:
url = row['url']
meta = {'phish_detail_url':row['phish_detail_url'],
'submission_time':row['submission_time'],
'src':'phishTank'}
urls[url] = meta
samples = random.sample(list(urls.keys()),self.runConfig.phishTankSampleSize)
for sample in samples:
db.addUrlsFromList(sample,urls[sample],'phishTank',self.runConfig)
db.close()

Python 2: ASCII issue when writing in Excel files

Problem sketch:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
I'm trying to write a simple python program that can auto-complete the blank units with data appeared in the same column above.
Since there're Chinese characters in the file, I've thought of the issue of ASCII, so I tried to change it into UTF-8.
Codes shown below:
#!/usr/bin/python
# -*- coding:utf-8 -*-
from xlrd import open_workbook
from xlwt import Workbook
from xlutils.copy import copy
rb = open_workbook('data.xls', 'utf-8')
wb = copy(rb)
sh = wb.get_sheet(0)
s = rb.sheet_by_index(0)
cols = s.ncols
rows = s.nrows
temp = 0
for cx in range(cols):
for rx in range(rows):
if s.cell_value(rowx = rx, colx = cx).encode('utf-8') != "":
temp = s.cell_value(rowx = rx, colx = cx).encode('utf-8')
print(temp) #to verify
else:
sh.write(rx, cx, temp)
wb.save('data.xls')
However, the issue still happened. Result in terminal:
ZishengdeMacBook-Pro:Downloads zisheng$ python form.py
(printed result ignored, and it looked good)
Traceback (most recent call last):
File "form.py", line 41, in <module>
wb.save('data.xls')
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/Workbook.py", line 710, in save
doc.save(filename_or_stream, self.get_biff_data())
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/Workbook.py", line 674, in get_biff_data
shared_str_table = self.__sst_rec()
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/Workbook.py", line 636, in __sst_rec
return self.__sst.get_biff_record()
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/BIFFRecords.py", line 77, in get_biff_record
self._add_to_sst(s)
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/BIFFRecords.py", line 92, in _add_to_sst
u_str = upack2(s, self.encoding)
File "/Users/zisheng/anaconda/lib/python2.7/site-packages/xlwt/UnicodeUtils.py", line 50, in upack2
us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
Anyone can help? Thanks in advance!
I've figured it out!
To solve this, we can add UTF-8 notation in the writing process:
sh.write(rx, cx, unicode(temp, 'utf-8'))
And it's done.
Problem solved.
To solve this, we can add UTF-8 notation in the writing process:
sh.write(rx, cx, unicode(temp, 'utf-8'))

return codecs.ascii_decode(input, self.errors)[0]

I am reading a songs file in csv format and I do not know what I am doing wrong.
import csv
import os
import random
file = open("songs.csv", "rU")
reader = csv.reader(file)
for song in reader:
print(song[0], song[1], song[2])
file.close()
This is the error:
Traceback (most recent call last):
File "/Users/kuku/Desktop/hey/mine/test.py", line 10, in <module>
for song in reader:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 414: ordinal not in range(128)
try
for song in [unicode(song, 'utf-8') for song in reader]:
print(...)
With this bit of your code:
for song in reader:
print( song[0], song[1],song[2])
you are printing elements 0, 1 and 2 of the lines in reader during each iteration of the loop. This will cause a (different) error if there are fewer than 3 elements in total.
If you don't know that there will be at least 3 elements in each line, you could include the code in a try, except block:
with open("songs.csv", "r") as f:
song_reader = csv.reader(f)
for song_line in song_reader:
lyric = song_line
try:
print(lyric[0], lyric[1], lyric[2])
except:
pass # ...or preferably do something better
It's worth noting that in most cases it is preferable to open a file within a with block, as shown above. This negates the need for file.close().
You can open the file in utf-8 encoding.
file = open("songs.csv", "rU", encoding="utf-8")

Write ® to csv with csv.writer

I'm trying to write strings with '®' to a csv file:
csvfile = open(destination, "wb")
csv_writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL, delimiter='\t')
for row in data:
csv_writer.writerow(row)
csvfile.close()
and row looks like this:
[123, "str", "str2 ®"]
The strings I'm trying to write to csv is retrieved from xml, which I believe is encoded to utf-8.
I get error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/ec2-user/django/app/models.py", line 94, in import_data
load_to_csv(out, out_data)
File "/home/ec2-user/django/utils/util.py", line 90, in load_to_csv
csv_writer.writerow(row)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 66: ordinal not in range(128)
Then I tried to encode the string to utf-8:
csvfile = open(destination, "wb")
csv_writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL, delimiter='\t')
for row in data:
for i, r in enumerate(row):
if type(r) is str:
row[i] = r.encode('utf-8')
csv_writer.writerow(row)
csvfile.close()
But I still get the same error.. Could anyone help? Have been stuck for a while..
You have a Unicode value, not a byte string. Encode those:
for row in data:
row = [c.encode('utf8') if isinstance(c, unicode) else c for c in row]:
csv_writer.writerow(row)

Categories

Resources