Write ® to csv with csv.writer - python

I'm trying to write strings with '®' to a csv file:
csvfile = open(destination, "wb")
csv_writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL, delimiter='\t')
for row in data:
csv_writer.writerow(row)
csvfile.close()
and row looks like this:
[123, "str", "str2 ®"]
The strings I'm trying to write to csv is retrieved from xml, which I believe is encoded to utf-8.
I get error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/ec2-user/django/app/models.py", line 94, in import_data
load_to_csv(out, out_data)
File "/home/ec2-user/django/utils/util.py", line 90, in load_to_csv
csv_writer.writerow(row)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 66: ordinal not in range(128)
Then I tried to encode the string to utf-8:
csvfile = open(destination, "wb")
csv_writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL, delimiter='\t')
for row in data:
for i, r in enumerate(row):
if type(r) is str:
row[i] = r.encode('utf-8')
csv_writer.writerow(row)
csvfile.close()
But I still get the same error.. Could anyone help? Have been stuck for a while..

You have a Unicode value, not a byte string. Encode those:
for row in data:
row = [c.encode('utf8') if isinstance(c, unicode) else c for c in row]:
csv_writer.writerow(row)

Related

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 355: invalid start byte

I've been trying to iterate through a csv file with the following code:
`
import csv
import os, sys
directory = "/Users/aliharam/Desktop/Lamis File"
files = []
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
# checking if it is a file
if os.path.isfile(f):
files.append(f)
files.pop()
for i in files:
with open(i, 'r') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
print(row)
`
This is the error I am getting:
Traceback (most recent call last):
File "/Users/aliharam/PycharmProjects/LamisTasks/Normalization.py", line 16, in <module>
for row in datareader:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 355: invalid start byte
['\tAli Haram \tAli Haram ']
Process finished with exit code 1
How do I fix this?!!
I tried using
dataset = pd.read_csv(i, header= 0,
encoding= 'unicode_escape')
and
with io.open(filename, 'r', encoding='utf-8') as fn:
lines = fn.readlines()
both didn't work
The file your program reads contains character(at position 355) which does not belong to Unicode.
If we assume you are reading a Unicode encoded file, then there is an error in your data file. First you need to make sure the file your program reads is encoded in Unicode or not.

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 123: character maps to <undefined>

I wrote this code:
#app.route('/cafes')
def cafes():
with open('cafe-data.csv', newline='') as csv_file
csv_data = csv.reader(csv_file, delimiter=',')
list_of_rows = []
for row in csv_data:
list_of_rows.append(row)
return render_template('cafes.html', cafes=list_of_rows)
But got this error on my website
You need to specify the encoding on the csvfile, default is 'utf-8' but there are others like 'cp-1252' that are commonly used as well.
with open('cafe-data.csv', newline='', encoding='utf-8') as csv_file:
...

reading .dat file in python (Agilent 4294A Precision Impedance Analyzer)

I've been trying to read a .dat file from an Agilent impedance analyzer. I keep getting the same error regardless of the method I try. Any ideas how to get around this issue?
Thanks in advance.
# import csv
# Method 1
# with open("RP.dat") as infile, open("outfile.csv", "w") as outfile:
# csv_writer = csv.writer(outfile)
# prev = ''
# csv_writer.writerow(['ID', 'PARENT_ID'])
# for line in infile.read().splitlines():
# csv_writer.writerow([line, prev])
# prev = line
# Method 2
# import numpy as np
# filename = 'RP.dat'
# indata = np.loadtxt(filename)
# print(indata)
# Method 3
with open("RP.dat") as infile:
file_contents = infile.readlines()
print(file_contents)
C:\Users\benjy\Workspace\urop>python read_dat.py
Traceback (most recent call last):
File "C:\Users\benjy\Workspace\urop\read_dat.py", line 17, in <module>
file_contents = infile.readlines()
File "C:\Users\benjy\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 672: character maps to <undefined>
You can use codecs library
import codecs
with codecs.open('RP.dat', errors='ignore', encoding='utf-8') as f:
dat = f.read()

Python UnicodeEncodeError involving 'charmap' codec

This code was working fine before but now when i try to write a list to a csv file I get this error -
File "C:/Users/wf5931/OneDrive - ENGIE/Documents/Python Scripts/Scrape Vehicle Reg Info/vehicleRegChecker 6.1.py", line 109, in openFile
writer.writerow(x)
File "C:\Users\wf5931\AppData\Local\Continuum\anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2082' in position 78: character maps to <undefined
from this:
with open(vehicleRegInformation, 'w', newline='') as f:
writer = csv.writer(f)
for x in vehicleRegInfo:
writer.writerow(x)
Try adding encoding="utf-8" :
with open(vehicleRegInformation, 'w', newline='',encoding="utf-8") as f:
writer = csv.writer(f)
for x in vehicleRegInfo:
writer.writerow(x)
Add encoding to the file opening
with open(vehicleRegInformation, 'w', newline='', encoding='utf8') as f:

Failing to Convert Files from CSV to Excel

Attempting to convert a folder list of csv files to excel. Unfortunately most of them do not work and I also get following errors. When I do the same via excel front end, it works fine to save them from csv. Any ideas what I might be doing wrong?
import os
import glob
import csv
import openpyxl # from https://pythonhosted.org/openpyxl/ or PyPI (e.g. via pip)
for csvfile in glob.glob(os.path.join('.', '*.csv')):
wb = openpyxl.Workbook()
ws = wb.active
with open(csvfile, 'rb') as f:
reader = csv.reader(f)
for r, row in enumerate(reader, start=1):
for c, val in enumerate(row, start=1):
ws.cell(row=r, column=c).value = val
wb.save(csvfile + '.xlsx')
Get the following errors:
(most recent call last):
File "C:\Users\test\Documents\ConvertCSVtoXLSX\2007+.py", line 14, in
ws.cell(row=r, column=c).value = val
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 272, in value
self._bind_value(value)
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 229, in _bind_value
value = self.check_string(value)
File "C:\Python27\ArcGIS10.7\lib\site-packages\openpyxl\cell\cell.py", line 180, in check_string
value = unicode(value, self.encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 30: invalid start byte
It looks like pyopenxl expects that it will receive data encoded as UTF-8, but the data you are passing it has some other encoding - probably one of the Windows cp* codepages. You can determine the system's default encoding by calling locale.getpreferredencoding. Let's assume it's cp1252.
In the traceback, we can see that this is the failing line:
unicode(value, self.encoding)
resulting in this error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 30: invalid start byte
pyopenxl is trying to decode the value it receives from UTF-8, and failing; we can work round this by re-encoding the value before passing it to pyopenxl.
for c, val in enumerate(row, start=1):
fixed_val = unicode(val, 'cp1252').encode('utf-8')
ws.cell(row=r, column=c).value = fixed_val
If it's possible that some of your files are encoded as UTF-8 and some are encoded in your system's default encoding, you may need to wrap the original assignment in a try/except block
for c, val in enumerate(row, start=1):
try:
ws.cell(row=r, column=c).value = val
except UnicodeDecodeError:
fixed_val = unicode(val, 'cp1252').encode('utf-8')
ws.cell(row=r, column=c).value = fixed_val

Categories

Resources