pandas reading csv file encoding error

pandas reading csv file encoding error - python

i have a iso8859-9 encoded csv file and trying to read it into a dataframe.
here is the code and error I got.
iller = pd.read_csv('/Users/me/Documents/Works/map/dist.csv' ,sep=';',encoding='iso-8859-9')
iller.head()
and error is
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 250: ordinal not in range(128)
and code below works without error.
import codecs
myfile = codecs.open('/Users/me/Documents/Works/map/dist.csv', "r",encoding='iso-8859-9')
for a in myfile:
print a
My question is why pandas not reading my correctly encoded file ? and is there any way to make it read?

Not possible to see what could be off with you data of course, but if you can read in the data without issues with codecs, then maybe an idea would be to write out the file to UTF encoding(?)
import codecs
filename = '/Users/me/Documents/Works/map/dist.csv'
target_filename = '/Users/me/Documents/Works/map/dist-utf-8.csv'
myfile = codecs.open(filename, "r",encoding='iso-8859-9')
f_contents = myfile.read()
or
import codecs
with codecs.open(filename, 'r', encoding='iso-8859-9') as fh:
f_contents = fh.read()
# write out in UTF-8
with codecs.open(target_filename, 'w', encoding = 'utf-8') as fh:
fh.write(f_contents)
I hope this helps!

Related

Python - Pandas : how to save csv file from url

so I'm trying to get a csv file with requests and save it to my project:
import requests
import pandas as pd
import csv
def get_and_save_countries():
url = 'https://www.trackcorona.live/api/countries'
r = requests.get(url)
data = r.json()
data = data["data"]
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="") as f:
title = "location,country_code,latitude,longitude,confirmed,dead,recovered,updated".split(",")
cw = csv.DictWriter(f,title,delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
cw.writeheader()
cw.writerows(data)
I've managed that but when I try this:
get_data.get_and_save_countries()
df = pd.read_csv("corona\\dash_apps\\finished_apps\\apicountries.csv")
I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
And I have no idea why. Any help is welcome. Thanks.

Try:
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="", encoding ='utf-8') as f:
to explicitly specify the encoding with encoding='utf-8'

When you write to a file, the default encoding is locale.getpreferredencoding(False). On Windows that is usually not UTF-8 and even on Linux the terminal could be configured other than UTF-8. Pandas is defaulting to utf-8, so specify encoding='utf8' as another parameter to open.

Python: Assistance with json and reading from a file

Say i have a notepad file (.txt) with the following content:
"Hello I am really bad at programming"
Using json, how would I get the sentence from the file to the python program which I can then use as a variable?
So far I have this code:
newfile = open((compfilename)+'.txt', 'r')
saveddata = json.load(newfile)
orgsentence = saveddata[0]
I always get this error:
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
Thanks in advance for any help!

Though you are using txt file. You could read this file without json. But as you mentioned in the question, you can try like this
hello.txt
"Hello I am really bad at programming"
To read this txt file,
import json
from pprint import pprint
with open('hello.txt') as myfile:
mydata = json.load(myfile) #to load json
print myfile.read() #to print contents on stdout, not using json load
pprint(mydata)
Output:
u'Hello I am really bad at programming'

import json
with open('file.txt') as f:
data = json.load(f)

Writing CSV file with umlauts causing "UnicodeEncodeError: 'ascii' codec can't encode character"

I am trying to write characters with double dots (umlauts) such as ä, ö and Ö. I am able to write it to the file with data.encode("utf-8") but the result b'\xc3\xa4\xc3\xa4\xc3\x96' is not nice (UTF-8 as literal characters). I want to get "ääÖ" as written stored to a file.
How can I write data with umlaut characters to a CSV file in Python 3?
import csv
data="ääÖ"
with open("test.csv", "w") as fp:
a = csv.writer(fp, delimiter=";")
data=resultFile
a.writerows(data)
Traceback:
File "<ipython-input-280-73b1f615929e>", line 5, in <module>
a.writerows(data)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 15: ordinal not in range(128)

Add a parameter encoding to the open() and set it to 'utf8'.
import csv
data = "ääÖ"
with open("test.csv", 'w', encoding='utf8') as fp:
a = csv.writer(fp, delimiter=";")
a.writerows(data)
Edit: Removed the use of io library as open is same as io.open in Python 3.

This solution should work on both python2 and 3 (not needed in python3):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
data="ääÖ"
with open("test.csv", "w") as fp:
a = csv.writer(fp, delimiter=";")
a.writerows(data)
Credits to:
Working with utf-8 encoding in Python source

Python: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

I'm currently have an issue with my python 3 code.
replace_line('Products.txt', line, tenminus_str)
Is the line I'm trying to turn into utf-8, however when I try to do this like I would with others, I get errors such as no attribute ones and when I try to add, for example...
.decode("utf8")
...to the end of it, I still get errors that it is using ascii. I also tried other methods that worked with other lines such as adding io. infront and adding a comma with
encoding = 'utf8'
The function that I am using for replace_line is:
def replace_line(file_name, line_num, text):
lines = open(file_name, 'r').readlines()
lines[line_num] = text
out = open(file_name, 'w')
out.writelines(lines)
out.close()
How would I fix this issue? Please note that I'm very new to Python and not advanced enough to do debugging well.
EDIT: Different fix to this question than 'duplicate'
EDIT 2:I have another error with the function now.
File "FILELOCATION", line 45, in refill replace_line('Products.txt', str(line), tenminus_str)
File "FILELOCATION", line 6, in replace_line lines[line_num] = text
TypeError: list indices must be integers, not str
What does this mean and how do I fix it?

Change your function to:
def replace_line(file_name, line_num, text):
with open(file_name, 'r', encoding='utf8') as f:
lines = f.readlines()
lines[line_num] = text
with open(file_name, 'w', encoding='utf8') as out:
out.writelines(lines)
encoding='utf8' will decode your UTF-8 file correctly.
with automatically closes the file when its block is exited.
Since your file started with \xef it likely has a UTF-8-encoding byte order mark (BOM) character at the beginning. The above code will maintain that on output, but if you don't want it use utf-8-sig for the input encoding. Then it will be automatically removed.

codecs module is just what you need. detail here
import codecs
def replace_line(file_name, line_num, text):
f = codecs.open(file_name, 'r', encoding='utf-8')
lines = f.readlines()
lines[line_num] = text
f.close()
w = codecs.open(file_name, 'w', encoding='utf-8')
w.writelines(lines)
w.close()

Handling coding problems You can try adding the following settings to your head
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Type = sys.getfilesystemencoding()

Try adding encoding='utf8' if you are reading a file
with open("../file_path", encoding='utf8'):
# your code

unicode encoding error using unicodecsv

I write a list into a csv file using unicodecsv module, encoding it by "utf-8", but when I try to read it using unicodecsv.reader, I still get the error:UnicodeDecodeError: 'utf8' codec can't decode byte.... I can read it in by csv.reader. Is there something that I am missing?
My codes are like this:
with open(datapath + filename, 'wb') as csvfile:
writer_to_csv = unicodecsv.writer(csvfile, encoding = "utf-8")
writer_to_csv.writerows(data)
When I try to read it:
with open(datapath + filename, 'rb') as csvfile:
file_to_list = unicodecsv.reader(csvfile, encoding = "utf-8")
I got the error message.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas reading csv file encoding error - python

Related

Python - Pandas : how to save csv file from url

Python: Assistance with json and reading from a file

Writing CSV file with umlauts causing "UnicodeEncodeError: 'ascii' codec can't encode character"

Python: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

unicode encoding error using unicodecsv

Categories

Resources