so I'm trying to get a csv file with requests and save it to my project:
import requests
import pandas as pd
import csv
def get_and_save_countries():
url = 'https://www.trackcorona.live/api/countries'
r = requests.get(url)
data = r.json()
data = data["data"]
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="") as f:
title = "location,country_code,latitude,longitude,confirmed,dead,recovered,updated".split(",")
cw = csv.DictWriter(f,title,delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
cw.writeheader()
cw.writerows(data)
I've managed that but when I try this:
get_data.get_and_save_countries()
df = pd.read_csv("corona\\dash_apps\\finished_apps\\apicountries.csv")
I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
And I have no idea why. Any help is welcome. Thanks.
Try:
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="", encoding ='utf-8') as f:
to explicitly specify the encoding with encoding='utf-8'
When you write to a file, the default encoding is locale.getpreferredencoding(False). On Windows that is usually not UTF-8 and even on Linux the terminal could be configured other than UTF-8. Pandas is defaulting to utf-8, so specify encoding='utf8' as another parameter to open.
Related
I would like to import the JSON file located at "https://www.drivy.com/cars/458342/reviews?page=1&paginate_per=6&rel=next" in python.
When I run this:
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file)
I get an error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 6776: character maps to <undefined>
Actually this error is due to a special character contained in the html keyvalue. Knowing that this character is an emoticon (a thumb), how can I still import my JSON by ignoring this ?
You need to specify the correct format for the json encoder to use. Most use utf8, therefore use something like:
reviews = json.load(
open("C:/Users/coppe/Documents/py trials/eval.json", encoding="utf8")
)
or
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file, encoding="utf8")
Good Luck!
use
open(json_file, encoding="utf8")
I try to open a json file but get a decode error. I can't find the solution for this. How can i decode this data?
The code gives the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 3765: invalid start byte
import json
url = 'users.json'
with open(url) as json_data:
data = json.load(json_data)
That means that the data you're trying to decode isn't encoded in UTF-8
EDIT:
You may decode it before loading it with json using something like this:
with open(url, 'rb') as f:
data = f.read()
data_str = data.decode("utf-8", errors='ignore')
json.load(data_str)
https://www.tutorialspoint.com/python/string_decode.htm
Be careful that you WILL lose some data during this process. A safer way would be to use the same decoding mechanism used to encode your JSON file, or to put raw data bytes in something like base64
I checked similar questions before I write here, also I tried to use try/except... where try does nothing, except prints bad line but couldn't solve my issue. So currently I have:
import pandas as pd
import chardet
# Read the file
with open("full_data.csv", 'rb') as f:
result = chardet.detect(f.read()) # or readline if the file is large
df1 = pd.read_csv("full_data.csv", sep=';',
encoding=result['encoding'], error_bad_lines=False, low_memory=False, quoting=csv.QUOTE_NONE)
But I still get the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 9: invalid start byte
Is there any option similar to error = 'replace' in open csv ? Or any other solutions
Using engine option sovles my problem:
df1 = pd.read_csv("full_data.csv", sep=";", engine="python")
i have a iso8859-9 encoded csv file and trying to read it into a dataframe.
here is the code and error I got.
iller = pd.read_csv('/Users/me/Documents/Works/map/dist.csv' ,sep=';',encoding='iso-8859-9')
iller.head()
and error is
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 250: ordinal not in range(128)
and code below works without error.
import codecs
myfile = codecs.open('/Users/me/Documents/Works/map/dist.csv', "r",encoding='iso-8859-9')
for a in myfile:
print a
My question is why pandas not reading my correctly encoded file ? and is there any way to make it read?
Not possible to see what could be off with you data of course, but if you can read in the data without issues with codecs, then maybe an idea would be to write out the file to UTF encoding(?)
import codecs
filename = '/Users/me/Documents/Works/map/dist.csv'
target_filename = '/Users/me/Documents/Works/map/dist-utf-8.csv'
myfile = codecs.open(filename, "r",encoding='iso-8859-9')
f_contents = myfile.read()
or
import codecs
with codecs.open(filename, 'r', encoding='iso-8859-9') as fh:
f_contents = fh.read()
# write out in UTF-8
with codecs.open(target_filename, 'w', encoding = 'utf-8') as fh:
fh.write(f_contents)
I hope this helps!
I'm having problem using decode in python, I'm trying to fetch an IMDB website (example address: http://www.imdb.com/title/tt2216240/):
req = urllib.request.Request(address)
response = urllib.request.urlopen(req)
page = response.read().decode('utf-8', 'ignore')
with open('film.html', 'w') as f:
print(page, file=f)
I get an error:
UnicodeEncodeError: 'charmap' codec can't encode character '\xe6' in position 4132: character maps to <undefined>
Try to explicitly specify utf-8 file encoding:
with open('film.html', 'w', encoding='utf-8') as f:
print(page, file=f)
Did already use requests library ?
Anyway, it made simpler:
#samplerequest.py
import requests
address = "http://www.imdb.com/title/tt2216240/"
req = requests.get(address)
print req.text
print req.encoding