JSON import in Python

JSON import in Python - python

I would like to import the JSON file located at "https://www.drivy.com/cars/458342/reviews?page=1&paginate_per=6&rel=next" in python.
When I run this:
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file)
I get an error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 6776: character maps to <undefined>
Actually this error is due to a special character contained in the html keyvalue. Knowing that this character is an emoticon (a thumb), how can I still import my JSON by ignoring this ?

You need to specify the correct format for the json encoder to use. Most use utf8, therefore use something like:
reviews = json.load(
open("C:/Users/coppe/Documents/py trials/eval.json", encoding="utf8")
)
or
with open('C:/Users/coppe/Documents/py trials/eval.json') as json_file:
reviews = json.load(json_file, encoding="utf8")
Good Luck!

use
open(json_file, encoding="utf8")

Related

Python - Pandas : how to save csv file from url

so I'm trying to get a csv file with requests and save it to my project:
import requests
import pandas as pd
import csv
def get_and_save_countries():
url = 'https://www.trackcorona.live/api/countries'
r = requests.get(url)
data = r.json()
data = data["data"]
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="") as f:
title = "location,country_code,latitude,longitude,confirmed,dead,recovered,updated".split(",")
cw = csv.DictWriter(f,title,delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
cw.writeheader()
cw.writerows(data)
I've managed that but when I try this:
get_data.get_and_save_countries()
df = pd.read_csv("corona\\dash_apps\\finished_apps\\apicountries.csv")
I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
And I have no idea why. Any help is welcome. Thanks.

Try:
with open("corona/dash_apps/finished_apps/apicountries.csv","w",newline="", encoding ='utf-8') as f:
to explicitly specify the encoding with encoding='utf-8'

When you write to a file, the default encoding is locale.getpreferredencoding(False). On Windows that is usually not UTF-8 and even on Linux the terminal could be configured other than UTF-8. Pandas is defaulting to utf-8, so specify encoding='utf8' as another parameter to open.

why do i get a decode error when using json load in python?

I try to open a json file but get a decode error. I can't find the solution for this. How can i decode this data?
The code gives the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 3765: invalid start byte
import json
url = 'users.json'
with open(url) as json_data:
data = json.load(json_data)

That means that the data you're trying to decode isn't encoded in UTF-8
EDIT:
You may decode it before loading it with json using something like this:
with open(url, 'rb') as f:
data = f.read()
data_str = data.decode("utf-8", errors='ignore')
json.load(data_str)
https://www.tutorialspoint.com/python/string_decode.htm
Be careful that you WILL lose some data during this process. A safer way would be to use the same decoding mechanism used to encode your JSON file, or to put raw data bytes in something like base64

How to extract String from a Unicoded JSONObject in Python?

I'm getting the below error when I try to parse a String with Unicodes like ' symbol and Emojis, etc :
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f33b' in position 19: ordinal not in range(128)
Sample Object:
{"user":{"name":"\u0e2a\u0e31\u0e48\u0e07\u0e14\u0e48\u0e27\u0e19 \u0e2b\u0e21\u0e14\u0e44\u0e27 \u0e40\u0e14\u0e23\u0e2a\u0e41\u0e1f\u0e0a\u0e31\u0e48\u0e19\u0e21\u0e32\u0e43\u0e2b\u0e21\u0e48 \u0e23\u0e32\u0e04\u0e32\u0e40\u0e1a\u0e32\u0e46 \u0e2a\u0e48\u0e07\u0e17\u0e31\u0e48\u0e27\u0e44\u0e17\u0e22 \u0e44\u0e14\u0e49\u0e02\u0e2d\u0e07\u0e0a\u0e31\u0e27\u0e23\u0e4c\u0e08\u0e49\u0e32 \u0e2a\u0e19\u0e43\u0e08\u0e15\u0e34\u0e14\u0e15\u0e48\u0e2d\u0e2a\u0e2d\u0e1a\u0e16\u0e32\u0e21 Is it","tag":"XYZ"}}
I'm able to extract tag value, but I'm unable to extract name value.
Here is my code:
dict = json.loads(json_data)
print('Tag - 'dict['user']['tag'])
print('Name - 'dict['user']['name'])

You can save the data in CSV file format which could also be opened using Excel. When you open a file in this way: open(filename, "w") then you can only store ASCII characters, but if you try to store Unicode data this way, you would get UnicodeEncodeError. In order for you to store Unicode data, you need to open the file with UTF-8 encoding.
mydict = json.loads(json_data) # or whatever dictionary it is...
# Open the file with UTF-8 encoding, most important step
f = open("userdata.csv", "w", encoding='utf-8')
f.write(mydict['user']['name'] + ", " + mydict['user']['tag'] + "\n")
f.close()
Feel free to change the code based on the data you have.
That's it...

Python: Assistance with json and reading from a file

Say i have a notepad file (.txt) with the following content:
"Hello I am really bad at programming"
Using json, how would I get the sentence from the file to the python program which I can then use as a variable?
So far I have this code:
newfile = open((compfilename)+'.txt', 'r')
saveddata = json.load(newfile)
orgsentence = saveddata[0]
I always get this error:
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
Thanks in advance for any help!

Though you are using txt file. You could read this file without json. But as you mentioned in the question, you can try like this
hello.txt
"Hello I am really bad at programming"
To read this txt file,
import json
from pprint import pprint
with open('hello.txt') as myfile:
mydata = json.load(myfile) #to load json
print myfile.read() #to print contents on stdout, not using json load
pprint(mydata)
Output:
u'Hello I am really bad at programming'

import json
with open('file.txt') as f:
data = json.load(f)

UnicodeDecodeError: 'gbk' codec can't decode byte when read json contains chinese

I'm switching from Python 2 to 3
In my jupyter notebook the code is
file = "./data/test.json"
with open(file) as data_file:
data = json.load(data_file)
It used to be fine with python 2, but now after just switch to python 3, it gives me the error
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 123: illegal multibyte sequence
The test.json file is like this:
[{
"name": "Daybreakers",
"detail_url": "http://www.movieinsider.com/m4120/daybreakers/",
"movie_tt_id": "中文"
}]
If I delete the chinese, there will be no error.
So what should I do?
There are a lot of similar questions in SO, but I didn't find a good solution for my case. If you find an applicable one, please tell me and I'll close this one.
Thanks a lot!

You need to specify the correct encoding when you open the file. If the JSON is encoded with UTF-8 you can do this:
import json
fname = "test.json"
with open(fname, encoding='utf-8') as data_file:
data = json.load(data_file)
print(data)
output
[{'name': 'Daybreakers', 'detail_url': 'http://www.movieinsider.com/m4120/daybreakers/', 'movie_tt_id': '中文'}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

JSON import in Python - python

use open(json_file, encoding="utf8")

Related

Python - Pandas : how to save csv file from url

why do i get a decode error when using json load in python?

How to extract String from a Unicoded JSONObject in Python?

Python: Assistance with json and reading from a file

UnicodeDecodeError: 'gbk' codec can't decode byte when read json contains chinese

Categories

Resources