How to parse nested json fields in list into dataframe?

How to parse nested json fields in list into dataframe? - python

I am making API calls and getting back nested JSON response for every ID.
If I run the API call for one ID the JSON looks like this.
u'{"id":26509,"name":"ORD.00001","order_type":"sales","consumer_id":415372,"order_source":"in_store","is_submitted":0,"fulfillment_method":"in_store","order_total":150,"balance_due":150,"tax_total":0,"coupon_total":0,"order_status":"cancelled","payment_complete":null,"created_at":"2017-12-02 19:49:15","updated_at":"2017-12-02 20:07:25","products":[{"id":48479,"item_master_id":239687,"name":"QA_FacewreckHaze","quantity":1,"pricing_weight_id":null,"category_id":1,"subcategory_id":8,"unit_price":"150.00","original_unit_price":"150.00","discount_total":"0.00","created_at":"2017-12-02 19:49:45","sold_weight":10,"sold_weight_uom":"GR"}],"payments":[],"coupons":[],"taxes":[],"order_subtotal":150}'
I can successfully parse this one JSON string into a dataframe using this line of code:
order_detail = json.loads(r.text)
order_detail = json_normalize(order_detail_staging)
I can iterate all my IDs through the API using this code:
lists = []
for id in df.id:
r = requests.get("URL/v1/orders/{id}".format(id=id), headers = headers_order)
lists.append(r.text)
Now that all my JSON responses are stored in the list. How do I write all the elements into the list into a dataframe?
The code I have been trying is this:
for x in lists:
order_detail = json.loads(x)
order_detail = json_normalize(x)
print(order_detail)
I get error:
AttributeError: 'unicode' object has no attribute 'itervalues'
I know this is happening at line:
order_detail = json_normalize(x)
Why does this line work for a single JSON string but not for the list? What can I do get the list of nested JSON into a dataframe?
Thank you in advance for the help.
edit:
Traceback (most recent call last):
File "<ipython-input-108-5051d2ceb18b>", line 3, in <module>
for id in df.id
File "/Users/bob/anaconda/lib/python2.7/site-packages/requests/models.py", line 802, in json
return json.loads(self.text, **kwargs)
File "/Users/bob/anaconda/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/Users/bob/anaconda/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/bob/anaconda/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Traceback (most recent call last):
File "<ipython-input-108-5051d2ceb18b>", line 3, in <module>
for id in df.id
File "/Users/bob/anaconda/lib/python2.7/site-packages/requests/models.py", line 802, in json
return json.loads(self.text, **kwargs)
File "/Users/bob/anaconda/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/Users/bob/anaconda/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/bob/anaconda/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")

use response .json() method
feed it directly to json_normalize
Example:
df = json_normalize([
requests.get("URL/v1/orders/{id}".format(id=id), headers = headers_order).json()
for id in df.id
])
UPD:
failsaife version to handle incorrect responses:
def gen():
for id in df.id:
try:
yield requests.get("URL/v1/orders/{id}".format(id=id), headers = headers_order).json()
except ValueError: # incorrect API response
pass
df = json_normalize(list(gen()))

Try this:
In [28]: lst = list(set(order_detail) - set(['products','coupons','payments','taxes']))
In [29]: pd.io.json.json_normalize(order_detail, ['products'], lst, meta_prefix='p_')
Out[29]:
category_id created_at discount_total id item_master_id name original_unit_price pricing_weight_id \
0 1 2017-12-02 19:49:45 0.00 48479 239687 QA_FacewreckHaze 150.00 None
quantity sold_weight ... p_tax_total p_order_source p_consumer_id p_payment_complete p_coupon_total \
0 1 10 ... 0 in_store 415372 None 0
p_fulfillment_method p_order_type p_is_submitted p_balance_due p_updated_at
0 in_store sales 0 150 2017-12-02 20:07:25
[1 rows x 29 columns]

Related

Reading JSON file using Python

I have a JSON file called 'elements.json':
[
{ldraw="003238a",lgeo="003238a",slope=0,anton=0,lutz=0,owen=0,damien=0},
{ldraw="003238b",lgeo="003238b",slope=0,anton=0,lutz=0,owen=0,damien=0},
{ldraw="003238c",lgeo="003238c",slope=0,anton=0,lutz=0,owen=0,damien=0},
{ldraw="003238d",lgeo="003238d",slope=0,anton=0,lutz=0,owen=0,damien=0}
]
I have a Python file called 'test.py':
import json
with open('elements.json') as json_file:
data = json.load(json_file)
for p in data:
print('ldraw: ' + p['ldraw'])
print('lgeo: ' + p['lgeo'])
Running from the Windows command line I get this error:
Traceback (most recent call last):
File "test.py", line 4, in <module>
data = json.load(json_file)
File "C:\Python27\lib\json\__init__.py", line 278, in load
**kw)
File "C:\Python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 2 column 2 (char 3)
What property name is expected? Why am I getting the error?

You aren't following the JSON specification. See json.org for details.
[
{"ldraw":"003238a","lgeo":"003238a","slope":0,"anton":0,"lutz":0,"owen":0,"damien":0},
{"ldraw":"003238b","lgeo":"003238b","slope":0,"anton":0,"lutz":0,"owen":0,"damien":0},
{"ldraw":"003238c","lgeo":"003238c","slope":0,"anton":0,"lutz":0,"owen":0,"damien":0},
{"ldraw":"003238d","lgeo":"003238d","slope":0,"anton":0,"lutz":0,"owen":0,"damien":0}
]
Your Python code is correct.
Your ldraw and lgeo values look like hexadecimal; JSON does not support hex, and you will have to do the extra work yourself.
[Edit: They're not]

Your file elements.json is not a valid json file.
It should have looked like this -
[{"ldraw":"003238a","lgeo":"003238a"}]

Your JSON format is invalid, JSON stands for JavaScript Object Notation, like the Javascript Object. So, you should replace "=" to ":". It means key-value pairs.
Wrong:
ldraw="003238a"
ldraw: 003238a // if no quote, the value should be the digit only.
Right:
ldraw: "003238a"
ldraw: { "example-key": "value" }
ldraw: "True"

Error in reading JSON: No JSON object could be decoded

I am reading a set of JSON files using glob and storing them in a list. The length of the list is 1046. When I am reading the JSON file one by one and loading it to run further code, it just runs on 595 files and gives the following error:
Traceback (most recent call last):
File "removeDeleted.py", line 38, in <module>
d = json.load(open(fn))
File "/usr/lib/python2.7/json/__init__.py", line 291, in load
**kw)
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
I am loading the json files like this:
json_file_names = sorted(glob.glob("./Intel_Shared_Data/gtFine/train/*/*.json"))
for fn in json_file_names:
#print fn
#temp = temp + 1
#count = 0
d = json.load(open(fn))
objects = d["objects"]
for j in range(len(objects)):
Can anybody suggest me way out of this error?

As Blender said, you need to find out which of your files contains invalid JSON. To this end, you need to add some debugging statements to your code:
json_file_names = sorted(glob.glob("./Intel_Shared_Data/gtFine/train/*/*.json"))
for fn in json_file_names:
#print fn
#temp = temp + 1
#count = 0
try:
d = json.load(open(fn))
objects = d["objects"]
for j in range(len(objects)):
except ValueError as e:
print "Could not load {}, invalid JSON".format({})

One of your json text files is empty. Maybe start by seeing if you have any zero size files with
find . -size 0
run from your directory of json files in a terminal.

How to transfer a JSON object into a Python object?

I use:
json_str = '{"name":"Saeron", "age":23, "score":100}'
def json2dict(d):
return dict(d['name'],d['age'],d['score'])
d = json.loads(json_str, object_hook=json2dict)
print(d.name)
but get an error:
Traceback (most recent call last):
File "C:/Users/40471/PycharmProjects/untitled/untitled.py", line 693, in <module>
d = json.loads(json_str, object_hook=json2dict)
File "C:\Program Files\Python36\lib\json\__init__.py", line 367, in loads
return cls(**kw).decode(s)
File "C:\Program Files\Python36\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\Python36\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
File "C:/Users/40471/PycharmProjects/untitled/untitled.py", line 692, in json2dict
return dict(d['name'],d['age'],d['score'])
TypeError: dict expected at most 1 arguments, got 3
I follow the steps which instructs to unpickle a Json obj to a Python obj, just like this:
json_str = '{"age": 20, "score": 88, "name": "Bob"}'
print(json.loads(json_str, object_hook=dict2student))
Why can't it take effect on a dict? How can I revise?

Once you load json using d = json.loads(json_str) d is python dict
you cannot get the item using .(dot).
You need:
json_str = '{"name":"Saeron", "age":23, "score":100}'
d = json.loads(json_str)
print(d['name'])

json.loads can evaluate your string as a Python dictionary directly as follows:
json_str = '{"name":"Saeron", "age":23, "score":100}'
d = json.loads(json_str)
print(d['name'])
>>>Saeron
The function you pass via the object_hook parameter will receive the dictionary that was created from the given string as input

Creating dataframe from json not always working

I'm trying to run this code to create a data frame from a JSON link. Sometimes, the code will run. Other times, I will get an error message (below). I'm not sure why this occurs, even though the code is the same.
import requests
import json
url = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
jd = requests.get(url).json()
df = []
for item in requests.get(url).json()['resultSets']:
print("got here")
row_df = []
for row in item['rowSet']:
row_df.append(str(row).strip('[]'))
df.append("\n")
df.append(row_df)
print(df)
Error Message:
Traceback (most recent call last):
File "/Users/K/PycharmProjects/mousefun/fun", line 8, in <module>
jd = requests.get(url).json()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/requests/models.py", line 812, in json return complexjson.loads(self.text, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py", line 318, in loads return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/decoder.py", line 343, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/decoder.py", line 361, in raw_decode raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)

Change your request logic to this and try again:
r = requests.get(url)
r.raise_for_status()
df = []
for item in r.json()["resultSets"]:
# ...
r.raise_for_status() will raise if the status is not OK .
Also, this does not do the request two times like your code does.

How to convert this json string to dict?

After executing the following code:
import json
a = '{"excludeTypes":"*.exe;~\\$*.*"}'
json.loads(a)
I get:
Traceback (most recent call last):
File "", line 1, in
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
So how can I convert 'a' to dict.
Please note that the string is already in 'a' and I cannot add 'r' in front of it. Ideally, the string should have been {"excludeTypes":"*.exe;~\\\\$*.*"}
Also, the following code doesn't work:
import json
a = '{"excludeTypes":"*.exe;~\\$*.*"}'
b = repr(a)
json.loads(b)

import ast
d = ast.literal_eval(a)

By escaping Escape Character "\":
import json
a = '{"excludeTypes":"*.exe;~\\$*.*"}'
a = a.replace("\\","\\\\")
json.loads(a)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse nested json fields in list into dataframe? - python

Related

Reading JSON file using Python

Error in reading JSON: No JSON object could be decoded

How to transfer a JSON object into a Python object?

Creating dataframe from json not always working

How to convert this json string to dict?

Categories

Resources