Json extraction of specfic field via Python

Json extraction of specfic field via Python - python

Trying to get the "externalCode" field from the below incomplete json file, however i am lost, i used python to only get to second element and get the error. I am not sure how to go about traversing through a nested JSON as such below
output.writerow([row['benefitCategories'], row['benefitValueSets']] + row['disabled'].values())
KeyError: 'benefitValueSets'
import csv, json, sys
input = open('C:/Users/kk/Downloads/foo.js', 'r')
data = json.load(input)
input.close()
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow([row['benefitCategories'], row['benefitValueSets']] + row['disabled'].values())
Json file
[
{
"benefitCategories": [
{
"benefits": [
{
"benefitCode": "NutritionLabel",
"benefitCustomAttributeSets": [
],
"benefitValueSets": [
{
"benefitValues": [
null
],
"costDifferential": 0,
"default": false,
"disabled": false,
"displayValue": "$500",
"externalCode": null,
"id": null,
"internalCode": "$500",
"selected": false,
"sortOrder": 0
}
],
"configurable": false,
"displayName": "DEDUCTIBLE",
"displayType": null,
"externalCode": "IndividualInNetdeductibleAmount",
"id": null,
"key": "IndividualInNetdeductibleAmount",
"productBenefitRangeValue": null,
"sortOrder": 0,
"values": [
{
"code": null,
"description": null,
"id": null,
"numericValue": null,
"selected": false,
"value": "$500"
}
]
},
{
"benefitCode": "NutritionLabel",
"benefitCustomAttributeSets": [
],
"benefitValueSets": [
{
"benefitValues": [
null
],
"costDifferential": 0,
"default": false,
"disabled": false,
"displayValue": "100%",
"externalCode": null,
"id": null,
"internalCode": "100%",
"selected": false,
"sortOrder": 0
}
],
"configurable": false,
"displayName": "COINSURANCE",
"displayType": null,
"externalCode": "PhysicianOfficeInNetCoInsurancePct",
"id": null,
"key": "PhysicianOfficeInNetCoInsurancePct",
"productBenefitRangeValue": null,
"sortOrder": 0,
"values": [
{
"code": null,
"description": null,
"id": null,
"numericValue": null,
"selected": false,
"value": "100%"
}
]
},
{

Try this code:
import csv, json, sys
input = open('C:/Users/spolireddy/Downloads/foo.js', 'r')
data = json.load(input)
input.close()
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow([row['benefitCategories'], row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0], row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0]['disabled']])
# for externalCode:
row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0]['externalCode']

I'm not quite sure I understand what you're looking to do with your code. There are multiple externalCode values for each element in the array, at least from the sample you've posted. But you can get the data you're looking for with this syntax:
data[0]["benefitCategories"][0]["benefits"][0]["externalCode"]
data[0]["benefitCategories"][0]["benefits"][1]["externalCode"]
The code below iterates through the data you're interested in (with a slightly modified JSON file so that it's complete) and works as desired:
import csv, json, sys
input = open('junk.json', 'r')
data = json.load(input)
input.close()
for x in data[0]["benefitCategories"][0]["benefits"]:
print x["externalCode"] + "\n\n"

Related

"NoneType object is not iterable" error while appending JSON data to lists

I am trying to webscrape customer reviews data from Shopee by using JSON URL calls. However, I encountered the error - 'NoneType' object is not iterable, while trying to append the retrieved JSON data into lists. I have looked through the related questions here, but could not find the solution.
Below are my codes for retrieval of the customer reviews data:
import re
import json
import requests
import pandas as pd
def get_ratings(shop_id, item_id):
ratings_url = "https://shopee.sg/api/v2/item/get_ratings?filter=0&flag=1&itemid={item_id}&limit=20&offset={offset}&shopid={shop_id}&type=0"
offset = 0
d = {"product_id": [], "username": [], "rating": [], "comment": []}
while True:
data = requests.get(
ratings_url.format(shop_id=shop_id, item_id=item_id, offset=offset)
).json()
# uncomment this to print all data:
print(json.dumps(data, indent=4))
i = 1
for i, rating in enumerate(data["data"]["ratings"], 1):
d["product_id"].append(rating["itemid"])
d["username"].append(rating["author_username"])
d["rating"].append(rating["rating_star"])
d["comment"].append(rating["comment"])
if i % 20:
break
offset += 20
return d
keyword_search = "Fishball"
headers = {
"User-Agent": "Chrome",
"Referer": "{}search?keyword={}".format(Shopee_url, keyword_search),
}
url = "https://shopee.sg/api/v2/search_items/?by=relevancy&keyword={}&limit=100&newest=0&order=desc&page_type=search".format(
keyword_search
)
# can change "relevancy" to "latest": to sort by latest products instead
# Shopee API request
r = requests.get(url, headers=headers).json()
# For Products: Create lists to hold the items
p_product_id = []
p_product_name = []
p_product_price = []
p_product_soldquantity = []
p_product_rating = []
# For Reviews: Create lists to hold the items
r_product_id = []
r_customer_username = []
r_customer_rating = []
r_customer_review = []
# Populate "Products" lists
for item in r["items"]:
p_product_id.append(item["itemid"])
p_product_name.append(item["name"])
p_product_price.append(item["price_min"])
p_product_soldquantity.append(item["historical_sold"])
p_product_rating.append(item["item_rating"]["rating_star"])
for item in r["items"]:
review_item = get_ratings(item["shopid"], item["itemid"])
r_product_id = r_product_id + review_item["product_id"]
r_customer_username = r_customer_username + review_item["username"]
r_customer_rating = r_customer_rating + review_item["rating"]
r_customer_review = r_customer_review + review_item["comment"]
Here is a snippet of the error:
I have to looked at the JSON dump data to figure out the source of the error. Such as:
Testing if the values of any of the keys(itemid, author_username, rating_star, comment) being null/None or "", are causing the "Nonetype" error. However, it doesn't cause any errors and would successfully append to the lists as I wanted.
Testing if any of the keys being missing causes the "Nonetype" error. However, it will just result a different error stating that the particular key is not declared.
Edited for #flakes: Below is a sample of the JSON dump. However, the actual dump data is a lot larger. Therefore, the error might not exist in this sample of code:
{
"data": {
"ratings": [],
"item_rating_summary": {
"rating_total": 0,
"rating_count": [
0,
0,
0,
0,
0
],
"rcount_with_context": 0,
"rcount_with_image": 0,
"rcount_with_media": 0,
"rcount_local_review": 0,
"rcount_repeat_purchase": 0
},
"is_sip_item": false,
"rcmd_algo": "ABTEST:2.a.312#AD12"
},
"error": 0,
"error_msg": null
}
{
"data": {
"ratings": [
{
"orderid": 49169039164703,
"itemid": 6342546681,
"cmtid": 2760023244,
"ctime": 1596074706,
"rating": 1,
"userid": 269495210,
"shopid": 233683003,
"comment": "Some items didn\u2019t received, and ask for refund on Shopee system, and Shopee ask for return all the items to Shopee(excuse me! Fresh items wow).....This make me crazy. Finally, seller have solved this problem and I got the refund already. Seller also reply slowly. I\u2019m so disappointed with Shopee ",
"rating_star": 4,
"status": 1,
"mtime": 1596074706,
"editable": 0,
"opt": 2,
"filter": 7,
"mentioned": [],
"is_hidden": false,
"author_username": "kgamdr9qx6",
"author_portrait": "",
"author_shopid": 269491453,
"anonymous": false,
"images": [
"a19dfc8f10afe789b586ed934f902adc"
],
"videos": [
{
"id": "sg_132892ff-a252-4505-ad1b-94ecf8cf4693_000046",
"cover": "https://play-ws.vod.shopee.com/c4/98934353/10321063012/YzQsMTI0LTE2NzMyLDE0MTAwODg1NTg2NzI0NDU0NDAsMw.jpg",
"url": "https://play-ws.vod.shopee.com/c4/98934353/10321063012/YzQsMTI0LTE2NzMyLDE0MTAwODkwMTc0Nzg5NzEzOTIsMw.mp4",
"duration": 7106,
"upload_time": null
}
],
"product_items": [
{
"itemid": 6342546681,
"shopid": 233683003,
"name": "BoBo Cooked Fishball 500g",
"image": "55156d13a9cda32c12d9a170196b219a",
"is_snapshot": 1,
"snapshotid": 1957356912,
"modelid": 40318347777,
"model_name": ""
}
],
"delete_reason": null,
"delete_operator": null,
"ItemRatingReply": null,
"tags": null,
"editable_date": null,
"show_reply": null,
"like_count": null,
"liked": null,
"sync_to_social": false,
"detailed_rating": null,
"exclude_scoring_due_low_logistic": false,
"loyalty_info": null,
"template_tags": [],
"has_template_tag": false,
"sync_to_social_toggle": null,
"sip_info": {
"is_oversea": false,
"origin_region": "sg"
},
"is_repeated_purchase": false
},
{
"orderid": 76164426633592,
"itemid": 6342546681,
"cmtid": 4977003708,
"ctime": 1622799587,
"rating": 1,
"userid": 193560584,
"shopid": 233683003,
"comment": "",
"rating_star": 5,
"status": 1,
"mtime": 1622799587,
"editable": 0,
"opt": 2,
"filter": 0,
"mentioned": [],
"is_hidden": false,
"author_username": "fynahros",
"author_portrait": "",
"author_shopid": 193557725,
"anonymous": false,
"images": null,
"videos": [],
"product_items": [
{
"itemid": 6342546681,
"shopid": 233683003,
"name": "BoBo Cooked Fishball 500g",
"image": "11e161f9dbea1be563bff914beedfed8",
"is_snapshot": 1,
"snapshotid": 2373175027,
"modelid": 40318347777,
"model_name": ""
}
],
"delete_reason": null,
"delete_operator": null,
"ItemRatingReply": null,
"tags": null,
"editable_date": null,
"show_reply": null,
"like_count": null,
"liked": null,
"sync_to_social": false,
"detailed_rating": null,
"exclude_scoring_due_low_logistic": false,
"loyalty_info": null,
"template_tags": [],
"has_template_tag": false,
"sync_to_social_toggle": null,
"sip_info": {
"is_oversea": false,
"origin_region": "sg"
},
"is_repeated_purchase": false
},
{
"orderid": 50027730227101,
"itemid": 6342546681,
"cmtid": 2779109484,
"ctime": 1596443125,
"rating": 1,
"userid": 67658682,
"shopid": 233683003,
"comment": "",
"rating_star": 5,
"status": 1,
"mtime": 1596443125,
"editable": 0,
"opt": 2,
"filter": 0,
"mentioned": [],
"is_hidden": false,
"author_username": "lynn1967",
"author_portrait": "",
"author_shopid": 67657225,
"anonymous": false,
"images": null,
"videos": [],
"product_items": [
{
"itemid": 6342546681,
"shopid": 233683003,
"name": "BoBo Cooked Fishball 500g",
"image": "55156d13a9cda32c12d9a170196b219a",
"is_snapshot": 1,
"snapshotid": 1957356912,
"modelid": 40318347777,
"model_name": ""
}
],
"delete_reason": null,
"delete_operator": null,
"ItemRatingReply": null,
"tags": null,
"editable_date": null,
"show_reply": null,
"like_count": null,
"liked": null,
"sync_to_social": false,
"detailed_rating": null,
"exclude_scoring_due_low_logistic": false,
"loyalty_info": null,
"template_tags": [],
"has_template_tag": false,
"sync_to_social_toggle": null,
"sip_info": {
"is_oversea": false,
"origin_region": "sg"
},
"is_repeated_purchase": false
}
],
"item_rating_summary": {
"rating_total": 17,
"rating_count": [
0,
0,
0,
1,
16
],
"rcount_with_context": 11,
"rcount_with_image": 11,
"rcount_with_media": 11,
"rcount_local_review": 0,
"rcount_repeat_purchase": 0
},
"is_sip_item": false,
"rcmd_algo": "ABTEST:2.a.312#AD12"
},
"error": 0,
"error_msg": null
}
Edited for #tdelaney: I found out that the key "rating" in the last dictionary has a "null" value:
{
"data": {
"ratings": null,
"item_rating_summary": {
"rating_total": 0,
"rating_count": [
0,
0,
0,
0,
0
],
"rcount_with_context": 0,
"rcount_with_image": 0,
"rcount_with_media": 0,
"rcount_local_review": 0,
"rcount_repeat_purchase": 0
},
"is_sip_item": false,
"rcmd_algo": "ABTEST:2.a.312#AD12"
},
"error": 0,
"error_msg": null
}

I am Curious on how should I approach to get the JSON to Pandas

I'm trying to write code to use data to generate a report. Instead of iterating through the dictionary, I wanted to use Pandas this time.
So, the first issue I faced was null in the data. I corrected it using json.loads().
I am trying to understand how can I get the nested JSON to Pandas.
How should I go about getting the below data to pandas? read_json() throws an error.
{
"Scans": [
{
"Targets": [
{
"Id": "5a8f415a-5146-4e39-8827-33cdb14ad478",
"Host": "xyz.com"
}
],
"Id": "51b13233-b3de-4f26-81d6-0001314f635f",
"Status": 32,
"StartTime": "2021-05-25T16:00:13.16",
"WindowScanStart": "2021-05-25T16:00:00.76",
"WindowScanStop": null,
"StartedBy": "8cafa5bc-b0df-496f-bab7-74f71eeadf9d",
"StoppedTime": null,
"CompletionTime": "2021-05-25T17:18:43.007",
"IsApproveRequired": false,
"IsMonitoring": false,
"IsUploaded": false,
"IsImported": true,
"SubStatus": null
},
{
"Targets": [
{
"Id": "6108c410-7d5c-41c9-979a-c8bf70e6f6bf",
"Host": "abc.com"
}
],
"Id": "de3a98a3-cb95-42ce-874b-00037347ebba",
"Status": 72,
"StartTime": "2021-06-07T19:50:01.85",
"WindowScanStart": "2021-06-07T19:49:44.517",
"WindowScanStop": null,
"StartedBy": "b6a3f887-0b4a-43e1-ba5b-fd93e60e58b6",
"StoppedTime": null,
"CompletionTime": "2021-06-07T19:50:53.667",
"IsApproveRequired": false,
"IsMonitoring": false,
"IsUploaded": false,
"IsImported": false,
"SubStatus": null
},
],
"IsSuccess": true,
"Reason": null,
"ErrorMessage": null
}
I am able to handle nulls using json.loads(). How can I create a Pandas Dataframe from this?

I suspect your problem is the comma after the } in the line before the ],
There is no following element, so the comma is not correct.
Remove that and try again.
{
"Scans": [
{
"Targets": [
{
"Id": "5a8f415a-5146-4e39-8827-33cdb14ad478",
"Host": "xyz.com"
}
],
"Id": "51b13233-b3de-4f26-81d6-0001314f635f",
"Status": 32,
"StartTime": "2021-05-25T16:00:13.16",
"WindowScanStart": "2021-05-25T16:00:00.76",
"WindowScanStop": null,
"StartedBy": "8cafa5bc-b0df-496f-bab7-74f71eeadf9d",
"StoppedTime": null,
"CompletionTime": "2021-05-25T17:18:43.007",
"IsApproveRequired": false,
"IsMonitoring": false,
"IsUploaded": false,
"IsImported": true,
"SubStatus": null
},
{
"Targets": [
{
"Id": "6108c410-7d5c-41c9-979a-c8bf70e6f6bf",
"Host": "abc.com"
}
],
"Id": "de3a98a3-cb95-42ce-874b-00037347ebba",
"Status": 72,
"StartTime": "2021-06-07T19:50:01.85",
"WindowScanStart": "2021-06-07T19:49:44.517",
"WindowScanStop": null,
"StartedBy": "b6a3f887-0b4a-43e1-ba5b-fd93e60e58b6",
"StoppedTime": null,
"CompletionTime": "2021-06-07T19:50:53.667",
"IsApproveRequired": false,
"IsMonitoring": false,
"IsUploaded": false,
"IsImported": false,
"SubStatus": null
}
],
"IsSuccess": true,
"Reason": null,
"ErrorMessage": null
}

Python json.dumps using new dictionary not returning valid json format

I'm currently using a python module that helps with the tenable API to export asset data from tenable. The export function returns an "ExportIterator" type object to walk through the results of the export.
Essentially this returns too much data per asset, and I'm having difficulty figuring out how to filter out the data being returned so I can use it.
This returns thousands of json objects with hundreds of keys (I've removed and obfuscated several) like this:
{
"id": "1a2b3c",
"has_plugin_results": true,
"created_at": "xxx",
"terminated_at": null,
"terminated_by": null,
"updated_at": "xxx",
"deleted_at": null,
"deleted_by": null,
"first_seen": "",
"last_seen": "",
"first_scan_time": "xxx",
"last_scan_time": "xxx",
"last_authenticated_scan_date": "xxx",
"last_licensed_scan_date": "xxx,
"last_scan_id": "xxx,
"last_schedule_id": "xxx",
"azure_vm_id": null,
"azure_resource_id": null,
"gcp_project_id": null,
"gcp_zone": null,
"gcp_instance_id": null,
"aws_ec2_instance_ami_id": null,
"aws_ec2_instance_id": null,
"agent_uuid": "xxx",
"bios_uuid": "xxx",
"network_id": "xxx",
"network_name": "Default",
"aws_owner_id": null,
"aws_availability_zone": null,
"aws_region": null,
"aws_vpc_id": null,
"aws_ec2_instance_group_name": null,
"aws_ec2_instance_state_name": null,
"aws_ec2_instance_type": null,
"aws_subnet_id": null,
"aws_ec2_product_code": null,
"aws_ec2_name": null,
"mcafee_epo_guid": "{xxx}",
"mcafee_epo_agent_guid": "{xxx}",
"servicenow_sysid": null,
"agent_names": [
"aaabbbccc123"
],
"installed_software": [],
"ipv4s": [
"1.1.1.1",
"2.2.2.2"
],
"ipv6s": [],
"fqdns": [
"aaabbbbccc"
],
"mac_addresses": [
"aa:bb:cc"
],
"netbios_names": [
"aaabbbccc123"
],
"operating_systems": [
"foobar 10"
],
"system_types": [
"general-purpose"
],
"hostnames": [
"aaabbbccc123"
],
"sources": [
{
"name": "AGENT",
"first_seen": "xxx",
"last_seen": "xxx"
}
],
}
This module function for exporting doesn't support any arguments for filtering the json object itself.
To filter, I'm using this to map the "hostnames": value to a new key named "vmName" in a new dictioary:
from tenable.io import TenableIO
import json
tio = TenableIO()
wr = open('tioasset.json','w')
for asset in tio.exports.assets():
new_data = {'vmName' : asset['hostnames'],},
wr.write(json.dumps(new_data, indent = 2, separators=(',', ':')))
wr.close()
This drops all the unnecessary keys from the api response , but the formatting seems to be all wrong:
output from code:
][
{
"vmName":[
"aaabbbccc123"
]
}
][
{
"vmName":[
"dddeeefff123"
]
}
][
{
"vmName":[
"ggghhhiii123"
]
}
][
{
"vmName":[
"jjjkkklll123"
]
}
][
{
"vmName":[
"mmmnnooo123"
]
}
][
Any idea how to make the code return appropriately formatted json data dictionaries? something like this:
[
{
"vmName":"aaabbbccc123"
},
{
"vmName":"dddeeefff123"
},
{
"vmName":"ggghhhiii123"
},
{
"vmName":"jjjkkklll123"
}
]

that's because hostnames is an array:
if you want just take the first element (just replace this):
new_data = {'vmName' : asset['hostnames'][0]}
or you can do this if you have many hostnames in each array :
for asset in tio.exports.assets():
for a in asset['hostnames']:
new_data = {'vmName' : a,},
wr.write(json.dumps(new_data, indent = 2, separators=(',', ':')))

from tenable.io import TenableIO
import json
tio = TenableIO()
wr = open('tioasset.json','w')
result = []
for asset in tio.exports.assets():
for a in asset['hostnames']:
new_data = {'vmName' : a}
result.append(new_data)
wr.write(json.dumps(result))
wr.close()

Turning Nested JSON with Arrays into DataFrame in Python

I have a heavily nested set of Json that I would like to turn into a table
I would like to turn the below JSON response into a table under "steps" I could just extract "name" and "options" and there values
"data": {
"activities": [
{
"sections": [
{
"steps": [
{
"blocking": false,
"actionable": true,
"document": null,
"name": "Site",
"options": [
"RKM",
"Meridian"
],
"description": null,
"id": "036c3090-95c4-4162-a746-832ed43a2805",
"type": "DROPDOWN"
},
{
"blocking": false,
"actionable": true,
"document": null,
"name": "Location",
"options": [
"Field",
"Station"
],

Assuming that you want a pandas dataframe:
df = pd.DataFrame(json['data']['activities'][0]['sections'][0]['steps'])[['name', 'options']]
print(df)
Output:
name options
0 Site [RKM, Meridian]
1 Location [Field, Station]

JSON or Python dict / list decoding problem

I have been using the Python script below to try and retrieve and extract some data from Flightradar24, it would appear that it extracts the data in JSON format and will print the data out ok fully using json.dumps, but when I attempt to select the data I want (the status text in this case) using get it gives the following error:
'list' object has no attribute 'get'
Is the Data in JSON or a List ? I'm totally confused now.
I'm fairly new to working with data in JSON format, any help would be appreciated!
Script:
import flightradar24
import json
flight_id = 'BA458'
fr = flightradar24.Api()
flight = fr.get_flight(flight_id)
y = flight.get("data")
print (json.dumps(flight, indent=4))
X= (flight.get('result').get('response').get('data').get('status').get('text'))
print (X)
Sample of output data:
{
"result": {
"request": {
"callback": null,
"device": null,
"fetchBy": "flight",
"filterBy": null,
"format": "json",
"limit": 25,
"page": 1,
"pk": null,
"query": "BA458",
"timestamp": null,
"token": null
},
"response": {
"item": {
"current": 16,
"total": null,
"limit": 25
},
"page": {
"current": 1,
"total": null
},
"timestamp": 1546241512,
"data": [
{
"identification": {
"id": null,
"row": 4852575431,
"number": {
"default": "BA458",
"alternative": null
},
"callsign": null,
"codeshare": null
},
"status": {
"live": false,
"text": "Scheduled",
"icon": null,
"estimated": null,
"ambiguous": false,
"generic": {
"status": {
"text": "scheduled",
"type": "departure",
"color": "gray",
"diverted": null
},

You can use print(type(variable_name)) to see what type it is. The .get(key[,default]) is not supported on lists - it is supported for dict's.
X = (flight.get('result').get('response').get('data').get('status').get('text'))
# ^^^^^^^^ does not work, data is a list of dicts
as data is a list of dicts:
"data": [ # <<<<<< this is a list
{
"identification": {
"id": null,
"row": 4852575431,
"number": {
"default": "BA458",
"alternative": null
},
"callsign": null,
"codeshare": null
},
"status": {
This should work:
X = (flight.get('result').get('response').get('data')[0].get('status').get('text')

The issue, as pointed out by #PatrickArtner, is your data is actually a list rather than a dictionary. As an aside, you may find your code more readable if you were to use a helper function to apply dict.get repeatedly on a nested dictionary:
from functools import reduce
def ng(dataDict, mapList):
"""Nested Getter: Iterate nested dictionary"""
return reduce(dict.get, mapList, dataDict)
X = ng(ng(flight, ['result', 'response', 'data'])[0], ['status'[, 'text']])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Json extraction of specfic field via Python - python

Related

"NoneType object is not iterable" error while appending JSON data to lists

I am Curious on how should I approach to get the JSON to Pandas

Python json.dumps using new dictionary not returning valid json format

Turning Nested JSON with Arrays into DataFrame in Python

JSON or Python dict / list decoding problem

Categories

Resources