Turning Nested JSON with Arrays into DataFrame in Python - python

I have a heavily nested set of Json that I would like to turn into a table
I would like to turn the below JSON response into a table under "steps" I could just extract "name" and "options" and there values
"data": {
"activities": [
{
"sections": [
{
"steps": [
{
"blocking": false,
"actionable": true,
"document": null,
"name": "Site",
"options": [
"RKM",
"Meridian"
],
"description": null,
"id": "036c3090-95c4-4162-a746-832ed43a2805",
"type": "DROPDOWN"
},
{
"blocking": false,
"actionable": true,
"document": null,
"name": "Location",
"options": [
"Field",
"Station"
],

Assuming that you want a pandas dataframe:
df = pd.DataFrame(json['data']['activities'][0]['sections'][0]['steps'])[['name', 'options']]
print(df)
Output:
name options
0 Site [RKM, Meridian]
1 Location [Field, Station]

Related

How merge or join data in a Pandas nested DataFrame

I'm trying to figure out how to perform a Merge or Join on a nested field in a DataFrame. Below is some example data:
df_all_groups = pd.read_json("""
[
{
"object": "group",
"id": "group-one",
"collections": [
{
"id": "111-111-111",
"readOnly": false
},
{
"id": "222-222-222",
"readOnly": false
}
]
},
{
"object": "group",
"id": "group-two",
"collections": [
{
"id": "111-111-111",
"readOnly": false
},
{
"id": "333-333-333",
"readOnly": false
}
]
}
]
""")
df_collections_with_names = pd.read_json("""
[
{
"object": "collection",
"id": "111-111-111",
"externalId": null,
"name": "Cats"
},
{
"object": "collection",
"id": "222-222-222",
"externalId": null,
"name": "Dogs"
},
{
"object": "collection",
"id": "333-333-333",
"externalId": null,
"name": "Fish"
}
]
""")
I'm trying to add the name field from df_collections_with_names to each df_all_groups['collections'][<index>] by joining on df_all_groups['collections'][<index>].id The output I'm trying to get to is:
[
{
"object": "group",
"id": "group-one",
"collections": [
{
"id": "111-111-111",
"readOnly": false,
"name": "Cats" // See Collection name was added
},
{
"id": "222-222-222",
"readOnly": false,
"name": "Dogs" // See Collection name was added
}
]
},
{
"object": "group",
"id": "group-two",
"collections": [
{
"id": "111-111-111",
"readOnly": false,
"name": "Cats" // See Collection name was added
},
{
"id": "333-333-333",
"readOnly": false,
"name": "Fish" // See Collection name was added
}
]
}
]
I've tried to use the merge method, but can't seem to get it to run on the collections nested field as I believe it's a series at that point.
Here's one approach:
First convert the json string used to construct df_all_groups (I named it all_groups here) to a dictionary using json.loads. Then use json_normalize to contruct a DataFrame with it.
Then merge the DataFrame constructed above with df_collections_with_names; we have "names" column now.
The rest is constructing the desired dictionary from the result obtained above; groupby + apply(to_dict) + reset_index + to_dict will fetch the desired outcome:
import json
out = (pd.json_normalize(json.loads(all_groups), ['collections'], ['object', 'id'], meta_prefix='_')
.merge(df_collections_with_names, on='id', suffixes=('','_'))
.drop(columns=['object','externalId']))
out = (out.groupby(['_object','_id']).apply(lambda x: x[['id','readOnly','name']].to_dict('records'))
.reset_index(name='collections'))
out.rename(columns={c: c.strip('_') for c in out.columns}).to_dict('records')
Output:
[{'object': 'group',
'id': 'group-one',
'collections': [{'id': '111-111-111', 'readOnly': False, 'name': 'Cats'},
{'id': '222-222-222', 'readOnly': False, 'name': 'Dogs'}]},
{'object': 'group',
'id': 'group-two',
'collections': [{'id': '111-111-111', 'readOnly': False, 'name': 'Cats'},
{'id': '333-333-333', 'readOnly': False, 'name': 'Fish'}]}]

How to apply filter on array of object in python

I am a Java developer trying to write a script in python. I have the following JSON: (json1)
{
"name": "Data",
"id": "9c27ea56-6f9e-4694-a088-c1423c114e88",
"nodes": [
{
"id": "6eb8cb19-53e5-4f1d-88f3-119a3ee09b5d",
"properties": [
{
"visible": true,
"uniqueness": true,
"id": "ab6d8974-59a2-4f5d-b026-d9ed54886fa8",
"dataType": "String",
"shortName": "id",
"longName": "id"
},
{
"visible": true,
"uniqueness": false,
"id": "24f5547e-c360-4293-9e67-10b19d38c774",
"dataType": "String",
"shortName": "event_id",
"longName": "event_id"
},
{
"visible": true,
"uniqueness": false,
"id": "2929d02b-950f-4495-8b36-04ea4c142f8d",
"dataType": "String",
"shortName": "event_name",
"longName": "event_name"
},
{
"visible": true,
"uniqueness": false,
"id": "3456d02b-950f-4495-8b36-04ea4c142f8d",
"dataType": "String",
"shortName": "event_date",
"longName": "Event Date"
}
]
}
]
}
And I've another same json but some modifications. Something like this:(json2)
{
"name": "Data",
"id": "9c27ea56-6f9e-4694-a088-c1423c114e88",
"nodes": [
{
"id": "6eb8cb19-53e5-4f1d-88f3-119a3ee09b5d",
"properties": [
{
"visible": true,
"uniqueness": true,
"id": "ab6d8974-59a2-4f5d-b026-d9ed54886fa8",
"dataType": "String",
"shortName": "id",
"longName": "id"
},
{
"visible": true,
"uniqueness": false,
"id": "24f5547e-c360-4293-9e67-10b19d38c774",
"dataType": "String",
"shortName": "event_id",
"longName": "event_id"
},
{
"visible": true,
"uniqueness": false,
"id": "2929d02b-950f-4495-8b36-04ea4c142f8d",
"dataType": "String",
"shortName": "event_name",
"longName": "Event Name"
},
{
"visible": true,
"uniqueness": false,
"id": "123245g-950f-4495-8b36-04ea4c142f8d",
"dataType": "String",
"shortName": "event_status",
"longName": "Event Status"
}
]
}
]
}
Now here in json2, I've added some more objects in the properties array other than json1.
When I'll iterate the json1 on the "properties" array I want to check whether it is present in the json2 array of "properties". And to check this I want to verify whether shortName is equal (in json1 properties array and json2 properties array)
Here json1 is considered to be as base object.
then How to filter "properties" array based on shortName (from "properties" array). As properties is an array of objects.
The most convenient way is to use json.loads, thus you could get then get the key, value pares.
# load the json object
jsonlist = json.loads(your_json)
#Show the list from properties
print(jsonlist['properties'])
#get the kets from the json list
for field in jsonlist['properties']:
for key, value in field.items():
print(key, value)

JSON or Python dict / list decoding problem

I have been using the Python script below to try and retrieve and extract some data from Flightradar24, it would appear that it extracts the data in JSON format and will print the data out ok fully using json.dumps, but when I attempt to select the data I want (the status text in this case) using get it gives the following error:
'list' object has no attribute 'get'
Is the Data in JSON or a List ? I'm totally confused now.
I'm fairly new to working with data in JSON format, any help would be appreciated!
Script:
import flightradar24
import json
flight_id = 'BA458'
fr = flightradar24.Api()
flight = fr.get_flight(flight_id)
y = flight.get("data")
print (json.dumps(flight, indent=4))
X= (flight.get('result').get('response').get('data').get('status').get('text'))
print (X)
Sample of output data:
{
"result": {
"request": {
"callback": null,
"device": null,
"fetchBy": "flight",
"filterBy": null,
"format": "json",
"limit": 25,
"page": 1,
"pk": null,
"query": "BA458",
"timestamp": null,
"token": null
},
"response": {
"item": {
"current": 16,
"total": null,
"limit": 25
},
"page": {
"current": 1,
"total": null
},
"timestamp": 1546241512,
"data": [
{
"identification": {
"id": null,
"row": 4852575431,
"number": {
"default": "BA458",
"alternative": null
},
"callsign": null,
"codeshare": null
},
"status": {
"live": false,
"text": "Scheduled",
"icon": null,
"estimated": null,
"ambiguous": false,
"generic": {
"status": {
"text": "scheduled",
"type": "departure",
"color": "gray",
"diverted": null
},
You can use print(type(variable_name)) to see what type it is. The .get(key[,default]) is not supported on lists - it is supported for dict's.
X = (flight.get('result').get('response').get('data').get('status').get('text'))
# ^^^^^^^^ does not work, data is a list of dicts
as data is a list of dicts:
"data": [ # <<<<<< this is a list
{
"identification": {
"id": null,
"row": 4852575431,
"number": {
"default": "BA458",
"alternative": null
},
"callsign": null,
"codeshare": null
},
"status": {
This should work:
X = (flight.get('result').get('response').get('data')[0].get('status').get('text')
The issue, as pointed out by #PatrickArtner, is your data is actually a list rather than a dictionary. As an aside, you may find your code more readable if you were to use a helper function to apply dict.get repeatedly on a nested dictionary:
from functools import reduce
def ng(dataDict, mapList):
"""Nested Getter: Iterate nested dictionary"""
return reduce(dict.get, mapList, dataDict)
X = ng(ng(flight, ['result', 'response', 'data'])[0], ['status'[, 'text']])

Manipulating data from json to reflect a single value from each entry

Setup:
This data set has 50 "issues", within these "issues" i have captured the data that I need to then put into my postgresql database. But when i get to "components" is where i have trouble. I am able to get a list of all "names" of "components" but only want to have 1 instance of "name" for each "issue", and some of them have 2. Some are empty and would like to return null for those.
Here is some sample data that should suffice:
{
"issues": [
{
"key": "1",
"fields": {
"components": [],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "2",
"fields": {
"components": [
{
"name": "Testing"
}
],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "3",
"fields": {
"components": [
{
"name": "Documentation"
},
{
"name": "Manufacturing"
}
],
"customfield_1": null,
"customfield_2": 5
}
}
]
}
I am looking to return (just for the component name piece):
['null', 'Testing', 'Documentation']
I set up the other data for entry into the db like so:
values = list((item['key'],
//components list,
item['fields']['customfield_1'],
item['fields']['customfield_2']) for item in data_story['issues'])
I am wondering if there is a possible way to enter in the created components list where i have commented "components list" above
Just for recap, i want to have only 1 component name for each issue null or not and be able to have it put in the the values variable with the rest of the data. Also the first name in components will work for each "issue"
Here's what I would do, assuming that we are working with a data variable:
values = [(x['fields']['components'][0]['name'] if len(x['fields']['components']) != 0 else 'null') for x in data['issues']]
Let me know if you have any queries.
in dict comprehension use if/else
example code is
results = [ (x['fields']['components'][0]['name'] if 'components' in x['fields'] and len(x['fields']['components']) > 0 else 'null') for x in data['issues'] ]
full sample code is
import json
data = json.loads('''{ "issues": [
{
"key": "1",
"fields": {
"components": [],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "2",
"fields": {
"components": [
{
"name": "Testing"
}
],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "3",
"fields": {
"components": [
{
"name": "Documentation"
},
{
"name": "Manufacturing"
}
],
"customfield_1": null,
"customfield_2": 5
}
}
]
}''')
results = [ (x['fields']['components'][0]['name'] if 'components' in x['fields'] and len(x['fields']['components']) > 0 else 'null') for x in data['issues'] ]
print(results)
output is ['null', u'Testing', u'Documentation']
If you just want to delete all but one of the names from the list, then you can do that this way:
issues={
"issues": [
{
"key": "1",
"fields": {
"components": [],
"customfield_1": "null",
"customfield_2": "null"
}
},
{
"key": "2",
"fields": {
"components": [
{
"name": "Testing"
}
],
"customfield_1": "null",
"customfield_2": "null"
}
},
{
"key": "3",
"fields": {
"components": [
{
"name": "Documentation"
},
{
"name": "Manufacturing"
}
],
"customfield_1": "null",
"customfield_2": 5
}
}
]
}
Data^
componentlist=[]
for i in range(len(issues["issues"])):
x= issues["issues"][i]["fields"]["components"]
if len(x)==0:
x="null"
componentlist.append(x)
else:
x=issues["issues"][i]["fields"]["components"][0]
componentlist.append(x)
print(componentlist)
>>>['null', {'name': 'Testing'}, {'name': 'Documentation'}]
Or, if you just want the values, and not the dictionary keys:
else:
x=issues["issues"][i]["fields"]["components"][0]["name"]
componentlist.append(x)
['null', 'Testing', 'Documentation']

Json extraction of specfic field via Python

Trying to get the "externalCode" field from the below incomplete json file, however i am lost, i used python to only get to second element and get the error. I am not sure how to go about traversing through a nested JSON as such below
output.writerow([row['benefitCategories'], row['benefitValueSets']] + row['disabled'].values())
KeyError: 'benefitValueSets'
import csv, json, sys
input = open('C:/Users/kk/Downloads/foo.js', 'r')
data = json.load(input)
input.close()
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow([row['benefitCategories'], row['benefitValueSets']] + row['disabled'].values())
Json file
[
{
"benefitCategories": [
{
"benefits": [
{
"benefitCode": "NutritionLabel",
"benefitCustomAttributeSets": [
],
"benefitValueSets": [
{
"benefitValues": [
null
],
"costDifferential": 0,
"default": false,
"disabled": false,
"displayValue": "$500",
"externalCode": null,
"id": null,
"internalCode": "$500",
"selected": false,
"sortOrder": 0
}
],
"configurable": false,
"displayName": "DEDUCTIBLE",
"displayType": null,
"externalCode": "IndividualInNetdeductibleAmount",
"id": null,
"key": "IndividualInNetdeductibleAmount",
"productBenefitRangeValue": null,
"sortOrder": 0,
"values": [
{
"code": null,
"description": null,
"id": null,
"numericValue": null,
"selected": false,
"value": "$500"
}
]
},
{
"benefitCode": "NutritionLabel",
"benefitCustomAttributeSets": [
],
"benefitValueSets": [
{
"benefitValues": [
null
],
"costDifferential": 0,
"default": false,
"disabled": false,
"displayValue": "100%",
"externalCode": null,
"id": null,
"internalCode": "100%",
"selected": false,
"sortOrder": 0
}
],
"configurable": false,
"displayName": "COINSURANCE",
"displayType": null,
"externalCode": "PhysicianOfficeInNetCoInsurancePct",
"id": null,
"key": "PhysicianOfficeInNetCoInsurancePct",
"productBenefitRangeValue": null,
"sortOrder": 0,
"values": [
{
"code": null,
"description": null,
"id": null,
"numericValue": null,
"selected": false,
"value": "100%"
}
]
},
{
Try this code:
import csv, json, sys
input = open('C:/Users/spolireddy/Downloads/foo.js', 'r')
data = json.load(input)
input.close()
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow([row['benefitCategories'], row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0], row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0]['disabled']])
# for externalCode:
row['benefitCategories'][0]['benefits'][0]['benefitValueSets'][0]['externalCode']
I'm not quite sure I understand what you're looking to do with your code. There are multiple externalCode values for each element in the array, at least from the sample you've posted. But you can get the data you're looking for with this syntax:
data[0]["benefitCategories"][0]["benefits"][0]["externalCode"]
data[0]["benefitCategories"][0]["benefits"][1]["externalCode"]
The code below iterates through the data you're interested in (with a slightly modified JSON file so that it's complete) and works as desired:
import csv, json, sys
input = open('junk.json', 'r')
data = json.load(input)
input.close()
for x in data[0]["benefitCategories"][0]["benefits"]:
print x["externalCode"] + "\n\n"

Categories

Resources