Using pandas and json_normalize to flatten nested JSON API response

Using pandas and json_normalize to flatten nested JSON API response - python

I have a deeply nested JSON that I am trying to turn into a Pandas Dataframe using json_normalize.
A generic sample of the JSON data I'm working with looks looks like this (I've added context of what I'm trying to do at the bottom of the post):
{
"per_page": 2,
"total": 1,
"data": [{
"total_time": 0,
"collection_mode": "default",
"href": "https://api.surveymonkey.com/v3/responses/5007154325",
"custom_variables": {
"custvar_1": "one",
"custvar_2": "two"
},
"custom_value": "custom identifier for the response",
"edit_url": "https://www.surveymonkey.com/r/",
"analyze_url": "https://www.surveymonkey.com/analyze/browse/",
"ip_address": "",
"pages": [
{
"id": "103332310",
"questions": [{
"answers": [{
"choice_id": "3057839051"
}
],
"id": "319352786"
}
]
},
{
"id": "44783164",
"questions": [{
"id": "153745381",
"answers": [{
"text": "some_name"
}
]
}
]
},
{
"id": "44783183",
"questions": [{
"id": "153745436",
"answers": [{
"col_id": "1087201352",
"choice_id": "1087201369",
"row_id": "1087201362"
}, {
"col_id": "1087201353",
"choice_id": "1087201373",
"row_id": "1087201362"
}
]
}
]
}
],
"date_modified": "1970-01-17T19:07:34+00:00",
"response_status": "completed",
"id": "5007154325",
"collector_id": "50253586",
"recipient_id": "0",
"date_created": "1970-01-17T19:07:34+00:00",
"survey_id": "105723396"
}
],
"page": 1,
"links": {
"self": "https://api.surveymonkey.com/v3/surveys/123456/responses/bulk?page=1&per_page=2"
}
}
I'd like to end up with a dataframe that contains the question_id, page_id, response_id, and response data like this:
choice_id col_id row_id text question_id page_id response_id
0 3057839051 NaN NaN NaN 319352786 103332310 5007154325
1 NaN NaN NaN some_name 153745381 44783164 5007154325
2 1087201369 1087201352 1087201362 NaN 153745436 44783183 5007154325
3 1087201373 1087201353 1087201362 NaN 153745436 44783183 5007154325
I can get close by running the following code (Python 3.6):
df = json_normalize(data=so_survey_responses['data'], record_path=['pages', 'questions'], meta='id', record_prefix ='question_')
print(df)
Which returns:
question_answers question_id id
0 [{'choice_id': '3057839051'}] 319352786 5007154325
1 [{'text': 'some_name'}] 153745381 5007154325
2 [{'col_id': '1087201352', 'choice_id': '108720... 153745436 5007154325
But if I try to run json_normalize at a deeper nest and keep the 'question_id' data from the above result, I can only get the page_id values to return, not true question_id values:
answers_df = json_normalize(data=so_survey_responses['data'], record_path=['pages', 'questions', 'answers'], meta=['id', ['questions', 'id'], ['pages', 'id']])
print(answers_df)
Returns:
choice_id col_id row_id text id questions.id pages.id
0 3057839051 NaN NaN NaN 5007154325 103332310 103332310
1 NaN NaN NaN some_name 5007154325 44783164 44783164
2 1087201369 1087201352 1087201362 NaN 5007154325 44783183 44783183
3 1087201373 1087201353 1087201362 NaN 5007154325 44783183 44783183
A complicating factor may be that all the above (question_id, page_id, response_id) are 'id:' in the JSON data.
I'm sure this is possible, but I can't get there. Any examples of how to do this?
Additional context:
I'm trying to create a dataframe of SurveyMonkey API response output.
My long term goal is to re-create the "all responses" excel sheet that their export service provides.
I plan to do this by getting the response dataframe set up (above), and then use .apply() to match responses with their survey structure API output.
I've found the SurveyMonkey API pretty lackluster at providing useful output, but I'm new to Pandas so it's probably on me.

You need to modify the meta parameter of your last option, and, if you want to rename columns to be exactly the way you want, you could do it with rename:
answers_df = json_normalize(data=so_survey_responses['data'],
record_path=['pages', 'questions', 'answers'],
meta=['id', ['pages', 'questions', 'id'], ['pages', 'id']])\
.rename(index=str,
columns={'id': 'response_id', 'pages.questions.id': 'question_id', 'pages.id': 'page_id'})

There is no way to do this in a completely generic way using json_normalize(). You can use the record_path and meta arguments to indicate how you want the JSON to be processed.
However, you can use the flatten package to flatten your deeply nested JSON and then convert that to a Pandas dataframe. The page has example usage of how to flatten a deeply-nested JSON and convert to a Pandas dataframe.

Related

How to flatten dict in a DataFrame & concatenate all resultant rows

I am using Github's GraphQL API to fetch some issue details.
I used Python Requests to fetch the data locally.
This is how the output.json looks like
{
"data": {
"viewer": {
"login": "some_user"
},
"repository": {
"issues": {
"edges": [
{
"node": {
"id": "I_kwDOHQ63-s5auKbD",
"title": "test issue 1",
"number": 146,
"createdAt": "2023-01-06T06:39:54Z",
"closedAt": null,
"state": "OPEN",
"updatedAt": "2023-01-06T06:42:00Z",
"comments": {
"edges": [
{
"node": {
"id": "IC_kwDOHQ63-s5R2XCV",
"body": "comment 01"
}
},
{
"node": {
"id": "IC_kwDOHQ63-s5R2XC9",
"body": "comment 02"
}
}
]
},
"labels": {
"edges": []
}
},
"cursor": "Y3Vyc29yOnYyOpHOWrimww=="
},
{
"node": {
"id": "I_kwDOHQ63-s5auKm8",
"title": "test issue 2",
"number": 147,
"createdAt": "2023-01-06T06:40:34Z",
"closedAt": null,
"state": "OPEN",
"updatedAt": "2023-01-06T06:40:34Z",
"comments": {
"edges": []
},
"labels": {
"edges": [
{
"node": {
"name": "food"
}
},
{
"node": {
"name": "healthy"
}
}
]
}
},
"cursor": "Y3Vyc29yOnYyOpHOWripvA=="
}
]
}
}
}
}
The json was put inside a list using
result = response.json()["data"]["repository"]["issues"]["edges"]
And then this list was put inside a DataFrame
import pandas as pd
df = pd.DataFrame (result, columns = ['node', 'cursor'])
df
These are the contents of the data frame
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
{'edges': [{'node': {'id': 'IC_kwDOHQ63-s5R2XCV","body": "comment 01"}},{'node': {'id': 'IC_kwDOHQ63-s5R2XC9","body": "comment 02"}}]}
{'edges': []}
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
{'edges': []}
{'edges': [{'node': {'name': 'food"}},{'node': {'name': 'healthy"}}]}
I would like to split/explode the comments and labels columns.
The values in these columns are nested dictionaries
I would like there to be as many rows for a single issue, as there are comments & labels.
I would like to flatten out the data frame.
So this involves split/explode and concat.
There are several stackoverflow answers that delve on this topic. And I have tried the code from several of them.
I can not paste the links to those questions, because stackoverflow marks my question as spam due to many links.
But these are the steps I have tried
df3 = df2['comments'].apply(pd.Series)
Drill down further
df4 = df3['edges'].apply(pd.Series)
df4
Drill down further
df5 = df4['node'].apply(pd.Series)
df5
The last statement above gives me the KeyError: 'node'
I understand, this is because node is not a key in the DataFrame.
But how else can i split this dictionary and concatenate all columns back to my issues row?
This is how I would like the output to look like
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
comment 01
Null
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
comment 02
Null
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
Null
food
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
Null
healthy

If dct is your dictionary from the question you can try:
df = pd.DataFrame(d['node'] for d in dct['data']['repository']['issues']['edges'])
df['comments'] = df['comments'].str['edges']
df = df.explode('comments')
df['comments'] = df['comments'].str['node'].str['body']
df['labels'] = df['labels'].str['edges']
df = df.explode('labels')
df['labels'] = df['labels'].str['node'].str['name']
print(df.to_markdown(index=False))
Prints:
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
OPEN
2023-01-06T06:42:00Z
comment 01
nan
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
OPEN
2023-01-06T06:42:00Z
comment 02
nan
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
OPEN
2023-01-06T06:40:34Z
nan
food
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
OPEN
2023-01-06T06:40:34Z
nan
healthy

#andrej-kesely has answered my question.
I have selected his response as the answer for this question.
I am now posting a consolidated script that includes my poor code and andrej's great code.
In this script i want to fetch details from Github's GraphQL API Server.
And put it inside pandas.
Primary source for this script is this gist.
And a major chunk of remaining code is an answer by #andrej-kesely.
Now onto the consolidated script.
First import the necessary packages and set headers
import requests
import json
import pandas as pd
headers = {"Authorization": "token <your_github_personal_access_token>"}
Now define the query that will fetch data from github.
In my particular case, I am fetching issue details form a particular repo
it can be something else for you.
query = """
{
viewer {
login
}
repository(name: "your_github_repo", owner: "your_github_user_name") {
issues(states: OPEN, last: 2) {
edges {
node {
id
title
number
createdAt
closedAt
state
updatedAt
comments(first: 10) {
edges {
node {
id
body
}
}
}
labels(orderBy: {field: NAME, direction: ASC}, first: 10) {
edges {
node {
name
}
}
}
comments(first: 10) {
edges {
node {
id
body
}
}
}
}
cursor
}
}
}
}
"""
Execute the query and save the response
def run_query(query):
request = requests.post('https://api.github.com/graphql', json={'query': query}, headers=headers)
if request.status_code == 200:
return request.json()
else:
raise Exception("Query failed to run by returning code of {}. {}".format(request.status_code, query))
result = run_query(query)
And now is the trickiest part.
In my query response, there are several nested dictionaries.
I would like to split them - more details in my question above.
This magic code from #andrej-kesely does that for you.
df = pd.DataFrame(d['node'] for d in result['data']['repository']['issues']['edges'])
df['comments'] = df['comments'].str['edges']
df = df.explode('comments')
df['comments'] = df['comments'].str['node'].str['body']
df['labels'] = df['labels'].str['edges']
df = df.explode('labels')
df['labels'] = df['labels'].str['node'].str['name']
print(df)

Python Pandas json_normalize with multiple lists of dicts

I'm trying to flatten a JSON file that was originally converted from XML using xmltodict(). There are multiple fields that may have a list of dictionaries. I've tried using record_path with meta data to no avail, but I have not been able to get it to work when there are multiple fields that may have other nested fields. It's expected that some fields will be empty for any given record
I have tried searching for another topic and couldn't find my specific problem with multiple nested fields. Can anyone point me in the right direction?
Thanks for any help that can be provided!
Sample base Python (without the record path)
import pandas as pd
import json
with open('./example.json', encoding="UTF-8") as json_file:
json_dict = json.load(json_file)
df = pd.json_normalize(json_dict['WIDGET'])
print(df)
df.to_csv('./test.csv', index=False)
Sample JSON
{
"WIDGET": [
{
"ID": "6",
"PROBLEM": "Electrical",
"SEVERITY_LEVEL": "1",
"TITLE": "Battery's Missing",
"CATEGORY": "User Error",
"LAST_SERVICE": "2020-01-04T17:39:37Z",
"NOTICE_DATE": "2022-01-01T08:00:00Z",
"FIXABLE": "1",
"COMPONENTS": {
"WHATNOTS": {
"WHATNOT1": "Battery Compartment",
"WHATNOT2": "Whirlygig"
}
},
"DIAGNOSIS": "Customer needs to put batteries in the battery compartment",
"STATUS": "0",
"CONTACT_TYPE": {
"CALL": "1"
}
},
{
"ID": "1004",
"PROBLEM": "Electrical",
"SEVERITY_LEVEL": "4",
"TITLE": "Flames emit from unit",
"CATEGORY": "Dangerous",
"LAST_SERVICE": "2015-06-04T21:40:12Z",
"NOTICE_DATE": "2022-01-01T08:00:00Z",
"FIXABLE": "0",
"DIAGNOSIS": "A demon seems to have possessed the unit and his expelling flames from it",
"CONSEQUENCE": "Could burn things",
"SOLUTION": "Call an exorcist",
"KNOWN_PROBLEMS": {
"PROBLEM": [
{
"TYPE": "RECALL",
"NAME": "Bad Servo",
"DESCRIPTION": "Bad servo's shipped in initial product"
},
{
"TYPE": "FAILURE",
"NAME": "Operating outside normal conditions",
"DESCRIPTION": "Device failed when customer threw into wood chipper"
}
]
},
"STATUS": "1",
"REPAIR_BULLETINS": {
"BULLETIN": [
{
"#id": "4",
"#text": "Known target of the occult"
},
{
"#id": "5",
"#text": "Not meant to be thrown into wood chippers"
}
]
},
"CONTACT_TYPE": {
"CALL": "1"
}
}
]
}
Sample CSV
ID
PROBLEM
SEVERITY_LEVEL
TITLE
CATEGORY
LAST_SERVICE
NOTICE_DATE
FIXABLE
DIAGNOSIS
STATUS
COMPONENTS.WHATNOTS.WHATNOT1
COMPONENTS.WHATNOTS.WHATNOT2
CONTACT_TYPE.CALL
CONSEQUENCE
SOLUTION
KNOWN_PROBLEMS.PROBLEM
REPAIR_BULLETINS.BULLETIN
6
Electrical
1
Battery's Missing
User Error
2020-01-04T17:39:37Z
2022-01-01T08:00:00Z
1
Customer needs to put batteries in the battery compartment
0
Battery Compartment
Whirlygig
1
1004
Electrical
4
Flames emit from unit
Dangerous
2015-06-04T21:40:12Z
2022-01-01T08:00:00Z
0
A demon seems to have possessed the unit and his expelling flames from it
1
1
Could burn things
Call an exorcist
[{'TYPE': 'RECALL', 'NAME': 'Bad Servo', 'DESCRIPTION': "Bad servo's shipped in initial product"}, {'TYPE': 'FAILURE', 'NAME': 'Operating outside normal conditions', 'DESCRIPTION': 'Device failed when customer threw into wood chipper'}]
[{'#id': '4', '#text': 'Known target of the occult'}, {'#id': '5', '#text': 'Not meant to be thrown into wood chippers'}]

I have attempted to extract the data and turned it into nested dictionary (instead of nested with list), so that pd.json_normalize() can work
for row in range(len(json_dict['WIDGET'])):
try:
lis = json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
del json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
for i, item in enumerate(lis):
json_dict['WIDGET'][row]['KNOWN_PROBLEMS'][str(i)] = item
lis = json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
del json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
for i, item in enumerate(lis):
json_dict['WIDGET'][row]['REPAIR_BULLETINS'][str(i)] = item
except KeyError:
continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)
If you have to manually add the varying keys from the larger dataset, here's a way to extract them automatically by identifying them as type list (and provided they are nested by 2 levels only)
linkage = []
for item in json_dict['WIDGET']:
for k1 in item.keys(): #get keys from first level
if isinstance(item[k1], str):
continue
#print(item[k1])
for k2 in item[k1].keys(): #get keys from second level
if isinstance(item[k1][k2], str):
continue
#print(item[k1][k2])
if isinstance(item[k1][k2], list):
linkage.append((k1, k2))
print(linkage)
# [('KNOWN_PROBLEMS', 'PROBLEM'), ('REPAIR_BULLETINS', 'BULLETIN')]
for row in range(len(json_dict['WIDGET'])):
for link in linkage:
try:
lis = json_dict['WIDGET'][row][link[0]][link[1]]
del json_dict['WIDGET'][row][link[0]][link[1]] #delete original dict value (which is a list)
for i, item in enumerate(lis):
json_dict['WIDGET'][row][link[0]][str(i)] = item #replace list with dict value (which is a dict)
except KeyError:
continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)
Output:
0 1
ID 6 1004
PROBLEM Electrical Electrical
SEVERITY_LEVEL 1 4
TITLE Battery's Missing Flames emit from unit
CATEGORY User Error Dangerous
LAST_SERVICE 2020-01-04T17:39:37Z 2015-06-04T21:40:12Z
NOTICE_DATE 2022-01-01T08:00:00Z 2022-01-01T08:00:00Z
FIXABLE 1 0
DIAGNOSIS Customer needs to put batt... A demon seems to have poss...
STATUS 0 1
COMPONENTS.WHATNOTS.WHATNOT1 Battery Compartment NaN
COMPONENTS.WHATNOTS.WHATNOT2 Whirlygig NaN
CONTACT_TYPE.CALL 1 1
CONSEQUENCE NaN Could burn things
SOLUTION NaN Call an exorcist
KNOWN_PROBLEMS.0.TYPE NaN RECALL
KNOWN_PROBLEMS.0.NAME NaN Bad Servo
KNOWN_PROBLEMS.0.DESCRIPTION NaN Bad servo's shipped in ini...
KNOWN_PROBLEMS.1.TYPE NaN FAILURE
KNOWN_PROBLEMS.1.NAME NaN Operating outside normal c...
KNOWN_PROBLEMS.1.DESCRIPTION NaN Device failed when custome...
REPAIR_BULLETINS.0.#id NaN 4
REPAIR_BULLETINS.0.#text NaN Known target of the occult
REPAIR_BULLETINS.1.#id NaN 5
REPAIR_BULLETINS.1.#text NaN Not meant to be thrown int...

Creating pandas dataframe from accessing specific keys of nested dictionary

How can below dictionary converted to expected dataframe like below?
{
"getArticleAttributesResponse": {
"attributes": [{
"articleId": {
"id": "2345",
"locale": "en_US"
},
"keyValuePairs": [{
"key": "tags",
"value": "[{\"displayName\": \"Nice\", \"englishName\": \"Pradeep\", \"refKey\": \"Key2\"}, {\"displayName\": \"Family Sharing\", \"englishName\": \"Sarvendra\", \"refKey\": \"Key1\", \"meta\": {\"customerDisplayable\": [false]}}}]"
}]
}]
}
}
Expected dataframe:
id displayName englistname refKey
2345 Nice Pradeep Key2
2345 Family Sharing Sarvendra Key1

df1 = pd.DataFrame(d['getDDResponse']['attributes']).explode('keyValuePairs')
df2 = pd.concat([df1[col].apply(pd.Series) for col in df1],1).assign(value = lambda x :x.value.apply(eval)).explode('value')
df = pd.concat([df2[col].apply(pd.Series) for col in df2],1)
OUTPUT:
0 0 display englishName reference source
0 1234 tags Unarchived Unarchived friend monster

Extracting str from pandas dataframe using json

I read csv file into a dataframe named df
Each rows contains str below.
'{"id":2140043003,"name":"Olallo Rubio",...}'
I would like to extract "name" and "id" from each row and make a new dataframe to store the str.
I use the following codes to extract but it shows an error. Please let me know if there is any suggestions on how to solve this problem. Thanks
JSONDecodeError: Expecting ',' delimiter: line 1 column 32 (char 31)

text={
"id": 2140043003,
"name": "Olallo Rubio",
"is_registered": True,
"chosen_currency": 'Null',
"avatar": {
"thumb": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=40&h=40&fit=crop&v=1510685152&auto=format&q=92&s=653706657ccc49f68a27445ea37ad39a",
"small": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa",
"medium": "https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa"
},
"urls": {
"web": {
"user": "https://www.kickstarter.com/profile/2140043003"
},
"api": {
"user": "https://api.kickstarter.com/v1/users/2140043003?signature=1531480520.09df9a36f649d71a3a81eb14684ad0d3afc83e03"
}
}
}
def extract(text,*args):
list1=[]
for i in args:
list1.append(text[i])
return list1
print(extract(text,'name','id'))
# ['Olallo Rubio', 2140043003]

Here's what I came up with using pandas.json_normalize():
import pandas as pd
sample = [{
"id": 2140043003,
"name":"Olallo Rubio",
"is_registered": True,
"chosen_currency": None,
"avatar":{
"thumb":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=40&h=40&fit=crop&v=1510685152&auto=format&q=92&s=653706657ccc49f68a27445ea37ad39a",
"small":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa",
"medium":"https://ksr-ugc.imgix.net/assets/019/223/259/16513215a3869caaea2d35d43f3c0c5f_original.jpg?w=160&h=160&fit=crop&v=1510685152&auto=format&q=92&s=0bd2f3cec5f12553e679153ba2b5d7fa"
},
"urls":{
"web":{
"user":"https://www.kickstarter.com/profile/2140043003"
},
"api":{
"user":"https://api.kickstarter.com/v1/users/2140043003?signature=1531480520.09df9a36f649d71a3a81eb14684ad0d3afc83e03"
}
}
}]
# Create datafrane
df = pd.json_normalize(sample)
# Select columns into new dataframe.
df1 = df.loc[:, ["name", "id",]]
Check df1:
Input:
print(df1)
Output:
name id
0 Olallo Rubio 2140043003

A efficient way to unpack nested json into a dataframe

I have a nested json, and i want to transform it into a pandas dataframe. I was able to normalize with json_normalize.
However, there are still json layer within the dataframe, which i also want to unpack. How can i do it in the best way? I will likely have to deal with this a few more times within the project i am doing currently
The json i have is the following
{
"data": {
"allOpportunityApplication": {
"data": [
{
"id": "111111111",
"opportunity": {
"programme": {
"short_name": "XX"
}
},
"person": {
"home_lc": {
"name": "NAME"
}
},
"standards": [
{
"constant_name": "constant1",
"standard_option": {
"option": "true"
}
},
{
"constant_name": "constant2",
"standard_option": {
"option": "true"
}
}
]
}
]
}
}
}
Used json_normalize
standards_df = json_normalize(
standard_json['allOpportunityApplication']['data'],
record_path=['standards'],
meta=['id','person','opportunity']
)
with that i get a dataframe with the columns: constant_name, standard_option, id, person, opportunity. The problem is that the data standard_option, person and opportunity are json, with a single option inside.
The current ouput and expected output for each column is as follow
Standard_option
Currently an item in the column "standard_option" looks like:
{'option': 'true'}
I want it to be just true
Person
Currently an item in the column "person" looks like:
{'programme': {'short_name': 'XX'}}
I want it to look like: XX
Opportunity
Currently an item in the column "opportunity" looks like:
{'home_lc': {'name': 'NAME'}}
I want it to look like: NAME

Might not be the best way, but I think it works.
standards_df['person'] = (standards_df.loc[:, 'person']
.apply(lambda x: x['home_lc']['name']))
standards_df['opportunity'] = (standards_df.loc[:, 'opportunity']
.apply(lambda x: x['programme']['short_name']))
constant_name standard_option.option id person opportunity
0 constant1 true 111111111 NAME XX
1 constant2 true 111111111 NAME XX
standard_option was already fine when I run your code

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using pandas and json_normalize to flatten nested JSON API response - python

Related

How to flatten dict in a DataFrame & concatenate all resultant rows

Python Pandas json_normalize with multiple lists of dicts

Creating pandas dataframe from accessing specific keys of nested dictionary

Extracting str from pandas dataframe using json

A efficient way to unpack nested json into a dataframe

Categories

Resources