I have a Pandas DataFrame which I need to transform into a JSON object. I thought by grouping it, I would achieve this but this does not seem to yield the correct results. Further, I wouldnt know how to name the sub group.
My data frame as follows:
parent
name
age
nick
stef
10
nick
rob
12
And I do a groupby as I would like all children together under one parent in json:
df = df.groupby(['parent', 'name'])['age'].min()
And I would like it to yield the following:
{
"parent": "Nick",
"children": [
{
"name": "Rob",
"age": 10,
},
{
"name": "Stef",
"age": 15,
},,.. ]
}
When I do .to_json() it seems to regroup everything on age etc.
df.groupby(['parent'])[['name', 'age']].apply(list).to_json()
Given I wanted to add some styling, I ended up solving it as follows:
import json
df_grouped = df.groupby('parent')
new = []
for group_name, df_group in df_grouped:
base = {}
base['parent'] = group_name
children = []
for row_index, row in df_group.iterrows():
temp = {}
temp['name'] = row['name']
temp['age'] = row['age']
children.append(temp)
base['children'] = children
new.append(base)
json_format = json.dumps(new)
print(new)
Which yielded the following results:
[
{
"parent":"fee",
"children":[
{
"name":"bob",
"age":9
},
{
"name":"stef",
"age":10
}
]
},
{
"parent":"nick",
"children":[
{
"name":"stef",
"age":10
},
{
"name":"tobi",
"age":2
},
{
"name":"ralf",
"age":12
}
]
},
{
"parent":"patrick",
"children":[
{
"name":"marion",
"age":10
}
]
}
]
Related
I've been wresting with this for many days now and would appreciate any help.
I'm importing an Excel file to a Pandas data frame resulting in the following dataframe [record]:
account_id
name
timestamp
value
A0001C
Fund_1
1588618800000000000
1
B0001B
Dev_2
1601578800000000000
1
I'm looking to produce a nested JSON output (will be used to submit data to an API), include adding a records and metric labels for the arrays.
Here is the output i'm looking for:
{
"records": [
{
"name": "Fund_1",
"account_id": "A0001C",
"metrics": [
{
"timestamp": 1588618800000000000,
"value": 1
}
]
}
{
"name": "Dev_2",
"account_id": "B0001B",
"metrics": [
{
"timestamp": 1601578800000000000,
"value": 1
}
]
}
]
}
I've gotten an output of a none nested JSON data set, but not able split out the timestamp and value to add the metrics part.
for record in df.to_dict(orient='records'):
record_data = {'records': [record]}
payload_json = json.dumps(record_data)
print(payload_json)
I get the following output:
{"records": [{"account_id": "A0001C", "name": "Fund_1", "Date Completed": 1588618800000000000, "Count": "1"}]}
{"records": [{"account_id": "B0001B", "name": "Dev_2", "Date Completed": 1601578800000000000, "Count": "1"}]}
Any help on how i can modify my code to add the metrics label and nest the data.
Thanks in advance.
One approach is through the use of pd.apply. This allows you to apply a function to series (either column- or row-wise) in your dataframe.
In your particular case, you want to apply the function row-by-row, so you have to use apply with axis=1:
records = list(df.apply(lambda row: {"name": row["name"],
"account_id": row["account_id"],
"metrics": [{
"timestamp": row["timestamp"],
"value": row["value"]}]
},
axis=1).values)
payload = {"records": records}
Alternatively, you could introduce an auxiliary column "metrics" in which you store your metrics (subsequently applying pd.to_json):
df["metrics"] = df.apply(lambda e: [{"timestamp": e.timestamp,
"value": e.value}],
axis=1)
records = df[["account_id", "name", "metrics"]].to_dict(orient="records")
payload = {"records": records}
Here's a full example applying option 2:
import io
import json
import pandas as pd
data = io.StringIO("""account_id name timestamp value
A0001C Fund_1 1588618800000000000 1
B0001B Dev_2 1601578800000000000 1""")
df = pd.read_csv(data, sep="\t")
df["metrics"] = df.apply(lambda e: [{"timestamp": e.timestamp,
"value": e.value}],
axis=1)
records = df[["account_id", "name", "metrics"]].to_dict(orient="records")
payload = {"records": records}
print(json.dumps(payload, indent=4))
Output:
{
"records": [
{
"account_id": "A0001C",
"name": "Fund_1",
"metrics": [
{
"timestamp": 1588618800000000000,
"value": 1
}
]
},
{
"account_id": "B0001B",
"name": "Dev_2",
"metrics": [
{
"timestamp": 1601578800000000000,
"value": 1
}
]
}
]
}
Edit: The second approach also makes grouping by accounts (in case you want to do that) rather easy. Below is a small example and output:
import io
import json
import pandas as pd
data = io.StringIO("""account_id name timestamp value
A0001C Fund_1 1588618800000000000 1
A0001C Fund_1 1588618900000000000 2
B0001B Dev_2 1601578800000000000 1""")
df = pd.read_csv(data, sep="\t")
# adding the metrics column as above
df["metrics"] = df.apply(lambda e: {"timestamp": e.timestamp,
"value": e.value},
axis=1)
# group metrics by account
df_grouped = df.groupby(by=["name", "account_id"]).metrics.agg(list).reset_index()
records = df_grouped[["account_id", "name", "metrics"]].to_dict(orient="records")
payload = {"records": records}
print(json.dumps(payload, indent=4))
Output:
{
"records": [
{
"account_id": "B0001B",
"name": "Dev_2",
"metrics": [
{
"timestamp": 1601578800000000000,
"value": 1
}
]
},
{
"account_id": "A0001C",
"name": "Fund_1",
"metrics": [
{
"timestamp": 1588618800000000000,
"value": 1
},
{
"timestamp": 1588618900000000000,
"value": 2
}
]
}
]
}
I have a csv file with a DF with structure as follows:
my dataframe:
I want to enter the data to the following JSON format using python. I looked to couple of links (but I got lost in the nested part). The links I checked:
How to convert pandas dataframe to uniquely structured nested json
convert dataframe to nested json
"PHI": 2,
"firstname": "john",
"medicalHistory": {
"allergies": "egg",
"event": {
"inPatient":{
"hospitalized": {
"visit" : "7-20-20",
"noofdays": "5",
"test": {
"modality": "xray"
}
"vitalSign": {
"temperature": "32",
"heartRate": "80"
},
"patientcondition": {
"headache": "1",
"cough": "0"
}
},
"icu": {
"visit" : "",
"noofdays": "",
},
},
"outpatient": {
"visit":"5-20-20",
"vitalSign": {
"temperature": "32",
"heartRate": "80"
},
"patientcondition": {
"headache": "1",
"cough": "1"
},
"test": {
"modality": "blood"
}
}
}
}
If anyone can help me with the nested array, that will be really helpful.
You need one or more helper functions to unpack the data in the table like this. Write main helper function to accept two arguments: 1. df and 2. schema. The schema will be used to unpack the df into a nested structure for each row in the df. The schema below is an example of how to achieve this for a subset of the logic you describe. Although not exactly what you specified in example, should be enough of hint for you to complete the rest of the task on your own.
from operator import itemgetter
groupby_idx = ['PHI', 'firstName']
groups = df.groupby(groupby_idx, as_index=False, drop=False)
schema = {
"event": {
"eventType": itemgetter('event'),
"visit": itemgetter('visit'),
"noOfDays": itemgetter('noofdays'),
"test": {
"modality": itemgetter('test')
},
"vitalSign": {
"temperature": itemgetter('temperature'),
"heartRate": itemgetter('heartRate')
},
"patientCondition": {
"headache": itemgetter('headache'),
"cough": itemgetter('cough')
}
}
}
def unpack(obj, schema):
tmp = {}
for k, v in schema.items():
if isinstance(v, (dict,)):
tmp[k] = unpack(obj, v)
if callable(v):
tmp[k] = v(obj)
return tmp
def apply_unpack(groups, schema):
results = {}
for gidx, df in groups:
events = []
for ridx, obj in df.iterrows():
d = unpack(obj, schema)
events.append(d)
results[gidx] = events
return results
unpacked = apply_unpack(groups, schema)
If I have json data formatted like this:
{
"result": [
{
"id": 878787,
"name": "Testing",
"schema": {
"id": 3463463,
"smartElements": [
{
"svKey": "Model",
"value": {
"type": "type1",
"value": "ThisValue"
}
},
{
"svKey": "SecondKey",
"value": {
"type": "example",
"value": "ThisValue2"
}
}
]
}
},
{
"id": 333,
"name": "NameName",
"schema": {
"id": 1111,
"smartElements": [
{
"svKey": "Model",
"value": {
"type": "type1",
"value": "NewValue"
}
},
{
"svKey": "SecondKey",
"value": {
"type": "example",
"value": "ValueIs"
}
}
]
}
}
]
}
is there a way to normalize it so I end up with records:
name Model SecondKey
Testing ThisValue ThisValue2
NameName NewValue ValueIs
I can get the smartElements to a pandas series but I can't figure out a way to break out smartElements[x].svKey to a column header and smartElements[x].value.value to the value for that column and/or merge it.
I'd skip trying to use a pre-baked solution and just navigate the json yourself.
import json
import pandas as pd
data = json.load(open('my.json'))
records = []
for d in data['result']:
record = {}
record['name'] = d['name']
for ele in d['schema']['smartElements']:
record[ele['svKey']] = ele['value']['value']
records.append(record)
pd.DataFrame(records)
name Model SecondKey
0 Testing ThisValue ThisValue2
1 NameName NewValue ValueIs
My solution
import pandas as pd
import json
with open('test.json') as f:
a = json.load(f)
d = pd.json_normalize(data=a['result'], errors='ignore', record_path=['schema', 'smartElements'], meta=['name'])
print(d)
produces
svKey value.type value.value name
0 Model type1 ThisValue Testing
1 SecondKey example ThisValue2 Testing
2 Model type1 NewValue NameName
3 SecondKey example ValueIs NameName
I have few static key columns EmployeeId,type and few columns coming from first FOR loop.
While in the second FOR loop if i have a specific key then only values should be appended to the existing data frame columns else whatever the columns getting fetched from first for loop should remain same.
First For Loop Output:
"EmployeeId","type","KeyColumn","Start","End","Country","Target","CountryId","TargetId"
"Emp1","Metal","1212121212","2000-06-17","9999-12-31","","","",""
After Second For Loop i have below output:
"EmployeeId","type","KeyColumn","Start","End","Country","Target","CountryId","TargetId"
"Emp1","Metal","1212121212","2000-06-17","9999-12-31","","AMAZON","1",""
"Emp1","Metal","1212121212","2000-06-17","9999-12-31","","FLIPKART","2",""
As per code if i have Employee tag available , i have got above 2 records but i may have few json files without Employee tag then output should remain same as per First Loop Output with all the key fields populated and rest columns with null.
But i am getting 0 records as per my code. Please help me if my way of coding is wrong.
Please help me ... If the way of asking question is not clear i am sorry , as i am new to python . Please find the sample data in the below link
Please find below code
for i in range(len(json_file['enty'])):
temp = {}
temp['EmployeeId'] = json_file['enty'][i]['id']
temp['type'] = json_file['enty'][i]['type']
for key in json_file['enty'][i]['data']['attributes'].keys():
try:
temp[key] = json_file['enty'][i]['data']['attributes'][key]['values'][0]['value']
except:
temp[key] = None
for key in json_file['enty'][i]['data']['attributes'].keys():
if(key == 'Employee'):
for j in range(len(json_file['enty'][i]['data']['attributes']['Employee']['group'])):
for key in json_file['enty'][i]['data']['attributes']['Employee']['group'][j].keys():
try:
temp[key] = json_file['enty'][i]['data']['attributes']['Employee']['group'][j][key]['values'][0]['value']
except:
temp[key] = None
temp_df = pd.DataFrame([temp])
df = pd.concat([df, temp_df], sort=True)
# Rearranging columns
df = df[['EmployeeId', 'type'] + [col for col in df.columns if col not in ['EmployeeId', 'type']]]
# Writing the dataset
df[columns_list].to_csv("Test22.csv", index=False, quotechar='"', quoting=1)
If Employee Tag is not available i am getting 0 records as output but i am expecting 1 record as for first for loop
enter link description here
The JSON structure is quite complicated. I try to simplified the data collection from it. The result is a list of flat dicts. The code handles the case where 'Employee' is not found.
import copy
d = {
"enty": [
{
"id": "Emp1",
"type": "Metal",
"data": {
"attributes": {
"KeyColumn": {
"values": [
{
"value": 1212121212
}
]
},
"End": {
"values": [
{
"value": "2050-12-31"
}
]
},
"Start": {
"values": [
{
"value": "2000-06-17"
}
]
},
"Employee": {
"group": [
{
"Target": {
"values": [
{
"value": "AMAZON"
}
]
},
"CountryId": {
"values": [
{
"value": "1"
}
]
}
},
{
"Target": {
"values": [
{
"value": "FLIPKART"
}
]
},
"CountryId": {
"values": [
{
"value": "2"
}
]
}
}
]
}
}
}
}
]
}
emps = []
for e in d['enty']:
entry = {'id': e['id'], 'type': e['type']}
for x in ["KeyColumn", "Start", "End"]:
entry[x] = e['data']['attributes'][x]['values'][0]['value']
if e['data']['attributes'].get('Employee'):
for grp in e['data']['attributes']['Employee']['group']:
clone = copy.deepcopy(entry)
for x in ['Target', 'CountryId']:
clone[x] = grp[x]['values'][0]['value']
emps.append(clone)
else:
emps.add(entry)
# TODO write to csv
for emp in emps:
print(emp)
output
{'End': '2050-12-31', 'Target': 'AMAZON', 'KeyColumn': 1212121212, 'Start': '2000-06-17', 'CountryId': '1', 'type': 'Metal', 'id': 'Emp1'}
{'End': '2050-12-31', 'Target': 'FLIPKART', 'KeyColumn': 1212121212, 'Start': '2000-06-17', 'CountryId': '2', 'type': 'Metal', 'id': 'Emp1'}
I am trying to convert a multi-level hierarchy table into a specific JSON format for a visual I am creating.
I have the data in a pandas dataframe and have tried grouping it by the different levels, but then cannot convert a groupby to a json using pandas. I did also try just converting the dataframe to a json, but the format isn't correct. I am not sure what else to do to get the parent/child format that I am looking for. All the "size" values only need to be 1 so that part seems straightforward enough...
Thanks in advance!
**This is what my data looks like**
ColA ColB ColC
Parent1 Child1
Parent1 Child2 Child2A
Parent1 Child2 Child2B
Parent1 Child3 Child2A
Parent2 Child1
Parent2 Child2 Child2A
What I am getting from the pandas dataframe to_json is creating the json column by column, so I am losing the hierarchy aspect of it.
so its:
data = {"Parent1}"{"index #":"col2 value"
What I want is:
data = ({ "name":"TEST",
"children": [
{
"name": "Parent1",
"children":
[
{
"name": "Child1",
"size": "1"
},
{
"name":"Child2",
"children":
[
{
"name":"Child2A",
"size":"1"
},
{
"name":"Child2B",
"size":"1"
},
{
"name":"Child2C",
"size":"1"
},
{
"name":"Child2D",
"size":"1"
},
],
},
{
"name":"Parent2",
"children": [
{
"name":"Child2A",
"size":"1"
},
{
"name":"Child2B",
"size":"1"
},
{
"name":"Child2C",
"size":"1"
},
]
},
]
},
{
"name": "Parent3",
"children":
[
{
"name": "Child1",
"size": "1",
},
{
"name":"Child2",
"children":
[
{
"name":"Child2A",
"size":"1"
},
{
"name":"Child2B",
"size":"1"
},
{
"name":"Child2C",
"size":"1"
},
],
},
{
"name":"Child3",
"children":
[
{
"name":"Child3A",
"size":"1"
},
],
},
],
},
]})
Here we come
import json
data = [
'Parent1 Child1',
'Parent1 Child2 Child2A',
'Parent1 Child2 Child2B',
'Parent1 Child3 Child2A',
'Parent2 Child1',
'Parent2 Child2 Child2A',
]
tree = {}
for d in data:
node = None
for item in d.split():
name = item.strip() # dont need spaces
current_dict = tree if node is None else node
node = current_dict.get(name)
if not node:
node = {}
current_dict[name] = node
def walker(src, res):
for name, value in src.items():
node = {'name': name, 'size': 1}
if 'children' not in res:
res['children'] = []
res['children'].append(node)
walker(value, node)
result = {'name': 'TEST'}
walker(tree, result)
print (json.dumps(result, indent = True))