Flatten deeply nested JSON vertically convert to pandas - python

Hi I am trying flatten JSON file but unable to . My JSON has 3 indents repeating sample as below
floors": [
{
"uuid": "8474",
"name": "some value",
"areas": [
{
"uuid": "xyz",
"**name**": "qwe",
"roomType": "Name1",
"templateUuid": "sdklfj",
"templateName": "asdf",
"templateVersion": "2.7.1",
"Required1": [
{
"**uuid**": "asdf",
"description": "asdf3",
"categoryName": "asdf",
"familyName": "asdf",
"productName": "asdf3",
"Required2": [
{
"**deviceId**": "asdf",
"**deviceUuid**": "asdf-asdf"
}
]
}
I want for area the corresponding values in nested Required1 and for the Required1 corresponding required 2.(Highlighted in **)
I have tried JSON normalize as below but failed and other free libs :
Attempts :
from pprint import pprint
with open('Filename.json') as data_file:
data_item = json.load(data_file)
Raw_Areas=json_normalize(data_item['floors'],'areas',errors='ignore',record_prefix='Area_')
No area value displayed. Only Required 1 Required 2 still nested
K=json_normalize(data_item['floors'][0],record_path=['Required1','Required2'],errors='ignore',record_prefix='Try_')
from flatten_json import flatten_json
Flat_J1= pd.DataFrame([flatten_json(data_item)])
Looking to get values as below :
Columns expected :
floors.areas.Required1.Required2.deviceUuid
floors.areas.name
(Side by Side)
Please help am I missing anything in my attempt. I am fairly new to JSON loads.

Assuming the following JSON (as multiple people pointed out, it's incomplete). So I completed it based on the bracket openings you had.
dct = {"floors": [
{
"uuid": "8474",
"name": "some value",
"areas": [
{
"uuid": "xyz",
"name": "qwe",
"roomType": "Name1",
"templateUuid": "sdklfj",
"templateName": "asdf",
"templateVersion": "2.7.1",
"Required1": [
{
"uuid": "asdf",
"description": "asdf3",
"categoryName": "asdf",
"familyName": "asdf",
"productName": "asdf3",
"Required2": [
{
"deviceId": "asdf",
"deviceUuid": "asdf-asdf"
}
]
}
]
}
]
}
]}
You can do the following (requires pandas 0.25.0)
df = pd.io.json.json_normalize(
dct, record_path=['floors','areas', 'Required1'],meta=[['floors', 'areas', 'name']])
df = df.explode('Required2')
df = pd.concat([df, df["Required2"].apply(pd.Series)], axis=1)
df = df[['floors.areas.name', 'uuid', 'deviceId', 'deviceUuid']]
Which gives,
>>> floors.areas.name uuid deviceId deviceUuid
>>> 0 qwe asdf asdf asdf-asdf

Related

How to create nested Json data in pandas?

I have a CSV file which I convert into JSON. However, in JSON, I need to format a specific column with curly brackets.
The field time has value "DAY=20220524", this has to be converted into {"DAY":20170801}
json data:
{"ID":200,"Type":"ABC","time":"DAY=20220524"}
{"ID":400,"Type":"ABC","time":"NOON=20220524"}
expected output:
{"ID":200,"Type":"ABC","time": {"DAY":20170801}}
{"ID":400,"Type":"ABC","time": {"DAY":20170801}}
I am not sure how do I do this. Can anyone please help me with this?
With the following file.json:
[
{
"ID": 200,
"Type": "ABC",
"time": "DAY=20220524"
},
{
"ID": 400,
"Type": "ABC",
"time": "NOON=20220524"
}
]
Here is one way to do it:
import pandas as pd
pd.read_json("file.json").assign(
time=lambda df_: df_["time"].apply(lambda x: f"{{{x}}}")
).to_json("new_file.json", orient="records")
In new_file.json:
[
{
"ID": 200,
"Type": "ABC",
"time": "{DAY=20220524}"
},
{
"ID": 400,
"Type": "ABC",
"time": "{NOON=20220524}"
}
]

Looking to generically convert JSON file to CSV in Python

Tried solution shared in link :: Nested json to csv - generic approach
This worked for Sample 1 , but giving only a single row for Sample 2.
is there a way to have generic python code to handle both Sample 1 and Sample 2.
Sample 1 ::
{
"Response": "Success",
"Message": "",
"HasWarning": false,
"Type": 100,
"RateLimit": {},
"Data": {
"Aggregated": false,
"TimeFrom": 1234567800,
"TimeTo": 1234567900,
"Data": [
{
"id": 11,
"symbol": "AAA",
"time": 1234567800,
"block_time": 123.282828282828,
"block_size": 1212121,
"current_supply": 10101010
},
{
"id": 12,
"symbol": "BBB",
"time": 1234567900,
"block_time": 234.696969696969,
"block_size": 1313131,
"current_supply": 20202020
},
]
}
}
Sample 2::
{
"Response": "Success",
"Message": "Summary succesfully returned!",
"Data": {
"11": {
"Id": "3333",
"Url": "test/11.png",
"value": "11",
"Name": "11 entries (11)"
},
"122": {
"Id": "5555555",
"Url": "test/122.png",
"Symbol": "122",
"Name": "122 cases (122)"
}
},
"Limit": {},
"HasWarning": False,
"Type": 50
}
Try this, you need to install flatten_json from here
import sys
import csv
import json
from flatten_json import flatten
data = json.load(open(sys.argv[1]))
data = flatten(data)
with open('foo.csv', 'w') as f:
out = csv.DictWriter(f, data.keys())
out.writeheader()
out.writerow(data)
Output
> cat foo.csv
Response,Message,Data_11_Id,Data_11_Url,Data_11_value,Data_11_Name,Data_122_Id,Data_122_Url,Data_122_Symbol,Data_122_Name,Limit,HasWarning,Type
Success,Summary succesfully returned!,3333,test/11.png,11,11 entries (11),5555555,test/122.png,122,122 cases (122),{},False,50
Note: False is incorrect in Json, you need to change it to false

Group By and Count occurences of values in list of nested dicts

I have a JSON file that looks structurally like this:
{
"content": [
{
"name": "New York",
"id": "1234",
"Tags": {
"hierarchy": "CITY"
}
},
{
"name": "Los Angeles",
"id": "1234",
"Tags": {
"hierarchy": "CITY"
}
},
{
"name": "California",
"id": "1234",
"Tags": {
"hierarchy": "STATE"
}
}
]
}
And as an outcome I would like a table view in CSV like so:
tag.key
tag.value
occurrance
hierarchy
CITY
2
hierarchy
STATE
1
Meaning I want to count the occurance of each unique "tag" in my json file and create an output csv that shows this. My original json is a pretty large file.
Firstly construct a dictionary object by using ast.literal_eval function, and then split this object to get a key, value tuples in order to create a dataframe by using zip. Apply groupby to newly formed dataframe, and finally create a .csv file through use of df_agg.to_csv such as
import json
import ast
import pandas as pd
Js= """{
"content": [
{
"name": "New York",
"id": "1234",
"Tags": {
"hierarchy": "CITY"
}
},
....
....
{
"name": "California",
"id": "1234",
"Tags": {
"hierarchy": "STATE"
}
}
]
}"""
data = ast.literal_eval(Js)
key = []
value=[]
for i in list(range(0,len(data['content']))):
value.append(data['content'][i]['Tags']['hierarchy'])
for j in data['content'][i]['Tags']:
key.append(j)
df = pd.DataFrame(list(zip(key, value)), columns =['tag.key', 'tag.value'])
df_agg=df.groupby(['tag.key', 'tag.value']).size().reset_index(name='occurrance')
df_agg.to_csv(r'ThePath\\to\\your\\file\\result.csv',index = False)

How to add square brackets in JSON object with python

I just need contexts to be an Array ie., 'contexts' :[{}] instead of 'contexts':{}
Below is my python code which helps in converting python data-frame to required JSON format
This is the sample df for one row
name type aim context
xxx xxx specs 67646546 United States of America
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts']['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Desired output:
{
"entities": [{
"name": "XXX XXX",
"type": "specs",
"data": {
"contexts": [{
"attributes": {
"aim": {
"values": [{
"value": 67646546,
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
However I am getting below output
{
"entities": [{
"name": "XXX XXX",
"type": "specs",
"data": {
"contexts": {
"attributes": {
"aim": {
"values": [{
"value": 67646546,
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
}
}
]
}
Can any one please suggest ways for solving this problem using Python.
I think this does it:
import pandas as pd
import json
df = pd.DataFrame([['xxx xxx','specs','67646546','United States of America']],
columns = ['name', 'type', 'aim', 'context' ])
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':[{'attributes':{},'context':{'country':row['context']}}]}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
if idx2 != 'aim':
continue
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts'][0]['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Output:
{
"entities": [
{
"name": "xxx xxx",
"type": "specs",
"data": {
"contexts": [
{
"attributes": {
"aim": {
"values": [
{
"value": "67646546",
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
The problem is here in the following code
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
As you can see , you are already creating a contexts dict and assigning values to it. What you could do is something like this
contextObj = {'attributes':{},'context':{'dcountry':row['dcountry']}}
contextList = []
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
contextObj['attributes'].update(dict_temp)
contextList.append(contextObj)
Please Note - This code will have logical errors and might not run ( as it is difficult for me , to understand the logic behind it). But here is what you need to do .
You need to create a list of objects, which is not what you are doing. You are trying to manipulate an object and when its JSON dumped , you are getting an object back instead of a list. What you need is a list. You create context object for each and every iteration and keep on appending them to the local list contextList that we created earlier.
Once when the for loop terminates, you can update your original object by using the contextList and you will have a list of objects instead of and object which you are having now.

How to make a 'outer' JSON key for JSON object with python

I would like to make the following JSON syntax output with python:
data={
"timestamp": "1462868427",
"sites": [
{
"name": "SiteA",
"zone": 1
},
{
"name": "SiteB",
"zone": 7
}
]
}
But I cannot manage to get the 'outer' data key there.
So far I got this output without the data key:
{
"timestamp": "1462868427",
"sites": [
{
"name": "SiteA",
"zone": 1
},
{
"name": "SiteB",
"zone": 7
}
]
}
I have tried with this python code:
sites = [
{
"name":"nameA",
"zone":123
},
{
"name":"nameB",
"zone":324
}
]
data = {
"timestamp": 123456567,
"sites": sites
}
print(json.dumps(data, indent = 4))
But how do I manage to get the outer 'data' key there?
Once you have your data ready, you can simply do this :
data = {'data': data}
JSON doesn't have =, it's all key:value.
What you're looking for is
data = {
"data": {
"timestamp": 123456567,
"sites": sites
}
}
json.dumps(data)
json.dumps() doesn't care for the name you give to the data object in python. You have to specify it manually inside the object, as a string.

Categories

Resources