Appending elements in lists in conversion to JSON - python

I'm coding a tool that reads a xlxs file and converts it to JSON. I'm using python 3 and 0.23.0 version of pandas for it. Here is the data that my code is reading from xlxs:
id label id_customer label_customer part_number
6 Sao Paulo CUST-99992 Brazil 7897
6 Sao Paulo CUST-99992 Brazil 1437
92 Hong Hong CUST-88888 China 785
==================================
Here is my code:
import pandas as pd
import json
file_imported = pd.read_excel('testing.xlsx', sheet_name = 'Plan1')
list_final = []
for index, row in file_imported.iterrows():
list1 = []
list_final.append ({
"id" : int(row['id']),
"label" : str(row['label']),
"Customer" : list1
})
list2 = []
list1.append ({
"id" : str(row['id_customer']) ,
"label" : str(row['label_customer']),
"number" : list2
})
list2.append({
"part" : str(row['part_number'])
})
print (list_final)
with open ('testing.json', 'w') as f:
json.dump(list_final, f, indent= True)
==================================
My code is working, and this is the output that I'm getting:
[
{
"id": 6,
"label": "Sao Paulo",
"Customer": [
{
"id": "CUST-99992",
"label": "Brazil",
"number" : [
{
"part": "7897"
}
]
}
]
},
{
"id": 6,
"label": "Sao Paulo",
"Customer": [
{
"id": "CUST-99992",
"label": "Brazil",
"number" : [
{
"part": "1437"
}
]
}
]
},
{
"id": 92,
"label": "Hong Hong",
"Customer": [
{
"id": "CUST-88888",
"label": "China",
"number" : [
{
"part": "785"
}
]
}
]
}
]
==================================
and I need something like this:
[
{
"id": 6,
"label": "Sao Paulo",
"Customer": [
{
"id": "CUST-99992",
"label": "Brazil",
"number" : [
{
"part": "7897"
},
{
"part": "1437"
}
]
}
]
},
{
"id": 92,
"label": "Hong Hong",
"Customer": [
{
"id": "CUST-88888",
"label": "China",
"number" : [
{
"part": "785"
}
]
}
]
}
]
==================================
I have been searching for other topics here or any useful material, but haven't found yet. This is just a piece of my code and excel file (they are too big to post here). I believe I have to use 'if' statement to verify the content inside each row before add it in my json, but idk how to do it.
I can have a lot of 'Customer' and 'number' lists inside 'list_final' with different contents (this is why I created my excel like that)
Could anyone help me?

Try this code. I started based that your data is a pandas data frame.
def part(value):
data = value.split("#")
part_list = []
for elements in data:
part_list.append({"part" : elements})
return part_list
path = yourpath
data = pd.read_excel(path)
data["part_number"] = data["part_number"].apply(lambda x: str(x))
data = data.groupby(["id", "label", "id_customer", "label_customer"], as_index=False).agg("#".join)
data["part_number"] = data["part_number"].apply(lambda x: part(x))
data = data.rename(columns={"id_customer": "Customer", "part_number": "number"})
data["label_customer"] = data["label_customer"].apply(lambda x: {"label": x})
data["Customer"] = data["Customer"].apply(lambda x: {"id": x})
data["number"] = data["number"].apply(lambda x: {"number": x})
data["Customer"] = data.apply(lambda x: [{**x["Customer"], **x["label_customer"], **x["number"]}], axis=1)
data = data[["id", "label", "Customer"]]
data.to_json(path_you_want, orient="records")

Related

Flattening Multi-Level Nested Object to DataFrame

I am trying to convert an object/dictionary to a Python DataFrame using the following code:
sr = pd.Series(object)
df = pd.DataFrame(sr.values.tolist())
display(df)
It works well but some of the output columns are of object/dictionary type, and I would like to break them up to multiple columns, for example, if column "Items" produces the following value in a cell:
obj = {
"item1": {
"id": "item1",
"relatedItems": [
{
"id": "1111",
"category": "electronics"
},
{
"id": "9999",
"category": "electronics",
"subcategory": "computers"
},
{
"id": "2222",
"category": "electronics",
"subcategory": "computers",
"additionalData": {
"createdBy": "Doron",
"inventory": 100
}
}
]
},
"item2": {
"id": "item2",
"relatedItems": [
{
"id": "4444",
"category": "furniture",
"subcategory": "sofas"
},
{
"id": "5555",
"category": "books",
},
{
"id": "6666",
"category": "electronics",
"subcategory": "computers",
"additionalData": {
"createdBy": "Joe",
"inventory": 5,
"condition": {
"name": "new",
"inspectedBy": "Doron"
}
}
}
]
}
}
The desired output is:
I tried using df.explode, but it multiplies the row to multiple rows, I am looking for a way to achieve the same but split into columns and retain a single row.
Any suggestions?
You can use the pd.json_normalize function to flatten the nested dictionary into multiple columns, with the keys joined with a dot (.).
sr = pd.Series({
'Items': {
'item_name': 'name',
'item_value': 'value'
}
})
df = pd.json_normalize(sr, sep='.')
display(df)
This will give you the following df
Items.item_name Items.item_value
0 name value
You can also specify the level of nesting by passing the record_path parameter to pd.json_normalize, for example, to only flatten the 'Items' key:
df = pd.json_normalize(sr, 'Items', sep='.')
display(df)
Seems like you're looking for pandas.json_normalize which has a (sep) parameter:​
obj = {
'name': 'Doron Barel',
'items': {
'item_name': 'name',
'item_value': 'value',
'another_item_prop': [
{
'subitem1_name': 'just_another_name',
'subitem1_value': 'just_another_value',
},
{
'subitem2_name': 'one_more_name',
'subitem2_value': 'one_more_value',
}
]
}
}
​
df = pd.json_normalize(obj, sep='.')
​
ser = df.pop('items.another_item_prop').explode()
​
out = (df.join(pd.DataFrame(ser.tolist(), index=s.index)
.rename(columns= lambda x: ser.name+"."+x))
.groupby("name", as_index=False).first()
)
Output :
print(out)
​
name items.item_name items.item_value items.another_item_prop.subitem1_name items.another_item_prop.subitem1_value items.another_item_prop.subitem2_name items.another_item_prop.subitem2_value
0 Doron Barel name value just_another_name just_another_value one_more_name one_more_value

How to add square brackets in JSON object with python

I just need contexts to be an Array ie., 'contexts' :[{}] instead of 'contexts':{}
Below is my python code which helps in converting python data-frame to required JSON format
This is the sample df for one row
name type aim context
xxx xxx specs 67646546 United States of America
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts']['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Desired output:
{
"entities": [{
"name": "XXX XXX",
"type": "specs",
"data": {
"contexts": [{
"attributes": {
"aim": {
"values": [{
"value": 67646546,
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
However I am getting below output
{
"entities": [{
"name": "XXX XXX",
"type": "specs",
"data": {
"contexts": {
"attributes": {
"aim": {
"values": [{
"value": 67646546,
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
}
}
]
}
Can any one please suggest ways for solving this problem using Python.
I think this does it:
import pandas as pd
import json
df = pd.DataFrame([['xxx xxx','specs','67646546','United States of America']],
columns = ['name', 'type', 'aim', 'context' ])
data = {'entities':[]}
for key,grp in df.groupby('name'):
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':[{'attributes':{},'context':{'country':row['context']}}]}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
if idx2 != 'aim':
continue
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
temp_dict_alpha['data']['contexts'][0]['attributes'].update(dict_temp)
data['entities'].append(temp_dict_alpha)
print(json.dumps(data, indent = 4))
Output:
{
"entities": [
{
"name": "xxx xxx",
"type": "specs",
"data": {
"contexts": [
{
"attributes": {
"aim": {
"values": [
{
"value": "67646546",
"source": "internal",
"locale": "en_Us"
}
]
}
},
"context": {
"country": "United States of America"
}
}
]
}
}
]
}
The problem is here in the following code
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
As you can see , you are already creating a contexts dict and assigning values to it. What you could do is something like this
contextObj = {'attributes':{},'context':{'dcountry':row['dcountry']}}
contextList = []
for idx, row in grp.iterrows():
temp_dict_alpha = {'name':key,'type':row['type'],'data' :{'contexts':{'attributes':{},'context':{'dcountry':row['dcountry']}}}}
attr_row = row[~row.index.isin(['name','type'])]
for idx2,row2 in attr_row.iteritems():
dict_temp = {}
dict_temp[idx2] = {'values':[]}
dict_temp[idx2]['values'].append({'value':row2,'source':'internal','locale':'en_Us'})
contextObj['attributes'].update(dict_temp)
contextList.append(contextObj)
Please Note - This code will have logical errors and might not run ( as it is difficult for me , to understand the logic behind it). But here is what you need to do .
You need to create a list of objects, which is not what you are doing. You are trying to manipulate an object and when its JSON dumped , you are getting an object back instead of a list. What you need is a list. You create context object for each and every iteration and keep on appending them to the local list contextList that we created earlier.
Once when the for loop terminates, you can update your original object by using the contextList and you will have a list of objects instead of and object which you are having now.

How to convert DataFrame into nested JSON

I'm trying to export a dataFrame into a nested JSON (hierarchical) for D3.js using solution which is only for one level ( parent , children)
Any help would be appreciated. I'm new to python
My DataFrame contains 7 levels
Here is the expected solution
JSON Example:
{
"name": "World",
"children": [
{
"name": "Europe",
"children": [
{
"name": "France",
"children": [
{
"name": "Paris",
"population": 1000000
}]
}]
}]
}
and here is the python method:
def to_flare_json(df, filename):
"""Convert dataframe into nested JSON as in flare files used for D3.js"""
flare = dict()
d = {"name":"World", "children": []}
for index, row in df.iterrows():
parent = row[0]
child = row[1]
child1 = row[2]
child2 = row[3]
child3 = row[4]
child4 = row[5]
child5 = row[6]
child_value = row[7]
# Make a list of keys
key_list = []
for item in d['children']:
key_list.append(item['name'])
#if 'parent' is NOT a key in flare.JSON, append it
if not parent in key_list:
d['children'].append({"name": parent, "children":[{"value": child_value, "name1": child}]})
# if parent IS a key in flare.json, add a new child to it
else:
d['children'][key_list.index(parent)]['children'].append({"value": child_value, "name11": child})
flare = d
# export the final result to a json file
with open(filename +'.json', 'w') as outfile:
json.dump(flare, outfile, indent=4,ensure_ascii=False)
return ("Done")
[EDIT]
Here is a sample of my df
World Continent Region Country State City Boroughs Population
1 Europe Western Europe France Ile de France Paris 17 821964
1 Europe Western Europe France Ile de France Paris 19 821964
1 Europe Western Europe France Ile de France Paris 20 821964
The structure you want is clearly recursive so I made a recursive function to fill it:
def create_entries(df):
entries = []
# Stopping case
if df.shape[1] == 2: # only 2 columns left
for i in range(df.shape[0]): # iterating on rows
entries.append(
{"Name": df.iloc[i, 0],
df.columns[-1]: df.iloc[i, 1]}
)
# Iterating case
else:
values = set(df.iloc[:, 0]) # Getting the set of unique values
for v in values:
entries.append(
{"Name": v,
# reiterating the process but without the first column
# and only the rows with the current value
"Children": create_entries(
df.loc[df.iloc[:, 0] == v].iloc[:, 1:]
)}
)
return entries
All that's left is to create the dictionary and call the function:
mydict = {"Name": "World",
"Children": create_entries(data.iloc[:, 1:])}
Then you just write your dict to a JSON file.
I hope my comments are explicit enough, the idea is to recursively use the first column of the dataset as the "Name" and the rest as the "Children".
Thank you Syncrossus for the answer, but this result in different branches for each boroughs or city
The result is this:
"Name": "World",
"Children": [
{
"Name": "Western Europe",
"Children": [
{
"Name": "France",
"Children": [
{
"Name": "Ile de France",
"Children": [
{
"Name": "Paris",
"Children": [
{
"Name": "17ème",
"Population": 821964
}
]
}
]
}
]
}
]
},{
"Name": "Western Europe",
"Children": [
{
"Name": "France",
"Children": [
{
"Name": "Ile de France",
"Children": [
{
"Name": "Paris",
"Children": [
{
"Name": "10ème",
"Population": 154623
}
]
}
]
}
]
}
]
}
But the desired result is this
"Name": "World",
"Children": [
{
"Continent": "Europe",
"Children": [
{
"Region": "Western Europe",
"Children": [
{
"Country": "France",
"Children": [
{
"State": "Ile De France",
"Children": [
{
"City": "Paris",
"Children": [
{
"Boroughs": "17ème",
"Population": 82194
},
{
"Boroughs": "16ème",
"Population": 99194
}
]
},
{
"City": "Saint-Denis",
"Children": [
{
"Boroughs": "10ème",
"Population": 1294
},
{
"Boroughs": "11ème",
"Population": 45367
}
]
}
]
}
]
},
{
"Country": "Belgium",
"Children": [
{
"State": "Oost-Vlaanderen",
"Children": [
{
"City": "Gent",
"Children": [
{
"Boroughs": "2ème",
"Population": 1234
},
{
"Boroughs": "4ème",
"Population": 7456
}
]
}
]
}
]
}
]
}
]
}
]

Nested Json to csv with python 3

I am trying to convert JSON data into a CSV in Python and found this code listed on Stack Exchange from a while back (link:How can I convert JSON to CSV?). It no longer works in Python 3, giving me different errors. Anyone know how to fix for Python 3? Thanks.
Below is my JSON data:
{ "fruit": [
{ "name": "Apple",
"binomial name": "Malus domestica",
"major_producers": [ "China", "United States", "Turkey" ],
"nutrition":
{ "carbohydrates": "13.81g",
"fat": "0.17g",
"protein": "0.26g"
}
},
{ "name": "Orange",
"binomial name": "Citrus x sinensis",
"major_producers": [ "Brazil", "United States", "India" ],
"nutrition":
{ "carbohydrates": "11.75g",
"fat": "0.12g",
"protein": "0.94g"
}
},
{ "name": "Mango",
"binomial name": "Mangifera indica",
"major_producers": [ "India", "China", "Thailand" ],
"nutrition":
{ "carbohydrates": "15g",
"fat": "0.38g",
"protein": "0.82g"
}
}
] }
The output CSV should look like
the most easiest way to go would be throwing the desired dict into a pandas dataframe and use its .to_csv() method:
json_data = { "fruit": [ { "name": "Apple", "binomial name": "Malus domestica", "major_producers": [ "China", "United States", "Turkey" ], "nutrition": { "carbohydrates": "13.81g", "fat": "0.17g", "protein": "0.26g" } }, { "name": "Orange", "binomial name": "Citrus x sinensis", "major_producers": [ "Brazil", "United States", "India" ], "nutrition": { "carbohydrates": "11.75g", "fat": "0.12g", "protein": "0.94g" } }, { "name": "Mango", "binomial name": "Mangifera indica", "major_producers": [ "India", "China", "Thailand" ], "nutrition": { "carbohydrates": "15g", "fat": "0.38g", "protein": "0.82g" } } ] }
df = pd.DataFrame(json_data['fruit'])
df.to_csv('/wherever/file/shall/roam/test.csv')
which leads to a csv file like
Still using pandas but slightly different approach by treating your JSON as a dictionary
import pandas as pd
import pprint as pprint
x = { "fruit": [ { "name": "Apple", "binomial name": "Malus domestica", "major_producers": [ "China", "United States", "Turkey" ], "nutrition": { "carbohydrates": "13.81g", "fat": "0.17g", "protein": "0.26g" } }, { "name": "Orange", "binomial name": "Citrus x sinensis", "major_producers": [ "Brazil", "United States", "India" ], "nutrition": { "carbohydrates": "11.75g", "fat": "0.12g", "protein": "0.94g" } }, { "name": "Mango", "binomial name": "Mangifera indica", "major_producers": [ "India", "China", "Thailand" ], "nutrition": { "carbohydrates": "15g", "fat": "0.38g", "protein": "0.82g" } } ] }
add some additional information to the dict that will give additional headers closer to the desired output.
for item in x['fruit']:
for index, country in enumerate(item['major_producers']):
new_key = 'major_producers'+str(index + 1)
item[new_key] = country
item['carbs'] = item['nutrition']['carbohydrates']
item['fat'] = item['nutrition']['fat']
item['protein']= item['nutrition']['protein']
pretty print of the updated dict
pprint(x['fruit'])
Create the pandas dataframe from the list of dicts as in:
xdf = pd.DataFrame.from_dict(x['fruit'])
Use only the headers you require
xdf = xdf[['name', 'binomial name', 'major_producers1','major_producers2','major_producers3','carbs','fat','protein']]
Then as #SpghttCd mentions you can use the pd.to_csv. No need for index in this case.
xdf.to_csv('filename.csv',index=False)
The csv file should look like this:

Convert JSON with nested objects to Pandas Dataframe

I am trying to load json from a url and convert to a Pandas dataframe, so that the dataframe would look like the sample below.
I've tried json_normalize, but it duplicates the columns, one for each data type (value and stringValue). Is there a simpler way than this method and then dropping and renaming columns after creating the dataframe? I want to keep the stringValue.
Person ID Position ID Job ID Manager
0 192 936 93 Tom
my_json = {
"columns": [
{
"alias": "c3",
"label": "Person ID",
"dataType": "integer"
},
{
"alias": "c36",
"label": "Position ID",
"dataType": "string"
},
{
"alias": "c40",
"label": "Job ID",
"dataType": "integer",
"entityType": "job"
},
{
"alias": "c19",
"label": "Manager",
"dataType": "integer"
},
],
"data": [
{
"c3": {
"value": 192,
"stringValue": "192"
},
"c36": {
"value": "936",
"stringValue": "936"
},
"c40": {
"value": 93,
"stringValue": "93"
},
"c19": {
"value": 12412453,
"stringValue": "Tom"
}
}
]
}
If c19 is of type string, this should work
alias_to_label = {x['alias']: x['label'] for x in my_json["columns"]}
is_str = {x['alias']: ('string' == x['dataType']) for x in my_json["columns"]}
data = []
for x in my_json["data"]:
data.append({
k: v["stringValue" if is_str[k] else 'value']
for k, v in x.items()
})
df = pd.DataFrame(data).rename(columns=alias_to_label)

Categories

Resources