Multiple relational tables to nested JSON format using Python

Multiple relational tables to nested JSON format using Python - python

I'm trying to create nested JSON object by combining more than one relational tables using python/pandas. I'm a beginner in Python/pandas, so looking for bit of a help here...
In the following example, instead of tables, I'm using CSV files just to keep it simple
Table1.csv
Emp_id, Gender, Age
1, M, 32
2, M, 35
3, F, 31
Table2.csv
Emp_id, Month, Incentive
1, Aug, 3000
1, Sep, 3500
1, Oct, 2000
2, Aug, 1500
3, Aug, 5000
3, Sep, 2400
I want to create a JSON object like below
*Required output:
{
"data": [{
"employee": 1,
"gender": M,
"age": 32,
"incentive": [{
"aug": 3000,
"sep": 3500,
"oct": 2000
}],
"employee": 2,
"gender": M,
"age": 35,
"incentive": [{
"aug": 1500
}],
"employee": 3,
"gender": F,
"age": 31,
"incentive": [{
"aug": 5000,
"sep": 2400
}]
}]
}

Use merge with left join first, then groupby with lambda function for dictionaries and convert to_dict, last add top key value and convert to json:
d = (df1.merge(df2, on='Emp_id', how='left')
.groupby(['Emp_id','Gender','Age'])['Month','Incentive']
.apply(lambda x: [dict(x.values)])
.reset_index(name='Incentive')
.to_dict(orient='records')
)
#print (d)
import json
json = json.dumps({'data':d})
print (json)
{
"data": [{
"Emp_id": 1,
"Gender": "M",
"Age": 32,
"Incentive": [{
"Aug": 3000,
"Sep": 3500,
"Oct": 2000
}]
}, {
"Emp_id": 2,
"Gender": "M",
"Age": 35,
"Incentive": [{
"Aug": 1500
}]
}, {
"Emp_id": 3,
"Gender": "F",
"Age": 31,
"Incentive": [{
"Aug": 5000,
"Sep": 2400
}]
}]
}

Related

How to generate a json file with a nested dictionary from pandas df?

I need to generate a json file with a specific format from a pandas dataframe. The dataframe looks like this:
user_id
product_id
date
1
23
01-01-2022
1
24
05-01-2022
2
56
05-06-2022
3
23
02-07-2022
3
24
01-02-2022
3
56
02-01-2022
And the json file needs to have the following format:
{
"user_id": 1,
"items": [{
"product_id": 23,
"date": 01-01-2022
}, {
"product_id": 24,
"date": 05-01-2022
}]
}
{
"userid": 2,
"items": [{
"product_id": 56,
"date": 05-06-2022
}]
}
...etc
I've tried the following, but it's not the right format:
result = (now.groupby('user_id')['product_id','date'].apply(lambda x: dict(x.values)).to_json())
Any help would be much appreciated!

out = (df[['product_id','date']].apply(dict, axis=1)
.groupby(df['user_id']).apply(list)
.to_frame('items').reset_index()
.to_dict('records'))
print(out)
[{'user_id': 1, 'items': [{'product_id': 23, 'date': '01-01-2022'}, {'product_id': 24, 'date': '05-01-2022'}]},
{'user_id': 2, 'items': [{'product_id': 56, 'date': '05-06-2022'}]},
{'user_id': 3, 'items': [{'product_id': 23, 'date': '02-07-2022'}, {'product_id': 24, 'date': '01-02-2022'}, {'product_id': 56, 'date': '02-01-2022'}]}]

The below code can solve the issue. It first converts the datetime to string for the date column. Then, it converts the dataframe into the desired format.
data is your data table saved as the excel file.
# Import libraries
import pandas as pd
import openpyxl
import json
# Read the excel data
data = pd.read_excel("data.xlsx", sheet_name=0)
# Change the data type of the date column (day-month-year)
data['date'] = data['date'].apply(lambda x: x.strftime('%d-%m-%Y'))
# Convert to desired json format
json_data = (data.groupby(['user_id'])
.apply(lambda x: x[['product_id','date']].to_dict('records'))
.reset_index()
.rename(columns={0:'items'})
.to_json(orient='records'))
# Pretty print the result
# https://stackoverflow.com/a/12944035/10905535
json_data = json.loads(json_data)
print(json.dumps(json_data, indent=4, sort_keys=False))
The output:
[
{
"user_id": 1,
"items": [
{
"product_id": 23,
"date": "01-01-2022"
},
{
"product_id": 24,
"date": "05-01-2022"
}
]
},
{
"user_id": 2,
"items": [
{
"product_id": 56,
"date": "05-06-2022"
}
]
},
{
"user_id": 3,
"items": [
{
"product_id": 23,
"date": "02-07-2022"
},
{
"product_id": 24,
"date": "01-02-2022"
},
{
"product_id": 56,
"date": "02-01-2022"
}
]
}
]

Sort list of nested dictionaries by multiple attributes

i have my sample data as
b = [{"id": 1, "name": {"d_name": "miranda", "ingredient": "orange"}, "score": 1.123},
{"id": 20, "name": {"d_name": "limca", "ingredient": "lime"}, "score": 4.231},
{"id": 3, "name": {"d_name": "coke", "ingredient": "water"}, "score": 4.231},
{"id": 2, "name": {"d_name": "fanta", "ingredient": "water"}, "score": 4.231},
{"id": 3, "name": {"d_name": "dew", "ingredient": "water & sugar"}, "score": 2.231}]
i need to sort such that score ASC, name DESC, id ASC (by relational db notation).
So far, i have implemented
def sort_func(e):
return (e['score'], e['name']['d_name'], e['id'])
a = b.sort(key=sort_func, reverse=False)
This works for score ASC, name ASC, id ASC.
but for score ASC, name DESC, id ASC if i try to sort by name DESC it throws error. because of unary - error in -e['name']['d_name'].
How can i approach this problem, from here ? Thanks,
Edit 1:
i need to make this dynamic such that there can be case such as e['name'['d_name'] ASC, e['name']['ingredient'] DESC. How can i handle this type of dynamic behaviour ?

You can sort by -score, name, -id with reverse=True:
from pprint import pprint
b = [
{
"id": 1,
"name": {"d_name": "miranda", "ingredient": "orange"},
"score": 1.123,
},
{
"id": 20,
"name": {"d_name": "limca", "ingredient": "lime"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "coke", "ingredient": "water"},
"score": 4.231,
},
{
"id": 2,
"name": {"d_name": "fanta", "ingredient": "water"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "dew", "ingredient": "water & sugar"},
"score": 2.231,
},
]
pprint(
sorted(
b,
key=lambda k: (-k["score"], k["name"]["d_name"], -k["id"]),
reverse=True,
)
)
Prints:
[{'id': 1,
'name': {'d_name': 'miranda', 'ingredient': 'orange'},
'score': 1.123},
{'id': 3,
'name': {'d_name': 'dew', 'ingredient': 'water & sugar'},
'score': 2.231},
{'id': 20, 'name': {'d_name': 'limca', 'ingredient': 'lime'}, 'score': 4.231},
{'id': 2, 'name': {'d_name': 'fanta', 'ingredient': 'water'}, 'score': 4.231},
{'id': 3, 'name': {'d_name': 'coke', 'ingredient': 'water'}, 'score': 4.231}]

Find duplicates of dictionary in a list and combine them in Python

I have this list of dictionaries:
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
I want to be able to find the duplicates of ingredients (by either name or id). If there are duplicates and have the same unit_of_measurement, combine them into one dictionary and add the quantity accordingly. So the above data should return:
[
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
How do I go about it?

Assuming you have a dictionary represented like this:
data = {
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
}
What you could do is use a collections.defaultdict of lists to group the ingredients by a (name, id) grouping key:
from collections import defaultdict
ingredient_groups = defaultdict(list)
for ingredient in data["ingredients"]:
key = tuple(ingredient["ingredient"].items())
ingredient_groups[key].append(ingredient)
Then you could go through the grouped values of this defaultdict, and calculate the sum of the fraction quantities using fractions.Fractions. For unit_of_measurement and ingredient, we could probably just use the first grouped values.
from fractions import Fraction
result = [
{
"unit_of_measurement": value[0]["unit_of_measurement"],
"quantity": str(sum(Fraction(ingredient["quantity"]) for ingredient in value)),
"ingredient": value[0]["ingredient"],
}
for value in ingredient_groups.values()
]
Which will then give you this result:
[{'ingredient': {'id': 12, 'name': 'Balsamic Vinegar'},
'quantity': '1',
'unit_of_measurement': {'id': 13, 'name': 'Pound (Lb)'}},
{'ingredient': {'id': 14, 'name': 'Basil Leaves'},
'quantity': '3',
'unit_of_measurement': {'id': 15, 'name': 'Tablespoon'}}]
You'll probably need to amend the above to account for ingredients with different units or measurements, but this should get you started.

Python/PySpark parse JSON string with numbered attributes

I need to store JSON strings like the one below in some file format different from plaintext (e.g: parquet):
{
"vidName": "Foo",
"vidInfo.0.size.length": 10,
"vidInfo.0.size.width": 10,
"vidInfo.0.quality": "Good",
"vidInfo.1.size.length": 7,
"vidInfo.1.size.width": 3,
"vidInfo.1.quality": "Bad",
"vidInfo.2.size.length": 10,
"vidInfo.2.size.width": 2,
"vidInfo.2.quality": "Excelent"
}
There's no known bound for the index of vidInfo (can be 10, 20). Thus I want either to have vidInfos in an array, or explode such JSON object into multiple smaller objects.
I found this question: PHP JSON parsing (number attributes?)
But it is in PHP which I do not quite understand. And I am not sure whether it is same as what I need.
The intermediate data should be something like this:
{
"vidName": "Foo",
"vidInfo": [
{
"id": 0,
"size": {
"length": 10,
"width": 10
},
"quality": "Good"
},
{
"id": 1,
"size": {
"length": 7,
"width": 3
},
"quality": "Bad"
},
{
"id": 2,
"size": {
"length": 10,
"width": 2
},
"quality": "Excelent"
}
]
}
or like this:
{
"vidName": "Foo",
"vidInfo": [
{
"size": {
"length": 10,
"width": 10
},
"quality": "Good"
},
{
"size": {
"length": 7,
"width": 3
},
"quality": "Bad"
},
{
"size": {
"length": 10,
"width": 2
},
"quality": "Excelent"
}
]
}
I am stuck, and would need some hints to move on.
Could you please help?
Thanks a lot for your help.

I found this library https://github.com/amirziai/flatten which does the trick.
In [154]: some_json = {
...: "vidName": "Foo",
...: "vidInfo.0.size.length": 10,
...: "vidInfo.0.size.width": 10,
...: "vidInfo.0.quality": "Good",
...: "vidInfo.1.size.length": 7,
...: "vidInfo.1.size.width": 3,
...: "vidInfo.1.quality": "Bad",
...: "vidInfo.2.size.length": 10,
...: "vidInfo.2.size.width": 2,
...: "vidInfo.2.quality": "Excelent"
...: }
In [155]: some_json
Out[155]:
{'vidName': 'Foo',
'vidInfo.0.size.length': 10,
'vidInfo.0.size.width': 10,
'vidInfo.0.quality': 'Good',
'vidInfo.1.size.length': 7,
'vidInfo.1.size.width': 3,
'vidInfo.1.quality': 'Bad',
'vidInfo.2.size.length': 10,
'vidInfo.2.size.width': 2,
'vidInfo.2.quality': 'Excelent'}
In [156]: from flatten_json import unflatten_list
...: import json
...: nested_json = unflatten_list(json.loads(json.dumps(some_json)), '.')
In [157]: nested_json
Out[157]:
{'vidInfo': [{'quality': 'Good', 'size': {'length': 10, 'width': 10}},
{'quality': 'Bad', 'size': {'length': 7, 'width': 3}},
{'quality': 'Excelent', 'size': {'length': 10, 'width': 2}}],
'vidName': 'Foo'}

Group and sum list of dictionaries by parameter

I have a list of dictionaries of my products (drinks, food, etc), some of the products may be added several times. I need to group my products by product_id parameter and sum product_cost and product_quantity of each group to get the total product price.
I'm a newbie in python, understand how to group list of dictionaries but can't figure out how to sum some parameter values.
"products_list": [
{
"product_cost": 25,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 14,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
},
{
"product_cost": 13,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 7,
}
]
I need to achieve something like that:
"products_list": [
{
"product_cost": 38,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 21,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
}
]

You can start by sorting the list of dictionaries on product_name, and then group items based on product_name
Then for each group, calculate the total product and total quantity, create your final dictionary and update to the list, and then make your final dictionary
from itertools import groupby
dct = {"products_list": [
{
"product_cost": 25,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 14,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
},
{
"product_cost": 13,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 7,
}
]}
result = {}
li = []
#Sort product list on product_name
sorted_prod_list = sorted(dct['products_list'], key=lambda x:x['product_name'])
#Group on product_name
for model, group in groupby(sorted_prod_list,key=lambda x:x['product_name']):
grp = list(group)
#Compute total cost and qty, make the dictionary and add to list
total_cost = sum(item['product_cost'] for item in grp)
total_qty = sum(item['product_quantity'] for item in grp)
product_name = grp[0]['product_name']
product_id = grp[0]['product_id']
li.append({'product_name': product_name, 'product_id': product_id, 'product_cost': total_cost, 'product_quantity': total_qty})
#Make final dictionary
result['products_list'] = li
print(result)
The output will be
{
'products_list': [{
'product_name': 'Apples',
'product_id': 2,
'product_cost': 176.74,
'product_quantity': 800
},
{
'product_name': 'Coca-cola',
'product_id': 1,
'product_cost': 38,
'product_quantity': 21
}
]
}

You can try with pandas:
d = {"products_list": [
{
"product_cost": 25,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 14,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
},
{
"product_cost": 13,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 7,
}
]}
df=pd.DataFrame(d["products_list"])
Pass dict to pandas and perform groupby.
Then convert it back to dict with to_dict function.
result={}
result["products_list"]=df.groupby("product_name",as_index=False).sum().to_dict(orient="records")
Result:
{'products_list': [{'product_cost': 176.74,
'product_id': 2,
'product_name': 'Apples',
'product_quantity': 800},
{'product_cost': 38.0,
'product_id': 2,
'product_name': 'Coca-cola',
'product_quantity': 21}]}

Me personally I would reorganize it in to another dictionary by unique identifiers. Also, if you still need it in the list format you can still reorganize it in a dictionary, but you can just convert the dict.values() in to a list. Below is a function that does that.
def get_totals(product_dict):
totals = {}
for product in product_list["product_list"]:
if product["product_name"] not in totals:
totals[product["product_name"]] = product
else:
totals[product["product_name"]]["product_cost"] += product["product_cost"]
totals[product["product_name"]]["product_quantity"] += product["product_quantity"]
return list(totals.values())
output is:
[
{
'product_cost': 38,
'product_id': 1,
'product_name': 'Coca-cola',
'product_quantity': 21
},
{
'product_cost': 176.74,
'product_id': 2,
'product_name': 'Apples',
'product_quantity': 800
}
]
Now if you need it to belong to a product list key. Just reassign the list to the same key. Instead of returning list(total.values()) do
product_dict["product_list"] = list(total.values())
return product_dict
The output is a dictionary like:
{
"products_list": [
{
"product_cost": 38,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 21,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
}
]
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple relational tables to nested JSON format using Python - python

Related

How to generate a json file with a nested dictionary from pandas df?

Sort list of nested dictionaries by multiple attributes

Find duplicates of dictionary in a list and combine them in Python

Python/PySpark parse JSON string with numbered attributes

Group and sum list of dictionaries by parameter

Categories

Resources