Group and sum list of dictionaries by parameter - python

I have a list of dictionaries of my products (drinks, food, etc), some of the products may be added several times. I need to group my products by product_id parameter and sum product_cost and product_quantity of each group to get the total product price.
I'm a newbie in python, understand how to group list of dictionaries but can't figure out how to sum some parameter values.
"products_list": [
{
"product_cost": 25,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 14,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
},
{
"product_cost": 13,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 7,
}
]
I need to achieve something like that:
"products_list": [
{
"product_cost": 38,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 21,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
}
]

You can start by sorting the list of dictionaries on product_name, and then group items based on product_name
Then for each group, calculate the total product and total quantity, create your final dictionary and update to the list, and then make your final dictionary
from itertools import groupby
dct = {"products_list": [
{
"product_cost": 25,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 14,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
},
{
"product_cost": 13,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 7,
}
]}
result = {}
li = []
#Sort product list on product_name
sorted_prod_list = sorted(dct['products_list'], key=lambda x:x['product_name'])
#Group on product_name
for model, group in groupby(sorted_prod_list,key=lambda x:x['product_name']):
grp = list(group)
#Compute total cost and qty, make the dictionary and add to list
total_cost = sum(item['product_cost'] for item in grp)
total_qty = sum(item['product_quantity'] for item in grp)
product_name = grp[0]['product_name']
product_id = grp[0]['product_id']
li.append({'product_name': product_name, 'product_id': product_id, 'product_cost': total_cost, 'product_quantity': total_qty})
#Make final dictionary
result['products_list'] = li
print(result)
The output will be
{
'products_list': [{
'product_name': 'Apples',
'product_id': 2,
'product_cost': 176.74,
'product_quantity': 800
},
{
'product_name': 'Coca-cola',
'product_id': 1,
'product_cost': 38,
'product_quantity': 21
}
]
}

You can try with pandas:
d = {"products_list": [
{
"product_cost": 25,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 14,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
},
{
"product_cost": 13,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 7,
}
]}
df=pd.DataFrame(d["products_list"])
Pass dict to pandas and perform groupby.
Then convert it back to dict with to_dict function.
result={}
result["products_list"]=df.groupby("product_name",as_index=False).sum().to_dict(orient="records")
Result:
{'products_list': [{'product_cost': 176.74,
'product_id': 2,
'product_name': 'Apples',
'product_quantity': 800},
{'product_cost': 38.0,
'product_id': 2,
'product_name': 'Coca-cola',
'product_quantity': 21}]}

Me personally I would reorganize it in to another dictionary by unique identifiers. Also, if you still need it in the list format you can still reorganize it in a dictionary, but you can just convert the dict.values() in to a list. Below is a function that does that.
def get_totals(product_dict):
totals = {}
for product in product_list["product_list"]:
if product["product_name"] not in totals:
totals[product["product_name"]] = product
else:
totals[product["product_name"]]["product_cost"] += product["product_cost"]
totals[product["product_name"]]["product_quantity"] += product["product_quantity"]
return list(totals.values())
output is:
[
{
'product_cost': 38,
'product_id': 1,
'product_name': 'Coca-cola',
'product_quantity': 21
},
{
'product_cost': 176.74,
'product_id': 2,
'product_name': 'Apples',
'product_quantity': 800
}
]
Now if you need it to belong to a product list key. Just reassign the list to the same key. Instead of returning list(total.values()) do
product_dict["product_list"] = list(total.values())
return product_dict
The output is a dictionary like:
{
"products_list": [
{
"product_cost": 38,
"product_id": 1,
"product_name": "Coca-cola",
"product_quantity": 21,
},
{
"product_cost": 176.74,
"product_id": 2,
"product_name": "Apples",
"product_quantity": 800,
}
]
}

Related

remove dict from nested list based on key-value condition with python

Im trying to remove a dict from a list only if the condition is met.
there are with key decision = "Remove" in the nested dict, I wish to keep only the elements which with key decision = "keep".
For example,
mylist = [
{
"id": 1,
"Note": [
{
"xx": 259,
"yy": 1,
"decision": "Remove"
},
{
"xx": 260,
"yy": 2,
"decision": "keep"
}
]
},
{
"id": 1,
"Note": [
{
"xx": 303,
"yy": 2,
"channels": "keep"
}
]
}
]
output:
[
{
"id": 1,
"Note": [
{
"xx": 260,
"yy": 2,
"decision": "keep"
}
]
},
{
"id": 1,
"Note": [
{
"xx": 303,
"yy": 2,
"channels": "keep"
}
]
}
]
My solution: This doesn't seem right.
for d in mylist:
for k,v in enumerate(d['Note']):
if v["decision"] == "Remove":
del d[k]
Any help please?
Very similar to #sushanth's answer. The difference is, it's agnostic about the other key-value pairs in the dicts and only drops the dicts where decision is "Remove":
for d in mylist:
d['Note'] = [inner for inner in d['Note'] if inner.get('decision')!='Remove']
Output:
[{'id': 1, 'Note': [{'xx': 260, 'yy': 2, 'decision': 'keep'}]},
{'id': 1, 'Note': [{'xx': 303, 'yy': 2, 'channels': 'keep'}]}]
try using list comprehension
print(
[
{
"id": i['id'],
"Note": [j for j in i['Note'] if j.get('decision', '') == 'keep']
}
for i in mylist
]
)
[{'id': 1, 'Note': [{'xx': 260, 'yy': 2, 'decision': 'keep'}]}, {'id': 1, 'Note': []}]

Count with MongoDB aggregate $group result

I have a database query with pymongo , like so
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
]
rows = list(collection_name.aggregate(pipeline))
print(rows)
The result like to
[
{'_id': 'p1', 'count': 45},
{'_id': 'p2', 'count': 4},
{'_id': 'p3', 'count': 96},
{'_id': 'p1', 'count': 23},
{'_id': 'p4', 'count': 10}
]
Objective:
On the basis of the above results, I want to conduct statistics between partitions. For example, To get the number of count times in the following intervals:
partition, count
(0, 10], 2
[11, 50), 2
[50, 100], 1
Is there a way of doing this entirely using MongoDB aggregate framework?
Any comments would be very helpful. Thanks.
Answer by #Wernfried Domscheit
$bucket
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
{"$bucket": {
"groupBy": "$count",
"boundaries": [0, 11, 51, 100],
"default": "Other",
"output": {
"count": {"$sum": 1},
}
}}
]
rows = list(tbl_athletes.aggregate(pipeline))
rows
$bucketAuto
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
{"$bucketAuto": {
"groupBy": "$count",
"buckets": 5,
"output": {
"count": {"$sum": 1},
}
}}
]
rows = list(tbl_athletes.aggregate(pipeline))
rows
NOTICE:
In $bucket,default must be there.
Yes, you have the $bucket operator for that:
db.collection.aggregate([
{
$bucket: {
groupBy: "$count",
boundaries: [0, 11, 51, 100],
output: {
count: { $sum: 1 },
}
}
}
])
Or use $bucketAuto where the intervals are generated automatically.

Sort list of nested dictionaries by multiple attributes

i have my sample data as
b = [{"id": 1, "name": {"d_name": "miranda", "ingredient": "orange"}, "score": 1.123},
{"id": 20, "name": {"d_name": "limca", "ingredient": "lime"}, "score": 4.231},
{"id": 3, "name": {"d_name": "coke", "ingredient": "water"}, "score": 4.231},
{"id": 2, "name": {"d_name": "fanta", "ingredient": "water"}, "score": 4.231},
{"id": 3, "name": {"d_name": "dew", "ingredient": "water & sugar"}, "score": 2.231}]
i need to sort such that score ASC, name DESC, id ASC (by relational db notation).
So far, i have implemented
def sort_func(e):
return (e['score'], e['name']['d_name'], e['id'])
a = b.sort(key=sort_func, reverse=False)
This works for score ASC, name ASC, id ASC.
but for score ASC, name DESC, id ASC if i try to sort by name DESC it throws error. because of unary - error in -e['name']['d_name'].
How can i approach this problem, from here ? Thanks,
Edit 1:
i need to make this dynamic such that there can be case such as e['name'['d_name'] ASC, e['name']['ingredient'] DESC. How can i handle this type of dynamic behaviour ?
You can sort by -score, name, -id with reverse=True:
from pprint import pprint
b = [
{
"id": 1,
"name": {"d_name": "miranda", "ingredient": "orange"},
"score": 1.123,
},
{
"id": 20,
"name": {"d_name": "limca", "ingredient": "lime"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "coke", "ingredient": "water"},
"score": 4.231,
},
{
"id": 2,
"name": {"d_name": "fanta", "ingredient": "water"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "dew", "ingredient": "water & sugar"},
"score": 2.231,
},
]
pprint(
sorted(
b,
key=lambda k: (-k["score"], k["name"]["d_name"], -k["id"]),
reverse=True,
)
)
Prints:
[{'id': 1,
'name': {'d_name': 'miranda', 'ingredient': 'orange'},
'score': 1.123},
{'id': 3,
'name': {'d_name': 'dew', 'ingredient': 'water & sugar'},
'score': 2.231},
{'id': 20, 'name': {'d_name': 'limca', 'ingredient': 'lime'}, 'score': 4.231},
{'id': 2, 'name': {'d_name': 'fanta', 'ingredient': 'water'}, 'score': 4.231},
{'id': 3, 'name': {'d_name': 'coke', 'ingredient': 'water'}, 'score': 4.231}]

Find duplicates of dictionary in a list and combine them in Python

I have this list of dictionaries:
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
I want to be able to find the duplicates of ingredients (by either name or id). If there are duplicates and have the same unit_of_measurement, combine them into one dictionary and add the quantity accordingly. So the above data should return:
[
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
How do I go about it?
Assuming you have a dictionary represented like this:
data = {
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
}
What you could do is use a collections.defaultdict of lists to group the ingredients by a (name, id) grouping key:
from collections import defaultdict
ingredient_groups = defaultdict(list)
for ingredient in data["ingredients"]:
key = tuple(ingredient["ingredient"].items())
ingredient_groups[key].append(ingredient)
Then you could go through the grouped values of this defaultdict, and calculate the sum of the fraction quantities using fractions.Fractions. For unit_of_measurement and ingredient, we could probably just use the first grouped values.
from fractions import Fraction
result = [
{
"unit_of_measurement": value[0]["unit_of_measurement"],
"quantity": str(sum(Fraction(ingredient["quantity"]) for ingredient in value)),
"ingredient": value[0]["ingredient"],
}
for value in ingredient_groups.values()
]
Which will then give you this result:
[{'ingredient': {'id': 12, 'name': 'Balsamic Vinegar'},
'quantity': '1',
'unit_of_measurement': {'id': 13, 'name': 'Pound (Lb)'}},
{'ingredient': {'id': 14, 'name': 'Basil Leaves'},
'quantity': '3',
'unit_of_measurement': {'id': 15, 'name': 'Tablespoon'}}]
You'll probably need to amend the above to account for ingredients with different units or measurements, but this should get you started.

Convert list of dictionaries into a nested dictionary [duplicate]

This question already has answers here:
Python: create a nested dictionary from a list of parent child values
(3 answers)
Closed 3 years ago.
I have a list of dictionaries that I got from the database in parent-child relationship:
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
I want to convert it to a nested dictionary such as
[ { "id": 1, "parent_id": 0, "name": "Wood", "price": 0, "children": [ { "id": 2, "parent_id": 1, "name": "Mango", "price": 18, "children": [ { "id": 3, "parent_id": 2, "name": "Table", "price": 342 }, { "id": 4, "parent_id": 2, "name": "Box", "price": 340, "children": [ { "id": 5, "parent_id": 4, "name": "Pencil", "price": 240 } ] } ] } ] }, { "id": 6, "parent_id": 0, "name": "Electronic", "price": 20, "children": [ { "id": 7, "parent_id": 6, "name": "TV", "price": 350 }, { "id": 8, "parent_id": 6, "name": "Mobile", "price": 300, "children": [ { "id": 9, "parent_id": 8, "name": "Iphone", "price": 0, "children": [ { "id": 10, "parent_id": 9, "name": "Iphone 10", "price": 400 } ] } ] } ] } ]
You can do this recursively, starting from the root nodes (where parent_id = 0) going downwards. But before your recursive calls, you can group nodes by their parent_id so that accessing them in each recursive call can be done in constant time:
levels = {}
for n in data:
levels.setdefault(n['parent_id'], []).append(n)
def build_tree(parent_id=0):
nodes = [dict(n) for n in levels.get(parent_id, [])]
for n in nodes:
children = build_tree(n['id'])
if children: n['children'] = children
return nodes
tree = build_tree()
print(tree)
Output
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0,'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400}]}]}]}]
Code is documented inline. Ignoring the corner cases like circular relations etc.
# Actual Data
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
# Create Parent -> child links using dictonary
data_dict = { r['id'] : r for r in data}
for r in data:
if r['parent_id'] in data_dict:
parent = data_dict[r['parent_id']]
if 'children' not in parent:
parent['children'] = []
parent['children'].append(r)
# Helper function to get all the id's associated with a parent
def get_all_ids(r):
l = list()
l.append(r['id'])
if 'children' in r:
for c in r['children']:
l.extend(get_all_ids(c))
return l
# Trimp the results to have a id only once
ids = set(data_dict.keys())
result = []
for r in data_dict.values():
the_ids = set(get_all_ids(r))
if ids.intersection(the_ids):
ids = ids.difference(the_ids)
result.append(r)
print (result)
Output:
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0, 'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400}]}]}]}]
I worked out a VERY SHORT solution, I believe it isn't the most efficient algorithm, but it does the job, will need a hell of optimization to work on very large data sets.
for i in range(len(data)-1, -1, -1):
data[i]["children"] = [child for child in data if child["parent_id"] == data[i]["id"]]
for child in data[i]["children"]:
data.remove(child)
Here is the complete explanation:
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
# Looping backwards,placing the lowest child
# into the next parent in the heirarchy
for i in range(len(data)-1, -1, -1):
# Create a dict key for the current parent in the loop called "children"
# and assign to it a list comprehension that loops over all items in the data
# to get the elements which have a parent_id equivalent to our current element's id
data[i]["children"] = [child for child in data if child["parent_id"] == data[i]["id"]]
# since the child is placed inside our its parent already, we will
# remove it from its actual position in the data
for child in data[i]["children"]:
data.remove(child)
# print the new data structure
print(data)
And here is the output:
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342, 'children': []}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240, 'children': []}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350, 'children': []}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0, 'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400, 'children': []}]}]}]}]

Categories

Resources