Im trying to remove a dict from a list only if the condition is met.
there are with key decision = "Remove" in the nested dict, I wish to keep only the elements which with key decision = "keep".
For example,
mylist = [
{
"id": 1,
"Note": [
{
"xx": 259,
"yy": 1,
"decision": "Remove"
},
{
"xx": 260,
"yy": 2,
"decision": "keep"
}
]
},
{
"id": 1,
"Note": [
{
"xx": 303,
"yy": 2,
"channels": "keep"
}
]
}
]
output:
[
{
"id": 1,
"Note": [
{
"xx": 260,
"yy": 2,
"decision": "keep"
}
]
},
{
"id": 1,
"Note": [
{
"xx": 303,
"yy": 2,
"channels": "keep"
}
]
}
]
My solution: This doesn't seem right.
for d in mylist:
for k,v in enumerate(d['Note']):
if v["decision"] == "Remove":
del d[k]
Any help please?
Very similar to #sushanth's answer. The difference is, it's agnostic about the other key-value pairs in the dicts and only drops the dicts where decision is "Remove":
for d in mylist:
d['Note'] = [inner for inner in d['Note'] if inner.get('decision')!='Remove']
Output:
[{'id': 1, 'Note': [{'xx': 260, 'yy': 2, 'decision': 'keep'}]},
{'id': 1, 'Note': [{'xx': 303, 'yy': 2, 'channels': 'keep'}]}]
try using list comprehension
print(
[
{
"id": i['id'],
"Note": [j for j in i['Note'] if j.get('decision', '') == 'keep']
}
for i in mylist
]
)
[{'id': 1, 'Note': [{'xx': 260, 'yy': 2, 'decision': 'keep'}]}, {'id': 1, 'Note': []}]
I have a database query with pymongo , like so
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
]
rows = list(collection_name.aggregate(pipeline))
print(rows)
The result like to
[
{'_id': 'p1', 'count': 45},
{'_id': 'p2', 'count': 4},
{'_id': 'p3', 'count': 96},
{'_id': 'p1', 'count': 23},
{'_id': 'p4', 'count': 10}
]
Objective:
On the basis of the above results, I want to conduct statistics between partitions. For example, To get the number of count times in the following intervals:
partition, count
(0, 10], 2
[11, 50), 2
[50, 100], 1
Is there a way of doing this entirely using MongoDB aggregate framework?
Any comments would be very helpful. Thanks.
Answer by #Wernfried Domscheit
$bucket
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
{"$bucket": {
"groupBy": "$count",
"boundaries": [0, 11, 51, 100],
"default": "Other",
"output": {
"count": {"$sum": 1},
}
}}
]
rows = list(tbl_athletes.aggregate(pipeline))
rows
$bucketAuto
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
{"$bucketAuto": {
"groupBy": "$count",
"buckets": 5,
"output": {
"count": {"$sum": 1},
}
}}
]
rows = list(tbl_athletes.aggregate(pipeline))
rows
NOTICE:
In $bucket,default must be there.
Yes, you have the $bucket operator for that:
db.collection.aggregate([
{
$bucket: {
groupBy: "$count",
boundaries: [0, 11, 51, 100],
output: {
count: { $sum: 1 },
}
}
}
])
Or use $bucketAuto where the intervals are generated automatically.
i have my sample data as
b = [{"id": 1, "name": {"d_name": "miranda", "ingredient": "orange"}, "score": 1.123},
{"id": 20, "name": {"d_name": "limca", "ingredient": "lime"}, "score": 4.231},
{"id": 3, "name": {"d_name": "coke", "ingredient": "water"}, "score": 4.231},
{"id": 2, "name": {"d_name": "fanta", "ingredient": "water"}, "score": 4.231},
{"id": 3, "name": {"d_name": "dew", "ingredient": "water & sugar"}, "score": 2.231}]
i need to sort such that score ASC, name DESC, id ASC (by relational db notation).
So far, i have implemented
def sort_func(e):
return (e['score'], e['name']['d_name'], e['id'])
a = b.sort(key=sort_func, reverse=False)
This works for score ASC, name ASC, id ASC.
but for score ASC, name DESC, id ASC if i try to sort by name DESC it throws error. because of unary - error in -e['name']['d_name'].
How can i approach this problem, from here ? Thanks,
Edit 1:
i need to make this dynamic such that there can be case such as e['name'['d_name'] ASC, e['name']['ingredient'] DESC. How can i handle this type of dynamic behaviour ?
You can sort by -score, name, -id with reverse=True:
from pprint import pprint
b = [
{
"id": 1,
"name": {"d_name": "miranda", "ingredient": "orange"},
"score": 1.123,
},
{
"id": 20,
"name": {"d_name": "limca", "ingredient": "lime"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "coke", "ingredient": "water"},
"score": 4.231,
},
{
"id": 2,
"name": {"d_name": "fanta", "ingredient": "water"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "dew", "ingredient": "water & sugar"},
"score": 2.231,
},
]
pprint(
sorted(
b,
key=lambda k: (-k["score"], k["name"]["d_name"], -k["id"]),
reverse=True,
)
)
Prints:
[{'id': 1,
'name': {'d_name': 'miranda', 'ingredient': 'orange'},
'score': 1.123},
{'id': 3,
'name': {'d_name': 'dew', 'ingredient': 'water & sugar'},
'score': 2.231},
{'id': 20, 'name': {'d_name': 'limca', 'ingredient': 'lime'}, 'score': 4.231},
{'id': 2, 'name': {'d_name': 'fanta', 'ingredient': 'water'}, 'score': 4.231},
{'id': 3, 'name': {'d_name': 'coke', 'ingredient': 'water'}, 'score': 4.231}]
I have this list of dictionaries:
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
I want to be able to find the duplicates of ingredients (by either name or id). If there are duplicates and have the same unit_of_measurement, combine them into one dictionary and add the quantity accordingly. So the above data should return:
[
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
How do I go about it?
Assuming you have a dictionary represented like this:
data = {
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
}
What you could do is use a collections.defaultdict of lists to group the ingredients by a (name, id) grouping key:
from collections import defaultdict
ingredient_groups = defaultdict(list)
for ingredient in data["ingredients"]:
key = tuple(ingredient["ingredient"].items())
ingredient_groups[key].append(ingredient)
Then you could go through the grouped values of this defaultdict, and calculate the sum of the fraction quantities using fractions.Fractions. For unit_of_measurement and ingredient, we could probably just use the first grouped values.
from fractions import Fraction
result = [
{
"unit_of_measurement": value[0]["unit_of_measurement"],
"quantity": str(sum(Fraction(ingredient["quantity"]) for ingredient in value)),
"ingredient": value[0]["ingredient"],
}
for value in ingredient_groups.values()
]
Which will then give you this result:
[{'ingredient': {'id': 12, 'name': 'Balsamic Vinegar'},
'quantity': '1',
'unit_of_measurement': {'id': 13, 'name': 'Pound (Lb)'}},
{'ingredient': {'id': 14, 'name': 'Basil Leaves'},
'quantity': '3',
'unit_of_measurement': {'id': 15, 'name': 'Tablespoon'}}]
You'll probably need to amend the above to account for ingredients with different units or measurements, but this should get you started.
This question already has answers here:
Python: create a nested dictionary from a list of parent child values
(3 answers)
Closed 3 years ago.
I have a list of dictionaries that I got from the database in parent-child relationship:
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
I want to convert it to a nested dictionary such as
[ { "id": 1, "parent_id": 0, "name": "Wood", "price": 0, "children": [ { "id": 2, "parent_id": 1, "name": "Mango", "price": 18, "children": [ { "id": 3, "parent_id": 2, "name": "Table", "price": 342 }, { "id": 4, "parent_id": 2, "name": "Box", "price": 340, "children": [ { "id": 5, "parent_id": 4, "name": "Pencil", "price": 240 } ] } ] } ] }, { "id": 6, "parent_id": 0, "name": "Electronic", "price": 20, "children": [ { "id": 7, "parent_id": 6, "name": "TV", "price": 350 }, { "id": 8, "parent_id": 6, "name": "Mobile", "price": 300, "children": [ { "id": 9, "parent_id": 8, "name": "Iphone", "price": 0, "children": [ { "id": 10, "parent_id": 9, "name": "Iphone 10", "price": 400 } ] } ] } ] } ]
You can do this recursively, starting from the root nodes (where parent_id = 0) going downwards. But before your recursive calls, you can group nodes by their parent_id so that accessing them in each recursive call can be done in constant time:
levels = {}
for n in data:
levels.setdefault(n['parent_id'], []).append(n)
def build_tree(parent_id=0):
nodes = [dict(n) for n in levels.get(parent_id, [])]
for n in nodes:
children = build_tree(n['id'])
if children: n['children'] = children
return nodes
tree = build_tree()
print(tree)
Output
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0,'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400}]}]}]}]
Code is documented inline. Ignoring the corner cases like circular relations etc.
# Actual Data
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
# Create Parent -> child links using dictonary
data_dict = { r['id'] : r for r in data}
for r in data:
if r['parent_id'] in data_dict:
parent = data_dict[r['parent_id']]
if 'children' not in parent:
parent['children'] = []
parent['children'].append(r)
# Helper function to get all the id's associated with a parent
def get_all_ids(r):
l = list()
l.append(r['id'])
if 'children' in r:
for c in r['children']:
l.extend(get_all_ids(c))
return l
# Trimp the results to have a id only once
ids = set(data_dict.keys())
result = []
for r in data_dict.values():
the_ids = set(get_all_ids(r))
if ids.intersection(the_ids):
ids = ids.difference(the_ids)
result.append(r)
print (result)
Output:
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0, 'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400}]}]}]}]
I worked out a VERY SHORT solution, I believe it isn't the most efficient algorithm, but it does the job, will need a hell of optimization to work on very large data sets.
for i in range(len(data)-1, -1, -1):
data[i]["children"] = [child for child in data if child["parent_id"] == data[i]["id"]]
for child in data[i]["children"]:
data.remove(child)
Here is the complete explanation:
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
# Looping backwards,placing the lowest child
# into the next parent in the heirarchy
for i in range(len(data)-1, -1, -1):
# Create a dict key for the current parent in the loop called "children"
# and assign to it a list comprehension that loops over all items in the data
# to get the elements which have a parent_id equivalent to our current element's id
data[i]["children"] = [child for child in data if child["parent_id"] == data[i]["id"]]
# since the child is placed inside our its parent already, we will
# remove it from its actual position in the data
for child in data[i]["children"]:
data.remove(child)
# print the new data structure
print(data)
And here is the output:
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342, 'children': []}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240, 'children': []}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350, 'children': []}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0, 'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400, 'children': []}]}]}]}]