Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have a list of json objects like this -
[{Level1: "A", Level2: "B", Level3: "C", Level4: "item1"},
{Level1: "A", Level2: "B", Level3: null, Level4: "item2"},
{Level1: "D", Level2: null, Level3: null, Level4: "item3"}]
In Python, I want to group them by level to create a tree structure.
{text: "root": items:
[{text: "A", items: [
{text: "B", items: [
{text: "C", items: [
text: "item1", items:[]]},
{text: "item2", items: []}}]},
{text: "D", items: [{text: "item3", items: []}]}
]
]}
# pseudocode
result = dict()
result["text"] = "root"
result["items"] = []
d = {"Level1": set(), "Level2": set(), "Level3": set(), "Level4": set()}
for row in data_rows:
insertLocation = result["items"]
for key in ["Level1", "Level2", "Level3", "Level4"]:
txt = row[key]
if txt in d[key]:
for j in insertLocation:
if j.text = txt:
insertLocation = j
break
else:
newItem = {"text": txt, "items": []}
insertLocation = newItem.items
d[key].add(txt)
Can anyone provide any feedback on my code to perhaps make it more efficient? (or if there's a better way to do this, would be super great to know). I'm really looking to maximize efficiency.
Here is how I would suggest doing it.
To understand this, you will have to understand:
functools.reduce
collections.defaultdict
Recursion
List comprehensions
Basically, I first change the input (because the keys don't matter), to be a list of lists of non-null values.
Then I functools.reduce the lists of non-null values into a simple tree structure which is of the shape that we want.
Everything in this simple_tree is a collections.defaultdict so I finally convert these into normal dicts and use recursion to get the "text"/"items" structure that you want.
This approach is more "functional" than your approach, and is smarter about pruning the input.
import collections
import functools
sequence = [
{"Level1": "A", "Level2": "B", "Level3": "C", "Level4": "item1"},
{"Level1": "A", "Level2": "B", "Level3": None, "Level4": "item2"},
{"Level1": "D", "Level2": None, "Level3": None, "Level4": "item3"},
]
# Change the input to be only the non-null values (order is preserved)
# I am using 2 list comprehensions here
seq_values = [[y for y in x.values() if y is not None] for x in sequence]
# Define a recursive defaultdict
# https://stackoverflow.com/questions/19189274/nested-defaultdict-of-defaultdict
def recursive_defaultdict():
return collections.defaultdict(recursive_defaultdict)
# Use reduce to get a simple tree in the shape/structure we want
def reduce_function(agg, cur):
sub_agg = agg
for seq_value in cur:
sub_agg = sub_agg[seq_value]
return agg
# Use reduce to get a simple tree in the shape/structure we want
simple_tree = functools.reduce(
reduce_function,
seq_values,
recursive_defaultdict()
)
# Use recursion to change simple "defaultdict tree" into custom text/items tree
def convert_defaultdict_tree_to_custom_dict(dd_tree):
if dd_tree:
return [{"text": x, "items": convert_defaultdict_tree_to_custom_dict(y)} for x, y in dd_tree.items()]
return []
# Here is the final_result, the thing you want.
final_result = {
"text": "root",
"items": convert_defaultdict_tree_to_custom_dict(simple_tree)
}
# Test the final_result is correct
expected_result = {
"text": "root",
"items": [
{
"text": "A",
"items": [
{
"text": "B",
"items": [
{
"text": "C",
"items": [
{"text": "item1", "items": []},
],
}, {
"text": "item2",
"items": [],
}
],
}
],
},
{"text": "D", "items": [{"text": "item3", "items": []}]},
],
}
assert final_result == expected_result
You can see at the end I do an assert to make sure the final_result is what we want. Let me know if you want help understanding any of this.
I suggest you check this answer by #anom on "What are some good code optimization methods?". In summary:
Step 1. Do not think about performance, think about clarity and correctness.
...
Step 4. See step 1.
Here's an alternative:
data_rows = [
{"Level1": "A", "Level2": "B", "Level3": "C", "Level4": "item1"},
{"Level1": "A", "Level2": "B", "Level3": None, "Level4": "item2"},
{"Level1": "D", "Level2": None,"Level3": None, "Level4": "item3"}
]
result = { "root": {} }
for row in data_rows:
current_level = result["root"]
# go through every "level" (key) in the row except the last one
for level in row.keys()[:-1]:
# make sure the level is not None
if row[level]:
# check if the level already exists
if row[level] not in current_level:
# if it doesn't, create this level
current_level[row[level]] = {}
# we move to the new level
current_level = current_level[row[level]]
# we set the "item" in the last valid level
current_level["items"]=row[row.keys()[-1]]
import json
print(json.dumps(result, indent=4))
This will create a tree as below:
{
"root": {
"A": {
"B": {
"items": "item2",
"C": {
"items": "item1"
}
}
},
"D": {
"items": "item3"
}
}
}
So instead of having a separate key "text" with the level name, the level name itself becomes the key.
The above code will work with any number of levels, as long as the last item in every data_row is the "item"
Related
I need each document in a collection to be updated only if its content is different, regardless of the order of the elements in nested lists.
Fundamentally, two versions should be the same if the elements are identical regardless of their order. MongoDB does not do that, by default.
def upsert(query, update):
# collection is a pymongo.collection.Collection object
result = collection.update_one(query, update, upsert=True)
print("\tFound match: ", result.matched_count > 0)
print("\tCreated: ", result.upserted_id is not None)
print("\tModified existing: ", result.modified_count > 0)
query = {"name": "Some name"}
update = {"$set": {
"products": [
{"product_name": "a"},
{"product_name": "b"},
{"product_name": "c"}]
}}
print("First update")
upsert(query, update)
print("Same update")
upsert(query, update)
update = {"$set": {
"products": [
{"product_name": "c"},
{"product_name": "b"},
{"product_name": "a"}]
}}
print("Update with different order of products")
upsert(query, update)
Output:
First update
Found match: False
Created: True
Modified existing: False
Same update
Found match: True
Created: False
Modified existing: False
Update with different order of products
Found match: True
Created: False
Modified existing: True
The last update does modify the document because the order of products are indeed different.
I did find a working solution which is to compare a sorting of the queried document's content and a sorting of the new one.
Thanks to Zero Piraeus's response for the short and convenient way to sort for comparison.
def ordered(obj):
if isinstance(obj, dict):
return sorted((k, ordered(v)) for k, v in obj.items())
if isinstance(obj, list):
return sorted(ordered(x) for x in obj)
else:
return obj
I apply it to compare the current and the new versions of the document. If their sorting are different, I apply the update.
new_update = {
"products": [
{"product_name": "b"},
{"product_name": "c"},
{"product_name": "a"}]
}
returned_doc = collection.find_one(query)
# Merging remote document with local dictionary
merged_doc = {**returned_doc, **new_update}
if ordered(returned_doc) != ordered(merged_doc):
upsert(query, {"$set": new_update})
print("Updated")
else:
print("Not Updated")
Output:
Not Updated
That works, but that relies on python to do the comparison, introducing a delay between the read and the write.
Is there a way to do it atomically ? Or, even better, a way to set a MongoDB Collection to adopt some kind of "order inside arrays doesn't matter" mode ?
This is part of a generic implementation. Documents can have any kind of nesting in their structure.
#nimrodserok correctly pointed out a flaw in my first answer. Here's my updated answer that's a slight variation of his answer.
This should also preserve the upsert option.
new_new_update = [
{
"$set": {
"products": {
"$let": {
"vars": {
"new_products": [
{"product_name": "b"},
{"product_name": "c"},
{"product_name": "a"}
],
"current_products": {
# need this for upsert
"$ifNull": ["$products", []]
}
},
"in": {
"$cond": [
{"$setEquals": ["$$current_products", "$$new_products"]},
"$$current_products",
"$$new_products"
]
}
}
}
}
}
]
Here's a mongoplayground.net example to demonstrate the concept. You can change the "name" value to verify the upsert option.
I'm curious what the result values are for this update_one.
EDIT:
According to your comment (and the resemblance to other question) I suggest:
db.collection.updateMany(
{"name": "Some name"},
[{
$set: {products: {
$cond: [
{$setEquals: [["a", "c", "b"], "$products.product_name"]},
"$products",
[{"product_name": "a"}, {"product_name": "c"}, {"product_name": "b"}]
]
}}
}]
)
See how it works on the playground example
Original answer:
One option is to use the query part of the update to work only on documents that are matching your condition:
db.collection.update(
{"name": "Some name",
"products.product_name": {
$not: {$all: ["a", "b", "c"]}}
},
{"$set": {
"products": [
{"product_name": "b"},
{"product_name": "c"},
{"product_name": "a"}
]
}}
)
See how it works on the playground example
I have a list of objects and I want to create another list of items but grouped by "Name" and two fields which are number of instances of a particular instance type.
I have this :
result = [
{"Name": "Foo", "Type": "A", "RandomColumn1": "1"},
{"Name": "Bar", "Type": "B", "RandomColumn2": "2"},
{"Name": "Foo", "Type": "A", "RandomColumn3": "3"},
{"Name": "Bar", "Type": "A", "RandomColumn4": "4"},
{"Name": "Foo", "Type": "B", "RandomColumn5": "5"},
]
I am trying to get a count of the number of different "Type" columns whilst discarding any other column - RandomColumnX in this case.
I want the above to come out like this:
[{"Name": "Foo", "A": 2, "B": 1}, {"Name": "Bar", "A": 1, "B": 1}]
I tried doing something like this :
group_requests = [{
"Name": key,
"A": len([d for d in list(value) if d.get('Type') == 'A']),
"B": len([y for y in list(value) if y['Type'] == 'B']),
} for key, value in groupby(result, key=lambda x: x['Name'])]
However, it does not count the values in the "B" column and the count for this key is always 0.
Can anybody help me?
Mistake 1.
In order for the itertools.groupby to work your input iterable needs to already be sorted on the same key function.
result = sorted(result, key=lambda x: x["Name"])
Mistake 2.
The returned group i.e value is itself an iterator, so you need to save the output in order to iterate over it multiple times.
group_requests = []
for key, value in itertools.groupby(result, key=lambda x: x["Name"]):
value = list(value) # save the output
temp = {
"Name": key,
"A": len([d for d in value if d.get("Type") == "A"]),
"B": len([y for y in value if y["Type"] == "B"]),
}
group_requests.append(temp)
If someone wants without list comprehension. It can be achieve like this
from collections import defaultdict
result = [{"Name": "Foo", "Type": "A", "RandomColumn1": "1"},
{"Name": "Bar", "Type": "B", "RandomColumn2": "2"},
{"Name": "Foo", "Type": "A", "RandomColumn3": "3"},
{"Name": "Bar", "Type": "A", "RandomColumn4": "4"},
{"Name": "Foo", "Type": "B", "RandomColumn5": "5"}]
group_requests = []
counterA = defaultdict(int)
counterB = defaultdict(int)
names = set()
for val in result:
name = val["Name"]
type = val["Type"]
if type == "A":
counterA[name] += 1
else:
counterB[name] += 1
names.add(name)
for name in names:
group_requests.append({
"Name": name,
"A": counterA[name],
"B": counterB[name]
})
print(group_requests)
I have a large JSON file with the following structure, with different depth embedded in a node. I want to delete different levels based on assigned depth.
So far I tried to cut some level manually but still it doesn't remove correctly as I remove them based on index and each time indexes shift
content = json.load(file)
content_copy = content.copy()
for j, i in enumerate(content):
if 'children' in i:
for xx, x in enumerate(i['children']):
if 'children' in x:
for zz, z in enumerate(x['children']):
if 'children' in z:
del content_copy[j]['children'][xx]['children'][zz]
Input:
[
{
"name":"1",
"children":[
{
"name":"3",
"children":"5"
},
{
"name":"33",
"children":"51"
},
{
"name":"13",
"children":[
{
"name":"20",
"children":"30"
},
{
"name":"40",
"children":"50"
}
]
}
]
},
{
"name":"2",
"children":[
{
"name":"7",
"children":"6"
},
{
"name":"3",
"children":"521"
},
{
"name":"193",
"children":"292"
}
]
}
]
Output:
In which in 'name':13, its children were removed.
[
{
"name": "1",
"children": [
{
"name": "3",
"children": "5"
},
{
"name": "33",
"children": "51"
},
{
"name": "13"
}
]
},
{
"name": "2",
"children": [
{
"name": "7",
"children": "6"
},
{
"name": "3",
"children": "521"
},
{
"name": "193",
"children": "292"
}
]
}
]
Not a python answer, but in the hope it's useful to someone, here is a one liner using jq tool:
<file jq 'del(.[][][]?[]?[]?)'
It simply deletes all elements that has a depth more than the 5.
The question mark ? is used to avoid iterating over elements that would have a depth less than 3.
One way to prune is pass depth+1 in a recursive function call.
You are asking for different behaviors for different types. If the grandchild is just a string, you want to keep it, but if it is a list then you want to prune. This seems inconsistent, 13 should have children ["20", "30"] but then they wouldn't be the same node structure, so I can see your dilemma.
I would convent them to a tree of node objects, and then just prune nodes, but to get the exact output you listed, I can just selectively prune based on whether child is a string or list.
import pprint
import json
data = """[{
"name": "1", "children":
[
{"name":"3","children":"5"},{"name":"33","children":"51"},
{"name":"13","children":[{"name":"20","children":"30"},
{"name":"40","children":"50"}]}
]
},
{
"name": "2", "children":
[
{"name":"7","children":"6"},
{"name":"3","children":"521"},
{"name":"193","children":"292"}
]
}]"""
content = json.loads(data)
def pruned_nodecopy(content, prune_level, depth=0):
if not isinstance(content, list):
return content
result = []
for node in content:
node_copy = {'name': node['name']}
if 'children' in node:
children = node['children']
if not isinstance(children, list):
node_copy['children'] = children
elif depth+1 < prune_level:
node_copy['children'] = pruned_nodecopy(node['children'], prune_level, depth+1)
result.append(node_copy)
return result
content_copy = pruned_nodecopy(content, 2)
pprint.pprint (content_copy)
Note that this is specifically copying the attributes you use. I had to make it use hard-coded attributes because you're asking for specific (and different) behaviors on those.
Note: Modified original question to demonstrate exactly what I am trying to do
I have two lists of JSON objects that I need to check are the "same" e.g.
x = [
{
"Id": "A",
"Type": "A",
"Component": "A",
"Level": "A",
"Debug": "A",
"Space": 1
},
{
"Id": "B",
"Type": "B",
"Component": "B",
"Level": "B",
"Debug": "B",
}
]
y = [
{
"Id": "B",
"Type": "B",
"Component": "B",
"Level": "B",
"Debug": "B",
},
{
"Id": "A",
"Type": "A",
"Component": "A",
"Level": "A",
"Debug": "A",
"Space": 1
}
]
For the purpose of what I am doing these objects would be classed as identical.
Background: I have data returned from an API that I have no control over. I need to check if the returned data is identical to a pre-stored definition that I have. If it is identical, that is both lists contain the same number of objects and those contain the same keys, then I need to return True. Otherwise False. Note that the keys in the dictionaries can be in a different order.
What I have tried so far:
Performing a json.dumps() with sort_keys set. This only seems to sort the top level objects and not drill down into the child objects.
I have also tried in object y moving object A above object B. Then doing a x == y comparison does return True.
Based on the above, it looks like the in built in equality operator expects the objects to be sorted in the same order. Therefore, I think something that can sort based on Type and then Id might be the way forward. The Type field determines if the object will contain a Space attribute, the addition of which in some of the objects seems to indicate that ordering is important.
Note 1: Performance is not important.
Note 2: I am not familiar with Python so unsure which of tool to use from the toolbox, or if there is anything built in.
Thanks.
Json are just serialized objects, you can loads into python dicts with the json module and the loads or load method (depends on your input) then just use the == operator
import json
d1 = json.loads("""[{
"a": 1,
"b": 2,
"c": {
"x": 9,
"y": 10
}
},
{
"a": 10,
"b": 20,
"c": {
"x": 90,
"y": 100
}
}]""")
d2 = json.loads("""[{
"b": 2,
"a": 1,
"c": {
"x": 9,
"y": 10
}
},
{
"a": 10,
"b": 20,
"c": {
"y": 100,
"x": 90
}
}]""")
print(d1 == d2) # True
As a side notes, your json are not valids, I had to add some commas
I want to sort the JSON below by ID.
This is unsuccessful:
records2 = sorted(records, key=lambda d: d["id"])
Question
How do I sort this JSON code by ID?
{
"item": [
{
"id": "2",
"name": "name1",
"arr": [
"a",
"b"
]
},
{
"id": "1",
"name": "name2",
"arr": [
"c",
"d"
]
}
]
}
try this
records2 ={"items" : sorted(records["item"], key=lambda d: d["id"])}
sorted work on array(list) and u must go to items whitch is array(list
)