MongoDB: conditional updates considering arrays as unordered - python

I need each document in a collection to be updated only if its content is different, regardless of the order of the elements in nested lists.
Fundamentally, two versions should be the same if the elements are identical regardless of their order. MongoDB does not do that, by default.
def upsert(query, update):
# collection is a pymongo.collection.Collection object
result = collection.update_one(query, update, upsert=True)
print("\tFound match: ", result.matched_count > 0)
print("\tCreated: ", result.upserted_id is not None)
print("\tModified existing: ", result.modified_count > 0)
query = {"name": "Some name"}
update = {"$set": {
"products": [
{"product_name": "a"},
{"product_name": "b"},
{"product_name": "c"}]
}}
print("First update")
upsert(query, update)
print("Same update")
upsert(query, update)
update = {"$set": {
"products": [
{"product_name": "c"},
{"product_name": "b"},
{"product_name": "a"}]
}}
print("Update with different order of products")
upsert(query, update)
Output:
First update
Found match: False
Created: True
Modified existing: False
Same update
Found match: True
Created: False
Modified existing: False
Update with different order of products
Found match: True
Created: False
Modified existing: True
The last update does modify the document because the order of products are indeed different.
I did find a working solution which is to compare a sorting of the queried document's content and a sorting of the new one.
Thanks to Zero Piraeus's response for the short and convenient way to sort for comparison.
def ordered(obj):
if isinstance(obj, dict):
return sorted((k, ordered(v)) for k, v in obj.items())
if isinstance(obj, list):
return sorted(ordered(x) for x in obj)
else:
return obj
I apply it to compare the current and the new versions of the document. If their sorting are different, I apply the update.
new_update = {
"products": [
{"product_name": "b"},
{"product_name": "c"},
{"product_name": "a"}]
}
returned_doc = collection.find_one(query)
# Merging remote document with local dictionary
merged_doc = {**returned_doc, **new_update}
if ordered(returned_doc) != ordered(merged_doc):
upsert(query, {"$set": new_update})
print("Updated")
else:
print("Not Updated")
Output:
Not Updated
That works, but that relies on python to do the comparison, introducing a delay between the read and the write.
Is there a way to do it atomically ? Or, even better, a way to set a MongoDB Collection to adopt some kind of "order inside arrays doesn't matter" mode ?
This is part of a generic implementation. Documents can have any kind of nesting in their structure.

#nimrodserok correctly pointed out a flaw in my first answer. Here's my updated answer that's a slight variation of his answer.
This should also preserve the upsert option.
new_new_update = [
{
"$set": {
"products": {
"$let": {
"vars": {
"new_products": [
{"product_name": "b"},
{"product_name": "c"},
{"product_name": "a"}
],
"current_products": {
# need this for upsert
"$ifNull": ["$products", []]
}
},
"in": {
"$cond": [
{"$setEquals": ["$$current_products", "$$new_products"]},
"$$current_products",
"$$new_products"
]
}
}
}
}
}
]
Here's a mongoplayground.net example to demonstrate the concept. You can change the "name" value to verify the upsert option.
I'm curious what the result values are for this update_one.

EDIT:
According to your comment (and the resemblance to other question) I suggest:
db.collection.updateMany(
{"name": "Some name"},
[{
$set: {products: {
$cond: [
{$setEquals: [["a", "c", "b"], "$products.product_name"]},
"$products",
[{"product_name": "a"}, {"product_name": "c"}, {"product_name": "b"}]
]
}}
}]
)
See how it works on the playground example
Original answer:
One option is to use the query part of the update to work only on documents that are matching your condition:
db.collection.update(
{"name": "Some name",
"products.product_name": {
$not: {$all: ["a", "b", "c"]}}
},
{"$set": {
"products": [
{"product_name": "b"},
{"product_name": "c"},
{"product_name": "a"}
]
}}
)
See how it works on the playground example

Related

Filtering list and classify according to ID

I have a list of dictionaries in my python app. A sample list is as follows:
[
{
"rule": {"uid": "4"},
"layer": {"name": "Premium Network", "type": "access-layer"},
"package": {"name": "Standard"},
},
{
"rule": {"uid": "10"},
"layer": {"name": "Premium Network", "type": "access-layer"},
"package": {"name": "Standard"},
},
{
"rule": {"uid": "2"},
"layer": {"name": "Premium Network", "type": "access-layer"},
"package": {"name": "Premium"},
},
{
"rule": {"uid": "5"},
"layer": {"name": "Premium Network", "type": "access-layer"},
"package": {"name": "Premium"},
},
{
"rule": {"uid": "78"},
"layer": {"name": "Premium Network", "type": "access-layer"},
"package": {"name": "Premium"},
},
]
I want to filter the list in such a way where for each package name I get the UID in a list.
I already have a class 'Rule' with attributes policy name and uids to store the data.
In my example above I should get two Rule objects:
1. 1st rule object with name Standard and uids ["4", "10"]
2. 2nd rule object with name Premium and uids ["2", "5", "78"]
I achieved this with the use of a for loop and flags but I was wondering if there was a better way to do this using filters in Python or any other better way than a long for loop with flags?
Below is the code i used to achieve this functionality:
#staticmethod
def fetch_rules(object_name, filtered_list):
access_rules_uid = []
rules = []
current_policy_processing = None
# If list is not empty, proceed
for rule in filtered_list:
# Get the policy name
policy = rule["package"]["name"]
# Flag the current policy to
# know when to switch
if current_policy_processing is None:
current_policy_processing = policy
# If policy name has changed
if current_policy_processing != policy:
# Create new rule object and assign data
rl = Rule(object_name, current_policy_processing, [])
rl.access_rules_uid.extend(access_rules_uid)
rules.append(rl)
# Change old policy name to new
current_policy_processing = policy
# Clear uid array
access_rules_uid.clear()
access_rules_uid.append(rule["rule"]["uid"])
if access_rules_uid:
# This is to add the last item
# Create new rule object and assign data
rl = Rule(object_name, current_policy_processing, [])
rl.access_rules_uid.extend(access_rules_uid)
rules.append(rl)
# Cleanup
# Change old policy name to none
current_policy_processing = None
# Clear uid array
access_rules_uid.clear()
return rules
named_rules = [(a["rule"]["uid"], a["package"]["name"]) for a in yourlist]
groups = {}
for rule, name in named_rules:
if name in groups:
groups[name].append(name)
else:
groups[name] = [name]
print(groups)
A one liner to parse the input, and then group the parsed input into groups as your requirement.
I will leave parsing into your formatted output for you
You could (and should) skip the one liner, and parse directly. I just wanted to make the example more readable.
You can do it using itertools.groupby like this:
def transform(array):
package_name_getter = lambda item: item['package']['name']
sorted_array = sorted(array, key=package_name_getter)
return [
Rule(
package_name,
[
item['rule']['uid']
for item in items
],
)
for package_name, items in itertools.groupby(sorted_array, key=package_name_getter)
]

Split dictionary into multiple dicts based on List value

I'm working with a nested mongo backend and am in the process of mapping it to SQL database tables. A user can fill in forms which will be stored as following
{
"response": {
"question_1": "answer_1",
"question_2": [
"answer_a",
"answer_b"
]
},
"user": "user_1",
"form_id": "2344"
}
Questions can have multiple answers stored as an array.
I would like to flatten this into a long format (Ie a single record per question-answer combination) like so
[
{
"question_id": "question_1",
"answer": "answer_1",
"user": "user_1",
"form_id": "2344"
},
{
"question_id": "question_2",
"answer": "answer_a",
"user": "user_1",
"form_id": "2344"
},
{
"question_id": "question_2",
"answer": "answer_b",
"user": "user_1",
"form_id": "2344"
}
]
What would be the most efficient way to achieve this in python?
I could brute-force it by looping over every response in the response dict but I have the feeling that's overkill...
Many thanks
EDIT:
A first working attempt uses the following function:
def transform_responses_to_long(completed_form: Dict):
responses = completed_form["response"]
del completed_form["response"]
for key, values in responses.items():
if not isinstance(values, (List, Dict)):
values = [values]
for value in values:
result = {}
result.update(completed_form)
result.update({"question_id": key, "answer": value})
yield result
Which yields the correct dict for every question: answer combination

Group json objects to make tree structure [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have a list of json objects like this -
[{Level1: "A", Level2: "B", Level3: "C", Level4: "item1"},
{Level1: "A", Level2: "B", Level3: null, Level4: "item2"},
{Level1: "D", Level2: null, Level3: null, Level4: "item3"}]
In Python, I want to group them by level to create a tree structure.
{text: "root": items:
[{text: "A", items: [
{text: "B", items: [
{text: "C", items: [
text: "item1", items:[]]},
{text: "item2", items: []}}]},
{text: "D", items: [{text: "item3", items: []}]}
]
]}
# pseudocode
result = dict()
result["text"] = "root"
result["items"] = []
d = {"Level1": set(), "Level2": set(), "Level3": set(), "Level4": set()}
for row in data_rows:
insertLocation = result["items"]
for key in ["Level1", "Level2", "Level3", "Level4"]:
txt = row[key]
if txt in d[key]:
for j in insertLocation:
if j.text = txt:
insertLocation = j
break
else:
newItem = {"text": txt, "items": []}
insertLocation = newItem.items
d[key].add(txt)
Can anyone provide any feedback on my code to perhaps make it more efficient? (or if there's a better way to do this, would be super great to know). I'm really looking to maximize efficiency.
Here is how I would suggest doing it.
To understand this, you will have to understand:
functools.reduce
collections.defaultdict
Recursion
List comprehensions
Basically, I first change the input (because the keys don't matter), to be a list of lists of non-null values.
Then I functools.reduce the lists of non-null values into a simple tree structure which is of the shape that we want.
Everything in this simple_tree is a collections.defaultdict so I finally convert these into normal dicts and use recursion to get the "text"/"items" structure that you want.
This approach is more "functional" than your approach, and is smarter about pruning the input.
import collections
import functools
sequence = [
{"Level1": "A", "Level2": "B", "Level3": "C", "Level4": "item1"},
{"Level1": "A", "Level2": "B", "Level3": None, "Level4": "item2"},
{"Level1": "D", "Level2": None, "Level3": None, "Level4": "item3"},
]
# Change the input to be only the non-null values (order is preserved)
# I am using 2 list comprehensions here
seq_values = [[y for y in x.values() if y is not None] for x in sequence]
# Define a recursive defaultdict
# https://stackoverflow.com/questions/19189274/nested-defaultdict-of-defaultdict
def recursive_defaultdict():
return collections.defaultdict(recursive_defaultdict)
# Use reduce to get a simple tree in the shape/structure we want
def reduce_function(agg, cur):
sub_agg = agg
for seq_value in cur:
sub_agg = sub_agg[seq_value]
return agg
# Use reduce to get a simple tree in the shape/structure we want
simple_tree = functools.reduce(
reduce_function,
seq_values,
recursive_defaultdict()
)
# Use recursion to change simple "defaultdict tree" into custom text/items tree
def convert_defaultdict_tree_to_custom_dict(dd_tree):
if dd_tree:
return [{"text": x, "items": convert_defaultdict_tree_to_custom_dict(y)} for x, y in dd_tree.items()]
return []
# Here is the final_result, the thing you want.
final_result = {
"text": "root",
"items": convert_defaultdict_tree_to_custom_dict(simple_tree)
}
# Test the final_result is correct
expected_result = {
"text": "root",
"items": [
{
"text": "A",
"items": [
{
"text": "B",
"items": [
{
"text": "C",
"items": [
{"text": "item1", "items": []},
],
}, {
"text": "item2",
"items": [],
}
],
}
],
},
{"text": "D", "items": [{"text": "item3", "items": []}]},
],
}
assert final_result == expected_result
You can see at the end I do an assert to make sure the final_result is what we want. Let me know if you want help understanding any of this.
I suggest you check this answer by #anom on "What are some good code optimization methods?". In summary:
Step 1. Do not think about performance, think about clarity and correctness.
...
Step 4. See step 1.
Here's an alternative:
data_rows = [
{"Level1": "A", "Level2": "B", "Level3": "C", "Level4": "item1"},
{"Level1": "A", "Level2": "B", "Level3": None, "Level4": "item2"},
{"Level1": "D", "Level2": None,"Level3": None, "Level4": "item3"}
]
result = { "root": {} }
for row in data_rows:
current_level = result["root"]
# go through every "level" (key) in the row except the last one
for level in row.keys()[:-1]:
# make sure the level is not None
if row[level]:
# check if the level already exists
if row[level] not in current_level:
# if it doesn't, create this level
current_level[row[level]] = {}
# we move to the new level
current_level = current_level[row[level]]
# we set the "item" in the last valid level
current_level["items"]=row[row.keys()[-1]]
import json
print(json.dumps(result, indent=4))
This will create a tree as below:
{
"root": {
"A": {
"B": {
"items": "item2",
"C": {
"items": "item1"
}
}
},
"D": {
"items": "item3"
}
}
}
So instead of having a separate key "text" with the level name, the level name itself becomes the key.
The above code will work with any number of levels, as long as the last item in every data_row is the "item"

Line wrapping code for nested dictionaries in python

with the new Entity Resolution in Alexa, nested dictionaries become very nested. what's the most pythonic way to refer to a deeply nested value? how do i write the code keeping within 79 characters per line?
this is what i currently have, and while it works, i'm pretty sure there is a better way:
if 'VolumeQuantity' in intent['slots']:
if 'resolutions' in intent['slots']['VolumeQuantity']:
half_decibels = intent['slots']['VolumeQuantity']['resolutions']['resolutionsPerAuthority'][0]['values'][0]['value']['name'].strip()
elif 'value' in intent['slots']['VolumeQuantity']:
half_decibels = intent['slots']['VolumeQuantity']['value'].strip()
Here is a partial sample of the json from alexa
{
"type": "IntentRequest",
"requestId": "amzn1.echo-api.request.9a...11",
"timestamp": "2018-03-28T20:37:21Z",
"locale": "en-US",
"intent": {
"name": "RelativeVolumeIntent",
"confirmationStatus": "NONE",
"slots": {
"VolumeQuantity": {
"name": "VolumeQuantity",
"confirmationStatus": "NONE"
},
"VolumeDirection": {
"name": "VolumeDirection",
"value": "softer",
"resolutions": {
"resolutionsPerAuthority": [
{
"authority": "amzn1.er-authority.echo-blah-blah-blah",
"status": {
"code": "ER_SUCCESS_MATCH"
},
"values": [
{
"value": {
"name": "down",
"id": "down"
}
}
]
}
]
},
"confirmationStatus": "NONE"
}
}
},
"dialogState": "STARTED"
}
You are probably refering to nested dictionaries, lists only accept integer indices.
Anyway, (ab?)using the implied line continuation inside parentheses, I think this is pretty readable:
>>> d = {'a':{'b':{'c':'value'}}}
>>> (d
... ['a']
... ['b']
... ['c']
... )
'value'
or alternatively
>>> (d['a']
... ['b']
... ['c'])
'value'
First, you can use some well-named intermediate variables to make the program more readable as well as simpler and faster:
volumes = intent['slots'] # Pick meaningful names. I'm just guessing.
if 'VolumeQuantity' in volumes:
quantity = volumes['VolumeQuantity']
if 'resolutions' in quantity:
half_decibels = quantity['resolutions']['resolutionsPerAuthority'][0]['values'][0]['value']['name'].strip()
elif 'value' in quantity:
half_decibels = quantity['value'].strip()
Second, you can write a helper function nav(structure, path) for navigating through these structures, so that, e.g.
nav(quantity, 'resolutions.resolutionsPerAuthority.0.values.0.value.name')
splits up the given path and does the sequence of indexing/lookup operations. It could use dict.get(key, default) so you don't have to do so many if key in dict checks.

Value in dictionary changes when variable changes

Trying to grab certain values from a json file and then 're-create' a new json file (sort of like a conversion). In the code below. I do the following:
define function that returns an dictionary
for each item in json, if function returns results, add results to list located inside parentObj dictionary
oldFile.json:
{
"elements": [
{
"fieldValues": [
{
"id": "101",
"value": "John"
},
{
"id": "102",
"value": "Doe"
}
]
},
{
"fieldValues": [
{
"id": "101",
"value": "Jane"
},
{
"id": "102",
"value": "Doe"
}
]
}
]
}
file.py
import json
import os
output = {}
parentObj = {}
parentObj['entries'] = []
def grabVals(iCounter):
# legend is a pre-determined dictionary matching ids with names (like first/last)
for myKey in subResults['elements'][iCounter]['fieldValues']:
if myKey['id'] in legend:
if 'value' in myKey:
newEntry = {legend[myKey['id']]: myKey['value']}
output.update(newEntry) # first adds 'John', then 'Doe'
# sample output below; next iteration would be 'Jane Doe'
# {"First": "John", "Last": "Doe"}
return output
subResults = json.loads(oldFile.json)
formCount = len(subResults['elements']) # subResults is the json above file. Grab total number of entries
for counter in range(0, formCount):
if convertTime(formEntryStamp, formEntryID) == 'keep': # self defined function (returns keep or None)
parentObj['entries'].append(grabVals(counter))
else:
pass
export = json.dumps(parent_obj, indent=4, sort_keys=False) # create new json based of dictionary
f = open("finished.json", "w")
f.write(export)
f.close()
Expected data in finished.json
{
"entries": [
{
"First": "John",
"Last": "Doe"
},
{
"First": "Jane",
"Last": "Doe"
}
]
}
Actual data in finished.json:
{
"entries": [
{
"First": "Jane",
"Last": "Doe"
},
{
"First": "Jane",
"Last": "Doe"
}
]
}
My question: How do I permanently write to parentObj? When output is changed in the function, the value inside parentObj is overwritten with new value. Does this have something to do mututable/immutable objects? Please let me know any further clarification is required.
Related links are similar, but refer to lists, whereas my is an object/dictionary:
Link 1
Link 2
After doing some reading on mutation of objects in python (link here), the code below solved my problem. Similar to what Juanpa said in the comments, I mutated the variable that was being re-used in my function (output). I assume with the code below, I am creating a copy thus leaving the original untouched.
def grabVals(iCounter, output=None):
if output == None:
output = {}
for myKey in subResults['elements'][iCounter]['fieldValues']:
if myKey['id'] in legend:
if 'value' in myKey:
newEntry = {legend[myKey['id']]: myKey['value']}
output.update(newEntry)
return output

Categories

Resources