pymongo embedded document update - python

I have following document in MongoDB 3.6
forum_collection = { '_id' : id,
'tags': tags,
'sub_topic_name' : sub_topic_name,
'topic_creator' : creator_name,
'main_forum_name' : forum_name,
'threads' : [ {'thread_id' = uuid4,
'thread_title = title,
'thread_author = author,
'thread_comment_cout = 0,
'thread_body' = content,
'thread_comments' = [ {'thread_comment_id' : uuid4,
'thread_comment_body': content,
'thread_commenter' : author,
'thread_comment_time' : time
},
]
'thread_time' = time,
},
],
SO I want to have forum_collection with multiple sub_topics, each sub_topics have threads and each thread have comments.
How can I add a new sub comment and update the thread_comment_count to 1 ? My following attempt fails miserably...my function passes three variables, sub_topic (to query the sub_topic_name), thread (to query thread_id) and thread_vals (to append new comment to thread_id)
forum_collection.find_one_and_update({'sub_topic_name': sub_topic, 'threads.thread_id' : thread},
{'$inc': {'threads.'+thread+'.thread_comments_count': 1},
'$addToSet': {thread + '.threads_comment': thread_vals}},
return_document=pymongo.ReturnDocument.AFTER
)

How does it fail?
I don't know if it was a copy/paste error but your "thread_comment_cout is misspelled .
Also the $addToSet should refer to "threads" not "thread"

I needed to add $ at appropriate places....so not 'threads.'+thread+'.thread_comments_count' but 'threads.$.thread_comments_count'...
forum_collection.find_one_and_update({'sub_topic_name': sub_topic, 'threads.thread_id' : thread},
{'$inc': {'threads.$.thread_comments_count': 1},
'$addToSet': {'threads.$.thread_comments': thread_vals}},
return_document=pymongo.ReturnDocument.AFTER
)

Related

Add Values into Nested Objects in PowerShell like you can in Python

I am looking use PowerShell to output some JSON that looks like this for use with a Python script:
{
"run_date": "2020-08-27",
"total_queries": 4,
"number_results": 3,
"number_warnings": 1,
"number_errors": 5,
"build_url": "https://some-url.com",
"queries":{
"query_a":{
"database_a": "102 rows",
"database_b": "Error: See pipeline logs for details"
},
"query_b": "No results",
"query_c": {
"database_a": "Warning: Number of results exceeded maximum threshold - 6509 rows",
"database_c": "Error: See pipeline logs for details",
"database_d": "Error: See pipeline logs for details"
}
} }
(Ignore the above closing bracket, it won't format properly on here for some reason).
I am using a foreach loop within PowerShell to run each of these queries sequentially depending on which databases they need to be ran on.
I know in Python I can create a template of the JSON like so:
options = {
'run_date': os.environ['SYSTEM_PIPELINESTARTTIME'].split()[0],
'total_queries': 0,
'number_results': 0,
'number_warnings': 0,
'number_errors': 0,
'build_url': 'options = {
'run_date': os.environ['SYSTEM_PIPELINESTARTTIME'].split()[0],
'total_hunts': 0,
'number_results': 0,
'number_warnings': 0,
'number_errors': 0,
'build_url': 'https://some-url.com',
'queries': {} }
and then use something like:
options['queries'][filename][database] = '{} rows'.format(len(data))
To add data into the Python dictionaries.
I've tried using nested PSCustomObjects but I end up with a conflict when different queries are being ran on the same database, so its trying to add a value to the PSCustomObject with the same Key. I would like to know if there is a nice 'native' way to do this in PowerShell like there is in Python.
Turns out I was just being a bit of an idiot and not remembering how to work with PowerShell objects.
Ended up first adding all the query names into the parent object like so:
foreach($name in $getqueries){
$notiObj.queries | Add-Member -NotePropertyName $name.BaseName -NotePropertyValue ([PSCustomObject]#{})}
Then adding in info about the queries themselves within the loop:
$notificationObj.queries.$queryName | Add-Member -NotePropertyName $database -NotePropertyValue "$($dataTable.Rows.Count) Rows"
If the required end-result is a Json file, there is actually no need to work with complex (and rather fat) [PSCustomObject] types. Instead you might just use a [HashTable] (or an ordered dictionary by just prefixing the hash table, like: [Ordered]#{...})
To convert hash tables from your Json file, use the ConvertFrom-Json -AsHashTable parameter (introduced in PowerShell 6.0).
To build a template (or just understand the PowerShell format), you might want to use this ConvertTo-Expression cmdlet:
$Json | ConvertFrom-Json -AsHashTable | ConvertTo-Expression
#{
'number_errors' = 5
'number_warnings' = 1
'queries' = #{
'query_b' = 'No results'
'query_a' = #{
'database_a' = '102 rows'
'database_b' = 'Error: See pipeline logs for details'
}
'query_c' = #{
'database_a' = 'Warning: Number of results exceeded maximum threshold - 6509 rows'
'database_d' = 'Error: See pipeline logs for details'
'database_c' = 'Error: See pipeline logs for details'
}
}
'build_url' = 'https://some-url.com'
'run_date' = '2020-08-27'
'number_results' = 3
'total_queries' = 4
}
Meaning you can assign this template to $Options as follows:
$Options = #{
'number_errors' = 5
'number_warnings' = 1
'queries' = #{ ...
And easily change your properties in your nested objects, like:
$Options.Queries.query_c.database_d = 'Changed'
Or add a new property to a nested object:
$Options.Queries.query_a.database_c = 'Added'
Which result in:
$Options | ConvertTo-Json
{
"run_date": "2020-08-27",
"queries": {
"query_a": {
"database_c": "Added",
"database_b": "Error: See pipeline logs for details",
"database_a": "102 rows"
},
"query_b": "No results",
"query_c": {
"database_c": "Error: See pipeline logs for details",
"database_d": "Changed",
"database_a": "Warning: Number of results exceeded maximum threshold - 6509 rows"
}
},
"number_results": 3,
"build_url": "https://some-url.com",
"total_queries": 4,
"number_errors": 5,
"number_warnings": 1
}

Python-How to find duplicated name/document in mongo db?

I want to find the duplicated document in my mongodb based on name, I have the following code:
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = [{'$project': {'name':'$name'}}]
name_cursor = collection.aggregate(names, cursor={})
for name in name_cursor:
issue_list.append(name)
print(name)
It will print all names, how can I print only the duplicated ones?
Appritiated for any help!
The following query will show only duplicates:
db['collection_name'].aggregate([{'$group': {'_id':'$name', 'count': {'$sum': 1}}}, {'$match': {'count': {'$gt': 1}}}])
How it works:
Step 1:
Go over the whole collection, and group the documents by the property called name, and for each name count how many times it is used in the collection.
Step 2:
filter (using the keyword match) only documents in which the count is greater than 1 (the gt operator).
An example (written for mongo shell, but can be easily adapted for python):
db.a.insert({name: "name1"})
db.a.insert({name: "name1"})
db.a.insert({name: "name2"})
db.a.aggregate([{"$group": {_id:"$name", count: {"$sum": 1}}}, {$match: {count: {"$gt": 1}}}])
Result is { "_id" : "name1", "count" : 2 }
So your code should look something like this:
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
{'$group': {'_id': '$name', 'count': {'$sum': 1}}},
{'$match': {'count': {'$gt': 1}}}
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()

pymongo $set on array of subdocuments

I have a pymongo collection in the form of:
{
"_id" : "R_123456789",
"supplier_ids" : [
{
"id" : "S_987654321",
"file_version" : ISODate("2016-03-15T00:00:00Z"),
"latest" : false
},
{
"id" : "S_101010101",
"file_version" : ISODate("2016-03-29T00:00:00Z"),
"latest" : true
}
]
}
when I get new supplier data, if the supplier ID has changed, I want to capture that by setting latest on the previous 'latest' to False and the $push the new record.
$set is not working as I am trying to employ it (commented code after 'else'):
import pymongo
from dateutil.parser import parse
new_id = 'S_323232323'
new_date = parse('20160331')
with pymongo.MongoClient() as client:
db = client.transactions
collection_ids = db.ids
try:
collection_ids.insert_one({"_id": "R_123456789",
"supplier_ids": ({"id": "S_987654321",
"file_version": parse('20160315'),
"latest": False},
{"id": "S_101010101",
"file_version": parse('20160329'),
"latest": True})})
except pymongo.errors.DuplicateKeyError:
print('record already exists')
record = collection_ids.find_one({'_id':'R_123456789'})
for supplier_id in record['supplier_ids']:
print(supplier_id)
if supplier_id['latest']:
print(supplier_id['id'], 'is the latest')
if supplier_id['id'] == new_id:
print(new_id, ' is already the latest version')
else:
# print('setting', supplier_id['id'], 'latest flag to False')
# <<< THIS FAILS >>>
# collection_ids.update_one({'_id':record['_id']},
# {'$set':{'supplier_ids.latest':False}})
print('appending', new_id)
data_to_append = {"id" : new_id,
"file_version": new_date,
"latest": True}
collection_ids.update_one({'_id':record['_id']},
{'$push':{'supplier_ids':data_to_append}})
any and all help is much appreciated.
This whole process seems unnaturally verbose - should I be using a more streamlined approach?
Thanks!
You can try with positional operators.
collection_ids.update_one(
{'_id':record['_id'], "supplier_ids.latest": true},
{'$set':{'supplier_ids.$.latest': false}}
)
This query will update supplier_ids.latest = false, if it's true in document and matches other conditions.
The catch is you have to include field array as part of condition too.
For more information see Update

MongoDB - Upsert with increment

I am trying to run the following query:
data = {
'user_id':1,
'text':'Lorem ipsum',
'$inc':{'count':1},
'$set':{'updated':datetime.now()},
}
self.db.collection('collection').update({'user_id':1}, data, upsert=True)
but the two '$' queries cause it to fail. Is it possible to do this within one statement?
First of all, when you ask a question like this it's very helpful to add information on why it's failing (e.g. copy the error).
Your query fails because you're mixing $ operators with document overrides. You should use the $set operator for the user_id and text fields as well (although the user_id part in your update is irrelevant at this example).
So convert this to pymongo query:
db.test.update({user_id:1},
{$set:{text:"Lorem ipsum", updated:new Date()}, $inc:{count:1}},
true,
false)
I've removed the user_id in the update because that isn't necessary. If the document exists this value will already be 1. If it doesn't exist the upsert will copy the query part of your update into the new document.
If you're trying to do the following:
If the doc doesn't exist, insert a new doc.
If it exists, then only increment one field.
Then you can use a combo of $setOnInsert and $inc. If the song exists then $setOnInsert won't do anything and $inc will increase the value of "listened". If the song doesn't exist, then it will create a new doc with the fields "songId" and "songName". Then $inc will create the field and set the value to be 1.
let songsSchema = new mongoose.Schema({
songId: String,
songName: String,
listened: Number
})
let Song = mongoose.model('Song', songsSchema);
let saveSong = (song) => {
return Song.updateOne(
{songId: song.songId},
{
$inc: {listened: 1},
$setOnInsert: {
songId: song.songId,
songName: song.songName,
}
},
{upsert: true}
)
.then((savedSong) => {
return savedSong;
})
.catch((err) => {
console.log('ERROR SAVING SONG IN DB', err);
})

Searching ID or property for match in Mongo

Goal:
I want to allow the user to search for a document by ID, or allow other text-based queries.
Code:
l_search_results = list(
cll_sips.find(
{
'$or': [
{'_id': ObjectId(s_term)},
{'s_text': re.compile(s_term, re.IGNORECASE)},
{'choices': re.compile(s_term, re.IGNORECASE)}
]
}
).limit(20)
)
Error:
<Whatever you searched for> is not a valid ObjectId
s_term needs to be a valid object ID (or at least in the right format) when you pass it to the ObjectId constructor. Since it's sometimes not an ID, that explains why you get the exception.
Try something like this instead:
from pymongo.errors import InvalidId
or_filter = [
{'s_text': re.compile(s_term, re.IGNORECASE)},
{'choices': re.compile(s_term, re.IGNORECASE)}
]
try:
id = ObjectId(s_term)
or_filter.append({ '_id': id })
except InvalidId:
pass
l_search_results = list(
cll_sips.find({ '$or': or_filter }).limit(20)
)

Categories

Resources