Nested complex query to MongoDB with Python - python

I have the following document in my MongoDB:
{
"_id" : ObjectId("5a672fe5c9afd19e04d011ca"),
"data" : [
{
"name" : "Smith",
"age" : 10,
"spouse" : "Lopez"
},
{
"name" : "Davis",
"age" : 10,
"spouse" : "Peter"
},
{
"name" : "Clark",
"age" : 10
}
],
"header" : {
"sourece" : "http://www.some.com/api/json/data?department=security&gender=female",
"fetch_time" : "2018-01-23T09:35:51"
}
}
Now I want to:
Get all the data under "data" node.
Get all the people who have
"spouse" node.
The following code doesn't work:
from pymongo import MongoClient
from pprint import pprint
client = MongoClient('mongodb://localhost:27017/')
db = client['test']
coll = db['test_2']
print('All content:')
for item in coll.find():
pprint(item)
print('-'*20)
print("Content under 'data':")
for item in coll.find({"data": "$all"}):
pprint(item)
for item in coll.find({"data": []}):
pprint(item)
for item in coll.find({"data": ["$all"]}):
pprint(item)
print('-'*20)
print("People who have 'spouse':")
for item in coll.find({"data": [{"spouse":"$all"}]}):
pprint(item)
The above code outputs the following:
All content:
{u'_id': ObjectId('5a672fe5c9afd19e04d011ca'),
u'data': [{u'age': 10, u'name': u'Smith', u'spouse': u'Lopez'},
{u'age': 10, u'name': u'Davis', u'spouse': u'Peter'},
{u'age': 10, u'name': u'Clark'}],
u'header': {u'fetch_time': u'2018-01-23T09:35:51',
u'sourece': u'http://www.some.com/api/json/data?department=security&gender=female'}}
--------------------
Content under 'data':
--------------------
People who have 'spouse':
I can get all the content from my MongoDB, which means the data is there in the database. But when I run the subsequent code, nothing was printed. I tried different ways but none of them work.
Moreover, is there any document like, say Oracle SQL reference.pdf stating the query statement grammar with strict structure specification so I can build any query statement based on it?

No need to get all data.
First part ( Regular Query ) - Read here
- Use projection to output all data fields with no query filter.
Something like coll.find({},{"data": 1}).
Second part ( Aggregate Query ) - Read here - Use $match to contain the documents where "data" have atleast have one array element where it has spouse field followed by $filter with $type expression to check for missing field to $project matching array elements.
Something like
col.aggregate([
{"$match":{"data.spouse":{"$exists":true}}},
{"$project":{
"data":{
"$filter":{
"input":"$data",
"as":"result",
"cond":{"$ne":[{"$type":"$$result.spouse"},"missing"]
}
}
}
}}
])
Also not query operators are different from aggregation comparison operators.

Related

Extracting and updating a dictionary from array of dictinaries in MongoDB

I have a structure like this:
{
"id" : 1,
"user" : "somebody",
"players" : [
{
"name" : "lala",
"surname" : "baba",
"player_place" : "1",
"start_num" : "123",
"results" : {
"1" : { ... }
"2" : { ... },
...
}
},
...
]
}
I am pretty new to MongoDB and I just cannot figure out how to extract results for a specific user (in this case "somebody", but there are many other users and each has an array of players and each player has many results) for a specific player with start_num.
I am using pymongo and this is the code I came up with:
record = collection.find(
{'user' : name}, {'players' : {'$elemMatch' : {'start_num' : start_num}}, '_id' : False}
)
This extracts players with specific player for a given user. That is good, but now I need to get specific result from results, something like this:
{ 'results' : { '2' : { ... } } }.
I tried:
record = collection.find(
{'user' : name}, {'players' : {'$elemMatch' : {'start_num' : start_num}}, 'results' : result_num, '_id' : False}
)
but that, of course, doesn't work. I could just turn that to list in Python and extract what I need, but I would like to do that with query in Mongo.
Also, what would I need to do to replace specific result in results for specific player for specific user? Let's say I have a new result with key 2 and I want to replace existing result that has key 2. Can I do it with same query as for find() (just replacing method find with method replace or find_and_replace)?
You can replace a specific result and the syntax for that should be something like this,
assuming you want to replace the result with key 1,
collection.updateOne({
"user": name,
"players.start_num": start_num
},
{ $set: { "players.$.results.1" : new_result }})

update nth document in a nested array document in mongo

I need to update a document in an array inside another document in Mongo DB.
{
"_id" : ObjectId("51cff693d342704b5047e6d8"),
"author" : "test",
"body" : "sdfkj dsfhk asdfjad ",
"comments" : [
{
"author" : "test",
"body" : "sdfkjdj\r\nasdjgkfdfj",
"email" : "test#tes.com"
},
{
"author" : "hola",
"body" : "sdfl\r\nhola \r\nwork here"
}
],
"date" : ISODate("2013-06-30T09:12:51.629Z"),
"permalink" : "mxwnnnqafl",
"tags" : [
"ab"
],
"title" : "cd"
}
If I try to update first document in comments array by below command, it works.
db.posts.update({'permalink':"cxzdzjkztkqraoqlgcru"},{'$inc': {"comments.0.num_likes": 1}})
But if I put the same in python code like below, I am getting Write error, that it can't traverse the element. I am not understanding what is missing!!
Can anyone help me out please.
post = self.posts.find_one({'permalink': permalink})
response = self.posts.update({'permalink': permalink},
{'$inc':"comments.comment_ordinal.num_likes": 1}})
WriteError: cannot use the part (comments of comments.comment_ordinal.num_likes) to traverse the element
comment_ordinal should be a substitution, not the index itself. You're treating it like an integer that can be mapped to an ordinal number. I mean you should do something like:
updated_field = "comments." + str(comment_ordinal) + ".num_likes"
response = self.posts.update({'permalink': permalink}, {'$inc': {updated_field: 1}})
Hope this helps.
You are doing it wrong you need to build your query dynamically and the best way to do that is using the str.format method.
response = self.posts.update_one(
{'permalink': permalink},
{'$inc': {"comments.{}.num_likes".format(comment_ordinal): 1}}
)
Also you should consider to use the update_one method for single update and update_many if you need to update multiple documents because update is deprecated.

MongoDB generating same ID between inserts

I am using pymongo and I am trying to insert dicts into mongodb database. My dictionaries look like this
{
"name" : "abc",
"Jobs" : [
{
"position" : "Systems Engineer (Data Analyst)",
"time" : [
"October 2014",
"May 2015"
],
"is_current" : 1,
"location" : "xyz",
"organization" : "xyz"
},
{
"position" : "Systems Engineer (MDM Support Lead)",
"time" : [
"January 2014",
"October 2014"
],
"is_current" : 1,
"location" : "xxx",
"organization" : "xxx"
},
{
"position" : "Asst. Systems Engineer (ETL Support Executive)",
"time" : [
"May 2012",
"December 2013"
],
"is_current" : 1,
"location" : "zzz",
"organization" : "xzx"
},
],
"location" : "Buffalo, New York",
"education" : [
{
"school" : "State University of New York at Buffalo - School of Management",
"major" : "Management Information Systems, General",
"degree" : "Master of Science (MS), "
},
{
"school" : "Rajiv Gandhi Prodyogiki Vishwavidyalaya",
"major" : "Electrical and Electronics Engineering",
"degree" : "Bachelor of Engineering (B.E.), "
}
],
"id" : "abc123",
"profile_link" : "example.com",
"html_source" : "<html> some_source_code </html>"
}
I am getting this error:
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index:
Linkedin_DB.employee_info.$id dup key: { :
ObjectId('56b64f6071c54604f02510a8') }
When I run my program 1st document gets inserted properly but when I insert the second document I get this error. When I start my script again the document which was not inserted because of this error get inserted properly and error comes for next document and this continues.
Clearly mognodb is using the same objecID during two inserts. I don't understand why mongodb is failing to generate a unique ID for new documents.
My code to save passed data:
class Mongosave:
"""
Pass collection_name and dict data
This module stores the passed dict in collection
"""
def __init__(self):
self.connection = pymongo.MongoClient()
self.db = self.connection['Linkedin_DB']
def _exists(self, id):
#To check if user alredy exists
return True if list(self.collection.find({'id': id})) else False
def save(self, collection_name, data):
self.collection = self.db[collection_name]
if not self._exists(data['id']):
print (data['id'])
self.collection.insert(data)
else:
self.collection.update({'id':data['id']}, {"$set": data})
I can figure out why this is happening. Any help is appreciated.
The problem is that your save method is using a field called "id" to decide if it should do an insert or an upsert. You want to use "_id" instead. You can read about the _id field and index here. PyMongo automatically adds an _id to you document if one is not already present. You can read more about that here.
You might have inserted two copies of the same document into your collection in one run.
I cannot quite understand what do you mean by:
When I start my script again the document which was not inserted because of this error get inserted properly and error comes for next document and this continues.
What I do know is if you do:
from pymongo import MongoClient
client = MongoClient()
db = client['someDB']
collection = db['someCollection']
someDocument = {'x': 1}
for i in range(10):
collection.insert_one(someDocument)
You'll get a:
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index:
This make me think although pymongo would generate a unique _id for you if you don't provide one, it is not guaranteed to be unique, especially if the document provided is not unique. Presumably pymongo is using some sort of hash algorithm on what you insert for their auto-gen _id without changing the seed.
Try generate your own _id and see if it would happen again.
Edit:
I just tried this and it works:
for i in range(10):
collection.insert_one({'x':1})
This make me think the way pymongo generates _id is associated with the object you feed into it, this time I'm not referencing to the same object anymore and the problem disappeared.
Are you giving your database two references of a same object?

Pymongo find by _id in subdocuments

Assuming that this one item of my database:
{"_id" : ObjectID("526fdde0ef501a7b0a51270e"),
"info": "foo",
"status": true,
"subitems : [ {"subitem_id" : ObjectID("65sfdde0ef501a7b0a51e270"),
//more},
{....}
],
//more
}
I want to find (or find_one, doesn't matter) the document(s) with "subitems.subitem_id" : xxx.
I have tried the following. All of them return an empty list.
from pymongo import MongoClient,errors
from bson.objectid import ObjectId
id = '65sfdde0ef501a7b0a51e270'
db.col.find({"subitems.subitem_id" : id } ) #obviously wrong
db.col.find({"subitems.subitem_id" : Objectid(id) })
db.col.find({"subitems.subitem_id" : {"$oid":id} })
db.col.find({"subitems.subitem_id.$oid" : id })
db.col.find({"subitems.$.subitem_id" : Objectid(id) })
In mongoshell this one works however:
find({"subitems.subitem_id" : { "$oid" : "65sfdde0ef501a7b0a51e270" } })
The literal 65sfdde0ef501a7b0a51e270 is not hexadecimal, hence, not a valid ObjectId.
Also, id is a Python built-in function. Avoid reseting it.
Finally, you execute a find but do not evaluate it, so you do not see any results. Remember that pymongo cursors are lazy.
Try this.
from pymongo import MongoClient
from bson.objectid import ObjectId
db = MongoClient().database
oid = '65cfdde0ef501a7b0a51e270'
x = db.col.find({"subitems.subitem_id" : ObjectId(oid)})
print list(x)
Notice I adjusted oid to a valid hexadecimal string.
Same query in the Mongo JavaScript shell.
db.col.find({"subitems.subitem_id" : new ObjectId("65cfdde0ef501a7b0a51e270")})
Double checked. Right answer is db.col.find({"subitems.subitem_id" : Objectid(id)})
Be aware that this query will return full record, not just matching part of sub-array.
Mongo shell:
a = ObjectId("5273e7d989800e7f4959526a")
db.m.insert({"subitems": [{"subitem_id":a},
{"subitem_id":ObjectId()}]})
db.m.insert({"subitems": [{"subitem_id":ObjectId()},
{"subitem_id":ObjectId()}]})
db.m.find({"subitems.subitem_id" : a })
>>> { "_id" : ObjectId("5273e8e189800e7f4959526d"),
"subitems" :
[
{"subitem_id" : ObjectId("5273e7d989800e7f4959526a") },
{"subitem_id" : ObjectId("5273e8e189800e7f4959526c")}
]}

$addToSet nested nested object using pymongo

As part of a for-in loop in python using pymongo I want to add some nested documents/objects inside a linktype field which is to be within a links field:
neither the links field or linktype field exists before the first such entries are to be added.
What is the commands to do this ?
Here is an item before adding links:
item = {
"_id" : ObjectId("5067c26b9d595266e25e825a"),
"name": "a Name"
}
And after adding one link of type typeA:
toType = "typeA"
to_link = {"_id" : ObjectId("5067c26b9d595266e25e825b"), "property":"value"}
{
"_id" : ObjectId("5067c26b9d595266e25e825a"),
"name": "a Name",
"links" : {
"typeA":{
{"_id" : ObjectId("5067c26b9d595266e25e825b"), "property":"value"}
}
}
}
I have tried:
db.collection.update({"name":"a Name"},{{"links":{"$addToSet":{toType:to_link}}})
which doesnt work.
If I just use:
db.collection.update({"name":"a Name"},{ {"$addToSet":{toType:to_link}} )
that works but that is not what i want.
$addToSet is for adding to an array. To add a new property to an existing embedded object you need to use the $set operator and dot notation as:
db.collection.update({name: 'a name'}, {$set: {'links.' + toType: to_link}})

Categories

Resources