I have a Team class with __find method to query MongoDB and get a single record:
class Team:
def __find(self, key):
team_document = self._db.get_single_data(TeamModel.TEAM_COLLECTION, key)
return team_document
A team document will look like this:
{
"_id": {
"$oid": "62291a3deb9a30c9e3cf5d28"
},
"name": "Warriors of Hell",
"race": "wizards",
"matches": [
{
"MTT001": "won"
},
{
"MCH005": "lost"
}]
My __find method gives me a full document if I pass a query parameter like
{'name':'Warriors of Hell'}
Is there a way I can create a query into which I can pass name and match ID and I get ONLY the result back? Something like:
{'name':'Warriors of Hell', ,'match_id':'MTT001'}
and get back
won
I do not know how to implement this query to look inside the team matches. Can any MongoDB/Python guru help me please?
Related
I a lot of documents which I know will rarely change and are very similar to each other, specifically I know they have a nested field in the document that is always the same (for some of them)
{
"docid": 1
"nested_field_that_will_always_be_the_same": {
"title": "this will always be the same"
"desc": "this will always be the same, too"
}
}
{
"docid": 2
"nested_field_that_will_always_be_the_same": {
"title": "this will always be the same"
"desc": "this will always be the same, too"
}
}
I don't want to store the same document over and over again, instead I want Mongo to "intern" this field, i.e only store it once and the rest will only store pointers to it.
Something like:
{
"docid": 1
"nested_field_that_will_always_be_the_same": {
"title": "this will always be the same"
"desc": "this will always be the same, too"
}
}
{
"docid": 2
"nested_field_that_will_always_be_the_same": <pointer to doc1.nested_field_that_will_always_be_the_same>
}
Now, of course, I can take out this nested field into a different document and then have Mongo reference its _id field, but I am not looking for app-side solution, because this collection is being accessed via multiple workers and I don't have all the documents that have the same nested_field_that_will_always_be_the_same at any given moment.
Instead, I want a solution provided by Mongo to only store this field once for every instance it is unique.
How can I do that?
I am using Pymongo.
This is quite an interesting challenge - I don't think a "pure" mongo solution is possible - you'll still have to modify your app code at insertion time. Quite interested to see if anyone does come up with a pure solution
What I'd probably do is add a unique index on the nested document with a partialFilterExpression, the index ensures you can quickly find the ID of the matching document, and the unique enforces this strictly.
Something like this (I shortened your field to nested for brevity):
collection.createIndex(
{ nested: 1 },
{ unique: true, partialFilterExpression: { nested: { $type: "object" } } }
);
Then for my inserts, I'd do the following (pseudo code)
try {
found = collection.findOne({ nested }, { projection: { _id: 1 } })
if (found) {
collection.insert({ docId, nested: found._id })
}
else {
collection.insert({ docId, nested })
}
}
catch (e) {
// test for E11000 and retry
}
I have a Team class with __find method to query MongoDB and get a single record:
class Team:
def __find(self, key):
team_document = self._db.get_single_data(TeamModel.TEAM_COLLECTION, key)
return team_document
A team document will look like this:
{
"_id": {
"$oid": "62291a3deb9a30c9e3cf5d28"
},
"name": "Warriors of Hell",
"race": "wizards",
"matches": [
{
"MTT001": "won"
},
{
"MCH005": "lost"
}]
Now I want to create a method named match_result
def match_result(self, team_name, match_id):
print(check if team played in match with match_id and get result')
I do not know how to implement this query to look inside the team matches. Can any MongoDB/Python guru help me please?
Also, can this be done by "tweaking" the __find method? Currently this method just does simple key-value lookups
Appreciate if you could help me for the best way to transform a result into json as below.
We have a result like below, where we are getting an information on the employees and the companies. In the result, somehow, we are getting a enum like T, but not for all the properties.
[ {
"T.id":"Employee_11",
"T.category":"Employee",
"node_id":["11"]
},
{
"T.id":"Company_12",
"T.category":"Company",
"node_id":["12"],
"employeecount":800
},
{
"T.id":"id~Employee_11_to_Company_12",
"T.category":"WorksIn",
},
{
"T.id":"Employee_13",
"T.category":"Employee",
"node_id":["13"]
},
{
"T.id":"Parent_Company_14",
"T.category":"ParentCompany",
"node_id":["14"],
"employeecount":900,
"childcompany":"Company_12"
},
{
"T.id":"id~Employee_13_to_Parent_Company_14",
"T.category":"Contractorin",
}]
We need to transform this result into a different structure and grouping based on the category, if category in Employee, Company and ParentCompany, then it should be under the node_properties object, else, should be in the edge_properties. And also, apart from the common properties(property_id, property_category and node), different properties to be added if the category is company and parent company. There are few more logic also where we have to get the from and to properties of the edge object based on the 'to' . the expected response is,
"node_properties":[
{
"property_id":"Employee_11",
"property_category":"Employee",
"node":{node_id: "11"}
},
{
"property_id":"Company_12",
"property_category":"Company",
"node":{node_id: "12"},
"employeecount":800
},
{
"property_id":"Employee_13",
"property_category":"Employee",
"node":{node_id: "13"}
},
{
"property_id":"Company_14",
"property_category":"ParentCompany",
"node":{node_id: "14"},
"employeecount":900,
"childcompany":"Company_12"
}
],
"edge_properties":[
{
"from":"Employee_11",
"to":"Company_12",
"property_id":"Employee_11_to_Company_12",
},
{
"from":"Employee_13",
"to":"Parent_Company_14",
"property_id":"Employee_13_to_Parent_Company_14",
}
]
In java, we have used the enhanced for loop, switch etc. How we can write the code in the python to get the structure as above from the initial result structure. ( I am new to python), thank you in advance.
Regards
Here is a method that I quickly made, you can adjust it to your requirements. You can use regex or your own function to get the IDs of the edge_properties then assign it to an object like the way I did. I am not so sure of your full requirements but if that list that you gave is all the categories then this will be sufficient.
def transform(input_list):
node_properties = []
edge_properties = []
for input_obj in input_list:
# print(obj)
new_obj = {}
if input_obj['T.category'] == 'Employee' or input_obj['T.category'] == 'Company' or input_obj['T.category'] == 'ParentCompany':
new_obj['property_id'] = input_obj['T.id']
new_obj['property_category'] = input_obj['T.category']
new_obj['node'] = {input_obj['node_id'][0]}
if "employeecount" in input_obj:
new_obj['employeecount'] = input_obj['employeecount']
if "childcompany" in input_obj:
new_obj['childcompany'] = input_obj['childcompany']
node_properties.append(new_obj)
else: # You can do elif == to as well based on your requirements if there are other outliers
# You can use regex or whichever method here to split the string and add the values like above
edge_properties.append(new_obj)
return [node_properties, edge_properties]
I am experimenting with Python with MongoDB. I am a newbie with python. Here I get records from a collection and based on a particular value from that collection, I find the count of that record(from the 1st collection). But my problem is I cannot append this count into my list.
Here is the code:
#gen.coroutine
def post(self):
Sid = self.body['Sid']
alpha = []
test = db.student.find({"Sid": Sid})
count = yield test.count()
print(count)
for document in (yield test.to_list(length=1000)):
cursor = db.attendance.find({"StudentId": document.get('_id')})
check = yield cursor.count()
print(check)
alpha.append(document)
self.write(bson.json_util.dumps({"data": alpha}))
the displayed output alpha is from the first collection (student), the count value is from (attendance collection).
when I try to extend the list with check I end up with error
alpha.append(document.extend(check))
But I am getting the correct count value in python terminal, I am unable to write it along with the output.
My output is like
{"data": [{"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."}}, {"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."}}]}
My output should be like
{"data": [{"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."},"count": "5"}, {"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."},"count": "3"}]}
Please guide me on how I can get my desired output.
Thank you.
A better approach to this is to use the MongoDB .aggregate() method from the python driver you are using rather than repeated .find() and .count() operations:
db.attendance.aggregate([
{ "$group": {
"_id": "$StudentId",
"name": { "$first": "$Student Name" },
"count": { "$sum": 1 }
}}
])
Then it is already done for you.
What your current code is doing is looking up the current student and returning a "count" of how many occurances there are. And you are doing that for every student by the content of your output.
Rather than do that the data is "aggregated" to return both the values from the document along with a "count" within the returned results, and it is aggregated per student.
This means you don't need to run a query for each student just to get the count. Instead you just call the database "once" and make it count all the students you need in one result.
If you need more that one student but not all students then you filter that with query conditions;
db.attendance.aggregate([
{ "$match": { "StudentId": { "$in": list_of_student_ids } } },
{ "$group": {
"_id": "$StudentId",
"name": { "$first": "$Student Name" },
"count": { "$sum": 1 }
}}
])
And the selection along with the aggregation is done for you.
No need for looping code and lots of database request. The .aggregate() method and pipeline will do it for you.
Read the core documation on the Aggregation Pipeline.
Add count entry to the dictionary document and append the dictionary:
for document in (yield test.to_list(length=1000)):
cursor = db.attendance.find({"StudentId": document.get('_id')})
check = yield cursor.count()
document['count'] = check
alpha.append(document)
I got two class on Mongoengine:
class UserPoints(EmbeddedDocument):
user = ReferenceField(User, verbose_name='user')
points = IntField(verbose_name='points', required=True)
def __unicode__(self):
return self.points
And
class Local(Document):
token = StringField(max_length=250,verbose_name='token_identifier',unique=True)
points = ListField(EmbeddedDocumentField(UserPoints),required=False)
def __unicode__(self):
return self.name
If i do something like: "LP = Local.objects.filter(points__user=user)" I got all the locals with userpoints from my user. But i Want all the UserPoints from a User. How can i?
I try also: "lUs = UserPoints.objects.filter(user=user)" but i got an empty Array.
PD: I do something like this to solve the problem, but it's not efficient.
LDPoints = []
LP = Local.objects.filter(points__user=user)
print 'List P: '+str(len(LP))
for local in LP:
for points in local.points:
if points.user == user:
dPoints = parsePoints(points)
lDPoints.append(dPoints)
Adding to the original and getting venerable answer is that the aggregation framework has $filter now for some time, which is a lot cleaner that the $map and $setDifference method used in the original answer.
Local._get_collection().aggregate([
{ "$match": { "points.user": user } },
{ "$project": {
"token": 1,
"points": {
"$filter": {
"input": "$points",
"as": "el",
"cond": { "$eq": [ "$$el.user", user ] }
}
}
}}
])
The same principles apply though for obtaining "multiple" matches from an array in the collection you use the aggregate() method of the underlying driver, as called from _get_collection().
Original
The answer to avoid "filtering" your embedded documents for the selected "user" only is to use the aggregation framework. This allows you to manipulate the "array content" on the database server rather than filtering the results in your client code.
Aggregation is done with the raw pymongo driver methods, but since Mongoengine is built on top of this driver you access the raw collection object from your class with the ._get_collection() method:
Local._get_collection().aggregate([
# Match the documents that have the required user
{ "$match": {
"points.user": user
}},
# unwind the embedded array to de-normalize
{ "$unwind": "$points" },
# Matching now filters the elements
{ "$match": {
"points.user": user
}},
# Group back as an array
{ "$group": {
"_id": "$_id",
"token": { "$first": "$token" },
"points": { "$push": "$points" }
}}
])
If you have MongoDB 2.6 or greater on your server and your "user/points" combination is always unique you can alternately filter without the $unwind|$match|$group cycle using the $map and $setDifference operators available there:
Local._get_collection().aggregate([
# Match the documents that have the required user
{ "$match": {
"points.user": user
}},
# Filter the array in place
{ "$project": {
"token": 1,
"points": {
"$setDifference": [
{
"$map": {
"input": "$points",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.user", user ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
In the second case there the $cond is a ternary operator which takes a logical expression as it's first argument and the values to return when that expression is either true or false as it's other arguments. Inside the $map, each element is tested to see if the condition is true, in this case "is the user field equal to the selected user".
Either the content of that array position is returned or otherwise false. The $setDifference takes the resulting array and "filters" the false values out, so only the matching elements are returned.
In the legacy approach, the $unwind pipeline operator is used to effectively turn each array element into it's own document with all other parent properties. This allows you to apply the same $match condition, which unlike the initial query actually removes the documents which now as single elements no longer match your condition. You always want the first stage as there is no point processing this $unwind|$match combination on all of the documents that might not contain your matching condition.
The $group stage brings everything back into line per document. Using the $first option to return all other fields that were essentially duplicated by the $unwind and the $push operator to rebuild the array with the matching elements.
So while there no "built-in" methods to MongoEngine to do this sort of query, you can do this the MongoDB way by accessing the raw driver.
Also note that if you only expected one element to match in any array for your given "user" or other query, then you could alternately use the field projection form available to the raw driver as well. But the aggregation method is required for any more than one matching element of the array.