Count a particular value from list in Python mongodb - python

I am experimenting with Python with MongoDB. I am a newbie with python. Here I get records from a collection and based on a particular value from that collection, I find the count of that record(from the 1st collection). But my problem is I cannot append this count into my list.
Here is the code:
#gen.coroutine
def post(self):
Sid = self.body['Sid']
alpha = []
test = db.student.find({"Sid": Sid})
count = yield test.count()
print(count)
for document in (yield test.to_list(length=1000)):
cursor = db.attendance.find({"StudentId": document.get('_id')})
check = yield cursor.count()
print(check)
alpha.append(document)
self.write(bson.json_util.dumps({"data": alpha}))
the displayed output alpha is from the first collection (student), the count value is from (attendance collection).
when I try to extend the list with check I end up with error
alpha.append(document.extend(check))
But I am getting the correct count value in python terminal, I am unable to write it along with the output.
My output is like
{"data": [{"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."}}, {"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."}}]}
My output should be like
{"data": [{"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."},"count": "5"}, {"Sid": "1", "Student Name": "Alex","_id": {"$oid": "..."},"count": "3"}]}
Please guide me on how I can get my desired output.
Thank you.

A better approach to this is to use the MongoDB .aggregate() method from the python driver you are using rather than repeated .find() and .count() operations:
db.attendance.aggregate([
{ "$group": {
"_id": "$StudentId",
"name": { "$first": "$Student Name" },
"count": { "$sum": 1 }
}}
])
Then it is already done for you.
What your current code is doing is looking up the current student and returning a "count" of how many occurances there are. And you are doing that for every student by the content of your output.
Rather than do that the data is "aggregated" to return both the values from the document along with a "count" within the returned results, and it is aggregated per student.
This means you don't need to run a query for each student just to get the count. Instead you just call the database "once" and make it count all the students you need in one result.
If you need more that one student but not all students then you filter that with query conditions;
db.attendance.aggregate([
{ "$match": { "StudentId": { "$in": list_of_student_ids } } },
{ "$group": {
"_id": "$StudentId",
"name": { "$first": "$Student Name" },
"count": { "$sum": 1 }
}}
])
And the selection along with the aggregation is done for you.
No need for looping code and lots of database request. The .aggregate() method and pipeline will do it for you.
Read the core documation on the Aggregation Pipeline.

Add count entry to the dictionary document and append the dictionary:
for document in (yield test.to_list(length=1000)):
cursor = db.attendance.find({"StudentId": document.get('_id')})
check = yield cursor.count()
document['count'] = check
alpha.append(document)

Related

Get Single value from MongoDB using Python

I have a Team class with __find method to query MongoDB and get a single record:
class Team:
def __find(self, key):
team_document = self._db.get_single_data(TeamModel.TEAM_COLLECTION, key)
return team_document
A team document will look like this:
{
"_id": {
"$oid": "62291a3deb9a30c9e3cf5d28"
},
"name": "Warriors of Hell",
"race": "wizards",
"matches": [
{
"MTT001": "won"
},
{
"MCH005": "lost"
}]
My __find method gives me a full document if I pass a query parameter like
{'name':'Warriors of Hell'}
Is there a way I can create a query into which I can pass name and match ID and I get ONLY the result back? Something like:
{'name':'Warriors of Hell', ,'match_id':'MTT001'}
and get back
won
I do not know how to implement this query to look inside the team matches. Can any MongoDB/Python guru help me please?

Automatically entering next JSON level using Python in a similar way to JQ in bash

I am trying to use Python to extract pricePerUnit from JSON. There are many entries, and this is just 2 of them -
{
"terms": {
"OnDemand": {
"7Y9ZZ3FXWPC86CZY": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "7Y9ZZ3FXWPC86CZY",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7",
"description": "Processed translation request in AWS GovCloud (US)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
},
"CQNY8UFVUNQQYYV4": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "CQNY8UFVUNQQYYV4",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7",
"description": "$0.000015 per Character for TextTranslationJob:TextTranslationJob in EU (London)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
}
}
}
}
The issue I run into is that the keys, which in this sample, are 7Y9ZZ3FXWPC86CZY, CQNY8UFVUNQQYYV4.JRTCKXETXF, and CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7 are a changing string that I cannot just type out as I am parsing the dictionary.
I have python code that works for the first level of these random keys -
with open('index.json') as json_file:
data = json.load(json_file)
json_keys=list(data['terms']['OnDemand'].keys())
#Get the region
for i in json_keys:
print((data['terms']['OnDemand'][i]))
However, this is tedious, as I would need to run the same code three times to get the other keys like 7Y9ZZ3FXWPC86CZY.JRTCKXETXF and 7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7, since the string changes with each JSON entry.
Is there a way that I can just tell python to automatically enter the next level of the JSON object, without having to parse all keys, save them, and then iterate through them? Using JQ in bash I can do this quite easily with jq -r '.terms[][][]'.
If you are really sure, that there is exactly one key-value pair on each level, you can try the following:
def descend(x, depth):
for i in range(depth):
x = next(iter(x.values()))
return x
You can use dict.values() to iterate over the values of a dict. You can also use next(iter(dict.values())) to get a first (only) element of a dict.
for demand in data['terms']['OnDemand'].values():
next_level = next(iter(demand.values()))
print(next_level)
If you expect other number of children than 1 in the second level, you can just nest the fors:
for demand in data['terms']['OnDemand'].values():
for sub_demand in demand.values()
print(sub_demand)
If you are insterested in the keys too, you can use dict.items() method to iterate over dict keys and values at the same time:
for demand_key, demand in data['terms']['OnDemand'].items():
for sub_demand_key, sub_demand in demand.items()
print(demand_key, sub_demand_key, sub_demand)

Bulk Update for elasticsearch documents using Python

I have elasticsearch documents like below where I need to rectify age value based on creationtime currentdate
age = creationtime - currentdate
:
hits = [
{
"_id":"CrRvuvcC_uqfwo-WSwLi",
"creationtime":"2018-05-20T20:57:02",
"currentdate":"2021-02-05 00:00:00",
"age":"60 months"
},
{
"_id":"CrRvuvcC_uqfwo-WSwLi",
"creationtime":"2013-07-20T20:57:02",
"currentdate":"2021-02-05 00:00:00",
"age":"60 months"
},
{
"_id":"CrRvuvcC_uqfwo-WSwLi",
"creationtime":"2014-08-20T20:57:02",
"currentdate":"2021-02-05 00:00:00",
"age":"60 months"
},
{
"_id":"CrRvuvcC_uqfwo-WSwLi",
"creationtime":"2015-09-20T20:57:02",
"currentdate":"2021-02-05 00:00:00",
"age":"60 months"
}
]
I want to do bulk update based on each document ID, but the problem is I need to correct 6 months of data & per data size (doc count of Index) is almost 535329, I want to efficiently do bulk update on age based on _id for each day on all documents using python.
Is there a way to do this, without looping through, all examples I came across using Pandas dataframes for update is based on a known value. But here _id I will get as and when the code runs.
The logic I had written was to fetch all doc & store their _id & then for each _id update the age . But its not an efficient way if I want to update all documents in bulk for each day of 6 months.
Can anyone give me some ideas for this or point me in the right direction.
As mentioned in the comments, fetching the IDs won't be necessary. You don't even need to fetch the documents themselves!
A single _update_by_query call will be enough. You can use ChronoUnit to get the difference after you've parsed the dates:
POST your-index-name/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": """
def created = LocalDateTime.parse(ctx._source.creationtime, DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss"));
def currentdate = LocalDateTime.parse(ctx._source.currentdate, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"));
def months = ChronoUnit.MONTHS.between(created, currentdate);
ctx._source._age = months + ' month' + (months > 1 ? 's' : '');
""",
"lang": "painless"
}
}
The official python client has this method too. Here's a working example.
🔑 Try running this update script on a small subset of your documents before letting in out on your whole index by adding a query other than the match_all I put there.
💡 It's worth mentioning that unless you search on this age field, it doesn't need to be stored in your index because it can be calculated at query time.
You see, if your index mapping's dates are properly defined like so:
{
"mappings": {
"properties": {
"creationtime": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss"
},
"currentdate": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
...
}
}
}
the age can be calculated as a script field:
POST ttimes/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"age_calculated": {
"script": {
"source": """
def months = ChronoUnit.MONTHS.between(
doc['creationtime'].value,
doc['currentdate'].value );
return months + ' month' + (months > 1 ? 's' : '');
"""
}
}
}
}
The only caveat is, the value won't be inside of the _source but rather inside of its own group called fields (which implies that more script fields are possible at once!).
"hits" : [
{
...
"_id" : "FFfPuncBly0XYOUcdIs5",
"fields" : {
"age_calculated" : [ "32 months" ] <--
}
},
...

Getting KeyError when parsing JSON in Python for following response

TL;DR:
Confused on how to parse following JSON response and get the value of [status of 12345 of dynamicValue_GGG of payload] in this case.
Full question:
I get the following as (sanitized) response upon hitting a REST API via Python code below:
response = requests.request("POST", url, data=payload,
headers=headers).json()
{
"payload": {
"name": "asdasdasdasd",
"dynamicValue_GGG": {
"12345": {
"model": "asad",
"status": "active",
"subModel1": {
"dynamicValue_67890": {
"model": "qwerty",
"status": "active"
},
"subModel2": {
"dynamicValue_33445": {
"model": "gghjjj",
"status": "active"
},
"subModel3": {
"dynamicValue_66778": {
"model": "tyutyu",
"status": "active"
}
}
}
},
"date": "2016-02-04"
},
"design": "asdasdWWWsaasdasQ"
}
If I do a type(response['payload']), it gives me 'dict'.
Now, I'm trying to parse the response above and fetch certain keys and values out of it. The problem is that I'm not able to iterate through using "index" and rather have to specify the "key", but then the response has certain "keys" that are dynamically generated and sent over. For instance, the keys called "dynamicValue_GGG", "dynamicValue_66778" etc are not static unlike the "status" key.
I can successfully parse by mentioning like:
print response['payload']['dynamicValue_GGG']['12345'][status]
in which case I get the expected output = 'active'.
However, since I have no control on 'dynamicValue_GGG', it would work only if I can specify something like this instead:
print response['payload'][0][0][status]
But the above line gives me error: " KeyError: 0 " when the python code is executed.
Is there someway in which I can use the power of both keys as well as index together in this case?
The order of values in a dictionary in Python are random, so you cannot use indexing. You'll have to iterate over all elements, potentially recursive, and test to see if it's the thing you're looking for. For example:
def find_submodels(your_dict):
for item_key, item_values in your_dict.items():
if 'status' in item_values:
print item_key, item_values['status']
if type(item_values) == dict:
find_submodels(item_values)
find_submodels(your_dict)
Which would output:
12345 active
dynamicValue_67890 active
dynamicValue_66778 active
dynamicValue_33445 active

get filtered embeded elements from all class on mongoengine

I got two class on Mongoengine:
class UserPoints(EmbeddedDocument):
user = ReferenceField(User, verbose_name='user')
points = IntField(verbose_name='points', required=True)
def __unicode__(self):
return self.points
And
class Local(Document):
token = StringField(max_length=250,verbose_name='token_identifier',unique=True)
points = ListField(EmbeddedDocumentField(UserPoints),required=False)
def __unicode__(self):
return self.name
If i do something like: "LP = Local.objects.filter(points__user=user)" I got all the locals with userpoints from my user. But i Want all the UserPoints from a User. How can i?
I try also: "lUs = UserPoints.objects.filter(user=user)" but i got an empty Array.
PD: I do something like this to solve the problem, but it's not efficient.
LDPoints = []
LP = Local.objects.filter(points__user=user)
print 'List P: '+str(len(LP))
for local in LP:
for points in local.points:
if points.user == user:
dPoints = parsePoints(points)
lDPoints.append(dPoints)
Adding to the original and getting venerable answer is that the aggregation framework has $filter now for some time, which is a lot cleaner that the $map and $setDifference method used in the original answer.
Local._get_collection().aggregate([
{ "$match": { "points.user": user } },
{ "$project": {
"token": 1,
"points": {
"$filter": {
"input": "$points",
"as": "el",
"cond": { "$eq": [ "$$el.user", user ] }
}
}
}}
])
The same principles apply though for obtaining "multiple" matches from an array in the collection you use the aggregate() method of the underlying driver, as called from _get_collection().
Original
The answer to avoid "filtering" your embedded documents for the selected "user" only is to use the aggregation framework. This allows you to manipulate the "array content" on the database server rather than filtering the results in your client code.
Aggregation is done with the raw pymongo driver methods, but since Mongoengine is built on top of this driver you access the raw collection object from your class with the ._get_collection() method:
Local._get_collection().aggregate([
# Match the documents that have the required user
{ "$match": {
"points.user": user
}},
# unwind the embedded array to de-normalize
{ "$unwind": "$points" },
# Matching now filters the elements
{ "$match": {
"points.user": user
}},
# Group back as an array
{ "$group": {
"_id": "$_id",
"token": { "$first": "$token" },
"points": { "$push": "$points" }
}}
])
If you have MongoDB 2.6 or greater on your server and your "user/points" combination is always unique you can alternately filter without the $unwind|$match|$group cycle using the $map and $setDifference operators available there:
Local._get_collection().aggregate([
# Match the documents that have the required user
{ "$match": {
"points.user": user
}},
# Filter the array in place
{ "$project": {
"token": 1,
"points": {
"$setDifference": [
{
"$map": {
"input": "$points",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.user", user ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
In the second case there the $cond is a ternary operator which takes a logical expression as it's first argument and the values to return when that expression is either true or false as it's other arguments. Inside the $map, each element is tested to see if the condition is true, in this case "is the user field equal to the selected user".
Either the content of that array position is returned or otherwise false. The $setDifference takes the resulting array and "filters" the false values out, so only the matching elements are returned.
In the legacy approach, the $unwind pipeline operator is used to effectively turn each array element into it's own document with all other parent properties. This allows you to apply the same $match condition, which unlike the initial query actually removes the documents which now as single elements no longer match your condition. You always want the first stage as there is no point processing this $unwind|$match combination on all of the documents that might not contain your matching condition.
The $group stage brings everything back into line per document. Using the $first option to return all other fields that were essentially duplicated by the $unwind and the $push operator to rebuild the array with the matching elements.
So while there no "built-in" methods to MongoEngine to do this sort of query, you can do this the MongoDB way by accessing the raw driver.
Also note that if you only expected one element to match in any array for your given "user" or other query, then you could alternately use the field projection form available to the raw driver as well. But the aggregation method is required for any more than one matching element of the array.

Categories

Resources