PyMongo aggregate by taking most recent value of field

PyMongo aggregate by taking most recent value of field - python

I would like to group my documents and for certain fields take the value of the record with the most recent timestamp (i.e. most recently inserted/updated value). In the example below, I want to group by user ID and phone, and take the email of the record with the most recent timestamp in that group. My initial strategy is to sort by descending timestamp and take the first value for an aggregation like so:
import pymongo
...
pipeline = [
{
"$sort": {"timestamp": -1 }
},
{ "$group": {
"_id": {
"userId": "$userId",
"userPhone": "$userPhone",
"userEmail": { "$first" : "$userEmail"},
"count": {"$sum": 1}
}
}
]
However I run into the following error:
pymongo.errors.OperationFailure: Unrecognized expression '$first'
Is there an equivalent $first function available for pymongo?

Your pipeline syntax is incorrect. Accumulators go on their own fields.
Something like
pipeline = [
{ "$sort": {"timestamp": -1 } },
{ "$group": { "_id": { "userId": "$userId", "userPhone": "$userPhone" }, "userEmail": { "$first" : "$userEmail"}, "count": {"$sum": 1} } }
]

Related

How to paginate subdocuments in a MongoDB collection?

I have a MongoDB collection with the following data structure;
[
{
"_id": "1",
"name": "businessName1",
"reviews": [
{
"_id": "1",
"comment": "comment1",
},
{
"_id": "2",
"comment": "comment1",
},
...
]
}
]
As you can see, the reviews for each business are a subdocument within the collection, where businessName1 has a total of 2 reviews. In my real MongoDB collection, each business has 100s of reviews. I want to view only 10 on one page using pagination.
I currently have a find_one() function in Python that retrieves this single business, but it also retrieves all of its reviews as well.
businesses.find_one( \
{ "_id" : ObjectId(1) }, \
{ "reviews" : 1, "_id" : 0 } )
I'm aware of the skip() and limit() methods in Python, where you can limit the number of results that are retrieved, but as far as I'm aware, you can only perform these methods on the find() method. Is this correct?

Option 1: You can use $slice for pagination as follow:
db.collection.find({
_id: 1
},
{
_id: 0,
reviews: {
$slice: [
3,
5
]
}
})
Playground
Option 2: Or via aggregation + total array size maybe better:
db.collection.aggregate([
{
$project: {
_id: 0,
reviews: {
$slice: [
"$reviews",
3,
5
]
},
total: {
$size: "$reviews"
}
}
}
])
Playground

MongoDB - How to aggregate data for each record

I have some stored data like this:
{
"_id" : 1,
"serverAddresses" : {
"name" : "0.0.0.0:8000",
"name2": "0.0.0.0:8001"
}
}
I need aggregated data to this:
[
{
"gameId": "1",
"name": "name1",
"url": "0.0.0.0:8000"
},
{
"gameId": "1",
"name": "name2",
"url": "0.0.0.0:8001"
}
]
What is the solution without using for loop?

$project - Add addresses field by converting $serverAddress to (key-value) array.
$unwind - Descontruct addresses field to multiple documents.
$replaceRoot - Decorate the output document based on (2).
db.collection.aggregate([
{
"$project": {
"addresses": {
"$objectToArray": "$serverAddresses"
}
}
},
{
$unwind: "$addresses"
},
{
"$replaceRoot": {
"newRoot": {
gameId: "$_id",
name: "$addresses.k",
address: "$addresses.v"
}
}
}
])
Sample Mongo Playground

Mongodb query using "$gt"

I have this task "Write a mongodb query to find the count of movies released after the year 1999" . I'm trying to do this with this different line codes in the picture bellow, none of them works. Any thoughts?
PS: the collection's name is movies, the columns are the year and _id of the movies.
These are the lines I'm trying:
docs = db.movies.find({"year":{"$gt":"total"("1999")}}).count()
docs = db.movies.aggregate([{"$group":{"_id":"$year","count":{"$gt":"$1999"}}}])
docs = db.movies.count( {"year": { "$gt": "moviecount"("1999") } } )
docs = db.movies.find({"year":{"$gt":"1999"}})
docs = db.movies.aggregate([{"$group":{"_id":"$year","count":{"$gt":"1999"}}}])

You can do it with an aggregate
try it here
[
{
"$match": {
"year": {
"$gt": "1999"
}
}
},
{
"$group": {
"_id": 1,
"count": {
"$sum": "$total"
}
}
}
]
The first stage of the pipeline is $match, it will filter only your documents with a year greater than 1999.
Then in the $group we will sum all the total variables.
The "_id": 1, is a dummy value because we are not grouping on any particular field, and we just want to sum all the total

Get field value in MongoDB without parent object name

I'm trying to find a way to retrieve some data on MongoDB trough python scripts
but I got stuck on a situation as follows:
I have to retrieve some data, check a field value and compare with another data (MongoDB Documents).
But the Object's name may vary from each module, see bellow:
Document 1
{
"_id": "001",
"promotion": {
"Avocado": {
"id": "01",
"timestamp": "202005181407",
},
"Banana": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "11"
}
Document 2
{
"_id": "002",
"promotion": {
"Grape": {
"id": "02",
"timestamp": "202005181407",
},
"Dragonfruit": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "15"
}
}
I'll aways have an Object called promotion but the child's name may vary, sometimes it's an ordered number, sometimes it is not. The field I need the value is the id inside promotion, it will aways have the same name.
So if the document matches the criteria I'll retrieve with python and get the rest of the work done.
PS.: I'm not the one responsible for this kind of Document Structure.
I've already tried these docs, but couldn't get them to work the way I need.
$all
$elemMatch

Try this python pipeline:
[
{
'$addFields': {
'fruits': {
'$objectToArray': '$promotion'
}
}
}, {
'$addFields': {
'FruitIds': '$fruits.v.id'
}
}, {
'$project': {
'_id': 0,
'FruitIds': 1
}
}
]
Output produced:
{FruitIds:["01","02"]},
{FruitIds:["02","02"]}
Is this the desired output?

MongoDB Query in Pymongo

I have a collection in this format:
{
"name": ....,
"users": [....,....,....,....]
}
I have two different names and I want to find the total number of users that belongs to both documents. Now, I am doing it with Python. I download the document of name 1 and the document of name 2 and check how many users are in both of the documents. I was wondering if there is any other way to do it only with MongoDB and return the number.
Example:
{
"name": "John",
"users": ["001","003","008","010"]
}
{
"name": "Peter",
"users": ["002, "003", "004","005","006","008"]
}
The result would be 2 since users 003 and 008 belongs to both documents.
How I do it:
doc1 = db.collection.find_one({"name":"John"})
doc2 = db.collection.find_one({"name":"Peter"})
total = 0
for user in doc1["users"]:
if user in doc2["users"]:
total += 1

You could also do this with the aggregation framework, but I think it would only make sense if you were doing this over a more than two users even though your could use it that way:
db.users.aggregate([
{ "$match": {
"name": { "$in": [ "John", "Peter" ] }
}},
{ "$unwind": "$users" },
{ "$group": {
"_id": "$users",
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } }},
{ "$group": {
"_id": null,
"count": { "$sum": 1 }
}}
])
That allows you to find the same counts over the names you supply to $in in $match

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyMongo aggregate by taking most recent value of field - python

Your pipeline syntax is incorrect. Accumulators go on their own fields. Something like pipeline = [ { "$sort": {"timestamp": -1 } }, { "$group": { "_id": { "userId": "$userId", "userPhone": "$userPhone" }, "userEmail": { "$first" : "$userEmail"}, "count": {"$sum": 1} } } ]

Related

How to paginate subdocuments in a MongoDB collection?

MongoDB - How to aggregate data for each record

Mongodb query using "$gt"

Get field value in MongoDB without parent object name

MongoDB Query in Pymongo

Categories

Resources