Querying 2 arrays based on specific indices in Mongo

Querying 2 arrays based on specific indices in Mongo - python

I am trying to pull data from 2 arrays (date and price time series) above a certain date cutoff oldest using pymongo.
date_price_collection.aggregate([
{ '$match': { 'ticker': ticker } },
{
'$project': {
'dates': { '$gte': ['dates', oldest] },
#prices to match these dates
},
},
],
);
The way data is organized is there are 2 arrays, one for dates and one for prices of the same length. How can I pull prices as well that correspond to dates > oldest?
Thank you very much for any advice!

What you could do is first $filter the date array, then use $slice to match the proper prices, like so:
const oldest = ...date;
db.collection.aggregate([
{
"$addFields": {
"dates": {
$filter: {
input: "$dates",
as: "date",
cond: {
$gte: [
"$$date",
oldest
]
}
}
}
}
},
{
$project: {
dates: 1,
prices: {
$slice: [
"$prices",
{
"$multiply": [
-1,
{
$size: "$dates"
}
]
},
2
]
}
}
}
])
Mongo Playground

Related

Aggregation function for Counting of Duplicates in a field based on duplicate items in another field

I am using mongoengine as ORM with flask application. The model class is define like
class MyData(db.Document):
task_id = db.StringField(max_length=50, required=True)
url = db.URLField(max_length=500,required=True,unique=True)
organization = db.StringField(max_length=250,required=True)
val = db.StringField(max_length=50, required=True)
The field organization can be repeating and I want to get the count of duplicates with respect to values in another field. For example if the data in mongodb is like
[{"task_id":"as4d2rds5","url":"https:example1.com","organization":"Avengers","val":"null"},
{"task_id":"rfre43fed","url":"https:example1.com","organization":"Avengers","val":"valid"},
{"task_id":"uyje3dsxs","url":"https:example2.com","organization":"Metro","val":"valid"},
{"task_id":"ghs563vt6","url":"https:example1.com","organization":"Avengers","val":"invalid"},
{"task_id":"erf6egy64","url":"https:example2.com","organization":"Metro","val":"null"}]
Then I am querying all the objects using
data = MyData.objects()
I want a response like
[{"url":"https:example1.com","Avengers":{"valid":1,"null":1,"invalid":1}},{"url":"https:example2.com",Metro":{"valid":1,"null":1,"invalid":0}}]
I tried like
db.collection.aggregate([
{
"$group": {
"_id": "$organization",
"count": [
{
"null": {
"$sum": 1
},
"valid": {
"$sum": 1
},
"invalid": {
"$sum": 1
}
}
]
}
}
])
but I am getting an error
The field 'count' must be an accumulator object

Maybe something like this:
db.collection.aggregate([
{
"$group": {
"_id": {
k: "$organization",
v: "$val"
},
"cnt": {
$sum: 1
}
}
},
{
$project: {
_id: 0,
k: "$_id.k",
o: {
k: "$_id.v",
v: "$cnt"
}
}
},
{
$group: {
_id: "$k",
v: {
$push: "$o"
}
}
},
{
$addFields: {
v: {
"$arrayToObject": "$v"
}
}
},
{
$project: {
_id: 0,
new: [
{
k: "$_id",
v: "$v"
}
]
}
},
{
"$addFields": {
"new": {
"$arrayToObject": "$new"
}
}
},
{
"$replaceRoot": {
"newRoot": "$new"
}
}
])
Explained:
Group to count
Project for arrayToObject
Group to join the values
arrayToObject one more time
project additionally
arrayToObject to form the final object
project one more time
replaceRoot to move the object to root.
P.S.
Please, note this solution is not showing the missing values if they do not exist , if you need the missing values additional mapping / mergeObjects need to be added
playground1
Option with missing values ( if possible values are fixed to null,valid,invalid) :
just replace the second addFiedlds with:
{
$addFields: {
v: {
"$mergeObjects": [
{
"null": 0,
valid: 0,
invalid: 0
},
{
"$arrayToObject": "$v"
}
]
}
}
}
playground2
++url:
playground3

Mongodb query using "$gt"

I have this task "Write a mongodb query to find the count of movies released after the year 1999" . I'm trying to do this with this different line codes in the picture bellow, none of them works. Any thoughts?
PS: the collection's name is movies, the columns are the year and _id of the movies.
These are the lines I'm trying:
docs = db.movies.find({"year":{"$gt":"total"("1999")}}).count()
docs = db.movies.aggregate([{"$group":{"_id":"$year","count":{"$gt":"$1999"}}}])
docs = db.movies.count( {"year": { "$gt": "moviecount"("1999") } } )
docs = db.movies.find({"year":{"$gt":"1999"}})
docs = db.movies.aggregate([{"$group":{"_id":"$year","count":{"$gt":"1999"}}}])

You can do it with an aggregate
try it here
[
{
"$match": {
"year": {
"$gt": "1999"
}
}
},
{
"$group": {
"_id": 1,
"count": {
"$sum": "$total"
}
}
}
]
The first stage of the pipeline is $match, it will filter only your documents with a year greater than 1999.
Then in the $group we will sum all the total variables.
The "_id": 1, is a dummy value because we are not grouping on any particular field, and we just want to sum all the total

How to filter elements of array in mongodb

i have a document in mongodb:
{
"company": "npcompany",
"department": [
{
"name": "it",
"employeeIds": [
"emp1",
"emp2",
"emp3"
]
},
{
"name": "economy",
"employeeIds": [
"emp1",
"emp3",
"emp4"
]
}
]
}
I want to find "emp4". In this case i want to get "economy" department data only. If i found "emp1" then i want to get "npcompany" and "economy" datas. How can i do it in mongodb (or pymongo)?

play
db.collection.aggregate([ //As you need to fetch all matching array elements, reshape them
{
$unwind: "$department"
},
{
"$match": {//look for match
"department.employeeIds": "emp4"
}
},
{
$group: {//regroup them
"_id": "$_id",
data: {
"$push": "$$ROOT"
}
}
}
])

mongoDB - Get newest date from document in array

I want to retrieve the array object with the newest dates for a particular document.
But I sadly can't solve it, I always end up with errors.
Dateformat 2020-06-10T13:25:25.645+00:00 datetime.now()
Sample data
collection.insert_one(
{
"document_name": "My Document",
"status": [
{
"status_time": datetimeobject, # 2020-01-02T13:25:25.645+00:00
"status_title": "Sample Title 1"
},
{
"status_time": datetimeobject, # 2020-06-10T13:25:25.645+00:00
"status_title": "Sample Title"
}
]
})
What I've tried
result = collection.find_one({"document_name": "My Document"}, {"status": 1}).sort({"status.status_time": -1}).limit(1)
result = collection.find_one({"document_name": "My Document"}, {"$max": {"status.status_time": -1})
result = collection_projects.find_one({"document_name": "Document"}, {"status": {"$elemMatch": {"$max": "$´status_time"}}})
result = list(collection.find({"document_name": "Document"}, {"_id": 0, "status": 1}).limit(1))
result = collection_projects.find_one(
{"document_name": "My Document"},
{"status.status_time": {"$arrayElemAt": -1}})
Result I'm looking for
{
"status_time": datetimeobject, # 2020-06-10T13:25:25.645+00:00
"status_title": "Sample Title 2"
}

You need to use aggregation to achieve this :
Query 1 :
db.collection.aggregate([
/** Re-create `status` field with what is needed */
{
$addFields: {
status: {
$reduce: {
input: "$status", // Iterate on array
initialValue: { initialDate: ISODate("1970-06-09T17:56:34.350Z"), doc: {} }, // Create initial values
in: { // If condition is met push current value to accumulator or return acummulator as is
initialValue: { $cond: [ { $gt: [ "$$this.status_time", "$$value.initialDate" ] }, "$$this.status_time", "$$value.initialDate" ] },
doc: { $cond: [ { $gt: [ "$$this.status_time", "$$value.initialDate" ] }, "$$this", "$$value" ] }
}
}
}
}
},
/**
* re-create `status` field from `$status.doc`
* Since it will always be having only on object you can make `status` as an object ratherthan an array
* Just in case if `status` need to be an array you need do { status: [ "$status.doc" ] }
*/
{
$addFields: { status: "$status.doc" }
}
])
Test : mongoplayground
Ref : $reduce , pymongo
Query 2 :
db.collection.aggregate([
/** unwind on `status` array */
{
$unwind: {
path: "$status",
preserveNullAndEmptyArrays: true // preserves doc where `status` field is `[]` or null or missing (Optional)
}
},
/** sort on descending order */
{
$sort: { "status.status_time": -1 }
},
/** group on `_id` & pick first found doc */
{
$group: { _id: "$_id", doc: { $first: "$$ROOT" } }
},
/** make `doc` field as new root */
{
$replaceRoot: { newRoot: "$doc" }
}
])
Test : mongoplayground
Test both queries, I believe on a huge dataset $unwind & $sort might be a bit slow, similar to iteration on a huge array.

You will have to use aggregate with $reduce, this solution is similar to #whoami's except there is no nested document when using $reduce
db.collection.aggregate([
{
$match: {
document_name: "My Document"
}
},
{
$project: { // use $project if you only want the status, use $addFields if you want other fields as well
status: {
$reduce: {
input: "$status",
initialValue: null,
in: {
$cond: [
{
$gte: [
"$$this.status_time",
"$$value.status_time"
]
},
"$$this",
"$$value"
]
}
}
}
}
}
])
mongoplayground

Group by and filter max(date) between two dates in elastic search

Currently we are able to group by customer_id in elastic search.
Following is the document structure
{
"order_id":"6",
"customer_id":"1",
"customer_name":"shailendra",
"mailing_addres":"shailendra#gmail.com",
"actual_order_date":"2000-04-30",
"is_veg":"0",
"total_amount":"2499",
"store_id":"276",
"city_id":"12",
"payment_mode":"cod",
"is_elite":"0",
"product":["1","2"],
"coupon_id":"",
"client_source":"1",
"vendor_id":"",
"vendor_name: "",
"brand_id":"",
"third_party_source":""
}
Now we need to filter the group to find the documents
last ordered date between two dates
first order date between two dates
How can we achieve this ?

You can try with the query below. Within each customer bucket, we further filter all document between two dates (here I've taken the month of August 2016) and then we run a stats aggregation on the date field. The min value will be the first order date and the max value will be the last order date.
{
"aggs": {
"customer_ids": {
"terms": {
"field": "customer_id"
},
"aggs": {
"date_filter": {
"filter": {
"range": {
"actual_order_date": {
"gt": "2016-08-01T00:00:00.000Z",
"lt": "2016-09-01T00:00:00.000Z"
}
}
},
"aggs": {
"min_max": {
"stats": {
"field": "actual_order_date"
}
}
}
}
}
}
}
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Querying 2 arrays based on specific indices in Mongo - python

Related

Aggregation function for Counting of Duplicates in a field based on duplicate items in another field

Mongodb query using "$gt"

How to filter elements of array in mongodb

mongoDB - Get newest date from document in array

Group by and filter max(date) between two dates in elastic search

Categories

Resources