I am trying to pull data from 2 arrays (date and price time series) above a certain date cutoff oldest using pymongo.
date_price_collection.aggregate([
{ '$match': { 'ticker': ticker } },
{
'$project': {
'dates': { '$gte': ['dates', oldest] },
#prices to match these dates
},
},
],
);
The way data is organized is there are 2 arrays, one for dates and one for prices of the same length. How can I pull prices as well that correspond to dates > oldest?
Thank you very much for any advice!
What you could do is first $filter the date array, then use $slice to match the proper prices, like so:
const oldest = ...date;
db.collection.aggregate([
{
"$addFields": {
"dates": {
$filter: {
input: "$dates",
as: "date",
cond: {
$gte: [
"$$date",
oldest
]
}
}
}
}
},
{
$project: {
dates: 1,
prices: {
$slice: [
"$prices",
{
"$multiply": [
-1,
{
$size: "$dates"
}
]
},
2
]
}
}
}
])
Mongo Playground
Related
I am using mongoengine as ORM with flask application. The model class is define like
class MyData(db.Document):
task_id = db.StringField(max_length=50, required=True)
url = db.URLField(max_length=500,required=True,unique=True)
organization = db.StringField(max_length=250,required=True)
val = db.StringField(max_length=50, required=True)
The field organization can be repeating and I want to get the count of duplicates with respect to values in another field. For example if the data in mongodb is like
[{"task_id":"as4d2rds5","url":"https:example1.com","organization":"Avengers","val":"null"},
{"task_id":"rfre43fed","url":"https:example1.com","organization":"Avengers","val":"valid"},
{"task_id":"uyje3dsxs","url":"https:example2.com","organization":"Metro","val":"valid"},
{"task_id":"ghs563vt6","url":"https:example1.com","organization":"Avengers","val":"invalid"},
{"task_id":"erf6egy64","url":"https:example2.com","organization":"Metro","val":"null"}]
Then I am querying all the objects using
data = MyData.objects()
I want a response like
[{"url":"https:example1.com","Avengers":{"valid":1,"null":1,"invalid":1}},{"url":"https:example2.com",Metro":{"valid":1,"null":1,"invalid":0}}]
I tried like
db.collection.aggregate([
{
"$group": {
"_id": "$organization",
"count": [
{
"null": {
"$sum": 1
},
"valid": {
"$sum": 1
},
"invalid": {
"$sum": 1
}
}
]
}
}
])
but I am getting an error
The field 'count' must be an accumulator object
Maybe something like this:
db.collection.aggregate([
{
"$group": {
"_id": {
k: "$organization",
v: "$val"
},
"cnt": {
$sum: 1
}
}
},
{
$project: {
_id: 0,
k: "$_id.k",
o: {
k: "$_id.v",
v: "$cnt"
}
}
},
{
$group: {
_id: "$k",
v: {
$push: "$o"
}
}
},
{
$addFields: {
v: {
"$arrayToObject": "$v"
}
}
},
{
$project: {
_id: 0,
new: [
{
k: "$_id",
v: "$v"
}
]
}
},
{
"$addFields": {
"new": {
"$arrayToObject": "$new"
}
}
},
{
"$replaceRoot": {
"newRoot": "$new"
}
}
])
Explained:
Group to count
Project for arrayToObject
Group to join the values
arrayToObject one more time
project additionally
arrayToObject to form the final object
project one more time
replaceRoot to move the object to root.
P.S.
Please, note this solution is not showing the missing values if they do not exist , if you need the missing values additional mapping / mergeObjects need to be added
playground1
Option with missing values ( if possible values are fixed to null,valid,invalid) :
just replace the second addFiedlds with:
{
$addFields: {
v: {
"$mergeObjects": [
{
"null": 0,
valid: 0,
invalid: 0
},
{
"$arrayToObject": "$v"
}
]
}
}
}
playground2
++url:
playground3
I have this task "Write a mongodb query to find the count of movies released after the year 1999" . I'm trying to do this with this different line codes in the picture bellow, none of them works. Any thoughts?
PS: the collection's name is movies, the columns are the year and _id of the movies.
These are the lines I'm trying:
docs = db.movies.find({"year":{"$gt":"total"("1999")}}).count()
docs = db.movies.aggregate([{"$group":{"_id":"$year","count":{"$gt":"$1999"}}}])
docs = db.movies.count( {"year": { "$gt": "moviecount"("1999") } } )
docs = db.movies.find({"year":{"$gt":"1999"}})
docs = db.movies.aggregate([{"$group":{"_id":"$year","count":{"$gt":"1999"}}}])
You can do it with an aggregate
try it here
[
{
"$match": {
"year": {
"$gt": "1999"
}
}
},
{
"$group": {
"_id": 1,
"count": {
"$sum": "$total"
}
}
}
]
The first stage of the pipeline is $match, it will filter only your documents with a year greater than 1999.
Then in the $group we will sum all the total variables.
The "_id": 1, is a dummy value because we are not grouping on any particular field, and we just want to sum all the total
i have a document in mongodb:
{
"company": "npcompany",
"department": [
{
"name": "it",
"employeeIds": [
"emp1",
"emp2",
"emp3"
]
},
{
"name": "economy",
"employeeIds": [
"emp1",
"emp3",
"emp4"
]
}
]
}
I want to find "emp4". In this case i want to get "economy" department data only. If i found "emp1" then i want to get "npcompany" and "economy" datas. How can i do it in mongodb (or pymongo)?
play
db.collection.aggregate([ //As you need to fetch all matching array elements, reshape them
{
$unwind: "$department"
},
{
"$match": {//look for match
"department.employeeIds": "emp4"
}
},
{
$group: {//regroup them
"_id": "$_id",
data: {
"$push": "$$ROOT"
}
}
}
])
I want to retrieve the array object with the newest dates for a particular document.
But I sadly can't solve it, I always end up with errors.
Dateformat 2020-06-10T13:25:25.645+00:00 datetime.now()
Sample data
collection.insert_one(
{
"document_name": "My Document",
"status": [
{
"status_time": datetimeobject, # 2020-01-02T13:25:25.645+00:00
"status_title": "Sample Title 1"
},
{
"status_time": datetimeobject, # 2020-06-10T13:25:25.645+00:00
"status_title": "Sample Title"
}
]
})
What I've tried
result = collection.find_one({"document_name": "My Document"}, {"status": 1}).sort({"status.status_time": -1}).limit(1)
result = collection.find_one({"document_name": "My Document"}, {"$max": {"status.status_time": -1})
result = collection_projects.find_one({"document_name": "Document"}, {"status": {"$elemMatch": {"$max": "$´status_time"}}})
result = list(collection.find({"document_name": "Document"}, {"_id": 0, "status": 1}).limit(1))
result = collection_projects.find_one(
{"document_name": "My Document"},
{"status.status_time": {"$arrayElemAt": -1}})
Result I'm looking for
{
"status_time": datetimeobject, # 2020-06-10T13:25:25.645+00:00
"status_title": "Sample Title 2"
}
You need to use aggregation to achieve this :
Query 1 :
db.collection.aggregate([
/** Re-create `status` field with what is needed */
{
$addFields: {
status: {
$reduce: {
input: "$status", // Iterate on array
initialValue: { initialDate: ISODate("1970-06-09T17:56:34.350Z"), doc: {} }, // Create initial values
in: { // If condition is met push current value to accumulator or return acummulator as is
initialValue: { $cond: [ { $gt: [ "$$this.status_time", "$$value.initialDate" ] }, "$$this.status_time", "$$value.initialDate" ] },
doc: { $cond: [ { $gt: [ "$$this.status_time", "$$value.initialDate" ] }, "$$this", "$$value" ] }
}
}
}
}
},
/**
* re-create `status` field from `$status.doc`
* Since it will always be having only on object you can make `status` as an object ratherthan an array
* Just in case if `status` need to be an array you need do { status: [ "$status.doc" ] }
*/
{
$addFields: { status: "$status.doc" }
}
])
Test : mongoplayground
Ref : $reduce , pymongo
Query 2 :
db.collection.aggregate([
/** unwind on `status` array */
{
$unwind: {
path: "$status",
preserveNullAndEmptyArrays: true // preserves doc where `status` field is `[]` or null or missing (Optional)
}
},
/** sort on descending order */
{
$sort: { "status.status_time": -1 }
},
/** group on `_id` & pick first found doc */
{
$group: { _id: "$_id", doc: { $first: "$$ROOT" } }
},
/** make `doc` field as new root */
{
$replaceRoot: { newRoot: "$doc" }
}
])
Test : mongoplayground
Test both queries, I believe on a huge dataset $unwind & $sort might be a bit slow, similar to iteration on a huge array.
You will have to use aggregate with $reduce, this solution is similar to #whoami's except there is no nested document when using $reduce
db.collection.aggregate([
{
$match: {
document_name: "My Document"
}
},
{
$project: { // use $project if you only want the status, use $addFields if you want other fields as well
status: {
$reduce: {
input: "$status",
initialValue: null,
in: {
$cond: [
{
$gte: [
"$$this.status_time",
"$$value.status_time"
]
},
"$$this",
"$$value"
]
}
}
}
}
}
])
mongoplayground
Currently we are able to group by customer_id in elastic search.
Following is the document structure
{
"order_id":"6",
"customer_id":"1",
"customer_name":"shailendra",
"mailing_addres":"shailendra#gmail.com",
"actual_order_date":"2000-04-30",
"is_veg":"0",
"total_amount":"2499",
"store_id":"276",
"city_id":"12",
"payment_mode":"cod",
"is_elite":"0",
"product":["1","2"],
"coupon_id":"",
"client_source":"1",
"vendor_id":"",
"vendor_name: "",
"brand_id":"",
"third_party_source":""
}
Now we need to filter the group to find the documents
last ordered date between two dates
first order date between two dates
How can we achieve this ?
You can try with the query below. Within each customer bucket, we further filter all document between two dates (here I've taken the month of August 2016) and then we run a stats aggregation on the date field. The min value will be the first order date and the max value will be the last order date.
{
"aggs": {
"customer_ids": {
"terms": {
"field": "customer_id"
},
"aggs": {
"date_filter": {
"filter": {
"range": {
"actual_order_date": {
"gt": "2016-08-01T00:00:00.000Z",
"lt": "2016-09-01T00:00:00.000Z"
}
}
},
"aggs": {
"min_max": {
"stats": {
"field": "actual_order_date"
}
}
}
}
}
}
}
}