Is it possible to calculate a first order derivative using the aggregate framework?
For example, I have the data :
{time_series : [10,20,40,70,110]}
I'm trying to obtain an output like:
{derivative : [10,20,30,40]}
db.collection.aggregate(
[
{
"$addFields": {
"indexes": {
"$range": [
0,
{
"$size": "$time_series"
}
]
},
"reversedSeries": {
"$reverseArray": "$time_series"
}
}
},
{
"$project": {
"derivatives": {
"$reverseArray": {
"$slice": [
{
"$map": {
"input": {
"$zip": {
"inputs": [
"$reversedSeries",
"$indexes"
]
}
},
"in": {
"$subtract": [
{
"$arrayElemAt": [
"$$this",
0
]
},
{
"$arrayElemAt": [
"$reversedSeries",
{
"$add": [
{
"$arrayElemAt": [
"$$this",
1
]
},
1
]
}
]
}
]
}
}
},
{
"$subtract": [
{
"$size": "$time_series"
},
1
]
}
]
}
},
"time_series": 1
}
}
]
)
We can use the pipeline above in version 3.4+ to do this.
In the pipeline, we use the $addFields pipeline stage. operator to add the array of the "time_series"'s elements index to do document, we also reversed the time series array and add it to the document using respectively the $range and $reverseArray operators
We reversed the array here because the element at position p in the array is always greater than the element at position p+1 which means that [p] - [p+1] < 0 and we do not want to use the $multiply here.(see pipeline for version 3.2)
Next we $zipped the time series data with the indexes array and applied a substract expression to the resulted array using the $map operator.
We then $slice the result to discard the null/None value from the array and re-reversed the result.
In 3.2 we can use the $unwind operator to unwind our array and include the index of each element in the array by specifying a document as operand instead of the traditional "path" prefixed by $.
Next in the pipeline, we need to $group our documents and use the $push accumulator operator to return an array of sub-documents that look like this:
{
"_id" : ObjectId("57c11ddbe860bd0b5df6bc64"),
"time_series" : [
{ "value" : 10, "index" : NumberLong(0) },
{ "value" : 20, "index" : NumberLong(1) },
{ "value" : 40, "index" : NumberLong(2) },
{ "value" : 70, "index" : NumberLong(3) },
{ "value" : 110, "index" : NumberLong(4) }
]
}
Finally comes the $project stage. In this stage, we need to use the $map operator to apply a series of expression to each element in the the newly computed array in the $group stage.
Here is what is going on inside the $map (see $map as a for loop) in expression:
For each subdocument, we assign the value field to a variable using the $let variable operator. We then subtract it value from the value of the "value" field of the next element in the array.
Since the next element in the array is the element at the current index plus one, all we need is the help of the $arrayElemAt operator and a simple $addition of the current element's index and 1.
The $subtract expression return a negative value so we need to multiply the value by -1 using the $multiply operator.
We also need to $filter the resulted array because it the last element is None or null. The reason is that when the current element is the last element, $subtract return None because the index of the next element equal the size of the array.
db.collection.aggregate([
{
"$unwind": {
"path": "$time_series",
"includeArrayIndex": "index"
}
},
{
"$group": {
"_id": "$_id",
"time_series": {
"$push": {
"value": "$time_series",
"index": "$index"
}
}
}
},
{
"$project": {
"time_series": {
"$filter": {
"input": {
"$map": {
"input": "$time_series",
"as": "el",
"in": {
"$multiply": [
{
"$subtract": [
"$$el.value",
{
"$let": {
"vars": {
"nextElement": {
"$arrayElemAt": [
"$time_series",
{
"$add": [
"$$el.index",
1
]
}
]
}
},
"in": "$$nextElement.value"
}
}
]
},
-1
]
}
}
},
"as": "item",
"cond": {
"$gte": [
"$$item",
0
]
}
}
}
}
}
])
Another option which I think is less efficient is perform a map/reduce operation on our collection using the map_reduce method.
>>> import pymongo
>>> from bson.code import Code
>>> client = pymongo.MongoClient()
>>> db = client.test
>>> collection = db.collection
>>> mapper = Code("""
... function() {
... var derivatives = [];
... for (var index=1; index<this.time_series.length; index++) {
... derivatives.push(this.time_series[index] - this.time_series[index-1]);
... }
... emit(this._id, derivatives);
... }
... """)
>>> reducer = Code("""
... function(key, value) {}
... """)
>>> for res in collection.map_reduce(mapper, reducer, out={'inline': 1})['results']:
... print(res) # or do something with the document.
...
{'value': [10.0, 20.0, 30.0, 40.0], '_id': ObjectId('57c11ddbe860bd0b5df6bc64')}
You can also retrieve all the document and use the numpy.diff to return the derivative like this:
import numpy as np
for document in collection.find({}, {'time_series': 1}):
result = np.diff(document['time_series'])
it's a bit dirty, but perhaps something like this?
use test_db
db['data'].remove({})
db['data'].insert({id: 1, time_series: [10,20,40,70,110]})
var mapF = function() {
emit(this.id, this.time_series);
emit(this.id, this.time_series);
};
var reduceF = function(key, values){
var n = values[0].length;
var ret = [];
for(var i = 0; i < n-1; i++){
ret.push( values[0][i+1] - values[0][i] );
}
return {'gradient': ret};
};
var finalizeF = function(key, val){
return val.gradient;
}
db['data'].mapReduce(
mapF,
reduceF,
{ out: 'data_d1', finalize: finalizeF }
)
db['data_d1'].find({})
The "strategy" here is to emit the data to be operated on twice so that it is accessible in the reduce stage, return an object to avoid the message "reduce -> multiple not supported yet" and then filter back the array in the finalizer.
This script then produces:
MongoDB shell version: 3.2.9
connecting to: test
switched to db test_db
WriteResult({ "nRemoved" : 1 })
WriteResult({ "nInserted" : 1 })
{
"result" : "data_d1",
"timeMillis" : 13,
"counts" : {
"input" : 1,
"emit" : 2,
"reduce" : 1,
"output" : 1
},
"ok" : 1
}
{ "_id" : 1, "value" : [ 10, 20, 30, 40 ] }
bye
Alternatively, one could move all the processing into the finalizer (reduceF is not called here since mapF is assumed to emit unique keys):
use test_db
db['data'].remove({})
db['data'].insert({id: 1, time_series: [10,20,40,70,110]})
var mapF = function() {
emit(this.id, this.time_series);
};
var reduceF = function(key, values){
};
var finalizeF = function(key, val){
var x = val;
var n = x.length;
var ret = [];
for(var i = 0; i < n-1; i++){
ret.push( x[i+1] - x[i] );
}
return ret;
}
db['data'].mapReduce(
mapF,
reduceF,
{ out: 'data_d1', finalize: finalizeF }
)
db['data_d1'].find({})
Related
I am using mongoengine as ORM with flask application. The model class is define like
class MyData(db.Document):
task_id = db.StringField(max_length=50, required=True)
url = db.URLField(max_length=500,required=True,unique=True)
organization = db.StringField(max_length=250,required=True)
val = db.StringField(max_length=50, required=True)
The field organization can be repeating and I want to get the count of duplicates with respect to values in another field. For example if the data in mongodb is like
[{"task_id":"as4d2rds5","url":"https:example1.com","organization":"Avengers","val":"null"},
{"task_id":"rfre43fed","url":"https:example1.com","organization":"Avengers","val":"valid"},
{"task_id":"uyje3dsxs","url":"https:example2.com","organization":"Metro","val":"valid"},
{"task_id":"ghs563vt6","url":"https:example1.com","organization":"Avengers","val":"invalid"},
{"task_id":"erf6egy64","url":"https:example2.com","organization":"Metro","val":"null"}]
Then I am querying all the objects using
data = MyData.objects()
I want a response like
[{"url":"https:example1.com","Avengers":{"valid":1,"null":1,"invalid":1}},{"url":"https:example2.com",Metro":{"valid":1,"null":1,"invalid":0}}]
I tried like
db.collection.aggregate([
{
"$group": {
"_id": "$organization",
"count": [
{
"null": {
"$sum": 1
},
"valid": {
"$sum": 1
},
"invalid": {
"$sum": 1
}
}
]
}
}
])
but I am getting an error
The field 'count' must be an accumulator object
Maybe something like this:
db.collection.aggregate([
{
"$group": {
"_id": {
k: "$organization",
v: "$val"
},
"cnt": {
$sum: 1
}
}
},
{
$project: {
_id: 0,
k: "$_id.k",
o: {
k: "$_id.v",
v: "$cnt"
}
}
},
{
$group: {
_id: "$k",
v: {
$push: "$o"
}
}
},
{
$addFields: {
v: {
"$arrayToObject": "$v"
}
}
},
{
$project: {
_id: 0,
new: [
{
k: "$_id",
v: "$v"
}
]
}
},
{
"$addFields": {
"new": {
"$arrayToObject": "$new"
}
}
},
{
"$replaceRoot": {
"newRoot": "$new"
}
}
])
Explained:
Group to count
Project for arrayToObject
Group to join the values
arrayToObject one more time
project additionally
arrayToObject to form the final object
project one more time
replaceRoot to move the object to root.
P.S.
Please, note this solution is not showing the missing values if they do not exist , if you need the missing values additional mapping / mergeObjects need to be added
playground1
Option with missing values ( if possible values are fixed to null,valid,invalid) :
just replace the second addFiedlds with:
{
$addFields: {
v: {
"$mergeObjects": [
{
"null": 0,
valid: 0,
invalid: 0
},
{
"$arrayToObject": "$v"
}
]
}
}
}
playground2
++url:
playground3
I´m lookiing for update one field, array in array
db.germain.updateOne({}, {$set: { "items.$[elem].sub_items.price" : 2}}, {arrayFilters: [ { "elem.sub_item_name": "my_item_two_one" } ] } )
I find one but it doesn´t update.
{
"_id" : ObjectId("4faaba123412d654fe83hg876"),
"user_id" : 123456,
"items" : [
{
"item_name" : "my_item_one",
"sub_items" : [
{
"sub_item_name" : "my_item_one_one",
"price" : 20
},
]
},
{
"item_name" : "my_item_two",
"sub_items" : [
{
"sub_item_name" : "my_item_two_one",
"price" : 30
},
{
"sub_item_name" : "my_item_two_two",
"price" : 50
},
]
}
]
}
Actually its nested array. So you need to specify which object in the parent array to be changed? and which object in the specified object of parent array to be changed.
db.collection.update({},
{
$set: {
"items.$[parent].sub_items.$[child].price": 2
}
},
{
arrayFilters: [
{ "parent.item_name": "my_item_two" },
{ "child.sub_item_name": "my_item_two_one" }
}
]
})
If you need o to change the whole object in parent array, simply you can use $ positional operator.
I want to retrieve the array object with the newest dates for a particular document.
But I sadly can't solve it, I always end up with errors.
Dateformat 2020-06-10T13:25:25.645+00:00 datetime.now()
Sample data
collection.insert_one(
{
"document_name": "My Document",
"status": [
{
"status_time": datetimeobject, # 2020-01-02T13:25:25.645+00:00
"status_title": "Sample Title 1"
},
{
"status_time": datetimeobject, # 2020-06-10T13:25:25.645+00:00
"status_title": "Sample Title"
}
]
})
What I've tried
result = collection.find_one({"document_name": "My Document"}, {"status": 1}).sort({"status.status_time": -1}).limit(1)
result = collection.find_one({"document_name": "My Document"}, {"$max": {"status.status_time": -1})
result = collection_projects.find_one({"document_name": "Document"}, {"status": {"$elemMatch": {"$max": "$´status_time"}}})
result = list(collection.find({"document_name": "Document"}, {"_id": 0, "status": 1}).limit(1))
result = collection_projects.find_one(
{"document_name": "My Document"},
{"status.status_time": {"$arrayElemAt": -1}})
Result I'm looking for
{
"status_time": datetimeobject, # 2020-06-10T13:25:25.645+00:00
"status_title": "Sample Title 2"
}
You need to use aggregation to achieve this :
Query 1 :
db.collection.aggregate([
/** Re-create `status` field with what is needed */
{
$addFields: {
status: {
$reduce: {
input: "$status", // Iterate on array
initialValue: { initialDate: ISODate("1970-06-09T17:56:34.350Z"), doc: {} }, // Create initial values
in: { // If condition is met push current value to accumulator or return acummulator as is
initialValue: { $cond: [ { $gt: [ "$$this.status_time", "$$value.initialDate" ] }, "$$this.status_time", "$$value.initialDate" ] },
doc: { $cond: [ { $gt: [ "$$this.status_time", "$$value.initialDate" ] }, "$$this", "$$value" ] }
}
}
}
}
},
/**
* re-create `status` field from `$status.doc`
* Since it will always be having only on object you can make `status` as an object ratherthan an array
* Just in case if `status` need to be an array you need do { status: [ "$status.doc" ] }
*/
{
$addFields: { status: "$status.doc" }
}
])
Test : mongoplayground
Ref : $reduce , pymongo
Query 2 :
db.collection.aggregate([
/** unwind on `status` array */
{
$unwind: {
path: "$status",
preserveNullAndEmptyArrays: true // preserves doc where `status` field is `[]` or null or missing (Optional)
}
},
/** sort on descending order */
{
$sort: { "status.status_time": -1 }
},
/** group on `_id` & pick first found doc */
{
$group: { _id: "$_id", doc: { $first: "$$ROOT" } }
},
/** make `doc` field as new root */
{
$replaceRoot: { newRoot: "$doc" }
}
])
Test : mongoplayground
Test both queries, I believe on a huge dataset $unwind & $sort might be a bit slow, similar to iteration on a huge array.
You will have to use aggregate with $reduce, this solution is similar to #whoami's except there is no nested document when using $reduce
db.collection.aggregate([
{
$match: {
document_name: "My Document"
}
},
{
$project: { // use $project if you only want the status, use $addFields if you want other fields as well
status: {
$reduce: {
input: "$status",
initialValue: null,
in: {
$cond: [
{
$gte: [
"$$this.status_time",
"$$value.status_time"
]
},
"$$this",
"$$value"
]
}
}
}
}
}
])
mongoplayground
Hi have a MongoDB collection matchedpairs with a data structure as follows:
each document defines a pairwise connection with each other, i.e 1 is in union with 2 and 2 is in union with 10 etc. There are a large number of relationships defined.
{
x:1,
y:2
},
{
x:2,
y:10
},
{
x:9,
y:10
},
{
x:8,
y:4
}
I would like to query the documents and retrieve the unique disjoint sets for the pairs, i.e. return a result like this
{
set:[1,2,9,10]
},
{
set:[8,4]
}
I am familiar with the aggregation framework, but cannot see how to create the correct accumulator in the $group stage to create the disjoint sets. The attempt below simply gives just one grouping of similar pairs. As I see it I would have to create a whole string of $group stages (depending upon my set of data) to get the result I am looking for. Any clever ideas here?
db.matchedpairs.aggregate([
{
'$group': {
'_id': '$y',
'like': {
'$addToSet': '$x'
},
'from': {
'$addToSet': '$y'
}
}
}, {
'$project': {
'_id': 0,
'set': {
'$setUnion': [
'$like', '$from'
]
}
}
}
]
gives:
{
set:[4,8]
},
{
set:[10,2,9]
},
{
set:[1,2]
}
maybe it would be beneficial to convert it into an array and mapreduce or custom script can be use
db.matchedpairs.aggregate([
{ $project:{'set':['$x','$y']}},
{
'$group': {
'_id': '1',
'list': {
'$addToSet': '$set'
}
}
},
{
$out:'matchedpairs2'
}
]);
//gives => matchedpairs2
{
"_id" : "1",
"list" : [
[
1,
2
],
[
9,
10
],
[
2,
10
],
[
8,
4
]
]
}
var map = function() {
emit("list", this.list);
};
var emit = function(key, value) {
const result = [];
const result2 = [];
value.map((item, i) => {
const distinct = value.filter((w, j) => i != j);
const convertset = [...new Set([].concat(...distinct))];
const b = new Set(convertset);
const intersection = item.filter(x => b.has(x));
const diff = item.filter(x => !b.has(x));
if (intersection.length > 0) result.push(item);
if (diff.length > 0) result2.push(item);
});
const set1 = [...new Set([].concat(...result))];
const set2 = [...new Set([].concat(...result2))];
const w = new Set(set1);
const diff2 = set2.filter(x => !w.has(x));
const finalset = [...new Set([].concat(...diff2))]
print(set1);
print(finalset);
};
var myCursor = db.matchedpairs2.find({});
while (myCursor.hasNext()) {
var doc = myCursor.next();
map.apply(doc);
}
Result:
/* 1 */
[
9.0,
10.0,
1.0,
2.0
]
/* 2 */
[
8.0,
4.0
]
I have a DB in MongoDB that has 3 levels, and I want to get the value form last level. The structure is the following:
{
"_id" : "10000",
"Values" : [
{
"Value1" : "Article 1",
"Value2" : [
{
"Value2_1" : 1,
"Value2_2" : 2,
}
]
}
]
}
I need to get the value form the label "Value2_1".
So far my code is the following:
for row in collection.find({"_id":1, "Values.Value2.Value2_1":1})
print(row)
The output is always "None".
Any ideas about how to make the correct query?
Thanks!
By using dot (.) notation you can get your expected result.
db.collection.find({"Values.Value2.Value2_1" : 100})
The above query will select all documents where the Values array has Values2 array and Values2 has Values2_1 whose value is equal 100
Output:
{
"_id" : ObjectId("5b86bd1172876096c7a9d6cf"),
"Values" : [
{
"Value1" : "Article 1",
"Value2" : [
{
"Value2_1" : 100.0,
"Value2_2" : 200.0
},
{
"Value2_1" : 15.0,
"Value2_2" : 25.0
}
]
}
]
}
And if you try to search with _id then you don't need to use your
second condition because by definition _id is always unique.
This following query will also show the same result as above.
db.collection.find({"_id" : ObjectId("5b86bd1172876096c7a9d6cf")})
If you want to specifically get only those items from inner array which have fulfill your inner conditions, you can aggregate the query
PS - My 2 cents - I do not know if you require this, as i could not get it that clear from your question, i just thought you may be asking this.
db.coll.aggregate([{
$unwind: '$Values'
}, {
$project: {
'Values_F': {
$filter: {
input: "$Values.Value2",
as: "value2",
cond: {
$eq: ["$$value2.Value2_1", 1]
}
}
}
}
}, {
$project: {
'Values_F': 1,
'total': {
$size: '$Values_F'
}
}
}, {
$match: {
total: {
$gte: 1
}
}
}
])