MongoDB pipeline unwind and check for empty array

MongoDB pipeline unwind and check for empty array - python

I'm unwinding one field which is an array of date objects, however in some cases there are empty array's which is fine. I'd like the same treatment using a pipeline, but in some cases, I want to filter the results which have an empty array.
pipeline = []
pipeline.append({"$unwind": "$date_object"})
pipeline.append({"$sort": {"date_object" : 1}})
I want to use the pipeline format, however the following code does not return any records:
pipeline.append({"$match": {"date_object": {'$exists': False }}})
nor does the following work:
pipeline.append({"$match": {"date_object": []}})
and then:
results = mongo.db.xxxx.aggregate(pipeline)
I'm also trying:
pipeline.append({ "$cond" : [ { "$eq" : [ "$date_object", [] ] }, [ { '$value' : 0 } ], '$date_object' ] } )
But with this I get the following error:
.$cmd failed: exception: Unrecognized pipeline stage name: '$cond'
However if I query using find such as find({"date_object": []}), I can get these results. How can I make this work with the pipeline.

I've done in MongoDB shell, but it can be translated into Python easily in python language.
Is it your requirements?
I suppose you have such structure:
db.collection.save({foo:1, date_object:[new Date(), new Date(2016,1,01,1,0,0,0)]})
db.collection.save({foo:2, date_object:[new Date(2016,0,16,1,0,0,0),new Date(2016,0,5,1,0,0,0)]})
db.collection.save({foo:3, date_object:[]})
db.collection.save({foo:4, date_object:[new Date(2016,1,05,1,0,0,0), new Date(2016,1,06,1,0,0,0)]})
db.collection.save({foo:5, date_object:[]})
// Get empty arrays after unwind
db.collection.aggregate([
{$project:{_id:"$_id", foo:"$foo",
date_object:{
$cond: [ {"$eq": [{ $size:"$date_object" }, 0]}, [null], "$date_object" ]
}
}
},
{$unwind:"$date_object"},
{$match:{"date_object":null}}
])
// Get empty arrays before unwind
db.collection.aggregate([
{$match:{"date_object.1":{$exists:false}}},
{$project:{_id:"$_id", foo:"$foo",
date_object:{
$cond: [ {"$eq": [{ $size:"$date_object" }, 0]}, [null], "$date_object" ]
}
}
},
{$unwind:"$date_object"}
])
Only empty date_object
[
{
"_id" : ObjectId("56eb0bd618d4d09d4b51087a"),
"foo" : 3,
"date_object" : null
},
{
"_id" : ObjectId("56eb0bd618d4d09d4b51087c"),
"foo" : 5,
"date_object" : null
}
]
At the end, if you need only empty date_object, you don't need to aggregate, you can easely achieve it with find:
db.collection.find({"date_object.1":{$exists:false}},{date_object:0})
Output
{
"_id" : ObjectId("56eb0bd618d4d09d4b51087a"),
"foo" : 3
}
{
"_id" : ObjectId("56eb0bd618d4d09d4b51087c"),
"foo" : 5
}

Related

Is there a way to get the particular Values from JSON Array using robot or Python code?

JSON OUTPUT:
${response}= [
{
"Name":"7122Project",
"checkBy":[
{
"keyId":"NA",
"target":"1232"
}
],
"Enabled":false,
"aceess":"123"
},
{
"Name":"7122Project",
"checkBy":[
{
"keyId":"_GU6S3",
"target":"123"
}
],
"aceess":"11222",
"Enabled":false
},
{
"Name":"7122Project",
"checkBy":[
{
"keyId":"-1lLUy",
"target":"e123"
}
],
"aceess":"123"
}
]
Need to get the keyId values from json without using hardcoded index using robot?
I did
${ID}= set variable ${response[0]['checkBy'][0]['keyId']}
But I need to check the length get all keyID values and store the values that dose not contain NA
How can I do check length and use for loop using robot framework?

I suppose you can have more elements in checkBy arrays, like so:
response = [
{
"Name":"7122Project",
"checkBy": [
{
"keyId": "NA",
"target": "1232"
}
],
"Enabled": False,
"aceess": "123"
},
{
"Name": "7122Project",
"checkBy": [
{
"keyId": "_GUO6g6S3",
"target": "123"
}
],
"aceess": "11222",
"Enabled": False
},
{
"Name": "7122Project",
"checkBy": [
{
"keyId": "-1lLlZOUy",
"target": "e123"
},
{
"keyId": "test",
"target": "e123"
}
],
"aceess": "123"
}
]
then you can key all keyIds in Python with this code:
def get_key_ids(response):
checkbys = [x["checkBy"] for x in response]
key_ids = []
for check_by in checkbys:
for key_id in check_by:
key_ids.append(key_id["keyId"])
return key_ids
for the example above, it will return: ['NA', '_GUO6g6S3', '-1lLlZOUy', 'test_NA'].
You want to get both ids with NA and without NA, so perhaps you can change the function a bit:
def get_key_ids(response, predicate):
checkbys = [x["checkBy"] for x in response]
key_ids = []
for check_by in checkbys:
for key_id in check_by:
if predicate(key_id["keyId"]):
key_ids.append(key_id["keyId"])
return key_ids
and use it like so:
get_key_ids(response, lambda id: id == "NA") # ['NA']
get_key_ids(response, lambda id: id != "NA") # ['_GUO6g6S3', '-1lLlZOUy', 'test_NA']
get_key_ids(response, lambda id: "NA" in id) # ['NA', 'test_NA']
get_key_ids(response, lambda id: "NA" not in id) # ['_GUO6g6S3', '-1lLlZOUy']
Now it's just a matter of creating a library and importing it into RF. You can get inspiration in the official documentation.
But I need to check the length get all keyID values and store the values that dose not contain NA
I don't completely understand what you are up to. Do you mean length of keyId strings, like "NA" and its length of 2, or the number of keyIds in the response?
How can I do check length and use for loop using robot framework?
You can use keyword Should Be Equal * from BuiltIn library. Some examples of for loops could be found in the user guide here.
Now you should have all the parts you need to accomplish your task, you can try to put it all together.

MongoDB using $cond with Update ($inc)

Is there any way to use $cond along with ($set, $inc, ...) operators in update? (MongoDB 4.2)
I want to update a field in my document by $inc it with "myDataInt" if a condition comes true, otherwise keeps it as it is:
db.mycoll.update(
{"_id" : "5e9e5da03da783817d231dc4"},
{"$inc" : {
"my_data_sum" : {
"$cond" : [
{
"$ne" : ["snapshot_time", new_snapshot_time)]
},myDataInt, 0]
]
}
},
{upsert=True, multi=False}
)
However, this gives an error in pymongo:
raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: The dollar ($) prefixed field '$cond' in 'my_data_sum.$cond' is not valid for storage.
Any idea to avoid using find() before update in this case?
Update:
If I use the approach that Joe has mentioned, an exception will be raised in PyMongo (v3.10.1) due to using 'list' as a parameter in update_many() instead of 'dict':
from pymongo import MongoClient
db = MongoClient()['mydb']
db.mycoll.update_many(
{"_id" : "5e9e5da03da783817d231dc4"},
[{"$set" : {
"my_data_sum" : {
"$sum": [
"$my_data_sum",
{"$cond" : [
{"$ne" : ["snapshot_time", new_snapshot_time]},
myDataInt,
0
]}
]
}
}}],
upsert:true
)
That ends up with this error:
File "/usr/local/lib64/python3.6/site-packages/pymongo/collection.py", line 1076, in update_many session=session),
File "/usr/local/lib64/python3.6/site-packages/pymongo/collection.py", line 856, in _update_retryable _update, session)
File "/usr/local/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1491, in _retryable_write return self._retry_with_session(retryable, func, s, None)
File "/usr/local/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1384, in _retry_with_session return func(session, sock_info, retryable)
File "/usr/local/lib64/python3.6/site-packages/pymongo/collection.py", line 852, in _update retryable_write=retryable_write)
File "/usr/local/lib64/python3.6/site-packages/pymongo/collection.py", line 823, in _update _check_write_command_response(result)
File "/usr/local/lib64/python3.6/site-packages/pymongo/helpers.py", line 221, in _check_write_command_response _raise_last_write_error(write_errors)
File "/usr/local/lib64/python3.6/site-packages/pymongo/helpers.py", line 203, in _raise_last_write_error raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: Modifiers operate on fields but we found type array instead. For example: {$mod: {<field>: ...}} not {$set: [ { $set: { my_data_sum: { $sum: [ "$my_data_sum", { $cond: [ { $ne: [ "$snapshot_time", 1586910283 ] }, 1073741824, 0 ] } ] } } } ]}

If you are using MongoDB 4.2, you can use aggregation operators with updates. $inc is not an aggregation operator, but $sum is. To specify a pipeline, pass an array as the second argument to update:
db.coll.update(
{"_id" : "5e9e5da03da783817d231dc4"},
[{"$set" : {
"my_data_sum" : {
"$sum": [
"$my_data_sum",
{"$cond" : [
{"$ne" : ["snapshot_time", new_snapshot_time]},
myDataInt,
0
]}
]
}
}}],
{upsert:true, multi:false}
)

After spending some time and searching online, I figured that the update_many(), update_one(), and update() methods of Collection object in PyMongo do not accept type list as parameters to support the new Aggregation Pipeline feature of the Update operation in MongoDB 4.2+. (At least this option is not available in PyMongo v3.10 yet.)
However, looks like I could use the command method of the Database object in PyMongo which is an instance of the (MongoDB runCommand) and it worked just fine for me:
from pymongo import MongoClient
db = MongoClient()['mydb']
result = db.command(
{
"update" : "mycoll",
"updates" : [{
"q" : {"_id" : "5e9e5da03da783817d231dc4"},
"u" : [
{"$set" : {
"my_data_sum" : {
"$sum": [
"$my_data_sum",
{"$cond" : [
{"$ne" : ["snapshot_time", new_snapshot_time]},
myDataInt,
0
]}
]
}
}}
],
"upsert" : True,
"multi" : True
}],
"ordered": False
}
)
The command method of the database object gets a dict object of all the required commands as its first argument, and then the list of Aggregation Pipeline can be included inside the dict object (q is the update query, and the u defined the fields to be updated).
result is a dictionary of Ack message from MongoDB which contains 'nModified', 'upserted', and 'writeErrors'.

https://mongoplayground.net/p/1AklFKuhFi6
[
{
"id": 1,
"like": 3
},
{
"id": 2,
"like": 1
}
]
let value = 1,
if you want to increment then use
value = -1 * value
db.collection.aggregate([
{
"$match": {
"id": 1
}
},
{
"$set": {
"count": {
$cond: {
if: {
$gt: [
"$like",
0
]
},
then: {
"$subtract": [
"$like",
value
]
},
else: 0
}
}
}
}
])

Python eve: using Sub Resource value in $match

I need to get a value inside an url (/some/url/value as a Sub Resource) usable as a parameter in an aggregation $match :
event/mac/11:22:33:44:55:66 --> {value:'11:22:33:44:55:66'}
and then:
{"$match":{"MAC":"$value"}},
here is a non-working example :
event = {
'url': 'event/mac/<regex("([\w:]+)"):value>',
'datasource': {
'source':"event",
'aggregation': {
'pipeline': [
{"$match": {"MAC":"$value"}},
{"$group": {"_id":"$MAC", "total": {"$sum": "$count"}}},
]
}
}
}
this example is working correctly with :
event/mac/blablabla?aggregate={"$value":"aa:11:bb:22:cc:33"}
any suggestion ?

The real quick and easy way would be to
path = "event/mac/11:22:33:44:55:66"
value = path.replace("event/mac/", "")
# or
value = path.split("/")[-1]

Improve the performce of a MongoDB query which uses a "$where" expression

I need to run the following query on a MongoDB server:
QUERY = {
"$and" : [
{"x" : {'$gt' : 1.0}},
{"y" : {'$gt' : 0.1}},
{"$where" : 'this.s1.length < this.s2.length+3'}
]
}
This query is very slow, due to the JavaScript expression which the server needs to execute on every document in the collection.
Is there any way for me to optimize it?
I thought about using the $size operator, but I'm not really sure that it works on strings, and I'm even less sure on how to compare its output on a pair of strings (as is the case here).
Here is the rest of my script, in case needed:
from pymongo import MongoClient
USERNAME = ...
PASSWORD = ...
SERVER_NAME = ...
DATABASE_NAME = ...
COLLECTION_NAME = ...
uri = 'mongodb://{}:{}#{}/{}'.format(USERNAME,PASSWORD,SERVER_NAME,DATABASE_NAME)
mongoClient = MongoClient(uri)
collection = mongoClient[DATABASE_NAME][COLLECTION_NAME]
cursor = collection.find(QUERY)
print cursor.count()
The pymongo version is 3.4.

You can use aggregation framework, which provides $strLenCP to get length of a string and $cmp to compare them:
db.collection.aggregate(
[
{
$match: {
"x" : {'$gt' : 1.0},
"y" : {'$gt' : 0.1}
}
},
{
$addFields: {
str_cmp: { $cmp: [ { $strLenCP: "$s1" }, { $add: [ { $strLenCP: "$s2" }, 3 ] } ] }
}
},
{
$match: {
"str_cmp": -1,
}
}
]
)

How to Query this in MongoDB?

My items store in MongoDB like this :
{"ProductName":"XXXX",
"Catalogs" : [
{
"50008064" : "Apple"
},
{
"50010566" : "Box"
},
{
"50016422" : "Water"
}
]}
Now I want query all the items belong to Catalog:50008064,how to?
(the catalog id "50008064" , catalog name "Apple")

You cannot query this in an efficient manner and performance will decrease as your data grows. As such I would consider it a schema bug and you should refactor/migrate to the following model which does allow for indexing :
{"ProductName":"XXXX",
"Catalogs" : [
{
id : "50008064",
value : "Apple"
},
{
id : "50010566",
value : "Box"
},
{
id : "50016422",
value : "Water"
}
]}
And then index :
ensureIndex({'Catalogs.id':1})
Again, I strongly suggest you change your schema as this is a potential performance bottleneck you cannot fix any other way.

This should probably work according to the entry here, although this won't be very fast, as stated in in the link.
db.products.find({ "Catalogs.50008064" : { $exists: true } } )

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

MongoDB pipeline unwind and check for empty array - python

Related

Is there a way to get the particular Values from JSON Array using robot or Python code?

MongoDB using $cond with Update ($inc)

Python eve: using Sub Resource value in $match

Improve the performce of a MongoDB query which uses a "$where" expression

How to Query this in MongoDB?

Categories

Resources