I have an aggregate query via mongoengine:
Foo.objects.aggregate(*[ { "$match": { "groups.1": { "$exists": True } } }, { "$redact": { "$cond": [ { "$gte": [ { "$size": { "$setIntersection": [ "$groups", my_groups ] } }, 1 ]}, "$$KEEP", "$$PRUNE" ] }} ])
But, the results of the query are not enough.
I need to find all documents that match this query OR other queries.
How should I do that?
Thank you!
Related
I want to query my index so that it matches whenever a particular attribute shows up called sitename, but I want all the data from a certain time range. I thought it might be something of the below but unsure:
{
"query": {
"range": {
"timestamp": {
"gte": "now-1h/h",
"lt": "now/h"
}
},
"match": {"sitename" : "HARB00ZAF0" }
}
}
You're almost there, but you need to leverage the bool queries
{
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1h/h",
"lt": "now/h"
}
}
}
],
"must": [
{
"match": {
"sitename": "HARB00ZAF0"
}
}
]
}
}
}
I have 2 mongo db collections, 'Contacts' and 'Messages'. Both collections share the phone number field(Primary/Foreign Key relation in SQL).
Contacts collection has this field as follows:
{
"phone": "+192******",
"name": "test"
}
and Messages as follows:
{
"Tel": "tel:+192******"
}
I want to aggregate the 2 collections such that I can have this nested document:
"text": "text sent by user",
"contact": {
"phone": "+192******",
"name": "test"
}
So far, I have tried the following aggregation but it doesn't work:
cursor = messages_client.aggregate([{
'$lookup':
{
'from': "contacts",
'let': { 'phone': "$phone"},
'pipeline': [
{ '$addFields': { 'phone_number': { "$substr": [ "$Tel", 4, -1 ] }}},
{'$match': { "$expr": { '$eq': [ '$phone_number', '$$phone']}}}
],
'as': 'contact'
}}
], allowDiskUse=True)
Could someone kindly help me? I'm using pymongo and Python3 if that is helpful.
Found some help from $indexOfCp operator, for anyone with a similar problem.
cursor = messages_client.aggregate([{
'$lookup':
{
'from': "contacts",
'let': { 'phone': "$phone"},
'pipeline': [
{'$match': { "$expr": { '$gt': [{ "$indexOfCP": ["$Tel", "$$phone"]}, -1]}}}
],
'as': 'contact'
}}
], allowDiskUse=True)
Try this:
db.messages.aggregate([
{
$addFields: {
'phone_number': { "$substr": ["$Tel", 4, -1] }
}
},
{
$lookup: {
from: "contacts",
let: { "phone": "$phone_number" },
pipeline: [
{
$match: {
$expr: { $eq: ["$phone", "$$phone"] }
}
}
],
as: "contact"
}
},
{ $unwind: "$contact" }
]);
i have a document in mongodb:
{
"company": "npcompany",
"department": [
{
"name": "it",
"employeeIds": [
"emp1",
"emp2",
"emp3"
]
},
{
"name": "economy",
"employeeIds": [
"emp1",
"emp3",
"emp4"
]
}
]
}
I want to find "emp4". In this case i want to get "economy" department data only. If i found "emp1" then i want to get "npcompany" and "economy" datas. How can i do it in mongodb (or pymongo)?
play
db.collection.aggregate([ //As you need to fetch all matching array elements, reshape them
{
$unwind: "$department"
},
{
"$match": {//look for match
"department.employeeIds": "emp4"
}
},
{
$group: {//regroup them
"_id": "$_id",
data: {
"$push": "$$ROOT"
}
}
}
])
I have a field in some of my documents if they've been individually queried before which is a unix timestamp:
"timelock": 1,561,081,724.254
Some documents don't have this if they've never been individually queried. I would like to also have a query that only returns documents that either DO NOT have the field or have the field but the difference between it's timestamp and the current time is greater than 10 minutes (600sec)
documents = es.search(index='index', size=10000, body={
"query": {
"bool": {
"must": [
{
"match_all": {}
},
],
"filter": [],
"should": [],
"must_not": [
]
}
}})
So I guess in pseudo-code I'd do it like:
if 'timelock' exists:
if current_time - 'timlock' > 600:
include in query
else:
exclude from query
else:
include in query
I'm using the python module for ES.
Why not simply using date math ?
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "timelock"
}
}
]
}
},
{
"range": {
"timelock": {
"lt": "now-10m"
}
}
}
]
}
}
}
I'm not aware of python syntax but what I can suggest via sudo code is to use the logic below:
compare_stamp = current_timestamp - 600
if 'timelock' exists:
if timelock < compare_stamp:
include document
else:
exclude document
else:
include document
Since you can easily get the compare_stamp in python script. This value can then be used in elastic query below:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "timelock"
}
}
]
}
},
{
"range": {
"timelock": {
"lt": compare_timestamp
}
}
}
]
}
}
}
I have a large collection that can be modeled more or less as the one created by the following code:
import string
from random import randint, random, choice
documents = []
for i in range(100):
letters = choice(string.letters[0:15])
documents.append({'hgvs_id': "".join([str(randint(0,9)), letters]),
'sample_id': "CDE",
'number': i*random()*50 - 30 })
documents.append({'hgvs_id': "".join([str(randint(0,9)), letters]),
'sample_id': 'ABC',
'number': i*random()*50 - 30 })
documents.append({'hgvs_id': "".join([str(randint(0,9)), letters]),
'sample_id': 'GEF',
'number': i*random()*50 - 30 })
for i in range(10): # add some unique values for sample_id 'ABC'
letters = choice(string.letters[0:15])
documents.append({'hgvs_id': "55" + letters,
'sample_id': 'ABC',
'number': i*random()*50 - 30 })
collection.insert_many(documents)
I am trying to retrieve the unique hgvs_id's that occur within documents that have a specific sample_id (ABC here) but not in documents containing the other two. Usually, there will be many more sample_id than just three.
It sounds pretty simple, but so far I have been unsuccessful. Given the size of the collection I'm working with (~30GB), I've been trying to use the aggregate framework as follows:
sample_1 = collection.aggregate(
[
{'$group':
{
'_id': '$hgvs_id',
#'sample_id' : {"addToSet": '$hgvs_id'},
'matchedDocuments':
{'$push':
{
'id': '$_id',
'sample_name': "$sample_id",
'hgvs_ids': "$hgvs_id"
}
},
}
},
{'$match': {
"$and": [
{'matchedDocuments': {"$elemMatch": {'sample_name': 'ABC'}}},
# Some other operation????
]
}
}
]) #, allowDiskUse=True) may be needed
This returns (understandably) all the hgvs_id's having sample_id equal ABC. Any leads would be more than appreciated.
If it's the only sample_id in the "set" of grouped values then the $size will be one:
With MongoDB 3.4 you can use $in in combination:
[
{ "$group": {
"_id": "$hgvs_id",
"samples": { "$addToSet": "$sample_id" }
}},
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$in": [ "ABC", "$samples" ] },
{ "$eq": [ { "$size": "$samples" }, 1 ] }
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
]
Otherwise use $setIntersection which is just a little longer in syntax:
[
{ "$group": {
"_id": "$hgvs_id",
"samples": { "$addToSet": "$sample_id" }
}},
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [ { "$size": { "$setIntersection": [ "$samples", ["ABC"] ] } }, 1 ] },
{ "$eq": [ { "$size": "$samples" }, 1 ] }
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
]
Or probably in the simplest form for all versions supporting aggregation anyway:
{ "$group": {
"_id": "$hgvs_id",
"samples": { "$addToSet": "$sample_id" }
}},
{ "$match": {
"$and": [{ "samples": "ABC" },{ "samples": { "$size": 1 } }]
}}
]
The same principle applies to any number of arguments in that the "set" produced much much the size of the arguments given as well as containing the specific value.