How can I dynamically create a combined Q query in elasticsearch dsl python?
I have gone through docs and this SO post. I constructed a q_query dictionary with all required info.
ipdb> q_queries
{'queries': [{'status': 'PASS'}, {'status': 'FAIL'}, {'status': 'ABSE'}, {'status': 'WITH'}], 'operation': 'or'}
I want to perform the following q_query
qq=Q("match", status='PASS') | Q("match", status="FAIL") | Q("match", status="ABSE") | Q("match", status="WITH")
for a list of dict following works out
ipdb> [Q('match', **z) for z in q_queries['queries']]
[Match(status='PASS'), Match(status='FAIL'), Match(status='ABSE'), Match(status='WITH')]
But How to combine multiple Qs with an or operator or an and operator? Also what is the corresponding elasticsearch raw query for the above? I tried following since I have to filter based on test_id.
{
"query": {
"bool": {
"must": [
{ "match": { "test_id": "7" }},
{
"range": {
"created": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
}
],
"should": [
{ "match": { "status": "PASS"}},
{ "match": { "status": "FAIL"}}
]
}
}
}
But results are not as expected I same query without should filter and the results obtained were the same. So should filters were not executed by elasticsearch in my case.
Any help is much appreciated.
TIA
After exploring elasticsearch dsl python for some more time, this piece of documentation helped me to solve above issue. Below posted is the function that I wrote to resolve this issue.
def create_q_queries(self, q_queries, search_query):
"""
create q queries and chain if multiple q queries.
:param q_queries: Q queries with operation and query params as a dict.
:param search_query: Search() object.
:return: search_query updated with q queries.
"""
if q_queries:
logical_operator_mappings = {'or': 'should', 'and': 'must'}
for query in q_queries:
queries = [Q('match', **query) for query in query['queries']]
search_query = search_query.query(Q('bool', **{
logical_operator_mappings.get(query.get('operation')): queries
}))
return search_query
I changed format of q_queries to perform chaining based on multiple operators like and, or etc.
q_queries = [
{
"operation": "or",
"queries":
[
{"status": "PASS"}, {"status": "FAIL"}, {"status": "ABSE"}, {"status": "WITH"}
]
}
]
Related
For example, if this is my record
{
"_id":"123",
"name":"google",
"ip_1":"10.0.0.1",
"ip_2":"10.0.0.2",
"ip_3":"10.0.1",
"ip_4":"10.0.1",
"description":""}
I want to get only those fields starting with 'ip_'. Consider I have 500 fields & only 15 of them start with 'ip_'
Can we do something like this to get the output -
db.collection.find({id:"123"}, {'ip*':1})
Output -
{
"ip_1":"10.0.0.1",
"ip_2":"10.0.0.2",
"ip_3":"10.0.1",
"ip_4":"10.0.1"
}
The following aggregate query, using PyMongo, returns documents with the field names starting with "ip_".
Note the various aggregation operators used: $filter, $regexMatch, $objectToArray, $arrayToObject. The aggregation pipeline the two stages $project and $replaceWith.
pipeline = [
{
"$project": {
"ipFields": {
"$filter" : {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$regexMatch": { "input": "$$this.k" , "regex": "^ip" } }
}
}
}
},
{
"$replaceWith": { "$arrayToObject": "$ipFields" }
}
]
pprint.pprint(list(collection.aggregate(pipeline)))
I am unaware of a way to specify an expression that would decide which hash keys would be projected. MongoDB has projection operators but they deal with arrays and text search.
If you have a fixed possible set of ip fields, you can simply request all of them regardless of which fields are present in a particular document, e.g. project with
{ip_1: true, ip_2: true, ...}
I have an elasticsearch DB with data of the form
record = {#all but age are strings
'diagnosis': self.diagnosis,
'vignette': self.vignette,
'symptoms': self.symptoms_list,
'care': self.care_level_string,
'age': self.age, #float
'gender': self.gender
}
I want to create a word cloud of the data in vignette.
I tried all sorts of queries and I get error 400, meaning I don't understand how to query the database.
I am using python
This is the only successful query I was able to come up with
def search_phrase_in_vignettes(self, phrase):
body = {
"_source": ["vignette"],
"query": {
"match_phrase": {
"vignette": {
"query": phrase,
}
}
}
}
res = self.es.search(index=self.index_name, doc_type=self.doc_type, body=body)
Which finds any record with phrase contained in the field `'vignette'
I am thinking some aggregation should do the trick, but I can't seem to be able to write a correct query with 'aggr'.
Would love some help on how to correctly write even the simplest query with aggregation in python.
Use terms aggregation for the approach words count. Your query will be:
{
"query": {
"match_phrase": {
"vignette": {
"query": phrase,
}
}
},
"aggs" : {
"cloud" : {
"terms" : { "field" : "vignette" }
}
}
}
When you receive results take buckets from aggregations key:
res = self.es.search(index=self.index_name, doc_type=self.doc_type, body=body)
for bucket in res['aggregations']['cloud']['buckets']:
rest of build cloud
I need to run the following query on a MongoDB server:
QUERY = {
"$and" : [
{"x" : {'$gt' : 1.0}},
{"y" : {'$gt' : 0.1}},
{"$where" : 'this.s1.length < this.s2.length+3'}
]
}
This query is very slow, due to the JavaScript expression which the server needs to execute on every document in the collection.
Is there any way for me to optimize it?
I thought about using the $size operator, but I'm not really sure that it works on strings, and I'm even less sure on how to compare its output on a pair of strings (as is the case here).
Here is the rest of my script, in case needed:
from pymongo import MongoClient
USERNAME = ...
PASSWORD = ...
SERVER_NAME = ...
DATABASE_NAME = ...
COLLECTION_NAME = ...
uri = 'mongodb://{}:{}#{}/{}'.format(USERNAME,PASSWORD,SERVER_NAME,DATABASE_NAME)
mongoClient = MongoClient(uri)
collection = mongoClient[DATABASE_NAME][COLLECTION_NAME]
cursor = collection.find(QUERY)
print cursor.count()
The pymongo version is 3.4.
You can use aggregation framework, which provides $strLenCP to get length of a string and $cmp to compare them:
db.collection.aggregate(
[
{
$match: {
"x" : {'$gt' : 1.0},
"y" : {'$gt' : 0.1}
}
},
{
$addFields: {
str_cmp: { $cmp: [ { $strLenCP: "$s1" }, { $add: [ { $strLenCP: "$s2" }, 3 ] } ] }
}
},
{
$match: {
"str_cmp": -1,
}
}
]
)
I have a list of JSON files in elasticsearch.
I have a list of strings, matching which I want to use as the criteria for a search.
Where, matching = ["223232_ds","dnjsnsd_22","2ee2i33","mkddsj2220","23e3efdjn"
I now need to find those records in elasticsearch where two keys contain values in this list, matching.
Without elasticsearch and simply loading the JSON as a python object I can do this like:
results= []
for record in JSON_list:
if record['key_1'] in matching and record['key_2'] in matching:
results.append(record)
Where the JSON_list looks like this:
[{'key_1' : "blahaksds",
'key_2' : "njasdnjkns"},
{'key_1' : "bladfgfdf",
'key_2' : "njasdsfsdrr"}]
How do I search for multiple criteria in es? Previously, I've used this setup to search for a record_id directly.
es = elasticsearch.Elasticsearch()
name = "so_sample"
# Formulate query
query = str("_id:"+'"'+ record_id +'"')
# Query
result = es.search(name,q=query)
You can use a bool query with two terms queries in the must clause, like this:
{
"query": {
"bool": {
"must": [
{
"terms": {
"key_1": ["223232_ds","dnjsnsd_22","2ee2i33","mkddsj2220","23e3efdjn"]
}
},
{
"terms": {
"key_2": ["223232_ds","dnjsnsd_22","2ee2i33","mkddsj2220","23e3efdjn"]
}
}
]
}
}
}
My actors collection contains an array-of-documents field, called acted_in. Instead of returning the size of acted_in.idmovies like so: {$size: $acted_in.idmovies}, I want to return the number of distinct values inside $acted_in.idmovies. How can I do that ?
c1 = actors.aggregate([{"$match": {'$and': [{'fname': f_name},
{'lname': l_name}]}},
{"$project": {'first_name': '$fname',
'last_name': '$lname',
'gender': '$gender',
'distinct_movies_played_in': {'$size': '$acted_in.idmovies'}}}])
You basically need to include $setDifference in there to obtain the "distinct" items. All "sets" are "distinct" by design and by obtaining the "difference" from the present array to an empty one [] you get the desired result. Then you can apply the $size.
You also have some common mistakes/misconceptions. Firstly when using $match or any MongoDB query expression you do not need to use $and unless there is an explicit case to do so. All query expression arguments are "already" AND conditions unless explicitly stated otherwise, as with $or. So don't explicitly use for this case.
Secondly your $project was using the explicit field path variables for every field. You do not need to do that just to return the field, and outside of usage in an "expression", you can simply use a 1 to notate you want it included:
c1 = actors.aggregate([
{ "$match": { "fname"': f_name, "lname": l_name } },
{ "$project": {
"first_name": 1,
"last_name": 1,
"gender": 1,
"distinct_movies_played_in": {
"$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
}
}}
])
In fact, if you are actually using MongoDB 3.4 or greater ( and your notation of an element within an array "$acted_in.idmovies" says you have at least MongoDB 3.2 ) which has support for $addFields then use that instead of specifying all other fields in the document.
c1 = actors.aggregate([
{ "$match": { "fname"': f_name, "lname": l_name } },
{ "$addFields": {
"distinct_movies_played_in": {
"$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
}
}}
])
Unless you explicitly need to just specify "some" other fields.
The basic case here is do not use $unwind for array operations unless you specifically need to perform a $group operation on with it's _id key pointing at a value obtained from "within" the array.
In all other cases, MongoDB has far more efficient operators for working with arrays that what $unwind does.
This should give you what you want:
actors.aggregate([
{
$match: {fname: f_name, lname: l_name}
},
{
$unwind: '$tags'
},
{
$group: {
_id: '$_id',
first_name: {$first: '$fname'},
last_name: {$last: '$lname'},
gender: {$first: '$gender'},
tags: {$addToSet: '$tags'}
}
},
{
$project: {
first_name: 1,
last_name: 1,
gender: 1,
distinct: {$size: '$tags'}
}
}
])
After the tags array is deconstructed and then put back into a set of itself, then you just need to get the number of items or length of that set.