Multiple queries in one ElasticSearch Query - python

Here an example of an item indexed in ES :
{
"_id" : ..,
"class": "A",
"name": "item1"
}
I want a single query where I can get all items of the same class of the item with name "item1". So basically, I want all indexed items with class A, with only having the name.
I can do it with 2 queries :
Query 1 :
SEARCH
{
"query": {
"query_string": {
"default_field": "name",
"query": "item1"
}
}
Then from this I get the class and I write this query :
SEARCH
{
"query": {
"query_string": {
"default_field": "class",
"query": "A"
}
}
Any idea ? I know there's an easy way but I can't find it...

You can combine multiple queries with clauses using a bool query. In this case, two criteria must be satisified, so both queries should be must clauses
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "item1",
"fields": [
"name"
]
}
},
{
"query_string": {
"query": "A",
"fields": [
"class"
]
}
}
]
}
}
}
If you don't need relevancy scores, which it doesn't appear that you do in this case, both queries could be filter clauses instead of must clauses.
If name and class are mapped as keyword datatypes, you may want to use a term-level query as opposed to a full-text query like query_string query. Here's what that would look like, using filter clauses
{
"query": {
"bool": {
"filter": [
{
"term": {
"name": {
"value": "item1"
}
}
},
{
"term": {
"class": {
"value": "A"
}
}
}
]
}
}
}

Related

How to filter ElasticSearch results without having it affect the document score?

I am trying to filter my results on "publication_year" field but I don't want it to affect the score of the document, but if I add the "range" to the query or to "filter", it seems to affect the score and score the documents higher whose "publication_year" is closer to "lte" or "less than equal to" the upper limit in the "range".
My query:
query = {
'bool': {
'should': [
{
'match_phrase': {
"title": keywords
}
},
{
'match_phrase': {
"abstract": keywords
}
},
]
}
}
if publication_year_constraint:
range_query = {"range":{"publication_year":{"gte":publication_year_constraint, "lte": datetime.datetime.today().year}}}
query["bool"]["filter"] = [range_query]
tried putting the "range" inside the "should" block as well, similar results.
Try use Filter Context.
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated.
Example:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}

Query an elasticsearch index by an attribute, with a given range?

I want to query my index so that it matches whenever a particular attribute shows up called sitename, but I want all the data from a certain time range. I thought it might be something of the below but unsure:
{
"query": {
"range": {
"timestamp": {
"gte": "now-1h/h",
"lt": "now/h"
}
},
"match": {"sitename" : "HARB00ZAF0" }
}
}
You're almost there, but you need to leverage the bool queries
{
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1h/h",
"lt": "now/h"
}
}
}
],
"must": [
{
"match": {
"sitename": "HARB00ZAF0"
}
}
]
}
}
}

How do you filter on not null values elasticsearch?

I am trying to filter out values with not null :
Exemple with sql
SELECT ALL FROM Mytable WHERE field_1 NOT NULL and field_2 ="alpha"
How should I be writing this query in elasticsearch-dsl(python)?
I tried things like:
s = Mytable.search().query(
Q('match', field_2 ='alpha')
).filter(~Q('missing', field='field_1'))
but it returns elements with null values of field_1
Also, I tried this down, but it didn't work
field_name_1 = 'field_2'
value_1 = "alpha"
field_name_2 = 'field_1'
value_2 = " "
filter = {
"query": {
"bool": {
"must": [
{
"match": {
field_name_1 : value_1
}
},
{
"bool": {
"should": [
{
"bool": {
"must_not": [
{
field_name_2: {
"textContent": "*"
}
}
]
} }
]
}
}
]
}
}
}
I am not familiar with elasticsearch-dsl(python), but the following search query, will get you the same search result as you want :
SELECT ALL FROM Mytable WHERE field_1 NOT NULL and field_2 ="alpha"
With the help of below search query, the search result will be such that name="alpha" AND cost field will not be null. You can refer exists query to know more about this.
Index Data:
{ "name": "alpha","item": null }
{ "name": "beta","item": null }
{ "name": "alpha","item": 1 }
{ "name": "alpha","item": [] }
Search query:
You can combine a bool query with a exists query like this:
{
"query": {
"bool": {
"must": [
{
"term": {
"name": "alpha"
}
},
{
"exists": {
"field": "item"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "4",
"_score": 1.6931472,
"_source": {
"name": "alpha",
"item": 1
}
}
]
You can try this one:
s = Mytable.search()
.query(Q('bool', must=[Q('match', field_2='alpha'), Q('exists', field='field_1')]))
This is the way to use boolean compound query
query: {
bool: {
must: [
{
bool: {
must_not: {
missing: {
field: 'follower',
existence: true,
null_value: true,
},
},
},
},
{
nested: {
path: 'follower',
query: {
match: {
'follower.id': req.currentUser?.id,
},
},
},
},
],
},
},

Partial search using wildcard in Elastic Search

I want to search on array value in Elastic search using wildcard.
{
"query": {
"wildcard": {
"short_message": {
"value": "*nne*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
I am search on "short_messages", It's working for me.
But I want to search on "messages.message" it's not working.
{
"query": {
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
And I also want to search for multiple fields in an array.
For Example:-
fields: ["messages.message","messages.subject", "messages.email_search"]
It is possible then to give me the best solutions.
Thanks in Advance.
Seems like you are making used of nested datatype for messages.
You would need to make use of nested query for this:
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1
}
}
}
}
}
}
For multi-field querying, you can probably do it using query_string so basically your solution would be to make use of query_string inside a nested query.
Query String:
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"query_string": {
"fields": ["messages.message", "messages.subject"],
"query": "*nne*",
"boost": 1
}
}
}
}
}
Query DSL
You can also make use of wildcard using Query DSL but then again, you need to add multiple query clauses for every field, for performance reasons I suspect that wildcard queries doesn't support multi-field querying.
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"bool": {
"should": [
{
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1
}
}
},
{
"wildcard": {
"messages.subject": {
"value": "*nne*",
"boost": 1
}
}
}
]
}
}
}
}
}
Note that wildcard search is not advisable because of the number of regex operations it has to do and would affect your latency to get a response, instead I would recommend you to look into Ngram Tokenizer thereby which you can make use of a simple match query to get your desired result.
Let me know if this helps!

Querying nested objects in Elasticsearch

I have a Product-Merchant mapping which looks like the following
catalog_map = {
"catalog": {
"properties": {
"merchant_id": {
"type": "string",
},
"products": {
"type": "object",
},
"merchant_name" :{
"type" : "string"
}
}
}
}
"product" has objects, say , product_id , product_name , product_price. Products and merchants are mapped, such that :
for merchant in Merchant.objects.all() :
products = [{"product_name" : x.product.name, "product_price" : x.price, "product_id" : x.product.id , "product_category" : x.product.category.name} for x in MerchantProductMapping.objects.filter(merchant=merchant)]
tab = {
'merchant_id': merchant.id,
'merchant_name': merchant.name,
'product': products
}
res = es.index(index="my-index", doc_type='catalog', body=tab)
The data gets indexed smoothly, in the desired form. Now, when I query the data from given index, I do it in the following way :
GET /esearch-index/catalog/_search
{
"query": {
"bool" :{
"must": [
{"match": {
"merchant_name": {
"query": "Sir John"
}
}}],
"should": [
{"match": {
"product_name": {
"query": "Vanilla"
}
}}
]
}}
This query gives me the result of all the products in the index with merchant name "Sir John" . However, I want it to return the details of the product "Vanilla" sold by "Sir John" instead.
On someone's recommendation, I used "_source" while querying, but that doesn't help.
How can I single out the information of one single object from the entire "catalog" index of the merchant?
Once your bool query has a must clause, all the conditions inside of it are required. The conditions inside of the should clause are not required. They will only boost the results. (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html#query-dsl-bool-query)
So, going back to your query, it will retrieve all catalogs matching merchant_name "Sir John". This is the only required (must) condition. The name "Vanilla" will only boost results with the name "Vanilla" to the top, because it is not required.
If you want to retrieve "Vanilla" sold by "Sir John", put both conditions inside of the must clause and change your query to this:
{
"query": {
"bool": {
"must": [
{
"match": {
"merchant_name": {
"query": "Sir John"
}
}
},
{
"match": {
"product_name": {
"query": "Vanilla"
}
}
}
]
}
}
}

Categories

Resources