Partial search using wildcard in Elastic Search - python

I want to search on array value in Elastic search using wildcard.
{
"query": {
"wildcard": {
"short_message": {
"value": "*nne*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
I am search on "short_messages", It's working for me.
But I want to search on "messages.message" it's not working.
{
"query": {
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
And I also want to search for multiple fields in an array.
For Example:-
fields: ["messages.message","messages.subject", "messages.email_search"]
It is possible then to give me the best solutions.
Thanks in Advance.

Seems like you are making used of nested datatype for messages.
You would need to make use of nested query for this:
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1
}
}
}
}
}
}
For multi-field querying, you can probably do it using query_string so basically your solution would be to make use of query_string inside a nested query.
Query String:
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"query_string": {
"fields": ["messages.message", "messages.subject"],
"query": "*nne*",
"boost": 1
}
}
}
}
}
Query DSL
You can also make use of wildcard using Query DSL but then again, you need to add multiple query clauses for every field, for performance reasons I suspect that wildcard queries doesn't support multi-field querying.
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"bool": {
"should": [
{
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1
}
}
},
{
"wildcard": {
"messages.subject": {
"value": "*nne*",
"boost": 1
}
}
}
]
}
}
}
}
}
Note that wildcard search is not advisable because of the number of regex operations it has to do and would affect your latency to get a response, instead I would recommend you to look into Ngram Tokenizer thereby which you can make use of a simple match query to get your desired result.
Let me know if this helps!

Related

How to filter ElasticSearch results without having it affect the document score?

I am trying to filter my results on "publication_year" field but I don't want it to affect the score of the document, but if I add the "range" to the query or to "filter", it seems to affect the score and score the documents higher whose "publication_year" is closer to "lte" or "less than equal to" the upper limit in the "range".
My query:
query = {
'bool': {
'should': [
{
'match_phrase': {
"title": keywords
}
},
{
'match_phrase': {
"abstract": keywords
}
},
]
}
}
if publication_year_constraint:
range_query = {"range":{"publication_year":{"gte":publication_year_constraint, "lte": datetime.datetime.today().year}}}
query["bool"]["filter"] = [range_query]
tried putting the "range" inside the "should" block as well, similar results.
Try use Filter Context.
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated.
Example:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}

ElasticSearch - Compile Error on Adding a Field?

Using Python, I'm trying to go row-by-row through an Elasticsearch index with 12 billion documents and add a field to each document. The field is named direction and will contain "e" for some values of the field src and "e" for others. For this particular _id, the field should contain an "e".
from elasticsearch import Elasticsearch
es = Elasticsearch(["https://myESserver:9200"],
http_auth=('myUsername', 'myPassword'))
query_to_add_direction_field = {
"script": {
"inline": "direction=\"e\"",
"lang": "painless"
},
"query": {"constant_score": {
"filter": {"bool": {"must": [{"match": {"_id": "YKReAoQBk7dLIXMBhYBF"}}]}}}}
}
results = es.update_by_query(index="myIndex-*", body=query_to_add_direction_field)
I'm getting this error:
elasticsearch.BadRequestError: BadRequestError(400, 'script_exception', 'compile error')
I'm new to Elasticsearch. How can I correct my query so that it does not throw an error?
UPDATE:
I updated the code like this:
query_find_id = {
"size": "1",
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
query_to_add_direction_field = {
"script": {
"source": "ctx._source['egress'] = true",
"lang": "painless"
},
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
results = es.search(index="traffic-*", body=query_find_id)
results = es.update_by_query(index="traffic-*", body=query_to_add_direction_field)
results_after_update = es.search(index="traffic-*", body=query_find_id)
The code now runs without errors... I think I may have fixed it.
I say I think I may have fixed it because if I run the same code again, I get a version_conflict_engine_exception error on the call to update_by_query... but I think that just means the big 12B-row index is still being updated to match the change I made. Does that sound possibly accurate?
Please try the following query:
{
"script": {
"source": "ctx._source.direction = 'e'",
"lang": "painless"
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
]
}
}
}
}
}
Regarding version_conflict_engine_exception it happens because the version of the document is not the one that the update_by_query operation expects, for example, because other process updated that doc at the same time.
You can add /_update_by_query?conflicts=proceed to workaround the issue.
Read more about conflicts here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/docs-update-by-query.html#docs-update-by-query-api-desc
If you think it is a temporal conflict, you can use retry_on_conflict to try again after the conflicts:
retry_on_conflict
(Optional, integer) Specify how many times should the operation be retried when a conflict occurs. Default: 0.

How to Search in multiple OR conditions in Elastic search in Python

I have to do a search for all items in array along with a static detail in elastic search.
Fields in Elastics search index: tech_id, detail, volume
tech_ids = ['qwe1', 'qwe2', 'qwe3', 'qwe4', 'qwe5', 'qwe6', 'qwe7']
Number of tech_id in array can differ.
Now my search has to take place in a combination of tech_id and detail where tech_id varies while detail stays static. This combination is an or combination. In the end i am expecting search to have with provided tech_ids and static detail.
tech_ids = ['qwe1', 'qwe2', 'qwe3', 'qwe4', 'qwe5', 'qwe6', 'qwe7']
"query": {
"bool": {
"must": [
{
"match": {
"detail": "calci"
}
},
{
"match_phrase": {
"tech_id": tech_ids[0]
}
}]
}
What you're after, I think, is a bool-should within a bool-must:
{
"query": {
"bool": {
"must": [
{
"match": {
"detail": "calci"
}
},
{
"bool": {
"should":
[{
"match_phrase": { "tech_id": tid }
} for tid in tech_ids]
}
}
]
}
}
}

Multiple queries in one ElasticSearch Query

Here an example of an item indexed in ES :
{
"_id" : ..,
"class": "A",
"name": "item1"
}
I want a single query where I can get all items of the same class of the item with name "item1". So basically, I want all indexed items with class A, with only having the name.
I can do it with 2 queries :
Query 1 :
SEARCH
{
"query": {
"query_string": {
"default_field": "name",
"query": "item1"
}
}
Then from this I get the class and I write this query :
SEARCH
{
"query": {
"query_string": {
"default_field": "class",
"query": "A"
}
}
Any idea ? I know there's an easy way but I can't find it...
You can combine multiple queries with clauses using a bool query. In this case, two criteria must be satisified, so both queries should be must clauses
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "item1",
"fields": [
"name"
]
}
},
{
"query_string": {
"query": "A",
"fields": [
"class"
]
}
}
]
}
}
}
If you don't need relevancy scores, which it doesn't appear that you do in this case, both queries could be filter clauses instead of must clauses.
If name and class are mapped as keyword datatypes, you may want to use a term-level query as opposed to a full-text query like query_string query. Here's what that would look like, using filter clauses
{
"query": {
"bool": {
"filter": [
{
"term": {
"name": {
"value": "item1"
}
}
},
{
"term": {
"class": {
"value": "A"
}
}
}
]
}
}
}

Getting linked documents in single lookup query in Elastic Search

To provide some context :
I want to write a bulk update query(possibly affecting 0.5 - 1M docs). The update would be in the aspects field (shown below) which are mostly duplicated.
My thinking was if I normalised it into another entity (aspect_label), the amount of docs updated would be reduced drastically (say 500-1000 max).
Query : I want to find out if there is a way to get linked documents via id in Elastic Search.
Eg. if I have documents in index my_db according to the mapping below.
Just to point out : processed_reviews is a child of aspect_label
{
"my_db":{
"mappings":{
"processed_reviews":{
"_all":{
"enabled":false
},
"_parent":{
"type":"aspect_label"
},
"_routing":{
"required":true
},
"properties":{
"data":{
"properties":{
"insights":{
"type":"nested",
"properties":{
"aspects":{
"type":"nested",
"properties":{
"aspect_label_id":{
"type":"keyword"
},
"aspect_term_frequency":{
"type":"long"
}
}
}
}
},
"preprocessed_text":{
"type":"text"
},
"preprocessed_title":{
"type":"text"
}
}
}
}
}
}
}
}
And another entity aspect_label :
{
"my_db": {
"mappings": {
"aspect_label": {
"_all": {
"enabled": false
},
"properties": {
"aspect": {
"type": "keyword"
},
"aspect_label_new": {
"type": "keyword"
},
"aspect_label_old": {
"type": "text"
}
}
}
}
}
}
Now, I want to write a search query on the processed_reviews type such that the aspect_label_id entity is replaced with the the value of aspect_label_new in the doc or the entire doc in aspect_label matching the id.
{
"_index":"my_db",
"_type":"processed_reviews",
"_id":"191b3bff-4915-4404-a05a-10e6bd2b19d4",
"_score":1,
"_routing":"5",
"_parent":"5",
"_source":{
"data":{
"preprocessed_text":"Good product I really like so comfortable and so light wait and looks good",
"preprocessed_title":"Good choice",
"insights":[
{
"aspects":[
{
"aspect_label":"color",
"aspect_term_frequency":1
}
]
}
]
}
}
}
Also, if there is a better way to approach this problem/ something wrong with my approach or if this is possible or not. Please inform me of the same as well.

Categories

Resources