Using Python, I'm trying to go row-by-row through an Elasticsearch index with 12 billion documents and add a field to each document. The field is named direction and will contain "e" for some values of the field src and "e" for others. For this particular _id, the field should contain an "e".
from elasticsearch import Elasticsearch
es = Elasticsearch(["https://myESserver:9200"],
http_auth=('myUsername', 'myPassword'))
query_to_add_direction_field = {
"script": {
"inline": "direction=\"e\"",
"lang": "painless"
},
"query": {"constant_score": {
"filter": {"bool": {"must": [{"match": {"_id": "YKReAoQBk7dLIXMBhYBF"}}]}}}}
}
results = es.update_by_query(index="myIndex-*", body=query_to_add_direction_field)
I'm getting this error:
elasticsearch.BadRequestError: BadRequestError(400, 'script_exception', 'compile error')
I'm new to Elasticsearch. How can I correct my query so that it does not throw an error?
UPDATE:
I updated the code like this:
query_find_id = {
"size": "1",
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
query_to_add_direction_field = {
"script": {
"source": "ctx._source['egress'] = true",
"lang": "painless"
},
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
results = es.search(index="traffic-*", body=query_find_id)
results = es.update_by_query(index="traffic-*", body=query_to_add_direction_field)
results_after_update = es.search(index="traffic-*", body=query_find_id)
The code now runs without errors... I think I may have fixed it.
I say I think I may have fixed it because if I run the same code again, I get a version_conflict_engine_exception error on the call to update_by_query... but I think that just means the big 12B-row index is still being updated to match the change I made. Does that sound possibly accurate?
Please try the following query:
{
"script": {
"source": "ctx._source.direction = 'e'",
"lang": "painless"
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
]
}
}
}
}
}
Regarding version_conflict_engine_exception it happens because the version of the document is not the one that the update_by_query operation expects, for example, because other process updated that doc at the same time.
You can add /_update_by_query?conflicts=proceed to workaround the issue.
Read more about conflicts here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/docs-update-by-query.html#docs-update-by-query-api-desc
If you think it is a temporal conflict, you can use retry_on_conflict to try again after the conflicts:
retry_on_conflict
(Optional, integer) Specify how many times should the operation be retried when a conflict occurs. Default: 0.
Related
I want to search on array value in Elastic search using wildcard.
{
"query": {
"wildcard": {
"short_message": {
"value": "*nne*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
I am search on "short_messages", It's working for me.
But I want to search on "messages.message" it's not working.
{
"query": {
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
And I also want to search for multiple fields in an array.
For Example:-
fields: ["messages.message","messages.subject", "messages.email_search"]
It is possible then to give me the best solutions.
Thanks in Advance.
Seems like you are making used of nested datatype for messages.
You would need to make use of nested query for this:
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1
}
}
}
}
}
}
For multi-field querying, you can probably do it using query_string so basically your solution would be to make use of query_string inside a nested query.
Query String:
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"query_string": {
"fields": ["messages.message", "messages.subject"],
"query": "*nne*",
"boost": 1
}
}
}
}
}
Query DSL
You can also make use of wildcard using Query DSL but then again, you need to add multiple query clauses for every field, for performance reasons I suspect that wildcard queries doesn't support multi-field querying.
POST <your_index_name>/_search
{
"query": {
"nested": {
"path": "messages",
"query": {
"bool": {
"should": [
{
"wildcard": {
"messages.message": {
"value": "*nne*",
"boost": 1
}
}
},
{
"wildcard": {
"messages.subject": {
"value": "*nne*",
"boost": 1
}
}
}
]
}
}
}
}
}
Note that wildcard search is not advisable because of the number of regex operations it has to do and would affect your latency to get a response, instead I would recommend you to look into Ngram Tokenizer thereby which you can make use of a simple match query to get your desired result.
Let me know if this helps!
I have a basic Elasticsearch index that consists of a variety of help articles. Users can search for them in my Python/Django app.
The index has the following mappings:
{
"mappings": {
"properties": {
"body": {
"type": "text"
},
"category": {
"type": "nested",
"properties": {
"category_id": {
"type": "long"
},
"category_title": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
}
}
},
"title": {
"type": "keyword"
},
"date_updated": {
"type": "date"
},
"position": {
"type": "integer"
}
}
}
}
I basically want the user to be able to search for a query and get any results that match the article title or category.
Say I have an article called "I Can't Remember My Password" in the "Your Account" category.
If I search for the article title exactly, I see the result. If I search for the category title exactly, I also see the result.
But if I search for just "password", I get nothing. What do I need to change in my setup/query to make it so that this query (or similarly non-exact queries) also returns the result?
My query looks like:
{
"query": {
"bool": {
"should": [{
"multi_match": {
"fields": ["title"],
"query": "password"
}
},
{
"nested": {
"path": "category",
"query": {
"multi_match": {
"fields": ["category.category_title"],
"query": "password"
}
}
}
}
]
}
}
}
I have read other questions and experimented with various settings but no luck so far. I am not doing anything particularly special at index time in terms of preparing the fields so I don't know if that's something to look at. I'm just using the elasticsearch-dsl defaults.
The solution was to reindex the title field as text rather than keyword. The latter only allows exact matching.
Credit to LeBigCat for pointing that out in the comments. They haven't posted it as an answer so I'm doing it on their behalf to improve visibility.
I am trying to use scripting in Elasticsearch to update some data. My script is the following:
for i in df.index:
es.update(
index=indexout,
doc_type="suggestedTag",
id=df['dataId'][i],
_source=True,
body={
"script": {
"inline": "ctx._source.items.suggestionTime = updated_time",
"params": {
"updated_time": {
"field": df['suggestionTime'][i]
}
}
}
}
)
But when I do that I get the following error:
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code,error_message, additional_info) elasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', '[jLIZdmn][127.0.0.1:9300][indices:data/write/update[s]]')
And I have looked at this question to enable it, but even with this and the documentation it still raises the same error. I inserted the following elements in the config/elasticsearch.yml file :
script.inline: true
script.indexed: true
script.update: true
But I still cannot avoid the RequestError that I have since the beginning
You are almost there, just need to add params. before updated_time:
{
"script": {
"inline": "ctx._source.items.suggestionTime = params.updated_time",
"params": {
"updated_time": {
"field": df['suggestionTime'][i]
}
}
}
}
If you would try to run your query in Kibana console, it would look something like this:
POST /my-index-2018-12/doc/AWdpylbN3HZjlM-Ibd7X/_update
{
"script": {
"inline": "ctx._source.suggestionTime = updated_time",
"params": {
"updated_time": {
"field": "2018-10-03T18:33:00Z"
}
}
}
}
You would see the entire response of the Elasticsearch, that would look like your error message + valuable details:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[7JNqOhT][127.0.0.1:9300][indices:data/write/update[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"... _source.suggestionTime = updated_time",
" ^---- HERE"
],
"script": "ctx._source.suggestionTime = updated_time",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Variable [updated_time] is not defined."
}
}
},
"status": 400
}
Which points us to the syntax error (parameters, apparently, are injected as params object).
I believe the scripting settings are not the source of the problem in this case.
Hope that helps!
To provide some context :
I want to write a bulk update query(possibly affecting 0.5 - 1M docs). The update would be in the aspects field (shown below) which are mostly duplicated.
My thinking was if I normalised it into another entity (aspect_label), the amount of docs updated would be reduced drastically (say 500-1000 max).
Query : I want to find out if there is a way to get linked documents via id in Elastic Search.
Eg. if I have documents in index my_db according to the mapping below.
Just to point out : processed_reviews is a child of aspect_label
{
"my_db":{
"mappings":{
"processed_reviews":{
"_all":{
"enabled":false
},
"_parent":{
"type":"aspect_label"
},
"_routing":{
"required":true
},
"properties":{
"data":{
"properties":{
"insights":{
"type":"nested",
"properties":{
"aspects":{
"type":"nested",
"properties":{
"aspect_label_id":{
"type":"keyword"
},
"aspect_term_frequency":{
"type":"long"
}
}
}
}
},
"preprocessed_text":{
"type":"text"
},
"preprocessed_title":{
"type":"text"
}
}
}
}
}
}
}
}
And another entity aspect_label :
{
"my_db": {
"mappings": {
"aspect_label": {
"_all": {
"enabled": false
},
"properties": {
"aspect": {
"type": "keyword"
},
"aspect_label_new": {
"type": "keyword"
},
"aspect_label_old": {
"type": "text"
}
}
}
}
}
}
Now, I want to write a search query on the processed_reviews type such that the aspect_label_id entity is replaced with the the value of aspect_label_new in the doc or the entire doc in aspect_label matching the id.
{
"_index":"my_db",
"_type":"processed_reviews",
"_id":"191b3bff-4915-4404-a05a-10e6bd2b19d4",
"_score":1,
"_routing":"5",
"_parent":"5",
"_source":{
"data":{
"preprocessed_text":"Good product I really like so comfortable and so light wait and looks good",
"preprocessed_title":"Good choice",
"insights":[
{
"aspects":[
{
"aspect_label":"color",
"aspect_term_frequency":1
}
]
}
]
}
}
}
Also, if there is a better way to approach this problem/ something wrong with my approach or if this is possible or not. Please inform me of the same as well.
I have a managed Elasticsearch (5.3) instance on AWS.
I want to do a sort on the results in Elasticsearch but i always get
TransportError(500, u'search_phase_execution_exception', u'runtime error')
And i don't know why.
Looking into it in Kibana i get the following error.
"caused_by": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:336)",
"org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:111)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:87)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:84)",
"java.security.AccessController.doPrivileged(Native Method)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:84)",
"doc['value'].value.length()",
" ^---- HERE"
],
"script": "doc['value'].value.length()",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [value] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
And my query is:
"query": {
"query_string": {
"fields": [
"value"
],
"query": "*a*"
}
},
"sort": {
"_script": {
"script": "doc['value'].value.length()",
"order": "asc",
"type": "string"
}
}
Do scripts work in AWS Elasticsearch?
I just want to order my results by the string length