AWS Elasticsearch Scripts - python

I have a managed Elasticsearch (5.3) instance on AWS.
I want to do a sort on the results in Elasticsearch but i always get
TransportError(500, u'search_phase_execution_exception', u'runtime error')
And i don't know why.
Looking into it in Kibana i get the following error.
"caused_by": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:336)",
"org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:111)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:87)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:84)",
"java.security.AccessController.doPrivileged(Native Method)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:84)",
"doc['value'].value.length()",
" ^---- HERE"
],
"script": "doc['value'].value.length()",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [value] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
And my query is:
"query": {
"query_string": {
"fields": [
"value"
],
"query": "*a*"
}
},
"sort": {
"_script": {
"script": "doc['value'].value.length()",
"order": "asc",
"type": "string"
}
}
Do scripts work in AWS Elasticsearch?
I just want to order my results by the string length

Related

ElasticSearch - Compile Error on Adding a Field?

Using Python, I'm trying to go row-by-row through an Elasticsearch index with 12 billion documents and add a field to each document. The field is named direction and will contain "e" for some values of the field src and "e" for others. For this particular _id, the field should contain an "e".
from elasticsearch import Elasticsearch
es = Elasticsearch(["https://myESserver:9200"],
http_auth=('myUsername', 'myPassword'))
query_to_add_direction_field = {
"script": {
"inline": "direction=\"e\"",
"lang": "painless"
},
"query": {"constant_score": {
"filter": {"bool": {"must": [{"match": {"_id": "YKReAoQBk7dLIXMBhYBF"}}]}}}}
}
results = es.update_by_query(index="myIndex-*", body=query_to_add_direction_field)
I'm getting this error:
elasticsearch.BadRequestError: BadRequestError(400, 'script_exception', 'compile error')
I'm new to Elasticsearch. How can I correct my query so that it does not throw an error?
UPDATE:
I updated the code like this:
query_find_id = {
"size": "1",
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
query_to_add_direction_field = {
"script": {
"source": "ctx._source['egress'] = true",
"lang": "painless"
},
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
results = es.search(index="traffic-*", body=query_find_id)
results = es.update_by_query(index="traffic-*", body=query_to_add_direction_field)
results_after_update = es.search(index="traffic-*", body=query_find_id)
The code now runs without errors... I think I may have fixed it.
I say I think I may have fixed it because if I run the same code again, I get a version_conflict_engine_exception error on the call to update_by_query... but I think that just means the big 12B-row index is still being updated to match the change I made. Does that sound possibly accurate?
Please try the following query:
{
"script": {
"source": "ctx._source.direction = 'e'",
"lang": "painless"
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
]
}
}
}
}
}
Regarding version_conflict_engine_exception it happens because the version of the document is not the one that the update_by_query operation expects, for example, because other process updated that doc at the same time.
You can add /_update_by_query?conflicts=proceed to workaround the issue.
Read more about conflicts here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/docs-update-by-query.html#docs-update-by-query-api-desc
If you think it is a temporal conflict, you can use retry_on_conflict to try again after the conflicts:
retry_on_conflict
(Optional, integer) Specify how many times should the operation be retried when a conflict occurs. Default: 0.

Elasticsearch not returning result for single word query

I have a basic Elasticsearch index that consists of a variety of help articles. Users can search for them in my Python/Django app.
The index has the following mappings:
{
"mappings": {
"properties": {
"body": {
"type": "text"
},
"category": {
"type": "nested",
"properties": {
"category_id": {
"type": "long"
},
"category_title": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
}
}
},
"title": {
"type": "keyword"
},
"date_updated": {
"type": "date"
},
"position": {
"type": "integer"
}
}
}
}
I basically want the user to be able to search for a query and get any results that match the article title or category.
Say I have an article called "I Can't Remember My Password" in the "Your Account" category.
If I search for the article title exactly, I see the result. If I search for the category title exactly, I also see the result.
But if I search for just "password", I get nothing. What do I need to change in my setup/query to make it so that this query (or similarly non-exact queries) also returns the result?
My query looks like:
{
"query": {
"bool": {
"should": [{
"multi_match": {
"fields": ["title"],
"query": "password"
}
},
{
"nested": {
"path": "category",
"query": {
"multi_match": {
"fields": ["category.category_title"],
"query": "password"
}
}
}
}
]
}
}
}
I have read other questions and experimented with various settings but no luck so far. I am not doing anything particularly special at index time in terms of preparing the fields so I don't know if that's something to look at. I'm just using the elasticsearch-dsl defaults.
The solution was to reindex the title field as text rather than keyword. The latter only allows exact matching.
Credit to LeBigCat for pointing that out in the comments. They haven't posted it as an answer so I'm doing it on their behalf to improve visibility.

Update with scripting in Elastisearch

I am trying to use scripting in Elasticsearch to update some data. My script is the following:
for i in df.index:
es.update(
index=indexout,
doc_type="suggestedTag",
id=df['dataId'][i],
_source=True,
body={
"script": {
"inline": "ctx._source.items.suggestionTime = updated_time",
"params": {
"updated_time": {
"field": df['suggestionTime'][i]
}
}
}
}
)
But when I do that I get the following error:
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code,error_message, additional_info) elasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', '[jLIZdmn][127.0.0.1:9300][indices:data/write/update[s]]')
And I have looked at this question to enable it, but even with this and the documentation it still raises the same error. I inserted the following elements in the config/elasticsearch.yml file :
script.inline: true
script.indexed: true
script.update: true
But I still cannot avoid the RequestError that I have since the beginning
You are almost there, just need to add params. before updated_time:
{
"script": {
"inline": "ctx._source.items.suggestionTime = params.updated_time",
"params": {
"updated_time": {
"field": df['suggestionTime'][i]
}
}
}
}
If you would try to run your query in Kibana console, it would look something like this:
POST /my-index-2018-12/doc/AWdpylbN3HZjlM-Ibd7X/_update
{
"script": {
"inline": "ctx._source.suggestionTime = updated_time",
"params": {
"updated_time": {
"field": "2018-10-03T18:33:00Z"
}
}
}
}
You would see the entire response of the Elasticsearch, that would look like your error message + valuable details:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[7JNqOhT][127.0.0.1:9300][indices:data/write/update[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"... _source.suggestionTime = updated_time",
" ^---- HERE"
],
"script": "ctx._source.suggestionTime = updated_time",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Variable [updated_time] is not defined."
}
}
},
"status": 400
}
Which points us to the syntax error (parameters, apparently, are injected as params object).
I believe the scripting settings are not the source of the problem in this case.
Hope that helps!

Elasticsearch "failed to find geo_point field [location]" when the mapping is there for that field

I have an index with the following mapping:
{
"mappings":{
"my_stuff_type":{
"properties":{
"location": {
"type": "geo_point",
"null_value": -1
}
}
}
}
}
I have to use the property null_value because some of my documents don't have information about their location (latitude/longitude), but I still would like to search by distance on a location, cf. here: https://www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html
When checking the index mapping details, I can verify that the geo mapping is there:
curl -XGET http://localhost:9200/my_stuff_index/_mapping | jq '.my_stuff_index.mappings.my_stuff_type.properties.location'
{
"properties": {
"lat": {
"type": "float"
},
"lon": {
"type": "float"
}
}
}
However when trying to search for documents on that index using a geo distance filter (cf. https://www.elastic.co/guide/en/elasticsearch/guide/current/geo-distance.html), then I see this:
curl -XPOST http://localhost:9200/my_stuff_index/_search -d'
{
"query": {
"bool": {
"filter": {
"geo_distance": {
"location": {
"lat": <PUT_LATITUDE_FLOAT_HERE>,
"lon": <PUT_LONGITUDE_FLOAT_HERE>
},
"distance": "200m"
}
}
}
}
}' | jq
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to find geo_point field [location]",
"index_uuid": "mO94yEsHQseQDFPkHjM6tA",
"index": "my_stuff_index"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_stuff_index",
"node": "MDueSn31TS2z0Lamo64zbw",
"reason": {
"type": "query_shard_exception",
"reason": "failed to find geo_point field [location]",
"index_uuid": "mO94yEsHQseQDFPkHjM6tA",
"index": "my_stuff_index"
}
}
],
"caused_by": {
"type": "query_shard_exception",
"reason": "failed to find geo_point field [location]",
"index_uuid": "mO94yEsHQseQDFPkHjM6tA",
"index": "my_stuff_index"
}
},
"status": 400
}
I think the null_value property should allow me to insert documents without that location filed and at the same time I should be able to search with filters on that same "optional" field.
Why I am not able to filter on that "optional" field? How could I do this?
Edit:
To reproduce this issue with python run the following code snippet, before performing the curl/jq operations from the command line.
The python code depends on this: pip install elasticsearch==5.4.0.
from elasticsearch import Elasticsearch
from elasticsearch import helpers
my_docs = [
{"xyz": "foo", "location": {"lat": 0.0, "lon": 0.0}},
{"xyz": "bar", "location": {"lat": 50.0, "lon": 50.0}}
]
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
index_mapping = '''
{
"mappings":{
"my_stuff_type":{
"properties":{
"location": {
"type": "geo_point",
"null_value": -1.0
}
}
}
}
}'''
es.indices.create(index='my_stuff_index', ignore=400, body=index_mapping)
helpers.bulk(es, my_docs, index='my_stuff_index', doc_type='my_stuff_type')
as #Val has said you should change your mapping. If you define the location field in this way:
"location": {
"type": "geo_point"
}
you could index lan and lon as two different subfield - without declaring them in the mapping as i shown - as described in the documentation - look here

Elastic Search: including #/hashtags in search results

Using elastic search's query DSL this is how I am currently constructing my query:
elastic_sort = [
{ "timestamp": {"order": "desc" }},
"_score",
{ "name": { "order": "desc" }},
{ "channel": { "order": "desc" }},
]
elastic_query = {
"fuzzy_like_this" : {
"fields" : [ "msgs.channel", "msgs.msg", "msgs.name" ],
"like_text" : search_string,
"max_query_terms" : 10,
"fuzziness": 0.7,
}
}
res = self.es.search(index="chat", body={
"from" : from_result, "size" : results_per_page,
"track_scores": True,
"query": elastic_query,
"sort": elastic_sort,
})
I've been trying to implement a filter or an analyzer that will allow the inclusion of "#" in searches (I want a search for "#thing" to return results that include "#thing"), but I am coming up short. The error messages I am getting are not helpful and just telling me that my query is malformed.
I attempted to incorporate the method found here : http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html but it doesn't make any sense to me in context.
Does anyone have a clue how I can do this?
Did you create a mapping for you index? You can specify within your mapping to not analyze certain fields.
For example, a tweet mapping can be something like:
"tweet": {
"properties": {
"id": {
"type": "long"
},
"msg": {
"type": "string"
},
"hashtags": {
"type": "string",
"index": "not_analyzed"
}
}
}
You can then perform a term query on "hashtags" for an exact string match, including "#" character.
If you want "hashtags" to be tokenized as well, you can always create a multi-field for "hashtags".

Categories

Resources