Update with scripting in Elastisearch - python

I am trying to use scripting in Elasticsearch to update some data. My script is the following:
for i in df.index:
es.update(
index=indexout,
doc_type="suggestedTag",
id=df['dataId'][i],
_source=True,
body={
"script": {
"inline": "ctx._source.items.suggestionTime = updated_time",
"params": {
"updated_time": {
"field": df['suggestionTime'][i]
}
}
}
}
)
But when I do that I get the following error:
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code,error_message, additional_info) elasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', '[jLIZdmn][127.0.0.1:9300][indices:data/write/update[s]]')
And I have looked at this question to enable it, but even with this and the documentation it still raises the same error. I inserted the following elements in the config/elasticsearch.yml file :
script.inline: true
script.indexed: true
script.update: true
But I still cannot avoid the RequestError that I have since the beginning

You are almost there, just need to add params. before updated_time:
{
"script": {
"inline": "ctx._source.items.suggestionTime = params.updated_time",
"params": {
"updated_time": {
"field": df['suggestionTime'][i]
}
}
}
}
If you would try to run your query in Kibana console, it would look something like this:
POST /my-index-2018-12/doc/AWdpylbN3HZjlM-Ibd7X/_update
{
"script": {
"inline": "ctx._source.suggestionTime = updated_time",
"params": {
"updated_time": {
"field": "2018-10-03T18:33:00Z"
}
}
}
}
You would see the entire response of the Elasticsearch, that would look like your error message + valuable details:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[7JNqOhT][127.0.0.1:9300][indices:data/write/update[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"... _source.suggestionTime = updated_time",
" ^---- HERE"
],
"script": "ctx._source.suggestionTime = updated_time",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Variable [updated_time] is not defined."
}
}
},
"status": 400
}
Which points us to the syntax error (parameters, apparently, are injected as params object).
I believe the scripting settings are not the source of the problem in this case.
Hope that helps!

Related

ElasticSearch - Compile Error on Adding a Field?

Using Python, I'm trying to go row-by-row through an Elasticsearch index with 12 billion documents and add a field to each document. The field is named direction and will contain "e" for some values of the field src and "e" for others. For this particular _id, the field should contain an "e".
from elasticsearch import Elasticsearch
es = Elasticsearch(["https://myESserver:9200"],
http_auth=('myUsername', 'myPassword'))
query_to_add_direction_field = {
"script": {
"inline": "direction=\"e\"",
"lang": "painless"
},
"query": {"constant_score": {
"filter": {"bool": {"must": [{"match": {"_id": "YKReAoQBk7dLIXMBhYBF"}}]}}}}
}
results = es.update_by_query(index="myIndex-*", body=query_to_add_direction_field)
I'm getting this error:
elasticsearch.BadRequestError: BadRequestError(400, 'script_exception', 'compile error')
I'm new to Elasticsearch. How can I correct my query so that it does not throw an error?
UPDATE:
I updated the code like this:
query_find_id = {
"size": "1",
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
query_to_add_direction_field = {
"script": {
"source": "ctx._source['egress'] = true",
"lang": "painless"
},
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
results = es.search(index="traffic-*", body=query_find_id)
results = es.update_by_query(index="traffic-*", body=query_to_add_direction_field)
results_after_update = es.search(index="traffic-*", body=query_find_id)
The code now runs without errors... I think I may have fixed it.
I say I think I may have fixed it because if I run the same code again, I get a version_conflict_engine_exception error on the call to update_by_query... but I think that just means the big 12B-row index is still being updated to match the change I made. Does that sound possibly accurate?
Please try the following query:
{
"script": {
"source": "ctx._source.direction = 'e'",
"lang": "painless"
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
]
}
}
}
}
}
Regarding version_conflict_engine_exception it happens because the version of the document is not the one that the update_by_query operation expects, for example, because other process updated that doc at the same time.
You can add /_update_by_query?conflicts=proceed to workaround the issue.
Read more about conflicts here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/docs-update-by-query.html#docs-update-by-query-api-desc
If you think it is a temporal conflict, you can use retry_on_conflict to try again after the conflicts:
retry_on_conflict
(Optional, integer) Specify how many times should the operation be retried when a conflict occurs. Default: 0.

Aggregation query fails using ElasticSearch Python client

Here is an aggregation query that works as expected when I use dev tools in on Elastic Search :
search_query = {
"aggs": {
"SHAID": {
"terms": {
"field": "identiferid",
"order": {
"sort": "desc"
},
# "size": 100000
},
"aggs": {
"update": {
"date_histogram": {
"field": "endTime",
"calendar_interval": "1d"
},
"aggs": {
"update1": {
"sum": {
"script": {
"lang": "painless",
"source":"""
if (doc['distanceIndex.att'].size()!=0) {
return doc['distanceIndex.att'].value;
}
else {
if (doc['distanceIndex.att2'].size()!=0) {
return doc['distanceIndex.att2'].value;
}
return null;
}
"""
}
}
},
"update2": {
"sum": {
"script": {
"lang": "painless",
"source":"""
if (doc['distanceIndex.att3'].size()!=0) {
return doc['distanceIndex.att3'].value;
}
else {
if (doc['distanceIndex.at4'].size()!=0) {
return doc['distanceIndex.att4'].value;
}
return null;
}
"""
}
}
},
}
},
"sort": {
"sum": {
"field": "time2"
}
}
}
}
},
"size": 0,
"query": {
"bool": {
"filter": [
{
"match_all": {}
},
{
"range": {
"endTime": {
"gte": "2021-11-01T00:00:00Z",
"lt": "2021-11-03T00:00:00Z"
}
}
}
]
}
}
}
When I attempt to execute this aggregation using the Python ElasticSearch client (https://elasticsearch-py.readthedocs.io/en/v7.15.1/) I receive the exception :
exception search() got multiple values for keyword argument 'size'
If I remove the attribute :
"size": 0,
From the query then the exception is not thrown but the aggregation does not run as "size": 0, is required for an aggregation.
Is there a different query format I should use for performing aggregations using the Python ElasticSearch client ?
Update :
Here is code used to invoke the query :
import elasticsearch
from elasticsearch import Elasticsearch, helpers
es_client = Elasticsearch(
["https://test-elastic.com"],
scheme="https",
port=443,
http_auth=("test-user", "test-password"),
maxsize=400,
timeout=120,
max_retries=10,
retry_on_timeout=True
)
query_response = helpers.scan(client=es_client,
query=search_query,
index="test_index",
clear_scroll=False,
request_timeout=1500)
rows = []
try:
for row in query_response:
rows.append(row)
except Exception as e:
print('exception' , e)
Using es_client :
es_client.search(index="test_index", query=search_query)
results in error :
/opt/oss/conda3/lib/python3.7/site-packages/elasticsearch/connection/base.py in _raise_error(self, status_code, raw_data)
336
337 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
--> 338 status_code, error_message, additional_info
339 )
340
RequestError: RequestError(400, 'parsing_exception', 'unknown query [aggs]')
Is aggs valid for search api ?
helpers.scan is a
Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.
It's meant to iterate through large result sets and comes with a default keyword argument of size=1000
To run an aggregation, use the es_client.search() method directly, passing in your query as body, and including "size": 0 in the query should be fine.

AWS Lex: sending a response from lambda function with python

I am having trouble with sending a JSON response from my python3.8 lambda function (default lambda_handler function). I am pretty sure I understand what I am doing after reading most of the docs and the Lambda Function Input Event and Response Format. from that resource, it says the only required section is the 'dialogAction' section.
Right now, my lex-bot has 1 intent and one slot. I know that this works because when I add a logger to the code, I can see that my lambda function is recieving confirmed JSON format.
My code tries to send a final response from the lambda function, but when I run the lex-bot in the console I get the following error:
Invalid Lambda Response: Received invalid response from Lambda: Can not construct instance of IntentResponse, problem: The validated object is null at [Source: {"dialogAction": {"type": "Close", "fulfillmentState": "Fulfilled", "message": {"contentType": "PlainText", "content": "milk"}}}; line: 1, column: 128]
Here is my python code:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
def lambda_handler(event, context):
# print
item = event["sessionState"]["intent"]["slots"]["MilkProduct"]["value"]["resolvedValues"][0]
logger.debug(item)
return{
"dialogAction": {
"type": "Close",
"fulfillmentState": "Fulfilled",
"message": {
"contentType": "PlainText",
"content": item
}
}
}
I do not think this is necessary for you to see, but here is what the lex-bot is sending me after it confirms the slot for the intent has been confirmed:
{
"sessionId": "120304235774457",
"inputTranscript": "I want to buy milk",
"interpretations": [
{
"intent": {
"slots": {
"MilkProduct": {
"shape": "Scalar",
"value": {
"originalValue": "milk",
"resolvedValues": [
"milk"
],
"interpretedValue": "milk"
}
}
},
"confirmationState": "None",
"name": "BuyCream",
"state": "ReadyForFulfillment"
},
"nluConfidence": 1
},
{
"intent": {
"slots": {},
"confirmationState": "None",
"name": "FallbackIntent",
"state": "ReadyForFulfillment"
}
}
],
"responseContentType": "text/plain; charset=utf-8",
"invocationSource": "FulfillmentCodeHook",
"messageVersion": "1.0",
"sessionState": {
"intent": {
"slots": {
"MilkProduct": {
"shape": "Scalar",
"value": {
"originalValue": "milk",
"resolvedValues": [
"milk"
],
"interpretedValue": "milk"
}
}
},
"confirmationState": "None",
"name": "BuyCream",
"state": "ReadyForFulfillment"
},
"originatingRequestId": "417dff57-5260-45cc-81a7-06df13fbee9a"
},
"inputMode": "Text",
"bot": {
"aliasId": "TSTALIASID",
"aliasName": "TestBotAlias",
"name": "Shopping",
"version": "DRAFT",
"localeId": "en_US",
"id": "JTGNDOEVQG"
}
}
Can someone please tell me what I am doing wrong? I have been at this for hours and I seriously do not know what I am doing wrong.
Thanks

Elasticsearch "failed to find geo_point field [location]" when the mapping is there for that field

I have an index with the following mapping:
{
"mappings":{
"my_stuff_type":{
"properties":{
"location": {
"type": "geo_point",
"null_value": -1
}
}
}
}
}
I have to use the property null_value because some of my documents don't have information about their location (latitude/longitude), but I still would like to search by distance on a location, cf. here: https://www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html
When checking the index mapping details, I can verify that the geo mapping is there:
curl -XGET http://localhost:9200/my_stuff_index/_mapping | jq '.my_stuff_index.mappings.my_stuff_type.properties.location'
{
"properties": {
"lat": {
"type": "float"
},
"lon": {
"type": "float"
}
}
}
However when trying to search for documents on that index using a geo distance filter (cf. https://www.elastic.co/guide/en/elasticsearch/guide/current/geo-distance.html), then I see this:
curl -XPOST http://localhost:9200/my_stuff_index/_search -d'
{
"query": {
"bool": {
"filter": {
"geo_distance": {
"location": {
"lat": <PUT_LATITUDE_FLOAT_HERE>,
"lon": <PUT_LONGITUDE_FLOAT_HERE>
},
"distance": "200m"
}
}
}
}
}' | jq
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to find geo_point field [location]",
"index_uuid": "mO94yEsHQseQDFPkHjM6tA",
"index": "my_stuff_index"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_stuff_index",
"node": "MDueSn31TS2z0Lamo64zbw",
"reason": {
"type": "query_shard_exception",
"reason": "failed to find geo_point field [location]",
"index_uuid": "mO94yEsHQseQDFPkHjM6tA",
"index": "my_stuff_index"
}
}
],
"caused_by": {
"type": "query_shard_exception",
"reason": "failed to find geo_point field [location]",
"index_uuid": "mO94yEsHQseQDFPkHjM6tA",
"index": "my_stuff_index"
}
},
"status": 400
}
I think the null_value property should allow me to insert documents without that location filed and at the same time I should be able to search with filters on that same "optional" field.
Why I am not able to filter on that "optional" field? How could I do this?
Edit:
To reproduce this issue with python run the following code snippet, before performing the curl/jq operations from the command line.
The python code depends on this: pip install elasticsearch==5.4.0.
from elasticsearch import Elasticsearch
from elasticsearch import helpers
my_docs = [
{"xyz": "foo", "location": {"lat": 0.0, "lon": 0.0}},
{"xyz": "bar", "location": {"lat": 50.0, "lon": 50.0}}
]
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
index_mapping = '''
{
"mappings":{
"my_stuff_type":{
"properties":{
"location": {
"type": "geo_point",
"null_value": -1.0
}
}
}
}
}'''
es.indices.create(index='my_stuff_index', ignore=400, body=index_mapping)
helpers.bulk(es, my_docs, index='my_stuff_index', doc_type='my_stuff_type')
as #Val has said you should change your mapping. If you define the location field in this way:
"location": {
"type": "geo_point"
}
you could index lan and lon as two different subfield - without declaring them in the mapping as i shown - as described in the documentation - look here

AWS Elasticsearch Scripts

I have a managed Elasticsearch (5.3) instance on AWS.
I want to do a sort on the results in Elasticsearch but i always get
TransportError(500, u'search_phase_execution_exception', u'runtime error')
And i don't know why.
Looking into it in Kibana i get the following error.
"caused_by": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:336)",
"org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:111)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:87)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:84)",
"java.security.AccessController.doPrivileged(Native Method)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:84)",
"doc['value'].value.length()",
" ^---- HERE"
],
"script": "doc['value'].value.length()",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [value] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
And my query is:
"query": {
"query_string": {
"fields": [
"value"
],
"query": "*a*"
}
},
"sort": {
"_script": {
"script": "doc['value'].value.length()",
"order": "asc",
"type": "string"
}
}
Do scripts work in AWS Elasticsearch?
I just want to order my results by the string length

Categories

Resources