Aggregation query fails using ElasticSearch Python client

Aggregation query fails using ElasticSearch Python client - python

Here is an aggregation query that works as expected when I use dev tools in on Elastic Search :
search_query = {
"aggs": {
"SHAID": {
"terms": {
"field": "identiferid",
"order": {
"sort": "desc"
},
# "size": 100000
},
"aggs": {
"update": {
"date_histogram": {
"field": "endTime",
"calendar_interval": "1d"
},
"aggs": {
"update1": {
"sum": {
"script": {
"lang": "painless",
"source":"""
if (doc['distanceIndex.att'].size()!=0) {
return doc['distanceIndex.att'].value;
}
else {
if (doc['distanceIndex.att2'].size()!=0) {
return doc['distanceIndex.att2'].value;
}
return null;
}
"""
}
}
},
"update2": {
"sum": {
"script": {
"lang": "painless",
"source":"""
if (doc['distanceIndex.att3'].size()!=0) {
return doc['distanceIndex.att3'].value;
}
else {
if (doc['distanceIndex.at4'].size()!=0) {
return doc['distanceIndex.att4'].value;
}
return null;
}
"""
}
}
},
}
},
"sort": {
"sum": {
"field": "time2"
}
}
}
}
},
"size": 0,
"query": {
"bool": {
"filter": [
{
"match_all": {}
},
{
"range": {
"endTime": {
"gte": "2021-11-01T00:00:00Z",
"lt": "2021-11-03T00:00:00Z"
}
}
}
]
}
}
}
When I attempt to execute this aggregation using the Python ElasticSearch client (https://elasticsearch-py.readthedocs.io/en/v7.15.1/) I receive the exception :
exception search() got multiple values for keyword argument 'size'
If I remove the attribute :
"size": 0,
From the query then the exception is not thrown but the aggregation does not run as "size": 0, is required for an aggregation.
Is there a different query format I should use for performing aggregations using the Python ElasticSearch client ?
Update :
Here is code used to invoke the query :
import elasticsearch
from elasticsearch import Elasticsearch, helpers
es_client = Elasticsearch(
["https://test-elastic.com"],
scheme="https",
port=443,
http_auth=("test-user", "test-password"),
maxsize=400,
timeout=120,
max_retries=10,
retry_on_timeout=True
)
query_response = helpers.scan(client=es_client,
query=search_query,
index="test_index",
clear_scroll=False,
request_timeout=1500)
rows = []
try:
for row in query_response:
rows.append(row)
except Exception as e:
print('exception' , e)
Using es_client :
es_client.search(index="test_index", query=search_query)
results in error :
/opt/oss/conda3/lib/python3.7/site-packages/elasticsearch/connection/base.py in _raise_error(self, status_code, raw_data)
336
337 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
--> 338 status_code, error_message, additional_info
339 )
340
RequestError: RequestError(400, 'parsing_exception', 'unknown query [aggs]')
Is aggs valid for search api ?

helpers.scan is a
Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.
It's meant to iterate through large result sets and comes with a default keyword argument of size=1000
To run an aggregation, use the es_client.search() method directly, passing in your query as body, and including "size": 0 in the query should be fine.

Related

ElasticSearch - Compile Error on Adding a Field?

Using Python, I'm trying to go row-by-row through an Elasticsearch index with 12 billion documents and add a field to each document. The field is named direction and will contain "e" for some values of the field src and "e" for others. For this particular _id, the field should contain an "e".
from elasticsearch import Elasticsearch
es = Elasticsearch(["https://myESserver:9200"],
http_auth=('myUsername', 'myPassword'))
query_to_add_direction_field = {
"script": {
"inline": "direction=\"e\"",
"lang": "painless"
},
"query": {"constant_score": {
"filter": {"bool": {"must": [{"match": {"_id": "YKReAoQBk7dLIXMBhYBF"}}]}}}}
}
results = es.update_by_query(index="myIndex-*", body=query_to_add_direction_field)
I'm getting this error:
elasticsearch.BadRequestError: BadRequestError(400, 'script_exception', 'compile error')
I'm new to Elasticsearch. How can I correct my query so that it does not throw an error?
UPDATE:
I updated the code like this:
query_find_id = {
"size": "1",
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
query_to_add_direction_field = {
"script": {
"source": "ctx._source['egress'] = true",
"lang": "painless"
},
"query": {
"bool": {
"filter": {
"term": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
}
}
}
results = es.search(index="traffic-*", body=query_find_id)
results = es.update_by_query(index="traffic-*", body=query_to_add_direction_field)
results_after_update = es.search(index="traffic-*", body=query_find_id)
The code now runs without errors... I think I may have fixed it.
I say I think I may have fixed it because if I run the same code again, I get a version_conflict_engine_exception error on the call to update_by_query... but I think that just means the big 12B-row index is still being updated to match the change I made. Does that sound possibly accurate?

Please try the following query:
{
"script": {
"source": "ctx._source.direction = 'e'",
"lang": "painless"
},
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"_id": "YKReAoQBk7dLIXMBhYBF"
}
}
]
}
}
}
}
}
Regarding version_conflict_engine_exception it happens because the version of the document is not the one that the update_by_query operation expects, for example, because other process updated that doc at the same time.
You can add /_update_by_query?conflicts=proceed to workaround the issue.
Read more about conflicts here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.5/docs-update-by-query.html#docs-update-by-query-api-desc
If you think it is a temporal conflict, you can use retry_on_conflict to try again after the conflicts:
retry_on_conflict
(Optional, integer) Specify how many times should the operation be retried when a conflict occurs. Default: 0.

reindex elasticsearch api timeout with large size of document [duplicate]

This question already has answers here:
Elasticsearch reindex error - client request timeout
(2 answers)
Closed 3 years ago.
I am re-indexing one index from python but size of document is large (6gig) and it take 60 min, so I am getting time out in api.
Code:
def Reindex(src, dest):
query = {
"source": {
"index": src,
"query": {
"range": {
"UTC_date": {
"lt": "now-15d/d"
}
}
}
},
"dest": {
"index": dest
}
}
Query = {
"query": {
"range": {
"UTC_date": {
"lt": "now-15d/d"
}
}
}
}
try:
result = es.reindex(query, wait_for_completion=True, request_timeout=300)
except:
pass

i found solution.because i reindex 6gig it takse more time so i increased time out and now it work
def Reindex(src, dest):
print("[X] START Reindex")
query = {
"source": {
"index": src,
"query": {
"range": {
"UTC_date": {
"lt": "now-1d/d"
}
}
}
},
"dest": {
"index": dest
}
}
Query = {
"query": {
"range": {
"UTC_date": {
"lt": "now-1d/d"
}
}
}
}
try:
result = es.reindex(query, wait_for_completion=True, request_timeout=10000,conflicts="proceed")
print(result)
log_dict = {}
log_dict['total']=result['total']
log_dict['created']=result['created']
log_dict['updated']=result['updated']
log_dict["Timestamp"] = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")
if log_dict['total']==(log_dict['created']+log_dict['updated']):
print("gggggg")
log_dict['status'] = 'success'
Delete(src)
else:
log_dict['status'] = 'failure'
access_logger.info(json.dumps(log_dict))

Get elasticsearch documents older than a certain age in minutes

I have a field in some of my documents if they've been individually queried before which is a unix timestamp:
"timelock": 1,561,081,724.254
Some documents don't have this if they've never been individually queried. I would like to also have a query that only returns documents that either DO NOT have the field or have the field but the difference between it's timestamp and the current time is greater than 10 minutes (600sec)
documents = es.search(index='index', size=10000, body={
"query": {
"bool": {
"must": [
{
"match_all": {}
},
],
"filter": [],
"should": [],
"must_not": [
]
}
}})
So I guess in pseudo-code I'd do it like:
if 'timelock' exists:
if current_time - 'timlock' > 600:
include in query
else:
exclude from query
else:
include in query
I'm using the python module for ES.

Why not simply using date math ?
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "timelock"
}
}
]
}
},
{
"range": {
"timelock": {
"lt": "now-10m"
}
}
}
]
}
}
}

I'm not aware of python syntax but what I can suggest via sudo code is to use the logic below:
compare_stamp = current_timestamp - 600
if 'timelock' exists:
if timelock < compare_stamp:
include document
else:
exclude document
else:
include document
Since you can easily get the compare_stamp in python script. This value can then be used in elastic query below:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "timelock"
}
}
]
}
},
{
"range": {
"timelock": {
"lt": compare_timestamp
}
}
}
]
}
}
}

Update with scripting in Elastisearch

I am trying to use scripting in Elasticsearch to update some data. My script is the following:
for i in df.index:
es.update(
index=indexout,
doc_type="suggestedTag",
id=df['dataId'][i],
_source=True,
body={
"script": {
"inline": "ctx._source.items.suggestionTime = updated_time",
"params": {
"updated_time": {
"field": df['suggestionTime'][i]
}
}
}
}
)
But when I do that I get the following error:
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code,error_message, additional_info) elasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', '[jLIZdmn][127.0.0.1:9300][indices:data/write/update[s]]')
And I have looked at this question to enable it, but even with this and the documentation it still raises the same error. I inserted the following elements in the config/elasticsearch.yml file :
script.inline: true
script.indexed: true
script.update: true
But I still cannot avoid the RequestError that I have since the beginning

You are almost there, just need to add params. before updated_time:
{
"script": {
"inline": "ctx._source.items.suggestionTime = params.updated_time",
"params": {
"updated_time": {
"field": df['suggestionTime'][i]
}
}
}
}
If you would try to run your query in Kibana console, it would look something like this:
POST /my-index-2018-12/doc/AWdpylbN3HZjlM-Ibd7X/_update
{
"script": {
"inline": "ctx._source.suggestionTime = updated_time",
"params": {
"updated_time": {
"field": "2018-10-03T18:33:00Z"
}
}
}
}
You would see the entire response of the Elasticsearch, that would look like your error message + valuable details:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[7JNqOhT][127.0.0.1:9300][indices:data/write/update[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"... _source.suggestionTime = updated_time",
" ^---- HERE"
],
"script": "ctx._source.suggestionTime = updated_time",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Variable [updated_time] is not defined."
}
}
},
"status": 400
}
Which points us to the syntax error (parameters, apparently, are injected as params object).
I believe the scripting settings are not the source of the problem in this case.
Hope that helps!

How to have Range and Match query in one elastic search query using python?

I have to find the matching documents which have the string, for example: "sky", within some "key" range. When I write separate match and range query, I get the output from the ES but it throws an exception when merged together.
range query:
res = es.search(index="dummy",
body={"from":0, "size":0,"query": {"range":{"key":{"gte":"1000"}}}})
match query:
res = es.search(index="dummy",
body={"from":0, "size":0,"query": {"match":{"word":"sky"}}})
combined query:
res = es.search(index="dummy",
body={
"from":0,
"size":0,
"query": {
"range":{
"key":{"gte":"1000"}
}
},
"match":{"word":"sky"}
})
The combined query when executed throws the error:
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'parsing_exception', u'Unknown key for a START_OBJECT in [match].')
What is the correct way of merging both the queries?

You need to do it like this using a bool/must query
res = es.search(index="dummy", body={
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"key": {
"gte": "1000"
}
}
},
{
"match": {
"word": "sky"
}
}
]
}
}
})

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Aggregation query fails using ElasticSearch Python client - python

Related

ElasticSearch - Compile Error on Adding a Field?

reindex elasticsearch api timeout with large size of document [duplicate]

Get elasticsearch documents older than a certain age in minutes

Update with scripting in Elastisearch

How to have Range and Match query in one elastic search query using python?

Categories

Resources