Elasticserach, getting error when trying to query on on time

Elasticserach, getting error when trying to query on on time - python

I have documents with timestamp of following format:
2022-11-17T17:16:26.397Z
I try to get all documents on each day between two dates, and on each day between, lets say 11:05 and 15:05.
This is my query:
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "2022-11-01",
"lte": "2022-11-30"
}
}
}, {
"script": {
"script": {
"source": "doc.timestamp.getHourOfDay() >= params.min && doc.timestamp.getHourOfDay() <= params.max",
"params": {
"min": 11,
"max": 15
}
}
}
}
]
}
}
}
}
}
EDIT#rabbitbt i ran you query on two different documents:
Okay, after lots of testing with your Query i find out that it gives a runtime Error whenever the timestamp includes a 0 directly after the T.
For example
"timestamp": "2022-11-07T01:04:39.357551"
Any idea how i can change the query to fix this?
Thanks for all the help, in the end i got it working by replacing the line in my original query:
"source": "doc.timestamp.getHourOfDay() >= params.min && doc.timestamp.getHourOfDay() <= params.max",
to
source": "doc['timestamp'].value.getHour() >= params.min &&
doc['timestamp'].value.getHour() <= params.max",

My suggestion:
POST bar/_doc
{
"date":"2022-11-14T11:12:46"
}
Python code
doc = {"date": "2022-11-17T11:16:26.397Z"}
response_index = get_client_es().index(index="bar", body=doc, refresh="wait_for")
print(response_index)
query = {
"query": {
"bool": {
"filter": [
{
"range": {
"date": {
"gte": "2022-11-01",
"lte": "2022-11-30"
}
}
},
{
"script": {
"script": {
"lang": "painless",
"source": """
def targetDate = doc['date'].value;
def targetMinute = targetDate.getMinute();
if(targetDate.getMinute() < 10)
{
targetMinute = "0" + targetDate.getMinute();
}
def timeFrom = LocalTime.parse(params.timeFrom);
def timeTo = LocalTime.parse(params.timeTo);
def target = LocalTime.parse(targetDate.getHour().toString()
+ ":"+ targetMinute);
if(target.isBefore(timeTo) && target.isAfter(timeFrom)) {
return true;
}
""",
"params": {
"timeFrom": "10:30",
"timeTo": "15:13"
}
}
}
}
]
}
}
}
result = get_client_es().search(index="bar", body=query)
print(result)

Related

Query an elasticsearch index by an attribute, with a given range?

I want to query my index so that it matches whenever a particular attribute shows up called sitename, but I want all the data from a certain time range. I thought it might be something of the below but unsure:
{
"query": {
"range": {
"timestamp": {
"gte": "now-1h/h",
"lt": "now/h"
}
},
"match": {"sitename" : "HARB00ZAF0" }
}
}

You're almost there, but you need to leverage the bool queries
{
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1h/h",
"lt": "now/h"
}
}
}
],
"must": [
{
"match": {
"sitename": "HARB00ZAF0"
}
}
]
}
}
}

How do you filter on not null values elasticsearch?

I am trying to filter out values with not null :
Exemple with sql
SELECT ALL FROM Mytable WHERE field_1 NOT NULL and field_2 ="alpha"
How should I be writing this query in elasticsearch-dsl(python)?
I tried things like:
s = Mytable.search().query(
Q('match', field_2 ='alpha')
).filter(~Q('missing', field='field_1'))
but it returns elements with null values of field_1
Also, I tried this down, but it didn't work
field_name_1 = 'field_2'
value_1 = "alpha"
field_name_2 = 'field_1'
value_2 = " "
filter = {
"query": {
"bool": {
"must": [
{
"match": {
field_name_1 : value_1
}
},
{
"bool": {
"should": [
{
"bool": {
"must_not": [
{
field_name_2: {
"textContent": "*"
}
}
]
} }
]
}
}
]
}
}
}

I am not familiar with elasticsearch-dsl(python), but the following search query, will get you the same search result as you want :
SELECT ALL FROM Mytable WHERE field_1 NOT NULL and field_2 ="alpha"
With the help of below search query, the search result will be such that name="alpha" AND cost field will not be null. You can refer exists query to know more about this.
Index Data:
{ "name": "alpha","item": null }
{ "name": "beta","item": null }
{ "name": "alpha","item": 1 }
{ "name": "alpha","item": [] }
Search query:
You can combine a bool query with a exists query like this:
{
"query": {
"bool": {
"must": [
{
"term": {
"name": "alpha"
}
},
{
"exists": {
"field": "item"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "4",
"_score": 1.6931472,
"_source": {
"name": "alpha",
"item": 1
}
}
]

You can try this one:
s = Mytable.search()
.query(Q('bool', must=[Q('match', field_2='alpha'), Q('exists', field='field_1')]))
This is the way to use boolean compound query

query: {
bool: {
must: [
{
bool: {
must_not: {
missing: {
field: 'follower',
existence: true,
null_value: true,
},
},
},
},
{
nested: {
path: 'follower',
query: {
match: {
'follower.id': req.currentUser?.id,
},
},
},
},
],
},
},

reindex elasticsearch api timeout with large size of document [duplicate]

This question already has answers here:
Elasticsearch reindex error - client request timeout
(2 answers)
Closed 3 years ago.
I am re-indexing one index from python but size of document is large (6gig) and it take 60 min, so I am getting time out in api.
Code:
def Reindex(src, dest):
query = {
"source": {
"index": src,
"query": {
"range": {
"UTC_date": {
"lt": "now-15d/d"
}
}
}
},
"dest": {
"index": dest
}
}
Query = {
"query": {
"range": {
"UTC_date": {
"lt": "now-15d/d"
}
}
}
}
try:
result = es.reindex(query, wait_for_completion=True, request_timeout=300)
except:
pass

i found solution.because i reindex 6gig it takse more time so i increased time out and now it work
def Reindex(src, dest):
print("[X] START Reindex")
query = {
"source": {
"index": src,
"query": {
"range": {
"UTC_date": {
"lt": "now-1d/d"
}
}
}
},
"dest": {
"index": dest
}
}
Query = {
"query": {
"range": {
"UTC_date": {
"lt": "now-1d/d"
}
}
}
}
try:
result = es.reindex(query, wait_for_completion=True, request_timeout=10000,conflicts="proceed")
print(result)
log_dict = {}
log_dict['total']=result['total']
log_dict['created']=result['created']
log_dict['updated']=result['updated']
log_dict["Timestamp"] = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")
if log_dict['total']==(log_dict['created']+log_dict['updated']):
print("gggggg")
log_dict['status'] = 'success'
Delete(src)
else:
log_dict['status'] = 'failure'
access_logger.info(json.dumps(log_dict))

Get elasticsearch documents older than a certain age in minutes

I have a field in some of my documents if they've been individually queried before which is a unix timestamp:
"timelock": 1,561,081,724.254
Some documents don't have this if they've never been individually queried. I would like to also have a query that only returns documents that either DO NOT have the field or have the field but the difference between it's timestamp and the current time is greater than 10 minutes (600sec)
documents = es.search(index='index', size=10000, body={
"query": {
"bool": {
"must": [
{
"match_all": {}
},
],
"filter": [],
"should": [],
"must_not": [
]
}
}})
So I guess in pseudo-code I'd do it like:
if 'timelock' exists:
if current_time - 'timlock' > 600:
include in query
else:
exclude from query
else:
include in query
I'm using the python module for ES.

Why not simply using date math ?
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "timelock"
}
}
]
}
},
{
"range": {
"timelock": {
"lt": "now-10m"
}
}
}
]
}
}
}

I'm not aware of python syntax but what I can suggest via sudo code is to use the logic below:
compare_stamp = current_timestamp - 600
if 'timelock' exists:
if timelock < compare_stamp:
include document
else:
exclude document
else:
include document
Since you can easily get the compare_stamp in python script. This value can then be used in elastic query below:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "timelock"
}
}
]
}
},
{
"range": {
"timelock": {
"lt": compare_timestamp
}
}
}
]
}
}
}

elasticsearch.exceptions.TransportError: TransportError 503: Data too large

I'm trying to get the response from ES hitting it from python code but it is showing the below error:
elasticsearch.exceptions.TransportError: TransportError(503, u'search_phase_execution_exception', u'[request] Data too large, data for [<agg [POSCodeModifier]>] would be [623327280/594.4mb], which is larger than the limit of [623326003/594.4mb]')
If i hit the same code from kibana i get the results but using python i'm getting this error. I'm using aggregation in my code. if someone can explain if i need to set some properties or how to optimise it??
Below is the structure for request i'm sending and if i set start and end date greater than 5 days it gives me the error, otherwise i'm getting the results
unmtchd_ESdata= es.search(index='cstore_new',body={'size' : 0, "aggs": {
"filtered": {
"filter": {
"bool": {
"must_not": [
{
"match": {
"CSPAccountNo": store_id
}
}
],
"must": [
{
"range": {
"ReportDate": {
"gte": start_dt,
"lte": end_dt
}
}
}
]
}
}
,
"aggs": {
"POSCode": {
"terms": {
"field": "POSCode",
"size": 10000
},
"aggs": {
"POSCodeModifier": {
"terms": {
"field": "POSCodeModifier",
"size": 10000
},
"aggs": {
"CSP": {
"terms": {
"field": "CSPAccountNo",
"size": 10000
},
"aggs": {
"per_stock": {
"date_histogram": {
"field": "ReportDate",
"interval": "week",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": start_dt,
"max": end_dt
}
},
"aggs": {
"avg_week_qty_sales": {
"sum": {
"field": "TotalCount"
}
}
}
},
"market_week_metrics": {
"extended_stats_bucket": {
"buckets_path": "per_stock>avg_week_qty_sales"
}
}
}
}
}
}
}
}
}
}
}},request_timeout=1000)
Edit1:
Result variables needed from elastic search response
for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])):
list6.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['avg'])
list7.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['key'])
list8.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max']-unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])
list9.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max'])
list10.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Elasticserach, getting error when trying to query on on time - python

Related

Query an elasticsearch index by an attribute, with a given range?

How do you filter on not null values elasticsearch?

reindex elasticsearch api timeout with large size of document [duplicate]

Get elasticsearch documents older than a certain age in minutes

elasticsearch.exceptions.TransportError: TransportError 503: Data too large

Categories

Resources