Query an elasticsearch index by an attribute, with a given range? - python

I want to query my index so that it matches whenever a particular attribute shows up called sitename, but I want all the data from a certain time range. I thought it might be something of the below but unsure:
{
"query": {
"range": {
"timestamp": {
"gte": "now-1h/h",
"lt": "now/h"
}
},
"match": {"sitename" : "HARB00ZAF0" }
}
}

You're almost there, but you need to leverage the bool queries
{
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1h/h",
"lt": "now/h"
}
}
}
],
"must": [
{
"match": {
"sitename": "HARB00ZAF0"
}
}
]
}
}
}

Related

convert elasticsearch query to elasticsearch-dsl query in python

i tried to convert this elasticsearch query to elasticsearch-dsl query but i didn't get the same result.
i am using elasticsearch version 5.X also for the elasticsearch-dsl
here the elasticsearch query :
qry = {
"_source": [
"accurate_call_duration",
"call_duration",
"called"
],
"query": {
"bool": {
"should": [
{
"has_child": {
"type": "outgoing",
"query": {
"match_all": {}
},
"inner_hits": {
"size": 5000,
"_source": [
"accurate_call_duration",
"accurate_ringing_duration",
"anonymous_call"
]
}
}
}
],
"must": [
{
"term": {
"client": 1212,
}
},
{
"range": {
"created_time": {
"gte": date_from,
"lte": date_to
}
}
},
{
"term": {
"type": "incoming"
}
}
]
}
},
"size": 5000
}
elaticsearch-dsl query:
result = Search(using=escli, index="cdr").source([
"accurate_call_duration",
"call_duration", "called").filter(
'range', created_time={'gt': date_from, 'lte': date_to}).filter(
"term", type="incoming").extra(size=5000).execute()
how can i get the same result as the elasticsearch query (with inner_hits) ?

How to perform sum aggregation on n number of documents after sorting records (descending) ? elastic

{so, i want latest 30 document between 20/6 to 20/4 and perform the sum aggregation on field duration_seconds of those 30 latest doc. we had tried multiple aggregation on that like top_hits, terms for sorting but then we got the sum of all doc between 20/6 to 20/4}
"size": 1,
"query": {
"bool": {
"must": [
{
"range": {
"create_datetime": {
"gte": "2022-04-20",
"lte": "2022-06-20"
}
}
}
]
}
},
"sort": [
{
"create_datetime": {
"order": "desc"
}
}
],
"aggs": {
"videosession": {
"sampler": {
"shard_size":30
},
"aggs": {
"sum_duration_seconds": {
"sum": {
"field": "duration_seconds"
}
}
}
}
}
}```

How to Search in multiple OR conditions in Elastic search in Python

I have to do a search for all items in array along with a static detail in elastic search.
Fields in Elastics search index: tech_id, detail, volume
tech_ids = ['qwe1', 'qwe2', 'qwe3', 'qwe4', 'qwe5', 'qwe6', 'qwe7']
Number of tech_id in array can differ.
Now my search has to take place in a combination of tech_id and detail where tech_id varies while detail stays static. This combination is an or combination. In the end i am expecting search to have with provided tech_ids and static detail.
tech_ids = ['qwe1', 'qwe2', 'qwe3', 'qwe4', 'qwe5', 'qwe6', 'qwe7']
"query": {
"bool": {
"must": [
{
"match": {
"detail": "calci"
}
},
{
"match_phrase": {
"tech_id": tech_ids[0]
}
}]
}
What you're after, I think, is a bool-should within a bool-must:
{
"query": {
"bool": {
"must": [
{
"match": {
"detail": "calci"
}
},
{
"bool": {
"should":
[{
"match_phrase": { "tech_id": tid }
} for tid in tech_ids]
}
}
]
}
}
}

Get elasticsearch documents older than a certain age in minutes

I have a field in some of my documents if they've been individually queried before which is a unix timestamp:
"timelock": 1,561,081,724.254
Some documents don't have this if they've never been individually queried. I would like to also have a query that only returns documents that either DO NOT have the field or have the field but the difference between it's timestamp and the current time is greater than 10 minutes (600sec)
documents = es.search(index='index', size=10000, body={
"query": {
"bool": {
"must": [
{
"match_all": {}
},
],
"filter": [],
"should": [],
"must_not": [
]
}
}})
So I guess in pseudo-code I'd do it like:
if 'timelock' exists:
if current_time - 'timlock' > 600:
include in query
else:
exclude from query
else:
include in query
I'm using the python module for ES.
Why not simply using date math ?
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "timelock"
}
}
]
}
},
{
"range": {
"timelock": {
"lt": "now-10m"
}
}
}
]
}
}
}
I'm not aware of python syntax but what I can suggest via sudo code is to use the logic below:
compare_stamp = current_timestamp - 600
if 'timelock' exists:
if timelock < compare_stamp:
include document
else:
exclude document
else:
include document
Since you can easily get the compare_stamp in python script. This value can then be used in elastic query below:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "timelock"
}
}
]
}
},
{
"range": {
"timelock": {
"lt": compare_timestamp
}
}
}
]
}
}
}

elasticsearch.exceptions.TransportError: TransportError 503: Data too large

I'm trying to get the response from ES hitting it from python code but it is showing the below error:
elasticsearch.exceptions.TransportError: TransportError(503, u'search_phase_execution_exception', u'[request] Data too large, data for [<agg [POSCodeModifier]>] would be [623327280/594.4mb], which is larger than the limit of [623326003/594.4mb]')
If i hit the same code from kibana i get the results but using python i'm getting this error. I'm using aggregation in my code. if someone can explain if i need to set some properties or how to optimise it??
Below is the structure for request i'm sending and if i set start and end date greater than 5 days it gives me the error, otherwise i'm getting the results
unmtchd_ESdata= es.search(index='cstore_new',body={'size' : 0, "aggs": {
"filtered": {
"filter": {
"bool": {
"must_not": [
{
"match": {
"CSPAccountNo": store_id
}
}
],
"must": [
{
"range": {
"ReportDate": {
"gte": start_dt,
"lte": end_dt
}
}
}
]
}
}
,
"aggs": {
"POSCode": {
"terms": {
"field": "POSCode",
"size": 10000
},
"aggs": {
"POSCodeModifier": {
"terms": {
"field": "POSCodeModifier",
"size": 10000
},
"aggs": {
"CSP": {
"terms": {
"field": "CSPAccountNo",
"size": 10000
},
"aggs": {
"per_stock": {
"date_histogram": {
"field": "ReportDate",
"interval": "week",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": start_dt,
"max": end_dt
}
},
"aggs": {
"avg_week_qty_sales": {
"sum": {
"field": "TotalCount"
}
}
}
},
"market_week_metrics": {
"extended_stats_bucket": {
"buckets_path": "per_stock>avg_week_qty_sales"
}
}
}
}
}
}
}
}
}
}
}},request_timeout=1000)
Edit1:
Result variables needed from elastic search response
for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])):
list6.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['avg'])
list7.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['key'])
list8.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max']-unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])
list9.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max'])
list10.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])

Categories

Resources