I am filtering in ElasticSearch. I want doc_count to return 0 on non-data dates, but it doesn't print those dates at all, only dates with data are returned to me. do you know how i can do it? Here is the Python output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
33479 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33480 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33481 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33482 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33483 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
And here is my ElasticSearch filter:
"from": 0,
"size": 0,
"query": {
"bool": {
"must":
[
{
"range": {
"#timestamp": {
"gte": "now-1M",
"lt": "now"
}
}
}
]
}
},
"aggs": {
"continent": {
"terms": {
"field": "source.geo.continent_name.keyword"
},
"aggs": {
"_source": {
"date_histogram": {
"field": "#timestamp", "interval": "8m"
}}}}}}
You need to set min_doc_count value to 0 for aggregation where you want result with zero doc_count.
{
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-1M",
"lt": "now"
}
}
}
]
}
},
"aggs": {
"continent": {
"terms": {
"field": "source.geo.continent_name.keyword",
"min_doc_count": 0
},
"aggs": {
"_source": {
"date_histogram": {
"field": "#timestamp",
"interval": "8m",
"min_doc_count": 0
}
}
}
}
}
}
So say I have a json with the following structure:
{
"json_1": {
"1": {
"banana": 0,
"corn": 5,
"apple": 5
},
"2": {
"melon": 10
},
"3": {
"onion": 9,
"garlic": 4
}
}
}
but I also have another json with the same structure but a little different data:
{
"json_2": {
"1": {
"banana": 2,
"corn": 3
},
"2": {
"melon": 1,
"watermelon": 5
},
"3": {
"onion": 4,
"garlic": 1
}
}
}
whats a fast algorithm to combine these two jsons into one so that for each number i would have the json_1 amount and the json_2 amount for every fruit and if for example one json doesn't have a fruit that the other one has it will not combine them:
{
"combined": {
"1": {
"banana": {
"json_1": 0,
"json_2": 2
},
"corn": {
"json_1": 5,
"json_2": 3
}
},
"2": {
"melon": {
"json_1": 10,
"json_2": 1
}
},
"3": {
"onion": {
"json_1": 9,
"json_2": 4
},
"garlic": {
"json_1": 4,
"json_2": 1
}
}
}
}
I have this json file loaded in Python with json.loads('myfile.json'):
[
{
"cart": {
"items": {
"3154ba405e5c5a22bbdf9bf1": {
"item": {
"_id": "3154ba405e5c5a22bbdf9bf1",
"title": "Drink alla cannella",
"price": 5.65,
"__v": 0
},
"qty": 1,
"price": 5.65
}
},
"totalQty": 1,
"totalPrice": 5.65
}
},
{
"cart": {
"items": {
"6214ba405e4c5a31bbdf9ad7": {
"item": {
"_id": "6214ba405e4c5a31bbdf9ad7",
"title": "Drink alla menta",
"price": 5.65,
"__v": 0
},
"qty": 2,
"price": 11.3
}
},
"totalQty": 2,
"totalPrice": 11.3
}
}
]
How I can access to both totalQty and totalPrice fields at same time and sum them?
How I can access to both Title fields to print it?
Let's assume that you have the JSON data available as a string then:
jdata = '''
[
{
"cart": {
"items": {
"3154ba405e5c5a22bbdf9bf1": {
"item": {
"_id": "3154ba405e5c5a22bbdf9bf1",
"title": "Drink alla cannella",
"price": 5.65,
"__v": 0
},
"qty": 1,
"price": 5.65
}
},
"totalQty": 1,
"totalPrice": 5.65
}
},
{
"cart": {
"items": {
"6214ba405e4c5a31bbdf9ad7": {
"item": {
"_id": "6214ba405e4c5a31bbdf9ad7",
"title": "Drink alla menta",
"price": 5.65,
"__v": 0
},
"qty": 2,
"price": 11.3
}
},
"totalQty": 2,
"totalPrice": 11.3
}
}
]
'''
totalQty = 0
totalPrice = 0
for d in json.loads(jdata):
c = d['cart']
totalQty += c['totalQty']
totalPrice += c['totalPrice']
for sd in c['items'].values():
print(sd['item']['title'])
print(f'{totalQty:d}', f'{totalPrice:.2f}')
Output:
3 16.95
Note:
I suspect that what you really want to do is multiply those two values
Idea is to compare N number of dictionaries with a single standard dictionary where each key, value pair comparison has a different conditional rule.
Eg.,
Standard dictionary -
{'ram': 16,
'storage': [512, 1, 2],
'manufacturers': ['Dell', 'Apple', 'Asus', 'Alienware'],
'year': 2018,
'drives': ['A', 'B', 'C', 'D', 'E']
}
List of dictionaries -
{'ram': 8,
'storage': 1,
'manufacturers': 'Apple',
'year': 2018,
'drives': ['C', 'D', 'E']
},
{'ram': 16,
'storage': 4,
'manufacturers': 'Asus',
'year': 2021,
'drives': ['F', 'G','H']
},
{'ram': 4,
'storage': 2,
'manufacturers': 'ACER',
'year': 2016,
'drives': ['F', 'G', 'H']
}
Conditions-
'ram' > 8
if 'ram' >=8 then 'storage' >= 2 else 1
'manufactures' in ['Dell', 'Apple', 'Asus', 'Alienware']
'year' >= 2018
if 'year' > 2018 then 'drives' in ['A', 'B', 'C', 'D', 'E'] else ['F', 'G', 'H']
So the expected output is to display all the non-matching ones with non-matching values and none/null for the matching values.
Expected Output -
{'ram': 8,
'storage': 1,
'manufacturers': None,
'year': None,
'drives': ['C', 'D', 'E']
},
{'ram': None,
'storage': None,
'manufacturers': None,
'year': None,
'drives': ['F','G','H']
},
{'ram': 4,
'storage': 2,
'manufacturers': 'ACER',
'year': 2016,
'drives': None
}
While working with MongoDB I encountered this problem where each document in a data collection should be compared with a standard collection. Any MongoDB direct query would also be very helpful.
To achieve the conditions along using MongoDB Aggregation, use the below Query:
db.collection.aggregate([
{
"$project": {
"ram": {
"$cond": {
"if": {
"$gt": [
"$ram",
8
]
},
"then": null,
"else": "$ram",
}
},
"storage": {
"$cond": {
"if": {
"$and": [
{
"$gte": [
"$ram",
8
]
},
{
"$gte": [
"$storage",
2
]
},
],
},
"then": null,
"else": "$storage",
}
},
"manufacturers": {
"$cond": {
"if": {
"$in": [
"$manufacturers",
[
"Dell",
"Apple",
"Asus",
"Alienware"
],
]
},
"then": null,
"else": "$manufacturers",
}
},
"year": {
"$cond": {
"if": {
"$gte": [
"$year",
2018
]
},
"then": null,
"else": "$year",
}
},
"drives": {
"$cond": {
"if": {
"$gt": [
"$year",
2018
]
},
"then": {
"$setIntersection": [
"$drives",
[
"A",
"B",
"C",
"D",
"E"
]
]
},
"else": "$drives",
}
},
}
}
])
Mongo Playground Sample Execution
You can combine this with for loop in Python
for std_doc in std_col.find({}, {
"ram": 1,
"storage": 1,
"manufacturers": 1,
"year": 1,
"drives": 1,
}):
print(list(list_col.aggregate([
{
"$project": {
"ram": {
"$cond": {
"if": {
"$gt": [
"$ram",
8
]
},
"then": None,
"else": "$ram",
}
},
"storage": {
"$cond": {
"if": {
"$and": [
{
"$gte": [
"$ram",
8
]
},
{
"$gte": [
"$storage",
2
]
},
],
},
"then": None,
"else": "$storage",
}
},
"manufacturers": {
"$cond": {
"if": {
"$in": [
"$manufacturers",
[
"Dell",
"Apple",
"Asus",
"Alienware"
],
]
},
"then": None,
"else": "$manufacturers",
}
},
"year": {
"$cond": {
"if": {
"$gte": [
"$year",
2018
]
},
"then": None,
"else": "$year",
}
},
"drives": {
"$cond": {
"if": {
"$gt": [
"$year",
2018
]
},
"then": {
"$setIntersection": [
"$drives",
[
"A",
"B",
"C",
"D",
"E"
]
]
},
"else": "$drives",
}
},
}
}
])))
The most optimized solution is to perform a lookup, but this varies based on your requirement:
db.std_col.aggregate([
{
"$lookup": {
"from": "dict_col",
"let": {
"cmpRam": "$ram",
"cmpStorage": "$storage",
"cmpManufacturers": "$manufacturers",
"cmpYear": "$year",
"cmpDrives": "$drives",
},
"pipeline": [
{
"$project": {
"ram": {
"$cond": {
"if": {
"$gt": [
"$ram",
"$$cmpRam",
]
},
"then": null,
"else": "$ram",
}
},
"storage": {
"$cond": {
"if": {
"$and": [
{
"$gte": [
"$ram",
"$$cmpRam"
]
},
{
"$gte": [
"$storage",
"$$cmpStorage"
]
},
],
},
"then": null,
"else": "$storage",
}
},
"manufacturers": {
"$cond": {
"if": {
"$in": [
"$manufacturers",
"$$cmpManufacturers",
]
},
"then": null,
"else": "$manufacturers",
}
},
"year": {
"$cond": {
"if": {
"$gte": [
"$year",
"$$cmpYear",
]
},
"then": null,
"else": "$year",
}
},
"drives": {
"$cond": {
"if": {
"$gt": [
"$year",
"$$cmpYear"
]
},
"then": {
"$setIntersection": [
"$drives",
"$$cmpDrives"
]
},
"else": "$drives",
}
},
}
},
],
"as": "inventory_docs"
}
}
])
Mongo Playground Sample Execution
Hello I have the following problem, whenever I aggregate data, the aggregations and to be more exact the date_histogram is always different. It starts with pretty much random date.
I am using elasticpy and my query looks like this before executing. Note that I am using python datetime objects to get a "real" results. I had some problems with other formats.
{
"query": {
"bool": {
"filter": [
{
"range": {
"original_date": {
"gte": datetime.datetime(2020, 2, 13, 0, 0),
"lte": datetime.datetime(2020, 2, 15, 23, 0),
}
}
}
],
"must": [
{
"query_string": {
"query": "whatever string"
}
}
],
}
},
"aggs": {
"docs_histogram": {
"date_histogram": {
"field": "original_date",
"interval": "hour",
"time_zone": "EET",
},
... (other aggs)
},
},
}
The date histogram should be in this range: 2020-02-13 00:00:00 - 2020-02-15 23:00:00 But look at the output's start and end. It starts 1 day later and ends same day 18:00 ??
"buckets": [
{
"key_as_string": "2020-02-14T00:00:00.000+02:00",
"key": 1581631200000,
"doc_count": 1,
"source_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{"key": "WhateverKey", "doc_count": 1}],
},
},
...
{
"key_as_string": "2020-02-14T18:00:00.000+02:00",
"key": 1581696000000,
"doc_count": 1,
"source_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{"key": "WhateverKey2", "doc_count": 1}],
},
},
]