I have 2 lists, looking like:
temp_data:
{
"id": 1,
"name": "test (replaced)",
"code": "test",
"last_update": "2020-01-01",
"online": false,
"data": {
"temperature": [
{
"date": "2019-12-17",
"value": 23.652905748126333
},
...
]}
hum_data:
{
"id": 1,
"name": "test (replaced)",
"code": "test",
"last_update": "2020-01-01",
"online": false,
"data": {
"humidity": [
{
"date": "2019-12-17",
"value": 23.652905748126333
},
...
]}
I need to merge the 2 lists to 1 without duplicating data. What is the easiest/efficient way? After merging, I want something like this:
{
"id": 1,
"name": "test",
"code": "test",
"last_update": "2020-01-01",
"online": false,
"data": {
"temperature": [
{
"date": "2019-12-17",
"value": 23.652905748126333
},
...
],
"humidity": [
{
"date": "2019-12-17",
"value": 23.652905748126333
},
...
Thanks for helping.
If your lists hum_data and temp_data are not sorted then first sort them and then concatenate the dictionaries pair-wise.
# To make comparisons for sorting
compare_function = lambda value : value['id']
# sort arrays before to make later concatenation easier
temp_data.sort(key=compare_function)
hum_data.sort(key=compare_function)
combined_data = temp_data.copy()
# concatenate the dictionries using the update function
for hum_row, combined_row in zip(hum_data, combined_data):
combined_row['data'].update(hum_row['data'])
# combined hum_data and temp_data
combined_data
If the lists are already sorted then you just need to concatenate dictionary by dictionary.
combined_data = temp_data.copy()
# concatenate the dictionries using the update function
for hum_row, combined_row in zip(hum_data, combined_data):
combined_row['data'].update(hum_row['data'])
# combined hum_data and temp_data
combined_data
With that code I got the following result:
[
{
'id': 1,
'name': 'test (replaced)',
'code': 'test',
'last_update': '2020-01-01',
'online': False,
'data': {
'temperature': [{'date': '2019-12-17', 'value': 1}],
'humidity': [{'date': '2019-12-17', 'value': 1}]}
},
{
'id': 2,
'name': 'test (replaced)',
'code': 'test',
'last_update': '2020-01-01',
'online': False,
'data': {
'temperature': [{'date': '2019-12-17', 'value': 2}],
'humidity': [{'date': '2019-12-17', 'value': 2}]}
}
]
Related
I am filtering in ElasticSearch. I want doc_count to return 0 on non-data dates, but it doesn't print those dates at all, only dates with data are returned to me. do you know how i can do it? Here is the Python output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
33479 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33480 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33481 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33482 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33483 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
And here is my ElasticSearch filter:
"from": 0,
"size": 0,
"query": {
"bool": {
"must":
[
{
"range": {
"#timestamp": {
"gte": "now-1M",
"lt": "now"
}
}
}
]
}
},
"aggs": {
"continent": {
"terms": {
"field": "source.geo.continent_name.keyword"
},
"aggs": {
"_source": {
"date_histogram": {
"field": "#timestamp", "interval": "8m"
}}}}}}
You need to set min_doc_count value to 0 for aggregation where you want result with zero doc_count.
{
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-1M",
"lt": "now"
}
}
}
]
}
},
"aggs": {
"continent": {
"terms": {
"field": "source.geo.continent_name.keyword",
"min_doc_count": 0
},
"aggs": {
"_source": {
"date_histogram": {
"field": "#timestamp",
"interval": "8m",
"min_doc_count": 0
}
}
}
}
}
}
I want to merge list of dictionary provided below with unique channel and zrepcode.
sample input:
[
{
"channel": 1,
"zrepcode": "123456",
"turn": 7833.9
},
{
"channel": 1,
"zrepcode": "123456",
"pipeline": 324
},
{
"channel": 1,
"zrepcode": "123456",
"inv_bal": 941.16
},
{
"channel": 1,
"zrepcode": "123456",
"display": 341
},
{
"channel": 3,
"zrepcode": "123456",
"display": 941.16
},
{
"channel": 3,
"zrepcode": "123456",
"turn": 7935.01
},
{
"channel": 3,
"zrepcode": "123456",
"pipeline": 0
},
{
"channel": 3,
"zrepcode": "123456",
"inv_bal": 341
},
{
"channel": 3,
"zrepcode": "789789",
"display": 941.16
},
{
"channel": 3,
"zrepcode": "789789",
"turn": 7935.01
},
{
"channel": 3,
"zrepcode": "789789",
"pipeline": 0
},
{
"channel": 3,
"zrepcode": "789789",
"inv_bal": 341
}
]
Sample output:
[
{'channel': 1, 'zrepcode': '123456', 'turn': 7833.9, 'pipeline': 324.0,'display': 341,'inv_bal': 941.16},
{'channel': 3, 'zrepcode': '123456', 'turn': 7935.01, 'pipeline': 0.0, 'display': 941.16, 'inv_bal': 341.0},
{'channel': 3, 'zrepcode': '789789', 'turn': 7935.01, 'pipeline': 0.0, 'display': 941.16, 'inv_bal': 341.0}
]
Easily solved with our good friend collections.defaultdict:
import collections
by_key = collections.defaultdict(dict)
for datum in data: # data is the list of dicts from the post
key = (datum.get("channel"), datum.get("zrepcode")) # form the key tuple
by_key[key].update(datum) # update the defaultdict by the key tuple
print(list(by_key.values()))
This outputs
[
{'channel': 1, 'zrepcode': '123456', 'turn': 7833.9, 'pipeline': 324, 'inv_bal': 941.16, 'display': 341},
{'channel': 3, 'zrepcode': '123456', 'display': 941.16, 'turn': 7935.01, 'pipeline': 0, 'inv_bal': 341},
{'channel': 3, 'zrepcode': '789789', 'display': 941.16, 'turn': 7935.01, 'pipeline': 0, 'inv_bal': 341},
]
i have my sample data as
b = [{"id": 1, "name": {"d_name": "miranda", "ingredient": "orange"}, "score": 1.123},
{"id": 20, "name": {"d_name": "limca", "ingredient": "lime"}, "score": 4.231},
{"id": 3, "name": {"d_name": "coke", "ingredient": "water"}, "score": 4.231},
{"id": 2, "name": {"d_name": "fanta", "ingredient": "water"}, "score": 4.231},
{"id": 3, "name": {"d_name": "dew", "ingredient": "water & sugar"}, "score": 2.231}]
i need to sort such that score ASC, name DESC, id ASC (by relational db notation).
So far, i have implemented
def sort_func(e):
return (e['score'], e['name']['d_name'], e['id'])
a = b.sort(key=sort_func, reverse=False)
This works for score ASC, name ASC, id ASC.
but for score ASC, name DESC, id ASC if i try to sort by name DESC it throws error. because of unary - error in -e['name']['d_name'].
How can i approach this problem, from here ? Thanks,
Edit 1:
i need to make this dynamic such that there can be case such as e['name'['d_name'] ASC, e['name']['ingredient'] DESC. How can i handle this type of dynamic behaviour ?
You can sort by -score, name, -id with reverse=True:
from pprint import pprint
b = [
{
"id": 1,
"name": {"d_name": "miranda", "ingredient": "orange"},
"score": 1.123,
},
{
"id": 20,
"name": {"d_name": "limca", "ingredient": "lime"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "coke", "ingredient": "water"},
"score": 4.231,
},
{
"id": 2,
"name": {"d_name": "fanta", "ingredient": "water"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "dew", "ingredient": "water & sugar"},
"score": 2.231,
},
]
pprint(
sorted(
b,
key=lambda k: (-k["score"], k["name"]["d_name"], -k["id"]),
reverse=True,
)
)
Prints:
[{'id': 1,
'name': {'d_name': 'miranda', 'ingredient': 'orange'},
'score': 1.123},
{'id': 3,
'name': {'d_name': 'dew', 'ingredient': 'water & sugar'},
'score': 2.231},
{'id': 20, 'name': {'d_name': 'limca', 'ingredient': 'lime'}, 'score': 4.231},
{'id': 2, 'name': {'d_name': 'fanta', 'ingredient': 'water'}, 'score': 4.231},
{'id': 3, 'name': {'d_name': 'coke', 'ingredient': 'water'}, 'score': 4.231}]
I want to remove some duplicates in my merged dictionary.
My data:
mongo_data = [{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'},],
'vendor': 'Fantasy'
},{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Hunger Games', 'value': '10.0'},
{'key': 'Maze Runner', 'value': '5.5'},],
'vendor': 'Dystopia'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'Divergent', 'value': '9.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},],
'vendor': 'Fantasy'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Divergent', 'value': '9.0'},],
'vendor': 'Fantasy'
}]
My code:
for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
search = {"url": key, "results": []}
for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
result = {
"genre": vendor,
"data": [{'key': key['key'], 'value': key['value']}
for result2 in group2
for key in result2["variables"]],
}
search["results"].append(result)
searches.append(search)
My result:
[
{
"url": "https://goodreads.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Harry Potter",
"value": "10.0"
},
{
"key": "Discovery of Witches",
"value": "8.5"
}
]
},
{
"genre": "Dystopia",
"data": [
{
"key": "Hunger Games",
"value": "10.0"
},
{
"key": "Maze Runner",
"value": "5.5"
}
]
}
]
},
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
},
{
"key": "Divergent",
"value": "9.0"
}
]
}
}
]
}
]
I do not want any duplicates in my structure. I am not sure on how to take them out. My expected result can be seen below.
Expected result:
[
{
"url": "https://goodreads.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Harry Potter",
"value": "10.0"
},
{
"key": "Discovery of Witches",
"value": "8.5"
}
]
},
{
"genre": "Dystopia",
"data": [
{
"key": "Hunger Games",
"value": "10.0"
},
{
"key": "Maze Runner",
"value": "5.5"
}
]
}
]
},
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
}
]
}
}
]
}
]
Divergent is getting repeated in the last list of dictionaries. When I merged my dictionaries even the duplicates inside https://kindle.com/-->Fantasy got merged into one. Is there a way for me to remove the duplicate dictionary?
I want the https://kindle.com/ part to look like:
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
}
]
}
}
]
}
You can try convert those dict to a set of tuple first and then convert back to a list of dict later:
mongo_data = [{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'},],
'vendor': 'Fantasy'
},{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Hunger Games', 'value': '10.0'},
{'key': 'Maze Runner', 'value': '5.5'},],
'vendor': 'Dystopia'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'Divergent', 'value': '9.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},],
'vendor': 'Fantasy'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Divergent', 'value': '9.0'},],
'vendor': 'Fantasy'
}]
from itertools import groupby
searches = []
for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
search = {"url": key, "results": []}
for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
result = {
"genre": vendor,
"data": set((key['key'], key['value'])
for result2 in group2
for key in result2["variables"]),
}
result['data'] = [{"key": tup[0], "value": tup[1]} for tup in result['data']]
search["results"].append(result)
searches.append(search)
searches
Output:
[{'results': [{'data': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'}],
'genre': 'Fantasy'},
{'data': [{'key': 'Maze Runner', 'value': '5.5'},
{'key': 'Hunger Games', 'value': '10.0'}],
'genre': 'Dystopia'}],
'url': 'https://goodreads.com/'},
{'results': [{'data': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},
{'key': 'Divergent', 'value': '9.0'}],
'genre': 'Fantasy'}],
'url': 'https://kindle.com/'}]
I have a list of dictionary and I want to get only a specific item from each dictionary. My data pattern is:
data = [
{
"_id": "uuid",
"_index": "my_index",
"_score": 1,
"_source": {
"id" : 1,
"price": 100
}
},
{
"_id": "uuid",
"_index": "my_index",
"_score": 1,
"_source": {
"id" : 2,
"price": 150
}
},
{
"_id": "uuid",
"_index": "my_index",
"_score": 1,
"_source": {
"id" : 3,
"price": 90
}
}
]
My desired output:
formatted_data = [
{
"id": 1,
"price": 100
},
{
"id": 2,
"price": 150
},
{
"id": 3,
"price": 90
}
]
To formate data I have used iteration (for) like
formatted_data = []
for item in data:
formatted_data.append(item['_source'])
In PHP I can use array_column() instead of for loop. So what will be the alternative of for in python3 in my case?
Thanks in advance.
You can use list comprehension:
In [11]: [e['_source'] for e in data]
Out[11]: [{'id': 1, 'price': 100}, {'id': 2, 'price': 150}, {'id': 3, 'price': 90}]