Merge 2 lists and remove duplicates in Python

Merge 2 lists and remove duplicates in Python - python

I have 2 lists, looking like:
temp_data:
{
"id": 1,
"name": "test (replaced)",
"code": "test",
"last_update": "2020-01-01",
"online": false,
"data": {
"temperature": [
{
"date": "2019-12-17",
"value": 23.652905748126333
},
...
]}
hum_data:
{
"id": 1,
"name": "test (replaced)",
"code": "test",
"last_update": "2020-01-01",
"online": false,
"data": {
"humidity": [
{
"date": "2019-12-17",
"value": 23.652905748126333
},
...
]}
I need to merge the 2 lists to 1 without duplicating data. What is the easiest/efficient way? After merging, I want something like this:
{
"id": 1,
"name": "test",
"code": "test",
"last_update": "2020-01-01",
"online": false,
"data": {
"temperature": [
{
"date": "2019-12-17",
"value": 23.652905748126333
},
...
],
"humidity": [
{
"date": "2019-12-17",
"value": 23.652905748126333
},
...
Thanks for helping.

If your lists hum_data and temp_data are not sorted then first sort them and then concatenate the dictionaries pair-wise.
# To make comparisons for sorting
compare_function = lambda value : value['id']
# sort arrays before to make later concatenation easier
temp_data.sort(key=compare_function)
hum_data.sort(key=compare_function)
combined_data = temp_data.copy()
# concatenate the dictionries using the update function
for hum_row, combined_row in zip(hum_data, combined_data):
combined_row['data'].update(hum_row['data'])
# combined hum_data and temp_data
combined_data
If the lists are already sorted then you just need to concatenate dictionary by dictionary.
combined_data = temp_data.copy()
# concatenate the dictionries using the update function
for hum_row, combined_row in zip(hum_data, combined_data):
combined_row['data'].update(hum_row['data'])
# combined hum_data and temp_data
combined_data
With that code I got the following result:
[
{
'id': 1,
'name': 'test (replaced)',
'code': 'test',
'last_update': '2020-01-01',
'online': False,
'data': {
'temperature': [{'date': '2019-12-17', 'value': 1}],
'humidity': [{'date': '2019-12-17', 'value': 1}]}
},
{
'id': 2,
'name': 'test (replaced)',
'code': 'test',
'last_update': '2020-01-01',
'online': False,
'data': {
'temperature': [{'date': '2019-12-17', 'value': 2}],
'humidity': [{'date': '2019-12-17', 'value': 2}]}
}
]

Related

Returning data that is not in ElasticSearch as 0 in doc_count

I am filtering in ElasticSearch. I want doc_count to return 0 on non-data dates, but it doesn't print those dates at all, only dates with data are returned to me. do you know how i can do it? Here is the Python output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
33479 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33480 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33481 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33482 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
33483 {'date': '2022-04-13T08:08:00.000Z', 'value': 7}
And here is my ElasticSearch filter:
"from": 0,
"size": 0,
"query": {
"bool": {
"must":
[
{
"range": {
"#timestamp": {
"gte": "now-1M",
"lt": "now"
}
}
}
]
}
},
"aggs": {
"continent": {
"terms": {
"field": "source.geo.continent_name.keyword"
},
"aggs": {
"_source": {
"date_histogram": {
"field": "#timestamp", "interval": "8m"
}}}}}}

You need to set min_doc_count value to 0 for aggregation where you want result with zero doc_count.
{
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-1M",
"lt": "now"
}
}
}
]
}
},
"aggs": {
"continent": {
"terms": {
"field": "source.geo.continent_name.keyword",
"min_doc_count": 0
},
"aggs": {
"_source": {
"date_histogram": {
"field": "#timestamp",
"interval": "8m",
"min_doc_count": 0
}
}
}
}
}
}

How to merge list of dictionaries by unique key value

I want to merge list of dictionary provided below with unique channel and zrepcode.
sample input:
[
{
"channel": 1,
"zrepcode": "123456",
"turn": 7833.9
},
{
"channel": 1,
"zrepcode": "123456",
"pipeline": 324
},
{
"channel": 1,
"zrepcode": "123456",
"inv_bal": 941.16
},
{
"channel": 1,
"zrepcode": "123456",
"display": 341
},
{
"channel": 3,
"zrepcode": "123456",
"display": 941.16
},
{
"channel": 3,
"zrepcode": "123456",
"turn": 7935.01
},
{
"channel": 3,
"zrepcode": "123456",
"pipeline": 0
},
{
"channel": 3,
"zrepcode": "123456",
"inv_bal": 341
},
{
"channel": 3,
"zrepcode": "789789",
"display": 941.16
},
{
"channel": 3,
"zrepcode": "789789",
"turn": 7935.01
},
{
"channel": 3,
"zrepcode": "789789",
"pipeline": 0
},
{
"channel": 3,
"zrepcode": "789789",
"inv_bal": 341
}
]
Sample output:
[
{'channel': 1, 'zrepcode': '123456', 'turn': 7833.9, 'pipeline': 324.0,'display': 341,'inv_bal': 941.16},
{'channel': 3, 'zrepcode': '123456', 'turn': 7935.01, 'pipeline': 0.0, 'display': 941.16, 'inv_bal': 341.0},
{'channel': 3, 'zrepcode': '789789', 'turn': 7935.01, 'pipeline': 0.0, 'display': 941.16, 'inv_bal': 341.0}
]

Easily solved with our good friend collections.defaultdict:
import collections
by_key = collections.defaultdict(dict)
for datum in data: # data is the list of dicts from the post
key = (datum.get("channel"), datum.get("zrepcode")) # form the key tuple
by_key[key].update(datum) # update the defaultdict by the key tuple
print(list(by_key.values()))
This outputs
[
{'channel': 1, 'zrepcode': '123456', 'turn': 7833.9, 'pipeline': 324, 'inv_bal': 941.16, 'display': 341},
{'channel': 3, 'zrepcode': '123456', 'display': 941.16, 'turn': 7935.01, 'pipeline': 0, 'inv_bal': 341},
{'channel': 3, 'zrepcode': '789789', 'display': 941.16, 'turn': 7935.01, 'pipeline': 0, 'inv_bal': 341},
]

Sort list of nested dictionaries by multiple attributes

i have my sample data as
b = [{"id": 1, "name": {"d_name": "miranda", "ingredient": "orange"}, "score": 1.123},
{"id": 20, "name": {"d_name": "limca", "ingredient": "lime"}, "score": 4.231},
{"id": 3, "name": {"d_name": "coke", "ingredient": "water"}, "score": 4.231},
{"id": 2, "name": {"d_name": "fanta", "ingredient": "water"}, "score": 4.231},
{"id": 3, "name": {"d_name": "dew", "ingredient": "water & sugar"}, "score": 2.231}]
i need to sort such that score ASC, name DESC, id ASC (by relational db notation).
So far, i have implemented
def sort_func(e):
return (e['score'], e['name']['d_name'], e['id'])
a = b.sort(key=sort_func, reverse=False)
This works for score ASC, name ASC, id ASC.
but for score ASC, name DESC, id ASC if i try to sort by name DESC it throws error. because of unary - error in -e['name']['d_name'].
How can i approach this problem, from here ? Thanks,
Edit 1:
i need to make this dynamic such that there can be case such as e['name'['d_name'] ASC, e['name']['ingredient'] DESC. How can i handle this type of dynamic behaviour ?

You can sort by -score, name, -id with reverse=True:
from pprint import pprint
b = [
{
"id": 1,
"name": {"d_name": "miranda", "ingredient": "orange"},
"score": 1.123,
},
{
"id": 20,
"name": {"d_name": "limca", "ingredient": "lime"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "coke", "ingredient": "water"},
"score": 4.231,
},
{
"id": 2,
"name": {"d_name": "fanta", "ingredient": "water"},
"score": 4.231,
},
{
"id": 3,
"name": {"d_name": "dew", "ingredient": "water & sugar"},
"score": 2.231,
},
]
pprint(
sorted(
b,
key=lambda k: (-k["score"], k["name"]["d_name"], -k["id"]),
reverse=True,
)
)
Prints:
[{'id': 1,
'name': {'d_name': 'miranda', 'ingredient': 'orange'},
'score': 1.123},
{'id': 3,
'name': {'d_name': 'dew', 'ingredient': 'water & sugar'},
'score': 2.231},
{'id': 20, 'name': {'d_name': 'limca', 'ingredient': 'lime'}, 'score': 4.231},
{'id': 2, 'name': {'d_name': 'fanta', 'ingredient': 'water'}, 'score': 4.231},
{'id': 3, 'name': {'d_name': 'coke', 'ingredient': 'water'}, 'score': 4.231}]

Remove duplicates from list of dictionaries created using groupby itertools in Python

I want to remove some duplicates in my merged dictionary.
My data:
mongo_data = [{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'},],
'vendor': 'Fantasy'
},{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Hunger Games', 'value': '10.0'},
{'key': 'Maze Runner', 'value': '5.5'},],
'vendor': 'Dystopia'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'Divergent', 'value': '9.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},],
'vendor': 'Fantasy'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Divergent', 'value': '9.0'},],
'vendor': 'Fantasy'
}]
My code:
for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
search = {"url": key, "results": []}
for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
result = {
"genre": vendor,
"data": [{'key': key['key'], 'value': key['value']}
for result2 in group2
for key in result2["variables"]],
}
search["results"].append(result)
searches.append(search)
My result:
[
{
"url": "https://goodreads.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Harry Potter",
"value": "10.0"
},
{
"key": "Discovery of Witches",
"value": "8.5"
}
]
},
{
"genre": "Dystopia",
"data": [
{
"key": "Hunger Games",
"value": "10.0"
},
{
"key": "Maze Runner",
"value": "5.5"
}
]
}
]
},
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
},
{
"key": "Divergent",
"value": "9.0"
}
]
}
}
]
}
]
I do not want any duplicates in my structure. I am not sure on how to take them out. My expected result can be seen below.
Expected result:
[
{
"url": "https://goodreads.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Harry Potter",
"value": "10.0"
},
{
"key": "Discovery of Witches",
"value": "8.5"
}
]
},
{
"genre": "Dystopia",
"data": [
{
"key": "Hunger Games",
"value": "10.0"
},
{
"key": "Maze Runner",
"value": "5.5"
}
]
}
]
},
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
}
]
}
}
]
}
]
Divergent is getting repeated in the last list of dictionaries. When I merged my dictionaries even the duplicates inside https://kindle.com/-->Fantasy got merged into one. Is there a way for me to remove the duplicate dictionary?
I want the https://kindle.com/ part to look like:
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
}
]
}
}
]
}

You can try convert those dict to a set of tuple first and then convert back to a list of dict later:
mongo_data = [{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'},],
'vendor': 'Fantasy'
},{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Hunger Games', 'value': '10.0'},
{'key': 'Maze Runner', 'value': '5.5'},],
'vendor': 'Dystopia'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'Divergent', 'value': '9.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},],
'vendor': 'Fantasy'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Divergent', 'value': '9.0'},],
'vendor': 'Fantasy'
}]
from itertools import groupby
searches = []
for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
search = {"url": key, "results": []}
for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
result = {
"genre": vendor,
"data": set((key['key'], key['value'])
for result2 in group2
for key in result2["variables"]),
}
result['data'] = [{"key": tup[0], "value": tup[1]} for tup in result['data']]
search["results"].append(result)
searches.append(search)
searches
Output:
[{'results': [{'data': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'}],
'genre': 'Fantasy'},
{'data': [{'key': 'Maze Runner', 'value': '5.5'},
{'key': 'Hunger Games', 'value': '10.0'}],
'genre': 'Dystopia'}],
'url': 'https://goodreads.com/'},
{'results': [{'data': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},
{'key': 'Divergent', 'value': '9.0'}],
'genre': 'Fantasy'}],
'url': 'https://kindle.com/'}]

What is the equivalent of array_column in python3

I have a list of dictionary and I want to get only a specific item from each dictionary. My data pattern is:
data = [
{
"_id": "uuid",
"_index": "my_index",
"_score": 1,
"_source": {
"id" : 1,
"price": 100
}
},
{
"_id": "uuid",
"_index": "my_index",
"_score": 1,
"_source": {
"id" : 2,
"price": 150
}
},
{
"_id": "uuid",
"_index": "my_index",
"_score": 1,
"_source": {
"id" : 3,
"price": 90
}
}
]
My desired output:
formatted_data = [
{
"id": 1,
"price": 100
},
{
"id": 2,
"price": 150
},
{
"id": 3,
"price": 90
}
]
To formate data I have used iteration (for) like
formatted_data = []
for item in data:
formatted_data.append(item['_source'])
In PHP I can use array_column() instead of for loop. So what will be the alternative of for in python3 in my case?
Thanks in advance.

You can use list comprehension:
In [11]: [e['_source'] for e in data]
Out[11]: [{'id': 1, 'price': 100}, {'id': 2, 'price': 150}, {'id': 3, 'price': 90}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merge 2 lists and remove duplicates in Python - python

Related

Returning data that is not in ElasticSearch as 0 in doc_count

How to merge list of dictionaries by unique key value

Sort list of nested dictionaries by multiple attributes

Remove duplicates from list of dictionaries created using groupby itertools in Python

What is the equivalent of array_column in python3

Categories

Resources