Create unique list of dictionaries using dict keys - python

I have a list of dictionaries where I want to get a new list of dictionaries with unique two keys: 1. City, 2. Country.
list = [
{ City: "Gujranwala", Country: "Pakistan", other_columns },
{ City: "Gujrwanala", Country: "India", other_columns },
{ City: "Gujranwala", Country: "Pakistan", other_columns }
]
The output should be:
list = [
{ City: "Gujranwala", Country: "Pakistan", other_columns },
{ City: "Gujrwanala", Country: "India", other_columns }
]

You can first extract the key-value pairs from the dicts and then remove duplicates by using a set. So you can do something like this:
Convert dicts into a list of dict_items:
dict_items = [tuple(d.items()) for d in lst] # they need to be tuples, otherwise you wouldn't be able to cast the list to a set
Deduplicate:
deduplicated = set(dict_items)
Convert the dict_items back to dicts:
back_to_dicts = [dict(i) for i in deduplicated]

I'm sure there are many other and probably better approaches to this problem, but you can use:
l = [
{ "City": "Gujranwala", "Country": "Pakistan" },
{ "City": "Gujrwanala", "Country": "India" },
{ "City": "Gujranwala", "Country": "Pakistan" }
]
ll, v = [], set()
for d in l:
k = d["City"] + d["Country"]
if not k in v:
v.add(k)
ll.append(d)
print(ll)
# [{'City': 'Gujranwala', 'Country': 'Pakistan'}, {'City': 'Gujrwanala', 'Country': 'India'}]`
Demo
We basically create a list with unique values containing the city and country that we use to verify if both values are already present on the final list.

One way to do this reduction is to have a dictionary with a unique key for every city, country combination. In my case I've just concatenated both those properties for the key which is a simple working solution.
We are using a dictionary here as the lookup on a dictionary happens in constant time, so the whole algorithm will run in O(n).
lst = [
{"City": "Gujranwala", "Country": "Pakistan"},
{"City": "Gujrwanala", "Country": "India"},
{"City": "Gujranwala", "Country": "Pakistan"}
]
unique = dict()
for item in lst:
# concatenate key
key = f"{item['City']}{item['Country']}"
# only add the value to the dictionary if we do not already have an item with this key
if not key in unique:
unique[key] = item
# get the dictionary values (we don't care about the keys now)
result = list(unique.values())
print(result)
Expected ouput:
[{'City': 'Gujranwala', 'Country': 'Pakistan'}, {'City': 'Gujrwanala', 'Country': 'India'}]

Related

How to sort nested dictionary by values in python

PROBLEM
I have a dictionary and it is nested i want to sort it using vlaues. There are many solution out there for the same question but I couldnt find one solution that satisfies my sorting condition
CONDITION
I want to sort thee dict in descending order of the given likes in the dict
Dict
dict = {actor_name: {movie_name: likes}
eg:- {'gal gadot': {'red notice': 1000}, 'tom holland': {'spiderman-nwh': 3000}}
output should be:- {'tom holland': {'spiderman-nwh': 3000}, 'gal gadot': {'red notice': 1000}}
I suggest improving your data structure first.
As an example you could use a list of dictionaries list[dict].
This would help you later, if you expand your structure.
Try this structure:
data = [
{
"actor": "gal gadot",
"movies": {
"name": "red notice",
"likes": 1000,
},
},
{
"actor": "tom holland",
"movies": {
"name": "spiderman-nwh",
"likes": 3000,
},
},
]
Using that structure, you can sort your data like this:
# Least likes first
least_likes_sorted = = sorted(data, key=lambda x: x["movies"]["likes"])
# Most likes first
most_likes_sorted = sorted(data, key=lambda x: x["movies"]["likes"], reverse=True)
You could build a list of tuples where each element is (likes, movie, actor).
Then sort the list in reverse.
Then reconstruct your dictionary.
Like this:
data = {'gal gadot': {'red notice': 1000}, 'tom holland': {'spiderman-nwh': 3000}}
lot = []
for k, v in data.items():
k_, v_ = next(iter(v.items()))
lot.append((v_, k_, k))
newdata = {a : {b: c} for c, b, a in sorted(lot, reverse=True)}
print(newdata)
Output:
{'tom holland': {'spiderman-nwh': 3000}, 'gal gadot': {'red notice': 1000}}

Merging multiple dictionaries with inconsistent keys

I'm a Python beginner and struggling with the following:
I'm attempting to merge multiple lists with nested dictionaries that I've decoded from multiple jsons. The common thread between the lists is the "uid" key for each nested dict corresponding to a name, but the problem is that some dicts have different names for the keys. For example, instead of "uid", a dict may have "number" as the key. I'd like to merge pieces of them together into a super nested-dictionary list of sorts. To illustrate, what I have is:
masterlist = [ ]
listA = [{"uid": "12345", "name": "John Smith"}, {etc...}]
listB = [{"number": "12345", "person": "John Smith", "val1": "25"}, {etc...}]
listC = [{"number": "12345", "person": "John Smith", "val2": "65"}, {etc...}]
What I'd like to end up with is:
masterlist = [{"uid": "12345", "name": "John Smith", "val1": "25", "val2: "65"}, {etc...}]
Is this possible to do efficiently/pythonically by iterating through and comparing for the identical "uid" value? I've seen a lot of how-tos on merging by matching keys but problem here obviously is the keys are not consistent. Sorting doesn't matter. All I need is for the master list to contain the corresponding uid, name, and values for each dict entry. Hopefully that makes sense and thank you!
There are probably solutions using base python, but simplest way I can think of is to use the pandas library to convert each list to a DataFrame, then join/merge them together.
import pandas as pd
dfA = pd.DataFrame(listA)
dfB = pd.DataFrame(listB)
merged_df = dfA.merge(dfB, left_on='uid', right_on='number')
That would return a DataFrame with more columns than you need (i.e. there would be columns for both "uid" and "number"), but you could specify which ones you want and the order you want them this way:
merged_df = merged_df[['uid', 'name', 'val1']]
For merging multiple DataFrames into one master frame, see here: pandas three-way joining multiple dataframes on columns
If you need to use different keys for each list, here is a solution that also uses an intermediate dict, with a function that takes the key representing uid and one or more keys to copy:
people_by_uid = {person["uid"]: person for person in listA}
def update_values(listX, uid_key, *val_keys):
for entry in listX:
person = people_by_uid[entry[uid_key]]
for val_key in val_keys:
person[val_key] = entry[val_key]
update_values(listB, "number", "val1")
update_values(listC, "number", "val2")
# e.g. if you had a listD from which you also needed val3 and val4:
update_values(listD, "number", "val3", "val4")
masterlist = [person for person in people_by_uid.values()]
You should put all your input lists in a list of lists, so that you can construct a dict that maps uid to a dict with aggregated item values, so that your desired list of dicts would be simply the dict values of the mapping. To allow for inconsistent naming of the key in different input dicts, pop the ones you don't want (such as number and id in my example) and assign to the dict with the key you want to keep (such as uid in the example):
wanted_key = 'uid'
unwanted_keys = {'number', 'id'}
mapping = {}
for l in lists:
for d in l:
if wanted_key not in d:
d[wanted_key] = d.pop(unwanted_keys.intersection(d).pop())
mapping.setdefault(d[wanted_key], {}).update(d)
masterlist = list(mapping.values())
so that given:
lists = [
[
{"uid": "12345", "name": "John Smith"},
{"uid": "56789", "name": "Joe Brown", "val1": "1"}
],
[
{"number": "12345", "name": "John Smith", "val1": "25"},
{"number": "56789", "name": "Joe Brown", "val2": "2"}
],
[
{"id": "12345", "name": "John Smith", "val2": "65"}
]
]
masterlist becomes:
[
{'uid': '12345', 'name': 'John Smith', 'val1': '25', 'val2': '65'},
{'uid': '56789', 'name': 'Joe Brown', 'val1': '1', 'val2': '2'}
]
You can do this without Pandas using a list comprehension that builds a dictionary of dictionaries to group the list's dictionaries by their "uid". You then take the .values() of that grouping dictionary to get a list of dictionaries again:
listA = [{"uid": "12345", "name": "John Smith"},{"uid": "67890", "name": "Jane Doe"}]
listB = [{"number": "12345", "person": "John Smith", "val1": "25"},{"number": "67890", "val1": "37"}]
listC = [{"number": "12345", "person": "John Smith", "val2": "65"},{"number": "67890", "val2": "53"}]
from collections import defaultdict
fn = { "number":"uid", "person":"name" } # map to get uniform key names
data = [ { fn.get(k,k):v for k,v in d.items() } for d in listA+listB+listC ]
result = next(r for r in [defaultdict(dict)] if [r[d["uid"]].update(d) for d in data])
print(*result.values())
{'uid': '12345', 'name': 'John Smith', 'val1': '25', 'val2': '65'}
{'uid': '67890', 'name': 'Jane Doe', 'val1': '37', 'val2': '53'}

Remove duplicate of a dictionary from list

How can i remove duplicate of the key "name"
[
{
'items':[
{
'$oid':'5a192d0590866ecc5c1f1683'
}
],
'image':'image12',
'_id':{
'$oid':'5a106f7490866e25ddf70cef'
},
'name':'Amala',
'store':{
'$oid':'5a0a10ad90866e5abae59470'
}
},
{
'items':[
{
'$oid':'5a192d2890866ecc5c1f1684'
}
],
'image':'fourth shit',
'_id':{
'$oid':'5a106fa190866e25ddf70cf0'
},
'name':'Amala',
'store':{
'$oid':'5a0a10ad90866e5abae59470'
}
}
]
I want to marge together dictionary with the same key "name"
Here is what i have tried
b = []
for q in data:
if len(data) == 0:
b.append(q)
else:
for y in b:
if q['name'] != y['name']:
b.append(q)
but after trying this the b list doesn't return unique dictionary that i wanted
You loop through the assembled list and if you find a dict with a different name, you add the current dict. The logic should be different: only add it if you don't find one with the same name!
That being said, you should maintain a set of seen names. That will make the check more performant:
b, seen = [], set()
for q in data:
if q['name'] not in seen:
b.append(q)
seen.add(q['name'])

iterate list and compare it's values with dict

I am dealing with 1. list of dictionaries and 2. list. I am trying to:
1. iterate through list (list1),
2. Match the list1's value with the ID of API response, if found- put entire dict in a new_dict
3. else skip
API response in Json format:
list_of_dict=[{"id":1500,
"f_name": "alex",
"age": 25
},
{"id" :1501,
"f_name":"Bob",
"age": 30
},
{"id" :1600,
"f_name":"Charlie",
"age": 35
}
...
]
And a list1:
list1=[1500,1501,1211.....]
According to this, 1500 & 1501 is present in list_of_dict so that entire dict will be added in new_dict.
My attempt:
new_dict=dict()
for i,val in enumerate(list1):
#assuming val found in dict
#so put it in new dict
print i ,"=",val
new_dict.update({"id": val,"name":name, "age":age})
What i see is this code taking only last item of the list and updates the dict.. but in my case, new_dict will contains two dictionaries with id 1500 and 1501. What i am missing?
list_of_dict = [{"id":1500,
"f_name": "alex",
"age": 25
},
{"id" :1501,
"f_name":"Bob",
"age": 30
},
{"id" :1600,
"f_name":"Charlie",
"age": 35
}
]
list1=[1500,1501,1211]
ret = filter(lambda x: x['id'] in list1, list_of_dict)
print ret
Check out the useful and simple filter function built-in to python. It iterates through an iterable (list) and returns only the items that return true for the provided function
In this case, our filtering function is:
lambda x: x['id'] in list1
list_of_dict = [{"id":1500,
"f_name": "alex",
"age": 25
},
{"id" :1501,
"f_name":"Bob",
"age" 30
},
{"id" :1600,
"f_name":"Charlie",
"age" 35
}
...
]
dicts = {d['id']:d for d in list_of_dict}
list1=[1500,1501,1211.....]
answer = [dicts[k] for k in list1 if k in dicts]
You can do this with a simple list comprehension, filtering the dicts by whether their id is in the list:
result = [d for d in list_of_dict if d["id"] in list1]
If your list1 is larger, you might want to turn it into a set first, so the lookup is faster:
list1_as_set = set(list1)
result = [d for d in list_of_dict if d["id"] in list1_as_set]

List of values for a key in List of list of dictionaries?

Suppose I have a list of a list of dictionaries.
I'm pretty sure Python has a nice and short way (without writing 2 for loops) to retrieve the value associated with the key. (Every dictionary has a key).
How can I do this?
Edit : There is a sample (JSON representation) of the input
[
[
{
"lat": 45.1845931,
"lgt": 5.7316984,
"name": "Chavant",
"sens": "",
"line" : [],
"stationID" : ""
},
{
"lat": 45.1845898,
"lgt": 5.731746,
"name": "Chavant",
"sens": "",
"line" : [],
"stationID" : ""
}
],
[
{
"lat": 45.1868233,
"lgt": 5.7565727,
"name": "Neyrpic - Belledonne",
"sens": "",
"line" : [],
"stationID" : ""
},
{
"lat": 45.1867322,
"lgt": 5.7568569,
"name": "Neyrpic - Belledonne",
"sens": "",
"line" : [],
"stationID" : ""
}
]
]
As output I'd like to have a list of names.
PS: Data under ODBL.
If you need all names in flat list:
response = # your big list
[d.get('name') for lst in response for d in lst]
if you want to get result with inner list:
[[d.get('name') for d in lst] for lst in response]
Call your list of lists of dictionaries L. Then you can retrieve all names using a list comprehension by iterating through each sublist and then each dictionary.
Demo
>>> vals = [ d['name'] for sublist in L for d in sublist ]
>>> vals
[u'Chavant', u'Chavant', u'Neyrpic - Belledonne', u'Neyrpic - Belledonne']
Note that this returns a flattened list of all names (and that 'Chavant' appears twice, since it appears twice in your input).

Categories

Resources