Merging multiple dictionaries with inconsistent keys

Merging multiple dictionaries with inconsistent keys - python

I'm a Python beginner and struggling with the following:
I'm attempting to merge multiple lists with nested dictionaries that I've decoded from multiple jsons. The common thread between the lists is the "uid" key for each nested dict corresponding to a name, but the problem is that some dicts have different names for the keys. For example, instead of "uid", a dict may have "number" as the key. I'd like to merge pieces of them together into a super nested-dictionary list of sorts. To illustrate, what I have is:
masterlist = [ ]
listA = [{"uid": "12345", "name": "John Smith"}, {etc...}]
listB = [{"number": "12345", "person": "John Smith", "val1": "25"}, {etc...}]
listC = [{"number": "12345", "person": "John Smith", "val2": "65"}, {etc...}]
What I'd like to end up with is:
masterlist = [{"uid": "12345", "name": "John Smith", "val1": "25", "val2: "65"}, {etc...}]
Is this possible to do efficiently/pythonically by iterating through and comparing for the identical "uid" value? I've seen a lot of how-tos on merging by matching keys but problem here obviously is the keys are not consistent. Sorting doesn't matter. All I need is for the master list to contain the corresponding uid, name, and values for each dict entry. Hopefully that makes sense and thank you!

There are probably solutions using base python, but simplest way I can think of is to use the pandas library to convert each list to a DataFrame, then join/merge them together.
import pandas as pd
dfA = pd.DataFrame(listA)
dfB = pd.DataFrame(listB)
merged_df = dfA.merge(dfB, left_on='uid', right_on='number')
That would return a DataFrame with more columns than you need (i.e. there would be columns for both "uid" and "number"), but you could specify which ones you want and the order you want them this way:
merged_df = merged_df[['uid', 'name', 'val1']]
For merging multiple DataFrames into one master frame, see here: pandas three-way joining multiple dataframes on columns

If you need to use different keys for each list, here is a solution that also uses an intermediate dict, with a function that takes the key representing uid and one or more keys to copy:
people_by_uid = {person["uid"]: person for person in listA}
def update_values(listX, uid_key, *val_keys):
for entry in listX:
person = people_by_uid[entry[uid_key]]
for val_key in val_keys:
person[val_key] = entry[val_key]
update_values(listB, "number", "val1")
update_values(listC, "number", "val2")
# e.g. if you had a listD from which you also needed val3 and val4:
update_values(listD, "number", "val3", "val4")
masterlist = [person for person in people_by_uid.values()]

You should put all your input lists in a list of lists, so that you can construct a dict that maps uid to a dict with aggregated item values, so that your desired list of dicts would be simply the dict values of the mapping. To allow for inconsistent naming of the key in different input dicts, pop the ones you don't want (such as number and id in my example) and assign to the dict with the key you want to keep (such as uid in the example):
wanted_key = 'uid'
unwanted_keys = {'number', 'id'}
mapping = {}
for l in lists:
for d in l:
if wanted_key not in d:
d[wanted_key] = d.pop(unwanted_keys.intersection(d).pop())
mapping.setdefault(d[wanted_key], {}).update(d)
masterlist = list(mapping.values())
so that given:
lists = [
[
{"uid": "12345", "name": "John Smith"},
{"uid": "56789", "name": "Joe Brown", "val1": "1"}
],
[
{"number": "12345", "name": "John Smith", "val1": "25"},
{"number": "56789", "name": "Joe Brown", "val2": "2"}
],
[
{"id": "12345", "name": "John Smith", "val2": "65"}
]
]
masterlist becomes:
[
{'uid': '12345', 'name': 'John Smith', 'val1': '25', 'val2': '65'},
{'uid': '56789', 'name': 'Joe Brown', 'val1': '1', 'val2': '2'}
]

You can do this without Pandas using a list comprehension that builds a dictionary of dictionaries to group the list's dictionaries by their "uid". You then take the .values() of that grouping dictionary to get a list of dictionaries again:
listA = [{"uid": "12345", "name": "John Smith"},{"uid": "67890", "name": "Jane Doe"}]
listB = [{"number": "12345", "person": "John Smith", "val1": "25"},{"number": "67890", "val1": "37"}]
listC = [{"number": "12345", "person": "John Smith", "val2": "65"},{"number": "67890", "val2": "53"}]
from collections import defaultdict
fn = { "number":"uid", "person":"name" } # map to get uniform key names
data = [ { fn.get(k,k):v for k,v in d.items() } for d in listA+listB+listC ]
result = next(r for r in [defaultdict(dict)] if [r[d["uid"]].update(d) for d in data])
print(*result.values())
{'uid': '12345', 'name': 'John Smith', 'val1': '25', 'val2': '65'}
{'uid': '67890', 'name': 'Jane Doe', 'val1': '37', 'val2': '53'}

Related

Create unique list of dictionaries using dict keys

I have a list of dictionaries where I want to get a new list of dictionaries with unique two keys: 1. City, 2. Country.
list = [
{ City: "Gujranwala", Country: "Pakistan", other_columns },
{ City: "Gujrwanala", Country: "India", other_columns },
{ City: "Gujranwala", Country: "Pakistan", other_columns }
]
The output should be:
list = [
{ City: "Gujranwala", Country: "Pakistan", other_columns },
{ City: "Gujrwanala", Country: "India", other_columns }
]

You can first extract the key-value pairs from the dicts and then remove duplicates by using a set. So you can do something like this:
Convert dicts into a list of dict_items:
dict_items = [tuple(d.items()) for d in lst] # they need to be tuples, otherwise you wouldn't be able to cast the list to a set
Deduplicate:
deduplicated = set(dict_items)
Convert the dict_items back to dicts:
back_to_dicts = [dict(i) for i in deduplicated]

I'm sure there are many other and probably better approaches to this problem, but you can use:
l = [
{ "City": "Gujranwala", "Country": "Pakistan" },
{ "City": "Gujrwanala", "Country": "India" },
{ "City": "Gujranwala", "Country": "Pakistan" }
]
ll, v = [], set()
for d in l:
k = d["City"] + d["Country"]
if not k in v:
v.add(k)
ll.append(d)
print(ll)
# [{'City': 'Gujranwala', 'Country': 'Pakistan'}, {'City': 'Gujrwanala', 'Country': 'India'}]`
Demo
We basically create a list with unique values containing the city and country that we use to verify if both values are already present on the final list.

One way to do this reduction is to have a dictionary with a unique key for every city, country combination. In my case I've just concatenated both those properties for the key which is a simple working solution.
We are using a dictionary here as the lookup on a dictionary happens in constant time, so the whole algorithm will run in O(n).
lst = [
{"City": "Gujranwala", "Country": "Pakistan"},
{"City": "Gujrwanala", "Country": "India"},
{"City": "Gujranwala", "Country": "Pakistan"}
]
unique = dict()
for item in lst:
# concatenate key
key = f"{item['City']}{item['Country']}"
# only add the value to the dictionary if we do not already have an item with this key
if not key in unique:
unique[key] = item
# get the dictionary values (we don't care about the keys now)
result = list(unique.values())
print(result)
Expected ouput:
[{'City': 'Gujranwala', 'Country': 'Pakistan'}, {'City': 'Gujrwanala', 'Country': 'India'}]

How to sort nested dictionary by values in python

PROBLEM
I have a dictionary and it is nested i want to sort it using vlaues. There are many solution out there for the same question but I couldnt find one solution that satisfies my sorting condition
CONDITION
I want to sort thee dict in descending order of the given likes in the dict
Dict
dict = {actor_name: {movie_name: likes}
eg:- {'gal gadot': {'red notice': 1000}, 'tom holland': {'spiderman-nwh': 3000}}
output should be:- {'tom holland': {'spiderman-nwh': 3000}, 'gal gadot': {'red notice': 1000}}

I suggest improving your data structure first.
As an example you could use a list of dictionaries list[dict].
This would help you later, if you expand your structure.
Try this structure:
data = [
{
"actor": "gal gadot",
"movies": {
"name": "red notice",
"likes": 1000,
},
},
{
"actor": "tom holland",
"movies": {
"name": "spiderman-nwh",
"likes": 3000,
},
},
]
Using that structure, you can sort your data like this:
# Least likes first
least_likes_sorted = = sorted(data, key=lambda x: x["movies"]["likes"])
# Most likes first
most_likes_sorted = sorted(data, key=lambda x: x["movies"]["likes"], reverse=True)

You could build a list of tuples where each element is (likes, movie, actor).
Then sort the list in reverse.
Then reconstruct your dictionary.
Like this:
data = {'gal gadot': {'red notice': 1000}, 'tom holland': {'spiderman-nwh': 3000}}
lot = []
for k, v in data.items():
k_, v_ = next(iter(v.items()))
lot.append((v_, k_, k))
newdata = {a : {b: c} for c, b, a in sorted(lot, reverse=True)}
print(newdata)
Output:
{'tom holland': {'spiderman-nwh': 3000}, 'gal gadot': {'red notice': 1000}}

Fastest way to get values from dictionary to another dictionary in python without using if?

I need to find a way to get values from one dictionary to another, bases on key name match without using two loops \ if statement.
Main goal is to make it run more efficiently since it's a part of a larger code and run on multiple threads.
If you can keep the dictionary structure it would help
The second dict is initialized with values 0 in advanced
dict_1 = {
"school": {
"main": ["first_key"]
},
"specific": {
"students": [
{
"name": "sam",
"age": 13
},
{
"name": "dan",
"age": 9
},
{
"name": "david",
"age": 20
},
{
"name": "mike",
"age": 5
}
],
"total_students": 4
}
}
dict_2 = {'sam': 0, 'mike': 0, 'david': 0, 'dan': 0, 'total_students': 0}
for i in dict_1["specific"]['students']:
for x in dict_2:
if x == i["name"]:
dict_2[x] = i["age"]
dict_2["total_students"] = dict_1["specific"]["total_students"]
print(dict_2)
Is there more elegant way of doing it?

You don't need two loops at all! You don't even need to initialize dict_2 in advance. Simply loop over dict_1["specific"]["students"] and assign the ages to dict_2 without an if.
for student in dict_1["specific"]["students"]:
student_name = student["name"]
student_age = student["age"]
dict_2[student_name] = student_age
You could also write this as a comprehension:
dict_2 = {student["name"]: student["age"] for student in dict_1["specific"]["students"]}
Both these give the following dict_2:
{'sam': 13, 'dan': 9, 'david': 20, 'mike': 5}
Then you can set dict_2["total_students"] like you already do, but outside any loops.
dict_2["total_students"] = dict_1["specific"]["total_students"]
If you only want to assign ages for students already in dict_2, you need the if. However, you can still do this with a single loop instead of two. :
for student in dict_1["specific"]["students"]:
student_name = student["name"]
if student_name not in dict_2:
continue # skip the rest of this loop
student_age = student["age"]
dict_2[student_name] = student_age
Both these approaches use a single loop, so they're going to be faster than your two-loop approach.

for i in dict_1['specific']['students']:
dict_2[i['name']]=i['age']
dict_2['total_students']=dict_1['specific']['total_students']
Runs in O(N) time and O(N) space.
Yours currently runs in O(N^2) time and O(N) space

Sort subdictionary lists by their dictionary's key values

I have an XML file(~500 lines), that is converted to nested ordered dictionary in Python. I need that to be sorted according to values, but because it is nested so much and has lists in it, I am lost. I have searched for answers, trying to find a way to sort dictionary that is mixed with lists. But no luck so far.
This is the closest I have got. Python: Sort nested dictionary by value But because I have a key that has 2 subkeys "foo" and "bar", which have lists, which themselves have dictionaries inside, it isn't not quite what I need.
I would like to sort "foo" and "bar" according to "Date". Or return subdictionaries by "Date" values.
I have a loop that iterates through the subdictionary's list, but it does not sort it. I have tried changing it to fix it, but no changes. It also doesn't help, that lambda seems like magic to me.
for i in range(len(your_dict['History']['foo'])):
mine = OrderedDict(sorted(your_dict.items(), key=lambda x: x[1]["foo"][i]['Date']))
Short example of the dictionary at hand:
"History": {
"#version": "4.0",
"foo": [
{"code": "ID", "Date": "2018-07-09T15:31:09+03:00"},
{"Date": "2018-07-09T13:46:09+03:00"}
],
"bar": [
{"code": "ID", "Date": "2018-07-09T09:39:29+03:00"},
{"code": "ID", "Date": "2018-07-09T09:48:25+03:00"}
]
}
So how could "foo" and "bar" be sorted?

>>> s = dict((k,sorted(v, key=lambda x: x['Date'])) for k,v in d['History'].items() if type(v)==list)
>>> s['foo']
[{'Date': '2018-07-09T13:46:09+03:00'}, {'code': 'ID', 'Date': '2018-07-09T15:31:09+03:00'}]
>>> s['bar']
[{'code': 'ID', 'Date': '2018-07-09T09:39:29+03:00'}, {'code': 'ID', 'Date': '2018-07-09T09:48:25+03:00'}]
Explanation
The following list comprehension would return the dict items as key-value pairs
[(k,v) for k,v in d['History'].items()]
This would filter the key-value pairs and include them only when value is of type list
[(k,v) for k,v in d['History'].items() if type(v)==list]
# [('foo', [{'code': 'ID', 'Date': '2018-07-09T15:31:09+03:00'}, {'Date': '2018-07-09T13:46:09+03:00'}]), ('bar', [{'code': 'ID', 'Date': '2018-07-09T09:39:29+03:00'}, {'code': 'ID', 'Date': '2018-07-09T09:48:25+03:00'}])]
All we have to do now, is to sort each value based on date. sorted(v, key=lambda x: x['Date'])) would do that
Merging these in a single line, you get the above mentioned one-liner

Using sorted with lambda
Ex:
d = {"History": {
"#version": "4.0",
"foo": [
{"code": "ID", "Date": "2018-07-09T15:31:09+03:00"},
{"Date": "2018-07-09T13:46:09+03:00"}
],
"bar": [
{"code": "ID", "Date": "2018-07-09T09:39:29+03:00"},
{"code": "ID", "Date": "2018-07-09T09:48:25+03:00"}
]
}
}
print(sorted(d["History"]["foo"], key=lambda k: k['Date']))
print(sorted(d["History"]["bar"], key=lambda k: k['Date']))
Output:
[{'Date': '2018-07-09T13:46:09+03:00'}, {'Date': '2018-07-09T15:31:09+03:00', 'code': 'ID'}]
[{'Date': '2018-07-09T09:39:29+03:00', 'code': 'ID'}, {'Date': '2018-07-09T09:48:25+03:00', 'code': 'ID'}]

The problem is you are trying to get .items, while you just want to sort a list, so no need for all that stuff, just get it straightforward like this:
for i in ('foo', 'bar'):
your_dict['History'][i] = sorted(your_dict['History][i], key=lambda x:x['date'])

How to parse for wildcard values in dictionary inside a list?

How do you parse dictionaries using wildcard values inside a list? I have a large list with numerous dicts and want to pull out data for all dicts and dump into a new dict, using keys that match a wildcard value.
For example, I'd like to retrieve data for all keys that match the value "Tom*" and dump into a new dict list2:
list = [
{"name": "Tom David Smith", "age": 10, "sex": "M"},
{"name": "Tom Harrison", "age": 5, "sex": "M"},
{"name": "Pam", "age": 7, "sex": "F"}
]

You can use a list comprehension and check the names with str.startwith() method :
>>> [d for d in l if d['name'].startswith('Tom')]
[{'age': 10, 'name': 'Tom David Smith', 'sex': 'M'}, {'age': 5, 'name': 'Tom Harrison', 'sex': 'M'}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merging multiple dictionaries with inconsistent keys - python

Related

Create unique list of dictionaries using dict keys

How to sort nested dictionary by values in python

Fastest way to get values from dictionary to another dictionary in python without using if?

Sort subdictionary lists by their dictionary's key values

How to parse for wildcard values in dictionary inside a list?

Categories

Resources