Merging two list of dictionaries based on key - python

dict1 = [{'id': 1.0, 'name': 'aa'},
{'id': 4.0, 'name': 'bb'},
{'id': 2.0, 'name': 'cc'}]
and
dict2 = [{'name': 'aa', 'dtype': 'StringType'},
{'name': 'bb', 'dtype': 'StringType'},
{'name': 'xx', 'dtype': 'StringType'},
{'name': 'cc', 'dtype': 'StringType'}]
I would like to merge this two dictionaries based on their common key which is name.
I would like to get the following desired result.
merged_dict= [{'id': 1.0, 'name': 'aa', 'dtype': 'StringType'},
{'id': 4.0, 'name': 'bb', 'dtype': 'StringType'},
{'id': 2.0, 'name': 'cc', 'dtype': 'StringType'}]
I was trying to get this using the following for loop.
for i in dict1:
for j in dict2:
j.update(i)

To avoid quadratic complexity, better first create a real dictionary (yours are lists of dictionaries), then update:
tmp = {d['name']: d for d in dict2}
for d in dict1:
d.update(tmp.get(d['name'], {}))
print(dict1)
Output:
[{'id': 1.0, 'name': 'aa', 'dtype': 'StringType'},
{'id': 4.0, 'name': 'bb', 'dtype': 'StringType'},
{'id': 2.0, 'name': 'cc', 'dtype': 'StringType'}]
Intermediate tmp:
{'aa': {'name': 'aa', 'dtype': 'StringType'},
'bb': {'name': 'bb', 'dtype': 'StringType'},
'xx': {'name': 'xx', 'dtype': 'StringType'},
'cc': {'name': 'cc', 'dtype': 'StringType'}}
If you want a copy (rather that modifying dict1 in place):
tmp = {d['name']: d for d in dict2}
merged_dict = [d|tmp.get(d['name'], {}) for d in dict1]

You can use pandas and try following:
import pandas as pd
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)
res = df1.merge(df2, on=['name'])
The output:
id name dtype
0 1.0 aa StringType
1 4.0 bb StringType
2 2.0 cc StringType
If you need a dictionary, you can convert merged result pd.DataFrame() to dict.
res.to_dict('records')
Final output is:
[
{'id': 1.0, 'name': 'aa', 'dtype': 'StringType'},
{'id': 4.0, 'name': 'bb', 'dtype': 'StringType'},
{'id': 2.0, 'name': 'cc', 'dtype': 'StringType'}
]

Related

drop a dictionary with nan value

I have the following dictionary:
my_dict = {'fields': ['id': 1.0,
'name': 'aaa',
'type': 'string'},
{'id': 3.0,
'name': 'eee',
'type': 'string'},
{'id': nan,
'name': 'bbb',
'type': 'string'},
{'id': 4.0,
'name': 'ccc',
'type': 'string'},
{'id': nan,
'name': 'ddd',
'type': 'string'}],
'type': 'struct'
}
From this dictionary, I would like to drop the dictionary with the id value nan value and would like to get the following.
my_updated_dict = {'fields': ['id': 1.0,
'name': 'aaa',
'type': 'string'},
{'id': 3.0,
'name': 'eee',
'type': 'string'},
{'id': 4.0,
'name': 'ccc',
'type': 'string'}],
'type': 'struct'
}
I was trying changing to data frame and dropping the id value with the nan value and changing to dictionary back but couldn't get the intended result.
my_updated_dict = pd.DataFrame(my_dict ).dropna().to_dict('list')
I do not know why would you need pandas for that if u can simply do:
my_dict["fields"] = [i for i in my_dict["fields"] if not np.isnan(i["id"])]
** UPDATE **
if you really do need for some reason to use pandas, you may try this constructiion:
my_dict["fields"] = pd.Series(my_dict["fields"]).apply(pd.Series).dropna().to_dict(orient="records")
though I do not see any advantages over simple list comprehension, except may be on big volume of information.
You can use update() to overwrite the value of the key, then you can try:
my_dict.update({'fields':[x for x in my_dict['fields'] if np.nan not in x.values()]})
Returning:
{'fields': [{'id': 1.0, 'name': 'aaa', 'type': 'string'},
{'id': 3.0, 'name': 'eee', 'type': 'string'},
{'id': 4.0, 'name': 'ccc', 'type': 'string'}],
'type': 'struct'}
Considering the dictionary json
import numpy as np
json = {'fields': [{'id': 1.0, 'name': 'aaa', 'type': 'string'},
{'id': 3.0, 'name': 'eee', 'type': 'string'},
{'id': np.nan, 'name': 'bbb', 'type': 'string'},
{'id': 4.0, 'name': 'ccc', 'type': 'string'},
{'id': np.nan, 'name': 'ddd', 'type': 'string'}],
'type': 'struct'}
In order to remove the parts where id is np.nan, one can use a list comprehension with numpy.isnan as follows
json['fields'] = [x for x in json['fields'] if not np.isnan(x['id'])]
[Out]:
{'fields': [{'id': 1.0, 'name': 'aaa', 'type': 'string'},
{'id': 3.0, 'name': 'eee', 'type': 'string'},
{'id': 4.0, 'name': 'ccc', 'type': 'string'}],
'type': 'struct'}

Remove duplicates from a list of dicts

I have a list of dicts like this:
[{'ID': 'a', 'Number': 2}, {'ID': 'b', 'Number': 5} , {'ID': 'a', 'Number': 6}, {'ID': 'a', 'Number': 8}, {'ID': 'c', 'Number': 3}]
I want to remove the dicts that have same key and only keep the one with smallest value. The expected result should be:
[{'ID': 'a', 'Number': 2}, {'Id': 'b', 'Number': 5}, {'ID': 'c', 'Number': 3}]
Most efficient solution would be to use a temporary lookup dictionary with keys as IDs and values as the current dict which has the lowest Number corresponding to that ID.
l = [{'ID': 'a', 'Number': 2},
{'ID': 'b', 'Number': 5}, # note that I corrected a typo Id --> ID
{'ID': 'a', 'Number': 6},
{'ID': 'a', 'Number': 8},
{'ID': 'c', 'Number': 3}]
lookup_dict = {}
for d in l:
if d['ID'] not in lookup_dict or d['Number'] < lookup_dict[d['ID']]['Number']:
lookup_dict[d['ID']] = d
output = list(lookup_dict.values())
which gives output as:
[{'ID': 'a', 'Number': 2}, {'ID': 'b', 'Number': 5}, {'ID': 'c', 'Number': 3}]
A piece of advice: given your final data structure, I wonder if you may be better off now representing this final data as a dictionary - with the IDs as keys since these are now unique. This would allow for more convenient data access.

Using Glom on a nested structure, how to I move top level dictionary fields into a list of dictionaries?

This is a question about the usage of Glom (https://github.com/mahmoud/glom/)
I have a dictionary that includes a list of other dictionaries.
{'date': '2020-01-01',
'location': 'A',
'items': [
{'name': 'A', 'id': 'A1'},
{'name': 'B', 'id': 'B1'},
{'name': 'C', 'id': 'C1'}
]}
I would like to use Glom to move the outer, global dictionary fields 'date' and 'location' into list of dictionaries for the items.
This is the end result I try to reach
[
{'name': 'A', 'id': 'A1', 'date': '2020-01-01', 'location': 'A'},
{'name': 'B', 'id': 'B1', 'date': '2020-01-01', 'location': 'A'},
{'name': 'C', 'id': 'C1', 'date': '2020-01-01', 'location': 'A'}
]
Alas, when the spec arrives at the 'item' of the dictionary, the other values are not longer accessable and the T object is set to the inner value instead.
from glom import glom, T
def update_dict(x, other_dict):
x.update({'date': other_dict['date'], 'location': other_dict['location']})
return x.copy()
spec = (T, 'items', [(lambda x: update_dict(x, T()))])
data = {'date': '2020-01-01',
'location': 'A',
'items': [{'name': 'A', 'id': 'A1'},
{'name': 'B', 'id': 'B1'},
{'name': 'C', 'id': 'C1'}]}
glom(data, spec) # print this
returns
[{'name': 'A', 'id': 'A1', 'date': T()['date'], 'location': T()['location']},
{'name': 'B', 'id': 'B1', 'date': T()['date'], 'location': T()['location']},
{'name': 'C', 'id': 'C1', 'date': T()['date'], 'location': T()['location']}]
Which is useless.
It's not difficult to update the dictionaries with regular Python code, but
is there a way to do this within a Glom spec?
The trick is to pass the target as a global scope as well,
this way, the Assign command can access the full target.
from glom import S, glom, Assign, Spec
spec = ('items',
[Assign( 'date', Spec(S['date']))],
[Assign( 'location', Spec(S['location']))]
)
target = {'date': '2020-04-01',
'location': 'A',
'items': [
{'name': 'A', 'id': 'A1'},
{'name': 'B', 'id': 'B1'},
{'name': 'C', 'id': 'C1'}
]}
spec = Spec(('items', [Assign( 'date', Spec(S['date']))], [Assign( 'location', Spec(S['location']))]))
glom(target, spec, scope=target)
Results in
[{'name': 'A', 'id': 'A1', 'date': '2020-04-01', 'location': 'A'},
{'name': 'B', 'id': 'B1', 'date': '2020-04-01', 'location': 'A'},
{'name': 'C', 'id': 'C1', 'date': '2020-04-01', 'location': 'A'}]

How do I merge multiple dictionaries values having same key in python?

I have n number of dicts like this :
dict_1 = {1: {'Name': 'xyz', 'Title': 'Engineer'}, 2: {'Name': 'abc',
'Title': 'Software'}}
dict_2 = {1: {'Education': 'abc'}, 2: {'Education': 'xyz'}}
dict_3 = {1: {'Experience': 2}, 2:{'Experience': 3}}
.
.
.
dict_n
I just want to combine all of them based on main key like this :
final_dict = {1: {'Name': 'xyz', 'Title': 'Engineer', 'Education':
'abc', 'Experience': 2},
2: {'Name': 'abc', 'Title': 'Software', 'Education':
'xyz', 'Experience': 3}}
can anybody help me to achieve this ?
from your question I think you have n number of dicts. So make list of your dicts and combine all the values having same key. But that itself won't give the exact answer. They are list of dicts. So the second thing I have done is make all those small dicts to one dict .
Here my code you can check :
d1 = {1: {'Name': 'xyz', 'Title': 'Engineer'}, 2: {'Name': 'abc',
'Title': 'Software'}}
d2 = {1: {'Education': 'abc'}, 2: {'Education': 'xyz'}}
d3 = {1: {'Experience': 2}, 2:{'Experience': 3}}
ds = [d1, d2, d3] # list of your dicts you can change it to dict_
big_dict = {}
for k in ds[0]:
big_dict[k] = [d[k] for d in ds]
for k in big_dict.keys():
result = {}
for d in big_dict[k]:
result.update(d)
big_dict[k] = result
print(big_dict)
It gives O/P like this :
{
1: {'Education': 'abc', 'Title': 'Engineer', 'Name': 'xyz',
'Experience': 2},
2: {'Education': 'xyz', 'Title': 'Software', 'Name': 'abc',
'Experience': 3}
}

What Is a Pythonic Way to Build a Dict of Dictionary-Lists by Attribute?

I'm looking for pythonic way to convert list of tuples which looks like this:
res = [{type: 1, name: 'Nick'}, {type: 2, name: 'Helma'}, ...]
To dict like this:
{1: [{type: 1, name: 'Nick'}, ...], 2: [{type: 2, name: 'Helma'}, ...]}
Now i do this with code like this (based on this question):
d = defaultdict(list)
for v in res:
d[v["type"]].append(v)
Is this a Pythonic way to build dict of lists of objects by attribute?
I agree with the commentators that here, list comprehension will lack, well, comprehension.
Having said that, here's how it can go:
import itertools
a = [{'type': 1, 'name': 'Nick'}, {'type': 2, 'name': 'Helma'}, {'type': 1, 'name': 'Moshe'}]
by_type = lambda a: a['type']
>>> dict([(k, list(g)) for (k, g) in itertools.groupby(sorted(a, key=by_type), key=by_type)])
{1: [{'name': 'Nick', 'type': 1}, {'name': 'Moshe', 'type': 1}], ...}
The code first sorts by 'type', then uses itertools.groupby to group by the exact same critera.
I stopped understanding this code 15 seconds after I finished writing it :-)
You could do it with a dictionary comprehension, which wouldn't be as illegible or incomprehensible as the comments suggest (IMHO):
# A collection of name and type dictionaries
res = [{'type': 1, 'name': 'Nick'},
{'type': 2, 'name': 'Helma'},
{'type': 3, 'name': 'Steve'},
{'type': 1, 'name': 'Billy'},
{'type': 3, 'name': 'George'},
{'type': 4, 'name': 'Sylvie'},
{'type': 2, 'name': 'Wilfred'},
{'type': 1, 'name': 'Jim'}]
# Creating a dictionary by type
res_new = {
item['type']: [each for each in res
if each['type'] == item['type']]
for item in res
}
>>>res_new
{1: [{'name': 'Nick', 'type': 1},
{'name': 'Billy', 'type': 1},
{'name': 'Jim', 'type': 1}],
2: [{'name': 'Helma', 'type': 2},
{'name': 'Wilfred', 'type': 2}],
3: [{'name': 'Steve', 'type': 3},
{'name': 'George', 'type': 3}],
4: [{'name': 'Sylvie', 'type': 4}]}
Unless I missed something, this should give you the result you're looking for.

Categories

Resources