Related
dict1 = [{'id': 1.0, 'name': 'aa'},
{'id': 4.0, 'name': 'bb'},
{'id': 2.0, 'name': 'cc'}]
and
dict2 = [{'name': 'aa', 'dtype': 'StringType'},
{'name': 'bb', 'dtype': 'StringType'},
{'name': 'xx', 'dtype': 'StringType'},
{'name': 'cc', 'dtype': 'StringType'}]
I would like to merge this two dictionaries based on their common key which is name.
I would like to get the following desired result.
merged_dict= [{'id': 1.0, 'name': 'aa', 'dtype': 'StringType'},
{'id': 4.0, 'name': 'bb', 'dtype': 'StringType'},
{'id': 2.0, 'name': 'cc', 'dtype': 'StringType'}]
I was trying to get this using the following for loop.
for i in dict1:
for j in dict2:
j.update(i)
To avoid quadratic complexity, better first create a real dictionary (yours are lists of dictionaries), then update:
tmp = {d['name']: d for d in dict2}
for d in dict1:
d.update(tmp.get(d['name'], {}))
print(dict1)
Output:
[{'id': 1.0, 'name': 'aa', 'dtype': 'StringType'},
{'id': 4.0, 'name': 'bb', 'dtype': 'StringType'},
{'id': 2.0, 'name': 'cc', 'dtype': 'StringType'}]
Intermediate tmp:
{'aa': {'name': 'aa', 'dtype': 'StringType'},
'bb': {'name': 'bb', 'dtype': 'StringType'},
'xx': {'name': 'xx', 'dtype': 'StringType'},
'cc': {'name': 'cc', 'dtype': 'StringType'}}
If you want a copy (rather that modifying dict1 in place):
tmp = {d['name']: d for d in dict2}
merged_dict = [d|tmp.get(d['name'], {}) for d in dict1]
You can use pandas and try following:
import pandas as pd
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)
res = df1.merge(df2, on=['name'])
The output:
id name dtype
0 1.0 aa StringType
1 4.0 bb StringType
2 2.0 cc StringType
If you need a dictionary, you can convert merged result pd.DataFrame() to dict.
res.to_dict('records')
Final output is:
[
{'id': 1.0, 'name': 'aa', 'dtype': 'StringType'},
{'id': 4.0, 'name': 'bb', 'dtype': 'StringType'},
{'id': 2.0, 'name': 'cc', 'dtype': 'StringType'}
]
I have a data structure. It looks as follows:
data = [[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2A', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-A', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2A', 'name': 'child', 'steps': 1},
{'id': '3B', 'name': 'grandChild-B', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-C', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3B', 'name': 'grandChild-D', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3C', 'name': 'grandChild-E', 'steps': 2},
{'id': '4A', 'name': 'final', 'steps': 3}
],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2', 'name': 'child', 'steps': 1},
]
]
How my expected output is
expected output
output = {
"1" : {
"2A": {
"3A": "grandChild-A",
"3B": "grandChild-B"
},
"2B": {
"3A": "grandChild-C",
"3B": "grandChild-D",
"3C": {
"4A": "final"
}
},
"2":"child"
}
}
How can I do that? I wanted to use the enumerator, But I always everything inside 1.
Thanks in advance
Update:
I have tried the following code:
parent = data[0][0]["id"]
dict_new = {}
dict_new[parent] = {}
for e in data:
for idx, item in enumerate(e):
display(item)
if idx>0:
dict_new[parent][e[idx]["id"]] = e[idx]["name"]
You can try:
d = {}
root = d
for L in data:
d = root
for M in L[:-1]:
d = d.setdefault(M["id"], {})
d[L[-1]["id"]] = L[-1]['name']
The idea is to follow each list to build a tree (thus d.setdefault(M["id"], {}). The leaf is handled differently, because it has to be the value of 'name'.
from pprint import pprint
pprint(root)
Output:
{'1': {'2': 'child',
'2A': {'3A': 'grandChild-A', '3B': 'grandChild-B'},
'2B': {'3A': 'grandChild-C',
'3B': 'grandChild-D',
'3C': {'4A': 'final'}}}}
The solution above won't work for the following input:
data = [[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1}]],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-C', 'steps': 2}]]
Iterating over the second list will try to add a new element 3A -> grandChild-C to the d['1']['2B'] dict. But d['1']['2B'] is not a dict but the 'child' string here, because of the first list.
When we iterate over the elements, we check if the key is already mapped and otherwise create a new dict (that's the setdefault job). We can also check if the key was mapped to a str, and if that's the case, replace the string by a fresh new dict:
...
for M in L[:-1]:
if M["id"] not in d or isinstance(d[M["id"]], str):
d[M["id"]] = {}
d = d[M["id"]]
...
Output:
{'1': {'2B': {'3A': 'grandChild-C'}}}
I fixed your data: (missing comma)
data = [[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2A', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-A', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2A', 'name': 'child', 'steps': 1},
{'id': '3B', 'name': 'grandChild-B', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-C', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3B', 'name': 'grandChild-D', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3C', 'name': 'grandChild-E', 'steps': 2},
{'id': '4A', 'name': 'final', 'steps': 3}
],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2', 'name': 'child', 'steps': 1},
]
]
And I came up with this code:
output = {}
#print(data)
for lis in data:
o = output
ln = len(lis) - 1
for idx,d in enumerate(lis):
id = d['id']
if idx == ln:
o[id] = d['name']
else:
if id not in o:
o[id] = {}
o = o[id]
print('Result:')
print(output)
I have list as follows:
data = [
{'items': [
{'key': u'3', 'id': 1, 'name': u'Typeplaatje'},
{'key': u'2', 'id': 2, 'name': u'Aanduiding van het chassisnummer '},
{'key': u'1', 'id': 3, 'name': u'Kilometerteller: Kilometerstand '},
{'key': u'5', 'id': 4, 'name': u'Inschrijvingsbewijs '},
{'key': u'4', 'id': 5, 'name': u'COC of gelijkvormigheidsattest '}
], 'id': 2, 'key': u'B', 'name': u'Onderdelen'},
{'items': [
{'key': u'10', 'id': 10, 'name': u'Koppeling'},
{'key': u'7', 'id': 11, 'name': u'Differentieel '},
{'key': u'9', 'id': 12, 'name': u'Cardanhoezen '},
{'key': u'8', 'id': 13, 'name': u'Uitlaat '},
{'key': u'6', 'id': 15, 'name': u'Batterij'}
], 'id': 2, 'key': u'B', 'name': u'Onderdelen'}
]
And I want to sort items by key.
Thus the wanted result is as follows:
res = [
{'items': [
{'key': u'1', 'id': 3, 'name': u'Kilometerteller: Kilometerstand '},
{'key': u'2', 'id': 2, 'name': u'Aanduiding van het chassisnummer '},
{'key': u'3', 'id': 1, 'name': u'Typeplaatje'},
{'key': u'4', 'id': 5, 'name': u'COC of gelijkvormigheidsattest '},
{'key': u'5', 'id': 4, 'name': u'Inschrijvingsbewijs '},
], 'id': 2, 'key': u'B', 'name': u'Onderdelen'},
{'items': [
{'key': u'6', 'id': 15, 'name': u'Batterij'},
{'key': u'7', 'id': 11, 'name': u'Differentieel '},
{'key': u'8', 'id': 13, 'name': u'Uitlaat '},
{'key': u'9', 'id': 12, 'name': u'Cardanhoezen '},
{'key': u'10', 'id': 10, 'name': u'Koppeling'}
], 'id': 2, 'key': u'B', 'name': u'Onderdelen'}
]
I've tried as follows:
res = []
for item in data:
new_data = {
'id': item['id'],
'key': item['key'],
'name': item['name'],
'items': sorted(item['items'], key=lambda k : k['key'])
}
res.append(new_data)
print(res)
The first is sorted fine, but the second one not.
What am I doing wrong and is there a better way of doing it?
Your sort is wrong in the second case because the keys are strings, and strings are sorted by their first character which is '1' if your key is '10'. A slight modification to your sorting function would do the trick:
'items': sorted(item['items'], key=lambda k : int(k['key'])
I'm doing an int because you want to sort them as if they are numbers. Here it is in your code:
res = []
for item in data:
new_data = {
'id': item['id'],
'key': item['key'],
'name': item['name'],
'items': sorted(item['items'], key=lambda k : int(k['key']) )
}
res.append(new_data)
print(res)
And here's the result:
[{'id': 2,
'items': [{'id': 3, 'key': '1', 'name': 'Kilometerteller: Kilometerstand '},
{'id': 2, 'key': '2', 'name': 'Aanduiding van het chassisnummer '},
{'id': 1, 'key': '3', 'name': 'Typeplaatje'},
{'id': 5, 'key': '4', 'name': 'COC of gelijkvormigheidsattest '},
{'id': 4, 'key': '5', 'name': 'Inschrijvingsbewijs '}],
'key': 'B',
'name': 'Onderdelen'},
{'id': 2,
'items': [{'id': 15, 'key': '6', 'name': 'Batterij'},
{'id': 11, 'key': '7', 'name': 'Differentieel '},
{'id': 13, 'key': '8', 'name': 'Uitlaat '},
{'id': 12, 'key': '9', 'name': 'Cardanhoezen '},
{'id': 10, 'key': '10', 'name': 'Koppeling'}],
'key': 'B',
'name': 'Onderdelen'}]
You need to replace the old items in the data with the sorted items based on key numerically instead of string sort. So use int(item['key']) in sort like,
>>> data
[{'items': [{'key': '1', 'id': 3, 'name': 'Kilometerteller: Kilometerstand '}, {'key': '2', 'id': 2, 'name': 'Aanduiding van het chassisnummer '}, {'key': '3', 'id': 1, 'name': 'Typeplaatje'}, {'key': '4', 'id': 5, 'name': 'COC of gelijkvormigheidsattest '}, {'key': '5', 'id': 4, 'name': 'Inschrijvingsbewijs '}], 'id': 2, 'key': 'B', 'name': 'Onderdelen'}, {'items': [{'key': '6', 'id': 15, 'name': 'Batterij'}, {'key': '7', 'id': 11, 'name': 'Differentieel '}, {'key': '8', 'id': 13, 'name': 'Uitlaat '}, {'key': '9', 'id': 12, 'name': 'Cardanhoezen '}, {'key': '10', 'id': 10, 'name': 'Koppeling'}], 'id': 2, 'key': 'B', 'name': 'Onderdelen'}]
>>>
>>> for item in data:
... item['items'] = sorted(item['items'], key=lambda x: int(x['key']))
...
>>> import pprint
>>> pprint.pprint(data)
[{'id': 2,
'items': [{'id': 3, 'key': '1', 'name': 'Kilometerteller: Kilometerstand '},
{'id': 2, 'key': '2', 'name': 'Aanduiding van het chassisnummer '},
{'id': 1, 'key': '3', 'name': 'Typeplaatje'},
{'id': 5, 'key': '4', 'name': 'COC of gelijkvormigheidsattest '},
{'id': 4, 'key': '5', 'name': 'Inschrijvingsbewijs '}],
'key': 'B',
'name': 'Onderdelen'},
{'id': 2,
'items': [{'id': 15, 'key': '6', 'name': 'Batterij'},
{'id': 11, 'key': '7', 'name': 'Differentieel '},
{'id': 13, 'key': '8', 'name': 'Uitlaat '},
{'id': 12, 'key': '9', 'name': 'Cardanhoezen '},
{'id': 10, 'key': '10', 'name': 'Koppeling'}],
'key': 'B',
'name': 'Onderdelen'}]
So list comes with a handy method called sort which sorts itself inplace. I'd use that to your advantage:
for d in data:
d['items'].sort(key=lambda x: int(x['key']))
Results:
[{'id': 2,
'items': [{'id': 3, 'key': '1', 'name': 'Kilometerteller: Kilometerstand '},
{'id': 2, 'key': '2', 'name': 'Aanduiding van het chassisnummer '},
{'id': 1, 'key': '3', 'name': 'Typeplaatje'},
{'id': 5, 'key': '4', 'name': 'COC of gelijkvormigheidsattest '},
{'id': 4, 'key': '5', 'name': 'Inschrijvingsbewijs '}],
'key': 'B',
'name': 'Onderdelen'},
{'id': 2,
'items': [{'id': 15, 'key': '6', 'name': 'Batterij'},
{'id': 11, 'key': '7', 'name': 'Differentieel '},
{'id': 13, 'key': '8', 'name': 'Uitlaat '},
{'id': 12, 'key': '9', 'name': 'Cardanhoezen '},
{'id': 10, 'key': '10', 'name': 'Koppeling'}],
'key': 'B',
'name': 'Onderdelen'}]
In the following example, I would like to sort the animals by the alphabetical order of their category, which is stored in an order dictionnary.
category = [{'uid': 0, 'name': 'mammals'},
{'uid': 1, 'name': 'birds'},
{'uid': 2, 'name': 'fish'},
{'uid': 3, 'name': 'reptiles'},
{'uid': 4, 'name': 'invertebrates'},
{'uid': 5, 'name': 'amphibians'}]
animals = [{'name': 'horse', 'category': 0},
{'name': 'whale', 'category': 2},
{'name': 'mollusk', 'category': 4},
{'name': 'tuna ', 'category': 2},
{'name': 'worms', 'category': 4},
{'name': 'frog', 'category': 5},
{'name': 'dog', 'category': 0},
{'name': 'salamander', 'category': 5},
{'name': 'horse', 'category': 0},
{'name': 'octopus', 'category': 4},
{'name': 'alligator', 'category': 3},
{'name': 'monkey', 'category': 0},
{'name': 'kangaroos', 'category': 0},
{'name': 'salmon', 'category': 2}]
sorted_animals = sorted(animals, key=lambda k: (k['category'])
How could I achieve this?
Thanks.
You are now sorting on the category id. All you need to do is map that id to a lookup for a given category name.
Create a dictionary for the categories first so you can directly map the numeric id to the associated name from the category list, then use that mapping when sorting:
catuid_to_name = {c['uid']: c['name'] for c in category}
sorted_animals = sorted(animals, key=lambda k: catuid_to_name[k['category']])
Demo:
>>> from pprint import pprint
>>> category = [{'uid': 0, 'name': 'mammals'},
... {'uid': 1, 'name': 'birds'},
... {'uid': 2, 'name': 'fish'},
... {'uid': 3, 'name': 'reptiles'},
... {'uid': 4, 'name': 'invertebrates'},
... {'uid': 5, 'name': 'amphibians'}]
>>> animals = [{'name': 'horse', 'category': 0},
... {'name': 'whale', 'category': 2},
... {'name': 'mollusk', 'category': 4},
... {'name': 'tuna ', 'category': 2},
... {'name': 'worms', 'category': 4},
... {'name': 'frog', 'category': 5},
... {'name': 'dog', 'category': 0},
... {'name': 'salamander', 'category': 5},
... {'name': 'horse', 'category': 0},
... {'name': 'octopus', 'category': 4},
... {'name': 'alligator', 'category': 3},
... {'name': 'monkey', 'category': 0},
... {'name': 'kangaroos', 'category': 0},
... {'name': 'salmon', 'category': 2}]
>>> catuid_to_name = {c['uid']: c['name'] for c in category}
>>> pprint(catuid_to_name)
{0: 'mammals',
1: 'birds',
2: 'fish',
3: 'reptiles',
4: 'invertebrates',
5: 'amphibians'}
>>> sorted_animals = sorted(animals, key=lambda k: catuid_to_name[k['category']])
>>> pprint(sorted_animals)
[{'category': 5, 'name': 'frog'},
{'category': 5, 'name': 'salamander'},
{'category': 2, 'name': 'whale'},
{'category': 2, 'name': 'tuna '},
{'category': 2, 'name': 'salmon'},
{'category': 4, 'name': 'mollusk'},
{'category': 4, 'name': 'worms'},
{'category': 4, 'name': 'octopus'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'dog'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'monkey'},
{'category': 0, 'name': 'kangaroos'},
{'category': 3, 'name': 'alligator'}]
Note that within each category, the dictionaries have been left in relative input order. You could return a tuple of values from the sorting key to further apply a sorting order within each category, e.g.:
sorted_animals = sorted(
animals,
key=lambda k: (catuid_to_name[k['category']], k['name'])
)
would sort by animal name within each category, producing:
>>> pprint(sorted(animals, key=lambda k: (catuid_to_name[k['category']], k['name'])))
[{'category': 5, 'name': 'frog'},
{'category': 5, 'name': 'salamander'},
{'category': 2, 'name': 'salmon'},
{'category': 2, 'name': 'tuna '},
{'category': 2, 'name': 'whale'},
{'category': 4, 'name': 'mollusk'},
{'category': 4, 'name': 'octopus'},
{'category': 4, 'name': 'worms'},
{'category': 0, 'name': 'dog'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'kangaroos'},
{'category': 0, 'name': 'monkey'},
{'category': 3, 'name': 'alligator'}]
imo your category structure is far too complicated - at least as long as the uid is nothing but the index, you could simply use a list for that:
category = [c['name'] for c in category]
# ['mammals', 'birds', 'fish', 'reptiles', 'invertebrates', 'amphibians']
sorted_animals = sorted(animals, key=lambda k: category[k['category']])
#[{'name': 'frog', 'category': 5}, {'name': 'salamander', 'category': 5}, {'name': 'whale', 'category': 2}, {'name': 'tuna ', 'category': 2}, {'name': 'salmon', 'category': 2}, {'name': 'mollusk', 'category': 4}, {'name': 'worms', 'category': 4}, {'name': 'octopus', 'category': 4}, {'name': 'horse', 'category': 0}, {'name': 'dog', 'category': 0}, {'name': 'horse', 'category': 0}, {'name': 'monkey', 'category': 0}, {'name': 'kangaroos', 'category': 0}, {'name': 'alligator', 'category': 3}]
I have seen Python: remove dictionary from list and Splitting a list of dictionaries into several lists of dictionaries - but this question is slightly different.
Consider this working example (same in Python 2 or 3):
#!/usr/bin/env python
from __future__ import print_function
origarr = [
{ 'name': 'test01', 'type': 0, 'value': 42 },
{ 'name': 'test02', 'type': 0, 'value': 142 },
{ 'name': 'test03', 'type': 2, 'value': 242 },
{ 'name': 'test04', 'type': 2, 'value': 342 },
{ 'name': 'test05', 'type': 3, 'value': 42 },
]
print("origarr: {}".format(origarr))
lastdictelem = origarr.pop()
print("\nlastdictelem: {}".format(lastdictelem))
print("after pop, origarr: {}".format(origarr))
namestofilter = [ 'test01', 'test02' ]
newarr = []
for iname in namestofilter:
# find the object having the name iname
foundidx = -1
for ix, idict in enumerate(origarr):
if idict.get('name') == iname:
foundidx = ix
break
if foundidx > -1:
# remove dict object via pop at index, save removed object
remdict = origarr.pop(foundidx)
# add removed object to newarr:
newarr.append(remdict)
print("\nafter namestofilter:")
print("newarr: {}".format(newarr))
print("origarr: {}".format(origarr))
Basically, mylist.pop() removes the last element from mylist as an object (here a dict), and returns it - then I can trivially insert it in a new array/list; this is illustrated by the first printout of this script:
$ python2 test.py
origarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}, {'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}, {'name': 'test05', 'type': 3, 'value': 42}]
lastdictelem: {'name': 'test05', 'type': 3, 'value': 42}
after pop, origarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}, {'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}]
Now, what I would like to do, is define an array with values for the name key in a dict (say, namestofilter = [ 'test01', 'test02' ]), and have those dicts removed from the orriginal array/list, and put into a new array/list (as .pop() would do with a single element and an object reference).
Since pop removes the item at a specific index and returns it, the above code does exactly that - and works:
...
after namestofilter:
newarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}]
origarr: [{'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}]
... but I was wondering - is there a more compact way of doing that, other than "manually" for-looping through the two arrays, and calling .pop()/.append() individually (as done in the example)?
I'm not sure is there a way to it compact - probaly not.
But you can simpify code a little bit and also don't spend O(n) for each .pop:
origarr = [
{ 'name': 'test01', 'type': 0, 'value': 42 },
{ 'name': 'test02', 'type': 0, 'value': 142 },
{ 'name': 'test03', 'type': 2, 'value': 242 },
{ 'name': 'test04', 'type': 2, 'value': 342 },
{ 'name': 'test05', 'type': 3, 'value': 42 },
]
namestofilter = set([ 'test01', 'test02' ]). # could be a list as in question
print("origarr: {}".format(origarr))
lastdictelem = origarr.pop()
print("\nlastdictelem: {}".format(lastdictelem))
print("after pop, origarr: {}".format(origarr))
shift = 0
newarr = []
for ix, idict in enumerate(origarr):
if idict['name'] in namestofilter:
shift += 1
newarr.append(idict)
continue
origarr[ix-shift] = origarr[ix]
origarr = origarr[:-shift] # perhaps it is a slicing O(n) copy overhead
print("\nafter namestofilter:")
print("newarr: {}".format(newarr))
print("origarr: {}".format(origarr))
Output:
origarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}, {'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}, {'name': 'test05', 'type': 3, 'value': 42}]
lastdictelem: {'name': 'test05', 'type': 3, 'value': 42}
after pop, origarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}, {'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}]
after namestofilter:
newarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}]
origarr: [{'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}]