Using pandas to add list of dicts together - python

This is a follow-up to this question: Using pandas to add list elements together. I would like to generalize this function to getting unique elements in an array, even if they're not of a 'hashable' type, such as a dict. Here is the input array:
items = [
{
'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Place': 'X', 'Age': 22}, {'Place': 'Y', 'Age': 23}]
},
{
'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Place': 'Z', 'Age': 20}]
},
{
'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Place': 'Z', 'Age': 20}]
},
{
'FirstName': 'Bob',
'LastName': 'Jones',
'Residence': [{'Place': 'Z', 'Age': 20}]
}
]
I want to add together the unique Residences (dicts) together, so the final result would be:
items = [
{
'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Place': 'X', 'Age': 22}, {'Place': 'Y', 'Age': 23}, {'Place': 'Z', 'Age': 20}]
},
{
'FirstName': 'Bob',
'LastName': 'Jones',
'Residence': [{'Place': 'Z', 'Age': 20}]
}
]
The SQL I would use would be something like this:
SELECT FirstName, LastName, GROUP_CONCAT(DISTINCT **Residence Object**)
FROM items
GROUP BY FirstName, LastName
How would I do this in pandas, so that I don't get an unhashable type error when trying to get the distinct array elements?

Barring anything else, I don't think Pandas would give you any real benefit here:
from collections import defaultdict
d = defaultdict(list)
for e in items:
d[(e['FirstName'], e['LastName'])].append(e['Residence'])
items = [{'FirstName': k[0], 'LastName': k[1], 'Residence': v} for k, v in d.items()]

Solution from pandas
#df=pd.DataFrame(items)
df.groupby(['FirstName','LastName']).Residence.\
apply(lambda x : x.sum()).\
apply(lambda x : [dict(y) for y in set(tuple(t.items()) for t in x)]).\
reset_index().to_dict('r')
Out[104]:
[{'FirstName': 'Bob',
'LastName': 'Jones',
'Residence': [{'Age': 20, 'Place': 'Z'}]},
{'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Age': 20, 'Place': 'Z'},
{'Age': 23, 'Place': 'Y'},
{'Age': 22, 'Place': 'X'}]}]

Related

How to get key and value instead of only value when filtering with JMESPath?

Input data:
s = {'111': {'name': 'john', 'exp': '1'}, '222': {'name': 'mia', 'exp': '1'}}
Code:
import jmespath
jmespath.search("(*)[?name=='john']", s)
Output:
[{'name': 'john', 'exp': '1'}]
Output I want:
[{'111': {'name': 'john', 'exp': '1'}}]
Convert the dictionary to the list
l1 = [{'key': k, 'value': v} for k, v in s.items()]
gives
[{'key': '111', 'value': {'name': 'john', 'exp': '1'}}, {'key': '222', 'value': {'name': 'mia', 'exp': '1'}}]
Select the values where the attribute name is john
l2 = jmespath.search('[?value.name == `john`]', l1)
gives
[{'key': '111', 'value': {'name': 'john', 'exp': '1'}}]
Convert the list back to the dictionary
s2 = dict([[i['key'], i['value']] for i in l2])
gives the expected result
{'111': {'name': 'john', 'exp': '1'}}
Example of complete code for testing
#!/usr/bin/python3
import jmespath
s = {'111': {'name': 'john', 'exp': '1'},
'222': {'name': 'mia', 'exp': '1'}}
# '333': {'name': 'john', 'exp': '1'}}
l1 = [{'key': k, 'value': v} for k, v in s.items()]
print(l1)
l2 = jmespath.search('[?value.name == `john`]', l1)
print(l2)
s2 = dict([[i['key'], i['value']] for i in l2])
print(s2)
Since you cannot preserve keys in JMESPath when doing an object projection, and that you will have to resort to a loop to have a JSON structure that will allow you to have your desired output see the other answer, the best will probably be to let JMESPath aside for your use case and achieve it with a list comprehension:
Given:
s = {
'111': {'name': 'john', 'exp': '1'},
'222': {'name': 'mia', 'exp': '1'},
}
print([
{key: value}
for key, value in s.items()
if value['name'] == 'john'
])
This yields the expect:
[{'111': {'name': 'john', 'exp': '1'}}]

How to "flatten" list of dictionaries in python

I have a list of dictionaries that looks like this:
test_dict = [{'id': 0, 'name': ['Jim'], 'lastname': ['kkk']}, {'id': 1, 'name': ['John'], 'lastname': ['lll']}]
test_dict
[{'id': 0, 'name': ['Jim'], 'lastname': ['kkk']},
{'id': 1, 'name': ['John'], 'lastname': ['lll']}]
I would like to create another dictionary that will have as keys, the ids of the test_dict.
The output I am looking for, looks like this:
test_dict_f = {'0': {'name': ['Jim'], 'lastname': ['kkk']},
'1': {'name': ['John'], 'lastname': ['lll']}}
test_dict_f
{'0': {'name': ['Jim'], 'lastname': ['kkk']},
'1': {'name': ['John'], 'lastname': ['lll']}}
Any ideas how I could achieve this ?
Try this in one line:
result = {str(i["id"]): {"name": i["name"], "lastname": i["lastname"]} for i in test_dict}
the result will be:
{'0': {'name': ['Jim'], 'lastname': ['kkk']},
'1': {'name': ['John'], 'lastname': ['lll']}}
Here's another way to do it that doesn't rely on knowledge of any keys other than 'id'. Note that this is destructive:
test_dict = [{'id': 0, 'name': ['Jim'], 'lastname': ['kkk']}, {'id': 1, 'name': ['John'], 'lastname': ['lll']}]
test_dict_f = {}
for d in test_dict:
id_ = d.pop('id')
test_dict_f[str(id_)] = d
print(test_dict_f)
Output:
{'0': {'name': ['Jim'], 'lastname': ['kkk']}, '1': {'name': ['John'], 'lastname': ['lll']}}
test_dict = [{'id': 0, 'name': ['Jim'], 'lastname': ['kkk']}, {'id': 1, 'name': ['John'], 'lastname': ['lll']}]
pp({i["id"]: {k:v for k,v in i.items() if k!="id"} for i in test_dict})
or event much funnier:
pp({i["id"]: i|{"id":i["id"]} for i in test_dict})
{0: {'name': ['Jim'], 'lastname': ['kkk']},
1: {'name': ['John'], 'lastname': ['lll']}}

Create a list of lists from a dictionary python

I have a list of dictionaries that I am wanting to convert to a nested list with the first element of that list(lst[0]) containing the dictionary keys and the rest of the elements of the list containing values for each dictionary.
[{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
My expected output result is:
[['id','name','city'], ['123','bob','LA'],['321','sally','manhattan'],['125','fred','miami']]
What would be a way to go about this? Any help would be greatly appreciated.
you can use:
d = [{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
[[k for k in d[0].keys()], *[list(i.values()) for i in d ]]
output:
[['id', 'name', 'city'],
['123', 'bob', 'LA'],
['321', 'sally', 'manhattan'],
['125', 'fred', 'miami']]
first, you get a list with your keys then get a list with the values for every inner dict
>>> d = [{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
>>> [list(x[0].keys())]+[list(i.values()) for i in d]
[['id', 'name', 'city'], ['123', 'bob', 'LA'], ['321', 'sally', 'manhattan'], ['125', 'fred', 'miami']]
Serious suggestion: To avoid the possibility of some dicts having a different iteration order, base the order off the first entry and use operator.itemgetter to get a consistent order from all entries efficiently:
import operator
d = [{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
keys = list(d[0])
keygetter = operator.itemgetter(*keys)
result = [keys, *[list(keygetter(x)) for x in d]] # [keys, *map(list, map(keygetter, d))] might be a titch faster
If a list of tuples is acceptable, this is simpler/faster:
keys = tuple(d[0])
keygetter = operator.itemgetter(*keys)
result = [keys, *map(keygetter, d)]
Unserious suggestion: Let csv do it for you!
import csv
import io
dicts = [{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
with io.StringIO() as sio:
writer = csv.DictWriter(sio, dicts[0].keys())
writer.writeheader()
writer.writerows(dicts)
sio.seek(0)
result = list(csv.reader(sio))
Try it online!
This can be done with for loop and enumerate() built-in method.
listOfDicts = [
{"id": "123", "name": "bob", "city": "LA"},
{"id": "321", "name": "sally", "city": "manhattan"},
{"id": "125", "name": "fred", "city": "miami"},
]
results = []
for index, dic in enumerate(listOfDicts, start = 0):
if index == 0:
results.append(list(dic.keys()))
results.append(list(dic.values()))
else:
results.append(list(dic.values()))
print(results)
output:
[['id', 'name', 'city'], ['123', 'bob', 'LA'], ['321', 'sally', 'manhattan'], ['125', 'fred', 'miami']]

Append, Remove, Edit a Dictionary Item from List of Dictionary

How do I perform such task in a list of dictionary?
lists = [{'firstname': 'John', 'lastname': 'Doe', 'color': 'red'}]
(1) Append an item, {'age': '30'} to the current lists [0].
lists = [{'firstname': 'John', 'lastname': 'Doe', 'color': 'red', 'age': '30}]
(2) How do I change the 'lastname' to 'Smith'?
lists = [{'firstname': 'John', 'lastname': 'Smith', 'color': 'red', 'age': '30}]
(3) How do I remove the 'color' from the list?
lists = [{'firstname': 'John', 'lastname': 'Smith', 'age': '30}]
lists = [{'firstname': 'John', 'lastname': 'Doe', 'color': 'red'}]
# update value to 30
lists[0]["age"] = 30
print(lists)
# update value to smith
lists[0]["lastname"] = "Smith"
print(lists)
# finally delete using the del statement using the key
del lists[0]["color"]
print(lists)
[{'firstname': 'John', 'lastname': 'Doe', 'age': 30, 'color': 'red'}]
[{'firstname': 'John', 'lastname': 'Smith', 'age': 30, 'color': 'red'}]
[{'firstname': 'John', 'lastname': 'Smith', 'age': 30}]
The same way you would with any other dictionary. lists[0] is a dictionary.
(1) Appending:
lists[0]['age'] = '30'
(2) Modifying
lists[0]['lastname'] = 'Smith'
(3) Deleting
del lists[0]['color']
(1) Append an item, {'age': '30'} to the current lists [0].
>>>lists[0]['age']=30
>>>lists
[{'age': 30, 'color': 'red', 'firstname': 'John', 'lastname': 'Doe'}]
(2) How do I change the 'lastname' to 'Smith'?
>>>lists[0]['lastname'] = "Smith"
>>>lists
[{'lastname': 'Smith', 'age': 30, 'color': 'red', 'firstname': 'John'}]
(3) How do I remove the 'color' from the list?
>>>del lists[0]['color'] #or lists[0].pop('color') , This should return `red`
>>>lists
[{'lastname': 'Smith', 'age': 30, 'firstname': 'John'}]

item frequency in a python list of dictionaries

Ok, so I have a list of dicts:
[{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
and I want the 'frequency' of the items within each column. So for this I'd get something like:
{'name': {'johnny': 2, 'jakob': 1, 'aaron': 1, 'max': 1},
'surname': {'smith': 2, 'ryan': 1, 'specter': 1, 'headroom': 1},
'age': {53:1, 13:1, 27: 1. 22:1, 108:1}}
Any modules out there that can do stuff like this?
collections.defaultdict from the standard library to the rescue:
from collections import defaultdict
LofD = [{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
def counters():
return defaultdict(int)
def freqs(LofD):
r = defaultdict(counters)
for d in LofD:
for k, v in d.items():
r[k][v] += 1
return dict((k, dict(v)) for k, v in r.items())
print freqs(LofD)
emits
{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1}, 'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1}, 'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}
as desired (order of keys apart, of course -- it's irrelevant in a dict).
items = [{'name': 'johnny', 'surname': 'smith', 'age': 53}, {'name': 'johnny', 'surname': 'ryan', 'age': 13}, {'name': 'jakob', 'surname': 'smith', 'age': 27}, {'name': 'aaron', 'surname': 'specter', 'age': 22}, {'name': 'max', 'surname': 'headroom', 'age': 108}]
global_dict = {}
for item in items:
for key, value in item.items():
if not global_dict.has_key(key):
global_dict[key] = {}
if not global_dict[key].has_key(value):
global_dict[key][value] = 0
global_dict[key][value] += 1
print global_dict
Simplest solution and actually tested.
New in Python 3.1: The collections.Counter class:
mydict=[{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
import collections
newdict = {}
for key in mydict[0].keys():
l = [value[key] for value in mydict]
newdict[key] = dict(collections.Counter(l))
print(newdict)
outputs:
{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1},
'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1},
'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}
This?
from collections import defaultdict
fq = { 'name': defaultdict(int), 'surname': defaultdict(int), 'age': defaultdict(int) }
for row in listOfDicts:
for field in fq:
fq[field][row[field]] += 1
print fq

Categories

Resources