item frequency in a python list of dictionaries - python

Ok, so I have a list of dicts:
[{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
and I want the 'frequency' of the items within each column. So for this I'd get something like:
{'name': {'johnny': 2, 'jakob': 1, 'aaron': 1, 'max': 1},
'surname': {'smith': 2, 'ryan': 1, 'specter': 1, 'headroom': 1},
'age': {53:1, 13:1, 27: 1. 22:1, 108:1}}
Any modules out there that can do stuff like this?

collections.defaultdict from the standard library to the rescue:
from collections import defaultdict
LofD = [{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
def counters():
return defaultdict(int)
def freqs(LofD):
r = defaultdict(counters)
for d in LofD:
for k, v in d.items():
r[k][v] += 1
return dict((k, dict(v)) for k, v in r.items())
print freqs(LofD)
emits
{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1}, 'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1}, 'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}
as desired (order of keys apart, of course -- it's irrelevant in a dict).

items = [{'name': 'johnny', 'surname': 'smith', 'age': 53}, {'name': 'johnny', 'surname': 'ryan', 'age': 13}, {'name': 'jakob', 'surname': 'smith', 'age': 27}, {'name': 'aaron', 'surname': 'specter', 'age': 22}, {'name': 'max', 'surname': 'headroom', 'age': 108}]
global_dict = {}
for item in items:
for key, value in item.items():
if not global_dict.has_key(key):
global_dict[key] = {}
if not global_dict[key].has_key(value):
global_dict[key][value] = 0
global_dict[key][value] += 1
print global_dict
Simplest solution and actually tested.

New in Python 3.1: The collections.Counter class:
mydict=[{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
import collections
newdict = {}
for key in mydict[0].keys():
l = [value[key] for value in mydict]
newdict[key] = dict(collections.Counter(l))
print(newdict)
outputs:
{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1},
'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1},
'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}

This?
from collections import defaultdict
fq = { 'name': defaultdict(int), 'surname': defaultdict(int), 'age': defaultdict(int) }
for row in listOfDicts:
for field in fq:
fq[field][row[field]] += 1
print fq

Related

Count duplicates in dictionary by specific keys

I have a list of dictionaries and I need to count duplicates by specific keys.
For example:
[
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
If will specify 'age' and 'country' it should return
[
{
'age': 10,
'country': 'USA',
'count': 2
},
{
'age': 10,
'country': 'Canada',
'count': 2
},
{
'age': 15,
'country': 'Canada',
'count': 1
}
]
Or if I will specify 'name' and 'height':
[
{
'name': 'John',
'height': 185,
'count': 2
},
{
'name': 'Mark',
'height': 180,
'count': 2
},
{
'name': 'Doe',
'heigth': 185,
'count': 1
}
]
Maybe there is a way to implement this by Counter?
You can use itertools.groupby with sorted list:
>>> data = [
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
>>> from itertools import groupby
>>> key = 'age', 'country'
>>> list_sorter = lambda x: tuple(x[k] for k in key)
>>> grouper = lambda x: tuple(x[k] for k in key)
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'age': 10, 'country': 'Canada', 'count': 2},
{'age': 10, 'country': 'USA', 'count': 2},
{'age': 15, 'country': 'Canada', 'count': 1}]
>>> key = 'name', 'height'
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
If you use pandas then you can use, pandas.DataFrame.groupby, pandas.groupby.size, pandas.Series.to_frame, pandas.DataFrame.reset_index and finally pandas.DataFrame.to_dict with orient='records':
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> df.groupby(list(key)).size().to_frame('count').reset_index().to_dict('records')
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]

Increment a key value in a list of dictionaries

I would like to add an id key to a list of dictionaries, where each id represents the enumerated nested dictionary.
Current list of dictionaries:
current_list_d = [{'id': 0, 'name': 'Paco', 'age': 18} #all id's are 0
{'id': 0, 'name': 'John', 'age': 20}
{'id': 0, 'name': 'Claire', 'age': 22}]
Desired output:
output_list_d = [{'id': 1, 'name': 'Paco', 'age': 18} #id's are counted/enumerated
{'id': 2, 'name': 'John', 'age': 20}
{'id': 3, 'name': 'Claire', 'age': 22}]
My code:
for d in current_list_d:
d["id"]+=1
You could use a simple for loop with enumerate and update in-place the id keys in the dictionaries:
for new_id, d in enumerate(current_list_d, start=1):
d['id'] = new_id
current_list_d
[{'id': 1, 'name': 'Paco', 'age': 18},
{'id': 2, 'name': 'John', 'age': 20},
{'id': 3, 'name': 'Claire', 'age': 22}]
You can use a variable.
id_val = 1
for dict in current_list_d :
dict["id"] = id_val
id_val+=1

Select list element where a field has the min value

Suppose I have a named list as follows:
myListOfPeople = [{'ID': 0, 'Name': 'Mary', 'Age': 25}, {'ID': 1, 'Name': 'John', 'Age': 28}]
I want to select the element (not only the field) where an specific field meets certain criteria, e.g., the element with the minimum 'Age'. Something like:
youngerPerson = [person for person in myListOfPeople if person = ***person with minimum age***]
And will get as answer:
>>youngerPerson: {'ID': 0, 'Name': Mary, 'Age': 25}
How can I do that?
You can use the key parameter of min:
>>> myListOfPeople = [{'ID': 0, 'Name': 'Mary', 'Age': 25}, {'ID': 1, 'Name': 'John', 'Age': 28}]
>>>
>>> min(myListOfPeople, key=lambda x: x["Age"])
{'ID': 0, 'Name': 'Mary', 'Age': 25}
>>>
You can use itemgetter :
from operator import itemgetter
myListOfPeople = [{'ID': 0, 'Name': 'Mary', 'Age': 25}, {'ID': 1, 'Name': 'John', 'Age': 28}]
sorted(myListOfPeople, key=itemgetter('Age'))[0]
# {'ID': 0, 'Name': 'Mary', 'Age': 25}

Generate list of dictionaries without adjacent dictionary key/values equal

I have a sorted list based on the value for key name as below:
s = [{'name': 'Bart', 'age': 12}, {'name': 'Bart', 'age': 19}, {'name': 'Bart', 'age': 1},{'name': 'Homer', 'age': 30}, {'name': 'Homer', 'age': 12},{'name': 'Simpson', 'age': 19}]
I want to arrange the elements of the list such that dictionaries with the same value for key name do not occur one after the other.
Required output:
[{'name': 'Bart', 'age': 12}, {'name': 'Homer', 'age': 30}, {'name': 'Simpson', 'age': 19}, {'name': 'Bart', 'age': 19}, {'name': 'Homer', 'age': 12}, {'name': 'Bart', 'age': 1}]
OR
[{'name': 'Bart', 'age': 12}, {'name': 'Homer', 'age': 30}, {'name': 'Bart', 'age': 19}, {'name': 'Homer', 'age': 12}, {'name': 'Bart', 'age': 1},{'name': 'Simpson', 'age': 19}]
To get either one of the required outputs I tried using map and lambda
The idea was to compare every name element with the next elements name and if they don't match, swap the values and return the resulting list.
Below is the code which I was trying:
map(lambda x: x if x['name']!=next(iter(x))['name'] else None, s)
One thing that did not work is that next(iter(x) did not return the next element. I also want to know why and if the solution can be achieved using map and lambda ?
I wrote this according to your requirements, though it is not as condensed:
s = [{'name': 'Bart', 'age': 12}, {'name': 'Bart', 'age': 19}, {'name': 'Bart', 'age': 1},
{'name': 'Homer', 'age': 30}, {'name': 'Homer', 'age': 12},{'name': 'Simpson', 'age': 19}]
res=[]
for m,n in zip(s, reversed(s)):
if m!=n:
res.append(m)
res.append(n)
else:
res.append(m)
if len(res)==len(s):
break
print res
It makes use of the fact that you already have the sorted list.

How to get whole dict with max value of a common key in a list of dicts

I have a list of dicts like below:
lod = [
{'name': 'Tom', 'score': 60},
{'name': 'Tim', 'score': 70},
{'name': 'Tam', 'score': 80},
{'name': 'Tem', 'score': 90}
]
I want to get {'name': 'Tem', 'score':90} but I only can do below:
max(x['score'] for x in lod)
This only return the value 90.
How can I get the whole dict?
You can use the key function of max:
>>> lod = [
... {'name': 'Tom', 'score': 60},
... {'name': 'Tim', 'score': 70},
... {'name': 'Tam', 'score': 80},
... {'name': 'Tem', 'score': 90}
... ]
...
>>> max(lod, key=lambda x: x['score'])
{'name': 'Tem', 'score': 90}
Just pass your list to max, like this:
>>> from operator import itemgetter
>>> lod = [
... {'name': 'Tom', 'score': 60},
... {'name': 'Tim', 'score': 70},
... {'name': 'Tam', 'score': 80},
... {'name': 'Tem', 'score': 90}
... ]
>>> max(lod, key=itemgetter('score'))
{'score': 90, 'name': 'Tem'}
I dont know whether sorting is time consuming,
>>>sorted(lod, key=lambda x:x['score'])[-1]
{'name': 'Tem', 'score': 90}

Categories

Resources