Count duplicates in dictionary by specific keys

Count duplicates in dictionary by specific keys - python

I have a list of dictionaries and I need to count duplicates by specific keys.
For example:
[
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
If will specify 'age' and 'country' it should return
[
{
'age': 10,
'country': 'USA',
'count': 2
},
{
'age': 10,
'country': 'Canada',
'count': 2
},
{
'age': 15,
'country': 'Canada',
'count': 1
}
]
Or if I will specify 'name' and 'height':
[
{
'name': 'John',
'height': 185,
'count': 2
},
{
'name': 'Mark',
'height': 180,
'count': 2
},
{
'name': 'Doe',
'heigth': 185,
'count': 1
}
]
Maybe there is a way to implement this by Counter?

You can use itertools.groupby with sorted list:
>>> data = [
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
>>> from itertools import groupby
>>> key = 'age', 'country'
>>> list_sorter = lambda x: tuple(x[k] for k in key)
>>> grouper = lambda x: tuple(x[k] for k in key)
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'age': 10, 'country': 'Canada', 'count': 2},
{'age': 10, 'country': 'USA', 'count': 2},
{'age': 15, 'country': 'Canada', 'count': 1}]
>>> key = 'name', 'height'
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
If you use pandas then you can use, pandas.DataFrame.groupby, pandas.groupby.size, pandas.Series.to_frame, pandas.DataFrame.reset_index and finally pandas.DataFrame.to_dict with orient='records':
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> df.groupby(list(key)).size().to_frame('count').reset_index().to_dict('records')
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]

Related

list of dicts- get the number of duplications [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a list of dicts (same format) like this :
L = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 3, 'name': 'stack', 'age': 40}
]
I want to remove duplication and get the number of this duplication like this
[
{'id': 1, 'name': 'john', 'age': 34, 'duplication': 2},
{'id': 2, 'name': 'hanna', 'age': 30, 'duplication': 2},
{'id': 3, 'name': 'stack', 'age': 40, 'duplication': 1}
]
I already managed to remove the duplication by using a set.... but I can't get the number of duplications
my code :
no_duplication = [dict(s) for s in set(frozenset(d.items()) for d in L)]
no_duplication = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 3, 'name': 'stack', 'age': 40}
]

Here is a solution you can give a try using collections.Counter,
from collections import Counter
print([
{**dict(k), "duplicated": v}
for k, v in Counter(frozenset(i.items()) for i in L).items()
])
[{'age': 34, 'duplicated': 2, 'id': 1, 'name': 'john'},
{'age': 30, 'duplicated': 2, 'id': 2, 'name': 'hanna'},
{'age': 40, 'duplicated': 1, 'id': 3, 'name': 'stack'}]

ar = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 3, 'name': 'stack', 'age': 40}
]
br = []
cnt = []
for i in ar:
if i not in br:
br.append(i)
cnt.append(1)
else:
cnt[br.index(i)] += 1
for i in range(len(br)):
br[i]['duplication'] = cnt[i]
The desired output is contained in br as:
[
{'id': 1, 'name': 'john', 'age': 34, 'duplication': 2},
{'id': 2, 'name': 'hanna', 'age': 30, 'duplication': 2},
{'id': 3, 'name': 'stack', 'age': 40, 'duplication': 1}
]

Sort a list of dict with a key from another list of dict

In the following example, I would like to sort the animals by the alphabetical order of their category, which is stored in an order dictionnary.
category = [{'uid': 0, 'name': 'mammals'},
{'uid': 1, 'name': 'birds'},
{'uid': 2, 'name': 'fish'},
{'uid': 3, 'name': 'reptiles'},
{'uid': 4, 'name': 'invertebrates'},
{'uid': 5, 'name': 'amphibians'}]
animals = [{'name': 'horse', 'category': 0},
{'name': 'whale', 'category': 2},
{'name': 'mollusk', 'category': 4},
{'name': 'tuna ', 'category': 2},
{'name': 'worms', 'category': 4},
{'name': 'frog', 'category': 5},
{'name': 'dog', 'category': 0},
{'name': 'salamander', 'category': 5},
{'name': 'horse', 'category': 0},
{'name': 'octopus', 'category': 4},
{'name': 'alligator', 'category': 3},
{'name': 'monkey', 'category': 0},
{'name': 'kangaroos', 'category': 0},
{'name': 'salmon', 'category': 2}]
sorted_animals = sorted(animals, key=lambda k: (k['category'])
How could I achieve this?
Thanks.

You are now sorting on the category id. All you need to do is map that id to a lookup for a given category name.
Create a dictionary for the categories first so you can directly map the numeric id to the associated name from the category list, then use that mapping when sorting:
catuid_to_name = {c['uid']: c['name'] for c in category}
sorted_animals = sorted(animals, key=lambda k: catuid_to_name[k['category']])
Demo:
>>> from pprint import pprint
>>> category = [{'uid': 0, 'name': 'mammals'},
... {'uid': 1, 'name': 'birds'},
... {'uid': 2, 'name': 'fish'},
... {'uid': 3, 'name': 'reptiles'},
... {'uid': 4, 'name': 'invertebrates'},
... {'uid': 5, 'name': 'amphibians'}]
>>> animals = [{'name': 'horse', 'category': 0},
... {'name': 'whale', 'category': 2},
... {'name': 'mollusk', 'category': 4},
... {'name': 'tuna ', 'category': 2},
... {'name': 'worms', 'category': 4},
... {'name': 'frog', 'category': 5},
... {'name': 'dog', 'category': 0},
... {'name': 'salamander', 'category': 5},
... {'name': 'horse', 'category': 0},
... {'name': 'octopus', 'category': 4},
... {'name': 'alligator', 'category': 3},
... {'name': 'monkey', 'category': 0},
... {'name': 'kangaroos', 'category': 0},
... {'name': 'salmon', 'category': 2}]
>>> catuid_to_name = {c['uid']: c['name'] for c in category}
>>> pprint(catuid_to_name)
{0: 'mammals',
1: 'birds',
2: 'fish',
3: 'reptiles',
4: 'invertebrates',
5: 'amphibians'}
>>> sorted_animals = sorted(animals, key=lambda k: catuid_to_name[k['category']])
>>> pprint(sorted_animals)
[{'category': 5, 'name': 'frog'},
{'category': 5, 'name': 'salamander'},
{'category': 2, 'name': 'whale'},
{'category': 2, 'name': 'tuna '},
{'category': 2, 'name': 'salmon'},
{'category': 4, 'name': 'mollusk'},
{'category': 4, 'name': 'worms'},
{'category': 4, 'name': 'octopus'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'dog'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'monkey'},
{'category': 0, 'name': 'kangaroos'},
{'category': 3, 'name': 'alligator'}]
Note that within each category, the dictionaries have been left in relative input order. You could return a tuple of values from the sorting key to further apply a sorting order within each category, e.g.:
sorted_animals = sorted(
animals,
key=lambda k: (catuid_to_name[k['category']], k['name'])
)
would sort by animal name within each category, producing:
>>> pprint(sorted(animals, key=lambda k: (catuid_to_name[k['category']], k['name'])))
[{'category': 5, 'name': 'frog'},
{'category': 5, 'name': 'salamander'},
{'category': 2, 'name': 'salmon'},
{'category': 2, 'name': 'tuna '},
{'category': 2, 'name': 'whale'},
{'category': 4, 'name': 'mollusk'},
{'category': 4, 'name': 'octopus'},
{'category': 4, 'name': 'worms'},
{'category': 0, 'name': 'dog'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'horse'},
{'category': 0, 'name': 'kangaroos'},
{'category': 0, 'name': 'monkey'},
{'category': 3, 'name': 'alligator'}]

imo your category structure is far too complicated - at least as long as the uid is nothing but the index, you could simply use a list for that:
category = [c['name'] for c in category]
# ['mammals', 'birds', 'fish', 'reptiles', 'invertebrates', 'amphibians']
sorted_animals = sorted(animals, key=lambda k: category[k['category']])
#[{'name': 'frog', 'category': 5}, {'name': 'salamander', 'category': 5}, {'name': 'whale', 'category': 2}, {'name': 'tuna ', 'category': 2}, {'name': 'salmon', 'category': 2}, {'name': 'mollusk', 'category': 4}, {'name': 'worms', 'category': 4}, {'name': 'octopus', 'category': 4}, {'name': 'horse', 'category': 0}, {'name': 'dog', 'category': 0}, {'name': 'horse', 'category': 0}, {'name': 'monkey', 'category': 0}, {'name': 'kangaroos', 'category': 0}, {'name': 'alligator', 'category': 3}]

Generate all combinations from a nested python dictionary and segregate them

My sample dict is:
sample_dict = {
'company': {
'employee': {
'name': [
{'explore': ["noname"],
'valid': ["john","tom"],
'boundary': ["aaaaaaaaaa"],
'negative': ["$"]}],
'age': [
{'explore': [200],
'valid': [20,30],
'boundary': [1,99],
'negative': [-1,100]}],
'others':{
'grade':[
{'explore': ["star"],
'valid': ["A","B"],
'boundary': ["C"],
'negative': ["AB"]}]}
}
}}
Its a "follow-on" question to-> Split python dictionary to result in all combinations of values
I would like to get a segregated list of combinations like below
Valid combinations:[generate only out of valid list of data]
COMPLETE OUTPUT for VALID CATEGORY :
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 20}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 20}, 'name': 'tom', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 30}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': 'john', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 30}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': 'tom', 'others': {'grade': 'B'}}}
Negative combinations : [Here its bit tricky because, negative combinations should be combined with "valid" pool as well with atleast only value being negative]
Complete output expected for NEGATIVE category :
=>[Basically, excluding combinations where all values are valid - ensuring atleast one value in the combination is from negative group]
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': -1}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 100}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': '$', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': '$', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': -1}, 'name': '$', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': 100}, 'name': '$', 'others': {'grade': 'AB'}}}
In the above output, in the first line, grade is tested for negative value AB by keeping remaining all valid. So its not necessary to generate the same with age as 30 as the intent is to test only negative set. We can supply the remaining parameters with any valid data.
Boundary Combinations is similar to valid -> Combinations for all values within the boundary pool only
Explore : Similar to negative - Mix with valid pool and always atleast one explore value in all combinations.
Sample dict - revised version
sample_dict2 = {
'company': {
'employee_list': [
{'employee': {'age': [{'boundary': [1,99],
'explore': [200],
'negative': [-1,100],
'valid': [20, 30]}],
'name': [{'boundary': ['aaaaaaaaaa'],
'explore': ['noname'],
'negative': ['$'],
'valid': ['john','tom']}],
'others': {
'grade': [
{'boundary': ['C'],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A','B']},
{'boundary': ['C'],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A','B']}]}}},
{'employee': {'age': [{'boundary': [1, 99],
'explore': [200],
'negative': [],
'valid': [20, 30]}],
'name': [{'boundary': [],
'explore': [],
'negative': ['$'],
'valid': ['john', 'tom']}],
'others': {
'grade': [
{'boundary': ['C'],
'explore': ['star'],
'negative': [],
'valid': ['A', 'B']},
{'boundary': [],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A', 'B']}]}}}
]
}
}
The sample_dict2 contains list of dicts. Here "employee" the whole hierarchy is a list element and also leaf node "grade" is a list
Also, except "valid" and "boundary" other data set can be empty - [] and we need to handle them as well.
VALID COMBINATIONS will be like
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','B']}}]}}
plus combinations of age=30 and name =tom in employee index 0

import itertools
def generate_combinations(thing, positive="valid", negative=None):
""" Generate all possible combinations, walking and mimicking structure of "thing" """
if isinstance(thing, dict): # if dictionary, distinguish between two types of dictionary
if positive in thing:
return thing[positive] if negative is None else [thing[positive][0]] + thing[negative]
else:
results = []
for key, value in thing.items(): # generate all possible key: value combinations
subresults = []
for result in generate_combinations(value, positive, negative):
subresults.append((key, result))
results.append(subresults)
return [dict(result) for result in itertools.product(*results)]
elif isinstance(thing, list) or isinstance(thing, tuple): # added tuple just to be safe
results = []
for element in thing: # generate recursive result sets for each element of list
for result in generate_combinations(element, positive, negative):
results.append(result)
return results
else: # not a type we know how to handle
raise TypeError("Unexpected type")
def generate_invalid_combinations(thing):
""" Generate all possible combinations and weed out the valid ones """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='negative') if result not in valid]
def generate_boundary_combinations(thing):
""" Generate all possible boundary combinations """
return generate_combinations(thing, positive="boundary")
def generate_explore_combinations(thing):
""" Generate all possible explore combinations and weed out the valid ones """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='explore') if result not in valid]
Calling generate_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 20, 'name': 'tom', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'tom', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 30, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 30, 'name': 'john', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 30, 'name': 'tom', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 30, 'name': 'tom', 'others': {'grade': 'B'}}}}
]
Calling generate_invalid_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 20, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': '$', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': -1, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': -1, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': -1, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': -1, 'name': '$', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 100, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 100, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 100, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 100, 'name': '$', 'others': {'grade': 'AB'}}}}
]
Calling generate_boundary_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 1, 'name': 'aaaaaaaaaa', 'others': {'grade': 'C'}}}},
{'company': {'employee': {'age': 99, 'name': 'aaaaaaaaaa', 'others': {'grade': 'C'}}}}
]
Calling generate_explore_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 20, 'name': 'noname', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'noname', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 200, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 200, 'name': 'john', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 200, 'name': 'noname', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 200, 'name': 'noname', 'others': {'grade': 'star'}}}}
]
REVISED SOLUTION (To match revised problem)
import itertools
import random
def generate_combinations(thing, positive="valid", negative=None):
""" Generate all possible combinations, walking and mimicking structure of "thing" """
if isinstance(thing, dict): # if dictionary, distinguish between two types of dictionary
if positive in thing:
if negative is None:
return thing[positive] # here it's OK if it's empty
elif thing[positive]: # here it's not OK if it's empty
return [random.choice(thing[positive])] + thing[negative]
else:
return []
else:
results = []
for key, value in thing.items(): # generate all possible key: value combinations
results.append([(key, result) for result in generate_combinations(value, positive, negative)])
return [dict(result) for result in itertools.product(*results)]
elif isinstance(thing, (list, tuple)): # added tuple just to be safe (thanks Padraic!)
# generate recursive result sets for each element of list
results = [generate_combinations(element, positive, negative) for element in thing]
return [list(result) for result in itertools.product(*results)]
else: # not a type we know how to handle
raise TypeError("Unexpected type")
def generate_boundary_combinations(thing):
""" Generate all possible boundary combinations """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='boundary') if result not in valid]
generate_invalid_combinations() and generate_explore_combinations() are the same as before. Subtle differences:
Instead of grabbing the first item out of the valid array in a negative evaluation, it now grabs a random item from the valid array.
Values for items like 'age': [30] come back as lists as that's how they were specified:
'age': [{'boundary': [1, 99],
'explore': [200],
'negative': [-1, 100],
'valid': [20, 30]}],
If you instead want 'age': 30 like the earlier output examples, then modify the definition accordingly:
'age': {'boundary': [1, 99],
'explore': [200],
'negative': [-1, 100],
'valid': [20, 30]},
The boundary property is now treated like one of the 'negative' values.
Just for reference, I don't plan to generate all the outputs this time: calling generate_combinations(sample_dict2) returns results like:
[
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'B']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'B']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['B', 'A']}, 'age': [20]}}]}},
...
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['A', 'B']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'A']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'A']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}]}}
]

This is an open-ended hornet's nest of a question.
Look at the whitepapers for Agitar other tools by Agitar to see if this what you are thinking about.
Look at Knuth's work on combinationals. It's a tough read.
Consider just writing a recursive descent generator that uses 'yield '.

How to get whole dict with max value of a common key in a list of dicts

I have a list of dicts like below:
lod = [
{'name': 'Tom', 'score': 60},
{'name': 'Tim', 'score': 70},
{'name': 'Tam', 'score': 80},
{'name': 'Tem', 'score': 90}
]
I want to get {'name': 'Tem', 'score':90} but I only can do below:
max(x['score'] for x in lod)
This only return the value 90.
How can I get the whole dict?

You can use the key function of max:
>>> lod = [
... {'name': 'Tom', 'score': 60},
... {'name': 'Tim', 'score': 70},
... {'name': 'Tam', 'score': 80},
... {'name': 'Tem', 'score': 90}
... ]
...
>>> max(lod, key=lambda x: x['score'])
{'name': 'Tem', 'score': 90}

Just pass your list to max, like this:
>>> from operator import itemgetter
>>> lod = [
... {'name': 'Tom', 'score': 60},
... {'name': 'Tim', 'score': 70},
... {'name': 'Tam', 'score': 80},
... {'name': 'Tem', 'score': 90}
... ]
>>> max(lod, key=itemgetter('score'))
{'score': 90, 'name': 'Tem'}

I dont know whether sorting is time consuming,
>>>sorted(lod, key=lambda x:x['score'])[-1]
{'name': 'Tem', 'score': 90}

item frequency in a python list of dictionaries

Ok, so I have a list of dicts:
[{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
and I want the 'frequency' of the items within each column. So for this I'd get something like:
{'name': {'johnny': 2, 'jakob': 1, 'aaron': 1, 'max': 1},
'surname': {'smith': 2, 'ryan': 1, 'specter': 1, 'headroom': 1},
'age': {53:1, 13:1, 27: 1. 22:1, 108:1}}
Any modules out there that can do stuff like this?

collections.defaultdict from the standard library to the rescue:
from collections import defaultdict
LofD = [{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
def counters():
return defaultdict(int)
def freqs(LofD):
r = defaultdict(counters)
for d in LofD:
for k, v in d.items():
r[k][v] += 1
return dict((k, dict(v)) for k, v in r.items())
print freqs(LofD)
emits
{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1}, 'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1}, 'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}
as desired (order of keys apart, of course -- it's irrelevant in a dict).

items = [{'name': 'johnny', 'surname': 'smith', 'age': 53}, {'name': 'johnny', 'surname': 'ryan', 'age': 13}, {'name': 'jakob', 'surname': 'smith', 'age': 27}, {'name': 'aaron', 'surname': 'specter', 'age': 22}, {'name': 'max', 'surname': 'headroom', 'age': 108}]
global_dict = {}
for item in items:
for key, value in item.items():
if not global_dict.has_key(key):
global_dict[key] = {}
if not global_dict[key].has_key(value):
global_dict[key][value] = 0
global_dict[key][value] += 1
print global_dict
Simplest solution and actually tested.

New in Python 3.1: The collections.Counter class:
mydict=[{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
import collections
newdict = {}
for key in mydict[0].keys():
l = [value[key] for value in mydict]
newdict[key] = dict(collections.Counter(l))
print(newdict)
outputs:
{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1},
'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1},
'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}

This?
from collections import defaultdict
fq = { 'name': defaultdict(int), 'surname': defaultdict(int), 'age': defaultdict(int) }
for row in listOfDicts:
for field in fq:
fq[field][row[field]] += 1
print fq

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Count duplicates in dictionary by specific keys - python

Related

list of dicts- get the number of duplications [closed]

Sort a list of dict with a key from another list of dict

Generate all combinations from a nested python dictionary and segregate them

How to get whole dict with max value of a common key in a list of dicts

item frequency in a python list of dictionaries

Categories

Resources