This is a follow-up to this question: Using pandas to add list elements together. I would like to generalize this function to getting unique elements in an array, even if they're not of a 'hashable' type, such as a dict. Here is the input array:
items = [
{
'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Place': 'X', 'Age': 22}, {'Place': 'Y', 'Age': 23}]
},
{
'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Place': 'Z', 'Age': 20}]
},
{
'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Place': 'Z', 'Age': 20}]
},
{
'FirstName': 'Bob',
'LastName': 'Jones',
'Residence': [{'Place': 'Z', 'Age': 20}]
}
]
I want to add together the unique Residences (dicts) together, so the final result would be:
items = [
{
'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Place': 'X', 'Age': 22}, {'Place': 'Y', 'Age': 23}, {'Place': 'Z', 'Age': 20}]
},
{
'FirstName': 'Bob',
'LastName': 'Jones',
'Residence': [{'Place': 'Z', 'Age': 20}]
}
]
The SQL I would use would be something like this:
SELECT FirstName, LastName, GROUP_CONCAT(DISTINCT **Residence Object**)
FROM items
GROUP BY FirstName, LastName
How would I do this in pandas, so that I don't get an unhashable type error when trying to get the distinct array elements?
Barring anything else, I don't think Pandas would give you any real benefit here:
from collections import defaultdict
d = defaultdict(list)
for e in items:
d[(e['FirstName'], e['LastName'])].append(e['Residence'])
items = [{'FirstName': k[0], 'LastName': k[1], 'Residence': v} for k, v in d.items()]
Solution from pandas
#df=pd.DataFrame(items)
df.groupby(['FirstName','LastName']).Residence.\
apply(lambda x : x.sum()).\
apply(lambda x : [dict(y) for y in set(tuple(t.items()) for t in x)]).\
reset_index().to_dict('r')
Out[104]:
[{'FirstName': 'Bob',
'LastName': 'Jones',
'Residence': [{'Age': 20, 'Place': 'Z'}]},
{'FirstName': 'David',
'LastName': 'Smith',
'Residence': [{'Age': 20, 'Place': 'Z'},
{'Age': 23, 'Place': 'Y'},
{'Age': 22, 'Place': 'X'}]}]
My sample dict is:
sample_dict = {
'company': {
'employee': {
'name': [
{'explore': ["noname"],
'valid': ["john","tom"],
'boundary': ["aaaaaaaaaa"],
'negative': ["$"]}],
'age': [
{'explore': [200],
'valid': [20,30],
'boundary': [1,99],
'negative': [-1,100]}],
'others':{
'grade':[
{'explore': ["star"],
'valid': ["A","B"],
'boundary': ["C"],
'negative': ["AB"]}]}
}
}}
Its a "follow-on" question to-> Split python dictionary to result in all combinations of values
I would like to get a segregated list of combinations like below
Valid combinations:[generate only out of valid list of data]
COMPLETE OUTPUT for VALID CATEGORY :
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 20}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 20}, 'name': 'tom', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 30}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': 'john', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 30}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': 'tom', 'others': {'grade': 'B'}}}
Negative combinations : [Here its bit tricky because, negative combinations should be combined with "valid" pool as well with atleast only value being negative]
Complete output expected for NEGATIVE category :
=>[Basically, excluding combinations where all values are valid - ensuring atleast one value in the combination is from negative group]
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': -1}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 100}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': '$', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': '$', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': -1}, 'name': '$', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': 100}, 'name': '$', 'others': {'grade': 'AB'}}}
In the above output, in the first line, grade is tested for negative value AB by keeping remaining all valid. So its not necessary to generate the same with age as 30 as the intent is to test only negative set. We can supply the remaining parameters with any valid data.
Boundary Combinations is similar to valid -> Combinations for all values within the boundary pool only
Explore : Similar to negative - Mix with valid pool and always atleast one explore value in all combinations.
Sample dict - revised version
sample_dict2 = {
'company': {
'employee_list': [
{'employee': {'age': [{'boundary': [1,99],
'explore': [200],
'negative': [-1,100],
'valid': [20, 30]}],
'name': [{'boundary': ['aaaaaaaaaa'],
'explore': ['noname'],
'negative': ['$'],
'valid': ['john','tom']}],
'others': {
'grade': [
{'boundary': ['C'],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A','B']},
{'boundary': ['C'],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A','B']}]}}},
{'employee': {'age': [{'boundary': [1, 99],
'explore': [200],
'negative': [],
'valid': [20, 30]}],
'name': [{'boundary': [],
'explore': [],
'negative': ['$'],
'valid': ['john', 'tom']}],
'others': {
'grade': [
{'boundary': ['C'],
'explore': ['star'],
'negative': [],
'valid': ['A', 'B']},
{'boundary': [],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A', 'B']}]}}}
]
}
}
The sample_dict2 contains list of dicts. Here "employee" the whole hierarchy is a list element and also leaf node "grade" is a list
Also, except "valid" and "boundary" other data set can be empty - [] and we need to handle them as well.
VALID COMBINATIONS will be like
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','B']}}]}}
plus combinations of age=30 and name =tom in employee index 0
import itertools
def generate_combinations(thing, positive="valid", negative=None):
""" Generate all possible combinations, walking and mimicking structure of "thing" """
if isinstance(thing, dict): # if dictionary, distinguish between two types of dictionary
if positive in thing:
return thing[positive] if negative is None else [thing[positive][0]] + thing[negative]
else:
results = []
for key, value in thing.items(): # generate all possible key: value combinations
subresults = []
for result in generate_combinations(value, positive, negative):
subresults.append((key, result))
results.append(subresults)
return [dict(result) for result in itertools.product(*results)]
elif isinstance(thing, list) or isinstance(thing, tuple): # added tuple just to be safe
results = []
for element in thing: # generate recursive result sets for each element of list
for result in generate_combinations(element, positive, negative):
results.append(result)
return results
else: # not a type we know how to handle
raise TypeError("Unexpected type")
def generate_invalid_combinations(thing):
""" Generate all possible combinations and weed out the valid ones """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='negative') if result not in valid]
def generate_boundary_combinations(thing):
""" Generate all possible boundary combinations """
return generate_combinations(thing, positive="boundary")
def generate_explore_combinations(thing):
""" Generate all possible explore combinations and weed out the valid ones """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='explore') if result not in valid]
Calling generate_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 20, 'name': 'tom', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'tom', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 30, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 30, 'name': 'john', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 30, 'name': 'tom', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 30, 'name': 'tom', 'others': {'grade': 'B'}}}}
]
Calling generate_invalid_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 20, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': '$', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': -1, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': -1, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': -1, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': -1, 'name': '$', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 100, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 100, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 100, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 100, 'name': '$', 'others': {'grade': 'AB'}}}}
]
Calling generate_boundary_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 1, 'name': 'aaaaaaaaaa', 'others': {'grade': 'C'}}}},
{'company': {'employee': {'age': 99, 'name': 'aaaaaaaaaa', 'others': {'grade': 'C'}}}}
]
Calling generate_explore_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 20, 'name': 'noname', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'noname', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 200, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 200, 'name': 'john', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 200, 'name': 'noname', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 200, 'name': 'noname', 'others': {'grade': 'star'}}}}
]
REVISED SOLUTION (To match revised problem)
import itertools
import random
def generate_combinations(thing, positive="valid", negative=None):
""" Generate all possible combinations, walking and mimicking structure of "thing" """
if isinstance(thing, dict): # if dictionary, distinguish between two types of dictionary
if positive in thing:
if negative is None:
return thing[positive] # here it's OK if it's empty
elif thing[positive]: # here it's not OK if it's empty
return [random.choice(thing[positive])] + thing[negative]
else:
return []
else:
results = []
for key, value in thing.items(): # generate all possible key: value combinations
results.append([(key, result) for result in generate_combinations(value, positive, negative)])
return [dict(result) for result in itertools.product(*results)]
elif isinstance(thing, (list, tuple)): # added tuple just to be safe (thanks Padraic!)
# generate recursive result sets for each element of list
results = [generate_combinations(element, positive, negative) for element in thing]
return [list(result) for result in itertools.product(*results)]
else: # not a type we know how to handle
raise TypeError("Unexpected type")
def generate_boundary_combinations(thing):
""" Generate all possible boundary combinations """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='boundary') if result not in valid]
generate_invalid_combinations() and generate_explore_combinations() are the same as before. Subtle differences:
Instead of grabbing the first item out of the valid array in a negative evaluation, it now grabs a random item from the valid array.
Values for items like 'age': [30] come back as lists as that's how they were specified:
'age': [{'boundary': [1, 99],
'explore': [200],
'negative': [-1, 100],
'valid': [20, 30]}],
If you instead want 'age': 30 like the earlier output examples, then modify the definition accordingly:
'age': {'boundary': [1, 99],
'explore': [200],
'negative': [-1, 100],
'valid': [20, 30]},
The boundary property is now treated like one of the 'negative' values.
Just for reference, I don't plan to generate all the outputs this time: calling generate_combinations(sample_dict2) returns results like:
[
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'B']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'B']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['B', 'A']}, 'age': [20]}}]}},
...
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['A', 'B']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'A']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'A']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}]}}
]
This is an open-ended hornet's nest of a question.
Look at the whitepapers for Agitar other tools by Agitar to see if this what you are thinking about.
Look at Knuth's work on combinationals. It's a tough read.
Consider just writing a recursive descent generator that uses 'yield '.
Ok, so I have a list of dicts:
[{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
and I want the 'frequency' of the items within each column. So for this I'd get something like:
{'name': {'johnny': 2, 'jakob': 1, 'aaron': 1, 'max': 1},
'surname': {'smith': 2, 'ryan': 1, 'specter': 1, 'headroom': 1},
'age': {53:1, 13:1, 27: 1. 22:1, 108:1}}
Any modules out there that can do stuff like this?
collections.defaultdict from the standard library to the rescue:
from collections import defaultdict
LofD = [{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
def counters():
return defaultdict(int)
def freqs(LofD):
r = defaultdict(counters)
for d in LofD:
for k, v in d.items():
r[k][v] += 1
return dict((k, dict(v)) for k, v in r.items())
print freqs(LofD)
emits
{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1}, 'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1}, 'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}
as desired (order of keys apart, of course -- it's irrelevant in a dict).
items = [{'name': 'johnny', 'surname': 'smith', 'age': 53}, {'name': 'johnny', 'surname': 'ryan', 'age': 13}, {'name': 'jakob', 'surname': 'smith', 'age': 27}, {'name': 'aaron', 'surname': 'specter', 'age': 22}, {'name': 'max', 'surname': 'headroom', 'age': 108}]
global_dict = {}
for item in items:
for key, value in item.items():
if not global_dict.has_key(key):
global_dict[key] = {}
if not global_dict[key].has_key(value):
global_dict[key][value] = 0
global_dict[key][value] += 1
print global_dict
Simplest solution and actually tested.
New in Python 3.1: The collections.Counter class:
mydict=[{'name': 'johnny', 'surname': 'smith', 'age': 53},
{'name': 'johnny', 'surname': 'ryan', 'age': 13},
{'name': 'jakob', 'surname': 'smith', 'age': 27},
{'name': 'aaron', 'surname': 'specter', 'age': 22},
{'name': 'max', 'surname': 'headroom', 'age': 108},
]
import collections
newdict = {}
for key in mydict[0].keys():
l = [value[key] for value in mydict]
newdict[key] = dict(collections.Counter(l))
print(newdict)
outputs:
{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1},
'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1},
'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}
This?
from collections import defaultdict
fq = { 'name': defaultdict(int), 'surname': defaultdict(int), 'age': defaultdict(int) }
for row in listOfDicts:
for field in fq:
fq[field][row[field]] += 1
print fq