I have a problem, I have a list like this:
[{'id': 34, 'questionid': 5, 'text': 'yes', 'score': 1}, {'id': 10, 'questionid': 5,
'text': 'test answer updated', 'score': 2}, {'id': 20, 'questionid': 5, 'text': 'no',
'score': 0}, {'id': 35, 'questionid': 5, 'text': 'yes', 'score': 1}]
and I want remove duplicate "questionid", "text" and "score", for example in this case I want output like this:
[{'id': 34, 'questionid': 5, 'text': 'yes', 'score': 1}, {'id': 10, 'questionid': 5,
'text': 'test answer updated', 'score': 2}, {'id': 20, 'questionid': 5, 'text': 'no',
'score': 0}]
How can I get this output in python?
We could create dictionary that has "questionid", "text" and "score" tuple as key and dicts as values and use this dictionary to check for duplicate values in data:
from operator import itemgetter
out = {}
for d in data:
key = itemgetter("questionid", "text", "score")(d)
if key not in out:
out[key] = d
out = list(out.values())
Output:
[{'id': 34, 'questionid': 5, 'text': 'yes', 'score': 1},
{'id': 10, 'questionid': 5, 'text': 'test answer updated', 'score': 2},
{'id': 20, 'questionid': 5, 'text': 'no', 'score': 0}]
I have a list of dictionaries and I need to count duplicates by specific keys.
For example:
[
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
If will specify 'age' and 'country' it should return
[
{
'age': 10,
'country': 'USA',
'count': 2
},
{
'age': 10,
'country': 'Canada',
'count': 2
},
{
'age': 15,
'country': 'Canada',
'count': 1
}
]
Or if I will specify 'name' and 'height':
[
{
'name': 'John',
'height': 185,
'count': 2
},
{
'name': 'Mark',
'height': 180,
'count': 2
},
{
'name': 'Doe',
'heigth': 185,
'count': 1
}
]
Maybe there is a way to implement this by Counter?
You can use itertools.groupby with sorted list:
>>> data = [
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
>>> from itertools import groupby
>>> key = 'age', 'country'
>>> list_sorter = lambda x: tuple(x[k] for k in key)
>>> grouper = lambda x: tuple(x[k] for k in key)
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'age': 10, 'country': 'Canada', 'count': 2},
{'age': 10, 'country': 'USA', 'count': 2},
{'age': 15, 'country': 'Canada', 'count': 1}]
>>> key = 'name', 'height'
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
If you use pandas then you can use, pandas.DataFrame.groupby, pandas.groupby.size, pandas.Series.to_frame, pandas.DataFrame.reset_index and finally pandas.DataFrame.to_dict with orient='records':
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> df.groupby(list(key)).size().to_frame('count').reset_index().to_dict('records')
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
{'alarms': [{'date': '20170925T235525-0700',
'id': 8,
'ip': '172.26.70.4',
'severity': 4,
'type': 45},
{'date': '20170925T235525-0700',
'id': 7,
'ip': '172.26.70.4',
'severity': 4,
'type': 45},
{'date': '20170925T235525-0700',
'id': 6,
'ip': '172.26.70.4',
'severity': 4,
'type': 45},
{'date': '20170925T220858-0700',
'id': 5,
'ip': '172.26.70.4',
'severity': 6,
'type': 44},
{'date': '20170925T220857-0700',
'id': 4,
'ip': '172.26.70.4',
'severity': 6,
'type': 44},
{'date': '20170925T220857-0700',
'id': 3,
'ip': '172.26.70.4',
'severity': 6,
'type': 44},
{'date': '20170925T220856-0700',
'id': 2,
'severity': 6,
'type': 32},
{'date': '20170925T220850-0700', 'id': 1, 'severity': 6, 'type': 1},
{'date': '20170925T220850-0700',
'id': 0,
'severity': 6,
'type': 33}]}
Need to fetch first key value pair (i.e., 'type': 45)
Kindly guide, I am trying it on Python 2.7.
Your data is a dictionary where the `"alarms" key is associated with a list of dictionaries.
That dictionary is in the list associated with the "alarms" key. So you can fetch it with:
data['alarms'][0]
with data the variable that stores this structure. So:
>>> data['alarms'][0]
{'date': '20170925T235525-0700', 'severity': 4, 'id': 8, 'ip': '172.26.70.4', 'type': 45}
you will need to do :
def return_correct_dict(data):
for d in data['alarms']:
if d.get('type',"") == 45:
return d
You have a dictionary of a list of dictionaries.
Suppose your dictionary is stored in a variable named dict.
dict['alarm'][0]['type'] will give you the value 45.
I m trying to sort a list of dict using sorted
>>> help(sorted)
Help on built-in function sorted in module __builtin__:
sorted(...)
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list
I have just given list to sorted and it sorts according to id.
>>>l = [{'id': 4, 'quantity': 40}, {'id': 1, 'quantity': 10}, {'id': 2, 'quantity': 20}, {'id': 3, 'quantity': 30}, {'id': 6, 'quantity': 60}, {'id': 7, 'quantity': -30}]
>>> sorted(l) # sorts by id
[{'id': -1, 'quantity': -10}, {'id': 1, 'quantity': 10}, {'id': 2, 'quantity': 20}, {'id': 3, 'quantity': 30}, {'id': 4, 'quantity': 40}, {'id': 6, 'quantity': 60}, {'id': 7, 'quantity': -30}]
>>> l.sort()
>>> l # sorts by id
[{'id': -1, 'quantity': -10}, {'id': 1, 'quantity': 10}, {'id': 2, 'quantity': 20}, {'id': 3, 'quantity': 30}, {'id': 4, 'quantity': 40}, {'id': 6, 'quantity': 60}, {'id': 7, 'quantity': -30}]
Many example of sorted says it requires key to sort the list of dict. But I didn't give any key. Why it didn't sort according to quantity? How did it choose to sort with id?
I tried another example with name & age,
>>> a
[{'age': 1, 'name': 'john'}, {'age': 3, 'name': 'shyam'}, {'age': 30,'name': 'ram'}, {'age': 15, 'name': 'rita'}, {'age': 5, 'name': 'sita'}]
>>> sorted(a) # sorts by age
[{'age': 1, 'name': 'john'}, {'age': 3, 'name': 'shyam'}, {'age': 5, 'name':'sita'}, {'age': 15, 'name': 'rita'}, {'age': 30, 'name': 'ram'}]
>>> a.sort() # sorts by age
>>> a
[{'age': 1, 'name': 'john'}, {'age': 3, 'name': 'shyam'}, {'age': 5, 'name':'sita'}, {'age': 15, 'name': 'rita'}, {'age': 30, 'name': 'ram'}]
Here it sorts according to age but not name. What am I missing in default behavior of these method?
From some old Python docs:
Mappings (dictionaries) compare equal if and only if their sorted (key, value) lists compare equal. Outcomes other than equality are resolved consistently, but are not otherwise defined.
Earlier versions of Python used lexicographic comparison of the sorted (key, value) lists, but this was very expensive for the common case of comparing for equality. An even earlier version of Python compared dictionaries by identity only, but this caused surprises because people expected to be able to test a dictionary for emptiness by comparing it to {}.
Ignore the default behaviour and just provide a key.
By default it will compare against the first difference it finds. If you are sorting dictionaries this is quite dangerous (consistent yet undefined).
Pass a function to key= parameter that takes a value from the list (in this case a dictionary) and returns the value to sort against.
>>> a
[{'age': 1, 'name': 'john'}, {'age': 3, 'name': 'shyam'}, {'age': 30,'name': 'ram'}, {'age': 15, 'name': 'rita'}, {'age': 5, 'name': 'sita'}]
>>> sorted(a, key=lambda d : d['name']) # sorts by name
[{'age': 1, 'name': 'john'}, {'age': 30, 'name': 'ram'}, {'age': 15, 'name': 'rita'}, {'age': 3, 'name': 'shyam'}, {'age': 5, 'name': 'sita'}]
See https://wiki.python.org/moin/HowTo/Sorting
The key parameter is quite powerful as it can cope with all sorts of data to be sorted, although maybe not very intuitive.
My sample dict is:
sample_dict = {
'company': {
'employee': {
'name': [
{'explore': ["noname"],
'valid': ["john","tom"],
'boundary': ["aaaaaaaaaa"],
'negative': ["$"]}],
'age': [
{'explore': [200],
'valid': [20,30],
'boundary': [1,99],
'negative': [-1,100]}],
'others':{
'grade':[
{'explore': ["star"],
'valid': ["A","B"],
'boundary': ["C"],
'negative': ["AB"]}]}
}
}}
Its a "follow-on" question to-> Split python dictionary to result in all combinations of values
I would like to get a segregated list of combinations like below
Valid combinations:[generate only out of valid list of data]
COMPLETE OUTPUT for VALID CATEGORY :
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 20}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 20}, 'name': 'tom', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 30}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': 'john', 'others': {'grade': 'B'}}}
{'company': {'employee': {'age': 30}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': 'tom', 'others': {'grade': 'B'}}}
Negative combinations : [Here its bit tricky because, negative combinations should be combined with "valid" pool as well with atleast only value being negative]
Complete output expected for NEGATIVE category :
=>[Basically, excluding combinations where all values are valid - ensuring atleast one value in the combination is from negative group]
{'company': {'employee': {'age': 20}, 'name': 'john', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': -1}, 'name': 'tom', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 100}, 'name': 'john', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': '$', 'others': {'grade': 'A'}}}
{'company': {'employee': {'age': 30}, 'name': '$', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': -1}, 'name': '$', 'others': {'grade': 'AB'}}}
{'company': {'employee': {'age': 100}, 'name': '$', 'others': {'grade': 'AB'}}}
In the above output, in the first line, grade is tested for negative value AB by keeping remaining all valid. So its not necessary to generate the same with age as 30 as the intent is to test only negative set. We can supply the remaining parameters with any valid data.
Boundary Combinations is similar to valid -> Combinations for all values within the boundary pool only
Explore : Similar to negative - Mix with valid pool and always atleast one explore value in all combinations.
Sample dict - revised version
sample_dict2 = {
'company': {
'employee_list': [
{'employee': {'age': [{'boundary': [1,99],
'explore': [200],
'negative': [-1,100],
'valid': [20, 30]}],
'name': [{'boundary': ['aaaaaaaaaa'],
'explore': ['noname'],
'negative': ['$'],
'valid': ['john','tom']}],
'others': {
'grade': [
{'boundary': ['C'],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A','B']},
{'boundary': ['C'],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A','B']}]}}},
{'employee': {'age': [{'boundary': [1, 99],
'explore': [200],
'negative': [],
'valid': [20, 30]}],
'name': [{'boundary': [],
'explore': [],
'negative': ['$'],
'valid': ['john', 'tom']}],
'others': {
'grade': [
{'boundary': ['C'],
'explore': ['star'],
'negative': [],
'valid': ['A', 'B']},
{'boundary': [],
'explore': ['star'],
'negative': ['AB'],
'valid': ['A', 'B']}]}}}
]
}
}
The sample_dict2 contains list of dicts. Here "employee" the whole hierarchy is a list element and also leaf node "grade" is a list
Also, except "valid" and "boundary" other data set can be empty - [] and we need to handle them as well.
VALID COMBINATIONS will be like
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','A']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'john', 'others': {'grade': ['A','B']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','A']}}]}}
{'company': {'employee_list':[{'employee': {'age': 20}, 'name': 'john', 'others': {'grade': ['A','B']}},{'employee': {'age': 1}, 'name': 'tom', 'others': {'grade': ['A','B']}}]}}
plus combinations of age=30 and name =tom in employee index 0
import itertools
def generate_combinations(thing, positive="valid", negative=None):
""" Generate all possible combinations, walking and mimicking structure of "thing" """
if isinstance(thing, dict): # if dictionary, distinguish between two types of dictionary
if positive in thing:
return thing[positive] if negative is None else [thing[positive][0]] + thing[negative]
else:
results = []
for key, value in thing.items(): # generate all possible key: value combinations
subresults = []
for result in generate_combinations(value, positive, negative):
subresults.append((key, result))
results.append(subresults)
return [dict(result) for result in itertools.product(*results)]
elif isinstance(thing, list) or isinstance(thing, tuple): # added tuple just to be safe
results = []
for element in thing: # generate recursive result sets for each element of list
for result in generate_combinations(element, positive, negative):
results.append(result)
return results
else: # not a type we know how to handle
raise TypeError("Unexpected type")
def generate_invalid_combinations(thing):
""" Generate all possible combinations and weed out the valid ones """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='negative') if result not in valid]
def generate_boundary_combinations(thing):
""" Generate all possible boundary combinations """
return generate_combinations(thing, positive="boundary")
def generate_explore_combinations(thing):
""" Generate all possible explore combinations and weed out the valid ones """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='explore') if result not in valid]
Calling generate_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 20, 'name': 'tom', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'tom', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 30, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 30, 'name': 'john', 'others': {'grade': 'B'}}}},
{'company': {'employee': {'age': 30, 'name': 'tom', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 30, 'name': 'tom', 'others': {'grade': 'B'}}}}
]
Calling generate_invalid_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 20, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': '$', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': -1, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': -1, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': -1, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': -1, 'name': '$', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 100, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 100, 'name': 'john', 'others': {'grade': 'AB'}}}},
{'company': {'employee': {'age': 100, 'name': '$', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 100, 'name': '$', 'others': {'grade': 'AB'}}}}
]
Calling generate_boundary_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 1, 'name': 'aaaaaaaaaa', 'others': {'grade': 'C'}}}},
{'company': {'employee': {'age': 99, 'name': 'aaaaaaaaaa', 'others': {'grade': 'C'}}}}
]
Calling generate_explore_combinations(sample_dict) returns:
[
{'company': {'employee': {'age': 20, 'name': 'john', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 20, 'name': 'noname', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 20, 'name': 'noname', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 200, 'name': 'john', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 200, 'name': 'john', 'others': {'grade': 'star'}}}},
{'company': {'employee': {'age': 200, 'name': 'noname', 'others': {'grade': 'A'}}}},
{'company': {'employee': {'age': 200, 'name': 'noname', 'others': {'grade': 'star'}}}}
]
REVISED SOLUTION (To match revised problem)
import itertools
import random
def generate_combinations(thing, positive="valid", negative=None):
""" Generate all possible combinations, walking and mimicking structure of "thing" """
if isinstance(thing, dict): # if dictionary, distinguish between two types of dictionary
if positive in thing:
if negative is None:
return thing[positive] # here it's OK if it's empty
elif thing[positive]: # here it's not OK if it's empty
return [random.choice(thing[positive])] + thing[negative]
else:
return []
else:
results = []
for key, value in thing.items(): # generate all possible key: value combinations
results.append([(key, result) for result in generate_combinations(value, positive, negative)])
return [dict(result) for result in itertools.product(*results)]
elif isinstance(thing, (list, tuple)): # added tuple just to be safe (thanks Padraic!)
# generate recursive result sets for each element of list
results = [generate_combinations(element, positive, negative) for element in thing]
return [list(result) for result in itertools.product(*results)]
else: # not a type we know how to handle
raise TypeError("Unexpected type")
def generate_boundary_combinations(thing):
""" Generate all possible boundary combinations """
valid = generate_combinations(thing)
return [result for result in generate_combinations(thing, negative='boundary') if result not in valid]
generate_invalid_combinations() and generate_explore_combinations() are the same as before. Subtle differences:
Instead of grabbing the first item out of the valid array in a negative evaluation, it now grabs a random item from the valid array.
Values for items like 'age': [30] come back as lists as that's how they were specified:
'age': [{'boundary': [1, 99],
'explore': [200],
'negative': [-1, 100],
'valid': [20, 30]}],
If you instead want 'age': 30 like the earlier output examples, then modify the definition accordingly:
'age': {'boundary': [1, 99],
'explore': [200],
'negative': [-1, 100],
'valid': [20, 30]},
The boundary property is now treated like one of the 'negative' values.
Just for reference, I don't plan to generate all the outputs this time: calling generate_combinations(sample_dict2) returns results like:
[
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'B']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['A', 'B']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['john'], 'others': {'grade': ['A', 'A']}, 'age': [20]}}, {'employee': {'name': ['john'], 'others': {'grade': ['B', 'A']}, 'age': [20]}}]}},
...
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['A', 'B']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'A']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'A']}, 'age': [30]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [20]}}]}},
{'company': {'employee_list': [{'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}, {'employee': {'name': ['tom'], 'others': {'grade': ['B', 'B']}, 'age': [30]}}]}}
]
This is an open-ended hornet's nest of a question.
Look at the whitepapers for Agitar other tools by Agitar to see if this what you are thinking about.
Look at Knuth's work on combinationals. It's a tough read.
Consider just writing a recursive descent generator that uses 'yield '.