I need to group a input dictionary based on two keys and return each group as part of a list of dictionaries. For e.g.,
data = {
'name': ['A', 'C', 'B', 'B'],
'tag': [13, 26, 13, 3],
'id': [234, 235, 236, 237],
'values': [[1, 3, 3], [1, 2, 1], [1, 2, 3], [1, 1, 1]],
}
I can use defaultdict to do the subsetting and to return one key of the dict pretty easily. For e.g., this will return a list of dicts grouped by data['name']:
Without using pandas (dataset is too big), how can I groupby one or more tags (say, by=['name', 'tag']) and return a list of dicts?
Edit: Expected output can be a list of dicts:
[
{'name': 'A', tag: 13, 'id': 234, 'values': [1, 3, 3]},
{'name': 'C', tag: 26, 'id': 235, 'values': [1, 2, 1]},
{'name': 'B', tag: 13, 'id': 236, 'values': [1, 2, 3]},
{'name': 'B', tag: 3, 'id': 237, 'values': [1, 1, 2]}
]
or a dict of dicts:
{
('A', 13): {'id': 234, 'values': [1, 3, 3]},
('C', 26): {'id': 235, 'values': [1, 2, 1]},
('B', 13): {'id': 236, 'values': [1, 2, 3]},
('B', 3): {'id': 237, 'values': [1, 1, 2]}
}
It's actually a lot easier than it might seem:
{(n, t): {'id': i, 'values': vs} for n, t, i, vs in zip(*data.values())}
Once you zip the 4 values together, it's just a matter of
iterating over the resulting sequence of tuples,
unpacking each tuple and
constructing the desired key/value pair from the unpacked values.
If there is any concern over the order in which the 4 list values will be returned by data.values(), you can be more explicit:
from operator import itemgetter
# g(data) == (data['name'], data['tag'], data['id'], data['values'])
g = itemgetter('name', 'tag', 'id', 'values')
result = {(n, t): {'id': i, 'values': vs} for n, t, i, vs in zip(*g(data))}
Related
I'm having some trouble accessing a value that is inside an array that contains a dictionary and another array.
It looks like this:
[{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
I want to access the 'count' number for each 'classification'. For example, for 'name' Alex, if 'classification' is 3, then the code returns the 'count' of 383, and so on for the other classifications and names.
Thanks for your help!
Not sure what your question asks, but if it's just a mapping exercise this will get you on the right track.
def get_toys(personDict):
person_toys = personDict.get('number_of_toys')
return [ (toys.get('classification'), toys.get('count')) for toys in person_toys]
def get_person_toys(database):
return [(personDict.get('name'), get_toys(personDict)) for personDict in database]
This result is:
[('Alex', [(3, 383), (1, 29), (0, 61)]), ('John', [(3, 8461), (0, 3825), (1, 1319)])]
This isn't as elegant as the previous answer because it doesn't iterate over the values, but if you want to select specific elements, this is one way to do that:
data = [{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
import pandas as pd
df = pd.DataFrame(data)
print(df.loc[0]['name'])
print(df.loc[0][1][0]['classification'])
print(df.loc[0][1][0]['count'])
which gives:
Alex
3
383
I have a list of dictionaries. The dictionaries have a key called friends whose value is a list of ids. I want to sort the list of dictionary on the basis of number of ids in the friends list.
users=[{'id': 0, 'name': 'Hero', 'friends': [1, 2]},
{'id': 1, 'name': 'Dunn', 'friends': [0, 2, 3]},
{'id': 2, 'name': 'Sue', 'friends': [0, 1, 3]},
{'id': 3, 'name': 'Chi', 'friends': [1, 2, 4]},
{'id': 4, 'name': 'Thor', 'friends': [3, 5]},
{'id': 5, 'name': 'Clive', 'friends': [4, 6, 7]},
{'id': 6, 'name': 'Hicks', 'friends': [5, 8]},
{'id': 7, 'name': 'Devin', 'friends': [5, 8]},
{'id': 8, 'name': 'Kate', 'friends': [6, 7, 9]},
{'id': 9, 'name': 'Klein', 'friends': [8]}]
How do i proceed with it?
I believe this is what you meant:
sorted(users, key=lambda d: len(d['friends']))
The list of users is sorted depending on the number of friends. Users with less friends appear first. If two users have the same number of friends, the order in which they appear is random.
This question already has answers here:
Fast Algorithm to Quickly Find the Range a Number Belongs to in a Set of Ranges?
(5 answers)
Closed 4 years ago.
I have a list of dictionaries in the following way:
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
A second list contains some integers:
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
I now want to join 'some_id' and 'data' such that 'value' is between 'lower_range' and 'upper_range' in a new list. I.e., I want the output to be
list3 = [{'some_id': 1, 'data': 'A'},
{'some_id': 2, 'data': 'B'},
{'some_id': 2, 'data': 'C'},
{'some_id': 3, 'data': 'D'}]
One way to do this would be
list3 = []
for i in list1:
for j in list2:
if (j['value'] >= i['lower_range'] and
j['value'] <= i['upper_range']):
list3.append({'some_id': i['some_id'], 'data': j['data']})
However, this seems highly inefficient. Is there some faster way?
There is a special premise that the ranges do not overlap.
So we can find a candidate by searching for an element with the maximum lower_bound that satisfies the condition.
Binary search can reduce complexity from O(n*n) to O(n log n).
In python3, we can use bisect.
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
list3 = []
list1.sort(key = lambda r: r['lower_range'])
lower_ranges = [r['lower_range'] for r in list1]
from bisect import bisect_right
for record in list2:
position = bisect_right(lower_ranges, record['value']) - 1
if (position < 0): continue
candidate = list1[position]
if (record['value'] <= candidate['upper_range']):
list3.append({'some_id': candidate['some_id'], 'data': record['data']})
print(list3)
output (manual indented)
[{'some_id': 1, 'data': 'A'},
{'some_id': 2, 'data': 'B'},
{'some_id': 2, 'data': 'C'},
{'some_id': 3, 'data': 'D'}]
This is a bit verbose but should be more efficient (O(nlogn) < O(n^2)) due to sorting (you can also sort in-place with list.sort):
#!/usr/bin/env python
from operator import itemgetter
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
# sort before merging so we iterate less (O(nlogn))
list1 = sorted(list1, key=itemgetter('lower_range'))
list2 = sorted(list2, key=itemgetter('value'))
it1 = iter(list1)
it2 = iter(list2)
# merge lists that we know are sorted (simple merging algorithm - O(n))
try:
curr_range = next(it1)
curr_val = next(it2)
list3 = []
while True:
rng = range(curr_range['lower_range'], curr_range['upper_range'] + 1)
value = curr_val['value']
if value in rng:
# got a match, add it and check if there are more values
list3.append({'some_id': curr_range['some_id'],
'data': curr_val['data']})
curr_val = next(it2)
continue
if value < curr_range['lower_range']:
# no match, skip to next value
curr_val = next(it2)
continue
if value >= curr_range['upper_range']:
# range too low for value, try next one
curr_range = next(it1)
continue
except StopIteration:
pass
print(list3)
Gives:
[{'data': 'A', 'some_id': 1},
{'data': 'B', 'some_id': 2},
{'data': 'C', 'some_id': 2},
{'data': 'D', 'some_id': 3}]
You could create a dict that maps values to ids like {3: 1, 4: 1, 5: 1, ..., 8: 2, 9: 2, ...}, which would let you find each dict's id in constant O(1) time:
# create a dict that maps values to ids
value_to_id_dict = {}
for dic in list1:
id_ = dic['some_id']
for value in range(dic['lower_range'], dic['upper_range']+1):
value_to_id_dict[value] = id_
# look up each dict's id in the dict we just created
list3 = []
for dic in list2:
new_dic = {'data': dic['data'],
'some_id': value_to_id_dict[dic['value']]}
list3.append(new_dic)
# result:
# [{'data': 'A', 'some_id': 1},
# {'data': 'B', 'some_id': 2},
# {'data': 'C', 'some_id': 2},
# {'data': 'D', 'some_id': 3}]
My current list:
my_list = [
{'id': 1, 'val': [6]},
{'id': 2, 'val': [7]},
{'id': 3, 'val': [8]},
{'id': 2, 'val': [9]},
{'id': 1, 'val': [10]},
]
Desired output:
my_list = [
{'id': 1, 'val': [6, 10]},
{'id': 2, 'val': [7, 9]},
{'id': 3, 'val': [8]},
]
what I tried so far:
my_new_list = []
id_set = set()
for d in my_list:
if d['id'] not in id_set:
id_set.add(d['id'])
temp = {'id': d['id'], 'val': d['val']}
my_new_list.append(temp)
else:
# loop over the new list and find the dict which already have d['id'] and update by appending value
# but this is not efficient
any other more efficient approach or may be some inbuilt function I'm not aware of.
PS: Order is important!
.setdefault() is your friend:
(We should use collections.OrderedDict to remember the order that keys were first inserted.)
>>> import collections
>>> result = collections.OrderedDict()
>>> for d in my_list:
... result.setdefault(d["id"], []).extend(d["val"])
>>> lst = []
>>> for k, v in result.items():
... lst.append({"id": k, "val": v})
Same approach as ozgur, but using collections.defaultdict:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for dd in my_list:
d[dd['id']].extend(dd['val'])
>>> d
defaultdict(<type 'list'>, {1: [6, 10], 2: [7, 9], 3: [8]})
>>>
>>> lst = []
>>> for k,v in d.iteritems():
lst.append({'id':k, 'val':v})
>>> lst
[{'id': 1, 'val': [6, 10]}, {'id': 2, 'val': [7, 9]}, {'id': 3, 'val': [8]}]
>>>
You can use itertools.groupby in order to sort and group the original list by 'id' and accumulate the 'val' for each group:
from itertools import groupby
key_fnc = lambda d: d['id']
result = [
{'id': k, 'val': sum([d['val'] for d in g], [])}
for k, g in groupby(sorted(my_list, key=key_fnc), key=key_fnc)
]
I have a list of dicts shown below , I want to merge some dicts into one based some key/value pair.
[
{'key': 16, 'value': 3, 'user': 3, 'id': 7},
{'key': 17, 'value': 4, 'user': 3, 'id': 7},
{'key': 17, 'value': 5, 'user': 578, 'id': 7},
{'key': 52, 'value': 1, 'user': 3, 'id': 48},
{'key': 46, 'value': 2, 'user': 578, 'id': 48}
]
Now as you can see dict 1 & 2 have same values for user & id keys. So it is possible to merge these two dicts like
[
{'key': [16,17], 'value': [3,4], 'user': 3, 'id': 7},
{'key': [17], 'value': [5], 'user': 578, 'id': 7},
{'key': [52], 'value': [1], 'user': 3, 'id': 48},
{'key': [46], 'value': [2], 'user': 578, 'id': 48}
]
means user & id value must be unique together.What will be the efficient way to merge (if possible)
Following function will convert the list of dictionaries to new format:
def convert(d):
res = {}
for x in d:
key = (x['user'], x['id'])
if key in res:
res[key]['key'].append(x['key'])
res[key]['value'].append(x['value'])
else:
x['key'] = [x['key']]
x['value'] = [x['value']]
res[key] = x
return res.values()
It will mutate the original dictionaries and the ordering of dictionaries in the result will be random. When applied to the input it will produce following result:
[
{'id': 7, 'value': [5], 'key': [17], 'user': 578},
{'id': 7, 'value': [3, 4], 'key': [16, 17], 'user': 3},
{'id': 48, 'value': [1], 'key': [52], 'user': 3},
{'id': 48, 'value': [2], 'key': [46], 'user': 578}
]
Let dicts be your original list of dictionaries. This idea maps unique combinations of user and id to defaultdict(list) objects. The final result will be the list of values from that dictionary.
from collections import defaultdict
tmp = defaultdict(dict)
for info in dicts:
tmp[(info['user'], info['id'])].setdefault('key', []).append(info['key'])
tmp[(info['user'], info['id'])].setdefault('value', []).append(info['value'])
for (user, id_), d in tmp.items(): # python2: use iteritems
d.update(dict(user=user, id=id_))
result = list(tmp.values()) # python2: tmp.values() already gives a list
del tmp
You can use following aggregate function:
def aggregate(lst):
new = {}
for d in lst:
new.setdefault((d['user'], d['id']), []).append(d)
for k, d in new.items():
if len(d) > 1:
keys, values = zip(*[(sub['key'], sub['value']) for sub in d])
user, id_ = k
yield {'key': keys, 'value': values, 'user': user, 'id': id_}
else:
yield d[0]
print list(aggregate(lst))
[{'id': 7, 'value': 5, 'key': 17, 'user': 578},
{'id': 7, 'value': (3, 4), 'key': (16, 17), 'user': 3},
{'id': 48, 'value': 1, 'key': 52, 'user': 3},
{'id': 48, 'value': 2, 'key': 46, 'user': 578}]