Python: mimic the effect of groupby with list/dict comprehension - python

I need to transform a list of dict:
original = [
{'type': 'a', 'length': 34, 'width': 74},
{'type': 'a', 'length': 15, 'width': 22},
{'type': 'b', 'length': 53, 'width': 54},
{'type': 'b', 'length': 11, 'width': 45},
]
into a dict with the value of type key as the key:
expected = {
'a': [
{'type': 'a', 'length': 34, 'width': 74},
{'type': 'a', 'length': 15, 'width': 22},
],
'b': [
{'type': 'b', 'length': 53, 'width': 54},
{'type': 'b', 'length': 11, 'width': 45},
],
}
This can be achieved with itertools.groupby or by iterating through the list manually, but is there any way to do it with just list/dict comprehension?

You could do something like this:
{t: [i for i in original if i['type'] == t] for t in {i['type'] for i in original}}
But it's both difficult to read and has a worst-case runtime complexity of O(n²), where n is the number of items in the list. Using itertools.groupby on a sorted list is both faster and easier to read.

Related

How to remove empty key-value from dictionary comprehension when applying filter

I am new to python and learning how to use a dictionary comprehension. I have a movie cast dictionary that I would like to filter on a specific value using the dictionary comprehension technique. I was able to get it work but for some reason I get empty dictionaries added as well if the condition is not met. Why does it do it? And how can I ensure these are not included?
movie_cast = [{'id': 90633,'name': 'Gal Gadot','cast_id': 0, 'order': 0},
{'id': 62064, 'name': 'Chris Pine','cast_id': 15, 'order': 1},
{'id': 41091, 'name': 'Kristen Wiig', 'cast_id': 12,'order': 2},
{'id': 41092, 'name': 'Pedro Pascal', 'cast_id': 13, 'order': 3},
{'id': 32, 'name': 'Robin Wright', 'cast_id': 78, 'order': 4}]
limit = 1
cast_limit = []
for dict in movie_cast:
d = {key:value for (key,value) in dict.items() if dict['order'] < limit}
cast_limit.append(d)
print(cast_limit)
current_result = [{'id': 90633,'name': 'Gal Gadot','cast_id': 0, 'order': 0},
{'id': 62064, 'name': 'Chris Pine','cast_id': 15, 'order': 1},{},{},{}]
desired_result = [{'id': 90633,'name': 'Gal Gadot','cast_id': 0, 'order': 0},
{'id': 62064, 'name': 'Chris Pine','cast_id': 15, 'order': 1}]
Try with this (you need a list comprehension, not a dict comprehension):
cast_limit = [dct for dct in movie_cast if dct['order'] < limit]
I.e., you need to filter out elements of the list, not elements of a dict.

How do i sort a dictionary by a 'subkey' in Python

Could i get some guidance on how to do this.
Variable
people = {'adam': {'distance': 14, 'age': 22, 'height': 1.3}, 'charles': {'distance': 3, 'age': 37, 'height': 1.4}, 'jeff': {'distance': 46, 'age': 42, 'height': 1.6}}
My Intended output after sorting the people variable by the subkey 'distance'
people = {'charles': {'distance': 3, 'age': 37, 'height': 1.4}, 'adam': {'distance': 14, 'age': 22, 'height': 1.3}, 'jeff': {'distance': 46, 'age': 42, 'height': 1.6}}
Most answers provide a way to create a new dictionary based the contents of the old. If you want to simply reorder the keys of the existing dictionary, you can do something similar:
for k in sorted(people, key=lambda x: people[x]['distance']):
people[k] = people.pop(k)
When a key is removed, it is also removed from the iteration order. Adding it back makes it the last key in the iteration order. Repeat this for every key, and you redefine the iteration order of the keys. This works because sorted completes its iteration over the dict before the for loop starts modifying it.
Just use sorted()
people = dict(sorted(people.items(), key=lambda x: x[1]['distance']))
or
people = {k: v for k, v in sorted(people.items(), key=lambda x: x[1]['distance'])}
try the following code:
people = {'adam': {'distance': 14, 'age': 22, 'height': 1.3}, 'charles': {'distance': 3, 'age': 37, 'height': 1.4}, 'jeff': {'distance': 46, 'age': 42, 'height': 1.6}}
people = dict(sorted(people.items(), key=lambda item: item[1]['distance'], reverse=False))
print(people)
Output:
people = {'charles': {'distance': 3, 'age': 37, 'height': 1.4}, 'adam': {'distance': 14, 'age': 22, 'height': 1.3}, 'jeff': {'distance': 46, 'age': 42, 'height': 1.6}}

Count duplicates in dictionary by specific keys

I have a list of dictionaries and I need to count duplicates by specific keys.
For example:
[
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
If will specify 'age' and 'country' it should return
[
{
'age': 10,
'country': 'USA',
'count': 2
},
{
'age': 10,
'country': 'Canada',
'count': 2
},
{
'age': 15,
'country': 'Canada',
'count': 1
}
]
Or if I will specify 'name' and 'height':
[
{
'name': 'John',
'height': 185,
'count': 2
},
{
'name': 'Mark',
'height': 180,
'count': 2
},
{
'name': 'Doe',
'heigth': 185,
'count': 1
}
]
Maybe there is a way to implement this by Counter?
You can use itertools.groupby with sorted list:
>>> data = [
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
>>> from itertools import groupby
>>> key = 'age', 'country'
>>> list_sorter = lambda x: tuple(x[k] for k in key)
>>> grouper = lambda x: tuple(x[k] for k in key)
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'age': 10, 'country': 'Canada', 'count': 2},
{'age': 10, 'country': 'USA', 'count': 2},
{'age': 15, 'country': 'Canada', 'count': 1}]
>>> key = 'name', 'height'
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
If you use pandas then you can use, pandas.DataFrame.groupby, pandas.groupby.size, pandas.Series.to_frame, pandas.DataFrame.reset_index and finally pandas.DataFrame.to_dict with orient='records':
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> df.groupby(list(key)).size().to_frame('count').reset_index().to_dict('records')
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]

Sorting list of dictionaries---what is the default behaviour (without key parameter)?

I m trying to sort a list of dict using sorted
>>> help(sorted)
Help on built-in function sorted in module __builtin__:
sorted(...)
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list
I have just given list to sorted and it sorts according to id.
>>>l = [{'id': 4, 'quantity': 40}, {'id': 1, 'quantity': 10}, {'id': 2, 'quantity': 20}, {'id': 3, 'quantity': 30}, {'id': 6, 'quantity': 60}, {'id': 7, 'quantity': -30}]
>>> sorted(l) # sorts by id
[{'id': -1, 'quantity': -10}, {'id': 1, 'quantity': 10}, {'id': 2, 'quantity': 20}, {'id': 3, 'quantity': 30}, {'id': 4, 'quantity': 40}, {'id': 6, 'quantity': 60}, {'id': 7, 'quantity': -30}]
>>> l.sort()
>>> l # sorts by id
[{'id': -1, 'quantity': -10}, {'id': 1, 'quantity': 10}, {'id': 2, 'quantity': 20}, {'id': 3, 'quantity': 30}, {'id': 4, 'quantity': 40}, {'id': 6, 'quantity': 60}, {'id': 7, 'quantity': -30}]
Many example of sorted says it requires key to sort the list of dict. But I didn't give any key. Why it didn't sort according to quantity? How did it choose to sort with id?
I tried another example with name & age,
>>> a
[{'age': 1, 'name': 'john'}, {'age': 3, 'name': 'shyam'}, {'age': 30,'name': 'ram'}, {'age': 15, 'name': 'rita'}, {'age': 5, 'name': 'sita'}]
>>> sorted(a) # sorts by age
[{'age': 1, 'name': 'john'}, {'age': 3, 'name': 'shyam'}, {'age': 5, 'name':'sita'}, {'age': 15, 'name': 'rita'}, {'age': 30, 'name': 'ram'}]
>>> a.sort() # sorts by age
>>> a
[{'age': 1, 'name': 'john'}, {'age': 3, 'name': 'shyam'}, {'age': 5, 'name':'sita'}, {'age': 15, 'name': 'rita'}, {'age': 30, 'name': 'ram'}]
Here it sorts according to age but not name. What am I missing in default behavior of these method?
From some old Python docs:
Mappings (dictionaries) compare equal if and only if their sorted (key, value) lists compare equal. Outcomes other than equality are resolved consistently, but are not otherwise defined.
Earlier versions of Python used lexicographic comparison of the sorted (key, value) lists, but this was very expensive for the common case of comparing for equality. An even earlier version of Python compared dictionaries by identity only, but this caused surprises because people expected to be able to test a dictionary for emptiness by comparing it to {}.
Ignore the default behaviour and just provide a key.
By default it will compare against the first difference it finds. If you are sorting dictionaries this is quite dangerous (consistent yet undefined).
Pass a function to key= parameter that takes a value from the list (in this case a dictionary) and returns the value to sort against.
>>> a
[{'age': 1, 'name': 'john'}, {'age': 3, 'name': 'shyam'}, {'age': 30,'name': 'ram'}, {'age': 15, 'name': 'rita'}, {'age': 5, 'name': 'sita'}]
>>> sorted(a, key=lambda d : d['name']) # sorts by name
[{'age': 1, 'name': 'john'}, {'age': 30, 'name': 'ram'}, {'age': 15, 'name': 'rita'}, {'age': 3, 'name': 'shyam'}, {'age': 5, 'name': 'sita'}]
See https://wiki.python.org/moin/HowTo/Sorting
The key parameter is quite powerful as it can cope with all sorts of data to be sorted, although maybe not very intuitive.

Python dictionaries, find similarities

I have a python dictionary with a thousand items. Each item is, itself, a dictionary. I'm looking for a clean and elegant way to parse through each item, and find & create templates.
Here's a simplified example of the individual dictionaries' structure:
{'id': 1,
'template': None,
'height': 80,
'width': 120,
'length': 75,
'weight': 100}
From this, I want to pass through once, and if, 500 of the 1000 share the same height and width, determine that, so I can build a template off that data, and assign the template id to 'template'. I can build a gigantic reference hash, but I'm hoping there's a cleaner more elegant way to accomplish this.
The actual data includes closer to 30 keys, of which a small subset need to be excluded from the template checking.
Given dict of dicts items:
import itertools as it
for (height, width), itemIter in it.groupby (items.values(), lambda x: (x['height'], x['width'])):
# in list(itemIter) you will find all items with dimensions (height, width)
#eumiro had an excellent core idea, namely that of using itertools.groupby() to arrange the items with common values together in batches. However besides neglecting to sort things first using the same key function as #Jochen Ritzel pointed-out (and is also mentioned in the documentation), he also didn't address the several other things you mentioned wanting to do.
Below is a more complete and somewhat longer answer. It determines the templates and assigns them in one pass thought the dict-of-dicts. To do this, after first creating a sorted list of items, it uses groupby() to batch them, and if there are enough in each group, creates a template and assigns its ID to each member.
inventory = {
'item1': {'id': 1, 'template': None, 'height': 80, 'width': 120, 'length': 75, 'weight': 100},
'item2': {'id': 2, 'template': None, 'height': 30, 'width': 40, 'length': 20, 'weight': 20},
'item3': {'id': 3, 'template': None, 'height': 80, 'width': 100, 'length': 96, 'weight': 150},
'item4': {'id': 4, 'template': None, 'height': 30, 'width': 40, 'length': 60, 'weight': 75},
'item5': {'id': 5, 'template': None, 'height': 80, 'width': 100, 'length': 36, 'weight': 33}
}
import itertools as itools
def print_inventory():
print 'inventory:'
for key in sorted(inventory.iterkeys()):
print ' {}: {}'.format(key, inventory[key])
print "-- BEFORE --"
print_inventory()
THRESHOLD = 2
ALLKEYS = ['template', 'height', 'width', 'length', 'weight']
EXCLUDEDKEYS = ['template', 'length', 'weight']
INCLUDEDKEYS = [key for key in ALLKEYS if key not in EXCLUDEDKEYS]
# determines which keys make up a template
sortby = lambda item, keys=INCLUDEDKEYS: tuple(item[key] for key in keys)
templates = {}
templateID = 0
sortedinventory = sorted(inventory.itervalues(), key=sortby)
for templatetuple, similariter in itools.groupby(sortedinventory, sortby):
similaritems = list(similariter)
if len(similaritems) >= THRESHOLD:
# create and assign a template
templateID += 1
templates[templateID] = templatetuple # tuple of values of INCLUDEDKEYS
for item in similaritems:
item['template'] = templateID
print
print "-- AFTER --"
print_inventory()
print
print 'templates:', templates
print
When I run it, the following is the output:
-- BEFORE --
inventory:
item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': None, 'id': 2}
item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': None, 'id': 3}
item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': None, 'id': 4}
item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': None, 'id': 5}
-- AFTER --
inventory:
item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': 1, 'id': 2}
item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': 2, 'id': 3}
item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': 1, 'id': 4}
item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': 2, 'id': 5}
templates: {1: (30, 40), 2: (80, 100)}

Categories

Resources