groupby iterator not adding to list in dictionary comprehension - python

I have a db query that returns a list. I then do a a dictionary comprehension like so:
results = {product: [g for g in group] for product, group in groupby(db_results, lambda x: x.product_id)}
The problem is that the value of the dictionary is only returning 1 value. I assume this do to the fact that the group is an iterator.
The following returns each item of the group, so I know that they are there:
groups = groupby(db_results, lambda x: x.product_id)
for k,g in groups:
if k==1001:
print list(g)
I am trying to get all the values of g in the above in a list whose key is the key of dictionary.
I've tried many variations like:
blah = dict((k,list(v)) for k,v in groupby(db_results, key=lambda x: x.product_id))
but I can't get it right.

If you insist on using groupby, then you need to make sure that the input is sorted byt the same key that you group on, however, I think I would suggest that you use defaultdict instead:
from collections import defaultdict
blah = defaultdict(list)
for item in db_results:
blah[item.product_id].append(item)

Related

Python dictionary comprehension to group together equal keys

I have a code snippit that groups together equal keys from a list of dicts and adds the dict with equal ObjectID to a list under that key.
Code bellow works, but I am trying to convert it to a Dictionary comprehension
group togheter subblocks if they have equal ObjectID
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = []
output[row["OBJECTID"]].append(row)
Using a comprehension is possible, but likely inefficient in this case, since you need to (a) check if a key is in the dictionary at every iteration, and (b) append to, rather than set the value. You can, however, eliminate some of the boilerplate using collections.defaultdict:
output = defaultdict(list)
for row in subblkDBF:
output[row['OBJECTID']].append(row)
The problem with using a comprehension is that if really want a one-liner, you have to nest a list comprehension that traverses the entire list multiple times (once for each key):
{k: [d for d in subblkDBF if d['OBJECTID'] == k] for k in set(d['OBJECTID'] for d in subblkDBF)}
Iterating over subblkDBF in both the inner and outer loop leads to O(n^2) complexity, which is pointless, especially given how illegible the result is.
As the other answer shows, these problems go away if you're willing to sort the list first, or better yet, if it is already sorted.
If rows are sorted by Object ID (or all rows with equal Object ID are at least next to each other, no matter the overall order of those IDs) you could write a neat dict comprehension using itertools.groupby:
from itertools import groupby
from operator import itemgetter
output = {k: list(g) for k, g in groupby(subblkDBF, key=itemgetter("OBJECTID"))}
However, if this is not the case, you'd have to sort by the same key first, making this a lot less neat, and less efficient than above or the loop (O(nlogn) instead of O(n)).
key = itemgetter("OBJECTID")
output = {k: list(g) for k, g in groupby(sorted(subblkDBF, key=key), key=key)}
You can adding an else block to safe on time n slightly improve perfomrance a little:
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = [row]
else:
output[row["OBJECTID"]].append(row)

sorting a list by names in python

I have a list of filenames. I need to group them based on the ending names after underscore ( _ ). My list looks something like this:
[
'1_result1.txt',
'2_result2.txt',
'3_result2.txt',
'4_result3.txt',
'5_result4.txt',
'6_result1.txt',
'7_result2.txt',
'8_result3.txt',
]
My end result should be:
List1 = ['1_result1.txt', '6_result1.txt']
List2 = ['2_result2.txt', '3_result2.txt', '7_result2.txt']
List3 = ['4_result3.txt', '8_result3.txt']
List4 = ['5_result4.txt']
This will come down to making a dictionary of lists, then iterating the input and adding each item to its proper list:
output = {}
for item in inlist:
output.setdefault(item.split("_")[1], []).append(item)
print output.values()
We use setdefault to make sure there's a list for the entry, then add our current filename to the list. output.values() will return just the lists, not the entire dictionary, which appears to be what you want.
using defaultdict from collections module:
from collections import defaultdict
output = defaultdict(list)
for file in data:
output[item.split("_")[1]].append(file)
print output.values()
using groupby from itertools module:
data.sort(key=lambda x: x.split('_')[1])
for key, group in groupby(data, lambda x: x.split('_')[1]):
print list(group)
Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.
The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.
So if l is the name of your list then you could use something like :
l.sort(key=lambda s: s.split('_')[1])
More information about key functions at here

Sort keys in dictionary by value in a list in Python

I have seen this post and this post as well as many others, but haven't quite found the answer to my question nor can I figure it out.
I have a dictionary of lists. For example it looks like:
Dict = {'a':[1,2,3,4], 'b':[9,8,7,6], 'c':[8,5,3,2]}
I want to return a list of the keys sorted (descending/reverse) based on a specific item in the lists. For example, I want to sort a,b,c based on the 4th item in each list.
This should return the list sorted_keys = ['b','a','c'] which were sorted by values [6,4,2].
Make sense? Please help...thanks!
Supply a key function, a lambda is easiest, and sort reversed:
sorted(Dict.keys(), key=lambda k: Dict[k][3], reverse=True)
The key function tells sorted what to sort by; the 4th item in the value for the given key.
Demo:
>>> sorted(Dict.keys(), key=lambda k: Dict[k][3], reverse=True)
['b', 'a', 'c']

sort dictionary of objects

I have a dictionary of objects where the key is a simple string, and the value is a data object with a few attributes. I'd like to sort my dictionary based on an attribute in the values of the dictionary. i have used this to sort based on the dictionaries values
sorted = dict.values()
sorted.sort(key = operator.attrgetter('total'), reverse=True)
This yields a sorted list of values (which is expected) and I lose my original keys from the dictionary (naturally). I would like to sort both the keys and values together... how can I achieve this? Any help would be greatly appreciated?
Use .items() (or its iterator version iteritems) instead of .values() to get a list of (key, value) tuples.
items = sorted(dct.iteritems(), key=lambda x: x[1].total, reverse=True)
You'll want to use .items() rather than .values(), for example:
def keyFromItem(func):
return lambda item: func(*item)
sorted(
dict.items(),
key=keyFromItem( lambda k,v: (v['total'], k) )
)
The above will sort first based on total, and for items with equal total, will sort them alphabetically by key. It will return items as (key,value) pairs, which you could just do [x[1] for x in sorted(...)] to get the values.
Use items instead of values - and a just use a lambda to fecth the sorting key itself, since there won't be a ready made operator for it:
sorted = dict.items()
sorted.sort(key = lambda item: item[1].total, reverse=True)

Sorting a dict with tuples as values

I have a dictionary that looks like this:
{'key_info': (rank, raw_data1, raw_data2),
'key_info2': ...}
Basically I need back a list of the keys in sorted order, that is sorted based on the rank field in the tuple.
My code looks something like this right now (diffs is the name of the dict above):
def _sortRanked(self):
print(type(self.diffs))
return sorted(self.diffs.keys(), key=lambda x: x[1], reverse=True)
that right now returns this when I run it:
return sorted(self.diffs.keys(), key=lambda x: x[1], reverse=True)
IndexError: string index out of range
keys() only gives you keys, not values, so you have to use the keys to retrieve values from the dict if you want to sort on them:
return sorted(self.diffs.keys(), key=lambda x: self.diffs[x], reverse=True)
Since you're sorting on rank, which is the first item in the tuple, you don't need to specify which item in the value tuple you want to sort on. But if you wanted to sort on raw_data1:
return sorted(self.diffs.keys(), key=lambda x: self.diffs[x][1], reverse=True)
You're passing the key as the argument to, uh, key.
[k for (k, v) in sorted(D.iteritems(), key=lambda x: x[1], reverse=True)]
You're attempting to sort on the keys of the dictionary, not the values. Replace your self.diffs.keys() call with self.diffs.items(), and then it should work (but do keep the lambda, or use operator.itemgetter(1). Tuples sort starting with the first element, so you don't have to worry about that.)
Just noticed that you only want the keys. With my suggestion, you'd have to wrap the sort with zip()[0] (making sure to unpack the resultant list of tuples from the sort by prefixing with * in the call to zip()).
You're close. Try this instead:
return sorted(self.diffs.keys(), key = lambda x: self.diffs[x][0], reverse = True)
You're sorting a list of keys, so you have to take that key back to the dictionary and retrieve element 1 in order to use it as a comparison value.

Categories

Resources