Python dictionary comprehension to group together equal keys - python

I have a code snippit that groups together equal keys from a list of dicts and adds the dict with equal ObjectID to a list under that key.
Code bellow works, but I am trying to convert it to a Dictionary comprehension
group togheter subblocks if they have equal ObjectID
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = []
output[row["OBJECTID"]].append(row)

Using a comprehension is possible, but likely inefficient in this case, since you need to (a) check if a key is in the dictionary at every iteration, and (b) append to, rather than set the value. You can, however, eliminate some of the boilerplate using collections.defaultdict:
output = defaultdict(list)
for row in subblkDBF:
output[row['OBJECTID']].append(row)
The problem with using a comprehension is that if really want a one-liner, you have to nest a list comprehension that traverses the entire list multiple times (once for each key):
{k: [d for d in subblkDBF if d['OBJECTID'] == k] for k in set(d['OBJECTID'] for d in subblkDBF)}
Iterating over subblkDBF in both the inner and outer loop leads to O(n^2) complexity, which is pointless, especially given how illegible the result is.
As the other answer shows, these problems go away if you're willing to sort the list first, or better yet, if it is already sorted.

If rows are sorted by Object ID (or all rows with equal Object ID are at least next to each other, no matter the overall order of those IDs) you could write a neat dict comprehension using itertools.groupby:
from itertools import groupby
from operator import itemgetter
output = {k: list(g) for k, g in groupby(subblkDBF, key=itemgetter("OBJECTID"))}
However, if this is not the case, you'd have to sort by the same key first, making this a lot less neat, and less efficient than above or the loop (O(nlogn) instead of O(n)).
key = itemgetter("OBJECTID")
output = {k: list(g) for k, g in groupby(sorted(subblkDBF, key=key), key=key)}

You can adding an else block to safe on time n slightly improve perfomrance a little:
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = [row]
else:
output[row["OBJECTID"]].append(row)

Related

Python: replace values of sublist, with values looked up from another sublist without indexing

Description
I have two lists of lists which are derived from CSVs (minimal working example below). The real dataset for this too large to do this manually.
mainlist = [["MH75","QF12",0,38], ["JQ59","QR21",105,191], ["JQ61","SQ48",186,284], ["SQ84","QF36",0,123], ["GA55","VA63",80,245], ["MH98","CX12",171,263]]
replacelist = [["MH75","QF12","BA89","QR29"], ["QR21","JQ59","VA51","MH52"], ["GA55","VA63","MH19","CX84"], ["SQ84","QF36","SQ08","JQ65"], ["SQ48","JQ61","QF87","QF63"], ["MH98","CX12","GA34","GA60"]]
mainlist contains a pair of identifiers (mainlist[x][0], mainlist[x][1]) and these are associated with to two integers (mainlist[x][2] and mainlist[x][3]).
replacelist is a second list of lists which also contains the same pairs of identifiers (but not in the same order within a pair, or across rows). All sublist pairs are unique. Importantly, replacelist[x][2],replacelist[x][3] corresponds to a replacement for replacelist[x][0],replacelist[x][1], respectively.
I need to create a new third list, newlist which copies mainlist but replaces the identifiers with those from replacelist[x][2],replacelist[x][3]
For example, given:
mainlist[2] is: [JQ61,SQ48,186,284]
The matching pair in replacelist is
replacelist[4]: [SQ48,JQ61,QF87,QF63]
Therefore the expected output is
newlist[2] = [QF87,QF63,186,284]
More clearly put:
if replacelist = [[A, B, C, D]]
A is replaced with C, and B is replaced with D.
but it may appear in mainlist as [[B, A]]
Note newlist row position uses the same as mainlist
Attempt
What has me totally stumped on a simple problem is I feel I can't use basic list comprehension [i for i in replacelist if i in mainlist] as the order within a pair changes, and if I sorted(list) then I lose information about what to replace the lists with. Current solution (with commented blanks):
newlist = []
for k in replacelist:
for i in mainlist:
if k[0] and k[1] in i:
# retrieve mainlist order, then use some kind of indexing to check a series of nested if statements to work out positional replacement.
As you can see, this solution is clearly inefficient and I can't work out the best way to perform the final step in a few lines.
I can add more information if this is not clear
It'll help if you had replacelist as a dict:
mainlist = [[MH75,QF12,0,38], [JQ59,QR21,105,191], [JQ61,SQ48,186,284], [SQ84,QF36,0,123], [GA55,VA63,80,245], [MH98,CX12,171,263]]
replacelist = [[MH75,QF12,BA89,QR29], [QR21,JQ59,VA51,MH52], [GA55,VA63,MH19,CX84], [SQ84,QF36,SQ08,JQ65], [SQ48,JQ61,QF87,QF63], [MH98,CX12,GA34,GA60]]
replacements = {frozenset(r[:2]):dict(zip(r[:2], r[2:])) for r in replacements}
newlist = []
for *ids, val1, val2 in mainlist:
reps = replacements[frozenset([id1, id2])]
newlist.append([reps[ids[0]], reps[ids[1]], val1, val2])
First thing you do - transform both lists in a dictionary:
from collections import OrderedDict
maindct = OrderedDict((frozenset(item[:2]),item[2:]) for item in mainlist)
replacedct = {frozenset(item[:2]):item[2:] for item in replacementlist}
# Now it is trivial to create another dict with the desired output:
output_list = [replacedct[key] + maindct[key] for key in maindct]
The big deal here is that by using a dictionary, you cancel up the search time for the indices on the replacement list - in a list you have to scan all the list for each item you have, which makes your performance worse with the square of your list length. With Python dictionaries, the search time is constant - and do not depend on the data length at all.

Copying first n items of a dictionary into another dictionary

This is a simple question but I am unable to code this in python. I want to copy first n items ( supposedly 100 ) i.e both the values and keys into another empty dictionary. I'll give a more clear picture of this. I created a dictionary and sorted it using OrderedDict. My code for sorting it is :
ordP = OrderedDict(reversed(sorted(wcP.items(), key=lambda t: t[1])))
Here ordP is the ordered dictionary I got. This is in descending order. And my original dictionary is wcP. I want to put the first 100 values of ordP i.e the first 100 maximum values of ordP ( sorted according to the keys ) in a new dictionary.
Dictionaries aren't ordered, but if you just want a random selection:
new_values = dict(your_values.items()[:n])
Or, for those obsessed with laziness:
import itertools
new_values = dict(itertools.islice(your_values.iteritems(), n))
If there's a particular sort you want to impose, define a key function that takes the key and value. People usually do lambdas, but there's no reason you can't use a full function.
def example_key_func((key, value)):
return key * value
new_dict = dict(sorted(your_values.items(), key=example_key_func)[:n])
n = 100
assert len(d.keys()) >= n
dic100 = {k:v for k,v in list(d.items())[:n]}

Is there a pythonic way to grab the total amount of variables within lists inside a dict?

I have a dictionary where each key has a list (vector) of items:
from collections import defaultdict
dict = defaultdict(list)
dict[133] = [2,4,64,312]
dict[4] = [2,3,5,12,45,32]
dict[54] = [12,2,443,223]
def getTotalVectorItems(items):
total = 0
for v in items.values():
total += len(v)
return total
print getTotalVectorItems(dict)
This will print:
14 # number of items in the 3 dict keys.
Is there an easier more pythonic way other than creating this "getTotalVectorItems" function? I feel like there is a quick way to do this already.
You are looking for the sum() built-in with a generator expression:
sum(len(v) for v in items.values())
The sum() function totals the values of the given iterator, and the generator expression yields the length of each value in the lists.
Note that calling lists vectors is probably confusing to most Python programmers, unless you are using the term vector in the context of the domain of the problem.
print sum(map(len,dic.itervalues()))

Correspendence between list indices originated from dictionary

I wrote the below code working with dictionary and list:
d = computeRanks() # dictionary of id : interestRank pairs
lst = list(d) # tuples (id, interestRank)
interestingIds = []
for i in range(20): # choice randomly 20 highly ranked ids
choice = randomWeightedChoice(d.values()) # returns random index from list
interestingIds.append(lst[choice][0])
There seems to be possible error because I'm not sure if there is a correspondence between indices in lst and d.values().
Do you know how to write this better?
One of the policies of dict is that the results of dict.keys() and dict.values() will correspond so long as the contents of the dictionary are not modified.
As #Ignacio says, the index choice does correspond to the intended element of lst, so your code's logic is correct. But your code should be much simpler: d already contains IDs for the elements, so rewrite randomWeightedChoice to take a dictionary and return an ID.
Perhaps it will help you to know that you can iterate over a dictionary's key-value pairs with d.items():
for k, v in d.items():
etc.

How to separate one list in two via list comprehension or otherwise

If have a list of dictionary items like so:
L = [{"a":1, "b":0}, {"a":3, "b":1}...]
I would like to split these entries based upon the value of "b", either 0 or 1.
A(b=0) = [{"a":1, "b":1}, ....]
B(b=1) = [{"a":3, "b":2}, .....]
I am comfortable with using simple list comprehensions, and i am currently looping through the list L two times.
A = [d for d in L if d["b"] == 0]
B = [d for d in L if d["b"] != 0]
Clearly this is not the most efficient way.
An else clause does not seem to be available within the list comprehension functionality.
Can I do what I want via list comprehension?
Is there a better way to do this?
I am looking for a good balance between readability and efficiency, leaning towards readability.
Thanks!
update:
thanks everyone for the comments and ideas! the most easiest one for me to read is the one by Thomas. but i will look at Alex' suggestion as well. i had not found any reference to the collections module before.
Don't use a list comprehension. List comprehensions are for when you want a single list result. You obviously don't :) Use a regular for loop:
A = []
B = []
for item in L:
if item['b'] == 0:
target = A
else:
target = B
target.append(item)
You can shorten the snippet by doing, say, (A, B)[item['b'] != 0].append(item), but why bother?
If the b value can be only 0 or 1, #Thomas's simple solution is probably best. For a more general case (in which you want to discriminate among several possible values of b -- your sample "expected results" appear to be completely divorced from and contradictory to your question's text, so it's far from obvious whether you actually need some generality;-):
from collections import defaultdict
separated = defaultdict(list)
for x in L:
separated[x['b']].append(x)
When this code executes, separated ends up with a dict (actually an instance of collections.defaultdict, a dict subclass) whose keys are all values for b that actually occur in dicts in list L, the corresponding values being the separated sublists. So, for example, if b takes only the values 0 and 1, separated[0] would be what (in your question's text as opposed to the example) you want as list A, and separated[1] what you want as list B.

Categories

Resources