Fastest way to get key with most items in a dictionary? - python

I am trying to find the fastest way to get the dictionary key with the most items. The two methods I have tried so far are:
def get_key_with_most_items(d):
maxcount = max(len(v) for v in d.values())
return [k for k, v in d.items() if len(v) == maxcount][0]
and
def sort_by_values_len(dict):
dict_len = {key: len(value) for key, value in dict.items()}
import operator
sorted_key_list = sorted(dict_len.items(), key=operator.itemgetter(1), reverse=True)
sorted_dict = [{item[0]: dict[item [0]]} for item in sorted_key_list]
return sorted_dict
The first method return the key with the biggest number of items, while the second returns the whole dictionary as a list. In my case I only need the key, just to be clear. After comparing these methods in this manner:
start_time = time.time()
for i in range(1000000):
get_key_with_most_items(my_dict) # sort_by_values_len(my_dict)
print("Time", (time.time() - start_time))
I have come to the conclusion that the get_key_with_most_items method is faster by almost 50%, with times 15.68s and 8.06s respectively. Could anyone recommend (if possible) something even faster?

The solution is extremely simple:
max(d, key=lambda x: len(d[x]))
Explanation:
dictionaries, when iterated, are just a set of keys. max(some_dictionary) will take maximum of keys
max optionally accepts a comparison function (key). To compare dictionary keys by the amount of items, the built-in len does just the job.

Use d.items() to get a sequence of the keys and values. Then get the max of this from the length of the values.
def get_key_with_most_items(d):
maxitem = max(d.items(), key = lambda item: len(item[1]))
return maxitem[0]

for the max function:
max(d, key=lambda k: len(d[k]))
If you want the dict to be ordered, then use OrderedDict. I think technically your code will still work with regular dict, but that's a technicality based on the current implementation of Pythons dict - In the past regular dict would not have reliable order, and in the future it may not.
You could do this for example as a one liner to turn your dict into an ordered dict by value length:
from collections import OrderedDict
ordered_dict = OrderedDict(sorted(d.items(), key=lambda t: len(t[1])))

Related

filter a dictionary by key less than some range of values

I have a dictionary like this,
d = {1:'a', 2:'b', 3:'c', 4:'d'}
Now I want to filter the dictionary where the key should be more than 1 and less than 4 so the dictionary will be,
d = {2:'b', 3:'c'}
I could do this using a for loop, iterating over all the keys. but the execution time will be more looking for some fastest way to do this more efficiently in pythonic way.
you can try below code:
d = {k:v for k,v in d.items() if 1<k<4}
Pythonic way would be to use a dictionary comprehension:
{key: value for key, value in d.items() if 1 < key < 4}
It's readable enough: for each key and value in the items of the dictionary, keep those key: value pairs that have their keys between 1 and 4.
More pythonic way would be a dictionary comprehension
d = {k: v for (k, v) in d.items() if k > 1 and k < 4}
If the efficiency is a bottleneck, you may want to try to use some tree based structure instead of a dictionary that is hash based.
Python dictionaries use hashes of their keys to efficiently lookup and store data. Unfortunately, that means that they don't allow you use the numeric properties of those keys to select data (the way you might be able to do using a slice to index a list).
So I think the best way to do what you want is probably just a dictionary comprehension that tests each key against your boundary values:
d = {key: value for key, value in d.items() if 1 < key < 4}

avoid loops and increase performance to update a dict

I have a dict of the form:
dict1[element1] : reference1
dict1[element2] : reference2
dict1[element3] : reference2
There a some elements have the same reference (like element2 and element3 have).
I need to convert this into a dict with the following form:
dict2[reference1] : [element1]
dict2[reference2] : [element2,element3]
To get this I wrote:
dict2=dict()
for key in dict1:
UpdateDict(dict2,dict1[key],key)
def UpdateDict(Dict,Key,Entry):
Keys = list(Dict.keys())
if Key in Keys:
Dict[Key].append(Entry)
return
else:
Item = list()
Item.append(Entry)
Dict[Key] = Item
return
This works fine until dict1 is not very large, but if dict1 is large (some 1000 keys) it takes hours to get the result.
Is there a faster way to do it?
This:
Keys = list(Dict.keys())
if Key in Keys:
...
is probably the main culprit. It turns a O(1) lookup (if Key in Dict:) into a O(n) one. This plus the overhead of the one-function-call per key is certainly suboptimal indeed.
A much simpler solution is to use a collections.defaultdict:
from collections import defaultdict
def revindex(dic):
rev = defaultdict(list)
# nb for py2.7 use `iteritems()` instead
for k, v in dic.items():
rev[v].append(k)
return rev
dict2 = revindex(dict1)
Use a defaultdict instead of a vanilla dict to avoid those membership checks and you can remove the function calls which adds a non trivial overhead with repeated calls:
from collections import defaultdict
dct2 = defaultdict(list)
for k in dct1:
dct2[dct1[k]].append(k)
You can use a defaultdict or just dict.setdefault:
dict2 = {}
for key, value in dict1.items():
dict2.setdefault(value, []).append(key)
A function seems unnecessary for such a simple call.
Instead of defaultdicts that were already mentioned, you could also create the dict with the setdefault method, like so:
def transform(some_dict):
new_dict = {}
for k, v in some_dict.items():
new_dict.setdefault(v, []).append(k)
return new_dict
There is no need for the if statement you made. In my benchmark, this is around 2 times faster than your method. The method of Moses Koledoye beats those (on my machine) for some reason, which is in turn beaten by bruno desthuilliers' method.
For the dictionary
dict1 = dict([(str(x),x%10) for x in range(0,100000)])
I get (transform_0 being your method with together with the for loop, transform_d Moses' and transform_d2 being Bruno's methods) as average over 200 calls:
benchmark for "transform_0": 133.516 ms
benchmark for "transform": 50.696 ms
benchmark for "transform_d": 43.967 ms
benchmark for "transform_d2": 38.408 ms

Find n largest values from dictionary

I am working in Python project and I have a problem seem as what I explain below but with other data.
For example, if I have this dict:
fruitsCount= {"apple": 24, "orange": 20, "banana":18, "grape":13, "kiwi": 13}
how can I return the keys with maximum values ? what if I want to return three largest ?
I used heapq.nlargest(3, fruitCount.values) but I don't know how I return them with their keys
Note: fruitsCount is a dict returned after using Counter() from another dict.
Output: should be same fruitsCount dictionary with n largest fruitsCount.values
You need to use heapq.nlargest() on the items, and use the key argument to tell it to take the value from that pair:
heapq.nlargest(3, fruitCount.items(), key=lambda i: i[1])
This returns the 3 largest (key, value) pairs.
Or you could just use the collections.Counter() class, which has a most_common() method that does this for you:
Counter(fruitCount).most_common(3)
Having never used the heapq module, I prefer this module-free approach:
sorted( fruitsCount.items(), key=lambda pair: pair[1], reverse=True )[:3]
Smaller, but less clear:
sorted( fruitsCount.items(), key=lambda pair: -pair[1] )[:3]
You can try this one. Your answer will be apple, orange, banana.
heapq.nlargest(3, d, key=fruitsCount.get)
Time complexity in this approach will be nlog(t). N is the number of elements in the dictionary and t is the number of elements you want. In the above case, t is 3.
New to python not very sure about the shortcut, but here is my answer:
from heapq import nlargest
fruitsCount= {"apple": 24, "orange": 20, "banana":18, "grape":13, "kiwi": 13}
Reverse the key value pairs so that sorting is done wrt values:
fruitsCount = {j:i for i,j in fruitsCount.items()}
Find the largest 3 items in a separate dictionary:
x = dict(nlargest(3,fruitsCount.items()))
Reverse the dictionary back to original form:
x = {z:y for y,z in x.items()}
print(x)

How to check if a key/value is repeated elsewhere in a dictionary using Python

I have a dictionary in python like:
dict = {'dog':['milo','otis','laurel','hardy'],
'cat':['bob','joe'],
'milo':['otis','laurel','hardy','dog'],
'hardy':['dog'],'bob':['joe','cat']}
...and I want to identify if a key exists elsewhere in a dictionary (in some other list of values). There are other questions I could find that want to know if an item simply exists in the dictionary, but this is not my question. The same goes for items in each list of values, to identify items that do not exist in OTHER keys and their associated values in the dictionary.
In the above example, the idea is that dogs and cats are not equal, their keys/values have nothing in common with those that come from cats. Ideally, a second dictionary would be created that collects all of those associated with each unique cluster:
unique.dict = {'cluster1':['dog','milo','otis','laurel','hardy'],
'cluster2':['cat','bob','joe'] }
This is a follow up question to In Python, count unique key/value pairs in a dictionary
It appears that the relationship is symmetric, but your data is not (e.g. there is no key 'otis'). The first part involves making it symmetric, so it won't matter where we start.
(If your data actually is symmetric, then skip that part.)
Python 2.7
from collections import defaultdict
data = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'hardy':['dog'],'bob':['joe','cat']}
# create symmetric version of data
d = defaultdict(list)
for key, values in data.iteritems():
for value in values:
d[key].append(value)
d[value].append(key)
visited = set()
def connected(key):
result = []
def connected(key):
if key not in visited:
visited.add(key)
result.append(key)
map(connected, d[key])
connected(key)
return result
print [connected(key) for key in d if key not in visited]
Python 3.3
from collections import defaultdict
data = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'hardy':['dog'],'bob':['joe','cat']}
# create symmetric version of data
d = defaultdict(list)
for key, values in data.items():
for value in values:
d[key].append(value)
d[value].append(key)
visited = set()
def connected(key):
visited.add(key)
yield key
for value in d[key]:
if key not in visited:
yield from connected(value)
print([list(connected(key)) for key in d if key not in visited])
Result
[['otis', 'milo', 'laurel', 'dog', 'hardy'], ['cat', 'bob', 'joe']]
Performance
O(n), where n is the total number of keys and values in data (in your case, 17 if I count correctly).
I'm taking "in some other list of values" literally, to mean that a key existing in its own set of values is OK. If not, that would make things slightly simpler, but you should be able to adjust the code yourself, so I won't write it both ways.
If you insist on using this data structure, you have to do it by brute force:
def does_key_exist_in_other_value(d, key):
for k, v in d.items():
if k != key and key in v:
return True
You could of course condense that into a one-liner with a genexpr and any:
return any(key in v for k, v in d.items() if k != key)
But a smarter thing to do would be to use a better data structure. At the very least use sets instead of lists as your values (which wouldn't simplify your code, but would make it a lot faster—if you have K keys and V total elements across your values, it would run in O(K) instead of O(KV).
But really, if you want to look things up, build a dict to look things up in:
inv_d = defaultdict(set)
for key, value in d.items():
for v in value:
inv_d[v].add(key)
And now, your code is just:
def does_key_exist_in_other_value(inv_d, key):
return inv_d[key] != {key}

Sort dictionary by specific order (list of keys)

I am trying to sort a dictionary based on the order that its keys appears in a list.
First element of the list would be the first element of the dictionary. At the end the dictionary keys wiould be in the same order as in the provided list...
I can sort by value with this code
newlist = sorted(product_list, key=lambda k: k['product'])
but cant do it for a list
thanks for any help!
from collections import OrderedDict
new_dict = OrderedDict((k, old_dict[k]) for k in key_list)
Having said that, there's probably a better way to solve your problem than using an OrderedDict
If some keys are missing, you'll need to use one of
new_dict = OrderedDict((k, old_dict.get(k)) for k in key_list)
or
new_dict = OrderedDict((k, old_dict[k]) for k in key_list if k in old_dict)
depending on how you want to handle the missing keys.
In Python (and most languages) dictionaries are unsorted, so you can't "sort" a dictionary.
You can retrieve and sort the keys and iterate through those:
for key in sorted(product_list.keys()):
item = product_list[key]
item.doSomething()
Or you can use a OrderDict, like so:
from collections import OrderedDict
And then build the dictionary in the required order (which is up to you to determine) but below we sort using the keys:
product_list = OrderDict(sorted(product_list.items(), key=lambda k: k[0]))
For reference, Dict.items() returns a list of tuples in the form:
[(key1, value1), (key2, value2) , ... , (keyN, valueN)]
By definition, a dictionary is unordered. You can use OrderedDict from collections as seen at http://docs.python.org/2/library/collections.html#collections.OrderedDict as a drop-in replacement.

Categories

Resources