avoid loops and increase performance to update a dict - python

I have a dict of the form:
dict1[element1] : reference1
dict1[element2] : reference2
dict1[element3] : reference2
There a some elements have the same reference (like element2 and element3 have).
I need to convert this into a dict with the following form:
dict2[reference1] : [element1]
dict2[reference2] : [element2,element3]
To get this I wrote:
dict2=dict()
for key in dict1:
UpdateDict(dict2,dict1[key],key)
def UpdateDict(Dict,Key,Entry):
Keys = list(Dict.keys())
if Key in Keys:
Dict[Key].append(Entry)
return
else:
Item = list()
Item.append(Entry)
Dict[Key] = Item
return
This works fine until dict1 is not very large, but if dict1 is large (some 1000 keys) it takes hours to get the result.
Is there a faster way to do it?

This:
Keys = list(Dict.keys())
if Key in Keys:
...
is probably the main culprit. It turns a O(1) lookup (if Key in Dict:) into a O(n) one. This plus the overhead of the one-function-call per key is certainly suboptimal indeed.
A much simpler solution is to use a collections.defaultdict:
from collections import defaultdict
def revindex(dic):
rev = defaultdict(list)
# nb for py2.7 use `iteritems()` instead
for k, v in dic.items():
rev[v].append(k)
return rev
dict2 = revindex(dict1)

Use a defaultdict instead of a vanilla dict to avoid those membership checks and you can remove the function calls which adds a non trivial overhead with repeated calls:
from collections import defaultdict
dct2 = defaultdict(list)
for k in dct1:
dct2[dct1[k]].append(k)

You can use a defaultdict or just dict.setdefault:
dict2 = {}
for key, value in dict1.items():
dict2.setdefault(value, []).append(key)
A function seems unnecessary for such a simple call.

Instead of defaultdicts that were already mentioned, you could also create the dict with the setdefault method, like so:
def transform(some_dict):
new_dict = {}
for k, v in some_dict.items():
new_dict.setdefault(v, []).append(k)
return new_dict
There is no need for the if statement you made. In my benchmark, this is around 2 times faster than your method. The method of Moses Koledoye beats those (on my machine) for some reason, which is in turn beaten by bruno desthuilliers' method.
For the dictionary
dict1 = dict([(str(x),x%10) for x in range(0,100000)])
I get (transform_0 being your method with together with the for loop, transform_d Moses' and transform_d2 being Bruno's methods) as average over 200 calls:
benchmark for "transform_0": 133.516 ms
benchmark for "transform": 50.696 ms
benchmark for "transform_d": 43.967 ms
benchmark for "transform_d2": 38.408 ms

Related

Fastest way to get key with most items in a dictionary?

I am trying to find the fastest way to get the dictionary key with the most items. The two methods I have tried so far are:
def get_key_with_most_items(d):
maxcount = max(len(v) for v in d.values())
return [k for k, v in d.items() if len(v) == maxcount][0]
and
def sort_by_values_len(dict):
dict_len = {key: len(value) for key, value in dict.items()}
import operator
sorted_key_list = sorted(dict_len.items(), key=operator.itemgetter(1), reverse=True)
sorted_dict = [{item[0]: dict[item [0]]} for item in sorted_key_list]
return sorted_dict
The first method return the key with the biggest number of items, while the second returns the whole dictionary as a list. In my case I only need the key, just to be clear. After comparing these methods in this manner:
start_time = time.time()
for i in range(1000000):
get_key_with_most_items(my_dict) # sort_by_values_len(my_dict)
print("Time", (time.time() - start_time))
I have come to the conclusion that the get_key_with_most_items method is faster by almost 50%, with times 15.68s and 8.06s respectively. Could anyone recommend (if possible) something even faster?
The solution is extremely simple:
max(d, key=lambda x: len(d[x]))
Explanation:
dictionaries, when iterated, are just a set of keys. max(some_dictionary) will take maximum of keys
max optionally accepts a comparison function (key). To compare dictionary keys by the amount of items, the built-in len does just the job.
Use d.items() to get a sequence of the keys and values. Then get the max of this from the length of the values.
def get_key_with_most_items(d):
maxitem = max(d.items(), key = lambda item: len(item[1]))
return maxitem[0]
for the max function:
max(d, key=lambda k: len(d[k]))
If you want the dict to be ordered, then use OrderedDict. I think technically your code will still work with regular dict, but that's a technicality based on the current implementation of Pythons dict - In the past regular dict would not have reliable order, and in the future it may not.
You could do this for example as a one liner to turn your dict into an ordered dict by value length:
from collections import OrderedDict
ordered_dict = OrderedDict(sorted(d.items(), key=lambda t: len(t[1])))

how to combine the common key and join the values in the dictionary python

I have one list which contain a few dictionaries.
[{u'TEXT242.txt': u'work'},{u'TEXT242.txt': u'go to work'},{u'TEXT1007.txt': u'report'},{u'TEXT797.txt': u'study'}]
how to combine the dictionary when it has the same key. for example:
u'work', u'go to work'are under one key:'TEXT242.txt', so that i can remove the duplicated key.
[{u'TEXT242.txt': [u'work', u'go to work']},{u'TEXT1007.txt': u'report'},{u'TEXT797.txt': u'study'}]
The setdefault method of dictionaries is handy here... it can create an empty list when a dictionary key doesn't exist, so that you can always append the value.
dictlist = [{u'TEXT242.txt': u'work'},{u'TEXT242.txt': u'go to work'},{u'TEXT1007.txt': u'report'},{u'TEXT797.txt': u'study'}]
newdict = {}
for d in dictlist:
for k in d:
newdict.setdefault(k, []).append(d[k])
from collections import defaultdict
before = [{u'TEXT242.txt': u'work'},{u'TEXT242.txt': u'go to work'},{u'TEXT1007.txt': u'report'},{u'TEXT797.txt': u'study'}]
after = defaultdict(list)
for i in before:
for k, v in i.items():
after[k].append(v)
out:
defaultdict(list,
{'TEXT1007.txt': ['report'],
'TEXT242.txt': ['work', 'go to work'],
'TEXT797.txt': ['study']})
This technique is simpler and faster
than an equivalent technique using dict.setdefault()

Split dictionary into two sub-dictionaries using a string

I have a dictionary and I want to split it into two sub-dictioaries based on a string.
Is there a nicer (more pythonic) way to do it than this:
dict_1 = {k:v for (k,v) in initial_dict.iteritems() if string in k}
dict_2 = {k:v for (k,v) in initial_dict.iteritems() if string not in k}
dict_1 = {key:initial_dict.pop(key) for key in initial_dict if string in key}
dict_2 = initial_dict
I'll vote for your original. Clear, short, and only slightly inefficient
There is a way to do it without referencing or testing elements of initial_dict more than once, which is quite Pythonic if Pythonic means knowing that int(False)==0 and int(True)==1
dict1, dict2 = {}, {}
for k,v in initial_dict.items(): (dict2,dict1)[string in k][k] = v
Like I said, I prefer the questioner's way!
By the way, if you have to perform an n-way partition, dict_list[i][key]=v inside a loop that fetches or generates key,i and v starts to look a lot better than a multi-way if ... elif ... elif ...

How to check if a key/value is repeated elsewhere in a dictionary using Python

I have a dictionary in python like:
dict = {'dog':['milo','otis','laurel','hardy'],
'cat':['bob','joe'],
'milo':['otis','laurel','hardy','dog'],
'hardy':['dog'],'bob':['joe','cat']}
...and I want to identify if a key exists elsewhere in a dictionary (in some other list of values). There are other questions I could find that want to know if an item simply exists in the dictionary, but this is not my question. The same goes for items in each list of values, to identify items that do not exist in OTHER keys and their associated values in the dictionary.
In the above example, the idea is that dogs and cats are not equal, their keys/values have nothing in common with those that come from cats. Ideally, a second dictionary would be created that collects all of those associated with each unique cluster:
unique.dict = {'cluster1':['dog','milo','otis','laurel','hardy'],
'cluster2':['cat','bob','joe'] }
This is a follow up question to In Python, count unique key/value pairs in a dictionary
It appears that the relationship is symmetric, but your data is not (e.g. there is no key 'otis'). The first part involves making it symmetric, so it won't matter where we start.
(If your data actually is symmetric, then skip that part.)
Python 2.7
from collections import defaultdict
data = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'hardy':['dog'],'bob':['joe','cat']}
# create symmetric version of data
d = defaultdict(list)
for key, values in data.iteritems():
for value in values:
d[key].append(value)
d[value].append(key)
visited = set()
def connected(key):
result = []
def connected(key):
if key not in visited:
visited.add(key)
result.append(key)
map(connected, d[key])
connected(key)
return result
print [connected(key) for key in d if key not in visited]
Python 3.3
from collections import defaultdict
data = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'hardy':['dog'],'bob':['joe','cat']}
# create symmetric version of data
d = defaultdict(list)
for key, values in data.items():
for value in values:
d[key].append(value)
d[value].append(key)
visited = set()
def connected(key):
visited.add(key)
yield key
for value in d[key]:
if key not in visited:
yield from connected(value)
print([list(connected(key)) for key in d if key not in visited])
Result
[['otis', 'milo', 'laurel', 'dog', 'hardy'], ['cat', 'bob', 'joe']]
Performance
O(n), where n is the total number of keys and values in data (in your case, 17 if I count correctly).
I'm taking "in some other list of values" literally, to mean that a key existing in its own set of values is OK. If not, that would make things slightly simpler, but you should be able to adjust the code yourself, so I won't write it both ways.
If you insist on using this data structure, you have to do it by brute force:
def does_key_exist_in_other_value(d, key):
for k, v in d.items():
if k != key and key in v:
return True
You could of course condense that into a one-liner with a genexpr and any:
return any(key in v for k, v in d.items() if k != key)
But a smarter thing to do would be to use a better data structure. At the very least use sets instead of lists as your values (which wouldn't simplify your code, but would make it a lot faster—if you have K keys and V total elements across your values, it would run in O(K) instead of O(KV).
But really, if you want to look things up, build a dict to look things up in:
inv_d = defaultdict(set)
for key, value in d.items():
for v in value:
inv_d[v].add(key)
And now, your code is just:
def does_key_exist_in_other_value(inv_d, key):
return inv_d[key] != {key}

Sort dictionary by specific order (list of keys)

I am trying to sort a dictionary based on the order that its keys appears in a list.
First element of the list would be the first element of the dictionary. At the end the dictionary keys wiould be in the same order as in the provided list...
I can sort by value with this code
newlist = sorted(product_list, key=lambda k: k['product'])
but cant do it for a list
thanks for any help!
from collections import OrderedDict
new_dict = OrderedDict((k, old_dict[k]) for k in key_list)
Having said that, there's probably a better way to solve your problem than using an OrderedDict
If some keys are missing, you'll need to use one of
new_dict = OrderedDict((k, old_dict.get(k)) for k in key_list)
or
new_dict = OrderedDict((k, old_dict[k]) for k in key_list if k in old_dict)
depending on how you want to handle the missing keys.
In Python (and most languages) dictionaries are unsorted, so you can't "sort" a dictionary.
You can retrieve and sort the keys and iterate through those:
for key in sorted(product_list.keys()):
item = product_list[key]
item.doSomething()
Or you can use a OrderDict, like so:
from collections import OrderedDict
And then build the dictionary in the required order (which is up to you to determine) but below we sort using the keys:
product_list = OrderDict(sorted(product_list.items(), key=lambda k: k[0]))
For reference, Dict.items() returns a list of tuples in the form:
[(key1, value1), (key2, value2) , ... , (keyN, valueN)]
By definition, a dictionary is unordered. You can use OrderedDict from collections as seen at http://docs.python.org/2/library/collections.html#collections.OrderedDict as a drop-in replacement.

Categories

Resources