Finding matching keys in two large dictionaries and doing it fast

Finding matching keys in two large dictionaries and doing it fast - python

I am trying to find corresponding keys in two different dictionaries. Each has about 600k entries.
Say for example:
myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
myNames = { 'Actinobacter': '8924342' }
I want to print out the value for Actinobacter (8924342) since it matches a value in myRDP.
The following code works, but is very slow:
for key in myRDP:
for jey in myNames:
if key == jey:
print key, myNames[key]
I've tried the following but it always results in a KeyError:
for key in myRDP:
print myNames[key]
Is there perhaps a function implemented in C for doing this? I've googled around but nothing seems to work.
Thanks.

Use sets, because they have a built-in intersection method which ought to be quick:
myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
myNames = { 'Actinobacter': '8924342' }
rdpSet = set(myRDP)
namesSet = set(myNames)
for name in rdpSet.intersection(namesSet):
print name, myNames[name]
# Prints: Actinobacter 8924342

You could do this:
for key in myRDP:
if key in myNames:
print key, myNames[key]
Your first attempt was slow because you were comparing every key in myRDP with every key in myNames. In algorithmic jargon, if myRDP has n elements and myNames has m elements, then that algorithm would take O(n×m) operations. For 600k elements each, this is 360,000,000,000 comparisons!
But testing whether a particular element is a key of a dictionary is fast -- in fact, this is one of the defining characteristics of dictionaries. In algorithmic terms, the key in dict test is O(1), or constant-time. So my algorithm will take O(n) time, which is one 600,000th of the time.

in python 3 you can just do
myNames.keys() & myRDP.keys()

for key in myRDP:
name = myNames.get(key, None)
if name:
print key, name
dict.get returns the default value you give it (in this case, None) if the key doesn't exist.

You could start by finding the common keys and then iterating over them. Set operations should be fast because they are implemented in C, at least in modern versions of Python.
common_keys = set(myRDP).intersection(myNames)
for key in common_keys:
print key, myNames[key]

Best and easiest way would be simply perform common set operations(Python 3).
a = {"a": 1, "b":2, "c":3, "d":4}
b = {"t1": 1, "b":2, "e":5, "c":3}
res = a.items() & b.items() # {('b', 2), ('c', 3)} For common Key and Value
res = {i[0]:i[1] for i in res} # In dict format
common_keys = a.keys() & b.keys() # {'b', 'c'}
Cheers!

Use the get method instead:
for key in myRDP:
value = myNames.get(key)
if value != None:
print key, "=", value

You can simply write this code and it will save the common key in a list.
common = [i for i in myRDP.keys() if i in myNames.keys()]

Copy both dictionaries into one dictionary/array. This makes sense as you have 1:1 related values. Then you need only one search, no comparison loop, and can access the related value directly.
Example Resulting Dictionary/Array:
[Name][Value1][Value2]
[Actinobacter][GATCGA...TCA][8924342]
[XYZbacter][BCABCA...ABC][43594344]
...

Here is my code for doing intersections, unions, differences, and other set operations on dictionaries:
class DictDiffer(object):
"""
Calculate the difference between two dictionaries as:
(1) items added
(2) items removed
(3) keys same in both but changed values
(4) keys same in both and unchanged values
"""
def __init__(self, current_dict, past_dict):
self.current_dict, self.past_dict = current_dict, past_dict
self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
self.intersect = self.set_current.intersection(self.set_past)
def added(self):
return self.set_current - self.intersect
def removed(self):
return self.set_past - self.intersect
def changed(self):
return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
def unchanged(self):
return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])
if __name__ == '__main__':
import unittest
class TestDictDifferNoChanged(unittest.TestCase):
def setUp(self):
self.past = dict((k, 2*k) for k in range(5))
self.current = dict((k, 2*k) for k in range(3,8))
self.d = DictDiffer(self.current, self.past)
def testAdded(self):
self.assertEqual(self.d.added(), set((5,6,7)))
def testRemoved(self):
self.assertEqual(self.d.removed(), set((0,1,2)))
def testChanged(self):
self.assertEqual(self.d.changed(), set())
def testUnchanged(self):
self.assertEqual(self.d.unchanged(), set((3,4)))
class TestDictDifferNoCUnchanged(unittest.TestCase):
def setUp(self):
self.past = dict((k, 2*k) for k in range(5))
self.current = dict((k, 2*k+1) for k in range(3,8))
self.d = DictDiffer(self.current, self.past)
def testAdded(self):
self.assertEqual(self.d.added(), set((5,6,7)))
def testRemoved(self):
self.assertEqual(self.d.removed(), set((0,1,2)))
def testChanged(self):
self.assertEqual(self.d.changed(), set((3,4)))
def testUnchanged(self):
self.assertEqual(self.d.unchanged(), set())
unittest.main()

def combine_two_json(json_request, json_request2):
intersect = {}
for item in json_request.keys():
if item in json_request2.keys():
intersect[item]=json_request2.get(item)
return intersect

Related

Python return key from value, but its a list in the dictionary

So Im quiet new to python and maybe I´ve searced the wrong words on google...
My current problem:
In python you can return the key to a value when its mentioned in a dictionary.
One thing I wonder, is it possible to return the key if the used value is part of a list of values to the key?
So my testing skript is the following
MainDict={'FAQ':['FAQ','faq','Faq']}
def key_return(X):
for Y, value in MainDict.items():
if X == value:
return Y
return "Key doesnt exist"
print(key_return(['FAQ', 'faq', 'Faq']))
print(key_return('faq'))
So I can just return the Key if I ask for the whole list,
How can I return the key if I just ask for one value of that list as written for the second print? On current code I get the "Key doesnt exist" as an answer.

You can check to see if a value in the dict is a list, and if it is check to see if the value you're searching for is in the list.
MainDict = {'FAQ':['FAQ','faq','Faq']}
def key_return(X):
for key, value in MainDict.items():
if X == value:
return key
if isinstance(value, list) and X in value:
return key
return "Key doesnt exist"
print(key_return(['FAQ', 'faq', 'Faq']))
print(key_return('faq'))
Note: You should also consider making MainDict a parameter that you pass to key_return instead of a global variable.

You can do this using next and a simple comprehension:
next(k for k, v in MainDict.items() if x == v or x in v)
So your code would look like:
MainDict = {'FAQ':['FAQ','faq','Faq']}
def key_return(x):
return next(k for k, v in MainDict.items() if x == v or x in v)
print(key_return(['FAQ', 'faq', 'Faq']))
#FAQ
print(key_return('faq'))
#FAQ

You can create a dict that maps from values in the lists to keys in MainDict:
MainDict={'FAQ':['FAQ','faq','Faq']}
back_dict = {value: k for k,values in MainDict.items() for value in values}
Then rewrite key_return to use this dict:
def key_return(X):
return back_dict[X]
print(key_return('faq'))
The line back_dict = {value: k for k,values in MainDict.items() for value in values} is a dictionary comprehension expression, which is equivalent to:
back_dict = {}
for k,values in MainDict.items():
for value in values:
back_dict[value] = k
This approach is more time-efficient that looping over every item of MainDict every time you search, since it only requires a single loopkup rather than a loop.

dictionary value compare and reassign

I want to solve a itenary problem, travel schedule. Here is my existing code.
import array as arr
class Solution():
def __init__(self):
pass
def printItenary(self,d):
reverse_d = dict()
for i in d:
if i and d[i]:
reverse_d[d[i]] = i
else:
print("Innvalid Input")
return
for i in reverse_d:
if reverse_d[i] not in reverse_d:
starting_pt = reverse_d[i]
break;
while(starting_pt in d):
print(starting_pt,"->",d[starting_pt],end=", ")
starting_pt = d[starting_pt]
if __name__=="__main__":
d = dict()
d["Chennai"] = "Banglore"
d["Bombay"] = "Delhi"
d["Goa"] = "Chennai"
d["Delhi"] = "Goa"
obj = Solution()
obj.printItenary(d)
The problem is if I add another line,
d["Chennai"] = "Delhi"
then there are multiple values for a single item, so i want to give a condition, if multiple inputs are given, then I will give priority based on lexicographical order, except it is not the value is in a dead end(if it is the last stoppage).
So my problem is, how to compare the dictionary data and update the value based on those condition

You need to make sure you handle the edge case of the key not being present. If you want it case insensitive, then do str.lower() as well in the comparison.
new_val = ...
val = d.get('Chennai')
val = min(val, new_val) if val else new_val
d['Chennai'] = val

For determining lexicographical order you could use the ord() function; ord('b') > ord('a') == True
if d["Chennai"]:
if ord(new_value[0]) > ord(d["Chennai"][0]):
d["Chennai"] = new_value
else:
d["Chennai"] = new_value

Modifying a nested dictionary element by a reference, generated from a list

The code:
def main():
nested_dict = {'A': {'A_1': 'value_1', 'B_1': 'value_2'},
'B': 'value_3'}
access_pattern = ['A', 'B_1']
new_value = 'value_4'
nested_dict[access_pattern] = new_value
return nested_dict
Background information:
As can be seen, I have a variable called nested_dict - in reality, it contains hundreds of elements with a different number of sub-elements each (I'm simplifying it for the purpose of the example).
I need to modify the value of some elements inside this dictionary, but it is not predetermined which elements exactly. The specific "path" to the elements that need be modified, will be provided by the access_pattern variable, which will be different every time.
The problem:
I know how to reference the value of the dictionary with this function functools.reduce(dict.get, access_pattern, nested_dict). However, I do not know how to universally modify (regardless of the contained variable type) the value of the access_pattern in the dictionary.
The provided code produces a TypeError that I do not know how to overcome elegantly. I did think of some solution, specified in 4.
Possible solutions:
if len(access_pattern) == 1:
nested_dict[access_pattern[0]] = new_value
elif len(access_pattern) == 2:
nested_dict[access_pattern[0]][access_pattern[1]] = new_value
...
So on for all len()
This just seems VERY inelegant and painful. Is there a more practical way to achieve this?

Make use of recursion
def edit_from_access_pattern(access_pattern, nested_dict, new_value):
if len(access_pattern) == 1:
nested_dict[access_pattern[0]] = new_value
else:
return edit_from_access_pattern(access_pattern[1:], nested_dict[access_pattern[0], new_value]

You can use recursion
def set_value(container, key, value):
if len(key) == 1:
container[key[0]] = value
else:
set_value(container[key[0]], key[1:], value)
but an explicit loop is probably going to be more efficient
def set_value(container, key, value):
for i in range(len(key)-1):
container = container[key[i]]
container[key[-1]] = value

How to check that only one value of my dictionary is filled?

How can I check that my dict contains only one value filled ?
I want to enter in my condition only if the value is the only one in my dict and of this type (in my example "test2") of my dict.
For now I have this if statement
my_dict = {}
my_dict["test1"] = ""
my_dict["test2"] = "example"
my_dict["test3"] = ""
my_dict["test4"] = ""
if my_dict["test2"] and not my_dict["test1"] and not my_dict["test3"] and not my_dict["test4"]:
print("inside")
I would like to find a better, classy and "pep8" way to achieve that
Any ideas ?

You have to check every value for truthiness, there's no way around that, e.g.
if sum(1 for v in my_dict.values() if v) == 1:
print('inside')

You can use filter() as below to check how many values are there in the dictionary.
if len(list(filter(None, my_dict.values()))) == 1:
print("inside")

Assuming that all your values are strings, what about
ref_key = "test2"
if ''.join(my_dict.values()) == my_dict[ref_key]:
print("inside")
... since it looks like you have a precise key in mind (when you do if my_dict["test2"]). Otherwise, my answer is (twice) less general than (some) others'.

Maybe you want to check if there's only one pair in dictionary after removing the empty values.
my_dict = {}
my_dict["test1"] = ""
my_dict["test2"] = "example"
my_dict["test3"] = ""
my_dict["test4"] = ""
my_dict={key:val for key,val in my_dict.items() if val}
if len(my_dict)==1:
print("inside")

Here is the another flavour (without loops):
data = list(my_dict.values())
if data.count('') + 1 == len(data):
print("Inside")

Working with python dictionaries

I am writing a function that takes in an argument. From that argument, I want to compare it to a dictionary's set of keys and return the key's value for any matches. So far I have been able to only return the argument matches for the keys.
def func(str):
a = []
b = {'a':'b','c':'d','e':'f'}
for i in str:
if i in b.keys():
a.append(i)
return a
Output sample:
func('abcdefghiabcdefghi')
['a','c','e','a','c','e']
Wanted output:
['b','d','f','b','d','f']

Best not to use str as a variable name. I think your function can be written more simply like this
def func(mystr):
b = {'a':'b','c':'d','e':'f'}
return [b[k] for k in mystr if k in b]
If you don't want to use a list comprehension, then you can fix it like this
def func(mystr):
a = []
b = {'a':'b','c':'d','e':'f'}
for i in mystr:
if i in b: # i in b works the same as i in b.keys()
a.append(b[i]) # look up the key(i) in the dictionary(b) here
return a

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding matching keys in two large dictionaries and doing it fast - python

in python 3 you can just do myNames.keys() & myRDP.keys()

for key in myRDP: name = myNames.get(key, None) if name: print key, name dict.get returns the default value you give it (in this case, None) if the key doesn't exist.

You could start by finding the common keys and then iterating over them. Set operations should be fast because they are implemented in C, at least in modern versions of Python. common_keys = set(myRDP).intersection(myNames) for key in common_keys: print key, myNames[key]

Use the get method instead: for key in myRDP: value = myNames.get(key) if value != None: print key, "=", value

You can simply write this code and it will save the common key in a list. common = [i for i in myRDP.keys() if i in myNames.keys()]

def combine_two_json(json_request, json_request2): intersect = {} for item in json_request.keys(): if item in json_request2.keys(): intersect[item]=json_request2.get(item) return intersect

Related

Python return key from value, but its a list in the dictionary

dictionary value compare and reassign

Modifying a nested dictionary element by a reference, generated from a list

How to check that only one value of my dictionary is filled?

Working with python dictionaries

Categories

Resources