I am trying to implement a LHS pattern match with a RHS action code in Python
How can I get a fast hashtable match.
Is this possible in Python?
I need to fast match features in terms of x,y,c where x and y are coordinates and c is the color at index(x,y) of a 2d array.
An hashmap is a dictionary in python.
There are several ways to create dictionaries, here's 2:
d = dict(k=v)
or
d = {k:v}
To get the value of a key:
k = d.get("k")
or
k = d[k]
To set the value of a key:
d[k] = "ok"
Notes:
In Python, dictionaries (or “dicts”, for short) are a central data
structure:
Dicts store an arbitrary number of objects, each identified by a
unique dictionary key. Dictionaries are often also called maps,
hashmaps, lookup tables, or associative arrays. They allow the
efficient lookup, insertion, and deletion of any object associated
with a given key.
Resources:
Dictionaries in Python
Dictionaries, Maps, and Hash Tables in Python
The native type dict is a hashmap, not a hashtable, therefore you can only values to a key.
You can however simulate a hashtable by using (x, y) tuples as keys:
d = {}
d[(1,0)] = True
d[(1,1)] = False
This works because the tuple type in Python is hashable, meaning that as long as the values that it wraps are hashable, it can convert the value to a key.
Otherwise, you could extend the dict type to provide additional methods, letting you access values in a Java- or C-style 2D array:
d[1][0] = True
d[1][1] = False
Related
I'm looking for the fastest way to do the following: given a dictionary and a key value, return the lowest key in the dictionary greater than than the value given. Per this question, the natural way would seem to be to create an OrderedDict, then use bisect on the keys to find the proper key location. The OrderedDict.keys() method doesn't support indexing, so per e.g. this question, one has to convert the keys to a list, before doing bisect or similar.
So once an OrderedDict has been created with its keys in order, in order to access the correct position one has to do the following:
Convert the keys to a list
Do a binary search of the keys with bisect or similar.
Check that this insertion point isn't at the end of the list, before retrieving the key located after this index.
Retrieve the key value in our original OrderedDict.
I'm most concerned about step 1 above, from an efficiency perspective (although all of this looks roundabout to me). Without knowing the details of how Python does the conversion to list, it seems like it would have to be O(n), completely eliminating the savings of using OrderedDict and binary search. I'm hoping someone can tell me whether this assumption I have about step 1 is or isn't correct, and regardless whether or not there may be a better method.
As an alternative, I could simply create a list of tuples, sorted by the first element (key), where the second element is the dict value associated with that key. Then I could pass the key lambda x:x[0] to bisect. This seems reasonable, but I'd prefer to store my key / value pairs more uniformly (e.g. JSON), since that's how it's done with other dicts in the same project that don't need this specific type of comparison.
Here's some example code for a single lookup. Edit: But lest anyone think I'm over-optimizing, the actual dictionary has ~3 million keys, and will be accessed ~7 million times in a batch, daily. So I'm very interested in finding a fast way of doing this.
# Single lookup example
from collections import OrderedDict
from bisect import bisect
d = OrderedDict()
d[5] = 'lowest_value'
d[7] = 'middle_value'
d[12] = 'highest_value'
sample_key = 6 # we want to find the value for the key above this in d, e.g. d[7]
list_of_keys = list(d.keys())
key_insertion_index = bisect(list_of_keys,sample_key)
if key_insertion_index < len(list_of_keys):
next_higher_key = list_of_keys[key_insertion_index]
v = d[next_higher_key]
Say I want to store ordered values where the key values represent a lower bound. Like this example:
d = {1: "pear", 4: "banana", 7: "orange"}
I can access the first object by d[1]. Say I want to store it so that I can access the first object "pear" by calling for any value between [1,4). If I input any "keyvalue" between [4,7) I want "banana" to be returned. Is there any type of data structure like that in python? I found intervalTrees, but it looked a bit more advanced than what I was looking for. In intervalTrees the intervals which are the keys, can be overlapping and I don't want that. Or maybe it is not a dictionary of any type I want since you can mix datatypes as keys in one dictionary. What do you think?
EDIT: From the tip I got, this would be a working code:
import bisect
d = [(1, "pear"), (4, "banana"), (7,"orange") ]
keys = [j[0] for j in d]
for v in range(1,10):
print("Using input ", v)
i = bisect.bisect(keys, v) - 1
out = d[i]
print(out)
print("")
# Or using SortedDict
from sortedcontainers import SortedDict
d2 = SortedDict()
d2[1] = 'pear'
d2[4] = 'banana'
d2[7] = 'orange'
for v in range(1,10):
print("Using input ", v)
i = bisect.bisect(d2.keys(), v) - 1
j = d2.keys()[i]
out = d2[j]
print(out)
print("")
The data structure you're looking for is a binary search tree (BST), and preferably a balanced BST. Your dictionary keys are the keys of the BST, and each node would just have an additional field to store the corresponding value. Then your lookup is just a lower-bound / bisect-left on the keys. Looking up Python implementations for Red-Black trees or AVL trees returns many possible packages.
There is no builtin library for always-sorted data. If you never need to add or delete keys, you can use bisect with (key, value) tuples in a sorted list.
For a pure Python implementation that allows modification, I would recommend checking out SortedDict from the SortedContainers library. It's built to be a drop-in replacement for BST's, is very usable and tested, and claims to outperform pointer-based BST's in memory and speed on reasonably sized datasets (but does not have the same asymptotic guarantees as a BST). You can also provide a custom key for comparing objects of different types.
I am modeling data for an application and decided to choose dictionary as my data structure. But each row in the data has multiple keys. So I created a dictionary with multiple keys mapping each row, something like:
>>> multiKeyDict = {}
>>> multiKeyDict[('key1','key2','key3')] = 'value1'
>>> multiKeyDict.get(('key1','key2','key3'))
'value1'
Now I have to retrieve all the values with key1 in O(1) time. From my research I know I could do:
use this package to get the job done but not sure if it is O(1)
search for keys as suggested here: https://stackoverflow.com/a/18453567/4085019
I am also open for any better data structures instead of using the dictionary.
You don't have multiple keys. As far as the Python dictionary is concerned, there is just one key, a tuple object. You can't search for the constituents of the tuple in anything other than O(N) linear time.
If your keys are unique, just add each key individually:
multiKeyDict['key1'] = multiKeyDict['key2'] = multiKeyDict['key3'] = 'value1'
Now you have 3 keys all referencing one value. The value object is not duplicated here, only the references to it are.
The multi_key_dict package you found uses an intermediate mapping to map a given constituent key to the composite key, which then maps to the value. This gives you O(1) search too, with the same limitation that each constituent key must be unique.
If your keys are not unique then you need to map each key to another container that then holds the values, like a set for instance:
for key in ('key1', 'key2', 'key3):
multiKeyDict.setdefault(key, set()).add(value)
Now looking up a key gives you the set of all values that that key references.
If you need to be able to combine keys too, then you can add additional references with those combinations. Key-value pairings are relatively cheap, it's all just references. The key and value objects themselves are not duplicated.
Another possibility is to build up an index to a list of row-objects which share a key-component. Provided the number of rows sharing any particular key value is small, this will be quite efficient. (Assume row-objects have keys accessed as row.key1, row.key2 etc., that's not a very relevant detail). Untested code:
index = {}
for row in rows:
index.setdefault( row.key1, []).append(row)
index.setdefault( row.key2, []).append(row)
index.setdefault( row.key3, []).append(row)
and then for looking up rows that match, say, key2 and key3
candidates = index[ key2]
if len( index[key3]) < len(candidates):
candidates = index[key3] # use key3 if it offers a better distribution
results = []
for cand in candidates:
if cand.key2 == key2 and cand.key3 == key3: # full test is necessary!
results.append( cand)
I apologize this must be a basic question for using dictionaries. I'm learning python, and the objective I have is to compare two dictionaries and recover the Key and Value entries from both entries that are identical. I understand that the order in dictionaries is not relevant like if one is working with a list. But I adopted a code to compare my dictionaries and i just wanted to make sure that the order of the dictionaries does not matter.
The code I have written so far is:
def compare_dict(first,second):
with open('Common_hits_python.txt', 'w') as file:
for keyone in first:
for keytwo in second:
if keytwo == keyone:
if first[keyone] == second[keytwo]:
file.write(keyone + "\t" + first[keyone] + "\n")
Any recommendations would be appreciated. I apologize for the redundany in the code above. But if someone could confirm that comparing two dictionaries this way does not require the key to be in the same order would great. Other ways of writing the function would be really appreciated as well.
Since you loop over both dictionaries and compare all the combinations, no, order doesn't matter. Every key in one dictionary is compared with every key in the other dictionary, eventually.
It is not a very efficient way to test for matching keys, however. Testing if a key is present is as simple as keyone in second, no need to loop over all the keys in second here.
Better still, you can use set intersections instead:
for key, value in first.viewitems() & second.viewitems():
# loops over all key - value pairs that match in both.
file.write('{}\t{}\n'.format(key, value))
This uses dictionary view objects; if you are using Python 3, then you can use first.items() & second.items() as dictionaries there return dictionary views by default.
Using dict.viewitems() as a set only works if the values are hashable too, but since you are treating your values as strings when writing to the file I assumed they were.
If your values are not hashable, you'll need to validate that the values match, but you can still use views and intersect just the keys:
for key in first.viewkeys() & second.viewkeys():
# loops over all keys that match in both.
if first[key] == second[key]:
file.write('{}\t{}\n'.format(key, first[key]))
Again, in Python 3, use first.keys() & second.keys() for the intersection of the two dictionaries by keys.
Your way of doing it is valid. As you look through both lists, the order of the dictionaries does not matter.
You could do this instead, to optimize your code.
for keyone in first:
if keyone in second: # returns true if keyone is present in second.
if first[keyone] == second[keyone]:
file.write(keyone + "\t" + first[keyone] + "\n")
The keys of a dictionary are effectively a set, and Python already has a built-in set type with an efficient intersection method. This will produce a set of keys that are common to both dictionaries:
dict0 = {...}
dict1 = {...}
set0 = set(dict0)
set1 = set(dict1)
keys = set0.intersection(set1)
Your goal is to build a dictionary out of these keys, which can be done with a dictionary comprehension. It will require a condition to keep out the keys that have unequal values in the two original dictionaries:
new_dict = {k: dict0[k] for k in keys if dict0[k] == dict1[k]}
Depending on your intended use for the new dictionary, you might want to copy or deepcopy the old dictionary's values into the new one.
Suppose I have some kind of dictionary structure like this (or another data structure representing the same thing.
d = {
42.123231:'X',
42.1432423:'Y',
45.3213213:'Z',
..etc
}
I want to create a function like this:
f(n,d,e):
'''Return a list with the values in dictionary d corresponding to the float n
within (+/-) the float error term e'''
So if I called the function like this with the above dictionary:
f(42,d,2)
It would return
['X','Y']
However, while it is straightforward to write this function with a loop, I don't want to do something that goes through every value in the dictionary and checks it exhaustively, but I want it to take advantage of the indexed structure somehow (or a even a sorted list could be used) to make the search much faster.
Dictionary is a wrong data structure for this. Write a search tree.
Python dictionary is a hashmap implementation. Its keys can't be compared and traversed as in search tree. So you simply can't do it using python dictionary without actually checking all keys.
Dictionaries with numeric keys are usually sorted - by key values. But you may - to be on the safe side - rearrange it as OrderedDictionary - you do it once
from collections import OrderedDict
d_ordered = OrderedDict(sorted(d.items(), key =lambda i:i[0]))
Then filtering values is rather simple - and it will stop at the upper border
import itertools
values = [val for k, val in
itertools.takewhile(lambda (k,v): k<upper, d_ordered.iteritems())
if k > lower]
As I've already stated, ordering dictionary is not really necessary - but some will say that this assumption is based on the current implementation and may change in the future.