Hash Map in Python

Hash Map in Python - python

I want to implement a HashMap in Python. I want to ask a user for an input. depending on his input I am retrieving some information from the HashMap. If the user enters a key of the HashMap, I would like to retrieve the corresponding value.
How do I implement this functionality in Python?
HashMap<String,String> streetno=new HashMap<String,String>();
streetno.put("1", "Sachin Tendulkar");
streetno.put("2", "Dravid");
streetno.put("3","Sehwag");
streetno.put("4","Laxman");
streetno.put("5","Kohli")

Python dictionary is a built-in type that supports key-value pairs. It's the nearest builtin data structure relative to Java's HashMap.
You can declare a dict with key-value pairs set to values:
streetno = {
"1": "Sachin Tendulkar",
"2": "Dravid",
"3": "Sehwag",
"4": "Laxman",
"5": "Kohli"
}
You can also set a key-value mapping after creation:
streetno = {}
streetno["1"] = "Sachin Tendulkar"
print(streetno["1"]) # => "Sachin Tendulkar"
Another way to create a dictionary is with the dict() builtin function, but this only works when your keys are valid identifiers:
streetno = dict(one="Sachin Tendulkar", two="Dravid")
print(streetno["one"]) # => "Sachin Tendulkar"

All you wanted (at the time the question was originally asked) was a hint. Here's a hint: In Python, you can use dictionaries.

It's built-in for Python. See dictionaries.
Based on your example:
streetno = {"1": "Sachine Tendulkar",
"2": "Dravid",
"3": "Sehwag",
"4": "Laxman",
"5": "Kohli" }
You could then access it like so:
sachine = streetno["1"]
Also worth mentioning: it can use any non-mutable data type as a key. That is, it can use a tuple, boolean, or string as a key.

Hash maps are built-in in Python, they're called dictionaries:
streetno = {} #create a dictionary called streetno
streetno["1"] = "Sachin Tendulkar" #assign value to key "1"
Usage:
"1" in streetno #check if key "1" is in streetno
streetno["1"] #get the value from key "1"
See the documentation for more information, e.g. built-in methods and so on. They're great, and very common in Python programs (unsurprisingly).

streetno = { 1 : "Sachin Tendulkar",
2 : "Dravid",
3 : "Sehwag",
4 : "Laxman",
5 : "Kohli" }
And to retrieve values:
name = streetno.get(3, "default value")
Or
name = streetno[3]
That's using number as keys, put quotes around the numbers to use strings as keys.

Here is the implementation of the Hash Map using python
For the simplicity hash map is of a fixed size 16.
This can be changed easily.
Rehashing is out of scope of this code.
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.next = None
class HashMap:
def __init__(self):
self.store = [None for _ in range(16)]
def get(self, key):
index = hash(key) & 15
if self.store[index] is None:
return None
n = self.store[index]
while True:
if n.key == key:
return n.value
else:
if n.next:
n = n.next
else:
return None
def put(self, key, value):
nd = Node(key, value)
index = hash(key) & 15
n = self.store[index]
if n is None:
self.store[index] = nd
else:
if n.key == key:
n.value = value
else:
while n.next:
if n.key == key:
n.value = value
return
else:
n = n.next
n.next = nd
hm = HashMap()
hm.put("1", "sachin")
hm.put("2", "sehwag")
hm.put("3", "ganguly")
hm.put("4", "srinath")
hm.put("5", "kumble")
hm.put("6", "dhoni")
hm.put("7", "kohli")
hm.put("8", "pandya")
hm.put("9", "rohit")
hm.put("10", "dhawan")
hm.put("11", "shastri")
hm.put("12", "manjarekar")
hm.put("13", "gupta")
hm.put("14", "agarkar")
hm.put("15", "nehra")
hm.put("16", "gawaskar")
hm.put("17", "vengsarkar")
print(hm.get("1"))
print(hm.get("2"))
print(hm.get("3"))
print(hm.get("4"))
print(hm.get("5"))
print(hm.get("6"))
print(hm.get("7"))
print(hm.get("8"))
print(hm.get("9"))
print(hm.get("10"))
print(hm.get("11"))
print(hm.get("12"))
print(hm.get("13"))
print(hm.get("14"))
print(hm.get("15"))
print(hm.get("16"))
print(hm.get("17"))
Output:
sachin
sehwag
ganguly
srinath
kumble
dhoni
kohli
pandya
rohit
dhawan
shastri
manjarekar
gupta
agarkar
nehra
gawaskar
vengsarkar

class HashMap:
def __init__(self):
self.size = 64
self.map = [None] * self.size
def _get_hash(self, key):
hash = 0
for char in str(key):
hash += ord(char)
return hash % self.size
def add(self, key, value):
key_hash = self._get_hash(key)
key_value = [key, value]
if self.map[key_hash] is None:
self.map[key_hash] = list([key_value])
return True
else:
for pair in self.map[key_hash]:
if pair[0] == key:
pair[1] = value
return True
else:
self.map[key_hash].append(list([key_value]))
return True
def get(self, key):
key_hash = self._get_hash(key)
if self.map[key_hash] is not None:
for pair in self.map[key_hash]:
if pair[0] == key:
return pair[1]
return None
def delete(self, key):
key_hash = self._get_hash(key)
if self.map[key_hash] is None :
return False
for i in range(0, len(self.map[key_hash])):
if self.map[key_hash][i][0] == key:
self.map[key_hash].pop(i)
return True
def print(self):
print('---Phonebook---')
for item in self.map:
if item is not None:
print(str(item))
h = HashMap()

Python Counter is also a good option in this case:
from collections import Counter
counter = Counter(["Sachin Tendulkar", "Sachin Tendulkar", "other things"])
print(counter)
This returns a dict with the count of each element in the list:
Counter({'Sachin Tendulkar': 2, 'other things': 1})

In python you would use a dictionary.
It is a very important type in python and often used.
You can create one easily by
name = {}
Dictionaries have many methods:
# add entries:
>>> name['first'] = 'John'
>>> name['second'] = 'Doe'
>>> name
{'first': 'John', 'second': 'Doe'}
# you can store all objects and datatypes as value in a dictionary
# as key you can use all objects and datatypes that are hashable
>>> name['list'] = ['list', 'inside', 'dict']
>>> name[1] = 1
>>> name
{'first': 'John', 'second': 'Doe', 1: 1, 'list': ['list', 'inside', 'dict']}
You can not influence the order of a dict.

A dictionary in Python is the best way to implement this. We can create the following dictionary using the given <key,value> pairs:
d = {"1": "Sachin Tendulkar", "2": "Dravid", "3": "Sehwag", "4": "Laxman", "5": "Kohli"}
To extract the value of a particular key, we can directly use d[key]:
name = d["1"] # The value of name would be "Sachin Tendulkar" here

That is my solution for LeetCode problem 706:
A hash map class with three methods: get, put and remove
class Item:
def __init__(self, key, value):
self.key = key
self.value = value
self.next = None
class MyHashMap:
def __init__(self, size=100):
self.items = [None] * size
self.size = size
def _get_index(self, key):
return hash(key) & self.size-1
def put(self, key: int, value: int) -> None:
index = self._get_index(key)
item = self.items[index]
if item is None:
self.items[index] = Item(key, value)
else:
if item.key == key:
item.value = value
else:
while True:
if item.key == key:
item.value = value
return
else:
if not item.next:
item.next = Item(key, value)
return
item = item.next
def get(self, key: int) -> int:
index = self._get_index(key)
if self.items[index] is None:
return -1
item = self.items[index]
while True:
if item.key == key:
return item.value
else:
if item.next:
item = item.next
else:
return -1
def remove(self, key: int) -> None:
value = self.get(key)
if value > -1:
index = self._get_index(key)
item = self.items[index]
if item.key == key:
self.items[index] = item.next if item.next else None
return
while True:
if item.next and item.next.key == key:
item.next = item.next.next
return
else:
if item.next:
item = item.next
else:
return

Related

How to implement consistent hashing of integer type keys in Python custom dictionary?

I am trying to build a dictionary from scratch in Python. I have done most of the work but am stuck on a little problem. First, I will start by saying that I am using the inbuilt Python hash() to get the hash_result of the key (key can be int or str) then the index is formed by the hash_result % capacity of dictionary. If the key is a string of characters, everything works fine. As soon as the key is an integer, my custom dictionary breaks. Sometimes everything works, other times, the key gets hash value 0 (for instance) when adding to the dictionary, but the same key returns hash value 4 (for instance) when searching for the key in the dictionary which returns a KeyError since the key is mapped at index 0 and not 4. I believe that at first, the index is calculated by hash(key) % capacity(4 for instance), but as soon as the capacity gets increased x2, the index that is returned by the function hash(key) % capacity(now 8 because x2) is different which results in the problem. I saw this formula in Wikipedia (hash(key) % capacity). I am interested in learning if this is the problem I am facing or if not, what is actually causing this unwanted behavior and how to tackle it.
Here is my code below:
class MyDictionary:
__LOAD_FACTOR_LIMIT = 0.75
__DEFAULT_CAPACITY = 4
def __init__(self):
self.__capacity = self.__DEFAULT_CAPACITY
self.__keys = [[] for i in range(self.__capacity)]
self.__values = [[] for i in range(self.__capacity)]
#property
def keys(self):
return [item for current_list in self.__keys for item in current_list]
#property
def values(self):
return [value for value_list in self.__values for value in value_list]
def __setitem__(self, key, value):
while self.__compute_load_factor() >= self.__LOAD_FACTOR_LIMIT:
self.__extend_dict()
index_hash = self.__hash_function(key)
if self.__is_key_in_dict(index_hash, key):
self.__set_value_to_an_existing_key(index_hash, key, value)
return
self.__set_value_to_a_new_key(index_hash, key, value)
def __getitem__(self, key):
index_hash = self.__hash_function(key)
if self.__is_key_in_dict(index_hash, key):
index_bucket = self.__get_index_bucket(index_hash, key)
return self.__values[index_hash][index_bucket]
raise KeyError('Key is not in dictionary!')
def __str__(self):
key_values = zip(self.keys, self.values)
result = '{' + ", ".join([f"{key}: {value}"
if isinstance(key, int) else f"'{key}': {value}"
for key, value in key_values]) + '}'
return result
def __hash_function(self, key):
index_hash = hash(key) % self.__capacity
return index_hash
def __is_key_in_dict(self, index_hash, key):
if key in self.__keys[index_hash]:
return True
return False
def __get_index_bucket(self, index_hash, key):
index_bucket = self.__keys[index_hash].index(key)
return index_bucket
def __extend_dict(self):
self.__keys += [[] for i in range(self.__capacity)]
self.__values += [[] for i in range(self.__capacity)]
self.__capacity *= 2
def __set_value_to_an_existing_key(self, index_hash, key, value):
index_bucket = self.__get_index_bucket(index_hash, key)
self.__values[index_hash][index_bucket] = value
def __set_value_to_a_new_key(self, index_hash, key, value):
self.__keys[index_hash].append(key)
self.__values[index_hash].append(value)
def __compute_load_factor(self):
k = len(self.__keys)
n = len([bucket for bucket in self.__keys if bucket])
return n / k
def get(self, key, return_value=None):
try:
index_hash = self.__hash_function(key)
index_bucket = self.__get_index_bucket(index_hash, key)
if self.__is_key_in_dict(index_hash, key):
return self.__keys[index_hash][index_bucket]
raise KeyError('Key is not in dictionary!')
except KeyError:
return return_value
def add(self):
pass
def pop(self):
pass
def clear(self):
self.__capacity = self.__DEFAULT_CAPACITY
self.__keys = [[] for i in range(self.__capacity)]
self.__values = [[] for i in range(self.__capacity)]
def items(self):
zipped_key_value = zip(self.keys, self.values)
return [item for item in zipped_key_value]
dictionary = MyDictionary()
dictionary.add()
dictionary[4] = 'hey'
dictionary['2'] = 'cya'
dictionary['4'] = 'welcome'
dictionary['5'] = 'welcome'
dictionary['32'] = 'heya'
dictionary['31'] = 'heya'
dictionary['36'] = 'heya'
dictionary['34'] = 'heya'
print(dictionary[4])

This is because you increase the capacity (stored in the __capacity attribute) by calling the __extend_dict method when the load is over a threshold, which makes the indices of the buckets in which the existing values are stored no longer valid, since you always derive the indices by taking the modulo of the capacity.
You should therefore re-insert the existing keys and values at their new indices every time you increase the dict's capacity:
def __extend_dict(self):
self.__capacity *= 2
new_keys = [[] for _ in range(self.__capacity)]
new_values = [[] for _ in range(self.__capacity)]
for keys, values in zip(self.__keys, self.__values):
for key, value in zip(keys, values):
index_hash = self.__hash_function(key)
new_keys[index_hash].append(key)
new_values[index_hash].append(value)
self.__keys = new_keys
self.__values = new_values
Demo: https://replit.com/#blhsing/NewEnchantingPerimeter

Python: List index out of range in while loop

What I am trying to do
I have this hashtable which I am trying to double hash the value however i am getting the error
if hashtable_list[hashKey] == None:
IndexError: list index out of range
I have been at this for hours and can't seem to find where I am going wrong with this double hashing algorithm. Please can someone help me . Any help will be much appreciated from the bottom of my heart
# The HashParent class is the main class and follows an ADT
# in which it holds the key and value
class HashParent:
def __init__(self, key, value):
self.key = key
self.value = value
self.isItemDeleted = False
class HashTable(object):
"""
a basic, minimal implementation of a hash map
"""
def __init__(self):
"""
constructs a new Map
"""
#Create a table size of 4 None values eg [None, None, None, None]
self.table = [None] * 4
self.hashTableSize = 0
#Uses Linear Probing to hash values into the table
def __get_hash_code(self, key, value):
return (hash(key) + value) % len(self.table)
# Uses Linear Probing to hash values into the table
def hashUsingQudratic(self, key, value):
return (hash(key) + value ** 2) % len(self.table)
def double_hashing(self, key, value):
hashtable_size = self.hashTableSize
hashtable_list = self.table
hashKey = hash(key)
if hashtable_list[hashKey] == None:
hashtable_list[hashKey] = key
else:
new_hashkey = hashKey
while hashtable_list[new_hashkey] is not None:
steps = value - (key % value)
new_hashkey = (new_hashkey + steps) % hashtable_size
hashtable_list[new_hashkey] = key
return hashtable_list
def getitem(self, key):
"""
gets the value associated with the key
"""
hashTableLength = len(self.table)
for i in range(hashTableLength):
index = self.__get_hash_code(key, i)
if self.table[index] != None:
if self.table[index].key == key:
if self.table[index].isItemDeleted:
raise KeyError('Key is not in the map')
else:
return self.table[index].value
elif self.table[index] is None:
raise KeyError('Key is not in the map')
raise KeyError('Hmm something has gone wrong here')
def whichMethod(self, whichType, key,i):
if whichType == 'linear':
index = self.__get_hash_code(key, i)
return index
if whichType == 'quadratic':
index = self.hashUsingQudratic(key, i)
return index
if whichType == 'double':
index = self.double_hashing(key, i)
return index
def putItem(self, key, item, whichType):
"""
stores the key value combo in the table
implements open addressing collision resolution
"""
parent = HashParent(key, item)
for i in range(len(self.table)):
index = self.whichMethod(whichType,key,i)
if self.table[index] is None or self.table[index].isItemDeleted:
self.table[index] = parent
self.hashTableSize += 1
break
def deleteValue(self, key):
"""
deletes a value from the hash table
"""
hashTableLength = len(self.table)
for i in range(hashTableLength):
index = self.__get_hash_code(key, i)
if self.table[index] != None:
if self.table[index].key == key:
if self.table[index].isItemDeleted:
raise KeyError('Key is not in the map')
else:
self.table[index].isItemDeleted = True
self.hashTableSize -= 1
break
m = HashTable()
linear = 'linear'
quadratic = 'quadratic'
doubleHash = 'double'
m.putItem('first', 1,doubleHash)
m.putItem('ninth',9 ,doubleHash)
m.putItem('third', 3,doubleHash)
m.putItem('Tenth', 10,doubleHash)
print("The value at key 'ninth' is:" ,m.getitem('ninth'))
m.deleteValue('Tenth')
#Size should now be 3
print('The Hashatble size is:',m.hashTableSize)

Ok, so you do m.putItem('first', 1, doubleHash), so key is "first".
You pass key from putItem to whichMethod, then from whichMethod to double_hashing.
Then, double_hashing does this:
hashtable_list = self.table
self.table starts out as self.table = [None] * 4. It just has four Nones. So hashtable_list will be [None, None, None, None].
Then, it does:
hashKey = hash(key)
if hashtable_list[hashKey] == None:
hash returns an integer, and key is "first". Let's just try that in the interpreter:
>>> hash("first")
-4954399314613441385
>>>
So, hashtable_list[hashKey] is like saying [None, None, None, None][hash("first")], which is like saying [None, None, None, None][-4954399314613441385]. There's your IndexError.

Converting a string to a nested dictionary

I have a string which I want to convert to a nested dictionary in Python.
Example Input :
import copy
diff_str = "/pathConstraint/latency/latencyValue"
value = "low"
diff_arr = diff.split("/")
final_temp_dict = dict()
for elem in reversed(diff_arr):
if len(final_temp_dict) == 0:
final_temp_dict.setdefault(elem, value)
else:
temp_final_dict = copy.deepcopy(final_temp_dict)
final_temp_dict.setdefault(elem, temp_final_dict)
print (final_temp_dict)
While running this I face an error and I'm not getting the expected output.
The output needed is as a nested dictionary:
{"pathConstraint" : {"latency" : {"latencyValue" : "low"}}}

You could use the following recursive function:
def string_to_dict(keys, value):
key = keys.split('/')
if len(key) == 2:
return {key[1]: value}
else:
return string_to_dict('/'.join(key[:-1]), {key[-1]: value})
Output:
>>> string_to_dict(diff_str, value)
{'pathConstraint': {'latency': {'latencyValue': 'low'}}}
Note that this assumes that diff_str begins with a / character.

The following is an iterative approach. Note diff_arr[1:] is used to exclude the empty string that is generated from splitting on the initial /.
diff_str = "/pathConstraint/latency/latencyValue"
value = "low"
diff_arr = diff_str.split("/")
for key in list(reversed(diff_arr[1:])):
value = {key: value}
print(value)
Output
{'pathConstraint': {'latency': {'latencyValue': 'low'}}}

Shorter recursive approach:
def to_dict(d, v):
return v if not d else {d[0]:to_dict(d[1:], v)}
diff_str = "/pathConstraint/latency/latencyValue"
value = "low"
print(to_dict(list(filter(None, diff_str.split('/'))), value))
Output:
{'pathConstraint': {'latency': {'latencyValue': 'low'}}}

I tried to modify your function as little as possible, this should work just fine
import copy
def func():
diff_str = "/pathConstraint/latency/latencyValue"
value = "low"
diff_arr = diff_str.split("/")
final_temp_dict = dict()
for elem in reversed(diff_arr):
if elem == "":
continue
if len(final_temp_dict) == 0:
final_temp_dict[elem] = value
else:
temp_final_dict = copy.deepcopy(final_temp_dict)
final_temp_dict = {}
final_temp_dict[elem] = temp_final_dict
print (final_temp_dict)
However, there are much nicer ways to do something like this. See the other answers for inspiration.

def convert(items, value):
if not items:
return value
return {items.pop(0): convert(items, value)}
print(convert(diff_str.strip('/').split('/'), 'low'))

What is the most efficient data structure to build a large word-to-index-to-word dictionary?

I would like to index a very large number of strings (mapping each string to an numeric value) but also be able to retrieve each string from its numeric index.
Using hash tables or python dict is not an option because of memory issues so I decided to use a radix trie to store the strings, I can retrieve the index of any string very quickly and handle a very large number of strings.
My problem is that I also need to retrieve the strings from their numeric index, and if I maintain a "reverse index" list [string1, string2, ..., stringn] I'll loose the memory benefit of the Trie.
I thought maybe the "reverse index" could be a list of pointers to the last node of a kind-of Trie structure but first, there are no pointers in python, and second I'm not sure I can have a "node-level" access to the Trie structure I'm currently using.
Does this kind of data-structure already exists? And if not how would you do this in python?

As per What data structure to use to have O(log n) key AND value lookup? , you need two synchronized data structures for key and value lookups, each holding references to the other's leaf nodes.
The structure for the ID lookup can be anything with sufficient efficientcy -- a balanced tree, a hash table, another trie.
To be able to extract the value from a leaf node reference, a trie needs to allow 1) leaf node references themselves (not necessarily a real Python reference, anything that its API can use); 2) walking up the trie to extract the word from that reference.
Note that a reference is effectively a unique integer so if your IDs are not larger than an integer, it makes sense to reuse something as IDs -- e.g. the trie node references themselves. Then if the trie API can validate such a reference (i.e. tell if it has a used node with such a reference) this will act as the ID lookup and you don't need the 2nd structure at all! This way, the IDs will be non-persistent though 'cuz reference values (effectively memory addresses) change between processes and runs.

I'm answering to myself because I finally end up creating my own data-structure which is perfectly suited for the word-to-index-to-word problem I had, using only python3 built-in functions.
I tried to make it clean and efficient but there's obviously room for improvement and a C binding would be better.
So the final result is a indexedtrie class that looks like a python dict (or defaultdict if you invoke it with a default_factory parameter) but can also be queried like a list because a kind of "reversed index" is automatically maintained.
The keys, which are stored in an internal radix trie, can be any subscriptable object (bytes, strings, tuples, lists) and the values you want to store anything you want inside.
Also the indextrie class is pickable, and you can benefit from the advantages of radix tries regarding "prefix search" and this kind of things!
Each key in the trie is associated with a unique integer index, you can retrieve the key with the index or the index with the key and the whole thing is fast and memory safe so I personally think that's one of the best data-structure in the world and that it should be integrated in python standard library :).
Enough talking, here is the code, feel free to adapt and use it:
"""
A Python3 indexed trie class.
An indexed trie's key can be any subscriptable object.
Keys of the indexed trie are stored using a "radix trie", a space-optimized data-structure which has many advantages (see https://en.wikipedia.org/wiki/Radix_tree).
Also, each key in the indexed trie is associated to a unique index which is build dynamically.
Indexed trie is used like a python dictionary (and even a collections.defaultdict if you want to) but its values can also be accessed or updated (but not created) like a list!
Example:
>>> t = indextrie()
>>> t["abc"] = "hello"
>>> t[0]
'hello'
>>> t["abc"]
'hello'
>>> t.index2key(0)
'abc'
>>> t.key2index("abc")
0
>>> t[:]
[0]
>>> print(t)
{(0, 'abc'): hello}
"""
__author__ = "#fbparis"
_SENTINEL = object()
class _Node(object):
"""
A single node in the trie.
"""
__slots__ = "_children", "_parent", "_index", "_key"
def __init__(self, key, parent, index=None):
self._children = set()
self._key = key
self._parent = parent
self._index = index
self._parent._children.add(self)
class IndexedtrieKey(object):
"""
A pair (index, key) acting as an indexedtrie's key
"""
__slots__ = "index", "key"
def __init__(self, index, key):
self.index = index
self.key = key
def __repr__(self):
return "(%d, %s)" % (self.index, self.key)
class indexedtrie(object):
"""
The indexed trie data-structure.
"""
__slots__ = "_children", "_indexes", "_values", "_nodescount", "_default_factory"
def __init__(self, items=None, default_factory=_SENTINEL):
"""
A list of items can be passed to initialize the indexed trie.
"""
self._children = set()
self.setdefault(default_factory)
self._indexes = []
self._values = []
self._nodescount = 0 # keeping track of nodes count is purely informational
if items is not None:
for k, v in items:
if isinstance(k, IndexedtrieKey):
self.__setitem__(k.key, v)
else:
self.__setitem__(k, v)
#classmethod
def fromkeys(cls, keys, value=_SENTINEL, default_factory=_SENTINEL):
"""
Build a new indexedtrie from a list of keys.
"""
obj = cls(default_factory=default_factory)
for key in keys:
if value is _SENTINEL:
if default_factory is not _SENTINEL:
obj[key] = obj._default_factory()
else:
obj[key] = None
else:
obj[key] = value
return obj
#classmethod
def fromsplit(cls, keys, value=_SENTINEL, default_factory=_SENTINEL):
"""
Build a new indexedtrie from a splitable object.
"""
obj = cls(default_factory=default_factory)
for key in keys.split():
if value is _SENTINEL:
if default_factory is not _SENTINEL:
obj[key] = obj._default_factory()
else:
obj[key] = None
else:
obj[key] = value
return obj
def setdefault(self, factory=_SENTINEL):
"""
"""
if factory is not _SENTINEL:
# indexed trie will act like a collections.defaultdict except in some cases because the __missing__
# method is not implemented here (on purpose).
# That means that simple lookups on a non existing key will return a default value without adding
# the key, which is the more logical way to do.
# Also means that if your default_factory is for example "list", you won't be able to create new
# items with "append" or "extend" methods which are updating the list itself.
# Instead you have to do something like trie["newkey"] += [...]
try:
_ = factory()
except TypeError:
# a default value is also accepted as default_factory, even "None"
self._default_factory = lambda: factory
else:
self._default_factory = factory
else:
self._default_factory = _SENTINEL
def copy(self):
"""
Return a pseudo-shallow copy of the indexedtrie.
Keys and nodes are deepcopied, but if you store some referenced objects in values, only the references will be copied.
"""
return self.__class__(self.items(), default_factory=self._default_factory)
def __len__(self):
return len(self._indexes)
def __repr__(self):
if self._default_factory is not _SENTINEL:
default = ", default_value=%s" % self._default_factory()
else:
default = ""
return "<%s object at %s: %d items, %d nodes%s>" % (self.__class__.__name__, hex(id(self)), len(self), self._nodescount, default)
def __str__(self):
ret = ["%s: %s" % (k, v) for k, v in self.items()]
return "{%s}" % ", ".join(ret)
def __iter__(self):
return self.keys()
def __contains__(self, key_or_index):
"""
Return True if the key or index exists in the indexed trie.
"""
if isinstance(key_or_index, IndexedtrieKey):
return key_or_index.index >= 0 and key_or_index.index < len(self)
if isinstance(key_or_index, int):
return key_or_index >= 0 and key_or_index < len(self)
if self._seems_valid_key(key_or_index):
try:
node = self._get_node(key_or_index)
except KeyError:
return False
else:
return node._index is not None
raise TypeError("invalid key type")
def __getitem__(self, key_or_index):
"""
"""
if isinstance(key_or_index, IndexedtrieKey):
return self._values[key_or_index.index]
if isinstance(key_or_index, int) or isinstance(key_or_index, slice):
return self._values[key_or_index]
if self._seems_valid_key(key_or_index):
try:
node = self._get_node(key_or_index)
except KeyError:
if self._default_factory is _SENTINEL:
raise
else:
return self._default_factory()
else:
if node._index is None:
if self._default_factory is _SENTINEL:
raise KeyError
else:
return self._default_factory()
else:
return self._values[node._index]
raise TypeError("invalid key type")
def __setitem__(self, key_or_index, value):
"""
"""
if isinstance(key_or_index, IndexedtrieKey):
self._values[key_or_index.index] = value
elif isinstance(key_or_index, int):
self._values[key_or_index] = value
elif isinstance(key_or_index, slice):
raise NotImplementedError
elif self._seems_valid_key(key_or_index):
try:
node = self._get_node(key_or_index)
except KeyError:
# create a new node
self._add_node(key_or_index, value)
else:
if node._index is None:
# if node exists but not indexed, we index it and update the value
self._add_to_index(node, value)
else:
# else we update its value
self._values[node._index] = value
else:
raise TypeError("invalid key type")
def __delitem__(self, key_or_index):
"""
"""
if isinstance(key_or_index, IndexedtrieKey):
node = self._indexes[key_or_index.index]
elif isinstance(key_or_index, int):
node = self._indexes[key_or_index]
elif isinstance(key_or_index, slice):
raise NotImplementedError
elif self._seems_valid_key(key_or_index):
node = self._get_node(key_or_index)
if node._index is None:
raise KeyError
else:
raise TypeError("invalid key type")
# switch last index with deleted index (except if deleted index is last index)
last_node, last_value = self._indexes.pop(), self._values.pop()
if node._index != last_node._index:
last_node._index = node._index
self._indexes[node._index] = last_node
self._values[node._index] = last_value
if len(node._children) > 1:
#case 1: node has more than 1 child, only turn index off
node._index = None
elif len(node._children) == 1:
# case 2: node has 1 child
child = node._children.pop()
child._key = node._key + child._key
child._parent = node._parent
node._parent._children.add(child)
node._parent._children.remove(node)
del(node)
self._nodescount -= 1
else:
# case 3: node has no child, check the parent node
parent = node._parent
parent._children.remove(node)
del(node)
self._nodescount -= 1
if hasattr(parent, "_index"):
if parent._index is None and len(parent._children) == 1:
node = parent._children.pop()
node._key = parent._key + node._key
node._parent = parent._parent
parent._parent._children.add(node)
parent._parent._children.remove(parent)
del(parent)
self._nodescount -= 1
#staticmethod
def _seems_valid_key(key):
"""
Return True if "key" can be a valid key (must be subscriptable).
"""
try:
_ = key[:0]
except TypeError:
return False
return True
def keys(self, prefix=None):
"""
Yield keys stored in the indexedtrie where key is a IndexedtrieKey object.
If prefix is given, yield only keys of items with key matching the prefix.
"""
if prefix is None:
for i, node in enumerate(self._indexes):
yield IndexedtrieKey(i, self._get_key(node))
else:
if self._seems_valid_key(prefix):
empty = prefix[:0]
children = [(empty, prefix, child) for child in self._children]
while len(children):
_children = []
for key, prefix, child in children:
if prefix == child._key[:len(prefix)]:
_key = key + child._key
_children.extend([(_key, empty, _child) for _child in child._children])
if child._index is not None:
yield IndexedtrieKey(child._index, _key)
elif prefix[:len(child._key)] == child._key:
_prefix = prefix[len(child._key):]
_key = key + prefix[:len(child._key)]
_children.extend([(_key, _prefix, _child) for _child in child._children])
children = _children
else:
raise ValueError("invalid prefix type")
def values(self, prefix=None):
"""
Yield values stored in the indexedtrie.
If prefix is given, yield only values of items with key matching the prefix.
"""
if prefix is None:
for value in self._values:
yield value
else:
for key in self.keys(prefix):
yield self._values[key.index]
def items(self, prefix=None):
"""
Yield (key, value) pairs stored in the indexedtrie where key is a IndexedtrieKey object.
If prefix is given, yield only (key, value) pairs of items with key matching the prefix.
"""
for key in self.keys(prefix):
yield key, self._values[key.index]
def show_tree(self, node=None, level=0):
"""
Pretty print the internal trie (recursive function).
"""
if node is None:
node = self
for child in node._children:
print("-" * level + "<key=%s, index=%s>" % (child._key, child._index))
if len(child._children):
self.show_tree(child, level + 1)
def _get_node(self, key):
"""
Return the node associated to key or raise a KeyError.
"""
children = self._children
while len(children):
notfound = True
for child in children:
if key == child._key:
return child
if child._key == key[:len(child._key)]:
children = child._children
key = key[len(child._key):]
notfound = False
break
if notfound:
break
raise KeyError
def _add_node(self, key, value):
"""
Add a new key in the trie and updates indexes and values.
"""
children = self._children
parent = self
moved = None
done = len(children) == 0
# we want to insert key="abc"
while not done:
done = True
for child in children:
# assert child._key != key # uncomment if you don't trust me
if child._key == key[:len(child._key)]:
# case 1: child's key is "ab", insert "c" in child's children
parent = child
children = child._children
key = key[len(child._key):]
done = len(children) == 0
break
elif key == child._key[:len(key)]:
# case 2: child's key is "abcd", we insert "abc" in place of the child
# child's parent will be the inserted node and child's key is now "d"
parent = child._parent
moved = child
parent._children.remove(moved)
moved._key = moved._key[len(key):]
break
elif type(key) is type(child._key): # don't mess it up
# find longest common prefix
prefix = key[:0]
for i, c in enumerate(key):
if child._key[i] != c:
prefix = key[:i]
break
if prefix:
# case 3: child's key is abd, we spawn a new node with key "ab"
# to replace child ; child's key is now "d" and child's parent is
# the new created node.
# the new node will also be inserted as a child of this node
# with key "c"
node = _Node(prefix, child._parent)
self._nodescount += 1
child._parent._children.remove(child)
child._key = child._key[len(prefix):]
child._parent = node
node._children.add(child)
key = key[len(prefix):]
parent = node
break
# create the new node
node = _Node(key, parent)
self._nodescount += 1
if moved is not None:
# if we have moved an existing node, update it
moved._parent = node
node._children.add(moved)
self._add_to_index(node, value)
def _get_key(self, node):
"""
Rebuild key from a terminal node.
"""
key = node._key
while node._parent is not self:
node = node._parent
key = node._key + key
return key
def _add_to_index(self, node, value):
"""
Add a new node to the index.
Also record its value.
"""
node._index = len(self)
self._indexes.append(node)
self._values.append(value)
def key2index(self, key):
"""
key -> index
"""
if self._seems_valid_key(key):
node = self._get_node(key)
if node._index is not None:
return node._index
raise KeyError
raise TypeError("invalid key type")
def index2key(self, index):
"""
index or IndexedtrieKey -> key.
"""
if isinstance(index, IndexedtrieKey):
index = index.index
elif not isinstance(index, int):
raise TypeError("index must be an int")
if index < 0 or index > len(self._indexes):
raise IndexError
return self._get_key(self._indexes[index])

automatic key assign in hashmap in python

I am implementing a Hashmap in python. Right now, I am manually inserting the key and value. What I want is to automatically assign the key of the given value in ascending order. Suppose, we have a number n=8 , then it will automatically start assigning key starts from 1 to 8 , when it reach the key number 8 and we want to insert more values, then it will show a print message like, entry is full.
Instead of
hm.put("1", "sachin")
I want ,
hm.put("sachin")
and it should automatically assign key 1 for sachin.
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.next = None
class HashMap:
def __init__(self):
self.store = [None for _ in range(16)]
def get(self, key):
index = hash(key) & 15
if self.store[index] is None:
return None
n = self.store[index]
while True:
if n.key == key:
return n.value
else:
if n.next:
n = n.next
else:
return None
def put(self, key, value):
nd = Node(key, value)
index = hash(key) & 15
n = self.store[index]
if n is None:
self.store[index] = nd
else:
if n.key == key:
n.value = value
else:
while n.next:
if n.key == key:
n.value = value
return
else:
n = n.next
n.next = nd
hm = HashMap()
hm.put("1", "sachin")
hm.put("2", "sehwag")
hm.put("3", "ganguly")
hm.put("4", "srinath")
hm.put("5", "kumble")
hm.put("6", "dhoni")
hm.put("7", "kohli")
hm.put("8", "pandya")
hm.put("9", "rohit")
hm.put("10", "dhawan")
hm.put("11", "shastri")
hm.put("12", "manjarekar")
hm.put("13", "gupta")
hm.put("14", "agarkar")
hm.put("15", "nehra")
hm.put("16", "gawaskar")
hm.put("17", "vengsarkar")
print(hm.get("1"))
print(hm.get("2"))
print(hm.get("3"))
print(hm.get("4"))
print(hm.get("5"))
print(hm.get("6"))
print(hm.get("7"))
print(hm.get("8"))
print(hm.get("9"))
print(hm.get("10"))
print(hm.get("11"))
print(hm.get("12"))
print(hm.get("13"))
print(hm.get("14"))
print(hm.get("15"))
print(hm.get("16"))
print(hm.get("17"))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Hash Map in Python - python

All you wanted (at the time the question was originally asked) was a hint. Here's a hint: In Python, you can use dictionaries.

streetno = { 1 : "Sachin Tendulkar", 2 : "Dravid", 3 : "Sehwag", 4 : "Laxman", 5 : "Kohli" } And to retrieve values: name = streetno.get(3, "default value") Or name = streetno[3] That's using number as keys, put quotes around the numbers to use strings as keys.

Python Counter is also a good option in this case: from collections import Counter counter = Counter(["Sachin Tendulkar", "Sachin Tendulkar", "other things"]) print(counter) This returns a dict with the count of each element in the list: Counter({'Sachin Tendulkar': 2, 'other things': 1})

Related

How to implement consistent hashing of integer type keys in Python custom dictionary?

Python: List index out of range in while loop

Converting a string to a nested dictionary

What is the most efficient data structure to build a large word-to-index-to-word dictionary?

automatic key assign in hashmap in python

Categories

Resources