Related
I am setting up a trie, here is my code:
class Trie:
def __init__(self):
self.root = {}
self.endSymbol = "*"
def add(self, word):
node = self.root
for letter in word:
if letter not in node:
node[letter] = {}
node = node[letter]
node[self.endSymbol] = word
def multiStringSearch(smallStrings):
trie = Trie()
for word in smallStrings:
trie.add(word)
return trie
print(multiStringSearch(["abc", "mnopqr", "wyz", "no", "e", "tuuv","abb"])
Two questions:
How can I print out the trie?
For some reason, the above doesn't produce the correct trie, for example, "mnopqr" and "no" should be under the same root for the 'n', but they appear separately by the above construction.
Thanks John and Ervin for your contributions, can I just check then, which of the following would the trie set-up produce?:
For printing out the trie:
class Trie:
def __init__(self):
self.root = {}
self.endSymbol = "*"
def add(self, word):
node = self.root
for letter in word:
if letter not in node:
node[letter] = {}
node = node[letter]
node[self.endSymbol] = word
def __str__(self):
return str(self.root)
def multiStringSearch(smallStrings):
trie = Trie()
for word in smallStrings:
trie.add(word)
return trie
if __name__ == "__main__":
trie = multiStringSearch(["abc", "mnopqr", "wyz", "no", "e", "tuuv","abb"])
print(trie)
No, they should not be under the same root, because they do not share a common prefix. In your example abb and abc should be under the same root, and they are if you print out the `trie.
ad 1) The following code will print the tree with indentation (for easier examination of the tree)
class Trie:
def __init__(self):
self.root = {}
self.endSymbol = "*"
def add(self, word):
node = self.root
for letter in word:
if letter not in node:
node[letter] = {}
node = node[letter]
node[self.endSymbol] = word
def __str__(self):
return self.__format(self.root)
def __format(self, node, indent = 0):
s = ''
for key in node:
s += ' ' * indent
s += key
if key == self.endSymbol:
s += ' => ' + node[key]
s += '\n'
else:
s += '\n'
s += self.__format(node[key], indent + 1)
return s
def multiStringSearch(smallStrings):
trie = Trie()
for word in smallStrings:
trie.add(word)
return trie
print(multiStringSearch(["abc", "mnopqr", "wyz", "no", "e", "tuuv","abb"]))
Output:
a
b
c
* => abc
b
* => abb
m
n
o
p
q
r
* => mnopqr
w
y
z
* => wyz
n
o
* => no
e
* => e
t
u
u
v
* => tuuv
I'm interested in tries and DAWGs (direct acyclic word graph) and I've been reading a lot about them but I don't understand what should the output trie or DAWG file look like.
Should a trie be an object of nested dictionaries? Where each letter is divided in to letters and so on?
Would a lookup performed on such a dictionary be fast if there are 100k or 500k entries?
How to implement word-blocks consisting of more than one word separated with - or space?
How to link prefix or suffix of a word to another part in the structure? (for DAWG)
I want to understand the best output structure in order to figure out how to create and use one.
I would also appreciate what should be the output of a DAWG along with trie.
I do not want to see graphical representations with bubbles linked to each other, I want to know the output object once a set of words are turned into tries or DAWGs.
Unwind is essentially correct that there are many different ways to implement a trie; and for a large, scalable trie, nested dictionaries might become cumbersome -- or at least space inefficient. But since you're just getting started, I think that's the easiest approach; you could code up a simple trie in just a few lines. First, a function to construct the trie:
>>> _end = '_end_'
>>>
>>> def make_trie(*words):
... root = dict()
... for word in words:
... current_dict = root
... for letter in word:
... current_dict = current_dict.setdefault(letter, {})
... current_dict[_end] = _end
... return root
...
>>> make_trie('foo', 'bar', 'baz', 'barz')
{'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}},
'z': {'_end_': '_end_'}}},
'f': {'o': {'o': {'_end_': '_end_'}}}}
If you're not familiar with setdefault, it simply looks up a key in the dictionary (here, letter or _end). If the key is present, it returns the associated value; if not, it assigns a default value to that key and returns the value ({} or _end). (It's like a version of get that also updates the dictionary.)
Next, a function to test whether the word is in the trie:
>>> def in_trie(trie, word):
... current_dict = trie
... for letter in word:
... if letter not in current_dict:
... return False
... current_dict = current_dict[letter]
... return _end in current_dict
...
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'baz')
True
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'barz')
True
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'barzz')
False
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'bart')
False
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'ba')
False
I'll leave insertion and removal to you as an exercise.
Of course, Unwind's suggestion wouldn't be much harder. There might be a slight speed disadvantage in that finding the correct sub-node would require a linear search. But the search would be limited to the number of possible characters -- 27 if we include _end. Also, there's nothing to be gained by creating a massive list of nodes and accessing them by index as he suggests; you might as well just nest the lists.
Finally, I'll add that creating a directed acyclic word graph (DAWG) would be a bit more complex, because you have to detect situations in which your current word shares a suffix with another word in the structure. In fact, this can get rather complex, depending on how you want to structure the DAWG! You may have to learn some stuff about Levenshtein distance to get it right.
Here is a list of python packages that implement Trie:
marisa-trie - a C++ based implementation.
python-trie - a simple pure python implementation.
PyTrie - a more advanced pure python implementation.
pygtrie - a pure python implementation by Google.
datrie - a double array trie implementation based on libdatrie.
Have a look at this:
https://github.com/kmike/marisa-trie
Static memory-efficient Trie structures for Python (2.x and 3.x).
String data in a MARISA-trie may take up to 50x-100x less memory than
in a standard Python dict; the raw lookup speed is comparable; trie
also provides fast advanced methods like prefix search.
Based on marisa-trie C++ library.
Here's a blog post from a company using marisa trie successfully:
https://www.repustate.com/blog/sharing-large-data-structure-across-processes-python/
At Repustate, much of our data models we use in our text analysis can be represented as simple key-value pairs, or dictionaries in Python lingo. In our particular case, our dictionaries are massive, a few hundred MB each, and they need to be accessed constantly. In fact for a given HTTP request, 4 or 5 models might be accessed, each doing 20-30 lookups. So the problem we face is how do we keep things fast for the client as well as light as possible for the server.
...
I found this package, marisa tries, which is a Python wrapper around a C++ implementation of a marisa trie. “Marisa” is an acronym for Matching Algorithm with Recursively Implemented StorAge. What’s great about marisa tries is the storage mechanism really shrinks how much memory you need. The author of the Python plugin claimed 50-100X reduction in size – our experience is similar.
What’s great about the marisa trie package is that the underlying trie structure can be written to disk and then read in via a memory mapped object. With a memory mapped marisa trie, all of our requirements are now met. Our server’s memory usage went down dramatically, by about 40%, and our performance was unchanged from when we used Python’s dictionary implementation.
There are also a couple of pure-python implementations, though unless you're on a restricted platform you'd want to use the C++ backed implementation above for best performance:
https://github.com/bdimmick/python-trie
https://pypi.python.org/pypi/PyTrie
Modified from senderle's method (above). I found that Python's defaultdict is ideal for creating a trie or a prefix tree.
from collections import defaultdict
class Trie:
"""
Implement a trie with insert, search, and startsWith methods.
"""
def __init__(self):
self.root = defaultdict()
# #param {string} word
# #return {void}
# Inserts a word into the trie.
def insert(self, word):
current = self.root
for letter in word:
current = current.setdefault(letter, {})
current.setdefault("_end")
# #param {string} word
# #return {boolean}
# Returns if the word is in the trie.
def search(self, word):
current = self.root
for letter in word:
if letter not in current:
return False
current = current[letter]
if "_end" in current:
return True
return False
# #param {string} prefix
# #return {boolean}
# Returns if there is any word in the trie
# that starts with the given prefix.
def startsWith(self, prefix):
current = self.root
for letter in prefix:
if letter not in current:
return False
current = current[letter]
return True
# Now test the class
test = Trie()
test.insert('helloworld')
test.insert('ilikeapple')
test.insert('helloz')
print test.search('hello')
print test.startsWith('hello')
print test.search('ilikeapple')
There's no "should"; it's up to you. Various implementations will have different performance characteristics, take various amounts of time to implement, understand, and get right. This is typical for software development as a whole, in my opinion.
I would probably first try having a global list of all trie nodes so far created, and representing the child-pointers in each node as a list of indices into the global list. Having a dictionary just to represent the child linking feels too heavy-weight, to me.
Using defaultdict and reduce function.
Create Trie
from functools import reduce
from collections import defaultdict
T = lambda : defaultdict(T)
trie = T()
reduce(dict.__getitem__,'how',trie)['isEnd'] = True
Trie :
defaultdict(<function __main__.<lambda>()>,
{'h': defaultdict(<function __main__.<lambda>()>,
{'o': defaultdict(<function __main__.<lambda>()>,
{'w': defaultdict(<function __main__.<lambda>()>,
{'isEnd': True})})})})
Search In Trie :
curr = trie
for w in 'how':
if w in curr:
curr = curr[w]
else:
print("Not Found")
break
if curr['isEnd']:
print('Found')
from collections import defaultdict
Define Trie:
_trie = lambda: defaultdict(_trie)
Create Trie:
trie = _trie()
for s in ["cat", "bat", "rat", "cam"]:
curr = trie
for c in s:
curr = curr[c]
curr.setdefault("_end")
Lookup:
def word_exist(trie, word):
curr = trie
for w in word:
if w not in curr:
return False
curr = curr[w]
return '_end' in curr
Test:
print(word_exist(trie, 'cam'))
Here is full code using a TrieNode class. Also implemented auto_complete method to return the matching words with a prefix.
Since we are using dictionary to store children, there is no need to convert char to integer and vice versa and don't need to allocate array memory in advance.
class TrieNode:
def __init__(self):
#Dict: Key = letter, Item = TrieNode
self.children = {}
self.end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def build_trie(self,words):
for word in words:
self.insert(word)
def insert(self,word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.end = True
def search(self, word):
node = self.root
for char in word:
if char in node.children:
node = node.children[char]
else:
return False
return node.end
def _walk_trie(self, node, word, word_list):
if node.children:
for char in node.children:
word_new = word + char
if node.children[char].end:
# if node.end:
word_list.append( word_new)
# word_list.append( word)
self._walk_trie(node.children[char], word_new , word_list)
def auto_complete(self, partial_word):
node = self.root
word_list = [ ]
#find the node for last char of word
for char in partial_word:
if char in node.children:
node = node.children[char]
else:
# partial_word not found return
return word_list
if node.end:
word_list.append(partial_word)
# word_list will be created in this method for suggestions that start with partial_word
self._walk_trie(node, partial_word, word_list)
return word_list
create a Trie
t = Trie()
words = ['hi', 'hieght', 'rat', 'ram', 'rattle', 'hill']
t.build_trie(words)
Search for word
words = ['hi', 'hello']
for word in words:
print(word, t.search(word))
hi True
hel False
search for words using prefix
partial_word = 'ra'
t.auto_complete(partial_word)
['rat', 'rattle', 'ram']
If you want a TRIE implemented as a Python class, here is something I wrote after reading about them:
class Trie:
def __init__(self):
self.__final = False
self.__nodes = {}
def __repr__(self):
return 'Trie<len={}, final={}>'.format(len(self), self.__final)
def __getstate__(self):
return self.__final, self.__nodes
def __setstate__(self, state):
self.__final, self.__nodes = state
def __len__(self):
return len(self.__nodes)
def __bool__(self):
return self.__final
def __contains__(self, array):
try:
return self[array]
except KeyError:
return False
def __iter__(self):
yield self
for node in self.__nodes.values():
yield from node
def __getitem__(self, array):
return self.__get(array, False)
def create(self, array):
self.__get(array, True).__final = True
def read(self):
yield from self.__read([])
def update(self, array):
self[array].__final = True
def delete(self, array):
self[array].__final = False
def prune(self):
for key, value in tuple(self.__nodes.items()):
if not value.prune():
del self.__nodes[key]
if not len(self):
self.delete([])
return self
def __get(self, array, create):
if array:
head, *tail = array
if create and head not in self.__nodes:
self.__nodes[head] = Trie()
return self.__nodes[head].__get(tail, create)
return self
def __read(self, name):
if self.__final:
yield name
for key, value in self.__nodes.items():
yield from value.__read(name + [key])
This version is using recursion
import pprint
from collections import deque
pp = pprint.PrettyPrinter(indent=4)
inp = raw_input("Enter a sentence to show as trie\n")
words = inp.split(" ")
trie = {}
def trie_recursion(trie_ds, word):
try:
letter = word.popleft()
out = trie_recursion(trie_ds.get(letter, {}), word)
except IndexError:
# End of the word
return {}
# Dont update if letter already present
if not trie_ds.has_key(letter):
trie_ds[letter] = out
return trie_ds
for word in words:
# Go through each word
trie = trie_recursion(trie, deque(word))
pprint.pprint(trie)
Output:
Coool👾 <algos>🚸 python trie.py
Enter a sentence to show as trie
foo bar baz fun
{
'b': {
'a': {
'r': {},
'z': {}
}
},
'f': {
'o': {
'o': {}
},
'u': {
'n': {}
}
}
}
This is much like a previous answer but simpler to read:
def make_trie(words):
trie = {}
for word in words:
head = trie
for char in word:
if char not in head:
head[char] = {}
head = head[char]
head["_end_"] = "_end_"
return trie
class TrieNode:
def __init__(self):
self.keys = {}
self.end = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str, node=None) -> None:
if node == None:
node = self.root
# insertion is a recursive operation
# this is base case to exit the recursion
if len(word) == 0:
node.end = True
return
# if this key does not exist create a new node
elif word[0] not in node.keys:
node.keys[word[0]] = TrieNode()
self.insert(word[1:], node.keys[word[0]])
# that means key exists
else:
self.insert(word[1:], node.keys[word[0]])
def search(self, word: str, node=None) -> bool:
if node == None:
node = self.root
# this is positive base case to exit the recursion
if len(word) == 0 and node.end == True:
return True
elif len(word) == 0:
return False
elif word[0] not in node.keys:
return False
else:
return self.search(word[1:], node.keys[word[0]])
def startsWith(self, prefix: str, node=None) -> bool:
if node == None:
node = self.root
if len(prefix) == 0:
return True
elif prefix[0] not in node.keys:
return False
else:
return self.startsWith(prefix[1:], node.keys[prefix[0]])
class Trie:
head = {}
def add(self,word):
cur = self.head
for ch in word:
if ch not in cur:
cur[ch] = {}
cur = cur[ch]
cur['*'] = True
def search(self,word):
cur = self.head
for ch in word:
if ch not in cur:
return False
cur = cur[ch]
if '*' in cur:
return True
else:
return False
def printf(self):
print (self.head)
dictionary = Trie()
dictionary.add("hi")
#dictionary.add("hello")
#dictionary.add("eye")
#dictionary.add("hey")
print(dictionary.search("hi"))
print(dictionary.search("hello"))
print(dictionary.search("hel"))
print(dictionary.search("he"))
dictionary.printf()
Out
True
False
False
False
{'h': {'i': {'*': True}}}
Python Class for Trie
Trie Data Structure can be used to store data in O(L) where L is the length of the string so for inserting N strings time complexity would be O(NL) the string can be searched in O(L) only same goes for deletion.
Can be clone from https://github.com/Parikshit22/pytrie.git
class Node:
def __init__(self):
self.children = [None]*26
self.isend = False
class trie:
def __init__(self,):
self.__root = Node()
def __len__(self,):
return len(self.search_byprefix(''))
def __str__(self):
ll = self.search_byprefix('')
string = ''
for i in ll:
string+=i
string+='\n'
return string
def chartoint(self,character):
return ord(character)-ord('a')
def remove(self,string):
ptr = self.__root
length = len(string)
for idx in range(length):
i = self.chartoint(string[idx])
if ptr.children[i] is not None:
ptr = ptr.children[i]
else:
raise ValueError("Keyword doesn't exist in trie")
if ptr.isend is not True:
raise ValueError("Keyword doesn't exist in trie")
ptr.isend = False
return
def insert(self,string):
ptr = self.__root
length = len(string)
for idx in range(length):
i = self.chartoint(string[idx])
if ptr.children[i] is not None:
ptr = ptr.children[i]
else:
ptr.children[i] = Node()
ptr = ptr.children[i]
ptr.isend = True
def search(self,string):
ptr = self.__root
length = len(string)
for idx in range(length):
i = self.chartoint(string[idx])
if ptr.children[i] is not None:
ptr = ptr.children[i]
else:
return False
if ptr.isend is not True:
return False
return True
def __getall(self,ptr,key,key_list):
if ptr is None:
key_list.append(key)
return
if ptr.isend==True:
key_list.append(key)
for i in range(26):
if ptr.children[i] is not None:
self.__getall(ptr.children[i],key+chr(ord('a')+i),key_list)
def search_byprefix(self,key):
ptr = self.__root
key_list = []
length = len(key)
for idx in range(length):
i = self.chartoint(key[idx])
if ptr.children[i] is not None:
ptr = ptr.children[i]
else:
return None
self.__getall(ptr,key,key_list)
return key_list
t = trie()
t.insert("shubham")
t.insert("shubhi")
t.insert("minhaj")
t.insert("parikshit")
t.insert("pari")
t.insert("shubh")
t.insert("minakshi")
print(t.search("minhaj"))
print(t.search("shubhk"))
print(t.search_byprefix('m'))
print(len(t))
print(t.remove("minhaj"))
print(t)
Code Oputpt
True
False
['minakshi', 'minhaj']
7
minakshi
minhajsir
pari
parikshit
shubh
shubham
shubhi
With prefix search
Here is #senderle's answer, slightly modified to accept prefix search (and not only whole-word matching):
_end = '_end_'
def make_trie(words):
root = dict()
for word in words:
current_dict = root
for letter in word:
current_dict = current_dict.setdefault(letter, {})
current_dict[_end] = _end
return root
def in_trie(trie, word):
current_dict = trie
for letter in word:
if _end in current_dict:
return True
if letter not in current_dict:
return False
current_dict = current_dict[letter]
t = make_trie(['hello', 'hi', 'foo', 'bar'])
print(in_trie(t, 'hello world'))
# True
In response to #basj
The following code will capture \b (end of word) letters.
_end = '_end_'
def make_trie(words):
root = dict()
for word in words:
current_dict = root
for letter in word:
current_dict = current_dict.setdefault(letter, {})
current_dict[_end] = _end
return root
def in_trie(trie, word):
current_dict = trie
for letter in word:
if letter not in current_dict: # Adjusted the
return False # order of letter
if _end in current_dict[letter]: # checks to capture
return True # the last letter.
current_dict = current_dict[letter]
t = make_trie(['hello', 'hi', 'foo', 'bar'])
>>> print(in_trie(t, 'hi'))
True
>>> print(in_trie(t, 'hola'))
False
>>> print(in_trie(t, 'hello friend'))
True
>>> print(in_trie(t, 'hel'))
None
I need to add the ability to list suffixes to implement our autocomplete feature. To do that, I implemented a function on the TrieNode object that will return all complete word suffixes that exist below it in the trie. For example, if our Trie contains the words ["fun", "function", "factory"] and we ask for suffixes from the f node, we would expect to receive ["un", "unction", "actory"] back from node.get_suffixes(). Here is how I get started:
class TrieNode:
def __init__(self):
## Initialize this node in the Trie
self.word_end = False
self.children = dict()
def insert(self, char):
## Add a child node in this Trie
if not char in self.children:
self.children[char] = TrieNode()
def get_suffixes(self):
pass
I have tested the get_suffixes function separately and it seemed to work fine.
result = []
def get_suffixes(node, suffix=""):
if not node.children == dict():
for key in node.children:
suffix += key
if node.children[key].word_end:
result.append(suffix)
get_suffixes(node.children[key], suffix)
suffix = suffix[:-1]
return result
How is how I tested the function:
# Create a mock trie for the test
node = TrieNode()
node.insert("A")
node.children["A"].word_end = True
node.children["A"].insert("t")
node.children["A"].children["t"].word_end = True
node.children["A"].insert("b")
node.children["A"].children["b"].insert("a")
node.children["A"].children["b"].children["a"].insert("c")
node.children["A"].children["b"].children["a"].children["c"].insert("a")
node.children["A"].children["b"].children["a"].children["c"].children["a"].word_end = True
node.children["A"].insert("d")
node.children["A"].children["d"].insert("d")
node.children["A"].children["d"].children["d"].word_end = True
node.children["A"].children["d"].insert("m")
node.children["A"].children["d"].children["m"].insert("i")
node.children["A"].children["d"].children["m"].children["i"].insert("n")
node.children["A"].children["d"].children["m"].children["i"].children["n"].word_end = True
result = []
def get_suffixes(node, suffix=""):
if not node.children == dict():
for key in node.children:
suffix += key
if node.children[key].word_end:
result.append(suffix)
get_suffixes(node.children[key], suffix)
suffix = suffix[:-1]
return result
get_suffixes(node.children["A"]) # Returns ['t', 'baca', 'dd', 'dmin'], as expected
The problem occured when I tried moving the get_suffixes function to the TrieNode class. Here I do not know how I should tackle the global variable result. It is not supposed to be a global variable anymore. I have tried two versions:
Version I: make result a class attribute
class TrieNode:
def __init__(self):
## Initialize this node in the Trie
self.word_end = False
self.children = dict()
self.result = []
def insert(self, char):
## Add a child node in this Trie
if not char in self.children:
self.children[char] = TrieNode()
def get_suffixes(self, suffix=""):
if not self.children == dict():
for key in self.children:
suffix += key
if self.children[key].word_end:
self.result.append(suffix)
self.children[key].get_suffixes(suffix)
suffix = suffix[:-1]
return self.result
node.children["A"].get_suffixes() # Returns ['t'], which is wrong
Version II: make result a default function parameter
class TrieNode:
def __init__(self):
## Initialize this node in the Trie
self.word_end = False
self.children = dict()
def insert(self, char):
## Add a child node in this Trie
if not char in self.children:
self.children[char] = TrieNode()
def suffixes(self, suffix="", result=[]):
if not self.children == dict():
for key in self.children:
suffix += key
if self.children[key].word_end:
result.append(suffix)
self.children[key].suffixes(suffix)
suffix = suffix[:-1]
return result
node.children["A"].suffixes() # Returns ['t', 'baca', 'dd', 'dmin']
node.children["A"].suffixes() # Returns ['t', 'baca', 'dd', 'dmin', 't', 'baca', 'dd', 'dmin']
The result of Version II is not surprising because:
def append(number, number_list=[]):
number_list.append(number)
print(number_list)
return number_list
append(5) # expecting: [5], actual: [5]
append(7) # expecting: [7], actual: [5, 7]
append(2) # expecting: [2], actual: [5, 7, 2]
I am learning algorithms and data structure in Python. I was asked to do it using a recursive function. Other approaches such as Implementing a Trie to support autocomplete in Python are not the answers I expect though they themselves might be able to solve the problem. I am extremely curious why self.result is not properly modified in Version I but works properly if it does not reside in a class.
result belongs to the class TrieNode.
When you return self.result from the get_suffixes method, you are only including the answers found in the current TrieNode Instance.
You need to include the answers found by its children as well. Thanks to recursion the code just needs a minor change and adding self.result+=self.children[key].get_suffixes(suffix) makes everything work.
class TrieNode:
def __init__(self):
## Initialize this node in the Trie
self.word_end = False
self.children = dict()
self.result = []
def insert(self, char):
## Add a child node in this Trie
if not char in self.children:
self.children[char] = TrieNode()
def get_suffixes(self, suffix=""):
if not self.children == dict():
for key in self.children:
suffix += key
if self.children[key].word_end:
self.result.append(suffix)
else:
self.result+=self.children[key].get_suffixes(suffix)
suffix = suffix[:-1]
return self.result
# Create a mock trie for the test
node = TrieNode()
node.insert("A")
node.children["A"].word_end = True
node.children["A"].insert("t")
node.children["A"].children["t"].word_end = True
node.children["A"].insert("b")
node.children["A"].children["b"].insert("a")
node.children["A"].children["b"].children["a"].insert("c")
node.children["A"].children["b"].children["a"].children["c"].insert("a")
node.children["A"].children["b"].children["a"].children["c"].children["a"].word_end = True
node.children["A"].insert("d")
node.children["A"].children["d"].insert("d")
node.children["A"].children["d"].children["d"].word_end = True
node.children["A"].children["d"].insert("m")
node.children["A"].children["d"].children["m"].insert("i")
node.children["A"].children["d"].children["m"].children["i"].insert("n")
node.children["A"].children["d"].children["m"].children["i"].children["n"].word_end = True
print(node.children["A"].get_suffixes())
Output:-
['t', 'baca', 'dd', 'dmin']
The thing to remember is that every child is a new instance of the TrieNode class and thus has its own separate result array.
Modified Insertion + No Result Array:-
class TrieNode:
def __init__(self):
## Initialize this node in the Trie
self.word_end = False
self.children = dict()
def insert(self, string):
if len(string) == 0:
self.word_end = True
return
## Add a child node in this Trie
if not string[0] in self.children:
self.children[string[0]] = TrieNode()
self.children[string[0]].insert(string[1:])
def get_suffixes(self, suffix=""):
query_result=[]
if self.word_end:
query_result.append(suffix)
for i in self.children:
query_result+=self.children[i].get_suffixes(suffix+i)
return query_result
# Create a mock trie for the test
node = TrieNode()
node.insert("Add")
node.insert("At")
node.insert("Abaca")
node.insert("Admin")
print(node.children["A"].get_suffixes())
print(node.children["A"].get_suffixes())
print(node.children["A"].children["t"].get_suffixes())
Output:-
['dd', 'dmin', 't', 'baca']
['dd', 'dmin', 't', 'baca']
['']
[Finished in 0.0s]
Can someone say what is wrong with my code, it is passing all the test cases except the last one when I downloaded the specific test case both the expected and actual output seems same, the question is https://leetcode.com/problems/implement-trie-prefix-tree/description/
Edit 1:
Here is the code:
class Trie:
def __init__(self):
"""
Initialize your data structure here.
"""
self.data = None
self.children = {}
self.isWord = False
def insert(self, word):
"""
Inserts a word into the trie.
:type word: str
:rtype: void
"""
if len(word) == 0:
return
if word[0] not in self.children:
self.children[word[0]] = Trie()
self.insertHelper(word[1:], self.children[word[0]])
else:
self.insertHelper(word[1:], self.children[word[0]])
if len(word) == 1:
self.isWord = True
def insertHelper(self, word, trie):
if len(word) == 0:
return
if word[0] not in trie.children:
trie.children[word[0]] = Trie()
trie.insertHelper(word[1:], trie.children[word[0]])
else:
trie.insertHelper(word[1:], trie.children[word[0]])
if len(word) == 1:
trie.isWord = True
def search(self, word):
"""
Returns if the word is in the trie.
:type word: str
:rtype: bool
"""
if len(word) == 1 and word[0] in self.children and self.isWord:
return True
elif len(word) == 0:
return False
if word[0] in self.children:
return self.searchHelper(word[1:], self.children[word[0]])
else:
return False
def searchHelper(self, word, trie):
if len(word) == 1 and word[0] in trie.children and trie.isWord:
return True
elif len(word) == 0:
return False
if word[0] in trie.children:
return self.searchHelper(word[1:], trie.children[word[0]])
else:
return False
def startsWith(self, prefix):
"""
Returns if there is any word in the trie that starts with the given prefix.
:type prefix: str
:rtype: bool
"""
if len(prefix) == 0:
return False
if prefix[0] in self.children:
return self.startsWithHelper(prefix[1:], self.children[prefix[0]])
else:
return False
def startsWithHelper(self, prefix, trie):
if len(prefix) == 0:
return True
if prefix[0] in trie.children:
return trie.startsWithHelper(prefix[1:], trie.children[prefix[0]])
else:
return False
Thanks in advance.
One quirk I noticed is passing an empty prefix into startsWith(). If this method is modeled on the Python str method startswith(), then we expect True:
>>> "apple".startswith("")
True
>>>
But your Trie returns False in this situation:
>>> t = Trie()
>>> t.insert("apple")
>>> t.startsWith("")
False
>>>
Below is my rework of your code that I did primarily to understand it but I also found you had redundancies, particularly your Helper functions. This code fixes the quirk mentioned above and is Python 3 specific:
class Trie:
def __init__(self):
self.children = {}
self.isWord = False
def insert(self, word):
"""
Inserts a word into the trie.
:type word: str (or list internally upon recursion)
:rtype: None
"""
if not word:
return
head, *tail = word
if head not in self.children:
self.children[head] = Trie()
trie = self.children[head]
if tail:
trie.insert(tail)
else:
self.isWord = True
def search(self, word):
"""
Returns True if the word is in the trie.
:type word: str (or list internally upon recursion)
:rtype: bool
"""
if not word:
return False
head, *tail = word
if head in self.children:
if not tail and self.isWord:
return True
return self.children[head].search(word[1:])
return False
def startsWith(self, prefix):
"""
Returns if there is any word in the trie that starts with the given prefix.
:type prefix: str (or list internally upon recursion)
:rtype: bool
"""
if not prefix:
return True
head, *tail = prefix
if head in self.children:
return self.children[head].startsWith(tail)
return False
Here's another solution using the 'defaultdictionary' from the collections module to utilize recursion in the 'insert' function too.
Credit: https://leetcode.com/problems/implement-trie-prefix-tree/discuss/631957/python-elegant-solution-no-nested-dictionaries
class Trie:
def __init__(self):
"""
Initialize your data structure here.
"""
self.nodes = collections.defaultdict(Trie)
self.is_word = False
def insert(self, word: str) -> None:
"""
Inserts a word into the trie.
"""
if not word:
self.is_word = True
else:
self.nodes[word[0]].insert(word[1:])
def search(self, word: str) -> bool:
"""
Returns if the word is in the trie.
"""
if not word:
return self.is_word
if word[0] in self.nodes:
return self.nodes[word[0]].search(word[1:])
return False
def startsWith(self, prefix: str) -> bool:
"""
Returns if there is any word in the trie that starts with the given prefix.
"""
if not prefix:
return True
if prefix[0] in self.nodes:
return self.nodes[prefix[0]].startsWith(prefix[1:])
return False
Your Trie object will be instantiated and called as such:
obj = Trie()
obj.insert(word)
param_2 = obj.search(word)
param_3 = obj.startsWith(prefix)
class TrieNode:
def __init__(self):
# each key is a TrieNode
self.keys = {}
self.end = False
class Trie:
def __init__(self):
self.root = TrieNode()
# node=this.root gives error "this" is not defined
def insert(self, word: str, node=None) -> None:
if node == None:
node = self.root
# insertion is a recursive operation
if len(word) == 0:
node.end = True
return
elif word[0] not in node.keys:
node.keys[word[0]] = TrieNode()
self.insert(word[1:], node.keys[word[0]])
# that means key exists
else:
self.insert(word[1:], node.keys[word[0]])
def search(self, word: str, node=None) -> bool:
if node == None:
node = self.root
# node.end=True means we have inserted the word before
if len(word) == 0 and node.end == True:
return True
# if we inserted apple and then search for app we get false becase we never inserted app so a-p-p last_p.end is not True
# But startsWith(app) would return True
elif len(word) == 0:
return False
elif word[0] not in node.keys:
return False
else:
# we have to return becasue api expects us to return bool
return self.search(word[1:], node.keys[word[0]])
def startsWith(self, prefix: str, node=None) -> bool:
if node == None:
node = self.root
if len(prefix) == 0:
return True
elif prefix[0] not in node.keys:
return False
else:
return self.startsWith(prefix[1:], node.keys[prefix[0]])
I've read the following implementation of the trie in python:
https://stackoverflow.com/a/11016430/2225221
and tried to make the remove fnction for it.
Basically, I had problems even with the start: If you want to remove a word from a trie, it can has sub-"words", or it can be "subword" of another word.
If you remove with "del dict[key]", you are removing these above mentioned two kinds of words also.
Could anyone help me in this, how to remove properly the chosen word (let us presume it's in the trie)
Basically, to remove a word from the trie (as it is implemented in the answer you linked to), you'd just have to remove its _end marker, for example like this:
def remove_word(trie, word):
current_dict = trie
for letter in word:
current_dict = current_dict.get(letter, None)
if current_dict is None:
# the trie doesn't contain this word.
break
else:
del current_dict[_end]
Note however that this doesn't ensure that the trie has its minimal size. After deleting the word, there may be branches in the trie left that are no longer used by any words. This doesn't affect the correctness of the data structure, it just means that the trie may consume more memory than absolutely necessary. You could improve this by iterating backwards from the leaf node and delete branches until you find one that has more than one child.
EDIT: Here's an idea how you could implement a remove function that also culls any unnecessary branches. There's probably a more efficient way to do it, but this might get you started:
def remove_word2(trie, word):
current_dict = trie
path = [current_dict]
for letter in word:
current_dict = current_dict.get(letter, None)
path.append(current_dict)
if current_dict is None:
# the trie doesn't contain this word.
break
else:
if not path[-1].get(_end, None):
# the trie doesn't contain this word (but a prefix of it).
return
deleted_branches = []
for current_dict, letter in zip(reversed(path[:-1]), reversed(word)):
if len(current_dict[letter]) <= 1:
deleted_branches.append((current_dict, letter))
else:
break
if len(deleted_branches) > 0:
del deleted_branches[-1][0][deleted_branches[-1][1]]
del path[-1][_end]
Essentially, it first finds the "path" to the word that is about to be deleted and then iterates through that backwards to find nodes that can be removed. It then removes the root of the path that can be deleted (which also implicitly deletes the _end node).
I think it is better to do it recursively, code as following:
def remove(self, word):
self.delete(self.tries, word, 0)
def delete(self, dicts, word, i):
if i == len(word):
if 'end' in dicts:
del dicts['end']
if len(dicts) == 0:
return True
else:
return False
else:
return False
else:
if word[i] in dicts and self.delete(dicts[word[i]], word, i + 1):
if len(dicts[word[i]]) == 0:
del dicts[word[i]]
return True
else:
return False
else:
return False
def remove_a_word_util(self, word, idx, node):
if len(word) == idx:
node.is_end_of_word = False
return bool(node.children)
ch = word[idx]
if ch not in node.children:
return True
flag = self.remove_a_word_util(word, idx+1, node.children[ch])
if flag:
return True
node.children.pop(ch)
return bool(node.children) or node.is_end_of_word
One method of handling structures like this is through recursion. The great thing about recursion in this case is that it zips to the bottom of the trie, then passes the returned values back up through the branches.
The following function does just that. It goes to the leaf and deletes the _end value, just in case the input word is a prefix of another. It then passes up a boolean (boo) which indicates that the current_dict is still in an outlying branch. Once we hit a point where the current dict has more than one child, we delete the appropriate branch and set boo to False so each remaining recursion will do nothing.
def trie_trim(term, trie=SYNONYMS, prev=0):
# checks that we haven't hit the end of the word
if term:
first, rest = term[0], term[1:]
current_length = len(trie)
next_length, boo = trie_trim(rest, trie=trie[first], prev=current_length)
# this statement avoids trimming excessively if the input is a prefix because
# if the word is a prefix, the first returned value will be greater than 1
if boo and next_length > 1:
boo = False
# this statement checks for the first occurrence of the current dict having more than one child
# or it checks that we've hit the bottom without trimming anything
elif boo and (current_length > 1 or not prev):
del trie[first]
boo = False
return current_length, boo
# when we do hit the end of the word, delete _end
else:
del trie[_end]
return len(trie) + 1, True
A bit of a long one, but I hope this helps answer your question:
class Trie:
WORD_END = "$"
def __init__(self):
self.trie = {}
def insert(self, word):
cur = self.trie
for char in word:
if char not in cur:
cur[char] = {}
cur = cur[char]
cur[Trie.WORD_END] = word
def delete(self, word):
def _delete(word, cur_trie, i=0):
if i == len(word):
if Trie.WORD_END not in cur_trie:
raise ValueError("'%s' is not registered in the trie..." %word)
cur_trie.pop(Trie.WORD_END)
if len(cur_trie) > 0:
return False
return True
if word[i] not in cur_trie:
raise ValueError("'%s' is not registered in the trie..." %word)
cont = _delete(word, cur_trie[word[i]], i+1)
if cont:
cur_trie.pop(word[i])
if Trie.WORD_END in cur_trie:
return False
return True
return False
_delete(word, self.trie)
t = Trie()
t.insert("bar")
t.insert("baraka")
t.insert("barakalar")
t.delete("barak") # raises error as 'barak' is not a valid WORD_END although it is a valid path.
t.delete("bareka") # raises error as 'e' does not exist in the path.
t.delete("baraka") # deletes the WORD_END of 'baraka' without deleting any letter as there is 'barakalar' afterwards.
t.delete("barakalar") # deletes until the previous word (until the first Trie.WORD_END; "$" - by going backwards with recursion) in the same path (until 'baraka').
In case you need the whole DS:
class TrieNode:
def __init__(self):
self.children = {}
self.wordCounter = 0
self.prefixCounter = 0
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node.prefixCounter += 1
node = node.children[char]
node.wordCounter += 1
def countWordsEqualTo(self, word: str) -> int:
node = self.root
if node.children:
for char in word:
node = node.children[char]
else:
return 0
return node.wordCounter
def countWordsStartingWith(self, prefix: str) -> int:
node = self.root
if node.children:
for char in prefix:
node = node.children[char]
else:
return 0
return node.prefixCounter
def erase(self, word: str) -> None:
node = self.root
for char in word:
if node.children:
node.prefixCounter -= 1
node = node.children[char]
else:
return None
node.wordCounter -= 1
if node.wordCounter == 0:
self.dfsRemove(self.root, word, 0)
def dfsRemove(self, node: TrieNode, word: str, idx: int) -> None:
if len(word) == idx:
node.wordCounter = 0
return
char = word[idx]
if char not in node.children:
return
self.dfsRemove(node.children[char], word, idx+1)
node.children.pop(char)
trie = Trie();
trie.insert("apple"); #// Inserts "apple".
trie.insert("apple"); #// Inserts another "apple".
print(trie.countWordsEqualTo("apple")) #// There are two instances of "apple" so return 2.
print(trie.countWordsStartingWith("app")) #// "app" is a prefix of "apple" so return 2.
trie.erase("apple") #// Erases one "apple".
print(trie.countWordsEqualTo("apple")) #// Now there is only one instance of "apple" so return 1.
print(trie.countWordsStartingWith("app")) #// return 1
trie.erase("apple"); #// Erases "apple". Now the trie is empty.
print(trie.countWordsEqualTo("apple")) #// return 0
print(trie.countWordsStartingWith("app")) #// return 0
I would argue that this implementation is the most succinct and easiest to understand after a bit of staring.
def removeWord(word, node=None):
if not node:
node = self.root
if word == "":
node.isEnd = False
return
newnode = node.children[word[0]]
removeWord(word[1:], newnode)
if not newnode.isEnd and len(newnode.children) == 0:
del node.children[word[0]]
Although it's a little tricky to understand with the default parameter node=None at first, this is the most succinct implementation of a Trie removal that handles marking the word node.isEnd = False while also pruning extraneous nodes.
The method is first called as Trie.removeWord("ToBeDeletedWord").
In subsequent recursion calls, a node tied to the corresponding letter ("T" then "o" then "B" then "e" etc. etc.) is added to the next recursion (e.g "remove 'oBeDeletedWord' with the node at T").
Once we hit the end node that has the full string ToBeDeletedWord , the last recursion calls removeWord("", <node d>)
In this last recursion call, we mark node.isEnd = False. Later, the node is no longer marked isEnd and it has no children so we can call the delete operator.
Once that last recursion call ends, the rest of the recursions (e.g TobeDeletedWor, TobeDeletedWo, TobeDeletedW, etc. etc.) will then observe that it too is not an end node and there are no more children. These nodes will also delete.
You will have to read this a couple of times but this implementation is concise, readable, and correct. The difficulty is that the recursion happens midfunction rather than at the beginning or end.
TL;DR
class TrieNode:
children: dict[str, "TrieNode"]
def __init__(self) -> None:
self.children = {}
self.end = False
def __contains__(self, char: str) -> bool:
return char in self.children
def __getitem__(self, __name: str) -> "TrieNode":
return self.children[__name]
def __setitem__(self, __name: str, __value: "TrieNode") -> None:
self.children[__name] = __value
def __len__(self):
return len(self.children)
def __delitem__(self, __name: str):
del self.children[__name]
class Trie:
def __init__(self, words: list[str]) -> None:
self.root = TrieNode()
for w in words:
self.insert(w)
def insert(self, word: str):
curr = self.root
for c in word:
curr = curr.children.setdefault(c, TrieNode())
curr.end = True
def remove(self, word: str):
def _remove(node: TrieNode, index: int):
if index >= len(word):
node.end = False
if not node.children:
return True
elif word[index] in node:
if _remove(node[word[index]], index + 1):
del node[word[index]]
_remove(self.root, 0)