The objective of my code is to get each seperate word from a txt file and put it into a list and then making a binary search tree using that list to count the frequency of each word and printing each word in alphabetical order along with its frequency. Each word in the can only contain letters, numbers, -, or ' The part that I am unable to do with my beginner programming knowledge is to make the Binary Search Tree using the list I have (I am only able to insert the whole list in one Node instead of putting each word in a Node to make the tree). The code I have so far is this:
def read_words(filename):
openfile = open(filename, "r")
templist = []
letterslist = []
for lines in openfile:
for i in lines:
ii = i.lower()
letterslist.append(ii)
for p in letterslist:
if p not in ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',"'","-",' '] and p.isdigit() == False:
letterslist.remove(p)
wordslist = list("".join(letterslist).split())
return wordslist
class BinaryTree:
class _Node:
def __init__(self, value, left=None, right=None):
self._left = left
self._right = right
self._value = value
self._count = 1
def __init__(self):
self.root = None
def isEmpty(self):
return self.root == None
def insert(self, value) :
if self.isEmpty() :
self.root = self._Node(value)
return
parent = None
pointer = self.root
while (pointer != None) :
if value == pointer._value:
pointer._count += 1
return
elif value < pointer._value:
parent = pointer
pointer = pointer._left
else :
parent = pointer
pointer = pointer._right
if (value <= parent._value) :
parent._left = self._Node(value)
else :
parent._right = self._Node(value)
def printTree(self):
pointer = self.root
if pointer._left is not None:
pointer._left.printTree()
print(str(pointer._value) + " " + str(pointer._count))
if pointer._right is not None:
pointer._right.printTree()
def createTree(self,words):
if len(words) > 0:
for word in words:
BinaryTree().insert(word)
return BinaryTree()
else:
return None
def search(self,tree, word):
node = tree
depth = 0
count = 0
while True:
print(node.value)
depth += 1
if node.value == word:
count = node.count
break
elif word < node.value:
node = node.left
elif word > node.value:
node = node.right
return depth, count
def main():
words = read_words('sample.txt')
b = BinaryTree()
b.insert(words)
b.createTree(words)
b.printTree()
Since you're a beginner I'd advice to implement the tree methods with recursion instead of iteration since this will result to simpler implementation. While recursion might seem a bit difficult concept at first often it is the easiest approach.
Here's a draft implementation of a binary tree which uses recursion for insertion, searching and printing the tree, it should support the functionality you need.
class Node(object):
def __init__(self, value):
self.value = value
self.left = None
self.right = None
self.count = 1
def __str__(self):
return 'value: {0}, count: {1}'.format(self.value, self.count)
def insert(root, value):
if not root:
return Node(value)
elif root.value == value:
root.count += 1
elif value < root.value:
root.left = insert(root.left, value)
else:
root.right = insert(root.right, value)
return root
def create(seq):
root = None
for word in seq:
root = insert(root, word)
return root
def search(root, word, depth=1):
if not root:
return 0, 0
elif root.value == word:
return depth, root.count
elif word < root.value:
return search(root.left, word, depth + 1)
else:
return search(root.right, word, depth + 1)
def print_tree(root):
if root:
print_tree(root.left)
print root
print_tree(root.right)
src = ['foo', 'bar', 'foobar', 'bar', 'barfoo']
tree = create(src)
print_tree(tree)
for word in src:
print 'search {0}, result: {1}'.format(word, search(tree, word))
# Output
# value: bar, count: 2
# value: barfoo, count: 1
# value: foo, count: 1
# value: foobar, count: 1
# search foo, result: (1, 1)
# search bar, result: (2, 2)
# search foobar, result: (2, 1)
# search bar, result: (2, 2)
# search barfoo, result: (3, 1)
To answer your direct question, the reason why you are placing all of the words into a single node is because of the following statement inside of main():
b.insert(words)
The insert function creates a Node and sets the value of the node to the item you pass in. Instead, you need to create a node for each item in the list which is what your createTree() function does. The preceeding b.insert is not necessary.
Removing that line makes your tree become correctly formed, but reveals a fundamental problem with the design of your data structure, namely the printTree() method. This method seems designed to traverse the tree and recursively call itself on any child. In your initial version this function worked, because there the tree was mal-formed with only a single node of the whole list (and the print function simply printed that value since right and left were empty).
However with a correctly formed tree the printTree() function now tries to invoke itself on the left and right descendants. The descendants however are of type _Node, not of type BinaryTree, and there is no methodprintTree() declared for _Node objects.
You can salvage your code and solve this new error in one of two ways. First you can implement your BinaryTree.printTree() function as _Node.printTree(). You can't do a straight copy and paste, but the logic of the function won't have to change much. Or you could leave the method where it is at, but wrap each _left or _right node inside of a new BinaryTree so that they would have the necessary printTree() method. Doing this would leave the method where it is at, but you will still have to implement some kind of helper traversal method inside of _Node.
Finally, you could change all of your _Node objects to be _BinaryTree objects instead.
The semantic difference between a node and a tree is one of scope. A node should only be aware of itself, its direct children (left and right), and possibly its parent. A tree on the other hand can be aware of any of its descendents, no matter how far removed. This is accomplished by treating any child node as its own tree. Even a leaf, without any children at all can be thought of as a tree with a depth of 0. This behavior is what lets a tree work recursively. Your code is mixing the two together.
Related
I have a tree data structure, defined as below:
class Tree(object):
def __init__(self, name='ROOT', children=None):
self.name = name
self.children = []
if children is not None:
for child in children:
self.add_child(child)
def __repr__(self):
return self.name
def add_child(self, node):
assert isinstance(node, Tree)
self.children.append(node)
I need to write a function to find the depth of the tree. Here is the function I wrote (takes a Tree as input, and returns an integer value as output), but it is not giving the right answer:
def depth(tree):
count = 1
if len(tree.children) > 0:
for child in tree.children:
count = count + 1
depth(child)
return count
How do I correct it?
While your depth(child) does do the recursive call, it does not do anything with the return value (the depth). You seem to be simply counting the nodes at a given level and calling that the depth (it's really the width).
What you need is something like (pseudo-code):
def maxDepth(node):
# No children means depth zero below.
if len(node.children) == 0:
return 0
# Otherwise get deepest child recursively.
deepestChild = 0
for each child in node.children:
childDepth = maxDepth(child)
if childDepth > deepestChild:
deepestChild = childDepth
# Depth of this node is one plus the deepest child.
return 1 + deepestChild
How about you just do a max of all the depth on each node recursively?
def max_depth(self, depth=0):
if not self.children:
return depth
return max(child.max_depth(depth + 1) for child in self.children)
def str_tree(atree,indent_char ='.',indent_delta=2):
def str_tree_1(indent,atree):
if atree == None:
return ''
else:
answer = ''
answer += str_tree_1(indent+indent_delta,atree.right)
answer += indent*indent_char+str(atree.value)+'\n'
answer += str_tree_1(indent+indent_delta,atree.left)
return answer
return str_tree_1(0,atree)
def build_balanced_bst(l):
d = []
if len(l) == 0:
return None
else:
mid = (len(l)-1)//2
if mid >= 1:
d.append(build_balanced_bst(l[:mid]))
d.append(build_balanced_bst(l[mid:]))
else:
return d
The build_balanced_bst(l) takes in a list of unique values that are sorted in increasing order. It returns a reference to the root of a well-balanced binary search tree. For example, calling build_ballanced_bst( list(irange(1,10)) returns a binary search tree of height 3 that would print as:
......10
....9
..8
......7
....6
5
......4
....3
..2
....1
The str_tree function prints what the build_balanced_bst function returns
I am working on the build_balanced_bst(l) function to make it apply to the str_tree function. I used the middle value in the list as the root’s value.
But when I call the function as the way below:
l = list(irange(1,10))
t = build_balanced_bst(l)
print('Tree is\n',str_tree(t),sep='')
it doesn't print anything. Can someone help me to fix my build_balanced_bst(l) function?
Keeping the str_tree method as it is, here's the remaining code.
class Node:
"""Represents a single node in the tree"""
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def build_balanced_bst(lt):
"""
Find the middle element in the sorted list
and make it root.
Do same for left half and right half recursively.
"""
if len(lt) == 1:
return Node(lt[0])
if len(lt) == 0:
return None
mid = (len(lt)-1)//2
left = build_balanced_bst(lt[:mid])
right = build_balanced_bst(lt[mid+1:])
root = Node(lt[mid], left, right)
return root
ordered_list = list(range(1,11))
bst=build_balanced_bst(ordered_list)
bst_repr = str_tree(bst)
print(bst_repr)
The output comes out as follows:
......10
....9
..8
......7
....6
5
......4
....3
..2
....1
I'm new to Python thus the question,this is the implementation of my my BST
class BST(object):
def __init__(self):
self.root = None
self.size = 0
def add(self, item):
return self.addHelper(item, self.root)
def addHelper(self, item, root):
if root is None:
root = Node(item)
return root
if item < root.data:
root.left = self.addHelper(item, root.left)
else:
root.right = self.addHelper(item, root.right)
This is the Node object
class Node(object):
def __init__(self, data):
self.data = data
self.left = None
self.right = None
This is my implmentation of str
def __str__(self):
self.levelByLevel(self.root)
return "Complete"
def levelByLevel(self, root):
delim = Node(sys.maxsize)
queue = deque()
queue.append(root)
queue.append(delim)
while queue:
temp = queue.popleft()
if temp == delim and len(queue) > 0:
queue.append(delim)
print()
else:
print(temp.data, " ")
if temp.left:
queue.append(temp.left)
if temp.right:
queue.append(temp.right)
This is my calling client,
def main():
bst = BST()
bst.root = bst.add(12)
bst.root = bst.add(15)
bst.root = bst.add(9)
bst.levelByLevel(bst.root)
if __name__ == '__main__':
main()
Instead of the expected output of printing the BST level by level I get the following output,
9
9223372036854775807
When I look in the debugger it seems that the every time the add method is called it starts with root as None and then returns the last number as root. I'm not sure why this is happening.
Any help appreciated.
If the root argument of your addHelper is None, you set it to a newly-created Node object and return it. If it is not, then you modify the argument but return nothing, so you end up setting bst.root to None again. Try the following with your code above — it should help your understanding of what your code is doing.
bst = BST()
bst.root = bst.add(12)
try:
print(bst.root.data)
except AttributeError:
print('root is None')
# => 12
# `bst.addHelper(12, self.root)` returned `Node(12)`,
# which `bst.add` returned too, so now `bst.root`
# is `Node(12)`
bst.root = bst.add(15)
try:
print(bst.root.data)
except AttributeError:
print('root is None')
# => root is None
# `bst.addHelper(15, self.root)` returned `None`,
# which `bst.add` returned too, so now `bst.root`
# is `None`.
bst.root = bst.add(9)
try:
print(bst.root.data)
except AttributeError:
print('root is None')
# => 9
# `bst.addHelper(9, self.root)` returned `Node(9)`,
# which `bst.add` returned too, so now `bst.root`
# is `Node(9)`
So you should do two things:
make you addHelper always return its last argument — after the appropriate modifications —, and
have your add function take care of assigning the result to self.root (do not leave it for the class user to do).
Here is the code:
def add(self, item):
self.root = self.addHelper(item, self.root)
self.size += 1 # Otherwise what good is `self.size`?
def addHelper(self, item, node):
if node is None:
node = Node(item)
elif item < node.data:
node.left = self.addHelper(item, node.left)
else:
node.right = self.addHelper(item, node.right)
return node
Notice that I changed the name of the last argument in addHelper to node for clarity (there already is something called root: that of the tree!).
You can now write your main function as follows:
def main():
bst = BST()
bst.add(12)
bst.add(15)
bst.add(9)
bst.levelByLevel(bst.root)
(which is exactly what #AaronTaggart suggests — but you need the modifications in add and addHelper). Its output is:
12
9
15
9223372036854775807
The above gets you to a working binary search tree. A few notes:
I would further modify your levelByLevel to avoid printing that last value, as well as not taking any arguments (besides self, of course) — it should always print from the root of the tree.
bst.add(None) will raise an error. You can guard against it by changing your add method. One possibility is
def add(self, item):
try:
self.root = self.addHelper(item, self.root)
self.size += 1
except TypeError:
pass
Another option (faster, since it refuses to go on processing item if it is None) is
def add(self, item):
if item is not None:
self.root = self.addHelper(item, self.root)
self.size += 1
From the point of view of design, I would expect selecting a node from a binary search tree would give me the subtree below it. In a way it does (the node contains references to all other nodes below), but still: Node and BST objects are different things. You may want to think about a way of unifying the two (this is the point in #YairTwito's answer).
One last thing: in Python, the convention for naming things is to have words in lower case and separated by underscores, not the camelCasing you are using — so add_helper instead of addHelper. I would further add an underscore at the beginning to signal that it is not meant for public use — so _add_helper, or simply _add.
Based on the following, you can see that bst.root in None after the second call to add():
>>> bst.root = bst.add(12)
>>> bst.root
<__main__.Node object at 0x7f9aaa29cfd0>
>>> bst.root = bst.add(15)
>>> type(bst.root)
<type 'NoneType'>
Your addHelper isn't returning the root node. Try this:
def addHelper(self, item, root):
if root is None:
root = Node(item)
return root
if item < root.data:
root.left = self.addHelper(item, root.left)
else:
root.right = self.addHelper(item, root.right)
return root
And then it works as expected:
>>> bst.root = bst.add(12)
>>> bst.root = bst.add(15)
>>> bst.levelByLevel(bst.root)
(12, ' ')
()
(15, ' ')
(9223372036854775807, ' ')
>>> bst.root = bst.add(9)
>>> bst.levelByLevel(bst.root)
(12, ' ')
()
(9, ' ')
(15, ' ')
(9223372036854775807, ' ')
You're using the BST object basically only to hold a root Node and the add function doesn't really operate on the BST object so it's better to have only one class (BtsNode) and implement the add there. Try that and you'll see that the add function would be much simpler.
And, in general, when a member function doesn't use self it shouldn't be a member function (like addHelper), i.e., it shouldn't have self as a parameter (if you'd like I can show you how to write the BtsNode class).
I tried writing a class that uses your idea of how to implement the BST.
class BstNode:
def __init__(self):
self.left = None
self.right = None
self.data = None
def add(self,item):
if not self.data:
self.data = item
elif item >= self.data:
if not self.right:
self.right = BstNode()
self.right.add(item)
else:
if not self.left:
self.left = BstNode()
self.left.add(item)
That way you can create a BST the following way:
bst = BstNode()
bst.add(13)
bst.add(10)
bst.add(20)
The difference is that now the add function actually operates on the object without any need for the user to do anything. The function changes the state of the object by itself.
In general a function should do only what it's expected to do. The add function is expected to add an item to the tree so it shouldn't return the root. The fact that you had to write bst.root = bst.add() each time should signal that there's some fault in your design.
Your add method probably shouldn't return a value. And you most certainly shouldn't assign the root of the tree to what the add method returns.
Try changing your main code to something like this:
def main():
bst = BST()
bst.add(12)
bst.add(15)
bst.add(9)
bst.levelByLevel(bst.root)
if __name__ == '__main__':
main()
I am trying to find the kth smallest element of binary search tree and I have problems using recursion. I understand how to print the tree inorder/postorder etc. but I fail to return the rank of the element. Can someone point where I am making a mistake? In general, I am having hard time understanding recursion in trees.
Edit: this is an exercise, so I am not looking for using built-in functions. I have another solution where I keep track of number of left and right children as I insert nodes and that code is working fine. I am wondering if it is possible to do this using inorder traversal because it seems to be a simpler solution.
class BinaryTreeNode:
def __init__(self, data, left=None, right=None):
self.data = data
self.left = left
self.right = right
def traverseInOrder(root,order):
if root == None:
return
traverseInOrder(root.left,order+1)
print root.data,
print order
traverseInOrder(root.right,order)
"""
a
/ \
b c
/ \ / \
d e f g
/ \
h i
"""
h = BinaryTreeNode("h")
i = BinaryTreeNode("i")
d = BinaryTreeNode("d", h, i)
e = BinaryTreeNode("e")
f = BinaryTreeNode("f")
g = BinaryTreeNode("g")
b = BinaryTreeNode("b", d, e)
c = BinaryTreeNode("c", f, g)
a = BinaryTreeNode("a", b, c)
print traverseInOrder(a,0)
If this is an academic exercise, make traverseInOrder (or similar method tailored to the purpose) return the number of children it visited. From there things get simpler.
If this isn't academic, have a look at http://stromberg.dnsalias.org/~dstromberg/datastructures/ - the dictionary-like objects are all trees, and support iterators - so finding the nth is a matter of zip(tree, range(n)).
You could find the smallets element in the binary search tree first. Then from that element call a method to give you the next element k times.
For find_smallest_node method, note that you can traverse all the nodes "in-order" until reach to smallest. But that approach takes O(n) time.
However, you do not need a recursion to find the smallest node, because in BST smallest node is simply the left most node, so you can traverse the nodes until finding a node that has no left child and it takes O(log n) time:
class BST(object):
def find_smallest_node(self):
if self.root == None:
return
walking_node = self.root
smallest_node = self.root
while walking_node != None:
if walking_node.data <= smallest_node.data:
smallest_node = walking_node
if walking_node.left != None:
walking_node = walking_node.left
elif walking_node.left == None:
walking_node = None
return smallest_node
def find_k_smallest(self, k):
k_smallest_node = self.find_smallest_node()
if k_smallest_node == None:
return
else:
k_smallest_data = k_smallest_node.data
count = 1
while count < k:
k_smallest_data = self.get_next(k_smallest_data)
count += 1
return k_smallest_data
def get_next (self, key):
...
It just requires to keep the parent of the nodes when inserting them to the tree.
class Node(object):
def __init__(self, data, left=None, right=None, parent=None):
self.data = data
self.right = right
self.left = left
self.parent = parent
An implementation of the bst class with the above methods and also def get_next (self, key) function is here. The upper folder contains the test cases for it and it worked.
The problem:
I have a trie and I want to return the information stored in it. Some leaves have information (set as value > 0) and some leaves do not. I would like to return only those leaves that have a value.
As in all trie's number of leaves on each node is variable, and the key to each value is actually made up of the path necessary to reach each leaf.
I am trying to use a generator to traverse the tree postorder, but I cannot get it to work. What am I doing wrong?
My module:
class Node():
'''Each leaf in the trie is a Node() class'''
def __init__(self):
self.children = {}
self.value = 0
class Trie():
'''The Trie() holds all nodes and can return a list of their values'''
def __init__(self):
self.root = Node()
def add(self, key, value):
'''Store a "value" in a position "key"'''
node = self.root
for digit in key:
number = digit
if number not in node.children:
node.children[number] = Node()
node = node.children[number]
node.value = value
def __iter__(self):
return self.postorder(self.root)
def postorder(self, node):
if node:
for child in node.children.values():
self.postorder(child)
# Do my printing / job related stuff here
if node.value > 0:
yield node.value
Example use:
>>trie = Trie()
>>trie.add('foo', 3)
>>trie.add('foobar', 5)
>>trie.add('fobaz', 23)
>>for key in trie:
>>....print key
>>
3
5
23
I know that the example given is simple and can be solved using any other data structure. However, it is important for this program to use a trie as it is very beneficial for the data access patterns.
Thanks for the help!
Note: I have omitted newlines in the code block to be able to copy-paste with greater ease.
Change
self.postorder(child)
to
for n in self.postorder(child):
yield n
seems to make it work.
P.S. It is very helpful for you to left out the blank lines for ease of cut & paste :)