I'm trying to solve the keyboard autocompletion problem described here.
The problem is to calculate how many keystrokes a word requires, given some dictionary and autocomplete rules. For example, for the dictionary:
data = ['hello', 'hell', 'heaven', 'goodbye']
We get the following results (please refer to the link above for further explanations):
{'hell': 2, 'heaven': 2, 'hello': 3, 'goodbye': 1}
Quick explanation: if the user types h, then e is autocompleted because all words starting with h also have e as second letter. Now if the user types in l, the other l is filled, giving 2 strokes for the word hell. Of course, hello would require one more stroke. Please, see the link above for more examples.
My Trie code is the following, and it works fine (taken from https://en.wikipedia.org/wiki/Trie). The Stack code is to parse the tree from root (see edit below):
class Stack(object):
def __init__(self, size):
self.data = [None]*size
self.i = 0
self.size = size
def pop(self):
if self.i == 0:
return None
item = self.data[self.i - 1]
self.i-= 1
return item
def push(self, item):
if self.i >= self.size:
return None
self.data[self.i] = item
self.i+= 1
return item
def __str__(self):
s = '# Stack contents #\n'
if self.i == 0:
return
for idx in range(self.i - 1, -1, -1):
s+= str(self.data[idx]) + '\n'
return s
class Trie(object):
def __init__(self, value, children):
self.value = value #char
self.children = children #{key, trie}
class PrefixTree(object):
def __init__(self, data):
self.root = Trie(None, {})
self.data = data
for w in data:
self.insert(w, w)
def insert(self, string, value):
node = self.root
i = 0
n = len(string)
while i < n:
if string[i] in node.children:
node = node.children[string[i]]
i = i + 1
else:
break
while i < n:
node.children[string[i]] = Trie(string[:i], {})
node = node.children[string[i]]
i = i + 1
node.value = value
def find(self, key):
node = self.root
for char in key:
if char in node.children:
node = node.children[char]
else:
return None
return node
I couldn't figure it out how to count the number of strokes:
data = ['hello', 'hell', 'heaven', 'goodbye']
tree = PrefixTree(data)
strokes = {w:1 for w in tree.data} #at least 1 stroke is necessary
And here's the code to parse the tree from the root:
stack = Stack(100)
stack.push((None, pf.root))
print 'Key\tChilds\tValue'
print '--'*25
strokes = {}
while stack.i > 0:
key, curr = stack.pop()
# if something:
#update strokes
print '%s\t%s\t%s' % (key, len(curr.children), curr.value)
for key, node in curr.children.items():
stack.push((key, node))
print strokes
Any idea or constructive comment would help, thanks!
Edit
Great answer by #SergiyKolesnikov. There's one small change that can be done in order to avoid the call to endsWith(). I just added a boolean field to the Trie class:
class Trie(object):
def __init__(self, value, children, eow):
self.value = value #char
self.children = children #{key, trie}
self.eow = eow # end of word
And at the end of insert():
def insert(self, string, value):
#...
node.value = value
node.eow = True
Then just replace curr.value.endswith('$'): with curr.eow. Thank you all!
The trie for your example can look like this
' '
| \
H G
| |
E O
| \ |
L A O
| | |
L$ V D
| | |
O E B
| |
N Y
|
E
What nodes in the trie can be seen as markers for user key strokes? There are two types of such nodes:
Inner nodes with more than one child, because the user has to choose among multiple alternatives.
Nodes that represent the last letter of a word, but are not leaves (marked with $), because the user has to type the next letter if the current word is not what is needed.
While traversing the trie recursively one counts how many of these marker nodes were encountered before the last letter of a word was reached. This count is the number of strokes needed for the word.
For the word "hell" it is two marker nodes: ' ' and E (2 strokes).
For the word "hello" it is three marker nodes: ' ', E, L$ (3 strokes).
And so on...
What needs to be changed in your implementation:
The end of a valid word needs to be marked in the tree, so that the second condition can be checked. Therefore, we change the last line of the PrefixTree.insert() method from
node.value = value
to
node.value = value + '$'
Now we add a stroke counter for each stack item (the last value in the triple pushed to the stack) and the checks that increase the counter:
stack = Stack(100)
stack.push((None, tree.root, 0)) # We start with stroke counter = 0
print('Key\tChilds\tValue')
print('--'*25)
strokes = {}
while stack.i > 0:
key, curr, stroke_counter = stack.pop()
if curr.value is not None and curr.value.endswith('$'):
# The end of a valid word is reached. Save the word and the corresponding stroke counter.
strokes[curr.value[:-1]] = stroke_counter
if len(curr.children) > 1:
# Condition 2 is true. Increase the stroke counter.
stroke_counter += 1
if curr.value is not None and curr.value.endswith('$') and len(curr.children) > 0:
# Condition 1 is true. Increase the stroke counter.
stroke_counter += 1
print('%s\t%s\t%s' % (key, len(curr.children), curr.value))
for key, node in curr.children.items():
stack.push((key, node, stroke_counter)) # Save the stroke counter
print(strokes)
Output:
Key Childs Value
--------------------------------------------------
None 2 None
h 1
e 2 h
a 1 he
v 1 hea
e 1 heav
n 0 heaven$
l 1 he
l 1 hell$
o 0 hello$
g 1
o 1 g
o 1 go
d 1 goo
b 1 good
y 1 goodb
e 0 goodbye$
{'heaven': 2, 'goodbye': 1, 'hell': 2, 'hello': 3}
While you go through your stack, you should keep a stroke counter for each node:
It begins at 0 for None.
If the current node has more than 2 children, the counter of the
children will be 1 more than the current counter.
If the current value is a valid word and has at least one child, the
counter of the child(ren) will be 1 more than the current counter.
For documentation purpose, here's my Ruby answer :
class Node
attr_reader :key, :children
attr_writer :final
def initialize(key, children = [])
#key = key
#children = children
#final = false
end
def final?
#final
end
end
class Trie
attr_reader :root
def initialize
#root = Node.new('')
end
def add(word)
node = root
word.each_char.each{|c|
next_node = node.children.find{|child| child.key == c}
if next_node then
node = next_node
else
next_node = Node.new(c)
node.children.push(next_node)
node = next_node
end
}
node.final = true
end
def count_strokes(node=root,word="",i=0)
word=word+node.key
strokes = {}
if node.final? then
strokes[word]=i
if node.children.size>0 then
i+=1
end
elsif node.children.size>1 then
i+=1
end
node.children.each{|c|
strokes.merge!(count_strokes(c, word, i))
}
strokes
end
end
data = ['hello', 'hell', 'heaven', 'goodbye']
trie = Trie.new
data.each do |word|
trie.add(word)
end
# File.readlines('/usr/share/dict/british-english').each{|line|
# trie.add line.strip
# }
puts trie.count_strokes
#=> {"hell"=>2, "hello"=>3, "heaven"=>2, "goodbye"=>1}
60 lines only, and it take less than 3 seconds for 100 000 words.
Related
I'm trying to optimize this solution for a function that accepts 2 arguments: fullstring and substring. The function will return True if the substring exists in the fullstring, and False if it does not. There is one special wildcard that could be entered in the substring that denotes 0 or 1 of the previous symbol, and there can be more than one wildcard in the substring.
For example, "a*" means "" or "a"
The solution I have works fine but I'm trying to reduce the number of for loops (3) and optimize for time complexity. Using regex is not permitted. Is there a more pythonic way to do this?
Current Solution:
def complex_search(fullstring, substring):
patterns = []
if "*" in substring:
index = substring.index("*")
patterns.append(substring[:index-1] + substring[index+1:])
patterns.append(substring[:index] + substring[index+1:])
else:
patterns.append(substring)
def check(s1, s2):
for a, b in zip(s1, s2):
if a != b:
return False
return True
for pattern in patterns:
for i in range(len(fullstring) - len(pattern) + 1):
if check(fullstring[i:i+len(pattern)], pattern):
return True
return False
>> print(complex_search("dogandcats", "dogs*andcats"))
>> True
Approach
Create all alternatives for the substring based upon '" in substring (can have zero or more '' in substring)
See Function combs(...) below
Use Aho-Corasick to check if one of the substring patterns is in the string. Aho-Corasick is a very efficient algorithm for checking if one or more substrings appear in a string and formed as the basis of the original Unix command fgrep.
For illustrative purposes a Python version of Aho-Corasik is used below, but a C implementation (with Python wrapper) is available at pyahocorasick for higher performance.
See class Aho-Corasick below.
Code
# Note: This is a modification of code explained in https://carshen.github.io/data-structures/algorithms/2014/04/07/aho-corasick-implementation-in-python.html
from collections import deque
class Aho_Corasick():
def __init__(self, keywords):
self.adj_list = []
# creates a trie of keywords, then sets fail transitions
self.create_empty_trie()
self.add_keywords(keywords)
self.set_fail_transitions()
def create_empty_trie(self):
""" initalize the root of the trie """
self.adj_list.append({'value':'', 'next_states':[],'fail_state':0,'output':[]})
def add_keywords(self, keywords):
""" add all keywords in list of keywords """
for keyword in keywords:
self.add_keyword(keyword)
def find_next_state(self, current_state, value):
for node in self.adj_list[current_state]["next_states"]:
if self.adj_list[node]["value"] == value:
return node
return None
def add_keyword(self, keyword):
""" add a keyword to the trie and mark output at the last node """
current_state = 0
j = 0
keyword = keyword.lower()
child = self.find_next_state(current_state, keyword[j])
while child != None:
current_state = child
j = j + 1
if j < len(keyword):
child = self.find_next_state(current_state, keyword[j])
else:
break
for i in range(j, len(keyword)):
node = {'value':keyword[i],'next_states':[],'fail_state':0,'output':[]}
self.adj_list.append(node)
self.adj_list[current_state]["next_states"].append(len(self.adj_list) - 1)
current_state = len(self.adj_list) - 1
self.adj_list[current_state]["output"].append(keyword)
def set_fail_transitions(self):
q = deque()
child = 0
for node in self.adj_list[0]["next_states"]:
q.append(node)
self.adj_list[node]["fail_state"] = 0
while q:
r = q.popleft()
for child in self.adj_list[r]["next_states"]:
q.append(child)
state = self.adj_list[r]["fail_state"]
while (self.find_next_state(state, self.adj_list[child]["value"]) == None
and state != 0):
state = self.adj_list[state]["fail_state"]
self.adj_list[child]["fail_state"] = self.find_next_state(state, self.adj_list[child]["value"])
if self.adj_list[child]["fail_state"] is None:
self.adj_list[child]["fail_state"] = 0
self.adj_list[child]["output"] = self.adj_list[child]["output"] + self.adj_list[self.adj_list[child]["fail_state"]]["output"]
def get_keywords_found(self, line):
""" returns keywords in trie from line """
line = line.lower()
current_state = 0
keywords_found = []
for i, c in enumerate(line):
while self.find_next_state(current_state, c) is None and current_state != 0:
current_state = self.adj_list[current_state]["fail_state"]
current_state = self.find_next_state(current_state, c)
if current_state is None:
current_state = 0
else:
for j in self.adj_list[current_state]["output"]:
yield {"index":i-len(j) + 1,"word":j}
def pattern_found(self, line):
''' Returns true when the pattern is found '''
return next(self.get_keywords_found(line), None) is not None
def combs(word, n = 0, path = ""):
''' Generate all combinations of words with star
e.g. list(combs("he*lp*")) = ['help', 'helpp', 'heelp', 'heelpp']
'''
if n == len(word):
yield path
elif word[n] == '*':
# Next letter
yield from combs(word, n+1, path) # don't add * to path
else:
if n < len(word) - 1 and word[n+1] == '*':
yield from combs(word, n+1, path) # Not including letter at n
yield from combs(word, n+1, path + word[n]) # including letter at n
Test
patterns = combs("dogs*andcats") # ['dogandcats', 'dogsandcats']
aho = Aho_Corasick(patterns) # Aho-Corasick structure to recognize patterns
print(aho.pattern_found("dogandcats")) # Output: True
print(aho.pattern_found("dogsandcats")) # Output: True
I have a huge trie dictionary that I built from data from web. Although it is just 5MB when I write the trie into a file its' size is so big when I load it on the memory (more than 100 MB). So I've to compress the trie.
I am facing difficulties in writing a recursive function (preferably runs in linear time like a DFS) to remove the words whose frequency is < 5 and length > 15. Any help is appreciated
Here is my trie structure.
class TrieNode:
def __init__(self):
self.ch = '|'
self.score = 0
self.childs = [None]*26
self.isWord = False
class Trie:
def __init__(self):
self.root = TrieNode('$')
#staticmethod
def print_trie(node, level):
if node is None:
return
print(node.ch, " ", level, " ", node.isWord)
for i in range(26):
Trie.print_trie(node.childs[i], level+1)
def insert(self, word):
word = word.lower()
if not is_valid(word):
return
childs = self.root.childs
i = 0
while i < len(word):
idx = to_int(word[i])
if childs[idx] is not None:
t = childs[idx]
else:
t = TrieNode(word[i])
childs[idx] = t
childs = t.childs
if i == len(word)-1:
t.isWord = True
t.score += 1
i += 1
def search_node(self, word):
word = word.lower()
if not is_valid(word):
return False, 0
if self.root is None or word is None or len(word) == 0:
return False, 0
children = self.root.childs
for i in range(len(word)):
idx = to_int(word[i])
if children[idx] is not None:
t = children[idx]
children = t.childs
else:
return False, 0
if t.isWord:
return True, t.score
else:
return False, t.score
The following method takes a node and its level (initially pass in root and 0) and returns True if the node should remain alive after pruning and False if the node should be removed from the trie (with its subtrie).
def prune(node, level):
if node is None:
return False
canPruneNode = True
for idx in xrange(len(node.children)):
# If any of the children remains alive, don't prune current node.
if prune(children[idx], level + 1):
canPruneNode = False
else:
# Remove dead child.
node.children[idx] = None
if node.isWord and level > 15 and node.score < 5:
node.isWord = False
# Current node should be removed if and only if all of its children
# were removed and it doesn't represent a word itself after pruning.
return node.isWord or not canPruneNode
I am not sure if removing will solve the problem. The space consumed is not because of the words but because of the 26 children every node has.
Eg. I have a word cat with frequency 30 & there's another word cater whose frequency is 10. So, if you delete the node for t in cat then all the subsequent nodes will be deleted (that is cater will be reduced to cat)
So, removing a word from Trie means nothing but setting its score to 0.
I was looking through the code written by Ben Langmead on SuffixTrees. I am having a hard time figuring out how to print all the edges of the suffix tree. What is a way to store them in a set and save it in the object class?
class SuffixTree(object):
class Node(object):
def __init__(self, lab):
self.lab = lab # label on path leading to this node
self.out = {} # outgoing edges; maps characters to nodes
def __init__(self, s):
""" Make suffix tree, without suffix links, from s in quadratic time
and linear space """
suffix=[]
self.suffix=suffix
self.root = self.Node(None)
self.root.out[s[0]] = self.Node(s) # trie for just longest suf
# add the rest of the suffixes, from longest to shortest
for i in xrange(1, len(s)):
# start at root; we’ll walk down as far as we can go
cur = self.root
j = i
while j < len(s):
if s[j] in cur.out:
child = cur.out[s[j]]
lab = child.lab
# Walk along edge until we exhaust edge label or
# until we mismatch
k = j+1
while k-j < len(lab) and s[k] == lab[k-j]:
k += 1
if k-j == len(lab):
cur = child # we exhausted the edge
suffix.append(child.lab)
j = k
else:
# we fell off in middle of edge
cExist, cNew = lab[k-j], s[k]
# create “mid”: new node bisecting edge
mid = self.Node(lab[:k-j])
mid.out[cNew] = self.Node(s[k:])
# original child becomes mid’s child
mid.out[cExist] = child
# original child’s label is curtailed
child.lab = lab[k-j:]
# mid becomes new child of original parent
cur.out[s[j]] = mid
else:
# Fell off tree at a node: make new edge hanging off it
cur.out[s[j]] = self.Node(s[j:])
def followPath(self, s):
""" Follow path given by s. If we fall off tree, return None. If we
finish mid-edge, return (node, offset) where 'node' is child and
'offset' is label offset. If we finish on a node, return (node,
None). """
cur = self.root
i = 0
while i < len(s):
c = s[i]
if c not in cur.out:
return (None, None) # fell off at a node
child = cur.out[s[i]]
lab = child.lab
j = i+1
while j-i < len(lab) and j < len(s) and s[j] == lab[j-i]:
j += 1
if j-i == len(lab):
cur = child # exhausted edge
i = j
elif j == len(s):
return (child, j-i) # exhausted query string in middle of edge
else:
return (None, None) # fell off in the middle of the edge
return (cur, None) # exhausted query string at internal node
def hasSubstring(self, s):
""" Return true iff s appears as a substring """
node, off = self.followPath(s)
return node
def hasSuffix(self, s):
""" Return true iff s is a suffix """
node, off = self.followPath(s)
if node is None:
return False # fell off the tree
if off is None:
# finished on top of a node
return '$' in node.out
else:
# finished at offset 'off' within an edge leading to 'node'
return node.lab[off] == '$'
The objective of my code is to get each seperate word from a txt file and put it into a list and then making a binary search tree using that list to count the frequency of each word and printing each word in alphabetical order along with its frequency. Each word in the can only contain letters, numbers, -, or ' The part that I am unable to do with my beginner programming knowledge is to make the Binary Search Tree using the list I have (I am only able to insert the whole list in one Node instead of putting each word in a Node to make the tree). The code I have so far is this:
def read_words(filename):
openfile = open(filename, "r")
templist = []
letterslist = []
for lines in openfile:
for i in lines:
ii = i.lower()
letterslist.append(ii)
for p in letterslist:
if p not in ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',"'","-",' '] and p.isdigit() == False:
letterslist.remove(p)
wordslist = list("".join(letterslist).split())
return wordslist
class BinaryTree:
class _Node:
def __init__(self, value, left=None, right=None):
self._left = left
self._right = right
self._value = value
self._count = 1
def __init__(self):
self.root = None
def isEmpty(self):
return self.root == None
def insert(self, value) :
if self.isEmpty() :
self.root = self._Node(value)
return
parent = None
pointer = self.root
while (pointer != None) :
if value == pointer._value:
pointer._count += 1
return
elif value < pointer._value:
parent = pointer
pointer = pointer._left
else :
parent = pointer
pointer = pointer._right
if (value <= parent._value) :
parent._left = self._Node(value)
else :
parent._right = self._Node(value)
def printTree(self):
pointer = self.root
if pointer._left is not None:
pointer._left.printTree()
print(str(pointer._value) + " " + str(pointer._count))
if pointer._right is not None:
pointer._right.printTree()
def createTree(self,words):
if len(words) > 0:
for word in words:
BinaryTree().insert(word)
return BinaryTree()
else:
return None
def search(self,tree, word):
node = tree
depth = 0
count = 0
while True:
print(node.value)
depth += 1
if node.value == word:
count = node.count
break
elif word < node.value:
node = node.left
elif word > node.value:
node = node.right
return depth, count
def main():
words = read_words('sample.txt')
b = BinaryTree()
b.insert(words)
b.createTree(words)
b.printTree()
Since you're a beginner I'd advice to implement the tree methods with recursion instead of iteration since this will result to simpler implementation. While recursion might seem a bit difficult concept at first often it is the easiest approach.
Here's a draft implementation of a binary tree which uses recursion for insertion, searching and printing the tree, it should support the functionality you need.
class Node(object):
def __init__(self, value):
self.value = value
self.left = None
self.right = None
self.count = 1
def __str__(self):
return 'value: {0}, count: {1}'.format(self.value, self.count)
def insert(root, value):
if not root:
return Node(value)
elif root.value == value:
root.count += 1
elif value < root.value:
root.left = insert(root.left, value)
else:
root.right = insert(root.right, value)
return root
def create(seq):
root = None
for word in seq:
root = insert(root, word)
return root
def search(root, word, depth=1):
if not root:
return 0, 0
elif root.value == word:
return depth, root.count
elif word < root.value:
return search(root.left, word, depth + 1)
else:
return search(root.right, word, depth + 1)
def print_tree(root):
if root:
print_tree(root.left)
print root
print_tree(root.right)
src = ['foo', 'bar', 'foobar', 'bar', 'barfoo']
tree = create(src)
print_tree(tree)
for word in src:
print 'search {0}, result: {1}'.format(word, search(tree, word))
# Output
# value: bar, count: 2
# value: barfoo, count: 1
# value: foo, count: 1
# value: foobar, count: 1
# search foo, result: (1, 1)
# search bar, result: (2, 2)
# search foobar, result: (2, 1)
# search bar, result: (2, 2)
# search barfoo, result: (3, 1)
To answer your direct question, the reason why you are placing all of the words into a single node is because of the following statement inside of main():
b.insert(words)
The insert function creates a Node and sets the value of the node to the item you pass in. Instead, you need to create a node for each item in the list which is what your createTree() function does. The preceeding b.insert is not necessary.
Removing that line makes your tree become correctly formed, but reveals a fundamental problem with the design of your data structure, namely the printTree() method. This method seems designed to traverse the tree and recursively call itself on any child. In your initial version this function worked, because there the tree was mal-formed with only a single node of the whole list (and the print function simply printed that value since right and left were empty).
However with a correctly formed tree the printTree() function now tries to invoke itself on the left and right descendants. The descendants however are of type _Node, not of type BinaryTree, and there is no methodprintTree() declared for _Node objects.
You can salvage your code and solve this new error in one of two ways. First you can implement your BinaryTree.printTree() function as _Node.printTree(). You can't do a straight copy and paste, but the logic of the function won't have to change much. Or you could leave the method where it is at, but wrap each _left or _right node inside of a new BinaryTree so that they would have the necessary printTree() method. Doing this would leave the method where it is at, but you will still have to implement some kind of helper traversal method inside of _Node.
Finally, you could change all of your _Node objects to be _BinaryTree objects instead.
The semantic difference between a node and a tree is one of scope. A node should only be aware of itself, its direct children (left and right), and possibly its parent. A tree on the other hand can be aware of any of its descendents, no matter how far removed. This is accomplished by treating any child node as its own tree. Even a leaf, without any children at all can be thought of as a tree with a depth of 0. This behavior is what lets a tree work recursively. Your code is mixing the two together.
I need to create a binary tree from a list of lists. My problem is that some of the nodes overlap(in the sense that the left child of one is the right of the other) and I want to separate them.
I duplicated the overlapping nodes and created a single list, but I am missing something. The code I use to do that:
self.root = root = BNodeItem(values[0][0], 0)
q = list()
q.append(root)
# make single tree list
tree_list = list()
tree_list.append(values[0][0])
for i in xrange(1, len(values[0])):
ll = [i for i in numpy.array(values)[:, i] if i is not None]
# duplicate the values
p = []
for item in ll[1:-1]:
p.append(item)
p.append(item)
new_ll = list()
new_ll.append(ll[0])
new_ll.extend(p)
new_ll.append(ll[-1])
tree_list.extend(new_ll)
# fix tree
for ind in xrange(len(tree_list)/2 - 1):
eval_node = q.pop(0)
eval_node.left = BNodeItem(tree_list[2*ind + 1], 0)
eval_node.right = BNodeItem(tree_list[2*ind + 2], 0)
q.append(eval_node.left)
q.append(eval_node.right)
the "values" variable looks like this(where 0 I get None normally):
100 141.9068 201.3753 285.7651 405.5200 575.4603
0 70.4688 100 141.9068 201.3753 285.7651
0 0 49.6585 70.4688 100.0000 141.9068
0 0 0 34.9938 49.6585 70.4688
0 0 0 0 24.6597 34.9938
0 0 0 0 0 17.3774
So for example the 141.9 in row = 1 has children 201.3 and 100 in row = 2, but 70.4 has children 100 and 49.6 in row 2(100 is shared).
Any suggestions?
EDIT : Had an error in len() and in creating the nodes from list values(wrong lists). Seems to still have a bug.
Seems it's working
Use this to print the tree from #Arthur's solution:
class Node():
def __init__(self, value):
self.value = value
self.leftChild = None
self.rightChild= None
def __str__(self, depth=0):
ret = ""
if self.leftChild is not None:
ret += self.leftChild.__str__(depth + 1)
ret += "\n" + (" " * depth) + str(self.value)
if self.rightChild is not None:
ret += self.rightChild.__str__(depth + 1)
return ret
Here comes a solution that return you a Node object having left and right child allowing you to use most of the tree parsing algorithms. If needed you can easily add reference to the parent node.
data2 = [[1,2,3],
[0,4,5],
[0,0,6]]
def exceptFirstColumn(data):
if data and data[0] :
return [ row[1:] for row in data ]
else :
return []
def exceptFirstLine(data):
if data :
return data[1:]
def left(data):
""" Returns the part of the data use to build the left subTree """
return exceptFirstColumn(data)
def right(data):
""" Returns the part of the data used to build the right subtree """
return exceptFirstColumn(exceptFirstLine(data))
class Node():
def __init__(self, value):
self.value = value
self.leftChild = None
self.rightChild= None
def __repr__(self):
if self.leftChild != None and self.rightChild != None :
return "[{0} (L:{1} | R:{2}]".format(self.value, self.leftChild.__repr__(), self.rightChild.__repr__())
else:
return "[{0}]".format(self.value)
def fromData2Tree(data):
if data and data[0] :
node = Node(data[0][0])
node.leftChild = fromData2Tree(left(data))
node.rightChild= fromData2Tree(right(data))
return node
else :
return None
tree = fromData2Tree(data2)
print(tree)
This code give the following result :
[1 (L:[2 (L:[3] | R:[5]] | R:[4 (L:[5] | R:[6]]]
That is the requested following tree. Test it on your data, it works. Now try to understand how it works ;)
+-----1-----+
| |
+--2--+ +--4--+
| | | |
3 5 5 6