I am taking a course on algorithms and data structures in Python 3 and my instructor recently introduced us to the binary search tree. However, I am having trouble understanding the deletion algorithm. Below is the implementation we were taught, however when I initially wrote my own rendition, I did not include a "base case" and it still worked:
def remove(self, data):
if self.root:
self.root = self.remove_node(data, self.root)
def remove_node(self, data, node):
if node is None:
return node
if data < node.data:
node.leftChild = self.remove_node(data, node.leftChild)
elif data > node.data:
node.rightChild = self.remove_node(data, node.rightChild)
else:
if not node.rightChild and not node.leftChild:
print('removing leaf node')
del node
return None
if not node.leftChild:
print('removing node with single right child')
tempNode = node.rightChild
del node
return tempNode
elif not node.rightChild:
print('removing node with single left child')
tempNode = node.leftChild
del node
return tempNode
print('removing node with two children')
tempNode = self.get_predecessor(node.leftChild)
node.data = tempNode.data
node.leftChild = self.remove_node(tempNode.data, node.leftChild)
return node
Now, all of this makes sense to me except the statement below:
if node is None:
return node
When we previously learned about base cases, we were taught that they were essentially the exit points for our algorithms. However, I do not understand how this is the case in the given code. For one, I do not see how a node could ever be empty and even if it was, why would we return an empty node? As far as I can see, this check serves no purpose in the overall recursion because we do not seem to "recur towards it" as we would in any other recursive function. I would greatly appreciate an explanation!
Base case(s), in general, serve one or more purposes, these include;
preventing the function from recursing infinitely
preventing the function from throwing errors on corner cases
compute/return a value/result to callers higher up in the recursion tree
With tree deletion, the first point isn't really a concern (because the recursion tree will only have a finite number of nodes - same as the tree you recurse over). You will be concerned with points 2 and 3 here.
In your function, you do have a base case - in fact, you have two (thanks to #user2357112) -
The value-not-found portion, specified by
if node is None:
return node
and,
The value-found portion, specified by your code inside the else statement, which performs the actual deletion.
To keep the behaviour consistent with the recursive cases, the value-not-found base case returns None. As you see, the first base case is consistent performs the second function of a generic base case outlined above, while the second base case performs the third.
Related
I'm curious about this implementation of a node deletion in BST (for additional code/context, see full implementation.) Here's how I understand it:
if val < self.data and elif val > self.data are both cases where
the current node isn't the node to be deleted, so they recursively
call the delete function on the appropriate child node.
else is the case where we have found the right node and need to perform the deletion.
a. if self.left is None and self.right is None: return None I'm unclear on what the goal is here. We've returned None but haven't reassigned the node value, itself, to None.
b. At this point, we've ruled out the possibility that both left and right don't exist and so elif self.left is None is a verbose way to write elif self.right, which returns right. But why? It doesn't reassign the value of the node, itself.
c. I'm unsure why this last control flow statement is elif self.right is None. Why the absence of an else statement?
This min_val, self.data, self.right dance occurs only when one of the above control flow statements from #2 is not conditionally executed, so I suppose that this is an implicit else statement.
a. This step really boils down to assigning self.data the minimum value down its right child then assigning self.right the output of a recursive function call, which might be left, right or None from #2.
def delete(self, val):
if val < self.data:
if self.left:
self.left = self.left.delete(val)
elif val > self.data:
if self.right:
self.right = self.right.delete(val)
else:
if self.left is None and self.right is None:
return None
elif self.left is None:
return self.right
elif self.right is None:
return self.left
min_val = self.right.find_min()
self.data = min_val
self.right = self.right.delete(min_val)
return self
To answer this question, please confirm or correct some of my doubts above.
The first cases of the algorithm are about deleting nodes from the tree: removing all references to them while maintaining the BST's key order.
Think about how you'd do this with pencil and paper.
If the deleted node has no children, redraw the parent's pointer to point to nothing (None). The deleted node is now "cut out" of the tree. BST order is maintained.
If the deleted node has exactly one child, replace the parent's pointer so it now points to that child. Again the deleted node is cut out of the tree, the BST key order is maintained.
Note the replacement of the parent pointers is happening at the recursive calls to delete. E.g. by returning None, the "no child" case is causing the parent to point to nothing. The "cut out" nodes will ultimately be garbage collected by Python.
Otherwise you have the more complex case: only one parent of the deleted node and two children. What to do? This particular code finds the next largest key wrt the deleted node and uses it to replace the key to be deleted. The node isn't cut out at all. Then it deletes that moved value from the right subtree, which does cause its node to be cut out. Again BST key order is maintained.
This way of implementing the third case makes clean code, and it probably makes sense in Python where all data access is via reference (pointer). But it's not the only choice. The alternative is to continue the same pattern as the other cases and actually cut out the node containing the deleted value and move the next largest node to that position (not just it's key data). This is a better choice if the node actually contains the data (not just a reference to it), and it's very large. It saves a copy of all that data. If the author had made that choice, there would be another return of that node for re-assignment of the parent's pointer.
findlast func is returning the correct node (checked the value before returning) but the receiving node named lastnode becomes none
simply returning the last node of the tree to later substitute the node to be deleted with this one
class tree:
def __init__(self):
self.root=None
def findlast(self,node,arr):
if(node.left!=None):
arr.append(node.left)
if(node.right!=None):
arr.append(node.right)
if(len(arr)==1):
print(arr[0].data) #prints 3
return arr[0]
elif(self.root==None and len(arr)==0):
print("empty tree")
elif(self.root!=None and len(arr)==0):
return self.root
else:
self.findlast(arr.pop(0),arr)
class treeNode:
def __init__(self,data=None,left=None,right=None):
self.data=data
self.left=left
self.right=right
bst=tree()
tn1=treeNode(2)
tn2=treeNode(3)
bst.root=treeNode(1,tn1,tn2)
arr=list()
print(bst.findlast(bst.root,arr).data) #throws error "nonetype has no object data"
To find the rightmost node at the lowest level, do an inorder traversal of the tree. When you find a leaf node, check its depth against the deepest node you've found previously. If the new node's depth is greater than or equal to the previous deepest node, then the new node becomes the deepest node. This works because an inorder traversal of the tree will visit the leaf nodes from left to right.
This is an O(n) operation. It's not possible to do it any faster because you must traverse the entire tree to find the deepest node.
For example if I wanted to change greater than to less than or equal to I have successfully executed:
def visit_Gt(self, node):
new_node = ast.GtE()
return ast.copy_location(new_node, node)
How would I visit/detect an assignment operation (=) and a function call () and simply delete them? I'm reading through the AST documentation and I can't find a way to visit the assignment or function call classes and then return nothing.
An example of what I'm seeking for assignment operations:
print("Start")
x = 5
print("End")
Becomes:
print("Start")
print("End")
And an example of what I'm seeking for deleting function calls:
print("Start")
my_function_call(Args)
print("End")
Becomes
print("Start")
print("End")
You can use a ast.NodeTransformer() subclass to mutate an existing AST tree:
import ast
class RemoveAssignments(ast.NodeTransformer):
def visit_Assign(self, node):
return None
def visit_AugAssign(self, node):
return None
new_tree = RemoveAssignments().visit(old_tree)
The above class removes None to completely remove the node from the input tree. The Assign and AugAssign nodes contain the whole assignment statement, so the expression producing the result, and the target list (1 or more names to assign the result to).
This means that the above will turn
print('Start!')
foo = 'bar'
foo += 'eggs'
print('Done!')
into
print('Start!')
print('Done!')
If you need to make more fine-grained decisions, look at the child nodes of the assignment, either directly, or by passing the child nodes to self.visit() to have the transformer further call visit_* hooks for them if they exist:
class RemoveFunctionCallAssignments(NodeTransformer):
"""Remove assignments of the form "target = name()", so a single name being called
The target list size plays no role.
"""
def visit_Assign(self, node):
if isinstance(node.value, ast.Call) and isinstance(node.value.func, ast.Name):
return None
return node
Here, we only return None if the value side of the assignment (the expression on the right-hand side) is a Call node that is applied to a straight-forward Name node. Returning the original node object passed in means that it'll not be replaced.
To replace top-level function calls (so those without an assignment or further expressions), look at Expr nodes; these are expression statements, not just expressions that are part of some other construct. If you have a Expr node with a Call, you can remove it:
def visit_Expr(self, node):
# stand-alone call to a single name is to be removed
if isinstance(node.value, ast.Call) and isinstance(node.value.func, ast.Name):
return None
return node
Also see the excellent Green Tree Snakes documentation, which covers working on the AST tree with further examples.
I have implemented a simple graph data structure in Python with the following structure below. The code is here just to clarify what the functions/variables mean, but they are pretty self-explanatory so you can skip reading it.
class Node:
def __init__(self, label):
self.out_edges = []
self.label = label
self.is_goal = False
self.is_visited = False
def add_edge(self, node, weight):
self.out_edges.append(Edge(node, weight))
def visit(self):
self.is_visited = True
class Edge:
def __init__(self, node, weight):
self.node = node
self.weight = weight
def to(self):
return self.node
class Graph:
def __init__(self):
self.nodes = []
def add_node(self, label):
self.nodes.append(Node(label))
def visit_nodes(self):
for node in self.nodes:
node.is_visited = True
Now I am trying to implement a depth-first search which starts from a given node v, and returns a path (in list form) to a goal node. By goal node, I mean a node with the attribute is_goal set to true. If a path exists, and a goal node is found, the string ':-)' is added to the list. Otherwise, the function just performs a DFS and goes as far as it can go. (I do this here just to easily check whether a path exists or not).
This is my implementation:
def dfs(G, v):
path = [] # path is empty so far
v.visit() # mark the node as visited
path.append(v.label) # add to path
if v.is_goal: # if v is a goal node
path.append(':-)') # indicate a path is reached
G.visit_nodes() # set all remaining nodes to visited
else:
for edge in v.out_edges: # for each out_edge of the starting node
if not edge.to().is_visited: # if the node pointed to is not visited
path += dfs(G, edge.to()) # return the path + dfs starting from that node
return path
Now the problem is, I have to set all the nodes to visited (line 9, visit_nodes()) for the algorithm to end once a goal node is reached. In effect, this sort of breaks out of the awaiting recursive calls since it ensures no other nodes are added to the path. My question is:
Is there a cleaner/better way to do this?
The solution seems a bit kludgy. I'd appreciate any help.
It would be better not to clutter the graph structure with visited information, as that really is context-sensitive information linked to a search algorithm, not with the graph itself. You can use a separate set instead.
Secondly, you have a bug in the code, as you keep adding to the path variable, even if your recursive call did not find the target node. So your path will even have nodes in sequence that have no edge between them, but are (close or remote) siblings/cousins.
Instead you should only return a path when you found the target node, and then after making the recursive call you should test that condition to determine whether to prefix that path with the current edge node you are trying with.
There is in fact no need to keep a path variable, as per recursion level you are only looking for one node to be added to a path you get from the recursive call. It is not necessary to store that one node in a list. Just a simple variable will do.
Here is the suggested code (not tested):
def dfs(G, v):
visited = set() # keep visited information away from graph
def _dfs(v):
visited.add(v) # mark the node as visited
if v.is_goal:
return [':-)'] # return end point of path
for edge in v.out_edges:
neighbor = edge.to() # to avoid calling to() several times
if neighbor not in visited:
result = _dfs(neighbor)
if result: # only when successful
# we only need 1 success: add current neighbor and exit
return [neighbor.label] + result
# otherwise, nothing should change to any path: continue
# don't return anything in case of failure
# call nested function: the visited and Graph variables are shared
return _dfs(v)
Remark
For the same reason as for visited, it is maybe better to remove the is_goal marking from the graph as well, and pass that target node as an additional argument to the dfs function.
It would also be nice to give a default value for the weight argument, so that you can use this code for unweighted graphs as well.
See how it runs on a sample graph with 5 nodes on repl.it.
I am trying to write a code to delete all nodes of a BST (each node has only three attributes, left, right and data, there are no parent pointers). The following code is what I have come up with, it deletes only the right half of the tree, keeping the left half intact. How do I modify it so that the left half is deleted as well (so that ultimately I am left with only the root node which has neither left or right subtrees)?
def delete(root):
global last
if root:
delete(root.left)
delete(root.right)
if not (root.left or root.right):
last = root
elif root.left == last:
root.left = None
else:
root.right = None
And secondly, can anybody suggest an iterative approach as well, using stack or other related data structure?
Blckknght is right about garbage collection, but in case you want to do some more complex cleanup than your example suggests or understand why your code didn't work, i'll provide an additional answer:
Your problem seems to be the elif node.left == last check.
I'm not sure what your last variable is used for or what the logic is behind it.
But the problem is that node.left is almost never equal to last (you only assign a node to the last variable if both children are already set to None, which they aren't for any of the interesting nodes (those that have children)).
If you look at your code, you'll see that in that if node.left isn't equal to last only the right child gets set to None, and thus only the right part of the subtree is deleted.
I don't know python, but this should work:
def delete(node):
if node:
# recurse: visit all nodes in the two subtrees
delete(node.left)
delete(node.right)
# after both subtrees have been visited, set pointers of this node to None
node.left = None
node.right = None
(I took the liberty of renaming your root parameter to node, since the node given to the function doesn't have to be the root-node of the tree.)
If you want to delete both subtrees, there's no need to recurse. Just set root.left and root.right to None and let the garbage collector take care of them. Indeed, rather than making a delete function in the first place, you could just set root = None and be done with it!
Edit: If you need to run cleanup code on the data values, you might want to recurse through the tree to get to all of them if the GC doesn't do enough. Tearing down the links in the tree shouldn't really be necessary, but I'll do that too for good measure:
def delete(node):
if node:
node.data.cleanup() # run data value cleanup code
delete(node.left) # recurse
delete(node.right)
node.data = None # clear pointers (not really necessary)
node.left = None
none.right = None
You had also asked about an iterative approach to traversing the tree, which is a little more complicated. Here's a way to an traversal using a deque (as a stack) to keep track of the ancestors:
from collections import deque
def delete_iterative(node):
stack = deque()
last = None
# start up by pushing nodes to the stack until reaching leftmost node
while node:
stack.append(node)
node = node.left
# the main loop
while stack:
node = stack.pop()
# should we expand the right subtree?
if node.right && node.right != last: # yes
stack.append(node)
node = node.right
while node: # expand to find leftmost node in right subtree
stack.append(node)
node = node.left
else: # no, we just came from there (or it doesn't exist)
# delete node's contents
node.data.cleanup()
node.data = None # clear pointers (not really necessary)
node.left = None
node.right = None
# let our parent know that it was us it just visited
last = node
An iterative post-order traversal using a stack could look like this:
def is_first_visit(cur, prev):
return prev is None or prev.left is cur or prev.right is cur
def visit_tree(root):
if root:
todo = [root]
previous = None
while len(todo):
node = todo[-1]
if is_first_visit(node, previous):
# add one of our children to the stack
if node.left:
todo.append(node.left)
elif node.right:
todo.append(node.right)
# now set previous to ourself and continue
elif previous is node.left:
# we've done the left subtree, do right subtree if any
if node.right:
todo.append(node.right)
else:
# previous is either node.right (we've visited both sub-trees)
# or ourself (we don't have a right subtree)
do_something(node)
todo.pop()
previous = node
do_something does whatever you want to call "actually deleting this node".
You can do it a bit more simply by setting an attribute on each node to say whether it has had do_something called on it yet, but obviously that doesn't work so well if your nodes have __slots__ or whatever, and you don't want to modify the node type to allow for the flag.
I'm not sure what you're doing with those conditions after the recursive calls, but I think this should be enough:
def delete(root):
if root:
delete(root.left)
delete(root.right)
root = None
As pointed out in comments, Python does not pass parameters by reference. In that case you can make this work in Python like this:
def delete(root):
if root:
delete(root.left)
delete(root.right)
root.left = None
root.right = None
Usage:
delete(root)
root = None
As for an iterative approach, you can try this. It's pseudocode, I don't know python. Basically we do a BF search.
delete(root):
make an empty queue Q
Q.push(root)
while not Q.empty:
c = Q.popFront()
Q.push(c.left, c.right)
c = None
Again, this won't modify the root by default if you use it as a function, but it will delete all other nodes. You could just set the root to None after the function call, or remove the parameter and work on a global root variable.