Python Set not Detecting Duplicate Node - python

I have created a Graph class along with a Node class that I've tested. It is working with basic tests. I'm now trying to test with a custom input file and I'm running into an error where some of the Nodes are being duplicated. In the Graph class, there is a set called Nodes where the Node being created is added. However, in my parser file I have a checker which checks to see if the Node with the value has been added before but it's not working properly.
Even after adding in the hash function below, I still get set([Alpha, Beta, Hotel, Alpha]).
What am I doing wrong?
Test File:
directed unweighted
Alpha=Hotel
Alpha=Beta
Beta=Charlie
Charlie=Juliett
Charlie=Juliett
Delta=Foxtrot
Delta=Golf
Echo=Charlie
Echo=Delta
Foxtrot=Golf
Golf=Juliett
Golf=Alpha
Hotel=Echo
Hotel=India
India=Beta
India=Golf
Juliett=Golf
Graph and Node Class:
class Graph:
def __init__(self):
self.Nodes = set()
def addVertex(self, vertex):
self.Nodes.add(vertex)
def getVertices(self):
return self.Nodes
#staticmethod
def addEdgeDirect(fromEdge, toEdge, cost=1):
fromEdge.neighbors[toEdge] = cost
class Node():
def __init__(self, val = None):
self.value = val
self.neighbors = {}
def getEdges(self):
return self.neighbors.keys()
def __repr__(self):
return str(self.value)
Test File Parser:
from graph import Graph, Node
graph = Graph()
file1 = open('Graphs/du.gl', 'r')
Lines = file1.readlines()
direction = Lines[0].split(" ")[0].strip()
weight = Lines[0].split(" ")[1].strip()
count = 0
if(direction == "directed"):
if(weight == "unweighted"):
for line in Lines:
count += 1
if(count == 1):
continue
node1 = Node(line.split("=")[0].strip())
node2 = Node(line.split("=")[1].strip())
if node1 not in graph.Nodes:
print("not found, to be added")
graph.addVertex(Node(node1))
if node2 not in graph.Nodes:
graph.addVertex(Node(node2))
print(graph.getVertices())
graph.addEdgeDirect(node1,node2)
# print("Line{}: {}".format(count, line.strip()))

In a set objects are always distinguished by their object ID (reference). So it doesn't help to define __hash__ either.
I would suggest to:
Use a dictionary instead of a set
Only create Node instances when you already know that you need a new instance -- so after having checked that the Nodes dictionary doesn't have it yet
By consequence: pass the string value to addVertex instead of a Node instance.
With some other little changes, your code could be:
class Graph:
def __init__(self):
self.nodes = {}
def addVertex(self, key):
if key not in self.nodes:
self.nodes[key] = Node(key)
return self.nodes[key]
def getVertices(self):
return self.nodes.values()
#staticmethod
def addEdgeDirect(fromEdge, toEdge, cost=1):
fromEdge.neighbors[toEdge] = cost
class Node():
def __init__(self, val=None):
self.value = val
self.neighbors = {}
def getEdges(self):
return self.neighbors.keys()
def __repr__(self):
return str(self.value)
graph = Graph()
file1 = open('du.gl', 'r')
firstline, *lines = file1.readlines()
direction, weight = firstline.split()
if direction == "directed":
if weight == "unweighted":
for line in lines:
graph.addEdgeDirect(*(graph.addVertex(key.strip())
for key in line.split("=")))
print(graph.getVertices())
Addendum
In comments you ask about getEdges and how it could return more information.
Here is a variant of that method:
def getEdges(self):
return { self: self.neighbors }
If you then do this:
print(graph.nodes['Alpha'].getEdges())
...it will output:
{Alpha: {Hotel: 1, Beta: 1}}

You are expecting that the set should contain nodes with unique names, but nowhere do you specify that uniqueness should depend on the property value of those nodes.
You should add the following method to your node class:
def __hash__(self):
return hash(self.value)

There's a few issues, some related to a lack of type checking and some from the design of your class.
First, you have a Node class, instances of which you've kind of implied maybe should have a unique self.value, and that self.value is expected to always be a string (although these expectations are not contained in the code).
One problem causing the set() behavior, addressed in another comment, is the lack of a __hash__() method. Since it seems like you maybe want two Nodes to be equal if and only if their value parameter is the same string, adding
def __hash__(self):
return hash(self.value)
is needed. However, for set() to work like you want, you also need to add an __eq__() function.
def __eq__(self, other):
return isinstance(other, Node) and self.value == other.value
It's unclear to me whether the 'neighbors' attribute of a node should matter for its identity, as the terms node and graph don't carry information about what the classes actually are or represent, so maybe neighbors should be included in __eq__ too.
After adding those methods, the body of your loop is still incorrect. The problem line is graph.addVertex(Node(node1)) specifically the Node(node1). Supposedly that was intended to create a copy of node1, but it actually initializes a new node, where newnode.value is now an instance of Node, not a string. Using type hints and/or explicit type checking helps catch those problems, for example, adding a check for isinstance(value, str) to the initialization body.
Replacing those two conditionals from the loop body, the correct version would be:
if node1 not in graph.Nodes:
graph.addVertex(node1)
if node2 not in graph.Nodes:
graph.addVertex(node2)

Related

Linked lists in Python

I have a Linked Lists assignment for school although I am just getting the hang of class constructors. I am trying to simply get the basics of the linked list data structure down, and I understand the basic concept. I have watched lots of Youtube tutorials and the like, but where I am failing to understand is how to print out the cargo or data in my nodes using a loop.
I have written something along these lines:
class Node:
def __init__(self, value, pointer):
self.value = value
self.pointer = pointer
node4 = Node(31, None)
node3 = Node(37, None)
node2 = Node(62, None)
node1 = Node(23, None)
Now...I understand that each node declaration is a call to the class constructor of Node and that the list is linked because each node contains a pointer to the next node, but I simply don't understand how to print them out using a loop. I've seen examples using global variables for the "head" and I've seen subclasses created to accomplish the task. I'm old and dumb. I was wondering if someone could take it slow and explain it to me like I'm 5. If anyone out there has the compassion and willingness to hold my hand through the explanation, I would be greatly obliged. Thank you in advance, kind sirs.
First of all, your nodes should be created something like this :
node4 = Node(31, node3)
node3 = Node(37, node2)
node2 = Node(62, node1)
node1 = Node(23, None)
Now, i am sure you can see that the last node in the list would point to None. So, therefore, you can loop through the list until you encounter None. Something like this should work :
printhead = node4
while True:
print(printhead.value)
if printhead.pointer is None:
break;
else :
printhead = printhead.pointer
This is a very basic linked list implementation for educational purposes only.
from __future__ import print_function
"""The above is needed for Python 2.x unless you change
`print(node.value)` into `print node.value`"""
class Node(object):
"""This class represents list item (node)"""
def __init__(self, value, next_node):
"""Store item value and pointer to the next node"""
self.value = value
self.next_node = next_node
class LinkedList(object):
"""This class represents linked list"""
def __init__(self, *values):
"""Create nodes and store reference to the first node"""
node = None
# Create nodes in reversed order as each node needs to store reference to next node
for value in reversed(values):
node = Node(value, node)
self.first_node = node
# Initialize current_node for iterator
self.current_node = self.first_node
def __iter__(self):
"""Tell Python that this class is iterable"""
return self
def __next__(self):
"""Return next node from the linked list"""
# If previous call marked iteration as done, let's really finish it
if isinstance(self.current_node, StopIteration):
stop_iteration = self.current_node
# Reset current_node back to reference first_node
self.current_node = self.first_node
# Raise StopIteration to exit for loop
raise stop_iteration
# Take the current_node into local variable
node = self.current_node
# If next_node is None, then the current_node is the last one, let's mark this with StopIteration instance
if node.next_node is None:
self.current_node = StopIteration()
else:
# Put next_node reference into current_node
self.current_node = self.current_node.next_node
return node
linked_list = LinkedList(31, 37, 62, 23)
for node in linked_list:
print(node.value)
This doesn't handle many cases properly (including break statement in the loop body) but the goal is to show minimum requirements for linked list implementation in Python.

Binary Tree: How Do Class Instances Link?

I am trying to understand binary trees, but doing so has brought me to confusion about how class instances interact, how does each instance link to another?
My Implementation:
class Node(object):
def __init__(self, key):
self.key= key
self.L = None
self.R = None
class BinaryTree(object):
def __init__(self):
self.root = None
def get_root(self):
return self.root
def insert(self, key):
if self.get_root()==None:
self.root = Node(key)
else:
self._insert(key, self.root)
def _insert(self, key, node):
if key < node.key:
if node.L == None:
node.L = key
else:
self._insert(key, Node(node.L))
if key > node.key:
if node.R == None:
node.R = key
else:
self._insert(key, Node(node.R))
myTree= BinaryTree()
A Scenario
So lets say I want to insert 10, I do myTree.insert(10) and this will instantiate a new instance of Node(), this is clear to me.
Now I want to add 11, I would expect this to become the right node of the root node; i.e it will be stored in the attribute R of the root node Node().
Now here comes the part I don't understand. When I add 12, it should become the child of the root nodes right child. In my code this creates a new instance of Node() where 11 should the be key and 12 should be R.
So my question is 2-fold: what happens to the last instance of Node()? Is it deleted if not how do I access it?
Or is the structure of a binary tree to abstract to think of each Node() connected together like in a graph
NB: this implementation is heavily derived from djra's implementation from this question How to Implement a Binary Tree?
Make L and R Nodes instead of ints. You can do this by changing the parts of your _insert function from this:
if node.L == None:
node.L = key
to this:
if node.L == None:
node.L = Node(key)
There is also a problem with this line:
self._insert(key, Node(node.L))
The way you're doing it right now, there is no way to access that last reference of Node() because your _insert function inserted it under an anonymously constructed node that has no parent node, and therefore is not a part of your tree. That node being passed in to your insert function is not the L or R of any other node in the tree, so you're not actually adding anything to the tree with this.
Now that we changed the Ls and Rs to be Nodes, you have a way to pass in a node that's part of the tree into the insert function:
self._insert(key, node.L)
Now you're passing the node's left child into the recursive insert, which by the looks of thing is what you were originally trying to do.
Once you make these changes in your code for both the L and R insert cases you can get to the last instance of Node() in your
10
\
11
\
12
example tree via myTree.root.R.R. You can get its key via myTree.root.R.R.key, which equals 12.
Most of you're questions come from not finishing the program; In your current code after myTree.insert(11) you're tree is setting R equal to a int rather than another Node.
If the value isn't found then create the new node at that point. Otherwise pass the next node into the recursive function to keep moving further down the tree.
def _insert(self, key, node):
if key < node.key:
if node.L == None:
node.L = Node(key)
else:
self._insert(key, node.L)
if key > node.key:
if node.R == None:
node.R = Node(key)
else:
self._insert(key, node.R)
P.S. This isn't finished you're going to need another level of logic testing incase something is bigger than the current Node.key but smaller than the next Node.

Python tree operations

I need to implement (or just use) a tree data structure on which I can perform:
1. Child additions at any specified position. The new child can itself be a big tree (need not be a singleton)
2. Subtree deletions and moving (to another node in the same tree)
3. Common traversal operations.
4. Access parent from child node.
First, is there any module I can use for this?
Second, if I were to implement this by myself, I've this concern:
When I do tree manipulations like moving subtrees, removing subtrees or adding new subtrees, I only wish to move the "references" to these tree nodes. For example, in C/C++ these operations can be performed by pointer manipulations and I can be assured that only the references are being moved.
Similarly, when I do tree "movements" I need to move only the reference - aka, a new copy of the tree should not be created at the destination.
I'm still in a "pointers" frame of thinking, and hence the question. May be, I don't need to do all this?
You can easily make your own tree with operator overloading. For example, here is a basic class with __add__ implemented :
class Node(object):
def __init__(self, value):
self.value = value
self.child = []
def add_child(self, child):
self.child.append(child)
def __add__(self, element):
if type(element) != Node:
raise NotImplementedError("Addition is possible only between 2 nodes")
self.value += element.value # i didn't get if you have to add also childs
return self # return the NODE object
So to answer to your second question, there is a python trick here. In __add__ you return self. Then, this return True:
a = Node(1)
b = Node(2)
print a is a + b
If you use a + b, this will modify the value a. a and b are, in fact, pointers. Then if you pass it as argument in a function, and you modify them in the function, the a and b instances will be modified. There is two different way to avoid this (maybe more, but this is the two i use) :
The first one is to directly modify the definition of __add__ :
def __add__(self, element):
# .../...
value = self.value + element.value
return Node(value) # you may add rows in order to copy childs
The second one is to add a copy method :
def copy(self):
# .../...
n = Node(self.value)
n.child = self.child[:] # Copy the list, in order to have 2 different instance of this list.
return n
This will allow you to do something like c = a.copy() + b and the assertion c is a will be false.
Hope I answered to your question.
Thi is an example for you:
class BinaryTree:
def __init__(self,rootObj):
self.key = rootObj
self.leftChild = None
self.rightChild = None
def insertLeft(self,newNode):
if self.leftChild == None:
self.leftChild = BinaryTree(newNode)
else:
t = BinaryTree(newNode)
t.leftChild = self.leftChild
self.leftChild = t
def insertRight(self,newNode):
if self.rightChild == None:
self.rightChild = BinaryTree(newNode)
else:
t = BinaryTree(newNode)
t.rightChild = self.rightChild
self.rightChild = t
def getRightChild(self):
return self.rightChild
def getLeftChild(self):
return self.leftChild
def setRootVal(self,obj):
self.key = obj
def getRootVal(self):
return self.key

How to print leaves of a tree implemented as a list of subtrees in Python?

Basically I want to be able to have each node of type tree have a Data field and a list of branches. This list should contain a number of objects of type Tree.
I think I have the actual implementation of the list down, but I get strange behavior when I try using the getLeaves method. Basically it calls itself recursively and never returns, and the way that happens is somehow the second node of the tree gets it's first branch set as itself (I think).
class Tree:
"""Basic tree graph datatype"""
branches = []
def __init__(self, root):
self.root = root
def addBranch (self, addition):
"""Adds another object of type Tree as a branch"""
self.branches += [addition]
def getLeaves (self):
"""returns the leaves of a given branch. For leaves of the tree, specify root"""
print (len(self.branches))
if (len(self.branches) == 0):
return self.root
else:
branchSum = []
for b in self.branches:
branchSum += b.getLeaves()
return (branchSum)
Your 'branches' variable is a class member, not an instance member. You need to initialize the 'branches' instance variable in the constructor:
class Tree:
"""Basic tree graph datatype"""
def __init__(self, root):
self.branches = []
self.root = root
The rest of your code looks good.
Is self.root the parent of said tree? In that case, getLeaves() should return self if it has no branches (len(self.branches)==0) instead of self.root as you have it there. Also, if you do have child branches you should include self within branchSum.
Possible solution (your source code with small changes):
class Tree:
def __init__(self, data):
"""Basic tree graph datatype"""
self.data = data
self.branches = []
def addBranch (self, addition):
"""Adds another object of type Tree as a branch"""
self.branches.append(addition)
def getLeaves (self):
"""returns the leaves of a given branch. For
leaves of the tree, specify data"""
if len(self.branches) == 0:
return self.data
else:
branchSum = []
for b in self.branches:
branchSum.append(b.getLeaves())
return branchSum
## Use it
t0 = Tree("t0")
t1 = Tree("t1")
t2 = Tree("t2")
t3 = Tree("t3")
t4 = Tree("t4")
t0.addBranch(t1)
t0.addBranch(t4)
t1.addBranch(t2)
t1.addBranch(t3)
print(t0.getLeaves())
Output:
[['t2', 't3'], 't4']
Remarks:
Looks that some formatting is broken in your code.
Not really sure if this is what you want. Do you want all the leaves in one level of the list? (If so the source code has to be adapted.)

How to use a generator to iterate over a tree's leafs

The problem:
I have a trie and I want to return the information stored in it. Some leaves have information (set as value > 0) and some leaves do not. I would like to return only those leaves that have a value.
As in all trie's number of leaves on each node is variable, and the key to each value is actually made up of the path necessary to reach each leaf.
I am trying to use a generator to traverse the tree postorder, but I cannot get it to work. What am I doing wrong?
My module:
class Node():
'''Each leaf in the trie is a Node() class'''
def __init__(self):
self.children = {}
self.value = 0
class Trie():
'''The Trie() holds all nodes and can return a list of their values'''
def __init__(self):
self.root = Node()
def add(self, key, value):
'''Store a "value" in a position "key"'''
node = self.root
for digit in key:
number = digit
if number not in node.children:
node.children[number] = Node()
node = node.children[number]
node.value = value
def __iter__(self):
return self.postorder(self.root)
def postorder(self, node):
if node:
for child in node.children.values():
self.postorder(child)
# Do my printing / job related stuff here
if node.value > 0:
yield node.value
Example use:
>>trie = Trie()
>>trie.add('foo', 3)
>>trie.add('foobar', 5)
>>trie.add('fobaz', 23)
>>for key in trie:
>>....print key
>>
3
5
23
I know that the example given is simple and can be solved using any other data structure. However, it is important for this program to use a trie as it is very beneficial for the data access patterns.
Thanks for the help!
Note: I have omitted newlines in the code block to be able to copy-paste with greater ease.
Change
self.postorder(child)
to
for n in self.postorder(child):
yield n
seems to make it work.
P.S. It is very helpful for you to left out the blank lines for ease of cut & paste :)

Categories

Resources