Implementing Graph for Bayes Net in FSharp - python

I'm trying to translate a graph formulation from Python to F#
The python "Node" class:
class Node:
""" A Node is the basic element of a graph. In its most basic form a graph is just a list of nodes. A Node is a really just a list of neighbors.
"""
def __init__(self, id, index=-1, name="anonymous"):
# This defines a list of edges to other nodes in the graph.
self.neighbors = set()
self.visited = False
self.id = id
# The index of this node within the list of nodes in the overall graph.
self.index = index
# Optional name, most usefull for debugging purposes.
self.name = name
def __lt__(self, other):
# Defines a < operator for this class, which allows for easily sorting a list of nodes.
return self.index < other.index
def __hash__(self):
return hash(self.id)
def __eq__(self, right):
return self.id == right.id
def add_neighbor(self, node):
""" Make node a neighbor if it is not alreadly. This is a hack, we should be allowing self to be a neighbor of self in some graphs. This should be enforced at the level of a graph, because that is where the type of the graph would disallow it.
"""
if (not node in self.neighbors) and (not self == node):
self.neighbors.add(node)
def remove_neighbor(self, node):
# Remove the node from the list of neighbors, effectively deleting that edge from
# the graph.
self.neighbors.remove(node)
def is_neighbor(self, node):
# Check if node is a member of neighbors.
return node in self.neighbors
My F# class so far:
type Node<'T>= string*'T
type Edge<'T,'G> = Node<'T>*Node<'T>*'G
type Graph =
| Undirected of seq(Node*list Edge)
| Directed of seq(Node*list Edge *list Edge)

Yes, this does have to do with immutability. F#'s Set is an immutable collection, it is based on a binary tree which supports Add, Remove and lookup in O(log n) time.
However, because the collection is immutable, the add operation returns a new Set.
let originalSet = set [1; 2; 7]
let newSet = originalSet.Add(5)
The most purely functional solution is probably to reconstruct your problem to remove the mutability entirely. This approach would probably see you reconstruct your node class as an immutable data container (with no methods) and define the functions that act on that data container in a separate module.
module Nodes =
/// creates a new node from an old node with a supplied neighbour node added.
let addNeighbour neighbourNode node =
Node <| Set.add neighbourNode (node.Neighbours)
//Note: you'll need to replace the backwards pipe with brackets for pre-F# 4.0
See the immutable collections in the FSharp Core library such as List, Map, etc. for more examples.
If you prefer the mutable approach, you could just make your neighbours mutable so that it can be updated when the map changes or just use a mutable collection such as a System.Collections.Generic.HashSet<'T>.
When it comes to the hashcode, Set<'T> actually doesn't make use of that. It requires that objects that can be contained within it implement the IComparable interface. This is used to generate the ordering required for the binary tree. It looks like your object already has a concept of ordering built-in which would be appropriate to provide this behaviour.

Related

Python Temperamentally Accessing Global Object from within a Function (which references that object from within another)

(Edit: Fundamentally my problem is that python is sometimes creating new instances of an object x,which I accessed by another object y, accessed by z instead of editing the original x directly. x and y both belong to a list of global variables, but when I access y via z to access x via y on a subsequent iteration of my recursive algorithm, its information isn't always correctly updated.)
Introduction
I'm writing a recursive function to emulate this version of Dijkstra's algorithm (Problem 2) with input from a CSV file. I have two globals Branches [] and Nodes [] to store all branch and node python objects. (I'll put how everything is initialized below).
When I attempt to change the set of the destination node of my branch from within my function, python generates a new object. It's not that it cannot access global Branches, because it does function correctly when the start node of the branch I am accessing is the same as the origin.
Update: I tried searching through the global list for a node with the label matching the one I wanted to change, but while this correctly changed the set of the global none of the branches referencing that node recognized the change
def dijkstra_algorithm(node_being_investigated, origin_node, destination_node):
...
for branch in shortest_route.requisite_branches:
# finding the new branch added to the route and adding it to set I and its destination to set A
if branch not in branches_in_set_I:
print(f"adding the new branch {branch.info()} to set I")
print(f"Adding the new node {branch.destination.info()} to set A")
print(f"The New Node: {branch.destination} ")
branch.set = "I"
branch.destination.set = "A"
# a redundancy I added just in case the for loop was somehow messing with things
shortest_route.requisite_branches[0].destination.set="A"
if destination in obtain_nodes_of_set("A"):
return shortest_route
else:
return dijkstra_algorithm(shortest_route.requisite_branches[0].destination, origin, destination)
NB: There is one other class Routes which stores a list of branches and their cumulative time. I suspect that since this is the only place some operation is performed on a node or branch, the problem is linked to this somehow. However, since I'm not sure how, here's a small table of contents of all the code snippets I've attached. I can send more (or even the whole file) if need be.
The Code Below
the way I define the Node and Branch classes
the initialization of everything from the CSVs
the way I define the Route class
the search_to_origin function which finds the overall length of various Route options
the piece of the dijkstra_algorithm function which finds the shortest_route inputted to this piece of the algorithm
Defining the Node and Branch classes
class Node:
def __init__(self, label, set):
self.label = label
self.set = set and set or "C"
print(f"\n\n!!!creating new node object with label {self.label}!!!\n\n")
def info(self):
return f"Set {self.set}: [[{self.label}]]"
and:
class Branch:
def __init__(self, origin, destination, duration, set):
self.origin = origin
self.destination = destination
self.duration = duration
self.set = set and set or "III"
def info(self):
return f"Set {self.set}:{self.origin.info()} -> {self.destination.info()} ({self.duration})"
Initializing the global Lists from the CSVs
From what I've checked by printing out the object IDs this seems to be working correctly, but since I'm not actually sure where the problem lies, I'm including it just in case
with open('nodes.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
Nodes.append(Node(row["LABEL"], 99999, "C"))
# here all the roads are read into the global list of branches
with open('roads.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
branch_info = {}
# getting the actual node objects to attach
for node in Nodes:
if node.label == row["ORIGIN"]:
branch_info['origin'] = node
if node.label == row["DESTINATION"]:
branch_info['destination'] = node
Branches.append(Branch(branch_info['origin'], branch_info['destination'], int(row["DURATION"]), "III"))
The Route Class and the Main Function Operating on It
There is a sub function called by the overall algorithm which traces the route from a list of branches back to a specified origin node object. Afterwards it returns a list of possible_routes to the main dijkstra function, for that function to decide which is quickest.
Route Class Initialization
class Route:
def __init__(self, branch_list, duration):
self.requisite_branches = branch_list
self.duration = duration
def extend_route(self, branch):
self.requisite_branches.append(branch)
self.duration += branch.duration
The function determining routes to the origin
Node that this calls a utility function find_attached_branches which searches for branches with a node as its origin or destination from a specific set.
def search_to_origin(origin, branches, route):
print("\n Searching to origin...")
possible_routes = []
# for each of the branches being investigated
for branch in branches:
# if this branch has not already been investigated in this route
if branch not in route.requisite_branches:
# this is a new individual route being investigated
new_route = copy.deepcopy(route)
new_route.extend_route(branch)
# if the start node has been found this is a possible route
if branch.origin == origin:
print("This branch leads to the origin")
possible_routes.append(new_route)
# if the start node has not been found search for branches preceding this one
else:
branches_to_this_node = find_attached_branches(branch.origin, 'destination, "II")
branches_to_this_node.extend(find_attached_branches(branch.origin, 'destination, "I"))
if len(branches_to_this_node) != 0:
print("this branch does not lead to the origin")
route_to_start = search_to_origin(origin, branches_to_this_node, new_route)
possible_routes.extend(route_to_start)
# return the lengths and requisite branches found
return possible_routes
Finding the Quickest Route to the Origin From a Given Node in B
This portion of the algorithm is done just before the one you see in the introduction section. The way it functions should be independent of whether the shortest route only consists of one branch, but this doesn't seem to be the case. It goes through the following steps:
Looks for all the nodes now in set B.
For each of these nodes it finds all the branches attached to each node in set B.
For all of those branches it looks for routes back to the origin.
From all of the resulting routes it determines the shortest_route
global Nodes
# get all the nodes in set B
nodes_in_set_B = obtain_nodes_of_set("B")
# get all branches in sets I or II
global Branches
branches_in_I_or_II = []
for branch in Branches:
if branch.set != "III": branches_in_I_or_II.append(branch)
# the shortest route found from one of the nodes of set B. Initialized as a large empty route
shortest_route = Route([], 99999)
if nodes_in_set_B is not None:
for node in nodes_in_set_B:
# the branches under consideration are only those to this node in set B
branches_under_consideration = []
for branch in branches_in_I_or_II:
if branch.destination == node: branches_under_consideration.append(branch)
possible_routes = search_to_origin(origin, branches_under_consideration, Route([], 0))
# finding the possible route of minimum length
for route in possible_routes:
if route.duration < shortest_route.duration:
shortest_route = route

Python creating static objects which are shared amongst all class objects

I have a class node something like this. It's a typical node object for a graph.
class Node(object):
def __init__(self, data, edges = []):
super(Node, self).__init__()
self.data = data
self.edges = edges
self.visited = False
def addEdge(self, *args):
print(self)
self.edges.extend(args)
print(self.edges)
I create two objects like this -
one = Node(1)
two = Node(2)
Next I add a pointer of two to one using the addEdge method defined above -
one.addEdge(two)
Now comes the surprising bit. When I check the values of one.edges and two.edges I get this -
one.edges
[<main.Node object at 0x109ed3e50>]
two.edges
[<main.Node object at 0x109ed3e50>].
If you see both the objects have gotten the value. I'm quite puzzled at this and have no idea why this is happening. Is this how python behaves? If so can you explain this behaviour?
You need to be careful when using an array literal as a default value because you don't get a new array with each instance — you get a reference to the same one. In your example you will see:
>> one.edges is two.edges
True
You need to do something to make sure you get a new array each time. One things you can do is:
self.edges = list(edges)
Another option is:
def __init__(self, data, edges = None):
if edges is None:
self.edges = []
else:
self.edges = edges
But edges is still mutable so it may also lead to subtle bugs if the caller is not expecting it to be changed.

Implementing a (modified) DFS in a Graph

I have implemented a simple graph data structure in Python with the following structure below. The code is here just to clarify what the functions/variables mean, but they are pretty self-explanatory so you can skip reading it.
class Node:
def __init__(self, label):
self.out_edges = []
self.label = label
self.is_goal = False
self.is_visited = False
def add_edge(self, node, weight):
self.out_edges.append(Edge(node, weight))
def visit(self):
self.is_visited = True
class Edge:
def __init__(self, node, weight):
self.node = node
self.weight = weight
def to(self):
return self.node
class Graph:
def __init__(self):
self.nodes = []
def add_node(self, label):
self.nodes.append(Node(label))
def visit_nodes(self):
for node in self.nodes:
node.is_visited = True
Now I am trying to implement a depth-first search which starts from a given node v, and returns a path (in list form) to a goal node. By goal node, I mean a node with the attribute is_goal set to true. If a path exists, and a goal node is found, the string ':-)' is added to the list. Otherwise, the function just performs a DFS and goes as far as it can go. (I do this here just to easily check whether a path exists or not).
This is my implementation:
def dfs(G, v):
path = [] # path is empty so far
v.visit() # mark the node as visited
path.append(v.label) # add to path
if v.is_goal: # if v is a goal node
path.append(':-)') # indicate a path is reached
G.visit_nodes() # set all remaining nodes to visited
else:
for edge in v.out_edges: # for each out_edge of the starting node
if not edge.to().is_visited: # if the node pointed to is not visited
path += dfs(G, edge.to()) # return the path + dfs starting from that node
return path
Now the problem is, I have to set all the nodes to visited (line 9, visit_nodes()) for the algorithm to end once a goal node is reached. In effect, this sort of breaks out of the awaiting recursive calls since it ensures no other nodes are added to the path. My question is:
Is there a cleaner/better way to do this?
The solution seems a bit kludgy. I'd appreciate any help.
It would be better not to clutter the graph structure with visited information, as that really is context-sensitive information linked to a search algorithm, not with the graph itself. You can use a separate set instead.
Secondly, you have a bug in the code, as you keep adding to the path variable, even if your recursive call did not find the target node. So your path will even have nodes in sequence that have no edge between them, but are (close or remote) siblings/cousins.
Instead you should only return a path when you found the target node, and then after making the recursive call you should test that condition to determine whether to prefix that path with the current edge node you are trying with.
There is in fact no need to keep a path variable, as per recursion level you are only looking for one node to be added to a path you get from the recursive call. It is not necessary to store that one node in a list. Just a simple variable will do.
Here is the suggested code (not tested):
def dfs(G, v):
visited = set() # keep visited information away from graph
def _dfs(v):
visited.add(v) # mark the node as visited
if v.is_goal:
return [':-)'] # return end point of path
for edge in v.out_edges:
neighbor = edge.to() # to avoid calling to() several times
if neighbor not in visited:
result = _dfs(neighbor)
if result: # only when successful
# we only need 1 success: add current neighbor and exit
return [neighbor.label] + result
# otherwise, nothing should change to any path: continue
# don't return anything in case of failure
# call nested function: the visited and Graph variables are shared
return _dfs(v)
Remark
For the same reason as for visited, it is maybe better to remove the is_goal marking from the graph as well, and pass that target node as an additional argument to the dfs function.
It would also be nice to give a default value for the weight argument, so that you can use this code for unweighted graphs as well.
See how it runs on a sample graph with 5 nodes on repl.it.

Disjoint-Set forests in Python alternate implementation

I'm implementing a disjoint set system in Python, but I've hit a wall. I'm using a tree implementation for the system and am implementing Find(), Merge() and Create() functions for the system.
I am implementing a rank system and path compression for efficiency.
The catch is that these functions must take the set of disjoint sets as a parameter, making traversing hard.
class Node(object):
def __init__(self, value):
self.parent = self
self.value = value
self.rank = 0
def Create(values):
l = [Node(value) for value in values]
return l
The Create function takes in a list of values and returns a list of singular Nodes containing the appropriate data.
I'm thinking the Merge function would look similar to this,
def Merge(set, value1, value2):
value1Root = Find(set, value1)
value2Root = Find(set, value2)
if value1Root == value2Root:
return
if value1Root.rank < value2Root.rank:
value1Root.parent = value2Root
elif value1Root.rank > value2Root.rank:
value2Root.parent = value1Root
else:
value2Root.parent = value1Root
value1Root.rank += 1
but I'm not sure how to implement the Find() function since it is required to take the list of Nodes and a value (not just a node) as the parameters. Find(set, value) would be the prototype.
I understand how to implement path compression when a Node is taken as a parameter for Find(x), but this method is throwing me off.
Any help would be greatly appreciated. Thank you.
Edited for clarification.
The implementation of this data structure becomes simpler when you realize that the operations union and find can also be implemented as methods of a disjoint set forest class, rather than on the individual disjoint sets.
If you can read C++, then have a look at my take on the data structure; it hides the actual sets from the outside world, representing them only as numeric indices in the API. In Python, it would be something like
class DisjSets(object):
def __init__(self, n):
self._parent = range(n)
self._rank = [0] * n
def find(self, i):
if self._parent[i] == i:
return i
else:
self._parent[i] = self.find(self._parent[i])
return self._parent[i]
def union(self, i, j):
root_i = self.find(i)
root_j = self.find(j)
if root_i != root_j:
if self._rank[root_i] < self._rank[root_j]:
self._parent[root_i] = root_j
elif self._rank[root_i] > self._rank[root_j]:
self._parent[root_j] = root_i
else:
self._parent[root_i] = root_j
self._rank[root_j] += 1
(Not tested.)
If you choose not to follow this path, the client of your code will indeed have to have knowledge of Nodes and Find must take a Node argument.
Clearly merge function should be applied to pair of nodes.
So find function should take single node parameter and look like this:
def find(node):
if node.parent != node:
node.parent = find(node.parent)
return node.parent
Also wikipedia has pseudocode that is easily translatable to python.
Find is always done on an item. Find(item) is defined as returning the set to which the item belongs. Merger as such must not take nodes, merge always takes two items/sets. Merge or union (item1, item2) must first find(item1) and find(item2) which will return the sets to which each of these belong. After that the smaller set represented by an up-tree must be added to the taller. When a find is issued, always retrace the path and compress it.
A tested implementation with path compression is here.

Python: Inheriting from Built-In Types

I have a question concerning subtypes of built-in types and their constructors. I want a class to inherit both from tuple and from a custom class.
Let me give you the concrete example. I work a lot with graphs, meaning nodes connected with edges. I am starting to do some work on my own graph framework.
There is a class Edge, which has its own attributes and methods. It should also inherit from a class GraphElement. (A GraphElement is every object that has no meaning outside the context of a specific graph.) But at the most basic level, an edge is just a tuple containing two nodes. It would be nice syntactic sugar if you could do the following:
edge = graph.create_edge("Spam","Eggs")
(u, v) = edge
So (u,v) would contain "Spam" and "Eggs". It would also support iteration like
for node in edge: ...
I hope you see why I would want to subtype tuple (or other basic types like set).
So here is my Edge class and its init:
class Edge(GraphElement, tuple):
def __init__(self, graph, (source, target)):
GraphElement.__init__(self, graph)
tuple.__init__((source, target))
When i call
Edge(aGraph, (source, target))
I get a TypeError: tuple() takes at most 1 argument (2 given). What am I doing wrong?
Since tuples are immutable, you need to override the __new__ method as well. See http://www.python.org/download/releases/2.2.3/descrintro/#__new__
class GraphElement:
def __init__(self, graph):
pass
class Edge(GraphElement, tuple):
def __new__(cls, graph, (source, target)):
return tuple.__new__(cls, (source, target))
def __init__(self, graph, (source, target)):
GraphElement.__init__(self, graph)
For what you need, I would avoid multiple inheritance and would implement an iterator using generator:
class GraphElement:
def __init__(self, graph):
pass
class Edge(GraphElement):
def __init__(self, graph, (source, target)):
GraphElement.__init__(self, graph)
self.source = source
self.target = target
def __iter__(self):
yield self.source
yield self.target
In this case both usages work just fine:
e = Edge(None, ("Spam","Eggs"))
(s, t) = e
print s, t
for p in e:
print p
You need to override __new__ -- currently tuple.__new__ is getting called (as you don't override it) with all the arguments you're passing to Edge.

Categories

Resources