I'm trying to create depth-first algorithm that assigns finishing times (the time when a vertex can no longer be expanded) which are used for things like Kosaraju's algorithm. I was able to create a recursive version of DFS fairly easily, but I'm having a hard time converting it to an iterative version.
I'm using an adjacency list to represent the graph: a dict of vertices. For example, the input graph {1: [0, 4], 2: [1, 5], 3: [1], 4: [1, 3], 5: [2, 4], 6: [3, 4, 7, 8], 7: [5, 6], 8: [9], 9: [6, 11], 10: [9], 11: [10]} represents edges (1,0), (1,4), (2,1), (2,5), etc. The following is the implementation of an iterative DFS that uses a simple stack (LIFO), but it doesn't compute finishing times. One of the key problems I faced was that since the vertices are popped, there is no way for the algorithm to trace back its path once a vertex has been fully expanded (unlike in recursion). How do I fix this?
def dfs(graph, vertex, finish, explored):
global count
stack = []
stack.append(vertex)
while len(stack) != 0:
vertex = stack.pop()
if explored[vertex] == False:
explored[vertex] = True
#add all outgoing edges to stack:
if vertex in graph: #check if key exists in hash -- since not all vertices have outgoing edges
for v in graph[vertex]:
stack.append(v)
#this doesn't assign finishing times, it assigns the times when vertices are discovered:
#finish[count] = vertex
#count += 1
N.b. there is also an outer loop that complements DFS -- though, I don't think the problem lies there:
#outer loop:
for vertex in range(size, 0, -1):
if explored[vertex] == False:
dfs(hGraph, vertex, finish, explored)
Think of your stack as a stack of tasks, not vertices. There are two types of task you need to do. You need to expand vertexes, and you need to add finishing times.
When you go to expand a vertex, you first add the task of computing a finishing time, then add expanding every child vertex.
When you go to add a finishing time, you can do so knowing that expansion finished.
Here is a working solution that uses two stacks during the iterative subroutine. The array traceBack holds the vertices that have been explored and is associated with complementary 2D-array, stack, that holds arrays of edges that have yet to be explored. These two arrays are linked; when we add an element to traceBack we also add to stack (same with popping elements).
count = 0
def outerLoop(hGraph, N):
explored = [False for iii in range(N+1)]
finish = {}
for vertex in range(N, -1, -1):
if explored[vertex] == False:
dfs(vertex, hGraph, explored, finish)
return finish
def dfs(vertex, graph, explored, finish):
global count
stack = [[]] #stack contains the possible edges to explore
traceBack = []
traceBack.append(vertex)
while len(stack) > 0:
explored[vertex] = True
try:
for n in graph[vertex]:
if explored[n] == False:
if n not in stack[-1]: #to prevent double adding (when we backtrack to previous vertex)
stack[-1].append(n)
else:
if n in stack[-1]: #make sure num exists in array before removing
stack[-1].remove(n)
except IndexError: pass
if len(stack[-1]) == 0: #if current stack is empty then there are no outgoing edges:
finish[count] = traceBack.pop() #thus, we can add finishing times
count += 1
if len(traceBack) > 0: #to prevent popping empty array
vertex = traceBack[-1]
stack.pop()
else:
vertex = stack[-1][-1] #pick last element in top of stack to be new vertex
stack.append([])
traceBack.append(vertex)
Here is a way. Every time we face the following condition, we do a callback or we mark the time,
The node has no outgoing edge(no way).
When the parent of the traversing node is different from last parent(parent change). In that case we finish the last parent.
When we reach at the end of the stack(tree end). We finish the last parent.
Here is the code,
var dfs_with_finishing_time = function(graph, start, cb) {
var explored = [];
var parent = [];
var i = 0;
for(i = 0; i < graph.length; i++) {
if(i in explored)
continue;
explored[i] = 1;
var stack = [i];
parent[i] = -1;
var last_parent = -1;
while(stack.length) {
var u = stack.pop();
var k = 0;
var no_way = true;
for(k = 0; k < graph.length; k++) {
if(k in explored)
continue;
if(!graph[u][k])
continue;
stack.push(k);
explored[k] = 1;
parent[k] = u;
no_way = false;
}
if(no_way) {
cb(null, u+1); // no way, reversed post-ordering (finishing time)
}
if(last_parent != parent[u] && last_parent != -1) {
cb(null, last_parent+1); // parent change, reversed post-ordering (finishing time)
}
last_parent = parent[u];
}
if(last_parent != -1) {
cb(null, last_parent+1); // tree end, reversed post-ordering (finishing time)
}
}
}
Related
I would like to modify the networkx implementation of Johnson's algorithm for finding all elementary cycles in a graph (also copied below) so that is does not search for cycles larger than some maximum length.
def simple_cycles(G):
def _unblock(thisnode,blocked,B):
stack=set([thisnode])
while stack:
node=stack.pop()
if node in blocked:
blocked.remove(node)
stack.update(B[node])
B[node].clear()
# Johnson's algorithm requires some ordering of the nodes.
# We assign the arbitrary ordering given by the strongly connected comps
# There is no need to track the ordering as each node removed as processed.
subG = type(G)(G.edges_iter()) # save the actual graph so we can mutate it here
# We only take the edges because we do not want to
# copy edge and node attributes here.
sccs = list(nx.strongly_connected_components(subG))
while sccs:
scc=sccs.pop()
# order of scc determines ordering of nodes
startnode = scc.pop()
# Processing node runs "circuit" routine from recursive version
path=[startnode]
blocked = set() # vertex: blocked from search?
closed = set() # nodes involved in a cycle
blocked.add(startnode)
B=defaultdict(set) # graph portions that yield no elementary circuit
stack=[ (startnode,list(subG[startnode])) ] # subG gives component nbrs
while stack:
thisnode,nbrs = stack[-1]
if nbrs:
nextnode = nbrs.pop()
# print thisnode,nbrs,":",nextnode,blocked,B,path,stack,startnode
# f=raw_input("pause")
if nextnode == startnode:
yield path[:]
closed.update(path)
# print "Found a cycle",path,closed
elif nextnode not in blocked:
path.append(nextnode)
stack.append( (nextnode,list(subG[nextnode])) )
closed.discard(nextnode)
blocked.add(nextnode)
continue
# done with nextnode... look for more neighbors
if not nbrs: # no more nbrs
if thisnode in closed:
_unblock(thisnode,blocked,B)
else:
for nbr in subG[thisnode]:
if thisnode not in B[nbr]:
B[nbr].add(thisnode)
stack.pop()
assert path[-1]==thisnode
path.pop()
# done processing this node
subG.remove_node(startnode)
H=subG.subgraph(scc) # make smaller to avoid work in SCC routine
sccs.extend(list(nx.strongly_connected_components(H)))
Of course, I'd also accept a suggestion that differs from the implementation above but runs in similar time. Also, my project uses networkx, so feel free to use any other function from that library, such as shortest_path.
(Note: not homework!)
Edit
Dorijan Cirkveni suggested (if I understood correctly):
if len(blocked) >= limit + 1:
continue
elif nextnode == startnode:
yield path[:]
However, that doesn't work. Here's a counterexample:
G = nx.DiGraph()
G.add_edge(1, 2)
G.add_edge(2, 3)
G.add_edge(3, 1)
G.add_edge(3, 2)
G.add_edge(3, 4)
my_cycles = list(simple_cycles(G, limit = 3)) # Modification
nx_cycles = list(nx.simple_cycles(G)) # Original networkx code
print("MY:", my_cycles)
print("NX:", nx_cycles)
Will output
MY: [[2, 3]]
NX: [[1, 2, 3], [2, 3]]
Also, if we substitute blocked by stack or path, the result will be correct for this example, but will give the wrong answer for other graphs.
This is a highly modified version of this code, but at least it is working.
def simple_cycles(G, limit):
subG = type(G)(G.edges())
sccs = list(nx.strongly_connected_components(subG))
while sccs:
scc = sccs.pop()
startnode = scc.pop()
path = [startnode]
blocked = set()
blocked.add(startnode)
stack = [(startnode, list(subG[startnode]))]
while stack:
thisnode, nbrs = stack[-1]
if nbrs and len(path) < limit:
nextnode = nbrs.pop()
if nextnode == startnode:
yield path[:]
elif nextnode not in blocked:
path.append(nextnode)
stack.append((nextnode, list(subG[nextnode])))
blocked.add(nextnode)
continue
if not nbrs or len(path) >= limit:
blocked.remove(thisnode)
stack.pop()
path.pop()
subG.remove_node(startnode)
H = subG.subgraph(scc)
sccs.extend(list(nx.strongly_connected_components(H)))
You only need to change two things:
The definition line (obviously)
def simple_cycles(G,limit):
Add an overriding condition somewhere in the next node processor (example below:)
...
if blocked.size>=limit+1:
pass
elif if nextnode == startnode:
yield path[:] ...
Bonus: Using == instead of >= will result in the function running as there is no limit when a negative value is used, as opposed to not returning any nodes.
I am having trouble wrapping my head around my code in the nested for loop. I am following the Kahn's algorithm here on wiki: Kahn's. I don't understand how to test for if outgoingEdge has incoming edges for each endArray element (m).
Here is what I have so far:
def topOrdering(self, graph):
retList = []
candidates = set()
left = []
right = []
for key in graph:
left.append(key)
right.append(graph[key])
flattenedRight = [val for sublist in right for val in sublist]
for element in left:
if element not in flattenedRight:
#set of all nodes with no incoming edges
candidates.add(element)
candidates = sorted(candidates)
while len(candidates) != 0:
a = candidates.pop(0)
retList.append(a)
endArray = graph[a]
for outGoingEdge in endArray:
if outGoingEdge not in flattenedRight:
candidates.append(outGoingEdge)
#flattenedRight.remove(outGoingEdge)
del outGoingEdge
if not graph:
return "the input graph is not a DAG"
else:
return retList
Here is a picture visualizing my algorithm. Graph is in a form of an adjacency list.
You can store indegree (number of incoming edges) separately and decrement the count every time you remove a vertex from empty set. When count becomes 0 add the vertex to empty set to be processed later. Here's example:
def top_sort(adj_list):
# Find number of incoming edges for each vertex
in_degree = {}
for x, neighbors in adj_list.items():
in_degree.setdefault(x, 0)
for n in neighbors:
in_degree[n] = in_degree.get(n, 0) + 1
# Iterate over edges to find vertices with no incoming edges
empty = {v for v, count in in_degree.items() if count == 0}
result = []
while empty:
# Take random vertex from empty set
v = empty.pop()
result.append(v)
# Remove edges originating from it, if vertex not present
# in adjacency list use empty list as neighbors
for neighbor in adj_list.get(v, []):
in_degree[neighbor] -= 1
# If neighbor has no more incoming edges add it to empty set
if in_degree[neighbor] == 0:
empty.add(neighbor)
if len(result) != len(in_degree):
return None # Not DAG
else:
return result
ADJ_LIST = {
1: [2],
2: [3],
4: [2],
5: [3]
}
print(top_sort(ADJ_LIST))
Output:
[1, 4, 5, 2, 3]
This is my pathfinding function:
def get_distance(x1,y1,x2,y2):
neighbors = [(-1,0),(1,0),(0,-1),(0,1)]
old_nodes = [(square_pos[x1,y1],0)]
new_nodes = []
for i in range(50):
for node in old_nodes:
if node[0].x == x2 and node[0].y == y2:
return node[1]
for neighbor in neighbors:
try:
square = square_pos[node[0].x+neighbor[0],node[0].y+neighbor[1]]
if square.lightcycle == None:
new_nodes.append((square,node[1]))
except KeyError:
pass
old_nodes = []
old_nodes = list(new_nodes)
new_nodes = []
nodes = []
return 50
The problem is that the AI takes to long to respond( response time <= 100ms)
This is just a python way of doing https://en.wikipedia.org/wiki/Pathfinding#Sample_algorithm
You should replace your algorithm with A*-search with the Manhattan distance as a heuristic.
One reasonably fast solution is to implement the Dijkstra algorithm (that I have already implemented in that question):
Build the original map. It's a masked array where the walker cannot walk on masked element:
%pylab inline
map_size = (20,20)
MAP = np.ma.masked_array(np.zeros(map_size), np.random.choice([0,1], size=map_size))
matshow(MAP)
Below is the Dijkstra algorithm:
def dijkstra(V):
mask = V.mask
visit_mask = mask.copy() # mask visited cells
m = numpy.ones_like(V) * numpy.inf
connectivity = [(i,j) for i in [-1, 0, 1] for j in [-1, 0, 1] if (not (i == j == 0))]
cc = unravel_index(V.argmin(), m.shape) # current_cell
m[cc] = 0
P = {} # dictionary of predecessors
#while (~visit_mask).sum() > 0:
for _ in range(V.size):
#print cc
neighbors = [tuple(e) for e in asarray(cc) - connectivity
if e[0] > 0 and e[1] > 0 and e[0] < V.shape[0] and e[1] < V.shape[1]]
neighbors = [ e for e in neighbors if not visit_mask[e] ]
tentative_distance = [(V[e]-V[cc])**2 for e in neighbors]
for i,e in enumerate(neighbors):
d = tentative_distance[i] + m[cc]
if d < m[e]:
m[e] = d
P[e] = cc
visit_mask[cc] = True
m_mask = ma.masked_array(m, visit_mask)
cc = unravel_index(m_mask.argmin(), m.shape)
return m, P
def shortestPath(start, end, P):
Path = []
step = end
while 1:
Path.append(step)
if step == start: break
if P.has_key(step):
step = P[step]
else:
break
Path.reverse()
return asarray(Path)
And the result:
start = (2,8)
stop = (17,19)
D, P = dijkstra(MAP)
path = shortestPath(start, stop, P)
imshow(MAP, interpolation='nearest')
plot(path[:,1], path[:,0], 'ro-', linewidth=2.5)
Below some timing statistics:
%timeit dijkstra(MAP)
#10 loops, best of 3: 32.6 ms per loop
The biggest issue with your code is that you don't do anything to avoid the same coordinates being visited multiple times. This means that the number of nodes you visit is guaranteed to grow exponentially, since it can keep going back and forth over the first few nodes many times.
The best way to avoid duplication is to maintain a set of the coordinates we've added to the queue (though if your node values are hashable, you might be able to add them directly to the set instead of coordinate tuples). Since we're doing a breadth-first search, we'll always reach a given coordinate by (one of) the shortest path(s), so we never need to worry about finding a better route later on.
Try something like this:
def get_distance(x1,y1,x2,y2):
neighbors = [(-1,0),(1,0),(0,-1),(0,1)]
nodes = [(square_pos[x1,y1],0)]
seen = set([(x1, y1)])
for node, path_length in nodes:
if path_length == 50:
break
if node.x == x2 and node.y == y2:
return path_length
for nx, ny in neighbors:
try:
square = square_pos[node.x + nx, node.y + ny]
if square.lightcycle == None and (square.x, square.y) not in seen:
nodes.append((square, path_length + 1))
seen.add((square.x, square.y))
except KeyError:
pass
return 50
I've also simplified the loop a bit. Rather than switching out the list after each depth, you can just use one loop and add to its end as you're iterating over the earlier values. I still abort if a path hasn't been found with fewer than 50 steps (using the distance stored in the 2-tuple, rather than the number of passes of the outer loop). A further improvement might be to use a collections.dequeue for the queue, since you could efficiently pop from one end while appending to the other end. It probably won't make a huge difference, but might avoid a little bit of memory usage.
I also avoided most of the indexing by one and zero in favor of unpacking into separate variable names in the for loops. I think this is much easier to read, and it avoids confusion since the two different kinds of 2-tuples had had different meanings (one is a node, distance tuple, the other is x, y).
I am trying to use interval tree to solve this problem. Below is my try but understandably it is not working i.e. it is not returning all the intervals.
A cricket match is going to be held. The field is represented by a 1D plane. A cricketer, Mr. X has favorite shots. Each shot has a particular range. The range of the ith shot is from A(i) to B(i). That means his favorite shot can be anywhere in this range. Each player on the opposite team can field only in a particular range. Player can field from A(i) to B(i). You are given the favorite shots of Mr. X and the range of M players.
Brute force solution is timing out for some of the test cases. All I need is an idea.
class node:
def __init__(self, low, high):
self.left = None
self.right = None
self.highest = high
self.low = low
self.high = high
class interval:
def __init__(self):
self.head = None
self.count = 0
def add_node(self, node):
if self.head == None:
self.head = node
else:
if self.head.highest < node.high:
self.head.highest = node.high
self.__add_node(self.head, node)
def __add_node(self, head, node):
if node.low <= head.low:
if head.left == None:
head.left = node
else:
if head.left.highest < node.high:
head.left.highest = node.high
self.__add_node(head.left, node)
else:
if head.right == None:
head.right = node
else:
if head.right.highest < node.high:
head.right.highest = node.high
self.__add_node(head.right, node)
def search(self, node):
self.count = 0
return self._search(self.head, node)
def _search(self, head, node):
if node.low <= head.high and node.high >= head.low:
self.count += 1
print(self.count, head.high, head.low)
if head.left != None and head.left.highest >= node.low:
return self._search(head.left, node)
elif head.right != None:
return self._search(head.right, node)
return self.count
data = input().split(" ")
N = int(data[0])
M = int(data[1])
intervals = interval()
for i in range(N):
data = input().split(" ")
p = node(int(data[0]), int(data[1]))
intervals.add_node(p)
count = 0
for i in range(M):
data = input().split(" ")
count += intervals.search(node(int(data[0]), int(data[1])))
print(count)
The key to solving the problem is to realize that there's no need to compare single fielding range to a single shot range since only the total number intersecting ranges needs to be known. In order to achieve this in O(n log n) time following algorithm can be used.
Take the shot ranges and create two ordered lists: one for start values and another for end values. The example problem has shots [[1, 2], [2, 3], [4, 5], [6, 7]] and after the sorting we have two lists: [1, 2, 4, 6] and [2, 3, 5, 7]. Everything so far can be done in O(n log n) time.
Next process the outfield players. First player has range [1, 5]. When we do binary search with start value 1 to sorted end values [2, 3, 5, 7] we notice that all the shot ranges end after the start value. Next we do another search with end value 5 to sorted start values [1, 2, 4, 6] we notice that 3 shot ranges start before or at the end value. Then we do simple calculation 3 - 0 to conclude that first outfield player can intersect 3 ranges. Repeating this to all outfield players (M) takes O(m log n) time.
I did some homework and tried to solve it with interval tree.But as you have realized,traditional interval tree may not be suitable for this problem.This is because there is only one match returned when searching an interval tree,but we need to find all matches.More exactly,we just need to count all matches,it's not required to find all of them.
So I add 2 fields to your node for the sake of pruning.I'm not familiar with python,It looks like this in java:
static class Node implements Comparable<Node> {
Node left;//left child
Node right;//right child
int low;//low of current node
int high;//high of current node
int lowest;//lowest of current subtree
int highest;//highest of current subtree
int nodeCount;//node count of current subtree
#Override
public int compareTo(Node o) {
return low - o.low;
}
}
In order to make an balanced tree,I sort all the intervals and then build the tree from middle to both sides recursively(It may be better to use red-black tree).This affects a lot to performance,so I suggest to add this feature to your program.
The preparations have been finished so far.The search method looks like this:
private static int search(Node node, int low, int high) {
//pruning 1: interval [low,high] totally overlaps with subtree,thus overlaps with all children
if (node.lowest >= low && node.highest <= high) {
return node.nodeCount;
}
//pruning 2: interval [low,high] never overlaps with subtree
if (node.highest < low || node.lowest > high) {
return 0;
}
//can't judge,go through left and right child
//overlapped with current node or not
int c = (high < node.low || low > node.high ? 0 : 1);
if (node.left != null) {
c += search(node.left, low, high);
}
if (node.right != null) {
c += search(node.right, low, high);
}
return c;
}
There are 2 main prunings as the comments show.There is no need to go through the children when the current subtree is totally overlapped or never overlapped.
It works well in most conditions and has been accepted by the system.It costs about 4000ms to solve the most complicate test case(N=99600,M=98000).I'm still trying to do more optimization,hoping to be helpful.
here is the first part of the code that i have did for Kosaraju's algorithm.
###### reading the data #####
with open('data.txt') as req_file:
ori_data = []
for line in req_file:
line = line.split()
if line:
line = [int(i) for i in line]
ori_data.append(line)
###### forming the Grev ####
revscc_dic = {}
for temp in ori_data:
if temp[1] not in revscc_dic:
revscc_dic[temp[1]] = [temp[0]]
else:
revscc_dic[temp[1]].append(temp[0])
print revscc_dic
######## finding the G#####
scc_dic = {}
for temp in ori_data:
if temp[0] not in scc_dic:
scc_dic[temp[0]] = [temp[1]]
else:
scc_dic[temp[0]].append(temp[1])
print scc_dic
##### iterative dfs ####
path = []
for i in range(max(max(ori_data)),0,-1):
start = i
q=[start]
while q:
v=q.pop(0)
if v not in path:
path.append(v)
q=revscc_dic[v]+q
print path
The code reads the data and forms Grev and G correctly. I have written a code for iterative dfs. How can i include to find the finishing time ?? I understand finding the finishing time using paper and pen but I do not understand the part of finishing time as a code ?? how can I implement it.. Only after this I can proceed my next part of code. Pls help. Thanks in advance.
The data.txt file contains:
1 4
2 8
3 6
4 7
5 2
6 9
7 1
8 5
8 6
9 7
9 3
please save it as data.txt.
With recursive dfs, it is easy to see when a given vertex has "finished" (i.e. when we have visited all of its children in the dfs tree). The finish time can be calculated just after the recursive call has returned.
However with iterative dfs, this is not so easy. Now that we are iteratively processing the queue using a while loop we have lost some of the nested structure that is associated with function calls. Or more precisely, we don't know when backtracking occurs. Unfortunately, there is no way to know when backtracking occurs without adding some additional information to our stack of vertices.
The quickest way to add finishing times to your dfs implementation is like so:
##### iterative dfs (with finish times) ####
path = []
time = 0
finish_time_dic = {}
for i in range(max(max(ori_data)),0,-1):
start = i
q = [start]
while q:
v = q.pop(0)
if v not in path:
path.append(v)
q = [v] + q
for w in revscc_dic[v]:
if w not in path: q = [w] + q
else:
if v not in finish_time_dic:
finish_time_dic[v] = time
time += 1
print path
print finish_time_dic
The trick used here is that when we pop off v from the stack, if it is the first time we have seen it, then we add it back to the stack again. This is done using: q = [v] + q. We must push v onto the stack before we push on its neighbours (we write the code that pushes v before the for loop that pushes v's neighbours) - or else the trick doesn't work. Eventually we will pop v off the stack again. At this point, v has finished! We have seen v before, so, we go into the else case and compute a fresh finish time.
For the graph provided, finish_time_dic gives the correct finishing times:
{1: 6, 2: 1, 3: 3, 4: 7, 5: 0, 6: 4, 7: 8, 8: 2, 9: 5}
Note that this dfs algorithm (with the finishing times modification) still has O(V+E) complexity, despite the fact that we are pushing each node of the graph onto the stack twice. However, more elegant solutions exist.
I recommend reading Chapter 5 of Python Algorithms: Mastering Basic Algorithms in the Python Language by Magnus Lie Hetland (ISBN: 1430232374, 9781430232377). Question 5-6 and 5-7 (on page 122) describe your problem exactly. The author answers these questions and gives an alternate solution to the problem.
Questions:
5-6 In recursive DFS, backtracking occurs when you return from one of the recursive calls. But where has the backtracking gone in the iterative version?
5-7. Write a nonrecursive version of DFS that can deal determine finish-times.
Answers:
5-6 It’s not really represented at all in the iterative version. It just implicitly occurs once you’ve popped off all your “traversal descendants” from the stack.
5-7 As explained in Exercise 5-6, there is no point in the code where backtracking occurs in the iterative DFS, so we can’t just set the finish time at some specific place (like in the recursive one). Instead, we’d need to add a marker to the stack. For example, instead of adding the neighbors of u to the stack, we could add edges of the form (u, v), and before all of them, we’d push (u, None), indicating the backtracking point for u.
Iterative DFS itself is not complicated, as seen from Wikipedia. However, calculating the finish time of each node requires some tweaks to the algorithm. We only pop the node off the stack the 2nd time we encounter it.
Here's my implementation which I feel demonstrates what's going on a bit more clearly:
step = 0 # time counter
def dfs_visit(g, v):
"""Run iterative DFS from node V"""
global step
total = 0
stack = [v] # create stack with starting vertex
while stack: # while stack is not empty
step += 1
v = stack[-1] # peek top of stack
if v.color: # if already seen
v = stack.pop() # done with this node, pop it from stack
if v.color == 1: # if GRAY, finish this node
v.time_finish = step
v.color = 2 # BLACK, done
else: # seen for first time
v.color = 1 # GRAY: discovered
v.time_discover = step
total += 1
for w in v.child: # for all neighbor (v, w)
if not w.color: # if not seen
stack.append(w)
return total
def dfs(g):
"""Run DFS on graph"""
global step
step = 0 # reset step counter
for k, v in g.nodes.items():
if not v.color:
dfs_visit(g, v)
I am following the conventions of the CLR Algorithm Book and use node coloring to designate its state during the DFS search. I feel this is easier to understand than using a separate list to track node state.
All nodes start out as white. When it's discovered during the search it is marked as gray. When we are done with it, it is marked as black.
Within the while loop, if a node is white we keep it in the stack, and change its color to gray. If it's gray we change its color to black, and set its finish time. If it's black we just ignore it.
It is possible for a node on the stack to be black (even with our coloring check before adding it to the stack). A white node can be added to the stack twice (via two different neighbors). One will eventually turn black. When we reach the 2nd instance on the stack, we need to make sure we don't change its already set finish time.
Here are some additional support codes:
class Node(object):
def __init__(self, name=None):
self.name = name
self.child = [] # children | adjacency list
self.color = 0 # 0: white [unvisited], 1: gray [found], 2: black [finished]
self.time_discover = None # DFS
self.time_finish = None # DFS
class Graph(object):
def __init__(self):
self.nodes = defaultdict(Node) # list of Nodes
self.max_heap = [] # nodes in decreasing finish time for SCC
def build_max_heap(self):
"""Build list of nodes in max heap using DFS finish time"""
for k, v in self.nodes.items():
self.max_heap.append((0-v.time_finish, v)) # invert finish time for max heap
heapq.heapify(self.max_heap)
To run DFS on the reverse graph, you can build a parent list similar to the child list for each Node when the edges file is processed, and use the parent list instead of the child list in dfs_visit().
To process Nodes in decreasing finish time for the last part of SCC computation, you can build a max heap of Nodes, and use that max heap in dfs_visit() instead of simply the child list.
while g.max_heap:
v = heapq.heappop(g.max_heap)[1]
if not v.color:
size = dfs_visit(g, v)
scc_size.append(size)
I had a few issues with the order produced by Lawson's version of the iterative DFS. Here is code for my version which has a 1-to-1 mapping with a recursive version of DFS.
n = len(graph)
time = 0
finish_times = [0] * (n + 1)
explored = [False] * (n + 1)
# Determine if every vertex connected to v
# has already been explored
def all_explored(G, v):
if v in G:
for w in G[v]:
if not explored[w]:
return False
return True
# Loop through vertices in reverse order
for v in xrange(n, 0, -1):
if not explored[v]:
stack = [v]
while stack:
print(stack)
v = stack[-1]
explored[v] = True
# If v still has outgoing edges to explore
if not all_explored(graph_reversed, v):
for w in graph_reversed[v]:
# Explore w before others attached to v
if not explored[w]:
stack.append(w)
break
# We have explored vertices findable from v
else:
stack.pop()
time += 1
finish_times[v] = time
Here are the recursive and iterative implementations in java:
int time = 0;
public void dfsRecursive(Vertex vertex) {
time += 1;
vertex.setVisited(true);
vertex.setDiscovered(time);
for (String neighbour : vertex.getNeighbours()) {
if (!vertices.get(neighbour).getVisited()) {
dfsRecursive(vertices.get(neighbour));
}
}
time += 1;
vertex.setFinished(time);
}
public void dfsIterative(Vertex vertex) {
Stack<Vertex> stack = new Stack<>();
stack.push(vertex);
while (!stack.isEmpty()) {
Vertex current = stack.pop();
if (!current.getVisited()) {
time += 1;
current.setVisited(true);
current.setDiscovered(time);
stack.push(current);
List<String> currentsNeigbours = current.getNeighbours();
for (int i = currentsNeigbours.size() - 1; i >= 0; i--) {
String currentNeigbour = currentsNeigbours.get(i);
Vertex neighBour = vertices.get(currentNeigbour);
if (!neighBour.getVisited())
stack.push(neighBour);
}
} else {
if (current.getFinished() < 1) {
time += 1;
current.setFinished(time);
}
}
}
}
First, you should know exactly what is finished time. In recursive dfs, finished time is when all of the adjacent nodes [V]s of a Node v is finished,
with this keeping in mind you need to have additional data structure to store all infos.
adj[][] //graph
visited[]=NULL //array of visited node
finished[]=NULL //array of finished node
Stack st=new Stack //normal stack
Stack backtrack=new Stack //additional stack
function getFinishedTime(){
for(node i in adj){
if (!vistied.contains[i]){
st.push(i);
visited.add(i)
while(!st.isEmpty){
int j=st.pop();
int[] unvisitedChild= getUnvistedChild(j);
if(unvisitedChild!=null){
for(int c in unvisitedChild){
st.push(c);
visited.add(c);
}
backtrack.push([j,unvisitedChild]); //you can store each entry as array with the first index as the parent node j, followed by all the unvisited child node.
}
else{
finished.add(j);
while(!backtrack.isEmpty&&finished.containsALL(backtrack.peek())) //all of the child node is finished, then we can set the parent node visited
{
parent=backtrack.pop()[0];
finished.add(parent);
}
}
}
}
}
function getUnvistedChild(int i){
unvisitedChild[]=null
for(int child in adj[i]){
if(!visited.contains(child))
unvisitedChild.add(child);
}
return unvisitedChild;
}
and the finished time should be
[5, 2, 8, 3, 6, 9, 1, 4, 7]