kosaraju finding finishing time using iterative dfs - python

here is the first part of the code that i have did for Kosaraju's algorithm.
###### reading the data #####
with open('data.txt') as req_file:
ori_data = []
for line in req_file:
line = line.split()
if line:
line = [int(i) for i in line]
ori_data.append(line)
###### forming the Grev ####
revscc_dic = {}
for temp in ori_data:
if temp[1] not in revscc_dic:
revscc_dic[temp[1]] = [temp[0]]
else:
revscc_dic[temp[1]].append(temp[0])
print revscc_dic
######## finding the G#####
scc_dic = {}
for temp in ori_data:
if temp[0] not in scc_dic:
scc_dic[temp[0]] = [temp[1]]
else:
scc_dic[temp[0]].append(temp[1])
print scc_dic
##### iterative dfs ####
path = []
for i in range(max(max(ori_data)),0,-1):
start = i
q=[start]
while q:
v=q.pop(0)
if v not in path:
path.append(v)
q=revscc_dic[v]+q
print path
The code reads the data and forms Grev and G correctly. I have written a code for iterative dfs. How can i include to find the finishing time ?? I understand finding the finishing time using paper and pen but I do not understand the part of finishing time as a code ?? how can I implement it.. Only after this I can proceed my next part of code. Pls help. Thanks in advance.
The data.txt file contains:
1 4
2 8
3 6
4 7
5 2
6 9
7 1
8 5
8 6
9 7
9 3
please save it as data.txt.

With recursive dfs, it is easy to see when a given vertex has "finished" (i.e. when we have visited all of its children in the dfs tree). The finish time can be calculated just after the recursive call has returned.
However with iterative dfs, this is not so easy. Now that we are iteratively processing the queue using a while loop we have lost some of the nested structure that is associated with function calls. Or more precisely, we don't know when backtracking occurs. Unfortunately, there is no way to know when backtracking occurs without adding some additional information to our stack of vertices.
The quickest way to add finishing times to your dfs implementation is like so:
##### iterative dfs (with finish times) ####
path = []
time = 0
finish_time_dic = {}
for i in range(max(max(ori_data)),0,-1):
start = i
q = [start]
while q:
v = q.pop(0)
if v not in path:
path.append(v)
q = [v] + q
for w in revscc_dic[v]:
if w not in path: q = [w] + q
else:
if v not in finish_time_dic:
finish_time_dic[v] = time
time += 1
print path
print finish_time_dic
The trick used here is that when we pop off v from the stack, if it is the first time we have seen it, then we add it back to the stack again. This is done using: q = [v] + q. We must push v onto the stack before we push on its neighbours (we write the code that pushes v before the for loop that pushes v's neighbours) - or else the trick doesn't work. Eventually we will pop v off the stack again. At this point, v has finished! We have seen v before, so, we go into the else case and compute a fresh finish time.
For the graph provided, finish_time_dic gives the correct finishing times:
{1: 6, 2: 1, 3: 3, 4: 7, 5: 0, 6: 4, 7: 8, 8: 2, 9: 5}
Note that this dfs algorithm (with the finishing times modification) still has O(V+E) complexity, despite the fact that we are pushing each node of the graph onto the stack twice. However, more elegant solutions exist.
I recommend reading Chapter 5 of Python Algorithms: Mastering Basic Algorithms in the Python Language by Magnus Lie Hetland (ISBN: 1430232374, 9781430232377). Question 5-6 and 5-7 (on page 122) describe your problem exactly. The author answers these questions and gives an alternate solution to the problem.
Questions:
5-6 In recursive DFS, backtracking occurs when you return from one of the recursive calls. But where has the backtracking gone in the iterative version?
5-7. Write a nonrecursive version of DFS that can deal determine finish-times.
Answers:
5-6 It’s not really represented at all in the iterative version. It just implicitly occurs once you’ve popped off all your “traversal descendants” from the stack.
5-7 As explained in Exercise 5-6, there is no point in the code where backtracking occurs in the iterative DFS, so we can’t just set the finish time at some specific place (like in the recursive one). Instead, we’d need to add a marker to the stack. For example, instead of adding the neighbors of u to the stack, we could add edges of the form (u, v), and before all of them, we’d push (u, None), indicating the backtracking point for u.

Iterative DFS itself is not complicated, as seen from Wikipedia. However, calculating the finish time of each node requires some tweaks to the algorithm. We only pop the node off the stack the 2nd time we encounter it.
Here's my implementation which I feel demonstrates what's going on a bit more clearly:
step = 0 # time counter
def dfs_visit(g, v):
"""Run iterative DFS from node V"""
global step
total = 0
stack = [v] # create stack with starting vertex
while stack: # while stack is not empty
step += 1
v = stack[-1] # peek top of stack
if v.color: # if already seen
v = stack.pop() # done with this node, pop it from stack
if v.color == 1: # if GRAY, finish this node
v.time_finish = step
v.color = 2 # BLACK, done
else: # seen for first time
v.color = 1 # GRAY: discovered
v.time_discover = step
total += 1
for w in v.child: # for all neighbor (v, w)
if not w.color: # if not seen
stack.append(w)
return total
def dfs(g):
"""Run DFS on graph"""
global step
step = 0 # reset step counter
for k, v in g.nodes.items():
if not v.color:
dfs_visit(g, v)
I am following the conventions of the CLR Algorithm Book and use node coloring to designate its state during the DFS search. I feel this is easier to understand than using a separate list to track node state.
All nodes start out as white. When it's discovered during the search it is marked as gray. When we are done with it, it is marked as black.
Within the while loop, if a node is white we keep it in the stack, and change its color to gray. If it's gray we change its color to black, and set its finish time. If it's black we just ignore it.
It is possible for a node on the stack to be black (even with our coloring check before adding it to the stack). A white node can be added to the stack twice (via two different neighbors). One will eventually turn black. When we reach the 2nd instance on the stack, we need to make sure we don't change its already set finish time.
Here are some additional support codes:
class Node(object):
def __init__(self, name=None):
self.name = name
self.child = [] # children | adjacency list
self.color = 0 # 0: white [unvisited], 1: gray [found], 2: black [finished]
self.time_discover = None # DFS
self.time_finish = None # DFS
class Graph(object):
def __init__(self):
self.nodes = defaultdict(Node) # list of Nodes
self.max_heap = [] # nodes in decreasing finish time for SCC
def build_max_heap(self):
"""Build list of nodes in max heap using DFS finish time"""
for k, v in self.nodes.items():
self.max_heap.append((0-v.time_finish, v)) # invert finish time for max heap
heapq.heapify(self.max_heap)
To run DFS on the reverse graph, you can build a parent list similar to the child list for each Node when the edges file is processed, and use the parent list instead of the child list in dfs_visit().
To process Nodes in decreasing finish time for the last part of SCC computation, you can build a max heap of Nodes, and use that max heap in dfs_visit() instead of simply the child list.
while g.max_heap:
v = heapq.heappop(g.max_heap)[1]
if not v.color:
size = dfs_visit(g, v)
scc_size.append(size)

I had a few issues with the order produced by Lawson's version of the iterative DFS. Here is code for my version which has a 1-to-1 mapping with a recursive version of DFS.
n = len(graph)
time = 0
finish_times = [0] * (n + 1)
explored = [False] * (n + 1)
# Determine if every vertex connected to v
# has already been explored
def all_explored(G, v):
if v in G:
for w in G[v]:
if not explored[w]:
return False
return True
# Loop through vertices in reverse order
for v in xrange(n, 0, -1):
if not explored[v]:
stack = [v]
while stack:
print(stack)
v = stack[-1]
explored[v] = True
# If v still has outgoing edges to explore
if not all_explored(graph_reversed, v):
for w in graph_reversed[v]:
# Explore w before others attached to v
if not explored[w]:
stack.append(w)
break
# We have explored vertices findable from v
else:
stack.pop()
time += 1
finish_times[v] = time

Here are the recursive and iterative implementations in java:
int time = 0;
public void dfsRecursive(Vertex vertex) {
time += 1;
vertex.setVisited(true);
vertex.setDiscovered(time);
for (String neighbour : vertex.getNeighbours()) {
if (!vertices.get(neighbour).getVisited()) {
dfsRecursive(vertices.get(neighbour));
}
}
time += 1;
vertex.setFinished(time);
}
public void dfsIterative(Vertex vertex) {
Stack<Vertex> stack = new Stack<>();
stack.push(vertex);
while (!stack.isEmpty()) {
Vertex current = stack.pop();
if (!current.getVisited()) {
time += 1;
current.setVisited(true);
current.setDiscovered(time);
stack.push(current);
List<String> currentsNeigbours = current.getNeighbours();
for (int i = currentsNeigbours.size() - 1; i >= 0; i--) {
String currentNeigbour = currentsNeigbours.get(i);
Vertex neighBour = vertices.get(currentNeigbour);
if (!neighBour.getVisited())
stack.push(neighBour);
}
} else {
if (current.getFinished() < 1) {
time += 1;
current.setFinished(time);
}
}
}
}

First, you should know exactly what is finished time. In recursive dfs, finished time is when all of the adjacent nodes [V]s of a Node v is finished,
with this keeping in mind you need to have additional data structure to store all infos.
adj[][] //graph
visited[]=NULL //array of visited node
finished[]=NULL //array of finished node
Stack st=new Stack //normal stack
Stack backtrack=new Stack //additional stack
function getFinishedTime(){
for(node i in adj){
if (!vistied.contains[i]){
st.push(i);
visited.add(i)
while(!st.isEmpty){
int j=st.pop();
int[] unvisitedChild= getUnvistedChild(j);
if(unvisitedChild!=null){
for(int c in unvisitedChild){
st.push(c);
visited.add(c);
}
backtrack.push([j,unvisitedChild]); //you can store each entry as array with the first index as the parent node j, followed by all the unvisited child node.
}
else{
finished.add(j);
while(!backtrack.isEmpty&&finished.containsALL(backtrack.peek())) //all of the child node is finished, then we can set the parent node visited
{
parent=backtrack.pop()[0];
finished.add(parent);
}
}
}
}
}
function getUnvistedChild(int i){
unvisitedChild[]=null
for(int child in adj[i]){
if(!visited.contains(child))
unvisitedChild.add(child);
}
return unvisitedChild;
}
and the finished time should be
[5, 2, 8, 3, 6, 9, 1, 4, 7]

Related

Classifying edges in DFS on a directed graph

Based on a DFS traversal, I want to classify edges (u, v) in a directed graph as:
Tree edge: when v is visited for the first time as we traverse the edge
Back edge: when v is an ancestor of u in the traversal tree
Forward edge: when v is a descendant of u in the traversal tree
Cross edge: when v is neither an ancestor or descendant of u in the traversal tree
I was following a GeeksForGeeks tutorial to write this code:
class Graph:
def __init__(self, v):
self.time = 0
self.traversal_array = []
self.v = v
self.graph_list = [[] for _ in range(v)]
def dfs(self):
self.visited = [False]*self.v
self.start_time = [0]*self.v
self.end_time = [0]*self.v
self.ff = 0
self.fc = 0
for node in range(self.v):
if not self.visited[node]:
self.traverse_dfs(node)
def traverse_dfs(self, node):
# mark the node visited
self.visited[node] = True
# add the node to traversal
self.traversal_array.append(node)
# get the starting time
self.start_time[node] = self.time
# increment the time by 1
self.time += 1
# traverse through the neighbours
for neighbour in self.graph_list[node]:
# if a node is not visited
if not self.visited[neighbour]:
# marks the edge as tree edge
print('Tree Edge:', str(node)+'-->'+str(neighbour))
# dfs from that node
self.traverse_dfs(neighbour)
else:
# when the parent node is traversed after the neighbour node
if self.start_time[node] > self.start_time[neighbour] and self.end_time[node] < self.end_time[neighbour]:
print('Back Edge:', str(node)+'-->'+str(neighbour))
# when the neighbour node is a descendant but not a part of tree
elif self.start_time[node] < self.start_time[neighbour] and self.end_time[node] > self.end_time[neighbour]:
print('Forward Edge:', str(node)+'-->'+str(neighbour))
# when parent and neighbour node do not have any ancestor and a descendant relationship between them
elif self.start_time[node] > self.start_time[neighbour] and self.end_time[node] > self.end_time[neighbour]:
print('Cross Edge:', str(node)+'-->'+str(neighbour))
self.end_time[node] = self.time
self.time += 1
But it does not output the desired results for the following graph:
which is represented with:
self.v = 3
self.graph_list = [[1, 2], [], [1]]
The above code is not identifying the edge (2, 1) as a cross edge, but as a back edge.
I have no clue what to adapt in my code in order to detect cross edges correctly.
In a discussion someone gave this information, but I couldn't make work:
The checking condition is wrong when the node has not been completely visited when the edge is classified. This is because in the initial state the start & end times are set to 0.
if the graph looks like this:
0 --> 1
1 --> 2
2 --> 3
3 --> 1
When checking the 3 --> 1 edge: the answer should be a back edge.
But now the start/end [3] = 4/0 ; start/end [1] = 1/0
and the condition end[3] < end[1] is false because of the intialization problem.
I see two solutions,
traverse the graph first and determine the correct start/end [i] values, but it needs more time complexity, or
use black/gray/white and discover the order to classify the edges
Here are some issues:
By initialising start_time and end_time to 0 for each node, you cannot make the difference with a real time of 0, which is assigned to the very first node's start time. You should initialise these lists with a value that indicates there was no start/end at all. You could use the value -1 for this purpose.
The following statements should not be inside the loop:
self.end_time[node] = self.time
self.time += 1
They should be executed after the loop has completed. Only at that point you can "end" the visit of the current node. So the indentation of these two statements should be less.
There are several places where the value of self.end_time[node] is compared in a condition, but that time has not been set yet (apart from its default value), so this condition makes little sense.
The last elif should really be an else because there are no other possibilities to cover. If ever the execution gets there, it means no other possibility remains, so no condition should be checked.
The condition self.start_time[node] > self.start_time[neighbour] is not strong enough for identifying a back edge, and as already said, the second part of that if condition makes no sense, since self.end_time[node] has not been given a non-default value yet. And so this if block is entered also when it is not a back edge. What you really want to test here, is that the visit of neighbor has not been closed yet. In other words, you should check that self.start_time[neighbor] is still at its default value (and I propose to use -1 for that).
Not a problem, but there are also these remarks to make:
when you keep track of start_time and end_time, there is no need to have visited. Whether a node is visited follows from the value of start_time: if it still has its default value (-1), then the node has not yet been visited.
Don't use code comments to state the obvious. For instance the comment "increment the time by 1" really isn't explaining anything that cannot be seen directly from the code.
Attribute v could use a better name. Although V is often used to denote the set of nodes of a graph, it is not intuitive to see v as the number of nodes in the graph. I would suggest using num_nodes instead. It makes the code more readable.
Here is a correction of your code:
class Graph:
def __init__(self, num_nodes):
self.time = 0
self.traversal_array = []
self.num_nodes = num_nodes # Use more descriptive name
self.graph_list = [[] for _ in range(num_nodes)]
def dfs(self):
self.start_time = [-1]*self.num_nodes
self.end_time = [-1]*self.num_nodes
for node in range(self.num_nodes):
if self.start_time[node] == -1: # No need for self.visited
self.traverse_dfs(node)
def traverse_dfs(self, node):
self.traversal_array.append(node)
self.start_time[node] = self.time
self.time += 1
for neighbour in self.graph_list[node]:
# when the neighbor was not yet visited
if self.start_time[neighbour] == -1:
print(f"Tree Edge: {node}-->{neighbour}")
self.traverse_dfs(neighbour)
# otherwise when the neighbour's visit is still ongoing:
elif self.end_time[neighbour] == -1:
print(f"Back Edge: {node}-->{neighbour}")
# otherwise when the neighbour's visit started before the current node's visit:
elif self.start_time[node] < self.start_time[neighbour]:
print(f"Forward Edge: {node}-->{neighbour}")
else: # No condition here: there are no other options
print(f"Cross Edge: {node}-->{neighbour}")
# Indentation corrected:
self.end_time[node] = self.time
self.time += 1

Faster way to add dummy nodes in networkx to limit degree

I am wondering if I can speed up my operation of limiting node degree using an inbuilt function.
A submodule of my task requires me to limit the indegree to 2. So, the solution I proposed was to introduce sequential dummy nodes and absorb the extra edges. Finally, the last dummy gets connected to the children of the original node. To be specific if an original node 2 is split into 3 nodes (original node 2 & two dummy nodes), ALL the properties of the graph should be maintained if we analyse the graph by packaging 2 & its dummies into one hypothetical node 2'; The function I wrote is shown below:
def split_merging(G, dummy_counter):
"""
Args:
G: as the name suggests
dummy_counter: as the name suggests
Returns:
G with each merging node > 2 incoming split into several consecutive nodes
and dummy_counter
"""
# we need two copies; one to ensure the sanctity of the input G
# and second, to ensure that while we change the Graph in the loop,
# the loop doesn't go crazy due to changing bounds
G_copy = nx.DiGraph(G)
G_copy_2 = nx.DiGraph(G)
for node in G_copy.nodes:
in_deg = G_copy.in_degree[node]
if in_deg > 2: # node must be split for incoming
new_nodes = ["dummy" + str(i) for i in range(dummy_counter, dummy_counter + in_deg - 2)]
dummy_counter = dummy_counter + in_deg - 2
upstreams = [i for i in G_copy_2.predecessors(node)]
downstreams = [i for i in G_copy_2.successors(node)]
for up in upstreams:
G_copy_2.remove_edge(up, node)
for down in downstreams:
G_copy_2.remove_edge(node, down)
prev_node = node
G_copy_2.add_edge(upstreams[0], prev_node)
G_copy_2.add_edge(upstreams[1], prev_node)
for i in range(2, len(upstreams)):
G_copy_2.add_edge(prev_node, new_nodes[i - 2])
G_copy_2.add_edge(upstreams[i], new_nodes[i - 2])
prev_node = new_nodes[i - 2]
for down in downstreams:
G_copy_2.add_edge(prev_node, down)
return G_copy_2, dummy_counter
For clarification, the input and output are shown below:
Input:
Output:
It works as expected. But the problem is that this is very slow for larger graphs. Is there a way to speed this up using some inbuilt function from networkx or any other library?
Sure; the idea is similar to balancing a B-tree. If a node has too many in-neighbors, create two new children, and split up all your in-neighbors among those children. The children have out-degree 1 and point to your original node; you may need to recursively split them as well.
This is as balanced as possible: node n becomes a complete binary tree rooted at node n, with external in-neighbors at the leaves only, and external out-neighbors at the root.
def recursive_split_node(G: 'nx.DiGraph', node, max_in_degree: int = 2):
"""Given a possibly overfull node, create a minimal complete
binary tree rooted at that node with no overfull nodes.
Return the new graph."""
global dummy_counter
current_in_degree = G.in_degree[node]
if current_in_degree <= max_in_degree:
return G
# Complete binary tree, so left gets 1 more descendant if tied
left_child_in_degree = (current_in_degree + 1) // 2
left_child = "dummy" + str(dummy_counter)
right_child = "dummy" + str(dummy_counter + 1)
dummy_counter += 2
G.add_node(left_child)
G.add_node(right_child)
old_predecessors = list(G.predecessors(node))
# Give all predecessors to left and right children
G.add_edges_from([(y, left_child)
for y in old_predecessors[:left_child_in_degree]])
G.add_edges_from([(y, right_child)
for y in old_predecessors[left_child_in_degree:]])
# Remove all incoming edges
G.remove_edges_from([(y, node) for y in old_predecessors])
# Connect children to me
G.add_edge(left_child, node)
G.add_edge(right_child, node)
# Split children
G = recursive_split_node(G, left_child, max_in_degree)
G = recursive_split_node(G, right_child, max_in_degree)
return G
def clean_graph(G: 'nx.DiGraph', max_in_degree: int = 2) -> 'nx.DiGraph':
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed our limit."""
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > max_in_degree:
G_copy = recursive_split_node(G_copy, node, max_in_degree)
return G_copy
This code for recursively splitting nodes is quite handy and easily generalized, and intentionally left unoptimized.
To solve your exact use case, you could go with an iterative solution: build a full, complete binary tree (with the same structure as a heap) implicitly as an array. This is, I believe, the theoretically optimal solution to the problem, in terms of minimizing the number of graph operations (new nodes, new edges, deleting edges) to achieve the constraint, and gives the same graph as the recursive solution.
def clean_graph(G):
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed 2."""
global dummy_counter
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > 2:
predecessors_list = list(G_copy.predecessors(node))
G_copy.remove_edges_from((y, node) for y in predecessors_list)
N = len(predecessors_list)
leaf_count = (N + 1) // 2
internal_count = leaf_count // 2
total_nodes = leaf_count + internal_count
node_names = [node]
node_names.extend(("dummy" + str(dummy_counter + i) for i in range(total_nodes - 1)))
dummy_counter += total_nodes - 1
for i in range(internal_count):
G_copy.add_edges_from(((node_names[2 * i + 1], node_names[i]), (node_names[2 * i + 2], node_names[i])))
for leaf in range(internal_count, internal_count + leaf_count):
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
return G_copy
From my testing, comparing performance on very dense graphs generated with nx.fast_gnp_random_graph(500, 0.3, directed=True), this is 2.75x faster than the recursive solution, and 1.75x faster than the original posted solution. The bottleneck for further optimizations is networkx and Python, or changing the input graphs to be less dense.

BFS in the nodes of a graph

Graph
I am trying to perform BFS on this graph starting from node 16. But my code is giving erroneous output. Can you please help me out. Thanks.
visited_nodes = set()
queue = [16]
pardaught = dict()
exclu = list()
path = set()
for node in queue:
path.add(node)
neighbors = G.neighbors(node)
visited_nodes.add(node)
queue.remove(node)
queue.extend([n for n in neighbors if n not in visited_nodes])
newG = G.subgraph(path)
nx.draw(newG, with_labels=True)
My output is:
Output
The cause of your problem is that you are removing things from (the start of) queue while looping through it. As it loops it steps ahead, but because the element is removed from the start, the list "steps" one in the opposite direction. The net result is that it appears to jump 2 at a time. Here's an example:
integer_list = [1,2,3]
next_int = 4
for integer in integer_list:
print integer
integer_list.remove(integer)
integer_list.append(next_int)
next_int += 1
Produces output
1
3
5
path should be a list, not set since set has no order.
That should work:
visited_nodes = set()
path = []
queue = [16]
while queue:
node = queue.pop(0)
visited_nodes.add(node)
path.append(node)
for neighbor in G.neighbors(node):
if neighbor in visited_nodes:
continue
queue.append(neighbor)

How do I add finishing times for iterative depth-first search?

I'm trying to create depth-first algorithm that assigns finishing times (the time when a vertex can no longer be expanded) which are used for things like Kosaraju's algorithm. I was able to create a recursive version of DFS fairly easily, but I'm having a hard time converting it to an iterative version.
I'm using an adjacency list to represent the graph: a dict of vertices. For example, the input graph {1: [0, 4], 2: [1, 5], 3: [1], 4: [1, 3], 5: [2, 4], 6: [3, 4, 7, 8], 7: [5, 6], 8: [9], 9: [6, 11], 10: [9], 11: [10]} represents edges (1,0), (1,4), (2,1), (2,5), etc. The following is the implementation of an iterative DFS that uses a simple stack (LIFO), but it doesn't compute finishing times. One of the key problems I faced was that since the vertices are popped, there is no way for the algorithm to trace back its path once a vertex has been fully expanded (unlike in recursion). How do I fix this?
def dfs(graph, vertex, finish, explored):
global count
stack = []
stack.append(vertex)
while len(stack) != 0:
vertex = stack.pop()
if explored[vertex] == False:
explored[vertex] = True
#add all outgoing edges to stack:
if vertex in graph: #check if key exists in hash -- since not all vertices have outgoing edges
for v in graph[vertex]:
stack.append(v)
#this doesn't assign finishing times, it assigns the times when vertices are discovered:
#finish[count] = vertex
#count += 1
N.b. there is also an outer loop that complements DFS -- though, I don't think the problem lies there:
#outer loop:
for vertex in range(size, 0, -1):
if explored[vertex] == False:
dfs(hGraph, vertex, finish, explored)
Think of your stack as a stack of tasks, not vertices. There are two types of task you need to do. You need to expand vertexes, and you need to add finishing times.
When you go to expand a vertex, you first add the task of computing a finishing time, then add expanding every child vertex.
When you go to add a finishing time, you can do so knowing that expansion finished.
Here is a working solution that uses two stacks during the iterative subroutine. The array traceBack holds the vertices that have been explored and is associated with complementary 2D-array, stack, that holds arrays of edges that have yet to be explored. These two arrays are linked; when we add an element to traceBack we also add to stack (same with popping elements).
count = 0
def outerLoop(hGraph, N):
explored = [False for iii in range(N+1)]
finish = {}
for vertex in range(N, -1, -1):
if explored[vertex] == False:
dfs(vertex, hGraph, explored, finish)
return finish
def dfs(vertex, graph, explored, finish):
global count
stack = [[]] #stack contains the possible edges to explore
traceBack = []
traceBack.append(vertex)
while len(stack) > 0:
explored[vertex] = True
try:
for n in graph[vertex]:
if explored[n] == False:
if n not in stack[-1]: #to prevent double adding (when we backtrack to previous vertex)
stack[-1].append(n)
else:
if n in stack[-1]: #make sure num exists in array before removing
stack[-1].remove(n)
except IndexError: pass
if len(stack[-1]) == 0: #if current stack is empty then there are no outgoing edges:
finish[count] = traceBack.pop() #thus, we can add finishing times
count += 1
if len(traceBack) > 0: #to prevent popping empty array
vertex = traceBack[-1]
stack.pop()
else:
vertex = stack[-1][-1] #pick last element in top of stack to be new vertex
stack.append([])
traceBack.append(vertex)
Here is a way. Every time we face the following condition, we do a callback or we mark the time,
The node has no outgoing edge(no way).
When the parent of the traversing node is different from last parent(parent change). In that case we finish the last parent.
When we reach at the end of the stack(tree end). We finish the last parent.
Here is the code,
var dfs_with_finishing_time = function(graph, start, cb) {
var explored = [];
var parent = [];
var i = 0;
for(i = 0; i < graph.length; i++) {
if(i in explored)
continue;
explored[i] = 1;
var stack = [i];
parent[i] = -1;
var last_parent = -1;
while(stack.length) {
var u = stack.pop();
var k = 0;
var no_way = true;
for(k = 0; k < graph.length; k++) {
if(k in explored)
continue;
if(!graph[u][k])
continue;
stack.push(k);
explored[k] = 1;
parent[k] = u;
no_way = false;
}
if(no_way) {
cb(null, u+1); // no way, reversed post-ordering (finishing time)
}
if(last_parent != parent[u] && last_parent != -1) {
cb(null, last_parent+1); // parent change, reversed post-ordering (finishing time)
}
last_parent = parent[u];
}
if(last_parent != -1) {
cb(null, last_parent+1); // tree end, reversed post-ordering (finishing time)
}
}
}

I can't find the corner cases

The problem specification is in https://www.dropbox.com/s/lmwxcsp3lie0x3n/437.pdf?dl=0
My solution is in http://ideone.com/3JsFCq
name = raw_input()
D = int(raw_input()) #degree of separation
N = int(raw_input()) #number of links
M = int(raw_input()) #book users
users = {}
books = {}
def build_edges(user1, user2):
if user1 not in users:
users[user1] = set([user2, ])
else:
users[user1].add(user2)
for i in xrange(N):
nw = raw_input()
us = nw.split('|')
build_edges(us[0], us[1])
build_edges(us[1], us[0])
def build_booklist(user1, book):
if user1 not in books:
users[user1] = []
else:
users[user1].append(user2)
for i in xrange(M):
bk = raw_input().split('|')
books[bk[0]] = []
for book in bk[1:]:
books[bk[0]].append(book)
rec = []
depth = [0,]
def bfs(graph, start):
visited, queue = set(), [start]
while queue:
vertex = queue.pop(0)
if vertex not in visited:
visited.add(vertex)
for book in books[vertex]:
if book not in books[start]:
rec.append(book)
queue.extend(graph[vertex] - visited)
depth[0] += 1
if depth[0] > D:
return
return visited
bfs(users, name)
print len(rec)
I couldn't find the corner cases.
It passes the example case, but it doesn't pass some others.
What is going wrong?
Your problem is that you are increasing the depth every time you process a vertex. Instead you need to store a depth for every vertex, and stop when you encounter a vertex with a depth larger that the given.
For example, if Alice has two friends, Bob and Carl, then as you process Alice, you will set depth to 1. Then as you process Bob, you will set it to two, and stop, before you process Carl, who is within distance one from Alice. Instead, you should set Alice's depth to 0, then as you add Bob and Carl to the queue, set their depths to 1, and process them. As you process them, you add their friends, whom you have not seen yet, with depth 2, and as soon as you encounter any of them in the main loop (pop from the queue), you stop.
UPDATE: also, add the first vertex to the visited set, when you initialize it. Otherwise you will process it as a vertex with depth two (you will add Alice's friend Bob with depth 1, and then Alice as Bob's friend with distance two). It doesn't hurt in this particular problem, but might be a problem if you make a similar mistake in a solution for some other BFS problem.

Categories

Resources