Faster way to add dummy nodes in networkx to limit degree - python

I am wondering if I can speed up my operation of limiting node degree using an inbuilt function.
A submodule of my task requires me to limit the indegree to 2. So, the solution I proposed was to introduce sequential dummy nodes and absorb the extra edges. Finally, the last dummy gets connected to the children of the original node. To be specific if an original node 2 is split into 3 nodes (original node 2 & two dummy nodes), ALL the properties of the graph should be maintained if we analyse the graph by packaging 2 & its dummies into one hypothetical node 2'; The function I wrote is shown below:
def split_merging(G, dummy_counter):
"""
Args:
G: as the name suggests
dummy_counter: as the name suggests
Returns:
G with each merging node > 2 incoming split into several consecutive nodes
and dummy_counter
"""
# we need two copies; one to ensure the sanctity of the input G
# and second, to ensure that while we change the Graph in the loop,
# the loop doesn't go crazy due to changing bounds
G_copy = nx.DiGraph(G)
G_copy_2 = nx.DiGraph(G)
for node in G_copy.nodes:
in_deg = G_copy.in_degree[node]
if in_deg > 2: # node must be split for incoming
new_nodes = ["dummy" + str(i) for i in range(dummy_counter, dummy_counter + in_deg - 2)]
dummy_counter = dummy_counter + in_deg - 2
upstreams = [i for i in G_copy_2.predecessors(node)]
downstreams = [i for i in G_copy_2.successors(node)]
for up in upstreams:
G_copy_2.remove_edge(up, node)
for down in downstreams:
G_copy_2.remove_edge(node, down)
prev_node = node
G_copy_2.add_edge(upstreams[0], prev_node)
G_copy_2.add_edge(upstreams[1], prev_node)
for i in range(2, len(upstreams)):
G_copy_2.add_edge(prev_node, new_nodes[i - 2])
G_copy_2.add_edge(upstreams[i], new_nodes[i - 2])
prev_node = new_nodes[i - 2]
for down in downstreams:
G_copy_2.add_edge(prev_node, down)
return G_copy_2, dummy_counter
For clarification, the input and output are shown below:
Input:
Output:
It works as expected. But the problem is that this is very slow for larger graphs. Is there a way to speed this up using some inbuilt function from networkx or any other library?

Sure; the idea is similar to balancing a B-tree. If a node has too many in-neighbors, create two new children, and split up all your in-neighbors among those children. The children have out-degree 1 and point to your original node; you may need to recursively split them as well.
This is as balanced as possible: node n becomes a complete binary tree rooted at node n, with external in-neighbors at the leaves only, and external out-neighbors at the root.
def recursive_split_node(G: 'nx.DiGraph', node, max_in_degree: int = 2):
"""Given a possibly overfull node, create a minimal complete
binary tree rooted at that node with no overfull nodes.
Return the new graph."""
global dummy_counter
current_in_degree = G.in_degree[node]
if current_in_degree <= max_in_degree:
return G
# Complete binary tree, so left gets 1 more descendant if tied
left_child_in_degree = (current_in_degree + 1) // 2
left_child = "dummy" + str(dummy_counter)
right_child = "dummy" + str(dummy_counter + 1)
dummy_counter += 2
G.add_node(left_child)
G.add_node(right_child)
old_predecessors = list(G.predecessors(node))
# Give all predecessors to left and right children
G.add_edges_from([(y, left_child)
for y in old_predecessors[:left_child_in_degree]])
G.add_edges_from([(y, right_child)
for y in old_predecessors[left_child_in_degree:]])
# Remove all incoming edges
G.remove_edges_from([(y, node) for y in old_predecessors])
# Connect children to me
G.add_edge(left_child, node)
G.add_edge(right_child, node)
# Split children
G = recursive_split_node(G, left_child, max_in_degree)
G = recursive_split_node(G, right_child, max_in_degree)
return G
def clean_graph(G: 'nx.DiGraph', max_in_degree: int = 2) -> 'nx.DiGraph':
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed our limit."""
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > max_in_degree:
G_copy = recursive_split_node(G_copy, node, max_in_degree)
return G_copy
This code for recursively splitting nodes is quite handy and easily generalized, and intentionally left unoptimized.
To solve your exact use case, you could go with an iterative solution: build a full, complete binary tree (with the same structure as a heap) implicitly as an array. This is, I believe, the theoretically optimal solution to the problem, in terms of minimizing the number of graph operations (new nodes, new edges, deleting edges) to achieve the constraint, and gives the same graph as the recursive solution.
def clean_graph(G):
"""Return a copy of our original graph, with nodes added to ensure
the max in degree does not exceed 2."""
global dummy_counter
G_copy = nx.DiGraph(G)
for node in G.nodes:
if G_copy.in_degree[node] > 2:
predecessors_list = list(G_copy.predecessors(node))
G_copy.remove_edges_from((y, node) for y in predecessors_list)
N = len(predecessors_list)
leaf_count = (N + 1) // 2
internal_count = leaf_count // 2
total_nodes = leaf_count + internal_count
node_names = [node]
node_names.extend(("dummy" + str(dummy_counter + i) for i in range(total_nodes - 1)))
dummy_counter += total_nodes - 1
for i in range(internal_count):
G_copy.add_edges_from(((node_names[2 * i + 1], node_names[i]), (node_names[2 * i + 2], node_names[i])))
for leaf in range(internal_count, internal_count + leaf_count):
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
G_copy.add_edge(predecessors_list.pop(), node_names[leaf])
if not predecessors_list:
break
return G_copy
From my testing, comparing performance on very dense graphs generated with nx.fast_gnp_random_graph(500, 0.3, directed=True), this is 2.75x faster than the recursive solution, and 1.75x faster than the original posted solution. The bottleneck for further optimizations is networkx and Python, or changing the input graphs to be less dense.

Related

Find the width of tree at each level/height (non-binary tree)

Dear experienced friends, I am looking for an algorithm (Python) that outputs the width of a tree at each level. Here are the input and expected outputs.
(I have updated the problem with a more complex edge list. The original question with sorted edge list can be elegantly solved by #Samwise answer.)
Input (Edge List: source-->target)
[[11,1],[11,2],
[10,11],[10,22],[10,33],
[33,3],[33,4],[33,5],[33,6]]
The tree graph looks like this:
10
/ | \
11 22 33
/ \ / | \ \
1 2 3 4 5 6
Expected Output (Width of each level/height)
[1,3,6] # according to the width of level 0,1,2
I have looked through the web. It seems this topic related to BFS and Level Order Traversal. However, most solutions are based on the binary tree. How can solve the problem when the tree is not binary (e.g. the above case)?
(I'm new to the algorithm, and any references would be really appreciated. Thank you!)
Build a dictionary of the "level" of each node, and then count the number of nodes at each level:
>>> from collections import Counter
>>> def tree_width(edges):
... levels = {} # {node: level}
... for [p, c] in edges:
... levels[c] = levels.setdefault(p, 0) + 1
... widths = Counter(levels.values()) # {level: width}
... return [widths[level] for level in sorted(widths)]
...
>>> tree_width([[0,1],[0,2],[0,3],
... [1,4],[1,5],
... [3,6],[3,7],[3,8],[3,9]])
[1, 3, 6]
This might not be the most efficient, but it requires only two scans over the edge list, so it's optimal up to a constant factor. It places no requirement on the order of the edges in the edge list, but does insist that each edge be (source, dest). Also, doesn't check that the edge list describes a connected tree (or a tree at all; if the edge list is cyclic, the program will never terminate).
from collections import defauiltdict
# Turn the edge list into a (non-binary) tree, represented as a
# dictionary whose keys are the source nodes with the list of children
# as its value.
def edge_list_to_tree(edges):
'''Given a list of (source, dest) pairs, constructs a tree.
Returns a tuple (tree, root) where root is the root node
and tree is a dict which maps each node to a list of its children.
(Leaves are not present as keys in the dictionary.)
'''
tree = defaultdict(list)
sources = set() # nodes used as sources
dests = set() # nodes used as destinations
for source, dest in edges:
tree[source].append(dest)
sources.add(source)
dests.add(dest)
roots = sources - dests # Source nodes which are not destinations
assert(len(roots) == 1) # There is only one in a tree
tree.default_factory = None # Defang the defaultdict
return tree, roots.pop()
# A simple breadth-first-search, keeping the count of nodes at each level.
def level_widths(tree, root):
'''Does a BFS of tree starting at root counting nodes at each level.
Returns a list of counts.
'''
widths = [] # Widths of the levels
fringe = [root] # List of nodes at current level
while fringe:
widths.append(len(fringe))
kids = [] # List of nodes at next level
for parent in fringe:
if parent in tree:
for kid in tree[parent]:
kids.append(kid)
fringe = kids # For next iteration, use this level's kids
return widths
# Put the two pieces together.
def tree_width(edges):
return level_widths(*edge_list_to_tree(edges))
Possible solution that is based on Width-First-Traversal
In Width-First-Traversal we add the node to the array, but in this solution we put the array in an object together with its level and then add it to the array.
function levelWidth(root) {
const counter = [];
const traverseBF = fn => {
const arr = [{n: root, l:0}];
const pushToArr = l => n => arr.push({n, l});
while (arr.length) {
const node = arr.shift();
node.n.children.forEach(pushToArr(node.l+1));
fn(node);
}
};
traverseBF(node => {
counter[node.l] = (+counter[node.l] || 0) + 1;
});
return counter;
}

How to find levels of each node in a graph where the root node is in the last level?

I have the following python code to construct a graph and find the level of each node of the graph(DAG in my case):
import queue
# A class to represent a graph object
class Graph:
# Constructor to construct a graph
def __init__(self, edges, n):
# A list of lists to represent an adjacency list
self.adjList = [None] * n
# allocate memory for the adjacency list
for i in range(n):
self.adjList[i] = []
# add edges to the directed graph
for (src, dest, weight) in edges:
# allocate node in adjacency list from src to dest
self.adjList[src].append((dest, weight))
# Function to print adjacency list representation of a graph
def printGraph(graph,n):
for src in range(len(graph.adjList)):
# print current vertex and all its neighboring vertices
for (dest, weight) in graph.adjList[src]:
new_graph[src].append(dest)
print(f'({src} —> {dest}, {weight}) ', end='')
print()
# function to determine level of
# each node starting from x using BFS
def getLevels(graph, V, x):
level = [None] * V
# array to store level of each node
marked = [False] * V
# create a queue
que = queue.Queue()
# enqueue element x
que.put(x)
# initialize level of source
# node to 1
level[x] = 1
# marked it as visited
marked[x] = True
# do until queue is empty
while (not que.empty()):
# get the first element of queue
x = que.get()
# traverse neighbors of node x
for i in range(len(graph[x])):
# b is neighbor of node x
b = graph[x][i]
# if b is not marked already
if (not marked[b]):
# enqueue b in queue
que.put(b)
# level of b is level of x + 1
level[b] = level[x] + 1
# mark b
marked[b] = True
# display all nodes and their levels
print("Nodes", " ", "Level")
for i in range(V):
print(" ",i, " --> ", level[i])
return level
# construct a graph from a given list of edges
graph = Graph(data.edges, data.tasks)
new_graph = [[] for i in range(data.tasks)]
# print adjacency list representation of the graph
printGraph(graph,data.tasks)
parents = getParents(graph, data.tasks)
level = getLevels(new_graph, data.tasks, 0)
print(parents)
The code works fine for a graph in which the root node is in level 1 such as the graph shown below:
But for a graph that starts from the bottom, in which the root node is in the last level as shown in the figure below, my code shows the level of each node as NONE.
I have been struggling to level the graph as shown in the second picture. Any help would be appreciated!

Is the given Graph a tree? Faster than below approach -

I was given a question during an interview and although my answer was accepted at the end they wanted a faster approach and I went blank..
Question :
Given an undirected graph, can you see if it's a tree? If so, return true and false otherwise.
A tree:
A - B
|
C - D
not a tree:
A
/ \
B - C
/
D
You'll be given two parameters: n for number of nodes, and a multidimensional array of edges like such: [[1, 2], [2, 3]], each pair representing the vertices connected by the edge.
Note:Expected space complexity : O(|V|)
The array edges can be empty
Here is My code: 105ms
def is_graph_tree(n, edges):
nodes = [None] * (n + 1)
for i in range(1, n+1):
nodes[i] = i
for i in range(len(edges)):
start_edge = edges[i][0]
dest_edge = edges[i][1]
if nodes[start_edge] != start_edge:
start_edge = nodes[start_edge]
if nodes[dest_edge] != dest_edge:
dest_edge = nodes[dest_edge]
if start_edge == dest_edge:
return False
nodes[start_edge] = dest_edge
return len(edges) <= n - 1
Here's one approach using a disjoint-set-union / union-find data structure:
def is_graph_tree(n, edges):
parent = list(range(n+1))
size = [1] * (n + 1)
for x, y in edges:
# find x (path splitting)
while parent[x] != x:
x, parent[x] = parent[x], parent[parent[x]]
# find y
while parent[y] != y:
y, parent[y] = parent[y], parent[parent[y]]
if x == y:
# Already connected
return False
# Union (by size)
if size[x] < size[y]:
x, y = y, x
parent[y] = x
size[x] += size[y]
return True
assert not is_graph_tree(4, [(1, 2), (2, 3), (3, 4), (4, 2)])
assert is_graph_tree(6, [(1, 2), (2, 3), (3, 4), (3, 5), (1, 6)])
The runtime is O(V + E*InverseAckermannFunction(V)), which better than O(V + E * log(log V)), so it's basically O(V + E).
Tim Roberts has posted a candidate solution, but this will work in the case of disconnected subtrees:
import queue
def is_graph_tree(n, edges):
# A tree with n nodes has n - 1 edges.
if len(edges) != n - 1:
return False
# Construct graph.
graph = [[] for _ in range(n)]
for first_vertex, second_vertex in edges:
graph[first_vertex].append(second_vertex)
graph[second_vertex].append(first_vertex)
# BFS to find edges that create cycles.
# The graph is undirected, so we can root the tree wherever we want.
visited = set()
q = queue.Queue()
q.put((0, None))
while not q.empty():
current_node, previous_node = q.get()
if current_node in visited:
return False
visited.add(current_node)
for neighbor in graph[current_node]:
if neighbor != previous_node:
q.put((neighbor, current_node))
# Only return true if the graph has only one connected component.
return len(visited) == n
This runs in O(n + len(edges)) time.
You could approach this from the perspective of tree leaves. Every leaf node in a tree will have exactly one edge connected to it. So, if you count the number of edges for each nodes, you can get the list of leaves (i.e. the ones with only one edge).
Then, take the linked node from these leaves and reduce their edge count by one (as if you were removing all the leaves from the tree. That will give you a new set of leaves corresponding to the parents of the original leaves. Repeat the process until you have no more leaves.
[EDIT] checking that the number of edges is N-1 eliminiates the need to do the multi-root check because there will be another discrepancy (e.g. double link, missing node) in the graph if there are multiple 'roots' or a disconnected subtree
If the graph is a tree, this process should eliminate all nodes from the node counts (i.e. they will all be flagged as leaves at some point).
Using the Counter class (from collections) will make this relatively easy to implement:
from collections import Counter
def isTree(N,E):
if N==1 and not E: return True # root only is a tree
if len(E) != N-1: return False # a tree has N-1 edges
counts = Counter(n for ab in E for n in ab) # edge counts per node
if len(counts) != N : return False # unlinked nodes
while True:
leaves = {n for n,c in counts.items() if c==1} # new leaves
if not leaves:break
for a,b in E: # subtract leaf counts
if counts[a]>1 and b in leaves: counts[a] -= 1
if counts[b]>1 and a in leaves: counts[b] -= 1
for n in leaves: counts[n] = -1 # flag leaves in counts
return all(c==-1 for c in counts.values()) # all must become leaves
output:
G = [[1,2],[1,3],[4,5],[4,6]]
print(isTree(6,G)) # False (disconnected sub-tree)
G = [[1,2],[1,3],[1,4],[2,3],[5,6]]
print(isTree(6,G)) # False (doubly linked node 3)
G = [[1,2],[2,6],[3,4],[5,1],[2,3]]
print(isTree(6,G)) # True
G = [[1,2],[2,3]]
print(isTree(3,G)) # True
G = [[1,2],[2,3],[3,4]]
print(isTree(4,G)) # True
G = [[1,2],[1,3],[2,5],[2,4]]
print(isTree(6,G)) # False (missing node)
Space complexity is O(N) because the counts dictionary has one entry per node(vertex) with an integer as value. Time complexity will be O(ExL) where E is the number of edges and L is the number of levels in the tree. The worts case time is O(E^2) for a tree where all parents have only one child node. However, since the initial condition is for E to be less than V, the worst case will actually be O(V^2)
Note that this algorithm makes no assumption on edge order or numerical relationships between node numbers. The root (last node to be made a leaf) found by this algorithm is not necessarily the only possible root given that, unless the nodes have an implicit cardinality relationship (or edges have an order), there could be ambiguous scenarios:
[1,2],[2,3],[2,4] could be:
1 2 3
|_2 OR |_1 OR |_2
|_3 |_3 |_1
|_4 |_4 |_4
If a cardinality relationship between node numbers or an order of edges can be relied upon, the algorithm could potentially be made more time efficient (because we could easily determine which node is the root and start from there).
[EDIT2] Alternative method using groups.
When the number of edges is N-1, if the graph is a tree, all nodes should be reachable from any other node. This means that, if we form groups of reachable nodes for each node and merge them together based on the edges, we should end up with a single group after going through all the edges.
Here is the modified function based on that approach:
def isTree(N,E):
if N==1 and not E: return True # root only is a tree
if len(E) != N-1: return False # a tree has N-1 edges
groups = {n:[n] for ab in E for n in ab} # each node in its own group
if len(groups) != N : return False # unlinked nodes
for a,b in E:
groups[a].extend(groups[b]) # merge groups
for n in groups[b]: groups[n] = groups[a] # update nodes' groups
return len(set(map(id,groups.values()))) == 1 # only one group when done
Given that we start out with fewer edges than nodes and that group merging will consume at most 2x a group size (so also < N), the space complexity will remain O(V). The time complexity will also be O(V^2) at for the worts case scenarios
You don't even need to know how many edges there are:
def is_graph_tree(n, edges):
seen = set()
for a,b in edges:
b = max(a,b)
if b in seen:
return False
seen.add(b)
return True
a = [[1,2],[2,3],[3,4]]
print(is_graph_tree(0,a))
b = [[1,2],[1,3],[2,3],[2,4]]
print(is_graph_tree(0,b))
Now, this WON'T catch the case of disconnected subtrees, but that wasn't in the problem description...

Distance map returned from shortest_distance function misses entries of certain vertices

I have a network present in a postgres database, where I can route with the pgrouting extension. I've read this into mem, and now want to calculate the distance of all nodes within 0.1 hours from a specific starting node:
dm = G.new_vp("double", np.inf)
gt.shortest_distance(G, source=nd[102481678], weights=wgts, dist_map = dm, max_dist=0.1)
where wgts is an EdgePropertyMap containing the weights per edge, and nd is a reverse mapping to get vertex index from the outside id.
In pgRouting this delivers 349 reachable nodes, using graph-tool only 328. The results are more or less the same (e.g. the furthest node is the same with the exact same cost, nodes present in both lists have same distance), but the graph-tool distance map just seems to miss certain nodes. The weird thing is that I found a cul-de-sac node labeled with a distance (second one from below), but the node connecting the cul-de-sac with the outside world is missing. Seems weird, because if the connecting node would not be reachable, the cul-de-sac would be unreachable as well.
I've compiled a MWE: https://gofile.io/d/YpgjSw
Below is the python code:
import graph_tool.all as gt
import numpy as np
import time
# construct list of source, target, edge-id (edge-id not really used in this example)
l = []
with open('nw.txt') as f:
rows = f.readlines()
for row in rows:
id = int(row.split('\t')[0])
source = int(row.split('\t')[1])
target = int(row.split('\t')[2])
l.append([source, target, id])
l.append([target, source, id])
print len(l)
# construct graph
G = gt.Graph(directed=True)
G.ep["edge_id"] = G.new_edge_property("int")
n = G.add_edge_list(l, hashed=True, eprops=G.ep["edge_id"])
# construct a dict for mapping outside node-id's to internal id's (node indexes)
nd = {}
i = 0
for x in n:
nd[x] = i
i = i + 1
# construct a dict for mapping (source, target) combis to a cost and reverse cost
db_wgts = {}
with open('costs.txt') as f:
rows = f.readlines()
for row in rows:
source = int(row.split('\t')[0])
target = int(row.split('\t')[1])
cost = float(row.split('\t')[2])
reverse_cost = float(row.split('\t')[3])
db_wgts[(source, target)] = cost
db_wgts[(target, source)] = reverse_cost
# construct an edge property and fill it according to previous dict
wgts = G.new_edge_property("double")
i = 0
for e in G.edges():
i = i + 1
print i
print e
s = n[int(e.source())]
t = n[int(e.target())]
try:
wgts[e] = db_wgts[(s, t)]
except KeyError:
# this was necessary
wgts[e] = 1000000
# calculate shortest distance to all nodes within 0.1 total cost from source-node with outside-id of 102481678
dm = G.new_vp("double", np.inf)
gt.shortest_distance(G, source=nd[102481678], weights=wgts, dist_map = dm, max_dist=0.1)
# some mumbo-jumbo for getting the result in a nice node-id: cost format
ar = dm.get_array()
idxs = np.where(dm.get_array() < 0.1)
vals = ar[ar < 0.1]
final_res = [(i, v) for (i,v) in zip(list(idxs[0]), list(vals))]
final_res.sort(key=lambda tup: tup[1])
for x in final_res:
print n[x[0]], x[1]
# output saved in result_missing_nodes.txt
# 328 records, should be 349
To illustrate (one of the) missing nodes:
>>> dm[nd[63447311]]
0.0696234786274957
>>> dm[nd[106448775]]
0.06165528930577409
>>> dm[nd[127601733]]
inf
>>> dm[nd[100428293]]
0.0819900275163846
>>>
This doesn't seem possible because this is the local layout of the network, labels are the id's referenced above:
This is a numerical precision problem. You have very low edge weights (1e-6) combined with very large values (1000000), which cause differences to be lost to finite precision. If you replace all values 1000000 (which I assume mean infinite weight) by numpy.inf, you actually get a more stable calculation, and no missing nodes in your example.
An even better alternative is to actually remove the "infinite weight"
edges using an edge filter:
u = GraphView(G, efilt=wgts.fa < 1000000)
and compute the distances on that.

kosaraju finding finishing time using iterative dfs

here is the first part of the code that i have did for Kosaraju's algorithm.
###### reading the data #####
with open('data.txt') as req_file:
ori_data = []
for line in req_file:
line = line.split()
if line:
line = [int(i) for i in line]
ori_data.append(line)
###### forming the Grev ####
revscc_dic = {}
for temp in ori_data:
if temp[1] not in revscc_dic:
revscc_dic[temp[1]] = [temp[0]]
else:
revscc_dic[temp[1]].append(temp[0])
print revscc_dic
######## finding the G#####
scc_dic = {}
for temp in ori_data:
if temp[0] not in scc_dic:
scc_dic[temp[0]] = [temp[1]]
else:
scc_dic[temp[0]].append(temp[1])
print scc_dic
##### iterative dfs ####
path = []
for i in range(max(max(ori_data)),0,-1):
start = i
q=[start]
while q:
v=q.pop(0)
if v not in path:
path.append(v)
q=revscc_dic[v]+q
print path
The code reads the data and forms Grev and G correctly. I have written a code for iterative dfs. How can i include to find the finishing time ?? I understand finding the finishing time using paper and pen but I do not understand the part of finishing time as a code ?? how can I implement it.. Only after this I can proceed my next part of code. Pls help. Thanks in advance.
The data.txt file contains:
1 4
2 8
3 6
4 7
5 2
6 9
7 1
8 5
8 6
9 7
9 3
please save it as data.txt.
With recursive dfs, it is easy to see when a given vertex has "finished" (i.e. when we have visited all of its children in the dfs tree). The finish time can be calculated just after the recursive call has returned.
However with iterative dfs, this is not so easy. Now that we are iteratively processing the queue using a while loop we have lost some of the nested structure that is associated with function calls. Or more precisely, we don't know when backtracking occurs. Unfortunately, there is no way to know when backtracking occurs without adding some additional information to our stack of vertices.
The quickest way to add finishing times to your dfs implementation is like so:
##### iterative dfs (with finish times) ####
path = []
time = 0
finish_time_dic = {}
for i in range(max(max(ori_data)),0,-1):
start = i
q = [start]
while q:
v = q.pop(0)
if v not in path:
path.append(v)
q = [v] + q
for w in revscc_dic[v]:
if w not in path: q = [w] + q
else:
if v not in finish_time_dic:
finish_time_dic[v] = time
time += 1
print path
print finish_time_dic
The trick used here is that when we pop off v from the stack, if it is the first time we have seen it, then we add it back to the stack again. This is done using: q = [v] + q. We must push v onto the stack before we push on its neighbours (we write the code that pushes v before the for loop that pushes v's neighbours) - or else the trick doesn't work. Eventually we will pop v off the stack again. At this point, v has finished! We have seen v before, so, we go into the else case and compute a fresh finish time.
For the graph provided, finish_time_dic gives the correct finishing times:
{1: 6, 2: 1, 3: 3, 4: 7, 5: 0, 6: 4, 7: 8, 8: 2, 9: 5}
Note that this dfs algorithm (with the finishing times modification) still has O(V+E) complexity, despite the fact that we are pushing each node of the graph onto the stack twice. However, more elegant solutions exist.
I recommend reading Chapter 5 of Python Algorithms: Mastering Basic Algorithms in the Python Language by Magnus Lie Hetland (ISBN: 1430232374, 9781430232377). Question 5-6 and 5-7 (on page 122) describe your problem exactly. The author answers these questions and gives an alternate solution to the problem.
Questions:
5-6 In recursive DFS, backtracking occurs when you return from one of the recursive calls. But where has the backtracking gone in the iterative version?
5-7. Write a nonrecursive version of DFS that can deal determine finish-times.
Answers:
5-6 It’s not really represented at all in the iterative version. It just implicitly occurs once you’ve popped off all your “traversal descendants” from the stack.
5-7 As explained in Exercise 5-6, there is no point in the code where backtracking occurs in the iterative DFS, so we can’t just set the finish time at some specific place (like in the recursive one). Instead, we’d need to add a marker to the stack. For example, instead of adding the neighbors of u to the stack, we could add edges of the form (u, v), and before all of them, we’d push (u, None), indicating the backtracking point for u.
Iterative DFS itself is not complicated, as seen from Wikipedia. However, calculating the finish time of each node requires some tweaks to the algorithm. We only pop the node off the stack the 2nd time we encounter it.
Here's my implementation which I feel demonstrates what's going on a bit more clearly:
step = 0 # time counter
def dfs_visit(g, v):
"""Run iterative DFS from node V"""
global step
total = 0
stack = [v] # create stack with starting vertex
while stack: # while stack is not empty
step += 1
v = stack[-1] # peek top of stack
if v.color: # if already seen
v = stack.pop() # done with this node, pop it from stack
if v.color == 1: # if GRAY, finish this node
v.time_finish = step
v.color = 2 # BLACK, done
else: # seen for first time
v.color = 1 # GRAY: discovered
v.time_discover = step
total += 1
for w in v.child: # for all neighbor (v, w)
if not w.color: # if not seen
stack.append(w)
return total
def dfs(g):
"""Run DFS on graph"""
global step
step = 0 # reset step counter
for k, v in g.nodes.items():
if not v.color:
dfs_visit(g, v)
I am following the conventions of the CLR Algorithm Book and use node coloring to designate its state during the DFS search. I feel this is easier to understand than using a separate list to track node state.
All nodes start out as white. When it's discovered during the search it is marked as gray. When we are done with it, it is marked as black.
Within the while loop, if a node is white we keep it in the stack, and change its color to gray. If it's gray we change its color to black, and set its finish time. If it's black we just ignore it.
It is possible for a node on the stack to be black (even with our coloring check before adding it to the stack). A white node can be added to the stack twice (via two different neighbors). One will eventually turn black. When we reach the 2nd instance on the stack, we need to make sure we don't change its already set finish time.
Here are some additional support codes:
class Node(object):
def __init__(self, name=None):
self.name = name
self.child = [] # children | adjacency list
self.color = 0 # 0: white [unvisited], 1: gray [found], 2: black [finished]
self.time_discover = None # DFS
self.time_finish = None # DFS
class Graph(object):
def __init__(self):
self.nodes = defaultdict(Node) # list of Nodes
self.max_heap = [] # nodes in decreasing finish time for SCC
def build_max_heap(self):
"""Build list of nodes in max heap using DFS finish time"""
for k, v in self.nodes.items():
self.max_heap.append((0-v.time_finish, v)) # invert finish time for max heap
heapq.heapify(self.max_heap)
To run DFS on the reverse graph, you can build a parent list similar to the child list for each Node when the edges file is processed, and use the parent list instead of the child list in dfs_visit().
To process Nodes in decreasing finish time for the last part of SCC computation, you can build a max heap of Nodes, and use that max heap in dfs_visit() instead of simply the child list.
while g.max_heap:
v = heapq.heappop(g.max_heap)[1]
if not v.color:
size = dfs_visit(g, v)
scc_size.append(size)
I had a few issues with the order produced by Lawson's version of the iterative DFS. Here is code for my version which has a 1-to-1 mapping with a recursive version of DFS.
n = len(graph)
time = 0
finish_times = [0] * (n + 1)
explored = [False] * (n + 1)
# Determine if every vertex connected to v
# has already been explored
def all_explored(G, v):
if v in G:
for w in G[v]:
if not explored[w]:
return False
return True
# Loop through vertices in reverse order
for v in xrange(n, 0, -1):
if not explored[v]:
stack = [v]
while stack:
print(stack)
v = stack[-1]
explored[v] = True
# If v still has outgoing edges to explore
if not all_explored(graph_reversed, v):
for w in graph_reversed[v]:
# Explore w before others attached to v
if not explored[w]:
stack.append(w)
break
# We have explored vertices findable from v
else:
stack.pop()
time += 1
finish_times[v] = time
Here are the recursive and iterative implementations in java:
int time = 0;
public void dfsRecursive(Vertex vertex) {
time += 1;
vertex.setVisited(true);
vertex.setDiscovered(time);
for (String neighbour : vertex.getNeighbours()) {
if (!vertices.get(neighbour).getVisited()) {
dfsRecursive(vertices.get(neighbour));
}
}
time += 1;
vertex.setFinished(time);
}
public void dfsIterative(Vertex vertex) {
Stack<Vertex> stack = new Stack<>();
stack.push(vertex);
while (!stack.isEmpty()) {
Vertex current = stack.pop();
if (!current.getVisited()) {
time += 1;
current.setVisited(true);
current.setDiscovered(time);
stack.push(current);
List<String> currentsNeigbours = current.getNeighbours();
for (int i = currentsNeigbours.size() - 1; i >= 0; i--) {
String currentNeigbour = currentsNeigbours.get(i);
Vertex neighBour = vertices.get(currentNeigbour);
if (!neighBour.getVisited())
stack.push(neighBour);
}
} else {
if (current.getFinished() < 1) {
time += 1;
current.setFinished(time);
}
}
}
}
First, you should know exactly what is finished time. In recursive dfs, finished time is when all of the adjacent nodes [V]s of a Node v is finished,
with this keeping in mind you need to have additional data structure to store all infos.
adj[][] //graph
visited[]=NULL //array of visited node
finished[]=NULL //array of finished node
Stack st=new Stack //normal stack
Stack backtrack=new Stack //additional stack
function getFinishedTime(){
for(node i in adj){
if (!vistied.contains[i]){
st.push(i);
visited.add(i)
while(!st.isEmpty){
int j=st.pop();
int[] unvisitedChild= getUnvistedChild(j);
if(unvisitedChild!=null){
for(int c in unvisitedChild){
st.push(c);
visited.add(c);
}
backtrack.push([j,unvisitedChild]); //you can store each entry as array with the first index as the parent node j, followed by all the unvisited child node.
}
else{
finished.add(j);
while(!backtrack.isEmpty&&finished.containsALL(backtrack.peek())) //all of the child node is finished, then we can set the parent node visited
{
parent=backtrack.pop()[0];
finished.add(parent);
}
}
}
}
}
function getUnvistedChild(int i){
unvisitedChild[]=null
for(int child in adj[i]){
if(!visited.contains(child))
unvisitedChild.add(child);
}
return unvisitedChild;
}
and the finished time should be
[5, 2, 8, 3, 6, 9, 1, 4, 7]

Categories

Resources