The problem specification is in https://www.dropbox.com/s/lmwxcsp3lie0x3n/437.pdf?dl=0
My solution is in http://ideone.com/3JsFCq
name = raw_input()
D = int(raw_input()) #degree of separation
N = int(raw_input()) #number of links
M = int(raw_input()) #book users
users = {}
books = {}
def build_edges(user1, user2):
if user1 not in users:
users[user1] = set([user2, ])
else:
users[user1].add(user2)
for i in xrange(N):
nw = raw_input()
us = nw.split('|')
build_edges(us[0], us[1])
build_edges(us[1], us[0])
def build_booklist(user1, book):
if user1 not in books:
users[user1] = []
else:
users[user1].append(user2)
for i in xrange(M):
bk = raw_input().split('|')
books[bk[0]] = []
for book in bk[1:]:
books[bk[0]].append(book)
rec = []
depth = [0,]
def bfs(graph, start):
visited, queue = set(), [start]
while queue:
vertex = queue.pop(0)
if vertex not in visited:
visited.add(vertex)
for book in books[vertex]:
if book not in books[start]:
rec.append(book)
queue.extend(graph[vertex] - visited)
depth[0] += 1
if depth[0] > D:
return
return visited
bfs(users, name)
print len(rec)
I couldn't find the corner cases.
It passes the example case, but it doesn't pass some others.
What is going wrong?
Your problem is that you are increasing the depth every time you process a vertex. Instead you need to store a depth for every vertex, and stop when you encounter a vertex with a depth larger that the given.
For example, if Alice has two friends, Bob and Carl, then as you process Alice, you will set depth to 1. Then as you process Bob, you will set it to two, and stop, before you process Carl, who is within distance one from Alice. Instead, you should set Alice's depth to 0, then as you add Bob and Carl to the queue, set their depths to 1, and process them. As you process them, you add their friends, whom you have not seen yet, with depth 2, and as soon as you encounter any of them in the main loop (pop from the queue), you stop.
UPDATE: also, add the first vertex to the visited set, when you initialize it. Otherwise you will process it as a vertex with depth two (you will add Alice's friend Bob with depth 1, and then Alice as Bob's friend with distance two). It doesn't hurt in this particular problem, but might be a problem if you make a similar mistake in a solution for some other BFS problem.
Related
Based on a DFS traversal, I want to classify edges (u, v) in a directed graph as:
Tree edge: when v is visited for the first time as we traverse the edge
Back edge: when v is an ancestor of u in the traversal tree
Forward edge: when v is a descendant of u in the traversal tree
Cross edge: when v is neither an ancestor or descendant of u in the traversal tree
I was following a GeeksForGeeks tutorial to write this code:
class Graph:
def __init__(self, v):
self.time = 0
self.traversal_array = []
self.v = v
self.graph_list = [[] for _ in range(v)]
def dfs(self):
self.visited = [False]*self.v
self.start_time = [0]*self.v
self.end_time = [0]*self.v
self.ff = 0
self.fc = 0
for node in range(self.v):
if not self.visited[node]:
self.traverse_dfs(node)
def traverse_dfs(self, node):
# mark the node visited
self.visited[node] = True
# add the node to traversal
self.traversal_array.append(node)
# get the starting time
self.start_time[node] = self.time
# increment the time by 1
self.time += 1
# traverse through the neighbours
for neighbour in self.graph_list[node]:
# if a node is not visited
if not self.visited[neighbour]:
# marks the edge as tree edge
print('Tree Edge:', str(node)+'-->'+str(neighbour))
# dfs from that node
self.traverse_dfs(neighbour)
else:
# when the parent node is traversed after the neighbour node
if self.start_time[node] > self.start_time[neighbour] and self.end_time[node] < self.end_time[neighbour]:
print('Back Edge:', str(node)+'-->'+str(neighbour))
# when the neighbour node is a descendant but not a part of tree
elif self.start_time[node] < self.start_time[neighbour] and self.end_time[node] > self.end_time[neighbour]:
print('Forward Edge:', str(node)+'-->'+str(neighbour))
# when parent and neighbour node do not have any ancestor and a descendant relationship between them
elif self.start_time[node] > self.start_time[neighbour] and self.end_time[node] > self.end_time[neighbour]:
print('Cross Edge:', str(node)+'-->'+str(neighbour))
self.end_time[node] = self.time
self.time += 1
But it does not output the desired results for the following graph:
which is represented with:
self.v = 3
self.graph_list = [[1, 2], [], [1]]
The above code is not identifying the edge (2, 1) as a cross edge, but as a back edge.
I have no clue what to adapt in my code in order to detect cross edges correctly.
In a discussion someone gave this information, but I couldn't make work:
The checking condition is wrong when the node has not been completely visited when the edge is classified. This is because in the initial state the start & end times are set to 0.
if the graph looks like this:
0 --> 1
1 --> 2
2 --> 3
3 --> 1
When checking the 3 --> 1 edge: the answer should be a back edge.
But now the start/end [3] = 4/0 ; start/end [1] = 1/0
and the condition end[3] < end[1] is false because of the intialization problem.
I see two solutions,
traverse the graph first and determine the correct start/end [i] values, but it needs more time complexity, or
use black/gray/white and discover the order to classify the edges
Here are some issues:
By initialising start_time and end_time to 0 for each node, you cannot make the difference with a real time of 0, which is assigned to the very first node's start time. You should initialise these lists with a value that indicates there was no start/end at all. You could use the value -1 for this purpose.
The following statements should not be inside the loop:
self.end_time[node] = self.time
self.time += 1
They should be executed after the loop has completed. Only at that point you can "end" the visit of the current node. So the indentation of these two statements should be less.
There are several places where the value of self.end_time[node] is compared in a condition, but that time has not been set yet (apart from its default value), so this condition makes little sense.
The last elif should really be an else because there are no other possibilities to cover. If ever the execution gets there, it means no other possibility remains, so no condition should be checked.
The condition self.start_time[node] > self.start_time[neighbour] is not strong enough for identifying a back edge, and as already said, the second part of that if condition makes no sense, since self.end_time[node] has not been given a non-default value yet. And so this if block is entered also when it is not a back edge. What you really want to test here, is that the visit of neighbor has not been closed yet. In other words, you should check that self.start_time[neighbor] is still at its default value (and I propose to use -1 for that).
Not a problem, but there are also these remarks to make:
when you keep track of start_time and end_time, there is no need to have visited. Whether a node is visited follows from the value of start_time: if it still has its default value (-1), then the node has not yet been visited.
Don't use code comments to state the obvious. For instance the comment "increment the time by 1" really isn't explaining anything that cannot be seen directly from the code.
Attribute v could use a better name. Although V is often used to denote the set of nodes of a graph, it is not intuitive to see v as the number of nodes in the graph. I would suggest using num_nodes instead. It makes the code more readable.
Here is a correction of your code:
class Graph:
def __init__(self, num_nodes):
self.time = 0
self.traversal_array = []
self.num_nodes = num_nodes # Use more descriptive name
self.graph_list = [[] for _ in range(num_nodes)]
def dfs(self):
self.start_time = [-1]*self.num_nodes
self.end_time = [-1]*self.num_nodes
for node in range(self.num_nodes):
if self.start_time[node] == -1: # No need for self.visited
self.traverse_dfs(node)
def traverse_dfs(self, node):
self.traversal_array.append(node)
self.start_time[node] = self.time
self.time += 1
for neighbour in self.graph_list[node]:
# when the neighbor was not yet visited
if self.start_time[neighbour] == -1:
print(f"Tree Edge: {node}-->{neighbour}")
self.traverse_dfs(neighbour)
# otherwise when the neighbour's visit is still ongoing:
elif self.end_time[neighbour] == -1:
print(f"Back Edge: {node}-->{neighbour}")
# otherwise when the neighbour's visit started before the current node's visit:
elif self.start_time[node] < self.start_time[neighbour]:
print(f"Forward Edge: {node}-->{neighbour}")
else: # No condition here: there are no other options
print(f"Cross Edge: {node}-->{neighbour}")
# Indentation corrected:
self.end_time[node] = self.time
self.time += 1
I'm trying to wrap my head around recursive functions with this one function def friends(self, name, degree):. The purpose of this one is to return the set of all friends up to a specified degree (for an address book). It's the last part of a larger class called class SocialAddressBook:. The 'degree' in this class allows the user to 'query' friends of friends: degree one is a direct friend, degree 2 is a friend-of-a-friend, and so on. The code I have is
def friends(self, name, degree):
fs = set()
if degree == 0:
return set()
if degree == 1:
which is far as my knowledge on this goes....
also some more context:
Transitive friendship:
Fred → Barb → Jane → Emma → Lisa
Fred → Sue
Jane → Mary
and so my tests are : a.friends('Fred', 1) == {'Barb', 'Sue'}
a.friends('Fred', 2) == {'Barb', 'Jane', 'Sue'}
a.friends('Fred', 3) == {'Mary', 'Barb', 'Jane', 'Sue', 'Emma'}
a.friends('Fred', 4) == {'Barb', 'Emma', 'Mary', 'Lisa', 'Sue', 'Jane'}
it only goes up to degree 4. SO should I even do this recursively or just manually since I know the degree it goes up to?.
If anyone could point me in the right direction on how to complete this recursively, that'd be awesome, thanks!
I would say to do this iteratively: just make it add friends to the current list n times, where n is an input parameter.
fs = set(self)
for i in range (n):
wider = set()
for chum in fs.copy():
for new_chum in chum.friend_list:
fs += new_chum
At each level, make a wider set from the friends of the current set. Once you've been through all of those, add them to the friend set. Repeat N times.
It is best to do this iteratively.
def get_friends_iteratively(self, name, degree)
if degree < 0:
raise ValueError('degree should be an int >= 0')
if degree == 0:
return set() # no friends of degree 0!
friends = self.friends
for _ in range(degree):
new_friends = set()
# it is important not to change a set while we iterate through it.
# thus, we change new_friends, then update friends when we are done.
for other_person in self.friends:
new_friends |= other_person.friends
friends |= new_friends
return friends
# In python, the |= ('in-place or') operator updates a set with
# the union of itself and another set.
I'm trying to code an algorithm to find the best path from one node s to another one f by Breadth First Search. I found the following code and I almost understand it all, but I don't get what are they doing in this line:
while frontier: ...
My questions are
What frontier means on the graph?
For how much time will while condition be true, or what depends on?
Here's the code:
Graph_Adjs={'s':['a','x'],'a':['s','z'],'x':['s','d','c'],'z':['a'],'d':['x','c','f'], 'c':['d','x','f','v'],'f':['d','c','v'],'v':['f','c'] }
def BFS (Graph_Adjs,s='s'):
level = {'s':0}
father = {'s':None}
i = 1
frontier = [s]
while frontier:
next = []
for u in frontier:
for v in Graph_Adjs[u]:
if v not in level:
level[v] = i
father[v] = u
next.append(v)
frontier = next
i+=1
return father
if __name__=='__main__':
father = BFS (Graph_Adjs,s='s')
f = 'f'
path = []
while f !=None:
path.append(f)
f = father[f]
path.reverse()
print (path)
In Python, the truth value of a list is True if it's non empty and False otherwise. The condition is effectively implementing a way to record visited nodes, so they are not iterated over again.
Graph
I am trying to perform BFS on this graph starting from node 16. But my code is giving erroneous output. Can you please help me out. Thanks.
visited_nodes = set()
queue = [16]
pardaught = dict()
exclu = list()
path = set()
for node in queue:
path.add(node)
neighbors = G.neighbors(node)
visited_nodes.add(node)
queue.remove(node)
queue.extend([n for n in neighbors if n not in visited_nodes])
newG = G.subgraph(path)
nx.draw(newG, with_labels=True)
My output is:
Output
The cause of your problem is that you are removing things from (the start of) queue while looping through it. As it loops it steps ahead, but because the element is removed from the start, the list "steps" one in the opposite direction. The net result is that it appears to jump 2 at a time. Here's an example:
integer_list = [1,2,3]
next_int = 4
for integer in integer_list:
print integer
integer_list.remove(integer)
integer_list.append(next_int)
next_int += 1
Produces output
1
3
5
path should be a list, not set since set has no order.
That should work:
visited_nodes = set()
path = []
queue = [16]
while queue:
node = queue.pop(0)
visited_nodes.add(node)
path.append(node)
for neighbor in G.neighbors(node):
if neighbor in visited_nodes:
continue
queue.append(neighbor)
here is the first part of the code that i have did for Kosaraju's algorithm.
###### reading the data #####
with open('data.txt') as req_file:
ori_data = []
for line in req_file:
line = line.split()
if line:
line = [int(i) for i in line]
ori_data.append(line)
###### forming the Grev ####
revscc_dic = {}
for temp in ori_data:
if temp[1] not in revscc_dic:
revscc_dic[temp[1]] = [temp[0]]
else:
revscc_dic[temp[1]].append(temp[0])
print revscc_dic
######## finding the G#####
scc_dic = {}
for temp in ori_data:
if temp[0] not in scc_dic:
scc_dic[temp[0]] = [temp[1]]
else:
scc_dic[temp[0]].append(temp[1])
print scc_dic
##### iterative dfs ####
path = []
for i in range(max(max(ori_data)),0,-1):
start = i
q=[start]
while q:
v=q.pop(0)
if v not in path:
path.append(v)
q=revscc_dic[v]+q
print path
The code reads the data and forms Grev and G correctly. I have written a code for iterative dfs. How can i include to find the finishing time ?? I understand finding the finishing time using paper and pen but I do not understand the part of finishing time as a code ?? how can I implement it.. Only after this I can proceed my next part of code. Pls help. Thanks in advance.
The data.txt file contains:
1 4
2 8
3 6
4 7
5 2
6 9
7 1
8 5
8 6
9 7
9 3
please save it as data.txt.
With recursive dfs, it is easy to see when a given vertex has "finished" (i.e. when we have visited all of its children in the dfs tree). The finish time can be calculated just after the recursive call has returned.
However with iterative dfs, this is not so easy. Now that we are iteratively processing the queue using a while loop we have lost some of the nested structure that is associated with function calls. Or more precisely, we don't know when backtracking occurs. Unfortunately, there is no way to know when backtracking occurs without adding some additional information to our stack of vertices.
The quickest way to add finishing times to your dfs implementation is like so:
##### iterative dfs (with finish times) ####
path = []
time = 0
finish_time_dic = {}
for i in range(max(max(ori_data)),0,-1):
start = i
q = [start]
while q:
v = q.pop(0)
if v not in path:
path.append(v)
q = [v] + q
for w in revscc_dic[v]:
if w not in path: q = [w] + q
else:
if v not in finish_time_dic:
finish_time_dic[v] = time
time += 1
print path
print finish_time_dic
The trick used here is that when we pop off v from the stack, if it is the first time we have seen it, then we add it back to the stack again. This is done using: q = [v] + q. We must push v onto the stack before we push on its neighbours (we write the code that pushes v before the for loop that pushes v's neighbours) - or else the trick doesn't work. Eventually we will pop v off the stack again. At this point, v has finished! We have seen v before, so, we go into the else case and compute a fresh finish time.
For the graph provided, finish_time_dic gives the correct finishing times:
{1: 6, 2: 1, 3: 3, 4: 7, 5: 0, 6: 4, 7: 8, 8: 2, 9: 5}
Note that this dfs algorithm (with the finishing times modification) still has O(V+E) complexity, despite the fact that we are pushing each node of the graph onto the stack twice. However, more elegant solutions exist.
I recommend reading Chapter 5 of Python Algorithms: Mastering Basic Algorithms in the Python Language by Magnus Lie Hetland (ISBN: 1430232374, 9781430232377). Question 5-6 and 5-7 (on page 122) describe your problem exactly. The author answers these questions and gives an alternate solution to the problem.
Questions:
5-6 In recursive DFS, backtracking occurs when you return from one of the recursive calls. But where has the backtracking gone in the iterative version?
5-7. Write a nonrecursive version of DFS that can deal determine finish-times.
Answers:
5-6 It’s not really represented at all in the iterative version. It just implicitly occurs once you’ve popped off all your “traversal descendants” from the stack.
5-7 As explained in Exercise 5-6, there is no point in the code where backtracking occurs in the iterative DFS, so we can’t just set the finish time at some specific place (like in the recursive one). Instead, we’d need to add a marker to the stack. For example, instead of adding the neighbors of u to the stack, we could add edges of the form (u, v), and before all of them, we’d push (u, None), indicating the backtracking point for u.
Iterative DFS itself is not complicated, as seen from Wikipedia. However, calculating the finish time of each node requires some tweaks to the algorithm. We only pop the node off the stack the 2nd time we encounter it.
Here's my implementation which I feel demonstrates what's going on a bit more clearly:
step = 0 # time counter
def dfs_visit(g, v):
"""Run iterative DFS from node V"""
global step
total = 0
stack = [v] # create stack with starting vertex
while stack: # while stack is not empty
step += 1
v = stack[-1] # peek top of stack
if v.color: # if already seen
v = stack.pop() # done with this node, pop it from stack
if v.color == 1: # if GRAY, finish this node
v.time_finish = step
v.color = 2 # BLACK, done
else: # seen for first time
v.color = 1 # GRAY: discovered
v.time_discover = step
total += 1
for w in v.child: # for all neighbor (v, w)
if not w.color: # if not seen
stack.append(w)
return total
def dfs(g):
"""Run DFS on graph"""
global step
step = 0 # reset step counter
for k, v in g.nodes.items():
if not v.color:
dfs_visit(g, v)
I am following the conventions of the CLR Algorithm Book and use node coloring to designate its state during the DFS search. I feel this is easier to understand than using a separate list to track node state.
All nodes start out as white. When it's discovered during the search it is marked as gray. When we are done with it, it is marked as black.
Within the while loop, if a node is white we keep it in the stack, and change its color to gray. If it's gray we change its color to black, and set its finish time. If it's black we just ignore it.
It is possible for a node on the stack to be black (even with our coloring check before adding it to the stack). A white node can be added to the stack twice (via two different neighbors). One will eventually turn black. When we reach the 2nd instance on the stack, we need to make sure we don't change its already set finish time.
Here are some additional support codes:
class Node(object):
def __init__(self, name=None):
self.name = name
self.child = [] # children | adjacency list
self.color = 0 # 0: white [unvisited], 1: gray [found], 2: black [finished]
self.time_discover = None # DFS
self.time_finish = None # DFS
class Graph(object):
def __init__(self):
self.nodes = defaultdict(Node) # list of Nodes
self.max_heap = [] # nodes in decreasing finish time for SCC
def build_max_heap(self):
"""Build list of nodes in max heap using DFS finish time"""
for k, v in self.nodes.items():
self.max_heap.append((0-v.time_finish, v)) # invert finish time for max heap
heapq.heapify(self.max_heap)
To run DFS on the reverse graph, you can build a parent list similar to the child list for each Node when the edges file is processed, and use the parent list instead of the child list in dfs_visit().
To process Nodes in decreasing finish time for the last part of SCC computation, you can build a max heap of Nodes, and use that max heap in dfs_visit() instead of simply the child list.
while g.max_heap:
v = heapq.heappop(g.max_heap)[1]
if not v.color:
size = dfs_visit(g, v)
scc_size.append(size)
I had a few issues with the order produced by Lawson's version of the iterative DFS. Here is code for my version which has a 1-to-1 mapping with a recursive version of DFS.
n = len(graph)
time = 0
finish_times = [0] * (n + 1)
explored = [False] * (n + 1)
# Determine if every vertex connected to v
# has already been explored
def all_explored(G, v):
if v in G:
for w in G[v]:
if not explored[w]:
return False
return True
# Loop through vertices in reverse order
for v in xrange(n, 0, -1):
if not explored[v]:
stack = [v]
while stack:
print(stack)
v = stack[-1]
explored[v] = True
# If v still has outgoing edges to explore
if not all_explored(graph_reversed, v):
for w in graph_reversed[v]:
# Explore w before others attached to v
if not explored[w]:
stack.append(w)
break
# We have explored vertices findable from v
else:
stack.pop()
time += 1
finish_times[v] = time
Here are the recursive and iterative implementations in java:
int time = 0;
public void dfsRecursive(Vertex vertex) {
time += 1;
vertex.setVisited(true);
vertex.setDiscovered(time);
for (String neighbour : vertex.getNeighbours()) {
if (!vertices.get(neighbour).getVisited()) {
dfsRecursive(vertices.get(neighbour));
}
}
time += 1;
vertex.setFinished(time);
}
public void dfsIterative(Vertex vertex) {
Stack<Vertex> stack = new Stack<>();
stack.push(vertex);
while (!stack.isEmpty()) {
Vertex current = stack.pop();
if (!current.getVisited()) {
time += 1;
current.setVisited(true);
current.setDiscovered(time);
stack.push(current);
List<String> currentsNeigbours = current.getNeighbours();
for (int i = currentsNeigbours.size() - 1; i >= 0; i--) {
String currentNeigbour = currentsNeigbours.get(i);
Vertex neighBour = vertices.get(currentNeigbour);
if (!neighBour.getVisited())
stack.push(neighBour);
}
} else {
if (current.getFinished() < 1) {
time += 1;
current.setFinished(time);
}
}
}
}
First, you should know exactly what is finished time. In recursive dfs, finished time is when all of the adjacent nodes [V]s of a Node v is finished,
with this keeping in mind you need to have additional data structure to store all infos.
adj[][] //graph
visited[]=NULL //array of visited node
finished[]=NULL //array of finished node
Stack st=new Stack //normal stack
Stack backtrack=new Stack //additional stack
function getFinishedTime(){
for(node i in adj){
if (!vistied.contains[i]){
st.push(i);
visited.add(i)
while(!st.isEmpty){
int j=st.pop();
int[] unvisitedChild= getUnvistedChild(j);
if(unvisitedChild!=null){
for(int c in unvisitedChild){
st.push(c);
visited.add(c);
}
backtrack.push([j,unvisitedChild]); //you can store each entry as array with the first index as the parent node j, followed by all the unvisited child node.
}
else{
finished.add(j);
while(!backtrack.isEmpty&&finished.containsALL(backtrack.peek())) //all of the child node is finished, then we can set the parent node visited
{
parent=backtrack.pop()[0];
finished.add(parent);
}
}
}
}
}
function getUnvistedChild(int i){
unvisitedChild[]=null
for(int child in adj[i]){
if(!visited.contains(child))
unvisitedChild.add(child);
}
return unvisitedChild;
}
and the finished time should be
[5, 2, 8, 3, 6, 9, 1, 4, 7]