I've been trying to create a graph representation for the popular kevin bacon game. I have created the graph and vertex classes, but am having some trouble creating a Breadth first search method to traverse the graph and find the shortest path from Kevin Bacon to the actor, and print out the edges on the way.
The user should enter in an actor, and the program should find the shortest path from kevin bacon to that actor. The user will then keep entering actors, and the shortest path to that actor will be taken, and the kevin bacon number printed out, else it will print out none.
There is a vertex and graph class. The vertex class is a dictionary which contains the other vertexes it is connected to and the edges.
The data I am working with looks like this:
vertices:
["Kevin Bacon", "actor1", "actor2", "actor3", "actor4", "actor5", "actor6"]
edges:
("Kevin Bacon", "actor1", "movie1")
("Kevin Bacon", "actor2", "movie1")
("actor1", "actor2", "movie1")
("actor1", "actor3", "movie2")
("actor3", "actor2", "movie3")
("actor3", "actor4", "movie4")
("actor5", "actor6", "movie5")
Where the movie is the edge name or weight, and the other parts of the tuple are the vertices. I want the BFS algorithm to print out all of the edges and kevin bacon number, or print out that it is not possible if the actor cannot be reached.
Here is the code so far. Any advice and help is appreciated.
Thank you for your time
class Vertex:
'''
keep track of the vertices to which it is connected, and the weight of each edge
'''
def __init__(self, key):
'''
'''
self.ID = key
self.connected_to = {}
def add_neighbor(self, neighbor, weight=0):
'''
add a connection from this vertex to anothe
'''
self.connected_to[neighbor] = weight
def __str__(self):
'''
returns all of the vertices in the adjacency list, as represented by the connectedTo instance variable
'''
return str(self.ID) + ' connected to: ' + str([x.ID for x in self.connected_to])
def get_connections(self):
'''
returns all of the connections for each of the keys
'''
return self.connected_to.keys()
def get_ID(self):
'''
returns the current key id
'''
return self.ID
def get_weight(self, neighbor):
'''
returns the weight of the edge from this vertex to the vertex passed as a parameter
'''
return self.connected_to[neighbor]
class Graph:
'''
contains a dictionary that maps vertex names to vertex objects.
'''
def __init__(self):
'''
'''
self.vert_list = {}
self.num_vertices = 0
def __str__(self):
'''
'''
edges = ""
for vert in self.vert_list.values():
for vert2 in vert.get_connections():
edges += "(%s, %s)\n" %(vert.get_ID(), vert2.get_ID())
return edges
def add_vertex(self, key):
'''
adding vertices to a graph
'''
self.num_vertices = self.num_vertices + 1
new_vertex = Vertex(key)
self.vert_list[key] = new_vertex
return new_vertex
def get_vertex(self, n):
'''
'''
if n in self.vert_list:
return self.vert_list[n]
else:
return None
def __contains__(self, n):
'''
in operator
'''
return n in self.vert_list
def add_edge(self, f, t, cost=0):
'''
connecting one vertex to another
'''
if f not in self.vert_list:
nv = self.add_vertex(f)
if t not in self.vert_list:
nv = self.add_vertex(t)
self.vert_list[f].add_neighbor(self.vert_list[t], cost)
def get_vertices(self):
'''
returns the names of all of the vertices in the graph
'''
return self.vert_list.keys()
def __iter__(self):
'''
for functionality
'''
return iter(self.vert_list.values())
def bfs(self):
'''
Needs to be implemented
'''
pass
Get an actor
Check if the actor is Kevin Bacon
If the actor is Kevin Bacon, go back along the path you took
If the actor is not Kevin Bacon then find all the actors connected to this actor who you have not already checked.
Add all the actors who this actor is connected to to your list to check.
The hardest problem you will have here is keeping a record of which vertexes you have already visited. As such I think your algorithm should check a list of vertexes. Some assumptions:
Each vertex is listed only once.
Vertexes are single direction only. This means that if you want to go from Actor 1 to Actor 2 and Actor 2 to Actor 1, you need two vertexes, one for each actor essentially.
You have weights, but I don't see how they're relevant for this. I'll try to implement them though. Also your default weight should not be 0, or all paths will be equally short (0*n = 0).
OK lets go.
def bfs(self, actor):
from heapq import heappush, heappop
if actor == "Kevin Bacon":
return print("This actor is Kevin Bacon!")
visited = set()
checked = []
n = 0
heappush(checked, (0, n, [self.get_vertex(actor)]))
# if the list is empty we haven't been able to find any path
while checked:
# note that we pop our current list out of the list of checked lists,
# if all of the children of this list have been visited it won't be
# added again
current_list = heappop(checked)[2]
current_vertex = current_list[-1]
if current_vertex.ID == "Kevin Bacon":
return print(current_list)
for child in current_vertex.get_connections():
if child in visited:
# we've already found a shorter path to this actor
# don't add this path into the list
continue
n += 1
# make a hash function for the vertexes, probably just
# the hash of the ID is enough, ptherwise the memory address
# is used and identical vertices can be visited multiple times
visited.add(child)
w = sum(current_list[i].get_weight(current_list[i+1])
for i in range(len(current_list)-1))
heappush(checked, (w, n, current_list + [child]))
print("no path found!")
You should also implement a __repr__() method for your vertex class. With the one I used, the output looks like this:
g = Graph()
for t in [("Kevin Bacon", "actor1", "movie1")
,("Kevin Bacon", "actor2", "movie1")
,("actor1", "actor2", "movie1")
,("actor1", "actor3", "movie2")
,("actor3", "actor2", "movie3")
,("actor3", "actor4", "movie4")
,("actor5", "actor6", "movie5")]:
g.add_edge(t[0],t[1],cost=1)
g.add_edge(t[1],t[0],cost=1)
g.bfs("actor4")
# prints [Vertex(actor4), Vertex(actor3), Vertex(actor2), Vertex(Kevin Bacon)]
I originally wasn't going to use heapq to do this, but in the end decided I might as well. Essentially, you need to sort your checked list to get the shortest path first. The simplest way to do this is to just sort your list every time you want to pop the smallest value off the top, but this can get very slow when your list is getting large. Heapq can keep the list sorted in a more efficient manner, but there is no key method to get the smallest value of the list we add, so we need to fake it by using a tuple. The first value in the tuple is the actual cost of the path, while the second one is simply a "tie breaker" so that we do not try to compare Vertexes (which are not ordered and will raise an exception if we try to do so).
Related
Given a list of words, determine whether the words can be chained to form a circle. A word X
can be placed in front of another word Y in a circle if the last character of X is the same as
the first character of Y.
For example, the words ['chair', 'height', 'racket', touch', 'tunic'] can form the following circle:
chair --> racket --> touch --> height --> tunic --> chair
The output it has to be a txt file with one word per line, ex:
chair
racket
touch
height
tunic
I searched for the solution, but i only managed to get the partial solution which answers wether or not it can be a circle.
# Python program to check if a given directed graph is Eulerian or not
CHARS = 26
# A class that represents an undirected graph
class Graph(object):
def __init__(self, V):
self.V = V # No. of vertices
self.adj = [[] for x in range(V)] # a dynamic array
self.inp = [0] * V
# function to add an edge to graph
def addEdge(self, v, w):
self.adj[v].append(w)
self.inp[w]+=1
# Method to check if this graph is Eulerian or not
def isSC(self):
# Mark all the vertices as not visited (For first DFS)
visited = [False] * self.V
# Find the first vertex with non-zero degree
n = 0
for n in range(self.V):
if len(self.adj[n]) > 0:
break
# Do DFS traversal starting from first non zero degree vertex.
self.DFSUtil(n, visited)
# If DFS traversal doesn't visit all vertices, then return false.
for i in range(self.V):
if len(self.adj[i]) > 0 and visited[i] == False:
return False
# Create a reversed graph
gr = self.getTranspose()
# Mark all the vertices as not visited (For second DFS)
for i in range(self.V):
visited[i] = False
# Do DFS for reversed graph starting from first vertex.
# Starting Vertex must be same starting point of first DFS
gr.DFSUtil(n, visited)
# If all vertices are not visited in second DFS, then
# return false
for i in range(self.V):
if len(self.adj[i]) > 0 and visited[i] == False:
return False
return True
# This function returns true if the directed graph has an eulerian
# cycle, otherwise returns false
def isEulerianCycle(self):
# Check if all non-zero degree vertices are connected
if self.isSC() == False:
return False
# Check if in degree and out degree of every vertex is same
for i in range(self.V):
if len(self.adj[i]) != self.inp[i]:
return False
return True
# A recursive function to do DFS starting from v
def DFSUtil(self, v, visited):
# Mark the current node as visited and print it
visited[v] = True
# Recur for all the vertices adjacent to this vertex
for i in range(len(self.adj[v])):
if not visited[self.adj[v][i]]:
self.DFSUtil(self.adj[v][i], visited)
# Function that returns reverse (or transpose) of this graph
# This function is needed in isSC()
def getTranspose(self):
g = Graph(self.V)
for v in range(self.V):
# Recur for all the vertices adjacent to this vertex
for i in range(len(self.adj[v])):
g.adj[self.adj[v][i]].append(v)
g.inp[v]+=1
return g
# This function takes an of strings and returns true
# if the given array of strings can be chained to
# form cycle
def canBeChained(arr, n):
# Create a graph with 'alpha' edges
g = Graph(CHARS)
# Create an edge from first character to last character
# of every string
for i in range(n):
s = arr[i]
g.addEdge(ord(s[0])-ord('a'), ord(s[len(s)-1])-ord('a'))
# The given array of strings can be chained if there
# is an eulerian cycle in the created graph
return g.isEulerianCycle()
# Driver program
arr1 = ["for", "geek", "rig", "kaf"]
n1 = len(arr1)
if canBeChained(arr1, n1):
print ("Can be chained")
else:
print ("Cant be chained")
arr2 = ["aab", "abb"]
n2 = len(arr2)
if canBeChained(arr2, n2):
print ("Can be chained")
else:
print ("Can't be chained")
Source: https://www.geeksforgeeks.org/given-array-strings-find-strings-can-chained-form-circle/
This solution only returns the Boolean statement of the list, it means that if there is a circle it will output True. The goal for me is to try and expand this solution to give the list separated, i will give another example:
Input:
{"for", "geek", "rig", "kaf"}
Output:
for
rig
geek
kaf
for
The problem you're describing is the Eulerian circuit problem.
There is an algorithm implemented in module networkx:
networkx.algorithms.euler.eulerian_circuit
from networkx import DiGraph, eulerian_circuit
words = ['chair', 'height', 'racket', 'touch', 'tunic']
G = DiGraph()
G.add_weighted_edges_from(((w[0], w[-1], w) for w in words), weight='word')
result = [G[a][b]['word'] for a,b in eulerian_circuit(G)]
print(result)
# ['chair', 'racket', 'touch', 'height', 'tunic']
This seems like a lot of effort to solve this problem. Consider a simple solution like:
from collections import defaultdict
words = ['chair', 'height', 'racket', 'touch', 'tunic']
def findChains(words):
dictionary = defaultdict(list)
for word in words:
dictionary[word[0]].append(word)
chains = [[words[0]]] # start with an arbitrary word
while True:
new_chains = []
for chain in chains:
for follower in dictionary[chain[-1][-1]]:
if follower in chain:
continue
new_chains.append([*chain, follower])
if new_chains:
chains = new_chains
else:
break
return [chain for chain in chains if len(chain) == len(words) and chain[-1][-1] == chain[0][0]]
print(findChains(words))
OUTPUT
% python3 test.py
[['chair', 'racket', 'touch', 'height', 'tunic']]
%
Is the issue that a simple algorithm like the above becomes unworkable as the list of words gets longer? You also seem to assume a single solution, but with enough start and end letter redundancy, there could be multiple solutions. You need to code for multiple even if in the end you just pick one.
Load the graph at the input. This graph started as a tree (i.e. an unoriented graph that does not contain loops) with n vertices numbered 1 to n, to which one edge was added that it did not contain. The input graph is represented as a list of its edges, where a_i b_i says that the graph has an edge between the vertices a_i and b_i.
Program will list which edge we can remove from the graph to create a tree with n vertices from the graph. If more than one answer is possible, answer with the one at the input last.
For example, to input:
1 2
1 3
2 3
Program will answer 2 3
For input:
1 2
2 3
3 4
1 4
1 5
Answer 1 4
I have a code that can determine if numbers are a tree, but I don't know how to make it so that they can be entered, and how to make it so that it removes unnecessary edges:
from collections import defaultdict
class Graph():
def __init__(self, V):
self.V = V
self.graph = defaultdict(list)
def addEdge(self, v, w):
self.graph[v].append(w)
self.graph[w].append(v)
def isCyclicUtil(self, v, visited, parent):
visited[v] = True
for i in self.graph[v]:
if visited[i] == False:
if self.isCyclicUtil(i, visited, v) == True:
return True
elif i != parent:
return True
return False
def isTree(self):
visited = [False] * self.V
if self.isCyclicUtil(0, visited, -1) == True:
return False
for i in range(self.V):
if visited[i] == False:
return False
return True
g1 = Graph(5)
g1.addEdge(1, 0)
g1.addEdge(0, 2)
g1.addEdge(0, 3)
g1.addEdge(2, 3)
if g1.isTree() == True:
print("Tree")
else:
print("Not Tree")
You can read input from standard input via the input function.
Your code currently creates the graph structure, and then determines whether two nodes are connected by walking through the graph with a depth-first traversal.
I would however suggest a slightly more efficient algorithm: instead of creating a graph, create a disjoint set. There are several libraries out there that offer this data structure, but I'll throw my own in the below code.
This structure keeps track of which nodes belong to the same connected group(s).
Then the algorithm becomes simple: for each edge you read from the input, indicate (using the disjoint set interface) that the two involved nodes belong to the same set. Before doing that however, check whether they already are in the same set. If this happens, stop the algorithm and output this edge.
Here is the generic DisjointSet class I will be using:
class DisjointSet:
class Element:
def __init__(self):
self.parent = self
self.rank = 0
def __init__(self):
self.elements = {}
def add(self, key):
el = self.Element()
self.elements[key] = el
return el
def find(self, key, add_if_not_exists=False):
el = self.elements.get(key, None)
if not el:
if add_if_not_exists:
el = self.add(key)
return el
# Path splitting algorithm
while el.parent != el:
el, el.parent = el.parent, el.parent.parent
return el
def union(self, key=None, *otherkeys):
if key is not None:
root = self.find(key, True)
for otherkey in otherkeys:
el = self.find(otherkey, True)
if el != root:
# Union by rank
if root.rank < el.rank:
root, el = el, root
el.parent = root
if root.rank == el.rank:
root.rank += 1
And here is the code specific for the problem:
def solve():
visited = DisjointSet()
while True:
edge = input().split()
a = visited.find(edge[0])
b = visited.find(edge[1])
if a and a is b: # This edge creates a cycle
print(" ".join(edge))
break # Stop reading more input
visited.union(*edge)
As you can see, this code assumes that the input will have an edge that creates a cycle (like is stated in the problem description). So it only has a way to stop the process when such an offending edge is found.
I am trying to build a graph using Python. For every vertex, there is additional information about whether sufficient food is available at that vertex.
class Vertex:
"""
Vertex represents a location in the quokka's quest to find food.
It contains the relevant information surrounding the location.
Attributes:
* self.has_food (bool) - indicates whether this location has food.
* self.edges (List[Vertex]) - list of connected vertices.
"""
def __init__(self, has_food: bool) -> None:
"""
Initialises this vertex, by setting the attribute whether it has food.
:param has_food - boolean indicating whether this location has food.
"""
self.has_food = has_food
self.edges = []
In the graph class, I am trying to build a function called find_path(s, t, k); this operation consists of finding a path from vertex s to vertex t such that from any location with food along this path we reach the next location with food in at most k steps.
I was trying to find all paths, but I get stuck on how to check whether "from any location with food along this path we reach the next location with food in at most k steps."
For the graph class, I already have some operations defined. They are:
add_vertex(v) -> bool
Adds the Vertex v to the graph, returning True if the operation was successful, False if it was not, or it was invalid.
fix_edge(u, v) -> bool
Fixes an edge between two vertices, u and v. If an edge already exists, or the edge is invalid, then this operation should return False. Else, if the operation succeeds, return True.
Example: If an edge between u and v already exists or is invalid, fix_edge(u, v) should return False.
block_edge(u, v) -> bool
Blocks the edge between two vertices, u and v. Removes the edge if it exists, and returns True if the operation was successful. If the edge does not exist or is invalid, it should be unsuccessful and return False.
My code for find_path(s, t, k) looks like this and it's incomplete.
def find_path(self, s: Vertex, t: Vertex, k: in) -> Union[List[Vertex], None]:
"""
find_path returns a SIMPLE path between `s` and `t` such that from any
location with food along this path we reach the next location with food
in at most `k` steps
:param s - The start vertex for the quokka colony
:param t - The destination for the quokka colony
:param k - The maximum number of hops between locations with food, so
that the colony can survive!
:returns
* The list of vertices to form the simple path from `s` to `t`
satisfying the conditions.
OR
* None if no simple path exists that can satisfy the conditions, or
is invalid.
Example:
(* means the vertex has food)
* *
A---B---C---D---E
1/ find_path(s=A, t=E, k=2) -> returns: [A, B, C, D, E]
2/ find_path(s=A, t=E, k=1) -> returns: None
(because there isn't enough food!)
3/ find_path(s=A, t=C, k=4) -> returns: [A, B, C]
"""
# TODO implement me please
if k < 0:
return False
else:
path = path + [s]
if s == t:
return [path]
if not graph.has_key(s):
return []
paths = []
for node in graph[s]:
if node not in path:
newpaths = find_all_paths(graph, node, t, path)
for newpath in newpaths:
paths.append(newpath)
return paths
Can someone help me figure out how to implement the path-finding algorithm?
This is more of a conceptual programming question, so bear with me:
Say you have a list of scenes in a movie, and each scene may or may not make reference to past/future scenes in the same movie. I'm trying to find the most efficient algorithm of sorting these scenes. There may not be enough information for the scenes to be completely sorted, of course.
Here's some sample code in Python (pretty much pseudocode) to clarify:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
class Scene:
def __init__(self, scene_id, references):
self.id = scene_id
self.references = references
def __repr__(self):
return self.id
def relative_sort(scenes):
return scenes # Algorithm in question
def main():
s1 = Scene('s1', [
Reference('s3', 'after')
])
s2 = Scene('s2', [
Reference('s1', 'before'),
Reference('s4', 'after')
])
s3 = Scene('s3', [
Reference('s4', 'after')
])
s4 = Scene('s4', [
Reference('s2', 'before')
])
print relative_sort([s1, s2, s3, s4])
if __name__ == '__main__':
main()
The goal is to have relative_sort return [s4, s3, s2, s1] in this case.
If it's helpful, I can share my initial attempt at the algorithm; I'm a little embarrassed at how brute-force it is. Also, if you're wondering, I'm trying to decode the plot of the film "Mulholland Drive".
FYI: The Python tag is only here because my pseudocode was written in Python.
The algorithm you are looking for is a topological sort:
In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks.
You can compute this pretty easily using a graph library, for instance, networkx, which implements topological_sort. First we import the library and list all of the relationships between scenes -- that is, all of the directed edges in the graph
>>> import networkx as nx
>>> relations = [
(3, 1), # 1 after 3
(2, 1), # 2 before 1
(4, 2), # 2 after 4
(4, 3), # 3 after 4
(4, 2) # 4 before 2
]
We then create a directed graph:
>>> g = nx.DiGraph(relations)
Then we run a topological sort:
>>> nx.topological_sort(g)
[4, 3, 2, 1]
I have included your modified code in my answer, which solves the current (small) problem, but without a larger sample problem, I'm not sure how well it would scale. If you provide the actual problem you're trying to solve, I'd love to test and refine this code until it works on that problem, but without test data I won't optimize this solution any further.
For starters, we track references as sets, not lists.
Duplicates don't really help us (if "s1" before "s2", and "s1" before "s2", we've gained no information)
This also lets us add inverse references with abandon (if "s1" comes before "s2", then "s2" comes after "s1").
We compute a min and max position:
Min position based on how many scenes we come after
This could be extended easily: If we come after two scenes with a min_pos of 2, our min_pos is 4 (If one is 2, other must be 3)
Max position based on how many things we come before
This could be extended similarly: If we come before two scenes with a max_pos of 4, our max_pos is 2 (If one is 4, other must be 3)
If you decide to do this, just replace pass in tighten_bounds(self) with code to try to tighten the bounds for a single scene (and set anything_updated to true if it works).
The magic is in get_possible_orders
Generates all valid orderings if you iterate over it
If you only want one valid ordering, it doesn't take the time to create them all
Code:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
def __repr__(self):
return '"%s %s"' % (self.relation, self.scene_id)
def __hash__(self):
return hash(self.scene_id)
def __eq__(self, other):
return self.scene_id == other.scene_id and self.relation == other.relation
class Scene:
def __init__(self, title, references):
self.title = title
self.references = references
self.min_pos = 0
self.max_pos = None
def __repr__(self):
return '%s (%s,%s)' % (self.title, self.min_pos, self.max_pos)
inverse_relation = {'before': 'after', 'after': 'before'}
def inverted_reference(scene, reference):
return Reference(scene.title, inverse_relation[reference.relation])
def is_valid_addition(scenes_so_far, new_scene, scenes_to_go):
previous_ids = {s.title for s in scenes_so_far}
future_ids = {s.title for s in scenes_to_go}
for ref in new_scene.references:
if ref.relation == 'before' and ref.scene_id in previous_ids:
return False
elif ref.relation == 'after' and ref.scene_id in future_ids:
return False
return True
class Movie:
def __init__(self, scene_list):
self.num_scenes = len(scene_list)
self.scene_dict = {scene.title: scene for scene in scene_list}
self.set_max_positions()
self.add_inverse_relations()
self.bound_min_max_pos()
self.can_tighten = True
while self.can_tighten:
self.tighten_bounds()
def set_max_positions(self):
for scene in self.scene_dict.values():
scene.max_pos = self.num_scenes - 1
def add_inverse_relations(self):
for scene in self.scene_dict.values():
for ref in scene.references:
self.scene_dict[ref.scene_id].references.add(inverted_reference(scene, ref))
def bound_min_max_pos(self):
for scene in self.scene_dict.values():
for ref in scene.references:
if ref.relation == 'before':
scene.max_pos -= 1
elif ref.relation == 'after':
scene.min_pos += 1
def tighten_bounds(self):
anything_updated = False
for scene in self.scene_dict.values():
pass
# If bounds for any scene are tightened, set anything_updated back to true
self.can_tighten = anything_updated
def get_possible_orders(self, scenes_so_far):
if len(scenes_so_far) == self.num_scenes:
yield scenes_so_far
raise StopIteration
n = len(scenes_so_far)
scenes_left = set(self.scene_dict.values()) - set(scenes_so_far)
valid_next_scenes = set(s
for s in scenes_left
if s.min_pos <= n <= s.max_pos)
# valid_next_scenes = sorted(valid_next_scenes, key=lambda s: s.min_pos * self.num_scenes + s.max_pos)
for s in valid_next_scenes:
if is_valid_addition(scenes_so_far, s, scenes_left - {s}):
for valid_complete_sequence in self.get_possible_orders(scenes_so_far + (s,)):
yield valid_complete_sequence
def get_possible_order(self):
return self.get_possible_orders(tuple()).__next__()
def relative_sort(lst):
try:
return [s.title for s in Movie(lst).get_possible_order()]
except StopIteration:
return None
def main():
s1 = Scene('s1', {Reference('s3', 'after')})
s2 = Scene('s2', {
Reference('s1', 'before'),
Reference('s4', 'after')
})
s3 = Scene('s3', {
Reference('s4', 'after')
})
s4 = Scene('s4', {
Reference('s2', 'before')
})
print(relative_sort([s1, s2, s3, s4]))
if __name__ == '__main__':
main()
As others have pointed out, you need a topological sort. A depth first traversal of the directed graph where the order relation forms the edges is all you need. Visit in post order. This the reverse of a topo sort. So to get the topo sort, just reverse the result.
I've encoded your data as a list of pairs showing what's known to go before what. This is just to keep my code short. You can just as easily traverse your list of classes to create the graph.
Note that for topo sort to be meaningful, the set being sorted must satisfy the definition of a partial order. Yours is fine. Order constraints on temporal events naturally satisfy the definition.
Note it's perfectly possible to create a graph with cycles. There's no topo sort of such a graph. This implementation doesn't detect cycles, but it would be easy to modify it to do so.
Of course you can use a library to get the topo sort, but where's the fun in that?
from collections import defaultdict
# Before -> After pairs dictating order. Repeats are okay. Cycles aren't.
# This is OP's data in a friendlier form.
OrderRelation = [('s3','s1'), ('s2','s1'), ('s4','s2'), ('s4','s3'), ('s4','s2')]
class OrderGraph:
# nodes is an optional list of items for use when some aren't related at all
def __init__(self, relation, nodes=[]):
self.succ = defaultdict(set) # Successor map
heads = set()
for tail, head in relation:
self.succ[tail].add(head)
heads.add(head)
# Sources are nodes that have no in-edges (tails - heads)
self.sources = set(self.succ.keys()) - heads | set(nodes)
# Recursive helper to traverse the graph and visit in post order
def __traverse(self, start):
if start in self.visited: return
self.visited.add(start)
for succ in self.succ[start]: self.__traverse(succ)
self.sorted.append(start) # Append in post-order
# Return a reverse post-order visit, which is a topo sort. Not thread safe.
def topoSort(self):
self.visited = set()
self.sorted = []
for source in self.sources: self.__traverse(source)
self.sorted.reverse()
return self.sorted
Then...
>>> print OrderGraph(OrderRelation).topoSort()
['s4', 's2', 's3', 's1']
>>> print OrderGraph(OrderRelation, ['s1', 'unordered']).topoSort()
['s4', 's2', 's3', 'unordered', 's1']
The second call shows that you can optionally pass values to be sorted in a separate list. You may but don't have mention values already in the relation pairs. Of course those not mentioned in order pairs are free to appear anywhere in the output.
The problem specification is in https://www.dropbox.com/s/lmwxcsp3lie0x3n/437.pdf?dl=0
My solution is in http://ideone.com/3JsFCq
name = raw_input()
D = int(raw_input()) #degree of separation
N = int(raw_input()) #number of links
M = int(raw_input()) #book users
users = {}
books = {}
def build_edges(user1, user2):
if user1 not in users:
users[user1] = set([user2, ])
else:
users[user1].add(user2)
for i in xrange(N):
nw = raw_input()
us = nw.split('|')
build_edges(us[0], us[1])
build_edges(us[1], us[0])
def build_booklist(user1, book):
if user1 not in books:
users[user1] = []
else:
users[user1].append(user2)
for i in xrange(M):
bk = raw_input().split('|')
books[bk[0]] = []
for book in bk[1:]:
books[bk[0]].append(book)
rec = []
depth = [0,]
def bfs(graph, start):
visited, queue = set(), [start]
while queue:
vertex = queue.pop(0)
if vertex not in visited:
visited.add(vertex)
for book in books[vertex]:
if book not in books[start]:
rec.append(book)
queue.extend(graph[vertex] - visited)
depth[0] += 1
if depth[0] > D:
return
return visited
bfs(users, name)
print len(rec)
I couldn't find the corner cases.
It passes the example case, but it doesn't pass some others.
What is going wrong?
Your problem is that you are increasing the depth every time you process a vertex. Instead you need to store a depth for every vertex, and stop when you encounter a vertex with a depth larger that the given.
For example, if Alice has two friends, Bob and Carl, then as you process Alice, you will set depth to 1. Then as you process Bob, you will set it to two, and stop, before you process Carl, who is within distance one from Alice. Instead, you should set Alice's depth to 0, then as you add Bob and Carl to the queue, set their depths to 1, and process them. As you process them, you add their friends, whom you have not seen yet, with depth 2, and as soon as you encounter any of them in the main loop (pop from the queue), you stop.
UPDATE: also, add the first vertex to the visited set, when you initialize it. Otherwise you will process it as a vertex with depth two (you will add Alice's friend Bob with depth 1, and then Alice as Bob's friend with distance two). It doesn't hurt in this particular problem, but might be a problem if you make a similar mistake in a solution for some other BFS problem.