I want to find out whether I can reach all nodes from a certain node. I am not interested in the path, I just want to output YES or NO if I can or cannot. Let's assume I have the following graph - As a constraint, I need to represent my nodes as a tuple (i,j):
graph={
(1,1): [(1,2),(2,2)]
(1,2): [(1,3)]
(1,3): [(1,2),(2,3)]
(2,2): [(3,3)]
(2,3): []
(3,3): [(2,2)]
}
Now, I need to show if I can reach from (1,1), (2,2) or (3,3), i.e. (i,j) with i = j, all other nodes where i != j. If yes, print(YES) - if no, print(NO).
The example mentioned above would output YES for node(1,1), since I can reach (1,2), (1,3) and (2,3) via node (1,1).
I tried to use the following
G = nx.DiGraph()
G.add_edges_from(graph)
for reachable_node in nx.dfs_postorder_nodes(G, source=None):
print reachable_node
However, if I declare (1,1), (2,2) or (3,3) as my source in nx.dfs_postorder.nodes(), I get, e.g., following error -> KeyError: (1,1)
Which function or library (the more standard the library is the better!!) should I use to indicate whether I can reach all nodes from any of the (i, i) nodes?
Thanks for all clarifications! I am a new member, so if my question doesn't follow the Stackoverflow guidelines, feel free to tell me how I can improve my next questions!
This program should do the work and it uses just standard library (basically gives you all possible states that can be visited for a given starting point):
graph={
(1,1): [(1,2), (2,2)],
(1,2): [(1,3)],
(1,3): [(1,2), (2,3)],
(2,2): [(3,3)],
(2,3): [],
(3,3): [(2,2)]
}
node0 = (1,1) #choose the starting node
node0_connections = [node0] #this list will contain all the possible states that can be visited from node0
for node in node0_connections:
for node_to in graph[node]:
if node0_connections.count(node_to) == 0:
node0_connections.append(node_to)
print 'All possible states to be visted from node', node0,':', node0_connections,'.'
count = node0_connections.count((1,2)) + node0_connections.count((1,3)) + node0_connections.count((2,2))
if count == 3:
print 'YES'
else:
print 'NO'
I think I understand your question. You could try an exhaustive approach with a try/except block using nx.shortest_path like this:
import networkx as nx
graph={
(1,1): [(1,2),(2,2)],
(1,2): [(1,3)],
(1,3): [(1,2),(2,3)],
(2,2): [(3,3)],
(3,3): [(2,2)],
(4,4): [(1,3)],
(5,5): []
}
G = nx.Graph(graph)
nodes = G.nodes()
balanced_nodes = [node for node in G.nodes() if node[0] == node[1]]
unbalanced_nodes = [node for node in G.nodes() if node[0] != node[1]]
for balanced_node in balanced_nodes:
for unbalanced_node in unbalanced_nodes:
connected = True
try:
path = nx.shortest_path(G,balanced_node, unbalanced_node)
except:
connected = False
break
print(balanced_node, ": ", connected)
This results in:
(1, 1) : True
(2, 2) : True
(3, 3) : True
(4, 4) : True
(5, 5) : False
Related
The code below should be completely reproducible. I tried my best to make question is clear, if not please ask for clarification.
What I need to do:
"From set R, find the two closest nodes to the nodes in set P Call the closest node i and the next closest node j" - quoting page 157 end of paragraph that starts step 2, in this paper.
The list R is the ordered set of nodes in a graph and P contains sublists of nodes assigned to a particular vehicle. For example
R = [1,5,6,9] # Size of R might change for new k
P = [[4],[7,3,8],[2]] # Sizes of sublists might change for new k
So vehicle k=0 takes gets node P[0]= [4], vehicle k=1 gets nodes P[1] = [7,3,8] and vehicle k=2 gets node P[2] = [2]. For each sublist in P, I want to find the two closest nodes in R from P[k].
The distances are stored in a dict:
dist =
{(1,4) : 52.35456045083369,
(5,4) : 37.48332962798263,
(6,4) : 52.92447448959697,
(9,4) : 76.83749084919418,
(1,7) : 94.89467845985885,
(1,3) : 58.9406481131655,
(1,8) : 11.180339887498949,
(5,7) : 54.817880294662984,
(5,3) : 51.478150704935004,
(5,8) : 45.044422518220834,
(6,7) : 27.80287754891569,
(6,3) : 60.74537019394976,
(6,8) : 72.3671196055225,
(9,7) : 99.68951800465283,
(9,3) : 44.68780594300866,
(9,8) : 15.811388300841896,
(1,2) : 102.44998779892558,
(5,2) : 65.60487786742691,
(6,2) : 42.37924020083418,
(9,2) : 102.55242561733974}
where the first element in the tuple-key are the R-nodes and the second element are the P-nodes.
So first P[0] = [4], the two closest to this node in R are i = 5 since the distance from 5 to 4 is 37.48 and the second closest is j = 1 since the distance from 1 to 4 is 52.35.
Now we proceed to P[1] = [7,3,8]. Here, is where I run into trouble. I interpret the paper as "which two nodes in R are closest to the entire group [7,3,8]?" My first instinct was to calculate the average distance from the R-nodes to each node in P[1] and the smallest value is the closest.
I've made an attempt, but it only works if len(p[k]) = 1. The function I need is a function that takes in R and P, and spits out i and j for each k. Here is my code:
for k in range(2):
all_nodes_dict = {}
for i in range(len(R)):
all_nodes_dict[(R[i],P[k][0])] = dist[(R[i], P[k][0])]
min_list = sorted(list(all_nodes_dict.values()), key = lambda x:float(x))
min_vals = min_list[:2]
two_closest_nodes = []
for i in range(len(min_vals)):
two_closest_nodes += [return_key(min_vals[i], all_nodes_dict)[0]]
i = two_closest_nodes[0]
j = two_closest_nodes[1]
# do something with i and j before resetting them for new iteration
Here is my code for the function return_key().
# function to return key given value and dict
def return_key(val, my_dict):
for key, value in my_dict.items():
if val == value:
return key
return "key doesn't exist"
Here is the code to generate all distances, or the dist dictionary in my code:
n = 10
random.seed(1)
# Create n random points
points = [(0, 0)]
points += [(random.randint(0, 100), random.randint(0, 100)) for i in range(n - 1)]
# Dictionary of distances between each pair of points
dist = {
(i, j): math.sqrt(sum((points[i][p] - points[j][p]) ** 2 for p in range(2)))
for i in range(n)
for j in range(n)
if i != j
}
I was given a question during an interview and although my answer was accepted at the end they wanted a faster approach and I went blank..
Question :
Given an undirected graph, can you see if it's a tree? If so, return true and false otherwise.
A tree:
A - B
|
C - D
not a tree:
A
/ \
B - C
/
D
You'll be given two parameters: n for number of nodes, and a multidimensional array of edges like such: [[1, 2], [2, 3]], each pair representing the vertices connected by the edge.
Note:Expected space complexity : O(|V|)
The array edges can be empty
Here is My code: 105ms
def is_graph_tree(n, edges):
nodes = [None] * (n + 1)
for i in range(1, n+1):
nodes[i] = i
for i in range(len(edges)):
start_edge = edges[i][0]
dest_edge = edges[i][1]
if nodes[start_edge] != start_edge:
start_edge = nodes[start_edge]
if nodes[dest_edge] != dest_edge:
dest_edge = nodes[dest_edge]
if start_edge == dest_edge:
return False
nodes[start_edge] = dest_edge
return len(edges) <= n - 1
Here's one approach using a disjoint-set-union / union-find data structure:
def is_graph_tree(n, edges):
parent = list(range(n+1))
size = [1] * (n + 1)
for x, y in edges:
# find x (path splitting)
while parent[x] != x:
x, parent[x] = parent[x], parent[parent[x]]
# find y
while parent[y] != y:
y, parent[y] = parent[y], parent[parent[y]]
if x == y:
# Already connected
return False
# Union (by size)
if size[x] < size[y]:
x, y = y, x
parent[y] = x
size[x] += size[y]
return True
assert not is_graph_tree(4, [(1, 2), (2, 3), (3, 4), (4, 2)])
assert is_graph_tree(6, [(1, 2), (2, 3), (3, 4), (3, 5), (1, 6)])
The runtime is O(V + E*InverseAckermannFunction(V)), which better than O(V + E * log(log V)), so it's basically O(V + E).
Tim Roberts has posted a candidate solution, but this will work in the case of disconnected subtrees:
import queue
def is_graph_tree(n, edges):
# A tree with n nodes has n - 1 edges.
if len(edges) != n - 1:
return False
# Construct graph.
graph = [[] for _ in range(n)]
for first_vertex, second_vertex in edges:
graph[first_vertex].append(second_vertex)
graph[second_vertex].append(first_vertex)
# BFS to find edges that create cycles.
# The graph is undirected, so we can root the tree wherever we want.
visited = set()
q = queue.Queue()
q.put((0, None))
while not q.empty():
current_node, previous_node = q.get()
if current_node in visited:
return False
visited.add(current_node)
for neighbor in graph[current_node]:
if neighbor != previous_node:
q.put((neighbor, current_node))
# Only return true if the graph has only one connected component.
return len(visited) == n
This runs in O(n + len(edges)) time.
You could approach this from the perspective of tree leaves. Every leaf node in a tree will have exactly one edge connected to it. So, if you count the number of edges for each nodes, you can get the list of leaves (i.e. the ones with only one edge).
Then, take the linked node from these leaves and reduce their edge count by one (as if you were removing all the leaves from the tree. That will give you a new set of leaves corresponding to the parents of the original leaves. Repeat the process until you have no more leaves.
[EDIT] checking that the number of edges is N-1 eliminiates the need to do the multi-root check because there will be another discrepancy (e.g. double link, missing node) in the graph if there are multiple 'roots' or a disconnected subtree
If the graph is a tree, this process should eliminate all nodes from the node counts (i.e. they will all be flagged as leaves at some point).
Using the Counter class (from collections) will make this relatively easy to implement:
from collections import Counter
def isTree(N,E):
if N==1 and not E: return True # root only is a tree
if len(E) != N-1: return False # a tree has N-1 edges
counts = Counter(n for ab in E for n in ab) # edge counts per node
if len(counts) != N : return False # unlinked nodes
while True:
leaves = {n for n,c in counts.items() if c==1} # new leaves
if not leaves:break
for a,b in E: # subtract leaf counts
if counts[a]>1 and b in leaves: counts[a] -= 1
if counts[b]>1 and a in leaves: counts[b] -= 1
for n in leaves: counts[n] = -1 # flag leaves in counts
return all(c==-1 for c in counts.values()) # all must become leaves
output:
G = [[1,2],[1,3],[4,5],[4,6]]
print(isTree(6,G)) # False (disconnected sub-tree)
G = [[1,2],[1,3],[1,4],[2,3],[5,6]]
print(isTree(6,G)) # False (doubly linked node 3)
G = [[1,2],[2,6],[3,4],[5,1],[2,3]]
print(isTree(6,G)) # True
G = [[1,2],[2,3]]
print(isTree(3,G)) # True
G = [[1,2],[2,3],[3,4]]
print(isTree(4,G)) # True
G = [[1,2],[1,3],[2,5],[2,4]]
print(isTree(6,G)) # False (missing node)
Space complexity is O(N) because the counts dictionary has one entry per node(vertex) with an integer as value. Time complexity will be O(ExL) where E is the number of edges and L is the number of levels in the tree. The worts case time is O(E^2) for a tree where all parents have only one child node. However, since the initial condition is for E to be less than V, the worst case will actually be O(V^2)
Note that this algorithm makes no assumption on edge order or numerical relationships between node numbers. The root (last node to be made a leaf) found by this algorithm is not necessarily the only possible root given that, unless the nodes have an implicit cardinality relationship (or edges have an order), there could be ambiguous scenarios:
[1,2],[2,3],[2,4] could be:
1 2 3
|_2 OR |_1 OR |_2
|_3 |_3 |_1
|_4 |_4 |_4
If a cardinality relationship between node numbers or an order of edges can be relied upon, the algorithm could potentially be made more time efficient (because we could easily determine which node is the root and start from there).
[EDIT2] Alternative method using groups.
When the number of edges is N-1, if the graph is a tree, all nodes should be reachable from any other node. This means that, if we form groups of reachable nodes for each node and merge them together based on the edges, we should end up with a single group after going through all the edges.
Here is the modified function based on that approach:
def isTree(N,E):
if N==1 and not E: return True # root only is a tree
if len(E) != N-1: return False # a tree has N-1 edges
groups = {n:[n] for ab in E for n in ab} # each node in its own group
if len(groups) != N : return False # unlinked nodes
for a,b in E:
groups[a].extend(groups[b]) # merge groups
for n in groups[b]: groups[n] = groups[a] # update nodes' groups
return len(set(map(id,groups.values()))) == 1 # only one group when done
Given that we start out with fewer edges than nodes and that group merging will consume at most 2x a group size (so also < N), the space complexity will remain O(V). The time complexity will also be O(V^2) at for the worts case scenarios
You don't even need to know how many edges there are:
def is_graph_tree(n, edges):
seen = set()
for a,b in edges:
b = max(a,b)
if b in seen:
return False
seen.add(b)
return True
a = [[1,2],[2,3],[3,4]]
print(is_graph_tree(0,a))
b = [[1,2],[1,3],[2,3],[2,4]]
print(is_graph_tree(0,b))
Now, this WON'T catch the case of disconnected subtrees, but that wasn't in the problem description...
I have a tree, given e.g. as a networkx object. In order to inpput it into a black-box algorithm I was given, I need to save it in the following strange format:
Traverse the tree in a clockwise order. As I pass through one side of an edge, I label it incrementally. Then I want to save for each edge the labels of its two sides.
For example, a star will become a list [(0,1),(2,3),(4,5),...] and a path with 3 vertices will be [(0,3),(1,2)].
I am stumped with implementing this. How can this be done? I can use any library.
I'll answer this without reference to any library.
You would need to perform a depth-first traversal, and log the (global) incremental number before you visit a subtree, and also after you visited it. Those two numbers make up the tuple that you have to prepend to the result you get from the subtree traversal.
Here is an implementation that needs the graph to be represented as an adjacency list. The main function needs to get the root node and the adjacency list
def iter_naturals(): # helper function to produce sequential numbers
n = 0
while True:
yield n
n += 1
def half_edges(root, adj):
visited = set()
sequence = iter_naturals()
def dfs(node):
result = []
visited.add(node)
for child in adj[node]:
if child not in visited:
forward = next(sequence)
path = dfs(child)
backward = next(sequence)
result.extend([(forward, backward)] + path)
return result
return dfs(root)
Here is how you can run it for the two examples you mentioned. I have just implemented those graphs as adjacency lists, where nodes are identified by their index in that list:
Example 1: a "star":
The root is the parent of all other nodes
adj = [
[1,2,3], # 1,2,3 are children of 0
[],
[],
[]
]
print(half_edges(0, adj)) # [(0, 1), (2, 3), (4, 5)]
Example 2: a single path with 3 nodes
adj = [
[1], # 1 is a child of 0
[2], # 2 is a child of 1
[]
]
print(half_edges(0, adj)) # [(0, 3), (1, 2)]
I found this great built-in function dfs_labeled_edges in networkx. From there it is a breeze.
def get_new_encoding(G):
dfs = [(v[0],v[1]) for v in nx.dfs_labeled_edges(G, source=1) if v[0]!=v[1] and v[2]!="nontree"]
dfs_ind = sorted(range(len(dfs)), key=lambda k: dfs[k])
new_tree_encoding = [(dfs_ind[i],dfs_ind[i+1]) for i in range(0,len(dfs_ind),2)]
return new_tree_encoding
So this is for understanding, and I am trying to code a way to identify where there is a connection, sort of a like a society of nodes. Basically, if I input a matrix and a node, it will return True or False if the given node has components that are already related.
I have tried using a while loop to loop through visited sets, but I am still lost in the process. I feel more comfortable with for loops, in terms of understanding. If there is a way to iterate a list of submatrices to find relations between nodes that would be easy to understand and adapt.
def society(graph_matrix, node):
for item in (graph_matrix):
for j in item:
if graph_matrix[item][j] and graph_matrix[item][node] and graph_matrix[j][node] == 1:
return True
return False
gmatrix = [ [0,1,1,1,0],
[1,0,0,1,0],
[1,0,0,0,1],
[1,1,0,0,0],
[0,0,1,0,0] ]
so if I input (society(gmatrix,0)) the answer should return True, as when you look at node 0 you can see its connection to node 1 and node 3, and node 1 is connected to node 3 as can be observed in the gmatrix matrix. sorta like a society of nodes. I am
however, society(gmatrix,2) should return False, node 2 is connected to 0, and 4 but 0 and 4 are not connected.
In your code, for item in (graph_matrix):, here item represents a list of numbers.
And you can not use list of numbers as the matrix indices like this: graph_matrix[item][node].
As far as i understood your problem, you want to know if three nodes are interconnected or not. To do this you can modify your code in the following way:
def society(graph_matrix, node):
for i in range(len(graph_matrix[node])):
for j in range(len(graph_matrix[node])):
if graph_matrix[node][i] and graph_matrix[node][j] and graph_matrix[i][j] == 1:
return True
return False
gmatrix = [ [0,1,1,1,0],
[1,0,0,1,0],
[1,0,0,0,1],
[1,1,0,0,0],
[0,0,1,0,0] ]
print(society(gmatrix, 0));
Here, len(graph_matrix[node]) will return the length of graph_matrix[node] and range(len(graph_matrix[node])) will iterate from 0 to length-1.
I think that having your graph in matrix form makes this harder to think about than it needs to be. Converting the edge connection lists so that they are instead lists of connected nodes would make things easier (and, as a bonus, reduce the computational load in the event that society() would return False, more important as the number of nodes increase):
def to_map(gmatrix):
return [[k for k,v in enumerate(edges) if v] for edges in gmatrix]
Then you'd be able to do:
def society(graph_map, node):
for n in graph_map[node]:
if n == node:
continue
for nn in graph_map[n]:
if nn != node and nn != n and nn in graph_map[node]:
return True
return False
As in:
gmatrix = [ [0,1,1,1,0],
[1,0,0,1,0],
[1,0,0,0,1],
[1,1,0,0,0],
[0,0,1,0,0] ]
gmap = to_map(gmatrix)
print(society(gmap,0)) # True
print(society(gmap,2)) # False
Problem
I have a list of line segments:
exampleLineSegments = [(1,2),(2,3),(3,4),(4,5),(5,6),(4,7),(8,7)]
These segments include the indices of the corresponding point in a separate array.
From this sublist, one can see that there is a branching point (4). So three different branches are emerging from this branching point.
(In other, more specific problems, there might be / are multiple branching points for n branches.)
Target
My target is to get a dictionary including information about the existing branches, so e.g.:
result = { branch_1: [1,2,3,4],
branch_2: [4,5,6],
branch_3: [4,7,8]}
Current state of work/problems
Currently, I am identifying the branch points first by setting up a dictionary for each point and checking for each entry if there are more than 2 neighbor points found. This means that there is a branching point.
Afterwards I am crawling through all points emerging from these branch points, checking for successors etc.
In these functions, there are a some for loops and generally an intensive "crawling". This is not the cleanest solution and if the number of points increasing, the performance is not so good either.
Question
What is the best / fastest / most performant way to achieve the target in this case?
I think you can achieve it by following steps:
use a neighbors dict to store the graph
find all branch points, which neighbors count > 2
start from every branch point, and use dfs to find all the paths
from collections import defaultdict
def find_branch_paths(exampleLineSegments):
# use dict to store the graph
neighbors = defaultdict(list)
for p1, p2 in exampleLineSegments:
neighbors[p1].append(p2)
neighbors[p2].append(p1)
# find all branch points
branch_points = [k for k, v in neighbors.items() if len(v) > 2]
res = []
def dfs(cur, prev, path):
# reach the leaf
if len(neighbors[cur]) == 1:
res.append(path)
return
for neighbor in neighbors[cur]:
if neighbor != prev:
dfs(neighbor, cur, path + [neighbor])
# start from all the branch points
for branch_point in branch_points:
dfs(branch_point, None, [branch_point])
return res
update an iteration version, for big data, which may cause a recursion depth problem:
def find_branch_paths(exampleLineSegments):
# use dict to store the graph
neighbors = defaultdict(list)
for p1, p2 in exampleLineSegments:
neighbors[p1].append(p2)
neighbors[p2].append(p1)
# find all branch points
branch_points = [k for k, v in neighbors.items() if len(v) > 2]
res = []
# iteration way to dfs
stack = [(bp, None, [bp]) for bp in branch_points]
while stack:
cur, prev, path = stack.pop()
if len(neighbors[cur]) == 1 or (prev and cur in branch_points):
res.append(path)
continue
for neighbor in neighbors[cur]:
if neighbor != prev:
stack.append((neighbor, cur, path + [neighbor]))
return res
test and output:
print(find_branch_paths([(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (4, 7), (8, 7)]))
# output:
# [[4, 3, 2, 1], [4, 5, 6], [4, 7, 8]]
Hope that helps you, and comment if you have further questions. : )
UPDATE: if there are many branch points, the path will grow exponentially. So if you only want distinct segments, you can end the path when encounter another branch point.
change this line
if len(neighbors[cur]) == 1:
to
if len(neighbors[cur]) == 1 or (prev and cur in branch_points):