Python basic spanning tree algorithm - python

I cannot figure out how to implement a basic spanning tree in Python; an un-weighted spanning tree.
I've learned how to implement an adjacency list:
for edge in adj1:
x, y = edge[int(0)], edge[int(1)]
if x not in adj2: adj2[x] = set()
if y not in adj2: adj2[y] = set()
adj2[x].add(y)
adj2[y].add(x)
print(adj2)
But I don't know how to implement "finding nearest unconnected vertex".

You don't say which spanning-tree algorithm you need to use. DFS? BFS? Prim's with constant weights? Kruskal's with constant weights? Also, what do you mean by "nearest" unconnected vertex, since the graph is unweighted? All the vertices adjacent to a given vertex v will be at the same distance from v. Finally, are you supposed to start at an arbitrary vertex? at 0? at a user-specified vertex? I'll assume 0 and try to push you in the right direction.
You need to have some way to represent which vertices are already in the spanning tree, so that you don't re-add them to the tree along another edge. That would form a cycle, which can't happen in a tree. Since you definitely need a way to represent the tree, like the [ [1], [0,2,3], [1], [1] ] example you gave, one way to start things out is with [ [], [], [], [] ].
You also need to make sure that you eventually connect everything to a single tree, rather than, for example, finishing with two trees that, taken together, cover all the vertices, but that aren't connected to each other via any edge. There are two ways to do this: (1) Start with a single vertex and grow the tree incrementally until it covers all the nodes, so that you never have more than one tree. (2) Add edges in some other way, keeping track of the connected components, so that you can make sure eventually to connect all the components. Unless you know you need (2), I suggest sticking with (1).
Getting around to your specific question: If you have the input inputgraph = [[1,2],[0,2,3],[0,1],[1]], and you start with currtree = [ [], [], [], [] ], and you start at vertex 0, then you look at inputgraph[0], discover that 0 is adjacent to 1 and 2, and pick either (0,1) or (0,2) to be an edge in the tree. Let's say the algorithm you are using tells you to pick (0,1). Then you update currtree to be [ [1], [0], [], [] ]. Your algorithm then directs you either to pick another edge at 0, which in this inputgraph would have to be (0,2), or directs you to pick an edge at 1, which could be (1,2) or (1,3). Note that your algorithm has to keep track of the fact that (1,0) is not an acceptable choice at 1, since (0,1) is already in the tree.
You need to use one of the algorithms I listed above, or some other spanning-tree algorithm, to be systematic about which vertex to examine next, and which edge to pick next.
I hope that gave you an idea of the issues you have to consider and how you can map an abstract algorithm description to running code. I leave learning the algorithm to you!

Related

How to find trio in list of lists? [duplicate]

I am working with complex networks. I want to find group of nodes which forms a cycle of 3 nodes (or triangles) in a given graph. As my graph contains about million edges, using a simple iterative solution (multiple "for" loop) is not very efficient.
I am using python for my programming, if these is some inbuilt modules for handling these problems, please let me know.
If someone knows any algorithm which can be used for finding triangles in graphs, kindly reply back.
Assuming its an undirected graph, the answer lies in networkx library of python.
if you just need to count triangles, use:
import networkx as nx
tri=nx.triangles(g)
But if you need to know the edge list with triangle (triadic) relationship, use
all_cliques= nx.enumerate_all_cliques(g)
This will give you all cliques (k=1,2,3...max degree - 1)
So, to filter just triangles i.e k=3,
triad_cliques=[x for x in all_cliques if len(x)==3 ]
The triad_cliques will give a edge list with only triangles.
A million edges is quite small. Unless you are doing it thousands of times, just use a naive implementation.
I'll assume that you have a dictionary of node_ids, which point to a sequence of their neighbors, and that the graph is directed.
For example:
nodes = {}
nodes[0] = 1,2
nodes[1] = tuple() # empty tuple
nodes[2] = 1
My solution:
def generate_triangles(nodes):
"""Generate triangles. Weed out duplicates."""
visited_ids = set() # remember the nodes that we have tested already
for node_a_id in nodes:
for node_b_id in nodes[node_a_id]:
if nod_b_id == node_a_id:
raise ValueError # nodes shouldn't point to themselves
if node_b_id in visited_ids:
continue # we should have already found b->a->??->b
for node_c_id in nodes[node_b_id]:
if node_c_id in visited_ids:
continue # we should have already found c->a->b->c
if node_a_id in nodes[node_c_id]:
yield(node_a_id, node_b_id, node_c_id)
visited_ids.add(node_a_id) # don't search a - we already have all those cycles
Checking performance:
from random import randint
n = 1000000
node_list = range(n)
nodes = {}
for node_id in node_list:
node = tuple()
for i in range(randint(0,10)): # add up to 10 neighbors
try:
neighbor_id = node_list[node_id+randint(-5,5)] # pick a nearby node
except:
continue
if not neighbor_id in node:
node = node + (neighbor_id,)
nodes[node_id] = node
cycles = list(generate_triangles(nodes))
print len(cycles)
When I tried it, it took longer to build the random graph than to count the cycles.
You might want to test it though ;) I won't guarantee that it's correct.
You could also look into networkx, which is the big python graph library.
Pretty easy and clear way to do is to use Networkx:
With Networkx you can get the loops of an undirected graph by nx.cycle_basis(G) and then select the ones with 3 nodes
cycls_3 = [c for c in nx.cycle_basis(G) if len(c)==3]
or you can find all the cliques by find_cliques(G) and then select the ones you want (with 3 nodes). cliques are sections of the graph where all the nodes are connected to each other which happens in cycles/loops with 3 nodes.
Even though it isn't efficient, you may want to implement a solution, so use the loops. Write a test so you can get an idea as to how long it takes.
Then, as you try new approaches you can do two things:
1) Make certain that the answer remains the same.
2) See what the improvement is.
Having a faster algorithm that misses something is probably going to be worse than having a slower one.
Once you have the slow test, you can see if you can do this in parallel and see what the performance increase is.
Then, you can see if you can mark all nodes that have less than 3 vertices.
Ideally, you may want to shrink it down to just 100 or so first, so you can draw it, and see what is happening graphically.
Sometimes your brain will see a pattern that isn't as obvious when looking at algorithms.
I don't want to sound harsh, but have you tried to Google it? The first link is a pretty quick algorithm to do that:
http://www.mail-archive.com/algogeeks#googlegroups.com/msg05642.html
And then there is this article on ACM (which you may have access to):
http://portal.acm.org/citation.cfm?id=244866
(and if you don't have access, I am sure if you kindly ask the lady who wrote it, you will get a copy.)
Also, I can imagine a triangle enumeration method based on clique-decomposition, but I don't know if it was described somewhere.
I am working on the same problem of counting number of triangles on undirected graph and wisty's solution works really well in my case. I have modified it a bit so only undirected triangles are counted.
#### function for counting undirected cycles
def generate_triangles(nodes):
visited_ids = set() # mark visited node
for node_a_id in nodes:
temp_visited = set() # to get undirected triangles
for node_b_id in nodes[node_a_id]:
if node_b_id == node_a_id:
raise ValueError # to prevent self-loops, if your graph allows self-loops then you don't need this condition
if node_b_id in visited_ids:
continue
for node_c_id in nodes[node_b_id]:
if node_c_id in visited_ids:
continue
if node_c_id in temp_visited:
continue
if node_a_id in nodes[node_c_id]:
yield(node_a_id, node_b_id, node_c_id)
else:
continue
temp_visited.add(node_b_id)
visited_ids.add(node_a_id)
Of course, you need to use a dictionary for example
#### Test cycles ####
nodes = {}
nodes[0] = [1, 2, 3]
nodes[1] = [0, 2]
nodes[2] = [0, 1, 3]
nodes[3] = [1]
cycles = list(generate_triangles(nodes))
print cycles
Using the code of Wisty, the triangles found will be
[(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]
which counted the triangle (0, 1, 2) and (0, 2, 1) as two different triangles. With the code I modified, these are counted as only one triangle.
I used this with a relatively small dictionary of under 100 keys and each key has on average 50 values.
Surprised to see no mention of the Networkx triangles function. I know it doesn't necessarily return the groups of nodes that form a triangle, but should be pretty relevant to many who find themselves on this page.
nx.triangles(G) # list of how many triangles each node is part of
sum(nx.triangles(G).values())/3 # total number of triangles
An alternative way to return clumps of nodes would be something like...
for u,v,d in G.edges(data=True):
u_array = adj_m.getrow(u).nonzero()[1] # get lists of all adjacent nodes
v_array = adj_m.getrow(v).nonzero()[1]
# find the intersection of the two sets - these are the third node of the triangle
np.intersect1d(v_array,u_array)
If you don't care about multiple copies of the same triangle in different order then a list of 3-tuples works:
from itertools import combinations as combos
[(n,nbr,nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2]]
The logic here is to check each pair of neighbors of every node to see if they are connected. G[n] is a fast way to iterate over or look up neighbors.
If you want to get rid of reorderings, turn each triple into a frozenset and make a set of the frozensets:
set(frozenset([n,nbr,nbr2]) for n in G for nbr, nbr2 in combos(G[n]) if nbr in G[nbr2])
If you don't like frozenset and want a list of sets then:
triple_iter = ((n, nbr, nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2])
triangles = set(frozenset(tri) for tri in triple_iter)
nice_triangles = [set(tri) for tri in triangles]
Do you need to find 'all' of the 'triangles', or just 'some'/'any'?
Or perhaps you just need to test whether a particular node is part of a triangle?
The test is simple - given a node A, are there any two connected nodes B & C that are also directly connected.
If you need to find all of the triangles - specifically, all groups of 3 nodes in which each node is joined to the other two - then you need to check every possible group in a very long running 'for each' loop.
The only optimisation is ensuring that you don't check the same 'group' twice, e.g. if you have already tested that B & C aren't in a group with A, then don't check whether A & C are in a group with B.
This is a more efficient version of Ajay M answer (I would have commented it, but I've not enough reputation).
Indeed the enumerate_all_cliques method of networkx will return all cliques in the graph, irrespectively of their length; hence looping over it may take a lot of time (especially with very dense graphs).
Moreover, once defined for triangles, it's just a matter of parametrization to generalize the method for every clique length so here's a function:
import networkx as nx
def get_cliques_by_length(G, length_clique):
""" Return the list of all cliques in an undirected graph G with length
equal to length_clique. """
cliques = []
for c in nx.enumerate_all_cliques(G) :
if len(c) <= length_clique:
if len(c) == length_clique:
cliques.append(c)
else:
return cliques
# return empty list if nothing is found
return cliques
To get triangles just use get_cliques_by_length(G, 3).
Caveat: this method works only for undirected graphs. Algorithm for cliques in directed graphs are not provided in networkx
i just found that nx.edge_disjoint_paths works to count the triangle contains certain edges. faster than nx.enumerate_all_cliques and nx.cycle_basis.
It returns the edges disjoint paths between source and target.Edge disjoint paths are paths that do not share any edge.
And result-1 is the number of triangles that contain certain edges or between source node and target node.
edge_triangle_dict = {}
for i in g.edges:
edge_triangle_dict[i] = len(list(nx.edge_disjoint_paths(g, i[0], i[1]))-1)

Find all strongly connected elements

I have a list which looks like that:
elements=[(1,2),(1,3),(2,3),(3,4),(4,5),(3,5),(5,6),(12,13)]
I want to have all elenents which are strongly connected to be listed.
For the given list, elements would be: [[1,2,3],[3,4,5]]
Please suggest how I can do that
Please use Kosaraju's algorithm to find strongly connected components in any graph.
I think you may find it on geeksforgeeks website. It goes something like this.
Let's say there are 5 nodes, 0 through 4.
0,1,2 are strongly connected, 3 and 4 are strongly connected. 3 connects to say 0.
Then the algorithm goes : (source : https://www.geeksforgeeks.org/strongly-connected-components/)
1) Create an empty stack ā€˜Sā€™ and do DFS traversal of a graph. In DFS traversal, after calling recursive DFS for adjacent vertices of a vertex, push the vertex to stack. In the above graph, if we start DFS from vertex 0, we get vertices in stack as 1, 2, 4, 3, 0.
2) Reverse directions of all arcs to obtain the transpose graph.
3) One by one pop a vertex from S while S is not empty. Let the popped vertex be ā€˜vā€™. Take v as source and do DFS (call DFSUtil(v)). The DFS starting from v prints strongly connected component of v. In the above example, we process vertices in order 0, 3, 4, 2, 1 (One by one popped from stack).
Edit : SCC for undirected graphs doesn't make sense. Theoretically SCCs are defined for directed graphs alone.

Graph Theory: Finding all possible paths of 'n' length (with some constraints)

I was given a sample challenge question by one of my friends. I would like to hear some advice on how to best approach finding a solution.
The problem involves calculating all possible patterns of traversing a series of points on a grid-like scale. I will simply be given a number 'n' that represents how many times I must move, and I must determine the number of ways I can traverse the grid moving n times. The starting point can be any of the points so I must run my calculation on every starting point with my answer being the sum of the results of each starting point.
I am still a bit of a beginner is some regards to programming, and my best guess as to how to approach this problem is to use graph theory. I have started by creating a graph to represent the nodes as well as their neighbors. I am leaving this problem intentionally vague because I want to learn how to approach these kinds of problems rather than having some expert swoop in and simply solve the entire thing for me. Here is an example representation of my graph in Python 3 (a dictionary).
graph = {'a':['b','c']
'b':['a','e','f']
'c':['a','d']
'd':['c']
'e':['b','g'] and etc.
My real graph is significantly bigger with each node typically having at least 3-4 neighbors. Let's pretend the 'n' given is 6, meaning I need to return all possible valid paths that involve moving 6 times. I am allowed to revisit nodes, so a valid path could simply be a-b-a-b-a-b. Another example of a 'valid' path is a-b-a-c-d-c or e-b-a-c-a-b since we can start from any starting point.
I am at a bit of a loss as to how to best approach this problem. Recursion has crossed my mind as a possible solution where I traverse all possible paths and increment a counter each time I hit the 'end' of a path. Another possible solution I have considered is at each node, calculate the possible moves and multiply it with a running tally. For example, starting at 'a', I have two moves. If I navigate to 'b', I have 3 moves. If I navigate to 'c', I have 2 moves. At this point, I have 1*3*2 moves. This could be a completely wrong approach...just an idea I had.
The actual problem is a lot more complex with constraints for certain nodes (how many times you can visit it, rules against visiting it if a certain sequence of nodes were hit prior, etc.) but I will omit the details for now. What I will say is that given these constraints, my algorithm must know what the previous pattern of visited nodes was. At the 5th move, for example, I must be able to refer to the previous 4 moves at any time.
I would love to hear advice on how you would best approach solving the 'simpler' problem I outlined above.
Check out Depth First Search (DFS). Just off the top of my head: Use recursive DFS, use a counter for saving each node found after making 'n' moves. You would need to build an undirected graph representation of the given data, so that you could run the DFS algorithm on the graph.
Here's the simplest answer for the case you gave. Once you have you graph in the form of a "transition map" (which can just be a dictionary, like you've shown), then the following code will work:
def myDFS(trans_dict,start,length,paths,path=[]):
path=path+[start]
if len(path)==length:
paths.append(path)
else:
for node in trans_dict[start]:
myDFS(trans_dict,node,length,paths,path)
If you want the number of ways you can traverse the map with a path of a given length, then that would just be len(paths).
Example:
trans_dict = {0:[1,2],1:[2,3],2:[0,3],3:[3]}
paths = []
length = 3
for a in trans_dict:
myDFS(trans_dict,a,length,paths)
print paths # [[0, 1, 2], [0, 1, 3], [0, 2, 0], [0, 2, 3], [1, 2, 0], [1, 2, 3], [1, 3, 3], [2, 0, 1], [2, 0, 2], [2, 3, 3], [3, 3, 3]]
print len(paths) # 11
Answer was inspired by this Q&A: trying to find all the path in a graph using DFS recursive in Python

How to restrict certain paths in NetworkX graphs?

I am trying to calculate shortest path between 2 points using Dijkstra and A Star algorithms (in a directed NetworkX graph).
At the moment it works fine and I can see the calculated path but I would like to find a way of restricting certain paths.
For example if we have following nodes:
nodes = [1,2,3,4]
With these edges:
edges = ( (1,2),(2,3),(3,4) )
Is there a way of blocking/restricting 1 -> 2 -> 3 but still allow 2 -> 3 & 1 -> 2.
This would mean that:
can travel from 1 to 2
can travel from 2 to 3
cannot travel from 1 to 3 .. directly or indirectly (i.e. restrict 1->2->3 path).
Can this be achieved in NetworkX.. if not is there another graph library in Python that would allow this ?
Thanks.
Interesting question, I never heard of this problem, probably because I don't have much background in this topic, nor much experience with NetworkX. However, I do have a idea for a algorithm. This may just be the most naive way to do this and I'd be glad to hear of a cleverer algorithm.
The idea is that you can use your restriction rules to transform you graph to a new graph where all edges are valid, using the following algorithm.
The restriction of path (1,2,3) can be split in two rules:
If you came over (1,2) then remove (2,3)
If you leave over (2,3) then remove (1,2)
To put this in the graph you can insert copies of node 2 for each case. I'll call the new nodes 1_2 and 2_3 after the valid edge in the respective case. For both nodes, you copy all incoming and outgoing edges minus the restricted edge.
For example:
Nodes = [1,2,3,4]
Edges = [(1,2),(2,3),(4,2)]
The valid path shall only be 4->2->3 not 1->2->3. So we expand the graph:
Nodes = [1,1_2,2_3,3,4] # insert the two states of 2
Edges = [ # first case: no (1_2,3) because of the restriction
(1,1_2), (4, 1_2)
# 2nd case, no (1,2_3)
(2_3,3), (4,2_3)]
The only valid path in this graph is 4->2_3->3. This simply maps to 4->2->3 in the original graph.
I hope this answer can at least help you if you find no existing solution. Longer restriction rules would blow up the graph with a exponentially growing number of state nodes, so either this algorithm is too simple, or the problem is hard ;-)
You could set your node data {color=['blue']} for node 1, node 2 has {color=['red','blue']} and node3 has {color=['red']}. Then use an networkx.algorithms. astar_path() approach setting the
heuristic is set to a function which returns a might_as_well_be_infinity when it encountered an node without the same color you are searching for
weight=less_than_infinity.

Finding cycle of 3 nodes ( or triangles) in a graph

I am working with complex networks. I want to find group of nodes which forms a cycle of 3 nodes (or triangles) in a given graph. As my graph contains about million edges, using a simple iterative solution (multiple "for" loop) is not very efficient.
I am using python for my programming, if these is some inbuilt modules for handling these problems, please let me know.
If someone knows any algorithm which can be used for finding triangles in graphs, kindly reply back.
Assuming its an undirected graph, the answer lies in networkx library of python.
if you just need to count triangles, use:
import networkx as nx
tri=nx.triangles(g)
But if you need to know the edge list with triangle (triadic) relationship, use
all_cliques= nx.enumerate_all_cliques(g)
This will give you all cliques (k=1,2,3...max degree - 1)
So, to filter just triangles i.e k=3,
triad_cliques=[x for x in all_cliques if len(x)==3 ]
The triad_cliques will give a edge list with only triangles.
A million edges is quite small. Unless you are doing it thousands of times, just use a naive implementation.
I'll assume that you have a dictionary of node_ids, which point to a sequence of their neighbors, and that the graph is directed.
For example:
nodes = {}
nodes[0] = 1,2
nodes[1] = tuple() # empty tuple
nodes[2] = 1
My solution:
def generate_triangles(nodes):
"""Generate triangles. Weed out duplicates."""
visited_ids = set() # remember the nodes that we have tested already
for node_a_id in nodes:
for node_b_id in nodes[node_a_id]:
if nod_b_id == node_a_id:
raise ValueError # nodes shouldn't point to themselves
if node_b_id in visited_ids:
continue # we should have already found b->a->??->b
for node_c_id in nodes[node_b_id]:
if node_c_id in visited_ids:
continue # we should have already found c->a->b->c
if node_a_id in nodes[node_c_id]:
yield(node_a_id, node_b_id, node_c_id)
visited_ids.add(node_a_id) # don't search a - we already have all those cycles
Checking performance:
from random import randint
n = 1000000
node_list = range(n)
nodes = {}
for node_id in node_list:
node = tuple()
for i in range(randint(0,10)): # add up to 10 neighbors
try:
neighbor_id = node_list[node_id+randint(-5,5)] # pick a nearby node
except:
continue
if not neighbor_id in node:
node = node + (neighbor_id,)
nodes[node_id] = node
cycles = list(generate_triangles(nodes))
print len(cycles)
When I tried it, it took longer to build the random graph than to count the cycles.
You might want to test it though ;) I won't guarantee that it's correct.
You could also look into networkx, which is the big python graph library.
Pretty easy and clear way to do is to use Networkx:
With Networkx you can get the loops of an undirected graph by nx.cycle_basis(G) and then select the ones with 3 nodes
cycls_3 = [c for c in nx.cycle_basis(G) if len(c)==3]
or you can find all the cliques by find_cliques(G) and then select the ones you want (with 3 nodes). cliques are sections of the graph where all the nodes are connected to each other which happens in cycles/loops with 3 nodes.
Even though it isn't efficient, you may want to implement a solution, so use the loops. Write a test so you can get an idea as to how long it takes.
Then, as you try new approaches you can do two things:
1) Make certain that the answer remains the same.
2) See what the improvement is.
Having a faster algorithm that misses something is probably going to be worse than having a slower one.
Once you have the slow test, you can see if you can do this in parallel and see what the performance increase is.
Then, you can see if you can mark all nodes that have less than 3 vertices.
Ideally, you may want to shrink it down to just 100 or so first, so you can draw it, and see what is happening graphically.
Sometimes your brain will see a pattern that isn't as obvious when looking at algorithms.
I don't want to sound harsh, but have you tried to Google it? The first link is a pretty quick algorithm to do that:
http://www.mail-archive.com/algogeeks#googlegroups.com/msg05642.html
And then there is this article on ACM (which you may have access to):
http://portal.acm.org/citation.cfm?id=244866
(and if you don't have access, I am sure if you kindly ask the lady who wrote it, you will get a copy.)
Also, I can imagine a triangle enumeration method based on clique-decomposition, but I don't know if it was described somewhere.
I am working on the same problem of counting number of triangles on undirected graph and wisty's solution works really well in my case. I have modified it a bit so only undirected triangles are counted.
#### function for counting undirected cycles
def generate_triangles(nodes):
visited_ids = set() # mark visited node
for node_a_id in nodes:
temp_visited = set() # to get undirected triangles
for node_b_id in nodes[node_a_id]:
if node_b_id == node_a_id:
raise ValueError # to prevent self-loops, if your graph allows self-loops then you don't need this condition
if node_b_id in visited_ids:
continue
for node_c_id in nodes[node_b_id]:
if node_c_id in visited_ids:
continue
if node_c_id in temp_visited:
continue
if node_a_id in nodes[node_c_id]:
yield(node_a_id, node_b_id, node_c_id)
else:
continue
temp_visited.add(node_b_id)
visited_ids.add(node_a_id)
Of course, you need to use a dictionary for example
#### Test cycles ####
nodes = {}
nodes[0] = [1, 2, 3]
nodes[1] = [0, 2]
nodes[2] = [0, 1, 3]
nodes[3] = [1]
cycles = list(generate_triangles(nodes))
print cycles
Using the code of Wisty, the triangles found will be
[(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]
which counted the triangle (0, 1, 2) and (0, 2, 1) as two different triangles. With the code I modified, these are counted as only one triangle.
I used this with a relatively small dictionary of under 100 keys and each key has on average 50 values.
Surprised to see no mention of the Networkx triangles function. I know it doesn't necessarily return the groups of nodes that form a triangle, but should be pretty relevant to many who find themselves on this page.
nx.triangles(G) # list of how many triangles each node is part of
sum(nx.triangles(G).values())/3 # total number of triangles
An alternative way to return clumps of nodes would be something like...
for u,v,d in G.edges(data=True):
u_array = adj_m.getrow(u).nonzero()[1] # get lists of all adjacent nodes
v_array = adj_m.getrow(v).nonzero()[1]
# find the intersection of the two sets - these are the third node of the triangle
np.intersect1d(v_array,u_array)
If you don't care about multiple copies of the same triangle in different order then a list of 3-tuples works:
from itertools import combinations as combos
[(n,nbr,nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2]]
The logic here is to check each pair of neighbors of every node to see if they are connected. G[n] is a fast way to iterate over or look up neighbors.
If you want to get rid of reorderings, turn each triple into a frozenset and make a set of the frozensets:
set(frozenset([n,nbr,nbr2]) for n in G for nbr, nbr2 in combos(G[n]) if nbr in G[nbr2])
If you don't like frozenset and want a list of sets then:
triple_iter = ((n, nbr, nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2])
triangles = set(frozenset(tri) for tri in triple_iter)
nice_triangles = [set(tri) for tri in triangles]
Do you need to find 'all' of the 'triangles', or just 'some'/'any'?
Or perhaps you just need to test whether a particular node is part of a triangle?
The test is simple - given a node A, are there any two connected nodes B & C that are also directly connected.
If you need to find all of the triangles - specifically, all groups of 3 nodes in which each node is joined to the other two - then you need to check every possible group in a very long running 'for each' loop.
The only optimisation is ensuring that you don't check the same 'group' twice, e.g. if you have already tested that B & C aren't in a group with A, then don't check whether A & C are in a group with B.
This is a more efficient version of Ajay M answer (I would have commented it, but I've not enough reputation).
Indeed the enumerate_all_cliques method of networkx will return all cliques in the graph, irrespectively of their length; hence looping over it may take a lot of time (especially with very dense graphs).
Moreover, once defined for triangles, it's just a matter of parametrization to generalize the method for every clique length so here's a function:
import networkx as nx
def get_cliques_by_length(G, length_clique):
""" Return the list of all cliques in an undirected graph G with length
equal to length_clique. """
cliques = []
for c in nx.enumerate_all_cliques(G) :
if len(c) <= length_clique:
if len(c) == length_clique:
cliques.append(c)
else:
return cliques
# return empty list if nothing is found
return cliques
To get triangles just use get_cliques_by_length(G, 3).
Caveat: this method works only for undirected graphs. Algorithm for cliques in directed graphs are not provided in networkx
i just found that nx.edge_disjoint_paths works to count the triangle contains certain edges. faster than nx.enumerate_all_cliques and nx.cycle_basis.
It returns the edges disjoint paths between source and target.Edge disjoint paths are paths that do not share any edge.
And result-1 is the number of triangles that contain certain edges or between source node and target node.
edge_triangle_dict = {}
for i in g.edges:
edge_triangle_dict[i] = len(list(nx.edge_disjoint_paths(g, i[0], i[1]))-1)

Categories

Resources