Remove reversible edges from directed graph networkx - python

Is there a way of removing reversible edges in a graph. For instance, let's say the following graph
import networkx as nx
G=nx.DiGraph()
G.add_edge(1,2)
G.add_edge(2,3)
G.add_edge(2,1)
G.add_edge(3,1)
print (G.edges())
[(1, 2), (2, 3), (2,1), (3,1)]
I want to remove (2,1) and (3,1), since I want the graph to be directed in just one direction. I know you can remove self-loops by using G.remove_edges_from(G.selfloop_edges()) but that's not the case here. The output I am looking for would be [(1, 2), (2, 3)]. Is there a way to remove this edges once the graph is created either by networkx or by other graph tool such as cytoscape?.

Method 1:
remove duplicate entries in edgelist -> remove everything from graph -> add back single edges => graph with single edges
Edges are stored as tuples. You could lose index information via temporary conversion to sets. You could then lose duplicate tuples, again, through temporary conversion to a set. After conversion back to a list, you have your list of edges, with duplicate entries removed, like so:
stripped_list = list(set([tuple(set(edge)) for edge in G.edges()]))
Then remove all edges from the graph that are currently there, and add back those that are in the list just created:
G.remove_edges_from([e for e in G.edges()])
G.add_edges_from(stripped_list)
Method 2:
find duplicate edges -> remove only those from graph => graph with single edges
again, losing positional information via conversion to sets:
set_list = [set(a) for a in G.edges()] # collect all edges, lose positional information
remove_list = [] # initialise
for i in range(len(set_list)):
edge = set_list.pop(0) # look at zeroth element in list:
# if there is still an edge like the current one in the list,
# add the current edge to the remove list:
if set_list.count(edge) > 0:
u,v = edge
# add the reversed edge
remove_list.append((v, u))
# alternatively, add the original edge:
# remove_list.append((u, v))
G.remove_edges_from(remove_list) # remove all edges collected above
As far as I know, networkx does not store the order that edges were added in, so unless you want to write further logic, you either remove all duplicate edges going from nodes with lower number to nodes with higher number, or the other way round.

Related

Add position to a NetworkX graph using dictionary

I have a DiGraph object called G1, a graph network with edges. G1 is composed by a list of nodes and I want to give them coordinates stored in a python dictionary.
This is the list of nodes:
LIST OF NODES
For every node I have built a python dictionary with node's name as keys and a tuple of coordinates as values:
DICTIONARY WITH COORDINATES
I want to add the attribute position (pos) to every node with those coordinates.
At the moment I have tried using this cycle:
FOR LOOP TO ADD COORDINATES
But as a result only the last node appears to have coordinates, it seems like the data are being subscribed with this method:
ERROR
The result should be a graph network plotted on a xy space with the right coordinates obtained with the code:
PLOT THE GRAPH
I am obtaining the following error:
KeyError: (78.44, 88.3)
I think it's mostly syntax errors here. Try this:
for node, pos in avg.items():
G1.nodes[node]['pos'] = pos
Then, when building your visual, build your list of positions beforehand like this and just use the pos=pos convention when calling nx.draw().
pos = {node: G.nodes[node]['pos'] for node in G.nodes()}
**** EDIT: ****
There's a built-in way to do this that is much cleaner:
nx.set_node_attributes(G1, avg, 'pos')
Then, when drawing, use:
nx.draw(G1, pos=nx.get_nodes_attributes(G1, 'pos'))

How to find trio in list of lists? [duplicate]

I am working with complex networks. I want to find group of nodes which forms a cycle of 3 nodes (or triangles) in a given graph. As my graph contains about million edges, using a simple iterative solution (multiple "for" loop) is not very efficient.
I am using python for my programming, if these is some inbuilt modules for handling these problems, please let me know.
If someone knows any algorithm which can be used for finding triangles in graphs, kindly reply back.
Assuming its an undirected graph, the answer lies in networkx library of python.
if you just need to count triangles, use:
import networkx as nx
tri=nx.triangles(g)
But if you need to know the edge list with triangle (triadic) relationship, use
all_cliques= nx.enumerate_all_cliques(g)
This will give you all cliques (k=1,2,3...max degree - 1)
So, to filter just triangles i.e k=3,
triad_cliques=[x for x in all_cliques if len(x)==3 ]
The triad_cliques will give a edge list with only triangles.
A million edges is quite small. Unless you are doing it thousands of times, just use a naive implementation.
I'll assume that you have a dictionary of node_ids, which point to a sequence of their neighbors, and that the graph is directed.
For example:
nodes = {}
nodes[0] = 1,2
nodes[1] = tuple() # empty tuple
nodes[2] = 1
My solution:
def generate_triangles(nodes):
"""Generate triangles. Weed out duplicates."""
visited_ids = set() # remember the nodes that we have tested already
for node_a_id in nodes:
for node_b_id in nodes[node_a_id]:
if nod_b_id == node_a_id:
raise ValueError # nodes shouldn't point to themselves
if node_b_id in visited_ids:
continue # we should have already found b->a->??->b
for node_c_id in nodes[node_b_id]:
if node_c_id in visited_ids:
continue # we should have already found c->a->b->c
if node_a_id in nodes[node_c_id]:
yield(node_a_id, node_b_id, node_c_id)
visited_ids.add(node_a_id) # don't search a - we already have all those cycles
Checking performance:
from random import randint
n = 1000000
node_list = range(n)
nodes = {}
for node_id in node_list:
node = tuple()
for i in range(randint(0,10)): # add up to 10 neighbors
try:
neighbor_id = node_list[node_id+randint(-5,5)] # pick a nearby node
except:
continue
if not neighbor_id in node:
node = node + (neighbor_id,)
nodes[node_id] = node
cycles = list(generate_triangles(nodes))
print len(cycles)
When I tried it, it took longer to build the random graph than to count the cycles.
You might want to test it though ;) I won't guarantee that it's correct.
You could also look into networkx, which is the big python graph library.
Pretty easy and clear way to do is to use Networkx:
With Networkx you can get the loops of an undirected graph by nx.cycle_basis(G) and then select the ones with 3 nodes
cycls_3 = [c for c in nx.cycle_basis(G) if len(c)==3]
or you can find all the cliques by find_cliques(G) and then select the ones you want (with 3 nodes). cliques are sections of the graph where all the nodes are connected to each other which happens in cycles/loops with 3 nodes.
Even though it isn't efficient, you may want to implement a solution, so use the loops. Write a test so you can get an idea as to how long it takes.
Then, as you try new approaches you can do two things:
1) Make certain that the answer remains the same.
2) See what the improvement is.
Having a faster algorithm that misses something is probably going to be worse than having a slower one.
Once you have the slow test, you can see if you can do this in parallel and see what the performance increase is.
Then, you can see if you can mark all nodes that have less than 3 vertices.
Ideally, you may want to shrink it down to just 100 or so first, so you can draw it, and see what is happening graphically.
Sometimes your brain will see a pattern that isn't as obvious when looking at algorithms.
I don't want to sound harsh, but have you tried to Google it? The first link is a pretty quick algorithm to do that:
http://www.mail-archive.com/algogeeks#googlegroups.com/msg05642.html
And then there is this article on ACM (which you may have access to):
http://portal.acm.org/citation.cfm?id=244866
(and if you don't have access, I am sure if you kindly ask the lady who wrote it, you will get a copy.)
Also, I can imagine a triangle enumeration method based on clique-decomposition, but I don't know if it was described somewhere.
I am working on the same problem of counting number of triangles on undirected graph and wisty's solution works really well in my case. I have modified it a bit so only undirected triangles are counted.
#### function for counting undirected cycles
def generate_triangles(nodes):
visited_ids = set() # mark visited node
for node_a_id in nodes:
temp_visited = set() # to get undirected triangles
for node_b_id in nodes[node_a_id]:
if node_b_id == node_a_id:
raise ValueError # to prevent self-loops, if your graph allows self-loops then you don't need this condition
if node_b_id in visited_ids:
continue
for node_c_id in nodes[node_b_id]:
if node_c_id in visited_ids:
continue
if node_c_id in temp_visited:
continue
if node_a_id in nodes[node_c_id]:
yield(node_a_id, node_b_id, node_c_id)
else:
continue
temp_visited.add(node_b_id)
visited_ids.add(node_a_id)
Of course, you need to use a dictionary for example
#### Test cycles ####
nodes = {}
nodes[0] = [1, 2, 3]
nodes[1] = [0, 2]
nodes[2] = [0, 1, 3]
nodes[3] = [1]
cycles = list(generate_triangles(nodes))
print cycles
Using the code of Wisty, the triangles found will be
[(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]
which counted the triangle (0, 1, 2) and (0, 2, 1) as two different triangles. With the code I modified, these are counted as only one triangle.
I used this with a relatively small dictionary of under 100 keys and each key has on average 50 values.
Surprised to see no mention of the Networkx triangles function. I know it doesn't necessarily return the groups of nodes that form a triangle, but should be pretty relevant to many who find themselves on this page.
nx.triangles(G) # list of how many triangles each node is part of
sum(nx.triangles(G).values())/3 # total number of triangles
An alternative way to return clumps of nodes would be something like...
for u,v,d in G.edges(data=True):
u_array = adj_m.getrow(u).nonzero()[1] # get lists of all adjacent nodes
v_array = adj_m.getrow(v).nonzero()[1]
# find the intersection of the two sets - these are the third node of the triangle
np.intersect1d(v_array,u_array)
If you don't care about multiple copies of the same triangle in different order then a list of 3-tuples works:
from itertools import combinations as combos
[(n,nbr,nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2]]
The logic here is to check each pair of neighbors of every node to see if they are connected. G[n] is a fast way to iterate over or look up neighbors.
If you want to get rid of reorderings, turn each triple into a frozenset and make a set of the frozensets:
set(frozenset([n,nbr,nbr2]) for n in G for nbr, nbr2 in combos(G[n]) if nbr in G[nbr2])
If you don't like frozenset and want a list of sets then:
triple_iter = ((n, nbr, nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2])
triangles = set(frozenset(tri) for tri in triple_iter)
nice_triangles = [set(tri) for tri in triangles]
Do you need to find 'all' of the 'triangles', or just 'some'/'any'?
Or perhaps you just need to test whether a particular node is part of a triangle?
The test is simple - given a node A, are there any two connected nodes B & C that are also directly connected.
If you need to find all of the triangles - specifically, all groups of 3 nodes in which each node is joined to the other two - then you need to check every possible group in a very long running 'for each' loop.
The only optimisation is ensuring that you don't check the same 'group' twice, e.g. if you have already tested that B & C aren't in a group with A, then don't check whether A & C are in a group with B.
This is a more efficient version of Ajay M answer (I would have commented it, but I've not enough reputation).
Indeed the enumerate_all_cliques method of networkx will return all cliques in the graph, irrespectively of their length; hence looping over it may take a lot of time (especially with very dense graphs).
Moreover, once defined for triangles, it's just a matter of parametrization to generalize the method for every clique length so here's a function:
import networkx as nx
def get_cliques_by_length(G, length_clique):
""" Return the list of all cliques in an undirected graph G with length
equal to length_clique. """
cliques = []
for c in nx.enumerate_all_cliques(G) :
if len(c) <= length_clique:
if len(c) == length_clique:
cliques.append(c)
else:
return cliques
# return empty list if nothing is found
return cliques
To get triangles just use get_cliques_by_length(G, 3).
Caveat: this method works only for undirected graphs. Algorithm for cliques in directed graphs are not provided in networkx
i just found that nx.edge_disjoint_paths works to count the triangle contains certain edges. faster than nx.enumerate_all_cliques and nx.cycle_basis.
It returns the edges disjoint paths between source and target.Edge disjoint paths are paths that do not share any edge.
And result-1 is the number of triangles that contain certain edges or between source node and target node.
edge_triangle_dict = {}
for i in g.edges:
edge_triangle_dict[i] = len(list(nx.edge_disjoint_paths(g, i[0], i[1]))-1)

How to check whether a graph is an undirected graph?

Currently, I am creating a function to check whether a graph is un-directed.
The way, my graphs are stored are in this way. This is a un-directed graph of 3 nodes, 1, 2, 3.
graph = {1: {2:{...}, 3:{...}}, 2: {1:{...}, 3:{...}}, 3: {1:{...}, 2:{...}}}
the {...} represents alternating layers of the dictionaries for the connections in each of the nodes. It is infinitely recurring, since it is nested in each other.
More details about graph:
the keys refer to the node, and it's values refer to a dict, with the nodes that are connected to the key.
Example: two nodes (1, 2) with an undirected edge: graph = {1: {2: {1: {...}}}, 2: {1: {2: {...}}}}
Example2: two nodes (1, 2) with a directed edge from 1 to 2: graph = {1: {2: {}}, 2: {}}
My current way of figuring out whether a graph is un-directed or not, is by checking whether the number of edges in the graph is equal to (n*(n-1))/2 (n represents the number of nodes) , but this cannot differentiate between 15 directed edges and 15 un-directed edges, so what other way can i confirm that my graph is undirected?
First off, I think you're abusing terminology by calling a graph with edges in both directions "undirected". In a real undirected graph, there is no notion of direction to an edge, which often means you don't need redundant direction information in the graph's representation in a computer program. What you have is a directed graph, and you want to see if it could be represented by an undirected graph, even though you're not doing so yet.
I'm not sure there's any easier way to do this than by checking every edge in the graph to see if the reversed edge also exists. This is pretty easy with your graph structure, just loop over the verticies and check if there is a returning edge for every outgoing edge:
def undirected_compatible(graph):
for src, edges in graph.items(): # edges is dict of outgoing edges from src
for dst, dst_edges in edges.items(): # dst_edges is dict of outgoing edges from dst
if src not in dst_edges:
return False
return True
I'd note that a more typical way of describing a graph like yours would be to omit the nested dictionaries and just give a list of destinations for the edges. A fully connected 3-node graph would be:
{1: [2, 3], 2: [1, 3], 3: [1, 2]}
You can get the same information from this graph as your current one, you'd just need an extra indirection to look up the destination node in the top level graph dict, rather than having it be the value of the corresponding key in the edge container already. A version of my function above for this more conventional structure would be:
def undirected_compatible(graph):
for src, edges in graph.items():
for dst in edges:
if src not in graph[dst]:
return False
return True
The not in test may make this slower for large graphs, since searching a list for an item is less asymptotically efficient than checking if a key is in a dictionary. If you needed the higher performance, you could use sets instead of lists, to speed up the membership tests.

Topologically sort directed graph into buckets for disjoint sub graphs

I am looking for an algorithm which can take a graph and topologically sort it such that it produces a set of lists, each which contains the topologically sorted vertices of a disjoint subgraph.
The difficult part is merging the lists when a node depends on a node in two different lists.
Here is my incomplete code/pseudocode where graph is a dict {node: [node, node, ...]}
Topologically sort graph into disjoint lists
sorted_subgraphs = []
while graph:
cyclic = True
for node, edges in list(graph.items()):
for edge in edges:
if edge in graph:
break
else:
del graph[node]
cyclic = False
sub_sorted = []
for edge in edges:
bucket.extend(...) # Get the list with edge in it, and remove it from sorted_subgraphs
bucket.append(node)
sorted_subgraphs.append(bucket)
if cyclic:
raise Exception('Cyclic graph')
First divide it into disjoint subgraphs using a flood fill algorithm, and then topologically sort each one.

Finding cycle of 3 nodes ( or triangles) in a graph

I am working with complex networks. I want to find group of nodes which forms a cycle of 3 nodes (or triangles) in a given graph. As my graph contains about million edges, using a simple iterative solution (multiple "for" loop) is not very efficient.
I am using python for my programming, if these is some inbuilt modules for handling these problems, please let me know.
If someone knows any algorithm which can be used for finding triangles in graphs, kindly reply back.
Assuming its an undirected graph, the answer lies in networkx library of python.
if you just need to count triangles, use:
import networkx as nx
tri=nx.triangles(g)
But if you need to know the edge list with triangle (triadic) relationship, use
all_cliques= nx.enumerate_all_cliques(g)
This will give you all cliques (k=1,2,3...max degree - 1)
So, to filter just triangles i.e k=3,
triad_cliques=[x for x in all_cliques if len(x)==3 ]
The triad_cliques will give a edge list with only triangles.
A million edges is quite small. Unless you are doing it thousands of times, just use a naive implementation.
I'll assume that you have a dictionary of node_ids, which point to a sequence of their neighbors, and that the graph is directed.
For example:
nodes = {}
nodes[0] = 1,2
nodes[1] = tuple() # empty tuple
nodes[2] = 1
My solution:
def generate_triangles(nodes):
"""Generate triangles. Weed out duplicates."""
visited_ids = set() # remember the nodes that we have tested already
for node_a_id in nodes:
for node_b_id in nodes[node_a_id]:
if nod_b_id == node_a_id:
raise ValueError # nodes shouldn't point to themselves
if node_b_id in visited_ids:
continue # we should have already found b->a->??->b
for node_c_id in nodes[node_b_id]:
if node_c_id in visited_ids:
continue # we should have already found c->a->b->c
if node_a_id in nodes[node_c_id]:
yield(node_a_id, node_b_id, node_c_id)
visited_ids.add(node_a_id) # don't search a - we already have all those cycles
Checking performance:
from random import randint
n = 1000000
node_list = range(n)
nodes = {}
for node_id in node_list:
node = tuple()
for i in range(randint(0,10)): # add up to 10 neighbors
try:
neighbor_id = node_list[node_id+randint(-5,5)] # pick a nearby node
except:
continue
if not neighbor_id in node:
node = node + (neighbor_id,)
nodes[node_id] = node
cycles = list(generate_triangles(nodes))
print len(cycles)
When I tried it, it took longer to build the random graph than to count the cycles.
You might want to test it though ;) I won't guarantee that it's correct.
You could also look into networkx, which is the big python graph library.
Pretty easy and clear way to do is to use Networkx:
With Networkx you can get the loops of an undirected graph by nx.cycle_basis(G) and then select the ones with 3 nodes
cycls_3 = [c for c in nx.cycle_basis(G) if len(c)==3]
or you can find all the cliques by find_cliques(G) and then select the ones you want (with 3 nodes). cliques are sections of the graph where all the nodes are connected to each other which happens in cycles/loops with 3 nodes.
Even though it isn't efficient, you may want to implement a solution, so use the loops. Write a test so you can get an idea as to how long it takes.
Then, as you try new approaches you can do two things:
1) Make certain that the answer remains the same.
2) See what the improvement is.
Having a faster algorithm that misses something is probably going to be worse than having a slower one.
Once you have the slow test, you can see if you can do this in parallel and see what the performance increase is.
Then, you can see if you can mark all nodes that have less than 3 vertices.
Ideally, you may want to shrink it down to just 100 or so first, so you can draw it, and see what is happening graphically.
Sometimes your brain will see a pattern that isn't as obvious when looking at algorithms.
I don't want to sound harsh, but have you tried to Google it? The first link is a pretty quick algorithm to do that:
http://www.mail-archive.com/algogeeks#googlegroups.com/msg05642.html
And then there is this article on ACM (which you may have access to):
http://portal.acm.org/citation.cfm?id=244866
(and if you don't have access, I am sure if you kindly ask the lady who wrote it, you will get a copy.)
Also, I can imagine a triangle enumeration method based on clique-decomposition, but I don't know if it was described somewhere.
I am working on the same problem of counting number of triangles on undirected graph and wisty's solution works really well in my case. I have modified it a bit so only undirected triangles are counted.
#### function for counting undirected cycles
def generate_triangles(nodes):
visited_ids = set() # mark visited node
for node_a_id in nodes:
temp_visited = set() # to get undirected triangles
for node_b_id in nodes[node_a_id]:
if node_b_id == node_a_id:
raise ValueError # to prevent self-loops, if your graph allows self-loops then you don't need this condition
if node_b_id in visited_ids:
continue
for node_c_id in nodes[node_b_id]:
if node_c_id in visited_ids:
continue
if node_c_id in temp_visited:
continue
if node_a_id in nodes[node_c_id]:
yield(node_a_id, node_b_id, node_c_id)
else:
continue
temp_visited.add(node_b_id)
visited_ids.add(node_a_id)
Of course, you need to use a dictionary for example
#### Test cycles ####
nodes = {}
nodes[0] = [1, 2, 3]
nodes[1] = [0, 2]
nodes[2] = [0, 1, 3]
nodes[3] = [1]
cycles = list(generate_triangles(nodes))
print cycles
Using the code of Wisty, the triangles found will be
[(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]
which counted the triangle (0, 1, 2) and (0, 2, 1) as two different triangles. With the code I modified, these are counted as only one triangle.
I used this with a relatively small dictionary of under 100 keys and each key has on average 50 values.
Surprised to see no mention of the Networkx triangles function. I know it doesn't necessarily return the groups of nodes that form a triangle, but should be pretty relevant to many who find themselves on this page.
nx.triangles(G) # list of how many triangles each node is part of
sum(nx.triangles(G).values())/3 # total number of triangles
An alternative way to return clumps of nodes would be something like...
for u,v,d in G.edges(data=True):
u_array = adj_m.getrow(u).nonzero()[1] # get lists of all adjacent nodes
v_array = adj_m.getrow(v).nonzero()[1]
# find the intersection of the two sets - these are the third node of the triangle
np.intersect1d(v_array,u_array)
If you don't care about multiple copies of the same triangle in different order then a list of 3-tuples works:
from itertools import combinations as combos
[(n,nbr,nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2]]
The logic here is to check each pair of neighbors of every node to see if they are connected. G[n] is a fast way to iterate over or look up neighbors.
If you want to get rid of reorderings, turn each triple into a frozenset and make a set of the frozensets:
set(frozenset([n,nbr,nbr2]) for n in G for nbr, nbr2 in combos(G[n]) if nbr in G[nbr2])
If you don't like frozenset and want a list of sets then:
triple_iter = ((n, nbr, nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2])
triangles = set(frozenset(tri) for tri in triple_iter)
nice_triangles = [set(tri) for tri in triangles]
Do you need to find 'all' of the 'triangles', or just 'some'/'any'?
Or perhaps you just need to test whether a particular node is part of a triangle?
The test is simple - given a node A, are there any two connected nodes B & C that are also directly connected.
If you need to find all of the triangles - specifically, all groups of 3 nodes in which each node is joined to the other two - then you need to check every possible group in a very long running 'for each' loop.
The only optimisation is ensuring that you don't check the same 'group' twice, e.g. if you have already tested that B & C aren't in a group with A, then don't check whether A & C are in a group with B.
This is a more efficient version of Ajay M answer (I would have commented it, but I've not enough reputation).
Indeed the enumerate_all_cliques method of networkx will return all cliques in the graph, irrespectively of their length; hence looping over it may take a lot of time (especially with very dense graphs).
Moreover, once defined for triangles, it's just a matter of parametrization to generalize the method for every clique length so here's a function:
import networkx as nx
def get_cliques_by_length(G, length_clique):
""" Return the list of all cliques in an undirected graph G with length
equal to length_clique. """
cliques = []
for c in nx.enumerate_all_cliques(G) :
if len(c) <= length_clique:
if len(c) == length_clique:
cliques.append(c)
else:
return cliques
# return empty list if nothing is found
return cliques
To get triangles just use get_cliques_by_length(G, 3).
Caveat: this method works only for undirected graphs. Algorithm for cliques in directed graphs are not provided in networkx
i just found that nx.edge_disjoint_paths works to count the triangle contains certain edges. faster than nx.enumerate_all_cliques and nx.cycle_basis.
It returns the edges disjoint paths between source and target.Edge disjoint paths are paths that do not share any edge.
And result-1 is the number of triangles that contain certain edges or between source node and target node.
edge_triangle_dict = {}
for i in g.edges:
edge_triangle_dict[i] = len(list(nx.edge_disjoint_paths(g, i[0], i[1]))-1)

Categories

Resources