Find highest weight edge(s) for a given node - python

I have a directed graph in NetworkX. The edges are weighted from 0 to 1, representing probabilities that they occurred. The network connectivity is quite high, so I want to prune the edges such for every node, only the highest probability node remains.
I'm not sure how to iterate over every node and keep only the highest weighted in_edges in the graph. Is there a networkx function that allows us to do this?
Here is an example of what I'd like to be able to do.
Nodes:
A, B, C, D
Edges:
A->B, weight=1.0
A->C, weight=1.0
A->D, weight=0.5
B->C, weight=0.9
B->D, weight=0.8
C->D, weight=0.9
Final Result Wanted:
A->B, weight=1.0
A->C, weight=1.0
C->D, weight=0.9
If there are two edges into a node, and they are both of the highest weight, I'd like to keep them both.

Here are some ideas:
import networkx as nx
G = nx.DiGraph()
G.add_edge('A','B', weight=1.0)
G.add_edge('A','C', weight=1.0)
G.add_edge('A','D', weight=0.5)
G.add_edge('B','C', weight=0.9)
G.add_edge('B','D', weight=0.8)
G.add_edge('C','D', weight=0.9)
print "all edges"
print G.edges(data=True)
print "edges >= 0.9"
print [(u,v,d) for (u,v,d) in G.edges(data=True) if d['weight'] >= 0.9]
print "sorted by weight"
print sorted(G.edges(data=True), key=lambda (source,target,data): data['weight'])

The solution I had was inspired by Aric. I used the following code:
for node in G.nodes():
edges = G.in_edges(node, data=True)
if len(edges) > 0: #some nodes have zero edges going into it
min_weight = min([edge[2]['weight'] for edge in edges])
for edge in edges:
if edge[2]['weight'] > min_weight:
G.remove_edge(edge[0], edge[1])

The solution of ericmjl provided does not work entirely in my program due to the following Runtime Error.
Moreover, it keeps the edges with the lowest probability, not the highest, as asked in the question (because: remove all edges with weight > min, instead remove all with weight < max). Furthermore, it sufficient to do the inner loop for len(edges) > 1, because we want to remove all edges from nodes with more than one edge.
The complete solution:
for node in G.nodes():
edges = G.edges(node, data=True)
if len(edges) > 1: # some nodes have zero edges going into it
max_weight = max([edge[2]['weight'] for edge in edges])
for edge in list(edges):
if edge[2]['weight'] < max_weight:
G.remove_edge(edge[0], edge[1])

Related

Networkx bipartite graph edge cover

I need to compute an edge cover of a weighted bipartite graph which I have built in Networkx. Based on this answer, I have two algorithms that respectively return a minimum weight edge cover and a minimum cardinality (and weight) one. The minimum weight algorithm presents some odd behaviour in the choice of edges, which may be related to an error that happens in the minimum cardinality algorithm, so I'll explain both situations below.
Here are a few details about the graphs being considered:
My current test case has about 1200 nodes on one side and 1600 on the other, with over a million edges
All nodes have at least one incident edge
The graph is typically disconnected in a few blocks
The problem is built as an undirected graph, but directed edges would also make sense (they would always be from the set with bipartite==_og_id to the other)
Minimum weight algorithm
This algorithm seems to always pick the vv' edges (i.e., the edges that are between a node in the original graph and its copy in the larger graph). I thought it was because some edges had a weight of 0 (causing the vv'edge to also have a weight of 0), but adding a minimum weight when building the graph did not change this behaviour. (I use 0.1 since the minimum nonzero weight in the graph should be 1) This basically reverts the algorithm to "for each node, pick the edge that has the smallest weight" which is suboptimal.
Code:
def _min_weight_edge_cover(g: nx.Graph):
"""Returns an edge cover that minimizes the total weight of included edges, but not the total number of edges"""
clone = g.copy()
for node, bi in g.nodes(data='bipartite'):
nd = f"{node}_copy"
clone.nodes[node]['copy'] = False
clone.add_node(nd, copy=True, bipartite=(_og_id if bi == _tg_id else _tg_id)) # invert the bipartite flag
minw = min([w for u, v, w in g.edges(node, data='weight')])
clone.add_edge(node, nd, weight=(2 * minw))
# Now clone contains both the nodes of g and their copies, and should still be bipartite
tops = {n for n, d in clone.nodes(data=True) if d['bipartite'] == _og_id}
bots = set(clone) - tops
print(f"[cover] we have {len(tops)} tops and {len(bots)} bots")
# Here the matching should always exist and be perfect
matching = nx.bipartite.minimum_weight_full_matching(clone, tops)
cover = g.copy()
cover.clear_edges()
keys = {k for k in matching.keys() if clone.nodes[k]['copy'] is False}
for k in keys:
v = matching[k]
if g.has_edge(k, v):
# We never get here
cover.add_edge(k, v)
else:
# v was a copy - this is always true
assert clone.nodes[v]['copy']
minw = math.inf
mine = None
# FIXME should check that we don't add edges between nodes that are already covered
for u, va, w in g.edges(k, data='weight'):
if w < minw:
minw = w
mine = (u, va)
cover.add_edge(*mine)
return cover
Minimum cardinality (and weight)
This algorithm is much simpler (start with a matching and then add the cheapest edge of each node not included in the matching). However, the nx.bipartite.minimum_weight_full_matching function causes an error with cost matrix is infeasible in scipy.optimize.linear_sum_assignment. Unfortunately, there are no details on what makes the cost matrix infeasible. The documentation states that the function takes into account the different number of nodes in the sets, and I've made sure that all nodes have at least one edge. networkx.min_weight matching does work, but it's much, much slower than the bipartite version.
Code:
def _min_cardinality_weight_edge_cover(g: nx.Graph) -> nx.Graph:
"""Returns an edge cover that minimizes
1. the number of edges included;
2. the total weight of all edges included
"""
# get the minimum weight matching.
# By definition, it will have at most one edge per node but some node may end up unmatched
matching = nx.bipartite.minimum_weight_full_matching(g, top_nodes={n for n, b in g.nodes(data='bipartite') if b ==_og_id})
# to make it into a cover, we take all edges from the matching and, for each node not matched, add its cheapest edge
cover = nx.Graph()
cover.add_edges_from(matching.items())
missing = set(g.nodes) - set(cover.nodes)
# there shouldn't be a case where two missing nodes could connect to each other or else that edge would have been
# included in the matching
for node in missing:
minw = math.inf
mine = None
for u, v, w in g.edges(node, data='weight'):
if w < minw:
minw = w
mine = (u, v)
cover.add_edge(*mine)
return cover
Any ideas as to what could be causing these issues?

How to find the shortest path in a graph where the origin is the same as the destination?

I'm currently using python and networkX. I'm trying to obtain the minimum cost path when the origin and destination of a node is the same, what I've been trying so far is the following (this is just a minimum example).
import networkx as nx
edges = [(0,0,{'Cost':1e100}), #Since this is the node I want to check, the self-loop for it is very high.
(0,1,{'Cost':5.0}),
(1,1,{'Cost':0.0}),
(1,0,{'Cost':10.0})]
G = nx.DiGraph()
G.add_edges_from(edges)
#I remove the self-loop for node 0 since I thought this way I could get the response I'm looking for
G.remove_edge(0,0)
shortest = nx.shortest_path(G,0,0,weight = 'Cost')
The expected response should be:
print(shortest) --> [0,1,0] #With a total cost of 15
But what I get is:
print(shortest) --> [0] # Which means a total cost of 1e100
Any ideas? I was thinking on making a copy of node 0 named something like '0p' and then add the same edges, which seems to work
G = nx.DiGraph()
edges = [(0,0,{'Cost':1e100}),
(0,1,{'Cost':5}),
(1,1,{'Cost':0}),
(1,0,{'Cost':10}),
('0p',1,{'Cost':5}),
(1,'0p',{'Cost':10})]
G.add_edges_from(edges)
shortest_path = nx.shortest_path(G, 0, '0p', weight='Cost')
print(shortest_path) --> [0,1,'0p']
But I don't think that scales very well with the number of nodes.

Discrepancy in calculating graph coloring code complexity

Consider the code below. Suppose the graph in question has N nodes with at most D neighbors for each node, and D+1 colors are available for coloring the nodes such that no two nodes connected with an edge have the same color assigned to them. I reckon the complexity of the code below is O(N*D) because for each of the N nodes we loop through the at most D neighbors of that node to populate the set illegal_colors, and then iterate through colors list that comprises D+1 colors. But the complexity given is O(N+M) where M is the number of edges. What am I doing wrong here?
def color_graph(graph, colors):
for node in graph:
if node in node.neighbors:
raise Exception('Legal coloring impossible for node with loop: %s' %
node.label)
# Get the node's neighbors' colors, as a set so we
# can check if a color is illegal in constant time
illegal_colors = set([
neighbor.color
for neighbor in node.neighbors
if neighbor.color
])
# Assign the first legal color
for color in colors:
if color not in illegal_colors:
node.color = color
break
The number of edges M, the maximum degree D and the number of nodes N satisfy the inequality:
M <= N * D / 2.
Therefore O(N+M) is included in O(N*(D+1)).
In your algorithm, you loop over every neighbour of every node. The exact complexity of that is not N*D, but d1 + d2 + d3 + ... + dN where di is the degree of node i. This sum is equal to 2*M, which is at most N*D but might be less.
Therefore the complexity of your algorithm is O(N+M). Hence it is also O(N*(D+1)). Note that O(N*(D+1)) = O(N*D) under the assumption D >= 1.
Saying your algorithm runs in O(N+M) is slightly more precise than saying it runs in O(N*D). If most nodes have a lot fewer than D neighbours, then M+N might be much smaller than N*D.
Also note that O(M+N) = O(M) under the assumption that every node has at least one neighbour.

Weight for edges according to number of occurence in NetworkX

Say that I have nodes ['a','b','c'] in the network, and the pairs are stored in a list:
[('a','b'), ('a','b'), ('b','a'), ('b','c'), ('a','c')]
I want to create a weighted network graph using NetworkX and matplotlib. Since the pair ('a','b') occurs 3 times (in an undirected network, ('b','a') also counts), while both ('b','c') and ('a','c') only occurs 1 time, I would like to change the width of the edges based on their weight.
Could anybody shed some light on this?
Something like this should work. Find out whether edge exists and if it does update the weights
default_weight = W
G = nx.Graph()
for nodes in node_list:
n0 = nodes[0]
n1 = nodes[1]
if G.has_edge(n0,n1):
G[n0][n1]['weight'] += default_weight
else:
G.add_edge(n0,n1, weight=default_weight)

how to create random single source random acyclic directed graphs with negative edge weights in python

I want to do a execution time analysis of the bellman ford algorithm on a large number of graphs and in order to do that I need to generate a large number of random DAGS with the possibility of having negative edge weights.
I am using networkx in python. There are a lot of random graph generators in the networkx library but what will be the one that will return the directed graph with edge weights and the source vertex.
I am using networkx.generators.directed.gnc_graph() but that does not quite guarantee to return only a single source vertex.
Is there a way to do this with or even without networkx?
You can generate random DAGs using the gnp_random_graph() generator and only keeping edges that point from lower indices to higher. e.g.
In [44]: import networkx as nx
In [45]: import random
In [46]: G=nx.gnp_random_graph(10,0.5,directed=True)
In [47]: DAG = nx.DiGraph([(u,v,{'weight':random.randint(-10,10)}) for (u,v) in G.edges() if u<v])
In [48]: nx.is_directed_acyclic_graph(DAG)
Out[48]: True
These can have more than one source but you could fix that with #Christopher's suggestion of making a "super source" that points to all of the sources.
For small connectivity probability values (p=0.5 in the above) these won't likely be connected either.
I noticed that the generated graphs have always exactly one sink vertex which is the first vertex. You can reverse direction of all edges to get a graph with single source vertex.
The method suggested by #Aric will generate random DAGs but the method will not work for a large number of nodes for example: for n tending to 100000.
G = nx.gnp_random_graph(n, 0.5, directed=True)
DAG = nx.DiGraph([(u, v,) for (u, v) in G.edges() if u < v])
# print(nx.is_directed_acyclic_graph(DAG)) # to check if the graph is DAG (though it will be a DAG)
A = nx.adjacency_matrix(DAG)
AM = A.toarray().tolist() # 1 for outgoing edges
while(len(AM)!=n):
AM = create_random_dag(n)
# to display the DAG in matplotlib uncomment these 2 line
# nx.draw(DAG,with_labels = True)
# plt.show()
return AM
For a large number of nodes, you can use the property that every lower triangular matrix is a DAG.
So generating random Lower Triangular matrix will generate random DAG.
mat = [[0 for x in range(N)] for y in range(N)]
for _ in range(N):
for j in range(5):
v1 = random.randint(0,N-1)
v2 = random.randint(0,N-1)
if(v1 > v2):
mat[v1][v2] = 1
elif(v1 < v2):
mat[v2][v1] = 1
for r in mat:
print(','.join(map(str, r)))
For G -> DG -> DAG
DAG with k inputs and m outputs
Generate a graph with your favorite algorithm( G=watts_strogatz_graph(10,2,0.4) )
make the graph to bidirectional ( DG = G.to_directed())
ensure only node with low index points to high index
remove k lowest index nodes' input edge, and m highest index nodes' output edges ( that make DG to DAG)
make sure every k lowest index nodes have output edges, and every m highest index nodes have input edges.
check every node in this DAG, if the k<index<n-m, and it only has no input edges or output edges, randomly choose a node in k lowest index nodes to link to or choose a node in m highest index nodes to link to it, then you get a random DAG with k inputs and m outputs
Like:
def g2dag(G: nx.Graph, k: int, m: int, seed=None) -> nx.DiGraph:
if seed is not None:
random.seed(seed)
DG = G.to_directed()
n = len(DG.nodes())
assert n > k and n > m
# Ensure only node with low index points to high index
for e in list(DG.edges):
if e[0] >= e[1]:
DG.remove_edge(*e)
# Remove k lowest index nodes' input edge. Randomly link a node if
# they have not output edges.
# And remove m highest index nodes' output edges. Randomly link a node if
# they have not input edges.
# ( that make DG to DAG)
n_list = sorted(list(DG.nodes))
for i in range(k):
n_idx = n_list[i]
for e in list(DG.in_edges(n_idx)):
DG.remove_edge(*e)
if len(DG.out_edges(n_idx)) == 0:
DG.add_edge(n_idx, random.random_choice(n_list[k:]))
for i in range(n-m, n):
n_idx = n_list[i]
for e in list(DG.out_edges(n_idx)):
DG.remove_edge(*e)
if len(DG.in_edges(n_idx)) == 0:
DG.add_edge(random.random_choice(n_list[:n-m], n_idx))
# If the k<index<n-m, and it only has no input edges or output edges,
# randomly choose a node in k lowest index nodes to link to or
# choose a node in m highest index nodes to link to it,
for i in range(k, m-n):
n_idx = n_list[i]
if len(DG.in_edges(n_idx)) == 0:
DG.add_edge(random.random_choice(n_list[:k], n_idx))
if len(DG.out_edges(n_idx)) == 0:
DG.add_edge(n_idx, random.random_choice(n_list[n-m:]))
# then you get a random DAG with k inputs and m outputs
return DG

Categories

Resources