Networkx bipartite graph edge cover - python

I need to compute an edge cover of a weighted bipartite graph which I have built in Networkx. Based on this answer, I have two algorithms that respectively return a minimum weight edge cover and a minimum cardinality (and weight) one. The minimum weight algorithm presents some odd behaviour in the choice of edges, which may be related to an error that happens in the minimum cardinality algorithm, so I'll explain both situations below.
Here are a few details about the graphs being considered:
My current test case has about 1200 nodes on one side and 1600 on the other, with over a million edges
All nodes have at least one incident edge
The graph is typically disconnected in a few blocks
The problem is built as an undirected graph, but directed edges would also make sense (they would always be from the set with bipartite==_og_id to the other)
Minimum weight algorithm
This algorithm seems to always pick the vv' edges (i.e., the edges that are between a node in the original graph and its copy in the larger graph). I thought it was because some edges had a weight of 0 (causing the vv'edge to also have a weight of 0), but adding a minimum weight when building the graph did not change this behaviour. (I use 0.1 since the minimum nonzero weight in the graph should be 1) This basically reverts the algorithm to "for each node, pick the edge that has the smallest weight" which is suboptimal.
Code:
def _min_weight_edge_cover(g: nx.Graph):
"""Returns an edge cover that minimizes the total weight of included edges, but not the total number of edges"""
clone = g.copy()
for node, bi in g.nodes(data='bipartite'):
nd = f"{node}_copy"
clone.nodes[node]['copy'] = False
clone.add_node(nd, copy=True, bipartite=(_og_id if bi == _tg_id else _tg_id)) # invert the bipartite flag
minw = min([w for u, v, w in g.edges(node, data='weight')])
clone.add_edge(node, nd, weight=(2 * minw))
# Now clone contains both the nodes of g and their copies, and should still be bipartite
tops = {n for n, d in clone.nodes(data=True) if d['bipartite'] == _og_id}
bots = set(clone) - tops
print(f"[cover] we have {len(tops)} tops and {len(bots)} bots")
# Here the matching should always exist and be perfect
matching = nx.bipartite.minimum_weight_full_matching(clone, tops)
cover = g.copy()
cover.clear_edges()
keys = {k for k in matching.keys() if clone.nodes[k]['copy'] is False}
for k in keys:
v = matching[k]
if g.has_edge(k, v):
# We never get here
cover.add_edge(k, v)
else:
# v was a copy - this is always true
assert clone.nodes[v]['copy']
minw = math.inf
mine = None
# FIXME should check that we don't add edges between nodes that are already covered
for u, va, w in g.edges(k, data='weight'):
if w < minw:
minw = w
mine = (u, va)
cover.add_edge(*mine)
return cover
Minimum cardinality (and weight)
This algorithm is much simpler (start with a matching and then add the cheapest edge of each node not included in the matching). However, the nx.bipartite.minimum_weight_full_matching function causes an error with cost matrix is infeasible in scipy.optimize.linear_sum_assignment. Unfortunately, there are no details on what makes the cost matrix infeasible. The documentation states that the function takes into account the different number of nodes in the sets, and I've made sure that all nodes have at least one edge. networkx.min_weight matching does work, but it's much, much slower than the bipartite version.
Code:
def _min_cardinality_weight_edge_cover(g: nx.Graph) -> nx.Graph:
"""Returns an edge cover that minimizes
1. the number of edges included;
2. the total weight of all edges included
"""
# get the minimum weight matching.
# By definition, it will have at most one edge per node but some node may end up unmatched
matching = nx.bipartite.minimum_weight_full_matching(g, top_nodes={n for n, b in g.nodes(data='bipartite') if b ==_og_id})
# to make it into a cover, we take all edges from the matching and, for each node not matched, add its cheapest edge
cover = nx.Graph()
cover.add_edges_from(matching.items())
missing = set(g.nodes) - set(cover.nodes)
# there shouldn't be a case where two missing nodes could connect to each other or else that edge would have been
# included in the matching
for node in missing:
minw = math.inf
mine = None
for u, v, w in g.edges(node, data='weight'):
if w < minw:
minw = w
mine = (u, v)
cover.add_edge(*mine)
return cover
Any ideas as to what could be causing these issues?

Related

Fully connect an unconnected bi-partite graph in NetworkX

This is related to this question, with a small difference. Namely, I am already given a graph G, which is a bi-partite graph, meaning that there exist two sets of vertices, set U and set I, and the connection could only exist between a node from the set U and a node from the set I.
I want to extend this unconnected graph by making it connected and still bi-partite, and I also want that the probability of a new edge (when extending the graph with edges) is proportional to the nodes degrees (i.e., higher probability that an edge links two nodes of large degrees). The code that I would like to extend:
import random
from itertools import combinations, groupby
components = dict(enumerate(nx.connected_components(G)))
components_combs = combinations(components.keys(), r=2)
for _, node_edges in groupby(components_combs, key=lambda x: x[0]):
node_edges = list(node_edges)
random_comps = random.choice(node_edges)
source = random.choice(list(components[random_comps[0]]))
target = random.choice(list(components[random_comps[1]]))
G.add_edge(source, target)

Discrepancy in calculating graph coloring code complexity

Consider the code below. Suppose the graph in question has N nodes with at most D neighbors for each node, and D+1 colors are available for coloring the nodes such that no two nodes connected with an edge have the same color assigned to them. I reckon the complexity of the code below is O(N*D) because for each of the N nodes we loop through the at most D neighbors of that node to populate the set illegal_colors, and then iterate through colors list that comprises D+1 colors. But the complexity given is O(N+M) where M is the number of edges. What am I doing wrong here?
def color_graph(graph, colors):
for node in graph:
if node in node.neighbors:
raise Exception('Legal coloring impossible for node with loop: %s' %
node.label)
# Get the node's neighbors' colors, as a set so we
# can check if a color is illegal in constant time
illegal_colors = set([
neighbor.color
for neighbor in node.neighbors
if neighbor.color
])
# Assign the first legal color
for color in colors:
if color not in illegal_colors:
node.color = color
break
The number of edges M, the maximum degree D and the number of nodes N satisfy the inequality:
M <= N * D / 2.
Therefore O(N+M) is included in O(N*(D+1)).
In your algorithm, you loop over every neighbour of every node. The exact complexity of that is not N*D, but d1 + d2 + d3 + ... + dN where di is the degree of node i. This sum is equal to 2*M, which is at most N*D but might be less.
Therefore the complexity of your algorithm is O(N+M). Hence it is also O(N*(D+1)). Note that O(N*(D+1)) = O(N*D) under the assumption D >= 1.
Saying your algorithm runs in O(N+M) is slightly more precise than saying it runs in O(N*D). If most nodes have a lot fewer than D neighbours, then M+N might be much smaller than N*D.
Also note that O(M+N) = O(M) under the assumption that every node has at least one neighbour.

Create a networkx weighted graph and find the path between 2 nodes with the smallest weight

I have a problem involving graph theory. To solve it, I would like to create a weighted graph using networkx. At the moment, I have a dictionnary where each key is a node, and each value is the associated weight (between 10 and 200 000 or so).
weights = {node: weight}
I believe I do not need to normalize the weights with networks.
At the moment, I create a non-weighted graph by adding the edges:
def create_graph(data):
edges = create_edges(data)
# Create the graph
G = nx.Graph()
# Add edges
G.add_edges_from(edges)
return G
From what I read, I can add a weight to the edge. However, I would prefer the weight to be applied to a specific node instead of an edge. How can I do that?
Idea: I create the graph by adding the nodes weighted, and then I add the edges between the nodes.
def create_graph(data, weights):
nodes = create_nodes(data)
edges = create_edges(data) # list of tuples
# Create the graph
G = nx.Graph()
# Add edges
for node in nodes:
G.add_node(node, weight=weights[node])
# Add edges
G.add_edges_from(edges)
return G
Is this approach correct?
Next step is to find the path between 2 nodes with the smallest weight. I found this function: networkx.algorithms.shortest_paths.generic.shortest_path which I think is doing the right thing. However, it uses weights on the edge instead of weights on the nodes. Could someone explain me what this function does, what the difference between wieghts on the nodes and weights on the edges is for networkx, and how I could achieve what I am looking for? Thanks :)
This generally looks right.
You might use bidirectional_dijkstra. It can be significantly faster if you know the source and target nodes of your path (see my comments at the bottom).
To handle the edge vs node weight issue, there are two options. First note that you are after the sum of the nodes along the path. If I give each edge a weight w(u,v) = w(u) + w(v) then the sum of weights along this is w(source) + w(target) + 2 sum(w(v)) where the nodes v are all nodes found along the way. Whatever has the minimum weight with these edge weights will have the minimum weight with the node weights.
So you could go and assign each edge the weight to be the sum of the two nodes.
for edge in G.edges():
G.edges[edge]['weight'] = G.nodes[edge[0]]['weight'] + G.nodes[edge[1]]['weight']
But an alternative is to note that the weight input into bidirectional_dijkstra can be a function that takes the edge as input. Define your own function to give the sum of the two node weights:
def f(edge):
u,v = edge
return G.nodes[u]['weight'] + G.nodes[v]['weight']
and then in your call do bidirectional_dijkstra(G, source, target, weight=f)
So the choices I'm suggesting are to either assign each edge a weight equal to the sum of the node weights or define a function that will give those weights just for the edges the algorithm encounters. Efficiency-wise I expect it will take more time to figure out which is better than it takes to code either algorithm. The only performance issue is that assigning all the weights will use more memory. Assuming memory isn't an issue, use whichever one you think is easiest to implement and maintain.
Some comments on bidirectional dijkstra: Imagine you have two points in space a distance R apart and you want to find the shortest distance between them. The dijkstra algorithm (which is the default of shortest_path) will explore every point within distance D of the source point. Basically it's like expanding a balloon centered at the first point until it reaches the other. This has a volume (4/3) pi R^3. With bidirectional_dijkstra we inflate balloons centered at each until they touch. They will each have radius R/2. So the volume is (4/3)pi (R/2)^3 + (4/3) pi (R/2)^3, which is a quarter the volume of the original balloon, so the algorithm has explored a quarter of the space. Since networks can have very high effective dimension, the savings is often much bigger.

Finding the minimum cost set of nodes so that once removed, the graph is disconnected

It would be great if someone could help me.
I have an undirected graph where every vertex has a weight and where no edges have weights. I want to find a set of nodes with minimum total weight for which their removal makes the graph disconnected. For example, between removing one node with a weight of 10 that make the graph disconnected and removing 2 nodes with a total weight of 6 that make the graph disconnected, the result set should contain those 2 nodes.
Is there any known algorithm for this problem?
Here is what I've done so far. I've made my code using networkx (python). I've already changed my graph to be directed. For instance, for node 1, I consider 1in and 1out node. and I connect 1in to 1out by the weight of node 1. I also add s and t nodes (I'm not sure if it's correct or not). I defined also capacity for each edge in new directed graph.
When run the code, I get this error: NetworkXUnbounded: Infinite capacity path, flow unbounded above.
deg_G = nx.degree(G)
max_weight = max([deg for i,deg in deg_G])+1
st_Weighted_Complement_G = nx.DiGraph()
r = np.arange(len(Complement_G.nodes))
nodes = ['s','t']
edges = []
for i in r:
nIn = (str(i)+'in')
nOut = (str(i)+'out')
nodes.extend([nIn,nOut])
edges.extend([(nIn,nOut,{'capacity':deg_G[i],'weight':deg_G[i]}),('s',nIn,{'capacity':math.inf,'weight':0}),
(nOut,'t',{'capacity':math.inf,'weight':0})])
for edge in Complement_G.edges:
print(edge[0],edge[1])
edges.extend([((str(edge[0]))+'out',(str(edge[1]))+'in',{'capacity':max_weight,'weight':0}),
((str(edge[1]))+'out',(str(edge[0]))+'in',{'capacity':max_weight,'weight':0})])
print(edges)
st_Weighted_Complement_G.add_nodes_from(nodes)
st_Weighted_Complement_G.add_weighted_edges_from(edges)
mincostFlow = nx.max_flow_min_cost(st_Weighted_Complement_G, 's', 't',capacity='capacity',weight='weight')
print(mincostFlow)
Thanks

Find highest weight edge(s) for a given node

I have a directed graph in NetworkX. The edges are weighted from 0 to 1, representing probabilities that they occurred. The network connectivity is quite high, so I want to prune the edges such for every node, only the highest probability node remains.
I'm not sure how to iterate over every node and keep only the highest weighted in_edges in the graph. Is there a networkx function that allows us to do this?
Here is an example of what I'd like to be able to do.
Nodes:
A, B, C, D
Edges:
A->B, weight=1.0
A->C, weight=1.0
A->D, weight=0.5
B->C, weight=0.9
B->D, weight=0.8
C->D, weight=0.9
Final Result Wanted:
A->B, weight=1.0
A->C, weight=1.0
C->D, weight=0.9
If there are two edges into a node, and they are both of the highest weight, I'd like to keep them both.
Here are some ideas:
import networkx as nx
G = nx.DiGraph()
G.add_edge('A','B', weight=1.0)
G.add_edge('A','C', weight=1.0)
G.add_edge('A','D', weight=0.5)
G.add_edge('B','C', weight=0.9)
G.add_edge('B','D', weight=0.8)
G.add_edge('C','D', weight=0.9)
print "all edges"
print G.edges(data=True)
print "edges >= 0.9"
print [(u,v,d) for (u,v,d) in G.edges(data=True) if d['weight'] >= 0.9]
print "sorted by weight"
print sorted(G.edges(data=True), key=lambda (source,target,data): data['weight'])
The solution I had was inspired by Aric. I used the following code:
for node in G.nodes():
edges = G.in_edges(node, data=True)
if len(edges) > 0: #some nodes have zero edges going into it
min_weight = min([edge[2]['weight'] for edge in edges])
for edge in edges:
if edge[2]['weight'] > min_weight:
G.remove_edge(edge[0], edge[1])
The solution of ericmjl provided does not work entirely in my program due to the following Runtime Error.
Moreover, it keeps the edges with the lowest probability, not the highest, as asked in the question (because: remove all edges with weight > min, instead remove all with weight < max). Furthermore, it sufficient to do the inner loop for len(edges) > 1, because we want to remove all edges from nodes with more than one edge.
The complete solution:
for node in G.nodes():
edges = G.edges(node, data=True)
if len(edges) > 1: # some nodes have zero edges going into it
max_weight = max([edge[2]['weight'] for edge in edges])
for edge in list(edges):
if edge[2]['weight'] < max_weight:
G.remove_edge(edge[0], edge[1])

Categories

Resources