This is related to this question, with a small difference. Namely, I am already given a graph G, which is a bi-partite graph, meaning that there exist two sets of vertices, set U and set I, and the connection could only exist between a node from the set U and a node from the set I.
I want to extend this unconnected graph by making it connected and still bi-partite, and I also want that the probability of a new edge (when extending the graph with edges) is proportional to the nodes degrees (i.e., higher probability that an edge links two nodes of large degrees). The code that I would like to extend:
import random
from itertools import combinations, groupby
components = dict(enumerate(nx.connected_components(G)))
components_combs = combinations(components.keys(), r=2)
for _, node_edges in groupby(components_combs, key=lambda x: x[0]):
node_edges = list(node_edges)
random_comps = random.choice(node_edges)
source = random.choice(list(components[random_comps[0]]))
target = random.choice(list(components[random_comps[1]]))
G.add_edge(source, target)
Related
I need to compute an edge cover of a weighted bipartite graph which I have built in Networkx. Based on this answer, I have two algorithms that respectively return a minimum weight edge cover and a minimum cardinality (and weight) one. The minimum weight algorithm presents some odd behaviour in the choice of edges, which may be related to an error that happens in the minimum cardinality algorithm, so I'll explain both situations below.
Here are a few details about the graphs being considered:
My current test case has about 1200 nodes on one side and 1600 on the other, with over a million edges
All nodes have at least one incident edge
The graph is typically disconnected in a few blocks
The problem is built as an undirected graph, but directed edges would also make sense (they would always be from the set with bipartite==_og_id to the other)
Minimum weight algorithm
This algorithm seems to always pick the vv' edges (i.e., the edges that are between a node in the original graph and its copy in the larger graph). I thought it was because some edges had a weight of 0 (causing the vv'edge to also have a weight of 0), but adding a minimum weight when building the graph did not change this behaviour. (I use 0.1 since the minimum nonzero weight in the graph should be 1) This basically reverts the algorithm to "for each node, pick the edge that has the smallest weight" which is suboptimal.
Code:
def _min_weight_edge_cover(g: nx.Graph):
"""Returns an edge cover that minimizes the total weight of included edges, but not the total number of edges"""
clone = g.copy()
for node, bi in g.nodes(data='bipartite'):
nd = f"{node}_copy"
clone.nodes[node]['copy'] = False
clone.add_node(nd, copy=True, bipartite=(_og_id if bi == _tg_id else _tg_id)) # invert the bipartite flag
minw = min([w for u, v, w in g.edges(node, data='weight')])
clone.add_edge(node, nd, weight=(2 * minw))
# Now clone contains both the nodes of g and their copies, and should still be bipartite
tops = {n for n, d in clone.nodes(data=True) if d['bipartite'] == _og_id}
bots = set(clone) - tops
print(f"[cover] we have {len(tops)} tops and {len(bots)} bots")
# Here the matching should always exist and be perfect
matching = nx.bipartite.minimum_weight_full_matching(clone, tops)
cover = g.copy()
cover.clear_edges()
keys = {k for k in matching.keys() if clone.nodes[k]['copy'] is False}
for k in keys:
v = matching[k]
if g.has_edge(k, v):
# We never get here
cover.add_edge(k, v)
else:
# v was a copy - this is always true
assert clone.nodes[v]['copy']
minw = math.inf
mine = None
# FIXME should check that we don't add edges between nodes that are already covered
for u, va, w in g.edges(k, data='weight'):
if w < minw:
minw = w
mine = (u, va)
cover.add_edge(*mine)
return cover
Minimum cardinality (and weight)
This algorithm is much simpler (start with a matching and then add the cheapest edge of each node not included in the matching). However, the nx.bipartite.minimum_weight_full_matching function causes an error with cost matrix is infeasible in scipy.optimize.linear_sum_assignment. Unfortunately, there are no details on what makes the cost matrix infeasible. The documentation states that the function takes into account the different number of nodes in the sets, and I've made sure that all nodes have at least one edge. networkx.min_weight matching does work, but it's much, much slower than the bipartite version.
Code:
def _min_cardinality_weight_edge_cover(g: nx.Graph) -> nx.Graph:
"""Returns an edge cover that minimizes
1. the number of edges included;
2. the total weight of all edges included
"""
# get the minimum weight matching.
# By definition, it will have at most one edge per node but some node may end up unmatched
matching = nx.bipartite.minimum_weight_full_matching(g, top_nodes={n for n, b in g.nodes(data='bipartite') if b ==_og_id})
# to make it into a cover, we take all edges from the matching and, for each node not matched, add its cheapest edge
cover = nx.Graph()
cover.add_edges_from(matching.items())
missing = set(g.nodes) - set(cover.nodes)
# there shouldn't be a case where two missing nodes could connect to each other or else that edge would have been
# included in the matching
for node in missing:
minw = math.inf
mine = None
for u, v, w in g.edges(node, data='weight'):
if w < minw:
minw = w
mine = (u, v)
cover.add_edge(*mine)
return cover
Any ideas as to what could be causing these issues?
I am working with networks undergoing a number of disrupting events. So, a number of nodes fail because of a given event. Therefore there is a transition between the image to the left to that to the right:
My question: how can I find the disconnected subgraphs, even if they contain only 1 node? My purpose is to count them and render as failed, as in my study this is what applies to them. By semi-isolated nodes, I mean groups of isolated nodes, but connected to each other.
I know I can find isolated nodes like this:
def find_isolated_nodes(graph):
""" returns a list of isolated nodes. """
isolated = []
for node in graph:
if not graph[node]:
isolated += node
return isolated
but how would you amend these lines to make them find groups of isolated nodes as well, like those highlighted in the right hand side picture?
MY THEORETICAL ATTEMPT
It looks like this problem is addressed by the Flood Fill algorithm, which is explained here. However, I wonder how it could be possible to simply count the number of nodes in the giant component(s) and then subtract it from the number of nodes that appear still active at stage 2. How would you implement this?
If I understand correctly, you are looking for "isolated" nodes, meaning the nodes not in the largest component of the graph. As you mentioned, one method to identify the "isolated" nodes is to find all the nodes NOT in the largest component. To do so, you can just use networkx.connected_components, to get a list of the components and sort them by size:
components = list(nx.connected_components(G)) # list because it returns a generator
components.sort(key=len, reverse=True)
Then you can find the largest component, and get a count of the "isolated" nodes:
largest = components.pop(0)
num_isolated = G.order() - len(largest)
I put this all together in an example where I draw a Erdos-Renyi random graph, coloring isolated nodes blue:
# Load modules and create a random graph
import networkx as nx, matplotlib.pyplot as plt
G = nx.gnp_random_graph(10, 0.15)
# Identify the largest component and the "isolated" nodes
components = list(nx.connected_components(G)) # list because it returns a generator
components.sort(key=len, reverse=True)
largest = components.pop(0)
isolated = set( g for cc in components for g in cc )
# Draw the graph
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos=pos, nodelist=largest, node_color='r')
nx.draw_networkx_nodes(G, pos=pos, nodelist=isolated, node_color='b')
nx.draw_networkx_edges(G, pos=pos)
plt.show()
I am going to draw a network with python3 and networkxmoduel.
First of all, I am sorry I could not write any example code because I did not receive any raw data.
The network consists of 3 groups of nodes and a below attached is what I imagined.
It is a hand drawn.
I would like to refer any layout or tips to draw this kind of above.
I know Multipartite Layouthttps://networkx.org/documentation/stable/auto_examples/drawing/plot_multipartite_graph.html#multipartite-layout, however, I am not sure it is suitable for me.
Thanks.
The multi-partite layout is going to put your nodes in rows/columns based on the partitions you specify, but it seems like what you want is to align your nodes so that the provided groups/partitions are clustered together and separated from the other groups/clusters. You can do this by making a position dictionary that you can pass to the networkx drawing functions. The example function below takes your graph, the name of the node attribute in your Graph object that specifies which group/partition each node belongs to (partition_attr), an optional list of partition names specifying the order you want to display your groups/components left-to-right (partition_order) and the minimum space between nodes in different partitions (epsilon).
#%% Function to make position dicts by partition
def make_node_positions(graph,partition_attr,partition_order=None,epsilon=.5):
if not partition_order:
# get a list of all the partition names if not specified
partition_order = list(set(dict(graph.nodes(data=partition_attr)).values()))
# make position dict for each partition
orig_partition_pos_dicts = {partition:nx.spring_layout(graph.subgraph([node for node,part in graph.nodes(data=partition_attr)
if part == partition]))
for partition in partition_order}
# update the x coordinate in the position dicts so partitions
# don't overlap and are in the specified order left-to-right
final_pos_dict = orig_partition_pos_dicts[partition_order[0]]
for i,partition in enumerate(partition_order[1:]):
# get the largest x coordinate from the previous partition's nodes
max_previous = max([x for x,y in final_pos_dict.values()])
# get smallest x coordinate from this partition's nodes
current_min = min([x for x,y in orig_partition_pos_dicts[partition].values()])
# update the x coordinates for this partition to be at least epsilon units
# to the right of the right-most node in the previous partition
final_pos_dict.update({node:(pos[0]+max_previous+abs(current_min)+epsilon,pos[1])
for node,pos in orig_partition_pos_dicts[partition].items()})
return(final_pos_dict)
Now I've made a graph similar to your drawing and applied the function below
#%% Set up toy graph
import networkx as nx
# make the initial graphs
k5 = nx.complete_graph(5)
triangle=nx.from_edgelist([(5,6),(6,7),(5,7)])
single_node = nx.Graph()
single_node.add_node(8)
# edges to connect the components
extra_edges = [(3,5),(2,6),(5,8),(6,8),(7,8)]
# combine graphs and specify the original graphs
orig_graphs = {'k5':{'graph':k5,'color':'blue'},
'triangle':{'graph':triangle,'color':'green'},
'single_node':{'graph':single_node,'color':'red'}}
g = nx.Graph()
for g_name,g_val_dict in orig_graphs.items():
# add the nodes from that graph and specify the partition and node colors
g.add_nodes_from(g_val_dict['graph'].nodes,partition=g_name,color=g_val_dict['color'])
if len(g_val_dict['graph'].edges) > 0:
# if the graph has edges then add the edges
g.add_edges_from(g_val_dict['graph'].edges,partition=g_name,color=g_val_dict['color'])
# add the extra edges to combine the graphs
g.add_edges_from(extra_edges,color='black')
#%% Draw graph #####
my_pos = make_node_positions(g,partition_attr='partition',partition_order=['k5','triangle','single_node'])
nx.draw_networkx_nodes(g,my_pos,node_color=[c for n,c in g.nodes(data='color')])
nx.draw_networkx_labels(g,my_pos)
nx.draw_networkx_edges(g,my_pos,edge_color=[c for u,v,c in g.edges(data='color')])
I need to build something like an Erdos-Renyi model (random graph):
I need to create it from a dictionary of node positions that is generated by a deterministic function. This means that I cannot allow Python to randomly decide where each node goes to, as I want to decide it. The function is:
pos = dict( (n, n) for n in G.nodes() ).
I was thinking of creating an adjacency matrix first, in order to randomly generate something similar to pairs of (start, endpoint) of each edge, like this:
G=np.random.randint(0, 1, 25).reshape(5, 5)
Then I was thinking of somehow turning the matrix into my list of edges, like this:
G1=nx.grid_2d_graph(G)
but of course it does not work since this function takes 2 args and I am only giving it 1.
My questions:
How to create this kind of graph in NetworkX?
How to make sure that all nodes are connected?
How to make sure that, upon assigning the 1 in the matrix, each pair of nodes has the same probability of landing a 1?
Example for point 3. Imagine we created the regular grid of points which positions are determined according to pos. When we start connecting the network and we select the first node, we want to make sure that the endpoint of this first edge is one of the N-1 nodes left in the network (except the starting node itself). Anyhow, we want to make sure that all N-1 nodes have the same probability of being connected to the node we first analyze.
Thanks a lot!
I will try to build on the previous questions concerning this problem to be consistent. Given that you have the keys of the grid_2d_graph as 'n' and not (i,j) with the relabel nodes function:
N = 10
G=nx.grid_2d_graph(N,N)
pos = dict( (n, n) for n in G.nodes() )
labels = dict( ((i, j), i + (N-1-j) * N ) for i, j in G.nodes() )
nx.relabel_nodes(G,labels,False)
now you can set the pos dictionary to map from the 'n' keyed nodes to the positions you already have by switching keys with values. And then simply call the Erdos-Renyi function to create the graph that has a probability 'p' that an edge exists between any two nodes (except a self edge) as you described in point 3. Then draw with the pos dictionary.
pos = {y:x for x,y in labels.iteritems()}
G2 = nx.erdos_renyi_graph(100,0.1)
nx.draw_networkx(G2, pos=pos, with_labels=True, node_size = 300)
print G.nodes()
plt.axis('off')
plt.show()
As for ensuring that the graph is connected in point 2 You can not guarantee that the graph is connected with a probability 1, but you can read a bit about the size of the Giant component in Erdos-Renyi graph. But to avoid getting into theoretical details, one is almost sure that the graph will be connected when lambda which is n*p (here they are 100*0.1) is greater than 4. Though for smaller graphs (like 100 nodes) you better increase lambda. From my own experience using n= 100 and p = 0.1, produced a non-connected graph for only about 0.2% of the times and that is after thousands of simulations. And anyway you can always make sure if the produced graph is connected or not with the is_connected method.
I am trying to create a connected graph where each node has some attributes that determine what other nodes it is connected to. The network is a circular space to make it easy to establish links (there are a 1000 nodes).
The way this network works is that a node has both neighbors (the ones to its immediate left/right - i.e. node 3 has neighbors 1 and 2) and also k long distance links. The way a node picks long distance links is that it just randomly picks nodes from the clockwise direction (i.e. node 25 might have 200 as its long distance link instead of 15).
Here is a sample image of what it might looks like: http://i.imgur.com/PkYk5bz.png
Given is a symphony network but my implementation is a simplification of that.
I partially implemented this in java(via a linked list holding an arraylist) but am lost on how to do this in NetworkX. I am especially confused on how to add these specific node attributes that say that a node will find k long links but after k will not accept any more links. Is there a specific built in graph in networkx that is suited towards this model or is any graph acceptable as long as I have the correct node attributes?
It's a simplification of a more complicated network where no node leaves and no edge dissapears.
Any help or a link to an example would be appreciated on this.
This approximates to your need:
import networkx as nx
import matplotlib.pyplot as plt
import random
N = 20 # number of nodes
K = 3 # number of "long" edges
G = nx.cycle_graph(N)
for node in G.nodes():
while len(G.neighbors(node)) < K+2:
# Add K neighbors to each node
# (each node already has two neighbors from the cycle)
valid_target_found = False
while not valid_target_found:
# CAUTION
# This loop will not terminate
# if K is too high relative to N
target = random.randint(0,N-1)
# pick a random node
if (not target in G.neighbors(node)
and len(G.neighbors(target)) < K+2):
# Accept the target if (a) it is not already
# connected to source and (b) target itself
# has less than K long edges
valid_target_found = True
G.add_edge(node, target)
nx.draw_circular(G)
plt.show()
It creates the graph below. There are improvements to be made, for example, a more efficient selection of the target nodes for the long edges, but this gets you started, I hope.
In NetworkX, if there's any logic about connecting your node everything should be left to you.
Nevertheless, if you want to iterate on nodes in Python (not tested):
for (nodeId, data) in yourGraph.nodes(data=True):
// some logic here over data
// to connect your node
yourGraph.add_edge(nodeId, otherNodeId)
Side note: if you want to stay in Java you can also consider using Jung and Gephi.