For loop optimization to create an adjacency matrix - python

I am currently working with graph with labeled edges.
The original adjacency matrix is a matrix with shape [n_nodes, n_nodes, n_edges] where each cell [i,j, k] is 1 if node i and j are connected via edge k.
I need to create a reverse of the original graph, where nodes become edges and edges become nodes, so i need a new matrix with shape [n_edges, n_edges, n_nodes], where each cell [i,j,k] is 1 if edges i and j have k as a common vertex.
The following code correctly completes the task, but the use of 5 nested for-loops is too slow, to process the amount of graphs with which I have to work seems to take about 700 hours.
Is there a better way to implement this?
n_nodes = extended_adj.shape[0]
n_edges = extended_adj.shape[2]
reversed_graph = torch.zeros(n_edges, n_edges, n_nodes, 1)
for i in range(n_nodes):
for j in range(n_nodes):
for k in range(n_edges):
#If adj_mat[i][j][k] == 1 nodes i and j are connected with edge k
#For this reason the edge k must be connected via node j to every outcoming edge of j
if extended_adj[i][j][k] == 1:
#Given node j, we need to loop through every other possible node (l)
for l in range(n_nodes):
#For every other node, we need to check if they are connected by an edge (m)
for m in range(n_edges):
if extended_adj[j][l][m] == 1:
reversed_graph[k][m][j] = 1
Thanks is advance.

Echoing the comments above, this graph representation is almost certainly cumbersome and inefficient. But that notwithstanding, let's define a vectorized solution without loops and that uses tensor views whenever possible, which should be fairly efficient to compute for larger graphs.
For clarity let's use [i,j,k] to index G (original graph) and [i',j',k'] to index G' (new graph). And let's shorten n_edges to e and n_nodes to n.
Consider the 2D matrix slice = torch.max(G,dim = 1). At each coordinate [a,b] of this slice, a 1 indicates that node a is connected by edge b to some other node (we don't care which).
slice = torch.max(G,dim = 1) # dimension [n,e]
We're well on our way to the solution, but we need an expression that tells us whether a is connected to edge b and another edge c, for all edges c. We can map all combinations b,c by expanding slice, copying it and transposing it, and looking for intersections between the two.
expanded_dim = [slice.shape[0],slice.shape[1],slice.shape[1]] # value [n,e,e]
# two copies of slice, expanded on different dimensions
expanded_slice = slice.unsqueeze(1).expand(expanded_dim) # dimension [n,e,e]
transpose_slice = slice.unsqueeze(2).expand(expanded_dim) # dimension [n,e,e]
G = torch.bitwise_and(expanded_slice,transpose_slice).int() # dimension [n,e,e]
G[i',j',k'] now equals 1 iff node i' is connected by edge j' to some other node, AND node i' is connected by edge k' to some other node. If j' = k' the value is 1 as long as one of the endpoints of that edge is i'.
Lastly, we reorder dimensions to get to your desired form.
G = torch.permute(G,(1,2,0)) # dimension [e,e,n]

Related

Networkx bipartite graph edge cover

I need to compute an edge cover of a weighted bipartite graph which I have built in Networkx. Based on this answer, I have two algorithms that respectively return a minimum weight edge cover and a minimum cardinality (and weight) one. The minimum weight algorithm presents some odd behaviour in the choice of edges, which may be related to an error that happens in the minimum cardinality algorithm, so I'll explain both situations below.
Here are a few details about the graphs being considered:
My current test case has about 1200 nodes on one side and 1600 on the other, with over a million edges
All nodes have at least one incident edge
The graph is typically disconnected in a few blocks
The problem is built as an undirected graph, but directed edges would also make sense (they would always be from the set with bipartite==_og_id to the other)
Minimum weight algorithm
This algorithm seems to always pick the vv' edges (i.e., the edges that are between a node in the original graph and its copy in the larger graph). I thought it was because some edges had a weight of 0 (causing the vv'edge to also have a weight of 0), but adding a minimum weight when building the graph did not change this behaviour. (I use 0.1 since the minimum nonzero weight in the graph should be 1) This basically reverts the algorithm to "for each node, pick the edge that has the smallest weight" which is suboptimal.
Code:
def _min_weight_edge_cover(g: nx.Graph):
"""Returns an edge cover that minimizes the total weight of included edges, but not the total number of edges"""
clone = g.copy()
for node, bi in g.nodes(data='bipartite'):
nd = f"{node}_copy"
clone.nodes[node]['copy'] = False
clone.add_node(nd, copy=True, bipartite=(_og_id if bi == _tg_id else _tg_id)) # invert the bipartite flag
minw = min([w for u, v, w in g.edges(node, data='weight')])
clone.add_edge(node, nd, weight=(2 * minw))
# Now clone contains both the nodes of g and their copies, and should still be bipartite
tops = {n for n, d in clone.nodes(data=True) if d['bipartite'] == _og_id}
bots = set(clone) - tops
print(f"[cover] we have {len(tops)} tops and {len(bots)} bots")
# Here the matching should always exist and be perfect
matching = nx.bipartite.minimum_weight_full_matching(clone, tops)
cover = g.copy()
cover.clear_edges()
keys = {k for k in matching.keys() if clone.nodes[k]['copy'] is False}
for k in keys:
v = matching[k]
if g.has_edge(k, v):
# We never get here
cover.add_edge(k, v)
else:
# v was a copy - this is always true
assert clone.nodes[v]['copy']
minw = math.inf
mine = None
# FIXME should check that we don't add edges between nodes that are already covered
for u, va, w in g.edges(k, data='weight'):
if w < minw:
minw = w
mine = (u, va)
cover.add_edge(*mine)
return cover
Minimum cardinality (and weight)
This algorithm is much simpler (start with a matching and then add the cheapest edge of each node not included in the matching). However, the nx.bipartite.minimum_weight_full_matching function causes an error with cost matrix is infeasible in scipy.optimize.linear_sum_assignment. Unfortunately, there are no details on what makes the cost matrix infeasible. The documentation states that the function takes into account the different number of nodes in the sets, and I've made sure that all nodes have at least one edge. networkx.min_weight matching does work, but it's much, much slower than the bipartite version.
Code:
def _min_cardinality_weight_edge_cover(g: nx.Graph) -> nx.Graph:
"""Returns an edge cover that minimizes
1. the number of edges included;
2. the total weight of all edges included
"""
# get the minimum weight matching.
# By definition, it will have at most one edge per node but some node may end up unmatched
matching = nx.bipartite.minimum_weight_full_matching(g, top_nodes={n for n, b in g.nodes(data='bipartite') if b ==_og_id})
# to make it into a cover, we take all edges from the matching and, for each node not matched, add its cheapest edge
cover = nx.Graph()
cover.add_edges_from(matching.items())
missing = set(g.nodes) - set(cover.nodes)
# there shouldn't be a case where two missing nodes could connect to each other or else that edge would have been
# included in the matching
for node in missing:
minw = math.inf
mine = None
for u, v, w in g.edges(node, data='weight'):
if w < minw:
minw = w
mine = (u, v)
cover.add_edge(*mine)
return cover
Any ideas as to what could be causing these issues?

Discrepancy in calculating graph coloring code complexity

Consider the code below. Suppose the graph in question has N nodes with at most D neighbors for each node, and D+1 colors are available for coloring the nodes such that no two nodes connected with an edge have the same color assigned to them. I reckon the complexity of the code below is O(N*D) because for each of the N nodes we loop through the at most D neighbors of that node to populate the set illegal_colors, and then iterate through colors list that comprises D+1 colors. But the complexity given is O(N+M) where M is the number of edges. What am I doing wrong here?
def color_graph(graph, colors):
for node in graph:
if node in node.neighbors:
raise Exception('Legal coloring impossible for node with loop: %s' %
node.label)
# Get the node's neighbors' colors, as a set so we
# can check if a color is illegal in constant time
illegal_colors = set([
neighbor.color
for neighbor in node.neighbors
if neighbor.color
])
# Assign the first legal color
for color in colors:
if color not in illegal_colors:
node.color = color
break
The number of edges M, the maximum degree D and the number of nodes N satisfy the inequality:
M <= N * D / 2.
Therefore O(N+M) is included in O(N*(D+1)).
In your algorithm, you loop over every neighbour of every node. The exact complexity of that is not N*D, but d1 + d2 + d3 + ... + dN where di is the degree of node i. This sum is equal to 2*M, which is at most N*D but might be less.
Therefore the complexity of your algorithm is O(N+M). Hence it is also O(N*(D+1)). Note that O(N*(D+1)) = O(N*D) under the assumption D >= 1.
Saying your algorithm runs in O(N+M) is slightly more precise than saying it runs in O(N*D). If most nodes have a lot fewer than D neighbours, then M+N might be much smaller than N*D.
Also note that O(M+N) = O(M) under the assumption that every node has at least one neighbour.

Proximity Graph in python

I'm relatively new to Python coding (I'm switching from R mostly due to running time speed) and I'm trying to figure out how to code a proximity graph.
That is suppose i have an array of evenly-spaced points in d-dimensional Euclidean space, these will be my nodes. I want to make these into an undirected graph by connecting two points if and only if they lie within e apart. How can I encode this functionally with parameters:
n: spacing between two points on the same axis
d: dimension of R^d
e: maximum distance allowed for an edge to exist.
The graph-tool library has much of the functionality you need. So you could do something like this, assuming you have numpy and graph-tool:
coords = numpy.meshgrid(*(numpy.linspace(0, (n-1)*delta, n) for i in range(d)))
# coords is a Python list of numpy arrays
coords = [c.flatten() for c in coords]
# now coords is a Python list of 1-d numpy arrays
coords = numpy.array(coords).transpose()
# now coords is a numpy array, one row per point
g = graph_tool.generation.geometric_graph(coords, e*(1+1e-9))
The silly e*(1+1e-9) thing is because your criterion is "distance <= e" and geometric_graph's criterion is "distance < e".
There's a parameter called delta that you didn't mention because I think your description of parameter n is doing duty for two params (spacing between points, and number of points).
This bit of code should work, although it certainly isn't the most efficient. It will go through each node and check its distance to all the other nodes (that haven't already compared to it). If that distance is less than your value e then the corresponding value in the connected matrix is set to one. Zero indicates two nodes are not connected.
In this code I'm assuming that your nodeList is a list of cartesian coordinates of the form nodeList = [[x1,y1,...],[x2,y2,...],...[xN,yN,...]]. I also assume you have some function called calcDistance which returns the euclidean distance between two cartesian coordinates. This is basic enough to implement that I haven't written the code for that, and in any case using a function allows for future generalizing and modability.
numNodes = len(nodeList)
connected = np.zeros([numNodes,numNodes])
for i, n1 in enumerate(nodeList):
for j, n2 in enumerate(nodeList[i:]):
dist = calcDistance(n1, n2)
if dist < e:
connected[i,j] = 1
connected[j,i] = 1

NetworkX: how to build an Erdos-Renyi graph from a set of predetermined positions?

I need to build something like an Erdos-Renyi model (random graph):
I need to create it from a dictionary of node positions that is generated by a deterministic function. This means that I cannot allow Python to randomly decide where each node goes to, as I want to decide it. The function is:
pos = dict( (n, n) for n in G.nodes() ).
I was thinking of creating an adjacency matrix first, in order to randomly generate something similar to pairs of (start, endpoint) of each edge, like this:
G=np.random.randint(0, 1, 25).reshape(5, 5)
Then I was thinking of somehow turning the matrix into my list of edges, like this:
G1=nx.grid_2d_graph(G)
but of course it does not work since this function takes 2 args and I am only giving it 1.
My questions:
How to create this kind of graph in NetworkX?
How to make sure that all nodes are connected?
How to make sure that, upon assigning the 1 in the matrix, each pair of nodes has the same probability of landing a 1?
Example for point 3. Imagine we created the regular grid of points which positions are determined according to pos. When we start connecting the network and we select the first node, we want to make sure that the endpoint of this first edge is one of the N-1 nodes left in the network (except the starting node itself). Anyhow, we want to make sure that all N-1 nodes have the same probability of being connected to the node we first analyze.
Thanks a lot!
I will try to build on the previous questions concerning this problem to be consistent. Given that you have the keys of the grid_2d_graph as 'n' and not (i,j) with the relabel nodes function:
N = 10
G=nx.grid_2d_graph(N,N)
pos = dict( (n, n) for n in G.nodes() )
labels = dict( ((i, j), i + (N-1-j) * N ) for i, j in G.nodes() )
nx.relabel_nodes(G,labels,False)
now you can set the pos dictionary to map from the 'n' keyed nodes to the positions you already have by switching keys with values. And then simply call the Erdos-Renyi function to create the graph that has a probability 'p' that an edge exists between any two nodes (except a self edge) as you described in point 3. Then draw with the pos dictionary.
pos = {y:x for x,y in labels.iteritems()}
G2 = nx.erdos_renyi_graph(100,0.1)
nx.draw_networkx(G2, pos=pos, with_labels=True, node_size = 300)
print G.nodes()
plt.axis('off')
plt.show()
As for ensuring that the graph is connected in point 2 You can not guarantee that the graph is connected with a probability 1, but you can read a bit about the size of the Giant component in Erdos-Renyi graph. But to avoid getting into theoretical details, one is almost sure that the graph will be connected when lambda which is n*p (here they are 100*0.1) is greater than 4. Though for smaller graphs (like 100 nodes) you better increase lambda. From my own experience using n= 100 and p = 0.1, produced a non-connected graph for only about 0.2% of the times and that is after thousands of simulations. And anyway you can always make sure if the produced graph is connected or not with the is_connected method.

how to create random single source random acyclic directed graphs with negative edge weights in python

I want to do a execution time analysis of the bellman ford algorithm on a large number of graphs and in order to do that I need to generate a large number of random DAGS with the possibility of having negative edge weights.
I am using networkx in python. There are a lot of random graph generators in the networkx library but what will be the one that will return the directed graph with edge weights and the source vertex.
I am using networkx.generators.directed.gnc_graph() but that does not quite guarantee to return only a single source vertex.
Is there a way to do this with or even without networkx?
You can generate random DAGs using the gnp_random_graph() generator and only keeping edges that point from lower indices to higher. e.g.
In [44]: import networkx as nx
In [45]: import random
In [46]: G=nx.gnp_random_graph(10,0.5,directed=True)
In [47]: DAG = nx.DiGraph([(u,v,{'weight':random.randint(-10,10)}) for (u,v) in G.edges() if u<v])
In [48]: nx.is_directed_acyclic_graph(DAG)
Out[48]: True
These can have more than one source but you could fix that with #Christopher's suggestion of making a "super source" that points to all of the sources.
For small connectivity probability values (p=0.5 in the above) these won't likely be connected either.
I noticed that the generated graphs have always exactly one sink vertex which is the first vertex. You can reverse direction of all edges to get a graph with single source vertex.
The method suggested by #Aric will generate random DAGs but the method will not work for a large number of nodes for example: for n tending to 100000.
G = nx.gnp_random_graph(n, 0.5, directed=True)
DAG = nx.DiGraph([(u, v,) for (u, v) in G.edges() if u < v])
# print(nx.is_directed_acyclic_graph(DAG)) # to check if the graph is DAG (though it will be a DAG)
A = nx.adjacency_matrix(DAG)
AM = A.toarray().tolist() # 1 for outgoing edges
while(len(AM)!=n):
AM = create_random_dag(n)
# to display the DAG in matplotlib uncomment these 2 line
# nx.draw(DAG,with_labels = True)
# plt.show()
return AM
For a large number of nodes, you can use the property that every lower triangular matrix is a DAG.
So generating random Lower Triangular matrix will generate random DAG.
mat = [[0 for x in range(N)] for y in range(N)]
for _ in range(N):
for j in range(5):
v1 = random.randint(0,N-1)
v2 = random.randint(0,N-1)
if(v1 > v2):
mat[v1][v2] = 1
elif(v1 < v2):
mat[v2][v1] = 1
for r in mat:
print(','.join(map(str, r)))
For G -> DG -> DAG
DAG with k inputs and m outputs
Generate a graph with your favorite algorithm( G=watts_strogatz_graph(10,2,0.4) )
make the graph to bidirectional ( DG = G.to_directed())
ensure only node with low index points to high index
remove k lowest index nodes' input edge, and m highest index nodes' output edges ( that make DG to DAG)
make sure every k lowest index nodes have output edges, and every m highest index nodes have input edges.
check every node in this DAG, if the k<index<n-m, and it only has no input edges or output edges, randomly choose a node in k lowest index nodes to link to or choose a node in m highest index nodes to link to it, then you get a random DAG with k inputs and m outputs
Like:
def g2dag(G: nx.Graph, k: int, m: int, seed=None) -> nx.DiGraph:
if seed is not None:
random.seed(seed)
DG = G.to_directed()
n = len(DG.nodes())
assert n > k and n > m
# Ensure only node with low index points to high index
for e in list(DG.edges):
if e[0] >= e[1]:
DG.remove_edge(*e)
# Remove k lowest index nodes' input edge. Randomly link a node if
# they have not output edges.
# And remove m highest index nodes' output edges. Randomly link a node if
# they have not input edges.
# ( that make DG to DAG)
n_list = sorted(list(DG.nodes))
for i in range(k):
n_idx = n_list[i]
for e in list(DG.in_edges(n_idx)):
DG.remove_edge(*e)
if len(DG.out_edges(n_idx)) == 0:
DG.add_edge(n_idx, random.random_choice(n_list[k:]))
for i in range(n-m, n):
n_idx = n_list[i]
for e in list(DG.out_edges(n_idx)):
DG.remove_edge(*e)
if len(DG.in_edges(n_idx)) == 0:
DG.add_edge(random.random_choice(n_list[:n-m], n_idx))
# If the k<index<n-m, and it only has no input edges or output edges,
# randomly choose a node in k lowest index nodes to link to or
# choose a node in m highest index nodes to link to it,
for i in range(k, m-n):
n_idx = n_list[i]
if len(DG.in_edges(n_idx)) == 0:
DG.add_edge(random.random_choice(n_list[:k], n_idx))
if len(DG.out_edges(n_idx)) == 0:
DG.add_edge(n_idx, random.random_choice(n_list[n-m:]))
# then you get a random DAG with k inputs and m outputs
return DG

Categories

Resources