Discrepancy in calculating graph coloring code complexity - python

Consider the code below. Suppose the graph in question has N nodes with at most D neighbors for each node, and D+1 colors are available for coloring the nodes such that no two nodes connected with an edge have the same color assigned to them. I reckon the complexity of the code below is O(N*D) because for each of the N nodes we loop through the at most D neighbors of that node to populate the set illegal_colors, and then iterate through colors list that comprises D+1 colors. But the complexity given is O(N+M) where M is the number of edges. What am I doing wrong here?
def color_graph(graph, colors):
for node in graph:
if node in node.neighbors:
raise Exception('Legal coloring impossible for node with loop: %s' %
node.label)
# Get the node's neighbors' colors, as a set so we
# can check if a color is illegal in constant time
illegal_colors = set([
neighbor.color
for neighbor in node.neighbors
if neighbor.color
])
# Assign the first legal color
for color in colors:
if color not in illegal_colors:
node.color = color
break

The number of edges M, the maximum degree D and the number of nodes N satisfy the inequality:
M <= N * D / 2.
Therefore O(N+M) is included in O(N*(D+1)).
In your algorithm, you loop over every neighbour of every node. The exact complexity of that is not N*D, but d1 + d2 + d3 + ... + dN where di is the degree of node i. This sum is equal to 2*M, which is at most N*D but might be less.
Therefore the complexity of your algorithm is O(N+M). Hence it is also O(N*(D+1)). Note that O(N*(D+1)) = O(N*D) under the assumption D >= 1.
Saying your algorithm runs in O(N+M) is slightly more precise than saying it runs in O(N*D). If most nodes have a lot fewer than D neighbours, then M+N might be much smaller than N*D.
Also note that O(M+N) = O(M) under the assumption that every node has at least one neighbour.

Related

Networkx bipartite graph edge cover

I need to compute an edge cover of a weighted bipartite graph which I have built in Networkx. Based on this answer, I have two algorithms that respectively return a minimum weight edge cover and a minimum cardinality (and weight) one. The minimum weight algorithm presents some odd behaviour in the choice of edges, which may be related to an error that happens in the minimum cardinality algorithm, so I'll explain both situations below.
Here are a few details about the graphs being considered:
My current test case has about 1200 nodes on one side and 1600 on the other, with over a million edges
All nodes have at least one incident edge
The graph is typically disconnected in a few blocks
The problem is built as an undirected graph, but directed edges would also make sense (they would always be from the set with bipartite==_og_id to the other)
Minimum weight algorithm
This algorithm seems to always pick the vv' edges (i.e., the edges that are between a node in the original graph and its copy in the larger graph). I thought it was because some edges had a weight of 0 (causing the vv'edge to also have a weight of 0), but adding a minimum weight when building the graph did not change this behaviour. (I use 0.1 since the minimum nonzero weight in the graph should be 1) This basically reverts the algorithm to "for each node, pick the edge that has the smallest weight" which is suboptimal.
Code:
def _min_weight_edge_cover(g: nx.Graph):
"""Returns an edge cover that minimizes the total weight of included edges, but not the total number of edges"""
clone = g.copy()
for node, bi in g.nodes(data='bipartite'):
nd = f"{node}_copy"
clone.nodes[node]['copy'] = False
clone.add_node(nd, copy=True, bipartite=(_og_id if bi == _tg_id else _tg_id)) # invert the bipartite flag
minw = min([w for u, v, w in g.edges(node, data='weight')])
clone.add_edge(node, nd, weight=(2 * minw))
# Now clone contains both the nodes of g and their copies, and should still be bipartite
tops = {n for n, d in clone.nodes(data=True) if d['bipartite'] == _og_id}
bots = set(clone) - tops
print(f"[cover] we have {len(tops)} tops and {len(bots)} bots")
# Here the matching should always exist and be perfect
matching = nx.bipartite.minimum_weight_full_matching(clone, tops)
cover = g.copy()
cover.clear_edges()
keys = {k for k in matching.keys() if clone.nodes[k]['copy'] is False}
for k in keys:
v = matching[k]
if g.has_edge(k, v):
# We never get here
cover.add_edge(k, v)
else:
# v was a copy - this is always true
assert clone.nodes[v]['copy']
minw = math.inf
mine = None
# FIXME should check that we don't add edges between nodes that are already covered
for u, va, w in g.edges(k, data='weight'):
if w < minw:
minw = w
mine = (u, va)
cover.add_edge(*mine)
return cover
Minimum cardinality (and weight)
This algorithm is much simpler (start with a matching and then add the cheapest edge of each node not included in the matching). However, the nx.bipartite.minimum_weight_full_matching function causes an error with cost matrix is infeasible in scipy.optimize.linear_sum_assignment. Unfortunately, there are no details on what makes the cost matrix infeasible. The documentation states that the function takes into account the different number of nodes in the sets, and I've made sure that all nodes have at least one edge. networkx.min_weight matching does work, but it's much, much slower than the bipartite version.
Code:
def _min_cardinality_weight_edge_cover(g: nx.Graph) -> nx.Graph:
"""Returns an edge cover that minimizes
1. the number of edges included;
2. the total weight of all edges included
"""
# get the minimum weight matching.
# By definition, it will have at most one edge per node but some node may end up unmatched
matching = nx.bipartite.minimum_weight_full_matching(g, top_nodes={n for n, b in g.nodes(data='bipartite') if b ==_og_id})
# to make it into a cover, we take all edges from the matching and, for each node not matched, add its cheapest edge
cover = nx.Graph()
cover.add_edges_from(matching.items())
missing = set(g.nodes) - set(cover.nodes)
# there shouldn't be a case where two missing nodes could connect to each other or else that edge would have been
# included in the matching
for node in missing:
minw = math.inf
mine = None
for u, v, w in g.edges(node, data='weight'):
if w < minw:
minw = w
mine = (u, v)
cover.add_edge(*mine)
return cover
Any ideas as to what could be causing these issues?

Algorithm for a complex graph visit

I have a graph of this type
And I have to estimate the probability to end in a node of a given color (let's say red for example). This probability is given by the chance of ending in a node of that color and not ending in any other node of that color in the graph. For example the probability of ending in the upper red node is 0.5*(1-(0.4*0.8)), which is the product of the chance of ending directly in the upper red node (0.5) and the chance of not ending in the lower red node (1- 0.4 *0.8).
So the total probability of ending in a red node is 0.5 *(1-(0.4 *0.8)) + 0.4 *0.8 *(1-0.5).
How can I formulate an algorithm solving this problem?
I created an algorithm that didn't care about the chance of not ending in any node of the same color in the tree (so its total probability of ending in a red node simply was 0.5 + 0.4 *0.8) that I could share if it's useful, but I'm having trouble in this algorithm
The simplier code I was talking before is this one:
def algorithm(self, startingNode, nodeToFind):
valueToReturn = 0
nodesToVisit = startingNode.nodesConnected
for j in nodesToVisit:
if j == nodeToFind:
probabilityToVisit = graph.search_edge(startingNode,j)
valueToReturn += probabilityToVisit
else:
valueToReturn += self.algorithm(j,nodeToFind)
return valueToReturn
The graph is a sort of tree where each node has a number of leaves <=2, and no successors of node X can have the same color of node X. Although two red nodes have almost all the same properties, they differ in their leaves because each single red node will have different sons based on the path traversed to reach it

For loop optimization to create an adjacency matrix

I am currently working with graph with labeled edges.
The original adjacency matrix is a matrix with shape [n_nodes, n_nodes, n_edges] where each cell [i,j, k] is 1 if node i and j are connected via edge k.
I need to create a reverse of the original graph, where nodes become edges and edges become nodes, so i need a new matrix with shape [n_edges, n_edges, n_nodes], where each cell [i,j,k] is 1 if edges i and j have k as a common vertex.
The following code correctly completes the task, but the use of 5 nested for-loops is too slow, to process the amount of graphs with which I have to work seems to take about 700 hours.
Is there a better way to implement this?
n_nodes = extended_adj.shape[0]
n_edges = extended_adj.shape[2]
reversed_graph = torch.zeros(n_edges, n_edges, n_nodes, 1)
for i in range(n_nodes):
for j in range(n_nodes):
for k in range(n_edges):
#If adj_mat[i][j][k] == 1 nodes i and j are connected with edge k
#For this reason the edge k must be connected via node j to every outcoming edge of j
if extended_adj[i][j][k] == 1:
#Given node j, we need to loop through every other possible node (l)
for l in range(n_nodes):
#For every other node, we need to check if they are connected by an edge (m)
for m in range(n_edges):
if extended_adj[j][l][m] == 1:
reversed_graph[k][m][j] = 1
Thanks is advance.
Echoing the comments above, this graph representation is almost certainly cumbersome and inefficient. But that notwithstanding, let's define a vectorized solution without loops and that uses tensor views whenever possible, which should be fairly efficient to compute for larger graphs.
For clarity let's use [i,j,k] to index G (original graph) and [i',j',k'] to index G' (new graph). And let's shorten n_edges to e and n_nodes to n.
Consider the 2D matrix slice = torch.max(G,dim = 1). At each coordinate [a,b] of this slice, a 1 indicates that node a is connected by edge b to some other node (we don't care which).
slice = torch.max(G,dim = 1) # dimension [n,e]
We're well on our way to the solution, but we need an expression that tells us whether a is connected to edge b and another edge c, for all edges c. We can map all combinations b,c by expanding slice, copying it and transposing it, and looking for intersections between the two.
expanded_dim = [slice.shape[0],slice.shape[1],slice.shape[1]] # value [n,e,e]
# two copies of slice, expanded on different dimensions
expanded_slice = slice.unsqueeze(1).expand(expanded_dim) # dimension [n,e,e]
transpose_slice = slice.unsqueeze(2).expand(expanded_dim) # dimension [n,e,e]
G = torch.bitwise_and(expanded_slice,transpose_slice).int() # dimension [n,e,e]
G[i',j',k'] now equals 1 iff node i' is connected by edge j' to some other node, AND node i' is connected by edge k' to some other node. If j' = k' the value is 1 as long as one of the endpoints of that edge is i'.
Lastly, we reorder dimensions to get to your desired form.
G = torch.permute(G,(1,2,0)) # dimension [e,e,n]

Understanding average_degree_connectivity in networkx?

I have some hard time to understand this graph quantity: networkx.algorithms.assortativity.average_degree_connectivity
average_neighbor_degree returns a node id and its average_neighbor_degree:
d – A dictionary keyed by node with average neighbors degree value.
However, I can't understand what average_degree_connectivity is? It returns:
d – A dictionary keyed by degree k with the value of average connectivity.
For example, for three graphs the average_degree_connectivity vs. average neighbors degree value. look as follows. What does average neighbors degree value. mean?
What does average_degree_connectivity represent?
How is average_neighbor_degree related to average_degree_connectivity?
It makes sense to answer your questions the other way round:
Let v be a vertex with m neighbors. The average_neighbor_degree of v is simply the sum of its neighbors' degrees divided by m.
For the average_degree_connectivity, this is the important part of the definition:
... is the average nearest neighbor degree of nodes with degree k
So for all the different degrees that occur in the graph, it gives the average of the average_neighbor_degree of all nodes with the same degree. It is a measure of how connected nodes with certain degrees are.
That are many averages, I hope this snippet clarifies question 2:
import networkx as nx
from collections import defaultdict
G = nx.karate_club_graph()
avg_neigh_degrees = nx.algorithms.assortativity.average_neighbor_degree(G)
deg_to_avg_neighbor_degrees = defaultdict(list)
for node, degree in nx.degree(G):
deg_to_avg_neighbor_degrees[degree].append(avg_neigh_degrees[node])
# this is the same as nx.algorithms.assortativity.average_degree_connectivity(G)
avg_degree_connectivity = {degree: sum(vals)/len(vals) for degree, vals in
deg_to_avg_neighbor_degrees.items()}

how to create random single source random acyclic directed graphs with negative edge weights in python

I want to do a execution time analysis of the bellman ford algorithm on a large number of graphs and in order to do that I need to generate a large number of random DAGS with the possibility of having negative edge weights.
I am using networkx in python. There are a lot of random graph generators in the networkx library but what will be the one that will return the directed graph with edge weights and the source vertex.
I am using networkx.generators.directed.gnc_graph() but that does not quite guarantee to return only a single source vertex.
Is there a way to do this with or even without networkx?
You can generate random DAGs using the gnp_random_graph() generator and only keeping edges that point from lower indices to higher. e.g.
In [44]: import networkx as nx
In [45]: import random
In [46]: G=nx.gnp_random_graph(10,0.5,directed=True)
In [47]: DAG = nx.DiGraph([(u,v,{'weight':random.randint(-10,10)}) for (u,v) in G.edges() if u<v])
In [48]: nx.is_directed_acyclic_graph(DAG)
Out[48]: True
These can have more than one source but you could fix that with #Christopher's suggestion of making a "super source" that points to all of the sources.
For small connectivity probability values (p=0.5 in the above) these won't likely be connected either.
I noticed that the generated graphs have always exactly one sink vertex which is the first vertex. You can reverse direction of all edges to get a graph with single source vertex.
The method suggested by #Aric will generate random DAGs but the method will not work for a large number of nodes for example: for n tending to 100000.
G = nx.gnp_random_graph(n, 0.5, directed=True)
DAG = nx.DiGraph([(u, v,) for (u, v) in G.edges() if u < v])
# print(nx.is_directed_acyclic_graph(DAG)) # to check if the graph is DAG (though it will be a DAG)
A = nx.adjacency_matrix(DAG)
AM = A.toarray().tolist() # 1 for outgoing edges
while(len(AM)!=n):
AM = create_random_dag(n)
# to display the DAG in matplotlib uncomment these 2 line
# nx.draw(DAG,with_labels = True)
# plt.show()
return AM
For a large number of nodes, you can use the property that every lower triangular matrix is a DAG.
So generating random Lower Triangular matrix will generate random DAG.
mat = [[0 for x in range(N)] for y in range(N)]
for _ in range(N):
for j in range(5):
v1 = random.randint(0,N-1)
v2 = random.randint(0,N-1)
if(v1 > v2):
mat[v1][v2] = 1
elif(v1 < v2):
mat[v2][v1] = 1
for r in mat:
print(','.join(map(str, r)))
For G -> DG -> DAG
DAG with k inputs and m outputs
Generate a graph with your favorite algorithm( G=watts_strogatz_graph(10,2,0.4) )
make the graph to bidirectional ( DG = G.to_directed())
ensure only node with low index points to high index
remove k lowest index nodes' input edge, and m highest index nodes' output edges ( that make DG to DAG)
make sure every k lowest index nodes have output edges, and every m highest index nodes have input edges.
check every node in this DAG, if the k<index<n-m, and it only has no input edges or output edges, randomly choose a node in k lowest index nodes to link to or choose a node in m highest index nodes to link to it, then you get a random DAG with k inputs and m outputs
Like:
def g2dag(G: nx.Graph, k: int, m: int, seed=None) -> nx.DiGraph:
if seed is not None:
random.seed(seed)
DG = G.to_directed()
n = len(DG.nodes())
assert n > k and n > m
# Ensure only node with low index points to high index
for e in list(DG.edges):
if e[0] >= e[1]:
DG.remove_edge(*e)
# Remove k lowest index nodes' input edge. Randomly link a node if
# they have not output edges.
# And remove m highest index nodes' output edges. Randomly link a node if
# they have not input edges.
# ( that make DG to DAG)
n_list = sorted(list(DG.nodes))
for i in range(k):
n_idx = n_list[i]
for e in list(DG.in_edges(n_idx)):
DG.remove_edge(*e)
if len(DG.out_edges(n_idx)) == 0:
DG.add_edge(n_idx, random.random_choice(n_list[k:]))
for i in range(n-m, n):
n_idx = n_list[i]
for e in list(DG.out_edges(n_idx)):
DG.remove_edge(*e)
if len(DG.in_edges(n_idx)) == 0:
DG.add_edge(random.random_choice(n_list[:n-m], n_idx))
# If the k<index<n-m, and it only has no input edges or output edges,
# randomly choose a node in k lowest index nodes to link to or
# choose a node in m highest index nodes to link to it,
for i in range(k, m-n):
n_idx = n_list[i]
if len(DG.in_edges(n_idx)) == 0:
DG.add_edge(random.random_choice(n_list[:k], n_idx))
if len(DG.out_edges(n_idx)) == 0:
DG.add_edge(n_idx, random.random_choice(n_list[n-m:]))
# then you get a random DAG with k inputs and m outputs
return DG

Categories

Resources