I have a graph of nodes assigned with a certain value. I wish to find the clique in the graph with the maximum sum of nodes (note not necessarily the maximum clique)
One approach I thought of would be a greedy algorithm that:
Select the largest node from the graph
Select the largest next node such that it is connected to all the previously selected nodes if the sum of the nodes increases.
Repeat 2 until the sum does not increase any more
However, this approach does not lead to correctness as you could imagine a graph with a clique of 8 nodes all of value 1, and a single node of value 7. The correct answer here is 8, not 7. My actual problem has a complex graph but here are some examples of the desired result with the actual graph and the biggest sum, which I found manually:
Here is a simpler example with the solution:
What would be the best graph representation and an efficient and correct way to solve this problem in python on an arbitrary graph in a representation of your choice in python without libraries?
Related
I have a set of points and need to select an optimal subset of 3 of them, where the criterion is a linear sum of some properties of the points, and some properties of pairs of the points.
In Python, this is quite easy using itertools.combinations:
all_points = combinations(points, 3)
costs = []
for i, (p1, p2, p3) in enumerate(all_points):
costs.append((p1.weight + p2.weight + p3.weight
+ pair_weight(p1, p2) + pair_weight(p1, p3) + pair_weight(p2, p3),
i))
costs.sort()
best = all_points[costs[0][1]]
The problem is that this is a brute force solution, requiring to enumerate all possible combinations of 3 points, which is O(n^3) in the number of points and therefore easily leads to a very large number of evaluations to perform. I have been trying to research whether there is a more efficient way to do this, perhaps taking advantage of the linearity of the cost function.
I have tried turning this into a networkx graph featuring node and edge weights. However, I have not yet found an algorithm in that toolkit that can calculate the "shortest triangle", particularly one that considers both edge and node weights. (Shortest path algorithms tend to only consider edge weights for example.)
There are functions to enumerate all cliques, and then I can select 3-cliques, and calculate the cost, but this is also brute force and therefore not better than doing it with combinations as above.
Are there any other algorithms I can look at?
By the way, if I do not have the edge weights, it is easy to just sort the nodes by their node-weight and choose the first three. So it is really the paired costs that add complexity to this problem. I am wondering if somehow I can just list all pairs and find the top-k of those that form triangles, or something better? At least if I could efficiently enumerate top candidates and stop the enumeration on some heuristic, it might be better than the brute force approach.
From now on, I will use n as the number of nodes and m as the number of edges. If your graph is fully connected, then m is just n choose 2. I'll also disregard node weights, because as the comments to your initial post have noted, the node weights can be absorbed into the edges they're connected to.
Your algorithm is O(n^3); it's hopefully not too hard to see why: You iterate over every possible triplet of nodes. However, it is possible to iterate over every triangle in a graph in O(m sqrt(m)):
for every node u:
for every node v adjacent to u:
if degree(u) < degree(v): continue;
for every node w adjacent to v:
if degree(v) < degree(w): continue;
if u is not connected to w: continue;
// <u,v,w> is a triangle!
The proof for this algorithm's runtime of O(m sqrt(m)) is nontrivial, so I'll direct you here: https://cs.stanford.edu/~rishig/courses/ref/l1.pdf
If your graph is fully connected, then you've gotta stick with the O(n^3), I think. There might be some early-pruning ideas you can do but they won't lead to a significant speedup, probably 2x at very best.
I have an undirected NetworkX graph (not necessarily complete, but connected). Also, I have a list of few nodes from all nodes of the above graph. I want to get the travel distance as per TSP result(or it's approximation).
I tried using networkx.approximation.traveling_salesman_problem(DS.G, nodes=['A0_S0_R0', 'A0_S14_R4', 'A0_S4_R4', 'A0_S14_R4', 'A0_S14_R4', 'A0_S7_R4']), but output list is a list of nodes to be used to cover given nodes in the order they appear in the given input list.
I want either total minimised distance or list of nodes in the order such that travelling distance is minimised and not according to their appearance in the input list.
The tsp function will return the list of nodes corresponding to best found path.
You can get the minimum distance using
nodes = [...] # list of nodes returned by traveling_salesman_problem()
# assuming you used `weight` key for edge weights
distances = [G.get_edge_data(nodes[i],node[i+1])['weight'] for i in range(len(nodes)-1)]
min_distance = sum(distances)
I have a large graph with about 50,000 nodes and 2,000,000 edges. I need to find all pairs of nodes that are between 2 and 3 hops away from each other.
The simplest (and dirty) solution is first to create combinatorial expansion and then check each pair of nodes:
import networkx as nx
import itertools
g = nx.erdos_renyi_graph(n=5000, p=0.05)
L = list(G.nodes())
# Create a complete graph
G = nx.Graph()
G.add_nodes_from(L)
G.add_edges_from(itertools.combinations(L, 2))
However, when I try to create a complete graph with 45,000 nodes and 2 million edges, I ran out of RAM.
Is there any other solution that could inspect a large graph in a reasonable time? Thanks for any advice or pointer.
Take your edge list and convert it to an adjacency matrix, see answers here for how to do it memory-efficiently with scipy.sparse. Then use numpy.linalg.matrix_power to raise the adjacency matrix to 2nd power. For each entry in the squared matrix, if the entry is non-zero, there exists a path of length two between the nodes (in fact the entry gives you the number of paths of length 2 between the nodes). See answers here:
Powers of the adjacency matrix are concatenating walks. The ijth entry of the kth power of the adjacency matrix tells you the number of walks of length k from vertex i to vertex j.
You can do the same for paths of length 3, by raising it to the 3rd power.
There is question from one Quiz that I don't fully understand:
Suppose you have a weighted directed graph and want to find a path between nodes A and B with the smallest total weight. Select the most accurate statement:
If some edges have negative weights, depth-first search finds a correct solution.
If all edges have weight 2, depth-first search guarantees that the first path found to be is the shortest path.
If some edges have negative weights, breadth-first search finds a correct solution.
If all edges have weight 2, breadth-first search guarantees that the first path found to be is the shortest path.
Am I right that #1 is correct?
4 is the correct one!, because all edges have the same weight, so you need to find the node traversing the minimum quantity of edges.
1 Is wrong because depth-first search doesn't consider edge weights, so any node could be reached first
I have the following problem:
Consider a weighted direct graph.
Each node has a rating and the weighted edges represents
the "influence" of a node on its neighbors.
When a node rating change, the neighbors will see their own rating modified (positively or negatively)
How to propagate a new rating on one node?
I think this should be a standard algorithm but which one?
This is a general question but in practice I am using Python ;)
Thanks
[EDIT]
The rating is a simple float value between 0 to 1: [0.0,1.0]
There is certainly a convergence issue: I want just limit the propagation to a few iteration...
There is an easy standard way to do it as follows:
let G=(V,E) be the graph
let w:E->R be a weight function such that w(e) = weight of edge e
let A be an array such that A[v] = rating(v)
let n be the required number of iterations
for i from 1 to n (inclusive) do:
for each vertex v in V:
A'[v] = calculateNewRating(v,A,w) #use the array A for the old values and w
A <- A' #assign A with the new values which are stored in A'
return A
However, for some cases - you might have better algorithms based on the features of the graph and how the rating for each node is recalculated. For example:
Assume rating'(v) = sum(rating(u) * w(u,v)) for each (u,v) in E, and you get a variation of Page Rank, which is guaranteed to converge to the principle eigenvector if the graph is strongly connected (Perron-Forbenius theorem), so calculating the final value is simple.
Assume rating'(v) = max{ rating(u) | for each (u,v) in E}, then it is also guaranteed to converge and can be solved linearly using strongly connected components. This thread discusses this case.