I have a undirected weighted graph G with a set of nodes and weighted edges.
I want to know if there is a method implemented in networkx to find a minimum spanning tree in a graph between given nodes (e.g. nx.steiner_tree(G, ['Berlin', 'Kiel', 'Munster', 'Nurnberg'])) (aparently there is none?)
I don't have reputation points to post images. The link to similar image could be:
Map
(A3, C1, C5, E4)
What I'm thinking:
check dijkstras shortest paths between all destination nodes;
put all the nodes (intermediate and destinations) and edges to a new graph V;
compute the mst on V (to remove cycles by breaking longest edge);
Maybe there are better ways(corectness- and computation- wise)? Because this approach does pretty bad with three destination nodes and becomes better with more nodes.
P.S. My graph is planar (it can be drawn on paper so that edges would not intersect). So maybe some kind of spring/force (like in d3 visualisation) algorithm could help?
As I understand your question, you're trying to find the lowest weight connected component that contains a set of nodes. This is the Steiner tree in graphs problem. It is NP complete. You're probably best off taking some sort of heuristic based on the specific case you are studying.
For two nodes, the approach to do this is Dijkstra's algorithm- it's fastest if you expand around both nodes until the two shells intersect. For three nodes I suspect a version of Dijkstra's algorithm where you expand around each of the nodes will give some good results, but I don't see how to make sure you're getting the best.
I've found another question for this, which has several decent answers (the posted question had an unweighted graph so it's different, but the algorithms given in the answers are appropriate for weighted). There are some good ones beyond just the accepted answer.
In networkx there's a standard Kruskal algorithm implemented with undirected weighted graph as input. The function is called "minimum_spanning_tree"
I propose you build a subgraph that contains the nodes you need and then let the Kruskal algorithm run on it.
import nertworkx as nx
H=G.subgraph(['Berlin','Kiel', 'Konstanz'])
MST=nx.minimum_spanning_tree(H)
As pointed out already, this is the Steiner tree problem in graphs.
There is a Steiner tree algorithm in networkx:
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.approximation.steinertree.steiner_tree.html
However, it only gives you an approximate solution, and it is also rather slow. For state-of-the-art solvers, see the section "External Links" under:
https://en.wikipedia.org/wiki/Steiner_tree_problem
Related
I created a simple directed graph (DiGraph) in the Python network package networkx, like so:
import networkx as nx
G = DiGraph([(1, 2)])
Now I would like to know the fastest / most efficient way to determine the output nodes (which in this case is the node 2). By output node I mean nodes in the directed graph that have no successor - with input nodes being nodes that have no predecessor. The analogy is from Machine Learning, in which I am supplying values to the input nodes, perform mathematical operations on the edges and have output values in the output nodes. See here for an example, in which p, d and lambda are input nodes and p' and d' are output nodes.
After reading the documentation I came up with such inefficient methods as traversing all nodes of the graph, calling successors() on them and saving all nodes with no successor. This however is terribly inefficient, especially since the graph is obviously considerably larger and more complex in my actual project. I was hoping for a simple and efficient method such as G.out_nodes() or something similar that is keeping track of the output nodes when the graph is edited, though I seem to be unable to find it. However since determining output and input nodes in a directed graph is no uncommon task am I of the opinion that I am missing something. I was hoping that you might be able to help me. Thank you very much.
Once you have your graph you could iterate and get the nodes with out_degree equal to 0.
output_nodes = [u for u, deg in g.out_degree() if not deg]
However, you could get them directly when building your graph, indeed you could remove from the nodes u if (u,v) is an arc. The remaining will be output nodes.
Another possibility is to use another way to represent your graph according to size/most common operations to be applied.
I'm working on a problem where I have a bunch of directed graphs with a single source/sink for each and the edges are probabilistic connections (albeit with 90% of the nodes having only 1 input and 1 output). There also is a critical node in each graph and I'd like to determine if there is any way to traverse the graph that bypasses this node. Ideally I'd also like to enumerate the specific paths which would bypass this node. I've been able to import an example graph into NetworkX and can run some of the functions on the graph without difficulty, but I'm not sure if what I'm looking for is a common request and I just don't know the right terminology to find it in the help files, or if this is something I'll need to code by hand. I'm also open to alternative tools or methods.
First, you might want some way to quantify critical nodes. For that you can use some measure of centrality, probably betweenness centrality in your case. Read more here.
Next, if you know the two nodes you want to travel between you can use this as a kind of psuedocode to help you get there. You can also loop through all possible pairs of nodes that could be traveled through, but that might take a while.
import NetworkX as nx
important_nodes=[]#list of important nodes
paths = nx.all_simple_paths(G, source, target)
paths=list(paths)
#This is pseudocode, next four lines could be done with list comprehension
exclusive_paths=[]
for path in paths:
if important_nodes not in path:
exclusive_paths.append(path)
Read more on all_simple_paths here.
list comprehension might look like this
exclusive_paths=[x for x in paths if important_nodes not in x]
I've an application where I have a graph and I need to count the number of triangles in the graph using MrJob (MapReduce in Python). However, I'm having some trouble wrapping my head around the mapping and the reducing steps needed.
What is the best Map Reduce pipeline for computing the triangles of a network graph?
Well, it would help to answer this to have a bit more context. Do you have a single graph or a large number of graphs, a tree? How many nodes are we talking about in your graph?
But in general, I would try to build a solution that uses the networkx package, specifically the triangles method at the core.
An issue you may face is filtering duplicates, as the triangles are reported relative to a node.
So a bit more context here on the specifics here would help narrow down the answer.
I have a very large connected graph (millions of nodes). Each edge has a weight -- identifying the proximity of the connected nodes. I want to find "clusters" in the graph (sets of nodes that are very close together). For instance, if the nodes were every city in the US and the edges were distance between the cities -- the clusters might be {Dallas, Houston, Fort Worth} and {New York, Bridgeport, Jersey City, Trenton}.
The clusters don't have to be the same size and not all nodes have to be in a cluster. Instead, clusters need to have some average minimum weight, W which is equal to (sum of weights in cluster) / (number of edges in cluster).
I am most comfortable in Python, and NetworkX seems to be the standard tool for this
What is the most efficient graph data structure in Python?
It seems like this would not be too hard to program, although not particularly efficiently. Is there a name for the algorithm I am describing? Is there an implementation in NetworkX already?
I know some graph partitioning algorithms that their goal is to make all parts with approximate same size and minimum edge cut as possible, but as you described you do not need such an algorithm. Anyways i think your problem is NP complete like many other graph partitioning problems.
Maybe there be some algorithms which specifically work fine for your problem (and i think there are but i do not know them) but i think you can still find good and acceptable solutions with slight changing the some algorithms which are originally for finding minimum edge cut with same components size.
For example see this. i think you can use multilevel k-way partitioning with some changes.
For example in coarsening phase, you can use Light Edge Matching.
Consider a situation when in coarsening phase you've matched A and B into one group and also C and D into another group. weight of edge between these two groups is sum of edges of its members to each other e.g. W=Wac+Wad+Wbc+Wbd where W is edge weight, Wac is edge weight between A and C an so on. I also think that considering average of Wac, Wad, Wbc and Wbd instead of sum of them also worth a try.
From my experience this algorithm is very fast and i am not sure you be able to find precoded library in python to make changes into it.
Edited question to make it a bit more specific.
Not trying to base it on content of nodes but solely of structure of directed graph.
For example, pagerank(at first) solely used the link structure(directed graph) to make inferences on what was more relevant. I'm not totally sure, but I think Elo(chess ranking) does something simlair to rank players(although it adds scores also).
I'm using python's networkx package but right now I just want to understand any algorithms that accomplish this.
Thanks!
Eigenvector centrality is a network metric that can be used to model the probability that a node will be encountered in a random walk. It factors in not only the number of edges that a node has but also the number of edges the nodes it connects to have and onward with the edges that the nodes connected to its connected nodes have and so on. It can be implemented with a random walk which is how Google's PageRank algorithm works.
That said, the field of network analysis is broad and continues to develop with new and interesting research. The way you ask the question implies that you might have a different impression. Perhaps start by looking over the three links I included here and see if that gets you started and then follow up with more specific questions.
You should probably take a look at Markov Random Fields and Conditional Random Fields. Perhaps the closest thing similar to what you're describing is a Bayesian Network