I am trying to use the python NetworkX package to validate some other code, but I am worried that load centrality does not mean what I think. When I run the example below, I expect to only get integer values for load since it should just be a count at each node of the number of shortest paths passing through the node (i.e., I list out all the shortest paths between node pairs, then for each node 'v' count how many paths cross it, excluding paths where 'v' is the first or last node):
edges = [ ('a0','a1'),('a0','a2'),('a1','a4'),('a1','a2'),('a2','a4'),('a2','z5'),('a2','a3'),('a3','z5'),('a4','z5'),('a4','z6'),('a4','z7')
,('z5','z6'),('z5','z7'),('z5','z8'),('z6','z7'),('z6','z8'),('z6','z9'),('z7','z8'),('z7','z9'),('z8','z9')]
import networkx as nx
testg = nx.Graph( edges )
nx.load_centrality( testg, normalized=False )
I get output like this:
{'a0': 0.0,
'a1': 3.16666665,
'a2': 15.4999998,
'a3': 0.0,
'a4': 14.75,
'z5': 20.25,
'z6': 6.04166666,
'z7': 6.04166666,
'z8': 2.24999996,
'z9': 0,0}
These are similar to the values I compute by hand in terms of relative size, but why aren't they integer values? Every other network that I have tested returns integer values for unnormalized load centrality, and I don't see anything in the definition that would lead to these values. The python doc for this function says to see betweenness and also provides an article as reference for the algorithm (which I can't access).
After very extensive calculations based on the paper Downshift linked to, it looks like 'load' follows the betweenness definition in that paper but subtracts off a factor of (2n-1) to adjust for some overcounting in the algorithm. Either that, or the algorithm in the paper doesn't make clear that the initial packets of size '1' should only contribute to nodes they pass through and not to the ends of the paths. In any case, I can match the output values of networkx now. The values differ from networkx's own betweenness function which follows the formula in the documentation based on node pairs rather than propagating packets of size 1 through the network.
In particular, because the packets split into equal size at branch points, nodes can accumulate partial packets and therefore accumulate a non-integer 'load' value. That's not what the description implies in the networkx documentation, but it's clear enough now.
Related
Before I describe my problem I'll summarise what I think I'm looking for. I think I need a method for nearest-neighbor searches which are restricted by node type in python (In my case a node represents an atom and the node type represents the element the atom is). So only returning the nearest neighbors of a given type. Maybe I am wording my problem incorrectly. I haven't been able to find any existing methods for this.
I am writing some ring statistics code to find different types of rings for molecular dynamics simulation data. The input data structure is a big array of atom id, atom type, and XYZ positions.
For example.
At the moment I only consider single-element systems (for example graphene, so only Carbon atoms are present). So each node is considered the same type when finding its nearest neighbors and calculating the adjacency matrix.
For this, I am using KDTree and scipy.spatial algorithms and find all atoms within the bond length, r, from any given atom. If an atom is within, r radius of a given atom I consider it connected and then populate and update an adjacency dictionary accordingly.
def create_adjacency_dict(data, r, leaf_size=5, box_size=None):
from scipy.spatial import KDTree
tree = KDTree(data, leafsize=leaf_size,
boxsize=box_size)
all_nn_indices = tree.query_ball_point(data, r, workers=5) # Calculates neighbours within radius r of a point.
adj_dict = {}
for count, item in enumerate(all_nn_indices):
adj_dict[count] = item # Populate adjacency dictionary
for node, nodes in adj_dict.items():
if node in nodes:
nodes.remove(node) # Remove duplicates
adj_dict = {k: set(v) for k, v in adj_dict.items()}
return adj_dict
I would like to expand the code to deal with multi-species systems. For example AB2, AB2C4 etc (Where A,B and C represent different atomic species). However, I am struggling to figure out a nice way to do this.
A
/ \
B B
The obvious method would be to just do a brute force Euclidean approach. My idea is to input the bond types for a molecule, so for AB2 (shown above), you would input something like AB to indicate the different types of bonds to consider, and then the respective bond lengths. Then loop over each atom finding the distance to all other atoms and, for this example of AB2, if an atom of type A is within the bond length of an atom B, consider them connected and populate the adjacency matrix. However, I'd like to be able to use the code on large datasets of 50,000+ atoms, so this method seems wasteful.
I suppose I could still use my current method, but just search for say the 10 nearest neighbors of a given atom and then do a Euclidean search for each atom pair, following the same approach as above. Still seems like a better method would already exist though.
Do better methods already exist for this type of problem? Finding nearest neighbors restricted by node type? Or maybe someone knows a more correct wording of my problem, which is I think one of my issues here.
"Then search the data."
This sounds like that old cartoon where someone points to a humourously complex diagram with a tiny label in the middle that says "Here a miracle happens"
Seriously, I am guessing that this searching is what you need to optimize ( you do not exactly say )
In turn, this suggests that you are doing a linear search through every atom and calculating the distance of each. Could it be so!?
There is a standard answer for this problem, called an octree.
https://en.wikipedia.org/wiki/Octree
A netflix tv miniseries 'The Billion Dollar Code' dramatizes the advantages of this approach https://www.netflix.com/title/81074012
I have a networkX graph where every node has an attribute.
I need to extract nodes based on a numerical attribute made in the range [0,inf] to create edges.
I tried using random.choice(G.nodes(), p) with p=attribute/(sum of the attributes in the graph).
The problem is that everytime i extract a node to create the edge my attribute change (for example let's say the attribute+=1) so I need to update all the probabilities because also the sum increases by 1.
For example I could have a graph with G.nodes(data=True)={1:{att=10},2:{att=5}, 3:{att=2}}
So p=[10/17, 5/17, 2/17].
If I extract for example 1 at the first extraction my graph will be G.nodes(data=True)={1:{att=11},2:{att=5}, 3:{att=2}} and p=[11/18, 5/18, 2/18].
Now, because i have more than a thousand graph and for every one of them I need to do a 50000 for clause that create edges, it's not computationally feasible to update all the probability every time i create an edge.
Is there a way to just use the node's attribute or to not calculate my probability every time?
By using numpy array I have done this:
G=nx.Graph()
G.add_nodes_from([1,2,3])
G.nodes[1]["att"]=10
G.nodes[2]["att"]=5
G.nodes[3]["att"]=2
dict={}
for i in G.nodes():
dict[i]=G.nodes[i]["att"]
extracted=random.chance(G.nodes(),p=np.fromiter(dict.values(),dtype="float")/np.sum(np.fromiter(dict.values(),dtype="float")))
When extracted (for example node 1) G.nodes[1]["att"]+=1 and nothing else need to be updated
We are given an m x n matrix w which represents the edge weights in a complete bipartite graph K_m,n. We wish to find a map {1,...,m} -> {1,...,n} with minimal weight, which is injective or surjective. Choosing a map is equivalent to, for every vertex v in {1,...,m}, choosing exactly one edge incident to v.
Let m<=n. An injective function with minimal weight can be found by searching for the perfect matching with minimal weight. In Python, this is implemented in scipy:
import numpy as np
import scipy, scipy.optimize
w=np.random.rand(5,10)
print(scipy.optimize.linear_sum_assignment(w))
Let m>=n. How can a surjective function with minimal weight be found? I'm looking for a concrete implementation in Python.
EDIT: It turns out I might have misunderstood your question.
From injective to surjective
If m > n, and you already have an algorithm that handles the case m <= n, then swap the two components. Your algorithm will give you an injective function from the second component to the first. Take the inverse of that function; it will be a surjective function from a subset of the first component to the second component.
Using networkx
What you are looking for is a maximum-cardinality matching in a bipartite graph.
The python library networkx contains several functions for that. Standard problems are "maximum-cardinality matching" and "maximum-weight matching"; I'd be surprised if there were a function to solve your "minimum-weight maximum-cardinality matching" problem directly.
However; it looks to me as if your problem was equivalent to finding a maximum-weight matching in the weighted graph obtained by replacing every weight w by W-w, where W is some very large value (for instance, three times the maximum weight in the original graph).
By including this large value W in the weight of every edge, you're forcing the maximum-weight matching to be a maximum-cardinality matching. And by including the negative value -w, you're asking the algorithm to find edges with the smallest possible original weight in the original graph.
Documentation: networkx.algorithms.matching.max_weight_matching
I have a directed graph with a maximum of 7 nodes. Every node is connected to every other node (not including itself of course) with a directed edge, and edges can have either positive or negative weights. My objective is to find a path from one given node to another, such that the path has a specific length. However, there's a catch. Not only can I make use of loops, if I reach the end node, the path doesn't have to immediately end. This means that I can have a simple path leading to the end node, and then have a loop out of the end node leading back into itself that ultimately. At the same time, I have to maximize the number of unique nodes visited by this path, so that if there are multiple paths of the desired length, I get the one with the most nodes in it.
Besides the problem with loops, I'm having trouble rephrasing this in terms of other simpler problems, like maybe Shortest Path, or Traveling Salesman. I'm not sure how to start tackling this problem. I had an idea of finding all simple paths and all loops, and recursively take combinations of each, but this brings up problems of loops within loops. Is there a more efficient approach to this problem?
Btw, I'm writing this in python.
EDIT: Another thing I forgot to mention that pairs of directed edges between nodes need not necessarily have the same weight. So A -> B might have weight -1, but B -> A might have weight 9.
EDIT 2: As requested, here's the input and output: I'm given the graph, the starting and exit nodes, and the desired length, and I return the path of desired length with the most visited nodes.
Sounds like a combinations problem. Since you don't have a fixed end state.
Let's list what we know.
Every node is connected to every other node, though it is directed. This is a complete digraph. Link: https://en.wikipedia.org/wiki/Complete_graph.
You can cut off the algorithm when it exceeds the desired distance.
Be careful of an infinite loop though; possible if the negative weights are able to equal the positive ones.
In this example, I'd use recursion with a maximum depth that is based on the total number of nodes. While I won't do your homework I'll attempt a pseudo-code start.
def recursion(depth, graph, path, previous_node, score, results):
// 1A. Return if max depth exceeded
// 1B. Return if score exceeded
// 1C. Return if score match AND append path to results
// 2. iterate and recurse through graph:
for node in graph:
path.append(node.name)
score += node.weight
recursion(depth, graph, path, node, score, results)
return results
# The results should contain all the possible paths with the given score.
This is where I'd start. Good luck.
I keep seeing pseudocode for Depth First Search that is completely confusing to me in how it relates to my specific problem. I'm trying to determine whether or not a 'directed graph' is strongly connected.
If I have a dict with 2 strings (the first represents the source, the second represents the destination) and an optional number that represents edge weight:
{'Austin': {'Houston': 300}, 'SanFrancisco': {'Albany': 1000}, 'NewYorkCity': { 'SanDiego': True }}
How can I implement some of the elements of the DFS? I know I can start at a vertex 'Austin' and that 'Houston' is another vertex. But I don't see how any of this works in Python code
Like I have this pseudocode:
function graph_DFS(start):
# Input: start vertex
S = new Stack()
# Mark start as visited
S.push(start)
while S is not empty:
node = S.pop()
# Do something? (e.g. print)
for neighbor in node’s adjacent nodes:
if neighbor not visited:
# Mark neighbor as visited
S.push(neighbor)
I can see that I could pass 'Austin' as my start. But how in the world do I set 'Austin' to visited, and how do I see what nodes are adjacent to 'Austin'?
Plus how can I even use this algorithm to return true or false if the graph is strongly connected?
I am just having a really hard time seeing this transfer from pseudocode to code. Appreciate any help.
I can see that I could pass 'Austin' as my start. But how in the world
do I set 'Austin' to visited, and how do I see what nodes are adjacent
to 'Austin'?
You can see in the code that you pop out 'Austin', so we will not be looking back at it. In your data structure, you allow only one edge from a vertex, so you will never have more than one neighbor.
Plus how can I even use this algorithm to return true or false if the graph is strongly connected?
This is just a utility DFS function, the algorithm to find whether a graph is strongly connected or not, required running DFS twice. Basically, the hint is you want to know whether you get a tree or a forest on running DFS.
In my opinion, you should update your data structure such that the value of every key is a list (vertices to which it has an edge). You can store weights as well in the list in case you need them later.