Edited question to make it a bit more specific.
Not trying to base it on content of nodes but solely of structure of directed graph.
For example, pagerank(at first) solely used the link structure(directed graph) to make inferences on what was more relevant. I'm not totally sure, but I think Elo(chess ranking) does something simlair to rank players(although it adds scores also).
I'm using python's networkx package but right now I just want to understand any algorithms that accomplish this.
Thanks!
Eigenvector centrality is a network metric that can be used to model the probability that a node will be encountered in a random walk. It factors in not only the number of edges that a node has but also the number of edges the nodes it connects to have and onward with the edges that the nodes connected to its connected nodes have and so on. It can be implemented with a random walk which is how Google's PageRank algorithm works.
That said, the field of network analysis is broad and continues to develop with new and interesting research. The way you ask the question implies that you might have a different impression. Perhaps start by looking over the three links I included here and see if that gets you started and then follow up with more specific questions.
You should probably take a look at Markov Random Fields and Conditional Random Fields. Perhaps the closest thing similar to what you're describing is a Bayesian Network
Related
According to the paper introducing Node2Vec, by Grover and Leskovec (Node2vec: Scalable Feature Learning for Networks.), the edge transition probability gets prepared and computed in the first step of the algorithm. Random walks only follow afterwards. I am working on an embedding of a Networkx graph using Node2Vec, which works perfectly fine. However, I am not entirely sure how the transition probabilities are being computed given no edge weights in the original graph.
Is it, that each outgoing connection gets the same transition probability? Such that, for instance each is 1/3 when there are three edges? Or do p (return hyperparameter) and q (inout hyperparameter) have already to do with this? Does this first step just gets skipped for unweighted graphs?
Every hint or more explanation regarding this topic is highly appreciated!
I have a complete undirected weighted graph. Think of a graph where persons are nodes and the edge (u,v,w) indicates the kind of relationship between u and v with weight w. w can take any value between 1 (doesn't know each other - hence the completeness), 2 (acquaintances), 3(friends). This kind of relationships form naturally clusters based on the edge weight.
My goal is to define a model that models this phenomena and from where I can sample some graphs and see the observed behaviour in reality.
So far I've played with stochastic block models (https://graspy.neurodata.io/tutorials/simulations/sbm.html) since there are some papers about the use of these generative models for these community-detection tasks. However I may be overseeing something, since I can't seem to be able to fully represent what I need: g = sbm(list_of_params) where g is complete and there are some discernibles clusters among nodes sharing weight 3.
At this point I am not even sure whether sbm is the best approach for this task.
I am also assuming that everything that graph-tool can do, graspy can also do. Since at the beginning I read about both and it seems that is the case.
Summarizing:
Is there a way to generate a stochastic block model in graspy that yields a complete undirected weighted graph?
Is sbm the best model for the task. Should I be looking at gmm?
Thanks
Is there a way to generate a stochastic block model in graspy that yields a complete undirected weighted graph?
Yes, but as pointed out in the comments above, that's a strange way to specify the model. If you want to benefit from the deep literature on community detection in social networks, you should not use a complete graph. Do what everyone else does: The presence (or absence) of an edge should indicate a relationship (or lack thereof), and an optional weight on the edge can indicate the strength of the relationship.
To generate graphs from SBM with weights, use this function:
https://graspy.neurodata.io/reference/simulations.html#graspologic.simulations.sbm
I am also assuming that everything that graph-tool can do, graspy can also do.
This is not true. There are (at least) two different popular methods for inferring the parameters of an SBM. Unfortunately, the practitioners of each method seem to avoid citing each other in their papers and code.
graph-tool uses an MCMC statistical inference approach to find the optimal graph partitioning.
graspologic (formerly graspy) uses a trick related to spectral clustering to find the partitioning.
From what I can tell, the graph-tool approach offers more straightforward and principled model selection methods. It also has useful extensions, such as overlapping communities, nested (hierarchical) communities, layered graphs, and more.
I'm not as familiar with the graspologic (spectral) methods, but -- to me -- they seem more difficult to extend beyond merely seeking a point estimate for the ideal community partitioning. You should take my opinion with a hefty bit of skepticism, though. I'm not really an expert in this space.
The title may be little unclear, but to give a brief explanation, I'm applying some biological networks like protein networks to programming. I want to use a breadth-first search to calculate some values. Here's an example of a network I'm currently working with:
On a side note, just because a node isn't named doesn't mean its not a node. Just means its name is not significant for the network.
Simpler example:
My problem here is that I need to represent this network with a data structure, which I need to use to calculate 2 values for every node:
The # of signal paths for a node (how many paths there are from input to output that includes the node)
The # of feedback loops for a node (how many loop paths the node is in)
I need to calculate these values for every single node in the network. Python came to mind because it's a standard for bioinformatics, but I'm open to other languages with potentially built in structures. Within Python, the only thing that comes to mind is some form of DFA/dictionary sort of deal to represent these kind of networks, but I'm posting the question here to see if anyone else has a better idea.
NetworkX works well. If you read section 4.39.2 of the documentation, you will see how to do BFS with NetworkX
I've an application where I have a graph and I need to count the number of triangles in the graph using MrJob (MapReduce in Python). However, I'm having some trouble wrapping my head around the mapping and the reducing steps needed.
What is the best Map Reduce pipeline for computing the triangles of a network graph?
Well, it would help to answer this to have a bit more context. Do you have a single graph or a large number of graphs, a tree? How many nodes are we talking about in your graph?
But in general, I would try to build a solution that uses the networkx package, specifically the triangles method at the core.
An issue you may face is filtering duplicates, as the triangles are reported relative to a node.
So a bit more context here on the specifics here would help narrow down the answer.
I have a undirected weighted graph G with a set of nodes and weighted edges.
I want to know if there is a method implemented in networkx to find a minimum spanning tree in a graph between given nodes (e.g. nx.steiner_tree(G, ['Berlin', 'Kiel', 'Munster', 'Nurnberg'])) (aparently there is none?)
I don't have reputation points to post images. The link to similar image could be:
Map
(A3, C1, C5, E4)
What I'm thinking:
check dijkstras shortest paths between all destination nodes;
put all the nodes (intermediate and destinations) and edges to a new graph V;
compute the mst on V (to remove cycles by breaking longest edge);
Maybe there are better ways(corectness- and computation- wise)? Because this approach does pretty bad with three destination nodes and becomes better with more nodes.
P.S. My graph is planar (it can be drawn on paper so that edges would not intersect). So maybe some kind of spring/force (like in d3 visualisation) algorithm could help?
As I understand your question, you're trying to find the lowest weight connected component that contains a set of nodes. This is the Steiner tree in graphs problem. It is NP complete. You're probably best off taking some sort of heuristic based on the specific case you are studying.
For two nodes, the approach to do this is Dijkstra's algorithm- it's fastest if you expand around both nodes until the two shells intersect. For three nodes I suspect a version of Dijkstra's algorithm where you expand around each of the nodes will give some good results, but I don't see how to make sure you're getting the best.
I've found another question for this, which has several decent answers (the posted question had an unweighted graph so it's different, but the algorithms given in the answers are appropriate for weighted). There are some good ones beyond just the accepted answer.
In networkx there's a standard Kruskal algorithm implemented with undirected weighted graph as input. The function is called "minimum_spanning_tree"
I propose you build a subgraph that contains the nodes you need and then let the Kruskal algorithm run on it.
import nertworkx as nx
H=G.subgraph(['Berlin','Kiel', 'Konstanz'])
MST=nx.minimum_spanning_tree(H)
As pointed out already, this is the Steiner tree problem in graphs.
There is a Steiner tree algorithm in networkx:
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.approximation.steinertree.steiner_tree.html
However, it only gives you an approximate solution, and it is also rather slow. For state-of-the-art solvers, see the section "External Links" under:
https://en.wikipedia.org/wiki/Steiner_tree_problem