What happens with unweighted graphs in the Node2Vec algorithm?

What happens with unweighted graphs in the Node2Vec algorithm? - python

According to the paper introducing Node2Vec, by Grover and Leskovec (Node2vec: Scalable Feature Learning for Networks.), the edge transition probability gets prepared and computed in the first step of the algorithm. Random walks only follow afterwards. I am working on an embedding of a Networkx graph using Node2Vec, which works perfectly fine. However, I am not entirely sure how the transition probabilities are being computed given no edge weights in the original graph.
Is it, that each outgoing connection gets the same transition probability? Such that, for instance each is 1/3 when there are three edges? Or do p (return hyperparameter) and q (inout hyperparameter) have already to do with this? Does this first step just gets skipped for unweighted graphs?
Every hint or more explanation regarding this topic is highly appreciated!

Related

Model for generating and detecting communities in dense network

I have a complete undirected weighted graph. Think of a graph where persons are nodes and the edge (u,v,w) indicates the kind of relationship between u and v with weight w. w can take any value between 1 (doesn't know each other - hence the completeness), 2 (acquaintances), 3(friends). This kind of relationships form naturally clusters based on the edge weight.
My goal is to define a model that models this phenomena and from where I can sample some graphs and see the observed behaviour in reality.
So far I've played with stochastic block models (https://graspy.neurodata.io/tutorials/simulations/sbm.html) since there are some papers about the use of these generative models for these community-detection tasks. However I may be overseeing something, since I can't seem to be able to fully represent what I need: g = sbm(list_of_params) where g is complete and there are some discernibles clusters among nodes sharing weight 3.
At this point I am not even sure whether sbm is the best approach for this task.
I am also assuming that everything that graph-tool can do, graspy can also do. Since at the beginning I read about both and it seems that is the case.
Summarizing:
Is there a way to generate a stochastic block model in graspy that yields a complete undirected weighted graph?
Is sbm the best model for the task. Should I be looking at gmm?
Thanks

Is there a way to generate a stochastic block model in graspy that yields a complete undirected weighted graph?
Yes, but as pointed out in the comments above, that's a strange way to specify the model. If you want to benefit from the deep literature on community detection in social networks, you should not use a complete graph. Do what everyone else does: The presence (or absence) of an edge should indicate a relationship (or lack thereof), and an optional weight on the edge can indicate the strength of the relationship.
To generate graphs from SBM with weights, use this function:
https://graspy.neurodata.io/reference/simulations.html#graspologic.simulations.sbm
I am also assuming that everything that graph-tool can do, graspy can also do.
This is not true. There are (at least) two different popular methods for inferring the parameters of an SBM. Unfortunately, the practitioners of each method seem to avoid citing each other in their papers and code.
graph-tool uses an MCMC statistical inference approach to find the optimal graph partitioning.
graspologic (formerly graspy) uses a trick related to spectral clustering to find the partitioning.
From what I can tell, the graph-tool approach offers more straightforward and principled model selection methods. It also has useful extensions, such as overlapping communities, nested (hierarchical) communities, layered graphs, and more.
I'm not as familiar with the graspologic (spectral) methods, but -- to me -- they seem more difficult to extend beyond merely seeking a point estimate for the ideal community partitioning. You should take my opinion with a hefty bit of skepticism, though. I'm not really an expert in this space.

Python programme to find relation between two parameaters

I have the expirimental value of 16 intensity values corresponding to 16 distance. I want to find the relation between Thea's points as an approximate equation,so that i can tell distance required to corresponding intensity value with out plotting the graph.
Is there any python programme for this ?
I can share the values,if required.

Based on the values you have given us, I highly doubt fitting a graph rule to this will work at all. The reason being is this:
If you aren't concerned with minute changes (in the decimals), then you can essentially estimate this to be 5.9 as a fair estimate. If you are concerned with these changes, then looking at the data it has a seemingly erratic behaviour, and I highly doubt you will get an r^2 value sufficient for any practical use.
If you had significantly more points you may be able to make a graph rule from this, or even apply a machine learning model to it (the data is simple enough that a basic feed forward neural network would work. Search for tensorflow), but with just those points a guess of 5.9 is as good as any.

QuickSI algorithm for finding subgraph isomorphisms

I am studying the Quick Subgraph Isomorphism (QuickSI) Algorithm and I am having a problem understanding the formulae regarding the inner support and average inner support calculation described at page 6, (2) and (3). If "v" stands for vertex and "e" stands for edge, then what does f(v) and f(e) do? How can I obtain the values of Table 2 from page 6? Definition 4 from page 5 does not really do much good in helping me understand. By isomorphic mappings from the query graph to the data graph I understand taking different components from the query graph and see if they can be found in the data graph. But the computation time for this does not seem to be too feasible for large graphs.
Here you can find the original article:
http://www.cse.unsw.edu.au/~lxue/10papers/vldb08_haichuan.pdf
Thank you in advance!

The function f is described in Definition 1 - it's just the isomorphism function that preserved the labels (l).
The 'average inner-support' is the number of 'features' (for example, vertices) that have an isomorphism divided by the number of graphs that have an isomorphism. To get the values of the table, you would need to know the dataset of graphs (D) that was used. It doesn't seem to be referenced except in Example 4.
Really, taking a step back - do you need to implement this particular algorithm? There are plenty of simpler ones that might be slightly slower, but clearer. Furthermore, why not use someone else's implementation of a subgraph isomorphism algorithm?

What algorithms can I use to make inferences from a graph?

Edited question to make it a bit more specific.
Not trying to base it on content of nodes but solely of structure of directed graph.
For example, pagerank(at first) solely used the link structure(directed graph) to make inferences on what was more relevant. I'm not totally sure, but I think Elo(chess ranking) does something simlair to rank players(although it adds scores also).
I'm using python's networkx package but right now I just want to understand any algorithms that accomplish this.
Thanks!

Eigenvector centrality is a network metric that can be used to model the probability that a node will be encountered in a random walk. It factors in not only the number of edges that a node has but also the number of edges the nodes it connects to have and onward with the edges that the nodes connected to its connected nodes have and so on. It can be implemented with a random walk which is how Google's PageRank algorithm works.
That said, the field of network analysis is broad and continues to develop with new and interesting research. The way you ask the question implies that you might have a different impression. Perhaps start by looking over the three links I included here and see if that gets you started and then follow up with more specific questions.

You should probably take a look at Markov Random Fields and Conditional Random Fields. Perhaps the closest thing similar to what you're describing is a Bayesian Network

Knight's Tour using a Neural Network

I was looking at the knights tour problem and decided to have a go at implementing it in python using a neural network to find solutions.
The general explanation of the method can be found on Wikipedia
While I think I have implemented it correctly (I can't see anything else that is wrong), it doesn't work, it updates a few links, removing the edges where the connecting vertex has a degree more than two, but it doesn't converge on the solution.
I was wondering if anyone had any ideas on what I have implemented incorrectly (Sorry about the horrible code).
EDIT
Working code can be found at GitHub https://github.com/Yacoby/KnightsTour

You can't update the neurons in place. Since U[t+1] depends on U[t] and V[t], if you have already updated V the calculation for U will be wrong
I think you should split the update into two phases
update_state and update_output, so all the U are updated and then all the V
for n in neurons:
n.update_state()
for n in neurons:
n.update_output()

First impression is that you only have one buffer for the board. I'm basing this on the fact that I don't see any buffer swaps between iterations - I haven't looked that closely and may easily be wrong.
If you modify a single buffer in place, when you do the neighbour counts, you base them on a partly modified board - not the board you had at the start.

After looking over your code, I think your explanation for the formula you used may be incorrect. You say that when updating the state you add four rather than two and subtract the output of the neuron itself. It looks to me like you're subtracting the output of the neuron itself twice. Your code which finds neighbors does not appear to distinguish between neighbors of the neuron and the neuron itself, and you run this code twice- once for each vertex.
Tests on my own code seem to confirm this. Convergence rates drastically improve when I subtract the neuron's own output twice rather than once.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.