Basic questions about nested blockmodel in graph-tool

Basic questions about nested blockmodel in graph-tool - python

Very briefly, two-three basic questions about the minimize_nested_blockmodel_dl function in graph-tool library. Is there a way to figure out which vertex falls onto which block? In other words, to extract a list from each block, containing the labels of its vertices.
The hierarchical visualization is rather difficult to understand for amateurs in network theory, e.g. are the squares with directed edges that are drawn meant to implicate the main direction of the underlying edges between two blocks under consideration? The blocks are nicely shown using different colors, but on a very conceptual level, which types of patterns or edge/vertex properties are behind the block categorization of vertices? In other words, when two vertices are in the same block, what can I say about their common properties?

Regarding your first question, it is fairly straightforward: The minimize_nested_blockmodel_dl() function returns a NestedBlockState object:
g = collection.data["football"]
state = minimize_nested_blockmodel_dl(g)
you can query the group membership of the nodes by inspecting the first level of the hierarchy:
lstate = state.levels[0]
This is a BlockState object, from which we get the group memberships via the get_blocks() method:
b = lstate.get_blocks()
print(b[30]) # prints the group membership of node 30
Regarding your second question, the stochastic block model assumes that nodes that belong to the same group have the same probability of connecting to the rest of the network. Hence, nodes that get classified in the same group by the function above have similar connectivity patterns. For example, if we look at the fit for the football network:
state.draw(output="football.png")
We see that nodes that belong to the same group tend to have more connections to other nodes of the same group --- a typical example of community structure. However, this is just one of the many possibilities that can be uncovered by the stochastic block model. Other topological patterns include core-periphery organization, bipartiteness, etc.

Related

Use GCN-LSTM to generate descriptions for scenes represented by graphs

Given a scene, I retrieve objects contained in it with an object detector. Next, I identify the potential unary (one object) and binary (a pair of objects) visual relationships in this scene (using specialized classifiers) by specifying the most likely attribute for each unary relationship (e.g. Object Cat with attribute "Sitting", which means that "The cat is sitting") and the most likely predicate for each binary relationship (e.g. A pair of objects [Cup - Table] with the predicate "On", which means that "The cup is on the table").
All of these defined relationships are modeled by an oriented graph with the form:
Each node represents an object of the scene (with its id).
Each defined unary relation (attribute) is modeled by a loop (an arrow, from a node to itself, with the corresponding attribute).
Each defined binary relation is modeled by an arrow, from a node (left side of the relation) to another (right side of the relation), with the corresponding predicate.
The figure below shows an example of the relationship graph construction from a scene. The latter contains four objects: "cup 1", "cup 2", "cat" and a "table". The defined relationships are:
Unary relations (only one): For the "cat" object ("standing").
Binary relationships (5 relationships): Three with the predicate "on" ([Cat - Table], [Cup 1 - Table], [Cup 2 - Table]) and two with the predicate "next to" ([Cat - Cup 1], [Cat - Cup 2]).
The goal is to train a GCN-LSTM which receives (in input) the previous graph and returns (in output) a description (which corresponds to this graph, and therefore, to the initial scene).
The training of this GCN-LSTM is done by creating, for each training scene, its graph (the input) and the output is its description (a paragraph). E.g. For the previous figure, the description (for training) is: "The cat standing on the table is next to a couple of cups". The process for the previous example is shown in the figure below.
I looked for implementation examples of GCN-LSTM (in particular, in the documentation of StellarGraph) for a similar problem, but I found nothing. So, I wanted to know, is there a way to build such a model using StellarGraph? if yes, how? else, which tool can help me to build this model?

GCN-LSTM is designed for encoding graphs with node features that are sequences, and doing forecasting on those sequences. In this case, it looks like you might be trying to encode a graph with fixed features and then use that encoding to generate a sentence.
For this, an appropriate model would be to use a graph classification model, such as GCNSupervisedGraphClassification or DeepGraphCNN. These can be used to encode a graph to an vector for input into a separate LSTM decoder.
Notably, these models cannot easily incorporate edge features/types, so some aspects of the on/next to/standing modelling would have to be adjusted; such as being incorporated into the nodes as features.
For instance, one can replace the final Dense(units=1, activation="sigmoid") layer (or more of the Dense layers) in https://stellargraph.readthedocs.io/en/stable/demos/graph-classification/gcn-supervised-graph-classification.html#Create-the-Keras-graph-classification-model with a more complicated decoder model (i.e. LSTM).
(This answer is copied from my response to a very similar (the same?) question asked on the StellarGraph community forum: original, archive.)

Sum a tree, with values in leaf nodes and subtotal in parent, up to root

Is there a readily available solution to sum all leaf nodes of an n-ary tree, and assign the sum to their parent node all the way up to the root?
Let me explain. I have several reports which I receive weekly. It's a financial document which is essentially an unbalanced hierarchical dataset up to nine-levels deep. Values are assigned only at the last level, but each parent has a sum of it's children (ie. only 1 edge away).
It looks something like:
root:
sectionA:
sectionB:
sectionC:
sectionD:
subtotal sectionE = sum of x
leaf-1 = x_1
leaf-2 = x_2
leaf-n = x_n
I'm working to automate the validation on this dataset. I need to sum each leaf and determine if it matches it's parent-subtotal all the way up to the root-total.
Also, I have a second table listing all of the leaf-elements together with their the parent relations. Something like this:
root:sectionA:sectionB:sectionC:sectionD:sectionE:leaf
I'm thinking a k-ary tree could represent the correct report structure generated from table #2. Then use the tree to compare to the weekly reports. I like this direction because the second table has structural data in this form (the full parent-path). Further down the road, when my script is complete, I will need to generalize the solution.
Is there a Python module or generalizable algorithm that addresses this problem?
Here's an example solution, Sum of all elements of N-ary Tree. But this solution assumes each node has a unique value, and edges are just relations.

After this rousing response of encouragement I implemented a solution in NetworkX.
Studying the documentation and other diverse sources, I learned my specific sum tree problem (root 'amount' value is the sum of the 'amount' attribute of each of it's children) is not common in the wild--obvious to others.
For anyone who discovers this later, here is the solution in outline. I constructed the graph with Pandas as follows.
Create DF of query results.
Munge DF (remove unused rows, columns).
Basic text normalization on headers and DF columns
Explicitly edit node label exceptions in DF rows.
Condition DF for parent, child label discovery.
Partition the DF into chunks at increasing level distance from root.
Create base graph.
Update the base graph; Add parent, child edges using #6
Update the graph with nodes values.
Validate each parent node's assigned value compared to the sum of it's child values, recursively with a little help from my friends.
Two helpful tips for using NetworkX to create a tree from diverse inputs
Why was G not a tree? Several recommended graph inspection methods (nx.is_connected(G), nx.connected_component_subgraphs(G)) result in this error: NetworkXNotImplemented: not implemented for directed type. (A directed graph was required for step #10). One additional method (list(nx.isolates(G))) didn't produce an error, but always produces an empty list.
The final graph generation solution for this tree used two techniques to ensure the graph was a tree:
build the graph by adding nodes to a common statically defined base graph.
utilize more of the column data to chunk the input DF into increasing levels from the root.
This last point is not a requirement for creating a tree from this data, because the input data already had a tree structure. However, it was necessary for debugging the node labels in the volume of results. Updating the DF was not necessary for the graph, but proved useful for later debugging and identifying the label outliers.
All of the exception cases were inconsistent names in the source data. Source data changes, so this will continue to be a problem in search of future solutions.

Create a connected graph of common DBpedia entities

My problem is such: Say I have 4 entities: Renoir, Newton, Leibniz and Pissaro. I need to create a connected graph of all entities common to them from the Dbpedia Ontology.
Example: This is a connected graph between Renoir and Pissaro from DBPedia. The nodes in between are the DBPedia schema's common to both. See image: http://postimg.org/image/6037y9lu1/
We need such a graph between the 4: Renoir, Newton, Leibniz and Pissaro.
http://postimg.org/image/vud0o1lu1/
How should this be done?
I’m novice to DPPedia, R or anything related. Any help is useful.
My objective of doing this is to find transitive connections between entities at conceptual level.

Have you tried to use relFinder? (http://www.visualdataweb.org/relfinder/relfinder.php) It serves precisely this purpose. I attach the graph I obtained when I introduced the four entities in your example:
As you can see, if you want to find a connection between them at a conceptual level you should aim for the "influencedBy"/"influences" relationship.

Get all disconnected pairs of nodes in a network

I have coded a network using igraph (undirected), and I want to obtain the list of pairs of nodes that are not connected in the network.
Looking through the igraph's documentation (Python), I haven't found a method that does this. Do I have to do this manually?
An related question: given any pair of nodes in the network, how do I find the list of common neighbors of these two nodes using igraph? Again, there seems no such method readily available in igraph.

Re the first question (listing pairs of disconnected nodes): yes, you have to do this manually, but it is fairly easy:
from itertools import product
all_nodes = set(range(g.vcount())
disconnected_pairs = [list(product(cluster, all_nodes.difference(cluster))) \
for cluster in g.clusters()]
But beware, this could be a fairly large list if your graph is large and consists of a lot of disconnected components.
Re the second question (listing common neighbors): again, you have to do this manually but it only takes a single set intersection operation in Python:
set(g.neighbors(v1)).intersection(set(g.neighbors(v2)))
If you find that you need to do this for many pairs of nodes, you should probably create the neighbor sets first:
neighbor_sets = [set(neis) for neis in g.get_adjlist()]
Then you can simply write neighbor_sets[i] instead of set(g.neighbors(i)).

using Python/Pexpect to crawl a network

This is more a logical thinking issue rather than coding.
I already have some working code blocks - one which telnets to a device, one which parses results of a command, one which populates a dictionary etc etc
Now lets say I want to analyse a network with unknown nodes, a,b,c etc (but I only know about 1)
I give my code block node a. The results are a table including b, c. I save that in a dictionary
I then want to use that first entry (b) as a target and see what it can see. Possibly d, e, etc And add those (if any) to the dict
And then do the same on the next node in this newly populated dictionary. The final output would be that all nodes have been visited once only, and all devices seen are recorded in this (or another) dictionary.
However I can't figure out how to keep re-reading the dict as it grows, and I can't figure out how to avoid looking at a device more than once.
I understand this is clearer to me than I have explained, apologies if it's confusing

You are looking at graph algorithms, specifically DFS or BFS. Are you asking specifically about implementation details, or more generally about the algorithms?
Recursion would be a very neat way of doing this.
seen = {}
def DFS( node ):
for neighbour in node.neighbours():
if neighbour not in seen:
seen[ neighbour ] = some_info_about_neighbour
DFS( neighbour )

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.