This is more a logical thinking issue rather than coding.
I already have some working code blocks - one which telnets to a device, one which parses results of a command, one which populates a dictionary etc etc
Now lets say I want to analyse a network with unknown nodes, a,b,c etc (but I only know about 1)
I give my code block node a. The results are a table including b, c. I save that in a dictionary
I then want to use that first entry (b) as a target and see what it can see. Possibly d, e, etc And add those (if any) to the dict
And then do the same on the next node in this newly populated dictionary. The final output would be that all nodes have been visited once only, and all devices seen are recorded in this (or another) dictionary.
However I can't figure out how to keep re-reading the dict as it grows, and I can't figure out how to avoid looking at a device more than once.
I understand this is clearer to me than I have explained, apologies if it's confusing
You are looking at graph algorithms, specifically DFS or BFS. Are you asking specifically about implementation details, or more generally about the algorithms?
Recursion would be a very neat way of doing this.
seen = {}
def DFS( node ):
for neighbour in node.neighbours():
if neighbour not in seen:
seen[ neighbour ] = some_info_about_neighbour
DFS( neighbour )
Related
I have multiple node- and edgelists which form a large graph, lets call that the maingraph. My current strategy is to first read all the nodelists and import it with add_vertices. Every node then gets an internal id which depends on the order they are ingested and therefore isnt very reliable (as i've read it, if you delete one, all higher ids than the one deleted change). I assign every node a name attribute which corresponds to the external ID I use so I can keep track of my nodes between frameworks and a type attribute.
Now, how do I add the edges? When I read an edgelist it will start making a new graph (subgraph) and hence starts the internal ID at 0. Therefore, "merging" the graphs with maingraph.add_edges(subgraph.get_edgelist) inevitably fails.
It is possible to work around this and use the name attribute from both maingraph and subgraph to find out which internal ID each edges' incident nodes have in the maingraph:
def _get_real_source_and_target_id(edge):
''' takes an edge from the to-be-added subgraph and gets the ids of the corresponding nodes in the
maingraph by their name '''
source_id = maingraph.vs.select(name_eq=subgraph.vs[edge[0]]["name"])[0].index
target_id = maingraph.vs.select(name_eq=subgraph.vs[edge[1]]["name"])[0].index
return (source_id,target_id)
And then I tried
edgelist = [_get_real_source_and_target_id(x) for x in subgraph.get_edgelist()]
maingraph.add_edges(edgelist)
But that is hoooooorribly slow. The graph has millions of nodes and edges, which takes 10 seconds to load with the fast, but incorrect maingraph.add_edges(subgraph.get_edgelist) approach. with the correct approach explained above, it takes minutes (I usually stop it after 5 minutes o so). I will have to do this tens of thousands of times. I switched from NetworkX to Igraph because of the fast loading, but it doesn't really help if I have to do it like this.
Does anybody have a more clever way to do this? Any help much appreciated!
Thanks!
Nevermind, I figured out that the mistake was elsewhere. I used numpy.loadtxt() to read the node's names as strings, which somehow did funny stuff when the names were incrementing numbers with more than five figures (see my issue report here). Therefore the above solution got stuck when it tried to get the nodes where numpy messed up the node name. maingraph.vs.select(name_eq=subgraph.vs[edge[0]]["name"])[0].index simply sat there when it couldnt find the node. Now I use pandas to read the node names and it works fine.
The solution above is still ~10x faster than my previous NetworkX solution, so I will just leave it helps someone. Feel free to delete it otherwise.
Designing one algorithm using Python I'm trying to maintain one invariant, but I don't know if that is even possible to maintain. It's part of an MST algorithm.
I have some "wanted nodes". They are wanted by one or more clusters, which are implemented as a list of nodes. If I get a node that is wanted by one cluster, it gets placed into that cluster. However, if more than one cluster want it, all those clusters get merged and then the node gets placed in the resulting cluster.
My goal
I am trying to get the biggest cluster of the list of "wanting clusters" in constant time, as if I had a max-heap and I could use the updated size of each cluster as the key.
What I am doing so far
The structure that I am using right now is a dict, where the keys are the nodes, and the values are lists with the clusters that want the node at the key. This way, if I get a node I can check in constant time if some cluster wants it, and in case there are, I loop through the list of clusters checking who is the biggest. Once I finish the loop, I merge the clusters by updating the information in all the smaller clusters. This way I get a total merging time of O(n logn), instead of O(n²).
Question
I was wondering if I could use something like a heap to store in my dict as the value, but I don't know how that heap would be updated with the current size of each cluster. Is it possible to do something like that by using pointers and possible other dict storing the size of each cluster?
Very briefly, two-three basic questions about the minimize_nested_blockmodel_dl function in graph-tool library. Is there a way to figure out which vertex falls onto which block? In other words, to extract a list from each block, containing the labels of its vertices.
The hierarchical visualization is rather difficult to understand for amateurs in network theory, e.g. are the squares with directed edges that are drawn meant to implicate the main direction of the underlying edges between two blocks under consideration? The blocks are nicely shown using different colors, but on a very conceptual level, which types of patterns or edge/vertex properties are behind the block categorization of vertices? In other words, when two vertices are in the same block, what can I say about their common properties?
Regarding your first question, it is fairly straightforward: The minimize_nested_blockmodel_dl() function returns a NestedBlockState object:
g = collection.data["football"]
state = minimize_nested_blockmodel_dl(g)
you can query the group membership of the nodes by inspecting the first level of the hierarchy:
lstate = state.levels[0]
This is a BlockState object, from which we get the group memberships via the get_blocks() method:
b = lstate.get_blocks()
print(b[30]) # prints the group membership of node 30
Regarding your second question, the stochastic block model assumes that nodes that belong to the same group have the same probability of connecting to the rest of the network. Hence, nodes that get classified in the same group by the function above have similar connectivity patterns. For example, if we look at the fit for the football network:
state.draw(output="football.png")
We see that nodes that belong to the same group tend to have more connections to other nodes of the same group --- a typical example of community structure. However, this is just one of the many possibilities that can be uncovered by the stochastic block model. Other topological patterns include core-periphery organization, bipartiteness, etc.
I have coded a network using igraph (undirected), and I want to obtain the list of pairs of nodes that are not connected in the network.
Looking through the igraph's documentation (Python), I haven't found a method that does this. Do I have to do this manually?
An related question: given any pair of nodes in the network, how do I find the list of common neighbors of these two nodes using igraph? Again, there seems no such method readily available in igraph.
Re the first question (listing pairs of disconnected nodes): yes, you have to do this manually, but it is fairly easy:
from itertools import product
all_nodes = set(range(g.vcount())
disconnected_pairs = [list(product(cluster, all_nodes.difference(cluster))) \
for cluster in g.clusters()]
But beware, this could be a fairly large list if your graph is large and consists of a lot of disconnected components.
Re the second question (listing common neighbors): again, you have to do this manually but it only takes a single set intersection operation in Python:
set(g.neighbors(v1)).intersection(set(g.neighbors(v2)))
If you find that you need to do this for many pairs of nodes, you should probably create the neighbor sets first:
neighbor_sets = [set(neis) for neis in g.get_adjlist()]
Then you can simply write neighbor_sets[i] instead of set(g.neighbors(i)).
If I have a rdflib.Uriref that point to a resource that I do not need any more. How can i remove it safely using rdflib.
If for example I just remove all the triples that refer to it may be a could broke something like a Bnode that is a list.
If I have a rdflib.Uriref that point to a resource that I do not need
any more. How can i remove it safely using rdflib.
An RDF graph is just a collection of triples. It doesn't contain any resources or nodes independent of those triples.
If for example I just remove all the triples that refer to it may be a
could broke something like a Bnode that is a list.
Removing all the triples that use a URI resource is the correct way to "remove it from the graph". There's no way for this to "break" the graph. Whether it invalidates any structure in the graph is another question, but one that you'd have to answer based on the structure that you're putting in the graph. You'd need to check in advance whether the resource appears in any triples that shouldn't be removed.