Is there a way to cluster networks generated from netwokx? - python

I'm trying to simulate an environment with minimum 300 nodes and random edges connecting the nodes. The network is generated using networkx in python. I want to divide this network into n clusters so that I run algorithms (like travelling salesman or tabu) in each cluster. But I can't find a good resource to do clustering / grouping.
I can successfully generate the graphs and I have previously worked on k-mean clustering, but bridging both has been difficult.
The data type generated by networkx is multidigraph How do I convert this to data type which I can run grouping/clustering algorithm on? (like matrix?if that's possible)
Or am I approaching this in the wrong way?
Any help would be really appreciated.

Related

"Transitivity works on simple graphs only" InternalError in igraph

I'm analysing a big graph - 30M nodes and 350M+ edges - using the python interface of igraph. I can load the edges without any issue but executing a function like transitivity_local_undirected to compute the clustering coefficient of each node returns the error "Transitivity works on simple graphs only, Invalid value".
I can't find anything online - any help would be much appreciated, thanks!
A simple graph is a graph with no loops or multiple edges -- it sounds like the computer thinks your graph is non-simple for some reason.
Are you sure your nodes have no loops or multiple edges between them?

MapReduce: Finding the triangles in a network graph using Mrjob

I've an application where I have a graph and I need to count the number of triangles in the graph using MrJob (MapReduce in Python). However, I'm having some trouble wrapping my head around the mapping and the reducing steps needed.
What is the best Map Reduce pipeline for computing the triangles of a network graph?
Well, it would help to answer this to have a bit more context. Do you have a single graph or a large number of graphs, a tree? How many nodes are we talking about in your graph?
But in general, I would try to build a solution that uses the networkx package, specifically the triangles method at the core.
An issue you may face is filtering duplicates, as the triangles are reported relative to a node.
So a bit more context here on the specifics here would help narrow down the answer.

Graphing a very large graph in networkx

I am attempting to draw a very large networkx graph that has approximately 5000 nodes and 100000 edges. It represents the road network of a large city. I cannot determine if the computer is hanging or if it simply just takes forever. The line of code that it seems to be hanging on is the following:
##a is my network
pos = networkx.spring_layout(a)
Is there perhaps a better method for plotting such a large network?
Here is the good news. Yes it wasn't broken, It was working and you wouldn't want to wait for it even if you could.
Check out my answer to this question to see what your end result would look like.
Drawing massive networkx graph: Array too big
I think the spring layout is an n^3 algorithm which would take 125,000,000,000 calculations to get the positions for your graph. The best thing for you is to choose a different layout type or plot the positions yourself.
So another alternative is pulling out the relevant points yourself using a tool called gephi.
As Aric said, if you know the locations, that's probably the best option.
If instead you just know distances, but don't have locations to plug in, there's some calculation you can do that will reproduce locations pretty well (up to a rotation). If you do a principal component analysis of the distances and project into 2 dimensions, it will probably do a very good job estimating the geographic locations. (It was an example I saw in a linear algebra class once)

Connected components in a graph with 100 million nodes

I am trying to get the list of connected components in a graph with 100 million nodes. For smaller graphs, I usually use the connected_components function of the Networkx module in Python which does exactly that. However, loading a graph with 100 million nodes (and their edges) into memory with this module would require ca. 110GB of memory, which I don't have. An alternative would be to use a graph database which has a connected components function but I haven't found any in Python. It would seem that Dex (API: Java, .NET, C++) has this functionality but I'm not 100% sure. Ideally I'm looking for a solution in Python. Many thanks.
SciPy has a connected components algorithm. It expects as input the adjacency matrix of your graph in one of its sparse matrix formats and handles both the directed and undirected cases.
Building a sparse adjacency matrix from a sequence of (i, j) pairs adj_list where i and j are (zero-based) indices of nodes can be done with
i_indices, j_indices = zip(*adj_list)
adj_matrix = scipy.sparse.coo_matrix((np.ones(number_of_nodes),
(i_indices, j_indices)))
You'll have to do some extra work for the undirected case.
This approach should be efficient if your graph is sparse enough.
https://graph-tool.skewed.de/performance
this tool as you can see from performance is very fast. It's written in C++ but the interface is in Python.
If this tool isn't good enough for you. (Which I think it will) then you can try Apache Giraph (http://giraph.apache.org/).

What algorithms can I use to make inferences from a graph?

Edited question to make it a bit more specific.
Not trying to base it on content of nodes but solely of structure of directed graph.
For example, pagerank(at first) solely used the link structure(directed graph) to make inferences on what was more relevant. I'm not totally sure, but I think Elo(chess ranking) does something simlair to rank players(although it adds scores also).
I'm using python's networkx package but right now I just want to understand any algorithms that accomplish this.
Thanks!
Eigenvector centrality is a network metric that can be used to model the probability that a node will be encountered in a random walk. It factors in not only the number of edges that a node has but also the number of edges the nodes it connects to have and onward with the edges that the nodes connected to its connected nodes have and so on. It can be implemented with a random walk which is how Google's PageRank algorithm works.
That said, the field of network analysis is broad and continues to develop with new and interesting research. The way you ask the question implies that you might have a different impression. Perhaps start by looking over the three links I included here and see if that gets you started and then follow up with more specific questions.
You should probably take a look at Markov Random Fields and Conditional Random Fields. Perhaps the closest thing similar to what you're describing is a Bayesian Network

Categories

Resources