I'm analysing a big graph - 30M nodes and 350M+ edges - using the python interface of igraph. I can load the edges without any issue but executing a function like transitivity_local_undirected to compute the clustering coefficient of each node returns the error "Transitivity works on simple graphs only, Invalid value".
I can't find anything online - any help would be much appreciated, thanks!
A simple graph is a graph with no loops or multiple edges -- it sounds like the computer thinks your graph is non-simple for some reason.
Are you sure your nodes have no loops or multiple edges between them?
Related
I'm trying to simulate an environment with minimum 300 nodes and random edges connecting the nodes. The network is generated using networkx in python. I want to divide this network into n clusters so that I run algorithms (like travelling salesman or tabu) in each cluster. But I can't find a good resource to do clustering / grouping.
I can successfully generate the graphs and I have previously worked on k-mean clustering, but bridging both has been difficult.
The data type generated by networkx is multidigraph How do I convert this to data type which I can run grouping/clustering algorithm on? (like matrix?if that's possible)
Or am I approaching this in the wrong way?
Any help would be really appreciated.
I've an application where I have a graph and I need to count the number of triangles in the graph using MrJob (MapReduce in Python). However, I'm having some trouble wrapping my head around the mapping and the reducing steps needed.
What is the best Map Reduce pipeline for computing the triangles of a network graph?
Well, it would help to answer this to have a bit more context. Do you have a single graph or a large number of graphs, a tree? How many nodes are we talking about in your graph?
But in general, I would try to build a solution that uses the networkx package, specifically the triangles method at the core.
An issue you may face is filtering duplicates, as the triangles are reported relative to a node.
So a bit more context here on the specifics here would help narrow down the answer.
I am attempting to draw a very large networkx graph that has approximately 5000 nodes and 100000 edges. It represents the road network of a large city. I cannot determine if the computer is hanging or if it simply just takes forever. The line of code that it seems to be hanging on is the following:
##a is my network
pos = networkx.spring_layout(a)
Is there perhaps a better method for plotting such a large network?
Here is the good news. Yes it wasn't broken, It was working and you wouldn't want to wait for it even if you could.
Check out my answer to this question to see what your end result would look like.
Drawing massive networkx graph: Array too big
I think the spring layout is an n^3 algorithm which would take 125,000,000,000 calculations to get the positions for your graph. The best thing for you is to choose a different layout type or plot the positions yourself.
So another alternative is pulling out the relevant points yourself using a tool called gephi.
As Aric said, if you know the locations, that's probably the best option.
If instead you just know distances, but don't have locations to plug in, there's some calculation you can do that will reproduce locations pretty well (up to a rotation). If you do a principal component analysis of the distances and project into 2 dimensions, it will probably do a very good job estimating the geographic locations. (It was an example I saw in a linear algebra class once)
I have a graph that contains edges that must be visited, as well as edges that are optional. The edges have varying weights and can be traveled in either direction and as many times as required. I am trying to determine the route that minimises the total weight.
As I understand it, the Chinese Postman Problem deals with a graphs where every edge of a graph must be visited at least once. Can anyone tell me if the variant described above has a 'name' or point me in the direction of algorithms that might deal with solving this type of graph?
I am attempting to program a solution in Python so any solutions that use that would be great, otherwise I'm sure I will be able to work through a solution.
I am trying to get the list of connected components in a graph with 100 million nodes. For smaller graphs, I usually use the connected_components function of the Networkx module in Python which does exactly that. However, loading a graph with 100 million nodes (and their edges) into memory with this module would require ca. 110GB of memory, which I don't have. An alternative would be to use a graph database which has a connected components function but I haven't found any in Python. It would seem that Dex (API: Java, .NET, C++) has this functionality but I'm not 100% sure. Ideally I'm looking for a solution in Python. Many thanks.
SciPy has a connected components algorithm. It expects as input the adjacency matrix of your graph in one of its sparse matrix formats and handles both the directed and undirected cases.
Building a sparse adjacency matrix from a sequence of (i, j) pairs adj_list where i and j are (zero-based) indices of nodes can be done with
i_indices, j_indices = zip(*adj_list)
adj_matrix = scipy.sparse.coo_matrix((np.ones(number_of_nodes),
(i_indices, j_indices)))
You'll have to do some extra work for the undirected case.
This approach should be efficient if your graph is sparse enough.
https://graph-tool.skewed.de/performance
this tool as you can see from performance is very fast. It's written in C++ but the interface is in Python.
If this tool isn't good enough for you. (Which I think it will) then you can try Apache Giraph (http://giraph.apache.org/).