I'd like to know the best way to read a disconected undirected graph using igraph for python. For instance, if I have the simple graph in which 0 is linked to 1 and 2 is a node not connected to any other. I couldn't get igraph to read it from a edgelist format(Graph.Read_Edgelist(...)), because every line must be an edge, so the following is not allowed:
0 1
2
I've been just wondering if adjacency matrix is my only/best option in this case (I could get it to work through this representation)? I'd rather a format in which I could understand the data by looking it (something really hard when it comes to matrix format).
Thanks in advance!
There's the LGL format which allows isolated vertices (see Graph.Read_LGL). The format looks like this:
# nodeID
nodeID2
nodeID3
# nodeID2
nodeID4
nodeID5
nodeID
# isolatedNode
# nodeID5
I think you get the basic idea; lines starting with a hash mark indicate that a new node is being defined. After this, the lines specify the neighbors of the node that has just been defined. If you need an isolated node, you just specify the node ID prepended by a hash mark in the line, then continue with the next node.
More information about the LGL format is to be found here.
Another fairly readable format that you might want to examine is the GML format which igraph also supports.
Related
Context
This is the first time I have to work with NetworkX so either I can't read correctly the documentation, or I simply do not use the right vocabulary.
Problem
I am working with a DiGraph, and I want to get a list of every nodes accessible starting from a specified node.
I thought of making a sub-graph containing the nodes I just described, and I would siply have to iterate over that specific sub-graph. Unfortunately, I didn't find a way to create automatically a sub-graph with the condition I mentioned.
It feels like an obvious feature. What am I missing ?
You are looking for the nx.descendants method:
descendants(G, source)
Return all nodes reachable from
(source) in G.
Parameters : G : NetworkX DiGraph
source : node in G
Returns : des : set()
The descendants of source in G
I have a large igraph object that has several edge and vertex attributes that i need to write to a file and load again later (probably by a different program like python).
> g
IGRAPH DN-- 85000 1000000 --
+ attr: name (v/c), numeric_var (e/n), binary_outcome1 (e/x), binary_outcome2 (e/x)
so what format should i use to be able to write all the edge attributes to the file format?
write.graph(g, file = "test1.fileextension",format = "which_format?")
Thanks very much!
The pros & cons of the various supported formats are documented pretty well in the R igraph read.igraph help file: http://igraph.sourceforge.net/doc/R/read.graph.html. The write.igraph page shows support for more types of output
Edge List is too simple for your needs
Pajek may be too domain-specific and has some similar limitations to GraphML
Dot might be able to do what you need (ref: http://www.graphviz.org/Documentation/dotguide.pdf)
GraphML wont' deal with hypergraphs, nested graphs or mixed (directed/undirected) graphs.
GML says that "only node and edge attributes are used, and only if they have a simple type: integer, real or string. So if an attribute is an array or a record, then it is ignored. This is also true if only some values of the attribute are complex."
DL is prbly not going to work for you.
NCOL is "simply a symbolic weighted edge list" so it's prbly out, too.
LGL is also prbly too simple to work.
DIMACS doesn't have the extra info you need.
LEDA (I believe) only supports single attributes.
GraphDB also has limitations.
So, I'd give either GraphML and GML a go.
I have a graph and want to isolate distinct paths from it. Since I can't phrase this easily in graph jargon, here is a sketch:
On the left side is a highly simplified representation of my source graph. The graph has nodes with only 2 neighbors (shown as blue squares). Then it has intersection and end nodes with more than 2 neighbors or exactly 1 neighbor (shown as red dots).
On the right side, coded in three different colors, are the paths I want to isolate as a result. I want to isolate alls paths connecting the red dots. The resulting paths must not cross (go through) any red dots. Each edge may only be part of one distinct result path. No edges should remain (shortest path length is 1).
I am sure this is a known task in the graph world. I have modeled the graph in NetworkX, which I'm using for the first time, and can't find the proper methods to do this. I'm sure I could code it the hard way, but would be glad to use a simple and fast method if it existed. Thanks!
Edit: After randomly browsing the NetworkX documentation I came across the all_simple_paths method. My idea is now to
iterate all nodes and identify the red dots (number of neighbors != 2)
use all_simple_paths() pairwise for the red dots, collect the resulting paths
deduplicate the resulting paths, throw away everything that contains a red dot except as the start and end node
Step 2, of course, won't scale well. With ~2000 intersection nodes, this seems still possible though.
Edit 2: all_simple_paths appears to be way too slow to use it this way.
I propose to find all straight nodes (i. e. nodes which have exactly two neighbors) and from the list of those build up a list of all your straight paths by picking one straight node by random and following its two leads to their two ends (the first non-straight nodes).
In code:
def eachStraightPath(g):
straightNodes = { node for node in g.node if len(g.edge[node]) == 2 }
print straightNodes
while straightNodes:
straightNode = straightNodes.pop()
straightPath = [ straightNode ]
neighborA, neighborB = g.edge[straightNode].keys()
while True: # break out later if node is not straight
straightPath.insert(0, neighborA)
if neighborA not in straightNodes:
break
newNeighborA = (set(g.edge[neighborA]) ^ { straightPath[1] }).pop()
straightNodes.remove(neighborA)
neighborA = newNeighborA
while True: # break out later if node is not straight
straightPath.append(neighborB)
if neighborB not in straightNodes:
break
newNeighborB = (set(g.edge[neighborB]) ^ { straightPath[-2] }).pop()
straightNodes.remove(neighborB)
neighborB = newNeighborB
yield straightPath
g = nx.lollipop_graph(5, 7)
for straightPath in eachStraightPath(g):
print straightPath
If your graph is very large and you do not want to hold a set of all straight nodes in memory, then you can iterate through them instead, but then the check whether the next neighbor is straight will become less readable (though maybe even faster). The real problem with that approach would be that you'd have to introduce a check to prevent straight paths from being yielded more than once.
I want to read sdf file (containing many molecules) and return the weighted adjacency matrix of the molecule. Atoms should be treated as vertices and bond as edges. If i and j vertex are connected by single, double, or triple bond then corresponding entries in the adjacency matrix should be 1,2, and 3 respectively. I need to further obtain a distance vector for each vertex which list the number of vertices at different distance.
Are there any python package available to do this?
I would recommend Pybel for reading and manipulating SDF files in Python. To get the bonding information, you will probably need to also use the more full-featured but less pythonic openbabel module, which can be used in concert with Pybel (as pybel.ob).
To start with, you would write something like this:
import pybel
for mol in pybel.readfile('sdf', 'many_molecules.sdf'):
for atom in mol:
coords = atom.coords
for neighbor in pybel.ob.OBAtomAtomIter(atom.OBAtom):
neighbor_coords = pybel.atom(neighbor).coords
See
http://code.google.com/p/cinfony/
However for your exact problem you will need to consult the documentation.
I have the following txt file representing a network in edgelist format.
The first two columns represent the usual: which node is connected to which other nodes
The third column represents weights, representing the number of times each node has contacted the other.
I have searched the igraph documentation but there's no mention of how to include an argument for weight when importing standard file formats like txt.
The file can be accessed from here and this is the code I've been using:
read.graph("Irvine/OClinks_w.txt", format="edgelist")
This code treats the third column as something other than weight.
Does anyone know the solution?
does the following cause too much annoyance?
g <- read.table("Irvine/OClinks_w.txt")
g <- graph.data.frame(g)
if it does then directly from the file you can use
g<-read.graph("Irvine/OClinks_w.txt",format="ncol")
E(g)$weight
If you are using Python and igraph, the following line of code works to import weights and vertex names:
g1w=Graph.Read_Ncol("g1_ncol_format_weighted.txt",names=True)
Note: you must tell igraph to read names attribute with names=True, otherwise just vertex numbers will be imported.
Where g1_ncol_format_weighted.txt looks something like:
A B 2
B C 3
To make sure the import worked properly, use the following lines:
print(g1w.get_edgelist())
print(g1w.es["weight"])
print(g1w.vs["name"])