I'm trying to save networkx DiGraph by preserving nodes' attributes.
I have tried with nx.write_weighted_edgelist, nx.write_edgelist and nx.write_weighted_edgelist , and after trying (and also looking at https://networkx.org/documentation/networkx-1.10/reference/readwrite.html) I know that both adjacency and edge lists does not preserve nodes' attributes.
Now I have seen also the other options in the networkx link, but I don't understand if other commands preserve attributes, and I need to be sure that it works (my code need to create and save more than 5000 graphs and it takes almost a day to run).
So which is the best way to save a graph and preserve nodes' attributes?
First, note that you are using an outdated version of the NetworkX docs; you should always use the stable version.
One format which is guaranteed to preserve node data is the pickle (although this is deprecated in NetworkX 2.6, it is currently usable):
In [1]: import networkx as nx
In [2]: G = nx.Graph()
In [3]: G.add_node("A", weight=10)
In [4]: nx.write_gpickle(G, "test.gpickle")
In [5]: H = nx.read_gpickle("test.gpickle")
In [6]: H.nodes(data=True)
Out[6]: NodeDataView({'A': {'weight': 10}})
The GML format should also work for most datatypes:
In [8]: nx.write_gml(G, "test.gml")
In [9]: H = nx.read_gml("test.gml")
In [10]: H.nodes(data=True)
Out[10]: NodeDataView({'A': {'weight': 10}})
GEXF works as well:
In [12]: nx.write_gexf(G, "test.gexf")
In [13]: H = nx.read_gexf("test.gexf")
In [14]: H.nodes(data=True)
Out[14]: NodeDataView({'A': {'weight': 10, 'label': 'A'}})
So you have several options and can decide based on performance and support for the specific attribute data you are trying to save.
Related
In the following documentation for the network.Graph.edges:
https://networkx.github.io/documentation/stable/reference/classes/generated/networkx.Graph.edges.html
it is illustrated how we add weight to an edge:
G = nx.path_graph(3)
G.add_edge(2, 3, weight=5)
My question is, can we add multiple weights to the same edge in a simple (not multi) graph?
For example, I want to create a simple graph depicting the contact network of employees working in an organization. A node in my network denotes an employee while creating the network, I want to draw an edge between two employees when they meet each other, but I want to assign multiple weights based on the time they spent together in performing different categories of work together, e.g say node A and node B spends 5 hours in working on a project, 2 hours in a meeting, 1 hour in lunch.
Thanks for your reply in advance.
You can add as many edge attributes as you want. Example:
>>> G.add_edge(2, 3, lunch_time=1, project_time=5, meeting_time=2)
>>> G.edges[2,3]
{'weight': 5, 'lunch_time': 1, 'project_time': 5, 'meeting_time': 2}
>>> G.edges[2,3]['lunch_time']
1
The answer is no but that's not that bad. G.add_edge(2, 3, weight=5) is a method that gives an instruction update keys 2 and 3 in dictionary G._adj which is an attribute used internally in nx.Graph() class.
After an update it looks this way:
G._adj = {2: {3: {'weight': 5}}, 3: {2: {'weight': 5}}}
As much as you can't have same keys in dictionary, assignment of different values to the same weight attribute is not possible.
In order to store hours spent, you can use dictionaries, tuples, lists or any other datatypes that would be good for you, for example:
G.add_edges(2, 3, weight = {'project': 5, 'meeting': 2, 'lunch':1})
They are single values too, it should just be (at least, normally) iterable ones.
I have a dataset that shows parent-child relationship but no child-child relationship for siblings. I'm building a network using python's Networkx package (python version 3.6). I would like to add edges between siblings (if children share parents, they are siblings). How can I do this?
I've found some questions about conditional edge creation but in these questions the condition does not depend on other node properties (for example, existing edges to certain nodes):
python networkx remove nodes and edges with some condition
But I'm not sure how to formulate the condition in my case, to achieve what I want.
import networkx as nx
dat = {'child':[1,1,4,4,5,5,8,8], 'parent':[2,3,2,3,6,7,6,7]}
# Create DataFrame
data = pd.DataFrame(dat)
# Create graph with known connections
G = nx.Graph()
def create_edges(row):
return G.add_edge(row['child'],row['parent'])
data.apply(create_edges, axis=1)
I would like to create edge between nodes 1 and 4, and nodes 5 and 8 (because they share parents and are clearly siblings) but not between 1 and 5, or 4 and 8.
I hope I'm not overcomplicating things, but this is how I'd go:
First, group the children by joint parents. The resulting variable parents_children is a dict with parents as keys and the set of every parent's children as values.
parents_children = {parent: {child for child in dat['child']
if (parent,child) in list(zip(dat['parent'],dat['child']))}
for parent in dat['parent']}
Afterwards, go over pairs of children with the same parent, and add an edge between them:
from itertools import combinations
for children in parents_children.values():
for children_couple in combinations(children,2):
G.add_edge(*children_couple)
I ran it on my side and I think it got the right result.
I have the following code:
import networkx
def reverse_graph(g):
reversed = networkx.DiGraph()
for e in g.edges():
reversed.add_edge(e[1], e[0])
return reversed
g = networkx.DiGraph()
for i in range(500000):
g.add_edge(i, i+1)
g2 = g.reverse()
g3 = reverse_graph(g)
And according to my line profiler I am spending WAY more time reversing the graph using networkx (their reverse took about 21 sec, mine took about 7). The overhead seems high in this simple case, and it's even worse in other code I have with more complex objects. Is there something happening under the hood of networkx I'm not aware of? This seems like it should be a relatively cheap function.
For reference, here is the doc for the reverse function
EDIT: I also tried running the implementations the other way around (i.e. mine first) to make sure there was no cacheing happening when they created theirs. Mine was still significantly faster
The source code for the reverse method looks like this:
def reverse(self, copy=True):
"""Return the reverse of the graph.
The reverse is a graph with the same nodes and edges
but with the directions of the edges reversed.
Parameters
----------
copy : bool optional (default=True)
If True, return a new DiGraph holding the reversed edges.
If False, reverse the reverse graph is created using
the original graph (this changes the original graph).
"""
if copy:
H = self.__class__(name="Reverse of (%s)"%self.name)
H.add_nodes_from(self)
H.add_edges_from( (v,u,deepcopy(d)) for u,v,d
in self.edges(data=True) )
H.graph=deepcopy(self.graph)
H.node=deepcopy(self.node)
else:
self.pred,self.succ=self.succ,self.pred
self.adj=self.succ
H=self
return H
So by default, when copy=True, not only are the edge nodes reversed,
but also a deepcopy of any edge data is made. Then the graph attributes (held in
self.graph) are deepcopied, and then the nodes themselves are deepcopied.
That's a lot of copying that reverse_graph does not do.
If you don't deepcopy everything, modifying g3 could affect g.
If you don't need to deepcopy everything, (and if mutating g is acceptable) then
g.reverse(copy=False)
is even faster than
g3 = reverse_graph(g)
In [108]: %timeit g.reverse(copy=False)
1000000 loops, best of 3: 359 ns per loop
In [95]: %timeit reverse_graph(g)
1 loops, best of 3: 1.32 s per loop
In [96]: %timeit g.reverse()
1 loops, best of 3: 4.98 s per loop
I'm using NetworkX to create a weighted graph (not a digraph). Each node has a node name and a number of edges that have a weight. The weights are always positive, non-zero integers.
What I'm trying to do is get a list of tuples where each tuple represents a node in the graph (by name) and the weighted degree of the node.
I can do something like this:
the_list = sorted(my_graph.degree_iter(),key=itemgetter(1),reverse=True)
But this doesn't appear to be taking the weighting of each node into account. Each node may have a different weight for every edge (or they may be the same, there's no way to know).
Do I need to write a function to do this manually? I've been coming through the NetworkX docs and am coming up empty on a built-in way to do this (but maybe I'm overlooking it).
If I have to write the function myself, I'm assuming I use the size() method with the weight flag set. That seems to only give me the sum of all the weights in the graph though.
Any help is greatly appreciated.
You can use the Graph.degree() method with the weight= keyword like this:
In [1]: import networkx as nx
In [2]: G = nx.Graph()
In [3]: G.add_edge(1,2,weight=7)
In [4]: G.add_edge(1,3,weight=42)
In [5]: G.degree(weight='weight')
Out[5]: {1: 49, 2: 7, 3: 42}
In [6]: G.degree(weight='weight').items()
Out[6]: [(1, 49), (2, 7), (3, 42)]
I am using python with networkx package. I need to find the nodes connected to out edges of a given node.
I know there is a function networkx.DiGraph.out_edges but it returns out edges for the entire graph.
I'm not a networkx expert, but have you tried networkx.DiGraph.out_edges, specifying the source node?
DiGraph.out_edges(nbunch=None, data=False)
Return a list of edges.
Edges are returned as tuples with optional data in the order (node,
neighbor, data).
If you just want the out edges for a single node, pass that node in inside the nbunch:
graph.out_edges([my_node])
The simplest way is to use the successors() method:
In [1]: import networkx as nx
In [2]: G=nx.DiGraph([(0,1),(1,2)])
In [3]: G.edges()
Out[3]: [(0, 1), (1, 2)]
In [4]: G.successors(1)
Out[4]: [2]