I would like to find node connectivity between node 1 and rest of the nodes in a graph. The input text file format is as follows:
1 2 1
1 35 1
8 37 1
and so on for 167 lines. First column represents source node, second column represents destination node while the last column represents weight of the edge.
I'm trying to read the source, destination nodes from input file and forming an edge between them. I need to then find out if it is a connected network (only one component of graph and no sub-components). Here is the code
from numpy import*
import networkx as nx
G=nx.empty_graph()
for row in file('out40.txt'):
row = row.split()
src = row[0]
dest = row[1]
#print src
G.add_edge(src, dest)
print src, dest
for i in range(2, 41):
if nx.bidirectional_dijkstra(G, 1, i): print "path exists from 1 to ", i
manually adding the edges using
G.add_edge(1, 2)
works but is tedious and not suitable for large input files such as mine. The if loop condition works when I add edges manually but throws the following error for the above code:
in neighbors_iter
raise NetworkXError("The node %s is not in the graph."%(n,))
networkx.exception.NetworkXError: The node 2 is not in the graph.
Any help will be much appreciated!
In your code, you're adding nodes "1" and "2" et cetera (since reading from a file is going to give you strings unless you explicitly convert them).
However, you're then trying to refer to nodes 1 and 2. I'm guessing that networkx does not think that 2 == "2".
Try changing this...
G.add_edge(src, dest)
to this:
G.add_edge(int(src), int(dest))
Not sure if that is an option for you, but are you aware of the build-in support of networkx for multiple graph text formats?
The edge list format seems to apply pretty well to your case. Specifically, the following method will read your input files without the need for custom code:
G = nx.read_weighted_edgelist(filename)
If you want to remove the weights (because you don't need them), you could subsequently do the following:
for e in G.edges_iter(data=True):
e[2].clear() #[2] is the 3rd element of the tuple, which
#contains the dictionary with edge attributes
From Networkx documentation:
for row in file('out40.txt'):
row = row.split()
src = row[0]
dest = row[1]
G.add_nodes_from([src, dest])
#print src
G.add_edge(src, dest)
print src, dest
The error message says the the graph G doesn't have the nodes you are looking to create an edge in between.
You can also use "is_connected()" to make this a little simpler. e.g.
$ cat disconnected.edgelist
1 2 1
2 3 1
4 5 1
$ cat connected.edgelist
1 2 1
2 3 1
3 4 1
$ ipython
In [1]: import networkx as nx
In [2]: print(nx.is_connected(nx.read_weighted_edgelist('disconnected.edgelist')))
False
In [3]: print(nx.is_connected(nx.read_weighted_edgelist('connected.edgelist')))
True
Another option is to load the file as a pandas dataframe and then use iterrows to iterate:
import pandas as pd
import networkx as nx
cols = ["src", "des", "wei"]
df = pd.read_csv('out40.txt', sep=" ", header=None, names=cols)
G = nx.empty_graph()
for index, row in df.iterrows():
G.add_edge(row["src"], row["des"])
Related
I am trying to remove nodes at random from graphs using the networkx package. The first block describes the graph construction and the second block gives me the node lists that I have to remove from my graph H (20%, 50% and 70% removals). I want 3 versions of the base graph H in the end, in a list or any data structure. The code in block 3 gives me objects of type "None". The last block shows that it works for a single case.
I am guessing that the problem is in the append function, which somehow returns objects of type "None". I also feel that the base graph H might be getting altered after every iteration. Is there any way around this? Any help would be appreciated :)
import networkx as nx
import numpy as np
import random
# node removals from Graphs at random
# network construction
H = nx.Graph()
H.add_nodes_from([1,2,3,4,5,6,7,8,9,10])
H.add_edges_from([[1,2],[2,4],[5,6],[7,10],[1,5],[3,6]])
nx.info(H)
nodes_list = list(H.nodes)
# list of nodes to be removed
perc = [.20,.50,.70] # percentage of nodes to be removed
random_sample_list = []
for p in perc:
interior_list = []
random.seed(2) # for replicability
sample = round(p*10)
random_sample = random.sample(nodes_list, sample)
interior_list.append(random_sample)
random_sample_list.append(random_sample)
# applying the list of nodes to be removed to create a list of graphs - not working
graph_list = []
for i in range(len(random_sample_list)):
H1 = H.copy()
graph_list.append(H1.remove_nodes_from(random_sample_list[i]))
# list access - works
H.remove_nodes_from(random_sample_list[1])
nx.info(H)
Final output should look like:
[Graph with 20% removed nodes, Graph with 50% removed nodes, Graph with 7% removed nodes] - eg. list
The function remove_nodes_from does not return the modified graph, but returns None. Consequently, you only need to create the graph with the desired percentage of your nodes and append it to the list:
graph_list = []
for i in range(len(random_sample_list)):
H1 = H.copy()
H1.remove_nodes_from(random_sample_list[i])
graph_list.append(H1)
I have a SpaCy dependency tree made by this code:
from spacy import displacy
text = "We could say to them that if in fact that's all there is, then we could, Oh, we can do something."
print(displacy.render(nlp(text), style='dep', jupyter = True, options = {'distance': 120}))
That prints out this:
SpaCy determines that this entire string is connected in a dependency tree. What I am trying to figure out is how to discern how direct or indirect the connection is between a word and the next word. For example, looking at the first 3 words:
'We' is connected to the next word 'could', because it is directly connected to 'say', which is directly connected to 'could'. Therefor, it is 2 connection points away from the next word.
'could' is directly connected to 'say'. There for it is 1 connection point away from the start.
and so on.
Essentially, I want to make a df that would look like this:
word connection_points_to_next_word
We 2
could 1
say 1
...
I'm not sure how to achieve this. As SpaCy makes this graph, I'm sure there is some efficient way to calculate the number of vertices required to connect adjacent nodes, but all of SpaCy's tools I've found, such as:
token.lefts
token.rights
token.subtree
token.children
more here https://spacy.io/api/token
Include connection information, but not how direct this connection is. Any ideas how to get closer to this problem?
Using the networkx library, we can build an undirected graph from the edgelist of token-children relationships. I am using the index of the token in the document as a unique identifier so that repeat words are treated as separate nodes.
import spacy
import networkx as nx
nlp= spacy.load('en_core_web_lg')
text = "We could say to them that if in fact that's all there is, then we could, Oh, we can do something."
doc = nlp(text)
edges = []
for tok in doc:
edges.extend([(tok.i, child.i) for child in tok.children])
The shortest path between neighboring tokens can be calculated as below:
for idx, _ in enumerate(doc):
if idx < len(doc)-1:
print(doc[idx], doc[idx+1], nx.shortest_path_length(graph,source=idx, target=idx+1))
Output:
We could 2
could say 1
say to 1
to them 1
them that 4
that if 3
if in 2
in fact 1
fact that 3
that 's 1
's all 1
all there 2
there is 1
is , 4
, then 2
then we 2
we could 2
could , 2
, Oh 2
Oh , 2
, we 2
we can 2
can do 1
do something 1
something . 3
i have a file with format like this(but its a bigger file):
13 16 1
11 17 1
8 18 -1
11 19 1
11 20 -1
11 21 1
11 22 1
The first column is the starting vertex, the second column is the ending vertex and the third is the weight between the starting and ending vertex.
I try to create a graph with networkx but im getting this error:
"Edge tuple %s must be a 2-tuple or 3-tuple." % (e,))
Here is my code:
import networkx as nx
file = open("network.txt","r")
lines = file.readlines()
start_vertex = []
end_vertex = []
sign = []
for x in lines:
start_vertex.append(x.split('\t')[0])
end_vertex.append(x.split('\t')[1])
sign.append(x.split('\t')[2])
file.close()
G = nx.Graph()
for i in lines:
G.add_nodes_from(start_vertex)
G.add_nodes_from(end_vertex)
G.add_edges_from([start_vertex, end_vertex, sign])
You should use networkx's read_edgelist command.
G=nx.read_edgelist('network.txt', delimiter = ' ', nodetype = int, data = (('weight', int),))
notice that the delimiter I'm using is two spaces, because this appears to be what you've used in your input file.
If you want to stick to your code:
First, get rid of for i in lines.
The reason for your error is twofold. First, you want to use G.add_weighted_edges_from rather than G.add_edges_from.
Also, this expects a list (or similar object) whose entries are of the form (u,v,weight). So for example, G.add_weighted_edges_from([(13,16,1), (11,17,1)]) would add your first two edges. It sees the command G.add_weighted_edges_from([[13,11,8,11,...],[16,17,18,19,...],[1,1,-1,1,...]) and thinks that [13,11,8,11,...] needs to be the information for the first edge, [16,17,18,19,...] is the second edge and [1,1,-1,1,...] is the third edge. It can't do this.
You could do G.add_weighted_edges_from(zip(start_vertex,end_vertex,sign)). See this explanation of zip: https://stackoverflow.com/a/13704903/2966723
finally,
G.add_nodes_from(start_vertex) and G.add_nodes_from(end_vertex) are unneeded. If the nodes don't exist already when networkx tries to add an edge it will add the nodes as well.
Use the networkx library of python .. (I am assuming Python 3.6).
The following code will read your file as is. You won't need the lines you have written above.
The print command that I have written is to help you check if the graph which has been read is correct or not.
Note: If your graph is not a directed graph then you can remove the create_using=nx.DiGraph() part written in the function.
import networkx as nx
g = nx.read_edgelist('network.txt', nodetype=int, data=(('weight', int),), create_using=nx.DiGraph(),delimiter=' ')
print(nx.info(g))
I have the following dataset:
firm_id_1 firm_id_2
1 2
1 4
1 5
2 1
2 3
3 2
3 6
4 1
4 5
4 6
5 4
5 7
6 3 ....
I would like to graph the network of firm_id = 1. In other words, I want to see a graph that shows that firm_id = 1 is directly connected to 2, 4, 5, and indirectly connected to 3 via firm 2, connected to 6 via firm 4 and indirectly connected to 7 via firm 5. In other words I graph the shortest distance to each node (firm_id) starting from firm_id=1. There is 3000 nodes in my data and I know that firm 1 reaches all nodes in less than 9 vertices. How can I graph this in Python?
I would start with a library called NetworkX. I'm not sure I understand everything that you are looking for, but I think this should be close enough for you to modify it.
This program will load the data in from a text file graphdata.txt, split by whitespace, and add the pair as an edge.
It will then calculate the shortest paths to all nodes from 1, and then print if the distance is larger than 9... see the documentation for more details.
Lastly, it will render the graph using a spring layout to a file called mynetwork.png and to the screen.
Some optimization may / may not be needed for 3000 nodes.
Hope this helps!
import networkx as nx
import matplotlib.pyplot as plt
graph = nx.Graph()
with open('graphdata.txt') as f:
for line in f:
firm_id_1, firm_id_2 = line.split()
graph.add_edge(firm_id_1, firm_id_2)
paths_from_1 = nx.shortest_path(graph, "1")
for path in paths_from_1:
if len(paths_from_1[node]) > 9:
print "Shortest path from 1 to", node, "is longer than 9"
pos = nx.spring_layout(graph, iterations=200)
nx.draw(graph, pos)
plt.savefig("mynetwork.png")
plt.show()
You can try python-graph package. I am not sure about its scalability, but you can do something like...
from pygraph.classes.digraph import digraph
from pygraph.algorithms.minmax import shortest_path
gr= digraph()
gr.add_nodes(range(1,num_nodes))
for i in range(num_edges):
gr.add_edge((edge_start, edge_end))
# shortest path from the node 1 to all others
shortest_path(gr,1)
I went through the manual but couldnt completely figure it out.
I have a massive file of 3 columns, node1 hits node 2 with a certain strength. From this many clusters are generated by NetworkX and this works perfectly. However I cannot load these files into for example cytoscape so I need to write every cluster to a separate file.
I tried:
for n in G: nx.write_weighted_edgelist(G[n], 'test'+str(count))
Or looked into G.number_of_nodes / edges, G.graph.keys(), dir(G) but this doesnt result in what I want.
Is there a way to store every cluster separately with the strength?
With Clusters = nx.connected_components(G) I can obtain the clusters yet I loose all the connection information.
for n,nbrs in G.adjacency_iter():
for nbr,eattr in nbrs.items():
data=eattr['weight']
if data < 2:
print('(%s, %s, %s)' % (n,nbr,data))
When using that upon an empty line I think that those are separate clusters.
##########Solution
Clusters = nx.connected_components(G)
for Cluster in Clusters:
count = count + 1
cfile = open("tmp/Cluster_"+str(count)+".clus","w")
for C in Cluster:
hit = G[C]
for h in hit:
cfile.write('\t'.join([str(C),str(h),str(hit[h].values()[0]),"\n"]))
Try using graphs = nx.connected_component_subgraphs(). That will return a list of graphs which you could write individually in whatever format works for cytoscape.