I have created a graph object using the networkx library with the following code.
import networkx as nx
#Convert snap dataset to graph object
g = nx.read_edgelist('com-amazon.ungraph.txt',create_using=nx.Graph(),nodetype = int)
print(nx.info(g))
However I need to write the graph object to a dimacs file format which I believe networkx's functions do not include. Is there a way to do so?
The specification described on http://prolland.free.fr/works/research/dsat/dimacs.html is pretty simple, so you can just do something like this:
g = nx.house_x_graph() # stand-in graph since we don't have your data
dimacs_filename = "mygraph.dimacs"
with open(dimacs_filename, "w") as f:
# write the header
f.write("p EDGE {} {}\n".format(g.number_of_nodes(), g.number_of_edges()))
# now write all edges
for u, v in g.edges():
f.write("e {} {}\n".format(u, v))
this generates the file "mygraph.dimacs":
p EDGE 5 8
e 0 1
e 0 2
e 0 3
e 1 2
e 1 3
e 2 3
e 2 4
e 3 4
Related
I would like to create a network with networkx package from the data stored in the CSV. The data in the CSV file consist of two columns, as in example below. All nodes within the same edge group connects with each other (i.e. in E1 group (3 elements); there are: ABC -> BCD, BCD -> DEF, ABC -> DEF).
What would be the best approach/practice for transforming such data in Python to get an input for networkx package?
Edges Nodes
E1 ABC
E1 BCD
E1 DEF
E2 ABC
E2 BCD
E3 ABC
E3 BCD
E3 CDE
E3 DEF
Your format seems rather awkward as a graph specification. Edges are dyadic in nature, i.e. they connect two nodes, but your definition has potential for an edge to be related to a single node.
Your example also includes more than one edge between the same pair of nodes (e.g. ABC -> BCD via E1 and E2). This implies a MultiGraph.
IF this is really what your format should define, here is a way to get it into a networkx graph. There are very likely cleaner ways to read the data.
import networkx as nx
import itertools
# read file into a dictionary, in 3 stages.
with open("graph.txt") as f:
lines = f.readlines()
edges = []
for line in lines:
if line.startswith('Edges'):
continue
parts = line.strip().split()
edges.append(parts)
D = {}
for e, n in edges:
if e not in D:
D[e] = []
D[e].append(n)
G = nx.MultiGraph()
# for each edge group, add edges between all pairs of nodes
for e, nodes in D.items():
print e, nodes
for (u,v) in itertools.combinations(nodes, 2):
G.add_edge(u, v, label=e)
i have a file with format like this(but its a bigger file):
13 16 1
11 17 1
8 18 -1
11 19 1
11 20 -1
11 21 1
11 22 1
The first column is the starting vertex, the second column is the ending vertex and the third is the weight between the starting and ending vertex.
I try to create a graph with networkx but im getting this error:
"Edge tuple %s must be a 2-tuple or 3-tuple." % (e,))
Here is my code:
import networkx as nx
file = open("network.txt","r")
lines = file.readlines()
start_vertex = []
end_vertex = []
sign = []
for x in lines:
start_vertex.append(x.split('\t')[0])
end_vertex.append(x.split('\t')[1])
sign.append(x.split('\t')[2])
file.close()
G = nx.Graph()
for i in lines:
G.add_nodes_from(start_vertex)
G.add_nodes_from(end_vertex)
G.add_edges_from([start_vertex, end_vertex, sign])
You should use networkx's read_edgelist command.
G=nx.read_edgelist('network.txt', delimiter = ' ', nodetype = int, data = (('weight', int),))
notice that the delimiter I'm using is two spaces, because this appears to be what you've used in your input file.
If you want to stick to your code:
First, get rid of for i in lines.
The reason for your error is twofold. First, you want to use G.add_weighted_edges_from rather than G.add_edges_from.
Also, this expects a list (or similar object) whose entries are of the form (u,v,weight). So for example, G.add_weighted_edges_from([(13,16,1), (11,17,1)]) would add your first two edges. It sees the command G.add_weighted_edges_from([[13,11,8,11,...],[16,17,18,19,...],[1,1,-1,1,...]) and thinks that [13,11,8,11,...] needs to be the information for the first edge, [16,17,18,19,...] is the second edge and [1,1,-1,1,...] is the third edge. It can't do this.
You could do G.add_weighted_edges_from(zip(start_vertex,end_vertex,sign)). See this explanation of zip: https://stackoverflow.com/a/13704903/2966723
finally,
G.add_nodes_from(start_vertex) and G.add_nodes_from(end_vertex) are unneeded. If the nodes don't exist already when networkx tries to add an edge it will add the nodes as well.
Use the networkx library of python .. (I am assuming Python 3.6).
The following code will read your file as is. You won't need the lines you have written above.
The print command that I have written is to help you check if the graph which has been read is correct or not.
Note: If your graph is not a directed graph then you can remove the create_using=nx.DiGraph() part written in the function.
import networkx as nx
g = nx.read_edgelist('network.txt', nodetype=int, data=(('weight', int),), create_using=nx.DiGraph(),delimiter=' ')
print(nx.info(g))
There is a file format called .xyz that helps visualizing molecular bonds. Basically the format asks for a specific pattern:
At the first line there must be the number of atoms, which in my case is 30.
After that there should be the data where the first line is the name of the atom, in my case they are all carbon. The second line is the x information and the third line is the y information and the last line is the z information which are all 0 in my case. So something like this:
30
C x1 y1 z1
C x2 y2 z2
...
...
...
I generated my data in C++ into a text file like this:
C 2.99996 7.31001e-05 0
C 2.93478 0.623697 0
C 2.74092 1.22011 0
C 2.42702 1.76343 0
C 2.0079 2.22961 0
C 1.50006 2.59812 0
C 0.927076 2.8532 0
C 0.313848 2.98349 0
C -0.313623 2.9837 0
C -0.927229 2.85319 0
C -1.5003 2.5981 0
C -2.00732 2.22951 0
C -2.42686 1.76331 0
C -2.74119 1.22029 0
C -2.93437 0.623802 0
C -2.99992 -5.5509e-05 0
C -2.93416 -0.623574 0
C -2.7409 -1.22022 0
C -2.42726 -1.7634 0
C -2.00723 -2.22941 0
C -1.49985 -2.59809 0
C -0.92683 -2.85314 0
C -0.313899 -2.98358 0
C 0.31363 -2.98356 0
C 0.927096 -2.85308 0
C 1.50005 -2.59792 0
C 2.00734 -2.22953 0
C 2.4273 -1.76339 0
C 2.74031 -1.22035 0
C 2.93441 -0.623647 0
So, now what I'm trying to do is that I want to write this file into a .xyz file. I saw online that people do it with Python in which I almost have no experience. So I checked around and came up with this script:
#!/usr/bin/env/python
text_file = open("output.txt","r")
lines = text_file.readlines()
myfile = open("output.xyz","w")
for line in lines:
atom, x, y, z = line.split()
myfile.write("%s\t %d\t %d\t %d\t" %(atom,x,y,z))
myfile.close()
text_file.close()
However when I run this, it gives the following error: "%d format: a number is required, not str."
It doesn't make sense to me, since as you can see in txt file, they are all numbers apart from the first line. I tried changing my d's into s's but then the program I'll load this data into gave an error.
To summarize:
I have a data file in .txt, I want to change it into .xyz that's been specified but I am running into problems.
Thanks in advance.
A string can represent a number as well. In programming languages, this is called a type. '1' and 1 have different types. Use %s instead for strings.
myfile.write("%s\t %s\t %s\t %s\t" % (atom, x, y, z))
If you want them to be floats, you should do this during the parsing stage:
x, y, z = map(float, (x, y, z))
And btw, % is considered obsolete in python. Please use format instead:
myfile.write("{}\t {}\t {}\t {}\t".format(atom,x,y,z))
Maybe the problem you've faced was because of "\t" in the answer (tab).
The .xyz file uses only spaces to separate data from the same line, as stated here. You could use only one space if you wanted, but to have an easily readable format, like other programs use to format when saving .xyz files, it's better to use the tips from https://pyformat.info/
My working code (in Python 3) to generate .xyz files, using objects of molecules and atoms from the CSD Python API library, is this one, that you can adapt to your reality:
with open("file.xyz", 'w') as xyz_file:
xyz_file.write("%d\n%s\n" % (len(molecule.atoms), title))
for atom in molecule.atoms:
xyz_file.write("{:4} {:11.6f} {:11.6f} {:11.6f}\n".format(
atom.atomic_symbol, atom.coordinates.x, atom.coordinates.y, atom.coordinates.z))
The first two lines are the number of atoms and the title for the xyz file. The other lines are the atoms (atomic symbol and 3D position).
So, the atomic symbol has 4 spaces for it, aligned to the left: {:4}
Then, this happens 3 times: {:11.6f}
That means one space followed by the next coordinate, that uses 11 characters aligned to the right, where 6 are after the decimal point. It's sufficient for numbers from -999.999999 to 9999.999999, that use to be sufficient. Numbers out of this interval only break the format, but keep the mandatory space between data, so the xyz file still works on those cases.
The result is like this:
18
WERPOY
N 0.655321 3.658330 14.594159
C 1.174111 4.551873 13.561374
C 0.703656 3.889113 15.926147
S 1.455530 5.313258 16.574524
C 1.127601 5.061435 18.321297
N 0.146377 2.914698 16.639984
C -0.288067 2.014580 15.736297
C 0.014111 2.441298 14.475693
N -0.266880 1.787085 13.260652
O -0.831165 0.699580 13.319885
O 0.056329 2.322290 12.209641
H 0.439235 4.780025 12.970561
H 1.821597 4.148825 13.137629
H 1.519448 5.312600 13.775525
H 1.522843 5.786000 18.590124
H 0.171557 5.069325 18.423056
H 1.477689 4.168550 18.574936
H -0.665073 1.282125 16.053727
I have the following dataset:
firm_id_1 firm_id_2
1 2
1 4
1 5
2 1
2 3
3 2
3 6
4 1
4 5
4 6
5 4
5 7
6 3 ....
I would like to graph the network of firm_id = 1. In other words, I want to see a graph that shows that firm_id = 1 is directly connected to 2, 4, 5, and indirectly connected to 3 via firm 2, connected to 6 via firm 4 and indirectly connected to 7 via firm 5. In other words I graph the shortest distance to each node (firm_id) starting from firm_id=1. There is 3000 nodes in my data and I know that firm 1 reaches all nodes in less than 9 vertices. How can I graph this in Python?
I would start with a library called NetworkX. I'm not sure I understand everything that you are looking for, but I think this should be close enough for you to modify it.
This program will load the data in from a text file graphdata.txt, split by whitespace, and add the pair as an edge.
It will then calculate the shortest paths to all nodes from 1, and then print if the distance is larger than 9... see the documentation for more details.
Lastly, it will render the graph using a spring layout to a file called mynetwork.png and to the screen.
Some optimization may / may not be needed for 3000 nodes.
Hope this helps!
import networkx as nx
import matplotlib.pyplot as plt
graph = nx.Graph()
with open('graphdata.txt') as f:
for line in f:
firm_id_1, firm_id_2 = line.split()
graph.add_edge(firm_id_1, firm_id_2)
paths_from_1 = nx.shortest_path(graph, "1")
for path in paths_from_1:
if len(paths_from_1[node]) > 9:
print "Shortest path from 1 to", node, "is longer than 9"
pos = nx.spring_layout(graph, iterations=200)
nx.draw(graph, pos)
plt.savefig("mynetwork.png")
plt.show()
You can try python-graph package. I am not sure about its scalability, but you can do something like...
from pygraph.classes.digraph import digraph
from pygraph.algorithms.minmax import shortest_path
gr= digraph()
gr.add_nodes(range(1,num_nodes))
for i in range(num_edges):
gr.add_edge((edge_start, edge_end))
# shortest path from the node 1 to all others
shortest_path(gr,1)
I would like to find node connectivity between node 1 and rest of the nodes in a graph. The input text file format is as follows:
1 2 1
1 35 1
8 37 1
and so on for 167 lines. First column represents source node, second column represents destination node while the last column represents weight of the edge.
I'm trying to read the source, destination nodes from input file and forming an edge between them. I need to then find out if it is a connected network (only one component of graph and no sub-components). Here is the code
from numpy import*
import networkx as nx
G=nx.empty_graph()
for row in file('out40.txt'):
row = row.split()
src = row[0]
dest = row[1]
#print src
G.add_edge(src, dest)
print src, dest
for i in range(2, 41):
if nx.bidirectional_dijkstra(G, 1, i): print "path exists from 1 to ", i
manually adding the edges using
G.add_edge(1, 2)
works but is tedious and not suitable for large input files such as mine. The if loop condition works when I add edges manually but throws the following error for the above code:
in neighbors_iter
raise NetworkXError("The node %s is not in the graph."%(n,))
networkx.exception.NetworkXError: The node 2 is not in the graph.
Any help will be much appreciated!
In your code, you're adding nodes "1" and "2" et cetera (since reading from a file is going to give you strings unless you explicitly convert them).
However, you're then trying to refer to nodes 1 and 2. I'm guessing that networkx does not think that 2 == "2".
Try changing this...
G.add_edge(src, dest)
to this:
G.add_edge(int(src), int(dest))
Not sure if that is an option for you, but are you aware of the build-in support of networkx for multiple graph text formats?
The edge list format seems to apply pretty well to your case. Specifically, the following method will read your input files without the need for custom code:
G = nx.read_weighted_edgelist(filename)
If you want to remove the weights (because you don't need them), you could subsequently do the following:
for e in G.edges_iter(data=True):
e[2].clear() #[2] is the 3rd element of the tuple, which
#contains the dictionary with edge attributes
From Networkx documentation:
for row in file('out40.txt'):
row = row.split()
src = row[0]
dest = row[1]
G.add_nodes_from([src, dest])
#print src
G.add_edge(src, dest)
print src, dest
The error message says the the graph G doesn't have the nodes you are looking to create an edge in between.
You can also use "is_connected()" to make this a little simpler. e.g.
$ cat disconnected.edgelist
1 2 1
2 3 1
4 5 1
$ cat connected.edgelist
1 2 1
2 3 1
3 4 1
$ ipython
In [1]: import networkx as nx
In [2]: print(nx.is_connected(nx.read_weighted_edgelist('disconnected.edgelist')))
False
In [3]: print(nx.is_connected(nx.read_weighted_edgelist('connected.edgelist')))
True
Another option is to load the file as a pandas dataframe and then use iterrows to iterate:
import pandas as pd
import networkx as nx
cols = ["src", "des", "wei"]
df = pd.read_csv('out40.txt', sep=" ", header=None, names=cols)
G = nx.empty_graph()
for index, row in df.iterrows():
G.add_edge(row["src"], row["des"])