python graph-tool load csv file

python graph-tool load csv file - python

I'm loading directed weighted graph from csv file into graph-tool graph in python. The organization of the input csv file is:
1,2,300
2,4,432
3,89,1.24
...
Where the fist two entries of a line identify source and target of an edge and the third number is the weight of the edge.
Currently I'm using:
g = gt.Graph()
e_weight = g.new_edge_property("float")
csv_network = open (in_file_directory+ '/'+network_input, 'r')
csv_data_n = csv_network.readlines()
for line in csv_data_n:
edge = line.replace('\r\n','')
edge = edge.split(delimiter)
e = g.add_edge(edge[0], edge[1])
e_weight[e] = float(edge[2])
However it takes quite long to load the data (I have network of 10 millions of nodes and it takes about 45 min).
I have tried to make it faster by using g.add_edge_list, but this works only for unweighted graphs. Any suggestion how to make it faster?

This has been answered in graph-tool's mailing list:
http://lists.skewed.de/pipermail/graph-tool/2015-June/002043.html
In short, you should use the function g.add_edge_list(), as you said, and and put the weights separately
via the array interface for property maps:
e_weight.a = weight_list
The weight list should have the same ordering as the edges you passed to
g.add_edge_list().

I suggest you try the performance you get by using the csv library. This example returns edge holding a list of the 3 parameters.
import csv
reader = csv.reader(open(in_file_directory+ '/'+network_input, 'r'), delimiter=",")
for edge in reader:
if len(edge) == 3:
edge_float = [float(param) for param in edge]
So you would get the following to work with...
edge_float = [1.0, 2.0, 300.0]

Related

How to save a qgis graph to a shapefile to use in networkx?

I created a graph from a layer using the code below. I want to save this graph to a shapefile for further use in networkx.
I don't want to also save it as a QGIS layer. So how can I simply save it without giving a layer as the first argument of writeAsVectorFormat?
And if I try give an existing layer as argument it gives a strange bug: the code runs, and Windows shows the .shp file at the recent files in Windows Explorer, but when I want to open it it says that the file does not exist, and I also can't see it in the folder where it should be.
I also can't find how to just create a random layer, so if it's really needed to create a layer, can someone tell me how to do it?
Thank you for help
from qgis.analysis import *
vectorLayer = qgis.utils.iface.mapCanvas().currentLayer()
director = QgsVectorLayerDirector(vectorLayer, 12, '2.0', '3.0', '1.0', QgsVectorLayerDirector.DirectionBoth)
# The index of the field that contains information about the edge speed
attributeId = 1
# Default speed value
defaultValue = 50
# Conversion from speed to metric units ('1' means no conversion)
toMetricFactor = 1
strategy = QgsNetworkSpeedStrategy(attributeId, defaultValue, toMetricFactor)
director.addStrategy(strategy)
builder = QgsGraphBuilder(vectorLayer.crs())
startPoint = QgsPointXY(16.8346339,46.8931070)
endPoint = QgsPointXY(16.8376039,46.8971058)
tiedPoints = director.makeGraph(builder, [startPoint, endPoint])
graph = builder.graph()
vl = QgsVectorLayer("Point", "temp", "memory")
QgsVectorFileWriter.writeAsVectorFormat(v1, "zzsh.shp", "CP1250", vectorLayer.crs(), "ESRI Shapefile")
#QgsProject.instance().mapLayersByName('ZZMap07 copy')[0]

Converting pixels into wavelength using 2 FITS files

I am new to python and FITS image files, as such I am running into issues. I have two FITS files; the first FITS file is pixels/counts and the second FITS file (calibration file) is pixels/wavelength. I need to convert pixels/counts into wavelength/counts. Once this is done, I need to output wavelength/counts as a new FITS file for further analysis. So far I have managed to array the required data as shown in the code below.
import numpy as np
from astropy.io import fits
# read the images
image_file = ("run_1.fits")
image_calibration = ("cali_1.fits")
hdr = fits.getheader(image_file)
hdr_c = fits.getheader(image_calibration)
# print headers
sp = fits.open(image_file)
print('\n\nHeader of the spectrum :\n\n', sp[0].header, '\n\n')
sp_c = fits.open(image_calibration)
print('\n\nHeader of the spectrum :\n\n', sp_c[0].header, '\n\n')
# generation of arrays with the wavelengths and counts
count = np.array(sp[0].data)
wave = np.array(sp_c[0].data)
I do not understand how to save two separate arrays into one FITS file. I tried an alternative approach by creating list as shown in this code
file_list = fits.open(image_file)
calibration_list = fits.open(image_calibration)
image_data = file_list[0].data
calibration_data = calibration_list[0].data
# make a list to hold images
img_list = []
img_list.append(image_data)
img_list.append(calibration_data)
# list to numpy array
img_array = np.array(img_list)
# save the array as fits - image cube
fits.writeto('mycube.fits', img_array)
However I could only save as a cube, which is not correct because I just need wavelength and counts data. Also, I lost all the headers in the newly created FITS file. To say I am lost is an understatement! Could someone point me in the right direction please? Thank you.
I am still working on this problem. I have now managed (I think) to produce a FITS file containing the wavelength and counts using this website:
https://www.mubdirahman.com/assets/lecture-3---numerical-manipulation-ii.pdf
This is my code:
# Making a Primary HDU (required):
primaryhdu = fits.PrimaryHDU(flux) # Makes a header # or if you have a header that you’ve created: primaryhdu = fits.PrimaryHDU(arr1, header=head1)
# If you have additional extensions:
secondhdu = fits.ImageHDU(wave)
# Making a new HDU List:
hdulist1 = fits.HDUList([primaryhdu, secondhdu])
# Writing the file:
hdulist1.writeto("filename.fits", overwrite=True)
image = ("filename.fits")
hdr = fits.open(image)
image_data = hdr[0].data
wave_data = hdr[1].data
I am sure this is not the correct format for wavelength/counts. I need both wavelength and counts to be contained in hdr[0].data

If you are working with spectral data, it might be useful to look into specutils which is designed for common tasks associated with reading/writing/manipulating spectra.
It's common to store spectral data in FITS files using tables, rather than images. For example you can create a table containing wavelength, flux, and counts columns, and include the associated units in the column metadata.
The docs include an example on how to create a generic "FITS table" writer with wavelength and flux columns. You could start from this example and modify it to suit your exact needs (which can vary quite a bit from case to case, which is probably why a "generic" FITS writer is not built-in).
You might also be able to use the fits-wcs1d format.
If you prefer not to use specutils, that example still might be useful as it demonstrates how to create an Astropy Table from your data and output it to a well-formatted FITS file.

Parse a file to create a graph in python

i have a file with format like this(but its a bigger file):
13 16 1
11 17 1
8 18 -1
11 19 1
11 20 -1
11 21 1
11 22 1
The first column is the starting vertex, the second column is the ending vertex and the third is the weight between the starting and ending vertex.
I try to create a graph with networkx but im getting this error:
"Edge tuple %s must be a 2-tuple or 3-tuple." % (e,))
Here is my code:
import networkx as nx
file = open("network.txt","r")
lines = file.readlines()
start_vertex = []
end_vertex = []
sign = []
for x in lines:
start_vertex.append(x.split('\t')[0])
end_vertex.append(x.split('\t')[1])
sign.append(x.split('\t')[2])
file.close()
G = nx.Graph()
for i in lines:
G.add_nodes_from(start_vertex)
G.add_nodes_from(end_vertex)
G.add_edges_from([start_vertex, end_vertex, sign])

You should use networkx's read_edgelist command.
G=nx.read_edgelist('network.txt', delimiter = ' ', nodetype = int, data = (('weight', int),))
notice that the delimiter I'm using is two spaces, because this appears to be what you've used in your input file.
If you want to stick to your code:
First, get rid of for i in lines.
The reason for your error is twofold. First, you want to use G.add_weighted_edges_from rather than G.add_edges_from.
Also, this expects a list (or similar object) whose entries are of the form (u,v,weight). So for example, G.add_weighted_edges_from([(13,16,1), (11,17,1)]) would add your first two edges. It sees the command G.add_weighted_edges_from([[13,11,8,11,...],[16,17,18,19,...],[1,1,-1,1,...]) and thinks that [13,11,8,11,...] needs to be the information for the first edge, [16,17,18,19,...] is the second edge and [1,1,-1,1,...] is the third edge. It can't do this.
You could do G.add_weighted_edges_from(zip(start_vertex,end_vertex,sign)). See this explanation of zip: https://stackoverflow.com/a/13704903/2966723
finally,
G.add_nodes_from(start_vertex) and G.add_nodes_from(end_vertex) are unneeded. If the nodes don't exist already when networkx tries to add an edge it will add the nodes as well.

Use the networkx library of python .. (I am assuming Python 3.6).
The following code will read your file as is. You won't need the lines you have written above.
The print command that I have written is to help you check if the graph which has been read is correct or not.
Note: If your graph is not a directed graph then you can remove the create_using=nx.DiGraph() part written in the function.
import networkx as nx
g = nx.read_edgelist('network.txt', nodetype=int, data=(('weight', int),), create_using=nx.DiGraph(),delimiter=' ')
print(nx.info(g))

Multiprocess or threading with huge data structure for RAM and speed issues. Python 2.7

I'm writing an application about MST algorithm passing huge graph (like 100 / 150 milion edges) in Python 2.7 . Graph is setted up with Adjacency List using a classic class with method like :
def insertArcW(self, tail, head, weight):
if head in self.nodes and tail in self.nodes:
self.adj[tail].addAsLast(head)
self.adj[tail].addAsLast(weight)
def insertNode(self, e):
newnode = Node(self.nextId, e)
self.nextId += 1
I'm also using Linked List (created with array) and queue from python stdlibrary(version 2.7).
With this piece of code the insert is really fast (due to less number of nodes compare to number of edges.):
n = []
for i in xrange(int(file_List[0])):
n.append(G.insertNode(i))
Problem comes with the insert of the edges:
for e in xrange(len(arc_List))
G.insertArcW(n[arc_List[e][0]].index, n[arc_List[e][1]].index,arc_List[e][2])
G.insertArcW(n[arc_List[e][1]].index, n[arc_List[e][0]].index,arc_List[e][2])
It's working great with 1 milion edges but with more it going to eat all of my ram (4GB , 64bit) but no freeze ! It can build the graph in a long time ! Considering that usage of CPU is limited to 19/25 % while doing this , there is a way of doing such things in multiprocess or multithread ? Like build the graph with two core doing same operation at same time but with different data ? I mean one core working with half of edges and another core with other half.
I'm practically new to this "place of programming" above all in Python.
EDIT : By using this function i'm setting up two list for nodes and edges ! I need to take information by a ".txt" file. Inserting the insertArcW and insertNode there is a oscillation of RAM between 2.4GB to 2.6GB . Now I can say that is stable (maybe due to "delete" of the two huge list of edges and node) but always at the same speed. Code :
f = open(graph + '.txt','r')
v = f.read()
file_List = re.split('\s+',v)
arc_List = []
n = []
p = []
for x in xrange(0,int(file_List[1])):
arc_List.append([0,0,0])
for i in xrange(int(file_List[0])):
n.append(G.insertNode(i))
for weight in xrange(1,int(file_List[1])+1):
p.append(weight)
random.shuffle(p)
i = 0
r = 0
while r < int(file_List[1]):
for k in xrange(2,len(file_List),2):
arc_List[r][0] = int(file_List[k])
arc_List[r][1] = int(file_List[k+1])
arc_List[r][2] = float(p[i])
G.insertArcW(n[arc_List[r][0]].index, n[arc_List[r][1]].index,arc_List[r][2])
G.insertArcW(n[arc_List[r][1]].index, n[arc_List[r][0]].index,arc_List[r][2])
print r
i+=1
r+=1
f.close()

NetworkX write to file

I went through the manual but couldnt completely figure it out.
I have a massive file of 3 columns, node1 hits node 2 with a certain strength. From this many clusters are generated by NetworkX and this works perfectly. However I cannot load these files into for example cytoscape so I need to write every cluster to a separate file.
I tried:
for n in G: nx.write_weighted_edgelist(G[n], 'test'+str(count))
Or looked into G.number_of_nodes / edges, G.graph.keys(), dir(G) but this doesnt result in what I want.
Is there a way to store every cluster separately with the strength?
With Clusters = nx.connected_components(G) I can obtain the clusters yet I loose all the connection information.
for n,nbrs in G.adjacency_iter():
for nbr,eattr in nbrs.items():
data=eattr['weight']
if data < 2:
print('(%s, %s, %s)' % (n,nbr,data))
When using that upon an empty line I think that those are separate clusters.
##########Solution
Clusters = nx.connected_components(G)
for Cluster in Clusters:
count = count + 1
cfile = open("tmp/Cluster_"+str(count)+".clus","w")
for C in Cluster:
hit = G[C]
for h in hit:
cfile.write('\t'.join([str(C),str(h),str(hit[h].values()[0]),"\n"]))

Try using graphs = nx.connected_component_subgraphs(). That will return a list of graphs which you could write individually in whatever format works for cytoscape.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python graph-tool load csv file - python

Related

How to save a qgis graph to a shapefile to use in networkx?

Converting pixels into wavelength using 2 FITS files

Parse a file to create a graph in python

Multiprocess or threading with huge data structure for RAM and speed issues. Python 2.7

NetworkX write to file

Categories

Resources