Is there a simpler, easier way to convert coordinates (long, lat) to a "networkx"-graph, than nested looping over those coordinates and adding weighted nodes/edges for each one?
for idx1, itm1 in enumerate(data):
for idx2, itm2 in enumerate(data):
pos1 = (itm1["lng"], itm1["lat"])
pos2 = (itm2["lng"], itm2["lat"])
distance = vincenty(pos1, pos2).meters #geopy distance
# print(idx1, idx2, distance)
graph.add_edge(idx1, idx2, weight=distance)
The target is representing points as a graph in order to use several functions on this graph.
Edit: Using an adjacency_matrix would still need a nested loop
You'll have to do some kind of loop. But if you are using an undirected graph you can eliminate half of the graph.add_edge() (only need to add u-v and not v-u). Also as #EdChum suggests you can use graph.add_weighted_edges_from() to make it go faster.
Here is a nifty way to do it
In [1]: from itertools import combinations
In [2]: import networkx as nx
In [3]: data = [10,20,30,40]
In [4]: edges = ( (s[0],t[0],s[1]+t[1]) for s,t in combinations(enumerate(data),2))
In [5]: G = nx.Graph()
In [6]: G.add_weighted_edges_from(edges)
In [7]: G.edges(data=True)
Out[7]:
[(0, 1, {'weight': 30}),
(0, 2, {'weight': 40}),
(0, 3, {'weight': 50}),
(1, 2, {'weight': 50}),
(1, 3, {'weight': 60}),
(2, 3, {'weight': 70})]
Related
I have this code. It reads a list of sentences, and then uses sklearn's CountVectorizer to compute word co-occurrences.
from sklearn.feature_extraction.text import CountVectorizer
data = ['this is a sentence', 'this was a monkey', 'all this is nice']
count_model = CountVectorizer(ngram_range=(1,1)) # default unigram model
X = count_model.fit_transform(data)
Xc = (X.T * X) # this is co-occurrence matrix in sparse csr format
Xc.setdiag(0) # sometimes you want to fill same word cooccurence to 0
matrix_dense = Xc.todense() # matrix in dense format
import networkx as nx
G=nx.from_numpy_matrix(matrix_dense)
If I do G.edges(data=True), it outputs this:
[(0, 1, {'weight': 1}),
(0, 3, {'weight': 1}),
(0, 5, {'weight': 1}),
(1, 3, {'weight': 1}),
(1, 4, {'weight': 1}),
(1, 5, {'weight': 2})
and so on. How can I get words instead of numbers as source, target?
EDIT:
This is a:
labels = count:model.get_feature_names() # get the word labels
G=nx.from_numpy_matrix(matrix_dense) # create graph
for node, label in zip(G.nodes(), labels): # add labels to the graph
G.node[node]['label'] = label
With networkx you can replace one set of with another set of nodes. This is with relabel_nodes.
Here is the example from the documentation. It creates a 3 node graph and then creates a copy of that graph with the new node names. You can also do directly to G by setting the optional argument copy to False in the function call.
G = nx.path_graph(3)
sorted(G)
> [0, 1, 2]
mapping = {0: 'a', 1: 'b', 2: 'c'}
H = nx.relabel_nodes(G, mapping)
sorted(H)
> ['a', 'b', 'c']
I'm trying to create a graph with the following information.
n = 6 #number of nodes
V = []
V=range(n)# list of vertices
print("vertices",V)
# Create n random points
random.seed(1)
points = []
pos = []
pos = {i:(random.randint(0,50),random.randint(0,100)) for i in V}
print("pos =", pos)
This gives my positions as
pos = {0: (8, 72), 1: (48, 8), 2: (16, 15), 3: (31, 97), 4: (28, 60), 5: (41, 48)}
I want to draw a graph with these nodes and some edges(which can be obtained in some other calculation) using Matplotlib in Python. I've tried it as follows. But didn't work.
G_1 = nx.Graph()
nx.set_node_attributes(G_1,'pos',pos)
G_1.add_nodes_from(V) # V is the set of nodes and V =range(6)
for (u,v) in tempedgelist:
G_1.add_edge(v, u, capacity=1) # tempedgelist contains my edges as a list ... ex: tempedgelist = [[0, 2], [0, 3], [1, 2], [1, 4], [5, 3]]
nx.draw(G_1,pos, edge_labels=True)
plt.show()
Can someone please help me with this...
You only need pos for nx.draw(). You can set both nodes and edges using add_edges_from().
import networkx as nx
import random
G_1 = nx.Graph()
tempedgelist = [[0, 2], [0, 3], [1, 2], [1, 4], [5, 3]]
G_1.add_edges_from(tempedgelist)
n_nodes = 6
pos = {i:(random.randint(0,50),random.randint(0,100)) for i in range(n_nodes)}
nx.draw(G_1, pos, edge_labels=True)
Note: If you need to track points and positions separately, write into lists from pos:
points = []
positions = []
for i in pos:
points.append(pos[i])
positions.append(i)
positions.append(pos[i])
I don't have a proper IDE right now, but one issue I spot in your code is that pos should be a dictionary, see the networkx doc here for setting node attribute and here for drawing
Try this
import networkx as nx
import matplotlib.pyplot as plt
g= nx.Graph()
pos = {0:(0,0), 1:(1,2)}
g.add_nodes_from(range(2))
nx.set_node_attributes(g, 'pos', pos)
g.add_edge(0, 1)
nx.draw(g, pos, edge_labels=True)
plt.show()
Let me know if it works.
You must transform your list of positions into a dictionary:
pos = dict(zip(pos[::2],pos[1::2]))
Incidentally, you can build the graph directly from the edge list (the nodes are added automatically):
G1 = nx.Graph(tempedgelist)
nx.set_node_attributes(G_1,'capacity',1)
I have a MultiDiGraph created in networkx for which I am trying to add weights to the edges, after which I assign a new weight based on the frequency/count of the edge occurance. I used the following code to create the graph and add weights, but I'm not sure how to tackle reassigning weights based on count:
g = nx.MultiDiGraph()
df = pd.read_csv('G:\cluster_centroids.csv', delimiter=',')
df['pos'] = list(zip(df.longitude,df.latitude))
dict_pos = dict(zip(df.cluster_label,df.pos))
#print dict_pos
for row in csv.reader(open('G:\edges.csv', 'r')):
if '[' in row[1]: #
g.add_edges_from(eval(row[1]))
for u, v, d in g.edges(data=True):
d['weight'] = 1
for u,v,d in g.edges(data=True):
print u,v,d
Edit
I was able to successfully assign weights to each edge, first part of my original question, with the following:
for u, v, d in g.edges(data=True):
d['weight'] = 1
for u,v,d in g.edges(data=True):
print u,v,d
However, I am still unable to reassign weights based on the number of times an edge occurs (a single edge in my graph can occur multiple times)? I need to accomplish this in order to visualize edges with a higher count differently than edges with a lower count (using edge color or width). I'm not sure how to proceed with reassigning weights based on count, please advise. Below are sample data, and links to my full data set.
Data
Sample Centroids(nodes):
cluster_label,latitude,longitude
0,39.18193382,-77.51885109
1,39.18,-77.27
2,39.17917928,-76.6688633
3,39.1782,-77.2617
4,39.1765,-77.1927
5,39.1762375,-76.8675441
6,39.17468,-76.8204499
7,39.17457332,-77.2807235
8,39.17406072,-77.274685
9,39.1731621,-77.2716502
10,39.17,-77.27
Sample Edges:
user_id,edges
11011,"[[340, 269], [269, 340]]"
80973,"[[398, 279]]"
608473,"[[69, 28]]"
2139671,"[[382, 27], [27, 285]]"
3945641,"[[120, 422], [422, 217], [217, 340], [340, 340]]"
5820642,"[[458, 442]]"
6060732,"[[291, 431]]"
6912362,"[[68, 27]]"
7362602,"[[112, 269]]"
Full data:
Centroids(nodes):https://drive.google.com/open?id=0B1lvsCnLWydEdldYc3FQTmdQMmc
Edges: https://drive.google.com/open?id=0B1lvsCnLWydEdEtfM2E3eXViYkk
UPDATE
I was able to solve, at least temporarily, the issue of overly disproportional edge widths due to high edge weight by setting a minLineWidth and multiplying it by the weight:
minLineWidth = 0.25
for u, v, d in g.edges(data=True):
d['weight'] = c[u, v]*minLineWidth
edges,weights = zip(*nx.get_edge_attributes(g,'weight').items())
and using width=[d['weight'] for u,v, d in g.edges(data=True)] in nx.draw_networkx_edges() as provided in the solution below.
Additionally, I was able to scale color using the following:
# Set Edge Color based on weight
values = range(7958) #this is based on the number of edges in the graph, use print len(g.edges()) to determine this
jet = cm = plt.get_cmap('YlOrRd')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
colorList = []
for i in range(7958):
colorVal = scalarMap.to_rgba(values[i])
colorList.append(colorVal)
And then using the argument edge_color=colorList in nx.draw_networkx_edges().
Try this on for size.
Note: I added a duplicate of an existing edge, just to show the behavior when there are repeats in your multigraph.
from collections import Counter
c = Counter(g.edges()) # Contains frequencies of each directed edge.
for u, v, d in g.edges(data=True):
d['weight'] = c[u, v]
print(list(g.edges(data=True)))
#[(340, 269, {'weight': 1}),
# (340, 340, {'weight': 1}),
# (269, 340, {'weight': 1}),
# (398, 279, {'weight': 1}),
# (69, 28, {'weight': 1}),
# (382, 27, {'weight': 1}),
# (27, 285, {'weight': 2}),
# (27, 285, {'weight': 2}),
# (120, 422, {'weight': 1}),
# (422, 217, {'weight': 1}),
# (217, 340, {'weight': 1}),
# (458, 442, {'weight': 1}),
# (291, 431, {'weight': 1}),
# (68, 27, {'weight': 1}),
# (112, 269, {'weight': 1})]
Edit: To visualize the graph with edge weights as thicknesses, use this:
nx.draw_networkx(g, width=[d['weight'] for _, _, d in g.edges(data=True)])
With this code I found the list of all subgraphs, and then trying the extracting all positive and negative subnetworks but did not find any logic for this, can anyone help me
import networkx as nx
from networkx.algorithms.components.connected import connected_components
import matplotlib.pyplot as plt
G = nx.read_edgelist('/home/suman/Desktop/dataset/CA-GrQc.txt', create_using = None, nodetype=int,edgetype=int)
H=nx.connected_component_subgraphs(G)
for i in H:
print list(i)
pos=nx.spring_layout(G)
nx.draw(G,pos=pos)
nx.draw_networkx_labels(G,pos=pos)
plt.show()
I think what you're after is to create the network made up of just negative edges and the network made up of just positive edges.
If so, here is some code to do that (edited to account for the fact that add_edges_from can handle weighted edges - I had misread the documentation):
G=nx.Graph()
G.add_edges_from([(1,3),(2,4),(3,5),(4,6)], weight = 1)
G.add_edges_from([(1,2),(2,3),(3,4),(4,5)], weight = -1)
pos_edges = [(u,v,w) for (u,v,w) in G.edges(data=True) if w['weight']>0]
neg_edges = [(u,v,w) for (u,v,w) in G.edges(data=True) if w['weight']<0]
Hpos = nx.Graph()
Hneg = nx.Graph()
Hpos.add_edges_from(pos_edges)
Hneg.add_edges_from(neg_edges)
Hneg.edges(data=True)
> [(1, 2, {'weight': -1}),
(2, 3, {'weight': -1}),
(3, 4, {'weight': -1}),
(4, 5, {'weight': -1})]
Hpos.edges(data=True)
> [(1, 3, {'weight': 1}),
(2, 4, {'weight': 1}),
(3, 5, {'weight': 1}),
(4, 6, {'weight': 1})]
Please let me know if this is what you're after. I have to go now so I can't give detailed explanation, but if you have some comments on what does/does not make sense, I will respond later.
How can I randomly assign weights from a power-law distribution to a network with very large number of nodes.
I wrote
import networkx as nx
import numpy as np
from networkx.utils import powerlaw_sequence
z=nx.utils.create_degree_sequence(200,nx.utils.powerlaw_sequence,exponent=1.9)
nx.is_valid_degree_sequence(z)
G=nx.configuration_model(z)
Gcc=nx.connected_component_subgraphs(G)[0]
edgelist=[nx.utils.powerlaw_sequence(nx.number_of_edges(Gcc),exponent=2.0)]
I know I assign weights to edges by a dictionary of tuples (node1,node2,weight) using:
nx.from_edgelist(edgelist,create_using=None)
But when I am just interested in getting a weighted network where weights are power-law distributed, is there another shorter way?
You can assign weights directly using G[u][v]['weight'], for example
In [1]: import networkx as nx
In [2]: import random
In [3]: G = nx.path_graph(10)
In [4]: for u,v in G.edges():
...: G[u][v]['weight'] = random.paretovariate(2)
...:
...:
In [5]: print G.edges(data=True)
[(0, 1, {'weight': 1.6988521989583232}), (1, 2, {'weight': 1.0749963615177736}), (2, 3, {'weight': 1.1503859779558812}), (3, 4, {'weight': 1.675436575683888}), (4, 5, {'weight': 1.1948608572552846}), (5, 6, {'weight': 1.080152340891444}), (6, 7, {'weight': 1.0296667672332183}), (7, 8, {'weight': 2.0014384064255446}), (8, 9, {'weight': 2.2691612212058447})]
I used Python's random.paretovariate() to choose the weight but you can, of course, put whatever you want there.
I tried and got the following.. I hope it helps. Also, I am looking for better methods as this does not insure I get a connected network. Also, I have still to check its properties.
'''written by Aya Al-Zarka'''
import networkx as nx
import matplotlib.pyplot as plt
from networkx.utils import powerlaw_sequence
import random as r
import numpy as np
G=nx.Graph()
v=[]
for i in range(100):
v.append(i)
G.add_nodes_from(v)
weight=[]
for j in range(300):
l=powerlaw_sequence(300,exponent=2.0)
weight.append(r.choice(l))
#print(weight)
e=[]
for k in range(300):
f=[r.choice(v),r.choice(v),r.choice(weight)]
e.append(f)
G.add_weighted_edges_from(e,weight='weight')
print(nx.is_connected(G)) #not always!
m=np.divide(weight,100.0)
pos=nx.random_layout(G,dim=2)
nx.draw_networkx_nodes(G,pos,nodelist=None,node_size=300,node_color='y',
node_shape='*', alpha=1.0, cmap=None, vmin=None,
vmax=None, ax=None, linewidths=None,)
nx.draw_networkx_edges(G,pos,edgelist=None,width=m,
edge_color='b',style='solid',alpha=None,edge_cmap=None, edge_vmin=None,
edge_vmax=None, ax=None, arrows=False)
plt.ylim(0,1)
plt.xlim(0,1)
plt.axis('off')
plt.show()