Initially, i have 2d array. By using this array i have created a graph with weight on its edges. Now i am trying to use this Graph to make Minimum Spanning Tree matrix but i cant make it as desire. I am using the following code to make graph.
G = nx.from_numpy_matrix(ED_Matrix, create_using=nx.DiGraph)
layout = nx.spring_layout(G)
sizes = len(ED_Matrix)
nx.draw(G, layout, with_labels=True, node_size=sizes)
labels = nx.get_edge_attributes(G, "weight")
output = nx.draw_networkx_edge_labels(G, pos=layout, edge_labels=labels)
plt.show()
And its gives the output like this
Now i am using MST code, to get the its MST matrix but its gives error like this.
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import minimum_spanning_tree
Tcsr = minimum_spanning_tree(G)
Tcsr.toarray().astype(int)
Taking into account example from docs of scipy, it should be constructed from adjacency matrix of G, not from G.
You might want to replace G with nx.adjacency_matrix(G) or csr_matrix(nx.adjacency_matrix(G)) or ED_Matrix itself in calculation (assignment) of Tcsr:
Tcsr = minimum_spanning_tree(nx.adjacency_matrix(G)) #or
Tcsr = minimum_spanning_tree(csr_matrix(nx.adjacency_matrix(G))) #or
Tcsr = minimum_spanning_tree(ED_Matrix)
Tcsr is a sparse matrix which is later converted to numpy array.
Related
Given the following example which is from: https://python-graph-gallery.com/404-dendrogram-with-heat-map/
It generates a dendrogram where I assume that it is based on scipy.
# Libraries
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
# Data set
url = 'https://python-graph-gallery.com/wp-content/uploads/mtcars.csv'
df = pd.read_csv(url)
df = df.set_index('model')
del df.index.name
df
# Default plot
sns.clustermap(df)
Question: How can one get the dendrogram in non-graphical form?
Background information:
From the root of that dendrogram I want to cut it at the largest length. For example we have one edge from the root to a left cluster (L) and an edge to a right cluster (R) ...from those two I'd like to get their edge lengths and cut the whole dendrogram at the longest of these two edges.
Best regards
clustermap returns a handle to the ClusterGrid object, which includes child objects for each dendrogram,
h.dendrogram_col and h.dendrogram_row.
Inside these are the dendrograms themselves, which provides the dendrogram geometry
as per the scipy.hierarchical.dendrogram return data, from which you could compute
the lengths of a specific branch.
h = sns.clustermap(df)
dgram = h.dendrogram_col.dendrogram
D = np.array(dgram['dcoord'])
I = np.array(dgram['icoord'])
# then the root node will be the last entry, and the length of the L/R branches will be
yy = D[-1]
lenL = yy[1]-yy[0]
lenR = yy[2]-yy[3]
The linkage matrix, the input used to compute the dendrogram, might also help:
h.dendrogram_col.linkage
h.dendrogram_row.linkage
I'm using NetworkX in python. Given any undirected and unweighted graph, I want to loop through all the nodes. With each node, I want to add a random edge and/or delete an existing random edge for that node with probability p. Is there a simple way to do this? Thanks a lot!
Create a new random edge in networkx
Let's set up a test graph:
import networkx as nx
import random
import matplotlib.pyplot as plt
graph = nx.Graph()
graph.add_edges_from([(1,3), (3,5), (2,4)])
nx.draw(graph, with_labels=True)
plt.show()
Now we can pick a random edge from a list of non-edge from the graph. It is not totally clear yet what is the probability you mentioned. Since you add a comment stating that you want to use random.choice I'll stick to that.
def random_edge(graph, del_orig=True):
'''
Create a new random edge and delete one of its current edge if del_orig is True.
:param graph: networkx graph
:param del_orig: bool
:return: networkx graph
'''
edges = list(graph.edges)
nonedges = list(nx.non_edges(graph))
# random edge choice
chosen_edge = random.choice(edges)
chosen_nonedge = random.choice([x for x in nonedges if chosen_edge[0] == x[0]])
if del_orig:
# delete chosen edge
graph.remove_edge(chosen_edge[0], chosen_edge[1])
# add new edge
graph.add_edge(chosen_nonedge[0], chosen_nonedge[1])
return graph
Usage exemple:
new_graph = random_edge(graph, del_orig=True)
nx.draw(new_graph, with_labels=True)
plt.show()
We can still add a probability distribution over the edges in random.choiceif you need to (using numpy.random.choice() for instance).
Given a node i, To add edges without duplication you need to know (1) what edges from i already exist and then compute (2) the set of candidate edges that don't exist from i. For removals, you already defined a method in the comment - which is based simply on (1).
Here is a function that will provide one round of randomised addition and removal, based on list comprehensions
def add_and_remove_edges(G, p_new_connection, p_remove_connection):
'''
for each node,
add a new connection to random other node, with prob p_new_connection,
remove a connection, with prob p_remove_connection
operates on G in-place
'''
new_edges = []
rem_edges = []
for node in G.nodes():
# find the other nodes this one is connected to
connected = [to for (fr, to) in G.edges(node)]
# and find the remainder of nodes, which are candidates for new edges
unconnected = [n for n in G.nodes() if not n in connected]
# probabilistically add a random edge
if len(unconnected): # only try if new edge is possible
if random.random() < p_new_connection:
new = random.choice(unconnected)
G.add_edge(node, new)
print "\tnew edge:\t {} -- {}".format(node, new)
new_edges.append( (node, new) )
# book-keeping, in case both add and remove done in same cycle
unconnected.remove(new)
connected.append(new)
# probabilistically remove a random edge
if len(connected): # only try if an edge exists to remove
if random.random() < p_remove_connection:
remove = random.choice(connected)
G.remove_edge(node, remove)
print "\tedge removed:\t {} -- {}".format(node, remove)
rem_edges.append( (node, remove) )
# book-keeping, in case lists are important later?
connected.remove(remove)
unconnected.append(remove)
return rem_edges, new_edges
To see this function in action:
import networkx as nx
import random
import matplotlib.pyplot as plt
p_new_connection = 0.1
p_remove_connection = 0.1
G = nx.karate_club_graph() # sample graph (undirected, unweighted)
# show original
plt.figure(1); plt.clf()
fig, ax = plt.subplots(2,1, num=1, sharex=True, sharey=True)
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos=pos, ax=ax[0])
# now apply one round of changes
rem_edges, new_edges = add_and_remove_edges(G, p_new_connection, p_remove_connection)
# and draw new version and highlight changes
nx.draw_networkx(G, pos=pos, ax=ax[1])
nx.draw_networkx_edges(G, pos=pos, ax=ax[1], edgelist=new_edges,
edge_color='b', width=4)
# note: to highlight edges that were removed, add them back in;
# This is obviously just for display!
G.add_edges_from(rem_edges)
nx.draw_networkx_edges(G, pos=pos, ax=ax[1], edgelist=rem_edges,
edge_color='r', style='dashed', width=4)
G.remove_edges_from(rem_edges)
plt.show()
And you should see something like this.
Note that you could also do something similar with the adjacency matrix,
A = nx.adjacency_matrix(G).todense() (it's a numpy matrix so operations like A[i,:].nonzero() would be relevant). This might be more efficient if you have extremely large networks.
I'm trying to represent some numbers as edges of a graph with connected components. For this, I've been using python's networkX module.
My graph is G, and has nodes and edges initialised as follows:
G = nx.Graph()
for (x,y) in my_set:
G.add_edge(x,y)
print G.nodes() #This prints all the nodes
print G.edges() #Prints all the edges as tuples
adj_matrix = nx.to_numpy_matrix(G)
Once I add the following line,
pos = nx.spring_layout(adj_matrix)
I get the abovementioned error.
If it might be useful, all the nodes are numbered in 9-15 digits. There are 412 nodes and 422 edges.
Detailed error:
File "pyjson.py", line 89, in <module>
mainevent()
File "pyjson.py", line 60, in mainevent
pos = nx.spring_layout(adj_matrix)
File "/usr/local/lib/python2.7/dist-packages/networkx/drawing/layout.py", line 244, in fruchterman_reingold_layout
A=nx.to_numpy_matrix(G,weight=weight)
File "/usr/local/lib/python2.7/dist-packages/networkx/convert_matrix.py", line 128, in to_numpy_matrix
nodelist = G.nodes()
AttributeError: 'matrix' object has no attribute 'nodes'
Edit: Solved below. Useful information: pos creates a dict with coordinates for each node. Doing nx.draw(G,pos) creates a pylab figure. But it doesn't display it, because pylab doesn't display automatically.
(some of this answer addresses some things in your comments. Can you add those to your question so that later users get some more context)
pos creates a dict with coordinates for each node. Doing nx.draw(G,pos) creates a pylab figure. But it doesn't display it, because pylab doesn't display automatically.
import networkx as nx
import pylab as py
G = nx.Graph()
for (x,y) in my_set:
G.add_edge(x,y)
print G.nodes() #This prints all the nodes
print G.edges() #Prints all the edges as tuples
pos = nx.spring_layout(G)
nx.draw(G,pos)
py.show() # or py.savefig('graph.pdf') if you want to create a pdf,
# similarly for png or other file types
The final py.show() will display it. py.savefig('filename.extension') will save as any of a number of filetypes based on what you use for extension.
spring_layout takes a network graph as it's first param and not a numpy array. What it returns are the positions of the nodes according to the Fruchterman-Reingold force-directed algorithm.
So you need to pass this to draw example:
import networkx as nx
%matplotlib inline
G=nx.lollipop_graph(14, 3)
nx.draw(G,nx.spring_layout(G))
yields:
I have an adjacency matrix stored as a pandas.DataFrame:
node_names = ['A', 'B', 'C']
a = pd.DataFrame([[1,2,3],[3,1,1],[4,0,2]],
index=node_names, columns=node_names)
a_numpy = a.as_matrix()
I'd like to create an igraph.Graph from either the pandas or the numpy adjacency matrices. In an ideal world the nodes would be named as expected.
Is this possible? The tutorial seems to be silent on the issue.
In igraph you can use igraph.Graph.Adjacency to create a graph from an adjacency matrix without having to use zip. There are some things to be aware of when a weighted adjacency matrix is used and stored in a np.array or pd.DataFrame.
igraph.Graph.Adjacency can't take an np.array as argument, but that is easily solved using tolist.
Integers in adjacency-matrix are interpreted as number of edges between nodes rather than weights, solved by using adjacency as boolean.
An example of how to do it:
import igraph
import pandas as pd
node_names = ['A', 'B', 'C']
a = pd.DataFrame([[1,2,3],[3,1,1],[4,0,2]], index=node_names, columns=node_names)
# Get the values as np.array, it's more convenenient.
A = a.values
# Create graph, A.astype(bool).tolist() or (A / A).tolist() can also be used.
g = igraph.Graph.Adjacency((A > 0).tolist())
# Add edge weights and node labels.
g.es['weight'] = A[A.nonzero()]
g.vs['label'] = node_names # or a.index/a.columns
You can reconstruct your adjacency dataframe using get_adjacency by:
df_from_g = pd.DataFrame(g.get_adjacency(attribute='weight').data,
columns=g.vs['label'], index=g.vs['label'])
(df_from_g == a).all().all() # --> True
Strictly speaking, an adjacency matrix is boolean, with 1 indicating the presence of a connection and 0 indicating the absence. Since many of the values in your a_numpy matrix are > 1, I will assume that they correspond to edge weights in your graph.
import igraph
# get the row, col indices of the non-zero elements in your adjacency matrix
conn_indices = np.where(a_numpy)
# get the weights corresponding to these indices
weights = a_numpy[conn_indices]
# a sequence of (i, j) tuples, each corresponding to an edge from i -> j
edges = zip(*conn_indices)
# initialize the graph from the edge sequence
G = igraph.Graph(edges=edges, directed=True)
# assign node names and weights to be attributes of the vertices and edges
# respectively
G.vs['label'] = node_names
G.es['weight'] = weights
# I will also assign the weights to the 'width' attribute of the edges. this
# means that igraph.plot will set the line thicknesses according to the edge
# weights
G.es['width'] = weights
# plot the graph, just for fun
igraph.plot(G, layout="rt", labels=True, margin=80)
This is possible with igraph.Graph.Weighted_Adjacency as
g = igraph.Graph.Weighted_Adjacency(a.to_numpy().tolist())
pandas.DataFrame.as_matrix has been deprecated,
so pandas.DataFrame.to_numpy should be used instead.
Additionally the numpy.ndarray given by a.to_numpy() must be converted to a list with tolist() before being passed to Weighted_Adjacency.
The node names can be stored as another attribute with
g.vs['name'] = node_names
I imported my Facebook data onto my computer in the form of a .json file. The data is in the format:
{"nodes":[{"name":"Alan"},{"name":"Bob"}],"links":[{"source":0,"target:1"}]}
Then, I use this function:
def parse_graph(filename):
"""
Returns networkx graph object of facebook
social network in json format
"""
G = nx.Graph()
json_data=open(filename)
data = json.load(json_data)
# The nodes represent the names of the respective people
# See networkx documentation for information on add_* functions
G.add_nodes_from([n['name'] for n in data['nodes']])
G.add_edges_from([(data['nodes'][e['source']]['name'],data['nodes'][e['target']]['name']) for e in data['links']])
json_data.close()
return G
to enable this .json file to be used a graph on NetworkX. If I find the degree of the nodes, the only method I know how to use is:
degree = nx.degree(p)
Where p is the graph of all my friends. Now, I want to plot the graph such that the size of the node is the same as the degree of that node. How do I do this?
Using:
nx.draw(G,node_size=degree)
didn't work and I can't think of another method.
Update for those using networkx 2.x
The API has changed from v1.x to v2.x. networkx.degree no longer returns a dict but a DegreeView Object as per the documentation.
There is a guide for migrating from 1.x to 2.x here.
In this case it basically boils down to using dict(g.degree) instead of d = nx.degree(g).
The updated code looks like this:
import networkx as nx
import matplotlib.pyplot as plt
g = nx.Graph()
g.add_edges_from([(1,2), (2,3), (2,4), (3,4)])
d = dict(g.degree)
nx.draw(g, nodelist=d.keys(), node_size=[v * 100 for v in d.values()])
plt.show()
nx.degree(p) returns a dict while the node_size keywod argument needs a scalar or an array of sizes. You can use the dict nx.degree returns like this:
import networkx as nx
import matplotlib.pyplot as plt
g = nx.Graph()
g.add_edges_from([(1,2), (2,3), (2,4), (3,4)])
d = nx.degree(g)
nx.draw(g, nodelist=d.keys(), node_size=[v * 100 for v in d.values()])
plt.show()
#miles82 provided a great answer. However, if you've already added the nodes to your graph using something like G.add_nodes_from(nodes), then I found that d = nx.degree(G) may not return the degrees in the same order as your nodes.
Building off the previous answer, you can modify the solution slightly to ensure the degrees are in the correct order:
d = nx.degree(G)
d = [(d[node]+1) * 20 for node in G.nodes()]
Note the d[node]+1, which will be sure that nodes of degree zero are added to the chart.
other method if you still get 'DiDegreeView' object has no attribute 'keys'
1)you can first get the degree of each node as a list of tuples
2)build a node list from the first value of tuple and degree list from the second value of tuple.
3)finally draw the network with the node list you've created and degree list you've created
here's the code:
list_degree=list(G.degree()) #this will return a list of tuples each tuple is(node,deg)
nodes , degree = map(list, zip(*list_degree)) #build a node list and corresponding degree list
plt.figure(figsize=(20,10))
nx.draw(G, nodelist=nodes, node_size=[(v * 5)+1 for v in degree])
plt.show() #ploting the graph