I am doing a tutorial on network X: https://www.datacamp.com/community/tutorials/networkx-python-graph-tutorial
This is the following code:
import itertools
import copy
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
edgelist = pd.read_csv('https://gist.githubusercontent.com/brooksandrew /e570c38bcc72a8d102422f2af836513b/raw/89c76b2563dbc0e88384719a35cba0dfc04cd522/edgelist_sleeping_giant.csv')
nodelist = pd.read_csv('https://gist.githubusercontent.com/brooksandrew/f989e10af17fb4c85b11409fea47895b/raw/a3a8da0fa5b094f1ca9d82e1642b384889ae16e8/nodelist_sleeping_giant.csv')
g = nx.Graph()
## Add edges and edge attributes
for i, elrow in edgelist.iterrows():
g.add_edge(elrow[0], elrow[1], attr_dict=elrow[2:].to_dict())
## Add nodes and node attributes
for i, nlrow in nodelist.iterrows():
g.nodes[nlrow['id']].update(nlrow[1:].to_dict())
##Visualization
# Define node positions data structure (dict) for plotting
node_positions = {node[0]: (node[1]['X'], -node[1]['Y']) for node in g.nodes(data=True)}
# Define data structureof edge colors for plotting
edge_colors = [e[2]["color"] for e in g.edges(data=True)]
gives me a KeyError: 'color' although in the data provided, the column is called color so it has nothing to do with case sensitivity
You are missing the "attr_dict" key as the "color" key is nested inside it.
edge_colors = [e[2]["attr_dict"]["color"] for e in g.edges(data=True)]
Related
The nodes in a directed graph has Name, Age and Height as attributes. I want to plot the distribution of the three attributes, is that possible?
I know that it is possible to get attributes this way:
name = nx.get_node_attributes(G, "Name")
age = nx.get_node_attributes(G, "Age")
height = nx.get_node_attributes(G, "Height")
But I don't really get how I can use those instead of G in function below?
import networkx as nx
def plot_degree_dist(G):
degrees = [G.degree(n) for n in G.nodes()]
plt.hist(degrees)
plt.show()
plot_degree_dist(nx.gnp_random_graph(100, 0.5, directed=True))
Or is there some better way to do plot the distribution of node attributes?
Seems like a perfectly reasonable way to me. I'm not aware of any more convenient method. To be more generalizable, add an argument to your function that takes the name of the attribute you'd like to plot.
Just know that nx.get_node_attributes() returns a dictionary keyed to the nodes. Since we're just plotting the distribution, we're only interested in the values and not the keys.
Here's a self-contained example following your lead:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
def plot_attribute_dist(G, attribute):
attribute = nx.get_node_attributes(G, attribute).values()
plt.hist(attribute)
plt.show()
attribute_name = 'Name'
G = nx.gnp_random_graph(100, 0.5, directed=True)
rng = np.random.default_rng(seed=42)
for node, data in G.nodes(data=True):
data[attribute_name] = rng.normal()
plot_attribute_dist(G, attribute_name)
which outputs
I am importing CSV data into networkX but cannot get the grouping or the colours to work. When i include them I get a Key error of 'group'. My CSVs are set up with the headers:
Nodes: 'name', 'group', 'nodesize'
Edges: 'source','target','value'
When i remove all code related to groups or colours it produces a Graph.
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
G = nx.Graph(day="Stackoverflow")
df_nodes = pd.read_csv('C:/Users/classic.123456/Documents/Nodes.csv')
df_edges = pd.read_csv('C:/Users/classic.123456/Documents/Edges.csv')
for index, row in df_nodes.iterrows():
G.add_node(row['name'], group = row['group'], nodesize=row['nodesize'])
for index, row in df_edges.iterrows():
G.add_weighted_edges_from([(row['source'], row['target'], row['value'])])
color_map = {1:'#f09494', 2:'#eebcbc', 3:'#72bbd0', 4:'#91f0a1', 5:'#629fff', 6:'#bcc2f2'}
plt.figure(figsize=(25,25))
options = {
'edge_color': '#FFDEA2',
'width': 1,
'with_labels': True,
'font_weight': 'regular',
}
colors = [color_map[G.node[node]['group']] for node in G]
sizes = [G.node[node]['nodesize']*10 for node in G]
"""
Using the spring layout :
- k controls the distance between the nodes and varies between 0 and 1
- iterations is the number of times simulated annealing is run
default k=0.1 and iterations=50
"""
nx.draw(G, pos=nx.spring_layout(G), **options)
ax = plt.gca()
ax.collections[0].set_edgecolor("#555555")
plt.show()
I have the following DataFrame:
import pandas as pd
df = pd.DataFrame({'id_emp': [1,2,3,4,1],
'name_emp': ['x','y','z','w','x'],
'donnated_value':[1100,11000,500,300,1000],
'refound_value':[22000,22000,50000,450,90]
})
df['return_percentagem'] = 100 *
df['refound_value']/df['donnated_value']
df['classification_roi'] = ''
def comunidade(i):
if i < 50:
return 'Bad Investment'
elif i >=50 and i < 100:
return 'Median Investment'
elif i >= 100:
return 'Good Investment'
df['classification_roi'] = df['return_percentagem'].map(comunidade)
df
The nodes would be the 'id_emp'. There will be a connection between two nodes if they have the same 'id_emp' but with distinct classifications in the 'classification_roi' column or if they have the same rank in the 'classification_roi' column. In short, the nodes have a connection if they have the same id or if they are in the same classification in the column 'classification_roi'
I do not have much practice with networkx and what I'm trying is far from ideal:
import networkx as nx
G = nx.from_pandas_edgelist(df, 'id_emp', 'return_percentagem')
nx.draw(G, with_labels=True)
Every help is welcome.
Here, I am not using from_pandas_edgelist. Instead, list comprehensions and for-loops:
import matplotlib.pyplot as plt
import networkx as nx
import itertools
G = nx.Graph()
# use index to name nodes, rather than id_emp, otherwise
# multiple nodes would end up having the same name
G.add_nodes_from([a for a in df.index])
#create edges:
#same employee edges
for ie in set(df['id_emp']):
indices = df[df['id_emp']==ie].index
G.add_edges_from(itertools.product(indices,indices))
# same classification edges
for cr in set(df['classification_roi']):
indices = df[df['classification_roi']==cr].index
G.add_edges_from(itertools.product(indices,indices))
nx.draw(G)
plt.show()
Optional: colouring, to distinguish nodes.
plt.subplot(121)
plt.title('coloured by id_emp')
nx.draw(G, node_color=df['id_emp'], cmap='viridis')
plt.subplot(122)
color_mapping = {
'Bad Investment': 0,
'Median Investment': 1,
'Good Investment':2}
plt.title('coloured by classification_roi')
nx.draw(G, node_color=df['classification_roi'].replace(color_mapping), cmap='RdYlBu')
%matplotlib inline
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_node('abc#gmail.com')
nx.draw(G, with_labels=True)
plt.show()
The output figure is
What I want is
I have thousands of email records from person#email.com to another#email.com in a csv file, I use G.add_node(email_address) and G.add_edge(from, to) to build G. I want keep the whole email address in Graph G but display it in a simplified string.
networkx has a method called relabel_nodes that takes a graph (G), a mapping (the relabeling rules) and returns a new graph (new_G) with the nodes relabeled.
That said, in your case:
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_node('abc#gmail.com')
mapping = {
'abc#gmail.com': 'abc'
}
relabeled_G = nx.relabel_nodes(G,mapping)
nx.draw(relabeled_G, with_labels=True)
plt.show()
That way you keep G intact and haves simplified labels.
You can optionally modify the labels in place, without having a new copy, in which case you'd simply call G = nx.relabel_nodes(G, mapping, copy=False)
If you don't know the email addresses beforehand, you can pass relabel_nodes a function, like so:
G = nx.relabel_nodes(G, lambda email: email.split("#")[0], copy=False)
This is my code, I am plotting a randomly generated graph containing 8 nodes
import networkx as nx
import matplotlib.pyplot as plt
import random
from sets import Set
from array import array
def draw_graph(graph):
# extract nodes from graph
nodes = set([n1 for n1, n2 in graph] + [n2 for n1, n2 in graph])
# create networkx graph
G=nx.Graph()
# add nodes
for node in nodes:
G.add_node(node)
# add edges
for edge in graph:
G.add_edge(edge[0], edge[1])
# draw graph
pos = nx.shell_layout(G)
nx.draw(G, pos)
# show graph
plt.show()
##draw example
zero=random.choice([1,4])
one=random.choice([2,3])
two=random.choice([1,7])
three=random.choice([1,5])
four=random.choice([0,6])
five=random.choice([3,7])
seven=random.choice([2,5])
graph = [(0,zero),(1,one),(2,two),(3,three),(4,four),(5,five),(6,4),(7,seven)]
draw_graph(graph)
now I have to label each node and I has to know positions of each node.. How to do it?? thanks in advance..