Plot distribution of node attributes networkx

Plot distribution of node attributes networkx - python

The nodes in a directed graph has Name, Age and Height as attributes. I want to plot the distribution of the three attributes, is that possible?
I know that it is possible to get attributes this way:
name = nx.get_node_attributes(G, "Name")
age = nx.get_node_attributes(G, "Age")
height = nx.get_node_attributes(G, "Height")
But I don't really get how I can use those instead of G in function below?
import networkx as nx
def plot_degree_dist(G):
degrees = [G.degree(n) for n in G.nodes()]
plt.hist(degrees)
plt.show()
plot_degree_dist(nx.gnp_random_graph(100, 0.5, directed=True))
Or is there some better way to do plot the distribution of node attributes?

Seems like a perfectly reasonable way to me. I'm not aware of any more convenient method. To be more generalizable, add an argument to your function that takes the name of the attribute you'd like to plot.
Just know that nx.get_node_attributes() returns a dictionary keyed to the nodes. Since we're just plotting the distribution, we're only interested in the values and not the keys.
Here's a self-contained example following your lead:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
def plot_attribute_dist(G, attribute):
attribute = nx.get_node_attributes(G, attribute).values()
plt.hist(attribute)
plt.show()
attribute_name = 'Name'
G = nx.gnp_random_graph(100, 0.5, directed=True)
rng = np.random.default_rng(seed=42)
for node, data in G.nodes(data=True):
data[attribute_name] = rng.normal()
plot_attribute_dist(G, attribute_name)
which outputs

Related

Bipartite graph in NetworkX for LARGE amount of nodes

I am trying to create bipartite of certain nodes, for small numbers it looks perfectly fine:
Image for around 30 nodes
Unfortunately, this isn't the case for more nodes like this one:
Image for more nodes
My code for determining the position of each node looks something like this:
pos = {}
pos[SOURCE_STRING] = (0, width/2)
row = 0
for arr in left_side.keys():
pos[str(arr).replace(" ","")]=(NODE_SIZE, row)
row += NODE_SIZE
row = 0
for arr in right_side.keys():
pos[str(arr).replace(" ","")]=(2*NODE_SIZE,row)
row += NODE_SIZE
pos[SINK_STRING] = (3*NODE_SIZE, width/2)
return pos
And then I feed it to the DiGraph class:
G = nx.DiGraph()
G.add_nodes_from(nodes)
G.add_edges_from(edges, len=1)
nx.draw(G, pos=pos ,node_shape = "s", with_labels = True,node_size=NODE_SIZE)
This doesn't make much sense since they should be in the same distance from each other since NODE_SIZE is constant it doesn't change for the rest of the program.
Following this thread:
Bipartite graph in NetworkX
Didn't help me either.
Can something be done about this?
Edit(Following Paul Brodersen Advice using netGraph:
Used this documentation: netgraph doc
And still got somewhat same results, such as:
netgraph try
Using edges and different positions, also played with node size, with no success.
Code:
netgraph.Graph(edges, node_layout='bipartite', node_labels=True)
plt.show()

In your netgraph call, you are not changing the node size.
My suggestion with 30 nodes:
import numpy as np
import matplotlib.pyplot as plt
from netgraph import Graph
edges = np.vstack([np.random.randint(0, 15, 60),
np.random.randint(16, 30, 60)]).T
Graph(edges, node_layout='bipartite', node_size=0.5, node_labels=True, node_label_offset=0.1, edge_width=0.1)
plt.show()
With 100 nodes:
import numpy as np
import matplotlib.pyplot as plt
from netgraph import Graph
edges = np.vstack([np.random.randint(0, 50, 200),
np.random.randint(51, 100, 200)]).T
Graph(edges, node_layout='bipartite', node_size=0.5, node_labels=True, node_label_offset=0.1, edge_width=0.1)
plt.show()

KeyError: 'color' in networkX

I am doing a tutorial on network X: https://www.datacamp.com/community/tutorials/networkx-python-graph-tutorial
This is the following code:
import itertools
import copy
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
edgelist = pd.read_csv('https://gist.githubusercontent.com/brooksandrew /e570c38bcc72a8d102422f2af836513b/raw/89c76b2563dbc0e88384719a35cba0dfc04cd522/edgelist_sleeping_giant.csv')
nodelist = pd.read_csv('https://gist.githubusercontent.com/brooksandrew/f989e10af17fb4c85b11409fea47895b/raw/a3a8da0fa5b094f1ca9d82e1642b384889ae16e8/nodelist_sleeping_giant.csv')
g = nx.Graph()
## Add edges and edge attributes
for i, elrow in edgelist.iterrows():
g.add_edge(elrow[0], elrow[1], attr_dict=elrow[2:].to_dict())
## Add nodes and node attributes
for i, nlrow in nodelist.iterrows():
g.nodes[nlrow['id']].update(nlrow[1:].to_dict())
##Visualization
# Define node positions data structure (dict) for plotting
node_positions = {node[0]: (node[1]['X'], -node[1]['Y']) for node in g.nodes(data=True)}
# Define data structureof edge colors for plotting
edge_colors = [e[2]["color"] for e in g.edges(data=True)]
gives me a KeyError: 'color' although in the data provided, the column is called color so it has nothing to do with case sensitivity

You are missing the "attr_dict" key as the "color" key is nested inside it.
edge_colors = [e[2]["attr_dict"]["color"] for e in g.edges(data=True)]

Simplify networkx node labels

%matplotlib inline
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_node('abc#gmail.com')
nx.draw(G, with_labels=True)
plt.show()
The output figure is
What I want is
I have thousands of email records from person#email.com to another#email.com in a csv file, I use G.add_node(email_address) and G.add_edge(from, to) to build G. I want keep the whole email address in Graph G but display it in a simplified string.

networkx has a method called relabel_nodes that takes a graph (G), a mapping (the relabeling rules) and returns a new graph (new_G) with the nodes relabeled.
That said, in your case:
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_node('abc#gmail.com')
mapping = {
'abc#gmail.com': 'abc'
}
relabeled_G = nx.relabel_nodes(G,mapping)
nx.draw(relabeled_G, with_labels=True)
plt.show()
That way you keep G intact and haves simplified labels.
You can optionally modify the labels in place, without having a new copy, in which case you'd simply call G = nx.relabel_nodes(G, mapping, copy=False)
If you don't know the email addresses beforehand, you can pass relabel_nodes a function, like so:
G = nx.relabel_nodes(G, lambda email: email.split("#")[0], copy=False)

How to set random_state in networkx graph with holoviews/bokeh?

I would like to generate reproducible plots. With networkx is possible to pass the random state to the layout. That is to ensure the plot is the same. When doing the same with holoviews I am getting an error.
%pylab inline
import pandas as pd
import networkx as nx
import holoviews as hv
# generating the graph
G = nx.Graph()
ndxs = [1, 2, 3, 4]
G.add_nodes_from(ndxs)
G.add_weighted_edges_from([(1,2,0), (1,3,1), (1,4,-1),
(2,4,1), (2,3,-1), (3,4,10)])
# drawing with networkx
nx.draw(G, nx.spring_layout(G, random_state=100))
# drawing with holoviews/bokeh
hv.extension('bokeh')
%opts Graph [width=400 height=400]
layout = nx.layout.spring_layout(G, random_state=100)
hv.Graph.from_networkx(G, layout)
>>> TypeError: 'dict' object is not callable

The first issue is that the Graph.from_networkx method accepts the layout function not the dictionary that is output by that function. If you want to pass arguments to the function you can do so as keyword argument, e.g.:
hv.Graph.from_networkx(G, nx.layout.spring_layout, random_state=42)
In my version of networkx random_state is not an accepted argument to the layout functions in which case you can set the seed directly with NumPy:
np.random.seed(42)
hv.Graph.from_networkx(G, nx.layout.spring_layout)

Python: Graph using NetworkX and mplleaflet

I have a networkx graph created from edges such as these:
user_id,edges
11011,"[[340, 269], [269, 340]]"
80973,"[[398, 279]]"
608473,"[[69, 28]]"
2139671,"[[382, 27], [27, 285]]"
3945641,"[[120, 422], [422, 217], [217, 340], [340, 340]]"
5820642,"[[458, 442]]"
Example
Where the edges are a user's movements between clusters, identified by their cluster label, e.g., [[340, 269], [269, 340]]. This represents a user's movement from cluster 340 to cluster 269 and then back to cluster 340. These clusters have coordinates, stored in another file, in the form of latitude and longitude, such as these:
cluster_label,latitude,longitude
0,39.18193382,-77.51885109
1,39.18,-77.27
2,39.17917928,-76.6688633
3,39.1782,-77.2617
4,39.1765,-77.1927
Is it possible to link the edges of my graph to their respective cluster in physical space using the node/cluster's lat/long and not in the abstract space of a graph? If so, how might I go about doing so? I would like to graph this on a map using a package such as mplleaflet (like shown here: http://htmlpreview.github.io/?https://github.com/jwass/mplleaflet/master/examples/readme_example.html) or directly into QGIS/ArcMap.
EDIT
I'm attempting to convert my csv with cluster centroid coordinates into a dictionary, however, I've run into several errors. Mainly, NetwotkXError: Node 0 has no position and IndexError: too many indices for array. Below is how I'm trying to convert to a dict and then graph with mplleaflet.
import csv
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
import time
import mplleaflet
g = nx.Graph()
# Set node positions as a dictionary
df = pd.read_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_cluster_centroids.csv', delimiter=',')
df.set_index('cluster_label', inplace=True)
dict_pos = df.to_dict(orient='index')
#print dict_pos
for row in csv.reader(open('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_edges.csv', 'r')):
if '[' in row[1]: #
g.add_edges_from(eval(row[1]))
# Plotting with matplotlib
#nx.draw(g, with_labels=True, alpha=0.15, arrows=True, linewidths=0.01, edge_color='r', node_size=250, node_color='k')
#plt.show()
# Plotting with mplleaflet
fig, ax = plt.subplots()
nx.draw_networkx_nodes(g,pos=dict_pos,node_size=10)
nx.draw_networkx_edges(g,pos=dict_pos,edge_color='gray', alpha=.1)
nx.draw_networkx_labels(g,dict_pos, label_pos =10.3)
mplleaflet.display(fig=ax.figure)

yes it is quite easily possible. Try something along this lines.
Create a dictionary, where the node (cluster_label) is the key and longitude latitude are saved as values in a list. I would use pd.read_csv() to read the csv and then use the df.to_dict() to create the dictionary. It should look like this for example:
dic_pos = {u'0': [-77.51885109, 39.18193382],
u'1': [-76.6688633, 39.18],
u'2': [-77.2617, 39.1791792],
u'3': [-77.1927, 39.1782],
.....
Then plotting the graph on a map is as easy as:
import mplleaflet
fig, ax = plt.subplots()
nx.draw_networkx_nodes(GG,pos=dic_pos,node_size=10,node_color='red',edge_color='k',alpha=.5, with_labels=True)
nx.draw_networkx_edges(GG,pos=dic_pos,edge_color='gray', alpha=.1)
nx.draw_networkx_labels(GG,pos=dic_pos, label_pos =10.3)
mplleaflet.display(fig=ax.figure)
If it does not produce the expected result try to reverse latitude,longitude.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plot distribution of node attributes networkx - python

Related

Bipartite graph in NetworkX for LARGE amount of nodes

KeyError: 'color' in networkX

Simplify networkx node labels

How to set random_state in networkx graph with holoviews/bokeh?

Python: Graph using NetworkX and mplleaflet

Categories

Resources