Is there a function available in Python's NetworkX for generating random directed graphs with a maximum Euclidean distance between any two connected nodes? For example, for nodes separated by a certain Euclidean distance, there is a probability p of those nodes being connected and for all other nodes separated by greater than this distance, they will not be connected in the graph that is generated.
If you have a threshold such that distances greater than the threshold do not exist, and all edges shorter than that threshold have probability p, then you're in luck. [if it's not the same probability for all shorter edges, it's still doable but a bit harder]
Start by building a random geometric graph G. This is a graph whose nodes are put in place uniformly at random and any two are connected if they are within a threshold distance from each other.
Then create a new directed graph which has each direction of the edges in G with probability p.
import networkx as nx
import random
N=100 # 100 nodes
D = 0.2 #threshold distance of 0.2
G = nx.random_geometric_graph(N, D)
H = nx.Digraph()
H.add_nodes_from(G.edges())
p = 0.1 #keep 10% of the edges
for u,v in G.edges():
if random.random()<p:
H.add_edge(u,v)
if random.random()<p:
H.add_edge(v,u)
Related
For each node (pink nodes) in an undirected/unweighted graph, I want to compute the closest distance to a given set of connected nodes (red nodes). It is essential that the computation is fast, also on large graphs (~4000 nodes) and for a large set of connected nodes. See the illustration below. What is the best algorithm/graph library for me to do this?
I tried to do something like that with NetworkX - shortest_path_length already, but it is too slow. Especially for a large set of connected red nodes.
import networkx as nx
all_distances = []
for node in red_nodes:
distances = nx.shortest_path_length(graph, source=node) # compute distances to all nodes in the graph from the source node
all_distances.append(distances)
shortest_distances = filter_for_shortest_distance(all_distances)
Here is an example on how to access a graph that I am working with. The red nodes could be any subset of connected nodes in the graph.
# Import navis
import navis
# Load one of the example neurons
sk = navis.example_neurons(n=1, kind='skeleton')
# Inspect the neuron graph
graph = sk.get_graph_nx()
You can use Dijkstra's algorithm with multiple start points:
import networkx as nx
import matplotlib.pyplot as plt
import typing
Node = int
def get_distances(graph: nx.Graph, red_nodes: typing.Set[Node]) -> typing.Dict[Node, int]:
node_to_distance = {}
node_to_neighbours = graph.adj
# fill node_to_distance for red nodes and init set of reachable nodes
reachable = set()
for node in red_nodes:
node_to_distance[node] = 0
for neighbour in node_to_neighbours[node]:
if neighbour not in red_nodes:
node_to_distance[neighbour] = 1
reachable.add(neighbour)
# for each reachable node add neighbours as reachable nodes with greater distance
while reachable:
node = reachable.pop()
distance = node_to_distance[node]
for neighbour in node_to_neighbours[node]:
if distance + 1 < node_to_distance.get(neighbour, len(graph)):
node_to_distance[neighbour] = distance + 1
reachable.add(neighbour)
return node_to_distance
def check():
g = nx.petersen_graph()
red_nodes = {1, 2}
node_to_distance = get_distances(g, red_nodes=red_nodes)
plt.subplot(121)
node_colors = ["red" if node in red_nodes else "yellow" for node in g.nodes()]
nx.draw(g, with_labels=True, node_color=node_colors, labels=node_to_distance)
plt.show()
check()
i've tested it on 40000 nodes graph and it took 42 ms to calc distances with 10000 red nodes
I have an adjacent matrix and I need to calculate the fraction of nodes in the largest component (or the largest weakly connected component in the case of a directed network):
# from dataframe
matrix_weak = matrix.copy()
# to numpy arrays
matrix_weak_to_numpy = matrix_weak.to_numpy()
G = nx.from_numpy_matrix(matrix_weak_to_numpy)
G = G.to_directed() # weakly connected component needs a directed
graph
max_wcc = max(nx.weakly_connected_components(G), key=len)
max_wcc = nx.subgraph(G, max_wcc)
How do I calculate this fraction from the code above?
The total number of nodes in the network is G.number_of_nodes(), so if I understand correctly, the answer is:
fraction = max_wcc.number_of_nodes() / G.number_of_nodes()
I have a large graph consisting of ~80k nodes and ~2M nodes (SNAP twitter graph). I want to downsample the graph with n number of nodes picked uniformly randomly (n=~1k), without losing certain properties of the graph (average clustering coefficient and average degree).
I can subgraph in networkx using:
sg = g.subgraph(list_of_nodes)
Is it possible to use networkx to do what I mentioned?
I am trying to have nodes connect to a main node with different distances.
What I have so far:
import networkx as nx
G = nx.empty_graph( 3 , create_using= None)
G.add_edge(0,1)
G.add_edge(0,2)
Graph with equal distance to a main node
However, as it can be seen from the image, the distance between the node on either side have equal distance to the main node. Is there a way to have their distance to the main node different?
There are two parts to your question:
Part 1 - Distance between nodes:
In network theory, the distance between nodes is represented by the weight of the edge between them. So you can add all your edges with weights to your network with the following line:
G = nx.Graph()
G.add_weighted_edges_from([(0,1,4.0),(0,2,5.0)])
You can randomize the weights on the edges above for random distance between nodes.
Part 2 - Network Visualization:
I understand that you're more concerned with how the network graph is shown. If you use nx.draw_random(G) you can get randomized distances between your nodes, and suggest that you save a picture when you get the desired figure, as it randomizes every time you run.
Hope it helps... :)
I am using facebook snap dataset and making a graph on it using networkX on python. But not been able to find the most important or you can say the most connected one in the network.
The code i am using i making a graph on facebook snap dataset is here:
import networkx as nx
import matplotlib.pyplot as plt
'''Exploratory Data Analysis'''
g = nx.read_edgelist('facebook_combined.txt', create_using=nx.Graph(), nodetype=int)
print nx.info(g)
'''Simple Graph'''
sp = nx.spring_layout(g)
nx.draw_networkx(g, pos=sp, with_labels=False, node_size=35)
# plt.axes("off")
plt.show()
The result it gives is this:
The link to the dataset is here
The source of dataset is here
But the question is that how can i find the most important individual in this network ?
One way to define "importance" is the individual's betweenness centrality. The betweenness centrality is a measure of how many shortest paths pass through a particular vertex. The more shortest paths that pass through the vertex, the more central the vertex is to the network.
Because the shortest path between any pair of vertices can be determined independently of any other pair of vertices.
To do this, we will use the Pool object from the multiprocessing library and the itertools library.
First thing we need to do is partition the vertices of the network into n subsets where n is dependent on the number of processors we have access to. For example, if we use a machine with 32 cores, we partition the Facebook network in 32 chunks with each chunk containing 128 vertices.
Now instead of one processor computing the betweenness for all 4,039 vertices, we can have 32 processors computing the betweenness for each of their 128 vertices in parallel. This drastically reduces the run-time of the algorithm and allows it to scale to larger networks.
The code i used is this:
import networkx as nx
import matplotlib.pyplot as plt
'''Exploratory Data Analysis'''
g = nx.read_edgelist('facebook_combined.txt', create_using=nx.Graph(), nodetype=int)
print nx.info(g)
'''Parallel Betweenness Centrality'''
from multiprocessing import Pool
import itertools
spring_pos = nx.spring_layout(g)
def partitions(nodes, n):
# '''Partitions the nodes into n subsets'''
nodes_iter = iter(nodes)
while True:
partition = tuple(itertools.islice(nodes_iter,n))
if not partition:
return
yield partition
def btwn_pool(G_tuple):
return nx.betweenness_centrality_source(*G_tuple)
def between_parallel(G, processes=None):
p = Pool(processes=processes)
part_generator = 4 * len(p._pool)
node_partitions = list(partitions(G.nodes(), int(len(G) / part_generator)))
num_partitions = len(node_partitions)
bet_map = p.map(btwn_pool,
zip([G] * num_partitions,
[True] * num_partitions,
[None] * num_partitions,
node_partitions))
bt_c = bet_map[0]
for bt in bet_map[1:]:
for n in bt:
bt_c[n] += bt[n]
return bt_c
bt = between_parallel(g)
top = 10
max_nodes = sorted(bt.iteritems(), key=lambda v: -v[1])[:top]
bt_values = [5] * len(g.nodes())
bt_colors = [0] * len(g.nodes())
for max_key, max_val in max_nodes:
bt_values[max_key] = 150
bt_colors[max_key] = 2
plt.axis("off")
nx.draw_networkx(g, pos=spring_pos, cmap=plt.get_cmap("rainbow"), node_color=bt_colors, node_size=bt_values,
with_labels=False)
plt.show()
The output it gives:
Now, let's look at the vertices with the top 10 highest betweenness centrality measures in the network.
As you can see, vertices that primarily either sit at the center of a hub or acts a bridge between two hubs have higher betweenness centrality. The bridge vertices have high betweenness because all paths connecting the hubs pass through them, and the hub center vertices have high betweenness because all intra-hub paths pass through them.