I am trying to have nodes connect to a main node with different distances.
What I have so far:
import networkx as nx
G = nx.empty_graph( 3 , create_using= None)
G.add_edge(0,1)
G.add_edge(0,2)
Graph with equal distance to a main node
However, as it can be seen from the image, the distance between the node on either side have equal distance to the main node. Is there a way to have their distance to the main node different?
There are two parts to your question:
Part 1 - Distance between nodes:
In network theory, the distance between nodes is represented by the weight of the edge between them. So you can add all your edges with weights to your network with the following line:
G = nx.Graph()
G.add_weighted_edges_from([(0,1,4.0),(0,2,5.0)])
You can randomize the weights on the edges above for random distance between nodes.
Part 2 - Network Visualization:
I understand that you're more concerned with how the network graph is shown. If you use nx.draw_random(G) you can get randomized distances between your nodes, and suggest that you save a picture when you get the desired figure, as it randomizes every time you run.
Hope it helps... :)
Related
I am working with networks undergoing a number of disrupting events. So, a number of nodes fail because of a given event. Therefore there is a transition between the image to the left to that to the right:
My question: how can I find the disconnected subgraphs, even if they contain only 1 node? My purpose is to count them and render as failed, as in my study this is what applies to them. By semi-isolated nodes, I mean groups of isolated nodes, but connected to each other.
I know I can find isolated nodes like this:
def find_isolated_nodes(graph):
""" returns a list of isolated nodes. """
isolated = []
for node in graph:
if not graph[node]:
isolated += node
return isolated
but how would you amend these lines to make them find groups of isolated nodes as well, like those highlighted in the right hand side picture?
MY THEORETICAL ATTEMPT
It looks like this problem is addressed by the Flood Fill algorithm, which is explained here. However, I wonder how it could be possible to simply count the number of nodes in the giant component(s) and then subtract it from the number of nodes that appear still active at stage 2. How would you implement this?
If I understand correctly, you are looking for "isolated" nodes, meaning the nodes not in the largest component of the graph. As you mentioned, one method to identify the "isolated" nodes is to find all the nodes NOT in the largest component. To do so, you can just use networkx.connected_components, to get a list of the components and sort them by size:
components = list(nx.connected_components(G)) # list because it returns a generator
components.sort(key=len, reverse=True)
Then you can find the largest component, and get a count of the "isolated" nodes:
largest = components.pop(0)
num_isolated = G.order() - len(largest)
I put this all together in an example where I draw a Erdos-Renyi random graph, coloring isolated nodes blue:
# Load modules and create a random graph
import networkx as nx, matplotlib.pyplot as plt
G = nx.gnp_random_graph(10, 0.15)
# Identify the largest component and the "isolated" nodes
components = list(nx.connected_components(G)) # list because it returns a generator
components.sort(key=len, reverse=True)
largest = components.pop(0)
isolated = set( g for cc in components for g in cc )
# Draw the graph
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos=pos, nodelist=largest, node_color='r')
nx.draw_networkx_nodes(G, pos=pos, nodelist=isolated, node_color='b')
nx.draw_networkx_edges(G, pos=pos)
plt.show()
I'm trying to predict the possible future links or missing edges in a directed graph, that is, will there be links between any node pairs in the future?
This is the dataset I'm using right now : https://github.com/JiaCheng-Lai/Link_Prediction
data_train_edge.csv is used for training data, there are about 20,000 edges. (This is a directed network, so each node pair represents a directed edge. E.g., (361, 981) represents an edge from node 361 to node 981.)
predict.csv is the node pair (node1, node2) to be predicted. The third cloumn ans is the prediction result 0 or 1. (0 means there is no hidden edge for this node pair; otherwise the ans is 1.)
import numpy as np
import pandas as pd
import networkx as nx
df = pd.read_csv("data_train_edge.csv")
G = nx.DiGraph()
G.add_edges_from(df[['node1', 'node2']].values)
I just built a directed graph with the above code, but I'm not quite sure how to do link prediction. You can use any kind of prediction method such as Jaccard Coefficient, Adamic-Adar index, etc.
However, I think networkx does not support link prediction using directed graphs because of Errors. If directed graph really doesn't works, it can be implemented with undirected graphs.
It would be a great help if can provide the code or any tips! Thanks a lot.
I have a csr matrix from which I extracted data, rows, and columns.
I want to create a bipartite graph using NetworkX, and I also tried several solutions without success (as an example: Plot bipartite graph using networkx in Python). The reasons why it doesn't work, in my opinion, is a matter of labeling. My two sets and the nodes inside them have no string name.
For example in a 10x10 matrix, the rows/cols indexes represent the name of the nodes of the two sets, while the intersection of these nodes is the weighted link between those nodes.
In my case, then, if I have (0,0)=0.5 it doesn't mean that it is a self-loop; instead, the link with weight 0.5 connects the "node 0" of the first set with the "node 0" of the second one.
import networkx as nx
from networkx.algorithms import bipartite
import matplotlib.pyplot as plt
def function(foo, n_row, n_col):
n_row=10
n_col=10
After the creation of the matrix, I obtain my data
weights = weights.tocsr()
wcoo = weights.tocoo()
m_data = wcoo.data
m_rows = wcoo.row
m_cols = wcoo.col
g = nx.Graph()
# TRIAL 1
g.add_nodes_from(m_cols, bipartite=0)
g.add_nodes_from(m_rows, bipartite=1)
bi_m = bipartite.matrix.biadjacency_matrix(g, m_data)
# TRIAL 2
g.add_weighted_edges_from(zip(m_cols, m_rows, m_data))
nx.draw(g, node_size=500)
plt.show()
I expected a bipartite graph with two sets of 10 nodes per each with a certain amount of weighted links among them (without link among the same set) as a result.
I, instead, obtained a classic non-oriented graph with 10 nodes in total.
At the same time, I'd like to optimize as well as I can my code to speed-up the computational time without affecting the readability.
I have a problem involving graph theory. To solve it, I would like to create a weighted graph using networkx. At the moment, I have a dictionnary where each key is a node, and each value is the associated weight (between 10 and 200 000 or so).
weights = {node: weight}
I believe I do not need to normalize the weights with networks.
At the moment, I create a non-weighted graph by adding the edges:
def create_graph(data):
edges = create_edges(data)
# Create the graph
G = nx.Graph()
# Add edges
G.add_edges_from(edges)
return G
From what I read, I can add a weight to the edge. However, I would prefer the weight to be applied to a specific node instead of an edge. How can I do that?
Idea: I create the graph by adding the nodes weighted, and then I add the edges between the nodes.
def create_graph(data, weights):
nodes = create_nodes(data)
edges = create_edges(data) # list of tuples
# Create the graph
G = nx.Graph()
# Add edges
for node in nodes:
G.add_node(node, weight=weights[node])
# Add edges
G.add_edges_from(edges)
return G
Is this approach correct?
Next step is to find the path between 2 nodes with the smallest weight. I found this function: networkx.algorithms.shortest_paths.generic.shortest_path which I think is doing the right thing. However, it uses weights on the edge instead of weights on the nodes. Could someone explain me what this function does, what the difference between wieghts on the nodes and weights on the edges is for networkx, and how I could achieve what I am looking for? Thanks :)
This generally looks right.
You might use bidirectional_dijkstra. It can be significantly faster if you know the source and target nodes of your path (see my comments at the bottom).
To handle the edge vs node weight issue, there are two options. First note that you are after the sum of the nodes along the path. If I give each edge a weight w(u,v) = w(u) + w(v) then the sum of weights along this is w(source) + w(target) + 2 sum(w(v)) where the nodes v are all nodes found along the way. Whatever has the minimum weight with these edge weights will have the minimum weight with the node weights.
So you could go and assign each edge the weight to be the sum of the two nodes.
for edge in G.edges():
G.edges[edge]['weight'] = G.nodes[edge[0]]['weight'] + G.nodes[edge[1]]['weight']
But an alternative is to note that the weight input into bidirectional_dijkstra can be a function that takes the edge as input. Define your own function to give the sum of the two node weights:
def f(edge):
u,v = edge
return G.nodes[u]['weight'] + G.nodes[v]['weight']
and then in your call do bidirectional_dijkstra(G, source, target, weight=f)
So the choices I'm suggesting are to either assign each edge a weight equal to the sum of the node weights or define a function that will give those weights just for the edges the algorithm encounters. Efficiency-wise I expect it will take more time to figure out which is better than it takes to code either algorithm. The only performance issue is that assigning all the weights will use more memory. Assuming memory isn't an issue, use whichever one you think is easiest to implement and maintain.
Some comments on bidirectional dijkstra: Imagine you have two points in space a distance R apart and you want to find the shortest distance between them. The dijkstra algorithm (which is the default of shortest_path) will explore every point within distance D of the source point. Basically it's like expanding a balloon centered at the first point until it reaches the other. This has a volume (4/3) pi R^3. With bidirectional_dijkstra we inflate balloons centered at each until they touch. They will each have radius R/2. So the volume is (4/3)pi (R/2)^3 + (4/3) pi (R/2)^3, which is a quarter the volume of the original balloon, so the algorithm has explored a quarter of the space. Since networks can have very high effective dimension, the savings is often much bigger.
I am trying to create a connected graph where each node has some attributes that determine what other nodes it is connected to. The network is a circular space to make it easy to establish links (there are a 1000 nodes).
The way this network works is that a node has both neighbors (the ones to its immediate left/right - i.e. node 3 has neighbors 1 and 2) and also k long distance links. The way a node picks long distance links is that it just randomly picks nodes from the clockwise direction (i.e. node 25 might have 200 as its long distance link instead of 15).
Here is a sample image of what it might looks like: http://i.imgur.com/PkYk5bz.png
Given is a symphony network but my implementation is a simplification of that.
I partially implemented this in java(via a linked list holding an arraylist) but am lost on how to do this in NetworkX. I am especially confused on how to add these specific node attributes that say that a node will find k long links but after k will not accept any more links. Is there a specific built in graph in networkx that is suited towards this model or is any graph acceptable as long as I have the correct node attributes?
It's a simplification of a more complicated network where no node leaves and no edge dissapears.
Any help or a link to an example would be appreciated on this.
This approximates to your need:
import networkx as nx
import matplotlib.pyplot as plt
import random
N = 20 # number of nodes
K = 3 # number of "long" edges
G = nx.cycle_graph(N)
for node in G.nodes():
while len(G.neighbors(node)) < K+2:
# Add K neighbors to each node
# (each node already has two neighbors from the cycle)
valid_target_found = False
while not valid_target_found:
# CAUTION
# This loop will not terminate
# if K is too high relative to N
target = random.randint(0,N-1)
# pick a random node
if (not target in G.neighbors(node)
and len(G.neighbors(target)) < K+2):
# Accept the target if (a) it is not already
# connected to source and (b) target itself
# has less than K long edges
valid_target_found = True
G.add_edge(node, target)
nx.draw_circular(G)
plt.show()
It creates the graph below. There are improvements to be made, for example, a more efficient selection of the target nodes for the long edges, but this gets you started, I hope.
In NetworkX, if there's any logic about connecting your node everything should be left to you.
Nevertheless, if you want to iterate on nodes in Python (not tested):
for (nodeId, data) in yourGraph.nodes(data=True):
// some logic here over data
// to connect your node
yourGraph.add_edge(nodeId, otherNodeId)
Side note: if you want to stay in Java you can also consider using Jung and Gephi.