Link Prediction for directed graph in Python - python

I'm trying to predict the possible future links or missing edges in a directed graph, that is, will there be links between any node pairs in the future?
This is the dataset I'm using right now : https://github.com/JiaCheng-Lai/Link_Prediction
data_train_edge.csv is used for training data, there are about 20,000 edges. (This is a directed network, so each node pair represents a directed edge. E.g., (361, 981) represents an edge from node 361 to node 981.)
predict.csv is the node pair (node1, node2) to be predicted. The third cloumn ans is the prediction result 0 or 1. (0 means there is no hidden edge for this node pair; otherwise the ans is 1.)
import numpy as np
import pandas as pd
import networkx as nx
df = pd.read_csv("data_train_edge.csv")
G = nx.DiGraph()
G.add_edges_from(df[['node1', 'node2']].values)
I just built a directed graph with the above code, but I'm not quite sure how to do link prediction. You can use any kind of prediction method such as Jaccard Coefficient, Adamic-Adar index, etc.
However, I think networkx does not support link prediction using directed graphs because of Errors. If directed graph really doesn't works, it can be implemented with undirected graphs.
It would be a great help if can provide the code or any tips! Thanks a lot.

Related

From Adjacency matrix to Bipartite Graph in NewworkX

I have a csr matrix from which I extracted data, rows, and columns.
I want to create a bipartite graph using NetworkX, and I also tried several solutions without success (as an example: Plot bipartite graph using networkx in Python). The reasons why it doesn't work, in my opinion, is a matter of labeling. My two sets and the nodes inside them have no string name.
For example in a 10x10 matrix, the rows/cols indexes represent the name of the nodes of the two sets, while the intersection of these nodes is the weighted link between those nodes.
In my case, then, if I have (0,0)=0.5 it doesn't mean that it is a self-loop; instead, the link with weight 0.5 connects the "node 0" of the first set with the "node 0" of the second one.
import networkx as nx
from networkx.algorithms import bipartite
import matplotlib.pyplot as plt
def function(foo, n_row, n_col):
n_row=10
n_col=10
After the creation of the matrix, I obtain my data
weights = weights.tocsr()
wcoo = weights.tocoo()
m_data = wcoo.data
m_rows = wcoo.row
m_cols = wcoo.col
g = nx.Graph()
# TRIAL 1
g.add_nodes_from(m_cols, bipartite=0)
g.add_nodes_from(m_rows, bipartite=1)
bi_m = bipartite.matrix.biadjacency_matrix(g, m_data)
# TRIAL 2
g.add_weighted_edges_from(zip(m_cols, m_rows, m_data))
nx.draw(g, node_size=500)
plt.show()
I expected a bipartite graph with two sets of 10 nodes per each with a certain amount of weighted links among them (without link among the same set) as a result.
I, instead, obtained a classic non-oriented graph with 10 nodes in total.
At the same time, I'd like to optimize as well as I can my code to speed-up the computational time without affecting the readability.

Creating random distance nodes in networkx

I am trying to have nodes connect to a main node with different distances.
What I have so far:
import networkx as nx
G = nx.empty_graph( 3 , create_using= None)
G.add_edge(0,1)
G.add_edge(0,2)
Graph with equal distance to a main node
However, as it can be seen from the image, the distance between the node on either side have equal distance to the main node. Is there a way to have their distance to the main node different?
There are two parts to your question:
Part 1 - Distance between nodes:
In network theory, the distance between nodes is represented by the weight of the edge between them. So you can add all your edges with weights to your network with the following line:
G = nx.Graph()
G.add_weighted_edges_from([(0,1,4.0),(0,2,5.0)])
You can randomize the weights on the edges above for random distance between nodes.
Part 2 - Network Visualization:
I understand that you're more concerned with how the network graph is shown. If you use nx.draw_random(G) you can get randomized distances between your nodes, and suggest that you save a picture when you get the desired figure, as it randomizes every time you run.
Hope it helps... :)

Python networkx node spacing

I have 1000 different names, each constituting a node. Each name can be connected with 0..1000 other names an unlimited amount of times. I would like to graph it in such a way that the distance between two nodes is inversely proportional to the number of times they are connected.
Example:
'node1' : ['node2','node2','node2','node2','node2','node3']
'node2' : ['node1','node1','node1','node1','node1']
'node3' : ['node1']
node1and node2 should huddle together and node3 should be further away.
Is that possible? Currently I'm graphing using the following code:
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_nodes_from(grapharr.keys())
for k in grapharr:
for j in grapharr[k]:
G.add_edge(k,j)
nx.draw_networkx(G, **options)
grapharris a dict structure where the keys are nodes and the values are arrays containing the connections for the particular node.
It is impossible in the general case. Look at this graph:
Imagine that the central node has a thousand connections to each other, but 'square' nodes have only one connection between them. How will you draw them?
Anyway, you can set the connectivity level as edge weight and use force-directed layouts that will try to create the best layout (but not 100% optimal, of course). In networkx, there are:
spring_layout
draw_spring
graphviz_layout with prog='neato' parameter

How to generate a random network but keep the original node degree using networkx?

I have a network, and how to generate a random network but ensure each node retains the same degre of the original network using networkx? My first thought is to get the adjacency matrix, and perform a random in each row of the matrix, but this way is somwhat complex, e.g. need to avoid self-conneted (which is not seen in the original network) and re-label the nodes. Thanks!
I believe what you're looking for is expected_degree_graph. It generates a random graph based on a sequence of expected degrees, where each degree in the list corresponds to a node. It also even includes an option to disallow self-loops!
You can get a list of degrees using networkx.degree. Here's an example of how you would use them together in networkx 2.0+ (degree is slightly different in 1.0):
import networkx as nx
from networkx.generators.degree_seq import expected_degree_graph
N,P = 3, 0.5
G = nx.generators.random_graphs.gnp_random_graph(N, P)
G2 = expected_degree_graph([deg for (_, deg) in G.degree()], selfloops=False)
Note that you're not guaranteed to have the exact degrees for each node using expected_degree_graph; as the name implies, it's probabilistic given the expected value for each of the degrees. If you want something a little more concrete you can use configuration_model, however it does not protect against parallel edges or self-loops, so you'd need to prune those out and replace the edges yourself.

python: using networkX on biological networks for directional edges

As the title says, I'm using networkX to represent some cell networks in Python.
The network is at the bottom of this post since it's a large image.
The reason I'm doing this is because some of theres nodes are considered "input" and some will be considered "output", and I need to be able to calculate the number of signal paths (the number of paths from input to output) that each node participates in. however, I don't think networkX offers edge directionality, which I believe is needed to calculate signal paths for nodes.
Does anyone know if its possible to add direction to edges in networkX, or if its possible to calculate signal paths without directionality?
Here's the code I wrote up until I realized I needed directional edges:
import networkx as nx
import matplotlib.pyplot as plt
G=nx.Graph()
molecules = ["CD40L", "CD40", "NF-kB", "XBP1", "Pax5", "Bach2", "Irf4", "IL-4",
"IL-4R", "STAT6", "AID", "Blimp1", "Bcl6", "ERK", "BCR", "STAT3", "Ag", "STAT5",
"IL-21R", "IL-21", "IL-2", "IL-2R"]
Bcl6_edges = [("Bcl6", "Bcl6"), ("Bcl6", "Blimp1"), ("Bcl6", "Irf4")]
STAT5_edges = [("STAT5", "Bcl6")]
edges = Bcl6_edges + STAT5_edges
G.add_nodes_from(molecules)
G.add_edges_from(edges)
Try G = nx.DiGraph() for a directed graph.

Categories

Resources