Graph VIsualization in NetworkX. Is this loop under node 10 ok? - python

I am making a graph visualization under NetworkX, but then I found a self loop around node 10. Of all the graph visualizations I have seen, I have never come across such a thing.
I dont if this is wrong or right, but Mr Stark, I dont feel good about this. Can somebody help me out?
I tried modifying the dataframe that I used to make this graph, but I cant figure it out yet.
import dgl as dgl
import networkx as nx
curr = coun_df['Currencies'].to_numpy()
loca = coun_df['Location'].to_numpy()
g = dgl.graph((loca, curr))
print(g)
nxgraph = g.to_networkx().to_undirected()
pos = nx.spring_layout(nxgraph)
nx.draw(nxgraph, pos, with_labels=True, node_color=[[.7, .7, .7]])
And this is how the dataframes (both currencies and location are actual categorical variables one-hot encoded to look like this) look like -
Dataframe
This is the image of the graph.
I can see currency 10 being mapped to location 10, but I wonder if self-loops are possible in graphs.

Yes, a network with loops in it may validly be called a "graph". As the wiki page indicates, there are multiple conventions followed as to what exactly constitutes a "graph" that can typically be inferred from context.
If you want to do so, then the easiest way to get rid of loops is to get rid of rows where Currencies and Location entries match in the dataframe from which the data is pulled. For instance,
import dgl as dgl
import networkx as nx
noloop_df = coun_df[coun_df['Currencies']!=coun_df['Location']]
curr = noloop_df['Currencies'].to_numpy()
loca = noloop_df['Location'].to_numpy()
g = dgl.graph((loca, curr))
print(g)
nxgraph = g.to_networkx().to_undirected()
pos = nx.spring_layout(nxgraph)
nx.draw(nxgraph, pos, with_labels=True, node_color=[[.7, .7, .7]])

Related

Can you specify a bidirectional edge in a NetworkX digraph?

I'd like to be able to draw a NetworkX graph connecting characters from the movie "Love, Actually" (because it's that time of the year in this country), and specifying how each character "relates" to the other in the story.
Certain relationships between characters are unidirectional - e.g. Mark is in love with Juliet, but not the reverse. However, Mark is best friends with Peter, and Peter is best friends with Mark - this is a bidirectional relationship. Ditto Peter and Juliet being married to each other.
I'd like to specify both kinds of relationships. Using a NetworkX digraph in Python, I seem to have a problem: to specify a bidirectional edge between two nodes, I apparently have to provide the same link twice, which will subsequently create two arrows between two nodes.
What I'd really like is a single arrow connecting two nodes, with heads pointing both ways. I'm using NetworkX to create the graph, and pyvis.Network to render it in HTML.
Here is the code so far, which loads a CSV specifying the nodes and edges to create in the graph.
import networkx as nx
import csv
from pyvis.network import Network
dg = nx.DiGraph()
with open("rels.txt", "r") as fh:
reader = csv.reader(fh)
for row in reader:
if len(row) != 3:
continue # Quick check for malformed csv input
dg.add_edge(row[0], row[1], label=row[2])
nt = Network('500px', '800px', directed=True)
nt.from_nx(dg)
nt.show('nx.html', True)
Here is the CSV, which can be read as "Node1", "Node2", "Edge label":
Mark,Juliet,in love with
Mark,Peter,best friends
Peter,Mark,best friends
Juliet,Peter,married
Peter,Juliet,married
And the resulting image:
Whereas what I'd really like the graph to look like is this:
(Thank you to this site for the wonderful graph tool for the above visualisation)
Is there a way to achieve the above visualisation using NetworkX and Pyvis? I wasn't able to find any documentation on ways to create bidirectional edges in a directed graph.
Read the csv into pandas. Create a digraph and plot. Networkx has quite a comprehensive documentation on plotting. See what I came up with
import pandas as pd
import networkx as nx
from networkx import*
df =pd.DataFrame({'Source':['Mark','Mark','Peter','Juliet','Peter'],'Target':['Juliet','Peter','Mark','Peter','Juliet'],'Status':['in love with','best friends','best friends','married','married']})
#Create graph
g = nx.from_pandas_edgelist(df, 'Source', "Target", ["Status"], create_using=nx.DiGraph())
pos = nx.spring_layout(g)
nx.draw(g, pos, with_labels=True)
edge_labels = dict([((n1, n2), d['Status'])
for n1, n2, d in g.edges(data=True)])
nx.draw_networkx_edge_labels(g,
pos, edge_labels=edge_labels,
label_pos=0.5,
font_color='red',
font_size=7,
font_weight='bold',
verticalalignment='bottom' )
plt.show()

Network graph CircosPlot function couldnt get the node labeling and node size variation to work

I will appreciate any help I can get here, I am using python networkx CircosPlot function to generate network graph. The graph looks ok, I am having trouble labeling the node and varying the node size. My spreadsheet has the following columns: "Model", "Node_Size", "Category", "Factors", "Edge_width". Please find below the python codes, thanks.
import networkx as nx
G = nx.from_pandas_edgelist(df,
source="Model",
target="Category",
edge_attr=["edge_size"],
create_using=nx.MultiGraph(),
)
bottom_nodes, top_nodes = bipartite.sets(G)
bipartite = bipartite.color(G)
nx.set_node_attributes(G, bipartite, 'Model')
c = CircosPlot(G, node_order='Model', node_grouping='Model', node_color='Model')
plt.show()

I can't form a graph with networkx based on three criteria

I'm new to Python. Please help me solve the problem with graph construction. I have a database with the attribute "Source", "Interlocutor" and "Frequency".
An example of three lines:
I need to build a graph based on the Source-Interlocutor, but the frequency is also taken into account.
Like this:
My code:
dic_values={Source:[24120.0,24120.0,24120.0], Interlocutor:[34,34,34],Frequency:[446625000, 442475000, 445300000]
session_graph=pd.DataFrame(dic_values)
friquency=session_graph['Frequency'].unique()
plt.figure(figsize=(10,10))
for i in range(len(friquency)):
df_friq=session_subset[session_subset['Frequency']==friquency[i]]
G_frique=nx.from_pandas_edgelist(df_friq,source='Source',target='Interlocutor')
pos = nx.spring_layout(G_frique)
nx.draw_networkx_nodes(G_frique, pos, cmap=plt.get_cmap('jet'), node_size = 20)
nx.draw_networkx_edges(G_frique, pos, arrows=True)
nx.draw_networkx_labels(G_frique, pos)
plt.show()
And I have like this:
Your problem requires a MultiGraph
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import pydot
from IPython.display import Image
dic_values = {"Source":[24120.0,24120.0,24120.0], "Interlocutor":[34,34,34],
"Frequency":[446625000, 442475000, 445300000]}
session_graph = pd.DataFrame(dic_values)
sources = session_graph['Source'].unique()
targets = session_graph['Interlocutor'].unique()
#create a Multigraph and add the unique nodes
G = nx.MultiDiGraph()
for n in [sources, targets]:
G.add_node(n[0])
#Add edges, multiple connections between the same set of nodes okay.
# Handled by enum in Multigraph
#Itertuples() is a faster way to iterate through a Pandas dataframe. Adding one edge per row
for row in session_graph.itertuples():
#print(row[1], row[2], row[3])
G.add_edge(row[1], row[2], label=row[3])
#Now, render it to a file...
p=nx.drawing.nx_pydot.to_pydot(G)
p.write_png('multi.png')
Image(filename='multi.png') #optional
This will produce the following:
Please note that node layouts are trickier when you use Graphviz/Pydot.
For example check this SO answer.. I hope this helps you move forward. And welcome to SO.

NetworkX Minimum Spanning Tree has different cluster arrangement with the same data?

I have a large dataset which compares products with a relatedness measure which looks like this:
product1 product2 relatedness
0101 0102 0.047619
0101 0103 0.023810
0101 0104 0.095238
0101 0105 0.214286
0101 0106 0.047619
... ... ...
I used the following code to feed the data into the NetworkX graphing tool and produce an MST diagram:
import networkx as nx
import matplotlib.pyplot as plt
products = (data['product1'])
products = list(dict.fromkeys(products))
products = sorted(products)
G = nx.Graph()
G.add_nodes_from(products)
print(G.number_of_nodes())
print(G.nodes())
row = 0
for c in data['product1']:
p = data['product2'][row]
w = data['relatedness'][row]
if w > 0:
G.add_edge(c,p, weight=w, with_labels=True)
row = row + 1
nx.draw(nx.minimum_spanning_tree(G), with_labels=True)
plt.show()
The resulting diagram looks like this: https://i.imgur.com/pBbcPGc.jpg
However, when I re-run the code, with the same data and no modifications, the arrangement of the clusters appears to change, so it then looks different, example here: https://i.imgur.com/4phvFGz.jpg, second example here: https://i.imgur.com/f2YepVx.jpg. The clusters, edges, and weights do not appear to be changing, but the arrangement of them on the graph space is changing each time.
What causes the arrangement of the nodes to change each time without any changes to the code or data? How can I re-write this code to produce a network diagram with approximately the same arrangement of nodes and edges for the same data each time?
The nx.draw method uses by default the spring_layout (link to the doc). This layout implements the Fruchterman-Reingold force-directed algorithm which starts with random initial positions. This is this layout effect that you witness in your repetitive trials.
If you want to "fix" the positions, then you should explicitely call the spring_layout function and specify the initial positions in the pos argument.
Assign G = nx.minimum_spanning_tree(G) for purpose of clarity. Then
nx.draw(G, with_labels=True)
is equivalent to
pos = nx.spring_layout(G)
nx.draw(G, pos=pos, with_labels=True)
Since you don't like pos to be calculated randomly every time you run your script, the only way to keep your pos stable is to store it once and retrieve from file after each rerun. You can put this script to calculate pos in an improved manner before nx.draw(G, pos=pos, with_labels=True):
import os, json
def store(pos):
#form of dictionary to be stored dictionary retrieved
return {k: v.tolist() for k, v in pos.items()}
def retrieve(pos):
#form of dictionary to be retrieved
return {float(k): v for k, v in pos.items()}
if 'nodes.txt' in os.listdir():
json_file = open('pos.txt').read()
pos = retrieve(json.loads(json_file)) #retrieving dictionary from file
print('retrieve', pos)
else:
with open('pos.txt', 'w') as outfile:
pos = nx.spring_layout(new_G) #calculates pos
print('store', pos)
json.dump(store(pos), outfile, indent=4) #records pos dictionary into file
This is an ugly solution because it depends unconditionally of data types used in pos dictionary. It worked for me, but you might to define your custom ones used in store and retrieve

Create a networkx color_map from OrderedDict elements

The goal is to create a networkx graph based on eigenvalue centrality and highlight the top five nodes of highest centrality measure with a different node color.
Below is my current script. It's working fine and everything's just set to one color for now.
#import packages
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import collections
#read data into nx graph
G = nx.from_pandas_dataframe(df, 'col1', 'col2')
#get dictionary of centrality measures and order it
eig = nx.eigenvector_centrality(G)
ordered = collections.OrderedDict(sorted(eig.items(), reverse = True, key=lambda t: t[1]))
#draw graph
nx.draw_spring(G, k=1, node_color = 'skyblue',
node_size = 200, font_size = 6, with_labels = True)
plt.show()
Below is what I'm experimenting with for the node coloring. I'm attempting to append the first five ordered dictionary key names to the color_map, setting them to a different color from the rest. Please let me know if you have any suggestions here or if another method would simpler. If possible, I'd prefer to stick to the packages I'm using.
#adjust color of top three
color_map = []
for key, value in ordered:
if key < 5:
color_map.append('blue')
else: color_map.append('green')
Figured it out. There's no need to create a separate OrderedDict. Applying the key=eig.get argument to sorted() allows you to sort a dictionary by value (the biggest obstacle for this problem). Then I can just filter this and apply it to cmap.
Not a bad question, but I am pretty sure its a duplicate...Either way, I'll try to answer it. You will have to forgive me for being vague because I can't run your code and I've not tried to write an example of your problem. I will update my answer when I can run an example of your code later. However, for now I believe changing the node colour is similar to changing the edge colour. Going from that, you need to focus on the this line of your code to change the node colour;
#draw graph
nx.draw_spring(G, k=1, node_color = 'skyblue',
node_size = 200, font_size = 6, with_labels = True)
and this is an example of how you could do this;
# depending on how your graph is being read from the pandas-
# dataframe this should pull out the correct value to associate-
# the colour with
nodes = G.nodes()
# Just a list to hold all the node colours
node_colours = []
for node in nodes:
if node < 5:
node_colours.append('blue')
else:
node_colours.append('green')
#draw graph, substitute the single colour with ur list of colours
nx.draw_spring(G, k=1, node_color = node_colours,
node_size = 200, font_size = 6, with_labels = True)
I hope this helps! I will get back to it later when I test an example of this on my own machine.
-Edit-
Since the OP has answered his own question, I will not elaborate further with my example. However, if anyone is interested in my example (but if it unfortunately throws any error) and wants me to fix it and expand on it, just leave me a comment.
Tiz

Categories

Resources