Building a graph from a list or dict - Python - python

I have an example list of word lists like this:
[['rowerowy', 'rower']
['rowerzysta', 'rower']
['domeczek', 'domek']
['domek', 'dom']
['rowerzystka', 'rowerzysta']]
and I need to concatenate words into dependency groups, we are making a graph of connections between the forms:
rowerowy --> rower <-- rowerzysta <--- rowerzystka
domeczek --> domek --> dom
If they are not attached to any pairs in the relation, they form a graph with one edge.
Any ideas ?
I am making a dictionary
data = [['rowerowy', 'rower'],
['rowerzysta', 'rower'],
['domeczek', 'domek'],
['domek', 'dom'],
['rowerzystka', 'rowerzysta']]
dc={}
for a in data:
if a[1] in dc:
dc[a[1]].append(a[0])
else:
dc[a[1]] = [a[0]]
out:
{'rower': ['rowerowy', 'rowerzysta'],
'domek': ['domeczek'],
'dom': ['domek'],
'rowerzysta': ['rowerzystka']}
or:
def maketree(source):
graph={}
for pair in source:
nodein,nodeout=pair
if nodeout in graph:
graph[nodeout].add(nodein)
else:
graph[nodeout]={nodein}
graph[None]=set(graph.keys()).difference(set.union(*graph.values()))
return graph
The second approach is better because I delete duplicate words in the dictionary values.
How to write it out graphically?

Here is a solution using the networkx library:
import networkx as nx
data = [['rowerowy', 'rower'],
['rowerzysta', 'rower'],
['domeczek', 'domek'],
['domek', 'dom'],
['rowerzystka', 'rowerzysta']]
#instantiate graph
G = nx.Graph()
# add word connections as edges
G.add_edges_from(data)
nx.draw_networkx(G)

Related

How to make a tree network graph in python with networkx (algorithm question)?

I have a list with 7 elements:
elements = ['A','B','C','D','E','F','G']
and I want to make a network starting from A in the center, and going down all the possible paths to each other element, so it would look something like this:
## A->[B1,C1,D1,E1,F1,G1]
## B1 -> [C2,D2,E2,F2,G2] for each of [B1,C1,...,G1]
## C2 -> [D3,E3,F3,G3] for each of the above, etc.
I'm currently trying with the networkx package, the end goal would then be to make it a circular tree graph, maybe color the nodes by the letter or assign some weights etc.
For now though I'm kinda stuck on the aforementioned problem, I've tried many things and it feels like it shouldn't be too tough but I'm not too experienced in these algorithm problems. It looks like it should be some kind of recursion problem. Here's one of the things I've tried if it helps:
def add_edges(network, edge_list,i,previous_ele):
edge_list1 = edge_list.copy()
for ele in edge_list:
network.add_edge(previous_ele+str(i),ele+str(i+1))
edge_list1.remove(ele)
add_edges(network, edge_list1, i+1, ele)
N = nx.DiGraph()
elements = ['A','B','C','D','E','F','G']
elements.remove('A')
for ele in elements:
N.add_edge('A',ele+'1')
for i in range(len(elements)):
add_edges(N, elements, 1, elements[i])
Thanks in advance for any tips!
I'm not sure if I completely understood what you're trying to do, but here's a script that does my best guess. The same script works for any number of elements, but it becomes difficult to see the graph clearly.
import networkx as nx
from itertools import combinations
elements = ['A','B','C','D']
G = nx.DiGraph()
for c in elements[1:]:
G.add_edge(elements[0], c+'1')
for i in range(1,len(elements)-1):
for a,b in combinations(elements[i:],2):
G.add_edge(a+str(i),b+str(i+1))
The resulting graph, drawn using nx.draw(G, with_labels = True, node_color = "orange"):
As a bonus, here's a way to draw the graph so that the hierarchy is clear. After the first block of code, add the following:
pos = {elements[0]:(0,0)}
for i in range(1,len(elements)):
for j in range(1,len(elements)):
pos[elements[j]+str(i)] = (j-1,-i)
nx.draw(G, pos = pos, with_labels = True, node_color = "orange")
The resulting drawing for your original list of elements:

Create "short cut" aware graph in Python

Assume we have these sequences:
A->X->Y->Z
B->Y->Z
C->Y->Z
D->X->Z
I would like to create a graph like:
C
|
A-X-Y-Z
| |
D B
In the sequence D-X-Z there is a short cut. My goal is to create a directed acyclic graph by eliminating these short-cuts and vice versa, expand existing edges when encountering expanded paths (e.g.: X-Z with X-Y-Z).
My approach so far was to create a directed graph with Networkx but this does not solve the problem because I could not find a way to eliminate short circuits (it is a big graph with hundreds of thousands of nodes).
Any hints would be appreciated.
You can set up the graph:
import networkx as nx
text = '''
A-X-Y-Z
B-Y-Z
C-Y-Z
D-X-Z
'''
G = nx.Graph()
for s in text.strip().split('\n'):
l = s.split('-')
G.add_edges_from(zip(l,l[1:]))
Then use find_cycles and remove_edge repeatedly to identify and remove edges that form cycles:
while True:
try:
c = nx.find_cycle(G)
print(f'found cycle: {c}')
G.remove_edge(*c[0])
except nx.NetworkXNoCycle:
break

Problem with appending a graph object to lists for networkx in Python

I am trying to remove nodes at random from graphs using the networkx package. The first block describes the graph construction and the second block gives me the node lists that I have to remove from my graph H (20%, 50% and 70% removals). I want 3 versions of the base graph H in the end, in a list or any data structure. The code in block 3 gives me objects of type "None". The last block shows that it works for a single case.
I am guessing that the problem is in the append function, which somehow returns objects of type "None". I also feel that the base graph H might be getting altered after every iteration. Is there any way around this? Any help would be appreciated :)
import networkx as nx
import numpy as np
import random
# node removals from Graphs at random
# network construction
H = nx.Graph()
H.add_nodes_from([1,2,3,4,5,6,7,8,9,10])
H.add_edges_from([[1,2],[2,4],[5,6],[7,10],[1,5],[3,6]])
nx.info(H)
nodes_list = list(H.nodes)
# list of nodes to be removed
perc = [.20,.50,.70] # percentage of nodes to be removed
random_sample_list = []
for p in perc:
interior_list = []
random.seed(2) # for replicability
sample = round(p*10)
random_sample = random.sample(nodes_list, sample)
interior_list.append(random_sample)
random_sample_list.append(random_sample)
# applying the list of nodes to be removed to create a list of graphs - not working
graph_list = []
for i in range(len(random_sample_list)):
H1 = H.copy()
graph_list.append(H1.remove_nodes_from(random_sample_list[i]))
# list access - works
H.remove_nodes_from(random_sample_list[1])
nx.info(H)
Final output should look like:
[Graph with 20% removed nodes, Graph with 50% removed nodes, Graph with 7% removed nodes] - eg. list
The function remove_nodes_from does not return the modified graph, but returns None. Consequently, you only need to create the graph with the desired percentage of your nodes and append it to the list:
graph_list = []
for i in range(len(random_sample_list)):
H1 = H.copy()
H1.remove_nodes_from(random_sample_list[i])
graph_list.append(H1)

Calculate the longest path between two nodes NetworkX

I'm trying to make a Gantt chard using Networkx. All the nodes in the network are "tasks" that need to be performed to complete the project. With Networkx it is easy to calculate the total time of the project. But the make the Gantt chard I need the latest start of each node.
NetworkX includes one function(dag_longest_path_length) but this calculates to longest path in the whole network. Another function(astar_path_length) results in the shortest path between a source and node, but no function is availed which gives the longest path, or latest start in my case. (if a node as two predecessors it will take the fastest route, but in reality it also has to wait on the second before it can start.
I was thinking of one option.
To evaluate the previous attached nodes and selecting the longest path. Unformal I did not succeeded.
start_time=[]
time=0
DD=nx.DiGraph()
for i in range(df.shape[0]):
DD.add_edge(str(df.at[i,'blockT'])+'_'+df.at[i,'Task'], str(df.at[i,'blockS'])+'_'+df.at[i,'Succ'], weight=df.at[i,'duration'])
fig, ax = plt.subplots()
labels=[]
for i in range(df.shape[0]):
labels.append(str(df.at[i,'blockT'])+'_'+df.at[i,'Task'])
print(nx.astar_path_length(DD, '0_START', str(df.at[i,'blockT'])+'_'+df.at[i,'Task']) )
ax.broken_barh([(nx.astar_path_length(DD, '0_START', str(df.at[i,'blockT'])+'_'+df.at[i,'Task']), heuristic=None, weight='weight'),df.at[i,'duration'] )],(i-0.4,0.8), facecolors='blue' )
Here is some code that I use. I agree is really should be part of NetworkX because it comes up pretty often for me. graph must be a DiGraph. s is the source node and dist is a dict keyed by nodes with weighted distances to s as values.
def single_source_longest_dag_path_length(graph, s):
assert(graph.in_degree(s) == 0)
dist = dict.fromkeys(graph.nodes, -float('inf'))
dist[s] = 0
topo_order = nx.topological_sort(graph)
for n in topo_order:
for s in graph.successors(n):
if dist[s] < dist[n] + graph.edges[n,s]['weight']:
dist[s] = dist[n] + graph.edges[n,s]['weight']
return dist
Looks like you are using DAGs.
Your problem is rather rare so there is no built-in function for it in networkx. You should do it manually:
max(nx.all_simple_paths(DAG, source, target), key=lambda x: len(x))
Here is the full testing code:
import networkx as nx
import random
from itertools import groupby
# Create random DAG
G = nx.gnp_random_graph(50,0.3,directed=True)
DAG = nx.DiGraph([(u,v) for (u,v) in G.edges() if u<v])
# Get the longest path from node 1 to node 10
max(nx.all_simple_paths(DAG, 1, 10), key=lambda x: len(x))

How to find all connected subgraph of a graph in networkx?

I'm developing a python application, and i want to list all possible connected subgraph of any size and starting from every node using NetworkX.
I just tried using combinations() from itertools library to find all possible combination of nodes but it is very too slow because it searchs also for not connected nodes:
for r in range(0,NumberOfNodes)
for SG in (G.subgraph(s) for s in combinations(G,r):
if (nx.is_connected(SG)):
nx.draw(SG,with_labels=True)
plt.show()
The actual output is correct. But i need another way faster to do this, because all combinations of nodes with a graph of 50 nodes and 8 as LenghtTupleToFind are up to 1 billion (n! / r! / (n-r)!) but only a minimal part of them are connected subgraph so are what i am interested in. So, it's possible to have a function for do this?
Sorry for my english, thank you in advance
EDIT:
As an example:
so, the results i would like to have:
[0]
[0,1]
[0,2]
[0,3]
[0,1,4]
[0,2,5]
[0,2,5,4]
[0,1,4,5]
[0,1,2,4,5]
[0,1,2,3]
[0,1,2,3,5]
[0,1,2,3,4]
[0,1,2,3,4,5]
[0,3,2]
[0,3,1]
[0,3,2]
[0,1,4,2]
and all combination that generates a connected graph
I had the same requirements and ended up using this code, super close to what you were doing. This code yields exactly the input you asked for.
import networkx as nx
import itertools
G = you_graph
all_connected_subgraphs = []
# here we ask for all connected subgraphs that have at least 2 nodes AND have less nodes than the input graph
for nb_nodes in range(2, G.number_of_nodes()):
for SG in (G.subgraph(selected_nodes) for selected_nodes in itertools.combinations(G, nb_nodes)):
if nx.is_connected(SG):
print(SG.nodes)
all_connected_subgraphs.append(SG)
I have modified Charly Empereur-mot's answer by using ego graph to make it faster:
import networkx as nx
import itertools
G = you_graph.copy()
all_connected_subgraphs = []
# here we ask for all connected subgraphs that have nb_nodes
for n in you_graph.nodes():
egoG = nx.generators.ego_graph(G,n,radius=nb_nodes-1)
for SG in (G.subgraph(sn+(n,) for sn in itertools.combinations(egoG, nb_nodes-1)):
if nx.is_connected(SG):
all_connected_subgraphs.append(SG)
G.remove_node(n)
You might want to look into connected_components function. It will return you all connected nodes, which you can then filter by size and node.
You can find all the connected components in O(n) time and memory complexity. Keep a seen boolean array, and run Depth First Search (DFS) or Bread First Search (BFS), to find the connected components.
In my code, I used DFS to find the connected components.
seen = [False] * num_nodes
def search(node):
component.append(node)
seen[node] = True
for neigh in G.neighbors(node):
if not seen[neigh]:
dfs(neigh)
all_subgraphs = []
# Assuming nodes are numbered 0, 1, ..., num_nodes - 1
for node in range(num_nodes):
component = []
dfs(node)
# Here `component` contains nodes in a connected component of G
plot_graph(component) # or do anything
all_subgraphs.append(component)

Categories

Resources