Create "short cut" aware graph in Python - python

Assume we have these sequences:
A->X->Y->Z
B->Y->Z
C->Y->Z
D->X->Z
I would like to create a graph like:
C
|
A-X-Y-Z
| |
D B
In the sequence D-X-Z there is a short cut. My goal is to create a directed acyclic graph by eliminating these short-cuts and vice versa, expand existing edges when encountering expanded paths (e.g.: X-Z with X-Y-Z).
My approach so far was to create a directed graph with Networkx but this does not solve the problem because I could not find a way to eliminate short circuits (it is a big graph with hundreds of thousands of nodes).
Any hints would be appreciated.

You can set up the graph:
import networkx as nx
text = '''
A-X-Y-Z
B-Y-Z
C-Y-Z
D-X-Z
'''
G = nx.Graph()
for s in text.strip().split('\n'):
l = s.split('-')
G.add_edges_from(zip(l,l[1:]))
Then use find_cycles and remove_edge repeatedly to identify and remove edges that form cycles:
while True:
try:
c = nx.find_cycle(G)
print(f'found cycle: {c}')
G.remove_edge(*c[0])
except nx.NetworkXNoCycle:
break

Related

Building a graph from a list or dict - Python

I have an example list of word lists like this:
[['rowerowy', 'rower']
['rowerzysta', 'rower']
['domeczek', 'domek']
['domek', 'dom']
['rowerzystka', 'rowerzysta']]
and I need to concatenate words into dependency groups, we are making a graph of connections between the forms:
rowerowy --> rower <-- rowerzysta <--- rowerzystka
domeczek --> domek --> dom
If they are not attached to any pairs in the relation, they form a graph with one edge.
Any ideas ?
I am making a dictionary
data = [['rowerowy', 'rower'],
['rowerzysta', 'rower'],
['domeczek', 'domek'],
['domek', 'dom'],
['rowerzystka', 'rowerzysta']]
dc={}
for a in data:
if a[1] in dc:
dc[a[1]].append(a[0])
else:
dc[a[1]] = [a[0]]
out:
{'rower': ['rowerowy', 'rowerzysta'],
'domek': ['domeczek'],
'dom': ['domek'],
'rowerzysta': ['rowerzystka']}
or:
def maketree(source):
graph={}
for pair in source:
nodein,nodeout=pair
if nodeout in graph:
graph[nodeout].add(nodein)
else:
graph[nodeout]={nodein}
graph[None]=set(graph.keys()).difference(set.union(*graph.values()))
return graph
The second approach is better because I delete duplicate words in the dictionary values.
How to write it out graphically?
Here is a solution using the networkx library:
import networkx as nx
data = [['rowerowy', 'rower'],
['rowerzysta', 'rower'],
['domeczek', 'domek'],
['domek', 'dom'],
['rowerzystka', 'rowerzysta']]
#instantiate graph
G = nx.Graph()
# add word connections as edges
G.add_edges_from(data)
nx.draw_networkx(G)

Problem with appending a graph object to lists for networkx in Python

I am trying to remove nodes at random from graphs using the networkx package. The first block describes the graph construction and the second block gives me the node lists that I have to remove from my graph H (20%, 50% and 70% removals). I want 3 versions of the base graph H in the end, in a list or any data structure. The code in block 3 gives me objects of type "None". The last block shows that it works for a single case.
I am guessing that the problem is in the append function, which somehow returns objects of type "None". I also feel that the base graph H might be getting altered after every iteration. Is there any way around this? Any help would be appreciated :)
import networkx as nx
import numpy as np
import random
# node removals from Graphs at random
# network construction
H = nx.Graph()
H.add_nodes_from([1,2,3,4,5,6,7,8,9,10])
H.add_edges_from([[1,2],[2,4],[5,6],[7,10],[1,5],[3,6]])
nx.info(H)
nodes_list = list(H.nodes)
# list of nodes to be removed
perc = [.20,.50,.70] # percentage of nodes to be removed
random_sample_list = []
for p in perc:
interior_list = []
random.seed(2) # for replicability
sample = round(p*10)
random_sample = random.sample(nodes_list, sample)
interior_list.append(random_sample)
random_sample_list.append(random_sample)
# applying the list of nodes to be removed to create a list of graphs - not working
graph_list = []
for i in range(len(random_sample_list)):
H1 = H.copy()
graph_list.append(H1.remove_nodes_from(random_sample_list[i]))
# list access - works
H.remove_nodes_from(random_sample_list[1])
nx.info(H)
Final output should look like:
[Graph with 20% removed nodes, Graph with 50% removed nodes, Graph with 7% removed nodes] - eg. list
The function remove_nodes_from does not return the modified graph, but returns None. Consequently, you only need to create the graph with the desired percentage of your nodes and append it to the list:
graph_list = []
for i in range(len(random_sample_list)):
H1 = H.copy()
H1.remove_nodes_from(random_sample_list[i])
graph_list.append(H1)

Improving BFS performance with some kind of memoization

I have this issue that I'm trying to build an algorithm which will find distances from one vertice to others in graph.
Let's say with the really simple example that my network looks like this:
network = [[0,1,2],[2,3,4],[4,5,6],[6,7]]
I created a BFS code which is supposed to find length of paths from the specified source to other graph's vertices
from itertools import chain
import numpy as np
n = 8
graph = {}
for i in range(0, n):
graph[i] = []
for communes in communities2:
for vertice in communes:
work = communes.copy()
work.remove(vertice)
graph[vertice].append(work)
for k, v in graph.items():
graph[k] = list(chain(*v))
def bsf3(graph, s):
matrix = np.zeros([n,n])
dist = {}
visited = []
queue = [s]
dist[s] = 0
visited.append(s)
matrix[s][s] = 0
while queue:
v = queue.pop(0)
for neighbour in graph[v]:
if neighbour in visited:
pass
else:
matrix[s][neighbour] = matrix[s][v] + 1
queue.append(neighbour)
visited.append(neighbour)
return matrix
bsf3(graph,2)
First I'm creating graph (dictionary) and than use the function to find distances.
What I'm concerned about is that this approach doesn't work with larger networks (let's say with 1000 people in there). And what I'm thinking about is to use some kind of memoization (actually that's why I made a matrix instead of list). The idea is that when the algorithm calculates the path from let's say 0 to 3 (what it does already) it should keep track for another routes in such a way that matrix[1][3] = 1 etc.
So I would use the function like bsf3(graph, 1) it would not calculate everything from scratch, but would be able to access some values from matrix.
Thanks in advance!
Knowing this not fully answer your question, but this is another approach you cabn try.
In networks you will have a routing table for each node inside your network. You simple save a list of all nodes inside the network and in which node you have to go. Example of routing table of node D
A -> B
B -> B
C -> E
D -> D
E -> E
You need to run BFS on each node to build all routing table and it will take O(|V|*(|V|+|E|). The space complexity is quadratic but you have to check all possible paths.
When you create all this information you can simple start from a node and search for your destination node inside the table and find the next node to go. This will give a more better time complexity (if you use the right data structure for the table).

How to find all connected subgraph of a graph in networkx?

I'm developing a python application, and i want to list all possible connected subgraph of any size and starting from every node using NetworkX.
I just tried using combinations() from itertools library to find all possible combination of nodes but it is very too slow because it searchs also for not connected nodes:
for r in range(0,NumberOfNodes)
for SG in (G.subgraph(s) for s in combinations(G,r):
if (nx.is_connected(SG)):
nx.draw(SG,with_labels=True)
plt.show()
The actual output is correct. But i need another way faster to do this, because all combinations of nodes with a graph of 50 nodes and 8 as LenghtTupleToFind are up to 1 billion (n! / r! / (n-r)!) but only a minimal part of them are connected subgraph so are what i am interested in. So, it's possible to have a function for do this?
Sorry for my english, thank you in advance
EDIT:
As an example:
so, the results i would like to have:
[0]
[0,1]
[0,2]
[0,3]
[0,1,4]
[0,2,5]
[0,2,5,4]
[0,1,4,5]
[0,1,2,4,5]
[0,1,2,3]
[0,1,2,3,5]
[0,1,2,3,4]
[0,1,2,3,4,5]
[0,3,2]
[0,3,1]
[0,3,2]
[0,1,4,2]
and all combination that generates a connected graph
I had the same requirements and ended up using this code, super close to what you were doing. This code yields exactly the input you asked for.
import networkx as nx
import itertools
G = you_graph
all_connected_subgraphs = []
# here we ask for all connected subgraphs that have at least 2 nodes AND have less nodes than the input graph
for nb_nodes in range(2, G.number_of_nodes()):
for SG in (G.subgraph(selected_nodes) for selected_nodes in itertools.combinations(G, nb_nodes)):
if nx.is_connected(SG):
print(SG.nodes)
all_connected_subgraphs.append(SG)
I have modified Charly Empereur-mot's answer by using ego graph to make it faster:
import networkx as nx
import itertools
G = you_graph.copy()
all_connected_subgraphs = []
# here we ask for all connected subgraphs that have nb_nodes
for n in you_graph.nodes():
egoG = nx.generators.ego_graph(G,n,radius=nb_nodes-1)
for SG in (G.subgraph(sn+(n,) for sn in itertools.combinations(egoG, nb_nodes-1)):
if nx.is_connected(SG):
all_connected_subgraphs.append(SG)
G.remove_node(n)
You might want to look into connected_components function. It will return you all connected nodes, which you can then filter by size and node.
You can find all the connected components in O(n) time and memory complexity. Keep a seen boolean array, and run Depth First Search (DFS) or Bread First Search (BFS), to find the connected components.
In my code, I used DFS to find the connected components.
seen = [False] * num_nodes
def search(node):
component.append(node)
seen[node] = True
for neigh in G.neighbors(node):
if not seen[neigh]:
dfs(neigh)
all_subgraphs = []
# Assuming nodes are numbered 0, 1, ..., num_nodes - 1
for node in range(num_nodes):
component = []
dfs(node)
# Here `component` contains nodes in a connected component of G
plot_graph(component) # or do anything
all_subgraphs.append(component)

Networkx Python Edges Comparison

I have been trying to build a graph for a project and I have been trying to identify newly added edges after populating it with more information.
For instance below you can see its first and second iteration:
---------------------- General Info Graph H-----------------------------
Total number of Nodes in Graph: 2364
Total number of Edges: 3151
---------------------- General Info Graph G -----------------------------
Total number of Nodes in Graph: 6035
Total number of Edges: 11245
The problem I have been facing is when I try to identify newly added edges using the code:
counter = 0
edges_all = list(G.edges_iter(data=True))
edges_before = list(H.edges_iter(data=True))
print "How many edges in old graph: ", len(edges_before)
print "How many edges in new graph: ", len(edges_all)
edge_not_found = []
for edge in edges_all:
if edge in edges_before:
counter += 1
else:
edge_not_found.append(edge)
print "Edges found: ", counter
print "Not found: ", len(edge_not_found)
And I have been getting these results:
How many edges in old graph: 3151
How many edges in new graph: 11245
Edges found: 1601
Not found: 9644
I can't understand why I am getting 1601 found instead of 11245-3151 = 8094
Any ideas?
Thank you!
TL/DR: There's a simple explanation for what you see, and if you get to the end, there is a much shorter way to write your code (with a lot of explanation along the way).
First note that it looks like Edges found is intended to be the number of edges that are in both H and G. So it should only have 3151, not 8094. 8094 should be Not found. Note that the number of edges found, 1601, is about half the number that you would expect. That makes sense because:
I believe the problem you are having is that when networkx lists out the edges an edge might appear as (a,b) in edges_before. However in edges_after, it might appear in the list as (b,a).
So (b,a) will not be in edges_before. It will fail your test. Assuming the edge orders aren't correlated between when they are listed for H and G, you'd expect to find about half of them pass. You can do a different test to see if (b,a) is an edge of H. This is H.has_edge(b,a)
A straightforward improvement:
for edge in edges_all:
if H.has_edge(edge[0],edge[1]):
counter += 1
else:
edge_not_found.append(edge)
This lets you avoid even defining edges_before.
You can also avoid defining edges_all through a better improvement:
for edge in G.edges_iter(data=True):
if H.has_edge(edge[0],edge[1]):
etc
Note: I've written it as H.has_edge(edge[0],edge[1]) to make clear what's happening. A more sophisticated way to write it is H.has_edge(*edge). The *edge notation unpacks the tuple.
Finally, using a list comprehension gives a better way to get edge_not_found:
edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]
This creates a list made up of edges which are in G but not in H.
Putting this all together (and using the .size() command to count edges in a network), we arrive at a cleaner version:
print "How many edges in old graph: ", H.size()
print "How many edges in new graph: ", G.size()
edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]
print "Not found: ", len(edge_not_found)
print "Edges found: ", G.size()-len(edge_not_found)

Categories

Resources