construct a tree out of list of strings

construct a tree out of list of strings - python

I have 400 lists that look like that:
[A ,B, C,D,E]
[A, C, G, B, E]
[A,Z,B,D,E]
...
[A,B,R,D,E]
Each with length of 5 items that start with A.
I would like to construct a tree or directed acyclic graph (while with counts a weights ) where each level is the index of the item i.e A have edges with all possible items in the first index, they will have edge with child in the second index and so on.
is there an easy way to build in in networkx ? what i thought to do is to create all the tuples for each level i.e for level 0 : (A,B) ,(A,C) , (A,Z) etc .. but not sure how to move with it

If I understood you correctly, you can set each list as a path using nx.add_path of a directed graph.
l = [['A' ,'B', 'C','D','E'],
['A', 'C','G', 'B', 'E'],
['A','Z','B','D','E'],
['A','B','R','D','E']]
Though since you have nodes across multiple levels, you should probably rename them according to their level, since you cannot have multiple nodes with the same name. So one way could be:
l = [[f'{j}_level{lev}' for lev,j in enumerate(i, 1)] for i in l]
#[['A_level1', 'B_level2', 'C_level3', 'D_level4', 'E_level5'],
# ['A_level1', 'C_level2', 'G_level3', 'B_level4', 'E_level5'],
# ['A_level1', 'Z_level2', 'B_level3', 'D_level4', 'E_level5'],
# ['A_level1', 'B_level2', 'R_level3', 'D_level4', 'E_level5']]
And now construct the graph with: 
G = nx.DiGraph()
for path in l:
nx.add_path(G, path)
Then you could create a tree-like structure using a graphviz's dot layout:
from networkx.drawing.nx_agraph import graphviz_layout
pos=graphviz_layout(G, prog='dot')
nx.draw(G, pos=pos,
node_color='lightgreen',
node_size=1500,
with_labels=True,
arrows=True)

Related

Generate a bipartite structure in gephi

I have constructed in python with networkx a bipartite network like this:
import networkx as nx
from random import choice
from string import ascii_lowercase, digits
# Define the characters to choose from
chars = ascii_lowercase + digits
# Create two separate lists of 100 random strings each
lst = [''.join(choice(chars) for _ in range(12)) for _ in range(100)]
lst1 = [''.join(choice(chars) for _ in range(12)) for _ in range(100)]
# Create node labels for each list
List1 = [city for city in lst]
List2 = [city for city in lst1]
# Create the graph object
G = nx.Graph()
# Add nodes to the graph with different bipartite indices
G.add_nodes_from(List1, bipartite=0)
G.add_nodes_from(List2, bipartite=1)
# Add edges connecting nodes from the two lists
for i in range(len(lst)):
G.add_edge(List1[i], List2[i])
# Save the graph to a file
nx.write_gexf(G, "bipartite_network.gexf")
and I want to export this in Gephi which results in the following database:
which does not give me a bipartite structure (i.e. two separate lists of node connected via edges, namely the list under id connected to the list under Label). What is the right input to give Gephi in order to obtain the desired outcome?
Thank you

Find all possible paths using networkx

how can I find all possible path between two nodes in a graph using networks?
import networkx as nx
G = nx.Graph()
edges = ['start-A', 'start-b', 'A-c', 'A-b', 'b-d', 'A-end', 'b-end']
nodes = []
for node in edges:
n1 = node.split('-')[0]
n2 = node.split('-')[1]
if n1 not in nodes:
nodes.append(n1)
if n2 not in nodes:
nodes.append(n2)
for node in nodes:
G.add_node(node)
for edge in edges:
n1 = edge.split('-')[0]
n2 = edge.split('-')[1]
G.add_edge(n1, n2)
for path in nx.all_simple_paths(G, 'start', 'end'):
print(path)
This is the result:
['start', 'A', 'b', 'end']
['start', 'A', 'end']
['start', 'b', 'A', 'end']
['start', 'b', 'end']
But I want all possible path so for e.g. start,b,A,c,A,end

If repeated visits to a node are allowed, then in a graph where at least 2 nodes on the path (not counting start and end) are connected, there is no upper bound to the number of valid paths. If there are 2 nodes on the path that are connected, e.g. nodes A and B, then any number of new paths can be formed by inserting A->B->A into the appropriate section of the valid path between start and end.
If number of repeated visits is restricted, then one might take the all_simple_paths as a starting point and insert any valid paths between two nodes in between, repeating this multiple times depending on the number of repeated visits allowed.
In your example, this would be taking the third output of all_simple_paths(G, 'start', 'end'), i.e. ['start', 'b', 'A', 'end'] and then for all nodes connected to b iterate over the results of all_simple_paths(G, X, 'A'), where X is the iterated node.
Here's rough pseudocode (it won't work but suggests an algo):
for path in nx.all_simple_paths(G, 'start', 'end'):
print(path)
for n, X, Y in enumerate(zip(path, path[1:])):
if X is not 'start' and X is not 'end':
for sub_path in nx.all_simple_paths(G, X, Y):
print(path[:n] + sub_path + path[n+2:])
This is not great, since with this formulation it's hard to control the number of repeated visits. One way to fix that is to create an additional filter based on the counts of nodes. However, for any real-world graphs this is not going to be computationally feasible due to the very large number of paths and nodes...

How to create a DAG from a list in python

I am using networkx to manually input the (u, v, weights) to a graph. But when the input gets bigger this manual insertion of nodes and edges will become a really tiresome task and prone to errors. I'm trying but haven't figured out that how to perform this task without manual labour.
Sample Input:
my_list = ["s1[0]", "d1[0, 2]", "s2[0]", "d2[1, 3]", "d3[0, 2]", "d4[1, 4]", "d5[2,
3]", "d6[1, 4]"]
Manual Insertion:
Before inserting nodes into a graph I need to number them, so first occurrence of 's' or 'd' can be differentiate from later similar characters e.g. s1,s2,s3,... and d1,d2,d3,...
I am aware it is something similar to SSA form (compilers) but I was not able to find something helpful for my case.
Manually inserting (u, v, weights) to a DiGraph()
my_graph.add_weighted_edges_from([("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), (
"d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)])
Question:
How to automatically convert that input list(my_list) into a DAG(my_graph), avoiding manual insertion?
Complete Code:
This is what I have written so far.
import networkx as nx
from networkx.drawing.nx_agraph import write_dot, graphviz_layout
from matplotlib import pyplot as plt
my_graph = nx.DiGraph()
my_graph.add_weighted_edges_from([("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), (
"d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)])
write_dot(my_graph, "graph.dot")
plt.title("draw graph")
pos = graphviz_layout(my_graph, prog='dot')
nx.draw(my_graph, pos, with_labels=True, arrows=True)
plt.show()
plt.clf()
Explanation:
's' and 'd' are some instructions that requires 1 or 2 registers respectively, to perform an operation.
In above example we have 2 's' operations and 6 'd' operations and there are five registers [0,1,2,3,4].
Each operation will perform some calculation and store the results in relevant register/s.
From input we can see that d1 uses register 0 and 2, so it cannot operate until both of these registers are free. Therefore, d1 is dependent on s1 because s1 comes before d1 and is using register 0. As soon as s1 finishes d1 can operate as register 2 is already free.
E.g. We initialize all registers with 1. s1 doubles its input while d1 sums two inputs and store the result in it's second register:
so after s1[0] reg-0 * 2 -> 1 * 2 => reg-0 = 2
and after d1[0, 2] reg-0 + reg-2 -> 2 + 1 => reg-0 = 2 and reg-2 = 3
Update 1: The graph will be a dependency-graph based on some resources [0...4], each node will require 1(for 's') or 2(for 'd') of these resources.
Update 2: Two questions were causing confusion so I'm separating them. For now I have changed my input list and there is only a single task of converting that list into a DAG. I have also included an explanation section.
PS: You might need to pip install graphviz if you don't already have it.

Ok now that I have a better idea of how the mapping works, it just comes down to describing the process in code, keeping a mapping of which op is using which resource and as iterating over the operations if it uses a resource used by the previous operation we generate an edge. I think this is along the lines of what you are looking for:
import ast
class UniqueIdGenerator:
def __init__(self, initial=1):
self.auto_indexing = {}
self.initial = initial
def get_unique_name(self, name):
"adds number after given string to ensure uniqueness."
if name not in self.auto_indexing:
self.auto_indexing[name] = self.initial
unique_idx = self.auto_indexing[name]
self.auto_indexing[name] += 1
return f"{name}{unique_idx}"
def generate_DAG(source):
"""
takes iterable of tuples in format (name, list_of_resources) where
- name doesn't have to be unique
- list_of_resources is a list of resources in any hashable format (list of numbers or strings is typical)
generates edges in the format (name1, name2, resource),
- name1 and name2 are unique-ified versions of names in input
- resource is the value in the list of resources
each "edge" represents a handoff of resource, so name1 and name2 use the same resource sequentially.
"""
# format {resource: name} for each resource in use.
resources = {}
g = UniqueIdGenerator()
for (op, deps) in source:
op = g.get_unique_name(op)
for resource in deps:
# for each resource this operation requires, if a previous operation used it
if resource in resources:
# yield the new edge
yield (resources[resource], op, resource)
# either first or yielded an edge, this op is now using the resource.
resources[resource] = op
my_list = ["s[0]", "d[0, 2]", "s[0]", "d[1, 3]", "d[0, 2]", "d[1, 4]", "d[2, 3]", "d[1, 4]"]
data = generate_DAG((a[0], ast.literal_eval(a[1:])) for a in my_list)
print(*data, sep="\n")

iGraph: selecting vertices connected to

Suppose I have the following graph:
g = ig.Graph([(0,1), (0,2), (2,3), (3,4), (4,2), (2,5), (5,0), (6,3), (5,6)], directed=False)
g.vs["name"] = ["Alice", "Bob", "Claire", "Dennis", "Esther", "Frank", "George"]
and I wish to see who Bob is connected to. Bob is only connected to one person Alice. However if try to find the edge :
g.es.select(_source=1)
>>> <igraph.EdgeSeq at 0x7f15ece78050>
I simply get the above response. How do I infer what the vertex index is from the above. Or if that isn't possible, how do I find the vertices connected to Bob?

This seems to work. The keyword arguments consist of the property, e.g _source and _target, and operator e.g eq (=). And also it seems you need to check both the source and target of the edges (even it's an undirected graph), after filtering the edges, you can use a list comprehension to loop through the edges and extract the source or target:
connected_from_bob = [edge.target for edge in g.es.select(_source_eq=1)]
connected_to_bob = [edge.source for edge in g.es.select(_target_eq=1)]
connected_from_bob
# []
connected_to_bob
# [0]
Then vertices connected with Bob is a union of the two lists:
connected_with_bob = connected_from_bob + connected_to_bob

Python - find longest path

The function will take in a dictionary as input, and I want to find the length of a longest path in a dictionary. Basically, if in a dictionary, key2 matches value1, and key3 matches value2, and so forth, this counts as a path. For example:
{'a':'b', 'b':'c', 'c':'d'}
In the case above, the length should be three. How would I achieve this? Or more specifically how would I compare keys to values? (it could be anything, strings, numbers, etc., not only numbers)
Many thanks in advance!

I would treat the dictionary as a list of edges in a directed acyclic graph (DAG) and use the networkx module to find the longest path in the graph:
import networkx as nx
data = {'a':'b', 'b':'c', 'c':'d'}
G = nx.DiGraph()
G.add_edges_from(data.items())
try:
path = nx.dag_longest_path(G)
print(path)
# ['a', 'b', 'c', 'd']
print(len(path) - 1)
# 3
except nx.exception.NetworkXUnfeasible: # There's a loop!
print("The graph has a cycle")

If you're insisting on not importing anything you could do something like:
def find_longest_path(data):
longest = 0
for key in data.iterkeys():
seen = set()
length = -1
while key:
if key in seen:
length = -1
raise RuntimeError('Graph has loop')
seen.add(key)
key = data.get(key, False)
length += 1
if length > longest:
longest = length
return longest

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

construct a tree out of list of strings - python

Related

Generate a bipartite structure in gephi

Find all possible paths using networkx

How to create a DAG from a list in python

iGraph: selecting vertices connected to

Python - find longest path

Categories

Resources