How to traverse tree with specific properties - python

I have a tree as shown below.
Red means it has a certain property, unfilled means it doesn't have it. I want to minimise the Red checks.
If Red than all Ancestors are also Red (and should not be checked again).
If Not Red than all Descendants are Not Red.
The depth of the tree is d.
The width of the tree is n.
Note that children nodes have value larger than the parent.
Example: In the tree below,
Node '0' has children [1, 2, 3],
Node '1' has children [2, 3],
Node '2' has children [3] and
Node '4' has children [] (No children).
Thus children can be constructed as:
if vertex.depth > 0:
vertex.children = [Vertex(parent=vertex, val=child_val, depth=vertex.depth-1, n=n) for child_val in xrange(self.val+1, n)]
else:
vertex.children = []
Here is an example tree:
I am trying to count the number of Red nodes. Both the depth and the width of the tree will be large. So I want to do a sort of Depth-First-Search and additionally use the properties 1 and 2 from above.
How can I design an algorithm to do traverse that tree?
PS: I tagged this [python] but any outline of an algorithm would do.
Update & Background
I want to minimise the property checks.
The property check is checking the connectedness of a bipartite graph constructed from my tree's path.
Example:
The bottom-left node in the example tree has path = [0, 1].
Let the bipartite graph have sets R and C with size r and c. (Note, that the width of the tree is n=r*c).
From the path I get to the edges of the graph by starting with a full graph and removing edges (x, y) for all values in the path as such: x, y = divmod(value, c).
The two rules for the property check come from the connectedness of the graph:
- If the graph is connected with edges [a, b, c] removed, then it must also be connected with [a, b] removed (rule 1).
- If the graph is disconnected with edges [a, b, c] removed, then it must also be disconnected with additional edge d removed [a, b, c, d] (rule 2).
Update 2
So what I really want to do is check all combinations of picking d elements out of [0..n]. The tree structure somewhat helps but even if I got an optimal tree traversal algorithm, I still would be checking too many combinations. (I noticed that just now.)
Let me explain. Assuming I need checked [4, 5] (so 4 and 5 are removed from bipartite graph as explained above, but irrelevant here.). If this comes out as "Red", my tree will prevent me from checking [4] only. That is good. However, I should also mark off [5] from checking.
How can I change the structure of my tree (to a graph, maybe?) to further minimise my number of checks?

Use a variant of the deletion–contraction algorithm for evaluating the Tutte polynomial (evaluated at (1,2), gives the total number of spanning subgraphs) on the complete bipartite graph K_{r,c}.
In a sentence, the idea is to order the edges arbitrarily, enumerate spanning trees, and count, for each spanning tree, how many spanning subgraphs of size r + c + k have that minimum spanning tree. The enumeration of spanning trees is performed recursively. If the graph G has exactly one vertex, the number of associated spanning subgraphs is the number of self-loops on that vertex choose k. Otherwise, find the minimum edge that isn't a self-loop in G and make two recursive calls. The first is on the graph G/e where e is contracted. The second is on the graph G-e where e is deleted, but only if G-e is connected.

Python is close enough to pseudocode.
class counter(object):
def __init__(self, ival = 0):
self.count = ival
def count_up(self):
self.count += 1
return self.count
def old_walk_fun(ilist, func=None):
def old_walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
for q in ilist:
tlist += old_walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return old_walk_fun_helper(ilist, func)
else:
return []
def walk_fun(ilist, func=None):
def walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
if(ilist[0] == "Red"): # Only evaluate sub-branches if current level is Red
for q in ilist:
tlist += walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return walk_fun_helper(ilist, func)
else:
return []
# Crude tree structure, first element is always its colour; following elements are its children
tree_list = \
["Red",
["Red",
["Red",
[]
],
["White",
[]
],
["White",
[]
]
],
["White",
["White",
[]
],
["White",
[]
]
],
["Red",
[]
]
]
red_counter = counter()
eval_counter = counter()
old_walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Unconditionally walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
red_counter = counter()
eval_counter = counter()
walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Selectively walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""

How hard are you working on making the test for connectedness fast?
To test a graph for connectedness I would pick edges in a random order and use union-find to merge vertices when I see an edge that connects them. I could terminate early if the graph was connected, and I have a sort of certificate of connectedness - the edges which connected two previously unconnected sets of vertices.
As you work down the tree/follow a path on the bipartite graph, you are removing edges from the graph. If the edge you remove is not in the certificate of connectedness, then the graph must still be connected - this looks like a quick check to me. If it is in the certificate of connectedness you could back up to the state of union/find as of just before that edge was added and then try adding new edges, rather than repeating the complete connectedness test.
Depending on exactly how you define a path, you may be able to say that extensions of that path will never include edges using a subset of vertices - such as vertices which are in the interior of the path so far. If edges originating from those untouchable vertices are sufficient to make the graph connected, then no extension of the path can ever make it unconnected. Then at the very least you just have to count the number of distinct paths. If the original graph is regular I would hope to find some dynamic programming recursion that lets you count them without explicitly enumerating them.

Related

Get all possible simple paths between two nodes (Graph theory)

In the context of graph theory, I'm trying to get all the possible simple paths between two nodes.
I record the network using an adjacency matrix stored in a pandas dataframe, in a way that network[x][y] store the value of the arrow which goes from x to y.
To get the paths between two nodes, what I do is:
I get all the possible permutations with all the nodes (using it.permutations -as the path is simple there is no repetitions).
Then I use an ad hoc function: adjacent (which gives me the neighbours of a node), to check which among all the possible paths are true.
This takes too long, and it's not efficient. Do you know how I can improve the code? May be with a recursive function??
For a non relevant reason I don't want to use Networkx
def get_simple_paths(self, node, node_objective):
# Gives you all simple path between two nodes
#We get all possibilities and then we will filter it
nodes = self.nodes #The list of all nodes
possible_paths = [] #Store all possible paths
simple_paths = [] #Store the truly paths
l = 2
while l <= len(nodes):
for x in it.permutations(nodes, l): #They are neighbourgs
if x[0] == node and x[-1] == node_objective:
possible_paths.append(x)
l += 1
# Now check which of those paths exists
for x_pos, x in enumerate(possible_paths):
for i_pos, i in enumerate(x):
#We use it to check among all the path,
#if two of the nodes are not neighbours, the loop brokes
if i in self.adjacencies(x[i_pos+1]):
if i_pos+2 == len(x):
simple_paths.append(x)
break
else:
continue
else:
break
#Return simple paths
return(simple_paths)

Iterate over two lists, execute function and return values

I am trying to iterate over two lists of the same length, and for the pair of entries per index, execute a function. The function aims to cluster the entries
according to some requirement X on the value the function returns.
The lists in questions are:
e_list = [-0.619489,-0.465505, 0.124281, -0.498212, -0.51]
p_list = [-1.7836,-1.14238, 1.73884, 1.94904, 1.84]
and the function takes 4 entries, every combination of l1 and l2.
The function is defined as
def deltaR(e1, p1, e2, p2):
de = e1 - e2
dp = p1 - p2
return de*de + dp*dp
I have so far been able to loop over the lists simultaneously as:
for index, (eta, phi) in enumerate(zip(e_list, p_list)):
for index2, (eta2, phi2) in enumerate(zip(e_list, p_list)):
if index == index2: continue # to avoid same indices
if deltaR(eta, phi, eta2, phi2) < X:
print (index, index2) , deltaR(eta, phi, eta2, phi2)
This loops executes the function on every combination, except those that are same i.e. index 0,0 or 1,1 etc
The output of the code returns:
(0, 1) 0.659449892453
(1, 0) 0.659449892453
(2, 3) 0.657024790285
(2, 4) 0.642297230697
(3, 2) 0.657024790285
(3, 4) 0.109675332432
(4, 2) 0.642297230697
(4, 3) 0.109675332432
I am trying to return the number of indices that are all matched following the condition above. In other words, to rearrange the output to:
output = [No. matched entries]
i.e.
output = [2, 3]
2 coming from the fact that indices 0 and 1 are matched
3 coming from the fact that indices 2, 3, and 4 are all matched
A possible way I have thought of is to append to a list, all the indices used such that I return
output_list = [0, 1, 1, 0, 2, 3, 4, 3, 2, 4, 4, 2, 3]
Then, I use defaultdict to count the occurrances:
for index in output_list:
hits[index] += 1
From the dict I can manipulate it to return [2,3] but is there a more pythonic way of achieving this?
This is finding connected components of a graph, which is very easy and well documented, once you revisit the problem from that view.
The data being in two lists is a distraction. I am going to consider the data to be zip(e_list, p_list). Consider this as a graph, which in this case has 5 nodes (but could have many more on a different data set). Construct the graph using these nodes, and connected them with an edge if they pass your distance test.
From there, you only need to determine the connected components of an undirected graph, which is covered on many many places. Here is a basic depth first search on this site: Find connected components in a graph
You loop through the nodes once, performing a DFS to find all connected nodes. Once you look at a node, mark it visited, so it does not get counted again. To get the answer in the format you want, simply count the number of unvisited nodes found from each unvisited starting point, and append that to a list.
------------------------ graph theory ----------------------
You have data points that you want to break down into related groups. This is a topic in both mathematics and computer science known as graph theory. see: https://en.wikipedia.org/wiki/Graph_theory
You have data points. Imagine drawing them in eta phi space as rectangular coordinates, and then draw lines between the points that are close to each other. You now have a "graph" with vertices and edges.
To determine which of these dots have lines between them is finding connected components. Obviously it's easy to see, but if you have thousands of points, and you want a computer to find the connected components quickly, you use graph theory.
Suppose I make a list of all the eta phi points with zip(e_list, p_list), and each entry in the list is a vertex. If you store the graph in "adjacency list" format, then each vertex will also have a list of the outgoing edges which connect it to another vertex.
Finding a connected component is literally as easy as looking at each vertex, putting a checkmark by it, and then following every line to the next vertex and putting a checkmark there, until you can't find anything else connected. Now find the next vertex without a checkmark, and repeat for the next connected component.
As a programmer, you know that writing your own data structures for common problems is a bad idea when you can use published and reviewed code to handle the task. Google "python graph module". One example mentioned in comments is "pip install networkx". If you build the graph in networkx, you can get the connected components as a list of lists, then take the len of each to get the format you want: [len(_) for _ in nx.connected_components(G)]
---------------- code -------------------
But if you don't understand the math, then you might not understand a module for graphs, nor a base python implementation, but it's pretty easy if you just look at some of those links. Basically dots and lines, but pretty useful when you apply the concepts, as you can see with your problem being nothing but a very simple graph theory problem in disguise.
My graph is a basic list here, so the vertices don't actually have names. They are identified by their list index.
e_list = [-0.619489,-0.465505, 0.124281, -0.498212, -0.51]
p_list = [-1.7836,-1.14238, 1.73884, 1.94904, 1.84]
def deltaR(e1, p1, e2, p2):
de = e1 - e2
dp = p1 - p2
return de*de + dp*dp
X = 1 # you never actually said, but this works
def these_two_particles_are_going_the_same_direction(p1, p2):
return deltaR(p1.eta, p1.phi, p2.eta, p2.phi) < X
class Vertex(object):
def __init__(self, eta, phi):
self.eta = eta
self.phi = phi
self.connected = []
self.visited = False
class Graph(object):
def __init__(self, e_list, p_list):
self.vertices = []
for eta, phi in zip(e_list, p_list):
self.add_node(eta, phi)
def add_node(self, eta, phi):
# add this data point at the next available index
n = len(self.vertices)
a = Vertex(eta, phi)
for i, b in enumerate(self.vertices):
if these_two_particles_are_going_the_same_direction(a,b):
b.connected.append(n)
a.connected.append(i)
self.vertices.append(a)
def reset_visited(self):
for v in self.nodes:
v.visited = False
def DFS(self, n):
#perform depth first search from node n, return count of connected vertices
count = 0
v = self.vertices[n]
if not v.visited:
v.visited = True
count += 1
for i in v.connected:
count += self.DFS(i)
return count
def connected_components(self):
self.reset_visited()
components = []
for i, v in enumerate(self.vertices):
if not v.visited:
components.append(self.DFS(i))
return components
g = Graph(e_list, p_list)
print g.connected_components()

Cycle detection in a 2-tuple python list

Given a list of edges in 2-tuple, (source, destination), is there any efficient way to determine if a cycle exists? Eg, in the example below, a cycle exists because 1 -> 3 -> 6 -> 4 -> 1. One idea is to calculate the number of occurrence of each integer in the list (again, is there any efficient way to do this?). Is there any better way? I am seeing a problem with 10,000 of 2-tuple edge information.
a = [(1,3), (4,6), (3,6), (1,4)]
I'm assuming you want to find a cycle in the undirected graph represented by your edge list and you don't want to count "trivial" cycles of size 1 or 2.
You can still use a standard depth-first search, but you need to be a bit careful about the node coloring (a simple flag to signal which nodes you have already visited is not sufficient):
from collections import defaultdict
edges = [(1,3), (4,6), (3,6), (1,4)]
adj = defaultdict(set)
for x, y in edges:
adj[x].add(y)
adj[y].add(x)
col = defaultdict(int)
def dfs(x, parent=None):
if col[x] == 1: return True
if col[x] == 2: return False
col[x] = 1
res = False
for y in adj[x]:
if y == parent: continue
if dfs(y, x): res = True
col[x] = 2
return res
for x in adj:
if dfs(x):
print "There's a cycle reachable from %d!" % x
This will detect if there is a back edge in the depth-first forest that spans at least 2 levels. This is exactly the case if there is a simple cycle of size >= 2. By storing parent pointers you can actually print the cycle as well if you found it.
For large graphs you might want to use an explicit stack instead of recursion, as illustrated on Wikipedia.

python, igraph coping with vertex renumbering

I am implementing an algorithm for finding a dense subgraph in a directed graph using python+igraph. The main loop maintains two subgraphs S and T which are initially identical and removes nodes (and incident edges) accoriding to a count of the indegree (or outdegree) of those nodes with respect to the other graph. The problem I have is that igraph renumbers the vertices so when I delete some from T, the remaining nodes no longer correspond to the same ones in S.
Here is the main part of the loop that is key.
def directed(S):
T = S.copy()
c = 2
while(S.vcount() > 0 and T.vcount() > 0):
if (S.vcount()/T.vcount() > c):
AS = S.vs.select(lambda vertex: T.outdegree(vertex) < 1.01*E(S,T)/S.vcount())
S.delete_vertices(AS)
else:
BT = T.vs.select(lambda vertex: S.indegree(vertex) < 1.01*E(S,T)/T.vcount())
T.delete_vertices(BT)
This doesn't work because of the effect of deleting vertices on the vertex ids. Is there a standard workaround for this problem?
One possibility is to assign unique names to the vertices in the name vertex attribute. These are kept intact when vertices are removed (unlike vertex IDs), and you can use them to refer to vertices in functions like indegree or outdegree. E.g.:
>>> g = Graph.Ring(4)
>>> g.vs["name"] = ["A", "B", "C", "D"]
>>> g.degree("C")
2
>>> g.delete_vertices(["B"])
>>> g.degree("C")
1
Note that I have removed vertex B so vertex C also gained a new ID, but the name is still the same.
In your case, the row with the select condition could probably be re-written like this:
AS = S.vs.select(lambda vertex: T.outdegree(vertex["name"]) < 1.01 * E(S,T)/S.vcount())
Of course this assumes that initially the vertex names are the same in S and T.

Converting a few python lines in pseudocde

Goal: Trying to convert some of the lines of an algorithm written in python to pseudocode.
Goal of the given algorithm: Find all cycles in a directed graph with cycles.
Where I stand: I well understand the theory behind the algorithm, I have also coded different versions on my own, however I cannot write an algorithm that small, efficient and correct on my own.
Source: stackoverflow
What I have done so far: I cannot describe enough how many weeks spent on it, have coded Tarjan, various versions DFS, Flloyd etc in php but unfortunately they are partial solutions only and one have to extend them more.
In addition: I have run this algorithm online and it worked, I need it for a school project that I am stack and cannot proceed further.
This is the algorithm:
def paths_rec(path,edges):
if len(path) > 0 and path[0][0] == path[-1][1]:
print "cycle", path
return #cut processing when find a cycle
if len(edges) == 0:
return
if len(path) == 0:
#path is empty so all edges are candidates for next step
next_edges = edges
else:
#only edges starting where the last one finishes are candidates
next_edges = filter(lambda x: path[-1][1] == x[0], edges)
for edge in next_edges:
edges_recursive = list(edges)
edges_recursive.remove(edge)
#recursive call to keep on permuting possible path combinations
paths_rec(list(path) + [edge], edges_recursive)
def all_paths(edges):
paths_rec(list(),edges)
if __name__ == "__main__":
#edges are represented as (node,node)
# so (1,2) represents 1->2 the edge from node 1 to node 2.
edges = [(1,2),(2,3),(3,4),(4,2),(2,1)]
all_paths(edges)
This is what I have managed to write in pseudocode from it, I have marked with #? the lines I do not understand. Once I have them in pseudocode I can code them in php with which I am a lot familiar.
procedure paths_rec (path, edges)
if size(path) > 0 and path[0][0] equals path[-1][1]
print "cycle"
for each element in path
print element
end of for
return
end of if
if size(edges) equals 0
return
end of if
if size(path) equals 0
next_edges equals edges
else
next edges equals filter(lambda x: path[-1][1] == x[0], edges) #?
end of else
for each edge in next_edges
edges_recursive = list(edges) #?
edges_recursive.remove(edge)#?
#recursive call to keep on permuting possible path combinations
paths_rec(list(path) + [edge], edges_recursive)#?
The line next_edges = filter(lambda x: path[-1][1] == x[0], edges) creates a new list containing those edges in edges whose first point is the same as the second point of the last element of the current path, and is equivalent to
next_edges = []
for x in edges:
if path[len(path) - 1][1] == x[0]
next_edges.append[x]
The lines create a new copy of the list of edges edges, so that when edge is removed from this copy, it doesn't change the list edges
edges_recursive = list(edges)
edges_recursive.remove(edge)
paths_rec(list(path) + [edge], edges_recursive) is just the recursive call with the path with edge added to the end and the new list of edges that has edge removed from it.

Categories

Resources