Given a graph G, a node n and a length L, I'd like to collect all (non-cyclic) paths of length L that depart from n.
Do you have any idea on how to approach this?
By now, I my graph is a networkx.Graph instance, but I do not really care if e.g. igraph is recommended.
Thanks a lot!
A very simple way to approach (and solve entirely) this problem is to use the adjacency matrix A of the graph. The (i,j) th element of A^L is the number of paths between nodes i and j of length L. So if you sum these over all j keeping i fixed at n, you get all paths emanating from node n of length L.
This will also unfortunately count the cyclic paths. These, happily, can be found from the element A^L(n,n), so just subtract that.
So your final answer is: Σj{A^L(n,j)} - A^L(n,n).
Word of caution: say you're looking for paths of length 5 from node 1: this calculation will also count the path with small cycles inside like 1-2-3-2-4, whose length is 5 or 4 depending on how you choose to see it, so be careful about that.
I would just like to expand on Lance Helsten's excellent answer:
The depth-limited search searches for a particular node within a certain depth (what you're calling the length L), and stops when it finds it. If you will take a look at the pseudocode in the wiki link in his answer, you'll understand this:
DLS(node, goal, depth) {
if ( depth >= 0 ) {
if ( node == goal )
return node
for each child in expand(node)
DLS(child, goal, depth-1)
}
}
In your case, however, as you're looking for all paths of length L from a node, you will not stop anywhere. So the pseudocode must be modified to:
DLS(node, depth) {
for each child in expand(node) {
record paths as [node, child]
DLS(child, depth-1)
}
}
After you're done with recording all the single-link paths from successive nests of the DLS, just take a product of them to get the entire path. The number of these gives you the number of paths of the required depth starting from the node.
Use a depth limited search (http://en.wikipedia.org/wiki/Depth-limited_search) where you keep a set of visited nodes for the detection of a cycle when on a path. For example you can build a tree from your node n with all nodes and length of L then prune the tree.
I did a quick search of graph algorithms to do this, but didn't find anything. There is a list of graph algorithms (http://en.wikipedia.org/wiki/Category:Graph_algorithms) that may have just what you are looking for.
This solution might be improved in terms efficiency but it seems very short and makes use of networkx functionality:
G = nx.complete_graph(4)
n = 0
L = 3
result = []
for paths in (nx.all_simple_paths(G, n, target, L) for target in G.nodes_iter()):
print(paths)
result+=paths
Here is another (rather naive) implementation I came up with after reading the answers here:
def findAllPaths(node, childrenFn, depth, _depth=0, _parents={}):
if _depth == depth - 1:
# path found with desired length, create path and stop traversing
path = []
parent = node
for i in xrange(depth):
path.insert(0, parent)
if not parent in _parents:
continue
parent = _parents[parent]
if parent in path:
return # this path is cyclic, forget
yield path
return
for nb in childrenFn(node):
_parents[nb] = node # keep track of where we came from
for p in findAllPaths(nb, childrenFn, depth, _depth + 1, _parents):
yield p
graph = {
0: [1, 2],
1: [4, 5],
2: [3, 10],
3: [8, 9],
4: [6],
5: [6],
6: [7],
7: [],
8: [],
9: [],
10: [2] # cycle
}
for p in findAllPaths(0, lambda n: graph[n], depth=4):
print(p)
# [0, 1, 4, 6]
# [0, 1, 5, 6]
# [0, 2, 3, 8]
# [0, 2, 3, 9]
Related
This is a problem that could be done with dynamic programming that has O(n^3) complexity, but I am wondering if there are more efficient ways to do this.
Let's assume that we have the following points on a line segment of length 10
Points: [1, 3, 5, 9]
Line segment: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
There is a value for each pair of points, for example:
[1, 3]: 2; [1, 5]: 4; [1, 9]: 3; [3, 5]: 1; [3, 9]: 5; [5, 9]: 3
We want to figure out what is the maximum sum of selected point pairs such that each pair of points should be less than 5 points apart.
In my above example, ([1, 5] is fine but [3, 9] is not) and different pairs cannot overlap with each other ([[1, 5], [5, 9]] is fine but [[1, 5], [3, 5]] is not).
The answer to this question is [[1, 5], [5, 9]] with sum 7.
I can use dynamic programming to solve this problem. I start by choosing the relative closest point pairs and the less closest until the farthest pair. While proceeding, I use an n*n matrix to save the current results according to the former ones. This dynamic programming has time complexity O(n^3).
I am wondering if there are more efficient ways of doing so.
tldr, O(n^2)
You can use some graph algorithms:
Each of your points (say p) can be mapped as two nodes (po and pc):
1o, 1c, 3o, 3c 5o, 5c, 9o, 9c where o stands for opening, and c stands for closing.
Since you consider pairs
every opening node can be connected to a close node for the cost you defined
every close node can be connected to an opening node with cost 0 (to allow opening of a new pair)
e.g: in your case, the weighted edges are:
// user defined
w(1o, 3c) = 2
w(1o, 5c) = 4
w(3o, 5c) = 1
w(5o, 9c) = 3
// enable connection of pairs
w(1c, 1o) = 0
w(1c, 3o) = 0
w(1c, 5o) = 0
w(1c, 9o) = 0
w(3c, 3o) = 0
w(3c, 5o) = 0
w(3c, 9o) = 0
w(5c, 5o) = 0
w(5c, 9o) = 0
Notice that having an edge e(x, 9o) is useless, since we can't make any pair with the last point as first elem of the pair...
To those weighted edges, we can add an additional close node S which is connected to all opening nodes
w(S, 1o) = 0
w(S, 3o) = 0
w(S, 5o) = 0
w(S, 9o) = 0
We there have a DAG (direct acyclic graph) to which we want to find the longest path to any node from source S.
Algorithm should be in O(E+V) where E is number of edges and V number of vertices
Regarding the topological ordering, the points p by increasing value almost already define it: p_c should come before p_o since we can have an edge (p_c, p_o)
Below code is in O(n^2), with n the number of points, since to connect pairs together we need to create an edge from every closing node to the following opening ones (about n(n+1)/2)
//pseudocode: https://www.geeksforgeeks.org/find-longest-path-directed-acyclic-graph/
function longest(V, E) {
let max = -9000
const dist = V.reduce((acc, v)=>(acc[v] = -9000, acc), [])
dist[V[0]] = 0
V.forEach(u => {
E[u].forEach(([v, w]) => {
if (dist[v] < dist[u] + w) {
dist[v] = dist[u] + w
if (dist[v] > max) {
max = dist[v]
}
}
})
})
return max
}
function buildVE () {
const open = x => x+'o'
const close = x => x+'c'
const points = [0].concat([1, 3, 5, 9])
//close occurs before open because 5c can link to 5o
const V = [close(0)].concat(points.slice(1).flatMap(x => [close(x), open(x)]))
const E = {}
const addEdge = (a, b, w) => {
E[a] = E[a] || []
E[a].push([b, w])
}
addEdge(open(1), close(3), 2)
addEdge(open(1), close(5), 4)
addEdge(open(3), close(5), 1)
addEdge(open(5), close(9), 3)
for (let i = 0; i < points.length; ++i) {
for (let j = i; j < points.length; ++j) {
addEdge(close(points[i]), open(points[j]), 0)
}
}
const last = points[points.length-1]
E[open(last)] = []
E[close(last)] = []
return {V, E}
}
const {V, E} = buildVE()
console.log(longest(V, E))
There's an O(R^2) dynamic programming solution (R = coordinates range). That can be brought further down to O(N^2) (N = number of pairs) by doing coordinate compression.
Let's call L[i][j] the weight of the pair with maximum weight starting at k >= i and ending exactly on j. L[i][j] can be computed in O(R^2) by using the recursion L[i][j] = max(W[i][j], L[i+1][j]).
Now let's call M[j] the answer to your problem with the added restriction that the last pair (let's call it Q) ends exactly on j. Notice that if the answer is composed of more than one pair, there must be another pair in the answer ending on k < j-1. If that is so, then it must be that the solution for M[j] is the same solution as for M[k], with Q appended in the end (otherwise we would be able to make M[j] bigger by using M[k] instead). Furthermore, Q must start in a coordinate i > k (since the pairs are non-overlapping), and it must be maximal, i.e. it's weight is L[k+1][j] (otherwise we would be able to make M[j] bigger by using L[k+1][j] instead of Q). There's also the special case when M[j] is composed of a single pair, in which case there's no such k.
With all that, we can make the recursion M[j] = max(L[0][j], max(M[k] + L[k+1][j], for k in [1, j-2])). You can compute M[j] in O(R^2) with the recursion above.
The solution to your problem will be max(M[j], for j in [0, R)). To bring it down to O(N^2), just sort all the coordinates (in O(N log N)), then map them to coordinates in [0, 2*N) (in O(N) using a hash map), since the only thing that matters to the problem is their order.
I have an algorithm that creates a graph that has all representations of 3-bit binary strings encoded in the form of the shortest graph paths, where an even number in the path means 0, while an odd number means 1:
from itertools import permutations, product
import networkx as nx
import progressbar
import itertools
def groups(sources, template):
func = permutations
keys = sources.keys()
combos = [func(sources[k], template.count(k)) for k in keys]
for t in product(*combos):
d = {k: iter(n) for k, n in zip(keys, t)}
yield [next(d[k]) for k in template]
g = nx.Graph()
added = []
good = []
index = []
# I create list with 3-bit binary strings
# I do not include one of the pairs of binary strings that have a mirror image
list_1 = [list(i) for i in itertools.product(tuple(range(2)), repeat=3) if tuple(reversed(i)) >= tuple(i)]
count = list(range(len(list_1)))
h = 0
while len(added) < len(list_1):
# In each next step I enlarge the list 'good` by the next even and odd number
if h != 0:
for q in range(2):
good.append([i for i in good if i%2 == q][-1] + 2)
# I create a list `c` with string indices from the list` list_1`, that are not yet used.
# Whereas the `index` list stores the numbering of strings from the list` list_1`, whose representations have already been correctly added to the `added` list.
c = [item for item in count if item not in index]
for m in c:
# I create representations of binary strings, where 0 is 'v0' and 1 is 'v1'. For example, the '001' combination is now 'v0v0v1'
a = ['v{}'.format(x%2) for x in list_1[m]]
if h == 0:
for w in range(2):
if len([i for i in good if i%2 == w]) < a.count('v{}'.format(w)):
for j in range(len([i for i in good if i%2 == w]), a.count('v{}'.format(w))):
good.insert(j,2*j + w)
sources={}
for x in range(2):
sources["v{0}".format(x)] = [n for n in good if n%2 == x]
# for each representation in the form 'v0v0v1' for example, I examine all combinations of strings where 'v0' is an even number 'a' v1 'is an odd number, choosing values from the' dobre2 'list and checking the following conditions.
for aaa_binary in groups(sources, a):
# Here, the edges and nodes are added to the graph from the combination of `aaa_binary` and checking whether the combination meets the conditions. If so, it is added to the `added` list. If not, the newly added edges are removed and the next `aaa_binary` combination is taken.
g.add_nodes_from (aaa_binary)
t1 = (aaa_binary[0],aaa_binary[1])
t2 = (aaa_binary[1],aaa_binary[2])
added_now = []
for edge in (t1,t2):
if not g.has_edge(*edge):
g.add_edge(*edge)
added_now.append(edge)
added.append(aaa_binary)
index.append(m)
for f in range(len(added)):
if nx.shortest_path(g, aaa_binary[0], aaa_binary[2]) != aaa_binary or nx.shortest_path(g, added[f][0], added[f][2]) != added[f]:
for edge in added_now:
g.remove_edge(*edge)
added.remove(aaa_binary)
index.remove(m)
break
# Calling a good combination search interrupt if it was found and the result added to the `added` list, while the index from the list 'list_1` was added to the` index` list
if m in index:
break
good.sort()
set(good)
index.sort()
h = h+1
Output paths representing 3-bit binary strings from added:
[[0, 2, 4], [0, 2, 1], [2, 1, 3], [1, 3, 5], [0, 3, 6], [3, 0, 7]]
So these are representations of 3-bit binary strings:
[[0, 0, 0], [0, 0, 1], [0, 1, 1], [1, 1, 1], [0, 1, 0], [1, 0, 1]]
Where in the step h = 0 the first 4 sub-lists were found, and in the step h = 1 the last two sub-lists were added.
Of course, as you can see, there are no reflections of the mirrored strings, because there is no such need in an undirected graph.
Graph:
The above solution creates a minimal graph and with the unique shortest paths. This means that one combination of a binary string has only one representation on the graph in the form of the shortest path. So the choice of a given path is a single-pointing indication of a given binary sequence.
Now I would like to use multiprocessing on the for m in c loop, because the order of finding elements does not matter here.
I try to use multiprocessing in this way:
from multiprocessing import Pool
added = []
def foo(i):
added = []
# do something
added.append(x[i])
return added
if __name__ == '__main__':
h = 0
while len(added)<len(c):
pool = Pool(4)
result = pool.imap_unordered(foo, c)
added.append(result[-1])
pool.close()
pool.join()
h = h + 1
Multiprocessing takes place in the while-loop, and in the foo function, the
added list is created. In each subsequent step h in the loop, the listadded should be incremented by subsequent values, and the current list added should be used in the functionfoo. Is it possible to pass the current contents of the list to the function in each subsequent step of the loop? Because in the above code, the foo function creates the new contents of the added list from scratch each time. How can this be solved?
Which in consequence gives bad results:
[[0, 2, 4], [0, 2, 1], [2, 1, 3], [1, 3, 5], [0, 1, 2], [1, 0, 3]]
Because for such a graph, nodes and edges, the condition is not met that nx.shortest_path (graph, i, j) == added[k] for every final nodes i, j from added[k] for k in added list.
Where for h = 0 to the elements [0, 2, 4], [0, 2, 1], [2, 1, 3], [1, 3, 5] are good, while elements added in the steph = 1, ie [0, 1, 2], [1, 0, 3] are evidently found without affecting the elements from the previous step.
How can this be solved?
I realize that this is a type of sequential algorithm, but I am also interested in partial solutions, i.e. parallel processes even on parts of the algorithm. For example, that the steps of h while looping run sequentially, but thefor m in c loop is multiprocessing. Or other partial solutions that will improve the entire algorithm for larger combinations.
I will be grateful for showing and implementing some idea for the use of multiprocessing in my algorithm.
I don't think you can parallelise the code as it is currently. The part that you're wanting to parallelise, the for m in c loop accesses three lists that are global good, added and index and the graph g itself. You could use a multiprocessing.Array for the lists, but that would undermine the whole point of parallelisation as multiprocessing.Array (docs) is synchronised, so the processes would not actually be running in parallel.
So, the code needs to be refactored. My preferred way of parallelising algorithms is to use a kind of a producer / consumer pattern
initialisation to set up a job queue that needs to be executed (runs sequentially)
have a pool of workers that all pull jobs from that queue (runs in parallel)
after the job queue has been exhausted, aggregate results and build up the final solution (runs sequentially)
In this case 1. would be the setup code for list_1, count and probably the h == 0 case. After that you would build a queue of "job orders", this would be the c list -> pass that list to a bunch of workers -> get the results back and aggregate. The problem is that each execution of the for m in c loop has access to global state and the global state changes after each iteration. This logically means that you can not run the code in parallel, because the first iteration changes the global state and affects what the second iteration does. That is, by definition, a sequential algorithm. You can not, at least not easily, parallelise an algorithm that iteratively builds a graph.
You could use multiprocessing.starmap and multiprocessing.Array, but that doesn't solve the problem. You still have the graph g which is also shared between all processes. So the whole thing would need to be refactored in such a way that each iteration over the for m in c loop is independent of any other iteration of that loop or the entire logic has to be changed so that the for m in c loop is not needed to begin with.
UPDATE
I was thinking that you could possibly turn the algorithm towards a slightly less sequential version with the following changes. I'm pretty sure the code already does something rather similar, but the code is a little too dense for me and graph algorithms aren't exactly my specialty.
Currently, for a new triple ('101' for instance), you're generating all possible connection points in the existing graph, then adding the new triple to the graph and eliminating nodes based on measuring shortest paths. This requires checking for shortest paths on the graph and modifying, which prevents parallelisation.
NOTE: what follows is a pretty rough outline for how the code could be refactored, I haven't tested this or verified mathematically that it actually works correctly
NOTE 2: In the below discussion '101' (notice the quotes '' is a binary string, so is '00' and '1' where as 1, 0, 4 and so on (without quotes) are vertex labels in the graph.
What if, you instead were to do a kind of a substring search on the existing graph, I'll use the first triple as an example. To initialise
generate a job_queue which contains all triples
take the first one and insert that, for instance, '000' which would be (0, 2, 4) - this is trivial no need to check anything because the graph is empty when you start so the shortest path is by definition the one you insert.
At this point you also have partial paths for '011', '001', '010' and conversely ('110' and '001' because the graph is undirected). We're going to utilise the fact that the existing graph contains sub-solutions to remaining triples in job_queue. Let's say the next triple is '010', you iterate over the binary string '010' or list('010')
if a path/vertex for '0' already exists in the graph --> continue
if a path/vertices for '01' already exists in the graph --> continue
if a path/vertices for '010' exists you're done, no need to add anything (this is actually a failure case: '010' should not have been in the job queue anymore because it was already solved).
The second bullet point would fail because '01' does not exist in the graph. Insert '1' which in this case would be node 1 to the graph and connect it to one of the three even nodes, I don't think it matters which one but you have to record which one it was connected to, let's say you picked 0. The graph now looks something like
0 - 2 - 4
\ *
\ *
\*
1
The optimal edge to complete the path is 1 - 2 (marked with stars) to get a path 0 - 1 - 2 for '010', this is the path that maximises the number of triples encoded, if the edge 1-2 is added to the graph. If you add 1-4 you encode only the '010' triple, where as 1 - 2 encodes '010' but also '001' and '100'.
As an aside, let's pretend you connected 1 to 2 at first, instead of 0 (the first connection was picked random), you now have a graph
0 - 2 - 4
|
|
|
1
and you can connect 1 to either 4 or to 0, but you again get a graph that encodes the maximum number of triples remaining in job_queue.
So how do you check how many triples a potential new path encodes? You can check for this relatively easily and more importantly the check can be done in parallel without modifying the graph g, for 3bit strings the savings from parallel aren't that big, but for 32bit strings they would be. Here's how it works.
(sequential) generate all possible complete paths from the sub-path 0-1 -> (0-1-2), (0-1-4).
(parallel) for each potential complete path check how many other triples that path solves, i.e. for each path candidate generate all the triples that the graph solves and check if those triples are still in job_queue.
(0-1-2) solves two other triples '001' (4-2-1) or (2-0-1) and '100' (1-0-2) or (1-2-4).
(0-1-4) only solved the triple '010', i.e. itself
the edge/path that solves the most triples remaining in job_queue is the optimal solution (I don't have a proof this).
You run 2. above in parallel copying the graph to each worker. Because you're not modifying the graph, only checking how many triples it solves, you can do this in parallel. Each worker should have a signature something like
check_graph(g, path, copy_of_job_queue):
# do some magic
return (n_paths_solved, paths_solved)
path is either (0-1-2) or (0-1-4), copy_of_job_queue should be a copy of the remaining paths on the job_queue. For K workers you create K copies of the queue. Once the worker pool finishes you know which path (0-1-2) or (0-1-4) solves the most triples.
You then add that path and modify the graph, and remove the solved paths from the job queue.
RINSE - REPEAT until job queue is empty.
There's a few obvious problems with the above, for one your doing a lot of copying and looping over of job_queue, if you're dealing with large bit spaces, say 32bits, then job_queue is pretty long, so you might want to not keep copying to all the workers.
For the parallel step above (2.) you might want to have job_queue actually be a dict where the key is the triple, say '010', and the value is a boolean flag saying if that triple is already encoded in the graph.
Is there a faster algorithm? Looking at these two trees, (i've represented the numbers in binary to make the paths easier to see). Now to reduce this from 14 nodes to 7 nodes, can you layer the required paths from one tree onto the other? You can add any edge you like to one of the trees as long as it doesn't connect a node with its ancestors.
_ 000
_ 00 _/
/ \_ 001
0 _ 010
\_ 01 _/
\_ 011
_ 100
_ 10 _/
/ \_ 101
1 _ 110
\_ 11 _/
\_ 111
can you see for example connecting 01 to 00, would be similar to replacing the head of the tree's 0 with 01, and thus with one edge you have added 100, 101 and 110..
Here is a fragment of the code responsible for creating the graph and its edges, depending on whether the edge exists and the condition that validates the shortest paths:
for q in range(len(aaa_binary)):
if len(added)!=i+1:
g.add_nodes_from (aaa_binary[q])
t1 = (aaa_binary[q][0],aaa_binary[q][1])
t2 = (aaa_binary[q][1],aaa_binary[q][2])
t3 = (aaa_binary[q][2],aaa_binary[q][3])
if g.has_edge(*t1)==False and g.has_edge(*t2)==False and g.has_edge(*t3)==False:
g.add_edge(*t1)
g.add_edge(*t2)
g.add_edge(*t3)
added.append([aaa_binary[q],'p'+ str(i)])
for j in range(len(added)):
if nx.shortest_path(g, added[j][0][0], added[j][0][3])!=added[j][0] or nx.shortest_path(g, aaa_binary[q][0], aaa_binary[q][3])!=aaa_binary[q]:
g.remove_edge(*t1)
g.remove_edge(*t2)
g.remove_edge(*t3)
added.remove([aaa_binary[q],'p'+ str(i)])
break
if g.has_edge(*t1)==False and g.has_edge(*t2)==False and g.has_edge(*t3)==True:
g.add_edge(*t1)
g.add_edge(*t2)
added.append([aaa_binary[q],'p'+ str(i)])
for j in range(len(added)):
if nx.shortest_path(g, added[j][0][0], added[j][0][3])!=added[j][0] or nx.shortest_path(g, aaa_binary[q][0], aaa_binary[q][3])!=aaa_binary[q]:
g.remove_edge(*t1)
g.remove_edge(*t2)
added.remove([aaa_binary[q],'p'+ str(i)])
break
# ... and then the rest of the False and True possibilities combinations in the `if g.has_edge()'condition.
added[] - list of currently valid paths in the form [[[0, 2, 4, 6], 'p0'], [[0, 2, 4, 1], 'p1'],...]
aaa_binary[] - list of path combinations to check in the form [[0, 2, 4, 6], [0, 2, 6, 4], [0, 4, 2, 6],...]
Loop operation:
The algorithm selects one sublist from the aaa_binary list, then adds nodes to the graph and creates edges. Then the algorithm checks if the given edge exists. If it does not exist, it adds it to the graph, if it exists, it does not add. Then, if the condition of the shortest path is not met, only the newly added edge is removed from the graph. And so until you find the right path from the aaa_binary list.
As you can see only with the four-element sublistors, there are 8 different combinations of False and True in the condition if g.has_edge () in the aaa_binary list, which already makes a technical problem. However, I would like to develop this to check, for example, eight-element paths, and then the combination will be 128! Which is obvious that I can not do it in the current way.
And I care that the loop necessarily adds only non-existent edges, because then it is easier to control the creation of the optimal graph.
Hence my question, is it possible to write such a loop differently and automate it more? I will be very grateful for any comments.
How about:
added_now = []
for edge in (t1,t2,t3):
if not g.has_edge(*edge):
g.add_edge(*edge)
added_now.append(edge)
added.append([aaa_binary[q],'p'+ str(i)])
for j in range(len(added)):
if nx.shortest_path(g, added[j][0][0], added[j][0][3])!=added[j][0] or nx.shortest_path(g, aaa_binary[q][0], aaa_binary[q][3])!=aaa_binary[q]:
for edge in added_now:
g.remove_edge(*edge)
added.remove([aaa_binary[q],'p'+ str(i)])
You just want to do the same for each edge that wasn't added.
Does this solution suits you ?
It doesn't block you from your 4-elem paths. It adjusts to the len of the current aaa_binary[q]. If you want to choose a n-elem paths, it should be easily modifiable. :)
It doesn't have a non-ending list of if.
for q in range(len(aaa_binary)):
if len(added)!=i+1:
g.add_nodes_from(aaa_binary[q])
# Instead of having hard-coded variable, make a list.
tn = []
for idx in range(0, len(aaa_binary[q]) - 1):
# Linking the current elem, to the next one.
# The len() - 1 avoids the iteration on the last elem,
# that will not have another elem after it.
tn.append((aaa_binary[q][idx], aaa_binary[q][idx + 1]))
# Instead of checking each and every case, try to make your
# task 'general'. Here, you want to add the edge that doesn't exist.
indexSaver = []
for index, item in enumerate(tn):
if g.has_edge(*item):
g.add_edge(*item)
# This line is here to keep in mind which item we added,
# Since we do not want to execute `.has_edge` multiple times.
indexSaver.append(index)
# This line is quite unclear as we do not know what is 'added',
# neither 'i' in your code. So I will let it as is.
added.append([aaa_binary[q], 'p' + str(i)])
# Now that non-existent edges have been added...
# I don't understand this part. So we will just modify the [3]
# index that was seemingly here to specify the last index.
for j in range(len(added)):
lastIndex = len(added) - 1
if nx.shortest_path(g, added[j][0][0], added[j][0][lastIndex])!=added[j][0] or nx.shortest_path(g, aaa_binary[q][0], aaa_binary[q][lastIndex])!=aaa_binary[q]:
# On the same logic of adding edges, we delete them.
for idx, item in enumerate(tn):
if idx in indexSaver:
g.remove_edge(*item)
added.remove([aaa_binary[q], 'p' + str(i)])
break
1
2 3
returns 1 + 3 = 4
I want to first find the maximum height of a tree and then find the sum of all its nodes.
If two path has the same height, only the path with larger sum will be return.
sorry for my bad examples... all i want to express is that a tree like above
should have the tree list like [1, [2, None, None], [3, None, None]] instead of [1,2,3]
Recursive function as recommended by Egg:
def sum_of_longest_branch(tree):
"""
Parses a tree and returns a tuple containing (depth, sum) of deepest branch.
"""
# stop conditions on leave nodes (can be single [node] or None)
if 1 == len(tree):
return 1, tree[0]
elif None == tree:
return 1, 0
# splitting the branches
else:
# calling the function recursively on all branches branching from current node
branches_sums = [sum_of_longest_branch(branch) for branch in tree[1:]]
# extracting the currently deepest branch
branch, sum = sorted(branches_sums, reverse=True)[0]
# add own node value and one depth before returning
return branch + 1, sum + tree[0]
Example:
tree = [1, [2, [4]], [3, [0]]]
depth, sum = sum_of_longest_branch(tree)
print depth, sum
Gives:
3, 7
Sorry if it's quick & dirty, but it works. The problem is actually not that trivial, especially for a beginner to programming / python. I hope its understandable.
Edit: Now checks first for depth and secondarily for the sum.
def tree_height(tree):
if (isinstance(tree, list)):
tree = tree[1:]
if (tree):
return (1 + max([tree_height(x) for x in tree]))
return 0
def tree_sum(tree):
if tree and (isinstance(tree, list)):
return tree[0] + sum([tree_sum(x) for x in tree[1:]])
return (tree or 0)
Are these values weights? Is the tree sorted in any way (is there a particular order to the tree)?
If so, maybe you can do a modified version of Dijkstra's Algorithm where you take the longest distance at each junction instead of the shortest, and instead of a min-priority queue use a stack so you traverse depth first instead of breadth first.
EDIT:
Now that I think about it, perhaps using a max-priority queue would work better. I'm still not sure what you are trying to accomplish. What I think you are asking for is the path with the largest sum, which doesn't necessarily mean it will be the branch with the most nodes. The path with the most nodes, given that each node appears to have a weight, seems meaningless because there could be shorter paths with greater weight.
I have a tree as shown below.
Red means it has a certain property, unfilled means it doesn't have it. I want to minimise the Red checks.
If Red than all Ancestors are also Red (and should not be checked again).
If Not Red than all Descendants are Not Red.
The depth of the tree is d.
The width of the tree is n.
Note that children nodes have value larger than the parent.
Example: In the tree below,
Node '0' has children [1, 2, 3],
Node '1' has children [2, 3],
Node '2' has children [3] and
Node '4' has children [] (No children).
Thus children can be constructed as:
if vertex.depth > 0:
vertex.children = [Vertex(parent=vertex, val=child_val, depth=vertex.depth-1, n=n) for child_val in xrange(self.val+1, n)]
else:
vertex.children = []
Here is an example tree:
I am trying to count the number of Red nodes. Both the depth and the width of the tree will be large. So I want to do a sort of Depth-First-Search and additionally use the properties 1 and 2 from above.
How can I design an algorithm to do traverse that tree?
PS: I tagged this [python] but any outline of an algorithm would do.
Update & Background
I want to minimise the property checks.
The property check is checking the connectedness of a bipartite graph constructed from my tree's path.
Example:
The bottom-left node in the example tree has path = [0, 1].
Let the bipartite graph have sets R and C with size r and c. (Note, that the width of the tree is n=r*c).
From the path I get to the edges of the graph by starting with a full graph and removing edges (x, y) for all values in the path as such: x, y = divmod(value, c).
The two rules for the property check come from the connectedness of the graph:
- If the graph is connected with edges [a, b, c] removed, then it must also be connected with [a, b] removed (rule 1).
- If the graph is disconnected with edges [a, b, c] removed, then it must also be disconnected with additional edge d removed [a, b, c, d] (rule 2).
Update 2
So what I really want to do is check all combinations of picking d elements out of [0..n]. The tree structure somewhat helps but even if I got an optimal tree traversal algorithm, I still would be checking too many combinations. (I noticed that just now.)
Let me explain. Assuming I need checked [4, 5] (so 4 and 5 are removed from bipartite graph as explained above, but irrelevant here.). If this comes out as "Red", my tree will prevent me from checking [4] only. That is good. However, I should also mark off [5] from checking.
How can I change the structure of my tree (to a graph, maybe?) to further minimise my number of checks?
Use a variant of the deletion–contraction algorithm for evaluating the Tutte polynomial (evaluated at (1,2), gives the total number of spanning subgraphs) on the complete bipartite graph K_{r,c}.
In a sentence, the idea is to order the edges arbitrarily, enumerate spanning trees, and count, for each spanning tree, how many spanning subgraphs of size r + c + k have that minimum spanning tree. The enumeration of spanning trees is performed recursively. If the graph G has exactly one vertex, the number of associated spanning subgraphs is the number of self-loops on that vertex choose k. Otherwise, find the minimum edge that isn't a self-loop in G and make two recursive calls. The first is on the graph G/e where e is contracted. The second is on the graph G-e where e is deleted, but only if G-e is connected.
Python is close enough to pseudocode.
class counter(object):
def __init__(self, ival = 0):
self.count = ival
def count_up(self):
self.count += 1
return self.count
def old_walk_fun(ilist, func=None):
def old_walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
for q in ilist:
tlist += old_walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return old_walk_fun_helper(ilist, func)
else:
return []
def walk_fun(ilist, func=None):
def walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
if(ilist[0] == "Red"): # Only evaluate sub-branches if current level is Red
for q in ilist:
tlist += walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return walk_fun_helper(ilist, func)
else:
return []
# Crude tree structure, first element is always its colour; following elements are its children
tree_list = \
["Red",
["Red",
["Red",
[]
],
["White",
[]
],
["White",
[]
]
],
["White",
["White",
[]
],
["White",
[]
]
],
["Red",
[]
]
]
red_counter = counter()
eval_counter = counter()
old_walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Unconditionally walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
red_counter = counter()
eval_counter = counter()
walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Selectively walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
How hard are you working on making the test for connectedness fast?
To test a graph for connectedness I would pick edges in a random order and use union-find to merge vertices when I see an edge that connects them. I could terminate early if the graph was connected, and I have a sort of certificate of connectedness - the edges which connected two previously unconnected sets of vertices.
As you work down the tree/follow a path on the bipartite graph, you are removing edges from the graph. If the edge you remove is not in the certificate of connectedness, then the graph must still be connected - this looks like a quick check to me. If it is in the certificate of connectedness you could back up to the state of union/find as of just before that edge was added and then try adding new edges, rather than repeating the complete connectedness test.
Depending on exactly how you define a path, you may be able to say that extensions of that path will never include edges using a subset of vertices - such as vertices which are in the interior of the path so far. If edges originating from those untouchable vertices are sufficient to make the graph connected, then no extension of the path can ever make it unconnected. Then at the very least you just have to count the number of distinct paths. If the original graph is regular I would hope to find some dynamic programming recursion that lets you count them without explicitly enumerating them.