Related
So essentially it is a simple two sum problem but there are multiple solutions. At the end I would like to return all pairs that sum up to the target within a given list and then tally the total number of pairs at the end and return that as well. Currently can only seem to return 1 pair of numbers.
So far my solution has to been to try and implement a function that counts the amount of additions done, and while that number is less than the total length of the list the code would continue to iterate. This did not prove effective as it would still not take into account other solutions. Any help would be greatly appreciated
I took your code and did a couple of tweaks to where summations were being tested and how the data was being stored. Following is your tweaked code.
def suminlist(mylist,target):
sumlist = []
count = 0
for i in range(len(mylist)):
for x in range(i+1,len(mylist)):
sum = mylist[i] + mylist[x]
if sum == target:
count += 1
worklist = []
worklist.append(mylist[i])
worklist.append(mylist[x])
sumlist.append(worklist)
return count, sumlist
list = [0, 5, 4, -6, 2, 7, 13, 3, 1]
print(suminlist(list,4))
Things to point out.
The sumlist variable is defined as a list with no initial values.
When a summation of two values in the passed list equate to the test value, they are placed into a new interim list and then that list is appended to the sumlist list along with incrementing the count value.
Once all list combinations are identified, the count value and sumlist are returned to the calling statement.
Following was the test output at the terminal for your list.
#Dev:~/Python_Programs/SumList$ python3 SumList.py
(2, [[0, 4], [3, 1]])
To split the count value out from the list, you might consider splitting the returned data as noted in the following reference Returning Multiple Values.
Give that a try to see if it meets the spirit of your project.
You can use the itertools module for this job.
my_list = [1, 2, 3, 4]
target = 3
out = [x for x in itertools.combinations(my_list, r=2) if sum(x) == target]
print(out)
>>> [(0, 3), (1, 2)]
If you feel like using a python standard library import is cheating, the the official documentation linked above showcases example code for a "low level" python implementation.
Issue:
The issue for returning one set of possible several sets remains in the first return line (return sumlist). Based on the code, the function will automatically ends the function as the first set of value that their sum is the same as the target value. Therefore, we need to adjust it.
Adjustment:
I add a list(finallist[]) at the begining of the function for collecting all the applicable sets that can sum up to the target value. Then, I add a list(list[]) right after the if statement (*since I create an empty list(list[]) right after the if statement, when any sum of two values fulfills the target value, the function will empty the list again to store the new set of two values and append to the finallist[] again). Hence, as long as a set of two numbers can sum up to the target value, we can append them to the list(list[]). Accordingly, I add two more lines of code to append two values into the list(list[]). At the end, I append this list(list[]) to finallist[]. Also, I move the return statement to the final line and adjust the spacing. After this adjustment, the function will not end right after discovering the first possible set of values. Instead, the function will iterate repeatedly until getting all sets of the values and storing in the finalist[].
Originally, the function puts the return statement (return -1) at the end of the function for the situation that none of the sets can sum up to the target value. However, after the previous adjustment, the original return statement (return -1) will not have the opportunity to function as everything will end in the previous return line (return finallist). Therefore, I change it to the else part in the if statement (*meaning: when none of the sum of two values adds up to the target value, we will return 'No two values in the list can add up to the target value.')
Changes in Function:
def suminlist(mylist,target):
# count = 0 # delete
# while count < len(mylist): # delete
finallist=[] # add
for i in range(len(mylist)):
for x in range(i+1,len(mylist)):
sum = mylist[i]+mylist[x]
# count = count + 1 # delete
if sum == target:
# sumlist = mylist[i],mylist[x] # delete
# return sumlist # delete
list=[] # add
list.append(mylist[i]) # add
list.append(mylist[x]) # add
finallist.append(list) # add
else: # add
return 'No two values in the list can add up to the target value.' # add
return finallist # add
# return -1 # delete
Final Version:
def suminlist(mylist,target):
finallist=[]
for i in range(len(mylist)):
for x in range(i+1,len(mylist)):
sum = mylist[i]+mylist[x]
if sum == target:
list=[]
list.append(mylist[i])
list.append(mylist[x])
finallist.append(list)
else:
return 'No two values in the list can add up to the target value.'
return finallist
Test Code and Output:
list = [0, 5, 4, -6, 2, 7, 13, 3, 1]
print(suminlist(list,100))
# Output: No two values in the list can add up to the target value.
print(suminlist(list,4))
# Output: [[0, 4], [3, 1]]
I have a code which gets a number of triangles in an Undirected Graph using matrix multiplication method. Now I would like it to also print these triangles, preferably to print those vertexes. It could be done with third party libraries, e.g. numpy or networkx, but it has to be done with matrix multiplication, as I know that I could do it with naive version.
To make it simplier I will use the easiest adjacency matrix:
[[0, 1, 0, 0],
[1, 0, 1, 1],
[0, 1, 0, 1],
[0, 1, 1, 0]]
it has edges:
x,y
0,1
1,2
1,3
2,3
So the triangle exsists between vertexes 1,2,3 and this is what I would like this program ALSO prints to the console
Now the code, which just prints how many triangles are in this graph:
# num of vertexes
V = 4
# graph from adjacency matrix
graph = [[0, 1, 0, 0],
[1, 0, 1, 1],
[0, 1, 0, 1],
[0, 1, 1, 0]]
# get the vertexes in a dict
vertexes = {}
for i in range(len(graph)):
vertexes[i] = i
print(vertexes)
## >> {0: 0, 1: 1, 2: 2, 3: 3}
# matrix multiplication
def multiply(A, B, C):
global V
for i in range(V):
for j in range(V):
C[i][j] = 0
for k in range(V):
C[i][j] += A[i][k] * B[k][j]
# Utility function to calculate
# trace of a matrix (sum of
# diagonal elements)
def getTrace(graph):
global V
trace = 0
for i in range(V):
trace += graph[i][i]
return trace
# Utility function for calculating
# number of triangles in graph
def triangleInGraph(graph):
global V
# To Store graph^2
aux2 = [[None] * V for _ in range(V)]
# To Store graph^3
aux3 = [[None] * V for i in range(V)]
# Initialising aux
# matrices with 0
for i in range(V):
for j in range(V):
aux2[i][j] = aux3[i][j] = 0
# aux2 is graph^2 now printMatrix(aux2)
multiply(graph, graph, aux2)
# after this multiplication aux3 is
# graph^3 printMatrix(aux3)
multiply(graph, aux2, aux3)
trace = getTrace(aux3)
return trace // 6
print("Total number of Triangle in Graph :",
triangleInGraph(graph))
## >> Total number of Triangle in Graph : 1
The thing is, the information of the triangle (more generally speaking, information of paths between a vertex i and a vertex j) is lost during that matrix multiplication process. All that is stored is that the path exist.
For adjacency matrix itself, whose numbers are the number of length 1 paths between i and j, answer is obvious, because if a path exists, then it has to be edge (i,j). But even in M², when you see number 2 at row i column j of M², well, all you know is that there are 2 length 2 paths connecting i to j. So, that it exists 2 different index k₁ and k₂, such as (i,k₁) and (k₁,j) are edges, and so are (i,k₂) and (k₂, j).
That is exactly why matrix multiplication works (and that is a virtue of coding as explicitly as you did: I don't need to recall you that element M²ᵢⱼ = ΣMᵢₖ×Mₖⱼ
So it is exactly that: 1 for all intermediate vertex k such as (i,k) and (k,j) are both edges. So 1 for all intermediate vertex k such as (i,k),(k,j) is a length 2 path for i to j.
But as you can see, that Σ is just a sum. In a sum, we loose the detail of what contributed to the sum.
In other words, nothing to do from what you computed. You've just computed the number of length-3 path from i to j, for all i and j, and, in particular what you are interested in, the number of length-3 paths from i to i for all i.
So the only solution you have, is to write another algorithm, that does a completely different computation (but makes yours useless: why compute the number of paths, when you have, or you will compute the list of paths?).
That computation is a rather classic one: you are just looking for paths from a node to another. Only, those two nodes are the same.
Nevertheless the most classical algorithm (Dijkstra, Ford, ...) are not really useful here (you are not searching the shortest one, and you want all paths, not just one).
One method I can think of, is to start nevertheless ("nevertheless" because I said earlier that your computing of length of path was redundant) from your code. Not that it is the easiest way, but now that your code is here; besides, I allways try to stay as close as possible from the original code
Compute a matrix of path
As I've said earlier, the formula ΣAᵢₖBₖⱼ makes sense: it is computing the number of cases where we have some paths (Aᵢₖ) from i to k and some other paths (Bₖⱼ) from k to j.
You just have to do the same thing, but instead of summing a number, sum a list of paths.
For the sake of simplicity, here, I'll use lists to store paths. So path i,k,j is stored in a list [i,k,j]. So in each cell of our matrix we have a list of paths, so a list of list (so since our matrix is itself implemented as a list of list, that makes the path matrix a list of list of list of list)
The path matrix (I made up the name just now. But I am pretty sure it has already an official name, since the idea can't be new. And that official name is probably "path matrix") for the initial matrix is very simple: each element is either [] (no path) where Mᵢⱼ is 0, and is [[i,j]] (1 path, i→j) where Mᵢⱼ is 1.
So, let's build it
def adjacencyToPath(M):
P=[[[] for _ in range(len(M))] for _ in range(len(M))]
for i in range(len(M)):
for j in range(len(M)):
if M[i][j]==1:
P[i][j]=[[i,j]]
else:
P[i][j]=[]
return P
Now that you've have that, we just have to follow the same idea as in the matrix multiplication. For example (to use the most complete example, even if out of your scope, since you don't compute more than M³) when you compute M²×M³, and say M⁵ᵢⱼ = ΣM²ᵢₖM³ₖⱼ that means that if M²ᵢₖ is 3 and M³ₖⱼ is 2, then you have 6 paths of length 5 between i and j whose 3rd step is at node k: all the 6 possible combination of the 3 ways to go from i to k in 3 steps and the 2 ways to go from k to j in 2 steps.
So, let's do also that for path matrix.
# Args=2 list of paths.
# Returns 1 list of paths
# Ex, if p1=[[1,2,3], [1,4,3]] and p2=[[3,2,4,2], [3,4,5,2]]
# Then returns [[1,2,3,2,4,2], [1,2,3,4,5,2], [1,4,3,2,4,2], [1,4,3,4,5,2]]
def combineListPath(lp1, lp2):
res=[]
for p1 in lp1:
for p2 in lp2:
res.append(p1+p2[1:]) # p2[0] is redundant with p1[-1]
return res
And the path matrix multiplication therefore goes like this
def pathMult(P1, P2):
res=[[[] for _ in range(len(P1))] for _ in range(len(P1))]
for i in range(len(P1)):
for j in range(len(P1)):
for k in range(len(P1)):
res[i][j] += combineListPath(P1[i][k], P2[k][j])
return res
So, all we have to do now, is to use this pathMult function as we use the matrix multiplication. As you computed aux2, let compute pm2
pm=adjacencyToPath(graph)
pm2=pathMult(pm, pm)
and as you computed aux3, let's compute pm3
pm3=pathMult(pm, pm2)
And now, you have in pm3, at each cell pm3[i][j] the list of paths of length 3, from i to j. And in particular, in all pm3[i][i] you have the list of triangles.
Now, the advantage of this method is that it mimics exactly your way of computing the number of paths: we do the exact same thing, but instead of retaining the number of paths, we retain the list of them.
Faster way
Obviously there are more efficient way. For example, you could just search pair (i,j) of connected nodes such as there is a third node k connected to both i and j (with an edge (j,k) and an edge (k,i), making no assumption whether your graph is oriented or not).
def listTriangle(M):
res=[]
for i in range(len(M)):
for j in range(i,len(M)):
if M[i][j]==0: continue
# So, at list point, we know i->j is an edge
for k in range(i,len(M)):
if M[j,k]>0 and M[k,i]>0:
res.append( (i,j,k) )
return res
We assume j≥i and k≥i, because triangles (i,j,k), (j,k,i) and (k,i,j) are the same, and exist all or none.
It could be optimized if we make the assumption that we are always in a non-oriented (or at least symmetric) graph, as you example suggest. In which case, we can assume i≤j≤k for example (since triangles (i,j,k) and (i,k,j) are also the same), turning the 3rd for from for k in range(i, len(M)) to for k in range(j, len(M)). And also if we exclude loops (either because there are none, as in your example, or because we don't want to count them as part of a triangle), then you can make the assumption i<j<k. Which then turns the 2 last loops into for j in range(i+1, len(M)) and for k in range(j+1, len(M)).
Optimisation
Last thing I didn't want to introduce until now, to stay as close as possible to your code. It worth mentioning that python already has some matrix manipulation routines, through numpy and the # operator. So it is better to take advantage of it (even tho I took advantage of the fact you reinvented the wheel of matrix multiplication to explain my path multiplication).
Your code, for example, becomes
import numpy as np
graph = np.array([[0, 1, 0, 0],
[1, 0, 1, 1],
[0, 1, 0, 1],
[0, 1, 1, 0]])
# Utility function for calculating
# number of triangles in graph
# That is the core of your code
def triangleInGraph(graph):
return (graph # graph # graph).trace()//6 # numpy magic
# shorter that your version, isn't it?
print("Total number of Triangle in Graph :",
triangleInGraph(graph))
## >> Total number of Triangle in Graph : 1
Mine is harder to optimize that way, but that can be done. We just have to define a new type, PathList, and define what are multiplication and addition of pathlists.
class PathList:
def __init__(self, pl):
self.l=pl
def __mul__(self, b): # That's my previous pathmult
res=[]
for p1 in self.l:
for p2 in b.l:
res.append(p1+p2[1:])
return PathList(res)
def __add__(self,b): # Just concatenation of the 2 lists
return PathList(self.l+b.l)
# For fun, a compact way to print it
def __repr__(self):
res=''
for n in self.l:
one=''
for o in n:
one=one+'→'+str(o)
res=res+','+one[1:]
return '<'+res[1:]+'>'
Using list pathlist (which is just the same list of list as before, but with add and mul operators), we can now redefine our adjacencyToPath
def adjacencyToPath(M):
P=[[[] for _ in range(len(M))] for _ in range(len(M))]
for i in range(len(M)):
for j in range(len(M)):
if M[i][j]==1:
P[i][j]=PathList([[i,j]])
else:
P[i][j]=PathList([])
return P
And now, a bit of numpy magic
pm = np.array(adjacencyToPath(graph))
pm3 = pm#pm#pm
triangles = [pm3[i,i] for i in range(len(pm3))]
pm3 is the matrix of all paths from i to j. So pm3[i,i] are the triangles.
Last remark
Some python remarks on your code.
It is better to compute V from your data, that assuming that coder is coherent when they choose V=4 and a graph 4x4. So V=len(graph) is better
You don't need global V if you don't intend to overwrite V. And it is better to avoid as many global keywords as possible. I am not repeating a dogma here. I've nothing against a global variable from times to times, if we know what we are doing. Besides, in python, there is already a sort of local structure even for global variables (they are still local to the unit), so it is not as in some languages where global variables are a high risks of collision with libraries symbols. But, well, not need to take the risk of overwriting V.
No need neither for the allocate / then write in way you do your matrix multiplication (like for matrix multiplication. You allocate them first, then call matrixmultiplication(source1, source2, dest). You can just return a new matrix. You have a garbage collector now. Well, sometimes it is still a good idea to spare some work to the allocation/garbage collector. Especially if you intended to "recycle" some variables (like in mult(A,A,B); mult(A,B,C); mult(A,C,B) where B is "recycled")
Since the triangles are defined by a sequence o vertices i,j,k such that , we can define the following function:
def find_triangles(adj, n=None):
if n is None:
n = len(adj)
triangles = []
for i in range(n):
for j in range(i + 1, n):
for k in range(j + 1, n):
if (adj[i][j] and adj[j][k] and adj[k][i]):
triangles.append([i, j, k])
return triangles
print("The triangles are: ", find_triangles(graph, V))
## >> The triangles are: [[1, 2, 3]]
Hi all – I'm new to coding if anybody is wondering.
This is my first time posting here and I'm currently stuck on one of my assignments. The code below is my draft code.
The expected output for adjacency_list is:
[[1,3], [2], [3], [0,2]]
# with possibly [3,1] instead of [1,3] or [0,2] instead of [2,0]
The output from my code is:
[[1, 3], [2], [3], [0, 2]] – which is what I wanted
And for the expected output for maximal_path is:
[0, 1, 2, 3] and [0, 3, 2]
Whereas my output is:
[0, 1, 3, 2] – which is totally different from the expected output that my professor wanted. I have been going through the second function and keep hitting a dead end. Can anyone show where my mistake is with the second function?
Here is the digraph.txt list:
6 4
0 1
0 3
1 2
2 3
3 0
3 2
–
Thank you!
def adjacency_list(file_name):
# create empty list
adj_temp_list = []
# open input file
f = open('digraph0.txt','r')
# read first line
in_file = f.readline()
first_row = [int(i) for i in in_file.strip().split()]
# first_row[0] represents number of edges and first_row[1] represents number of vertices
# add number of vertices amount of empty lists to adj_temp_list
for i in range(first_row[1]):
adj_temp_list.append([])
# read edges( each line represents one edge)
for i in range(first_row[0]):
line = f.readline()
# split the line and convert the returned list into a list of integers
vertex=[int(i) for i in line.strip().split()]
# vertex[0] and vertex[1] represents start vertex and end vertex respectively.
# that is, vertex[1] is adjacent vertex or neighbour to vertex[0]
# Thus, append vertex[1] to the adjcency list of vertex[0]
adj_temp_list[vertex[0]].append(vertex[1])
# close input file
f.close()
# create a new empty list
adj_list=[]
# sort each list in the adj_temp_list and append to adj_list
for lst in adj_temp_list:
lst.sort()
adj_list.append(lst)
# return the adjacency list
return adj_list
def maximal_path(file_name,start):
# call adjacency_list function to get adjacency list
adj_list=adjacency_list(file_name)
# create max_path as an empty list
max_path=[]
# create a boolean list that represents which vertex is added to path
# Thus, initialize the list such that it represents no vertex is added to path
visited=[False]*len(adj_list)
# create an empty list(stack)
stack=[]
# add start vertex to stack list
stack.append(start)
# process until there is no possibility to extend the path
while True:
# get a node u from stack and remove it from stack
u=stack[0]
stack=stack[1:]
# set vertex u visited and add it to max_path
visited[u]=True
max_path.append(u)
# find whether u has unvisited neighbours to extend path or not
existNeighbour=False
for v in adj_list[u]:
if visited[v]==False:
existNeighbour=True
break
# if u has unvisited adjacent vertices
if existNeighbour==True:
# push all un-visited neighbours into stack
for v in adj_list[u]:
stack.append(v)
# if u has no unvisited adjacent vertices, exit the loop
else:
break
# return the maximal path
return max_path
Your approach is what's known as a Breadth First Search, or BFS. When you append new nodes to the end of the stack, the graph is explored layer by layer. The most minimal change you can make to your program to get your expected output is to insert the new nodes in the beginning of the stack.
stack.insert(0,v)
I have an algorithm that creates a graph that has all representations of 3-bit binary strings encoded in the form of the shortest graph paths, where an even number in the path means 0, while an odd number means 1:
from itertools import permutations, product
import networkx as nx
import progressbar
import itertools
def groups(sources, template):
func = permutations
keys = sources.keys()
combos = [func(sources[k], template.count(k)) for k in keys]
for t in product(*combos):
d = {k: iter(n) for k, n in zip(keys, t)}
yield [next(d[k]) for k in template]
g = nx.Graph()
added = []
good = []
index = []
# I create list with 3-bit binary strings
# I do not include one of the pairs of binary strings that have a mirror image
list_1 = [list(i) for i in itertools.product(tuple(range(2)), repeat=3) if tuple(reversed(i)) >= tuple(i)]
count = list(range(len(list_1)))
h = 0
while len(added) < len(list_1):
# In each next step I enlarge the list 'good` by the next even and odd number
if h != 0:
for q in range(2):
good.append([i for i in good if i%2 == q][-1] + 2)
# I create a list `c` with string indices from the list` list_1`, that are not yet used.
# Whereas the `index` list stores the numbering of strings from the list` list_1`, whose representations have already been correctly added to the `added` list.
c = [item for item in count if item not in index]
for m in c:
# I create representations of binary strings, where 0 is 'v0' and 1 is 'v1'. For example, the '001' combination is now 'v0v0v1'
a = ['v{}'.format(x%2) for x in list_1[m]]
if h == 0:
for w in range(2):
if len([i for i in good if i%2 == w]) < a.count('v{}'.format(w)):
for j in range(len([i for i in good if i%2 == w]), a.count('v{}'.format(w))):
good.insert(j,2*j + w)
sources={}
for x in range(2):
sources["v{0}".format(x)] = [n for n in good if n%2 == x]
# for each representation in the form 'v0v0v1' for example, I examine all combinations of strings where 'v0' is an even number 'a' v1 'is an odd number, choosing values from the' dobre2 'list and checking the following conditions.
for aaa_binary in groups(sources, a):
# Here, the edges and nodes are added to the graph from the combination of `aaa_binary` and checking whether the combination meets the conditions. If so, it is added to the `added` list. If not, the newly added edges are removed and the next `aaa_binary` combination is taken.
g.add_nodes_from (aaa_binary)
t1 = (aaa_binary[0],aaa_binary[1])
t2 = (aaa_binary[1],aaa_binary[2])
added_now = []
for edge in (t1,t2):
if not g.has_edge(*edge):
g.add_edge(*edge)
added_now.append(edge)
added.append(aaa_binary)
index.append(m)
for f in range(len(added)):
if nx.shortest_path(g, aaa_binary[0], aaa_binary[2]) != aaa_binary or nx.shortest_path(g, added[f][0], added[f][2]) != added[f]:
for edge in added_now:
g.remove_edge(*edge)
added.remove(aaa_binary)
index.remove(m)
break
# Calling a good combination search interrupt if it was found and the result added to the `added` list, while the index from the list 'list_1` was added to the` index` list
if m in index:
break
good.sort()
set(good)
index.sort()
h = h+1
Output paths representing 3-bit binary strings from added:
[[0, 2, 4], [0, 2, 1], [2, 1, 3], [1, 3, 5], [0, 3, 6], [3, 0, 7]]
So these are representations of 3-bit binary strings:
[[0, 0, 0], [0, 0, 1], [0, 1, 1], [1, 1, 1], [0, 1, 0], [1, 0, 1]]
Where in the step h = 0 the first 4 sub-lists were found, and in the step h = 1 the last two sub-lists were added.
Of course, as you can see, there are no reflections of the mirrored strings, because there is no such need in an undirected graph.
Graph:
The above solution creates a minimal graph and with the unique shortest paths. This means that one combination of a binary string has only one representation on the graph in the form of the shortest path. So the choice of a given path is a single-pointing indication of a given binary sequence.
Now I would like to use multiprocessing on the for m in c loop, because the order of finding elements does not matter here.
I try to use multiprocessing in this way:
from multiprocessing import Pool
added = []
def foo(i):
added = []
# do something
added.append(x[i])
return added
if __name__ == '__main__':
h = 0
while len(added)<len(c):
pool = Pool(4)
result = pool.imap_unordered(foo, c)
added.append(result[-1])
pool.close()
pool.join()
h = h + 1
Multiprocessing takes place in the while-loop, and in the foo function, the
added list is created. In each subsequent step h in the loop, the listadded should be incremented by subsequent values, and the current list added should be used in the functionfoo. Is it possible to pass the current contents of the list to the function in each subsequent step of the loop? Because in the above code, the foo function creates the new contents of the added list from scratch each time. How can this be solved?
Which in consequence gives bad results:
[[0, 2, 4], [0, 2, 1], [2, 1, 3], [1, 3, 5], [0, 1, 2], [1, 0, 3]]
Because for such a graph, nodes and edges, the condition is not met that nx.shortest_path (graph, i, j) == added[k] for every final nodes i, j from added[k] for k in added list.
Where for h = 0 to the elements [0, 2, 4], [0, 2, 1], [2, 1, 3], [1, 3, 5] are good, while elements added in the steph = 1, ie [0, 1, 2], [1, 0, 3] are evidently found without affecting the elements from the previous step.
How can this be solved?
I realize that this is a type of sequential algorithm, but I am also interested in partial solutions, i.e. parallel processes even on parts of the algorithm. For example, that the steps of h while looping run sequentially, but thefor m in c loop is multiprocessing. Or other partial solutions that will improve the entire algorithm for larger combinations.
I will be grateful for showing and implementing some idea for the use of multiprocessing in my algorithm.
I don't think you can parallelise the code as it is currently. The part that you're wanting to parallelise, the for m in c loop accesses three lists that are global good, added and index and the graph g itself. You could use a multiprocessing.Array for the lists, but that would undermine the whole point of parallelisation as multiprocessing.Array (docs) is synchronised, so the processes would not actually be running in parallel.
So, the code needs to be refactored. My preferred way of parallelising algorithms is to use a kind of a producer / consumer pattern
initialisation to set up a job queue that needs to be executed (runs sequentially)
have a pool of workers that all pull jobs from that queue (runs in parallel)
after the job queue has been exhausted, aggregate results and build up the final solution (runs sequentially)
In this case 1. would be the setup code for list_1, count and probably the h == 0 case. After that you would build a queue of "job orders", this would be the c list -> pass that list to a bunch of workers -> get the results back and aggregate. The problem is that each execution of the for m in c loop has access to global state and the global state changes after each iteration. This logically means that you can not run the code in parallel, because the first iteration changes the global state and affects what the second iteration does. That is, by definition, a sequential algorithm. You can not, at least not easily, parallelise an algorithm that iteratively builds a graph.
You could use multiprocessing.starmap and multiprocessing.Array, but that doesn't solve the problem. You still have the graph g which is also shared between all processes. So the whole thing would need to be refactored in such a way that each iteration over the for m in c loop is independent of any other iteration of that loop or the entire logic has to be changed so that the for m in c loop is not needed to begin with.
UPDATE
I was thinking that you could possibly turn the algorithm towards a slightly less sequential version with the following changes. I'm pretty sure the code already does something rather similar, but the code is a little too dense for me and graph algorithms aren't exactly my specialty.
Currently, for a new triple ('101' for instance), you're generating all possible connection points in the existing graph, then adding the new triple to the graph and eliminating nodes based on measuring shortest paths. This requires checking for shortest paths on the graph and modifying, which prevents parallelisation.
NOTE: what follows is a pretty rough outline for how the code could be refactored, I haven't tested this or verified mathematically that it actually works correctly
NOTE 2: In the below discussion '101' (notice the quotes '' is a binary string, so is '00' and '1' where as 1, 0, 4 and so on (without quotes) are vertex labels in the graph.
What if, you instead were to do a kind of a substring search on the existing graph, I'll use the first triple as an example. To initialise
generate a job_queue which contains all triples
take the first one and insert that, for instance, '000' which would be (0, 2, 4) - this is trivial no need to check anything because the graph is empty when you start so the shortest path is by definition the one you insert.
At this point you also have partial paths for '011', '001', '010' and conversely ('110' and '001' because the graph is undirected). We're going to utilise the fact that the existing graph contains sub-solutions to remaining triples in job_queue. Let's say the next triple is '010', you iterate over the binary string '010' or list('010')
if a path/vertex for '0' already exists in the graph --> continue
if a path/vertices for '01' already exists in the graph --> continue
if a path/vertices for '010' exists you're done, no need to add anything (this is actually a failure case: '010' should not have been in the job queue anymore because it was already solved).
The second bullet point would fail because '01' does not exist in the graph. Insert '1' which in this case would be node 1 to the graph and connect it to one of the three even nodes, I don't think it matters which one but you have to record which one it was connected to, let's say you picked 0. The graph now looks something like
0 - 2 - 4
\ *
\ *
\*
1
The optimal edge to complete the path is 1 - 2 (marked with stars) to get a path 0 - 1 - 2 for '010', this is the path that maximises the number of triples encoded, if the edge 1-2 is added to the graph. If you add 1-4 you encode only the '010' triple, where as 1 - 2 encodes '010' but also '001' and '100'.
As an aside, let's pretend you connected 1 to 2 at first, instead of 0 (the first connection was picked random), you now have a graph
0 - 2 - 4
|
|
|
1
and you can connect 1 to either 4 or to 0, but you again get a graph that encodes the maximum number of triples remaining in job_queue.
So how do you check how many triples a potential new path encodes? You can check for this relatively easily and more importantly the check can be done in parallel without modifying the graph g, for 3bit strings the savings from parallel aren't that big, but for 32bit strings they would be. Here's how it works.
(sequential) generate all possible complete paths from the sub-path 0-1 -> (0-1-2), (0-1-4).
(parallel) for each potential complete path check how many other triples that path solves, i.e. for each path candidate generate all the triples that the graph solves and check if those triples are still in job_queue.
(0-1-2) solves two other triples '001' (4-2-1) or (2-0-1) and '100' (1-0-2) or (1-2-4).
(0-1-4) only solved the triple '010', i.e. itself
the edge/path that solves the most triples remaining in job_queue is the optimal solution (I don't have a proof this).
You run 2. above in parallel copying the graph to each worker. Because you're not modifying the graph, only checking how many triples it solves, you can do this in parallel. Each worker should have a signature something like
check_graph(g, path, copy_of_job_queue):
# do some magic
return (n_paths_solved, paths_solved)
path is either (0-1-2) or (0-1-4), copy_of_job_queue should be a copy of the remaining paths on the job_queue. For K workers you create K copies of the queue. Once the worker pool finishes you know which path (0-1-2) or (0-1-4) solves the most triples.
You then add that path and modify the graph, and remove the solved paths from the job queue.
RINSE - REPEAT until job queue is empty.
There's a few obvious problems with the above, for one your doing a lot of copying and looping over of job_queue, if you're dealing with large bit spaces, say 32bits, then job_queue is pretty long, so you might want to not keep copying to all the workers.
For the parallel step above (2.) you might want to have job_queue actually be a dict where the key is the triple, say '010', and the value is a boolean flag saying if that triple is already encoded in the graph.
Is there a faster algorithm? Looking at these two trees, (i've represented the numbers in binary to make the paths easier to see). Now to reduce this from 14 nodes to 7 nodes, can you layer the required paths from one tree onto the other? You can add any edge you like to one of the trees as long as it doesn't connect a node with its ancestors.
_ 000
_ 00 _/
/ \_ 001
0 _ 010
\_ 01 _/
\_ 011
_ 100
_ 10 _/
/ \_ 101
1 _ 110
\_ 11 _/
\_ 111
can you see for example connecting 01 to 00, would be similar to replacing the head of the tree's 0 with 01, and thus with one edge you have added 100, 101 and 110..
Given a graph G, a node n and a length L, I'd like to collect all (non-cyclic) paths of length L that depart from n.
Do you have any idea on how to approach this?
By now, I my graph is a networkx.Graph instance, but I do not really care if e.g. igraph is recommended.
Thanks a lot!
A very simple way to approach (and solve entirely) this problem is to use the adjacency matrix A of the graph. The (i,j) th element of A^L is the number of paths between nodes i and j of length L. So if you sum these over all j keeping i fixed at n, you get all paths emanating from node n of length L.
This will also unfortunately count the cyclic paths. These, happily, can be found from the element A^L(n,n), so just subtract that.
So your final answer is: Σj{A^L(n,j)} - A^L(n,n).
Word of caution: say you're looking for paths of length 5 from node 1: this calculation will also count the path with small cycles inside like 1-2-3-2-4, whose length is 5 or 4 depending on how you choose to see it, so be careful about that.
I would just like to expand on Lance Helsten's excellent answer:
The depth-limited search searches for a particular node within a certain depth (what you're calling the length L), and stops when it finds it. If you will take a look at the pseudocode in the wiki link in his answer, you'll understand this:
DLS(node, goal, depth) {
if ( depth >= 0 ) {
if ( node == goal )
return node
for each child in expand(node)
DLS(child, goal, depth-1)
}
}
In your case, however, as you're looking for all paths of length L from a node, you will not stop anywhere. So the pseudocode must be modified to:
DLS(node, depth) {
for each child in expand(node) {
record paths as [node, child]
DLS(child, depth-1)
}
}
After you're done with recording all the single-link paths from successive nests of the DLS, just take a product of them to get the entire path. The number of these gives you the number of paths of the required depth starting from the node.
Use a depth limited search (http://en.wikipedia.org/wiki/Depth-limited_search) where you keep a set of visited nodes for the detection of a cycle when on a path. For example you can build a tree from your node n with all nodes and length of L then prune the tree.
I did a quick search of graph algorithms to do this, but didn't find anything. There is a list of graph algorithms (http://en.wikipedia.org/wiki/Category:Graph_algorithms) that may have just what you are looking for.
This solution might be improved in terms efficiency but it seems very short and makes use of networkx functionality:
G = nx.complete_graph(4)
n = 0
L = 3
result = []
for paths in (nx.all_simple_paths(G, n, target, L) for target in G.nodes_iter()):
print(paths)
result+=paths
Here is another (rather naive) implementation I came up with after reading the answers here:
def findAllPaths(node, childrenFn, depth, _depth=0, _parents={}):
if _depth == depth - 1:
# path found with desired length, create path and stop traversing
path = []
parent = node
for i in xrange(depth):
path.insert(0, parent)
if not parent in _parents:
continue
parent = _parents[parent]
if parent in path:
return # this path is cyclic, forget
yield path
return
for nb in childrenFn(node):
_parents[nb] = node # keep track of where we came from
for p in findAllPaths(nb, childrenFn, depth, _depth + 1, _parents):
yield p
graph = {
0: [1, 2],
1: [4, 5],
2: [3, 10],
3: [8, 9],
4: [6],
5: [6],
6: [7],
7: [],
8: [],
9: [],
10: [2] # cycle
}
for p in findAllPaths(0, lambda n: graph[n], depth=4):
print(p)
# [0, 1, 4, 6]
# [0, 1, 5, 6]
# [0, 2, 3, 8]
# [0, 2, 3, 9]