How to isolate these (image included) paths in a graph?

How to isolate these (image included) paths in a graph? - python

I have a graph and want to isolate distinct paths from it. Since I can't phrase this easily in graph jargon, here is a sketch:
On the left side is a highly simplified representation of my source graph. The graph has nodes with only 2 neighbors (shown as blue squares). Then it has intersection and end nodes with more than 2 neighbors or exactly 1 neighbor (shown as red dots).
On the right side, coded in three different colors, are the paths I want to isolate as a result. I want to isolate alls paths connecting the red dots. The resulting paths must not cross (go through) any red dots. Each edge may only be part of one distinct result path. No edges should remain (shortest path length is 1).
I am sure this is a known task in the graph world. I have modeled the graph in NetworkX, which I'm using for the first time, and can't find the proper methods to do this. I'm sure I could code it the hard way, but would be glad to use a simple and fast method if it existed. Thanks!
Edit: After randomly browsing the NetworkX documentation I came across the all_simple_paths method. My idea is now to
iterate all nodes and identify the red dots (number of neighbors != 2)
use all_simple_paths() pairwise for the red dots, collect the resulting paths
deduplicate the resulting paths, throw away everything that contains a red dot except as the start and end node
Step 2, of course, won't scale well. With ~2000 intersection nodes, this seems still possible though.
Edit 2: all_simple_paths appears to be way too slow to use it this way.

I propose to find all straight nodes (i. e. nodes which have exactly two neighbors) and from the list of those build up a list of all your straight paths by picking one straight node by random and following its two leads to their two ends (the first non-straight nodes).
In code:
def eachStraightPath(g):
straightNodes = { node for node in g.node if len(g.edge[node]) == 2 }
print straightNodes
while straightNodes:
straightNode = straightNodes.pop()
straightPath = [ straightNode ]
neighborA, neighborB = g.edge[straightNode].keys()
while True: # break out later if node is not straight
straightPath.insert(0, neighborA)
if neighborA not in straightNodes:
break
newNeighborA = (set(g.edge[neighborA]) ^ { straightPath[1] }).pop()
straightNodes.remove(neighborA)
neighborA = newNeighborA
while True: # break out later if node is not straight
straightPath.append(neighborB)
if neighborB not in straightNodes:
break
newNeighborB = (set(g.edge[neighborB]) ^ { straightPath[-2] }).pop()
straightNodes.remove(neighborB)
neighborB = newNeighborB
yield straightPath
g = nx.lollipop_graph(5, 7)
for straightPath in eachStraightPath(g):
print straightPath
If your graph is very large and you do not want to hold a set of all straight nodes in memory, then you can iterate through them instead, but then the check whether the next neighbor is straight will become less readable (though maybe even faster). The real problem with that approach would be that you'd have to introduce a check to prevent straight paths from being yielded more than once.

Related

Find a path in a directed graph with a given length, allowing for loops and negative lengths

I have a directed graph with a maximum of 7 nodes. Every node is connected to every other node (not including itself of course) with a directed edge, and edges can have either positive or negative weights. My objective is to find a path from one given node to another, such that the path has a specific length. However, there's a catch. Not only can I make use of loops, if I reach the end node, the path doesn't have to immediately end. This means that I can have a simple path leading to the end node, and then have a loop out of the end node leading back into itself that ultimately. At the same time, I have to maximize the number of unique nodes visited by this path, so that if there are multiple paths of the desired length, I get the one with the most nodes in it.
Besides the problem with loops, I'm having trouble rephrasing this in terms of other simpler problems, like maybe Shortest Path, or Traveling Salesman. I'm not sure how to start tackling this problem. I had an idea of finding all simple paths and all loops, and recursively take combinations of each, but this brings up problems of loops within loops. Is there a more efficient approach to this problem?
Btw, I'm writing this in python.
EDIT: Another thing I forgot to mention that pairs of directed edges between nodes need not necessarily have the same weight. So A -> B might have weight -1, but B -> A might have weight 9.
EDIT 2: As requested, here's the input and output: I'm given the graph, the starting and exit nodes, and the desired length, and I return the path of desired length with the most visited nodes.

Sounds like a combinations problem. Since you don't have a fixed end state.
Let's list what we know.
Every node is connected to every other node, though it is directed. This is a complete digraph. Link: https://en.wikipedia.org/wiki/Complete_graph.
You can cut off the algorithm when it exceeds the desired distance.
Be careful of an infinite loop though; possible if the negative weights are able to equal the positive ones.
In this example, I'd use recursion with a maximum depth that is based on the total number of nodes. While I won't do your homework I'll attempt a pseudo-code start.
def recursion(depth, graph, path, previous_node, score, results):
// 1A. Return if max depth exceeded
// 1B. Return if score exceeded
// 1C. Return if score match AND append path to results
// 2. iterate and recurse through graph:
for node in graph:
path.append(node.name)
score += node.weight
recursion(depth, graph, path, node, score, results)
return results
# The results should contain all the possible paths with the given score.
This is where I'd start. Good luck.

Python Depth First Search with Dict

I keep seeing pseudocode for Depth First Search that is completely confusing to me in how it relates to my specific problem. I'm trying to determine whether or not a 'directed graph' is strongly connected.
If I have a dict with 2 strings (the first represents the source, the second represents the destination) and an optional number that represents edge weight:
{'Austin': {'Houston': 300}, 'SanFrancisco': {'Albany': 1000}, 'NewYorkCity': { 'SanDiego': True }}
How can I implement some of the elements of the DFS? I know I can start at a vertex 'Austin' and that 'Houston' is another vertex. But I don't see how any of this works in Python code
Like I have this pseudocode:
function graph_DFS(start):
# Input: start vertex
S = new Stack()
# Mark start as visited
S.push(start)
while S is not empty:
node = S.pop()
# Do something? (e.g. print)
for neighbor in node’s adjacent nodes:
if neighbor not visited:
# Mark neighbor as visited
S.push(neighbor)
I can see that I could pass 'Austin' as my start. But how in the world do I set 'Austin' to visited, and how do I see what nodes are adjacent to 'Austin'?
Plus how can I even use this algorithm to return true or false if the graph is strongly connected?
I am just having a really hard time seeing this transfer from pseudocode to code. Appreciate any help.

I can see that I could pass 'Austin' as my start. But how in the world
do I set 'Austin' to visited, and how do I see what nodes are adjacent
to 'Austin'?
You can see in the code that you pop out 'Austin', so we will not be looking back at it. In your data structure, you allow only one edge from a vertex, so you will never have more than one neighbor.
Plus how can I even use this algorithm to return true or false if the graph is strongly connected?
This is just a utility DFS function, the algorithm to find whether a graph is strongly connected or not, required running DFS twice. Basically, the hint is you want to know whether you get a tree or a forest on running DFS.
In my opinion, you should update your data structure such that the value of every key is a list (vertices to which it has an edge). You can store weights as well in the list in case you need them later.

Remove illegal edges from the permuation set (python)

I am trying to apply brute-force method to find the shortest path between an origin and a destination node (OD pair). I create the network using networkX and call the permutations followed by the application of brute force. If all the nodes in the network are connected with all others, this is ok. But if some or many edges are not there, this method is not gonna work out.
To make it right, i should delete all the permuations which are containing the illegal edges.
For example if two permutation tuples are
[(1,2,3,4,5), (1,2,4,3,5)]
and in my network no edge exist between node 2 and 3, the first tuple in the mentioned list should be deleted.
First question: Is it an efficient way to first create permutations and then go in there and delete those containing illegal edges? if not what should I do?
Second question: If yes, my strategy is that I am first creating a list of tuples containing all illegal edges from networkx "G.has_edge(u,v)" command and then going into the permutations and looking if such an edge exist, delete that permutation and so on. Is it a good strategy? if no, what else do you suggest.
Thank you :)

Exact solution for general TSP is recognized as non-polynomial. Enumerating all permutations, being the most straight-forward approach, is valid despite of its O(n!) complexity. Refer to wikipedia page of TSP for more information.
As for your specific problem, generating valid permutation is possible using depth-first search over the graph.
Python-like pseudo code showing this algorithm is as following:
def dfs(vertex, visited):
if vertex == target_vertex:
visited.append(target_vertex)
if visited != graph.vertices:
return
do_it(visited) # visited is a valid permutation
for i in graph.vertices:
if not i in visited and graph.has_edge(vertex, i):
dfs(i, visited + [i])
dfs(start_vertex, [start_vertex])

Algorithm to find "good" neighbours - graph coloring?

I have a group of people and for each of them a list of friends, and a list of foes. I want to line them up (no circle as on a table) so that preferrable no enemies but only friends are next to each other.
Example with the Input: https://gist.github.com/solars/53a132e34688cc5f396c
I think I need to use graph coloring to solve this, but I'm not sure how - I think I have to leave out the friends (or foes) list to make it easier and map to a graph.
Does anyone know how to solve such problems and can tell me if I'm on the right path?
Code samples or online examples would also be nice, I don't mind the programming language, I usually use Ruby, Java, Python, Javascript
Thanks a lot for your help!

It is already mentioned in the comments, that this problem is equivalent to travelling salesman problem. I would like to elaborate on that:
Every person is equivalent to a vertex and the edges are between vertices which represents persons who can seat to each other. Now, finding a possible seating arrangement is equivalent to finding a Hamiltonian path in the graph.
So this problem is NPC. The most naive solution would be to try all possible permutations resulting in a O(n!) running time. There are a lot of well known approaches which perform better than O(n!) and are freely available on the web. I would like to mention Held-Karp, which runs in O(n^2*2^n)and is pretty straight forward to code, here in python:
#graph[i] contains all possible neighbors of the i-th person
def held_karp(graph):
n = len(graph)#number of persons
#remember the set of already seated persons (as bitmask) and the last person in the line
#thus a configuration consists of the set of seated persons and the last person in the line
#start with every possible person:
possible=set([(2**i, i) for i in xrange(n)])
#remember the predecessor configuration for every possible configuration:
preds=dict([((2**i, i), (0,-1)) for i in xrange(n)])
#there are maximal n persons in the line - every iterations adds a person
for _ in xrange(n-1):
next_possible=set()
#iterate through all possible configurations
for seated, last in possible:
for neighbor in graph[last]:
bit_mask=2**neighbor
if (bit_mask&seated)==0: #this possible neighbor is not yet seated!
next_config=(seated|bit_mask, neighbor)#add neighbor to the bit mask of seated
next_possible.add(next_config)
preds[next_config]=(seated, last)
possible=next_possible
#now reconstruct the line
if not possible:
return []#it is not possible for all to be seated
line=[]
config=possible.pop() #any configuration in possible has n person seated and is good enough!
while config[1]!=-1:
line.insert(0, config[1])
config=preds[config]#go a step back
return line
Disclaimer: this code is not properly tested, but I hope you can get the gist of it.

Minimum removed nodes required to cut path from A to B algorithm in Python

I am trying to solve a problem related to graph theory but can't seem to remember/find/understand the proper/best approach so I figured I'd ask the experts...
I have a list of paths from two nodes (1 and 10 in example code). I'm trying to find the minimum number of nodes to remove to cut all paths. I'm also only able to remove certain nodes.
I currently have it implemented (below) as a brute force search. This works fine on my test set but is going to be an issue when scaling up to a graphs that have paths in the 100K and available nodes in the 100 (factorial issue). Right now, I'm not caring about the order I remove nodes in, but I will at some point want to take that into account (switch sets to list in code below).
I believe there should be a way to solve this using a max flow/min cut algorithm. Everything I'm reading though is going way over my head in some way. It's been several (SEVERAL) years since doing this type of stuff and I can't seem to remember anything.
So my questions are:
1) Is there a better way to solve this problem other than testing all combinations and taking the smallest set?
2) If so, can you either explain it or, preferably, give pseudo code to help explain? I'm guessing there is probably a library that already does this in some way (I have been looking and using networkX lately but am open to others)
3) If not (or even of so), suggestions for how to multithread/process solution? I want to try to get every bit of performance I can from computer. (I have found a few good threads on this question I just haven't had a chance to implement so figured I'd ask at same time just in chance. I first want to get everything working properly before optimizing.)
4) General suggestions on making code more "Pythonic" (probably will help with performance too). I know there are improvements I can make and am still new to Python.
Thanks for the help.
#!/usr/bin/env python
def bruteForcePaths(paths, availableNodes, setsTested, testCombination, results, loopId):
#for each node available, we are going to
# check if we have already tested set with node
# if true- move to next node
# if false- remove the paths effected,
# if there are paths left,
# record combo, continue removing with current combo,
# if there are no paths left,
# record success, record combo, continue to next node
#local copy
currentPaths = list(paths)
currentAvailableNodes = list(availableNodes)
currentSetsTested = set(setsTested)
currentTestCombination= set(testCombination)
currentLoopId = loopId+1
print "loop ID: %d" %(currentLoopId)
print "currentAvailableNodes:"
for set1 in currentAvailableNodes:
print " %s" %(set1)
for node in currentAvailableNodes:
#add to the current test set
print "%d-current node: %s current combo: %s" % (currentLoopId, node, currentTestCombination)
currentTestCombination.add(node)
# print "Testing: %s" % currentTestCombination
# print "Sets tested:"
# for set1 in currentSetsTested:
# print " %s" % set1
if currentTestCombination in currentSetsTested:
#we already tested this combination of nodes so go to next node
print "Already test: %s" % currentTestCombination
currentTestCombination.remove(node)
continue
#get all the paths that don't have node in it
currentRemainingPaths = [path for path in currentPaths if not (node in path)]
#if there are no paths left
if len(currentRemainingPaths) == 0:
#save this combination
print "successful combination: %s" % currentTestCombination
results.append(frozenset(currentTestCombination))
#add to remember we tested combo
currentSetsTested.add(frozenset(currentTestCombination))
#now remove the node that was add, and go to the next one
currentTestCombination.remove(node)
else:
#this combo didn't work, save it so we don't test it again
currentSetsTested.add(frozenset(currentTestCombination))
newAvailableNodes = list(currentAvailableNodes)
newAvailableNodes.remove(node)
bruteForcePaths(currentRemainingPaths,
newAvailableNodes,
currentSetsTested,
currentTestCombination,
results,
currentLoopId)
currentTestCombination.remove(node)
print "-------------------"
#need to pass "up" the tested sets from this loop
setsTested.update(currentSetsTested)
return None
if __name__ == '__main__':
testPaths = [
[1,2,14,15,16,18,9,10],
[1,2,24,25,26,28,9,10],
[1,2,34,35,36,38,9,10],
[1,3,44,45,46,48,9,10],
[1,3,54,55,56,58,9,10],
[1,3,64,65,66,68,9,10],
[1,2,14,15,16,7,10],
[1,2,24,7,10],
[1,3,34,35,7,10],
[1,3,44,35,6,10],
]
setsTested = set()
availableNodes = [2, 3, 6, 7, 9]
results = list()
currentTestCombination = set()
bruteForcePaths(testPaths, availableNodes, setsTested, currentTestCombination, results, 0)
print "results:"
for result in sorted(results, key=len):
print result
UPDATE:
I reworked the code using itertool for generating the combinations. It make the code cleaner and faster (and should be easier to multiprocess. Now to try to figure out the dominate nodes as suggested and multiprocess function.
def bruteForcePaths3(paths, availableNodes, results):
#start by taking each combination 2 at a time, then 3, etc
for i in range(1,len(availableNodes)+1):
print "combo number: %d" % i
currentCombos = combinations(availableNodes, i)
for combo in currentCombos:
#get a fresh copy of paths for this combiniation
currentPaths = list(paths)
currentRemainingPaths = []
# print combo
for node in combo:
#determine better way to remove nodes, for now- if it's in, we remove
currentRemainingPaths = [path for path in currentPaths if not (node in path)]
currentPaths = currentRemainingPaths
#if there are no paths left
if len(currentRemainingPaths) == 0:
#save this combination
print combo
results.append(frozenset(combo))
return None

Here is an answer which ignores the list of paths. It just takes a network, a source node, and a target node, and finds the minimum set of nodes within the network, not either source or target, so that removing these nodes disconnects the source from the target.
If I wanted to find the minimum set of edges, I could find out how just by searching for Max-Flow min-cut. Note that the Wikipedia article at http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem#Generalized_max-flow_min-cut_theorem states that there is a generalized max-flow min-cut theorem which considers vertex capacity as well as edge capacity, which is at least encouraging. Note also that edge capacities are given as Cuv, where Cuv is the maximum capacity from u to v. In the diagram they seem to be drawn as u/v. So the edge capacity in the forward direction can be different from the edge capacity in the backward direction.
To disguise a minimum vertex cut problem as a minimum edge cut problem I propose to make use of this asymmetry. First of all give all the existing edges a huge capacity - for example 100 times the number of nodes in the graph. Now replace every vertex X with two vertices Xi and Xo, which I will call the incoming and outgoing vertices. For every edge between X and Y create an edge between Xo and Yi with the existing capacity going forwards but 0 capacity going backwards - these are one-way edges. Now create an edge between Xi and Xo for each X with capacity 1 going forwards and capacity 0 going backwards.
Now run max-flow min-cut on the resulting graph. Because all the original links have huge capacity, the min cut must all be made up of the capacity 1 links (actually the min cut is defined as a division of the set of nodes into two: what you really want is the set of pairs of nodes Xi, Xo with Xi in one half and Xo in the other half, but you can easily get one from the other). If you break these links you disconnect the graph into two parts, as with standard max-flow min-cut, so deleting these nodes will disconnect the source from the target. Because you have the minimum cut, this is the smallest such set of nodes.
If you can find code for max-flow min-cut, such as those pointed to by http://www.cs.sunysb.edu/~algorith/files/network-flow.shtml I would expect that it will give you the min-cut. If not, for instance if you do it by solving a linear programming problem because you happen to have a linear programming solver handy, notice for example from http://www.cse.yorku.ca/~aaw/Wang/MaxFlowMinCutAlg.html that one half of the min cut is the set of nodes reachable from the source when the graph has been modifies to subtract out the edge capacities actually used by the solution - so given just the edge capacities used at max flow you can find it pretty easily.

If the paths were not provided as part of the problem I would agree that there should be some way to do this via http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem, given a sufficiently ingenious network construction. However, because you haven't given any indication as to what is a reasonable path and what is not I am left to worry that a sufficiently malicious opponent might be able to find strange collections of paths which don't arise from any possible network.
In the worst case, this might make your problem as difficult as http://en.wikipedia.org/wiki/Set_cover_problem, in the sense that somebody, given a problem in Set Cover, might be able to find a set of paths and nodes that produces a path-cut problem whose solution can be turned into a solution of the original Set Cover problem.
If so - and I haven't even attempted to prove it - your problem is NP-Complete, but since you have only 100 nodes it is possible that some of the many papers you can find on Set Cover will point at an approach that will work in practice, or can provide a good enough approximation for you. Apart from the Wikipedia article, http://www.cs.sunysb.edu/~algorith/files/set-cover.shtml points you at two implementations, and a quick search finds the following summary at the start of a paper in http://www.ise.ufl.edu/glan/files/2011/12/EJORpaper.pdf:
The SCP is an NP-hard problem in the strong sense (Garey and Johnson, 1979) and many algorithms
have been developed for solving the SCP. The exact algorithms (Fisher and Kedia, 1990; Beasley and
JØrnsten, 1992; Balas and Carrera, 1996) are mostly based on branch-and-bound and branch-and-cut.
Caprara et al. (2000) compared different exact algorithms for the SCP. They show that the best exact
algorithm for the SCP is CPLEX. Since exact methods require substantial computational effort to solve
large-scale SCP instances, heuristic algorithms are often used to find a good or near-optimal solution in a
reasonable time. Greedy algorithms may be the most natural heuristic approach for quickly solving large
combinatorial problems. As for the SCP, the simplest such approach is the greedy algorithm of Chvatal
(1979). Although simple, fast and easy to code, greedy algorithms could rarely generate solutions of good
quality....

Edit: If you want to destroy in fact all paths, and not those from a given list, then max-flow techniques as explained by mcdowella is much better than this approach.
As mentioned by mcdowella, the problem is NP-hard in general. However, the way your example looks, an exact approach might be feasible.
First, you can delete all vertices from the paths that are not available for deletion. Then, reduce the instance by eliminating dominated vertices. For example, every path that contains 15 also contains 2, so it never makes sense to delete 15. In the example if all vertices were available, 2, 3, 9, and 35 dominate all other vertices, so you'd have the problem down to 4 vertices.
Then take a vertex from the shortest path and branch recursively into two cases: delete it (remove all paths containing it) or don't delete it (delete it from all paths). (If the path has length one, omit the second case.) You can then check for dominance again.
This is exponential in the worst case, but might be sufficient for your examples.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.