How to traverse an ordered "network"? - python

I have an interesting python "network" challenge. Not sure what data construct to use and am hoping to learn something new from the experts!
The following is a snippet from an ordered table (see SORT column) with a FROM NODE and TO NODE. This table represents a hydrologic network (watershed sequencing) from upstream to downstream. TO NODE = 0 indicates the bottom of the network.
Given there can be many branches, how does one traverse a network from any given FROM NODE to TO NODE = 0 and store the list of nodes that are sequenced over?
For example, starting at:
HYBAS_ID (this is same as FROM NODE)= 2120523430
see <---- about 2/3rd way down, the sequence of Nodes would include the following:
[2120523430, 2120523020, 2120523290, 2120523280,2120523270,2120020790, 0]
Table:
SORT FROM NODE TO NODE
30534 2121173490 2120522070
30533 2120521930 2120521630
30532 2120522070 2120521630
30531 2120521620 2120522930
30530 2120521630 2120522930
30529 2121172200 2121173080
30528 2120522930 2120523360
30527 2120522940 2120523360
30526 2120519380 2120523170
30525 2120519520 2120523170
30524 2120523440 2120523350
30523 2120523360 2120523350
30522 2120525750 2120523430
30521 2120525490 2120523430
30520 2121173080 2120522820
30519 2120523430 2120523020 <------- if you start network here
30518 2120523350 2120523020
30517 2120522820 2120523290
30516 2120523020 2120523290 <------ the second node is here
30515 2120523170 2120523280
30514 2120523290 2120523280 <------ third node here
30513 2120523160 2120523270
30512 2120523280 2120523270 <------ fourth
30511 2120523150 2120020790
30510 2120523270 2120020790 <------ fifth
30509 2120020790 0 <------ End.
Would a dictionary and some kind of graph structure work to traverse this network? The code may be fed a list of thousands or millions of FROM NODES to calculate the routing so efficiency is important.

The standard way to do this is with an unordered dictionary representing a directed graph, and then to iterate through it. First you have to populate the dictionary (which let's say is a csv for the sake of argument), and then iterate through the values. so
my_graph = {}
with open("myfile.csv") as f:
for l in f:
split_lines = l.split(",")
my_graph[int(split_lines[1])] = int(split_lines[2])
HYBAS_ID = 2120523430
current_hybas = my_graph[HYBAS_ID]
return_value = [HYBAS_ID]
while current_hybas != 0:
return_value.append(current_hybas)
current_hybas = my_graph[current_hybas]
print return_value
Should approximately get you what you want.

Related

Untrackable object attribute

I am trying to adapt this code here: https://github.com/nachonavarro/gabes/blob/master/gabes/circuit.py (line 136)
but am coming across an issue because several times the attribute .chosen_label is used but I can find no mention of it anywhere in the code. The objects left_gate, right_gate and gate are Gate objects (https://github.com/nachonavarro/gabes/blob/master/gabes/gate.py)
def reconstruct(self, labels):
levels = [[node for node in children]
for children in anytree.LevelOrderGroupIter(self.tree)][::-1]
for level in levels:
for node in level:
gate = node.name
if node.is_leaf:
garblers_label = labels.pop(0)
evaluators_label = labels.pop(0)
else:
left_gate = node.children[0].name
right_gate = node.children[1].name
garblers_label = left_gate.chosen_label
evaluators_label = right_gate.chosen_label
output_label = gate.ungarble(garblers_label, evaluators_label)
gate.chosen_label = output_label
return self.tree.name.chosen_label
The code runs without error and the .chosen_label is a Label object (https://github.com/nachonavarro/gabes/blob/master/gabes/label.py)
Any help would be much appreciated
The attribute is set in the same method:
for level in levels:
for node in level:
gate = node.name
if node.is_leaf:
# set `garblers_label` and `evaluators_label` from
# the next two elements of the `labels` argument
else:
# use the child nodes of this node to use their gates, and
# set `garblers_label` and `evaluators_label` to the left and
# right `chosen_label` values, respectively.
# generate the `Label()` instance based on `garblers_label` and `evaluators_label`
output_label = gate.ungarble(garblers_label, evaluators_label)
gate.chosen_label = output_label
I'm not familiar with the anytree library, so I had to look up the documentation: the anytree.LevelOrderGroupIter(...) function orders the nodes in a tree from root to leaves, grouped by level. The tree here appears to be a balanced binary tree (each node has either 0 or 2 child nodes), so you get a list with [(rootnode,), (level1_left, level1_right), (level2_left_left, level2_left_right, level2_right_left, level2_right_right), ...]. The function loops over these levels in reverse order. This means that leaves are processed first.
Once all node.is_leaf nodes have their chosen_label set, the other non-leaf nodes can reference the chosen_label value on the leaf nodes on the level already processed before them.
So, assuming that labels is a list with at least twice the number of leaf nodes in the tree, you end up with those label values aggregated at every level via the gate.ungarble() function, and the final value is found at the root node via self.tree.name.chosen_label.

Improving BFS performance with some kind of memoization

I have this issue that I'm trying to build an algorithm which will find distances from one vertice to others in graph.
Let's say with the really simple example that my network looks like this:
network = [[0,1,2],[2,3,4],[4,5,6],[6,7]]
I created a BFS code which is supposed to find length of paths from the specified source to other graph's vertices
from itertools import chain
import numpy as np
n = 8
graph = {}
for i in range(0, n):
graph[i] = []
for communes in communities2:
for vertice in communes:
work = communes.copy()
work.remove(vertice)
graph[vertice].append(work)
for k, v in graph.items():
graph[k] = list(chain(*v))
def bsf3(graph, s):
matrix = np.zeros([n,n])
dist = {}
visited = []
queue = [s]
dist[s] = 0
visited.append(s)
matrix[s][s] = 0
while queue:
v = queue.pop(0)
for neighbour in graph[v]:
if neighbour in visited:
pass
else:
matrix[s][neighbour] = matrix[s][v] + 1
queue.append(neighbour)
visited.append(neighbour)
return matrix
bsf3(graph,2)
First I'm creating graph (dictionary) and than use the function to find distances.
What I'm concerned about is that this approach doesn't work with larger networks (let's say with 1000 people in there). And what I'm thinking about is to use some kind of memoization (actually that's why I made a matrix instead of list). The idea is that when the algorithm calculates the path from let's say 0 to 3 (what it does already) it should keep track for another routes in such a way that matrix[1][3] = 1 etc.
So I would use the function like bsf3(graph, 1) it would not calculate everything from scratch, but would be able to access some values from matrix.
Thanks in advance!
Knowing this not fully answer your question, but this is another approach you cabn try.
In networks you will have a routing table for each node inside your network. You simple save a list of all nodes inside the network and in which node you have to go. Example of routing table of node D
A -> B
B -> B
C -> E
D -> D
E -> E
You need to run BFS on each node to build all routing table and it will take O(|V|*(|V|+|E|). The space complexity is quadratic but you have to check all possible paths.
When you create all this information you can simple start from a node and search for your destination node inside the table and find the next node to go. This will give a more better time complexity (if you use the right data structure for the table).

How to cleanly avoid loops in recursive function (breadth-first traversal)

I'm writing a recursive breadth-first traversal of a network. The problem I ran into is that the network often looks like this:
1
/ \
2 3
\ /
4
|
5
So my traversal starts at 1, then traverses to 2, then 3. The next stop is to proceed to 4, so 2 traverses to 4. After this, 3 traverses to 4, and suddenly I'm duplicating work as both lines try to traverse to 5.
The solution I've found is to create a list called self.already_traversed, and every time a node is traversed, I append it to the list. Then, when I'm traversing from node 4, I check to make sure it hasn't already been traversed.
The problem here is that I'm using an instance variable for this, so I need a way to set up the list before the first recursion and a way to clean it up afterwards. The way I'm currently doing this is:
self.already_traversed = []
self._traverse_all_nodes(start_id)
self.already_traversed = []
Of course, it sucks to be twoggling variables outside of the function that's using them. Is there a better way to do this so this can be put into my traversal function?
Here's the actual code, though I recognize it's a bit dense:
def _traverse_all_nodes(self, root_authority, max_depth=6):
"""Recursively build a networkx graph
Process is:
- Work backwards through the authorities for self.cluster_end and all
of its children.
- For each authority, add it to a networkx graph, if:
- it happened after self.cluster_start
- it's in the Supreme Court
- we haven't exceeded a max_depth of six cases.
- we haven't already followed this path
"""
g = networkx.Graph()
if hasattr(self, 'already_traversed'):
is_already_traversed = (root_authority.pk in self.visited_nodes)
else:
# First run. Create an empty list.
self.already_traversed = []
is_already_traversed = False
is_past_max_depth = (max_depth <= 0)
is_cluster_start_obj = (root_authority == self.cluster_start)
blocking_conditions = [
is_past_max_depth,
is_cluster_start_obj,
is_already_traversed,
]
if not any(blocking_conditions):
print " No blocking conditions. Pressing on."
self.visited_nodes.append(root_authority.pk)
for authority in root_authority.authorities.filter(
docket__court='scotus',
date_filed__gte=self.cluster_start.date_filed):
g.add_edge(root_authority.pk, authority.pk)
# Combine our present graph with the result of the next
# recursion
g = networkx.compose(g, self._build_graph(
authority,
max_depth - 1,
))
return g
def add_clusters(self):
"""Do the network analysis to add clusters to the model.
Process is to:
- Build a networkx graph
- For all nodes in the graph, add them to self.clusters
"""
self.already_traversed = []
g = self._traverse_all_nodes(
self.cluster_end,
max_depth=6,
)
self.already_traversed = []
Check out:
How do I pass a variable by reference?
which contains an example on how to past a list by reference. If you pass the list by reference, every call to your function will refer to the same list.

Algorithm Is Node A Connected to Node B in Graph

I am looking for an algorithm to check for any valid connection (shortest or longest) between two arbitrary nodes on a graph.
My graph is fixed to a grid with logical (x, y) coordinates with north/south/east/west connections, but nodes can be removed randomly so you can't assume that taking the edge with coords closest to the target is always going to get you there.
The code is in python. The data structure is each node (object) has a list of connected nodes. The list elements are object refs, so we can then search that node's list of connected nodes recursively, like this:
for pnode in self.connected_nodes:
for cnode in pnode.connected_nodes:
...etc
I've included a diagram showing how the nodes map to x,y coords and how they are connected in north/east/south/west. Sometimes there are missing nodes (i.e between J and K), and sometimes there are missing edges (i.e between G and H). The presence of nodes and edges is in flux (although when we run the algorithm, it is taking a fixed snapshot in time), and can only be determined by checking each node for it's list of connected nodes.
The algorithm needs to yield a simple true/false to whether there is a valid connection between two nodes. Recursing through every list of connected nodes explodes the number of operations required - if the node is n edges away, it requires at most 4^n operations. My understanding is something like Dijistrka's algorithm works by finding the shortest path based on edge weights, but if there is no connection at all then would it still work?
For some background, I am using this to model 2D destructible objects. Each node represents a chunk of the material, and if one or more nodes do not have a connection to the rest of the material then it should separate off. In the diagram - D, H, R - should pare off from the main body as they are not connected.
UPDATE:
Although many of the posted answers might well work, DFS is quick, easy and very appropriate. I'm not keen on the idea of sticking extra edges between nodes with high value weights to use Dijkstra because node's themselves might disappear as well as edges. The SSC method seems more appropriate for distinguishing between strong and weakly connected graph sections, which in my graph would work if there was a single edge between G and H.
Here is my experiment code for DFS search, which creates the same graph as shown in the diagram.
class node(object):
def __init__(self, id):
self.connected_nodes = []
self.id = id
def dfs_is_connected(self, node):
# Initialise our stack and our discovered list
stack = []
discovered = []
# Declare operations count to track how many iterations it took
op_count = 0
# Push this node to the stack, for our starting point
stack.append(self)
# Keeping iterating while the stack isn't empty
while stack:
# Pop top element off the stack
current_node = stack.pop()
# Is this the droid/node you are looking for?
if current_node.id == node.id:
# Stop!
return True, op_count
# Check if current node has not been discovered
if current_node not in discovered:
# Increment op count
op_count += 1
# Is this the droid/node you are looking for?
if current_node.id == node.id:
# Stop!
return True, op_count
# Put this node in the discovered list
discovered.append(current_node)
# Iterate through all connected nodes of the current node
for connected_node in current_node.connected_nodes:
# Push this connected node into the stack
stack.append(connected_node)
# Couldn't find the node, return false. Sorry bud
return False, op_count
if __name__ == "__main__":
# Initialise all nodes
a = node('a')
b = node('b')
c = node('c')
d = node('d')
e = node('e')
f = node('f')
g = node('g')
h = node('h')
j = node('j')
k = node('k')
l = node('l')
m = node('m')
n = node('n')
p = node('p')
q = node('q')
r = node('r')
s = node('s')
# Connect up nodes
a.connected_nodes.extend([b, e])
b.connected_nodes.extend([a, f, c])
c.connected_nodes.extend([b, g])
d.connected_nodes.extend([r])
e.connected_nodes.extend([a, f, j])
f.connected_nodes.extend([e, b, g])
g.connected_nodes.extend([c, f, k])
h.connected_nodes.extend([r])
j.connected_nodes.extend([e, l])
k.connected_nodes.extend([g, n])
l.connected_nodes.extend([j, m, s])
m.connected_nodes.extend([l, p, n])
n.connected_nodes.extend([k, m, q])
p.connected_nodes.extend([s, m, q])
q.connected_nodes.extend([p, n])
r.connected_nodes.extend([h, d])
s.connected_nodes.extend([l, p])
# Check if a is connected to q
print a.dfs_is_connected(q)
print a.dfs_is_connected(d)
print p.dfs_is_connected(h)
To find this out, you just need to run simple DFS or BFS algorithm on one of the nodes, it'll find all reachable nodes within a continuous component of the graph, so you just mark it down if you've found the other node during the run of algorithm.
There is a way to use Dijkstra to find the path. If there is an edge between two nodes put 1 for weight, if there is no node, put weight of sys.maxint. Then when the min path is calculated, if it is larger than the number of nodes - there is no path between them.
Another approach is to first find the strongly connected components of the graph. If the nodes are on the same strong component then use Dijkstra to find the path, otherwise there is no path that connects them.
You could take a look at the A* Path Finding Algorithm (which uses heuristics to make it more efficient than Dijkstra's, so if there isn't anything you can exploit in your problem, you might be better off using Dijkstra's algorithm. You would need positive weights though. If this is not something you have in your graph, you could simply give each edge a weight of 1).
Looking at the pseudo code on Wikipedia, A* moves from one node to another by getting the neighbours of the current node. Dijkstra's Algorithm keeps an adjacency list so that it knows which nodes are connected to each other.
Thus, if you where to start from node H, you could only go to R and D. Since these nodes are not connected to the others, the algorithm will not go through the other nodes.
You can find strongly connected components(SCC) of your graph and then check if nodes of interest in one component or not. In your example H-R-D will be first component and rest second, so for H and R result will be true but H and A false.
See SCC algorithm here: https://class.coursera.org/algo-004/lecture/53.

By having a list with mazes houses(2 dimensions), how do i create a directed graph with a dictionary

I can only make an undirected graph. no idea on how i can make a directed one.
Any idea?
Apologies for the rather long winded post. I had time to kill on the train.
I'm guessing what you're after is a directed graph representing all paths leading away from your starting position (as opposed to a graph representation of the maze which once can use to solve arbitrary start/end positions).
(No offence meant, but) this sounds like homework, or at least, a task that is very suitable for homework. With this in mind, the following solution focuses on simplicity rather than performance or elegance.
Approach
One straight-forward way to do this would be to first store your map in a more navigable format, then, beginning with the start node do the following:
look up neighbours (top, bottom, left, right)
for each neighbour:
if it is not a possible path, ignore
if we have processed this node before, ignore
else, add this node as an edge and push it a queue (not a stack, more on this later) for further processing
for each node in the queue/stack, repeat from step 1.
(See example implementation below)
At this point, you'll end up with a directed acyclic graph (DAG) with the starting node at the top of the tree and end node as one of the leaves. Solving this would be easy at this point. See this answer on solving a maze representing as a graph.
A possible optimisation when building the graph would be to stop once the end point is found. You'll end up with an incomplete graph, but if you're only concerned about the final solution this doesn't matter.
stack or queue?
Note that using a stack (first in last out) would mean building the graph in a depth-first manner, while using a queue (first in first out) would result in a breadth-first approach.
You would generally want to use a queue (breadth first if the intention is to look for the shortest path. Consider the following map:
START
######## ######
######## ######
### b a ######
### ## ######
### ## e ######
### c d ######
######## ######
######## END
#################
If the path is traversed depth-first and at branch a you happen take the a->b path before a->e, you end up with the graph:
START
|
a
/ \
b e <-- end, since d already visited
|
c
|
d
\
END
However, using a breadth-first approach the a->e path would come across node d earlier, resulting in a shorter graph and a better solution:
START
|
a
/ \
b e
| |
c d
|
END
Example code
Sample input provided:
..........
#########.
..........
.#########
......#...
#####...#.
##...####.
##.#......
...#######
e = (0,0)
s = (8,0)
DISCLAIMER: The following code is written for clarity, not speed. It is not fully tested so there is no guarantee of correctness but it should give you an idea of what is possible.
We assumes that the input file is formatted consistently. Most error checking left out for brevity.
# regex to extract start/end positions
import re
re_sepos = re.compile("""
^([se])\s* # capture first char (s or e) followed by arbitrary spaces
=\s* # equal sign followed by arbitrary spaces
\( # left parenthesis
(\d+),(\d+) # capture 2 sets of integers separated by comma
\) # right parenthesis
""", re.VERBOSE)
def read_maze(filename):
"""
Reads input from file and returns tuple (maze, start, end)
maze : dict holding value of each maze cell { (x1,y1):'#', ... }
start: start node coordinage (x1,y1)
end : end node coordinate (x2,y2)
"""
# read whole file into a list
f = open(filename, "r")
data = f.readlines()
f.close()
# parse start and end positions from last 2 lines
pos = {}
for line in data[-2:]:
match = re_sepos.match(line)
if not match:
raise ValueError("invalid input file")
c,x,y = match.groups() # extract values
pos[c] = (int(x),int(y))
try:
start = pos["s"]
end = pos["e"]
except KeyError:
raise ValueError("invalid input file")
# read ascii maze, '#' for wall '.' for empty slor
# store as maze = { (x1,y1):'#', (x2,y2):'.', ... }
# NOTE: this is of course not optimal, but leads to a simpler access later
maze = {}
for line_num, line in enumerate(data[:-3]): # ignore last 3 lines
for col_num, value in enumerate(line[:-1]): # ignore \n at end
maze[(line_num, col_num)] = value
return maze, start, end
def maze_to_dag(maze, start, end):
"""
Traverses the map starting from start coordinate.
Returns directed acyclic graph in the form {(x,y):[(x1,y1),(x2,y2)], ...}
"""
dag = {} # directed acyclic graph
queue = [start] # queue of nodes to process
# repeat till queue is empty
while queue:
x,y = queue.pop(0) # get next node in queue
edges = dag[(x,y)] = [] # init list to store edges
# for each neighbour (top, bottom, left, right)
for coord in ((x,y-1), (x,y+1), (x-1,y), (x+1,y)):
if coord in dag.keys(): continue # visited before, ignore
node_value = maze.get(coord, None) # Return None if outside maze
if node_value == ".": # valid path found
edges.append(coord) # add as edge
queue.append(coord) # push into queue
# uncomment this to stop once we've found the end point
#if coord == end: return dag
return dag
if __name__ == "__main__":
maze,start,end = read_maze("l4.txt")
dag = maze_to_dag(maze, start, end)
print dag
This page provides a nice tutorial on implementing graphs with python. From the article, this is an example of a directory graph represented by dictionary:
graph = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']}
That said, you might also want to look into existing graph libraries such as NetworkX and igraph.
Since you already have a list, try creating an Adjacency Matrix instead of a dictionary.
list_of_houses = []
directed_graph = [][]
for i in xrange(len(list_of_houses)):
for i in xrange(len(list_of_houses)):
directed_graph[i][i] = 0
Then for any new edge from one house to another (or w/e the connection is)
directed_graph[from_house][to_house] = 1
and you're done. If there is an edge from house_a to house_b then directed_graph[house_a][house_b] == 1.

Categories

Resources