Goal: Trying to convert some of the lines of an algorithm written in python to pseudocode.
Goal of the given algorithm: Find all cycles in a directed graph with cycles.
Where I stand: I well understand the theory behind the algorithm, I have also coded different versions on my own, however I cannot write an algorithm that small, efficient and correct on my own.
Source: stackoverflow
What I have done so far: I cannot describe enough how many weeks spent on it, have coded Tarjan, various versions DFS, Flloyd etc in php but unfortunately they are partial solutions only and one have to extend them more.
In addition: I have run this algorithm online and it worked, I need it for a school project that I am stack and cannot proceed further.
This is the algorithm:
def paths_rec(path,edges):
if len(path) > 0 and path[0][0] == path[-1][1]:
print "cycle", path
return #cut processing when find a cycle
if len(edges) == 0:
return
if len(path) == 0:
#path is empty so all edges are candidates for next step
next_edges = edges
else:
#only edges starting where the last one finishes are candidates
next_edges = filter(lambda x: path[-1][1] == x[0], edges)
for edge in next_edges:
edges_recursive = list(edges)
edges_recursive.remove(edge)
#recursive call to keep on permuting possible path combinations
paths_rec(list(path) + [edge], edges_recursive)
def all_paths(edges):
paths_rec(list(),edges)
if __name__ == "__main__":
#edges are represented as (node,node)
# so (1,2) represents 1->2 the edge from node 1 to node 2.
edges = [(1,2),(2,3),(3,4),(4,2),(2,1)]
all_paths(edges)
This is what I have managed to write in pseudocode from it, I have marked with #? the lines I do not understand. Once I have them in pseudocode I can code them in php with which I am a lot familiar.
procedure paths_rec (path, edges)
if size(path) > 0 and path[0][0] equals path[-1][1]
print "cycle"
for each element in path
print element
end of for
return
end of if
if size(edges) equals 0
return
end of if
if size(path) equals 0
next_edges equals edges
else
next edges equals filter(lambda x: path[-1][1] == x[0], edges) #?
end of else
for each edge in next_edges
edges_recursive = list(edges) #?
edges_recursive.remove(edge)#?
#recursive call to keep on permuting possible path combinations
paths_rec(list(path) + [edge], edges_recursive)#?
The line next_edges = filter(lambda x: path[-1][1] == x[0], edges) creates a new list containing those edges in edges whose first point is the same as the second point of the last element of the current path, and is equivalent to
next_edges = []
for x in edges:
if path[len(path) - 1][1] == x[0]
next_edges.append[x]
The lines create a new copy of the list of edges edges, so that when edge is removed from this copy, it doesn't change the list edges
edges_recursive = list(edges)
edges_recursive.remove(edge)
paths_rec(list(path) + [edge], edges_recursive) is just the recursive call with the path with edge added to the end and the new list of edges that has edge removed from it.
Related
I have implemented Dijkstra's algorithm but I have a problem. It always prints the same minimum path while there may be other paths with the same weight.
How could I change my algorithm so that it randomly selects the neighbors with the same weight?
My algorithm is below:
def dijkstra_algorithm(graph, start_node):
unvisited_nodes = list(graph.get_nodes())
# We'll use this dict to save the cost of visiting each node and update it as we move along the graph
shortest_path = {}
# We'll use this dict to save the shortest known path to a node found so far
previous_nodes = {}
# We'll use max_value to initialize the "infinity" value of the unvisited nodes
max_value = sys.maxsize
for node in unvisited_nodes:
shortest_path[node] = max_value
# However, we initialize the starting node's value with 0
shortest_path[start_node] = 0
# The algorithm executes until we visit all nodes
while unvisited_nodes:
# The code block below finds the node with the lowest score
current_min_node = None
for node in unvisited_nodes: # Iterate over the nodes
if current_min_node == None:
current_min_node = node
elif shortest_path[node] < shortest_path[current_min_node]:
current_min_node = node
# The code block below retrieves the current node's neighbors and updates their distances
neighbors = graph.get_outgoing_edges(current_min_node)
for neighbor in neighbors:
tentative_value = shortest_path[current_min_node] + graph.value(current_min_node, neighbor)
if tentative_value < shortest_path[neighbor]:
shortest_path[neighbor] = tentative_value
# We also update the best path to the current node
previous_nodes[neighbor] = current_min_node
# After visiting its neighbors, we mark the node as "visited"
unvisited_nodes.remove(current_min_node)
return previous_nodes, shortest_path
# The code block below finds all the min nodes
# and randomly chooses one for traversal
min_nodes = []
for node in unvisited_nodes: # Iterate over the nodes
if len(min_nodes) == 0:
min_nodes.append(node)
elif shortest_path[node] < shortest_path[min_nodes[0]]:
min_nodes = [node]
else:
# this is the case where 2 nodes have the same cost
# we are going to take all of them
# and at the end choose one randomly
min_nodes.append(node)
current_min_node = random.choice(min_nodes)
What the code does is as follows:
Instead of taking the first smallest element, it creates a list of all the smallest elements.
At the end it choose one of the smallest elements randomly.
This will both guarantee the Dijkstra invariant and choose a random path among the cheapest.
probably just try something like this
random.shuffle(neighbors)
for neighbor in neighbors:
...
which should visit the neighbors randomly (this assumes neighbors is a list or tuple... if its a generator call list on it first...
In the context of graph theory, I'm trying to get all the possible simple paths between two nodes.
I record the network using an adjacency matrix stored in a pandas dataframe, in a way that network[x][y] store the value of the arrow which goes from x to y.
To get the paths between two nodes, what I do is:
I get all the possible permutations with all the nodes (using it.permutations -as the path is simple there is no repetitions).
Then I use an ad hoc function: adjacent (which gives me the neighbours of a node), to check which among all the possible paths are true.
This takes too long, and it's not efficient. Do you know how I can improve the code? May be with a recursive function??
For a non relevant reason I don't want to use Networkx
def get_simple_paths(self, node, node_objective):
# Gives you all simple path between two nodes
#We get all possibilities and then we will filter it
nodes = self.nodes #The list of all nodes
possible_paths = [] #Store all possible paths
simple_paths = [] #Store the truly paths
l = 2
while l <= len(nodes):
for x in it.permutations(nodes, l): #They are neighbourgs
if x[0] == node and x[-1] == node_objective:
possible_paths.append(x)
l += 1
# Now check which of those paths exists
for x_pos, x in enumerate(possible_paths):
for i_pos, i in enumerate(x):
#We use it to check among all the path,
#if two of the nodes are not neighbours, the loop brokes
if i in self.adjacencies(x[i_pos+1]):
if i_pos+2 == len(x):
simple_paths.append(x)
break
else:
continue
else:
break
#Return simple paths
return(simple_paths)
I'm trying to calculate the area of skyline (overlapping rectangles with same baseline)
building_count = int(input())
items = {} # dictionary, location on x axis is the key, height is the value
count = 0 # total area
for j in range(building_count):
line = input().split(' ')
H = int(line[0]) # height
L = int(line[1]) # left point (start of the building)
R = int(line[2]) # right point (end of the building)
for k in range(R - L):
if not (L+k in items): # if it's not there, add it
items[L+k] = H
elif H > items[L+k]: # if we have a higher building on that index
items[L+k] = H
for value in items.values(): # we add each column basically
count += value
print(count)
sample input would be:
5
3 -3 0
2 -1 1
4 2 4
2 3 7
3 6 8
and output is 29.
The issue is memory efficiency, when there are lots of values, the script simply throws MemoryError. Anyone have some ideas for optimizing memory usage?
You are allocating a separate key-value pair for every single integer value in your range. Imagine the case where R = 1 and L = 100000. Your items dictionary will be filled with 1000000 items. Your basic idea of processing/removing overlaps is is sound, but the way you do it is massive overkill.
Like so much else in life, this is a graph problem in disguise. Imaging the vertices being the rectangles you are trying to process and the (weighted) edges being the overlaps. The complication is that you can not just add up the areas of the vertices and subtract the areas of the overlaps, because many of the overlaps overlap each other as well. The overlap issue can be resolved by applying a transformation that converts two overlapping rectangles into non-overlapping rectangles, effectively cutting the edge that connects them. The transformation is shown in the image below. Notice that in some cases one of the vertices will be removed as well, simplifying the graph, while in another case a new vertex is added:
Green: overlap to be chopped out.
Normally, if we have m rectangles and n overlaps between them, constructing the graph would be an O(m2) operation because we would have to check all vertices for overlaps against each other. However, we can bypass a construction of the input graph entirely to get a O(m + n) traversal algorithm, which is going to be optimal since we will only analyze each rectangle once, and construct the output graph with no overlaps as efficiently as possible. O(m + n) assumes that your input rectangles are sorted according to their left edges in ascending order. If that is not the case, the algorithm will be O(mlog(m) + n) to account for the initial sorting step. Note that as the graph density increases, n will go from ~m to ~m2. This confirms the intuitive idea that the fewer overlaps there are, them more you would expect the process will run in O(m) time, while the more overlaps there are, the closer you will run to O(m2) time.
The space complexity of the proposed algorithm will be O(m): each rectangle in the input will result in at most two rectangles in the output, and 2m = O(m).
Enough about complexity analysis and on to the algorithm itself. The input will be a sequence of rectangles defined by L, R, H as you have now. I will assume that the input is sorted by the leftmost edge L. The output graph will be a linked list of rectangles defined by the same parameters, sorted in descending order by the rightmost edge. The head of the list will be the rightmost rectangle. The output will have no overlaps between any rectangles, so the total area of the skyline will just be the sum of H * (R - L) for each of the ~m output rectangles.
The reason for picking a linked list is that the only two operations we need is iteration from the head node and the cheapest insertion possible to maintain the list in sorted order. The sorting will be done as part of overlap checking, so we do not need to do any kind of binary searches through the list or anything like that.
Since the input list is ordered by increasing left edge and the output list is ordered by decreasing right edge, we can guarantee that each rectangle added will be checked only against the rectangles it actually overlaps1. We will do overlap checking and removal as shown in the diagram above until we reach a rectangle whose left edge is less than or equal to the left edge of the new rectangle. All further rectangles in the output list are guaranteed not to overlap with the new rectangle. This check-and-chop operation guarantees that each overlap is visited at most once, and that no non-overlapping rectangles are processed unnecessarily, making the algorithm optimal.
Before I show code, here is a diagram of the algorithm in action. Red rectangles are new rectangles; note that their left edges progress to the right. Blue rectangles are ones that are already added and have overlap with the new rectangle. Black rectangles are already added and have no overlap with the new one. The numbering represents the order of the output list. It is always done from the right. A linked list is a perfect structure to maintain this progression since it allows cheap insertions and replacements:
Here is an implementation of the algorithm which assumes that the input coordinates are passed in as an iterable of objects having the attributes l, r, and h. The iteration order is assumed to be sorted by the left edge. If that is not the case, apply sorted or list.sort to the input first:
from collections import namedtuple
# Defined in this order so you can sort a list by left edge without a custom key
Rect = namedtuple('Rect', ['l', 'r', 'h'])
class LinkedList:
__slots__ = ['value', 'next']
"""
Implements a singly-linked list with mutable nodes and an iterator.
"""
def __init__(self, value=None, next=None):
self.value = value
self.next = next
def __iter__(self):
"""
Iterate over the *nodes* in the list, starting with this one.
The `value` and `next` attribute of any node may be modified
during iteration.
"""
while self:
yield self
self = self.next
def __str__(self):
"""
Provided for inspection purposes.
Works well with `namedtuple` values.
"""
return ' -> '.join(repr(x.value) for x in self)
def process_skyline(skyline):
"""
Turns an iterable of rectangles sharing a common baseline into a
`LinkedList` of rectangles containing no overlaps.
The input is assumed to be sorted in ascending order by left edge.
Each element of the input must have the attributes `l`, r`, `h`.
The output will be sorted in descending order by right edge.
Return `None` if the input is empty.
"""
def intersect(r1, r2, default=None):
"""
Return (1) a flag indicating the order of `r1` and `r2`,
(2) a linked list of between one and three non-overlapping
rectangles covering the exact same area as `r1` and `r2`,
and (3) a pointer to the last nodes (4) a pointer to the
second-to-last node, or `default` if there is only one node.
The flag is set to True if the left edge of `r2` is strictly less
than the left edge of `r1`. That would indicate that the left-most
(last) chunk of the tuple came from `r2` instead of `r1`. For the
algorithm as a whole, that means that we need to keep checking for
overlaps.
The resulting list is always returned sorted descending by the
right edge. The input rectangles will not be modified. If they are
not returned as-is, a `Rect` object will be used instead.
"""
# Swap so left edge of r1 < left edge of r2
if r1.l > r2.l:
r1, r2 = r2, r1
swapped = True
else:
swapped = False
if r2.l >= r1.r:
# case 0: no overlap at all
last = LinkedList(r1)
s2l = result = LinkedList(r2, last)
elif r1.r < r2.r:
# case 1: simple overlap
if r1.h > r2.h:
# Chop r2
r2 = Rect(r1.r, r2.r, r2.h)
else:
r1 = Rect(r1.l, r2.l, r1.h)
last = LinkedList(r1)
s2l = result = LinkedList(r2, last)
elif r1.h < r2.h:
# case 2: split into 3
r1a = Rect(r1.l, r2.l, r1.h)
r1b = Rect(r2.r, r1.r, r1.h)
last = LinkedList(r1a)
s2l = LinkedList(r2, last)
result = LinkedList(r1b, s2l)
else:
# case 3: complete containment
result = LinkedList(r1)
last = result
s2l = default
return swapped, result, last, s2l
root = LinkedList()
skyline = iter(skyline)
try:
# Add the first node as-is
root.next = LinkedList(next(skyline))
except StopIteration:
# Empty input iterator
return None
for new_rect in skyline:
prev = root
for rect in root.next:
need_to_continue, replacement, last, second2last = \
intersect(rect.value, new_rect, prev)
# Replace the rectangle with the de-overlapped regions
prev.next = replacement
if not need_to_continue:
# Retain the remainder of the list
last.next = rect.next
break
# Force the iterator to move on to the last node
new_rect = last.value
prev = second2last
return root.next
Computing the total area is now trivial:
skyline = [
Rect(-3, 0, 3), Rect(-1, 1, 2), Rect(2, 4, 4),
Rect(3, 7, 2), Rect(6, 8, 3),
]
processed = process_skyline(skyline)
area = sum((x.value.r - x.value.l) * x.value.h for x in processed) if processed else None
Notice the altered order of the input parameters (h moved to the end). The resulting area is 29. This matches with what I get by doing the computation by hand. You can also do
>>> print(processed)
Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) -> Rect(l=2, r=4, h=4) ->
Rect(l=0, r=1, h=2) -> Rect(l=-3, r=0, h=3)
This is to be expected from the diagram of the inputs/output shown below:
As an additional verification, I added a new building, Rect(-4, 9, 1) to the start of the list. It overlaps all the others and adds three units to area, or a final result of 32. processed comes out as:
Rect(l=8, r=9, h=1) -> Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) ->
Rect(l=2, r=4, h=4) -> Rect(l=1, r=2, h=1) -> Rect(l=0, r=1, h=2) ->
Rect(l=-3, r=0, h=3) -> Rect(l=-4, r=-3, h=1)
Note:
While I am sure that this problem has been solved many times over, the solution I present here is entirely my own work, done without consulting any other references. The idea of using an implicit graph representation and the resulting analysis is inspired by a recent reading of Steven Skiena's Algorithm Design Manual, Second Edition. It is one of the best comp-sci books I have ever come across.
1 Technically, if a new rectangle does not overlap any other rectangles, it will be checked against one rectangle it does not overlap. If that extra check was always the case, the algorithm would have an additional m - 1 comparisons to do. Fortunately, m + m + n - 1 = O(m + n) even if we always had to check one extra rectangle (which we don't).
The reason for getting MemoryError is huge size of the dictionary being created. In the worst case, the dict can have 10^10 keys, which would end up taking all your memory. If there really is a need, shelve is a possible solution to make use of such large dict.
Let's say there is a building with 10 0 100 and another with 20 50 150, then that list might have info like [(-10^9, 0), (0, 10), (50, 20), (150, 0), (10^9, 0)]. As you come across more buildings, you can add more entries in this list. This will be O(n^2).
This might help you further.
I was wondering how the flow of this recursive algorithm works: an inversion counter based on merge-sort. When I looked at the diagrams of the merge-sort recursion tree, it seemed fairly lucid; I thought that the leaves would keep splitting until each leaf was a single unit, then merge() would start combining them; and therefore, start 'moving back up' the tree -- so to speak.
But in the code below, if we print out this function with a given array print(sortAndCount(test_case)) then we're actually getting our 'final' output from the merge() function, not the return statement in sortAndCount()? So in the code below, I thought that the sortAndCount() method would call itself over and over in (invCountA, A) = sortAndCount(anArray[:halfN]) until reaching the base case and then moving on to processing the next half of the array -- but now that seems incorrect. Can someone correct my understanding of this recursive flow? (N.b. I truncated some of the code for the merge() method since I'm only interested the recursive process.)
def sortAndCount(anArray):
N = len(anArray)
halfN = N // 2
#base case:
if N == 1: return (0, anArray)
(invCountA, A) = sortAndCount(anArray[:halfN])
(invCountB, B) = sortAndCount(anArray[halfN:])
(invCountCross, anArray) = merge(A, B)
return (invCountA + invCountB + invCountCross, anArray)
def merge(listA, listB):
counter = 0
i, j = 0, 0
#some additional code...
#...
#...
#If all items in one array have been selected,
#we just return remaining values from other array:
if (i == Asize):
return (counter, output_array + listB[j:])
else:
return (counter, output_array + listA[i:])
The following image created using rcviz shows the order of recursive call, as explained in the documentation the edges are numbered by the order in which they were traversed by the execution.The edges are colored from black to grey to indicate order of traversal : black edges first, grey edges last.:
So if we follow the steps closely we see that first we traverse the left half of the original array completely then the right.
I have a tree as shown below.
Red means it has a certain property, unfilled means it doesn't have it. I want to minimise the Red checks.
If Red than all Ancestors are also Red (and should not be checked again).
If Not Red than all Descendants are Not Red.
The depth of the tree is d.
The width of the tree is n.
Note that children nodes have value larger than the parent.
Example: In the tree below,
Node '0' has children [1, 2, 3],
Node '1' has children [2, 3],
Node '2' has children [3] and
Node '4' has children [] (No children).
Thus children can be constructed as:
if vertex.depth > 0:
vertex.children = [Vertex(parent=vertex, val=child_val, depth=vertex.depth-1, n=n) for child_val in xrange(self.val+1, n)]
else:
vertex.children = []
Here is an example tree:
I am trying to count the number of Red nodes. Both the depth and the width of the tree will be large. So I want to do a sort of Depth-First-Search and additionally use the properties 1 and 2 from above.
How can I design an algorithm to do traverse that tree?
PS: I tagged this [python] but any outline of an algorithm would do.
Update & Background
I want to minimise the property checks.
The property check is checking the connectedness of a bipartite graph constructed from my tree's path.
Example:
The bottom-left node in the example tree has path = [0, 1].
Let the bipartite graph have sets R and C with size r and c. (Note, that the width of the tree is n=r*c).
From the path I get to the edges of the graph by starting with a full graph and removing edges (x, y) for all values in the path as such: x, y = divmod(value, c).
The two rules for the property check come from the connectedness of the graph:
- If the graph is connected with edges [a, b, c] removed, then it must also be connected with [a, b] removed (rule 1).
- If the graph is disconnected with edges [a, b, c] removed, then it must also be disconnected with additional edge d removed [a, b, c, d] (rule 2).
Update 2
So what I really want to do is check all combinations of picking d elements out of [0..n]. The tree structure somewhat helps but even if I got an optimal tree traversal algorithm, I still would be checking too many combinations. (I noticed that just now.)
Let me explain. Assuming I need checked [4, 5] (so 4 and 5 are removed from bipartite graph as explained above, but irrelevant here.). If this comes out as "Red", my tree will prevent me from checking [4] only. That is good. However, I should also mark off [5] from checking.
How can I change the structure of my tree (to a graph, maybe?) to further minimise my number of checks?
Use a variant of the deletion–contraction algorithm for evaluating the Tutte polynomial (evaluated at (1,2), gives the total number of spanning subgraphs) on the complete bipartite graph K_{r,c}.
In a sentence, the idea is to order the edges arbitrarily, enumerate spanning trees, and count, for each spanning tree, how many spanning subgraphs of size r + c + k have that minimum spanning tree. The enumeration of spanning trees is performed recursively. If the graph G has exactly one vertex, the number of associated spanning subgraphs is the number of self-loops on that vertex choose k. Otherwise, find the minimum edge that isn't a self-loop in G and make two recursive calls. The first is on the graph G/e where e is contracted. The second is on the graph G-e where e is deleted, but only if G-e is connected.
Python is close enough to pseudocode.
class counter(object):
def __init__(self, ival = 0):
self.count = ival
def count_up(self):
self.count += 1
return self.count
def old_walk_fun(ilist, func=None):
def old_walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
for q in ilist:
tlist += old_walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return old_walk_fun_helper(ilist, func)
else:
return []
def walk_fun(ilist, func=None):
def walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
if(ilist[0] == "Red"): # Only evaluate sub-branches if current level is Red
for q in ilist:
tlist += walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return walk_fun_helper(ilist, func)
else:
return []
# Crude tree structure, first element is always its colour; following elements are its children
tree_list = \
["Red",
["Red",
["Red",
[]
],
["White",
[]
],
["White",
[]
]
],
["White",
["White",
[]
],
["White",
[]
]
],
["Red",
[]
]
]
red_counter = counter()
eval_counter = counter()
old_walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Unconditionally walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
red_counter = counter()
eval_counter = counter()
walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Selectively walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
How hard are you working on making the test for connectedness fast?
To test a graph for connectedness I would pick edges in a random order and use union-find to merge vertices when I see an edge that connects them. I could terminate early if the graph was connected, and I have a sort of certificate of connectedness - the edges which connected two previously unconnected sets of vertices.
As you work down the tree/follow a path on the bipartite graph, you are removing edges from the graph. If the edge you remove is not in the certificate of connectedness, then the graph must still be connected - this looks like a quick check to me. If it is in the certificate of connectedness you could back up to the state of union/find as of just before that edge was added and then try adding new edges, rather than repeating the complete connectedness test.
Depending on exactly how you define a path, you may be able to say that extensions of that path will never include edges using a subset of vertices - such as vertices which are in the interior of the path so far. If edges originating from those untouchable vertices are sufficient to make the graph connected, then no extension of the path can ever make it unconnected. Then at the very least you just have to count the number of distinct paths. If the original graph is regular I would hope to find some dynamic programming recursion that lets you count them without explicitly enumerating them.