Why is my A* implementation slower than floodfill?

Why is my A* implementation slower than floodfill? - python

I have a blank grid of 100, 100 tiles. Start point is (0,0), goal is (99,99). Tiles are 4-way connections.
My floodfill algorithm finds the shortest path in 30ms, but my A* implementation is around 10x slower.
Note: A* is consistently slower (3 - 10x) than my floodfill, no matter what kind of size of grid or layout. Because the floodfill is simple, then I suspect I'm missing some kind of optimisation in the A*.
Here's the function. I use Python's heapq to maintain a f-sorted list. The 'graph' holds all nodes, goals, neighbours and g/f values.
import heapq
def solve_astar(graph):
open_q = []
heapq.heappush(open_q, (0, graph.start_point))
while open_q:
current = heapq.heappop(open_q)[1]
current.seen = True # Equivalent of being in a closed queue
for n in current.neighbours:
if n is graph.end_point:
n.parent = current
open_q = [] # Clearing the queue stops the process
# Ignore if previously seen (ie, in the closed queue)
if n.seen:
continue
# Ignore If n already has a parent and the parent is closer
if n.parent and n.parent.g <= current.g:
continue
# Set the parent, or switch parents if it already has one
if not n.parent:
n.parent = current
elif n.parent.g > current.g:
remove_from_heap(n, n.f, open_q)
n.parent = current
# Set the F score (simple, uses Manhattan)
set_f(n, n.parent, graph.end_point)
# Push it to queue, prioritised by F score
heapq.heappush(open_q, (n.f, n))
def set_f(point, parent, goal):
point.g += parent.g
h = get_manhattan(point, goal)
point.f = point.g + h

It's a tie-breaker issue. On an empty grid, starting at (0,0) and going to (99,99) produces many tiles with the same f-score.
By adding a tiny nudge to the heuristic, tiles that are slightly closer to the destination will get selected first, meaning the goal is reached quicker and fewer tiles need to be checked.
def set_f(point, parent, goal):
point.g += parent.g
h = get_manhattan(point, goal) * 1.001
point.f = point.g + h
This resulted in around a 100x improvement, making it much faster than floodfill.

Related

How to pickle class() objects with children or neighbour relationships without hitting recursion limits and retain objects when loading

There is a TLDR at the bottom. I have a python class as follows which is used to make nodes for D* Pathfinding (Pathfinding process not included):
import numpy as np
from tqdm import tqdm
import sys
class Node():
"""A node class for D* Pathfinding"""
def __init__(self,parent, position):
self.parent = parent
self.position = position
self.tag = "New"
self.state = "Empty"
self.h = 0
self.k = 0
self.neighbours = []
def __eq__(self, other):
if type(other) != Node : return False
else: return self.position == other.position
def __lt__(self, other):
return self.k < other.k
def makenodes(grid):
"""Given all the enpoints, make nodes for them"""
endpoints = getendpoints(grid)
nodes = [Node(None,pos) for pos in endpoints]
t = tqdm(nodes, desc='Making Nodes') #Just creates a progress bar
for node in t:
t.refresh()
node.neighbours = getneighbours(node,grid,nodes)
return nodes
def getneighbours(current_node,grid,spots):
'''Given the current node, link it to it's neighbours'''
neighbours = []
for new_position in [(0, -1), (0, 1), (-1, 0), (1, 0)]: # Adjacent nodes
# Get node position
node_position = (int(current_node.position[0] + new_position[0]), int(current_node.position[1] + new_position[1]))
# Make sure within range
if node_position[0] > (len(maze) - 1) or node_position[0] < 0 or node_position[1] > (len(maze[len(maze)-1]) -1) or node_position[1] < 0:
continue
# Create new node
new_node = spots[spots.index(Node(None, node_position))]
neighbours.append(new_node)
return neighbours
def getendpoints(grid):
"""returns all locations on the grid"""
x,y = np.where(grid == 0)
endpoints = [(x,y) for (x,y) in zip(x,y)]
return endpoints
"""Actually Making The Nodes Goes as Follows"""
grid = np.zeros((100,100))
spots = makenodes(grid)
sys.setrecursionlimit(40000)
grid = np.zeros((100,100))
spots = makenodes(grid)
with open('100x100 Nodes Init 4 Connected', 'wb') as f:
pickle.dump(spots, f)
print('Done 100x100')
The program runs well however this node making process takes 3min on 100x100 but over a week on 1000x1000. To counter this, I am using pickle.dump() to save the all the nodes, then when I run my program I can use spots = pickle.load(...). On grid=np.zeros((40,50)) I have to set the recursion limit to around 16000, on 100x100, I increase the recursion limit to 40000 but the kernel dies.
To help you visualise how what the nodes are and how they are connected to neighbours, refer to the image I drew where black: Node and white: connection.
If I change, getneighbours() to return tuples of (x,y) coordinates (instead of node objects) then I am able to pickle the nodes as they're no longer recursive. This approach slows down the rest of the program because each time I want to refer to a node I have to reconstruct it and search for it in the list spots. (I have simplified this part of the explanation as the question is already very long).
How can I save these node objects while maintaining their neighbours as nodes for larger grids?
TLDR: I have a node class that is recursive in that each node contains a reference to it's neighbours making it highly recursive. I can save the nodes using pickle for grid = np.zeros((40,50)) and setting a fairly high recursion limit of 16000. For larger grids I reach a point where I reach the max system recursion when calling pickle.dump

Once again TLDR included(Just so you can check if this will work for you).I found a workaround which preserves the time saving benefits of pickling but allows the nodes to be pickled even at larger sizes. Tests included to demonstrate this:
I changed the structure of each Node() in makenodes() that is to be pickled by stripping it of it's neighbours so that there is less recursion. With the method mentioned in the question, a grid = np.zeros((40,50)) would require sys.setrecursionlimt(16000) or so. With this approach, Python's in-built recursion limit of 1000 is never hit and the intended structure can be reconstructed upon loading the pickled file.
Essentially when pickling, the following procedure is followed.
Make a list of all nodes with with their corresponding indexes so that:
In [1]: nodes
Out [1]: [[0, <__main__.Node at 0x7fbfbebf60f0>],...]
Do the same for the neighbours such that:
In [2]: neighbours
Out [2]: [[0, [<__main__.Node at 0x7fbfbebf6f60>, <__main__.Node at 0x7fbfbec068d0>]],...]
The above is an example of what both nodes and neighbours look like for any size of maze. '...' indicates that the list continues for as many elements as it contains and an additional note is that len(nodes) should equal len(neighbours. The index is not necessary however I have included it as an additional check.
The changes are as follows:
def makenodes(grid):
"""Given all the enpoints, make nodes for them"""
endpoints = getendpoints(grid)
spots = [Node(None,pos) for pos in endpoints]
neighbours = []
nodes = []
t = tqdm(spots, desc='Making Nodes')
i = 0
for node in t:
t.refresh()
neighbour = getneighbours(node,grid,spots)
nodes.append([i,node])
neighbours.append([i,neighbour])
i+=1
return nodes, neighbours
"""Actually Making the Nodes"""
grid = np.zeros((100,100))
nodes, neighbours = makenodes(grid)
arr = np.array([nodes,neighbours])
with open('100x100 Nodes Init 4 Connected.pickle', 'wb') as f:
pickle.dump(arr,f)
print('Done 100x100')
def reconstruct(filename):
'''Reconstruct the relationships by re-assigning neighbours'''
with open(filename, 'rb') as file:
f = pickle.load(file)
nodes = f[0]
neighbours = f[1]
reconstructed = []
for node in nodes:
i = node[0]
for neighbour in neighbours[i][1]:
node[1].neighbours.append(neighbour)
reconstructed.append(node[1])
return reconstructed
nodes_in = reconstruct('100x100 Nodes Init 4 Connected.pickle')
This achieves the desired output and can re-establish neighbour/children relationships when reloading(reconstructing). Additionally, the objects in the neighbours list of each node, point directly to their corresponding node such that if you run this on just the picked array without reconstructing it:
In [3]: nodes2[1][1] is neighbours2[0][1][0]
Out[3]: True
Time Differences
Mean time to create and save a 50x50 grid was. Average time of doing this 60 times:
Original Way: 9.613s
Adapted Way (with reconstruction): 9.822s
Mean time to load:
Original Way: 0.0582s
Adapted Way: 0.04071s
Both File Sizes: 410KB
Conclusion
The adapted way can handle larger sizes and judging by the times, has the same performance. Both methods create files of the same size because pickle only stores each object once, read the documentation for this.
Please note that the choice to use a 50x50 grid is to prevent the original method from hitting recursion limits. The maximum size I have been able to pickle with the adapted way is 500x500. I haven't done larger simply because the node making process took almost 2 days at that size on my machine. However loading 500x500 takes less than 1 sec so the benefits are even more apparent.
**TLDR: **I managed to save the nodes and their corresponding neighbours as an array of np.array([nodes.neighbours]) so that when pickling, the necessary data to re-establish relationships when un-pickling was available. In essence, the necessary data is computed and saved in a way that you don't need to raise the recursion limit. You can then reconstruct the data quickly when loading.

Performance of BFS in Python 3, two implementations, very different execution times

I'm currently in a Programming competition where time is key, therefore I'm trying to optimize my solution as much as possible. I have to do 30x30 operations of BFS in under 1s. I tried 3 implementations (2 of the same once with lists and once with deque, and 1 of 2D array). Surprisingly, one BFS was done in 0.027 seconds, while the other was done in 0.0023 seconds! I would like to understand why this has happened, as both of them are the same.
Implementation 1
def BFS3(grid, start):
queue = deque([start])
visited = deque([start])
while queue:
current_point = queue.popleft()
if grid[current_point.y][current_point.x] == -1:
continue
adjacent_points = current_point.get_adjacents(grid)
for point in adjacent_points:
if grid[point.y][point.x] == -1:
continue
if (point.x, point.y) in visited:
continue
visited.append(point)
queue.append(point)
return visited
Implementation 2
def BFS4(grid, start):
visited = []
for _ in range(height):
visited.append([0]*width)
queue = [start]
_x, _y = start.x, start.y
visited[_y][_x] = 1
while queue:
current_point = queue.pop(0)
if grid[current_point.y][current_point.x] == -1:
continue
adjacent_points = current_point.get_adjacents(grid)
for point in adjacent_points:
if grid[point.y][point.x] == -1:
continue
if visited[point.y][point.x]:
continue
_x, _y = point.x, point.y
visited[_y][_x] = 1
queue.append(point)
return visited
and here is the main:
start_point = Point(30, 3)
new_grid = deepcopy(rows)
timer = time()
test = BFS4(rows, start_point)
print_debug("Timer: ", time() - timer)
start_point = Point(30, 3)
new_grid = deepcopy(rows)
timer = time()
test = BFS3(rows, start_point)
print_debug("Timer: ", time() - timer)
Output:
Timer: 0.002305269241333008
Timer: 0.027254581451416016

To my knowledge, the pop(0) function of built-in type list is O(n) making it considerably more time consuming than deque with O(1)
Also see this thread Link
There are multiple components in the two Implementations that differ in performance.
1) deque.popleft() has a runtime of O(1) compared to list.pop() which is in the order of O(n)
2) The way you look up the visited nodes:
In Implementation 2 visited is a N x N matrix with boolish values looked up by coordinates visited[x][y] which has a lookup of O(1) however also has to store N^2 values being very inefficient storage wise for sparse matrixes
In Implementation 1 you use (x, y) in visited which utilizes the overwriten __contain__() function, which most certainly is a search algorithm, that at best is O(log(n)) at worst O(n) negatively scaling with the number of nodes
An alternative implementation for visited would be a hashtable which the Python native dict is. It also has a lookup time in the order of O(1), but only saves the 1s of the visited matrix with
dict[(x,y)] = 1 for setting and dict.get((x,y)). However note that hashtables come with an overhead.
If you are just concerned with runtime and RAM usage is neglectable you can stick with the matrix, as (I suspect) it should be faster than the hashtable.

How to reduce/optimize memory usage when calculating area of skyline?

I'm trying to calculate the area of skyline (overlapping rectangles with same baseline)
building_count = int(input())
items = {} # dictionary, location on x axis is the key, height is the value
count = 0 # total area
for j in range(building_count):
line = input().split(' ')
H = int(line[0]) # height
L = int(line[1]) # left point (start of the building)
R = int(line[2]) # right point (end of the building)
for k in range(R - L):
if not (L+k in items): # if it's not there, add it
items[L+k] = H
elif H > items[L+k]: # if we have a higher building on that index
items[L+k] = H
for value in items.values(): # we add each column basically
count += value
print(count)
sample input would be:
5
3 -3 0
2 -1 1
4 2 4
2 3 7
3 6 8
and output is 29.
The issue is memory efficiency, when there are lots of values, the script simply throws MemoryError. Anyone have some ideas for optimizing memory usage?

You are allocating a separate key-value pair for every single integer value in your range. Imagine the case where R = 1 and L = 100000. Your items dictionary will be filled with 1000000 items. Your basic idea of processing/removing overlaps is is sound, but the way you do it is massive overkill.
Like so much else in life, this is a graph problem in disguise. Imaging the vertices being the rectangles you are trying to process and the (weighted) edges being the overlaps. The complication is that you can not just add up the areas of the vertices and subtract the areas of the overlaps, because many of the overlaps overlap each other as well. The overlap issue can be resolved by applying a transformation that converts two overlapping rectangles into non-overlapping rectangles, effectively cutting the edge that connects them. The transformation is shown in the image below. Notice that in some cases one of the vertices will be removed as well, simplifying the graph, while in another case a new vertex is added:
Green: overlap to be chopped out.
Normally, if we have m rectangles and n overlaps between them, constructing the graph would be an O(m2) operation because we would have to check all vertices for overlaps against each other. However, we can bypass a construction of the input graph entirely to get a O(m + n) traversal algorithm, which is going to be optimal since we will only analyze each rectangle once, and construct the output graph with no overlaps as efficiently as possible. O(m + n) assumes that your input rectangles are sorted according to their left edges in ascending order. If that is not the case, the algorithm will be O(mlog(m) + n) to account for the initial sorting step. Note that as the graph density increases, n will go from ~m to ~m2. This confirms the intuitive idea that the fewer overlaps there are, them more you would expect the process will run in O(m) time, while the more overlaps there are, the closer you will run to O(m2) time.
The space complexity of the proposed algorithm will be O(m): each rectangle in the input will result in at most two rectangles in the output, and 2m = O(m).
Enough about complexity analysis and on to the algorithm itself. The input will be a sequence of rectangles defined by L, R, H as you have now. I will assume that the input is sorted by the leftmost edge L. The output graph will be a linked list of rectangles defined by the same parameters, sorted in descending order by the rightmost edge. The head of the list will be the rightmost rectangle. The output will have no overlaps between any rectangles, so the total area of the skyline will just be the sum of H * (R - L) for each of the ~m output rectangles.
The reason for picking a linked list is that the only two operations we need is iteration from the head node and the cheapest insertion possible to maintain the list in sorted order. The sorting will be done as part of overlap checking, so we do not need to do any kind of binary searches through the list or anything like that.
Since the input list is ordered by increasing left edge and the output list is ordered by decreasing right edge, we can guarantee that each rectangle added will be checked only against the rectangles it actually overlaps1. We will do overlap checking and removal as shown in the diagram above until we reach a rectangle whose left edge is less than or equal to the left edge of the new rectangle. All further rectangles in the output list are guaranteed not to overlap with the new rectangle. This check-and-chop operation guarantees that each overlap is visited at most once, and that no non-overlapping rectangles are processed unnecessarily, making the algorithm optimal.
Before I show code, here is a diagram of the algorithm in action. Red rectangles are new rectangles; note that their left edges progress to the right. Blue rectangles are ones that are already added and have overlap with the new rectangle. Black rectangles are already added and have no overlap with the new one. The numbering represents the order of the output list. It is always done from the right. A linked list is a perfect structure to maintain this progression since it allows cheap insertions and replacements:
Here is an implementation of the algorithm which assumes that the input coordinates are passed in as an iterable of objects having the attributes l, r, and h. The iteration order is assumed to be sorted by the left edge. If that is not the case, apply sorted or list.sort to the input first:
from collections import namedtuple
# Defined in this order so you can sort a list by left edge without a custom key
Rect = namedtuple('Rect', ['l', 'r', 'h'])
class LinkedList:
__slots__ = ['value', 'next']
"""
Implements a singly-linked list with mutable nodes and an iterator.
"""
def __init__(self, value=None, next=None):
self.value = value
self.next = next
def __iter__(self):
"""
Iterate over the *nodes* in the list, starting with this one.
The `value` and `next` attribute of any node may be modified
during iteration.
"""
while self:
yield self
self = self.next
def __str__(self):
"""
Provided for inspection purposes.
Works well with `namedtuple` values.
"""
return ' -> '.join(repr(x.value) for x in self)
def process_skyline(skyline):
"""
Turns an iterable of rectangles sharing a common baseline into a
`LinkedList` of rectangles containing no overlaps.
The input is assumed to be sorted in ascending order by left edge.
Each element of the input must have the attributes `l`, r`, `h`.
The output will be sorted in descending order by right edge.
Return `None` if the input is empty.
"""
def intersect(r1, r2, default=None):
"""
Return (1) a flag indicating the order of `r1` and `r2`,
(2) a linked list of between one and three non-overlapping
rectangles covering the exact same area as `r1` and `r2`,
and (3) a pointer to the last nodes (4) a pointer to the
second-to-last node, or `default` if there is only one node.
The flag is set to True if the left edge of `r2` is strictly less
than the left edge of `r1`. That would indicate that the left-most
(last) chunk of the tuple came from `r2` instead of `r1`. For the
algorithm as a whole, that means that we need to keep checking for
overlaps.
The resulting list is always returned sorted descending by the
right edge. The input rectangles will not be modified. If they are
not returned as-is, a `Rect` object will be used instead.
"""
# Swap so left edge of r1 < left edge of r2
if r1.l > r2.l:
r1, r2 = r2, r1
swapped = True
else:
swapped = False
if r2.l >= r1.r:
# case 0: no overlap at all
last = LinkedList(r1)
s2l = result = LinkedList(r2, last)
elif r1.r < r2.r:
# case 1: simple overlap
if r1.h > r2.h:
# Chop r2
r2 = Rect(r1.r, r2.r, r2.h)
else:
r1 = Rect(r1.l, r2.l, r1.h)
last = LinkedList(r1)
s2l = result = LinkedList(r2, last)
elif r1.h < r2.h:
# case 2: split into 3
r1a = Rect(r1.l, r2.l, r1.h)
r1b = Rect(r2.r, r1.r, r1.h)
last = LinkedList(r1a)
s2l = LinkedList(r2, last)
result = LinkedList(r1b, s2l)
else:
# case 3: complete containment
result = LinkedList(r1)
last = result
s2l = default
return swapped, result, last, s2l
root = LinkedList()
skyline = iter(skyline)
try:
# Add the first node as-is
root.next = LinkedList(next(skyline))
except StopIteration:
# Empty input iterator
return None
for new_rect in skyline:
prev = root
for rect in root.next:
need_to_continue, replacement, last, second2last = \
intersect(rect.value, new_rect, prev)
# Replace the rectangle with the de-overlapped regions
prev.next = replacement
if not need_to_continue:
# Retain the remainder of the list
last.next = rect.next
break
# Force the iterator to move on to the last node
new_rect = last.value
prev = second2last
return root.next
Computing the total area is now trivial:
skyline = [
Rect(-3, 0, 3), Rect(-1, 1, 2), Rect(2, 4, 4),
Rect(3, 7, 2), Rect(6, 8, 3),
]
processed = process_skyline(skyline)
area = sum((x.value.r - x.value.l) * x.value.h for x in processed) if processed else None
Notice the altered order of the input parameters (h moved to the end). The resulting area is 29. This matches with what I get by doing the computation by hand. You can also do
>>> print(processed)
Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) -> Rect(l=2, r=4, h=4) ->
Rect(l=0, r=1, h=2) -> Rect(l=-3, r=0, h=3)
This is to be expected from the diagram of the inputs/output shown below:
As an additional verification, I added a new building, Rect(-4, 9, 1) to the start of the list. It overlaps all the others and adds three units to area, or a final result of 32. processed comes out as:
Rect(l=8, r=9, h=1) -> Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) ->
Rect(l=2, r=4, h=4) -> Rect(l=1, r=2, h=1) -> Rect(l=0, r=1, h=2) ->
Rect(l=-3, r=0, h=3) -> Rect(l=-4, r=-3, h=1)
Note:
While I am sure that this problem has been solved many times over, the solution I present here is entirely my own work, done without consulting any other references. The idea of using an implicit graph representation and the resulting analysis is inspired by a recent reading of Steven Skiena's Algorithm Design Manual, Second Edition. It is one of the best comp-sci books I have ever come across.
1 Technically, if a new rectangle does not overlap any other rectangles, it will be checked against one rectangle it does not overlap. If that extra check was always the case, the algorithm would have an additional m - 1 comparisons to do. Fortunately, m + m + n - 1 = O(m + n) even if we always had to check one extra rectangle (which we don't).

The reason for getting MemoryError is huge size of the dictionary being created. In the worst case, the dict can have 10^10 keys, which would end up taking all your memory. If there really is a need, shelve is a possible solution to make use of such large dict.
Let's say there is a building with 10 0 100 and another with 20 50 150, then that list might have info like [(-10^9, 0), (0, 10), (50, 20), (150, 0), (10^9, 0)]. As you come across more buildings, you can add more entries in this list. This will be O(n^2).
This might help you further.

recursion and multi dimension matrix in python

this is famous path counting problem , i am trying to solve it using memoization.
Enlighten me!
def pathCounter(a,b):
matrix = [[0 for i in xrange(a)] for i in xrange(b)]
if a==0 or b==0:
return 1
if matrix[a][b]:
return matrix[a][b]
print matrix[a][b]
matrix[a][b]=pathCounter(a,b-1)+pathCounter(a-1,b)
return matrix[2][2]
if __name__=='__main__':
k=pathCounter(2,2)
print k

I believe your trying to solve this problem.
If that is the case, then you are correct that it would be sensible to solve with recursion.
If you imagine each corner of the grid as a node, then you want to a recursive function that simply takes a parameter of the node it is at (x, y). In the function, it first needs to check if the position that it was called at is the bottom right vertex of the grid. If it is, the function adds one to the path count (as a path is finished when it reaches this corner) and then returns. Otherwise, this function just calls two more of itself (this is the recursion), one to its right (so y+1) and one to its left (x+1). An added step is to check that the coordinates are in the grid before calling them as a node in the middle of the bottom row for instance shouldn't call a node below it as that would be off the grid.
Now you have the recursive function defined, all you need to do now is declare a variable to store the path count. And call the recursive function from the coordinate (0,0).
However, as I am sure you have seen, this solution does not complete in reasonable time so it is necessary that you use memoization - speeding it up by caching the nodes so that the same sections of paths aren't calculated twice.
It also makes coding it more simple if as you have done, we work from the bottom right corner up to the top left corner. One last thing is that if you use a dictionary then the code becomes clearer.
The final code should look something like:
cache = {}
def pathCounter(x, y):
if x == 0 or y == 0:
return 1
if (x,y) in cache:
return cache[(x,y)]
cache[(x,y)] = pathCounter(x, y-1) + pathCounter(x-1, y)
return cache[(x,y)]
print(pathCounter(2,2))
this gives the expected result of 6.
I'll leave you to do the 20x20 grid. Hope this helps!

You made a few errors in your implementation of the algorithm. If your using a recursive approach you do not have to use the grid because you want require any of the stored data, actually. You only need to return the two possible sub-paths from your current position - that's it! Therefore, you need to make some changes in the main idea of your code.
I tried to keep as much of your original code as possible, but still make it working:
def pathCounterNaive(width, height, startX = 0, startY = 0):
if startX >= width or startY >= height:
return 0
if startX == width-1 and startY == height-1:
return 1
return pathCounter(width,height, startX+1, startY) + pathCounter(width,height, startX, startY+1)
slowK=pathCounterNaive(3,3)
print(slowK)
Please keep in mind, that the parameters width and height represent the number of vertices, and are therefore not 2 but 3 for a 2x2 grid. As this code is using pure recursion it is very slow. If you want to use your memorization approach, you have to modify your code like this:
import numpy as np
def pathCounter(width, height):
grid = np.zeros((height+1, width+1))
def pathCounterInternal(x, y):
if x==0 or y==0:
return 1
grid[x, y] = pathCounterInternal(x,y-1)+pathCounterInternal(x-1,y)
return grid[x, y]
grid[width, height] = pathCounterInternal(width, height)
return grid[width, height]
k=pathCounter(2,2)
print(k)
Here you have to call it with 2 as the parameter for a 2x2 grid. This code is much faster due to the caching of already calculated paths.

Understanding the recursion in mergesort-like algorithms

I was wondering how the flow of this recursive algorithm works: an inversion counter based on merge-sort. When I looked at the diagrams of the merge-sort recursion tree, it seemed fairly lucid; I thought that the leaves would keep splitting until each leaf was a single unit, then merge() would start combining them; and therefore, start 'moving back up' the tree -- so to speak.
But in the code below, if we print out this function with a given array print(sortAndCount(test_case)) then we're actually getting our 'final' output from the merge() function, not the return statement in sortAndCount()? So in the code below, I thought that the sortAndCount() method would call itself over and over in (invCountA, A) = sortAndCount(anArray[:halfN]) until reaching the base case and then moving on to processing the next half of the array -- but now that seems incorrect. Can someone correct my understanding of this recursive flow? (N.b. I truncated some of the code for the merge() method since I'm only interested the recursive process.)
def sortAndCount(anArray):
N = len(anArray)
halfN = N // 2
#base case:
if N == 1: return (0, anArray)
(invCountA, A) = sortAndCount(anArray[:halfN])
(invCountB, B) = sortAndCount(anArray[halfN:])
(invCountCross, anArray) = merge(A, B)
return (invCountA + invCountB + invCountCross, anArray)
def merge(listA, listB):
counter = 0
i, j = 0, 0
#some additional code...
#...
#...
#If all items in one array have been selected,
#we just return remaining values from other array:
if (i == Asize):
return (counter, output_array + listB[j:])
else:
return (counter, output_array + listA[i:])

The following image created using rcviz shows the order of recursive call, as explained in the documentation the edges are numbered by the order in which they were traversed by the execution.The edges are colored from black to grey to indicate order of traversal : black edges first, grey edges last.:
So if we follow the steps closely we see that first we traverse the left half of the original array completely then the right.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is my A* implementation slower than floodfill? - python

Related

How to pickle class() objects with children or neighbour relationships without hitting recursion limits and retain objects when loading

Performance of BFS in Python 3, two implementations, very different execution times

How to reduce/optimize memory usage when calculating area of skyline?

recursion and multi dimension matrix in python

Understanding the recursion in mergesort-like algorithms

Categories

Resources