I am trying to find the nearest points on an x,y plane within given radius using kd trees. I mapped all the points in the tree, but the problem arise since the points I am providing to the algorithm to search for the nearest point are themself mapped in the tree. In other words, I'm getting the same point in return so the distance is basically 0. So I should be looking for the second nearest point I guess. The way I thought it might be achievable was doing the same thing but getting the result that was worst than the best_result but better than the root. I have been trying to implement this but have been unsuccessful so far. I did this with nested for loops its way easier with for loops to get the second-best result.
here's my kd tree implementation:
def kdtree(points, depth = 0 ):
n = len(points)
if n<=0:
return None
axis = depth % k
sorted_points = sorted(points, key = lambda point: point[axis])
if n % 2 == 0:
median_idx = int(n/2)
else:
median_idx = floor(n/2)
return{
'point': sorted_points[median_idx],
'left': kdtree(sorted_points[:median_idx], depth+1),
'right': kdtree(sorted_points[median_idx+1:], depth+1)
}
Here is my function for searching the nearest point
kd_nearest_search (root, point, depth = 0):
if root is None:
return None
axis = depth % k
apposite_branch = None
if point[axis] < root['point'][axis]:
next_branch = root['left']
opposite_branch = root['right']
else:
next_branch = root['right']
opposite_branch = root['left']
best_result = best_distance(point,kdtree_closest_point(next_branch,point,depth+1), root['point'])
if distance(point,best_result) > (abs(point[axis] - root['point'][axis])):
best_result = best_distance(point,kdtree_closest_point(next_branch,point,depth+1), best_result)
return best_result
I don't know your kd implementation. But I'll give you the basic idea.
Create tuples of (min_possible_distance, type, object) where the distance is the minimum possible distance to your searched for point the type can be either "box" or "point", and the object is whatever the thing is.
Put those tuples into a heap.
And now, in pseudocode, you do this:
place (0, "box", root_of_tree) into new heap
while heap is not empty:
(distance, type, obj) = pop minimal element from heap.
if type == "point":
yield obj
else:
if obj has a point:
store (distance, "point", point) in heap
for child_box from obj:
store (min_distance", "box", child_box) in heap
This will yield points in the kd tree, from closest to farthest. So you can just search until you find the second one. But alternately you could ask for the 10 nearest and it would give you that instead.
Related
Is there a code to define / sort through lists?
Advanced solution
You can use the min function of python with the key argument like this:
def find_closest(start_point, remaining_points):
return min(remaining_points, key=lambda a: distance(start_point, a))
Basic solution
Because of your specific needs (only loops and if statements), here is another solution. For other people who are not restricted, I recommend using the above solution.
def find_closest(start_point, remaining_points):
closest = None
closest_distance = 1000000
# Better but more advanced initialisation
# closest_distance = float("inf")
for element in list(remaining_points):
dist = distance(start_point, element)
if dist < closest_distance:
closest = element
closest_distance = dist
return closest
Explanation
Before going through all points, we initialise the closest point to None (it is not found yet) and the closest_distante to a very high value (to be sure that the first evaluated point will be closer).
Then, for each point in remaining_points, we calculate its distance from start_point and store it in dist.
If this distance dist is less than closest_distance, then the current point is closest from the current stored one, so we update the stored closest point closest with the current point and we update the closest_distance with the current distance dist.
When all points have been evaluated, we return the closest point closest.
Links for more information
min function: https://www.programiz.com/python-programming/methods/built-in/min
lambda function: https://www.w3schools.com/python/python_lambda.asp
A quick and straightforward solution would be to create a list of results and then corroborate the index with your list of remaining points (because they are inherently lined up). Below is a step-by-step process to achieving this, and at the very bottom is a cleaned-up version of the code.
def find_closest(start_point, remaining_points):
results = [] # list of results for later use
for element in list(remaining_points):
result = distance(start_point, element)
results.append(result) # append the result to the list
# After iteration is finished, find the lowest result
lowest_result = min(results)
# Find the index of the lowest result
lowest_result_index = results.index(lowest_result)
# Corroborate this with the inputs
closest_point = remaining_points[lowest_result_index]
# Return the closest point
return closest_point
Or to simplify the code:
def find_closest(start_point, remaining_points):
results = []
for element in remaining_points:
results.append(distance(start_point, element))
return remaining_points[results.index(min(results))]
Edit: you commented saying you can't use Python's in-built min() function. A solution would be to just create your own functional minimum_value() function.
def minimum_value(lst: list):
min_val = lst[0]
for item in lst:
if item < min_val:
min_val = item
return min_val
def find_closest(start_point, remaining_points):
results = []
for element in remaining_points:
results.append(distance(start_point, element))
return remaining_points[results.index(minimum_value(results))]
There is a TLDR at the bottom. I have a python class as follows which is used to make nodes for D* Pathfinding (Pathfinding process not included):
import numpy as np
from tqdm import tqdm
import sys
class Node():
"""A node class for D* Pathfinding"""
def __init__(self,parent, position):
self.parent = parent
self.position = position
self.tag = "New"
self.state = "Empty"
self.h = 0
self.k = 0
self.neighbours = []
def __eq__(self, other):
if type(other) != Node : return False
else: return self.position == other.position
def __lt__(self, other):
return self.k < other.k
def makenodes(grid):
"""Given all the enpoints, make nodes for them"""
endpoints = getendpoints(grid)
nodes = [Node(None,pos) for pos in endpoints]
t = tqdm(nodes, desc='Making Nodes') #Just creates a progress bar
for node in t:
t.refresh()
node.neighbours = getneighbours(node,grid,nodes)
return nodes
def getneighbours(current_node,grid,spots):
'''Given the current node, link it to it's neighbours'''
neighbours = []
for new_position in [(0, -1), (0, 1), (-1, 0), (1, 0)]: # Adjacent nodes
# Get node position
node_position = (int(current_node.position[0] + new_position[0]), int(current_node.position[1] + new_position[1]))
# Make sure within range
if node_position[0] > (len(maze) - 1) or node_position[0] < 0 or node_position[1] > (len(maze[len(maze)-1]) -1) or node_position[1] < 0:
continue
# Create new node
new_node = spots[spots.index(Node(None, node_position))]
neighbours.append(new_node)
return neighbours
def getendpoints(grid):
"""returns all locations on the grid"""
x,y = np.where(grid == 0)
endpoints = [(x,y) for (x,y) in zip(x,y)]
return endpoints
"""Actually Making The Nodes Goes as Follows"""
grid = np.zeros((100,100))
spots = makenodes(grid)
sys.setrecursionlimit(40000)
grid = np.zeros((100,100))
spots = makenodes(grid)
with open('100x100 Nodes Init 4 Connected', 'wb') as f:
pickle.dump(spots, f)
print('Done 100x100')
The program runs well however this node making process takes 3min on 100x100 but over a week on 1000x1000. To counter this, I am using pickle.dump() to save the all the nodes, then when I run my program I can use spots = pickle.load(...). On grid=np.zeros((40,50)) I have to set the recursion limit to around 16000, on 100x100, I increase the recursion limit to 40000 but the kernel dies.
To help you visualise how what the nodes are and how they are connected to neighbours, refer to the image I drew where black: Node and white: connection.
If I change, getneighbours() to return tuples of (x,y) coordinates (instead of node objects) then I am able to pickle the nodes as they're no longer recursive. This approach slows down the rest of the program because each time I want to refer to a node I have to reconstruct it and search for it in the list spots. (I have simplified this part of the explanation as the question is already very long).
How can I save these node objects while maintaining their neighbours as nodes for larger grids?
TLDR: I have a node class that is recursive in that each node contains a reference to it's neighbours making it highly recursive. I can save the nodes using pickle for grid = np.zeros((40,50)) and setting a fairly high recursion limit of 16000. For larger grids I reach a point where I reach the max system recursion when calling pickle.dump
Once again TLDR included(Just so you can check if this will work for you).I found a workaround which preserves the time saving benefits of pickling but allows the nodes to be pickled even at larger sizes. Tests included to demonstrate this:
I changed the structure of each Node() in makenodes() that is to be pickled by stripping it of it's neighbours so that there is less recursion. With the method mentioned in the question, a grid = np.zeros((40,50)) would require sys.setrecursionlimt(16000) or so. With this approach, Python's in-built recursion limit of 1000 is never hit and the intended structure can be reconstructed upon loading the pickled file.
Essentially when pickling, the following procedure is followed.
Make a list of all nodes with with their corresponding indexes so that:
In [1]: nodes
Out [1]: [[0, <__main__.Node at 0x7fbfbebf60f0>],...]
Do the same for the neighbours such that:
In [2]: neighbours
Out [2]: [[0, [<__main__.Node at 0x7fbfbebf6f60>, <__main__.Node at 0x7fbfbec068d0>]],...]
The above is an example of what both nodes and neighbours look like for any size of maze. '...' indicates that the list continues for as many elements as it contains and an additional note is that len(nodes) should equal len(neighbours. The index is not necessary however I have included it as an additional check.
The changes are as follows:
def makenodes(grid):
"""Given all the enpoints, make nodes for them"""
endpoints = getendpoints(grid)
spots = [Node(None,pos) for pos in endpoints]
neighbours = []
nodes = []
t = tqdm(spots, desc='Making Nodes')
i = 0
for node in t:
t.refresh()
neighbour = getneighbours(node,grid,spots)
nodes.append([i,node])
neighbours.append([i,neighbour])
i+=1
return nodes, neighbours
"""Actually Making the Nodes"""
grid = np.zeros((100,100))
nodes, neighbours = makenodes(grid)
arr = np.array([nodes,neighbours])
with open('100x100 Nodes Init 4 Connected.pickle', 'wb') as f:
pickle.dump(arr,f)
print('Done 100x100')
def reconstruct(filename):
'''Reconstruct the relationships by re-assigning neighbours'''
with open(filename, 'rb') as file:
f = pickle.load(file)
nodes = f[0]
neighbours = f[1]
reconstructed = []
for node in nodes:
i = node[0]
for neighbour in neighbours[i][1]:
node[1].neighbours.append(neighbour)
reconstructed.append(node[1])
return reconstructed
nodes_in = reconstruct('100x100 Nodes Init 4 Connected.pickle')
This achieves the desired output and can re-establish neighbour/children relationships when reloading(reconstructing). Additionally, the objects in the neighbours list of each node, point directly to their corresponding node such that if you run this on just the picked array without reconstructing it:
In [3]: nodes2[1][1] is neighbours2[0][1][0]
Out[3]: True
Time Differences
Mean time to create and save a 50x50 grid was. Average time of doing this 60 times:
Original Way: 9.613s
Adapted Way (with reconstruction): 9.822s
Mean time to load:
Original Way: 0.0582s
Adapted Way: 0.04071s
Both File Sizes: 410KB
Conclusion
The adapted way can handle larger sizes and judging by the times, has the same performance. Both methods create files of the same size because pickle only stores each object once, read the documentation for this.
Please note that the choice to use a 50x50 grid is to prevent the original method from hitting recursion limits. The maximum size I have been able to pickle with the adapted way is 500x500. I haven't done larger simply because the node making process took almost 2 days at that size on my machine. However loading 500x500 takes less than 1 sec so the benefits are even more apparent.
**TLDR: **I managed to save the nodes and their corresponding neighbours as an array of np.array([nodes.neighbours]) so that when pickling, the necessary data to re-establish relationships when un-pickling was available. In essence, the necessary data is computed and saved in a way that you don't need to raise the recursion limit. You can then reconstruct the data quickly when loading.
im currently working on trying to get the the number of unique paths from node 1 .. N of maximum length for a weighted directed acyclic graph, i have worked out getting the max length but i am stuck on getting the NUMBER of paths of that given max length...
Data is inputted like this:
91 120 # Number of nodes, number of edges
1 2 34
1 3 15
2 4 10
....
As Node 1-> Node 2 with a weight of 34,
I input my data using a diction so my dict looks like:
_distance = {}
_distance = {1: [(2, 34), (3, 15)], 2: [(4, 10)], 3: [(4, 17)], 4: [(5, 36), (6, 22)], 5: [(7, 8)],...ect
I have worked out how to achieve the longest length of the paths using this:
first i make a list of vertices
class Vertice:
def __init__(self,name,weight=0,visted=False):
self._n = name
self._w = weight
self._visited = visted
self.pathTo
for i in range(numberOfNodes): # List of vertices (0-n-1)
_V = Vertice(i)
_nodes.append(_V)
next i iterate through my dictionary setting each node to the maximum weight it can be
for vert, neighbors in _distance.iteritems():
_vert = _nodes[vert-1] # Current vertice array starts at 0, so n-1
for x,y in neighbors: # neighbores,y = weight of neighbors
_v = _nodes[x-1] # Node #1 will be will be array[0]
if _v._visited == True:
if _v._w > _vert._w+y:
_v._w = _v._w
else:
_v._w = y + _vert._w
else:
_v._w = y + _vert._w
_v._visited = True
with this done, the last node will have a weight of the maximum so i can just call
max = _nodes[-1]._w
to get the max weight. This seems to perform fast and has no trouble finding the max length path even when performed on the bigger data set, i then take my max value and run it into this function:
# Start from first node in dictionary, distances is our dict{}
# Target is the last node in the list of nodes, or the total number of nodes.
numLongestPaths(currentLocation=1,target=_numNodes,distances=_distance,maxlength=max)
def numLongestPaths(currentLocation,maxlength, target, sum=0, distances={}):
_count = 0
if currentLocation == target:
if sum == maxlength:
_count += 1
else:
for vert, weight in distances[currentLocation]:
newSum = sum + weight
currentLocation = vert
_count += numLongestPaths(currentLocation,maxlength,target,newSum,distances)
return _count
I simply check once we have hit the end node if our current sum is the max, if it is, add one to our count, if not pass.
This works instantly for the inputs such as 8 nodes and longest path is 20, finding 3 paths, and for inputs such as 100 nodes, longest length of 149 and only 1 unique path of that length, but when i try to do a data set with 91 nodes such as longest path 1338 and number of unique paths are 32, the function takes extremely LONG, it works but is very slow.
Can someone give me some tips on what is wrong with my function to cause it to take so long finding the # of paths length X from 1..N? i'm assuming its getting an exponential run time but i'm unsure how to fix it
Thank you for your help!
EDIT: Okay i was overthinking this and going about this the wrong way, i restructured my approach and my code is now as follows:
# BEGIN SEARCH.
for vert, neighbors in _distance.iteritems():
_vert = _nodes[vert-1] # Current vertice array starts at 0, so n-1
for x,y in neighbors: # neighbores
_v = _nodes[x-1] # Node #1 will be will be array[0]
if _v._visited == True:
if _v._w > _vert._w+y:
_v._w = _v._w
elif _v._w == _vert._w+y:
_v.pathsTo += _vert.pathsTo
else:
_v.pathsTo = _vert.pathsTo
_v._w = y + _vert._w
else:
_v._w = y + _vert._w
_v.pathsTo = max(_vert.pathsTo, _v.pathsTo + 1)
_v._visited = True
i added a pathsTo variable to my Vertice class, and that will hold the number of unique paths of MAX length
Your numLongestPaths is slow because you're recursively trying every possible path, and there can be exponentially many of those. Find a way to avoid computing numLongestPaths for any node more than once.
Also, your original _w computation is broken, because when it computes a node's _w value, it does nothing to ensure the other _w values it's relying on have themselves been computed. You will need to avoid using uninitialized values; a topological sort may be useful, although it sounds like the vertex labels may have already been assigned in topological sort order.
In addition to #user2357112's answer, here are two additional recommendations
Language
If you what this code to be as efficient as possible, I recommend using C. Python is a great scripting language, but really slow compared to compiled alternatives
Data-structure
Nodes are named in an ordered fashion, you can thus optimize a lot your code by using a list instead of a dictionary. i.e.
_distance = [[] for i in range(_length)]
I have a tree as shown below.
Red means it has a certain property, unfilled means it doesn't have it. I want to minimise the Red checks.
If Red than all Ancestors are also Red (and should not be checked again).
If Not Red than all Descendants are Not Red.
The depth of the tree is d.
The width of the tree is n.
Note that children nodes have value larger than the parent.
Example: In the tree below,
Node '0' has children [1, 2, 3],
Node '1' has children [2, 3],
Node '2' has children [3] and
Node '4' has children [] (No children).
Thus children can be constructed as:
if vertex.depth > 0:
vertex.children = [Vertex(parent=vertex, val=child_val, depth=vertex.depth-1, n=n) for child_val in xrange(self.val+1, n)]
else:
vertex.children = []
Here is an example tree:
I am trying to count the number of Red nodes. Both the depth and the width of the tree will be large. So I want to do a sort of Depth-First-Search and additionally use the properties 1 and 2 from above.
How can I design an algorithm to do traverse that tree?
PS: I tagged this [python] but any outline of an algorithm would do.
Update & Background
I want to minimise the property checks.
The property check is checking the connectedness of a bipartite graph constructed from my tree's path.
Example:
The bottom-left node in the example tree has path = [0, 1].
Let the bipartite graph have sets R and C with size r and c. (Note, that the width of the tree is n=r*c).
From the path I get to the edges of the graph by starting with a full graph and removing edges (x, y) for all values in the path as such: x, y = divmod(value, c).
The two rules for the property check come from the connectedness of the graph:
- If the graph is connected with edges [a, b, c] removed, then it must also be connected with [a, b] removed (rule 1).
- If the graph is disconnected with edges [a, b, c] removed, then it must also be disconnected with additional edge d removed [a, b, c, d] (rule 2).
Update 2
So what I really want to do is check all combinations of picking d elements out of [0..n]. The tree structure somewhat helps but even if I got an optimal tree traversal algorithm, I still would be checking too many combinations. (I noticed that just now.)
Let me explain. Assuming I need checked [4, 5] (so 4 and 5 are removed from bipartite graph as explained above, but irrelevant here.). If this comes out as "Red", my tree will prevent me from checking [4] only. That is good. However, I should also mark off [5] from checking.
How can I change the structure of my tree (to a graph, maybe?) to further minimise my number of checks?
Use a variant of the deletion–contraction algorithm for evaluating the Tutte polynomial (evaluated at (1,2), gives the total number of spanning subgraphs) on the complete bipartite graph K_{r,c}.
In a sentence, the idea is to order the edges arbitrarily, enumerate spanning trees, and count, for each spanning tree, how many spanning subgraphs of size r + c + k have that minimum spanning tree. The enumeration of spanning trees is performed recursively. If the graph G has exactly one vertex, the number of associated spanning subgraphs is the number of self-loops on that vertex choose k. Otherwise, find the minimum edge that isn't a self-loop in G and make two recursive calls. The first is on the graph G/e where e is contracted. The second is on the graph G-e where e is deleted, but only if G-e is connected.
Python is close enough to pseudocode.
class counter(object):
def __init__(self, ival = 0):
self.count = ival
def count_up(self):
self.count += 1
return self.count
def old_walk_fun(ilist, func=None):
def old_walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
for q in ilist:
tlist += old_walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return old_walk_fun_helper(ilist, func)
else:
return []
def walk_fun(ilist, func=None):
def walk_fun_helper(ilist, func=None, count=0):
tlist = []
if(isinstance(ilist, list) and ilist):
if(ilist[0] == "Red"): # Only evaluate sub-branches if current level is Red
for q in ilist:
tlist += walk_fun_helper(q, func, count+1)
else:
tlist = func(ilist)
return [tlist] if(count != 0) else tlist
if(func != None and hasattr(func, '__call__')):
return walk_fun_helper(ilist, func)
else:
return []
# Crude tree structure, first element is always its colour; following elements are its children
tree_list = \
["Red",
["Red",
["Red",
[]
],
["White",
[]
],
["White",
[]
]
],
["White",
["White",
[]
],
["White",
[]
]
],
["Red",
[]
]
]
red_counter = counter()
eval_counter = counter()
old_walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Unconditionally walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
red_counter = counter()
eval_counter = counter()
walk_fun(tree_list, lambda x: (red_counter.count_up(), eval_counter.count_up()) if(x == "Red") else eval_counter.count_up())
print "Selectively walking"
print "Reds found: %d" % red_counter.count
print "Evaluations made: %d" % eval_counter.count
print ""
How hard are you working on making the test for connectedness fast?
To test a graph for connectedness I would pick edges in a random order and use union-find to merge vertices when I see an edge that connects them. I could terminate early if the graph was connected, and I have a sort of certificate of connectedness - the edges which connected two previously unconnected sets of vertices.
As you work down the tree/follow a path on the bipartite graph, you are removing edges from the graph. If the edge you remove is not in the certificate of connectedness, then the graph must still be connected - this looks like a quick check to me. If it is in the certificate of connectedness you could back up to the state of union/find as of just before that edge was added and then try adding new edges, rather than repeating the complete connectedness test.
Depending on exactly how you define a path, you may be able to say that extensions of that path will never include edges using a subset of vertices - such as vertices which are in the interior of the path so far. If edges originating from those untouchable vertices are sufficient to make the graph connected, then no extension of the path can ever make it unconnected. Then at the very least you just have to count the number of distinct paths. If the original graph is regular I would hope to find some dynamic programming recursion that lets you count them without explicitly enumerating them.
I'm working on a front-end for a robotic project (an 'autonomous' car that localizes itself using some sensors and a map - generated from an SVG file).
For the robot to be controllable, we must generate paths between its current position and its goal. I used the easiest algorithm for that : A*.
I got some weird results doing that : The car tends to go on multiples of 45° degree, and one especially annoying problem : some generated paths are quite noisy !
See the noisy path near the orange rectangle in this case :
Is there anyway to avoid those weird/noisy results ? Eventually we'd want to build a path with the minimum number of heading angle changes. (the car can turn without moving, so we don't need any path 'smoothing').
Here's my A* implementation :
def search(self, begin, goal):
if goal.x not in range(self.width) or goal.y not in range(self.height):
print "Goal is out of bound"
return []
elif not self.grid[begin.y][begin.x].reachable:
print "Beginning is unreachable"
return []
elif not self.grid[goal.y][goal.x].reachable:
print "Goal is unreachable"
return []
else:
self.cl = set()
self.ol = set()
curCell = begin
self.ol.add(curCell)
while len(self.ol) > 0:
# We choose the cell in the open list having the minimum score as our current cell
curCell = min(self.ol, key = lambda x : x.f)
# We add the current cell to the closed list
self.ol.remove(curCell)
self.cl.add(curCell)
# We check the cell's (reachable) neighbours :
neighbours = self.neighbours(curCell)
for cell in neighbours:
# If the goal is a neighbour cell :
if cell == goal:
cell.parent = curCell
self.path = cell.path()
self.display()
self.clear()
return self.path
elif cell not in self.cl:
# We process the cells that are not in the closed list
# (processing <-> calculating the "F" score)
cell.process(curCell, goal)
self.ol.add(cell)
EDIT 1: By popuplar demand, here's the score calculation function (process) :
def process(self, parent, goal):
self.parent = parent
self.g = parent.distance(self)
self.h = self.manhattanDistance(goal)
self.f = self.g + self.h
EDIT Here's the neighbours method (updated following user1884905's answer) :
def neighbours(self, cell, radius = 1, unreachables = False, diagonal = True):
neighbours = set()
for i in xrange(-radius, radius + 1):
for j in xrange(-radius, radius + 1):
x = cell.x + j
y = cell.y + i
if 0 <= y < self.height and 0 <= x < self.width and ( self.grid[y][x].reachable or unreachables ) and (diagonal or (x == cell.x or y == cell.y)) :
neighbours.add(self.grid[y][x])
return neighbours
(this looks complicated but it just gives the 8 neighbours - including diagonal neighbours - of a cell ; it can also take a radius different from 1 because it's used for other features)
And distance calculations (depending on the use of diagonals neighbours or not : )
def manhattanDistance(self, cell):
return abs(self.x - cell.x) + abs(self.y - cell.y)
def diagonalDistance(self, cell):
xDist = abs(self.x - cell.x)
yDist = abs(self.y - cell.y)
if xDist > yDist:
return 1.4 * yDist + (xDist - yDist)
else:
return 1.4 * xDist + (yDist - xDist)
It seems the implementation is not correct, because it's moving to the cell not yet in the examined one which is nearest (as the crow flies) to the goal, while it should try it and undo the path when finding an obstacle in order to find the optimal one. See this nice animation on Wikipedia to get the idea.
The issue here is related to how you calculate cell.f, maybe you aren't adding the score of the current cell when doing the calculus, in general the A* should take the steps marked in red here and generate sub optimal paths like that.
Since the space is divided into discrete cells, when the best path (always as the crow flies) in a continuous world is right in between two discrete moves, it approximate it as best as it can with that weird path.
I see two approaches here:
Fix the algorithm (here the pseudocode) keeping the correct distance value for each evaluated cell (in the pasted one there's no information about how cell.f is calculated).
Use Djikstra, it should be easy to be implemented with a few changes to the current algorithm. In fact, A* is just an optimized version of it.
Without being able to see how you have implemented your neighbour and distance functions, I still have a good guess about what is going wrong:
You should not use manhattan distance if you allow for diagonal traversal.
The manhattan distance in the goal-function should be a measure of the shortest distance to the goal. (Which it isn't, if you can drive diagonally through building-blocks.)
The easiest way to fix this would be to keep the manhattan distance as a goal-function and change the definition of neighbours to only include the four left-right-up-down adjacent cells.
Edit
There are still problems with your code. The following pseudo code is taken from wikipedia. I have marked important lines that you will have to check. You must ensure that i) you are updating the nodes in the open set if you find a better solution and ii) you always take into account the previously traveled distance.
function A*(start,goal)
closedset := the empty set // The set of nodes already evaluated.
openset := {start} // The set of tentative nodes to be evaluated, initially containing the start node
came_from := the empty map // The map of navigated nodes.
g_score[start] := 0 // Cost from start along best known path.
// Estimated total cost from start to goal through y.
f_score[start] := g_score[start] + heuristic_cost_estimate(start, goal)
while openset is not empty
current := the node in openset having the lowest f_score[] value
if current = goal
return reconstruct_path(came_from, goal)
remove current from openset
add current to closedset
for each neighbor in neighbor_nodes(current)
// -------------------------------------------------------------------
// This is the way the tentative_g_score should be calculated.
// Do you include the current g_score in your calculation parent.distance(self) ?
tentative_g_score := g_score[current] + dist_between(current,neighbor)
// -------------------------------------------------------------------
if neighbor in closedset
if tentative_g_score >= g_score[neighbor]
continue
// -------------------------------------------------------------------
// You never make this comparrison
if neighbor not in openset or tentative_g_score < g_score[neighbor]
// -------------------------------------------------------------------
came_from[neighbor] := current
g_score[neighbor] := tentative_g_score
f_score[neighbor] := g_score[neighbor] + heuristic_cost_estimate(neighbor, goal)
if neighbor not in openset
add neighbor to openset
return failure
function reconstruct_path(came_from, current_node)
if current_node in came_from
p := reconstruct_path(came_from, came_from[current_node])
return (p + current_node)
else
return current_node