I have a graph of this type
And I have to estimate the probability to end in a node of a given color (let's say red for example). This probability is given by the chance of ending in a node of that color and not ending in any other node of that color in the graph. For example the probability of ending in the upper red node is 0.5*(1-(0.4*0.8)), which is the product of the chance of ending directly in the upper red node (0.5) and the chance of not ending in the lower red node (1- 0.4 *0.8).
So the total probability of ending in a red node is 0.5 *(1-(0.4 *0.8)) + 0.4 *0.8 *(1-0.5).
How can I formulate an algorithm solving this problem?
I created an algorithm that didn't care about the chance of not ending in any node of the same color in the tree (so its total probability of ending in a red node simply was 0.5 + 0.4 *0.8) that I could share if it's useful, but I'm having trouble in this algorithm
The simplier code I was talking before is this one:
def algorithm(self, startingNode, nodeToFind):
valueToReturn = 0
nodesToVisit = startingNode.nodesConnected
for j in nodesToVisit:
if j == nodeToFind:
probabilityToVisit = graph.search_edge(startingNode,j)
valueToReturn += probabilityToVisit
else:
valueToReturn += self.algorithm(j,nodeToFind)
return valueToReturn
The graph is a sort of tree where each node has a number of leaves <=2, and no successors of node X can have the same color of node X. Although two red nodes have almost all the same properties, they differ in their leaves because each single red node will have different sons based on the path traversed to reach it
Related
I need to compute an edge cover of a weighted bipartite graph which I have built in Networkx. Based on this answer, I have two algorithms that respectively return a minimum weight edge cover and a minimum cardinality (and weight) one. The minimum weight algorithm presents some odd behaviour in the choice of edges, which may be related to an error that happens in the minimum cardinality algorithm, so I'll explain both situations below.
Here are a few details about the graphs being considered:
My current test case has about 1200 nodes on one side and 1600 on the other, with over a million edges
All nodes have at least one incident edge
The graph is typically disconnected in a few blocks
The problem is built as an undirected graph, but directed edges would also make sense (they would always be from the set with bipartite==_og_id to the other)
Minimum weight algorithm
This algorithm seems to always pick the vv' edges (i.e., the edges that are between a node in the original graph and its copy in the larger graph). I thought it was because some edges had a weight of 0 (causing the vv'edge to also have a weight of 0), but adding a minimum weight when building the graph did not change this behaviour. (I use 0.1 since the minimum nonzero weight in the graph should be 1) This basically reverts the algorithm to "for each node, pick the edge that has the smallest weight" which is suboptimal.
Code:
def _min_weight_edge_cover(g: nx.Graph):
"""Returns an edge cover that minimizes the total weight of included edges, but not the total number of edges"""
clone = g.copy()
for node, bi in g.nodes(data='bipartite'):
nd = f"{node}_copy"
clone.nodes[node]['copy'] = False
clone.add_node(nd, copy=True, bipartite=(_og_id if bi == _tg_id else _tg_id)) # invert the bipartite flag
minw = min([w for u, v, w in g.edges(node, data='weight')])
clone.add_edge(node, nd, weight=(2 * minw))
# Now clone contains both the nodes of g and their copies, and should still be bipartite
tops = {n for n, d in clone.nodes(data=True) if d['bipartite'] == _og_id}
bots = set(clone) - tops
print(f"[cover] we have {len(tops)} tops and {len(bots)} bots")
# Here the matching should always exist and be perfect
matching = nx.bipartite.minimum_weight_full_matching(clone, tops)
cover = g.copy()
cover.clear_edges()
keys = {k for k in matching.keys() if clone.nodes[k]['copy'] is False}
for k in keys:
v = matching[k]
if g.has_edge(k, v):
# We never get here
cover.add_edge(k, v)
else:
# v was a copy - this is always true
assert clone.nodes[v]['copy']
minw = math.inf
mine = None
# FIXME should check that we don't add edges between nodes that are already covered
for u, va, w in g.edges(k, data='weight'):
if w < minw:
minw = w
mine = (u, va)
cover.add_edge(*mine)
return cover
Minimum cardinality (and weight)
This algorithm is much simpler (start with a matching and then add the cheapest edge of each node not included in the matching). However, the nx.bipartite.minimum_weight_full_matching function causes an error with cost matrix is infeasible in scipy.optimize.linear_sum_assignment. Unfortunately, there are no details on what makes the cost matrix infeasible. The documentation states that the function takes into account the different number of nodes in the sets, and I've made sure that all nodes have at least one edge. networkx.min_weight matching does work, but it's much, much slower than the bipartite version.
Code:
def _min_cardinality_weight_edge_cover(g: nx.Graph) -> nx.Graph:
"""Returns an edge cover that minimizes
1. the number of edges included;
2. the total weight of all edges included
"""
# get the minimum weight matching.
# By definition, it will have at most one edge per node but some node may end up unmatched
matching = nx.bipartite.minimum_weight_full_matching(g, top_nodes={n for n, b in g.nodes(data='bipartite') if b ==_og_id})
# to make it into a cover, we take all edges from the matching and, for each node not matched, add its cheapest edge
cover = nx.Graph()
cover.add_edges_from(matching.items())
missing = set(g.nodes) - set(cover.nodes)
# there shouldn't be a case where two missing nodes could connect to each other or else that edge would have been
# included in the matching
for node in missing:
minw = math.inf
mine = None
for u, v, w in g.edges(node, data='weight'):
if w < minw:
minw = w
mine = (u, v)
cover.add_edge(*mine)
return cover
Any ideas as to what could be causing these issues?
Consider the code below. Suppose the graph in question has N nodes with at most D neighbors for each node, and D+1 colors are available for coloring the nodes such that no two nodes connected with an edge have the same color assigned to them. I reckon the complexity of the code below is O(N*D) because for each of the N nodes we loop through the at most D neighbors of that node to populate the set illegal_colors, and then iterate through colors list that comprises D+1 colors. But the complexity given is O(N+M) where M is the number of edges. What am I doing wrong here?
def color_graph(graph, colors):
for node in graph:
if node in node.neighbors:
raise Exception('Legal coloring impossible for node with loop: %s' %
node.label)
# Get the node's neighbors' colors, as a set so we
# can check if a color is illegal in constant time
illegal_colors = set([
neighbor.color
for neighbor in node.neighbors
if neighbor.color
])
# Assign the first legal color
for color in colors:
if color not in illegal_colors:
node.color = color
break
The number of edges M, the maximum degree D and the number of nodes N satisfy the inequality:
M <= N * D / 2.
Therefore O(N+M) is included in O(N*(D+1)).
In your algorithm, you loop over every neighbour of every node. The exact complexity of that is not N*D, but d1 + d2 + d3 + ... + dN where di is the degree of node i. This sum is equal to 2*M, which is at most N*D but might be less.
Therefore the complexity of your algorithm is O(N+M). Hence it is also O(N*(D+1)). Note that O(N*(D+1)) = O(N*D) under the assumption D >= 1.
Saying your algorithm runs in O(N+M) is slightly more precise than saying it runs in O(N*D). If most nodes have a lot fewer than D neighbours, then M+N might be much smaller than N*D.
Also note that O(M+N) = O(M) under the assumption that every node has at least one neighbour.
The problem I have been attempting to solve is the path finding from a given position to a given goal for a dubins car (no backwards motion, constant velocity) with obstacles. I've attempted to implement a gridless A* algorithm with some simple obstacle avoidance. I expected the generated path to head straight towards the goal, and make minor adjustments to drive around the obstacles that it found. However, as soon as obstacles are introduced to the map, the path instead seems to get stuck at local minimum points of the algorithm's cost function.
The cost function I've implemented is the following:
f(x) = c(x) + g(x)
where c(x) is total travel cost, namely the cumulative cost of moving from node i-1 to i.
Also, g(x) is the cost of the optimal path from the current node to the end goal, which becomes a straight line as it ignores obstacles.
The cost is used as a priority value in a min heap, where each iteration pops the minimum node and generates children nodes. As the children are generated, it is controlled that they are not out of bounds, have not already been visited and are not inside an obstacle. If these controls return false, then the child is added to the heap.
I've attempted introducing a weighting factor k * g(x) to the path cost, hoping that this would "incentivize" the algorithm to move towards the goal instead of getting stuck at a point. However, this merely shifted the minimum point to another location, but still resulted in getting stuck.
I will include my code implementation of the A* algorithm below:
# Description: Pathfinding algorithm, iteratively generates new neighbouring
# nodes and selects the cheapest of these through utilizing a min heap.
# In: Car class object, a node as starting point.
# Out: The finishing node, with attached parent pointers.
def Astar(car, current):
minHeap = [] #initialize heap as list
h.heappush(minHeap, current) #push initial node onto heap
heapCount = 1 #add upon pushes to heap, subtract upon pop
# Iterate through nodes in priority queue
while not ((goal(car, current)) or heapCount == 0):
current = h.heappop(minHeap)
heapCount -= 1
for phi in [-m.pi/4, 0, m.pi/4]: #Full turns or straight are optimal, according to Pontryagins maximum principle
#calculate new values for each phi (steering angle)
xn, yn, thetan = step(car, current.x, current.y, current.theta, phi)
#control feasibility of position
if validCheck(car, xn, yn, current):
#calculate costs for these directives
costC = current.travelled + m.hypot(current.x - xn, current.y - yn) #cost of travel from start position
costG = m.hypot(car.xt - xn, car.yt - yn) #current optimal distance to goal
totalCost = costC + costG
#renew time stamp
newTime = current.time + 0.01
#create child from new data
child = Node(xn, yn, thetan, phi, totalCost, costC, newTime, current)
#push child onto heap
h.heappush(minHeap, child)
heapCount += 1
return current
Note that car is a class which includes certain attributes:
x0 : float: initial x-position [m]
y0 : float: initial y-position [m]
xt : float: target x-position [m]
yt : float: target y-position [m]
xlb : float: minimum x-position [m]
xub : float: maximum x-position [m]
ylb : float: minimum y-position [m]
yub : float: maximum y-position [m]
obs : list: list of tuples for each obstacle obs[i]
It also includes a method step which can generate a new heading angle and position when given a steering angle and previous heading and position.
Any advice or help regarding this problem, why it is occurring and what I can do to improve the path finding would very much be appreciated.
I don't have a solution ready, but an explanation what's going on and and maybe a hint what you can do.
Analysis
The A* algorithm is for graph searching and, given a decent cost function, can greatly reduce the search space when compared with uninformed strategies like BFS. But still, the size of the problem graph matters.
In your code, I see a time increment of 0.01, and I read that as a hint that you are doing very small steps from parent to child nodes. That surely makes sense, to most closely approximating a smooth, non-quantized movement. But at the same time, it results in a huge growth of your problem graph.
Without obstacles, A* will still handle that huge graph gracefully. It will postpone all deviations from the straight line, as their cost will be higher than the node on the straight line. Your heap will grow (have some debug output show you its size...), but most nodes will never be explored further.
With obstacles, the game changes drastically. Let's say, there's an obstacle so that the resulting best path is 1.00 units longer than the straight line. Then A* will explore all nonsense paths, starting from somewhere on the line from start to obstacle, arbitrarily turning left or right until these paths reach an additional length of 1.00. There will be lots of these useless paths, and A* gets stuck in exploring nonsense.
Suggestion
I'd have the A* operate on a higher level.
I guess your obstacles are polygons. So the resulting total path will either ignore an obstacle or touch it at one of its corners. The elements between the touching points will start at a touching point with some heading direction, consist of an initial full-turn part, then a straight part, and then a final full-turn part, and then arrive at the next touching point with some (different) heading (to be honest, I'm not absolutely sure that this turn-straight-turn pattern will really cover all possible situations). Given start and end points and the desired end heading of such an element, you can compute the parts using some geometry, and by the way, check for collisions.
You can't know in advance the optimum heading when passing some touching point, so you'd have to check all possible headings.
I'd take these elements as the steps to be explored by A*.
So, that's how I'd apply A* to your task:
To compute the children of a node, for all other polygon corners and all headings at that corner, compute the element from the parent corner to the other corner, resulting in the given heading there. Check if such an element is geometrically possible and does not collide with some obstacle.
As the cost function, accumulate the length travelled so far, and then add the shortest obstacle-ignoring path to the target. This can either be the straight Pythagorean distance, or a more elaborate distance, taking into account the neccessary initial turn from the current heading to facing the target.
I have a problem involving graph theory. To solve it, I would like to create a weighted graph using networkx. At the moment, I have a dictionnary where each key is a node, and each value is the associated weight (between 10 and 200 000 or so).
weights = {node: weight}
I believe I do not need to normalize the weights with networks.
At the moment, I create a non-weighted graph by adding the edges:
def create_graph(data):
edges = create_edges(data)
# Create the graph
G = nx.Graph()
# Add edges
G.add_edges_from(edges)
return G
From what I read, I can add a weight to the edge. However, I would prefer the weight to be applied to a specific node instead of an edge. How can I do that?
Idea: I create the graph by adding the nodes weighted, and then I add the edges between the nodes.
def create_graph(data, weights):
nodes = create_nodes(data)
edges = create_edges(data) # list of tuples
# Create the graph
G = nx.Graph()
# Add edges
for node in nodes:
G.add_node(node, weight=weights[node])
# Add edges
G.add_edges_from(edges)
return G
Is this approach correct?
Next step is to find the path between 2 nodes with the smallest weight. I found this function: networkx.algorithms.shortest_paths.generic.shortest_path which I think is doing the right thing. However, it uses weights on the edge instead of weights on the nodes. Could someone explain me what this function does, what the difference between wieghts on the nodes and weights on the edges is for networkx, and how I could achieve what I am looking for? Thanks :)
This generally looks right.
You might use bidirectional_dijkstra. It can be significantly faster if you know the source and target nodes of your path (see my comments at the bottom).
To handle the edge vs node weight issue, there are two options. First note that you are after the sum of the nodes along the path. If I give each edge a weight w(u,v) = w(u) + w(v) then the sum of weights along this is w(source) + w(target) + 2 sum(w(v)) where the nodes v are all nodes found along the way. Whatever has the minimum weight with these edge weights will have the minimum weight with the node weights.
So you could go and assign each edge the weight to be the sum of the two nodes.
for edge in G.edges():
G.edges[edge]['weight'] = G.nodes[edge[0]]['weight'] + G.nodes[edge[1]]['weight']
But an alternative is to note that the weight input into bidirectional_dijkstra can be a function that takes the edge as input. Define your own function to give the sum of the two node weights:
def f(edge):
u,v = edge
return G.nodes[u]['weight'] + G.nodes[v]['weight']
and then in your call do bidirectional_dijkstra(G, source, target, weight=f)
So the choices I'm suggesting are to either assign each edge a weight equal to the sum of the node weights or define a function that will give those weights just for the edges the algorithm encounters. Efficiency-wise I expect it will take more time to figure out which is better than it takes to code either algorithm. The only performance issue is that assigning all the weights will use more memory. Assuming memory isn't an issue, use whichever one you think is easiest to implement and maintain.
Some comments on bidirectional dijkstra: Imagine you have two points in space a distance R apart and you want to find the shortest distance between them. The dijkstra algorithm (which is the default of shortest_path) will explore every point within distance D of the source point. Basically it's like expanding a balloon centered at the first point until it reaches the other. This has a volume (4/3) pi R^3. With bidirectional_dijkstra we inflate balloons centered at each until they touch. They will each have radius R/2. So the volume is (4/3)pi (R/2)^3 + (4/3) pi (R/2)^3, which is a quarter the volume of the original balloon, so the algorithm has explored a quarter of the space. Since networks can have very high effective dimension, the savings is often much bigger.
I'm trying to find the distance of a point (in 4 dimensions, only 2 are shown here) (any coloured crosses in the figure) to a supposed Pareto frontier (black line). This line represents the best Pareto frontier representation during an optimization process.
Pareto = [[0.3875575798354123, -2.4122340425531914], [0.37707675586149786, -2.398936170212766], [0.38176077842761763, -2.4069148936170213], [0.4080534133844003, -2.4914285714285715], [0.35963459448268725, -2.3631532329495126], [0.34395217638838566, -2.3579931972789114], [0.32203302106516224, -2.344858156028369], [0.36742404637441123, -2.3886054421768708], [0.40461156254852226, -2.4141156462585034], [0.36387868122767975, -2.375], [0.3393199109776927, -2.348404255319149]]
Right now, I calculate the distance from any point to the Pareto frontier like this:
def dominates(row, rowCandidate):
return all(r >= rc for r, rc in zip(row, rowCandidate))
def dist2Pareto(pareto,candidate):
listDist = []
dominateN = 0
dominatePoss = 0
if len(pareto) >= 2:
for i in pareto:
if i != candidate:
dominatePoss += 1
dominate = dominates(candidate,i)
if dominate == True:
dominateN += 1
listDist.append(np.linalg.norm(np.array(i)-np.array(candidate)))
listDist.sort()
if dominateN == len(pareto):
print "beyond"
return listDist[0]
else:
return listDist[0]
Where I calculate the distance to each point of the black line, and retrieve the shortest distance (distance to the closest point of the known Frontier).
However, I feel I should calculate the distance to the closest line segment instead. How would I go about achieving this?
The formula for the coordinates of the nearest point on the line is given here. Specifically, you are interested in the one called "line defined by two points". For posterity, the formula is:
Because the frontier is relatively simple, you can loop through each two-point line segment in the frontier, and calculate the closest distance for each, keeping the smallest. You could introduce other constraints / pre-computations to limit the number of calculations required.