Python - speed up pathfinding - python

This is my pathfinding function:
def get_distance(x1,y1,x2,y2):
neighbors = [(-1,0),(1,0),(0,-1),(0,1)]
old_nodes = [(square_pos[x1,y1],0)]
new_nodes = []
for i in range(50):
for node in old_nodes:
if node[0].x == x2 and node[0].y == y2:
return node[1]
for neighbor in neighbors:
try:
square = square_pos[node[0].x+neighbor[0],node[0].y+neighbor[1]]
if square.lightcycle == None:
new_nodes.append((square,node[1]))
except KeyError:
pass
old_nodes = []
old_nodes = list(new_nodes)
new_nodes = []
nodes = []
return 50
The problem is that the AI takes to long to respond( response time <= 100ms)
This is just a python way of doing https://en.wikipedia.org/wiki/Pathfinding#Sample_algorithm

You should replace your algorithm with A*-search with the Manhattan distance as a heuristic.

One reasonably fast solution is to implement the Dijkstra algorithm (that I have already implemented in that question):
Build the original map. It's a masked array where the walker cannot walk on masked element:
%pylab inline
map_size = (20,20)
MAP = np.ma.masked_array(np.zeros(map_size), np.random.choice([0,1], size=map_size))
matshow(MAP)
Below is the Dijkstra algorithm:
def dijkstra(V):
mask = V.mask
visit_mask = mask.copy() # mask visited cells
m = numpy.ones_like(V) * numpy.inf
connectivity = [(i,j) for i in [-1, 0, 1] for j in [-1, 0, 1] if (not (i == j == 0))]
cc = unravel_index(V.argmin(), m.shape) # current_cell
m[cc] = 0
P = {} # dictionary of predecessors
#while (~visit_mask).sum() > 0:
for _ in range(V.size):
#print cc
neighbors = [tuple(e) for e in asarray(cc) - connectivity
if e[0] > 0 and e[1] > 0 and e[0] < V.shape[0] and e[1] < V.shape[1]]
neighbors = [ e for e in neighbors if not visit_mask[e] ]
tentative_distance = [(V[e]-V[cc])**2 for e in neighbors]
for i,e in enumerate(neighbors):
d = tentative_distance[i] + m[cc]
if d < m[e]:
m[e] = d
P[e] = cc
visit_mask[cc] = True
m_mask = ma.masked_array(m, visit_mask)
cc = unravel_index(m_mask.argmin(), m.shape)
return m, P
def shortestPath(start, end, P):
Path = []
step = end
while 1:
Path.append(step)
if step == start: break
if P.has_key(step):
step = P[step]
else:
break
Path.reverse()
return asarray(Path)
And the result:
start = (2,8)
stop = (17,19)
D, P = dijkstra(MAP)
path = shortestPath(start, stop, P)
imshow(MAP, interpolation='nearest')
plot(path[:,1], path[:,0], 'ro-', linewidth=2.5)
Below some timing statistics:
%timeit dijkstra(MAP)
#10 loops, best of 3: 32.6 ms per loop

The biggest issue with your code is that you don't do anything to avoid the same coordinates being visited multiple times. This means that the number of nodes you visit is guaranteed to grow exponentially, since it can keep going back and forth over the first few nodes many times.
The best way to avoid duplication is to maintain a set of the coordinates we've added to the queue (though if your node values are hashable, you might be able to add them directly to the set instead of coordinate tuples). Since we're doing a breadth-first search, we'll always reach a given coordinate by (one of) the shortest path(s), so we never need to worry about finding a better route later on.
Try something like this:
def get_distance(x1,y1,x2,y2):
neighbors = [(-1,0),(1,0),(0,-1),(0,1)]
nodes = [(square_pos[x1,y1],0)]
seen = set([(x1, y1)])
for node, path_length in nodes:
if path_length == 50:
break
if node.x == x2 and node.y == y2:
return path_length
for nx, ny in neighbors:
try:
square = square_pos[node.x + nx, node.y + ny]
if square.lightcycle == None and (square.x, square.y) not in seen:
nodes.append((square, path_length + 1))
seen.add((square.x, square.y))
except KeyError:
pass
return 50
I've also simplified the loop a bit. Rather than switching out the list after each depth, you can just use one loop and add to its end as you're iterating over the earlier values. I still abort if a path hasn't been found with fewer than 50 steps (using the distance stored in the 2-tuple, rather than the number of passes of the outer loop). A further improvement might be to use a collections.dequeue for the queue, since you could efficiently pop from one end while appending to the other end. It probably won't make a huge difference, but might avoid a little bit of memory usage.
I also avoided most of the indexing by one and zero in favor of unpacking into separate variable names in the for loops. I think this is much easier to read, and it avoids confusion since the two different kinds of 2-tuples had had different meanings (one is a node, distance tuple, the other is x, y).

Related

Is the given Graph a tree? Faster than below approach -

I was given a question during an interview and although my answer was accepted at the end they wanted a faster approach and I went blank..
Question :
Given an undirected graph, can you see if it's a tree? If so, return true and false otherwise.
A tree:
A - B
|
C - D
not a tree:
A
/ \
B - C
/
D
You'll be given two parameters: n for number of nodes, and a multidimensional array of edges like such: [[1, 2], [2, 3]], each pair representing the vertices connected by the edge.
Note:Expected space complexity : O(|V|)
The array edges can be empty
Here is My code: 105ms
def is_graph_tree(n, edges):
nodes = [None] * (n + 1)
for i in range(1, n+1):
nodes[i] = i
for i in range(len(edges)):
start_edge = edges[i][0]
dest_edge = edges[i][1]
if nodes[start_edge] != start_edge:
start_edge = nodes[start_edge]
if nodes[dest_edge] != dest_edge:
dest_edge = nodes[dest_edge]
if start_edge == dest_edge:
return False
nodes[start_edge] = dest_edge
return len(edges) <= n - 1
Here's one approach using a disjoint-set-union / union-find data structure:
def is_graph_tree(n, edges):
parent = list(range(n+1))
size = [1] * (n + 1)
for x, y in edges:
# find x (path splitting)
while parent[x] != x:
x, parent[x] = parent[x], parent[parent[x]]
# find y
while parent[y] != y:
y, parent[y] = parent[y], parent[parent[y]]
if x == y:
# Already connected
return False
# Union (by size)
if size[x] < size[y]:
x, y = y, x
parent[y] = x
size[x] += size[y]
return True
assert not is_graph_tree(4, [(1, 2), (2, 3), (3, 4), (4, 2)])
assert is_graph_tree(6, [(1, 2), (2, 3), (3, 4), (3, 5), (1, 6)])
The runtime is O(V + E*InverseAckermannFunction(V)), which better than O(V + E * log(log V)), so it's basically O(V + E).
Tim Roberts has posted a candidate solution, but this will work in the case of disconnected subtrees:
import queue
def is_graph_tree(n, edges):
# A tree with n nodes has n - 1 edges.
if len(edges) != n - 1:
return False
# Construct graph.
graph = [[] for _ in range(n)]
for first_vertex, second_vertex in edges:
graph[first_vertex].append(second_vertex)
graph[second_vertex].append(first_vertex)
# BFS to find edges that create cycles.
# The graph is undirected, so we can root the tree wherever we want.
visited = set()
q = queue.Queue()
q.put((0, None))
while not q.empty():
current_node, previous_node = q.get()
if current_node in visited:
return False
visited.add(current_node)
for neighbor in graph[current_node]:
if neighbor != previous_node:
q.put((neighbor, current_node))
# Only return true if the graph has only one connected component.
return len(visited) == n
This runs in O(n + len(edges)) time.
You could approach this from the perspective of tree leaves. Every leaf node in a tree will have exactly one edge connected to it. So, if you count the number of edges for each nodes, you can get the list of leaves (i.e. the ones with only one edge).
Then, take the linked node from these leaves and reduce their edge count by one (as if you were removing all the leaves from the tree. That will give you a new set of leaves corresponding to the parents of the original leaves. Repeat the process until you have no more leaves.
[EDIT] checking that the number of edges is N-1 eliminiates the need to do the multi-root check because there will be another discrepancy (e.g. double link, missing node) in the graph if there are multiple 'roots' or a disconnected subtree
If the graph is a tree, this process should eliminate all nodes from the node counts (i.e. they will all be flagged as leaves at some point).
Using the Counter class (from collections) will make this relatively easy to implement:
from collections import Counter
def isTree(N,E):
if N==1 and not E: return True # root only is a tree
if len(E) != N-1: return False # a tree has N-1 edges
counts = Counter(n for ab in E for n in ab) # edge counts per node
if len(counts) != N : return False # unlinked nodes
while True:
leaves = {n for n,c in counts.items() if c==1} # new leaves
if not leaves:break
for a,b in E: # subtract leaf counts
if counts[a]>1 and b in leaves: counts[a] -= 1
if counts[b]>1 and a in leaves: counts[b] -= 1
for n in leaves: counts[n] = -1 # flag leaves in counts
return all(c==-1 for c in counts.values()) # all must become leaves
output:
G = [[1,2],[1,3],[4,5],[4,6]]
print(isTree(6,G)) # False (disconnected sub-tree)
G = [[1,2],[1,3],[1,4],[2,3],[5,6]]
print(isTree(6,G)) # False (doubly linked node 3)
G = [[1,2],[2,6],[3,4],[5,1],[2,3]]
print(isTree(6,G)) # True
G = [[1,2],[2,3]]
print(isTree(3,G)) # True
G = [[1,2],[2,3],[3,4]]
print(isTree(4,G)) # True
G = [[1,2],[1,3],[2,5],[2,4]]
print(isTree(6,G)) # False (missing node)
Space complexity is O(N) because the counts dictionary has one entry per node(vertex) with an integer as value. Time complexity will be O(ExL) where E is the number of edges and L is the number of levels in the tree. The worts case time is O(E^2) for a tree where all parents have only one child node. However, since the initial condition is for E to be less than V, the worst case will actually be O(V^2)
Note that this algorithm makes no assumption on edge order or numerical relationships between node numbers. The root (last node to be made a leaf) found by this algorithm is not necessarily the only possible root given that, unless the nodes have an implicit cardinality relationship (or edges have an order), there could be ambiguous scenarios:
[1,2],[2,3],[2,4] could be:
1 2 3
|_2 OR |_1 OR |_2
|_3 |_3 |_1
|_4 |_4 |_4
If a cardinality relationship between node numbers or an order of edges can be relied upon, the algorithm could potentially be made more time efficient (because we could easily determine which node is the root and start from there).
[EDIT2] Alternative method using groups.
When the number of edges is N-1, if the graph is a tree, all nodes should be reachable from any other node. This means that, if we form groups of reachable nodes for each node and merge them together based on the edges, we should end up with a single group after going through all the edges.
Here is the modified function based on that approach:
def isTree(N,E):
if N==1 and not E: return True # root only is a tree
if len(E) != N-1: return False # a tree has N-1 edges
groups = {n:[n] for ab in E for n in ab} # each node in its own group
if len(groups) != N : return False # unlinked nodes
for a,b in E:
groups[a].extend(groups[b]) # merge groups
for n in groups[b]: groups[n] = groups[a] # update nodes' groups
return len(set(map(id,groups.values()))) == 1 # only one group when done
Given that we start out with fewer edges than nodes and that group merging will consume at most 2x a group size (so also < N), the space complexity will remain O(V). The time complexity will also be O(V^2) at for the worts case scenarios
You don't even need to know how many edges there are:
def is_graph_tree(n, edges):
seen = set()
for a,b in edges:
b = max(a,b)
if b in seen:
return False
seen.add(b)
return True
a = [[1,2],[2,3],[3,4]]
print(is_graph_tree(0,a))
b = [[1,2],[1,3],[2,3],[2,4]]
print(is_graph_tree(0,b))
Now, this WON'T catch the case of disconnected subtrees, but that wasn't in the problem description...

Get a set of maximum number of dissimilar arrays

I have an array A of length n, each element of this array(say Wi) is an array is an array of length 10. There is a function, match_check(Wi, Wj) defined as :
def match_check(Wi, Wj):
n = len(Wi)
num_matches =0
for i in range(n):
if (round(Wi[i],4)== round(Wj[i]),4):
num_matches +=1
if (num_matches >= 3):
return True
else :
False
I want to get set of maximum number of elements from this array A, such that for no two elements in this set match_check is True. I have thought of this as a DP problem and written the following solution.
def maximum_arrays(start,end ,curr_items=[], match_dict={}, lookup_dict={}):
key = str(start) + "|" + str(end)
if (lookup_dict.get(key)):
return lookup_dict[key]
if (start == end ):
for items in curr_items:
match_key = str(start)+ ":" + str(items)
if(match_dict[match_key]):
lookup_dict[key] = len(curr_items)
return lookup_dict[key]
lookup_dict[key] = 1 + len(curr_items)
return lookup_dict[key]
match_flag = False
for items in curr_items:
match_key = str(start)+":" + str(items)
if (match_dict.get(match_key)):
match_flag = True
break
if (match_flag):
lookup_dict[key] = maximum_arrays(start+1,end, curr_items,match_dict, lookup_dict)
else:
curr_items_new = curr_items + [start]
lookup_dict[key] = max(1 + maximum_arrays(start+1,end, curr_items_new,match_dict, lookup_dict),
maximum_arrays(start+1,end, curr_items,match_dict, lookup_dict))
return lookup_dict[key]
Where match_dict is contains the result of match_check for all possible pairs of indexes from the array A. But I doubt that dynamic programming would help here and the solution would be O(2^n), since we have to evaluate for all possible cases(keeping and dropping each element in the set).
A simple algorithm which takes O(n^2) would be to first build an adjacency matrix for these arrays by simply applying match_check to every couple of arrays. An edge will be added iff the function match_check returned False.
Then, the problem reduces to finding the maximum clique within the graph and returning its size, a thing which can be done in O(n^2).
Here is a simple demo:
import networkx as nx
import numpy as np
def match_check(Wi, Wj):
n = len(Wi)
num_matches =0
for i in range(n):
if round(Wi[i],4) == round(Wj[i],4):
num_matches +=1
if (num_matches >= 3):
return True
else :
return False
check_arr = [list(10*np.random.rand(5)) for k in range(10)]
n = len(check_arr)
graph_adjacency_mat = np.zeros((n,n))
for i in range(n):
for j in range(n):
if i==j:
continue
graph_adjacency_mat[i][j] = not match_check(check_arr[i],check_arr[j])
graph_adjacency_mat[j][i] = graph_adjacency_mat[i][j]
G=nx.from_numpy_matrix(graph_adjacency_mat)
print(max([len(clique) for clique in nx.find_cliques(G)]))
Note that here I've used the find_cliques function from NetworkX which is NOT O(n^2) (but O(3^(n/3))) because the function max_clique of NetworkX seems to be discarded. You can easily implement max_clique by applying BFS/DFS on the graph starting from every vertex and saving the maximum clique found thus far.

Optimizing grid connections with python

I have the following situation:
(1) I have a large grid. By some conditions I want to further observe specific points/cells in this grid. Each cell has an ID and coordinates X, Y seperately. So in this case lets observe one cell only - marked C on the image, that is located on the edge of the grid. By some formula I can get all the neighbouring cells of the first order (marked 1 on the image) and the second order (marked 2 on the image).
(2) With a further condition I identify some cells in the neighbouring cells and are marked in orange on the second image. What I want to do is to connect all orange cells with each other by optimizing the distances and takih into account only min() distances. My first attempt was to observe cells only by calculating the distances to cells of the lower order. So when looking at cells in neighbours cells 2, i'm looking at the cells in 1 only. The solution of connections is presented on image 2, but it's not optimal, since the ideal solution would compare the distances of all cells and not only of the cells of the lower neighbour order. By doing this, i'm getting the situation presented on image 3. And the problem is that the cells are of course not connected to the centre. What to do?
The current code is:
CO - list of centre points.
data - df all all ID's with X,Y values
CO_list = CO['ID'].tolist()
neighbor100 = []
for p in IskanjeCO_list:
d = get_neighbors100k2(p, len(data)) #function that finds the ID's of neighbours of the first order
neighbor100.append(d)
neighbor200 = []
for p in IskanjeCO_list:
d = get_neighbors200k2(p, len(data)) #function that finds the ID's of neighbours of the second order
neighbor200.append(d)
flat100 = []
for i in neighbor100:
for j in i:
flat100.append(j)
flat200 = []
for i in neighbor200:
for j in i:
flat200.append(j)
neighbors100 = flat100
neighbors200 = flat200
data_sosedi100 = data.iloc[flat100,].reset_index(drop=True)
data_sosedi200 = data.iloc[flat200,].reset_index(drop=True)
dist200 = []
for b in flat200:
d = ((pd.DataFrame((data_sosedi100['X']* - data.iloc[b,]['X'])**2
+ (data_sosedi100['Y'] - data.iloc[b,]['Y'])**2 )**0.5)).sum(1)
dist200.append(d.min())
data_sosedi200['dist'] = dist200
data_sosedi200['id'] = None
for e in CO_list:
data_sosedi200.loc[data_sosedi200['FID_2'].isin((get_neighbors200k2(e, len(data)))),'id'] = e
Do you have any suggestion how to optimize this a bit further? I hope i presented the whole image. If needed, I'll clarify further. If you see a part of the code, where i'd be able to furher optimize this loop, i'd be very grateful!
I defined the points manually to work with:
import numpy as np
from operator import itemgetter, attrgetter
nodes = [[-2,1], [-2,0], [-1,0], [0,0], [1,1], [2,1], [2,0], [1,2], [2,2]]
center = [0,0]
def find_neighbor(node):
n=[]
for i in range(-1,2):
for j in range(-1,2):
if not (i ==0 and j ==0):
n.append([node[0]+i,node[1]+j])
return [N for N in n if N in nodes]
def distance_to_center(node):
return np.sqrt(node[0]**2+node[1]**2)
def distance_between_two_nodes(node1, node2):
return np.sqrt((node1[0]-node2[0])**2+(node1[1]-node2[1])**2)
def next_node_closest_to_center(node):
min = distance_to_center(node)
next_node = node
for n in find_neighbor(node):
if distance_to_center(n) < min:
min = distance_to_center(n)
next_node = n
return next_node, min
def get_path_to_center(node):
node_path = [node]
distance = 0.
while node!= center:
new_node = next_node_closest_to_center(node)[0]
distance += distance_between_two_nodes(node, new_node)
node_path.append(new_node)
node=new_node
return node_path,distance
def furthest_nodes_from_center(nodes):
max = 0.
for n in nodes:
if get_path_to_center(n)[1] > max:
furthest_nodes_pathwise = []
max = get_path_to_center(n)[1]
furthest_nodes_pathwise.append(n)
elif get_path_to_center(n)[1] == max:
furthest_nodes_pathwise.append(n)
return furthest_nodes_pathwise
def farthest_node_from_center(nodes):
max = 0.
farthest_node = center
for n in nodes:
if distance_to_center(n) > max:
max = distance_to_center(n)
farthest_node = n
return farthest_node
def closest_node_to_center(nodes):
min = distance_to_center(farthest_node_from_center(nodes))
for n in nodes:
if distance_to_center(n) < min:
min = distance_to_center(n)
closest_node = n
return closest_node
def closest_node_center_with_furthest_distance(node_selection):
if len(node_selection) == 1:
return node_selection[0]
else:
return closest_node_to_center(node_selection)
print(closest_node_center_with_furthest_distance(furthest_nodes_from_center(nodes)))
Output:
[2, 0]
[Finished in 0.266s]
By running on all nodes I can now determine that the furthest node away path-wise but still closest to the center distance wise is [2,0] and not [2,2]. So we start from there. To find the one on the other side just split the data like I said into negative x values and positive. if you run it over a list of only the negative x value cells you will get [-2,1]
Now that you have your 2 starting cells [2,0] and [-2,1] I will leave you to figure out the algorithm to navigate to the center passing by all cells using the steps in my comments (you can now skip step 1 because this is the answer posted)

Finding the optimal location for router placement

I am looking for an optimization algorithm that takes a text file encoded with 0s, 1s, and -1s:
1's denoting target cells that requires Wi-Fi coverage
0's denoting cells that are walls
1's denoting cells that are void (do not require Wi-Fi coverage)
Example of text file:
I have created a solution function along with other helper functions, but I can't seem to get the optimal positions of the routers to be placed to ensure proper coverage. There is another file that does the printing, I am struggling with finding the optimal location. I basically need to change the get_random_position function to get the optimal one, but I am unsure how to do that. The area covered by the various routers are:
This is the kind of output I am getting:
Each router covers a square area of at most (2S+1)^2
Type 1: S=5; Cost=180
Type 2: S=9; Cost=360
Type 3: S=15; Cost=480
My code is as follows:
import numpy as np
import time
from random import randint
def is_taken(taken, i, j):
for coords in taken:
if coords[0] == i and coords[1] == j:
return True
return False
def get_random_position(floor, taken , nrows, ncols):
i = randint(0, nrows-1)
j = randint(0, ncols-1)
while floor[i][j] == 0 or floor[i][j] == -1 or is_taken(taken, i, j):
i = randint(0, nrows-1)
j = randint(0, ncols-1)
return (i, j)
def solution(floor):
start_time = time.time()
router_types = [1,2,3]
nrows, ncols = floor.shape
ratio = 0.1
router_scale = int(nrows*ncols*0.0001)
if router_scale == 0:
router_scale = 1
row_ratio = int(nrows*ratio)
col_ratio = int(ncols*ratio)
print('Row : ',nrows, ', Col: ', ncols, ', Router scale :', router_scale)
global_best = [0, ([],[],[])]
taken = []
while True:
found_better = False
best = [global_best[0], (list(global_best[1][0]), list(global_best[1][1]), list(global_best[1][2]))]
for times in range(0, row_ratio+col_ratio):
if time.time() - start_time > 27.0:
print('Time ran out! Using what I got : ', time.time() - start_time)
return global_best[1]
fit = []
for rtype in router_types:
interim = (list(global_best[1][0]), list(global_best[1][1]), list(global_best[1][2]))
for i in range(0, router_scale):
pos = get_random_position(floor, taken, nrows, ncols)
interim[0].append(pos[0])
interim[1].append(pos[1])
interim[2].append(rtype)
fit.append((fitness(floor, interim), interim))
highest_fitness = fit[0]
for index in range(1, len(fit)):
if fit[index][0] > highest_fitness[0]:
highest_fitness = fit[index]
if highest_fitness[0] > best[0]:
best[0] = highest_fitness[0]
best[1] = (highest_fitness[1][0],highest_fitness[1][1], highest_fitness[1][2])
found_better = True
global_best = best
taken.append((best[1][0][-1],best[1][1][-1]))
break
if found_better == False:
break
print('Best:')
print(global_best)
end_time = time.time()
run_time = end_time - start_time
print("Run Time:", run_time)
return global_best[1]
def available_cells(floor):
available = 0
for i in range(0, len(floor)):
for j in range(0, len(floor[i])):
if floor[i][j] != 0:
available += 1
return available
def fitness(building, args):
render = np.array(building, dtype=int, copy=True)
cov_factor = 220
cost_factor = 22
router_types = { # type: [coverage, cost]
1: {'size' : 5, 'cost' : 180},
2: {'size' : 9, 'cost' : 360},
3: {'size' : 15, 'cost' : 480},
}
routers_used = args[-1]
for r, c, t in zip(*args):
size = router_types[t]['size']
nrows, ncols = render.shape
rows = range(max(0, r-size), min(nrows, r+size+1))
cols = range(max(0, c-size), min(ncols, c+size+1))
walls = []
for ri in rows:
for ci in cols:
if building[ri, ci] == 0:
walls.append((ri, ci))
def blocked(ri, ci):
for w in walls:
if min(r, ri) <= w[0] and max(r, ri) >= w[0]:
if min(c, ci) <= w[1] and max(c, ci) >= w[1]:
return True
return False
for ri in rows:
for ci in cols:
if blocked(ri, ci):
continue
if render[ri, ci] == 2:
render[ri, ci] = 4
if render[ri, ci] == 1:
render[ri, ci] = 2
render[r, c] = 5
return (
cov_factor * np.sum(render > 1) -
cost_factor * np.sum([router_types[x]['cost'] for x in routers_used])
)
Here's a suggestion on how to solve the problem; however I don't affirm this is the best approach, and it's certainly not the only one.
Main idea
Your problem can be modelised as a weighted minimum set cover problem.
Good news, this is a well known optimization problem:
It is easy to find algorithm descriptions for approximate solutions
A quick search on the web shows many implementations of approximation algorithms in Python.
Bad news, this is a NP-hard optimization problem:
If you need an exact solution: algorithms will work only for "small" sized problems in a reasonable amount of time(in your case: size of the problem <=> number of "1" cells).
Approximate (a.k.a greedy) algorithms are trade-off between computation requirements, and a risk do deliver far from optimal solutions in certain cases.
Note that the following part does not prove that your problem is NP-hard. The general minimum set cover problem is NP-hard. In your case the subsets have several properties that might help to design a better algorithm. I have no idea how though.
Translating into a cover set problem
Let's define some sets:
U: the set of "1" cells (requiring Wifi).
P(U): the power set of U (the set of subsets of U).
P: the set of cells on which you can place a router (not sure if P=U in your original post).
T: the set of router type (3 values in your case).
R+: positive Real number (used to describe prices).
Let's define a function (pseudo Python):
# Domain of definition : T,P --> R+,P(U)
# This function takes a router type and a position, and returns
# a tuple containing:
# - the price of a router of the given type.
# - the subset of U containing all the position covered by a router
# of the given type placed at the given position.
def weighted_subset(routerType, position):
pass # TODO: implementation
Now, we define a last set, as the image of the function we've just described: S=weighted_subset(T,P). Each element of this set is a subset of U, weighted by a price in R+.
With all this formalism, finding the router types & positions that:
gives coverage to all the desirable locations
minimize the cost
Is equivalent to finding a sub-collection of S:
whose union of their P(U) is equal to U
which minimise the sum of the associated weights
Which is the weighted minimal set cover problem.

backtracking not trying all possibilities

so I've got a list of questions as a dictionary, e.g
{"Question1": 3, "Question2": 5 ... }
That means the "Question1" has 3 points, the second one has 5, etc.
I'm trying to create all subset of question that have between a certain number of questions and points.
I've tried something like
questions = {"Q1":1, "Q2":2, "Q3": 1, "Q4" : 3, "Q5" : 1, "Q6" : 2}
u = 3 #
v = 5 # between u and v questions
x = 5 #
y = 10 #between x and y points
solution = []
n = 0
def main(n_):
global n
n = n_
global solution
solution = []
finalSolution = []
for x in questions.keys():
solution.append("_")
finalSolution.extend(Backtracking(0))
return finalSolution
def Backtracking(k):
finalSolution = []
for c in questions.keys():
solution[k] = c
print ("candidate: ", solution)
if not reject(k):
print ("not rejected: ", solution)
if accept(k):
finalSolution.append(list(solution))
else:
finalSolution.extend(Backtracking(k+1))
return finalSolution
def reject(k):
if solution[k] in solution: #if the question already exists
return True
if k > v: #too many questions
return True
points = 0
for x in solution:
if x in questions.keys():
points = points + questions[x]
if points > y: #too many points
return True
return False
def accept(k):
points = 0
for x in solution:
if x in questions.keys():
points = points + questions[x]
if points in range (x, y+1) and k in range (u, v+1):
return True
return False
print(main(len(questions.keys())))
but it's not trying all possibilities, only putting all the questions on the first index..
I have no idea what I'm doing wrong.
There are three problems with your code.
The first issue is that the first check in your reject function is always True. You can fix that in a variety of ways (you commented that you're now using solution.count(solution[k]) != 1).
The second issue is that your accept function uses the variable name x for what it intends to be two different things (a question from solution in the for loop and the global x that is the minimum number of points). That doesn't work, and you'll get a TypeError when trying to pass it to range. A simple fix is to rename the loop variable (I suggest q since it's a key into questions). Checking if a value is in a range is also a bit awkward. It's usually much nicer to use chained comparisons: if x <= points <= y and u <= k <= v
The third issue is that you're not backtracking at all. The backtracking step needs to reset the global solution list to the same state it had before Backtracking was called. You can do this at the end of the function, just before you return, using solution[k] = "_" (you commented that you've added this line, but I think you put it in the wrong place).
Anyway, here's a fixed version of your functions:
def Backtracking(k):
finalSolution = []
for c in questions.keys():
solution[k] = c
print ("candidate: ", solution)
if not reject(k):
print ("not rejected: ", solution)
if accept(k):
finalSolution.append(list(solution))
else:
finalSolution.extend(Backtracking(k+1))
solution[k] = "_" # backtracking step here!
return finalSolution
def reject(k):
if solution.count(solution[k]) != 1: # fix this condition
return True
if k > v:
return True
points = 0
for q in solution:
if q in questions:
points = points + questions[q]
if points > y: #too many points
return True
return False
def accept(k):
points = 0
for q in solution: # change this loop variable (also done above, for symmetry)
if q in questions:
points = points + questions[q]
if x <= points <= y and u <= k <= v: # chained comparisons are much nicer than range
return True
return False
There are still things that could probably be improved in there. I think having solution be a fixed-size global list with dummy values is especially unpythonic (a dynamically growing list that you pass as an argument would be much more natural). I'd also suggest using sum to add up the points rather than using an explicit loop of your own.

Categories

Resources