Get a set of maximum number of dissimilar arrays - python

I have an array A of length n, each element of this array(say Wi) is an array is an array of length 10. There is a function, match_check(Wi, Wj) defined as :
def match_check(Wi, Wj):
n = len(Wi)
num_matches =0
for i in range(n):
if (round(Wi[i],4)== round(Wj[i]),4):
num_matches +=1
if (num_matches >= 3):
return True
else :
False
I want to get set of maximum number of elements from this array A, such that for no two elements in this set match_check is True. I have thought of this as a DP problem and written the following solution.
def maximum_arrays(start,end ,curr_items=[], match_dict={}, lookup_dict={}):
key = str(start) + "|" + str(end)
if (lookup_dict.get(key)):
return lookup_dict[key]
if (start == end ):
for items in curr_items:
match_key = str(start)+ ":" + str(items)
if(match_dict[match_key]):
lookup_dict[key] = len(curr_items)
return lookup_dict[key]
lookup_dict[key] = 1 + len(curr_items)
return lookup_dict[key]
match_flag = False
for items in curr_items:
match_key = str(start)+":" + str(items)
if (match_dict.get(match_key)):
match_flag = True
break
if (match_flag):
lookup_dict[key] = maximum_arrays(start+1,end, curr_items,match_dict, lookup_dict)
else:
curr_items_new = curr_items + [start]
lookup_dict[key] = max(1 + maximum_arrays(start+1,end, curr_items_new,match_dict, lookup_dict),
maximum_arrays(start+1,end, curr_items,match_dict, lookup_dict))
return lookup_dict[key]
Where match_dict is contains the result of match_check for all possible pairs of indexes from the array A. But I doubt that dynamic programming would help here and the solution would be O(2^n), since we have to evaluate for all possible cases(keeping and dropping each element in the set).

A simple algorithm which takes O(n^2) would be to first build an adjacency matrix for these arrays by simply applying match_check to every couple of arrays. An edge will be added iff the function match_check returned False.
Then, the problem reduces to finding the maximum clique within the graph and returning its size, a thing which can be done in O(n^2).
Here is a simple demo:
import networkx as nx
import numpy as np
def match_check(Wi, Wj):
n = len(Wi)
num_matches =0
for i in range(n):
if round(Wi[i],4) == round(Wj[i],4):
num_matches +=1
if (num_matches >= 3):
return True
else :
return False
check_arr = [list(10*np.random.rand(5)) for k in range(10)]
n = len(check_arr)
graph_adjacency_mat = np.zeros((n,n))
for i in range(n):
for j in range(n):
if i==j:
continue
graph_adjacency_mat[i][j] = not match_check(check_arr[i],check_arr[j])
graph_adjacency_mat[j][i] = graph_adjacency_mat[i][j]
G=nx.from_numpy_matrix(graph_adjacency_mat)
print(max([len(clique) for clique in nx.find_cliques(G)]))
Note that here I've used the find_cliques function from NetworkX which is NOT O(n^2) (but O(3^(n/3))) because the function max_clique of NetworkX seems to be discarded. You can easily implement max_clique by applying BFS/DFS on the graph starting from every vertex and saving the maximum clique found thus far.

Related

Longest common substring with rolling hash

I am implementing in Python3 an algorithm to find the longest substring of two strings s and t. Given s and t, I need to return (a,b,l) where l is the length of the longest common substring, a is the position in s where the longest substring starts, and b is the position in t where the longest substring starts. I have a working version of the algorithm but it is quite slow and I am not sure why; it is frustrating because I have found other implementations in python using pretty much the same logic that are many times faster. I am self-learning so any help would be greatly appreciated.
The approach is based on comparing hash values rather than directly comparing substrings and using binary search to find maximal length of common substrings. Here is the code for my hash function (m is a big prime and x is just some constant):
def polynomial_hash(my_string, m, x):
str_len = len(my_string)
result = 0
for i in range(str_len):
result = (result + ord(my_string[i]) * power_mod_p(x, i, m)) % m
return result
Given two strings s and t, I first find which string is shorter, without loss of generality, let s be the shorter string. First I need to find the hash values of substrings of a string. I use the following function, implemented as a generator:
def all_length_k_hashes(my_string, k, m, x):
current_position = len(my_string) - k
x_to_the_k = power_mod_p(x, k, m)
hash_value = polynomial_hash(my_string[current_position:], m, x)
yield (hash_value, current_position)
while current_position > 0:
current_position = current_position - 1
hash_value = ((hash_value * x) + ord(my_string[current_position]) - x_to_the_k*ord(my_string[current_position + k])) % m
yield (hash_value, current_position)
This function is simple, its first yield is the hash value of the final length k substring of the string, after that each of its iteration is the hash value of the next length k substring to its left (we move left by one position, for example for k=3 from abcdefghi to abcdefghi then from abcdefghi to abcdefghi). This should be able to calculate all the hash values of all length k substrings of my_string in O(|my_string|).
Now I find out if s and t has a length k substring in common, I use the following function:
def common_sub_string_length_k(shorter_str, longer_str, k, m, x):
short_str_dict = dict()
for hash_and_index in all_length_k_hashes(shorter_str, k, m, x):
short_str_dict.update({hash_and_index[0]: hash_and_index[1]})
hash_generator_longer_str = all_length_k_hashes(longer_str, k, m, x)
for hash_and_index in hash_generator_longer_str:
if hash_and_index[0] in short_str_dict:
return (short_str_dict[hash_and_index[0]], hash_and_index[1])
return False
What is happening in this function is: I create a Python empty dictionary and fill it with (key:values) such that each key is the hash value of a length k substring of the shorter string and its value is that substring's starting index, I call this 'short_str_dict'
Then, using all_length_k_hashes, I create a generator of hash values of substrings of length k of the longer string, then I iterate through this generator to check if there is a hash value that's in the 'short_str_dict', if there is, then the two strings have a substring of length k in common (assuming no hash collisions). This whole process should take time O(|shorter_string| + |longer_string|)
Finally, the following function repeatedly uses the previous process to find the maximal k, using a binary search technique:
def longest_common_substring(str_1, str_2):
m_1 = 309000599
m_2 = 988017827
x = randint(1, 10 ** 6)
len_str_1 = len(str_1)
len_str_2 = len(str_2)
if len_str_1 <= len_str_2:
short_str = str_1
long_str = str_2
switched = False
else:
short_str = str_2
long_str = str_1
switched = True
len_short_str = len(short_str)
len_long_str = len(long_str)
low = 0
high = len_short_str
mid = 0
longest_so_far = 0
longest_indices = (0,0)
while low <= high:
mid = (high + low) // 2
m1_result = common_sub_string_length_k(short_str, long_str, mid, m_1, x)
m2_result = common_sub_string_length_k(short_str, long_str, mid, m_2, x)
if m1_result is False or m2_result is False:
high = mid - 1
else:
longest_so_far = mid
longest_indices = m1_result
low = mid + 1
if switched:
return (longest_indices[1], longest_indices[0], longest_so_far)
else:
return (longest_indices[0], longest_indices[1], longest_so_far)
Two different hashes are used to reduce the probability of a collision. So in total, assuming no collisions, this whole process should take
O(log|shorter_string|) * O(|shorter_string| + |longer_string|).
Have I made any error? Is it slow because of the use of Python dictionaries? I really want to understand my mistake. Any help is greatly appreciated.

Python - long execution time

I created my first Python program and I suspect something is wrong. The execution time of the testovanie() method was 2 hour. In Java same code was time 10 min.
The implementation must be in two classes. And the implementation of each algorithm must be as written (if there is no problem).
Can you help me fix the execution time?
First Class
class Algoritmy:
"""
The Algorithms class creates an array of Integer numbers. Contains methods for working with the field (sorting).
"""
def __init__(self, velkostPola):
"""
Constructor that initializes attributes.
:param velkostPola: array size.
"""
self.velkostPola = velkostPola
self.poleCisel = []
def nacitajZoSuboru(self, nazov):
"""
The method reads integer values from a file and writes them to the field.
:param nazov: a string that contains the name of the file from which the values are read into the field.
:type nazov: string
"""
f = open(nazov, 'r+')
self.poleCisel = f.readlines()
self.poleCisel = [int(i) for i in self.poleCisel]
f.close()
def toString(self):
"""
A method that serves as a text representation (listing) of the entire array of Numbers fields.
"""
for x in self.poleCisel:
print(x)
def bubbleSort(self):
"""
The method sorts arrays according to a bubble algorithm with complexity n ^ 2.
Compares adjacent values if the first value is greater than the second value
replace them. The algorithm is repeated until it sorts the entire field from the smallest to the largest.
"""
n = len(self.poleCisel)
for i in range(0, n):
for j in range(0, n - i - 1):
if self.poleCisel[j] > self.poleCisel[j + 1]:
self.vymena(j,j+1)
def insertionSort(self):
"""
The method classifies the array according to the insertion algorithm with complexity n ^ 2.
Compares the next value with the previous one and places it after
a value that is less than. Repeats until it sorts the entire field from
smallest to largest.
"""
n = len(self.poleCisel)
for i in range(n):
pom = self.poleCisel[i]
j = i - 1
while (j >= 0) and (pom < self.poleCisel[j]):
self.poleCisel[j + 1] = self.poleCisel[j]
j -= 1
self.poleCisel[j + 1] = pom
def quickSort(self, najm, najv):
"""
The method sorts arrays according to a quick algorithm with complexity n * (log2 n).
The algorithm chooses the pivot. The elements are so smaller that there are smaller and on the left side
larger elements on the right.
:param najm: najmI celociselna lowest index
:type najm: integer
:param najv: najvI highest index
:type najv: integer
"""
i = najm
j = najv
pivot = self.poleCisel[najm + (najv - najm) // 2]
while i <= j:
while self.poleCisel[i] < pivot:
i += 1
while self.poleCisel[j] > pivot:
j -= 1
if i <= j:
self.vymena(i, j)
i += 1
j -= 1
if najm < j:
self.quickSort(najm, j)
if i < najv:
self.quickSort(i, najv)
def vymena(self, i, j):
"""
An auxiliary procedure that ensures the exchange of element i for element j in the array.
:param i: jeden prvok pola
:type i: integer
:param j: druhy prvok pola
:type j: integer
"""
pom = self.poleCisel[i]
self.poleCisel[i] = self.poleCisel[j]
self.poleCisel[j] = pom
def selectionSort(self):
"""
The method classifies the array according to a selection algorithm with complexity n ^ 2.
The algorithm finds the largest value and exchanges it with the last element. He will find
always the highest value among unsorted elements and exchanges it with
the last unsorted element.
"""
for i in reversed(range(0, len(self.poleCisel))):
prvy = 0
for j in range(0, i):
if self.poleCisel[j] > self.poleCisel[prvy]:
prvy = j
self.vymena(prvy,i)
def shellSort(self, n):
"""
The method classifies the array according to a shell algorithm with complexity n ^ 2
Gradually, the elements distant from each other are compared by a space - at the beginning there is a space = n / 2,
where n is the size of the field we are sorting. If the left element being compared is larger than the right one being compared,
so for replacement. Then the gap is reduced and the procedure is repeated.
:param n: size of array
:type n: integer
"""
medzera = n // 2
while medzera > 0:
i = medzera
for i in range(0, n):
pom = self.poleCisel[i]
j = i
while (j >= medzera) and (self.poleCisel[j - medzera] > pom):
self.poleCisel[j] = self.poleCisel[j - medzera]
j = j - medzera
self.poleCisel[j] = pom
medzera = medzera // 2
def heapSort(self):
"""
The method sorts arrays according to the heap algorithm with complexity n * (log n).
The algorithm adds elements to the heap where it stores them at the end of the heap. Unless
the previous element is larger, then the elements are replaced until it is
predecessor smaller. This is repeated until a sorted field is created.
"""
n = len(self.poleCisel)
for k in reversed(range(1, n // 2)):
self.maxHeapify(k, n)
while True:
self.vymena(0,n-1)
n = n - 1
self.maxHeapify(1, n)
if (n < 1):
break
def maxHeapify(self, otecI, n):
"""
This method serves to preserve the properties of Heap.
:param otecI: index otca
:type otecI: integer
:param n: nastavenie vacsich prvkov
:type n: integer
"""
otec = self.poleCisel[otecI - 1]
while otecI <= n // 2:
lavySyn = otecI + otecI
if (lavySyn < n) and (self.poleCisel[lavySyn - 1] < self.poleCisel[lavySyn]):
lavySyn += 1
if otec >= self.poleCisel[lavySyn - 1]:
break
else:
self.poleCisel[otecI - 1] = self.poleCisel[lavySyn - 1]
otecI = lavySyn
self.poleCisel[otecI - 1] = otec
Second Class
from Algoritmy import Algoritmy
import time
class Praca:
def __init__(self):
self.casB = []
self.casQ = []
self.casS = []
self.casI = []
self.casSh = []
self.casH = []
def vypisPriemer(self):
"""
A method that calculates and prints the averages of the algorithm duration from the time field.
"""
sumB = 0;sumQ = 0;sumS = 0;sumI = 0;sumSh = 0;sumH = 0
for j in range(0, 200):
sumB += self.casB[j]
sumQ += self.casQ[j]
sumS += self.casS[j]
sumI += self.casI[j]
sumSh += self.casSh[j]
sumH += self.casH[j]
priemerB = sumB / 200
priemerQ = sumQ / 200
priemerS = sumS / 200
priemerI = sumI / 200
priemerSh = sumSh / 200
priemerH = sumH / 200
print("Bubble Sort alg. priemer: %10.9f" %priemerB)
print("Quick Sort alg. priemer: %10.9f"%priemerQ)
print("Selection Sort alg. priemer: %10.9f"%priemerS)
print("Insertion Sort alg. priemer: %10.9f"%priemerI)
print("Shell Sort alg. priemer: %10.9f"%priemerSh)
print("Heap Sort alg. priemer: %10.9f"%priemerH)
def replikacie(self,velkost, nazovS):
"""
The method is aimed at performing 200 replications for each single algorithm.
Collects and stores the execution time of the algorithm in the field.
:param velkost: array size
:type velkost: integer
:param nazovS: file name
:type nazovS: string
"""
self.casB.clear()
self.casQ.clear()
self.casS.clear()
self.casI.clear()
self.casSh.clear()
self.casH.clear()
praca = Algoritmy(velkost)
for i in range(0, 200):
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.bubbleSort()
self.casB.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.quickSort(0, praca.velkostPola-1)
self.casQ.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.selectionSort()
self.casS.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.insertionSort()
self.casI.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.shellSort(praca.velkostPola)
self.casSh.append(time.time() - zaciatok)
praca.nacitajZoSuboru(nazovS)
zaciatok=time.time()
praca.heapSort()
self.casH.append(time.time() - zaciatok)
def testovanie(self):
"""
Testing
"""
self.replikacie(10000,"neutr10000.txt")
print("Neutriedene 10000")
self.vypisPriemer()
def main(self):
zaciatok = time.time()
self.testovanie()
print(time.time() - zaciatok)
"""
Run
"""
if __name__ == '__main__':
praca = Praca()
praca.main()
If you have any improvements, don't be shy to tell me, if I said it's my first Python program. Be nice to me :)
A more condensed MRE would make it easier to comment on the specific statements, but my guess is that your example just illustrates that Python is slow for certain use cases.
This kind of number crunching in pure-Python loops is the nightmare scenario for Python, at least for the most popular CPython implementation.
There are, however, different ways you could speed this up if you diverge a bit from pure CPython:
Use PyPy JIT to run your program instead of CPython. PyPy usually speeds your code ~3-5x, but for numeric stuff like yours you can get an even more impressive speed bump.
Use numeric libraries to vectorize your code and/or offload common operations to optimized routines (written in C, Fortran or even assembly). Numpy is a popular choice.
Rewrite your program, or at least the "hottest" code paths, in Cython cdef functions and classes, see, e.g., https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html.
You may want to check out Numba, but I have no experience with it.

Numpy, how can i index an array, to keep items that are smaller than the previous and next 5 items following them?

I'm making a trading strategy that uses support and resistance levels. One of the ways i'm finding those is by searching for maxima's/minima's (prices that are higher/lower than the previous and next 5 prices).
I have an array of smoothed closing prices and i first tried to find them with a for loop :
def find_max_min(smoothed_prices) # smoothed_prices = np.array([1.873,...])
avg_delta = np.diff(smoothed_prices).mean()
maximas = []
minimas = []
for index in range(len(smoothed_prices)):
if index < 5 or index > len(smoothed_prices) - 6:
continue
current_value = smoothed_prices[index]
previous_points = smoothed_prices[index - 5:index]
next_points = smoothed_prices [index+1:index+6]
previous_are_higher = all(x > current_value for x in previous_points)
next_are_higher = all(x > current_value for x in next_points)
previous_are_smaller = all(x < current_value for x in previous_points)
next_are_smaller = all(x < current_value for x in next_points)
previous_delta_is_enough = abs(previous[0] - current_value) > avg_delta
next_delta_is_enough = abs(next_points[-1] - current_value) > avg_delta
delta_is_enough = previous_delta_is_enough and next_delta_is_enough
if previous_are_higher and next_are_higher and delta_is_enough:
minimas.append(current_value)
elif previous_are_higher and next_are_higher and delta_is_enough:
maximas.append(current_value)
else:
continue
return maximas, minimas
(This isn't the actual code that i used because i erased it, this may not work but is was something like that)
So this code could find the maximas and minimas but it was way too slow and i need to use the function multiple times per secs on huge arrays.
My question is : is it possible to do it with a numpy mask in a similar way as this :
smoothed_prices = s
minimas = s[all(x > s[index] for x in s[index-5:index]) and all(x > s[index] for x in s[index+1:index+6])]
maximas = ...
or do you know how i could to it in another efficient numpy way ?
I have thought of a way, it should be faster than the for loop you presented, but it uses more memory. Simply put, it creates a intermediate matrix of windows, then it just gets the max and min of each window:
def find_max_min(arr, win_pad_size=5):
windows = np.zeros((len(arr) - 2 * win_pad_size, 2 * win_pad_size + 1))
for i in range(2 * win_pad_size + 1):
windows[:, i] = arr[i:i+windows.shape[0]]
return windows.max(axis=1), windows.min(axis=1)
Edit: I found a faster way to calculate the sub-sequences (I had called windows) from Split Python sequence into subsequences. It doesn't use more memory, instead, it creates a view of the array.
def subsequences(ts, window):
shape = (ts.size - window + 1, window)
strides = ts.strides * 2
return np.lib.stride_tricks.as_strided(ts, shape=shape, strides=strides)
def find_max_min(arr, win_pad_size=5):
windows = subsequences(arr, 2 * win_pad_size + 1)
return windows.max(axis=1), windows.min(axis=1)
You can do it easily by:
from skimage.util import view_as_windows
a = smoothed_prices[4:-5]
a[a == view_as_windows(smoothed_prices, (10)).min(-1)]
Please note that since you are looking at minimas within +/- 5 of the index, they can be in indices [4:-5] of your array.

backtracking not trying all possibilities

so I've got a list of questions as a dictionary, e.g
{"Question1": 3, "Question2": 5 ... }
That means the "Question1" has 3 points, the second one has 5, etc.
I'm trying to create all subset of question that have between a certain number of questions and points.
I've tried something like
questions = {"Q1":1, "Q2":2, "Q3": 1, "Q4" : 3, "Q5" : 1, "Q6" : 2}
u = 3 #
v = 5 # between u and v questions
x = 5 #
y = 10 #between x and y points
solution = []
n = 0
def main(n_):
global n
n = n_
global solution
solution = []
finalSolution = []
for x in questions.keys():
solution.append("_")
finalSolution.extend(Backtracking(0))
return finalSolution
def Backtracking(k):
finalSolution = []
for c in questions.keys():
solution[k] = c
print ("candidate: ", solution)
if not reject(k):
print ("not rejected: ", solution)
if accept(k):
finalSolution.append(list(solution))
else:
finalSolution.extend(Backtracking(k+1))
return finalSolution
def reject(k):
if solution[k] in solution: #if the question already exists
return True
if k > v: #too many questions
return True
points = 0
for x in solution:
if x in questions.keys():
points = points + questions[x]
if points > y: #too many points
return True
return False
def accept(k):
points = 0
for x in solution:
if x in questions.keys():
points = points + questions[x]
if points in range (x, y+1) and k in range (u, v+1):
return True
return False
print(main(len(questions.keys())))
but it's not trying all possibilities, only putting all the questions on the first index..
I have no idea what I'm doing wrong.
There are three problems with your code.
The first issue is that the first check in your reject function is always True. You can fix that in a variety of ways (you commented that you're now using solution.count(solution[k]) != 1).
The second issue is that your accept function uses the variable name x for what it intends to be two different things (a question from solution in the for loop and the global x that is the minimum number of points). That doesn't work, and you'll get a TypeError when trying to pass it to range. A simple fix is to rename the loop variable (I suggest q since it's a key into questions). Checking if a value is in a range is also a bit awkward. It's usually much nicer to use chained comparisons: if x <= points <= y and u <= k <= v
The third issue is that you're not backtracking at all. The backtracking step needs to reset the global solution list to the same state it had before Backtracking was called. You can do this at the end of the function, just before you return, using solution[k] = "_" (you commented that you've added this line, but I think you put it in the wrong place).
Anyway, here's a fixed version of your functions:
def Backtracking(k):
finalSolution = []
for c in questions.keys():
solution[k] = c
print ("candidate: ", solution)
if not reject(k):
print ("not rejected: ", solution)
if accept(k):
finalSolution.append(list(solution))
else:
finalSolution.extend(Backtracking(k+1))
solution[k] = "_" # backtracking step here!
return finalSolution
def reject(k):
if solution.count(solution[k]) != 1: # fix this condition
return True
if k > v:
return True
points = 0
for q in solution:
if q in questions:
points = points + questions[q]
if points > y: #too many points
return True
return False
def accept(k):
points = 0
for q in solution: # change this loop variable (also done above, for symmetry)
if q in questions:
points = points + questions[q]
if x <= points <= y and u <= k <= v: # chained comparisons are much nicer than range
return True
return False
There are still things that could probably be improved in there. I think having solution be a fixed-size global list with dummy values is especially unpythonic (a dynamically growing list that you pass as an argument would be much more natural). I'd also suggest using sum to add up the points rather than using an explicit loop of your own.

Python - speed up pathfinding

This is my pathfinding function:
def get_distance(x1,y1,x2,y2):
neighbors = [(-1,0),(1,0),(0,-1),(0,1)]
old_nodes = [(square_pos[x1,y1],0)]
new_nodes = []
for i in range(50):
for node in old_nodes:
if node[0].x == x2 and node[0].y == y2:
return node[1]
for neighbor in neighbors:
try:
square = square_pos[node[0].x+neighbor[0],node[0].y+neighbor[1]]
if square.lightcycle == None:
new_nodes.append((square,node[1]))
except KeyError:
pass
old_nodes = []
old_nodes = list(new_nodes)
new_nodes = []
nodes = []
return 50
The problem is that the AI takes to long to respond( response time <= 100ms)
This is just a python way of doing https://en.wikipedia.org/wiki/Pathfinding#Sample_algorithm
You should replace your algorithm with A*-search with the Manhattan distance as a heuristic.
One reasonably fast solution is to implement the Dijkstra algorithm (that I have already implemented in that question):
Build the original map. It's a masked array where the walker cannot walk on masked element:
%pylab inline
map_size = (20,20)
MAP = np.ma.masked_array(np.zeros(map_size), np.random.choice([0,1], size=map_size))
matshow(MAP)
Below is the Dijkstra algorithm:
def dijkstra(V):
mask = V.mask
visit_mask = mask.copy() # mask visited cells
m = numpy.ones_like(V) * numpy.inf
connectivity = [(i,j) for i in [-1, 0, 1] for j in [-1, 0, 1] if (not (i == j == 0))]
cc = unravel_index(V.argmin(), m.shape) # current_cell
m[cc] = 0
P = {} # dictionary of predecessors
#while (~visit_mask).sum() > 0:
for _ in range(V.size):
#print cc
neighbors = [tuple(e) for e in asarray(cc) - connectivity
if e[0] > 0 and e[1] > 0 and e[0] < V.shape[0] and e[1] < V.shape[1]]
neighbors = [ e for e in neighbors if not visit_mask[e] ]
tentative_distance = [(V[e]-V[cc])**2 for e in neighbors]
for i,e in enumerate(neighbors):
d = tentative_distance[i] + m[cc]
if d < m[e]:
m[e] = d
P[e] = cc
visit_mask[cc] = True
m_mask = ma.masked_array(m, visit_mask)
cc = unravel_index(m_mask.argmin(), m.shape)
return m, P
def shortestPath(start, end, P):
Path = []
step = end
while 1:
Path.append(step)
if step == start: break
if P.has_key(step):
step = P[step]
else:
break
Path.reverse()
return asarray(Path)
And the result:
start = (2,8)
stop = (17,19)
D, P = dijkstra(MAP)
path = shortestPath(start, stop, P)
imshow(MAP, interpolation='nearest')
plot(path[:,1], path[:,0], 'ro-', linewidth=2.5)
Below some timing statistics:
%timeit dijkstra(MAP)
#10 loops, best of 3: 32.6 ms per loop
The biggest issue with your code is that you don't do anything to avoid the same coordinates being visited multiple times. This means that the number of nodes you visit is guaranteed to grow exponentially, since it can keep going back and forth over the first few nodes many times.
The best way to avoid duplication is to maintain a set of the coordinates we've added to the queue (though if your node values are hashable, you might be able to add them directly to the set instead of coordinate tuples). Since we're doing a breadth-first search, we'll always reach a given coordinate by (one of) the shortest path(s), so we never need to worry about finding a better route later on.
Try something like this:
def get_distance(x1,y1,x2,y2):
neighbors = [(-1,0),(1,0),(0,-1),(0,1)]
nodes = [(square_pos[x1,y1],0)]
seen = set([(x1, y1)])
for node, path_length in nodes:
if path_length == 50:
break
if node.x == x2 and node.y == y2:
return path_length
for nx, ny in neighbors:
try:
square = square_pos[node.x + nx, node.y + ny]
if square.lightcycle == None and (square.x, square.y) not in seen:
nodes.append((square, path_length + 1))
seen.add((square.x, square.y))
except KeyError:
pass
return 50
I've also simplified the loop a bit. Rather than switching out the list after each depth, you can just use one loop and add to its end as you're iterating over the earlier values. I still abort if a path hasn't been found with fewer than 50 steps (using the distance stored in the 2-tuple, rather than the number of passes of the outer loop). A further improvement might be to use a collections.dequeue for the queue, since you could efficiently pop from one end while appending to the other end. It probably won't make a huge difference, but might avoid a little bit of memory usage.
I also avoided most of the indexing by one and zero in favor of unpacking into separate variable names in the for loops. I think this is much easier to read, and it avoids confusion since the two different kinds of 2-tuples had had different meanings (one is a node, distance tuple, the other is x, y).

Categories

Resources