Backtracing the longest palindromic subsequence - python

I modified the code from Geeks for Geeks to backtrace the actual subsequence, not only its length. But when I backtrace and get to the end where I can put an arbitrary character to the middle of the palindrome, I find my solution to be sloppy and not 'Pythonic'. Can someone please help me?
This piece smells particularly bad(if it works correctly at all):
if length_matrix[start][end] == 1 and substr_length >= 0:
middle = sequence[start]
Here is the forward pass:
def calc_subsequence_lengths(sequence):
n = len(sequence)
# Create a table to store results of subproblems
palindrome_lengths = np.zeros((n, n))
# Strings of length 1 are palindrome of length 1
np.fill_diagonal(palindrome_lengths, 1)
for substr_length in range(2, n + 1):
for i in range(n - substr_length + 1):
j = i + substr_length - 1
if sequence[i] == sequence[j] and substr_length == 2:
palindrome_lengths[i][j] = 2
elif sequence[i] == sequence[j]:
palindrome_lengths[i][j] = palindrome_lengths[i + 1][j - 1] + 2
else:
palindrome_lengths[i][j] = max(palindrome_lengths[i][j - 1],
palindrome_lengths[i + 1][j])
return palindrome_lengths
And here is the traceback:
def restore_palindrome(length_matrix, sequence):
palindrome_left = ''
middle = ''
n, n = np.shape(length_matrix)
# start in the north-eastern corner of the matrix
substr_length, end = n - 1, n-1
# traceback
while substr_length > 0 and end > 1:
start = end - substr_length
# if possible, go left
if length_matrix[start][end] == (length_matrix[start][end - 1]):
substr_length -= 1
end -= 1
# the left cell == current - 2, but the lower is the same as current, go down
elif length_matrix[start][end] == (length_matrix[start + 1][end]):
substr_length -= 1
# both left and lower == current - 2, go south-west
else:
palindrome_left += sequence[start]
substr_length -= 2
end -= 1
if length_matrix[start][end] == 1 and substr_length >= 0:
middle = sequence[start+1]
result = ''.join(palindrome_left) + middle + ''.join(palindrome_left[::-1])
return result, int(length_matrix[0][n-1])
Update
First off, the problem is to calculate the longest non-contiguous palindromic sequence (as stated in the article I referred to). For the sequence BBABCBCAB, the output should be BABCBAB
Secondly, as I have pointed out, I'm building upon an existing DP solution which works in O(N^2) time and space. It calculates the length just fine, so I need to backtrace the actual palindrome in the most elegant way, not sacrificing efficiency for elegance.

Related

Codility FibFrog Algorithm - Improving Time Complexity from O(N * log(N) ** N) to O(N*log(N)) [duplicate]

This question already has answers here:
FibFrog Codility Problem - Optimising for Performance
(2 answers)
Closed last year.
I'm trying to solve the Codility FibFrog problem and I came up with the following solution:
def jumps_from(position, fb, A):
paths = set([])
for i in fb:
newPos = position + i
if newPos == len(A):
return set([-1])
elif newPos < len(A):
if A[newPos] == 1:
paths.add(newPos)
else: break
return paths
def solution(A):
if len(A) < 3: return 1
fibonaccis = fibonacci(len(A))
if len(A) + 1 in fibonaccis: return 1
paths = set([-1])
steps = 0
while True:
paths = set([idx for pos in paths for idx in jumps_from(pos, fibonaccis, A)])
if len(paths) == 0: return -1
if -1 in paths:
return steps + 1
steps += 1
return steps
def fibonacci(N):
arr = [0] * (N + 2)
arr[1] = 1
for i in range(2, N + 2):
arr[i] = arr[i-1] + arr[i-2]
return dict.fromkeys(arr[2:], 1)
Codility detects the runtime of this as O(N * log(N) ** N).
Codility Report: https://app.codility.com/demo/results/trainingJV7YAC-G3B/
I'm comparing this with the following solution, which scores 100% on Codility, and has runtime O(N * log(N)):
def gen_fib(n):
fn = [0,1]
i = 2
s = 2
while s < n:
s = fn[i-2] + fn[i-1]
fn.append(s)
i+=1
return fn
def new_paths(A, n, last_pos, fn):
"""
Given an array A of len n.
From index last_pos which numbers in fn jump to a leaf?
returns list: set of indexes with leaves.
"""
paths = []
for f in fn:
new_pos = last_pos + f
if new_pos == n or (new_pos < n and A[new_pos]):
paths.append(new_pos)
return paths
def solution(A):
n = len(A)
if n < 3:
return 1
# A.append(1) # mark final jump
fn = sorted(gen_fib(100000)[2:]) # Fib numbers with 0, 1, 1, 2.. clipped to just 1, 2..
# print(fn)
paths = set([-1]) # locate all the leaves that are one fib jump from the start position.
jump = 1
while True:
# Considering each of the previous jump positions - How many leaves from there are one fib jump away
paths = set([idx for pos in paths for idx in new_paths(A, n, pos, fn)])
# no new jumps means game over!
if not paths:
break
# If there was a result in the new jumps record that
if n in paths:
return jump
jump += 1
return -1
I'm not sure why my solution differs in runtime, since the approach is exactly the same - compute all the indices you can jump to from -1, and then compute all the indices you can jump to from the new positions, until you get to the other side of the river, or no new positions can be found.
Please refer to the first point in my previous answer.
If len(A) = 100000, you are calculating 100003 fibonacci numbers, while we only need fibonacci numbers which are less than 100k, which would be <30 of them.
The current fibonacci function is still returning N fibonacci numbers instead of just returning fibonacci numbers which are less than N. For N=100k, it should be just 25 numbers instead of over 100k.
Please update your fibonacci function to this -
def fibonacci(N):
arr = [1, 1]
while arr[-1] < N:
arr.append(arr[-1] + arr[-2])
return dict.fromkeys(arr[1:], 1)
I just ran a test locally, and looks like your fibonacci function takes ~1 sec to generate the first 100k fibonacci numbers, and that's the reason it might be failing the performance test, even though the rest of your code is optimal. I think you should be able to clear it with the required performance limits after correcting the fibonacci function.

Codility StoneWell Problem Python - Code Fails on Performance Tests Due to Incorrect Responses

I'm trying to solve the StoneWall codility problem.
I came up with the following code:
def solution(H):
if len(H) == 0: return 0
if len(H) == 1: return 1
count = 1
potentialBases = {}
for i in range(1, len(H)):
# print(potentialBases, count, H[i])
if H[i] > H[i - 1]:
count += 1
potentialBases[H[i-1]] = 1
elif H[i] < H[i-1]:
if H[i] not in potentialBases:
count += 1
potentialBases[H[i]] = 1
if H[i-1] in potentialBases:
potentialBases.pop(H[i-1])
return count
This passes all the correctness tests (100%), but fails on pretty much all the performance tests - not because of time complexity issues, but because the values obtained are incorrect.
I'm trying to figure out what I'm doing wrong here, but I can't seem to come up with a small example which makes the code fail.
I managed to solve it with the following code:
def solution(H):
if len(H) == 0: return 0
if len(H) == 1: return 1
count = 1
potentialBases = {}
for i in range(1, len(H)):
if H[i] > H[i - 1]:
count += 1
potentialBases[H[i-1]] = 1
elif H[i] < H[i-1]:
if H[i] not in potentialBases:
count += 1
potentialBases[H[i]] = 1
if len(potentialBases.keys()) < H[i-1] + 1 - H[i]:
keysToPop = []
for key in potentialBases.keys():
if key >= H[i] and key <= H[i-1]:
keysToPop.append(key)
for key in keysToPop:
potentialBases.pop(key)
else:
for j in range(H[i], H[i-1] + 1):
if j in potentialBases:
potentialBases.pop(j)
return count
The issue was occurring when the current value block height was smaller than the previous one. In my previous solution, I was just removing the previous block from my potentialBases dict (H[i-1]). Instead, we must remove all blocks which are in the range H[i-1] - H[i].
The conditional I added which compares the range with the size of the potentialBases keyspace is added for performance reasons to ensure we always do the least work possible, and iterate through the smallest number of values.
EDIT:
Here's an alternative approach with a stack:
def solution(H):
if len(H) == 0: return 0
if len(H) == 1: return 1
count = 0
stack = []
last = 0
for i in range(0, len(H)):
if H[i] > last:
count += 1
stack.append(H[i])
last = H[i]
elif H[i] < H[i-1]:
while (len(stack) > 0 and H[i] < stack[-1]):
stack.pop()
if (len(stack) == 0 or H[i] != stack[-1]):
count += 1
stack.append(H[i])
last = H[i]
return count

3sum algorithm. I am not getting results for numbers less than the target

How can I get this to print all triplets that have a sum less than or equal to a target? Currently this returns triplets that are = to the target. I've tried to change and think but can't figure out
def triplets(nums):
# Sort array first
nums.sort()
output = []
# We use -2 because at this point the left and right pointers will be at same index
# For example [1,2,3,4,5] current index is 4 and left and right pointer will be at 5, so we know we cant have a triplet
# _ LR
for i in range(len(nums) - 2):
# check if current index and index -1 are same if same continue because we need distinct results
if i > 0 and nums[i] == nums[i - 1]:
continue
left = i + 1
right = len(nums) - 1
while left < right:
currentSum = nums[i] + nums[left] + nums[right]
if currentSum <= 8:
output.append([nums[i], nums[left], nums[right]])
# below checks again to make sure index isnt same with adjacent index
while left < right and nums[left] == nums[left + 1]:
left += 1
while left < right and nums[right] == nums[right - 1]:
right -= 1
# In this case we have to change both pointers since we found a solution
left += 1
right -= 1
elif currentSum > 8:
left += 1
else:
right -= 1
return output
So for example input array is [1,2,3,4,5] we will get the result (1,2,3),(1,2,4),(1,2,5),(1,3,4) Because these have a sum of less than or equal to target of 8.
The main barrier to small changes to your code to solve the new problem is that your original goal of outputting all distinct triplets with sum == target can be solved in O(n^2) time using two loops, as in your algorithm. The size of the output can be of size proportional to n^2, so this is optimal in a certain sense.
The problem of outputting all distinct triplets with sum <= target, cannot always be solved in O(n^2) time, since the output can have size proportional to n^3; for example, with an array nums = [1,2,...,n], target = n^2 + 1, the answer is all possible triples of elements. So your algorithm has to change in a way equivalent to adding a third loop.
One O(n^3) solution is shown below. Being a bit more clever about filtering duplicate elements (like using a hashmap and working with frequencies), this should be improvable to O(max(n^2, H)) where H is the size of your output.
def triplets(nums, target=8):
nums.sort()
output = set()
for i, first in enumerate(nums[:-2]):
if first * 3 > target:
break
# Filter some distinct results
if i + 3 < len(nums) and first == nums[i + 3]:
continue
for j, second in enumerate(nums[i + 1:], i + 1):
if first + 2 * second > target:
break
if j + 2 < len(nums) and second == nums[j + 2]:
continue
for k, third in enumerate(nums[j + 1:], j + 1):
if first + second + third > target:
break
if k + 1 < len(nums) and third == nums[k + 1]:
continue
output.add((first, second, third))
return list(map(list, output))

Design O(log n) algorithm for finding 3 distinct elements in a list

The question is:
Design an O(log n) algorithm whose input is a sorted list A. The algorithm should return true if A contains at least 3 distinct elements. Otherwise, the algorithm should return false.
as it has to be O(log n), I tried to use binary search and this is the code I wrote:
def hasThreeDistinctElements(A):
if len(A) < 3:
return False
minInd = 0
maxInd = len(A)-1
midInd = (maxInd+minInd)//2
count = 1
while minInd < maxInd:
if A[minInd] == A[midInd]:
minInd = midInd
if A[maxInd] == A[midInd]:
maxInd = midInd
else:
count += 1
maxInd -= 1
else:
count += 1
minInd += 1
midInd = (maxInd+minInd)//2
return count >= 3
is there a better way to do this?
Thanks
from bisect import bisect
def hasThreeDistinctElements(A):
return A[:1] < A[-1:] > [A[bisect(A, A[0])]]
The first comparison safely(*) checks whether there are two different values at all. If so, we check whether the first value larger than A[0] is also smaller than A[-1].
(*): Doesn't crash if A is empty.
Or without bisect, binary-searching for a third value in A[1:-1]. The invariant is that if there is any, it must be in A[lo : hi+1]:
def hasThreeDistinctElements(A):
lo, hi = 1, len(A) - 2
while lo <= hi:
mid = (lo + hi) // 2
if A[mid] == A[0]:
lo = mid + 1
elif A[mid] == A[-1]:
hi = mid - 1
else:
return True
return False
In order to really be O(logN), the updates to the bounding indeces minInd,maxInd should only ever be
maxInd = midInd [- 1]
minInd = midInd [+ 1]
to half the search space. Since there are paths through your loop body that only do
minInd += 1
maxInd -= 1
respectively, I am not sure that you can't create data for which your function is linear. The following is a bit simpler and guaranteed O(logN)
def x(A):
if len(A) < 3:
return False
minInd, maxInd = 0, len(A)-1
mn, mx = A[minInd], A[maxInd]
while minInd < maxInd:
midInd = (minInd + maxInd) // 2
if mn != A[midInd] != mx:
return True
if A[midInd] == mn:
minInd = midInd + 1 # minInd == midInd might occur
else:
maxInd = midInd # while maxInd != midInd is safe
return False
BTW, if you can use the standard library, it is as easy as:
from bisect import bisect_right
def x(A):
return A and (i := bisect_right(A, A[0])) < len(A) and A[i] < A[-1]
Yes, there is a better approach.
As the list is sorted, you can use binary search with slight custom modifications as follows:
list = [1, 1, 1, 2, 2]
uniqueElementSet = set([])
def binary_search(minIndex, maxIndex, n):
if(len(uniqueElementSet)>=3):
return
#Checking the bounds for index:
if(minIndex<0 or minIndex>=n or maxIndex<0 or maxIndex>=n):
return
if(minIndex > maxIndex):
return
if(minIndex == maxIndex):
uniqueElementSet.add(list[minIndex])
return
if(list[minIndex] == list[maxIndex]):
uniqueElementSet.add(list[minIndex])
return
uniqueElementSet.add(list[minIndex])
uniqueElementSet.add(list[maxIndex])
midIndex = (minIndex + maxIndex)//2
binary_search(minIndex+1, midIndex, n)
binary_search(midIndex+1, maxIndex-1, n)
return
binary_search(0, len(list)-1, len(list))
print(True if len(uniqueElementSet)>=3 else False)
As, we are dividing the array into 2 parts in each iteration of the recursion, it will require maximum of log(n) steps to check if it contains 3 unique elements.
Time Complexity = O(log(n)).

Modifying references to lists

For an algorithm I'm benchmarking I need to test some portion of a list (which could be very long, but is filled with 0's mostly and the occasional 1). The idea is that in a list of n items, with d of them being of interest, in expectation each is defective with probability d/n. So, check a group of size d/n (it's defined in terms of the floor and log functions for information theoretic reasons - it makes the analysis of the algorithm easier).
Algorithm:
1./ If n <= 2*d -2 (ie more than half the list is filled with 1s) just look at each item in turn
2./ If n > 2*d -2: Check a group of size aplha (= floor(binarylog(l/d), l = n - d + 1, d = number of 1s). If there is a 1, do binary search on the group to find the defective and set d = d - 1 and n = n - 1 - x (x = size of the group minus the defective). If there isn't a one, set n = n - groupSize and go to 1 (i.e. check the rest of the list).
However, when populating the list with 10 1s in random places, the algorithm find all but a single 1 and then continues to loop whilst checking an empty list.
I think the problem is that when discarding a group containing all 0s I'm not correctly modifying the reference that says where to start for the next round, and this is causing my algorithm to fail.
Here is the relevant part of the function:
import math
def binary_search(inList):
low = 0
high = len(inList)
while low < high:
mid = (low + high) // 2
upper = inList[mid:high]
lower = inList[low:mid]
if any(lower):
high = mid
elif any(upper):
low = mid + 1
elif mid == 1:
return mid
else:
# Neither side has a 1
return -1
return mid
def HGBSA(inList, num_defectives):
n = len(inList)
defectives = []
#initialising the start of the group to be tested
start = 0
while num_defectives > 0:
defective = 0
if(n <= (2*num_defectives - 2)):
for i in inList:
if i == 1:
num_defectives = num_defectives - 1
n = n - 1
defectives.append(i)
else:
#params to determine size of group
l = n - num_defectives + 1
alpha = int(math.floor(math.log(l/num_defectives, 2)))
groupSize = 2**alpha
end = start + groupSize
group = inList[start:end]
#print(groupSize)
#print(group)
if any(group):
defective = binary_search(group)
defective = start + defective
defectives.append(defective)
undefectives = [s for s in group if s != 1]
n = n - 1 - len(undefectives)
num_defectives = num_defectives - 1
print(defectives)
else:
n = n - groupSize
start = start + groupSize
print(defectives)
return defectives
Also here are the tests that the function currently passes:
from GroupTesting import HGBSA
#idenitify a single defective
inlist = [0]*1024
inlist[123] = 1
assert HGBSA(inlist, 1) == [123]
#identify two defectives
inlist = [0]*1024
inlist[123] = 1
inlist[789] = 1
assert inlist[123] == 1
assert inlist[789] == 1
assert HGBSA(inlist, 2) == [123, 789]
zeros = [0]*1024
ones = [1, 101, 201, 301, 401, 501, 601, 701, 801, 901]
for val in ones:
zeros[val] = 1
assert HGBSA(zeros, 10) == ones
I.e. it finds a single 1, 2 and 10 1s deterministically placed in the list, but this test:
zeros = [0] * 1024
ones = [1] * 10
l = zeros + ones
shuffle(l)
where_the_ones_are = [i for i, x in enumerate(l) if x == 1]
assert HGBSA(l, 10) == where_the_ones_are
Has exposed the bug.
This test also fails with the code above
#identify two defectives next to each other
inlist = [0]*1024
inlist[123] = 1
inlist[124] = 1
assert GT(inlist, 2) == [123, 124]
The following modification (discarding a whole group if it is undefective, but only discarding the members of a group before the defective) passes the 'two next to each other' test, but not the '10 in a row' or random tests:
def HGBSA(inList, num_defectives):
n = len(inList)
defectives = []
#initialising the start of the group to be tested
start = 0
while num_defectives > 0:
defective = 0
if(n <= (2*num_defectives - 2)):
for i in inList:
if i == 1:
num_defectives = num_defectives - 1
n = n - 1
defectives.append(i)
else:
#params to determine size of group
l = n - num_defectives + 1
alpha = int(math.floor(math.log(l/num_defectives, 2)))
groupSize = 2**alpha
end = start + groupSize
group = inList[start:end]
#print(groupSize)
#print(group)
if any(group):
defective = binary_search(group)
defective = start + defective
defectives.append(defective)
undefectives = [s for s in group if s != 1 in range(0, groupSize//2)]
print(len(undefectives))
n = n - 1 - len(undefectives)
num_defectives = num_defectives - 1
start = start + defective + 1
#print(defectives)
else:
n = n - groupSize
start = start + groupSize
print(defectives)
return defectives
I.e. the problem is when there are multiple 1s in a group being tested, and after the first none are being detected. The best test for the code to pass, would be the 1s uniformly distributed at random throughout the list and all defectives are found.
Also, how would I create tests to catch this kind of error in future?
Your algorithm seemingly has worse performance than a linear scan.
A naïve algorithm would just scan a piece of list the size of d/n in O(d/n).
defectives = [index for (index, element) in enumerate(inList[start:end], start)]
Common sense says that you can't possibly detect positions of all 1s in a list without looking at every element of the list once, and there's no point in looking at it more that once.
Your "binary search" uses any multiple times, effectively scanning pieces of the list multiple times. Same applies to constructs like if any(group): ... [s for s in group if ...] which scan group twice, first time needlessly.
If you described the actual algorithm you're trying to implement, people could help troubleshoot it. From your code and your post, the algorithm is unclear. The fact that your HGBSA function is long and not exactly commented unfortunately does not help understanding.
Don't be afraid to tell people here the details of what your algorithm is doing and why; we're sort of computer geeks here, too, we're going to understand :)

Categories

Resources