Is it possible to process string from starting for DP solution - python

I was trying out longest palindromic subsequence problem from leetcode.
One of the discussed solution is as follows:
class Solution:
def longestPalindromeSubseq(self, s: str) -> int:
n = len(s)
dp = [[0] * n for _ in range(n)]
for i in range(n - 1, -1, -1):
dp[i][i] = 1
for j in range(i+1, n):
if s[i] == s[j]:
dp[i][j] = dp[i + 1][j - 1] + 2
else:
dp[i][j] = max(dp[i + 1][j], dp[i][j - 1])
return dp[0][n - 1]
So it starts from end of the string:
I was guessing if it is possible to begin from the starting of the string. That is if its possible to have loops something like this:
for i in range(0, n):
for j in range(i+1, n):
# ...
But dp[i + 1] wont be calculated for any given iteration of i and we need dp[i+1] for evaluating
dp[i][j] = dp[i + 1][j - 1] + 2 and
dp[i][j] = max(dp[i + 1][j], dp[i][j - 1])
Is it possible to change these two updates to dp (and hence come up with new recurrence relation) in some way to make it possible to begin from the starting of the string or starting from the end of the string is the only way possible !? (I was not able to come up with any recurrence solution / index adjustments to make it possible. So I have started to believe that its indeed not possible. But I wanted to be sure.)

The first hint that you can do this from the beginning is that let's say you're given a string 'baabbcc' that this logic gets the answer for, the same logic will work for the reversed string as well ('ccbbaab').
The more robust reasoning for this can be derived from what dp[i][j] represents. The value represents the Longest Palindromic Subsequence between i and j inclusive. We calculate this dp array using two pointers, say i and j.
We iterate over all possible values of i and j, and if s[i] == s[j] then we know that the answer from i to j will be equal to the answer for i+1 to j-1 + 2 because we can take the answer from i+1 to j-1 and add s[i] and s[j] to the beginning and end of that. I hope this is clear from the code you provided.
What that means is that to calculate dp[i][j], you need dp[i+1][j-1].
The code you have provided does this by starting the i pointer from the ending and for every i, it loops from j = i till j = n-1. This means that i+1 is reached before i and j-1 is reached before j.
However, you can achieve the same effect starting from the beginning. This time, start by moving the j pointer from the beginning, and for every j, move the i pointer backward from i = j till i = 0. This ensures that j-1 is reached before j and i+1 is reached before i, which is what we're looking for.
The final code would look something like this (Which I've submitted and gotten accepted):
class Solution:
def longestPalindromeSubseq(self, s: str) -> int:
n = len(s)
dp = [[0] * n for _ in range(n)]
for j in range(0, n):
dp[j][j] = 1
for i in range(j-1, -1, -1):
if s[i] == s[j]:
dp[i][j] = dp[i + 1][j - 1] + 2
else:
dp[i][j] = max(dp[i + 1][j], dp[i][j - 1])
return dp[0][n - 1]

Related

Finding a subset whose sum is zero

I have an Excel file containing two rows, one containing numbers, another having ID, serial number basically.
Now, these numbers are both positive and negative. I have to find if there exists a subset whose sum is zero. If yes, then the IDs of the element of the subsets whose sum is zero. It would be awesome if I could also find all subsets smaller than a number, say 3 or 4.
The most important part is that I want the IDs of the numbers.
def subset_sum_exists(numbers, target):
n = len(numbers)
# Initialize a 2D array with size (n+1) * (target+1)
dp = [[False for _ in range(target + 1)] for _ in range(n + 1)]
# Initialize the first column with true
for i in range(n + 1):
dp[i][0] = True
for i in range(1, n + 1):
for j in range(1, target + 1):
# if j is less than current number
if j < numbers[i - 1]:
dp[i][j] = dp[i - 1][j]
# if j is greater or equal than current number
if j >= numbers[i - 1]:
dp[i][j] = dp[i - 1][j] or dp[i - 1][j - numbers[i - 1]]
# return the last element
return dp[n][target]
This code should at least tell me if such a subset exists or not, but seems like there is some error in this. For target = 0, it always says true. For others, it gives an error.
numbers = [6, 20, 54, 93, -54, -26]

ZigZag Quadruples

I've seen this interesting question, and wonder if there are more ways to approach it:
Given a permutation of numbers from 1 to n, count the number of
quadruples indices (i,j,k,l) such that i<j<k<l and A[i]<A[k]<A[j]<A[l]
e.g.
Input : [1,3,2,6,5,4]
Output : 1 (1,3,2,6)
Desired algorithm is O(n^2)
Approach:
I've tried to solve it using stack, in a similiar manner to Leetcode 132 pattern - but it seems to fail.
def get_smaller_before(A):
smaller_before = [0] * len(A)
for i in range(len(A)):
for j in range(i):
if A[j] < A[i]:
smaller_before[i] += 1
return smaller_before
def get_larger_after(A):
larger_after = [0] * len(A)
for i in range(len(A)):
for j in range(i+1, len(A)):
if A[i] < A[j]:
larger_after[i] += 1
return larger_after
def countQuadrples(nums):
if not nums:
return False
smaller_before = get_smaller_before(nums)
larger_after = get_larger_after(nums)
counter = 0
stack = []
for j in reversed(range(1, len(nums))):
# i < j < k < l
# smaller_before < nums[k] < nums[j] < larger_after
while stack and nums[stack[-1]] < nums[j]:
counter += smaller_before[j] * larger_after[stack[-1]]
stack.pop()
stack.append(j)
return counter
Does anyone has a better idea?
What you need is some sort of 2-dimensional tree that allows you to quickly answer the question "How many points after k have value bigger than A[j]," and the question "How many points before j have value less than A[k]?" These will usually be time O(n log(n)) to build and those queries should run in time something like O(log(n)^2)).
A number of such data structures exist. One option is a variant on a Quadtree. You you turn each array element into a point with x-coordinate the position in the array and y-coordinate being its value. And your queries are just counting how many elements are in a box.
And now you can do a double loop over all j, k and count how many zig-zag quadruples have those as the inner pair.

How to reduce time complexity of this program

The entry Y [i][j] stores the sum of the subarray X[i..j], but can I get a better time complexity?
def func(X, n):
Y = [[0 for i in range(n)] for j in range(n)]
for i in range(n):
for j in range(i, n):
for k in range(i, j+1):
Y[i][j] += X[k]
return Y
if __name__ == "__main__":
n = 500
X = list(range(n))
for i in range(30, 50):
print(X[i], end=" ")
print()
print(func(X, n)[30][49])
You could use a prefix sum array.
The idea is that you have an array where the entry ps[i] denotes the sum of all elements arr[0..i]. You can calculate it in linear time:
ps[0] = arr[0]
for i in range(1, len(arr)):
ps[i] = ps[i - 1] + arr[i]
Can you guess how to retrieve a sum Y(i, j) in constant time?
Solution: Y(i, j) = ps[j] - ps[i - 1]. You take the entire sum of the array from j to the start and subtract the part that you don't want again (which is from i-1 to the start).
Note: It is possible that I messed up some edge cases. Be wary for things like i=0, j=0, j<i, etc.

use complex number to traverse neighbors in 2d array

I find a new way to traverse the quartet neighbors by using complex number in this solution.
https://leetcode.com/problems/word-search-ii/discuss/59804/27-lines-uses-complex-numbers
(you can just read my example.)
I think it is elegant and concise, but I can not fully understand about it.
Here I have extracted the key code, and simplify the exmaple.
board is a 2d array, and we want to start from every node, and traverse the 4 direction neigbor recursively by dfs:
this is a common way:
def dfs(i, j, word):
# create 4 direction by hand
for I, J in (i + 1, j), (i - 1, j), (i, j + 1), (i, j - 1):
# need to check boundary
if 0 <= I < len(board) and 0 <= J < len(board[0]):
dfs(I, J, word + c)
for i, j in board:
dfs(i, j, '')
here is using complex number as index:
board = {i + 1j * j: c
for i, row in enumerate(board)
for j, c in enumerate(row)}
def dfs(z, word):
c = board.get(z)
# here is visit 4 direction neighbors, which I don't understand
if c:
for k in range(4):
search(node[c], z + 1j ** k, word + c)
for z in board:
dfs(z, '')
I think there is two advantages by using complex number:
don't need to create 4 direction by hand
don't need to check boundary
But I can't understand here for k in range(4): dfs(z + 1j ** k, word + c)
can somebody explain this algorithm? really appreciate it.
If what I think is correct this solution uses the following property of the imaginary number j:
which if added to a complex number as a representation of a grid are the nodes: right, up, left, down

Incorrect indexing for max subarray in Python

I wrote both a brute-force and a divide-and-conquer implementation of the Max Subarray problem in Python. Tests are run by drawing a random sample of integers.
When the length of the input array is large, the assert in __main__ fails because the recursive algorithm does not return the correct answer. However, the two algorithms DO agree when the array is less than 10 elements long (this is approximate, and the actual size of the failed input varies on each execution). The issue does not seem to be related to even or odd array lengths, but it does appear to be related to how the array is indexed.
Sorry if I'm missing something stupid, but why does the recursive algorithm stop returning the correct output when the input array starts getting larger?
# Subarray solutions are represented by an array in the form
# [lower_bound, higher_bound, sum]
from sys import maxsize
import random
import time
# Brute force implementation (THETA(n^2))
def bf_max_subarray(A):
biggest = -maxsize - 1
left = 0
right = 0
for i in range(0, len(A)):
sum = 0
for j in range(i, len(A)):
sum += A[j]
if sum > biggest:
biggest = sum
left = i
right = j
return [left, right, biggest]
# Part of divide-and-conquer solution
def cross_subarray(A, l, m, r):
lsum = -maxsize - 1
rsum = -maxsize - 1
lbound = 0
rbound = 0
tempsum = 0
for i in range(m, l-1, -1):
tempsum += A[i]
if tempsum > lsum:
lsum = tempsum
lbound = i
tempsum = 0
for j in range(m+1, r+1):
tempsum += A[j]
if tempsum > rsum:
rsum = tempsum
rbound = j
return [lbound, rbound, lsum + rsum]
# Recursive solution
def rec_max_subarray(A, l, r):
# Base case: array of one element
if (l == r):
return [l, r, A[l]]
else:
m = (l+r)//2
left = rec_max_subarray(A, l, m)
right = rec_max_subarray(A, m+1, r)
cross = cross_subarray(A, l, m, r)
# Returns the array representing the subarray with the maximum sum.
return max([left, right, cross], key=lambda i:i[2])
if __name__ == "__main__":
for i in range(1, 101):
A = random.sample(range(-i*2, i), i)
start = time.clock()
bf = bf_max_subarray(A)
bf_time = time.clock() - start
start = time.clock()
dc = rec_max_subarray(A, 0, len(A)-1)
dc_time = time.clock() - start
assert dc == bf # Make sure the algorithms agree.
The subarray with the maximum sum is represented by an array of the form [left_bound, right_bound, sum].
But thanks toreturn max([left, right, cross], key=lambda i:i[2]), rec_max_subarray returns the correct maximum sum for A, but risks returning indicies that do not match the indicies returned in bf_max_subarray. My error was assuming that the boundaries of a subarray with the maximum sum would be unique.
The solution is to either fix the criteria that selects a subarray, or just to assert the equality of the sums using assert dc[2] == bf[2].

Categories

Resources