Middle partition in quicksort algorithm Python - python

I'm trying to implement quicksort algorithm which uses middle partitioning method. below is my code:
def middlepartition(A, p, r):
pvi = (p + r) // 2
pv = A[pvi]
while p < r:
while p < len(A) and A[p] <= pv:
p += 1
while A[r] > pv:
r -= 1
if p < r:
A[p], A[r] = A[r], A[p]
A[r], A[pvi] = A[pvi], A[r]
return r
def quicksort(A, p, r):
if p < r:
q = middlepartition(A, p, r)
quicksort(A, p, q - 1)
quicksort(A, q + 1, r)
A = [0, 1, 5, 23, 0, 2, 5, 56, 79, 3, 65]
quicksort(A, 0, len(A)-1)
print(A)
but somehow the code doesn't work fine and I'm really confused. the following is the output:
[0, 1, 2, 5, 23, 0, 3, 5, 56, 65, 79]
I seriously cant find the mistake...

This is your mistake:
A[r], A[pvi] = A[pvi], A[r]
(or at least a mistake - there could be other issues).
This bit of code occurs in variants of QuickSort that initially move (swap) the pivot element to the end of the array and then exclude that slot from the body of the partitioning loop. Once the partitioning loop is done, the initial pivot element is moved into place at the point where the pivot index has emerged.
That's not what you're doing here, though - when control reaches the above statement the pvi index points to a random element, so the swap messes you up. Just get rid of it - the partitioning loop will have already put the pivot element in the right place at that point.

Related

Maximum Score from Two Arrays | Which Test Case is this approach missing?

Problem Statement
Given two integer arrays A and B of size N and M respectively. You begin with a score of 0. You want to perform exactly K operations. On the iᵗʰ operation (1-indexed), you will:
Choose one integer x from either the start or the end of any one array, A or B. Remove it from that array
Add x to score.
Return the maximum score after performing K operations.
Example
Input: A = [3,1,2], B = [2,8,1,9] and K=5
Output: 24
Explanation: An optimal solution is as follows:
Choose from end of B, add 9 to score. Remove 9 from B
Choose from start of A, add 3 to score. Remove 3 from A
Choose from start of B, add 2 to score. Remove 2 from B
Choose from start of B, add 8 to score. Remove 8 from B
Choose from end of A, add 2 to score. Remove 2 from A
The total score is 9+3+2+8+2 = 24
Constraints
1 ≤ N ≤ 6000
1 ≤ M ≤ 6000
1 ≤ A[i] ≤ 109
1 ≤ B[i] ≤ 109
1 ≤ K ≤ N+M
My Approach
Since, greedy [choosing maximum end from both array] approach is failing here [because it will produce conflict when maximum end of both array is same], it suggests we have to look for all possible combinations. There will be overlapping sub-problems, hence DP!
Here is the python reprex code for the same.
A = [3,1,2]
N = len(A)
B = [2,8,1,9]
M = len(B)
K = 5
memo = {}
def solve(i,j, AL, BL):
if (i,j,AL,BL) in memo:
return memo[(i,j,AL,BL)]
AR = (N-1)-(i-AL)
BR = (M-1)-(j-BL)
if AL>AR or BL>BR or i+j==K:
return 0
op1 = A[AL] + solve(i+1,j,AL+1,BL)
op2 = B[BL] + solve(i,j+1,AL,BL+1)
op3 = A[AR] + solve(i+1,j,AL,BL)
op4 = B[BR] + solve(i,j+1,AL,BL)
memo[(i,j,AL,BL)] = max(op1,op2,op3,op4)
return memo[(i,j,AL,BL)]
print(solve(0,0,0,0))
In brief,
i indicates that we have performed i operations from A
j indicates that we have performed j operations from B
Total operation is thus i+j
AL indicates index on left of which which all integers of A are used. Similarly AR indicates index on right of which all integers of A used for operation.
BL indicates index on left of which which all integers of B are used. Similarly BR indicates index on right of which all integers of B used for operation.
We are trying out all possible combination, and choosing maximum from them in each step. Also memoizing our answer.
Doubt
The code worked fine for several test cases, but also failed for few. The message was Wrong Answer means there was no Time Limit Exceed, Memory Limit Exceed, Syntax Error or Run Time Error. This means there is some logical error only.
Can anyone help in identifying those Test Cases? And, also in understanding intuition/reason behind why this approach failed in some case?
Examples were posted code gives the wrong answer:
Example 1.
A = [1, 1, 1]
N = len(A)
B = [1, 1]
M = len(B)
K = 5
print(print(solve(0,0,0,0))) # Output: 4 (which is incorrect)
# Correct answer is 5
Example 2.
A = [1, 1]
B = [1]
N = len(A)
M = len(B)
K = 3
print(print(solve(0,0,0,0))) # Output: 2 (which is incorrect)
# Correct answer is 3
Alternative Code
def solve(A, B, k):
def solve_(a_left, a_right, b_left, b_right, remaining_ops, sum_):
'''
a_left - left pointer into A
a_right - right pointer in A
b_left - left pointer into B
b_right - right pointer into B
remaining_ops - remaining operations
sum_ - sum from previous operations
'''
if remaining_ops == 0:
return sum_ # out of operations
if a_left > a_right and b_left > b_right:
return sum_ # both left and right are empty
if (a_left, a_right, b_left, b_right) in cache:
return cache[(a_left, a_right, b_left, b_right)]
max_ = sum_ # init to current sum
if a_left <= a_right: # A not empty
max_ = max(max_,
solve_(a_left + 1, a_right, b_left, b_right, remaining_ops - 1, sum_ + A[a_left]), # Draw from left of A
solve_(a_left, a_right - 1, b_left, b_right, remaining_ops - 1, sum_ + A[a_right])) # Draw from right of A
if b_left <= b_right: # B not empty
max_ = max(max_,
solve_(a_left, a_right, b_left + 1, b_right, remaining_ops - 1, sum_ + B[b_left]), # Draw from left of B
solve_(a_left, a_right, b_left, b_right - 1, remaining_ops - 1, sum_ + B[b_right])) # Draw from right of B
cache[(a_left, a_right, b_left, b_right)] = max_ # update cache
return cache[(a_left, a_right, b_left, b_right)]
cache = {}
return solve_(0, len(A) - 1, 0, len(B) - 1, k, 0)
Tests
print(solve([3,1,2], [2,8,1,9], 5) # Output 24
print(solve([1, 1, 1], [1, 1, 1], 5) # Output 5
The approach is failing because the Recursive Functions stops computing further sub-problems when either "AL exceeds AR" or "BL exceeds BR".
We should stop computing and return 0 only when both of them are True. If either of "AL exceeds AR" or "BL exceeds BR" evaluates to False, means we can solve that sub-problem.
Moreover, one quick optimization here is that when N+M==K, in this case we can get maximum score by choosing all elements from both the arrays.
Here is the correct code!
A = [3,1,2]
B = [2,8,1,9]
K = 5
N, M = len(A), len(B)
memo = {}
def solve(i,j, AL, BL):
if (i,j,AL,BL) in memo:
return memo[(i,j,AL,BL)]
AR = (N-1)-(i-AL)
BR = (M-1)-(j-BL)
if i+j==K or (AL>AR and BL>BR):
return 0
ans = -float('inf')
if AL<=AR:
ans = max(A[AL]+solve(i+1,j,AL+1,BL),A[AR]+solve(i+1,j,AL,BL),ans)
if BL<=BR:
ans = max(B[BL]+solve(i,j+1,AL,BL+1),B[BR]+solve(i,j+1,AL,BL),ans)
memo[(i,j,AL,BL)] = ans
return memo[(i,j,AL,BL)]
if N+M==K:
print(sum(A)+sum(B))
else:
print(solve(0,0,0,0))
[This answer was published taking help from DarryIG's Answer. The reason for publishing answer is to write code similar to code in question body. DarryIG's answer used different prototype for function]

Python: min heap swap count

Although there are lots of questions that have already been asked and answered regarding heap implementation in python, I was unable to find any practical clarifications about indexes. So, allow me to ask one more heap related question.
I'm trying to write a code that transforms a list of values into a min-heap and saves swapped indexes. Here is what I have so far:
def mins(a, i, res):
n = len(a)-1
left = 2 * i + 1
right = 2 * i + 2
if not (i >= n//2 and i <= n):
if (a[i] > a[left] or a[i] > a[right]):
if a[left] < a[right]:
res.append([i, left])
a[i], a[left] = a[left], a[i]
mins(a, left, res)
else:
res.append([i, right])
a[i], a[right] = a[right], a[i]
mins(a, right, res)
def heapify(a, res):
n = len(a)
for i in range(n//2, -1, -1):
mins(a, i, res)
return res
a = [7, 6, 5, 4, 3, 2]
res = heapify(a, [])
print(a)
print(res)
Expected output:
a = [2, 3, 4, 5, 6, 7]
res = [[2, 5], [1, 4], [0, 2], [2, 5]]
What I get:
a = [3, 4, 5, 6, 7, 2]
res = [[1, 4], [0, 1], [1, 3]]
It's clear that there is something wrong with indexation in the above script. Probably something very obvious, but I just don't see it. Help out!
You have some mistakes in your code:
In heapify the first node that has a child, is at index (n - 2)//2, so use that as start value of the range.
In mins the condition not (i >= n//2 and i <= n) does not make a distinction between the case where the node has just one child or two. And i==n//2 should really be allowed when n is odd. Because then it has a left child. It is so much easier to just compare the value of left and right with n. It is also confusing that in heapify you define n as len(a), while in mins you define it as one less. This is really good for confusing the reader of your code!
To avoid code duplication (the two blocks where you swap), introduce a new variable that is set to either left or right depending on which one has the smaller value.
Here is a correction:
def mins(a, i, res):
n = len(a)
left = 2 * i + 1
right = 2 * i + 2
if left >= n:
return
child = left
if right < n and a[right] < a[left]:
child = right
if a[child] < a[i]: # need to swap
res.append([i, child])
a[i], a[child] = a[child], a[i]
mins(a, child, res)
def heapify(a, res):
n = len(a)
for i in range((n - 2)//2, -1, -1):
mins(a, i, res)
return res

Problem implementing Merge Sort from pseudo code python

Im trying to implement merge sort in Python based on the following pseudo code. I know there are many implementations out there, but I have not been able to find one that followis this pattern with a for loop at the end as opposed to while loop(s). Also, setting the last values in the subarrays to infinity is something I haven't seen in other implementation. NOTE: The following pseudo code has 1 based index i.e. index starts at 1. So I think my biggest issue is getting the indexing right. Right now its just not sorting properly and its really hard to follow with the debugger. My implementation is at the bottom.
Current Output:
Input: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Merge Sort: [0, 0, 0, 3, 0, 5, 5, 5, 8, 0]
def merge_sort(arr, p, r):
if p < r:
q = (p + (r - 1)) // 2
merge_sort(arr, p, q)
merge_sort(arr, q + 1, r)
merge(arr, p, q, r)
def merge(A, p, q, r):
n1 = q - p + 1
n2 = r - q
L = [0] * (n1 + 1)
R = [0] * (n2 + 1)
for i in range(0, n1):
L[i] = A[p + i]
for j in range(0, n2):
R[j] = A[q + 1 + j]
L[n1] = 10000000 #dont know how to do infinity for integers
R[n2] = 10000000 #dont know how to do infinity for integers
i = 0
j = 0
for k in range(p, r):
if L[i] <= R[j]:
A[k] = L[i]
i += 1
else:
A[k] = R[j]
j += 1
return A
First of all you need to make sure if the interval represented by p and r is open or closed at its endpoints. The pseudocode (for loops include last index) establishes that the interval is closed at both endpoints: [p, r].
With last observation in mind you can note that for k in range(p, r): doesn't check last number so the correct line is for k in range(p, r + 1):.
You can represent "infinity" in you problem by using the maximum element of A in the range [p, r] plus one. That will make the job done.
You not need to return the array A because all changes are being done through its reference.
Also, q = (p + (r - 1)) // 2 isn't wrong (because p < r) but correct equation is q = (p + r) // 2 as the interval you want middle integer value of two numbers.
Here is a rewrite of the algorithm with “modern” conventions, which are the following:
Indices are 0-based
The end of a range is not part of that range; in other words, intervals are closed on the left and open on the right.
This is the resulting code:
INF = float('inf')
def merge_sort(A, p=0, r=None):
if r is None:
r = len(A)
if r - p > 1:
q = (p + r) // 2
merge_sort(A, p, q)
merge_sort(A, q, r)
merge(A, p, q, r)
def merge(A, p, q, r):
L = A[p:q]; L.append(INF)
R = A[q:r]; R.append(INF)
i = 0
j = 0
for k in range(p, r):
if L[i] <= R[j]:
A[k] = L[i]
i += 1
else:
A[k] = R[j]
j += 1
A = [433, 17, 585, 699, 942, 483, 235, 736, 629, 609]
merge_sort(A)
print(A)
# → [17, 235, 433, 483, 585, 609, 629, 699, 736, 942]
Notes:
Python has a handy syntax for copying a subrange.
There is no int infinity in Python, but we can use the float one, because ints and floats can always be compared.
There is one difference between this algorithm and the original one, but it is irrelevant. Since the “midpoint” q does not belong to the left range, L is shorter than R when the sum of their lengths is odd. In the original algorithm, q belongs to L, and so L is the longer of the two in this case. This does not change the correctness of the algorithm, since it simply swaps the roles of L and R. If for some reason you need not to have this difference, then you must calculate q like this:
q = (p + r + 1) // 2
In mathematics, we represent all real numbers which are greater than or equal to i and smaller than j by [i, j). Notice the use of [ and ) brackets here. I have used i and j in the same way in my code to represent the region that I am dealing with currently.
ThThe region [i, j) of an array covers all indexes (integer values) of this array which are greater or equal to i and smaller than j. i and j are 0-based indexes. Ignore the first_array and second_array the time being.
Please notice, that i and j define the region of the array that I am dealing with currently.
Examples to understand this better
If your region spans over the whole array, then i should be 0 and j should be the length of array [0, length).
The region [i, i + 1) has only index i in it.
The region [i, i + 2) has index i and i + 1 in it.
def mergeSort(first_array, second_array, i, j):
if j > i + 1:
mid = (i + j + 1) // 2
mergeSort(second_array, first_array, i, mid)
mergeSort(second_array, first_array, mid, j)
merge(first_array, second_array, i, mid, j)
One can see that I have calculated middle point as mid = (i + j + 1) // 2 or one can also use mid = (i + j) // 2 both will work. I will divide the region of the array that I am currently dealing with into 2 smaller regions using this calculated mid value.
In line 4 of the code, MergeSort is called on the region [i, mid) and in line 5, MergeSort is called on the region [mid, j).
You can access the whole code here.

number of subsequences whose sum is divisible by k

I just did a coding challenge for a company and was unable to solve this problem. Problem statement goes like:
Given an array of integers, find the number of subsequences in the array whose sum is divisible by k, where k is some positive integer.
For example, for [4, 1, 3, 2] and k = 3, the solution is 5. [[3], [1, 2], [4,3,2], [4,2], [1,3,2]] are the subsequences whose sum is divisible by k, i.e. current_sum + nums[i] % k == 0, where nums[i] is the current element in the array.
I tried to solve this recursively, however, I was unable to pass any test cases. My recursive code followed something like this:
def kSum(nums, k):
def kSum(cur_sum, i):
if i == len(nums): return 0
sol = 1 if (cur_sum + nums[i]) % k == 0 else 0
return sol + kSum(cur_sum, i+1) + kSum(cur_sum + nums[i], i+1)
return kSum(0, 0)
What is wrong with this recursive approach, and how can I correct it? I'm not interested in an iterative solution, I just want to know why this recursive solution is wrong and how I can correct it.
Are you sure that is not the case test? For example:
[4, 1, 3, 2], k = 3
has
4+2 = 6, 1+2=3, 3, 1+2+3=6, 4+2+3 = 9
So, your function is right (it gives me 5) and I don't see a major problem with your function.
Here is a javascript reproduction of what you wrote with some console logs to help explain its behavior.
function kSum(nums, k) {
let recursive_depth = 1;
function _kSum(cur_sum, i) {
recursive_depth++;
if (i == nums.length) {
recursive_depth--;
return 0;
}
let sol = 0;
if (((cur_sum + nums[i]) % k) === 0) {
sol = 1;
console.log(`Found valid sequence ending with ${nums[i]} with sum = ${cur_sum + nums[i]} with partial sum ${cur_sum} at depth ${recursive_depth}`);
}
const _kSum1 = _kSum(cur_sum, i+1);
const _kSum2 = _kSum(cur_sum + nums[i], i+1);
const res = sol + _kSum1 + _kSum2;
recursive_depth--;
return res;
}
return _kSum(0, 0);
}
let arr = [4, 1, 3, 2], k = 3;
console.log(kSum(arr, k));
I think this code actually gets the right answer. I'm not fluent in Python, but I might have inadvertently fixed a bug in your code though by adding parenthesis around (cur_sum + nums[i]) % k
It seems to me that your solution is correct. It reaches the answer by trying all subsequences, which has 2^n complexity. We could formulate it recursively in an O(n*k) search space, although it could be more efficient to table. Let f(A, k, i, r) represent how many subsequences leave remainder r when their sum is divided by k, using elements up to A[i]. Then:
function f(A, k, i=A.length-1, r=0){
// A[i] leaves remainder r
// when divided by k
const c = A[i] % k == r ? 1 : 0;
if (i == 0)
return c;
return c +
// All previous subsequences
// who's sum leaves remainder r
// when divided by k
f(A, k, i - 1, r) +
// All previous subsequences who's
// sum when combined with A[i]
// leaves remainder r when
// divided by k
f(A, k, i - 1, (k + r - A[i]%k) % k);
}
console.log(f([1,2,1], 3));
console.log(f([2,3,5,8], 5));
console.log(f([4,1,3,2], 3));
console.log(f([3,3,3], 3));

Quickselect implementation in Python has indeterminant results

I was studying algorithms and tried implementing quickSelect as explained here and asked here on StackOverflow.
I am curious as to why the code would return seemingly random result (instead of the expected k-th smallest element) when I wrote the code like this quickSelect(A2, k - len(A) - len(A2)) instead of like this quickSelect(A2, k - (len(A) - len(A2))). I've pasted the whole code and please feel free to uncomment/comment these two alternatives and run it. I thought the expressions like k - len(A) - len(A2) are evaluated BEFORE being passed into the next stack level?
import random
def quickSelect(A, k):
if not A:
return
pivot = random.choice(A)
A1 = []
A2 = []
for i in A:
if i < pivot:
A1.append(i)
else:
A2.append(i)
if k < len(A1):
return quickSelect(A1, k)
elif k > (len(A) - len(A2)):
# commented code below gives wrong results
# return quickSelect(A2, k - len(A) - len(A2))
return quickSelect(A2, k - (len(A) - len(A2)))
else:
return pivot
myList = [54,26,93,17,77,31,44,55,20]
# ordered array looks like this
# [17, 20, 26, 31, 44, 54, 55, 77, 93]
print(quickSelect(myList, 1))
print(quickSelect(myList, 2))
print(quickSelect(myList, 3))
print(quickSelect(myList, 4))
print(quickSelect(myList, 5))

Categories

Resources