Two quicksort implementations, differing comparisons

Two quicksort implementations, differing comparisons - python

UPDATE:
I've been trying to calculate the number of comparisons made in a quicksort implementation (code below), but I noticed that the number of comparisons is only correctly computed when I take out the bold bit in the line below.
left_comparisons, left_list = quick_sort(unsorted_list [:i - 1], l, i - 2)
I have two questions to ask:
Why should the bold bit affect the total comparisons made?
Why does input_list[l:r + 1] sometimes have a length of 0? Shouldn't this algorithm ensure that the base case always has 1 element in it?
$
def quick_sort(unsorted_list, l, r):
if len(unsorted_list[l:r + 1]) <= 1:
return 0, unsorted_list
else:
# choose_pivot(input_list) # TODO implement properly later. cd return input_list here.
pivot = unsorted_list[l]
i = l + 1
num_comparisons = r - l
for j in xrange(l + 1, r + 1):
if unsorted_list[j] < pivot:
temp = unsorted_list[i]
unsorted_list[i] = unsorted_list[j]
unsorted_list[j] = temp
i += 1
unsorted_list[l] = unsorted_list[i - 1]
unsorted_list[i - 1] = pivot
left_comparisons, left_list = quick_sort(unsorted_list[:i - 1], l, i - 2)
right_comparisons, right_list = quick_sort(unsorted_list, i, len(unsorted_list) - 1)
return left_comparisons + num_comparisons + right_comparisons, left_list[:i - 1] + [pivot] + right_list[i:]

Related

Maximum sum path in the matrix with a given starting point

I am learning to tackle a similar type of dynamic programming problem to find a maximum path sum in a matrix.
I have based my learning on this algorithm on the website below.
Source: Maximum path sum in matrix
The problem I am trying to solve is a little bit different from the one on the website.
The algorithm from the website makes use of max() to update values in the matrix to find max values to create a max path.
For example, given an array:
sample = [[110, 111, 108, 1],
[9, 8, 7, 2],
[4, 5, 10, 300],
[1, 2, 3, 4]]
The best sum path is 111 + 7 + 300 + 4 = 422
In the example above, the algorithm finds the first path by finding what is the max value of the first row of the matrix.
My question is, what if have to specify the starting point of the algorithm. The value h is given as the first element to start.
For example, given the sample array above, if h = 0, we need to start at sample[0][h], therefore the best path would be
110 (Our staring point) + 8 + 10 + 4 = 132
As you can see, the path can only travel downwards or adjacent, therefore if we start at h = 0, there will be values that we cannot reach some values such as 300.
Here is my messy attempt of solving this within the O(N*D) complexity,
# Find max path given h as a starting point
def find_max_path_w_start(mat, h):
res = mat[0][0]
M = len(mat[0])
N = len((mat))
for i in range(1, N):
res = 0
for j in range(M):
# Compute the ajacent sum of the ajacent values from h
if i == 1:
# If h is starting area, then compute the sum, find the max
if j == h:
# All possible
if (h > 0 and h < M - 1):
mat[1][h + 1] += mat[0][h]
mat[1][h] += mat[0][h]
mat[1][h - 1] += mat[0][h]
print(mat)
# Diagona Right not possible
elif (h > 0):
mat[1][h] += mat[0][h]
mat[1][h - 1] += mat[0][h]
# Diagonal left not possible
elif (h < M - 1):
mat[1][h] += mat[0][h]
mat[1][h + 1] += mat[0][h]
# Ignore value that has been filled.
elif j == h + 1 or j == h - 1 :
pass
# Other elements that cannot reach, make it -1
elif j > h + 1 or j < h - 1:
mat[i][j] = -1
else:
# Other elements that cannot reach, make it -1
if j > h + 1 or j < h - 1:
mat[i][j] = -1
else:
# When all paths are possible
if (j > 0 and j < M - 1):
mat[i][j] += max(mat[i - 1][j],
max(mat[i - 1][j - 1],
mat[i - 1][j + 1]))
# When diagonal right is not possible
elif (j > 0):
mat[i][j] += max(mat[i - 1][j],
mat[i - 1][j - 1])
# When diagonal left is not possible
elif (j < M - 1):
mat[i][j] += max(mat[i - 1][j],
mat[i - 1][j + 1])
res = max(mat[i][j], res)
return res
My approach is to only store the reachable values, if example if we start at h = 0, since we are starting at mat[0][h], we can only compute the sum of current and bottom max(mat[1][h] and sum of current and adjacent right mat[1][h + 1]), for other values I mark it as -1 to mark it as unreachable.
This doesn't return the expected sum at the end.
Is my logic incorrect? Are there other values that I need to store to complete this?

You can set all elements of the first row except h to negative infinity, and compute the answer as if there is no starting point restriction.
For example, put this piece of code at the start of your code
for i in range(M):
if i != h:
mat[0][i] = -1e100

Here is a solution which works in a similar way to yours, however it only calculates path sums for at matrix values that could have started at h.
def find_max_path_w_start(mat, h):
M = len(mat[0])
N = len((mat))
for i in range(1, N):
# `h - i` is the left hand side of a triangle with `h` as the top point.
# `max(..., 0)` ensures that is is at least 0 and in the matrix.
min_j = max(h - i, 0)
# similar to above, but the right hand side of the triangle.
max_j = min(h + i, M - 1)
for j in range(min_j, max_j + 1):
# min_k and max_k are the start and end indices of the points in the above
# layer which could potentially lead to a correct solution.
# Generally, you want to iterate from `j - 1` up to `j + 1`,
# however if at the edge of the triangle, do not take points from outside the triangle:
# this leads to the `h - i + 1` and `h + i - 1`.
# The `0` and `M - 1` prevent values outside the matrix being sampled.
min_k = max(j - 1, h - i + 1, 0)
max_k = min(j + 1, h + i - 1, M - 1)
# Find the max of the possible path totals
mat[i][j] += max(mat[i - 1][k] for k in range(min_k, max_k + 1))
# Only sample from items in the bottom row which could be paths from `h`
return max(mat[-1][max(h - N, 0):min(h + N, M - 1) + 1])
sample = [[110, 111, 108, 1],
[9, 8, 7, 2],
[4, 5, 10, 300],
[1, 2, 3, 4]]
print(find_max_path_w_start(sample, 0))

It's easy to build a bottom up solution here. Start thinking the case when there's only one or two rows, and extend it to understand this algorithm easily.
Note: this modifies the original matrix instead of creating a new one. If you need to run the function multiple times on the same matrix, you'll need to create a copy of the matrix to do the same.
def find_max_path_w_start(mat, h):
res = mat[0][0]
M = len(mat[0])
N = len((mat))
# build solution bottom up
for i in range(N-2,-1,-1):
for j in range(M):
possible_values = [mat[i+1][j]]
if j==0:
possible_values.append(mat[i+1][j+1])
elif j==M-1:
possible_values.append(mat[i+1][j-1])
else:
possible_values.append(mat[i+1][j+1])
possible_values.append(mat[i+1][j-1])
mat[i][j] += max(possible_values)
return mat[0][h]
sample = [[110, 111, 108, 1],
[9, 8, 7, 2],
[4, 5, 10, 300],
[1, 2, 3, 4]]
print(find_max_path_w_start(sample, 0)) # prints 132

Trying to find the optimal subset for the Greedy knapsack problem(python)

I think this is the correct algorithm for finding the optimal value, but now i need to find the optimal subsets that got me that value. Help would be greatly appreciated!
These were my directions:
Implement a greedy algorithm that arranges the items in the decreasing order of value to weight ratio (vi/wi for i = 1, 2, ..., n), then select the items in this order until the weight of the next item exceeds the remaining capacity (Note: In this greedy version, we stop right after the first item whose inclusion would exceed the knapsack capacity).
def greedy_knapsack(val, weight, W, n):
# index = [0, 1, 2, ..., n - 1] for n items
index = list(range(len(val)))
# contains ratios of values to weight
ratio = [v / w for v, w in zip(val, weight)]
QuickSort(ratio, 0, len(ratio) - 1)
max_value = 0
for i in index:
if weight[i] <= W:
max_value += val[i]
W -= weight[i]
else:
max_value += val[i] * W // weight[i]
break
return max_value

Your greedy approach will fail in many cases.
One such trivial case:
weight = [10, 10, 10]
value = [5, 4, 3]
W = 7
In this case, your algorithm will choose (item 1) sum = 5, but the optimal answer should be (items 2 and 3), sum = 7.
You need a dynamic programming approach to solve this and you can keep a matrix to store your previous states so that you can reconstruct the solution and get the item list.
# Prints the items which are put in a
# knapsack of capacity W
def printknapSack(W, wt, val, n):
K = [[0 for w in range(W + 1)]
for i in range(n + 1)]
# Build table K[][] in bottom
# up manner
for i in range(n + 1):
for w in range(W + 1):
if i == 0 or w == 0:
K[i][w] = 0
elif wt[i - 1] <= w:
K[i][w] = max(val[i - 1]
+ K[i - 1][w - wt[i - 1]],
K[i - 1][w])
else:
K[i][w] = K[i - 1][w]
# stores the result of Knapsack
res = K[n][W]
print(res)
w = W
for i in range(n, 0, -1):
if res <= 0:
break
# either the result comes from the
# top (K[i-1][w]) or from (val[i-1]
# + K[i-1] [w-wt[i-1]]) as in Knapsack
# table. If it comes from the latter
# one/ it means the item is included.
if res == K[i - 1][w]:
continue
else:
# This item is included.
print(wt[i - 1])
# Since this weight is included
# its value is deducted
res = res - val[i - 1]
w = w - wt[i - 1]
# Driver code
val = [ 60, 100, 120 ]
wt = [ 10, 20, 30 ]
W = 50
n = len(val)
printknapSack(W, wt, val, n)
ref: https://www.geeksforgeeks.org/printing-items-01-knapsack/

Merge Sort in Python Bug

I'm trying to implement a merge sort in Python. I completed a merge sort lesson on Khan Academy where they had me implement it in JavaScript, but I wanted to try and implement it in Python.
Lesson: https://www.khanacademy.org/computing/computer-science/algorithms#merge-sort
Here is my code:
from math import floor
def merge(array, p, q, r):
left_array = []
right_array = []
k = p
while (k < q):
left_array.append(array[k])
k += 1
while (k < r):
right_array.append(array[k])
k += 1
k = p
i = 0
j = 0
while (i < len(left_array) and j < len(right_array)):
if (left_array[i] <= right_array[j]):
array[k] = left_array[i]
k += 1
i += 1
else:
array[k] = right_array[j]
k += 1
j += 1
while (i < len(left_array)):
array[k] = left_array[i]
k += 1
i += 1
while (j < len(right_array)):
array[k] = right_array[j]
k += 1
j += 1
print("Merging", array)
def merge_sort(array, p, r):
print("Splitting", array)
if p < r:
q = floor((p + r) / 2)
merge_sort(array, p, q)
merge_sort(array, q + 1, r)
merge(array, p, q, r)
test3 = [3, 2, 1]
merge_sort(test3, 0, len(test3))
There's a bug somewhere in my code and I can't seem to get it. I think that it has to do with my splicing, but I haven't been able to confirm this. Here is my output for the test at the bottom:
Splitting [3, 2, 1]
Splitting [3, 2, 1]
Splitting [3, 2, 1]
Splitting [3, 2, 1]
Merging [3, 2, 1]
Splitting [3, 2, 1]
Splitting [3, 2, 1]
Splitting [3, 2, 1]
Merging [3, 2, 1]
Merging [2, 1, 3]
I took the idea of adding print statements from here.
Any help is appreciated. Thank you!

Your code is not following the conventions of the text you linked to on whether the bounds are exclusive or inclusive. In the text, they are inclusive, but in your code they are exclusive of the upper bound. As a result, when you have these two lines:
merge_sort(array, p, q)
merge_sort(array, q + 1, r)
the first sorts array[p] through array[q-1], the second sorts array[q+1] through array[r-1], and you end up completely skipping array[q].
I think you will find it easier to follow the conventions of the text and make both bounds inclusive. So modify you code, start with
test3 = [3, 2, 1]
merge_sort(test3, 0, len(test3) - 1)
, and go from there.
You can also clean up your code greatly by using python slice notation. For example:
left_array = []
right_array = []
k = p
while (k < q):
left_array.append(array[k])
k += 1
while (k < r):
right_array.append(array[k])
k += 1
can be simplified to
left_array = array[p:q]
right_array = array[q:r]
although, as I stated, you'll probably want to start using inclusive indices.

Unable to implement a dynamic programming table algorithm in python

I am having problems creating a table in python. Basically I want to build a table that for every number tells me if I can use it to break down another(its the table algo from the accepted answer in Can brute force algorithms scale?). Here's the pseudo code:
for i = 1 to k
for z = 0 to sum:
for c = 1 to z / x_i:
if T[z - c * x_i][i - 1] is true:
set T[z][i] to true
Here's the python implementation I have:
from collections import defaultdict
data = [1, 2, 4]
target_sum = 10
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T = defaultdict(bool) # all values are False by default
T[0, 0] = True # base case
for i, x in enumerate(data): # i is index, x is data[i]
for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
for c in range(s / x + 1):
if T[s - c * x, i]:
T[s, i+1] = True
#query area
target_result = 1
for node in T:
if node[0]==target_result:
print node, ':', T[node]
So what I expect is if target_result is set to 8, it shows how each item in list data can be used to break that number down. For 8, 1,2,4 for all work so I expect them all to be true, but this program is making everything true. For example, 1 should only be able to be broken down by 1(and not 2 or 4) but when I run it as 1, I get:
(1, 2) : True
(1, 0) : False
(1, 3) : True
(1, 1) : True
can anyone help me understand what's wrong with the code? or perhaps I am not understanding the algorithm that was posted in answer I am referring to.
(Note: I could be completely wrong, but I learned that defaultdict creates entries even if its not there, and if the entry exists the algo turns it to true, maybe thats the problem I'm not sure, but it was the line of thought I tried to go but it didn't work for me because it seems to break the overall implemention)
Thanks!

The code works if you print the solution using RecursivelyListAllThatWork():
coeff = [0]*len(data)
def RecursivelyListAllThatWork(k, sum): # Using last k variables, make sum
# /* Base case: If we've assigned all the variables correctly, list this
# * solution.
# */
if k == 0:
# print what we have so far
print(' + '.join("%2s*%s" % t for t in zip(coeff, data)))
return
x_k = data[k-1]
# /* Recursive step: Try all coefficients, but only if they work. */
for c in range(sum // x_k + 1):
if T[sum - c * x_k, k - 1]:
# mark the coefficient of x_k to be c
coeff[k-1] = c
RecursivelyListAllThatWork(k - 1, sum - c * x_k)
# unmark the coefficient of x_k
coeff[k-1] = 0
RecursivelyListAllThatWork(len(data), target_sum)
Output
10*1 + 0*2 + 0*4
8*1 + 1*2 + 0*4
6*1 + 2*2 + 0*4
4*1 + 3*2 + 0*4
2*1 + 4*2 + 0*4
0*1 + 5*2 + 0*4
6*1 + 0*2 + 1*4
4*1 + 1*2 + 1*4
2*1 + 2*2 + 1*4
0*1 + 3*2 + 1*4
2*1 + 0*2 + 2*4
0*1 + 1*2 + 2*4

As a side note, you don't really need a defaultdict with what you're doing, you can use a normal dict + .get():
data = [1, 2, 4]
target_sum = 10
T = {}
T[0, 0] = True
for i,x in enumerate(data):
for s in range(target_sum + 1): # xrange on python-2.x
for c in range(s // x + 1):
if T.get((s - c * x, i)):
T[s, i+1] = True
If you're using J.S. solution, don't forget to change:
if T[sum - c * x_k, k - 1]:
with:
if T.get((sum - c * x_k, k - 1)):

Your code is right.
1 = 1 * 1 + 0 * 2, so T[1, 2] is True.
1 = 1 * 1 + 0 * 2 + 0 * 4, so T[1, 3] is True.
As requested in the comments, a short explanation of the algo:
It calculates all numbers from 0 to targetsum that can be represented as a sum of (non-negative) multiples of some of the numbers in data.
If T[s, i] is True, then s can be represented in this way using only the first i elements of data.
At the start, 0 can be represented as the empty sum, thus T[0, 0] is True. (This step may seem a little technical.)
Let x be the 'i+1'-th element of data. Then, the algorithm tries for each number s if it can be represented by the sum of some multiple of x and a number for which a representation exists that uses only the first i elements of data (the existence of such a number means T[s - c * x, i] is True for some c). If so, s can be represented using only the first i+1 elements of data.

how to diff / align Python lists using arbitrary matching function?

I'd like to align two lists in a similar way to what difflib.Differ would do except I want to be able to define a match function for comparing items, not just use string equality, and preferably a match function that can return a number between 0.0 and 1.0, not just a boolean.
So, for example, say I had the two lists:
L1 = [('A', 1), ('B', 3), ('C', 7)]
L2 = ['A', 'b', 'C']
and I want to be able to write a match function like this:
def match(item1, item2):
if item1[0] == item2:
return 1.0
elif item1[0].lower() == item2.lower():
return 0.5
else:
return 0.0
and then do:
d = Differ(match_func=match)
d.compare(L1, L2)
and have it diff using the match function. Like difflib, I'd rather the algorithm gave more intuitive Ratcliff-Obershelp type results rather than a purely minimal Levenshtein distance.

I just wrote this implementation of Needleman-Wunsch and it seems to do what I want:
def nw_align(a, b, replace_func, insert, delete):
ZERO, LEFT, UP, DIAGONAL = 0, 1, 2, 3
len_a = len(a)
len_b = len(b)
matrix = [[(0, ZERO) for x in range(len_b + 1)] for y in range(len_a + 1)]
for i in range(len_a + 1):
matrix[i][0] = (insert * i, UP)
for j in range(len_b + 1):
matrix[0][j] = (delete * j, LEFT)
for i in range(1, len_a + 1):
for j in range(1, len_b + 1):
replace = replace_func(a[i - 1], b[j - 1])
matrix[i][j] = max([
(matrix[i - 1][j - 1][0] + replace, DIAGONAL),
(matrix[i][j - 1][0] + insert, LEFT),
(matrix[i - 1][j][0] + delete, UP)
])
i, j = len_a, len_b
align_a = ""
align_b = ""
while (i, j) != (0, 0):
if matrix[i][j][1] == DIAGONAL:
align_a += a[i - 1]
align_b += b[j - 1]
i -= 1
j -= 1
elif matrix[i][j][1] == LEFT:
align_a += "-"
align_b += b[j - 1]
j -= 1
else: # UP
align_a += a[i - 1]
align_b += "-"
i -= 1
return align_a[::-1], align_b[::-1]

I recently ran across a discussion of an algorithm called patience diff that sounds rather simple. You could try implementing that yourself, and then of course you can have it use whatever comparison algorithm you like.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Two quicksort implementations, differing comparisons - python

Related

Maximum sum path in the matrix with a given starting point

Trying to find the optimal subset for the Greedy knapsack problem(python)

Merge Sort in Python Bug

Unable to implement a dynamic programming table algorithm in python

how to diff / align Python lists using arbitrary matching function?

Categories

Resources