Merge sort to count split inversions in Python - python

I am trying to use mergesort--which I get--to count the number of split inversions in a list (that is, where an element in the first half of the unsorted list should appear after a given element in the second half of the unsorted list; for example [3 2 1 4] would contain the split inversion (3, 1), but not (3, 2) as 3 and 2 are both in the first half). When I get to the final print statement, I am getting the answer I expect--in this case 9--but the return value is all wonky since it's returning the split value through the recursion. I've tried all sorts of combinations of indexing to no avail. Any help? (using Python 2.7)
(For the record, this is a Coursera homework problem, but I'm just learning for fun--no one's grading this other than me.)
def mergesort(lst):
'''Recursively divides list in halves to be sorted'''
if len(lst) is 1:
return lst
middle = int(len(lst)/2)
left = mergesort(lst[:middle])
right = mergesort(lst[middle:])
sortedlist = merge(left, right)
return sortedlist
def merge(left, right):
'''Subroutine of mergesort to sort split lists. Also returns number
of split inversions (i.e., each occurence of a number from the sorted second
half of the list appearing before a number from the sorted first half)'''
i, j = 0, 0
splits = 0
result = []
while i < len(left) and j < len(right):
if left[i] < right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
splits += len(left[i:])
result += left[i:]
result += right[j:]
print result, splits
return result, splits
print mergesort([7,2,6,4,5,1,3,8])

Modify your mergesort function to disregard the intermediate splits.
def mergesort(lst):
'''Recursively divides list in halves to be sorted'''
if len(lst) == 1:
return lst, 0
middle = len(lst)/2
left = mergesort(lst[:middle])[0] # Ignore intermediate splits
right = mergesort(lst[middle:])[0] # Ignore intermediate splits
sortedlist, splits = merge(left, right)
return sortedlist, splits

There's a few things wrong in your code:
Don't int() the result of len() / 2. If you are using Python3, I'd directly use integer division with the // operator.
The comparison in the first line of mergesort() is wrong. Firstly, do not use is to compare for equality. The is operator is only intended for identity. If you have two different integers which have the same value, they are equal but not identical. For small integers, I believe that at least some Python dialects will intern the value, hence your comparison works, but you are relying on something that isn't guaranteed. Still, using == also doesn't work, because you forgot the case of the empty list.
I guess your actual problem ("wonky return value") is caused by the fact that you return two values that you store (as a tuple) under a single name and which you then pass as parameters to recursive merge() calls.

Since, in your code merge returns a pair, mergesort must also return a pair.
Enable to get the total split inversion in the list you must add the left half, right half and merge split inversions.
Here is the modification I have made in your code.
def mergesort(lst):
'''Recursively divides list in halves to be sorted'''
if len(lst) == 1:
return lst, 0
middle = len(lst)/2
left, s1 = mergesort(lst[:middle])[0] # Ignore intermediate splits
right, s2 = mergesort(lst[middle:])[0] # Ignore intermediate splits
sortedlist, s3 = merge(left, right)
return sortedlist, (s1+s2+s3)`

Related

Ceiling of the element in sorted array

Hi I am doing DSA problems and found a problem called as ceiling of the element in sorted array. In this problem there is a sorted array and if the target element is present in the sorted array return the target. If the target element is not found in the sorted array we need to return the smallest element which is greater than target. I have written the code and also done some test cases but need to check if everything works correctly. This problem is not there on leetcode where I could run it with many different cases. Need suggestion/feedback if the problem is solved in the correct way and if it would give correct results in all cases
class Solution:
#My approch
def smallestNumberGreaterThanTarget(self, nums, target):
start = 0
end = len(nums)-1
if target > nums[end]:
return -1
while start <= end:
mid = start + (end-start)//2
if nums[mid] == target:
return nums[mid]
elif nums[mid] < target:
if nums[mid+1] >= target:
return nums[mid+1]
start = mid + 1
else:
end = mid-1
return nums[start]
IMO, the problem can be solved in a simpler way, with only one test inside the main loop. The figure below shows a partition of the real line, in subsets associated to the values in the array.
First, we notice that for all values above the largest, there is no corresponding element, and we will handle this case separately.
Now we have exactly N subsets left, and we can find the right one by a dichotomic search among these subsets.
if target > nums[len(nums)-1]:
return None
s, e= 0, len(nums);
while e > s:
m= e + ((s - e) >> 1);
if target > nums[m]:
s= m+1
else:
e= m
return s
We can formally prove the algorithm using the invariant nums[s-1] < target <= nums[e], with the fictional convention nums[-1] = -∞. In the end, we have the bracketing nums[s-1] < target <= nums[s].
The code errors out with an index out-of-range error for the empty list (though this may not be necessary because you haven't specified the problem constraints).
A simple if guard at the top of the function can fix this:
if not nums:
return -1
Otherwise, it seems fine to me. But if you're still not sure whether or not your algorithm works, you can always do random testing (e.g. create a linear search version of the algorithm and then randomly generate inputs to both algorithms, and then see if there's any difference).
Here's a one-liner that you can test against:
input_list = [0, 1, 2, 3, 4]
target = 0
print(next((num for num in input_list if num >= target), -1))

Quicksort - Median of Three with Edge Cases

I am posting for help on a Quicksort median of three problem. I am required to implement a partition that uses a pivot that is the median of three elements in the input list (start, middle, end) while accounting for some edges cases.
In brief, I need to implement a method choose_median() that returns the index of the median element (the chosen pivot); print the index of the pivot chosen at the previous step; create a partition()method so that it can work with the chosen pivot; print the list after the first partition.
I've been having trouble with this case for even lists:
"Note that if the size of a list is even, there are two ways to choose a median element. To avoid ambiguity, choose an element with a smaller index as a median in this case."
This is deceptively hard, I've been stumped for a while. Especially on lists with length 2 or less.
I have a method for the partition that works for cases if given the appropriate pivot:
def partition(arr, pivot):
less, equal, greater = [], [], []
for val in arr:
if val < pivot: less.append(val)
if val == pivot: equal.append(val)
if val > pivot: greater.append(val)
return less+equal+greater, pivot
I could also change this function to work with the pivot too.
def partition(lst, pivot, start, end):
j = start
for i in range(start + 1, end + 1):
if lst[i] <= lst[start]:
j += 1
lst[i], lst[j] = lst[j], lst[i]
lst[start], lst[j] = lst[j], lst[start]
return j
Choosing the median with odd number lists are straightforward.
Also, I've seen/know how to implement the implementations here:
Python: Quicksort with median of three
Any help would be appreciated.
You seem to make this harder than it really is. First of all, I think you confused your terminology of "median value" and "middle element".
The middle element of the list, rounding down, is simply lst[(len(lst) - 1)//2]: your index is the list length, minus 1, integer divide by 2.
Therefore, your pivot selection is easy: take the three indicated elements, sort them, and return the middle element.
def choose_pivot(lst):
return sorted( [lst[0],
lst[-1],
lst[(len(lst)-1) //2]
])[1]

recursion vs iteration time complexity

Could anyone explain exactly what's happening under the hood to make the recursive approach in the following problem much faster and efficient in terms of time complexity?
The problem: Write a program that would take an array of integers as input and return the largest three numbers sorted in an array, without sorting the original (input) array.
For example:
Input: [22, 5, 3, 1, 8, 2]
Output: [5, 8, 22]
Even though we can simply sort the original array and return the last three elements, that would take at least O(nlog(n)) time as the fastest sorting algorithm would do just that. So the challenge is to perform better and complete the task in O(n) time.
So I was able to come up with a recursive solution:
def findThreeLargestNumbers(array, largest=[]):
if len(largest) == 3:
return largest
max = array[0]
for i in array:
if i > max:
max = i
array.remove(max)
largest.insert(0, max)
return findThreeLargestNumbers(array, largest)
In which I kept finding the largest number, removing it from the original array, appending it to my empty array, and recursively calling the function again until there are three elements in my array.
However, when I looked at the suggested iterative method, I composed this code:
def findThreeLargestNumbers(array):
sortedLargest = [None, None, None]
for num in array:
check(num, sortedLargest)
return sortedLargest
def check(num, sortedLargest):
for i in reversed(range(len(sortedLargest))):
if sortedLargest[i] is None:
sortedLargest[i] = num
return
if num > sortedLargest[i]:
shift(sortedLargest, i, num)
return
def shift(array, idx, element):
if idx == 0:
array[0] = element
return array
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
return array
Both codes passed successfully all the tests and I was convinced that the iterative approach is faster (even though not as clean..). However, I imported the time module and put the codes to the test by providing an array of one million random integers and calculating how long each solution would take to return back the sorted array of the largest three numbers.
The recursive approach was way much faster (about 9 times faster) than the iterative approach!
Why is that? Even though the recursive approach is traversing the huge array three times and, on top of that, every time it removes an element (which takes O(n) time as all other 999 elements would need to be shifted in the memory), whereas the iterative approach is traversing the input array only once and yes making some operations at every iteration but with a very negligible array of size 3 that wouldn't even take time at all!
I really want to be able to judge and pick the most efficient algorithm for any given problem so any explanation would tremendously help.
Advice for optimization.
Avoid function calls. Avoid creating temporary garbage. Avoid extra comparisons. Have logic that looks at elements as little as possible. Walk through how your code works by hand and look at how many steps it takes.
Your recursive code makes only 3 function calls, and as pointed out elsewhere does an average of 1.5 comparisons per call. (1 while looking for the min, 0.5 while figuring out where to remove the element.)
Your iterative code makes lots of comparisons per element, calls excess functions, and makes calls to things like sorted that create/destroy junk.
Now compare with this iterative solution:
def find_largest(array, limit=3):
if len(array) <= limit:
# Special logic not needed.
return sorted(array)
else:
# Initialize the answer to values that will be replaced.
min_val = min(array[0:limit])
answer = [min_val for _ in range(limit)]
# Now scan for smallest.
for i in array:
if answer[0] < i:
# Sift elements down until we find the right spot.
j = 1
while j < limit and answer[j] < i:
answer[j-1] = answer[j]
j = j+1
# Now insert.
answer[j-1] = i
return answer
There are no function calls. It is possible that you can make up to 6 comparisons per element (verify that answer[0] < i, verify that (j=1) < 3, verify that answer[1] < i, verify that (j=2) < 3, verify that answer[2] < i, then find that (j=3) < 3 is not true). You will hit that worst case if array is sorted. But most of the time you only do the first comparison then move to the next element. No muss, no fuss.
How does it benchmark?
Note that if you wanted the smallest 100 elements, then you'd find it worthwhile to use a smarter data structure such as a heap to avoid the bubble sort.
I am not really confortable with python, but I have a different approach to the problem for what it's worth.
As far as I saw, all solutions posted are O(NM) where N is the length of the array and M the length of the largest elements array.
Because of your specific situation whereN >> M you could say it's O(N), but the longest the inputs the more it will be O(NM)
I agree with #zvone that it seems you have more steps in the iterative solution, which sounds like an valid explanation to your different computing speeds.
Back to my proposal, implements binary search O(N*logM) with recursion:
import math
def binarySearch(arr, target, origin = 0):
"""
Recursive binary search
Args:
arr (list): List of numbers to search in
target (int): Number to search with
Returns:
int: index + 1 from inmmediate lower element to target in arr or -1 if already present or lower than the lowest in arr
"""
half = math.floor((len(arr) - 1) / 2);
if target > arr[-1]:
return origin + len(arr)
if len(arr) == 1 or target < arr[0]:
return -1
if arr[half] < target and arr[half+1] > target:
return origin + half + 1
if arr[half] == target or arr[half+1] == target:
return -1
if arr[half] < target:
return binarySearch(arr[half:], target, origin + half)
if arr[half] > target:
return binarySearch(arr[:half + 1], target, origin)
def findLargestNumbers(array, limit = 3, result = []):
"""
Recursive linear search of the largest values in an array
Args:
array (list): Array of numbers to search in
limit (int): Length of array returned. Default: 3
Returns:
list: Array of max values with length as limit
"""
if len(result) == 0:
result = [float('-inf')] * limit
if len(array) < 1:
return result
val = array[-1]
foundIndex = binarySearch(result, val)
if foundIndex != -1:
result.insert(foundIndex, val)
return findLargestNumbers(array[:-1],limit, result[1:])
return findLargestNumbers(array[:-1], limit,result)
It is quite flexible and might be inspiration for a more elaborated answer.
The recursive solution
The recursive function goes through the list 3 times to fins the largest number and removes the largest number from the list 3 times.
for i in array:
if i > max:
...
and
array.remove(max)
So, you have 3×N comparisons, plus 3x removal. I guess the removal is optimized in C, but there is again about 3×(N/2) comparisons to find the item to be removed.
So, a total of approximately 4.5 × N comparisons.
The other solution
The other solution goes through the list only once, but each time it compares to the three elements in sortedLargest:
for i in reversed(range(len(sortedLargest))):
...
and almost each time it sorts the sortedLargest with these three assignments:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
So, you are N times:
calling check
creating and reversing a range(3)
accessing sortedLargest[i]
comparing num > sortedLargest[i]
calling shift
comparing idx == 0
and about 2×N/3 times doing:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
and N/3 times array[0] = element
It is difficult to count, but that is much more than 4.5×N comparisons.

Find pairs of numbers that add to a certain value?

I have a function match that takes in a list of numbers and a target number and I want to write a function that finds within the array two numbers that add to that target.
Here is my approach:
>>> def match(values, target=3):
... for i in values:
... for j in values:
... if j != i:
... if i + j == target:
... return print(f'{i} and {j}')
... return print('no matching pair')
Is this solution valiant? Can it be improved?
The best approach would result in O(NlogN) solution.
You sort the list, this will cost you O(NlogN)
Once the list is sorted you get two indices, the former points to the first element, the latter -- to the latest element and you check to see if the sum of the elements matches whatever is your target. If the sum is above the target, you move the upper index down, if the sum is below the target -- you move the lower index up. Finish when the upper index is equal to the lower index. This operation is linear and can be done in O(N) time.
All in all, you have O(NlogN) for the sorting and O(N) for the indexing, bringing the complexity of the whole solution to O(NlogN).
There is room for improvement. Right now, you have a nested loop. Also, you do not return when you use print.
As you iterate over values, you are getting the following:
values = [1, 2, 3]
target = 3
first_value = 1
difference: 3 - 1 = 2
We can see that in order for 1 to add up to 3, a 2 is required. Rather than iterating over the values, we can simply ask 2 in values.
def match(values, target):
values = set(values)
for value in values:
summand = target - value
if summand in values:
break
else:
print('No matching pair')
print(f'{value} and {summand}')
Edit: Converted values to a set since it has handles in quicker than if it were looking it up in a list. If you require the indices of these pairs, such as in the LeetCode problem you should not convert it to a set, since you will lose the order. You should also use enumerate in the for-loop to get the indices.
Edit: summand == value edge case
def match(values, target):
for i, value in enumerate(values):
summand = target - value
if summand in values[i + 1:]:
break
else:
print('No matching pair')
return
print(f'{value} and {summand}')

Simple selection sort with repeated elements?

Given a list x, I want to sort it with selection sort, and then count the number of swaps made within the sort. So I came out with something like this:
count=0
a=0
n=len(x)
while (n-a)>0:
#please recommend a better way to swap.
i = (min(x[a:n]))
x[i], x[a] = x[a], x[i]
a += 1
#the count must still be there
count+=1
print (x)
Could you help me to find a way to manage this better? It doesn't work that well.
The problem is NOT about repeated elements. Your code doesn't work for lists with all elements distinct, either. Try x = [2,6,4,5].
i = (min(x[a:n]))
min() here gets the value of the minimum element in the slice, and then you use it as an index, that doesn't make sense.
You are confusing the value of an element, with its location. You must use the index to identify the location.
seq = [2,1,0,0]
beg = 0
n = len(seq)
while (n - beg) > 0:
jdx = seq[beg:n].index((min(seq[beg:n]))) # use the remaining unsorted right
seq[jdx + beg], seq[beg] = seq[beg], seq[jdx + beg] # swap the minimum with the first unsorted element.
beg += 1
print(seq)
print('-->', seq)
As the sorting progresses, the left of the list [0:beg] is sorted, and the right side [beg:] is being sorted, until completion.
jdx is the location (the index) of the minimum of the remaining of the list (finding the min must happen on the unsorted right part of the list --> [beg:])

Categories

Resources