Python implementation of the mergeSort algorithm - python

I came across the following implementation of the mergeSort algorithm:
def merge_sort(x):
merge_sort2(x,0,len(x)-1)
def merge_sort2(x,first,last):
if first < last:
middle = (first + last) // 2
merge_sort2(x,first,middle)
merge_sort2(x,middle+1,last)
merge(x,first,middle,last)
def merge(x,first,middle,last):
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
i=j=0
for k in range(first,last+1):
if L[i] <= R[j]:
x[k] = L[i]
i += 1
else:
x[k] = R[j]
j += 1
x = [17, 87, 6, 22, 41, 3, 13, 54]
x_sorted = merge_sort(x)
print(x)
I get most of it. However, what I don't understand are the following four lines of the merge function:
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
First of all: why does the slicing end with middle+1 ? Slicing an array in Python includes the last element, right? So, shouldn't it be sufficient to slice from first:middle ? So, what is the +1 there for?
Secondly: Why do I have to append the huge number to the lists? Why doesn't it work without? It doesn't, I checked that. But I just don't know why.

Q1: Slicing an array in Python includes the last element, right?
No, Like range function Python slicing doesn't include the last element.
> a=[1,2,3,4,5]
> a[1:4]
[2, 3, 4]
Q2: Regarding the below snippet.
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
Without appending those large numbers to the lists, your merge code could have been different something like below.
# Copy data to temp arrays L[] and R[]
while i < len(L) and j < len(R):
if L[i] <= R[j]:
x[k] = L[i]
i += 1
else:
x[k] = R[j]
j += 1
# Checking if any element was left
while i < len(L):
x[k] = L[i]
i+=1
k+=1
while j < len(R):
x[k] = R[j]
j+=1
k+=1
As #Cedced_Bro pointed out in the comment section, those largest numbers are used to know that the end of one of the sides has been reached.
If you observe the above code snippet, if we run out of numbers in one list we ideally get out of the for loop and inserts the remaining elements of other lists in the temp array if any.
Appending those large numbers is an intelligent way to avoid those two for loops. But it has some cost of unnecessary comparison of 999999999 with remaining elements in the other list.

You don't really need the spaghetti-style nested function, simply recur would do, from https://rosettacode.org/wiki/Sorting_algorithms/Merge_sort#Python
from heapq import merge
def merge_sort(m):
if len(m) <= 1:
return m
middle = len(m) // 2
left = m[:middle]
right = m[middle:]
left = merge_sort(left)
right = merge_sort(right)
return list(merge(left, right))
The indexing shouldn't have +1 since Python slices don't overlap if they are the same index, i.e.
>>> x = [1,2,3,4,5,6]
>>> middle = 4
>>> x[:middle]
[1, 2, 3, 4]
>>> x[middle:]
[5, 6]
Moreover the heapq implementation of merge would have been more optimal than what you can write =)

Related

Merge Sorted array

I'm trying to solve the problem which is defined as:
we are given two integer arrays nums1 and nums2, sorted in non-decreasing order, and two integers m and n, representing the number of elements in nums1 and nums2 respectively.
Merge nums1 and nums2 into a single array sorted in non-decreasing order.
The function should not return the final sorted array, but instead, be stored inside the array nums1. To accommodate this, nums1 has a length of m + n, where the first m elements denote the elements that should be merged, and the last n elements are set to 0 and should be ignored. nums2 has a length of n.
I have written the following code in python :
def merge(nums1, m, nums2, n):
"""
Do not return anything, modify nums1 in-place instead.
"""
nums1=nums1[:m]
# print("before",nums1)
for i in nums1[m:n]:
i=0
# print("after1",nums1)
nums1[m:n]=nums2[:n]
nums1.sort()
# print(nums1)
merge([0],0,[1],1)
I have tried to submit the solution but showing up as an error. Can anyone find the solution to the given problem? Please do something with the above code, not anything from outside.
Input: nums1 = [1,2,3,0,0,0], m = 3, nums2 = [2,5,6], n = 3
Output: [1,2,2,3,5,6]
Explanation: The arrays we are merging are [1,2,3] and [2,5,6].
The result of the merge is [1,2,2,3,5,6] with the underlined elements coming from nums1.
You would like us to start from your code attempt, but none of the statements in your attempt are salvageable in an efficient solution:
nums1=nums1[:m]. This statement creates a new list, storing its reference in the variable that had a reference to the original input list. Thereby you lose the reference to the input list, and make it impossible to change anything in that list -- which was the purpose of this function.
for i in nums1[m:n]: In an efficient solution, you would not create a new list (like with using slice notation).
i=0: there is no benefit in repeating this in a loop. This is an assignment to a variable, not to a member of a list. It seems you thought this loop would clear part of the original list, but you cannot clear a list by assigning 0 to a variable. Moreover, there is no need to clear anything in any list for this code challenge.
nums1[m:n]=nums2[:n] although this copies the second list into nums1, this is copying it in a local list -- a list that the caller has no access to. Secondly, you would need to use the free space in the left list to efficiently merge the data. Now by having copied all values from the second list in it, you don't have any space anymore for such merge.
nums1.sort(). Calling sort defeats the purpose of the code challenge. sort doesn't use the knowledge that we're dealing with 2 sorted lists, and can only offer a time complexity of O(nlogn), while you can do it with a complexity of O(n).
So... nothing in your code can stay. It goes wrong from the first statement onwards, and takes the wrong approach.
The algorithm that should be implemented will have an index going from end to start in the nums1 and store the greatest value there from the values that have not yet been stored that way. As the input lists are sorted, there are only 2 values candidate for this operation.
Here is an implementation:
def merge(nums1, m, nums2, n):
# Let m and n refer to the last used index in given lists
m -= 1
n -= 1
for i in range(m + n + 1, -1, -1):
if n < 0 or m >= 0 and nums1[m] > nums2[n]:
nums1[i] = nums1[m]
m -= 1
else:
nums1[i] = nums2[n]
n -= 1
What you are asking for is just the classic function used within the merge sort algorithm. You can find many solutions online.
Here is a possible solution:
def merge(nums1, m, nums2, n):
i = m - 1
j = n - 1
for last in range(m + n - 1, -1, -1):
if i < 0:
nums1[last] = nums2[j]
j -= 1
elif j < 0:
nums1[last] = nums1[i]
i -= 1
elif nums1[i] < nums2[j]:
nums1[last] = nums2[j]
j -= 1
else:
nums1[last] = nums1[i]
i -= 1
Example:
>>> nums1 = [1, 3, 5, 7, 0, 0, 0]
>>> nums2 = [4, 6, 8]
>>> m = 4
>>> n = 3
>>> merge(nums1, m, nums2, n)
>>> nums1
[1, 3, 4, 5, 6, 7, 8]
Another example:
>>> nums1 = [1, 2, 3, 0, 0, 0]
>>> nums2 = [2, 5, 6]
>>> m = 3
>>> n = 3
>>> merge(nums1, m, nums2, n)
>>> nums1
[1, 2, 2, 3, 5, 6]
If you want to keep your original code and fix it then you can do this:
def merge(nums1, m, nums2, n):
nums1[m:] = nums2
nums1.sort()
Beware that this is much slower than my solution, and it's not the standard way to solve this well known problem.

Use list comprehensions to make a list of count of elements smaller than the element in an array

I was solving this leetcode problem - https://leetcode.com/problems/how-many-numbers-are-smaller-than-the-current-number/
I solved it easily by using nested for loops but list comprehensions have always intrigued me. Ive spent a lot of time to make that one liner work but I always get some syntax error.
here's the solution:
count = 0
ans = []
for i in nums:
for j in nums:
if i > j:
count = count + 1
ans.append(count)
count = 0
return ans
these were the ones so far I think shouldve worked:
return [count = count + 1 for i in nums for j in nums if i > j]
return [count for i in nums for j in nums if i > j count = count + 1]
return [count:= count + 1 for i in nums for j in nums if i > j]
Ill also be happy if there's some resource or similar to put it together, Ive been searching the python docs but didnt find something that'll help me
I will transform the code step by step in order to show the thought process.
First: we don't care what the value of count is afterward, but we need it to be 0 at the start of each inner loop. So it is simpler logically to set it there, rather than outside and then also at the end of the inner loop:
ans = []
for i in nums:
count = 0
for j in nums:
if i > j:
count = count + 1
ans.append(count)
return ans
Next, we focus on the contents of the loop:
count = 0
for j in nums:
if i > j:
count = count + 1
ans.append(count)
A list comprehension is not good at math; it is good at producing a sequence of values from a source sequence. The transformation we need to do here is to put the actual elements into our "counter" variable1, and then figure out how many there are (in order to append to ans). Thus:
smaller = []
for j in nums:
if i > j:
smaller.append(j)
ans.append(len(smaller))
Now that the creation of smaller has the right form, we can replace it with a list comprehension, in a mechanical, rule-based way. It becomes:
smaller = [j for j in nums if i > j]
# ^ ^^^^^^^^^^^^^ ^^^^^^^^
# | \- the rest of the parts are in the same order
# \- this moves from last to first
# and then we use it the same as before
ans.append(len(smaller))
We notice that we can just fold that into one line; and because we are passing a single comprehension argument to len we can drop the brackets2:
ans.append(len(j for j in nums if i > j))
Good. Now, let's put that back in the original context:
ans = []
for i in nums:
ans.append(len(j for j in nums if i > j))
return ans
We notice that the same technique applies: we have the desired form already. So we repeat the procedure:
ans = [len(j for j in nums if i > j) for i in nums]
return ans
And of course:
return [len(j for j in nums if i > j) for i in nums]
Another popular trick is to put a 1 in the output for each original element, and then sum them. It's about the same either way; last I checked the performance is about the same and I don't think either is clearer than the other.
Technically, this produces a generator expression instead. Normally, these would be surrounded with () instead of [], but a special syntax rule lets you drop the extra pair of () when calling a function with a single argument that is a generator expression. This is especially convenient for the built-in functions len and sum - as well as for any, all, max, min and (if you don't need a custom sort order) sorted.
Hmm, three people write sum solutions but every single one does sum(1 for ...). I prefer this:
[sum(j < i for j in nums) for i in nums]
Instead of trying to advance an external counter, try adding ones to your list and then sum it:
for example:
nums = [1,2,3,4,5]
target = 3
print(sum(1 for n in nums if n < target))
Using counter inside the list comprehension creates the challenge of resetting it's value, each iteration of the first loop.
This can be avoided by filtering, and summing, in the second loop:
You use the first loop to iterate over the values of nums array.
return [SECOND_LOOP for i in nums]
You use the second loop, iterating over all elements of nums array. You filter in the elements that are smaller than i, the current element in the first loop, with if i < j, and evaluating 1 for each of them. Finally, you sum all the 1s generated:
sum(1 for j in nums if i > j)
You get the number of values that meet the requirements, by the list comprehension of the first loop:
return [sum(1 for j in nums if i > j) for i in nums]
This solution has been checked & validated in LeetCode.
You need a slightly different approach for the inner loop than a list comprehension. Instead of repeatedly appending a value to a list you need to repeatedly add a value to a variable.
This can be done in a functional way by using sum and a generator expression:
count = 0
# ...
for j in nums:
if i > j:
count = count + 1
can be replaced by
count = sum(1 for j in nums if i > j)
So that we now have this:
ans = []
for i in nums:
count = sum(1 for j in nums if i > j)
ans.append(count)
return ans
This pattern can in fact be replaced by a list comprehension:
return [sum(1 for j in nums if i > j) for i in nums]
Alternative Solution
We can also use the Counter from collections:
class Solution:
def smallerNumbersThanCurrent(self, nums):
count_map = collections.Counter(nums)
smallers = []
for index in range(len(nums)):
count = 0
for key, value in count_map.items():
if key < nums[index]:
count += value
smallers.append(count)
return smallers

Solving the "firstDuplicate" question in Python

I'm trying to solve the following challenge from codesignal.com:
Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.
Example
For a = [2, 1, 3, 5, 3, 2], the output should be
firstDuplicate(a) = 3.
There are 2 duplicates: numbers 2 and 3. The second occurrence of 3 has a smaller index than the second occurrence of 2 does, so the answer is 3.
For a = [2, 4, 3, 5, 1], the output should be
firstDuplicate(a) = -1.
The execution time limit is 4 seconds.
The guaranteed constraints were:
1 ≤ a.length ≤ 10^5, and
1 ≤ a[i] ≤ a.length
So my code was:
def firstDuplicate(a):
b = a
if len(list(set(a))) == len(a):
return -1
n = 0
answer = -1
starting_distance = float("inf")
while n!=len(a):
value = a[n]
if a.count(value) > 1:
place_of_first_number = a.index(value)
a[place_of_first_number] = 'string'
place_of_second_number = a.index(value)
if place_of_second_number < starting_distance:
starting_distance = place_of_second_number
answer = value
a=b
n+=1
if n == len(a)-1:
return answer
return answer
Out of the 22 tests the site had, I passed all of them up to #21, because the test list was large and the execution time exceeded 4 seconds. What are some tips for reducing the execution time, while keeping the the code more or less the same?
As #erip has pointed out in the comments, you can iterate through the list, add items to a set, and if the item is already in a set, it is a duplicate that has the lowest index, so you can simply return the item; or return -1 if you get to the end of the loop without finding a duplicate:
def firstDuplicate(a):
seen = set()
for i in a:
if i in seen:
return i
seen.add(i)
return -1
Create a new set and find its already in the new list, if its there return the element:
def firstDuplicate(a):
dup = set()
for i in range(len(a)):
if a[i] in dup:
return a[i]
else:
dup.add(a[i])
return -1
This is just an idea, I didn't verify it but it should work. It seems there's no memory limit but just a time limit. Therefore using space to trade time is probably a practical way to do this. The computation complexity is O(n). This algorithm also depends on the condition that the number range is between 1 to len(a).
def first_duplicate(a):
len_a = len(a)
b = [len_a + 1] * len_a
for i, n in enumerate(a):
n0 = n - 1
if b[n0] == len_a + 1:
b[n0] = len_a
elif b[n0] == len_a:
b[n0] = i
min_i = len_a
min_n = -1
for n0, i in enumerate(b):
if i < min_i:
min_i = i
min_n = n0 + 1
return min_n
Update:
This solution is not as fast as the set() solution by #blhsing. However, it may not be the same if it was implemented in C - it's kinda unfair since set() is a built-in function which was implemented in C as other core functions of CPython.

Index out of range in implementation of a variation of mergesort algorithm in python?

I have done a variation of my merge sort algorithm in python, based on what I've learnt from the CLRS book, and compared it with the implementation done on the introductory computer science book by MIT. I cannot find the problem in my algorithm, and the IDLE gives me an index out of range although everything looks fine to me. I'm unsure if this is due to some confusion in borrowing ideas from the MIT algorithm (see below).
lista = [1,2,3,1,1,1,1,6,7,12,2,7,7,67,4,7,9,6,6,3,1,14,4]
def merge(A, p, q, r):
q = (p+r)/2
L = A[p:q+1]
R = A[q+1:r]
i = 0
j = 0
for k in range(len(A)):
#if the list R runs of of space and L[i] has nothing to compare
if i+1 > len(R):
A[k] = L[i]
i += 1
elif j+1 > len(L):
A[k] = R[j]
j += 1
elif L[i] <= R[j]:
A[k] = L[i]
i += 1
elif R[j] <= L[i]:
A[k] = R[j]
j += 1
#when both the sub arrays have run out and all the ifs and elifs done,
# the for loop has effectively ended
return A
def mergesort(A, p, r):
"""A is the list, p is the first index and r is the last index for which
the portion of the list is to be sorted."""
q = (p+r)/2
if p<r:
mergesort(A, p, q)
mergesort(A, q+1, r)
merge (A, p, q, r)
return A
print mergesort(lista, 0, len(lista)-1)
I have followed the pseudocode in CLRS as closely as I could, just without using the "infinity value" at the end of L and R, which would continue to compare (is this less efficient?). I tried to incorporate ideas like that in the MIT book, which is to simply copy down the remaining L or R list to A, to mutate A and return a sorted list. However, I can't seem to find what has went wrong with it. Also, I don't get why the pseudo code requires a 'q' as an input, given that q would be calculated as (p+q)/2 for the middle index anyway. And why is there a need to put p
On the other hand, from the MIT book, we have something that looks really elegant.
def merge(left, right, compare):
"""Assumes left and right are sorted lists and
compare defines an ordering on the elements.
Returns a new sorted(by compare) list containing the
same elements as(left + right) would contain.
"""
result = []
i, j = 0, 0
while i < len(left) and j < len(right):
if compare(left[i], right[j]):
result.append(left[i])
i += 1
else :
result.append(right[j])
j += 1
while (i < len(left)):
result.append(left[i])
i += 1
while (j < len(right)):
result.append(right[j])
j += 1
return result
import operator
def mergeSort(L, compare = operator.lt):
"""Assumes L is a list, compare defines an ordering
on elements of L.
Returns a new sorted list containing the same elements as L"""
if len(L) < 2:
return L[: ]
else :
middle = len(L) //2
left = mergeSort(L[: middle], compare)
right = mergeSort(L[middle: ], compare)
return merge(left, right, compare)
Where could I have gone wrong?
Also, I think the key difference in the MIT implementation is that it creates a new list instead of mutating the original list. This makes it quite difficult for me to understand mergesort, because I found the CLRS explanation quite clear, by understanding it in terms of different layers of recursion occurring to sort the most minute components of the original list (the list of length 1 that needs no sorting), thus "storing" the results of recursion within the old list itself.
However, thinking again, is it right to say that the "result" returned by each recursion in the MIT algorithm, which is in turn combined?
Thank you!
the fundamental difference between your code and the MIT is the conditional statement in the mergesort function. Where your if statement is:
if p<r:
theirs is:
if len(L) < 2:
This means that if you were to have, at any point in the recursive call tree, a list that is of len(A) == 1, then it would still call merge on a size 1 or even 0 list. You can see that this causes problems in the merge function because then your L, R, or both sub lists can end up being of size 0, which would then cause an out if bounds index error.
your problem could then be easily fixed by changing your if statement to something alike to theirs, like len(A) < 2 or r-p < 2

Sort a list efficiently which contains only 0 and 1 without using any builtin python sort function?

What is the most efficient way to sort a list, [0,0,1,0,1,1,0] whose elements are only 0 & 1, without using any builtin sort() or sorted() or count() function. O(n) or less than that
>>> lst = [0,0,1,0,1,1,0]
>>> l, s = len(lst), sum(lst)
>>> result = [0] * (l - s) + [1] * s
>>> result
[0, 0, 0, 0, 1, 1, 1]
There are many different general sorting algorithms that can be used. However, in this case, the most important consideration is that all the elements to sort belong to the set (0,1).
As other contributors answered there is a trivial implementation.
def radix_sort(a):
slist = [[],[]]
for elem in a:
slist[elem].append(elem)
return slist[0] + slist[1]
print radix_sort([0,0,1,0,1,1,0])
It must be noted that this is a particular implementation of the Radix sort. And this can be extended easily if the elements of the list to be sorted belong to a defined limited set.
def radix_sort(a, elems):
slist = {}
for elem in elems:
slist[elem] = []
for elem in a:
slist[elem].append(elem)
nslist = []
for elem in elems:
nslist += slist[elem]
return nslist
print radix_sort([2,0,0,1,3,0,1,1,0],[0,1,2,3])
No sort() or sorted() or count() function. O(n)
This one is O(n) (you can't get less):
old = [0,0,1,0,1,1,0]
zeroes = old.count(0) #you gotta count them somehow!
new = [0]*zeroes + [1]*(len(old) - zeroes)
As there are no Python loops, this may be the faster you can get in pure Python...
def sort_arr_with_zero_one():
main_list = [0,0,1,0,1,1,0]
zero_list = []
one_list = []
for i in main_list:
if i:
one_list.append(i)
else:
zero_list.append(i)
return zero_list + one_list
You have only two values, so you know in advance the precise structure of the output: it will be divided into two regions of varying lengths.
I'd try this:
b = [0,0,1,0,1,1,0]
def odd_sort(a):
zeroes = a.count(0)
return [0 for i in xrange(zeroes)] + [1 for i in xrange(len(a) - zeroes)]
You could walk the list with two pointers, one from the start (i) and from the end (j), and compare the values one by one and swap them if necessary:
def sort_binary_values(l):
i, j = 0, len(l)-1
while i < j:
# skip 0 values from the begin
while i < j and l[i] == 0:
i = i+1
if i >= j: break
# skip 1 values from the end
while i < j and l[j] == 1:
j = j-1
if i >= j: break
# since all in sequence values have been skipped and i and j did not reach each other
# we encountered a pair that is out of order and needs to be swapped
l[i], l[j] = l[j], l[i]
j = j-1
i = i+1
return l
I like the answer by JBernado, but will throw in another monstrous option (although I've not done any profiling on it - it's not particulary extensible as it relies on the order of a dictionary hash, but works for 0 and 1):
from itertools import chain, repeat
from collections import Counter
list(chain.from_iterable(map(repeat, *zip(*Counter(bits).items()))))
Or - slightly less convoluted...
from itertools import repeat, chain, islice, ifilter
from operator import not_
list(islice(chain(ifilter(not_, bits), repeat(1)), len(bits)))
This should keep everything at the C level - so it should be fairly optimal.
All you need to know is how long the original sequence is and how many ones are in it.
old = [0,0,1,0,1,1,0]
ones = sum(1 for b in old if b)
new = [0]*(len(old)-ones) + [1]*ones
Here is a Python solution in O(n) time and O(2) space.
Absolutely no need to create new lists and best time performance
def sort01(arr):
i = 0
j = len(arr)-1
while i < j:
while arr[i] == 0:
i += 1
while arr[j] == 1:
j -= 1
if i<j:
arr[i] = 0
arr[j] = 1
return arr

Categories

Resources