What is the sorting algorithm most optimized for sorting an array that consists of 2 sorted sub-arrays?
This question came up when I was solving: https://leetcode.com/problems/squares-of-a-sorted-array/
My solution is as follows:
def sortedSquares(self, nums: List[int]) -> List[int]:
n = len(nums)
l, r = 0, n-1
res = [0] * n
for i in range(n-1, -1, -1):
if abs(nums[l]) > abs(nums[r]):
res[i] = nums[l] ** 2
l += 1
else:
res[i] = nums[r] ** 2
r -= 1
return res
However this solution beats mine:
def sortedSquares(self, nums: List[int]) -> List[int]:
return sorted([i * i for i in nums])
If timsort is doing insertion sort on small chunks, then wouldn't an array like this cause O(n^2) on half the runs? How is that better than my supposedly O(n) solution?
Thank you!
In this case, the theoretical time complexity is O(n) because you don't need to sort at all (merely merge two ordered lists). Performing a sort generally has a O(NlogN) complexity.
Complexity and performance are two different things however. Your O(n) solution in Python code is competing with O(NlogN) in highly optimized low level C code. In theory there should be a point where the size of the list is large enough for the Python based O(n) solution to catch up with the O(NlogN) profile of the native sort but you can expect that this would be a very large number of elements (probably larger than your computer's memory can hold). If Python's native sort is smart enough to switch to a radix sorting algorithm for integers or if its sorting algorithm benefits from partially sorted data, it will be impossible to catch up.
Related
I have an n-element array. All elements except 4√n of them are sorted. We do not know the positions of these misplaced elements. What is the most efficient way of sorting this list?
Is there an O(n) way to do this?
Update 1:
time complexity of an insertion sort is O(n) for almost sorted data (is it true in worst case?)?
There is a fast general method for sorting almost sorted arrays:
Scan through the original array from start to end. If you find two items that are not ordered correctly, move them to a second array and remove them from the first array. Be careful; for example if you remove x2 and x3, then you need to check again that x1 ≤ x2. This is done in O(n) time. In your case, the new array is at most 8sqrt(n) in size.
Sort the second array, then merge both arrays. With the small number of items in the second array, any reasonable sorting algorithm will sort the small second array in O(n), and the merge takes O(n) again, so the total time is O(n).
If you use a O(n log n) algorithm to sort the second array, then sorting is O(n) as long as the number of items in the wrong position is at most O (n / log n).
No, insertion sort isn't O(n) on that. Worst case is when it's the last 4√n elements that are misplaced, and they're so small that they belong at the front of the array. It'll take insertion sort Θ(n √n) to move them there.
Here's a Python implementation of gnasher729's answer that's O(n) time and O(n) space on such near-sorted inputs. We can't naively "remove" pairs from the array, though, that would be inefficient. Instead, I move correctly sorted values into a good list and the misordered pairs into a bad list. So as long as the numbers are increasing, they're just added to good. But if the next number x is smaller than the last good number good[-1], then they're both moved to bad. When I'm done, I concatenate good and bad and let Python's Timsort do the rest. It detects the already sorted run good in O(n - √n) time, then sorts the bad part in O(√n log √n) time, and finally merges the two sorted parts in O(n) time.
def sort1(a):
good, bad = [], []
for x in a:
if good and x < good[-1]:
bad += x, good.pop()
else:
good += x,
a[:] = sorted(good + bad)
Next is a space-improved version that takes O(n) time and only O(√n) space. Instead of storing the good part in an extra list, I store it in a[:good]:
def sort2(a):
good, bad = 0, []
for x in a:
if good and x < a[good-1]:
bad += x, a[good-1]
good -= 1
else:
a[good] = x
good += 1
a[good:] = bad
a.sort()
And here's another O(n) time and O(√n) space variation where I let Python sort bad for me, but then merge the good part with the bad part myself, from right to left. So this doesn't rely on Timsort's sorted-run detection and is thus easily ported to other languages:
def sort3(a):
good, bad = 0, []
for x in a:
if good and x < a[good-1]:
bad += x, a[good-1]
good -= 1
else:
a[good] = x
good += 1
bad.sort()
i = len(a)
while bad:
i -= 1
if good and a[good-1] > bad[-1]:
good -= 1
a[i] = a[good]
else:
a[i] = bad.pop()
Finally, some test code:
from random import random, sample
from math import isqrt
def sort1(a):
...
def sort2(a):
...
def sort3(a):
...
def fake(a):
"""Intentionally do nothing, to show that the test works."""
def main():
n = 10**6
a = [random() for _ in range(n)]
a.sort()
for i in sample(range(n), 4 * isqrt(n)):
a[i] = random()
for sort in sort1, sort2, sort3, fake:
copy = a.copy()
sort(copy)
print(sort.__name__, copy == sorted(a))
if __name__ == '__main__':
main()
Output, shows that both solutions passed the test (and that the test works, detecting fake as incorrect):
sort1 True
sort2 True
sort3 True
fake False
Fun fact: For Timsort alone (i.e., not used as part of the above algorithms), the worst case I mentioned above is rather a best case: It would sort that in O(n) time. Just like in my first version's sorted(good + bad), it'd recognize the prefix of n-√n sorted elements in O(n - √n) time, sort the √n last elements in O(√n log √n) time, and then merge the two sorted parts in O(n) time.
So can we just let Timsort do the whole thing? Is it O(n) on all such near-sorted inputs? No, it's not. If the 4√n misplaced elements are evenly spread over the array, then we have up to 4√n sorted runs and Timsort will take O(n log(4√n)) = O(n log n) time to merge them.
Given an array of integers, find all unique quartets summing up to a
specified integer.
I will provide two different solutions below, I was just wondering which one was more efficient with respect to time complexity?
Solution 1:
def four_sum(arr, s):
n = len(arr)
output = set()
for i in range(n-2):
for j in range(i+1, n-1):
seen = set()
for k in range(j+1, n):
target = s - arr[i] - arr[j] - arr[k]
if target in seen:
output.add((arr[i], arr[j], arr[k], target))
else:
seen.add(arr[k])
return print('\n'.join(map(str, list(output))))
I know that this has time complexity of O(n^3).
Solution 2:
def four_sum2(arr, s):
n = len(arr)
seen = {}
for i in range(n-1):
for j in range(i+1, n):
if arr[i] + arr[j] in seen:
seen[arr[i] + arr[j]].add((i, j))
else:
seen[arr[i] + arr[j]] = {(i, j)}
output = set()
for key in seen:
if s - key in seen:
for (i, j) in seen[key]:
for (p, q) in seen[s - key]:
sorted_index = tuple(sorted((arr[i], arr[j], arr[p], arr[q])))
if i not in (p, q) and j not in (p, q):
output.add(sorted_index)
return output
Now, the first block has a time complexity of O(n^2), but I'm not sure what the time complexity is on the second block?
TLDR: the complexity of this algorithm is O(n^4).
In the first part, a tuple is added in seen for all pair (i,j) where j>i.
Thus the number of tuples in seen is about (n-1)*n/2 = O(n^2) as you guess.
The second part is a bit more complex. If we ignore the first condition of the nested loops (critical case), the two first loops can iterate over all possible tuples in seen. Thus the complexity is at least O(n^2). For the third loop, it is a bit tricky: it is hard to know the complexity without making any assumption on the input data. However, we can assume that there is theoretically a critical case where seen[s - key] contains O(n^2) tuples. In such a case, the overall algorithm would run in O(n^4)!
Is this theoretical critical case practical?
Well, sadly yes. Indeed, take the input arr = [5, 5, ..., 5, 5] with s = 20 for example. The seen map will contains one key (10) associated to an array with (n-1)*n/2 = O(n^2) elements. In this case the two first loops of the second part will run in O(n^2) and third nested loop in O(n^2) too.
Thus the overall algorithm run in O(n^4).
However, note that in practice such case should be quite rare and the algorithm should run much faster on random inputs with many different numbers. The complexity can probably be improved to O(n^3) or even O(n^2) if this critical case is fixed (eg. by computing this pathological case separately).
I look up online and know that list.pop() has O(1) time complexity but list.pop(i) has O(n) time complexity. While I am writing leetcode, many people use pop(i) in a for loop and they say it is O(n) time complexity and in fact it is faster than my code, which only uses one loop but many lines in that loop. I wonder why this would happen, and should I use pop(i) instead of many lines to avoid it?
Example: Leetcode 26. Remove Duplicates from Sorted Array
My code: (faster than 75%)
class Solution(object):
def removeDuplicates(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
left, right = 0, 0
count = 1
while right < len(nums)-1:
if nums[right] == nums[right+1]:
right += 1
else:
nums[left+1]=nums[right+1]
left += 1
right += 1
count += 1
return count
and other people's code, faster than 90%: (this guy does not say O(n), but why O(n^2) faster than my O(n)?)
https://leetcode.com/problems/remove-duplicates-from-sorted-array/discuss/477370/python-3%3A-straight-forward-6-lines-solution-90-faster-100-less-memory
My optimized code (faster than 89%)
class Solution(object):
def removeDuplicates(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
left, right = 0, 0
while right < len(nums)-1:
if nums[right] != nums[right+1]:
nums[left+1]=nums[right+1]
left += 1
right += 1
return left + 1
Your algorithm genuinely does take O(n) time and the "pop in reverse order" algorithm genuinely does take O(n²) time. However, LeetCode isn't reporting that your time complexity is better than 89% of submissions; it is reporting your actual running time is better than 89% of all submissions. The actual running time depends on what inputs the algorithm is tested with; not just the sizes but also the number of duplicates.
It also depends how the running times across multiple test cases are averaged; if most of the test cases are for small inputs where the quadratic solution is faster, then the quadratic solution may come out ahead overall even though its time complexity is higher. #Heap Overflow also points out in the comments that the overhead time of LeetCode's judging system is proportionally large and quite variable compared to the time it takes for the algorithms to run, so the discrepancy could simply be due to random variation in that overhead.
To shed some light on this, I measured running times using timeit. The graph below shows my results; the shapes are exactly what you'd expect given the time complexities, and the crossover point is somewhere between 8000 < n < 9000 on my machine. This is based on sorted lists where each distinct element appears on average twice. The code I used to generate the times is given below.
Timing code:
def linear_solution(nums):
left, right = 0, 0
while right < len(nums)-1:
if nums[right] != nums[right+1]:
nums[left+1]=nums[right+1]
left += 1
right += 1
return left + 1
def quadratic_solution(nums):
prev_obj = []
for i in range(len(nums)-1,-1,-1):
if prev_obj == nums[i]:
nums.pop(i)
prev_obj = nums[i]
return len(nums)
from random import randint
from timeit import timeit
def gen_list(n):
max_n = n // 2
return sorted(randint(0, max_n) for i in range(n))
# I used a step size of 1000 up to 15000, then a step size of 5000 up to 50000
step = 1000
max_n = 15000
reps = 100
print('n', 'linear time (ms)', 'quadratic time (ms)', sep='\t')
for n in range(step, max_n+1, step):
# generate input lists
lsts1 = [ gen_list(n) for i in range(reps) ]
# copy the lists by value, since the algorithms will mutate them
lsts2 = [ list(g) for g in lsts1 ]
# use iterators to supply the input lists one-by-one to timeit
iter1 = iter(lsts1)
iter2 = iter(lsts2)
t1 = timeit(lambda: linear_solution(next(iter1)), number=reps)
t2 = timeit(lambda: quadratic_solution(next(iter2)), number=reps)
# timeit reports the total time in seconds across all reps
print(n, 1000*t1/reps, 1000*t2/reps, sep='\t')
The conclusion is that your algorithm is indeed faster than the quadratic solution for large enough inputs, but the inputs LeetCode is using to measure running times are not "large enough" to overcome the variation in the judging overhead, and the fact that the average includes times measured on smaller inputs where the quadratic algorithm is faster.
Just because the solution is not O(n), you can't assume it to be O(n^2).
It doesn't quite become O(n^2) because he is using pop in reverse order which decreases the time to pop every time, using pop(i) on forward order will consume more time than that on reverse, as the pop searches from reverse and in every loop he is decreasing the number of elements on the back. Try that same solution in non-reverse order, run few times to make sure, you'll see.
Anyway, regarding why his solution is faster, You have an if condition with a lot of variables, he has only used one variable prev_obj, using the reverse order makes it possible to do with just one variable. So the number of basic mathematical operations are more in your case, so with same O(n) complexity each of your n-loops is longer than his.
Just look at your count varible, in every iteration its value is left+1 you could return left+1, just removing that would decrease n amount of count=count+1 you have to do.
I just posted this solution and it is 76% faster
class Solution:
def removeDuplicates(self, nums: List[int]) -> int:
a=sorted(set(nums),key=lambda item:item)
for i,v in enumerate(a):
nums[i]=v
return len(a)
and this one gives faster than 90%.
class Solution:
def removeDuplicates(self, nums: List[int]) -> int:
a ={k:1 for k in nums} #<--- this is O(n)
for i,v in enumerate(a.keys()): #<--- this is another O(n), but the length is small so O(m)
nums[i]=v
return len(a)
You can say both of them are more than O(n) if you look at the for loop,
But since we are working with dublicate members when I am looping over the reduced memebers while your code is looping over all memebers. So the time required to make that unique set/dict is if lesser than time required for you to loop over those extra members and to check for if conditions, then my solution can be faster.
I'm trying to figure out the time complexity for this whole algorithm. Isit O(nlogn) or O(n)? I've been searching online and some says max heap it's O(nlogn) and some are O(n). I am trying to get the time complexity O(n).
def max_heapify(A, i):
left = 2 * i + 1
right = 2 * i + 2
largest = i
if left < len(A) and A[left] > A[largest]:
largest = left
if right < len(A) and A[right] > A[largest]:
largest = right
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, largest)
def build_max_heap(A):
for i in range(len(A) // 2, -1, -1):
max_heapify(A, i)
return A
The code you have in the question rearranges array elements such that they satisfy the heap property i.e. the value of the parent node is greater than that of the children nodes. The time complexity of the heapify operation is O(n).
Here's an extract from [Wikipedia page on Min-max heap](https://en.wikipedia.org/wiki/Min-max_heap#Build
Creating a min-max heap is accomplished by an adaption of Floyd's linear-time heap construction algorithm, which proceeds in a bottom-up fashion.[10] A typical Floyd's build-heap algorithm[11] goes as follows:
function FLOYD-BUILD-HEAP (h):
for each index i from floor(length(h)/2) down to 1 do:
push-down(h, i)
return h
Here the function FLOYD-BUILD-HEAP is same as your build_max_heap function and push-down is same as your max_heapify function.
A suggestion: the naming of your functions is a little confusing. Your max_heapify is not actually heapifying. It is just a part of the heapify operation. A better name could be something like push_down (as used in Wikipedia) or fix_heap.
A heap is a data structure which supports operations including insertion and retrieval. Each operation has its own runtime complexity.
Maybe you were thinking of the runtime complexity of heapsort which is a sorting algorithm that uses a heap. In that case, the runtime complexity is O(n*log(n)).
I have this, and it works:
# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
finalList = []
for item in list1:
finalList.append(item)
for item in list2:
finalList.append(item)
finalList.sort()
return finalList
# +++your code here+++
return
But, I'd really like to learn this stuff well. :) What does 'linear' time mean?
Linear means O(n) in Big O notation, while your code uses a sort() which is most likely O(nlogn).
The question is asking for the standard merge algorithm. A simple Python implementation would be:
def merge(l, m):
result = []
i = j = 0
total = len(l) + len(m)
while len(result) != total:
if len(l) == i:
result += m[j:]
break
elif len(m) == j:
result += l[i:]
break
elif l[i] < m[j]:
result.append(l[i])
i += 1
else:
result.append(m[j])
j += 1
return result
>>> merge([1,2,6,7], [1,3,5,9])
[1, 1, 2, 3, 5, 6, 7, 9]
Linear time means that the time taken is bounded by some undefined constant times (in this context) the number of items in the two lists you want to merge. Your approach doesn't achieve this - it takes O(n log n) time.
When specifying how long an algorithm takes in terms of the problem size, we ignore details like how fast the machine is, which basically means we ignore all the constant terms. We use "asymptotic notation" for that. These basically describe the shape of the curve you would plot in a graph of problem size in x against time taken in y. The logic is that a bad curve (one that gets steeper quickly) will always lead to a slower execution time if the problem is big enough. It may be faster on a very small problem (depending on the constants, which probably depends on the machine) but for small problems the execution time isn't generally a big issue anyway.
The "big O" specifies an upper bound on execution time. There are related notations for average execution time and lower bounds, but "big O" is the one that gets all the attention.
O(1) is constant time - the problem size doesn't matter.
O(log n) is a quite shallow curve - the time increases a bit as the problem gets bigger.
O(n) is linear time - each unit increase means it takes a roughly constant amount of extra time. The graph is (roughly) a straight line.
O(n log n) curves upwards more steeply as the problem gets more complex, but not by very much. This is the best that a general-purpose sorting algorithm can do.
O(n squared) curves upwards a lot more steeply as the problem gets more complex. This is typical for slower sorting algorithms like bubble sort.
The nastiest algorithms are classified as "np-hard" or "np-complete" where the "np" means "non-polynomial" - the curve gets steeper quicker than any polynomial. Exponential time is bad, but some are even worse. These kinds of things are still done, but only for very small problems.
EDIT the last paragraph is wrong, as indicated by the comment. I do have some holes in my algorithm theory, and clearly it's time I checked the things I thought I had figured out. In the mean time, I'm not quite sure how to correct that paragraph, so just be warned.
For your merging problem, consider that your two input lists are already sorted. The smallest item from your output must be the smallest item from one of your inputs. Get the first item from both and compare the two, and put the smallest in your output. Put the largest back where it came from. You have done a constant amount of work and you have handled one item. Repeat until both lists are exhausted.
Some details... First, putting the item back in the list just to pull it back out again is obviously silly, but it makes the explanation easier. Next - one input list will be exhausted before the other, so you need to cope with that (basically just empty out the rest of the other list and add it to the output). Finally - you don't actually have to remove items from the input lists - again, that's just the explanation. You can just step through them.
Linear time means that the runtime of the program is proportional to the length of the input. In this case the input consists of two lists. If the lists are twice as long, then the program will run approximately twice as long. Technically, we say that the algorithm should be O(n), where n is the size of the input (in this case the length of the two input lists combined).
This appears to be homework, so I will no supply you with an answer. Even though this is not homework, I am of the opinion that you will be best served by taking a pen and a piece of paper, construct two smallish example lists which are sorted, and figure out how you would merge those two lists, by hand. Once you figured that out, implementing the algorithm is a piece of cake.
(If all goes well, you will notice that you need to iterate over each list only once, in a single direction. That means that the algorithm is indeed linear. Good luck!)
If you build the result in reverse sorted order, you can use pop() and still be O(N)
pop() from the right end of the list does not require shifting the elements, so is O(1)
Reversing the list before we return it is O(N)
>>> def merge(l, r):
... result = []
... while l and r:
... if l[-1] > r[-1]:
... result.append(l.pop())
... else:
... result.append(r.pop())
... result+=(l+r)[::-1]
... result.reverse()
... return result
...
>>> merge([1,2,6,7], [1,3,5,9])
[1, 1, 2, 3, 5, 6, 7, 9]
This thread contains various implementations of a linear-time merge algorithm. Note that for practical purposes, you would use heapq.merge.
Linear time means O(n) complexity. You can read something about algorithmn comlexity and big-O notation here: http://en.wikipedia.org/wiki/Big_O_notation .
You should try to combine those lists not after getting them in the finalList, try to merge them gradually - adding an element, assuring the result is sorted, then add next element... this should give you some ideas.
A simpler version which will require equal sized lists:
def merge_sort(L1, L2):
res = []
for i in range(len(L1)):
if(L1[i]<L2[i]):
first = L1[i]
secound = L2[i]
else:
first = L2[i]
secound = L1[i]
res.extend([first,secound])
return res
itertoolz provides an efficient implementation to merge two sorted lists
https://toolz.readthedocs.io/en/latest/_modules/toolz/itertoolz.html#merge_sorted
'Linear time' means that time is an O(n) function, where n - the number of items input (items in the lists).
f(n) = O(n) means that that there exist constants x and y such that x * n <= f(n) <= y * n.
def linear_merge(list1, list2):
finalList = []
i = 0
j = 0
while i < len(list1):
if j < len(list2):
if list1[i] < list2[j]:
finalList.append(list1[i])
i += 1
else:
finalList.append(list2[j])
j += 1
else:
finalList.append(list1[i])
i += 1
while j < len(list2):
finalList.append(list2[j])
j += 1
return finalList