heapsort coding for python 3

heapsort coding for python 3 - python

def heapSort(lst):
heap = arrayHeap.mkHeap(len(lst), arrayHeap.less)
alst = list(lst)
while alst != []:
v = alst.pop(0)
arrayHeap.add (heap, v)
while heap.size != 0:
w = arrayHeap.removeMin(heap)
alst.append(w)
return last
is this a valid heap sort function?

Assuming your arrayHeap provides the same guarantees as the stdlib's heapq or any other reasonable heap implementation, then this is a valid heap sort, but it's a very silly one.
By copying the original sequence into a list and then popping from the left side, you're doing O(N^2) preparation for your O(N log N) sort.
If you change this to pop from the right side, then you're only doing O(N) preparation, so the whole thing takes O(N log N), as a heapsort should.
That being said, I can't understand why you want to pop off the list instead of just iterating over it. Or, for that matter, why you want to copy the original sequence into a list instead of just iterating over it directly. If you do that, it will be faster, and use only half the memory, and be much simpler code. Like this:
def heapSort(lst):
heap = arrayHeap.mkHeap(len(lst), arrayHeap.less)
for v in lst:
arrayHeap.add(heap, v)
alst = []
while heap.size:
w = arrayHeap.removeMin(heap)
alst.append(w)
return last
With a slightly nicer API, like the one in the stdlib's heapq module (is there a reason you're not using it, by the way?), you can make this even simpler:
def heapSort(lst):
alst = []
for v in lst:
heapq.heappush(alst, v)
return [heapq.heappop(alst) for i in range(len(alst))]
… or, if you're sure lst is a list and you don't mind mutating it:
def heapSort(lst):
heapq.heapify(lst)
return [heapq.heappop(lst) for i in range(len(lst))]
… or, of course, you can copy lst and then mutate the copy:
def heapSort(lst):
alst = lst[:]
heapq.heapify(alst)
return [heapq.heappop(alst) for i in range(len(alst))]
You may notice that the first one is the first of the Basic Examples in the heapq docs.

Related

Does one for loop mean a time complexity of n in this case?

So, I've run into this problem in the daily coding problem challenge, and I've devised two solutions. However, I am unsure if one is better than the other in terms of time complexity (Big O).
# Given a list of numbers and a number k,
# return whether any two numbers from the list add up to k.
#
# For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
#
# Bonus: Can you do this in one pass?
# The above part seemed to denote this can be done in O(n).
def can_get_value(lst=[11, 15, 3, 7], k=17):
for x in lst:
for y in lst:
if x+y == k:
return True
return False
def optimized_can_get_value(lst=[10, 15, 3, 7], k=17):
temp = lst
for x in lst:
if k-x in temp:
return True
else:
return False
def main():
print(can_get_value())
print(optimized_can_get_value())
if __name__ == "__main__":
main()
I think the second is better than the first since it has one for loop, but I'm not sure if it is O(n), since I'm still running through two lists. Another solution I had in mind that was apparently a O(n) solution was using the python equivalent of "Java HashSets". Would appreciate confirmation, and explanation of why/why not it is O(n).

The first solution can_get_value() is textbook O(n^2). You know this.
The second solution is as well. This is because elm in list has O(n) complexity, and you're executing it n times. O(n) * O(n) = O(n^2).
The O(n) solution here is to convert from a list into a set (or, well, any type of hash table - dict would work too). The following code runs through the list exactly twice, which is O(n):
def can_get_value(lst, k):
st = set(lst) # make a hashtable (set) where each key is the same as its value
for x in st: # this executes n times --> O(n)
if k-x in st: # unlike for lists, `in` is O(1) for hashtables
return True
return False
This is thus O(n) * O(1) = O(n) in most cases.

In order to analyze the asymptotic runtime of your code, you need to know the runtime of each of the functions which you call as well. We generally think of arithmetic expressions like addition as being constant time (O(1)), so your first function has two for loops over n elements and the loop body only takes constant time, coming out to O(n * n * 1) = O(n^2).
The second function has only one for loop, but checking membership for a list is an O(n) function in the length of the list, so you still have O(n * n) = O(n^2). The latter option may still be faster (Python probably has optimized code for checking list membership), but it won't be asymptotically faster (the runtime still increases quadratically in n).
EDIT - as #Mark_Meyer pointed out, your second function is actually O(1) because there's a bug in it; sorry, I skimmed it and didn't notice. This answer assumes a corrected version of the second function like
def optimized_can_get_value(lst, k=17):
for x in lst:
if k - x in lst:
return True
return False
(Note - don't have a default value for you function which is mutable. See this SO question for the troubles that can bring. I also removed the temporary list because there's no need for that; it was just pointing to the same list object anyway.)
EDIT 2: for fun, here are a couple of O(n) solutions to this (both use that checking containment for a set is O(1)).
A one-liner which still stops as soon as a solution is found:
def get_value_one_liner(lst, k):
return any(k - x in set(lst) for x in lst)
EDIT 3: I think this is actually O(n^2) because we call set(lst) for each x. Using Python 3.8's assignment expressions could, I think, give us a one-liner that is still efficient. Does anybody have a good Python <3.8 one-liner?
And a version which tries not to do extra work by building up a set as it goes (not sure if this is actually faster in practice than creating the whole set at the start; it probably depends on the actual input data):
def get_value_early_stop(lst, k):
values = set()
for x in lst:
if x in values:
return True
values.add(k - x)
return False

O(n) list subtraction

When working on an AoC puzzle, I found I wanted to subtract lists (preserving ordering):
def bag_sub(list_big, sublist):
result = list_big[:]
for n in sublist:
result.remove(n)
return result
I didn't like the way the list.remove call (which is itself O(n)) is contained within the loop, that seems needlessly inefficient. So I tried to rewrite it to avoid that:
def bag_sub(list_big, sublist):
c = Counter(sublist)
result = []
for k in list_big:
if k in c:
c -= Counter({k: 1})
else:
result.append(k)
return result
Is this now O(n), or does the Counter.__isub__ usage still screw things up?
This approach requires that elements must be hashable, a restriction which the original didn't have. Is there an O(n) solution which avoids creating this additional restriction? Does Python have any better "bag" datatype than collections.Counter?
You can assume sublist is half the length of list_big.

I'd use a Counter, but I'd probably do it slightly differently, and I'd probably do this iteratively...
def bag_sub(big_list, sublist):
sublist_counts = Counter(sublist)
result = []
for item in big_list:
if sublist_counts[item] > 0:
sublist_counts[item] -= 1
else:
result.append(item)
return result
This is very similar to your solution, but it's probably not efficient to create an entire new counter every time you want to decrement the count on something.1
Also, if you don't need to return a list, then consider a generator function...
This works as long as all of the elements in list_big and sublist can be hashed. This solution is O(N + M) where N and M are the lengths of list_big and sublist respectively.
If the elements cannot be hashed, you are out of luck unless you have other constraints (e.g. the inputs are sorted using the same criterion). If your inputs are sorted, you could do something similar to the merge stage of merge-sort to determine which elements from bag_sub are in sublist.
1Note that Counters also behave a lot like a defaultdict(int) so it's perfectly fine to look for an item in a counter that isn't there already.

Is this now O(n), or does the Counter.__isub__ usage still screw things up?
This would be expected-case O(n), except that when Counter.__isub__ discards nonpositive values, it goes through every key to do so. You're better off just subtracting 1 from the key value the "usual" way and checking c[k] instead of k in c. (c[k] is 0 for k not in c, so you don't need an in check.)
if c[k]:
c[k] -= 1
else:
result.append(k)
Is there an O(n) solution which avoids creating this additional restriction?
Only if the inputs are sorted, in which case a standard variant of a mergesort merge can do it.
Does Python have any better "bag" datatype than collections.Counter?
collections.Counter is Python's bag.

Removing an item from a list of length N is O(N) if the list is unordered, because you have to find it.
Removing k items from a list of length N, therefore, is O(kN) if we focus on "reasonable" cases where k << N.
So I don't see how you could get it down to O(N).
A concise way to write this:
new_list = [x for x in list_big if x not in sublist]
But that's still O(kN).

Simple Merge Sort bug in Python

I'm doing a Merge Sort assignment in Python, but I keep have the error of RuntimeError: maximum recursion depth exceeded
Here's my code:
def merge_sort(list):
left_num = len(list) // 2
left_sorted = merge_sort(list[:left_num])
right_sorted = merge_sort(list[left_num:])
final_sort = merge(left_sorted, right_sorted)
return final_sort
def merge(left_sorted, right_sorted):
final_sort = []
while left_sorted and right_sorted:
if left_sorted[0] <= right_sorted[0]:
final_sort.append(left_sorted[0])
left_sorted.pop(0)
else:
final_sort.append(right_sorted[0])
right_sorted.pop(0)
final_sort = final_sort + left_sorted + right_sorted
return final_sort
if __name__ == "__main__":
list = [4, 2]
print(merge_sort(list))
Can someone tell me why? To make the problem more usable to others, feel free to edit the question to make it make more sense. ^_^

When you write a recursive function, you should be careful about the base case, which decides when the recursion should come to an end.
In your case, the base case is missing. For example, if the list has only one element, then you don't have recursively sort it again. So, that is your base condition.
def merge_sort(list):
if len(list) == 1:
return list
...
...
Note: The variable name list shadows the builtin function list. So better avoid using builtin names.
Since you are doing lot of pop(0)s, its worth noting that it is not efficient on lists. Quoting Python's official documentation,
Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.
So, the better alternative would be to use collections.deque, instead of list, if you are popping a lot. The actual popping from a deque is done with popleft method.
>>> from collections import deque
>>> d = deque([4, 2])
>>> d.popleft()
4
>>> d
deque([2])

You don't have an exit point in merge_sort. You need to do something like:
left_num = len(list) // 2
if left_num <= 1:
return list
You always need to have a conditional exit in recursion function: if COND then EXIT else RECURSION_CALL.

What is a typical way to add a reverse feature to an insertion sort?

I wrote the following insertion sort algorithm
def insertionSort(L, reverse=False):
for j in xrange(1,len(L)):
valToInsert = L[j]
i=j-1
while i>=0 and L[i] > valToInsert:
L[i+1] = L[i]
i-=1
L[i+1] = valToInsert
return L
Edit: All you need to do is change the final > to < to get it to work in reverse.
However, what do most people do in these situations? Write the algorithm twice in two if-statements, one where it's > and the other where it's < instead? What is the "correct" way to typically handle these kinds of scenarios where the change is minor but it simply changes the nature of the loop/code entirely?
I know this question is a little subjective.

You could use a variable for the less-than operator:
import operator
def insertionSort(L, reverse=False):
lt = operator.gt if reverse else operator.lt
for j in xrange(1,len(L)):
valToInsert = L[j]
i = j-1
while 0 <= i and lt(valToInsert, L[i]):
L[i+1] = L[i]
i -= 1
L[i+1] = valToInsert
return L

Option 1:
def insertionSort(L, reverse=False):
# loop is the same...
if reverse:
L.reverse()
return L
Option 2:
def insertionSort(L, reverse=False):
if reverse:
cmpfunc = lambda a, b: cmp(b, a)
else:
cmpfunc = cmp
for j in xrange(1,len(L)):
valToInsert = L[j]
i=j-1
while i>=0 and cmpfunc(L[i], valToInsert) > 0:
L[i+1] = L[i]
i-=1
L[i+1] = valToInsert
return L

You'll probably notice that sorted and list.sort and all other functions that do any kind of potentially-decorated processing have a key parameter, and those that specifically do ordering also have a reverse parameter. (The Sorting Mini-HOWTO covers this.)
So, you can look at how they're implemented. Unfortunately, in CPython, all of this stuff is implemented in C. Plus, it uses a custom algorithm called "timsort" (described in listsort.txt). But I think can explain the key parts here, since it's blindingly simple. The list.sort code is separate from the sorted code, and they're both spread out over a slew of functions. But if you just look at the top-level function listsort, you can see how it handles the reverse flag:
1982 /* Reverse sort stability achieved by initially reversing the list,
1983 applying a stable forward sort, then reversing the final result. */
1984 if (reverse) {
1985 if (keys != NULL)
1986 reverse_slice(&keys[0], &keys[saved_ob_size]);
1987 reverse_slice(&saved_ob_item[0], &saved_ob_item[saved_ob_size]);
1988 }
Why reverse the list at the start as well as the end? Well, in the case where the list is nearly-sorted in the first place, many sort algorithms—including both timsort and your insertion sort—will do a lot better starting in the right order than in backward order. Yes, it wastes an O(N) reverse call, but you're already doing one of those—and, since any sort algorithm is at least O(N log N), and yours is specifically O(N^2), this doesn't make it algorithmically worse. Of course for smallish N, and a better sort, and a list in random order, this wasted 2N is pretty close to N log N, so it can make a difference in practice. It'll be a difference that vanishes as N gets huge, but if you're sorting millions of smallish lists, rather than a few huge ones, it might be worth worrying about.
Second, notice that it does the reversing by creating a reverse slice. This, at least potentially, could be optimized by referencing the original list object with __getitem__ in reverse order, meaning the two reversals are actually O(1). The simplest way to do this is to literally create a reverse slice: lst[::-1]. Unfortunately, this actually creates a new reversed list, so timsort includes its own custom reverse-slice object. But you can do the same thing in Python by creating a ReversedList class.
This probably won't actually be faster in CPython, because the cost of the extra function calls is probably high enough to swamp the differences. But you're complaining about the algorithmic cost of the two reverse calls, and this solves the problem, in effectively the same way that the built-in sort functions do.
You can also look at how PyPy does it. Its list is implemented in listobject.py. It delegates to one of a few different Strategy classes depending on what the list contains, but if you look over all of the strategies (except the ones that have nothing to do), they basically do the same thing: sort the list, then reverse it.
So, it's good enough for CPython, and for PyPy… it's probably good enough for you.

Question on a solution from Google python class day

Hey,
I'm trying to learn a bit about Python so I decided to follow Google's tutorial. Anyway I had a question regarding one of their solution for an exercise.
Where I did it like this way.
# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
# +++your code here+++
return sorted(list1 + list2)
However they did it in a more complicated way. So is Google's solution quicker? Because I noticed in the comment lines that the solution should work in "linear" time, which mine probably isn't?
This is their solution
def linear_merge(list1, list2):
# +++your code here+++
# LAB(begin solution)
result = []
# Look at the two lists so long as both are non-empty.
# Take whichever element [0] is smaller.
while len(list1) and len(list2):
if list1[0] < list2[0]:
result.append(list1.pop(0))
else:
result.append(list2.pop(0))
# Now tack on what's left
result.extend(list1)
result.extend(list2)
return result

this could be another soln?
#
def linear_merge(list1, list2):
tmp = []
while len(list1) and len(list2):
#print list1[-1],list2[-1]
if list1[-1] > list2[-1]:
tmp.append(list1.pop())
else:
tmp.append(list2.pop())
#print "tmp = ",tmp
#print list1,list2
tmp = tmp + list1
tmp = tmp + list2
tmp.reverse()
return tmp

Yours is not linear, but that doesn't mean it's slower. Algorithmic complexity ("big-oh notation") is often only a rough guide and always only tells one part of the story.
However, theirs isn't linear either, though it may appear to be at first blush. Popping from a list requires moving all later items, so popping from the front requires moving all remaining elements.
It is a good exercise to think about how to make this O(n). The below is in the same spirit as the given solution, but avoids its pitfalls while generalizing to more than 2 lists for the sake of exercise. For exactly 2 lists, you could remove the heap handling and simply test which next item is smaller.
import heapq
def iter_linear_merge(*args):
"""Yield non-decreasing items from sorted a and b."""
# Technically, [1, 1, 2, 2] isn't an "increasing" sequence,
# but it is non-decreasing.
nexts = []
for x in args:
x = iter(x)
for n in x:
heapq.heappush(nexts, (n, x))
break
while len(nexts) >= 2:
n, x = heapq.heappop(nexts)
yield n
for n in x:
heapq.heappush(nexts, (n, x))
break
if nexts: # Degenerate case of the heap, not strictly required.
n, x = nexts[0]
yield n
for n in x:
yield n
Instead of the last if-for, the while loop condition could be changed to just "nexts", but it is probably worthwhile to specially handle the last remaining iterator.
If you want to strictly return a list instead of an iterator:
def linear_merge(*args):
return list(iter_linear_merge(*args))

With mostly-sorted data, timsort approaches linear. Also, your code doesn't have to screw around with the lists themselves. Therefore, your code is possibly just a bit faster.
But that's what timing is for, innit?

I think the issue here is that the tutorial is illustrating how to implement a well-known algorithm called 'merge' in Python. The tutorial is not expecting you to actually use a library sorting function in the solution.
sorted() is probably O(nlgn); then your solution cannot be linear in the worst case.
It is important to understand how merge() works because it is useful in many other algorithms. It exploits the fact the input lists are individually sorted, moving through each list sequentially and selecting the smallest option. The remaining items are appended at the end.
The question isn't which is 'quicker' for a given input case but about which algorithm is more complex.
There are hybrid variations of merge-sort which fall back on another sorting algorithm once the input list size drops below a certain threshold.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.