I have implemented insertion sort in python and was wondering how to determine the complexity of the algorithm. Is this an inefficient way of implementing insertion sort? To me, this seems like the most readable algorithm.
import random as rand
source = [3,1,0,10,20,2,1]
target = []
while len(source)!=0:
if len(target) ==0:
target.append(source[0])
source.pop(0)
element = source.pop(0)
if(element <= target[0]):
target.reverse()
target.append(element)
target.reverse()
elif element > target[len(target)-1]:
target.append(element)
else:
for i in range(0,len(target)-1):
if element >= target[i] and element <= target[i+1]:
target.insert(i+1,element)
break
print target
Instead of:
target.reverse()
target.append(element)
target.reverse()
try:
target.insert(0, element)
Also, maybe use a for loop, instead of a while loop, to avoid source.pop()?:
for value in source:
...
In the final else block, the first part of the if test is redundant:
else:
for i in range(0,len(target)-1):
if element >= target[i] and element <= target[i+1]:
target.insert(i+1,element)
break
Since the list is already sorted, as soon as you find an element larger than the one you're inserting, you've found the insertion location.
I would say it is rather inefficient. How can you tell? Your approach creates a second array, but you don't need one in a selection sort. You use a lot of operations -- selection sort requires lookups and exchanges, but you have lookups, appends, pops, inserts, and reverses. So you know that you can probably do better.
def insertionsort( aList ):
for i in range( 1, len( aList ) ):
tmp = aList[i]
k = i
while k > 0 and tmp < aList[k - 1]:
aList[k] = aList[k - 1]
k -= 1
aList[k] = tmp
This code is taken from geekviewpoint.com. Clearly it's a O(n^2) algorithm since it's using two loops. If the input is already sorted, however, then it's O(n) since the while-loop would then always be skipped due to tmp < aList[k - 1] failing.
Related
Could anyone explain exactly what's happening under the hood to make the recursive approach in the following problem much faster and efficient in terms of time complexity?
The problem: Write a program that would take an array of integers as input and return the largest three numbers sorted in an array, without sorting the original (input) array.
For example:
Input: [22, 5, 3, 1, 8, 2]
Output: [5, 8, 22]
Even though we can simply sort the original array and return the last three elements, that would take at least O(nlog(n)) time as the fastest sorting algorithm would do just that. So the challenge is to perform better and complete the task in O(n) time.
So I was able to come up with a recursive solution:
def findThreeLargestNumbers(array, largest=[]):
if len(largest) == 3:
return largest
max = array[0]
for i in array:
if i > max:
max = i
array.remove(max)
largest.insert(0, max)
return findThreeLargestNumbers(array, largest)
In which I kept finding the largest number, removing it from the original array, appending it to my empty array, and recursively calling the function again until there are three elements in my array.
However, when I looked at the suggested iterative method, I composed this code:
def findThreeLargestNumbers(array):
sortedLargest = [None, None, None]
for num in array:
check(num, sortedLargest)
return sortedLargest
def check(num, sortedLargest):
for i in reversed(range(len(sortedLargest))):
if sortedLargest[i] is None:
sortedLargest[i] = num
return
if num > sortedLargest[i]:
shift(sortedLargest, i, num)
return
def shift(array, idx, element):
if idx == 0:
array[0] = element
return array
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
return array
Both codes passed successfully all the tests and I was convinced that the iterative approach is faster (even though not as clean..). However, I imported the time module and put the codes to the test by providing an array of one million random integers and calculating how long each solution would take to return back the sorted array of the largest three numbers.
The recursive approach was way much faster (about 9 times faster) than the iterative approach!
Why is that? Even though the recursive approach is traversing the huge array three times and, on top of that, every time it removes an element (which takes O(n) time as all other 999 elements would need to be shifted in the memory), whereas the iterative approach is traversing the input array only once and yes making some operations at every iteration but with a very negligible array of size 3 that wouldn't even take time at all!
I really want to be able to judge and pick the most efficient algorithm for any given problem so any explanation would tremendously help.
Advice for optimization.
Avoid function calls. Avoid creating temporary garbage. Avoid extra comparisons. Have logic that looks at elements as little as possible. Walk through how your code works by hand and look at how many steps it takes.
Your recursive code makes only 3 function calls, and as pointed out elsewhere does an average of 1.5 comparisons per call. (1 while looking for the min, 0.5 while figuring out where to remove the element.)
Your iterative code makes lots of comparisons per element, calls excess functions, and makes calls to things like sorted that create/destroy junk.
Now compare with this iterative solution:
def find_largest(array, limit=3):
if len(array) <= limit:
# Special logic not needed.
return sorted(array)
else:
# Initialize the answer to values that will be replaced.
min_val = min(array[0:limit])
answer = [min_val for _ in range(limit)]
# Now scan for smallest.
for i in array:
if answer[0] < i:
# Sift elements down until we find the right spot.
j = 1
while j < limit and answer[j] < i:
answer[j-1] = answer[j]
j = j+1
# Now insert.
answer[j-1] = i
return answer
There are no function calls. It is possible that you can make up to 6 comparisons per element (verify that answer[0] < i, verify that (j=1) < 3, verify that answer[1] < i, verify that (j=2) < 3, verify that answer[2] < i, then find that (j=3) < 3 is not true). You will hit that worst case if array is sorted. But most of the time you only do the first comparison then move to the next element. No muss, no fuss.
How does it benchmark?
Note that if you wanted the smallest 100 elements, then you'd find it worthwhile to use a smarter data structure such as a heap to avoid the bubble sort.
I am not really confortable with python, but I have a different approach to the problem for what it's worth.
As far as I saw, all solutions posted are O(NM) where N is the length of the array and M the length of the largest elements array.
Because of your specific situation whereN >> M you could say it's O(N), but the longest the inputs the more it will be O(NM)
I agree with #zvone that it seems you have more steps in the iterative solution, which sounds like an valid explanation to your different computing speeds.
Back to my proposal, implements binary search O(N*logM) with recursion:
import math
def binarySearch(arr, target, origin = 0):
"""
Recursive binary search
Args:
arr (list): List of numbers to search in
target (int): Number to search with
Returns:
int: index + 1 from inmmediate lower element to target in arr or -1 if already present or lower than the lowest in arr
"""
half = math.floor((len(arr) - 1) / 2);
if target > arr[-1]:
return origin + len(arr)
if len(arr) == 1 or target < arr[0]:
return -1
if arr[half] < target and arr[half+1] > target:
return origin + half + 1
if arr[half] == target or arr[half+1] == target:
return -1
if arr[half] < target:
return binarySearch(arr[half:], target, origin + half)
if arr[half] > target:
return binarySearch(arr[:half + 1], target, origin)
def findLargestNumbers(array, limit = 3, result = []):
"""
Recursive linear search of the largest values in an array
Args:
array (list): Array of numbers to search in
limit (int): Length of array returned. Default: 3
Returns:
list: Array of max values with length as limit
"""
if len(result) == 0:
result = [float('-inf')] * limit
if len(array) < 1:
return result
val = array[-1]
foundIndex = binarySearch(result, val)
if foundIndex != -1:
result.insert(foundIndex, val)
return findLargestNumbers(array[:-1],limit, result[1:])
return findLargestNumbers(array[:-1], limit,result)
It is quite flexible and might be inspiration for a more elaborated answer.
The recursive solution
The recursive function goes through the list 3 times to fins the largest number and removes the largest number from the list 3 times.
for i in array:
if i > max:
...
and
array.remove(max)
So, you have 3×N comparisons, plus 3x removal. I guess the removal is optimized in C, but there is again about 3×(N/2) comparisons to find the item to be removed.
So, a total of approximately 4.5 × N comparisons.
The other solution
The other solution goes through the list only once, but each time it compares to the three elements in sortedLargest:
for i in reversed(range(len(sortedLargest))):
...
and almost each time it sorts the sortedLargest with these three assignments:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
So, you are N times:
calling check
creating and reversing a range(3)
accessing sortedLargest[i]
comparing num > sortedLargest[i]
calling shift
comparing idx == 0
and about 2×N/3 times doing:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
and N/3 times array[0] = element
It is difficult to count, but that is much more than 4.5×N comparisons.
I am trying to solve assignment problem, the code I wrote takes extremely long to run. I think it's due to nested loop I used. Is there another way to rewrite the code to make it more efficient.
The question I am trying to solve. Basically, starting at first element to compare with every element to its right. if it is larger than the rest, it will be "dominator". Then the second element to compare with every element to its right again. All the way to the last element which will be automatically become "dominator"
def count_dominators(items):
if len(items) ==0:
return len(items)
else:
k = 1
for i in range(1,len(items)):
for j in items[i:]:
if items[i-1]>j:
k = k+1
else:
k = k+0
return k
You can use a list comprehension to check if each item is a "dominator", then take the length - you have to exclude the final one to avoid taking max of an empty list, then we add 1 because we know that the final one is actually a dominator.
num_dominators = len([i for i in range(len(items) - 1) if items[i] > max(items[i + 1:])]) + 1
This is nice because it fits on one line, but the more efficient (single pass through the list) way to do it is to start at the end and every time we find a new number bigger than any we have seen before, count it:
biggest = items[-1]
n = 1
for x in reversed(items):
if x > biggest:
biggest = x
n+=1
return n
I should not use advance function, as this is a logical test during interview.
Trying to remove all digits which appear more than once in array.
testcase:
a=[1,1,2,3,2,4,5,6,7]
code:
def dup(a):
i=0
arraySize = len(a)
print(arraySize)
while i < arraySize:
#print("1 = ",arraySize)
k=i+1
for k in range(k,arraySize):
if a[i] == a[k]:
a.remove(a[k])
arraySize -= 1
#print("2 = ",arraySize)
i += 1
print(a)
result should be : 1,2,3,4,5,6,7
But i keep getting index out of range. i know that it is because the array list inside the loop changed, so the "while" initial index is different with the new index.
The question is : any way to sync the new index length (array inside the loop) with the parent loop (index in "while" loop) ?
The only thing i can think of is to use function inside the loop.
any hint?
Re-Calculating Array Size Per Iteration
It looks like we have a couple issues here. The first issue is that you can't update the "stop" value in your inner loop (the range function). So first off, let's remove that and use another while loop to give us the ability to re-calculate our array size every iteration.
Re-Checking Values Shifted Into Removed List Spot
Next, after you fix that you will run into a larger issue. When you use remove it moves a value from the end of the list or shifts the entire list to the left to use the removed spot, and you are not re-checking the value that got moved into the old values removed spot. To resolve this, we need to decrement i whenever we remove an element, this makes sure we are checking the value that gets placed into the removed elements spot.
remove vs del
You should use del over remove in this case. remove iterates over the list and removes the first occurrence of the value and it looks like we already know the exact index of the value we want to remove. remove might work, but it's usage here over complicates things a bit.
Functional Code with Minimal Changeset
def dup(a):
i = 0
arraySize = len(a)
print(arraySize)
while i < arraySize:
k = i + 1
while k < arraySize: # CHANGE: use a while loop to have greater control over the array size.
if a[i] == a[k]:
print("Duplicate found at indexes %d and %d." % (i, k))
del a[i] # CHANGE: used del instead of remove.
i -= 1 # CHANGE: you need to recheck the new value that got placed into the old removed spot.
arraySize -= 1
break
k += 1
i += 1
return a
Now, I'd like to note that we have some readability and maintainability issues with the code above. Iterating through an array and manipulating the iterator in the way we are doing is a bit messy and could be prone to simple mistakes. Below are a couple ways I'd implement this problem in a more readable and maintainable manner.
Simple Readable Alternative
def remove_duplicates(old_numbers):
""" Simple/naive implementation to remove duplicate numbers from a list of numbers. """
new_numbers = []
for old_number in old_numbers:
is_duplicate = False
for new_number in new_numbers:
if old_number == new_number:
is_duplicate = True
if is_duplicate == False:
new_numbers.append(old_number)
return new_numbers
Optimized Low Level Alternative
def remove_duplicates(numbers):
""" Removes all duplicates in the list of numbers in place. """
for i in range(len(numbers) - 1, -1, -1):
for k in range(i, -1, -1):
if i != k and numbers[i] == numbers[k]:
print("Duplicate found. Removing number at index: %d" % i)
del numbers[i]
break
return numbers
You could copy contents in another list and remove duplicates from that and return the list. For example:
duplicate = a.copy()
f = 0
for j in range(len(a)):
for i in range(len(duplicate)):
if i < len(duplicate):
if a[j] == duplicate[i]:
f = f+1
if f > 1:
f = 0
duplicate.remove(duplicate[i])
f=0
print(duplicate)
Given a list x, I want to sort it with selection sort, and then count the number of swaps made within the sort. So I came out with something like this:
count=0
a=0
n=len(x)
while (n-a)>0:
#please recommend a better way to swap.
i = (min(x[a:n]))
x[i], x[a] = x[a], x[i]
a += 1
#the count must still be there
count+=1
print (x)
Could you help me to find a way to manage this better? It doesn't work that well.
The problem is NOT about repeated elements. Your code doesn't work for lists with all elements distinct, either. Try x = [2,6,4,5].
i = (min(x[a:n]))
min() here gets the value of the minimum element in the slice, and then you use it as an index, that doesn't make sense.
You are confusing the value of an element, with its location. You must use the index to identify the location.
seq = [2,1,0,0]
beg = 0
n = len(seq)
while (n - beg) > 0:
jdx = seq[beg:n].index((min(seq[beg:n]))) # use the remaining unsorted right
seq[jdx + beg], seq[beg] = seq[beg], seq[jdx + beg] # swap the minimum with the first unsorted element.
beg += 1
print(seq)
print('-->', seq)
As the sorting progresses, the left of the list [0:beg] is sorted, and the right side [beg:] is being sorted, until completion.
jdx is the location (the index) of the minimum of the remaining of the list (finding the min must happen on the unsorted right part of the list --> [beg:])
Just learning Python and got on to the subject of sorting lists. Two types of algorithms were shown: insertion and selection. So, I had an idea and created this:
def DiffSort(lst):
lstDiff = [None] * len(lst)
i = 0
while i < len(lst):
lstDiff[i] = lst[i] - lst[i-1] if i != 0 else lst[0]
if lstDiff[i] < 0:
sbj, tmp = lst[i], lstDiff[i]
while tmp < 0:
i -= 1
tmp += lstDiff[i]
lst[i+1] = lst[i]
lst[i] = sbj
else:
i += 1
lst = [13,25,18,122,32,1,0.78,25,85,1,32,56,0.55,0.6,17]
print(lst)
DiffSort(lst)
print(lst)
Any good? Is there a similar method out there already?
list.sort() if you want to sort a list in-place.
sorted(list) if you want to return a sorted copy of the list.
The second option works with any iterable type, whereas the first is list-exclusive (although some other types may have the same or a similar function defined as well, but you can generally not expect that).
Since you seem to care about the algorithmic part of it, this may be of interest to you:
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
Isn't lst.sort() good enough? It's bound to be much faster than a Python solution that has to run in O(n^2) time.