What is this code's space complexity - python

I'm not sure about the space complexity of these two selection sort implementations:
def selection_sort(lst):
n = len(lst)
for i in range(n):
m_index = i
for j in range(i+1,n):
if lst[m_index] > lst[j]:
m_index = j
swap(lst, i, m_index)
return None
and this one:
def selection_sort2(lst):
n = len(lst)
for i in range(n):
m = min(lst[i:n])
m_index = lst.index(m) #find the index of the minimum
lst[i], lst[m_index] = lst[m_index], lst[i]
return None
and, regarding the second code, where are the previous slices being saved, once m gets a new slice?
Thanks!

The first point to make is that your second function contains a bug in its use of index. Running this:
def selection_sort2(lst):
n = len(lst)
for i in range(n):
m = min(lst[i:n])
m_index = lst.index(m) #find the index of the minimum
lst[i], lst[m_index] = lst[m_index], lst[i]
return
l = [5,4,1,3,4]
selection_sort2(l)
print(l)
prints out
[1, 3, 5, 4, 4]
This is because you have misunderstood the index function. What it does is to find the first occurrence of the supplied value (here m) in the supplied list (here lst). So what your code is doing is first of all to create a slice and find its min. Then the slice goes out of scope and is garbage collected. Then you find the value in the whole list (in the wrong place in this example).
We can fix this by restricting the index to the slice, though bear in mind that this is not good code, as I will explain next.
m_index = lst.index(m,i) #find the index of the minimum
With this change, the function works, but it has two problems. The first is that the slicing does (as you suspected) create a copy and so doubles the memory requirement of the code. But the second problem is that once you find the minimum value, you then pointlessly iterate through the slice a second time to find the index of the place where you found the minimum, so also doubling the run time.
The copying can be fixed by replacing the slice with a generator expression. So instead of a slice we just produce the values one at a time.
Then we can arrange to find the index of the minimum by carrying it along with the value in a tuple. Then minimising the tuples provides us with the index at the same time. The resulting code looks like this:
def selection_sort2(lst):
n = len(lst)
for i in range(n):
m,m_index = min((lst[j],j) for j in range(i,n))
lst[i], lst[m_index] = lst[m_index], lst[i]
return
However, this code is functionally more or less the same as your first example and probably not any clearer - so why change?

Related

Swapping List Elements During Iteration

I have read in several places that it is bad practice to modify an array/list during iteration. However many common algorithms appear to do this. For example Bubble Sort, Insertion Sort, and the example below for finding the minimum number of swaps needed to sort a list.
Is swapping list items during iteration an exception to the rule? If so why?
Is there a difference between what happens with enumerate and a simple for i in range(len(arr)) loop in this regard?
def minimumSwaps(arr):
ref_arr = sorted(arr)
index_dict = {v: i for i,v in enumerate(arr)}
swaps = 0
for i,v in enumerate(arr):
print("i:", i, "v:", v)
print("arr: ", arr)
correct_value = ref_arr[i]
if v != correct_value:
to_swap_ix = index_dict[correct_value]
print("swapping", arr[to_swap_ix], "with", arr[i])
# Why can you modify list during iteration?
arr[to_swap_ix],arr[i] = arr[i], arr[to_swap_ix]
index_dict[v] = to_swap_ix
index_dict[correct_value] = i
swaps += 1
return swaps
arr = list(map(int, "1 3 5 2 4 6 7".split(" ")))
assert minimumSwaps(arr) == 3
An array should not be modified while iterating through it, because iterators cannot handle the changes. But there are other ways to go through an array, without using iterators.
This is using iterators:
for index, item in enumerate(array):
# don't modify array here
This is without iterators:
for index in range(len(array)):
item = array[index]
# feel free to modify array, but make sure index and len(array) are still OK
If the length & index need to be modified when modifying an array, do it even more "manually":
index = 0
while index < len(array):
item = array[index]
# feel free to modify array and modify index if needed
index += 1
Modifying items in a list could sometimes produce unexpected result but it's perfectly fine to do if you are aware of the effects. It's not unpredictable.
You need to understand it's not a copy of the original list you ar iterating through. The next item is always the item on the next index in the list. So if you alter the item in an index before iterator reaches it the iterator will yield the new value.
That means if you for example intend to move all items one index up by setting item at index+1 to current value yielded from enumerate(). Then you will end up with a list completely filled with the item originally on index 0.
a = ['a','b','c','d']
for i, v in enumerate(a):
next_i = (i + 1) % len(a)
a[next_i] = v
print(a) # prints ['a', 'a', 'a', 'a']
And if you appending and inserting items to the list while iterating you may never reach the end.
In your example, and as you pointed out in a lot of algorithms for e.g. combinatoric and sorting, it's a part of the algorithm to change the forthcoming items.
An iterator over a range as in for i in range(len(arr)) won't adapt to changes in the original list because the range is created before starting and is immutable. So if the list has length 4 in the beginning, the loop will try iterate exactly 4 times regardless of changes of the lists length.
# This is probably a bad idea
for i in range(len(arr)):
item = arr[i]
if item == 0:
arr.pop()
# This will work (don't ask for a use case)
for i, item in enumerate(arr):
if item == 0:
arr.pop()

recursion vs iteration time complexity

Could anyone explain exactly what's happening under the hood to make the recursive approach in the following problem much faster and efficient in terms of time complexity?
The problem: Write a program that would take an array of integers as input and return the largest three numbers sorted in an array, without sorting the original (input) array.
For example:
Input: [22, 5, 3, 1, 8, 2]
Output: [5, 8, 22]
Even though we can simply sort the original array and return the last three elements, that would take at least O(nlog(n)) time as the fastest sorting algorithm would do just that. So the challenge is to perform better and complete the task in O(n) time.
So I was able to come up with a recursive solution:
def findThreeLargestNumbers(array, largest=[]):
if len(largest) == 3:
return largest
max = array[0]
for i in array:
if i > max:
max = i
array.remove(max)
largest.insert(0, max)
return findThreeLargestNumbers(array, largest)
In which I kept finding the largest number, removing it from the original array, appending it to my empty array, and recursively calling the function again until there are three elements in my array.
However, when I looked at the suggested iterative method, I composed this code:
def findThreeLargestNumbers(array):
sortedLargest = [None, None, None]
for num in array:
check(num, sortedLargest)
return sortedLargest
def check(num, sortedLargest):
for i in reversed(range(len(sortedLargest))):
if sortedLargest[i] is None:
sortedLargest[i] = num
return
if num > sortedLargest[i]:
shift(sortedLargest, i, num)
return
def shift(array, idx, element):
if idx == 0:
array[0] = element
return array
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
return array
Both codes passed successfully all the tests and I was convinced that the iterative approach is faster (even though not as clean..). However, I imported the time module and put the codes to the test by providing an array of one million random integers and calculating how long each solution would take to return back the sorted array of the largest three numbers.
The recursive approach was way much faster (about 9 times faster) than the iterative approach!
Why is that? Even though the recursive approach is traversing the huge array three times and, on top of that, every time it removes an element (which takes O(n) time as all other 999 elements would need to be shifted in the memory), whereas the iterative approach is traversing the input array only once and yes making some operations at every iteration but with a very negligible array of size 3 that wouldn't even take time at all!
I really want to be able to judge and pick the most efficient algorithm for any given problem so any explanation would tremendously help.
Advice for optimization.
Avoid function calls. Avoid creating temporary garbage. Avoid extra comparisons. Have logic that looks at elements as little as possible. Walk through how your code works by hand and look at how many steps it takes.
Your recursive code makes only 3 function calls, and as pointed out elsewhere does an average of 1.5 comparisons per call. (1 while looking for the min, 0.5 while figuring out where to remove the element.)
Your iterative code makes lots of comparisons per element, calls excess functions, and makes calls to things like sorted that create/destroy junk.
Now compare with this iterative solution:
def find_largest(array, limit=3):
if len(array) <= limit:
# Special logic not needed.
return sorted(array)
else:
# Initialize the answer to values that will be replaced.
min_val = min(array[0:limit])
answer = [min_val for _ in range(limit)]
# Now scan for smallest.
for i in array:
if answer[0] < i:
# Sift elements down until we find the right spot.
j = 1
while j < limit and answer[j] < i:
answer[j-1] = answer[j]
j = j+1
# Now insert.
answer[j-1] = i
return answer
There are no function calls. It is possible that you can make up to 6 comparisons per element (verify that answer[0] < i, verify that (j=1) < 3, verify that answer[1] < i, verify that (j=2) < 3, verify that answer[2] < i, then find that (j=3) < 3 is not true). You will hit that worst case if array is sorted. But most of the time you only do the first comparison then move to the next element. No muss, no fuss.
How does it benchmark?
Note that if you wanted the smallest 100 elements, then you'd find it worthwhile to use a smarter data structure such as a heap to avoid the bubble sort.
I am not really confortable with python, but I have a different approach to the problem for what it's worth.
As far as I saw, all solutions posted are O(NM) where N is the length of the array and M the length of the largest elements array.
Because of your specific situation whereN >> M you could say it's O(N), but the longest the inputs the more it will be O(NM)
I agree with #zvone that it seems you have more steps in the iterative solution, which sounds like an valid explanation to your different computing speeds.
Back to my proposal, implements binary search O(N*logM) with recursion:
import math
def binarySearch(arr, target, origin = 0):
"""
Recursive binary search
Args:
arr (list): List of numbers to search in
target (int): Number to search with
Returns:
int: index + 1 from inmmediate lower element to target in arr or -1 if already present or lower than the lowest in arr
"""
half = math.floor((len(arr) - 1) / 2);
if target > arr[-1]:
return origin + len(arr)
if len(arr) == 1 or target < arr[0]:
return -1
if arr[half] < target and arr[half+1] > target:
return origin + half + 1
if arr[half] == target or arr[half+1] == target:
return -1
if arr[half] < target:
return binarySearch(arr[half:], target, origin + half)
if arr[half] > target:
return binarySearch(arr[:half + 1], target, origin)
def findLargestNumbers(array, limit = 3, result = []):
"""
Recursive linear search of the largest values in an array
Args:
array (list): Array of numbers to search in
limit (int): Length of array returned. Default: 3
Returns:
list: Array of max values with length as limit
"""
if len(result) == 0:
result = [float('-inf')] * limit
if len(array) < 1:
return result
val = array[-1]
foundIndex = binarySearch(result, val)
if foundIndex != -1:
result.insert(foundIndex, val)
return findLargestNumbers(array[:-1],limit, result[1:])
return findLargestNumbers(array[:-1], limit,result)
It is quite flexible and might be inspiration for a more elaborated answer.
The recursive solution
The recursive function goes through the list 3 times to fins the largest number and removes the largest number from the list 3 times.
for i in array:
if i > max:
...
and
array.remove(max)
So, you have 3×N comparisons, plus 3x removal. I guess the removal is optimized in C, but there is again about 3×(N/2) comparisons to find the item to be removed.
So, a total of approximately 4.5 × N comparisons.
The other solution
The other solution goes through the list only once, but each time it compares to the three elements in sortedLargest:
for i in reversed(range(len(sortedLargest))):
...
and almost each time it sorts the sortedLargest with these three assignments:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
So, you are N times:
calling check
creating and reversing a range(3)
accessing sortedLargest[i]
comparing num > sortedLargest[i]
calling shift
comparing idx == 0
and about 2×N/3 times doing:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
and N/3 times array[0] = element
It is difficult to count, but that is much more than 4.5×N comparisons.

How to update array Index in loop (IndexError: list index out of range)

I should not use advance function, as this is a logical test during interview.
Trying to remove all digits which appear more than once in array.
testcase:
a=[1,1,2,3,2,4,5,6,7]
code:
def dup(a):
i=0
arraySize = len(a)
print(arraySize)
while i < arraySize:
#print("1 = ",arraySize)
k=i+1
for k in range(k,arraySize):
if a[i] == a[k]:
a.remove(a[k])
arraySize -= 1
#print("2 = ",arraySize)
i += 1
print(a)
result should be : 1,2,3,4,5,6,7
But i keep getting index out of range. i know that it is because the array list inside the loop changed, so the "while" initial index is different with the new index.
The question is : any way to sync the new index length (array inside the loop) with the parent loop (index in "while" loop) ?
The only thing i can think of is to use function inside the loop.
any hint?
Re-Calculating Array Size Per Iteration
It looks like we have a couple issues here. The first issue is that you can't update the "stop" value in your inner loop (the range function). So first off, let's remove that and use another while loop to give us the ability to re-calculate our array size every iteration.
Re-Checking Values Shifted Into Removed List Spot
Next, after you fix that you will run into a larger issue. When you use remove it moves a value from the end of the list or shifts the entire list to the left to use the removed spot, and you are not re-checking the value that got moved into the old values removed spot. To resolve this, we need to decrement i whenever we remove an element, this makes sure we are checking the value that gets placed into the removed elements spot.
remove vs del
You should use del over remove in this case. remove iterates over the list and removes the first occurrence of the value and it looks like we already know the exact index of the value we want to remove. remove might work, but it's usage here over complicates things a bit.
Functional Code with Minimal Changeset
def dup(a):
i = 0
arraySize = len(a)
print(arraySize)
while i < arraySize:
k = i + 1
while k < arraySize: # CHANGE: use a while loop to have greater control over the array size.
if a[i] == a[k]:
print("Duplicate found at indexes %d and %d." % (i, k))
del a[i] # CHANGE: used del instead of remove.
i -= 1 # CHANGE: you need to recheck the new value that got placed into the old removed spot.
arraySize -= 1
break
k += 1
i += 1
return a
Now, I'd like to note that we have some readability and maintainability issues with the code above. Iterating through an array and manipulating the iterator in the way we are doing is a bit messy and could be prone to simple mistakes. Below are a couple ways I'd implement this problem in a more readable and maintainable manner.
Simple Readable Alternative
def remove_duplicates(old_numbers):
""" Simple/naive implementation to remove duplicate numbers from a list of numbers. """
new_numbers = []
for old_number in old_numbers:
is_duplicate = False
for new_number in new_numbers:
if old_number == new_number:
is_duplicate = True
if is_duplicate == False:
new_numbers.append(old_number)
return new_numbers
Optimized Low Level Alternative
def remove_duplicates(numbers):
""" Removes all duplicates in the list of numbers in place. """
for i in range(len(numbers) - 1, -1, -1):
for k in range(i, -1, -1):
if i != k and numbers[i] == numbers[k]:
print("Duplicate found. Removing number at index: %d" % i)
del numbers[i]
break
return numbers
You could copy contents in another list and remove duplicates from that and return the list. For example:
duplicate = a.copy()
f = 0
for j in range(len(a)):
for i in range(len(duplicate)):
if i < len(duplicate):
if a[j] == duplicate[i]:
f = f+1
if f > 1:
f = 0
duplicate.remove(duplicate[i])
f=0
print(duplicate)

Analyzing the complexity of this sort algorithm

I know merge sort is the best way to sort a list of arbitrary length, but I am wondering how to optimize my current method.
def sortList(l):
'''
Recursively sorts an arbitrary list, l, to increasing order.
'''
#base case.
if len(l) == 0 or len(l) == 1:
return l
oldNum = l[0]
newL = sortList(l[1:]) #recursive call.
#if oldNum is the smallest number, add it to the beginning.
if oldNum <= newL[0]:
return [oldNum] + newL
#find where oldNum goes.
for n in xrange(len(newL)):
if oldNum >= newL[n]:
try:
if oldNum <= newL[n+1]:
return newL[:n+1] + [oldNum] + newL[n+1:]
#if index n+1 is non-existant, oldNum must be the largest number.
except IndexError:
return newL + [oldNum]
What is the complexity of this function? I was thinking O(n^2) but I wasn't sure. Also, is there anyway to further optimize this procedure? (besides ditching it and going for merge sort!).
There's a few places I'd optimize your code.
You do a lot of list copies: each time you slice, you create a new copy of the list. That can be avoided by adding an index to the function declaration that indicates where in the array to start sorting from.
You should follow PEP 8 for naming: sort_list rather than sortList.
The code that does the insertion is a bit weird; intentionally raising an out-of-bounds index exception isn't normal programming practice. Instead, just percolate the value up the array until it's in the right place.
Applying these changes gives this code:
def sort_list(l, i=0):
if i == len(l): return
sort_list(l, i+1)
for j in xrange(i+1, len(l)):
if l[j-1] <= l[j]: return
l[j-1], l[j] = l[j], l[j-1]
This now sorts the array in-place, so there's no return value.
Here's some simple tests:
cases = [
[1, 2, 0, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[5, 4, 3, 2, 1, 1]
]
for c in cases:
got = c[:]
sort_list(got)
if sorted(c) != got:
print "sort_list(%s) = %s, want %s" % (c, got, sorted(c))
The time complexity is, as you suggest, O(n^2) where n is the length of the list. My version uses O(n) additional memory, whereas yours, because of the way the list gets copied at each stage, uses O(n^2).
One more step, which further improves the memory usage is to eliminate the recursion. Here's a version that does that:
def sort_list(l):
for i in xrange(len(l)-2, -1, -1):
for j in xrange(i+1, len(l)):
if l[j-1] <= l[j]: break
l[j-1], l[j] = l[j], l[j-1]
This works just the same as the recursive version, but does it iteratively; first sorting the last two elements in the array, then the last three, then the last four, and so on until the whole array is sorted.
This still has runtime complexity O(n^2), but now uses O(1) additional memory. Also, avoiding recursion means you can sort longer lists without hitting the notoriously low recursion limit in Python. And another benefit is that this code is now O(n) in the best case (when the array is already sorted).
A young Euler came up with a formula that seems appropriate here. The story goes that in grade school his teacher was very tired and to keep the class busy for a while they were told to add up all the numbers zero to one hundred. Young Euler came back with this:
This is applicable here because your run-time is going to be proportional to the sum of all the numbers up to the length of your list because in the worst case your function will be sorting an already sorted list and will go through the entire length newL each time to find the position of the next element at the end of the list.

Simple selection sort with repeated elements?

Given a list x, I want to sort it with selection sort, and then count the number of swaps made within the sort. So I came out with something like this:
count=0
a=0
n=len(x)
while (n-a)>0:
#please recommend a better way to swap.
i = (min(x[a:n]))
x[i], x[a] = x[a], x[i]
a += 1
#the count must still be there
count+=1
print (x)
Could you help me to find a way to manage this better? It doesn't work that well.
The problem is NOT about repeated elements. Your code doesn't work for lists with all elements distinct, either. Try x = [2,6,4,5].
i = (min(x[a:n]))
min() here gets the value of the minimum element in the slice, and then you use it as an index, that doesn't make sense.
You are confusing the value of an element, with its location. You must use the index to identify the location.
seq = [2,1,0,0]
beg = 0
n = len(seq)
while (n - beg) > 0:
jdx = seq[beg:n].index((min(seq[beg:n]))) # use the remaining unsorted right
seq[jdx + beg], seq[beg] = seq[beg], seq[jdx + beg] # swap the minimum with the first unsorted element.
beg += 1
print(seq)
print('-->', seq)
As the sorting progresses, the left of the list [0:beg] is sorted, and the right side [beg:] is being sorted, until completion.
jdx is the location (the index) of the minimum of the remaining of the list (finding the min must happen on the unsorted right part of the list --> [beg:])

Categories

Resources