Making the complexity smaller (better)

Making the complexity smaller (better) - python

I have an algorithm that looks for the good pairs in a list of numbers. A good pair is being considered as index i being less than j and arr[i] < arr[j]. It currently has a complexity of O(n^2) but I want to make it O(nlogn) based on divide and conquering. How can I go about doing that?
Here's the algorithm:
def goodPairs(nums):
count = 0
for i in range(0,len(nums)):
for j in range(i+1,len(nums)):
if i < j and nums[i] < nums[j]:
count += 1
j += 1
j += 1
return count
Here's my attempt at making it but it just returns 0:
def goodPairs(arr):
count = 0
if len(arr) > 1:
# Finding the mid of the array
mid = len(arr)//2
# Dividing the array elements
left_side = arr[:mid]
# into 2 halves
right_side = arr[mid:]
# Sorting the first half
goodPairs(left_side)
# Sorting the second half
goodPairs(right_side)
for i in left_side:
for j in right_side:
if i < j:
count += 1
return count

The current previously accepted answer by Fire Assassin doesn't really answer the question, which asks for better complexity. It's still quadratic, and about as fast as a much simpler quadratic solution. Benchmark with 2000 shuffled ints:
387.5 ms original
108.3 ms pythonic
104.6 ms divide_and_conquer_quadratic
4.1 ms divide_and_conquer_nlogn
4.6 ms divide_and_conquer_nlogn_2
Code (Try it online!):
def original(nums):
count = 0
for i in range(0,len(nums)):
for j in range(i+1,len(nums)):
if i < j and nums[i] < nums[j]:
count += 1
j += 1
j += 1
return count
def pythonic(nums):
count = 0
for i, a in enumerate(nums, 1):
for b in nums[i:]:
if a < b:
count += 1
return count
def divide_and_conquer_quadratic(arr):
count = 0
left_count = 0
right_count = 0
if len(arr) > 1:
mid = len(arr) // 2
left_side = arr[:mid]
right_side = arr[mid:]
left_count = divide_and_conquer_quadratic(left_side)
right_count = divide_and_conquer_quadratic(right_side)
for i in left_side:
for j in right_side:
if i < j:
count += 1
return count + left_count + right_count
def divide_and_conquer_nlogn(arr):
mid = len(arr) // 2
if not mid:
return 0
left = arr[:mid]
right = arr[mid:]
count = divide_and_conquer_nlogn(left)
count += divide_and_conquer_nlogn(right)
i = 0
for r in right:
while i < mid and left[i] < r:
i += 1
count += i
arr[:] = left + right
arr.sort() # linear, as Timsort takes advantage of the two sorted runs
return count
def divide_and_conquer_nlogn_2(arr):
mid = len(arr) // 2
if not mid:
return 0
left = arr[:mid]
right = arr[mid:]
count = divide_and_conquer_nlogn_2(left)
count += divide_and_conquer_nlogn_2(right)
i = 0
arr.clear()
append = arr.append
for r in right:
while i < mid and left[i] < r:
append(left[i])
i += 1
append(r)
count += i
arr += left[i:]
return count
from timeit import timeit
from random import shuffle
arr = list(range(2000))
shuffle(arr)
funcs = [
original,
pythonic,
divide_and_conquer_quadratic,
divide_and_conquer_nlogn,
divide_and_conquer_nlogn_2,
]
for func in funcs:
print(func(arr[:]))
for _ in range(3):
print()
for func in funcs:
arr2 = arr[:]
t = timeit(lambda: func(arr2), number=1)
print('%5.1f ms ' % (t * 1e3), func.__name__)

One of the most well-known divide-and-conquer algorithms is merge sort. And merge sort is actually a really good foundation for this algorithm.
The idea is that when comparing two numbers from two different 'partitions', you already have a lot of information about the remaining part of these partitions, as they're sorted in every iteration.
Let's take an example!
Consider the following partitions, which has already been sorted individually and "good pairs" have been counted.
Partition x: [1, 3, 6, 9].
Partition y: [4, 5, 7, 8].
It is important to note that the numbers from partition x is located further to the left in the original list than partition y. In particular, for every element in x, it's corresponding index i must be smaller than some index j for every element in y.
We will start of by comparing 1 and 4. Obviously 1 is smaller than 4. But since 4 is the smallest element in partition y, 1 must also be smaller than the rest of the elements in y. Consequently, we can conclude that there is 4 additional good pairs, since the index of 1 is also smaller than the index of the remaining elements of y.
The exact same thing happens with 3, and we can add 4 new good pairs to the sum.
For 6 we will conclude that there is two new good pairs. The comparison between 6 and 4 did not yield a good pair and likewise for 6 and 5.
You might now notice how these additional good pairs would be counted? Basically if the element from x is less than the element from y, add the number of elements remaining in y to the sum. Rince and repeat.
Since merge sort is an O(n log n) algorithm, and the additional work in this algorithm is constant, we can conclude that this algorithm is also an O(n log n) algorithm.
I will leave the actual programming as an exercise for you.

#niklasaa has added an explanation for the merge sort analogy, but your implementation still has an issue.
You are partitioning the array and calculating the result for either half, but
You haven't actually sorted either half. So when you're comparing their elements, your two pointer approach isn't correct.
You haven't used their results in the final computation. That's why you're getting an incorrect answer.
For point #1, you should look at merge sort, especially the merge() function. That logic is what will give you the correct pair count without having O(N^2) iteration.
For point #2, store the result for either half first:
# Sorting the first half
leftCount = goodPairs(left_side)
# Sorting the second half
rightCount = goodPairs(right_side)
While returning the final count, add these two results as well.
return count + leftCount + rightCount

Like #Abhinav Mathur stated, you have most of the code down, your problem is with these lines:
# Sorting the first half
goodPairs(left_side)
# Sorting the second half
goodPairs(right_side)
You want to store these in variables that should be declared before the if statement. Here's an updated version of your code:
def goodPairs(arr):
count = 0
left_count = 0
right_count = 0
if len(arr) > 1:
mid = len(arr) // 2
left_side = arr[:mid]
right_side = arr[mid:]
left_count = goodPairs(left_side)
right_count = goodPairs(right_side)
for i in left_side:
for j in right_side:
if i < j:
count += 1
return count + left_count + right_count
Recursion can be difficult at times, look into the idea of merge sort and quick sort to get better ideas on how the divide and conquer algorithms work.

Related

Minimum count to sort an array in Python by sending the element to the end

Here is the explanation of what I'm trying to say:-
Input:- 5 1 3 2 7
Output:- 3
Explanation:
In first move, we move 3 to the end. Our list becomes 5,1,2,7,3
In second move, we move 5 to the end. Our list becomes 1,2,7,3,5
In third move, we move 7 to the end. Our final list = 1,2,3,5,7
So, total moves are:- 3.
Here is what I tried to do, but failed.
a = [int(i) for i in input().split()]
count = 0
n = 0
while (n < len(a) - 1):
for i in range(0,n+1):
while (a[i] > a[i + 1]):
temp = a[i]
a.pop(i)
a.append(temp)
count += 1
n += 1
print(count, end='')
I'd like to request your assistance in helping in solving this question.

jdehesa's answer is basically right, but not optimal for cases, when there is more element of same value. Maybe more complex solution?
def min_moves(a):
c = 0
while(1):
tmp = None
for i in range(0, len(a)):
if a[i] != min(a[i:]) and (tmp is None or a[i] < a[tmp]):
tmp = i
if tmp is None:
return c
else:
a.append(a.pop(tmp))
c += 1
Edit:
Or if you don't need ordered list, there's much more easier solution just to count items that are out of order for the reason from jdehesa's solution :-D
def min_moves(a):
c = 0
for i in range(0, len(a)):
if a[i] != min(a[i:]):
c += 1
return c
Edit 2:
Or if you like jdehesa's answer more, small fix is to reduce lst to set, so it will get smallest index
sorted_index = {elem: i for i, elem in enumerate(sorted(set(lst)))}
I cannot comment yet.

I don't know if it can be done better, but I think the following algorithm gives the right answer:
def num_move_end_sort(lst):
# dict that maps each list element to its index in the sorted list
sorted_index = {elem: i for i, elem in enumerate(sorted(lst))}
moves = 0
for idx, elem in enumerate(lst):
if idx != sorted_index[elem] + moves:
moves += 1
return moves
print(num_move_end_sort([5, 1, 3, 2, 7]))
# 3
The idea is as follows. Each element of the list would have to be moved to the end at most once (it should be easy to see that a solution that moves the same element to the end more than once can be simplified). So each element in the list may or may not need to be moved once to the end. If an element does not need to be moved is because it ended up in the right position after all the moves. So, if an element is currently at position i and should end up in position j, then the element will not need to be moved if the number of previous elements that need to be moved, n, satisfies j == i + n (because, after those n moves, the element will indeed be at position j).
So in order to compute that, I sorted the list and took the indices of each element in the sorted list. Then you just count the number of elements that are not in the right position.
Note this algorithm does not tell you the actual sequence of steps you would need to take (the order in which the elements would have to be moved), only the count. The complexity is O(n·log(n)) (due to the sorting).

I think you can simplify your problem,
Counting elements that need to be pushed at the end is equivalent to counting the length of the elements that are not in sorted order.
l = [5, 1, 3, 2, 7]
sorted_l = sorted(l)
current_element = sorted_l[0]
current_index = 0
ans = 0
for element in l:
if current_element == element:
current_index += 1
if current_index < len(l):
current_element = sorted_l[current_index]
else:
ans += 1
print(ans)
Here the answer is 3

PassingCars in Codility using Python

I have a coding challenge next week as the first round interview. The HR said they will use Codility as the coding challenge platform. I have been practicing using the Codility Lessons.
My issue is that I often get a very high score on Correctness, but my Performance score, which measure time complexity, is horrible (I often get 0%).
Here's the question:
https://app.codility.com/programmers/lessons/5-prefix_sums/passing_cars/
My code is:
def solution(A):
N = len(A)
my_list = []
count = 0
for i in range(N):
if A[i] == 1:
continue
else:
my_list = A[i + 1:]
count = count + sum(my_list)
print(count)
return count
It is supposed to be O(N) but mine is O(N**2).
How can someone approach this question to solve it under the O(N) time complexity?
In general, when you look at an algorithm question, how do you come up with an approach?

You should not sum the entire array each time you find a zero. That makes it O(n^2). Instead note that every zero found will give a +1 for each following one:
def solution(A):
zeros = 0
passing = 0
for i in A:
if i == 0:
zeros += 1
else:
passing += zeros
return passing

You may check all codility solutions as well as passingcars example.
Don’t forget the 1000000000 limit.
def solution(a):
pc=0
fz=0
for e in a:
if pc>1000000000:
return -1
if e==0:
fz+=1
else:
pc+=fz
return pc

Regarding the above answer, it is correct but missing checking the cases where passing exceeds 1000000000.
Also, I found a smarter and simple way to count the pairs of cars that could be passed where you just count all existing ones inside the array from the beginning and inside the loop when you find any zero, you say Ok we could possibly pair all the ones with this zero (so we increase the count) and as we are already looping through the whole array, if we find one then we can simply remove that one from ones since we will never need it again.
It takes O(N) as a time complexity since you just need to loop once in the array.
def solution(A):
ones = A.count(1)
c = 0
for i in range(0, len(A)):
if A[i] == 0:
c += ones
else: ones -= 1
if c > 1000000000:
return -1
return c

In a loop, find indices of zeros in list and nb of ones in front of first zero. In another loop, find the next zero, reduce nOnesInFront by the index difference between current and previous zero
def solution(A):
count = 0; zeroIndices = []
nOnesInFront = 0; foundZero = False
for i in range(len(A)):
if A[i] == 0:
foundZero = True
zeroIndices.append(i)
elif foundZero: nOnesInFront += 1;
if nOnesInFront == 0: return 0 #no ones in front of a zero
if not zeroIndices: return 0 #no zeros
iPrev = zeroIndices[0]
count = nOnesInFront
for i in zeroIndices[1:]:
nOnesInFront -= (i-iPrev) - 1 #decrease nb of ones by the differnce between current and previous zero index
iPrev = i
count += nOnesInFront
if count > 1000000000: return -1
else: return count
Here are some tests cases to verify the solution:
print(solution([0, 1, 0, 1, 1])) # 5
print(solution([0, 1, 1, 0, 1])) # 4
print(solution([0])) # 0
print(solution([1])) # 0
print(solution([1, 0])) # 0
print(solution([0, 1])) # 1
print(solution([1, 0, 0])) # 0
print(solution([1, 0, 1])) # 1
print(solution([0, 0, 1])) # 2

Why can't I implement merge sort this way

I understand mergesort works by divide and conquer, you keep halving until you reach a point where you can sort in constant time or the list is just one lement and then you merge the lists.
def mergesort(l):
if len(l)<=1:
return l
l1 = l[0:len(l)//2+1]
l2 = l[len(l)//2:]
l1 = mergesort(l1)
l2 = mergesort(l2)
return merge(l1,l2)
I have a working merge implementation and I checked it works fine but the merge sort implementation does not work it just returns half of the elements of the list.
I see on the internet mergesort is implemented using l & r and m = (l + r)/2. What is wrong with my implementation? I am recursively subdividing the list and merging too.

the problem is the +1 in your code, here:
l1 = l[0:len(l)//2]
l2 = l[len(l)//2:]
replace this with your code and you're be fine

The code you have listed doesn't appear to do any sorting. I can't know for certain because you haven't listed the merge() function's code, but the only thing that the above function will do is recursively divide the list into halves. Here is a working implementation of a merge sort:
def mergeSort(L):
# lists with only one value already sorted
if len(L) > 1:
# determine halves of list
mid = len(L) // 2
left = L[:mid]
right = L[mid:]
# recursive function calls
mergeSort(left)
mergeSort(right)
# keeps track of current index in left half
i = 0
# keeps track of current index in right half
j = 0
# keeps track of current index in new merged list
k = 0
while i < len(left) and j < len(right):
# lower values appended to merged list first
if left[i] < right[j]:
L[k] = left[i]
i += 1
else:
L[k] = right[j]
j += 1
k += 1
# catch remaining values in left and right
while i < len(left):
L[k] = left[i]
i += 1
k += 1
while j < len(right):
L[k] = right[j]
j += 1
k += 1
return L
Your function makes no comparisons of values in the original list. Also, when you are splitting the list into halves in:
l1 = l[0:len(l)//2 + 1]
the '+ 1' is unnecessary (and can actually cause incorrect solutions). You can simply use:
l1 = l[:len(l)//2]
If the length is even (i.e 12) it will divide the two halves from [0:6] and [6:12]. If it is odd it will still automatically divide correctly (i.e. length = 13 would be [0:6] and [6:13]. I hope this helps!

What is wrong with this approach for insertion sort?

Could anyone explain to me why the following method for insertion sort is wrong please?
def insertion_sort(m):
n = 1
while n < len(m)-1:
current_value = m[n]
if m[n] < m[n-1]:
m[n] = m[n-1]
m[n-1] = current_value
n = n + 1
return m
#how my code turned out:
m = [7,4,5] returned [4,7,5] instead of [4,5,7]

See explanations in the code comments:
def insertion_sort(m):
n = 1
while n < len(m): # <-- end at the end of the list
current_value = m[n]
if m[n] < m[n-1]:
m[n] = m[n-1]
m[n-1] = current_value
n = n + 1
return m

As alfasin mentioned, the problem is with the while loop condition!
You can actually use for loop for this scenario. Take a look at this
And to be more pythonic, you can swap two numbers as a,b = b,a as mentioned in the comments by #TigerhawkT3!
def insertion_sort(m):
for n in range(len(m)): #iterate from 0 to m-1. auto increment value is by default 1
if m[n] < m[n-1]: m[n], m[n-1] = m[n-1], m[n]
return m
print insertion_sort([7,4,5])
return m
print insertion_sort([7,4,5])
Output:
[4, 5, 7]
Moreover, the one you've tried isnt actually the insertion sort algorithm. You got to fine tune yours to for list with more than 3 elements!

Explanation in the comments
def insertion_sort(array):
# For each number in the list (not starting
# with the first element because the first
# element has no prior (i.e., no array[j-1])
for i in range(1, len(array)):
# Set a temp variable j (totally not necessary
# but is helpful for the sake of a visual)
j = i
# While j is greater than 0 (position 0 is the
# very beginning of the array) AND the current
# element array[j] is less than the prior element
# array[j-1]
while j>0 and array[j] < array[j-1]:
# Set the current element equal to the prior
# element, and set the prior element equal to
# the current element (i.e., switch their positions)
array[j], array[j-1] = array[j-1], array[j]
# Decrement j (because we are moving each element
# towards the beginning of the list until it is
# in a sorted position)
j-=1
return array
array = [1, 5, 8, 3, 9, 2]
First for-loop iteration: [1,5,8,3,9,2] (5 is already in a sorted position)
Second for-loop iteration: [1,5,8,3,9,2] (8 is already in a sorted position)
Third: [1,3,5,8,9,2] (move 3 back until it's sorted)
Fourth: [1,2,3,8,9] (move 2 back until it's sorted)
Hope this slight illustration helps.

Largest subset in an array such that the smallest and largest elements are less than K apart

Given an array, I want to find the largest subset of elements such that the smallest and largest elements of the subset are less than or equal to K apart. Specifically, I want the elements, not just the size. If there are multiple occurrences, any can be matched.
For example, in the array [14,15,17,20,23], if K was 3, the largest subset possible would be [14,15,17]. The same would go if 17 was replaced by 16. Also, multiple elements should be matched, such as [14,14,14,15,16,17,17]. The array is not necessarily sorted, but it is probably a good starting point to sort it. The elements are not necessarily integral and the subset not necessarily consecutive in the original array - I just want an occurrence of the largest possible subset.
To illustrate the desired result more clearly, a naïve approach would be to first sort the array, iterate over every element of the sorted array, and then create a new array containing the current element that is extended to contain every element after the current element <= K larger than it. (i.e. in the first above example, if the current element was 20, the array would be extended to [20,23] and then stop because the end of the array was reached. If the current element was 15, the array would be extended to [15,17] and then stop because 20 is more than 3 larger than 15.) This array would then be checked against a current maximum and, if it was larger, the current maximum would be replaced. The current maximum is then the largest subset. (This method is of complexity O(N^2), in the case that the largest subset is the array.)
I am aware of this naïve approach, and this question is asking for an optimised algorithm.
A solution in Python is preferable although I can run with a general algorithm.

This seems very similar to your "naïve" approach, but it's O(n) excluding the sort so I don't think you can improve on your approach much. The optimization is to use indices and only create a second array once the answer is known:
def largest_less_than_k_apart(a, k):
a.sort()
upper_index = lower_index = max_length = max_upper_index = max_lower_index = 0
while upper_index < len(a):
while a[lower_index] < a[upper_index] - k:
lower_index += 1
if upper_index - lower_index + 1 > max_length:
max_length = upper_index - lower_index + 1
max_upper_index, max_lower_index = upper_index, lower_index
upper_index += 1
return a[max_lower_index:max_upper_index + 1]
a = [14,15,17,20,23]
print largest_less_than_k_apart(a, 3);
Output:
[14, 15, 17]
It does one pass through the sorted array, with the current index stored in upper_index and another index lower_index that lags behind as far as possible while still pointing to a value greater than or equal to K less than the value of the current element. The function keeps track of when the two indices are as far apart as possible and uses those indices to split the list and return the subset.
Duplicate elements are handled, because lower_index lags behind as far as possible (pointing to the earliest duplicate), whereas the difference of indices will be maximal when upper_index is pointing to the last duplicate of a given subset.
It's not valid to pass in a negative value for k.

I assume that we can not modify array by sorting it & we have to find out largest consecutive Subset, So my solution (in python 3.2) is :
arr = [14, 15, 17, 20, 23]
k = 3
f_start_index=0
f_end_index =0
length = len(arr)
for i in range(length):
min_value = arr[i]
max_value = arr[i]
start_index = i
end_index = i
for j in range((i+1),length):
if (min_value != arr[j] and max_value != arr[j]) :
if (min_value > arr[j]) :
min_value = arr[j]
elif (max_value < arr[j]) :
max_value = arr[j]
if(max_value-min_value) > k :
break
end_index = j
if (end_index-start_index) > (f_end_index-f_start_index):
f_start_index = start_index
f_end_index = end_index
if(f_end_index-f_start_index>=(length-j+1)): # for optimization
break
for i in range(f_start_index,f_end_index+1):
print(arr[i],end=" ")
It is not most efficient solution , but it will get your work done.
Tested against :
1.input:[14, 15, 17, 20, 23]
1.output:14 15 17
2.input:[14,14,14,15,16,17,17]
2.output:14 14 14 15 16 17 17
3.input:[23 ,20, 17 , 16 ,14]
3.output:17 16 14
4.input:[-2,-1,0,1,2,4]
4.output:-2 -1 0 1
For input number 4 there are two possible answers
-2 -1 0 1
-1 0 1 2
But my solution take first as if subset's length is same then it will print the subset which occurs first in array when we traverse array elements from position 0 to array length-1
But if we have to find largest subset in array which may or may not be consecutive then solution would be different.

Brute force approach:
arr = [14,14,14,15,16,17,17]
max_difference = 3
solution = []
for i, start in enumerate(arr):
tmp = []
largest = start
smallest = start
for j, end in enumerate(arr[i:]):
if abs(end - largest) <= max_difference and abs(end - smallest) <= max_difference:
tmp.append(end)
if end > largest:
largest = end
if end < smallest:
smallest = end
else:
break
if len(tmp) > len(solution):
solution = tmp
Try to optimize it! (Tip: the inner loop doesn't need to run as many times as it does here)

An inefficient algorithm (O(n^2)) for this would be very simple:
l = [14,15,17,20,23]
s = max((list(filter(lambda x: start<=x<=start+3, l)) for start in l), key=len)
print(s)

A speedy approach with complexity O(n*log(n)) for the sort and O(n) to search for the longest chain:
list_1 = [14, 15, 17, 20, 23]
k = 3
list_1.sort()
list_len = len(list_1)
min_idx = -1
max_idx = -1
idx1 = 0
idx2 = 0
while idx2 < list_len-1:
idx2 += 1
while list_1[idx2] - list_1[idx1] > k:
idx1 += 1
if idx2 - idx1 > max_idx - min_idx:
min_idx, max_idx = idx1, idx2
print(list_1[min_idx:max_idx+1])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.