I've been studying python algorithm and would like to solve a problem that is:
A positive integer array A and an integer K are given.
Find the largest even sum of the array A with K elements.
If not possible, return -1.
For example, if there is an array A= [1,2,3,4,4,5] and K= 3,
the answer is 12 (5+4+3),
which is the largest even sum with K (3) elements.
However, if A= [3, 3, 3] and K= 1,
the answer is -1 because it cannot make an even sum with one element.
I tried to exclude every minimum odd from the array, but it failed when K=n in the while loop.
Is there any simple way to solve this problem? I would sincerely appreciate if you could give some advice.
Sort the array and "take" the biggest K elements.
If it's already even sum - you are done.
Otherwise, you need to replace exactly one element, replace an even element you have chosen with an odd one you have not, or the other way around. You need the difference between the two elements to be minimized.
A naive solution will check all possible ways to do that, but that's O(n^2). You can do better, by checking the actual two viable candidates:
The maximal odd element you did not choose, and the minimal
even element you have chosen
The maximal even element you did not choose and the minimal odd element you have chosen.
Choose the one that the difference between the two elements is minimal. If no such two elements exist (i.e. your k=3, [3,3,3] example) - there is no viable solution.
Time complexity is O(nlogn) for sorting.
In my (very rusty) python, it should be something like:
def FindMaximalEvenArray(a, k):
a = sorted(a)
chosen = a[len(a)-k:]
not_chosen = a[0:len(a)-k]
if sum(chosen) % 2 == 0:
return sum(chosen)
smallest_chosen_even = next((x for x in chosen if x % 2 == 0), None)
biggest_not_chosen_odd = next((x for x in not_chosen[::-1] if x % 2 != 0), None)
candidiate1 = smallest_chosen_even - biggest_not_chosen_odd if smallest_chosen_even and biggest_not_chosen_odd else float("inf")
smallest_chosen_odd = next((x for x in chosen if x % 2 != 0), None)
biggest_not_chosen_even = next((x for x in not_chosen[::-1] if x % 2 == 0), None)
candidiate2 = smallest_chosen_odd - biggest_not_chosen_even if smallest_chosen_odd and biggest_not_chosen_even else float("inf")
if candidiate1 == float("inf") and candidiate2 == float("inf"):
return -1
return sum(chosen) - min(candidiate1, candidiate2)
Note: This can be done even better (in terms of time complexity), because you don't actually care for the order of all elements, only finding the "candidates" and the top K elements. So you could use Selection Algorithm instead of sorting, which will make this run in O(n) time.
Related
Suppose there are n sets of real numbers: S[1], S[2], ..., S[n]. We know two things about these sets:
Each set S[i] has exactly 3 elements.
All elements in each of the sets S[i] are real numbers in the [0, 1] range. (I don't know if this detail can be helpful for the solution, though).
Let's consider a set T of all numbers that can be represented as p[1] * p[2] * p[3] * ... * p[n] where p[i] is an element of S[i]. This set T, obviously, has 3^n elements.
My question is, given the sets S[1], S[2], ..., S[n] (1 <= n <= 30) and some 1 <= k <= 10 as input, can we find the k-th largest number in T faster than in O(3^n) time? It's important that I need not only the k-th largest number, but also the corresponding numbers (p[1], p[2], p[3], ... , p[n]) that produce it.
Even if the answer is no, I would appreciate any hints on how you would solve this problem approximately, maybe, by using some heuristics? I know about beam search, but maybe you could suggest something else? And even for beam search, it is not really clear how to implement it here the best way.
If the exact answer can be obtained algorithmically in less than O(3^n) time, I would greatly appreciate it if you could point out the solution.
Well, you know that the largest product is the one that uses the largest factor from each set.
Furthermore, every other product can be formed by starting with a larger one, and then decreasing the factor chosen in exactly one set.
That leads to a simple search:
Put the largest product in a max-first priority queue.
Repeat k times:
a. Remove the largest product p from the priority queue
b. For each set that has a smaller number than the one selected in p,
generate the product formed by decreasing that number to the next lower one in that set. If this selection of factors hasn't been seen before, then add it to the priority queue.
Products will be removed from the queue in decreasing order, so the kth one you take out is the kth largest.
Complexity is about N*(k log kN), depending on how you implement things.
Note that there may be multiple ways to select the factors that produce the same product. This solution considers those ways to be distinct products, i.e., each way is counted when finding the kth largest. That may or may not be what you want.
To put the previous discussion into code we can do the following:
import operator
from functools import partial, reduce
import heapq
def prod_by_data(tup, data):
return reduce(operator.mul, (datum[t] for t, datum in zip(tup, data)), 1)
def downset(tup):
return [
tuple(t - (1 if j == i else 0) for j, t in enumerate(tup))
for i in range(len(tup))
if tup[i] > 0
]
data = [
[1, 2, 3],
[4, 2, 1],
[8, 1, 3],
[1, 1, 2],
]
data = [sorted(d) for d in data]
prod = partial(prod_by_data, data=data)
k_smallest = [tuple(len(dat) - 1 for dat in data)]
possible_k_smallest = []
while len(k_smallest) < 10:
new_possible = sorted(downset(k_smallest[-1]), key=prod, reverse=True)
possible_k_smallest = heapq.merge(possible_k_smallest, new_possible, key=prod, reverse=True)
k_smallest.append(next(possible_k_smallest))
print(k_smallest)
print([prod(tup) for tup in k_smallest])
We maintain a heap of the smallest elements. After we pop off the smallest, we need to check all if its downset (tuples that differ in exactly one position), because those tuples might be the next smallest element.
We see that we look through k - 1 times sorting O(n) elements each time with a key that itself is O(n). Because of the key this should make the sort take O(n^2) instead of O(n log n). The heapq is lazy and so popping from it is actually O(k). The initial sorting and preparation should be O(n) as well. Overall I think this makes everything O(k n^2).
Input is an array that has at most one element that appears at least 60% a time. The goal is to find if this array has such an element and if yes, find that element. I came up with a divide and conquer function that finds such an element.
from collections import Counter
def CommonElement(a):
c = Counter(a)
return c.most_common(1) #Returns the element and it's frequency
def func(array):
if len(array) == 1:
return array[0]
mid = len(array)//2
left_element = func(array[:mid])
right_element = func(array[mid:])
if left_element == right_element:
return right_element
most_common_element = CommonElement(array)
element_count = most_common_element[0][1] #Getting the frequency of the element
percent = element_count/len(array)
if percent >= .6:
return most_common_element[0][0] #Returning the value of the element
else:
return None
array = [10,9,10,10,5,10,10,10,12,42,10,10,44,10,23,10] #Correctly Returns 10
array = [10,9,10,8,5,10,10,10,12,42,10,12,44,10,23,5] #Correctly Returns None
result = func(array)
print(result)
This function works but it's in O(n log(n)). I want to implement an algorithm that's O(n)
The recursion function for my algorithm is T(n) = 2T(n/2) + O(n). I think the goal is to eliminate the need to find frequency, which takes O(n). Any thoughts?
If you are guaranteed to have a list 60% of which is a given number, that number is guaranteed to be the median. To see this intuitively, sort the list. The number in question represents a contiguous window that is 60% of the length of the list. There is no way to place that window so that it doesn't cover the middle.
There are plenty of divide-and-conquer algorithms for finding the median. A common one is called introselect. You can find an implementation in numpy's partition and argpartition functions (it's written in C). The basic idea is to do quicksort, but only recurse into the portion that contains the index you care about. This reduces the algorithm to O(n).
By the way, you could search for any index between 40% and 60% of the length of the list. 50% seems like a reasonable middle ground.
To verify that the median appears > 60% of the time, run a single loop over the array, counting the number of times the median appears.
You can create a frequency counter for all elements in the list one time in O(n). Then, iterate the lookup table and see if any are at least 60% of the elements (in other words, count / len(lst) >= 0.6).
>>> from collections import Counter
>>> L = [4, 2, 3, 2, 4, 4, 4]
>>> Counter(L)
Counter({4: 4, 2: 1, 3: 1})
>>> Counter(L).most_common(1)
[(4, 4)]
>>> item, count = Counter(L).most_common(1)[0]
>>> count / len(L)
0.6666666666666666
>>> count / len(L) >= 0.6
True
Divide & conquer is a creative, but inappropriate, approach for this problem.
...or so I thought, but see this answer.
There's a pretty simple algorithm for finding the majority element of a collection, if the collection has one:
def majority(l):
count, candidate = 0, None
for element in l:
if count == 0:
count, candidate = 1, element
elif element == candidate:
count += 1
else:
count -= 1
return candidate
This algorithm essentially pairs each element of the input against another element with a different value until all unpaired elements have the same value, then returns that value. If the input has a majority element, the algorithm must return that.
You can compute a candidate with this algorithm, then make another pass through the input and see if that candidate is a 60% supermajority. This works in O(1) space and O(n) time without mutating the input, while hash-based or introselect-based algorithms would need more space or mutate the input. It's also immune to hash collision attacks (unlike Counter and other hash-based approaches) and doesn't require elements to have an order relation (unlike introselect).
Could anyone explain exactly what's happening under the hood to make the recursive approach in the following problem much faster and efficient in terms of time complexity?
The problem: Write a program that would take an array of integers as input and return the largest three numbers sorted in an array, without sorting the original (input) array.
For example:
Input: [22, 5, 3, 1, 8, 2]
Output: [5, 8, 22]
Even though we can simply sort the original array and return the last three elements, that would take at least O(nlog(n)) time as the fastest sorting algorithm would do just that. So the challenge is to perform better and complete the task in O(n) time.
So I was able to come up with a recursive solution:
def findThreeLargestNumbers(array, largest=[]):
if len(largest) == 3:
return largest
max = array[0]
for i in array:
if i > max:
max = i
array.remove(max)
largest.insert(0, max)
return findThreeLargestNumbers(array, largest)
In which I kept finding the largest number, removing it from the original array, appending it to my empty array, and recursively calling the function again until there are three elements in my array.
However, when I looked at the suggested iterative method, I composed this code:
def findThreeLargestNumbers(array):
sortedLargest = [None, None, None]
for num in array:
check(num, sortedLargest)
return sortedLargest
def check(num, sortedLargest):
for i in reversed(range(len(sortedLargest))):
if sortedLargest[i] is None:
sortedLargest[i] = num
return
if num > sortedLargest[i]:
shift(sortedLargest, i, num)
return
def shift(array, idx, element):
if idx == 0:
array[0] = element
return array
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
return array
Both codes passed successfully all the tests and I was convinced that the iterative approach is faster (even though not as clean..). However, I imported the time module and put the codes to the test by providing an array of one million random integers and calculating how long each solution would take to return back the sorted array of the largest three numbers.
The recursive approach was way much faster (about 9 times faster) than the iterative approach!
Why is that? Even though the recursive approach is traversing the huge array three times and, on top of that, every time it removes an element (which takes O(n) time as all other 999 elements would need to be shifted in the memory), whereas the iterative approach is traversing the input array only once and yes making some operations at every iteration but with a very negligible array of size 3 that wouldn't even take time at all!
I really want to be able to judge and pick the most efficient algorithm for any given problem so any explanation would tremendously help.
Advice for optimization.
Avoid function calls. Avoid creating temporary garbage. Avoid extra comparisons. Have logic that looks at elements as little as possible. Walk through how your code works by hand and look at how many steps it takes.
Your recursive code makes only 3 function calls, and as pointed out elsewhere does an average of 1.5 comparisons per call. (1 while looking for the min, 0.5 while figuring out where to remove the element.)
Your iterative code makes lots of comparisons per element, calls excess functions, and makes calls to things like sorted that create/destroy junk.
Now compare with this iterative solution:
def find_largest(array, limit=3):
if len(array) <= limit:
# Special logic not needed.
return sorted(array)
else:
# Initialize the answer to values that will be replaced.
min_val = min(array[0:limit])
answer = [min_val for _ in range(limit)]
# Now scan for smallest.
for i in array:
if answer[0] < i:
# Sift elements down until we find the right spot.
j = 1
while j < limit and answer[j] < i:
answer[j-1] = answer[j]
j = j+1
# Now insert.
answer[j-1] = i
return answer
There are no function calls. It is possible that you can make up to 6 comparisons per element (verify that answer[0] < i, verify that (j=1) < 3, verify that answer[1] < i, verify that (j=2) < 3, verify that answer[2] < i, then find that (j=3) < 3 is not true). You will hit that worst case if array is sorted. But most of the time you only do the first comparison then move to the next element. No muss, no fuss.
How does it benchmark?
Note that if you wanted the smallest 100 elements, then you'd find it worthwhile to use a smarter data structure such as a heap to avoid the bubble sort.
I am not really confortable with python, but I have a different approach to the problem for what it's worth.
As far as I saw, all solutions posted are O(NM) where N is the length of the array and M the length of the largest elements array.
Because of your specific situation whereN >> M you could say it's O(N), but the longest the inputs the more it will be O(NM)
I agree with #zvone that it seems you have more steps in the iterative solution, which sounds like an valid explanation to your different computing speeds.
Back to my proposal, implements binary search O(N*logM) with recursion:
import math
def binarySearch(arr, target, origin = 0):
"""
Recursive binary search
Args:
arr (list): List of numbers to search in
target (int): Number to search with
Returns:
int: index + 1 from inmmediate lower element to target in arr or -1 if already present or lower than the lowest in arr
"""
half = math.floor((len(arr) - 1) / 2);
if target > arr[-1]:
return origin + len(arr)
if len(arr) == 1 or target < arr[0]:
return -1
if arr[half] < target and arr[half+1] > target:
return origin + half + 1
if arr[half] == target or arr[half+1] == target:
return -1
if arr[half] < target:
return binarySearch(arr[half:], target, origin + half)
if arr[half] > target:
return binarySearch(arr[:half + 1], target, origin)
def findLargestNumbers(array, limit = 3, result = []):
"""
Recursive linear search of the largest values in an array
Args:
array (list): Array of numbers to search in
limit (int): Length of array returned. Default: 3
Returns:
list: Array of max values with length as limit
"""
if len(result) == 0:
result = [float('-inf')] * limit
if len(array) < 1:
return result
val = array[-1]
foundIndex = binarySearch(result, val)
if foundIndex != -1:
result.insert(foundIndex, val)
return findLargestNumbers(array[:-1],limit, result[1:])
return findLargestNumbers(array[:-1], limit,result)
It is quite flexible and might be inspiration for a more elaborated answer.
The recursive solution
The recursive function goes through the list 3 times to fins the largest number and removes the largest number from the list 3 times.
for i in array:
if i > max:
...
and
array.remove(max)
So, you have 3×N comparisons, plus 3x removal. I guess the removal is optimized in C, but there is again about 3×(N/2) comparisons to find the item to be removed.
So, a total of approximately 4.5 × N comparisons.
The other solution
The other solution goes through the list only once, but each time it compares to the three elements in sortedLargest:
for i in reversed(range(len(sortedLargest))):
...
and almost each time it sorts the sortedLargest with these three assignments:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
So, you are N times:
calling check
creating and reversing a range(3)
accessing sortedLargest[i]
comparing num > sortedLargest[i]
calling shift
comparing idx == 0
and about 2×N/3 times doing:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
and N/3 times array[0] = element
It is difficult to count, but that is much more than 4.5×N comparisons.
I'm trying to write the fastest algorithm possible to return the number of "magic triples" (i.e. x, y, z where z is a multiple of y and y is a multiple of x) in a list of 3-2000 integers.
(Note: I believe the list was expected to be sorted and unique but one of the test examples given was [1,1,1] with the expected result of 1 - that is a mistake in the challenge itself though because the definition of a magic triple was explicitly noted as x < y < z, which [1,1,1] isn't. In any case, I was trying to optimise an algorithm for sorted lists of unique integers.)
I haven't been able to work out a solution that doesn't include having three consecutive loops and therefore being O(n^3). I've seen one online that is O(n^2) but I can't get my head around what it's doing, so it doesn't feel right to submit it.
My code is:
def solution(l):
if len(l) < 3:
return 0
elif l == [1,1,1]:
return 1
else:
halfway = int(l[-1]/2)
quarterway = int(halfway/2)
quarterIndex = 0
halfIndex = 0
for i in range(len(l)):
if l[i] >= quarterway:
quarterIndex = i
break
for i in range(len(l)):
if l[i] >= halfway:
halfIndex = i
break
triples = 0
for i in l[:quarterIndex+1]:
for j in l[:halfIndex+1]:
if j != i and j % i == 0:
multiple = 2
while (j * multiple) <= l[-1]:
if j * multiple in l:
triples += 1
multiple += 1
return triples
I've spent quite a lot of time going through examples manually and removing loops through unnecessary sections of the lists but this still completes a list of 2,000 integers in about a second where the O(n^2) solution I found completes the same list in 0.6 seconds - it seems like such a small difference but obviously it means mine takes 60% longer.
Am I missing a really obvious way of removing one of the loops?
Also, I saw mention of making a directed graph and I see the promise in that. I can make the list of first nodes from the original list with a built-in function, so in principle I presume that means I can make the overall graph with two for loops and then return the length of the third node list, but I hit a wall with that too. I just can't seem to make progress without that third loop!!
from array import array
def num_triples(l):
n = len(l)
pairs = set()
lower_counts = array("I", (0 for _ in range(n)))
upper_counts = lower_counts[:]
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[i] += 1
upper_counts[j] += 1
return sum(nx * nz for nz, nx in zip(lower_counts, upper_counts))
Here, lower_counts[i] is the number of pairs of which the ith number is the y, and z is the other number in the pair (i.e. the number of different z values for this y).
Similarly, upper_counts[i] is the number of pairs of which the ith number is the y, and x is the other number in the pair (i.e. the number of different x values for this y).
So the number of triples in which the ith number is the y value is just the product of those two numbers.
The use of an array here for storing the counts is for scalability of access time. Tests show that up to n=2000 it makes negligible difference in practice, and even up to n=20000 it only made about a 1% difference to the run time (compared to using a list), but it could in principle be the fastest growing term for very large n.
How about using itertools.combinations instead of nested for loops? Combined with list comprehension, it's cleaner and much faster. Let's say l = [your list of integers] and let's assume it's already sorted.
from itertools import combinations
def div(i,j,k): # this function has the logic
return l[k]%l[j]==l[j]%l[i]==0
r = sum([div(i,j,k) for i,j,k in combinations(range(len(l)),3) if i<j<k])
#alaniwi provided a very smart iterative solution.
Here is a recursive solution.
def find_magicals(lst, nplet):
"""Find the number of magical n-plets in a given lst"""
res = 0
for i, base in enumerate(lst):
# find all the multiples of current base
multiples = [num for num in lst[i + 1:] if not num % base]
res += len(multiples) if nplet <= 2 else find_magicals(multiples, nplet - 1)
return res
def solution(lst):
return find_magicals(lst, 3)
The problem can be divided into selecting any number in the original list as the base (i.e x), how many du-plets we can find among the numbers bigger than the base. Since the method to find all du-plets is the same as finding tri-plets, we can solve the problem recursively.
From my testing, this recursive solution is comparable to, if not more performant than, the iterative solution.
This answer was the first suggestion by #alaniwi and is the one I've found to be the fastest (at 0.59 seconds for a 2,000 integer list).
def solution(l):
n = len(l)
lower_counts = dict((val, 0) for val in l)
upper_counts = lower_counts.copy()
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[lower] += 1
upper_counts[upper] += 1
return sum((lower_counts[y] * upper_counts[y] for y in l))
I think I've managed to get my head around it. What it is essentially doing is comparing each number in the list with every other number to see if the smaller is divisible by the larger and makes two dictionaries:
One with the number of times a number is divisible by a larger
number,
One with the number of times it has a smaller number divisible by
it.
You compare the two dictionaries and multiply the values for each key because the key having a 0 in either essentially means it is not the second number in a triple.
Example:
l = [1,2,3,4,5,6]
lower_counts = {1:5, 2:2, 3:1, 4:0, 5:0, 6:0}
upper_counts = {1:0, 2:1, 3:1, 4:2, 5:1, 6:3}
triple_tuple = ([1,2,4], [1,2,6], [1,3,6])
I have a function match that takes in a list of numbers and a target number and I want to write a function that finds within the array two numbers that add to that target.
Here is my approach:
>>> def match(values, target=3):
... for i in values:
... for j in values:
... if j != i:
... if i + j == target:
... return print(f'{i} and {j}')
... return print('no matching pair')
Is this solution valiant? Can it be improved?
The best approach would result in O(NlogN) solution.
You sort the list, this will cost you O(NlogN)
Once the list is sorted you get two indices, the former points to the first element, the latter -- to the latest element and you check to see if the sum of the elements matches whatever is your target. If the sum is above the target, you move the upper index down, if the sum is below the target -- you move the lower index up. Finish when the upper index is equal to the lower index. This operation is linear and can be done in O(N) time.
All in all, you have O(NlogN) for the sorting and O(N) for the indexing, bringing the complexity of the whole solution to O(NlogN).
There is room for improvement. Right now, you have a nested loop. Also, you do not return when you use print.
As you iterate over values, you are getting the following:
values = [1, 2, 3]
target = 3
first_value = 1
difference: 3 - 1 = 2
We can see that in order for 1 to add up to 3, a 2 is required. Rather than iterating over the values, we can simply ask 2 in values.
def match(values, target):
values = set(values)
for value in values:
summand = target - value
if summand in values:
break
else:
print('No matching pair')
print(f'{value} and {summand}')
Edit: Converted values to a set since it has handles in quicker than if it were looking it up in a list. If you require the indices of these pairs, such as in the LeetCode problem you should not convert it to a set, since you will lose the order. You should also use enumerate in the for-loop to get the indices.
Edit: summand == value edge case
def match(values, target):
for i, value in enumerate(values):
summand = target - value
if summand in values[i + 1:]:
break
else:
print('No matching pair')
return
print(f'{value} and {summand}')