Related
I know how to check if the number can be represented as the sum of two squares with a brute-force approach.
def sumSquare( n) :
i = 1
while i * i <= n :
j = 1
while(j * j <= n) :
if (i * i + j * j == n) :
print(i, "^2 + ", j , "^2" )
return True
j = j + 1
i = i + 1
return False
But how to do it for n distinct positive integers. So the question would be:
Function which checks if the number can be written as sum of 'n' different squares
I have some examples.
For e.g.
is_sum_of_squares(18, 2) would be false because 18 can be written as the sum of two squares (3^2 + 3^2) but they are not distinct.
(38,3) would be true because 5^2+3^2+2^2 = 38 and 5!=3!=2.
I can't extend the if condition for more values. I think it could be done with recursion, but I have problems with it.
I found this function very useful since it finds the number of squares the number can be split into.
def findMinSquares(n):
T = [0] * (n + 1)
for i in range(n + 1):
T[i] = i
j = 1
while j * j <= i:
T[i] = min(T[i], 1 + T[i - j * j])
j += 1
return T[n]
But again I can't do it with recursion. Sadly I can't wrap my head around it. We started learning it a few weeks ago (I am in high school) and it is so different from the iterative approach.
Recursive approach:
def is_sum_of_squares(x, n, used=None):
x_sqrt = int(x**0.5)
if n == 1:
if x_sqrt**2 == x:
return used.union([x_sqrt])
return None
used = used or set()
for i in set(range(max(used, default=0)+1, int((x/n)**0.5))):
squares = is_sum_of_squares(x-i**2, n-1, used.union([i]))
if squares:
return squares
return None
Quite a compelling exercise. I have attempted solving it using recursion in a form of backtracking. Start with an empty list, run a for loop to add numbers to it from 1 to max feasible (square root of target number) and for each added number continue with recursion. Once the list reaches the required size n, validate the result. If the result is incorrect, backtrack by removing the last number.
Not sure if it is 100% correct though. In terms of speed, I tried it on the (1000,13) input and the process finished reasonably fast (3-4s).
def is_sum_of_squares(num, count):
max_num = int(num ** 0.5)
return backtrack([], num, max_num, count)
def backtrack(candidates, target, max_num, count):
"""
candidates = list of ints of max length <count>
target = sum of squares of <count> nonidentical numbers
max_num = square root of target, rounded
count = desired size of candidates list
"""
result_num = sum([x * x for x in candidates]) # calculate sum of squares
if result_num > target: # if sum exceeded target number stop recursion
return False
if len(candidates) == count: # if candidates reach desired length, check if result is valid and return result
result = result_num == target
if result: # print for result sense check, can be removed
print("Found: ", candidates)
return result
for i in range(1, max_num + 1): # cycle from 1 to max feasible number
if candidates and i <= candidates[-1]:
# for non empty list, skip numbers smaller than the last number.
# allow only ascending order to eliminate duplicates
continue
candidates.append(i) # add number to list
if backtrack(candidates, target, max_num, count): # next recursion
return True
candidates.pop() # if combination was not valid then backtrack and remove the last number
return False
assert(is_sum_of_squares(38, 3))
assert(is_sum_of_squares(30, 3))
assert(is_sum_of_squares(30, 4))
assert(is_sum_of_squares(36, 1))
assert not(is_sum_of_squares(35, 1))
assert not(is_sum_of_squares(18, 2))
assert not(is_sum_of_squares(1000, 13))
How can I get this to print all triplets that have a sum less than or equal to a target? Currently this returns triplets that are = to the target. I've tried to change and think but can't figure out
def triplets(nums):
# Sort array first
nums.sort()
output = []
# We use -2 because at this point the left and right pointers will be at same index
# For example [1,2,3,4,5] current index is 4 and left and right pointer will be at 5, so we know we cant have a triplet
# _ LR
for i in range(len(nums) - 2):
# check if current index and index -1 are same if same continue because we need distinct results
if i > 0 and nums[i] == nums[i - 1]:
continue
left = i + 1
right = len(nums) - 1
while left < right:
currentSum = nums[i] + nums[left] + nums[right]
if currentSum <= 8:
output.append([nums[i], nums[left], nums[right]])
# below checks again to make sure index isnt same with adjacent index
while left < right and nums[left] == nums[left + 1]:
left += 1
while left < right and nums[right] == nums[right - 1]:
right -= 1
# In this case we have to change both pointers since we found a solution
left += 1
right -= 1
elif currentSum > 8:
left += 1
else:
right -= 1
return output
So for example input array is [1,2,3,4,5] we will get the result (1,2,3),(1,2,4),(1,2,5),(1,3,4) Because these have a sum of less than or equal to target of 8.
The main barrier to small changes to your code to solve the new problem is that your original goal of outputting all distinct triplets with sum == target can be solved in O(n^2) time using two loops, as in your algorithm. The size of the output can be of size proportional to n^2, so this is optimal in a certain sense.
The problem of outputting all distinct triplets with sum <= target, cannot always be solved in O(n^2) time, since the output can have size proportional to n^3; for example, with an array nums = [1,2,...,n], target = n^2 + 1, the answer is all possible triples of elements. So your algorithm has to change in a way equivalent to adding a third loop.
One O(n^3) solution is shown below. Being a bit more clever about filtering duplicate elements (like using a hashmap and working with frequencies), this should be improvable to O(max(n^2, H)) where H is the size of your output.
def triplets(nums, target=8):
nums.sort()
output = set()
for i, first in enumerate(nums[:-2]):
if first * 3 > target:
break
# Filter some distinct results
if i + 3 < len(nums) and first == nums[i + 3]:
continue
for j, second in enumerate(nums[i + 1:], i + 1):
if first + 2 * second > target:
break
if j + 2 < len(nums) and second == nums[j + 2]:
continue
for k, third in enumerate(nums[j + 1:], j + 1):
if first + second + third > target:
break
if k + 1 < len(nums) and third == nums[k + 1]:
continue
output.add((first, second, third))
return list(map(list, output))
A common algorithm for solving the problem of finding the median of two sorted arrays of size m and n is to:
Run binary search to adjust "a cut" of the smaller array in two halves. When doing so, we adjust the cut of the larger array to make sure the total number of elements on the first halves of both arrays equals the total number of elements in the second halves of both arrays, which is a pre-condition for splitting both arrays around the median.
The binary search shifts the cuts left or right until all elements on the left halves <= all elements on the right halves.
At the end of the procedure, we can readily compute the median with a basic comparison of the elements on the boundary of the cuts of both arrays.
While I understand at a high level the algorithm, I'm not sure I understand why one needs to do the calculation on the smaller array, and adjust the larger array, as opposed to the other way around.
Here's a video explaining the algorithm, but the author doesn't explain exactly why we use the smaller array to drive the binary search.
I'm also including below Python code that is supposed to solve the problem, mostly to make the post self-contained, even if it's not well documented.
def median(A, B):
m, n = len(A), len(B)
if m > n:
## Making sure that A refers to the smaller array
A, B, m, n = B, A, n, m
if n == 0:
raise ValueError
imin, imax, half_len = 0, m, (m + n + 1) / 2
while imin <= imax:
i = (imin + imax) / 2
j = half_len - i
if i < m and B[j-1] > A[i]:
# i is too small, must increase it
imin = i + 1
elif i > 0 and A[i-1] > B[j]:
# i is too big, must decrease it
imax = i - 1
else:
# i is perfect
if i == 0: max_of_left = B[j-1]
elif j == 0: max_of_left = A[i-1]
else: max_of_left = max(A[i-1], B[j-1])
if (m + n) % 2 == 1:
return max_of_left
if i == m: min_of_right = B[j]
elif j == n: min_of_right = A[i]
else: min_of_right = min(A[i], B[j])
return (max_of_left + min_of_right) / 2.0
By enforcing m <= n, we make sure both i and j are always non-negative.
Also, we are able to reduce some redundant boundary checks in the while loop when working with i and j.
Take the first if condition in the while loop as an example, the code checks for i < m before accessing A[i], but why wouldn't it also check for j-1 >= 0 before accessing B[j-1]? This is because i falls into [0, m], and j = (m + n + 1) / 2 - i, so when i is the largest, j is the smallest.
When i < m, j = (m + n + 1)/2 - i > (m + n + 1)/2 - m = n/2 - m/2 + 1/2 >= 0. So j must be positive when i < m, and j - 1 >= 0.
Similarly, in the second if condition in the while loop, when i > 0, j is guaranteed to be less than n.
To verify this idea, you can try removing the size check and swap logic at the top, and run through below example input, in which A is longer than B.
[1,2,3,4,6]
[5]
Given an array of integers size N, how can you efficiently find a subset of size K with elements that are closest to each other?
Let the closeness for a subset (x1,x2,x3,..xk) be defined as:
2 <= N <= 10^5
2 <= K <= N
constraints: Array may contain duplicates and is not guaranteed to be sorted.
My brute force solution is very slow for large N, and it doesn't check if there's more than 1 solution:
N = input()
K = input()
assert 2 <= N <= 10**5
assert 2 <= K <= N
a = []
for i in xrange(0, N):
a.append(input())
a.sort()
minimum = sys.maxint
startindex = 0
for i in xrange(0,N-K+1):
last = i + K
tmp = 0
for j in xrange(i, last):
for l in xrange(j+1, last):
tmp += abs(a[j]-a[l])
if(tmp > minimum):
break
if(tmp < minimum):
minimum = tmp
startindex = i #end index = startindex + K?
Examples:
N = 7
K = 3
array = [10,100,300,200,1000,20,30]
result = [10,20,30]
N = 10
K = 4
array = [1,2,3,4,10,20,30,40,100,200]
result = [1,2,3,4]
Your current solution is O(NK^2) (assuming K > log N). With some analysis, I believe you can reduce this to O(NK).
The closest set of size K will consist of elements that are adjacent in the sorted list. You essentially have to first sort the array, so the subsequent analysis will assume that each sequence of K numbers is sorted, which allows the double sum to be simplified.
Assuming that the array is sorted such that x[j] >= x[i] when j > i, we can rewrite your closeness metric to eliminate the absolute value:
Next we rewrite your notation into a double summation with simple bounds:
Notice that we can rewrite the inner distance between x[i] and x[j] as a third summation:
where I've used d[l] to simplify the notation going forward:
Notice that d[l] is the distance between each adjacent element in the list. Look at the structure of the inner two summations for a fixed i:
j=i+1 d[i]
j=i+2 d[i] + d[i+1]
j=i+3 d[i] + d[i+1] + d[i+2]
...
j=K=i+(K-i) d[i] + d[i+1] + d[i+2] + ... + d[K-1]
Notice the triangular structure of the inner two summations. This allows us to rewrite the inner two summations as a single summation in terms of the distances of adjacent terms:
total: (K-i)*d[i] + (K-i-1)*d[i+1] + ... + 2*d[K-2] + 1*d[K-1]
which reduces the total sum to:
Now we can look at the structure of this double summation:
i=1 (K-1)*d[1] + (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=2 (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=3 (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
...
i=K-2 2*d[K-2] + d[K-1]
i=K-1 d[K-1]
Again, notice the triangular pattern. The total sum then becomes:
1*(K-1)*d[1] + 2*(K-2)*d[2] + 3*(K-3)*d[3] + ... + (K-2)*2*d[K-2]
+ (K-1)*1*d[K-1]
Or, written as a single summation:
This compact single summation of adjacent differences is the basis for a more efficient algorithm:
Sort the array, order O(N log N)
Compute the differences of each adjacent element, order O(N)
Iterate over each N-K sequence of differences and calculate the above sum, order O(NK)
Note that the second and third step could be combined, although with Python your mileage may vary.
The code:
def closeness(diff,K):
acc = 0.0
for (i,v) in enumerate(diff):
acc += (i+1)*(K-(i+1))*v
return acc
def closest(a,K):
a.sort()
N = len(a)
diff = [ a[i+1] - a[i] for i in xrange(N-1) ]
min_ind = 0
min_val = closeness(diff[0:K-1],K)
for ind in xrange(1,N-K+1):
cl = closeness(diff[ind:ind+K-1],K)
if cl < min_val:
min_ind = ind
min_val = cl
return a[min_ind:min_ind+K]
itertools to the rescue?
from itertools import combinations
def closest_elements(iterable, K):
N = set(iterable)
assert(2 <= K <= len(N) <= 10**5)
combs = lambda it, k: combinations(it, k)
_abs = lambda it: abs(it[0] - it[1])
d = {}
v = 0
for x in combs(N, K):
for y in combs(x, 2):
v += _abs(y)
d[x] = v
v = 0
return min(d, key=d.get)
>>> a = [10,100,300,200,1000,20,30]
>>> b = [1,2,3,4,10,20,30,40,100,200]
>>> print closest_elements(a, 3); closest_elements(b, 4)
(10, 20, 30) (1, 2, 3, 4)
This procedure can be done with O(N*K) if A is sorted. If A is not sorted, then the time will be bounded by the sorting procedure.
This is based on 2 facts (relevant only when A is ordered):
The closest subsets will always be subsequent
When calculating the closeness of K subsequent elements, the sum of distances can be calculated as the sum of each two subsequent elements time (K-i)*i where i is 1,...,K-1.
When iterating through the sorted array, it is redundant to recompute the entire sum, we can instead remove K times the distance between the previously two smallest elements, and add K times the distance of the two new largest elements. this fact is being used to calculate the closeness of a subset in O(1) by using the closeness of the previous subset.
Here's the pseudo-code
List<pair> FindClosestSubsets(int[] A, int K)
{
List<pair> minList = new List<pair>;
int minVal = infinity;
int tempSum;
int N = A.length;
for (int i = K - 1; i < N; i++)
{
tempSum = 0;
for (int j = i - K + 1; j <= i; j++)
tempSum += (K-i)*i * (A[i] - A[i-1]);
if (tempSum < minVal)
{
minVal = tempSum;
minList.clear();
minList.add(new pair(i-K, i);
}
else if (tempSum == minVal)
minList.add(new pair(i-K, i);
}
return minList;
}
This function will return a list of pairs of indexes representing the optimal solutions (the starting and ending index of each solution), it was implied in the question that you want to return all solutions of the minimal value.
try the following:
N = input()
K = input()
assert 2 <= N <= 10**5
assert 2 <= K <= N
a = some_unsorted_list
a.sort()
cur_diff = sum([abs(a[i] - a[i + 1]) for i in range(K - 1)])
min_diff = cur_diff
min_last_idx = K - 1
for last_idx in range(K,N):
cur_diff = cur_diff - \
abs(a[last_idx - K - 1] - a[last_idx - K] + \
abs(a[last_idx] - a[last_idx - 1])
if min_diff > cur_diff:
min_diff = cur_diff
min_last_idx = last_idx
From the min_last_idx, you can calculate the min_first_idx. I use range to preserve the order of idx. If this is python 2.7, it will take linearly more RAM. This is the same algorithm that you use, but slightly more efficient (smaller constant in complexity), as it does less then summing all.
After sorting, we can be sure that, if x1, x2, ... xk are the solution, then x1, x2, ... xk are contiguous elements, right?
So,
take the intervals between numbers
sum these intervals to get the intervals between k numbers
Choose the smallest of them
My initial solution was to look through all the K element window and multiply each element by m and take the sum in that range, where m is initialized by -(K-1) and incremented by 2 in each step and take the minimum sum from the entire list. So for a window of size 3, m is -2 and the values for the range will be -2 0 2. This is because I observed a property that each element in the K window add a certain weight to the sum. For an example if the elements are [10 20 30] the sum is (30-10) + (30-20) + (20-10). So if we break down the expression we have 2*30 + 0*20 + (-2)*10. This can be achieved in O(n) time and the entire operation would be in O(NK) time. However it turns out that this solution is not optimal, and there are certain edge cases where this algorithm fails. I am yet to figure out those cases, but shared the solution anyway if anyone can figure out something useful from it.
for(i = 0 ;i <= n - k;++i)
{
diff = 0;
l = -(k-1);
for(j = i;j < i + k;++j)
{
diff += a[j]*l;
if(min < diff)
break;
l += 2;
}
if(j == i + k && diff > 0)
min = diff;
}
You can do this is O(n log n) time with a sliding window approach (O(n) if the array is already sorted).
First, suppose we've precomputed, at every index i in our array, the sum of distances from A[i] to the previous k-1 elements. The formula for that would be
(A[i] - A[i-1]) + (A[i] - A[i-2]) + ... + (A[i] - A[i-k+1]).
If i is less than k-1, we just compute the sum to the array boundary.
Suppose we also precompute, at every index i in our array, the sum of distances from A[i] to the next k-1 elements. Then we could solve the whole problem with a single pass of a sliding window.
If our sliding window is on [L, L+k-1] with closeness sum S, then the closeness sum for the interval [L+1, L+k] is just S - dist_sum_to_next[L] + dist_sum_to_prev[L+k]. The only changes in the sum of pairwise distances are removing all terms involving A[L] when it leaves our window, and adding all terms involving A[L+k] as it enters our window.
The only remaining part is how to compute, at a position i, the sum of distances between A[i] and the previous k-1 elements (the other computation is totally symmetric). If we know the distance sum at i-1, this is easy: subtract the distance from A[i-1] to A[i-k], and add in the extra distance from A[i-1] to A[i] k-1 times
dist_sum_to_prev[i] = (dist_sum_to_prev[i - 1] - (A[i - 1] - A[i - k])
+ (A[i] - A[i - 1]) * (k - 1)
Python code:
def closest_subset(nums: List[int], k: int) -> List[int]:
"""Given a list of n (poss. unsorted and non-unique) integers nums,
returns a (sorted) list of size k that minimizes the sum of pairwise
distances between all elements in the list.
Runs in O(n lg n) time, uses O(n) auxiliary space.
"""
n = len(nums)
assert len(nums) == n
assert 2 <= k <= n
nums.sort()
# Sum of pairwise distances to the next (at most) k-1 elements
dist_sum_to_next = [0] * n
# Sum of pairwise distances to the last (at most) k-1 elements
dist_sum_to_prev = [0] * n
for i in range(1, n):
if i >= k:
dist_sum_to_prev[i] = ((dist_sum_to_prev[i - 1] -
(nums[i - 1] - nums[i - k]))
+ (nums[i] - nums[i - 1]) * (k - 1))
else:
dist_sum_to_prev[i] = (dist_sum_to_prev[i - 1]
+ (nums[i] - nums[i - 1]) * i)
for i in reversed(range(n - 1)):
if i < n - k:
dist_sum_to_next[i] = ((dist_sum_to_next[i + 1]
- (nums[i + k] - nums[i + 1]))
+ (nums[i + 1] - nums[i]) * (k - 1))
else:
dist_sum_to_next[i] = (dist_sum_to_next[i + 1]
+ (nums[i + 1] - nums[i]) * (n-i-1))
best_sum = math.inf
curr_sum = 0
answer_right_bound = 0
for i in range(n):
curr_sum += dist_sum_to_prev[i]
if i >= k:
curr_sum -= dist_sum_to_next[i - k]
if curr_sum < best_sum and i >= k - 1:
best_sum = curr_sum
answer_right_bound = i
return nums[answer_right_bound - k + 1:answer_right_bound + 1]
For an algorithm I'm benchmarking I need to test some portion of a list (which could be very long, but is filled with 0's mostly and the occasional 1). The idea is that in a list of n items, with d of them being of interest, in expectation each is defective with probability d/n. So, check a group of size d/n (it's defined in terms of the floor and log functions for information theoretic reasons - it makes the analysis of the algorithm easier).
Algorithm:
1./ If n <= 2*d -2 (ie more than half the list is filled with 1s) just look at each item in turn
2./ If n > 2*d -2: Check a group of size aplha (= floor(binarylog(l/d), l = n - d + 1, d = number of 1s). If there is a 1, do binary search on the group to find the defective and set d = d - 1 and n = n - 1 - x (x = size of the group minus the defective). If there isn't a one, set n = n - groupSize and go to 1 (i.e. check the rest of the list).
However, when populating the list with 10 1s in random places, the algorithm find all but a single 1 and then continues to loop whilst checking an empty list.
I think the problem is that when discarding a group containing all 0s I'm not correctly modifying the reference that says where to start for the next round, and this is causing my algorithm to fail.
Here is the relevant part of the function:
import math
def binary_search(inList):
low = 0
high = len(inList)
while low < high:
mid = (low + high) // 2
upper = inList[mid:high]
lower = inList[low:mid]
if any(lower):
high = mid
elif any(upper):
low = mid + 1
elif mid == 1:
return mid
else:
# Neither side has a 1
return -1
return mid
def HGBSA(inList, num_defectives):
n = len(inList)
defectives = []
#initialising the start of the group to be tested
start = 0
while num_defectives > 0:
defective = 0
if(n <= (2*num_defectives - 2)):
for i in inList:
if i == 1:
num_defectives = num_defectives - 1
n = n - 1
defectives.append(i)
else:
#params to determine size of group
l = n - num_defectives + 1
alpha = int(math.floor(math.log(l/num_defectives, 2)))
groupSize = 2**alpha
end = start + groupSize
group = inList[start:end]
#print(groupSize)
#print(group)
if any(group):
defective = binary_search(group)
defective = start + defective
defectives.append(defective)
undefectives = [s for s in group if s != 1]
n = n - 1 - len(undefectives)
num_defectives = num_defectives - 1
print(defectives)
else:
n = n - groupSize
start = start + groupSize
print(defectives)
return defectives
Also here are the tests that the function currently passes:
from GroupTesting import HGBSA
#idenitify a single defective
inlist = [0]*1024
inlist[123] = 1
assert HGBSA(inlist, 1) == [123]
#identify two defectives
inlist = [0]*1024
inlist[123] = 1
inlist[789] = 1
assert inlist[123] == 1
assert inlist[789] == 1
assert HGBSA(inlist, 2) == [123, 789]
zeros = [0]*1024
ones = [1, 101, 201, 301, 401, 501, 601, 701, 801, 901]
for val in ones:
zeros[val] = 1
assert HGBSA(zeros, 10) == ones
I.e. it finds a single 1, 2 and 10 1s deterministically placed in the list, but this test:
zeros = [0] * 1024
ones = [1] * 10
l = zeros + ones
shuffle(l)
where_the_ones_are = [i for i, x in enumerate(l) if x == 1]
assert HGBSA(l, 10) == where_the_ones_are
Has exposed the bug.
This test also fails with the code above
#identify two defectives next to each other
inlist = [0]*1024
inlist[123] = 1
inlist[124] = 1
assert GT(inlist, 2) == [123, 124]
The following modification (discarding a whole group if it is undefective, but only discarding the members of a group before the defective) passes the 'two next to each other' test, but not the '10 in a row' or random tests:
def HGBSA(inList, num_defectives):
n = len(inList)
defectives = []
#initialising the start of the group to be tested
start = 0
while num_defectives > 0:
defective = 0
if(n <= (2*num_defectives - 2)):
for i in inList:
if i == 1:
num_defectives = num_defectives - 1
n = n - 1
defectives.append(i)
else:
#params to determine size of group
l = n - num_defectives + 1
alpha = int(math.floor(math.log(l/num_defectives, 2)))
groupSize = 2**alpha
end = start + groupSize
group = inList[start:end]
#print(groupSize)
#print(group)
if any(group):
defective = binary_search(group)
defective = start + defective
defectives.append(defective)
undefectives = [s for s in group if s != 1 in range(0, groupSize//2)]
print(len(undefectives))
n = n - 1 - len(undefectives)
num_defectives = num_defectives - 1
start = start + defective + 1
#print(defectives)
else:
n = n - groupSize
start = start + groupSize
print(defectives)
return defectives
I.e. the problem is when there are multiple 1s in a group being tested, and after the first none are being detected. The best test for the code to pass, would be the 1s uniformly distributed at random throughout the list and all defectives are found.
Also, how would I create tests to catch this kind of error in future?
Your algorithm seemingly has worse performance than a linear scan.
A naïve algorithm would just scan a piece of list the size of d/n in O(d/n).
defectives = [index for (index, element) in enumerate(inList[start:end], start)]
Common sense says that you can't possibly detect positions of all 1s in a list without looking at every element of the list once, and there's no point in looking at it more that once.
Your "binary search" uses any multiple times, effectively scanning pieces of the list multiple times. Same applies to constructs like if any(group): ... [s for s in group if ...] which scan group twice, first time needlessly.
If you described the actual algorithm you're trying to implement, people could help troubleshoot it. From your code and your post, the algorithm is unclear. The fact that your HGBSA function is long and not exactly commented unfortunately does not help understanding.
Don't be afraid to tell people here the details of what your algorithm is doing and why; we're sort of computer geeks here, too, we're going to understand :)