Fastest algorithm possible to pick number pairs

Fastest algorithm possible to pick number pairs - python

The question:
Given N integers [N<=10^5], count the total pairs of integers that have a difference of K. [K>0 and K<1e9]. Each of the N integers will be greater than 0 and at least K away from 2^31-1 (Everything can be done with 32 bit integers).
1st line contains N & K (integers).
2nd line contains N numbers of the set. All the N numbers are assured to be distinct.
Now the question is from hackerrank. I got a solution for the question but it doesn't satisfy the time-limit for all the sample test cases. I'm not sure if its possible to use another algorithm but I'm out of ideas. Will really appreciate if someone took a bit of time to check my code and give a tip or two.
temp = input()
temp = temp.split(" ")
N = int(temp[0])
K = int(temp[1])
num_array = input()
num_array = num_array.split(" ")
diff = 0
pairs= 0
i = 0
while(i < N):
num_array[i] = int(num_array[i])
i += 1
while(num_array != []):
j = 0
while(j < (len(num_array)-1)):
diff = abs(num_array[j+1] - num_array[0])
if(diff == K):
pairs += 1
j += 1
del num_array[0]
if(len(num_array) == 1):
break
print(pairs)

You can do this in aproximately linear time by following the procedure:
So, O(n) solution:
For each number x add it to hash-set H[x]
For each number x check whether x-k is in H, if yes - add 1 to answer
Or by using some balanced structure (like tree-based set) in O(nlgn)
This solution bases on the assumption that integers are distinct, if they are not you need to store the number of times element has been "added to set" and instead of adding 1 to answer - add the product of H[x]*H[x+k]
So in general you take some HashMap H with "default value 0"
For each number x update map: H[x] = H[x]+1
For each number x add to answer H[x]*H[x-k] (you don't have to check whether it is in the map, because if it is not, H[x-k]=0 )
and again - solution using hash-map is O(n) and using tree-map O(nlgn)
So given set of numbesr A, and number k (solution for distinct numbers):
H=set()
ans=0
for a in A:
H.add(a)
for a in A:
if a-k in H:
ans+=1
print ans
or shorter
H=set(A)
ans = sum(1 for a in A if a-k in H)
print ans

Use a dictionary (hash map).
Step 1: Fill the dictionary D with all entries from the array.
Step 2: Count occurences of all A[i] + k in the dictionary.
Dictionary<int> dict = new Dictionary<int>();
foreach (int n in num_array) do dict.Add(n);
int solitions = 0;
foreach (int n in num_Array) do
if dict.contains(n+k)
solutions += 1;
Filling a dictionary is O(1), Searching is O(1) as well. Doing it for each element in the array is O(n). This is as fast as it can get.
Sorry, you have to translate it to python, though.
EDIT: Same idea as the previous one. Sorry to post a duplicate. It's too late to remove my duplicate I guess.

Related

Optimal brute force solution for finding longest palindromic substring

This is my current approach:
def isPalindrome(s):
if (s[::-1] == s):
return True
return False
def solve(s):
l = len(s)
ans = ""
for i in range(l):
subStr = s[i]
for j in range(i + 1, l):
subStr += s[j]
if (j - i + 1 <= len(ans)):
continue
if (isPalindrome(subStr)):
ans = max(ans, subStr, key=len)
return ans if len(ans) > 1 else s[0]
print(solve(input()))
My code exceeds the time limit according to the auto scoring system. I've already spend some time to look up on Google, all of the solutions i found have the same idea with no optimization or using dynamic programming, but sadly i must and only use brute force to solve this problem. I was trying to break the loop earlier by skipping all the substrings that are shorter than the last found longest palindromic string, but still end up failing to meet the time requirement. Is there any other way to break these loops earlier or more time-efficient approach than the above?

With subStr += s[j], a new string is created over the length of the previous subStr. And with s[::-1], the substring from the previous offset j is copied over and over again. Both are inefficient because strings are immutable in Python and have to be copied as a new string for any string operation. On top of that, the string comparison in s[::-1] == s is also inefficient because you've already compared all of the inner substrings in the previous iterations and need to compare only the outermost two characters at the current offset.
You can instead keep track of just the index and the offset of the longest palindrome so far, and only slice the string upon return. To account for palindromes of both odd and even lengths, you can either increase the index by 0.5 at a time, or double the length to avoid having to deal with float-to-int conversions:
def solve(s):
length = len(s) * 2
index_longest = offset_longest = 0
for index in range(length):
offset = 0
for offset in range(1 + index % 2, min(index, length - index), 2):
if s[(index - offset) // 2] != s[(index + offset) // 2]:
offset -= 2
break
if offset > offset_longest:
index_longest = index
offset_longest = offset
return s[(index_longest - offset_longest) // 2: (index_longest + offset_longest) // 2 + 1]

Solved by using the approach "Expand Around Center", thanks #Maruthi Adithya

This modification of your code should improve performance. You can stop your code when the max possible substring is smaller than your already computed answer. Also, you should start your second loop with j+ans+1 instead of j+1 to avoid useless iterations :
def solve(s):
l = len(s)
ans = ""
for i in range(l):
if (l-i+1 <= len(ans)):
break
subStr = s[i:len(ans)]
for j in range(i + len(ans) + 1, l+1):
if (isPalindrome(subStr)):
ans = subStr
subStr += s[j]
return ans if len(ans) > 1 else s[0]

This is a solution that has a time complexity greater than the solutions provided.
Note: This post is to think about the problem better and does not specifically answer the question. I have taken a mathematical approach to find a time complexity greater than 2^L (where L is size of input string)
Note: This is a post to discuss potential algorithms. You will not find the answer here. And the logic shown here has not been proven extensively.
Do let me know if there is something that I haven't considered.
Approach: Create set of possible substrings. Compare and find the maximum pair* from this set that has the highest possible pallindrome.
Example case with input string: "abc".
In this example, substring set has: "a","b","c","ab","ac","bc","abc".
7 elements.
Comparing each element with all other elements will involve: 7^2 = 49 calculations.
Hence, input size is 3 & no of calculations is 49.
Time Complexity:
First compute time complexity for generating the substring set:
<img src="https://latex.codecogs.com/gif.latex?\sum_{a=1}^{L}\left&space;(&space;C_{a}^{L}&space;\right&space;)" title="\sum_{a=1}^{L}\left ( C_{a}^{L} \right )" />
(The math equation is shown in the code snippet)
Here, we are adding all the different substring size combination from the input size L.
To make it clear: In the above example input size is 3. So we find all the pairs with size =1 (i.e: "a","b","c"). Then size =2 (i.e: "ab","ac","bc") and finally size = 3 (i.e: "abc").
So choosing 1 character from input string = combination of taking L things 1 at a time without repetition.
In our case number of combinations = 3.
This can be mathematically shown as (where a = 1):
<img src="https://latex.codecogs.com/gif.latex?C_{a}^{L}" title="C_{a}^{L}" />
Similarly choosing 2 char from input string = 3
Choosing 3 char from input string = 1
Finding time complexity of palindrome pair from generated set with maximum length:
Size of generated set: N
For this we have to compare each string in set with all other strings in set.
So N*N, or 2 for loops. Hence the final time complexity is:
<img src="https://latex.codecogs.com/gif.latex?\sum_{a=1}^{L}\left&space;(&space;C_{a}^{L}&space;\right&space;)^{2}" title="\sum_{a=1}^{L}\left ( C_{a}^{L} \right )^{2}" />
This is diverging function greater than 2^L for L > 1.
However, there can be multiple optimizations applied to this. For example: there is no need to compare "a" with "abc" as "a" will also be compared with "a". Even if this optimization is applied, it will still have a time complexity > 2^L (For the most cases).
Hope this gave you a new perspective to the problem.
PS: This is my first post.

You should not find the string start from the beginning of that string, but you should start from the middle of it & expand the current string
For example, for the string xyzabccbalmn, your solution will cost ~ 6 * 11 comparison but searching from the middle will cost ~ 11 * 2 + 2 operations
But anyhow, brute-forcing will never ensure that your solution will run fast enough for any arbitrary string.

Try this:
def solve(s):
if len(s)==1:
print(0)
return '1'
if len(s)<=2 and not(isPalindrome(s)):
print (0)
return '1'
elif isPalindrome(s):
print( len(s))
return '1'
elif isPalindrome(s[0:len(s)-1]) or isPalindrome(s[1:len(s)]):
print (len(s)-1)
return '1'
elif len(s)>=2:
solve(s[0:len(s)-1])
return '1'
return 0

Guidance on removing a nested for loop from function

I'm trying to write the fastest algorithm possible to return the number of "magic triples" (i.e. x, y, z where z is a multiple of y and y is a multiple of x) in a list of 3-2000 integers.
(Note: I believe the list was expected to be sorted and unique but one of the test examples given was [1,1,1] with the expected result of 1 - that is a mistake in the challenge itself though because the definition of a magic triple was explicitly noted as x < y < z, which [1,1,1] isn't. In any case, I was trying to optimise an algorithm for sorted lists of unique integers.)
I haven't been able to work out a solution that doesn't include having three consecutive loops and therefore being O(n^3). I've seen one online that is O(n^2) but I can't get my head around what it's doing, so it doesn't feel right to submit it.
My code is:
def solution(l):
if len(l) < 3:
return 0
elif l == [1,1,1]:
return 1
else:
halfway = int(l[-1]/2)
quarterway = int(halfway/2)
quarterIndex = 0
halfIndex = 0
for i in range(len(l)):
if l[i] >= quarterway:
quarterIndex = i
break
for i in range(len(l)):
if l[i] >= halfway:
halfIndex = i
break
triples = 0
for i in l[:quarterIndex+1]:
for j in l[:halfIndex+1]:
if j != i and j % i == 0:
multiple = 2
while (j * multiple) <= l[-1]:
if j * multiple in l:
triples += 1
multiple += 1
return triples
I've spent quite a lot of time going through examples manually and removing loops through unnecessary sections of the lists but this still completes a list of 2,000 integers in about a second where the O(n^2) solution I found completes the same list in 0.6 seconds - it seems like such a small difference but obviously it means mine takes 60% longer.
Am I missing a really obvious way of removing one of the loops?
Also, I saw mention of making a directed graph and I see the promise in that. I can make the list of first nodes from the original list with a built-in function, so in principle I presume that means I can make the overall graph with two for loops and then return the length of the third node list, but I hit a wall with that too. I just can't seem to make progress without that third loop!!

from array import array
def num_triples(l):
n = len(l)
pairs = set()
lower_counts = array("I", (0 for _ in range(n)))
upper_counts = lower_counts[:]
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[i] += 1
upper_counts[j] += 1
return sum(nx * nz for nz, nx in zip(lower_counts, upper_counts))
Here, lower_counts[i] is the number of pairs of which the ith number is the y, and z is the other number in the pair (i.e. the number of different z values for this y).
Similarly, upper_counts[i] is the number of pairs of which the ith number is the y, and x is the other number in the pair (i.e. the number of different x values for this y).
So the number of triples in which the ith number is the y value is just the product of those two numbers.
The use of an array here for storing the counts is for scalability of access time. Tests show that up to n=2000 it makes negligible difference in practice, and even up to n=20000 it only made about a 1% difference to the run time (compared to using a list), but it could in principle be the fastest growing term for very large n.

How about using itertools.combinations instead of nested for loops? Combined with list comprehension, it's cleaner and much faster. Let's say l = [your list of integers] and let's assume it's already sorted.
from itertools import combinations
def div(i,j,k): # this function has the logic
return l[k]%l[j]==l[j]%l[i]==0
r = sum([div(i,j,k) for i,j,k in combinations(range(len(l)),3) if i<j<k])

#alaniwi provided a very smart iterative solution.
Here is a recursive solution.
def find_magicals(lst, nplet):
"""Find the number of magical n-plets in a given lst"""
res = 0
for i, base in enumerate(lst):
# find all the multiples of current base
multiples = [num for num in lst[i + 1:] if not num % base]
res += len(multiples) if nplet <= 2 else find_magicals(multiples, nplet - 1)
return res
def solution(lst):
return find_magicals(lst, 3)
The problem can be divided into selecting any number in the original list as the base (i.e x), how many du-plets we can find among the numbers bigger than the base. Since the method to find all du-plets is the same as finding tri-plets, we can solve the problem recursively.
From my testing, this recursive solution is comparable to, if not more performant than, the iterative solution.

This answer was the first suggestion by #alaniwi and is the one I've found to be the fastest (at 0.59 seconds for a 2,000 integer list).
def solution(l):
n = len(l)
lower_counts = dict((val, 0) for val in l)
upper_counts = lower_counts.copy()
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[lower] += 1
upper_counts[upper] += 1
return sum((lower_counts[y] * upper_counts[y] for y in l))
I think I've managed to get my head around it. What it is essentially doing is comparing each number in the list with every other number to see if the smaller is divisible by the larger and makes two dictionaries:
One with the number of times a number is divisible by a larger
number,
One with the number of times it has a smaller number divisible by
it.
You compare the two dictionaries and multiply the values for each key because the key having a 0 in either essentially means it is not the second number in a triple.
Example:
l = [1,2,3,4,5,6]
lower_counts = {1:5, 2:2, 3:1, 4:0, 5:0, 6:0}
upper_counts = {1:0, 2:1, 3:1, 4:2, 5:1, 6:3}
triple_tuple = ([1,2,4], [1,2,6], [1,3,6])

Code challenge: finding the divisible in a list

I am playing a code challenge. Simply speaking, the problem is:
Given a list L (max length is of the order of 1000) containing positive integers.
Find the number of "Lucky Triples", which is L[i] divides L[j], and L[j] divides L[k].
for example, [1,2,3,4,5,6] should give the answer 3 because [1,2,4], [1,2,6],[1,3,6]
My attempt:
Sort the list. (let say there are n elements)
3 For loops: i, j, k (i from 1 to n-2), (j from i+1 to n-1), (k from j+1 to n)
only if L[j] % L[i] == 0, the k for loop will be executed
The algorithm seems to give the correct answer. But the challenge said that my code exceeded the time limit. I tried on my computer for the list [1,2,3,...,2000], count = 40888(I guess it is correct). The time is around 5 second.
Is there any faster way to do that?
This is the code I have written in python.
def answer(l):
l.sort()
cnt = 0
if len(l) == 2:
return cnt
for i in range(len(l)-2):
for j in range(1,len(l)-1-i):
if (l[i+j]%l[i] == 0):
for k in range(1,len(l)-j-i):
if (l[i+j+k]%l[i+j] == 0):
cnt += 1
return cnt

You can use additional space to help yourself. After you sort the input list you should make a map/dict where the key is each element in the list and value is a list of elements which are divisible by that in the list so you would have something like this
assume sorted list is list = [1,2,3,4,5,6] your map would be
1 -> [2,3,4,5,6]
2-> [4,6]
3->[6]
4->[]
5->[]
6->[]
now for every key in the map you find what it can divide and then you find what that divides, for example you know that
1 divides 2 and 2 divides 4 and 6, similarly 1 divides 3 and 3 divides 6
the complexity of sorting should be O(nlogn) and that of constructing the list should be better than O(n^2) (but I am not sure about this part) and then I am not sure about the complexity of when you are actually checking for multiples but I think this should be much much faster than a brute force O(n^3)
If someone could help me figure out the time complexity of this I would really appreciate it
EDIT :
You can make the map creation part faster by incrementing by X (and not 1) where X is the number in the list you are currently on since it is sorted.

Thank you guys for all your suggestions. They are brilliant. But it seems that I still can't pass the speed test or I cannot handle with duplicated elements.
After discussing with my friend, I have just come up with another solution. It should be O(n^2) and I passed the speed test. Thanks all!!
def answer(lst):
lst.sort()
count = 0
if len(lst) == 2:
return count
#for each middle element, count the divisors at the front and the multiples at the back. Then multiply them.
for i, middle in enumerate(lst[1:len(lst)-1], start = 1):
countfirst = 0
countthird = 0
for first in (lst[0:i]):
if middle % first == 0:
countfirst += 1
for third in (lst[i+1:]):
if third % middle == 0:
countthird += 1
count += countfirst*countthird
return count

I guess sorting the list is pretty inefficient. I would rather try to iteratively reduce the number of candidates. You could do that in two steps.
At first filter all numbers that do not have a divisor.
from itertools import combinations
candidates = [max(pair) for pair in combinations(l, 2) if max(pair)%min(pair) == 0]
After that, count the number of remaining candidates, that do have a divisor.
result = sum(max(pair)%min(pair) == 0 for pair in combinations(candidates, 2))

Your original code, for reference.
def answer(l):
l.sort()
cnt = 0
if len(l) == 2:
return cnt
for i in range(len(l)-2):
for j in range(1,len(l)-1-i):
if (l[i+j]%l[i] == 0):
for k in range(1,len(l)-j-i):
if (l[i+j+k]%l[i+j] == 0):
cnt += 1
return cnt
There are a number of misimplementations here, and with just a few tweaks we can probably get this running much faster. Let's start:
def answer(lst): # I prefer not to use `l` because it looks like `1`
lst.sort()
count = 0 # use whole words here. No reason not to.
if len(lst) == 2:
return count
for i, first in enumerate(lst):
# using `enumerate` here means you can avoid ugly ranges and
# saves you from a look up on the list afterwards. Not really a
# performance hit, but definitely looks and feels nicer.
for j, second in enumerate(lst[i+1:], start=i+1):
# this is the big savings. You know since you sorted the list that
# lst[1] can't divide lst[n] if n>1, but your code still starts
# searching from lst[1] every time! Enumerating over `l[i+1:]`
# cuts out a lot of unnecessary burden.
if second % first == 0:
# see how using enumerate makes that look nicer?
for third in lst[j+1:]:
if third % second == 0:
count += 1
return count
I bet that on its own will pass your speed test, but if not, you can check for membership instead. In fact, using a set here is probably a great idea!
def answer2(lst):
s = set(lst)
limit = max(s) # we'll never have a valid product higher than this
multiples = {} # accumulator for our mapping
for n in sorted(s):
max_prod = limit // n # n * (max_prod+1) > limit
multiples[n] = [n*k for k in range(2, max_prod+1) if n*k in s]
# in [1,2,3,4,5,6]:
# multiples = {1: [2, 3, 4, 5, 6],
# 2: [4, 6],
# 3: [6],
# 4: [],
# 5: [],
# 6: []}
# multiples is now a mapping you can use a Depth- or Breadth-first-search on
triples = sum(1 for j in multiples
for k in multiples.get(j, [])
for l in multiples.get(k, []))
# This basically just looks up each starting value as j, then grabs
# each valid multiple and assigns it to k, then grabs each valid
# multiple of k and assigns it to l. For every possible combination there,
# it adds 1 more to the result of `triples`
return triples

I'll give you just an idea, the implementation should be up to you:
Initialize the global counter to zero.
Sort the list, starting with smallest number.
Create a list of integers (one entry per number with same index).
Iterate through each number (index i), and do the following:
Check for dividers at positions 0 to i-1.
Store the number of dividers in the list at the position i.
Fetch the number of dividers from the list for each divider, and add each number to the global counter.
Unless you finished, go to 3rd.
Your result should be in the global counter.

Missing Term Arithmetic Progression - Clean up my code

I just tried a little online programming quiz that asked me to solve this problem as quickly as possible. I got the right answer but I know it isn't pretty. I'm trying to become a better programmer and write cleaner, more efficient code so please give me some tips. I've included the description below. PS I think this algorithm fails for the case N=3
# Enter your code here. Read input from STDIN. Print output to STDOUT
import sys
N= int(sys.stdin.readline())
stringdata = sys.stdin.readline()
array = stringdata.split(' ')
diff1=[0]*(N-1)
diff2 = [0]*(N-2)
index = 0
diff = 0
for i in range(0,len(array)-1):
first_diff[i] = int(array[i+1])-int(array[i])
for i in range(0,len(diff1)-1):
second_diff[i] = first_diff[i+1]-first_diff[i]
if second_diff[i] == 0:
diff = first_diff[i]
else:
index = i
print(int(array[index])+diff)
Task: Find the missing term in an Arithmetic Progression.
An Arithmetic Progression is defined as one in which there is a constant difference between the consecutive terms of a given series of numbers. You are provided with consecutive elements of an Arithmetic Progression. There is however one hitch: Exactly one term from the original series is missing from the set of numbers which have been given to you. The rest of the given series is the same as the original AP. Find the missing term.
Input Format
The first line contains an Integer N, which is the number of terms which will be provided as input.
This is followed by N consecutive Integers, with a space between each pair of integers. All of these are on one line, and they are in AP (other than the point where an integer is missing).
Output Format
One Number which is the missing integer from the series.
Sample Input
5
1 3 5 9 11
Sample Output
7

I think this code can be somewhat simplified. First, the input. Not much different, except I use raw_input (or input in Python 3), and I immediately map the numbers to int.
n = int(raw_input("Number of Numbers: "))
s = raw_input("List of Numbers, space-separated: ")
nums = map(int, s.split())
assert n == len(nums) and n > 2
Now for the interesting part: Note that (assuming the list is well-formed) there can just be two differences between numbers: Either the correct difference, or two times that difference. I use a list comprehension to create a list of tuples (difference, at index). Now I can simply use the builtin max function to find the one with two times the correct difference and the respective index (d2, index) and calculate the missing number.
diffs = [(nums[i+1] - nums[i], i) for i in range(n-1)]
(d2, index) = max(diffs)
print nums[index] + d2 / 2
But the question was about coding style, not about the algorithm, so here are my thoughts:
add some blank lines and comments between logical blocks of your program (e.g. # read input)
map the array to int once, instead of casting the numbers each time you need them
you can use a list comprehension to create diff1 (aka first_diff), as in my example
you don't need diff2 at all; just write if diff1[i+1] - diff1[i] == 0:
be concise: range(0,len(array)-1) is the same as range(N-1)

Works for
1) Any value of N (given 5 in example)
2) Any Difference between terms (given 2 in example)
3) Difference can be + as well as - (example: 11 5 2 -1 -4)
int diff[]= new int[length-1];
for(int i = 0; i<length-1;i++){
diff[i] = n1[i+1]-n1[i];
//System.out.println(diff[i]);
if(i!=0){
if(diff[i]<diff[i-1]){
if(diff[i]<0)
System.out.println(n1[i]+diff[i-1]);
else
System.out.println(n1[i-1]+diff[i]);
break;
}
if(diff[i]>diff[i-1]){
if(diff[i]<0)
System.out.println(n1[i-1]+diff[i]);
else
System.out.println(n1[i]+diff[i-1]);
break;
}
}
}
n1 is where you store the number array from String.
Length is how many numbers you are providing.
This is optimized so that if you miss number in between first two numbers then it only loops 3 times no matter how many numbers you have given

Its very simple , review the code below and if you removed the blank lines it will be exactly 8 lines
I hope this answer is clear for you
import re
N = int(raw_input()) #Number of Terms
I = raw_input() #The Series of Numbers received as a String
I = re.findall(r'\d+',I) #Extract items from the string
I = [int(s) for s in I] #I is a list with Series of Integers
for x in range(N-1):
if (I[x]+2 != I[x+1]):
print I[x]+2

int a[]={1,3,5,7,11};
int i=0,n=5,fd,sd;
printf("output:\n");
do
{
fd=0;sd=0;
fd=a[i+1]-a[i];
sd=a[i+2]-a[i+1];
if(fd<sd)
{
printf("missing term is %d",fd+a[i+1]);
}
else if(fd>sd){
printf("missing term is %d",a[i]+sd);}
else{
i++;}
}while((fd==sd)&&i<n-2);

N = input()
max_num = range(N)
s = raw_input()
AP = map(int,s.split())
comm_dif = AP[1]-AP[0]
length = len(AP)
for i in range(N):
if i != length-1:
if AP[i+1]-AP[i] != comm_dif:
print AP[i]+comm_dif
INPUT:
5
1 21 31 51 61
OUTPUT:
41

Here is my code, work for both positive difference and negative difference...
def find_arith(aplist):
idiff=[]
flag=0
for j in range(0, len(aplist)-1):
diff1 = aplist[j+1] - aplist[j]
if diff1 < 0:
flag=1
idiff.append(abs(diff1))
if flag==1:
final_diff=-1*min(idiff)
else:
final_diff=min(idiff)
print(idiff)
print("final diff:", final_diff)
for i in range(aplist[0],aplist[len(aplist)-1]+1,final_diff):
if i not in aplist:
print(i)
if __name__ == "__main__":
print("First Result")
find_arith([13,21,25,33,37,45])
print("Second Result")
find_arith([-10,-6,-4,-2])
print("3rd Result")
find_arith([-5, -1, 3, 11])
print("4th Result")
find_arith([1, 5, 13, 17, 21])
print("5th Result")
find_arith([-2, -8, -11, -14, -17, -20, -23, -29])

2 For Loops for a simple problems such as this !! that solution above is Quadratic in its behavior !!
Here is one solution which is O(N) for the worst case behavior where the item # index 1 is missing and for any item missing after index 1 the solution is better than linear.
The Arithmetic Progression(input Array) to this method, Substitute the SYSOUT with returns appropriately.
solution:
public static int findMissingNumberInAP(int[] ipArr)
{
// ipArr will always be more than 3 elements in size.
int maxDiff = ipArr[1] - ipArr[0];
int i=0;
while(i<ipArr.length-1)
{
if((ipArr[i+1] - ipArr[i]) > maxDiff)
break;
i++;
}
// This means the 2nd element or i=1 was missing so add ip[0] to
// any random difference you are good to go.
if(i == ipArr.length - 1)
System.out.println(ipArr[0] + (ipArr[ipArr.length-1]-ipArr[ipArr.length-2]));
else System.out.println(ipArr[i] + maxDiff);
// Else just add the maxDiff you got from first step to the position
// of i you broke the loop at.
return -1;
}

Subset sum Problem

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:
import random
from time import time
from itertools import product
time0 = time()
# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
return [[0]*b for x in xrange(a)]
# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
return [random.randrange(lower,upper+1) for i in range(num)]
# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
N_list = []
P_list = []
for x in A:
if x < 0:
N_list.append(x)
elif x > 0:
P_list.append(x)
return [sum(N_list), sum(P_list)]
# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
if n < 0:
return 0
try:
return table[n][m - N]
except:
return 0
# same definition as above
def set_element(table, n, m, N, value):
table[n][m - N] = value
# input array
#A = [1, -3, 2, 4]
A = random_ints(200)
[N, P] = split_sum(A)
# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)
# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)
# iterate through each table element
#for i in xrange(1, m): #row
# for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
#set_element(table, i, s, N, 1)
table[i][s - N] = 1
# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
s = s - A[i]
solution.append(A[i])
print "Solution: ",solution
time1 = time()
print "Time execution: ", time1 - time0

I'm not quite sure if your solution is exact or a PTA (poly-time approximation).
But, as someone pointed out, this problem is indeed NP-Complete.
Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.
Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:
2^59 ops --> 2^59 seconds --> 2^26 years --> 1 year
-------------- ---------------
10.000.000.000 3600 x 24 x 365
You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.
On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory.
And even much larger than the few tera-bytes on your hard drive.
( That's for small values of the bound for the value of the elements in the set )
May be this is the case of your Dynamic programing algorithm.
It seemed to me that you were using a bound of 1000 when building your initialization matrix.
You can try a smaller bound. That is... if your input is consistently consist of small values.
Good Luck!

Someone on Hacker News came up with the following solution to the problem, which I quite liked. It just happens to be in python :):
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I spent a few minutes with it and it worked very well.

An interesting article on optimizing python code is available here. Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.
Hope that helps! Cheers

, 1st eye catch
def split_sum(A):
N_list = 0
P_list = 0
for x in A:
if x < 0:
N_list+=x
elif x > 0:
P_list+=x
return [N_list, P_list]
Some advices:
Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon. This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)
Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.

The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
nums = input("Enter the Elements").strip().split()
inputSum = int(input("Enter the Sum You want"))
for i, combo in enumerate(powerset(nums), 1):
sum = 0
for num in combo:
sum += int(num)
if sum == inputSum:
print(combo)
The Input Output is as Follows:
Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')

Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).
def subsetsum(cs,k,r,x,w,d):
x[k]=1
if(cs+w[k]==d):
for i in range(0,k+1):
if x[i]==1:
print (w[i],end=" ")
print()
elif cs+w[k]+w[k+1]<=d :
subsetsum(cs+w[k],k+1,r-w[k],x,w,d)
if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
x[k]=0
subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]
subsetsum(0,0,sum(w),x,w,7)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fastest algorithm possible to pick number pairs - python

Related

Optimal brute force solution for finding longest palindromic substring

Guidance on removing a nested for loop from function

Code challenge: finding the divisible in a list

Missing Term Arithmetic Progression - Clean up my code

Subset sum Problem

Categories

Resources