Is the time complexity of this code O(n^2)? - python

The problem finds two items in the array that add up to target value.
It returns an array w/ the index of the correct values.
I think the time complexity is n^2 because the while loop runs through array once so n time. And in the worst case, it has to repeat this n times. So n*n running time.
Even though the number of elements it has to iterate through decreases each time, we drop the constants when calc. time complexity.
Is this analysis correct?
Any recommendations for bringing it down to n?
def twoSum(nums, target):
indx = []
size = len(nums)
if (size < 2):
return indx
x = 0
y = size - 1
while(x < y):
if( (nums[x] + nums[y]) == target):
indx[0] = x
indx[1] = y
break
elif ( (y - 1) == x):
x = x + 1
y = size - 1
else:
y = y -1
return indx

You can do O(n), this is a google interview question that they have a video on YouTube for I believe. Or at least they had a very similar problem:
def twoSum(nums, target):
values = dict()
for index, n in enumerate(nums):
if target - n in values:
return values[target - n], index
else:
values[n] = index
print(twoSum([4, 5, 2, 1, 3], 4)) # (3, 4)
- Edit -
Per the comments below, this solution technically still has a worst case of O(n^2) do to hash collisions. For most cases you should get close to O(n) but if you are working with large numbers (negative or positive) you will see an increase in collisions which will result n * log(n) to n^2 time (especially if the test set given to you tries to target hash collisions).

Related

Guidance on removing a nested for loop from function

I'm trying to write the fastest algorithm possible to return the number of "magic triples" (i.e. x, y, z where z is a multiple of y and y is a multiple of x) in a list of 3-2000 integers.
(Note: I believe the list was expected to be sorted and unique but one of the test examples given was [1,1,1] with the expected result of 1 - that is a mistake in the challenge itself though because the definition of a magic triple was explicitly noted as x < y < z, which [1,1,1] isn't. In any case, I was trying to optimise an algorithm for sorted lists of unique integers.)
I haven't been able to work out a solution that doesn't include having three consecutive loops and therefore being O(n^3). I've seen one online that is O(n^2) but I can't get my head around what it's doing, so it doesn't feel right to submit it.
My code is:
def solution(l):
if len(l) < 3:
return 0
elif l == [1,1,1]:
return 1
else:
halfway = int(l[-1]/2)
quarterway = int(halfway/2)
quarterIndex = 0
halfIndex = 0
for i in range(len(l)):
if l[i] >= quarterway:
quarterIndex = i
break
for i in range(len(l)):
if l[i] >= halfway:
halfIndex = i
break
triples = 0
for i in l[:quarterIndex+1]:
for j in l[:halfIndex+1]:
if j != i and j % i == 0:
multiple = 2
while (j * multiple) <= l[-1]:
if j * multiple in l:
triples += 1
multiple += 1
return triples
I've spent quite a lot of time going through examples manually and removing loops through unnecessary sections of the lists but this still completes a list of 2,000 integers in about a second where the O(n^2) solution I found completes the same list in 0.6 seconds - it seems like such a small difference but obviously it means mine takes 60% longer.
Am I missing a really obvious way of removing one of the loops?
Also, I saw mention of making a directed graph and I see the promise in that. I can make the list of first nodes from the original list with a built-in function, so in principle I presume that means I can make the overall graph with two for loops and then return the length of the third node list, but I hit a wall with that too. I just can't seem to make progress without that third loop!!
from array import array
def num_triples(l):
n = len(l)
pairs = set()
lower_counts = array("I", (0 for _ in range(n)))
upper_counts = lower_counts[:]
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[i] += 1
upper_counts[j] += 1
return sum(nx * nz for nz, nx in zip(lower_counts, upper_counts))
Here, lower_counts[i] is the number of pairs of which the ith number is the y, and z is the other number in the pair (i.e. the number of different z values for this y).
Similarly, upper_counts[i] is the number of pairs of which the ith number is the y, and x is the other number in the pair (i.e. the number of different x values for this y).
So the number of triples in which the ith number is the y value is just the product of those two numbers.
The use of an array here for storing the counts is for scalability of access time. Tests show that up to n=2000 it makes negligible difference in practice, and even up to n=20000 it only made about a 1% difference to the run time (compared to using a list), but it could in principle be the fastest growing term for very large n.
How about using itertools.combinations instead of nested for loops? Combined with list comprehension, it's cleaner and much faster. Let's say l = [your list of integers] and let's assume it's already sorted.
from itertools import combinations
def div(i,j,k): # this function has the logic
return l[k]%l[j]==l[j]%l[i]==0
r = sum([div(i,j,k) for i,j,k in combinations(range(len(l)),3) if i<j<k])
#alaniwi provided a very smart iterative solution.
Here is a recursive solution.
def find_magicals(lst, nplet):
"""Find the number of magical n-plets in a given lst"""
res = 0
for i, base in enumerate(lst):
# find all the multiples of current base
multiples = [num for num in lst[i + 1:] if not num % base]
res += len(multiples) if nplet <= 2 else find_magicals(multiples, nplet - 1)
return res
def solution(lst):
return find_magicals(lst, 3)
The problem can be divided into selecting any number in the original list as the base (i.e x), how many du-plets we can find among the numbers bigger than the base. Since the method to find all du-plets is the same as finding tri-plets, we can solve the problem recursively.
From my testing, this recursive solution is comparable to, if not more performant than, the iterative solution.
This answer was the first suggestion by #alaniwi and is the one I've found to be the fastest (at 0.59 seconds for a 2,000 integer list).
def solution(l):
n = len(l)
lower_counts = dict((val, 0) for val in l)
upper_counts = lower_counts.copy()
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[lower] += 1
upper_counts[upper] += 1
return sum((lower_counts[y] * upper_counts[y] for y in l))
I think I've managed to get my head around it. What it is essentially doing is comparing each number in the list with every other number to see if the smaller is divisible by the larger and makes two dictionaries:
One with the number of times a number is divisible by a larger
number,
One with the number of times it has a smaller number divisible by
it.
You compare the two dictionaries and multiply the values for each key because the key having a 0 in either essentially means it is not the second number in a triple.
Example:
l = [1,2,3,4,5,6]
lower_counts = {1:5, 2:2, 3:1, 4:0, 5:0, 6:0}
upper_counts = {1:0, 2:1, 3:1, 4:2, 5:1, 6:3}
triple_tuple = ([1,2,4], [1,2,6], [1,3,6])

What is an efficient way of counting the number of unique multiplicative and additive pairs in a list of integers in Python?

Given a sorted array A = [n, n+1, n+2,... n+k] elements, I am trying to count the unique number of multiplicative and additive pairs such that the condition xy >= x+y is satisfied. Where x and y are indices of the list, and y > x.
Here is my minimum working example using a naive brute force approach:
def minimum_working_example(A):
A.sort()
N = len(A)
mpairs = []
x = 0
while x < N:
for y in range(N):
if x<y and (A[x]*A[y])>=(A[x]+A[y]):
mpairs.append([A[x], A[y]])
else:
continue
x+=1
return len(mpairs)
A = [1,2,3,4,5]
print(minimum_working_example(A))
#Output = 6, Unique pairs that satisfy xy >= x+y: (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)
However this approach has an exponential time complexity for large lists.
What sorting or searching algorithms exist that will allow me to implement a more efficient solution?
This question has a closed-form mathematical solution, but if you'd prefer to implement in a programming langauge, you just need to find all unique pairs of numbers from your list, and count the number that satisfy your requirement. itertools.combinations is your friend here:
import itertools
A = [1,2,3,4,5]
pairs = []
for x, y in itertools.combinations(A, 2):
if x*y >= x + y:
pairs.append((x,y))
Output
[(2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]
Basic algebra ... solve for one variable in terms of the other:
xy >= x + y
xy - y >= x
y(x-1) >= x
Now, if your elements are all positive integers, you get
if x == 1, no solution
if x == 2, y >= 2
else x > 2
y >= x/(x-1)
In this last case, x/(x-1) is a fraction between 1 and 2; again,
y >= 2
Solves the inequality.
This gives you a trivially accessible solution in O(1) time; if you want the pairs themselves, you're constrained by the printing, which is O(n^2) time.
So using the fact that x*y >= x+y if both (mistake in my original comment) x and y are >=2 (see #Prune's answer for details), then you may as well remove 0 and 1 from your list if they appear, because they won't make any suitable pair.
So now assuming all numbers or >=2 and you have k of them (e.g. replace k by k-1 in the following operation if you have n=1), all possible pairs will satisfy your condition. And the number of pairs among k elements is the well known formula k*(k-1)/2 (google it if you don't know about it). The time to compute this number is essentially the same (one multiplication, one division) no matter what value of k you have (unless you start going to crazy big numbers), so complexity is O(1).
This assumes your integers are positive, if not the formula will be slightly more complicated but still possible as a closed form solution.
If you want a more mathematical solution, consider that xy > x+y has no solutions for y=1. Otherwise, you can algebraically work this out to x > y/(y-1). Now if we have two consecutive, positive integers and divide the larger by the smaller, we either get exactly 2 (if y=2) or get some fraction between 1 and 2 exclusive. Note that x has to be greater than this y/(y-1) quotient, but also has to be less than y. If y=2, then the only possible x value in our list of positive integers has to be 1, in which case there are no matches because 1 is not greater than 2/1. So this all simplifies to "For each number y in our list, count all of the values x that are in the range of [2,y)." If you do the math, this should come out to adding 1 + 2 + 3 + ... + k, which is simply k(k+1)/2. Again, we're assuming n and k are positive integers; you can derive a slightly more complicated formula when you take into account cases for n <= 0.
But assuming you DO want to stick with a brute force approach, and not do a little mathematical reasoning to find a different approach: I tried out several variations, and here's a faster solution based on the following.
You said the list is already sorted, so I dropped the sorting function.
Likewise, the "else: continue" isn't necessary, so for simplicity I dropped that.
Instead of looping through all x and y values, then checking if x < y, you can just make your second loop check y values in the range from x+1 to y. BUT...
You can use itertools to generate the unique pairs of all numbers in your list A
If you ultimately really only care about the length of the pairs list and not the number pairs themselves, then you can just count the pairs along the way instead of storing them. Otherwise you can run out of memory at high N values.
I get slightly faster results with the equivalent test of x(y-1)-y>0. More so than with x(y-1)>y too.
So here's what I have:
def example4(A):
mpair_count = 0
for pair in itertools.combinations(A, 2):
if pair[0]*(pair[1]-1) - pair[1] > 0:
mpair_count += 1
return mpair_count
Here's everything timed:
from timeit import default_timer as timer
import itertools
def minimum_working_example(A):
A.sort()
N = len(A)
mpairs = []
x = 0
while x < N:
for y in range(N):
if x<y and (A[x]*A[y])>=(A[x]+A[y]):
mpairs.append([A[x], A[y]])
else:
continue
x+=1
return len(mpairs)
# Cutting down the range
def example2(A):
N = len(A)
mpairs = []
x = 0
while x < N:
for y in range(x+1,N):
if (A[x]*A[y])>=(A[x]+A[y]):
mpairs.append([A[x], A[y]])
x += 1
return len(mpairs)
# Using itertools
def example3(A):
mpair_count = 0
for pair in itertools.combinations(A, 2):
if pair[0]*pair[1] > sum(pair):
mpair_count += 1
return mpair_count
# Using itertools and the different comparison
def example4(A):
mpair_count = 0
for pair in itertools.combinations(A, 2):
if pair[0]*(pair[1]-1) - pair[1] > 0:
mpair_count += 1
return mpair_count
# Same as #4, but slightly different
def example5(A):
mpair_count = 0
for pair in itertools.combinations(A, 2):
if pair[0]*(pair[1]-1) > pair[1]:
mpair_count += 1
return mpair_count
A = range(1,5000)
start = timer()
print(minimum_working_example(A))
end = timer()
print(end - start)
start = timer()
print(example2(A))
end = timer()
print(end - start)
start = timer()
print(example3(A))
end = timer()
print(end - start)
start = timer()
print(example4(A))
end = timer()
print(end - start)
start = timer()
print(example5(A))
end = timer()
print(end - start)
Result:
12487503
8.29403018155
12487503
7.81883932384
12487503
3.39669140954
12487503
2.79594281764
12487503
2.92911447083

Optimization of python code for calculating list of squared divisors

I was participating in a python challenge in codewars website. I encountered the following challenge:
Divisors of 42 are : 1, 2, 3, 6, 7, 14, 21, 42. These divisors squared are: 1, 4, 9, 36, 49, 196, 441, 1764. The sum of the squared divisors is 2500 which is 50 * 50, a square!
Given two integers m, n (1 <= m <= n) we want to find all integers between m and n whose sum of squared divisors is itself a square. 42 is such a number.
The result will be an array of arrays, each subarray having two elements, first the number whose squared divisors is a square and then the sum of the squared divisors.
The output should be:
list_squared(1, 250) --> [[1, 1], [42, 2500], [246, 84100]]
list_squared(42, 250) --> [[42, 2500], [246, 84100]]
list_squared(250, 500) --> [[287, 84100]]
I have written following code with two additional functions: one corresponding to determine all factors of a number and other checking if a number is perfect square or not.
Function to determine all factors:
def fact(m):
return [i for i in range(1,m+1) if m%i == 0]
Function to check if a number is perfect square and return 0 if it is not otherwise return square root
def square_root(x):
ans = 0
while ans < x // 2 + 1:
ans = ans + 1
if ans*ans == x:
return ans
break;
return 0
Function where the desired result is calculated
def list_squared(m, n):
# your code
fac=[]
for i in range(m,n+1):
sq_sum = sum(list(map(lambda x: x**2,fact(i))))
if square_root(sq_sum) != 0:
fac.append([i,sq_sum])
return fac
This code gives me the correct result, however it is too long. I was able to pass all the test results but it took me around 6000 ms. When I attempted to submit the code, the web submission returns that the algorithm is inefficient and it took more than 1200 ms which is the maximum.
I would highly appreciate if anyone can point to a better algorithm for this.
There are several optimizations to your code but the biggest one is to stop when ans*ans becomes bigger than x here:
def square_root(x):
ans = 0
while True:
ans += 1
sqans = ans*ans
if sqans == x:
return ans
elif sqans > x:
return 0
The condition in the while can be removed, since now the test is done on the square value.
with that optimization, I drop from 8 seconds to 0.07 seconds with the 250, 500 case.
But that's stil not satisfactory. In general, algorithms containing a condition to break or return are at least O(n) and even if you can save time, the complexity is too high.
You can do better by simply checking the square of the rounded square root:
def square_root(x):
ans = int(x**0.5 + 0.5) # rounded just in case it goes below the actual value (float inaccuracy)
sqans = ans*ans
return 0 if sqans !=x else x
I divide the execution time by further 2 with that (confirmed by Optimized way to find if a number is a perfect square)
Aside (that doesn't speed up that much but worth mentionning):
no need to convert map to list in sum:
sq_sum = sum(map(lambda x: x**2,fact(i)))
Also fact could avoid looping to the max number. Loop to the max number divided by 2 and add the max number to the list is equivalent. No more divisors exist above max number / 2
def fact(m):
return [i for i in range(1,m//2+1) if m%i == 0] + [m]
Final edit: this is still slow because of the list comprehension used in fact. I could cut the time drastically by using a generator instead and add m*m outside it:
def sqfact(m):
return (i*i for i in range(1,m//2+1) if m%i == 0)
final code, now runs so fast I get 0 seconds.
def sqfact(m):
return (i*i for i in range(1,m//2+1) if m%i == 0)
def square_root(x):
ans = int(x**0.5 + 0.5)
return 0 if ans*ans !=x else x
def list_squared(m, n):
# your code
fac=[]
for i in range(m,n+1):
sq_sum = sum(sqfact(i)) + i*i # add i square outside
if square_root(sq_sum):
fac.append([i,sq_sum])
return fac
I have updated the fact function which was very inefficient. Now, rather than iterating to full value of m to find its factors, I am only going up to sqrt(m). This has reduced the run time immensely. The logic behind this is trivial so I am not elaborating. Following is the new code which worked for me.
def fact(m):
#determining the lower factors i.e., smaller than sqrt(m)
fac = [i for i in range(1, int(m**0.5) + 1) if m%i == 0]
#determining the higher factors i.e., larger than sqrt(m)
fac = fac + [m//i for i in range(1, int(m**0.5) + 1) if m%i == 0]
return sorted(list(set(fac))) #in order to get rid of duplicate factors

Python 2 lists of positive integers finding prime number

Given 2 lists of positive integers, find how many ways you can select a number from each of the lists such that their sum is a prime number.
My code is tooo slow As i have both list1 and list 2 containing 50000 numbers each. So any way to make it faster so it solves it in minutes instead of days?? :)
# 2 is the only even prime number
if n == 2: return True
# all other even numbers are not primes
if not n & 1: return False
# range starts with 3 and only needs to go
# up the squareroot of n for all odd numbers
for x in range(3, int(n**0.5)+1, 2):
if n % x == 0: return False
return True
for i2 in l2:
for i1 in l1:
if isprime(i1 + i2):
n = n + 1 # increasing number of ways
s = "{0:02d}: {1:d}".format(n, i1 + i2)
print(s) # printing out
Sketch:
Following #Steve's advice, first figure out all the primes <= max(l1) + max(l2). Let's call that list primes. Note: primes doesn't really need to be a list; you could instead generate primes up the max one at a time.
Swap your lists (if necessary) so that l2 is the longest list. Then turn that into a set: l2 = set(l2).
Sort l1 (l1.sort()).
Then:
for p in primes:
for i in l1:
diff = p - i
if diff < 0:
# assuming there are no negative numbers in l2;
# since l1 is sorted, all diffs at and beyond this
# point will be negative
break
if diff in l2:
# print whatever you like
# at this point, p is a prime, and is the
# sum of diff (from l2) and i (from l1)
Alas, if l2 is, for example:
l2 = [2, 3, 100000000000000000000000000000000000000000000000000]
this is impractical. It relies on that, as in your example, max(max(l1), max(l2)) is "reasonably small".
Fleshed out
Hmm! You said in a comment that the numbers in the lists are up to 5 digits long. So they're less than 100,000. And you said at the start that the list have 50,000 elements each. So they each contain about half of all possible integers under 100,000, and you're going to have a very large number of sums that are primes. That's all important if you want to micro-optimize ;-)
Anyway, since the maximum possible sum is less than 200,000, any way of sieving will be fast enough - it will be a trivial part of the runtime. Here's the rest of the code:
def primesum(xs, ys):
if len(xs) > len(ys):
xs, ys = ys, xs
# Now xs is the shorter list.
xs = sorted(xs) # don't mutate the input list
sum_limit = xs[-1] + max(ys) # largest possible sum
ys = set(ys) # make lookups fast
count = 0
for p in gen_primes_through(sum_limit):
for x in xs:
diff = p - x
if diff < 0:
# Since xs is sorted, all diffs at and
# beyond this point are negative too.
# Since ys contains no negative integers,
# no point continuing with this p.
break
if diff in ys:
#print("%s + %s = prime %s" % (x, diff, p))
count += 1
return count
I'm not going to supply my gen_primes_through(), because it's irrelevant. Pick one from the other answers, or write your own.
Here's a convenient way to supply test cases:
from random import sample
xs = sample(range(100000), 50000)
ys = sample(range(100000), 50000)
print(primesum(xs, ys))
Note: I'm using Python 3. If you're using Python 2, use xrange() instead of range().
Across two runs, they each took about 3.5 minutes. That's what you asked for at the start ("minutes instead of days"). Python 2 would probably be faster. The counts returned were:
219,334,097
and
219,457,533
The total number of possible sums is, of course, 50000**2 == 2,500,000,000.
About timing
All the methods discussed here, including your original one, take time proportional to the product of two lists' lengths. All the fiddling is to reduce the constant factor. Here's a huge improvement over your original:
def primesum2(xs, ys):
sum_limit = max(xs) + max(ys) # largest possible sum
count = 0
primes = set(gen_primes_through(sum_limit))
for i in xs:
for j in ys:
if i+j in primes:
# print("%s + %s = prime %s" % (i, j, i+j))
count += 1
return count
Perhaps you'll understand that one better. Why is it a huge improvement? Because it replaces your expensive isprime(n) function with a blazing fast set lookup. It still takes time proportional to len(xs) * len(ys), but the "constant of proportionality" is slashed by replacing a very expensive inner-loop operation with a very cheap operation.
And, in fact, primesum2() is faster than my primesum() in many cases too. What makes primesum() faster in your specific case is that there are only around 18,000 primes less than 200,000. So iterating over the primes (as primesum() does) goes a lot faster than iterating over a list with 50,000 elements.
A "fast" general-purpose function for this problem would need to pick different methods depending on the inputs.
You should use the Sieve of Eratosthenes to calculate prime numbers.
You are also calculating the prime numbers for each possible combination of sums. Instead, consider finding the maximum value you can achieve with the sum from the lists. Generate a list of all the prime numbers up to that maximum value.
Whilst you are adding up the numbers, you can see if the number appears in your prime number list or not.
I would find the highest number in each range. The range of primes is the sum of the highest numbers.
Here is code to sieve out primes:
def eras(n):
last = n + 1
sieve = [0, 0] + list(range(2, last))
sqn = int(round(n ** 0.5))
it = (i for i in xrange(2, sqn + 1) if sieve[i])
for i in it:
sieve[i * i:last:i] = [0] * (n // i - i + 1)
return filter(None, sieve)
It takes around 3 seconds to find the primes up to 10 000 000. Then I would use the same n ^ 2 algorithm you are using for generating sums. I think there is an n logn algorithm but I can't come up with it.
It would look something like this:
from collections import defaultdict
possible = defaultdict(int)
for x in range1:
for y in range2:
possible[x + y] += 1
def eras(n):
last = n + 1
sieve = [0, 0] + list(range(2, last))
sqn = int(round(n ** 0.5))
it = (i for i in xrange(2, sqn + 1) if sieve[i])
for i in it:
sieve[i * i:last:i] = [0] * (n // i - i + 1)
return filter(None, sieve)
n = max(possible.keys())
primes = eras(n)
possible_primes = set(possible.keys()).intersection(set(primes))
for p in possible_primes:
print "{0}: {1} possible ways".format(p, possible[p])

Subset sum Problem

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:
import random
from time import time
from itertools import product
time0 = time()
# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
return [[0]*b for x in xrange(a)]
# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
return [random.randrange(lower,upper+1) for i in range(num)]
# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
N_list = []
P_list = []
for x in A:
if x < 0:
N_list.append(x)
elif x > 0:
P_list.append(x)
return [sum(N_list), sum(P_list)]
# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
if n < 0:
return 0
try:
return table[n][m - N]
except:
return 0
# same definition as above
def set_element(table, n, m, N, value):
table[n][m - N] = value
# input array
#A = [1, -3, 2, 4]
A = random_ints(200)
[N, P] = split_sum(A)
# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)
# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)
# iterate through each table element
#for i in xrange(1, m): #row
# for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
#set_element(table, i, s, N, 1)
table[i][s - N] = 1
# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
s = s - A[i]
solution.append(A[i])
print "Solution: ",solution
time1 = time()
print "Time execution: ", time1 - time0
I'm not quite sure if your solution is exact or a PTA (poly-time approximation).
But, as someone pointed out, this problem is indeed NP-Complete.
Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.
Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:
2^59 ops --> 2^59 seconds --> 2^26 years --> 1 year
-------------- ---------------
10.000.000.000 3600 x 24 x 365
You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.
On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory.
And even much larger than the few tera-bytes on your hard drive.
( That's for small values of the bound for the value of the elements in the set )
May be this is the case of your Dynamic programing algorithm.
It seemed to me that you were using a bound of 1000 when building your initialization matrix.
You can try a smaller bound. That is... if your input is consistently consist of small values.
Good Luck!
Someone on Hacker News came up with the following solution to the problem, which I quite liked. It just happens to be in python :):
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I spent a few minutes with it and it worked very well.
An interesting article on optimizing python code is available here. Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.
Hope that helps! Cheers
, 1st eye catch
def split_sum(A):
N_list = 0
P_list = 0
for x in A:
if x < 0:
N_list+=x
elif x > 0:
P_list+=x
return [N_list, P_list]
Some advices:
Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon. This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)
Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.
The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
nums = input("Enter the Elements").strip().split()
inputSum = int(input("Enter the Sum You want"))
for i, combo in enumerate(powerset(nums), 1):
sum = 0
for num in combo:
sum += int(num)
if sum == inputSum:
print(combo)
The Input Output is as Follows:
Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')
Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).
def subsetsum(cs,k,r,x,w,d):
x[k]=1
if(cs+w[k]==d):
for i in range(0,k+1):
if x[i]==1:
print (w[i],end=" ")
print()
elif cs+w[k]+w[k+1]<=d :
subsetsum(cs+w[k],k+1,r-w[k],x,w,d)
if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
x[k]=0
subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]
subsetsum(0,0,sum(w),x,w,7)

Categories

Resources