Delete certain elements of a numpy array - python

I have two numpy arrays a and b. I have a definition that construct an array c whose elements are all the possible sums of different elements of a.
import numpy as np
def Sumarray(a):
n = len(a)
sumarray = np.array([0]) # Add a default zero element
for k in range(2,n+1):
full = np.mgrid[k*(slice(n),)]
nd_triu_idx = full[:,(np.diff(full,axis=0)>0).all(axis=0)]
sumarray = np.append(sumarray, a[nd_triu_idx].sum(axis=0))
return sumarray
a = np.array([1,2,6,8])
c = Sumarray(a)
print(d)
I then perform a subsetsum between an element of c and b: isSubsetSum returns the elements of b that when summed gives c[1]. Let's say that I get
c[0] = b[2] + b[3]
Then I want to remove:
the elements b[2], b[3] (easy bit), and
the elements of a that when summed gave c[0]
As you can see from the definition, Sumarray, the order of sums of different elements of a are preserved, so I need to realise some mapping.
The function isSubsetSum is given by
def _isSubsetSum(numbers, n, x, indices):
if (x == 0):
return True
if (n == 0 and x != 0):
return False
# If last element is greater than x, then ignore it
if (numbers[n - 1] > x):
return _isSubsetSum(numbers, n - 1, x, indices)
# else, check if x can be obtained by any of the following
found = _isSubsetSum(numbers, n - 1, x, indices)
if found: return True
indices.insert(0, n - 1)
found = _isSubsetSum(numbers, n - 1, x - numbers[n - 1], indices)
if not found: indices.pop(0)
return found
def isSubsetSum(numbers, x):
indices = []
found = _isSubsetSum(numbers, len(numbers), x, indices)
return indices if found else None

As you are iterating over all possible numbers of terms, you could as well directly generate all possible subsets.
These can be conveniently encoded as numbers 0,1,2,... by means of their binary representations: O means no terms at all, 1 means only the first term, 2 means only the second, 3 means the first and the second and so on.
Using this scheme it becomes very easy to recover the terms from the sum index because all we need to do is obtain the binary representation:
UPDATE: we can suppress 1-term-sums with a small amount of extra code:
import numpy as np
def find_all_subsums(a,drop_singletons=False):
n = len(a)
assert n<=32 # this gives 4G subsets, and we have to cut somewhere
# compute the smallest integer type with enough bits
dt = f"<u{1<<((n-1)>>3).bit_length()}"
# the numbers 0 to 2^n encode all possible subsets of an n
# element set by means of their binary representation
# each bit corresponds to one element number k represents the
# subset consisting of all elements whose bit is set in k
rng = np.arange(1<<n,dtype=dt)
if drop_singletons:
# one element subsets correspond to powers of two
rng = np.delete(rng,1<<np.arange(n))
# np.unpackbits transforms bytes to their binary representation
# given the a bitvector b we can compute the corresponding subsum
# as b dot a, to do it in bulk we can mutliply the matrix of
# binary rows with a
return np.unpackbits(rng[...,None].view('u1'),
axis=1,count=n,bitorder='little') # a
def show_terms(a,idx,drop_singletons=False):
n = len(a)
if drop_singletons:
# we must undo the dropping of powers of two to get an index
# that is easy to translate. One can check that the following
# formula does the trick
idx += (idx+idx.bit_length()).bit_length()
# now we can simply use the binary representation
return a[np.unpackbits(np.asarray(idx,dtype='<u8')[None].view('u1'),
count=n,bitorder='little').view('?')]
example = np.logspace(1,7,7,base=3)
ss = find_all_subsums(example,True)
# check every single sum
for i,s in enumerate(ss):
assert show_terms(example,i,True).sum() == s
# print one example
idx = 77
print(ss[idx],"="," + ".join(show_terms(example.astype('U'),idx,True)))
Sample run:
2457.0 = 27.0 + 243.0 + 2187.0

Related

Combinatorics 1 to 1 mapping for Power Groups

Problem Description:
I'm working on making a function which gives me a definition for a particular combination of several descriptors based on a single index. My inputs are a set of raw features X = [feat0,feat1,feat2,feat3,feat4], a list of powers to be used pow = [1,2,3], and a list of group sizes sizes = [1,3,5]. A valid output might look like the following:
feat0^2 * feat4^3 * feat1^1
This output is valid because feat0, feat4, and feat1 exist within X, their powers exist within pow, and the number of features being combined is in sizes.
Invalid edge cases include:
values which don't exist in X, powers not in pow, and combination sizes not in sizes
combinations that are identical to another are invalid: feat0^2 * feat1^3 and feat1^3 * feat0^2 are the same
combinations that include multiples of the same feature are invalid: feat0^1 * feat0^3 * feat2^2 is invalid
under the hood I'm encoding these groupings as lists of tuples. So feat0^2 * feat4^3 * feat1^1 would be represented as [(0,2), (4,3), (1,1)], where the first element in the tuple is the feature index, and the second is the power.
Question:
my question is, how can I create a 1 to 1 mapping of a particular combination to an index i? I would like to get the number of possible combinations, and be able to plug in an integer i to a function, and have that function generate a particular combination. Something like this:
X = [0.123, 0.111, 11, -5]
pow = [1,2,3]
sizes = [1,3]
#getting total number of combinations
numCombos = get_num_combos(X,pow,sizes)
#getting a random index corresponding to a grouping
i = random.randint(0, numCombos)
#getting grouping
grouping = generate_grouping(i, X, pow, sizes)
print(grouping)
Resulting in something like
[(0,1), (1,2), (3,1)]
So far, figuring out the generation when not accounting for the various edge cases wasn't too hard, but I'm at a loss for how to account for edge cases 2 and 3; making it guaranteed that no value of i is algebraically equivalent to any other value of i, and that the same feature does not appear multiple times in a grouping.
Current Progress
#computes the n choose k of a list and a size
def get_num_groupings(n, k):
return int(math.factorial(n)/(math.factorial(k)*math.factorial(n-k)))
import numpy as np
import bisect
i = 150
n = 5
m = 3
sizes = [1, 3, 5]
#computing the number of elements in each group length
numElements = [m**k * get_num_groupings(n, k) for k in sizes]
#index bins for each group size
bins = list(np.cumsum(numElements))[:-1]
#getting the current group size
binIdx = bisect.bisect_left(bins,i)
curSize = sizes[binIdx]
#adding idx 0 to bins
bins = [0]+bins
#getting the location of i in the bin
z = i - bins[binIdx]
#getting the product index and combination rank
pi = z // m**k
ci = z % m**k
#getting the indexes of the powers
pidx = [(pi // m**(curSize - (num+1)))%m for num in range(curSize)]
#getting the indexes of the features
#TODO cidx = unrank(i, range(n))
This is based on the Mad Physicist's answer. Though I haven't figured out how to get cidx yet. Some of the variable names are rewritten for my own understanding. To my knowledge this implimentation works by logically separating the combinations of variables and which powers they each have. So far, I can get the powers from an index i, and once unrank is ironed out I should be able to get the indexes for which features are used.
Let's look at a slightly different problem that's closely related to what to want: generate all the possible valid combinations.
If you choose a size and a power, finding all possible combinations of features is fairly straightforward:
from itertools import combinations, product
n = len(X)
m = len(powers)
k = size = ... # e.g. 3
pow = ... # e.g. [1, 2, 3]
The iterator of unique combinations of features is given by
def elements(X, size, pow):
for x in combinations(X, size):
yield sum(e**p for p, e in zip(pow, x))
The equivalent one-liner would he
(sum(e**p for p, e in zip(pow, x)) for x in combinations(X, size))
This generator has exactly n choose k unique elements. These elements meet all your conditions by definition.
Now you can loop over all possible sizes and product of powers to get all the options:
def all_features(X, sizes, powers):
for size in sizes:
for pow in product(powers, repeat=size):
for x in combinations(X, size):
yield sum(e**p for p, e in zip(pow, x))
The total number of elements is the sum for each k of m**k * n choose k.
Now that you've counted the possibilities, you can compute the mapping of element to index and vice versa, using a combinatorial number system. Sample ranking and unranking functions for combinations are shown here. You can use them after you adjust the index for the size and power bins.
To show what I mean, assume you have three functions (given in the linked answer):
choose(n, k) computes n choose k
rank(combo) accepts the ordered indices of a specific commination and returns the rank.
unrank(ind, k) accepts a rank and size, and returns the k indices of the corresponding combination.
You can then compute the offsets of each size group and the step for each power within that group. Let's work through your concrete example with n = 5, m = 3, and sizes = [1, 3, 5].
The number of elements for each size is given by
elements = [m**k * choose(n, k) for k in sizes]
The total number of possible arrangements is sum(elements):
3**1 * choose(5, 1) + 3**3 * choose(5, 3) + 3**5 * choose(5, 5) = 3 * 5 + 27 * 10 + 243 * 1 = 15 + 270 + 243 = 528
The cumulative sum is useful to convert between index and element:
cumsum = [0, 15, 285]
When you get an index, you can check which bin it falls in using bisect.
Let's say you were given index = 55. Since 15 < 55 < 285, your offset is 15, size = 3. Within the size = 3 group, you have an offset of z = 55 - 15 = 40.
Within the k = 3 group, there are m**k = 3**3 = 27 power products. The index of the product is pi = z // m**k and the combination rank is ci = z % m**k.
So the indices of the power are given by
pidx = [(pi // m**(k - 1)) % m, (pi // m**(k - 2)) % m, ...]
Similarly, the indices of the combination are given by
cidx = unrank(ci, k)
You can convert all these indices into a value using something like
sum(X[q]**powers[p] for p, q in zip(pidx, cidx))

How to find the most frequent progressive digit from a list of 4-digits numbers

I am quite new in Python programming. What's an efficient and Pyhtonic way to find the most frequent progressive digit from a list of 4-digits numbers?
Let's say I have the following list: [6111, 7111, 6112, 6121, 6115, 6123].
The logic is to observe that the for the first digit the 6 is the most frequent. I can eliminate the number 7111 for the next considerations.
For the second digit I consider the new candidates [6111, 6112, 6121, 6115, 6123] and I observe that the 1 is the most frequent digit and so on.
At the end of the algorithm I'll have just 1 number of the list left.
If there are 2 or more number with the same occurrences for a digit I can either pick the smaller one on a random one between all of them.
A simple approach could be to convert the list into a Nx4 matrix and consider for each column the most frequent digit. This could work but I find a very stupid and inefficient way to solve this problem. Can anyone help?
EDIT: my code for this solution (NOTE: THIS CODE DOES NOT ALWAYS WORK, SOMETHING IS WRONG. FOR THE SOLUTION TO THIS PROBLEM PLEASE REFER TO #MadPhysicist ANSWER)
import numpy as np
import pandas as pd
from collections import Counter
numbers_list = [6111, 7111, 6112, 6121, 6115, 6123]
my_list = []
for number in numbers_list:
digit_list = []
for c in str(number):
digit_list.append(c)
my_list.append(digit_list)
matrix = np.array(my_list)
matrix0 = matrix
my_counter = Counter(matrix.T[0]).most_common(1)
i=0
for digit0 in matrix.T[0]:
if digit0 != my_counter[0][0]:
matrix0 = np.delete(matrix, i, 0)
i += 1
matrix = matrix0
matrix1 = matrix
my_counter = Counter(matrix.T[1]).most_common(1)
i=0
for digit1 in matrix.T[1]:
if digit1 != my_counter[0][0]:
matrix1 = np.delete(matrix, i, 0)
i += 1
matrix = matrix1
matrix2 = matrix
my_counter = Counter(matrix.T[2]).most_common(1)
i=0
for digit2 in matrix.T[2]:
if digit2 != my_counter[0][0]:
matrix2 = np.delete(matrix, i, 0)
i += 1
matrix = matrix2
matrix3 = matrix
my_counter = Counter(matrix.T[3]).most_common(1)
i=0
for digit3 in matrix.T[3]:
if digit3 != my_counter[0][0]:
matrix3 = np.delete(matrix, i, 0)
i += 1
matrix = matrix3
print (matrix[0])
Your idea of converting to a numpy array is solid. You don't need to split it up-front. A series of masks and histograms will pare down the array fairly quickly.
z = np.array([6111, 7111, 6112, 6121, 6115, 6123])
The nth digits (zero-based) can be obtained with something like
nth = (z // 10**n) % 10
Counting the most frequent one can be accomplished quickly with np.bincount as shown here:
frequentest = np.argmax(np.bincount(nth))
You can select the elements that have that digit in the nth place with simply
mask = nth == frequentest
So now run this in a loop over n (going backwards):
# Input array
z = np.array([6111, 7111, 6112, 6121, 6115, 6123])
# Compute the maximum number of decimal digits in the list.
# You can just manually set this to 4 if you prefer
n = int(np.ceil(np.log10(z + 1).max()))
# Empty output array
output = np.empty(n, dtype=int)
# Loop over the number of digits in reverse.
# In this case, i will be 3, 2, 1, 0.
for i in range(n - 1, -1, -1):
# Get the ith digit from each element of z
# The operators //, ** and % are vectorized: they operate
# on each element of an array to return an array
ith = (z // 10**i) % 10
# Count the number of occurrences of each number 0-9 in the ith digit
# Bincount returns an array of 10 elements. counts[0] is the number of 0s,
# counts[1] is the number of 1s, ..., counts[9] is the number of 9s
counts = np.bincount(ith)
# argmax finds the index of the maximum element: the digit with the
# highest count
output[i] = np.argmax(counts)
# Trim down the array to numbers that have the requested digit in the
# right place. ith == output[i] is a boolean mask. It is 1 where ith
# is the most common digit and 0 where it is not. Indexing with such a
# mask selects the elements at locations that are non-zero.
z = z[ith == output[i]]
As it happens, np.argmax will return the index of the first maximum count if there are multiple available, meaning that it will always select the smallest number.
You can recover the number from output with something like
>>> output
array([1, 1, 1, 6])
>>> (output * 10**np.arange(output.size)).sum()
6111
You can also just get the remaining element of z:
>>> z[0]
6111

Specifying a parameter (x) to be a multidimensional array that has a certain number of columns?

import numpy as np
def validation(x):
x = np.asarray(x)
if len(x) != 16:
return("Card doesn't have exactly 16 digits. Try again")
values = []
rwhat = x[::-1] # reverse the order of the credit card numbers
rwhat
checkDig = rwhat[0] # the leftmost [originally rightmost] digit which is the checkDigit ... I'm just doing this because it's easier for me to work with
checkDig
withCheck = [] # to append later when we add all single digits
everySec = rwhat[1:16:2] # we don't want to double the checkDigit, but we're extracting every second digit starting from the first, leftmost digit [tho we omit this checkDigit
everySec
def double(num): # to double the extracted second digit values
return [j * 2 for j in everySec]
xx = double(everySec)
xx
def getSingle(y): # to add the sum of the digits of any of the new doubled numbers which happen to be greater than 9
u = 0
while y:
u += y % 10
y //= 10
return u
yy=list(map(getSingle,xx))
yy
withCheck.append(checkDig)
withCheck
new_vv = withCheck + yy
new_vv # now we include the omitted checkDigit into this new list which should all be single digits
sumDig = sum(new_vv)
sumDig # now have the sum of the the new_vv list.
def final(f):
if sumDig % 10 == 0: # if the calculated sum is divisible by 10, then the card is valid.
return("Valid")
else:
return("Invalid")
go = final(sumDig)
values.append(go) # basically just appending into values[] for the sake of the validation(x) function, and so we can return something for this function. in this case we'd return values as seen below.
return values
So I've created this program, and I need to figure out how to define that the first (outermost) function's parameter takes card numbers as a multidimensional array that consists of exactly 16 columns, and should ultimately return a list of values stating either "Valid" or "Invalid".
The stuff inside the def validation(x) works, I've tested it before actually making the said function, but I just don't know how to specify that this function [aka what this program basically is] takes in a multidimensional array of 16 columns.
I'm pretty sure the lines of code regarding if len(x) != 16 is part of the problem, but it works if we just wanted to run one card [aka one set of 16 digits]
For example, if I wanted to try validation(([[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5],[1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6]]) I'm plagued with the output: "Card doesn't have exactly 16 digits. Try again" instead of the program properly running and returning me a list that states Valid or Invalid in respective to each card
Besides the basic issues addressed in #JohnZwinck's answer, there is the fundamental fact that you are not using numpy arrays as numpy arrays.
For the program you are writing, there should not be any explicit looping or comprehensions to compute sums or other quantities. Numpy arrays are excellent tools for vectorizing code and simplifying its appearance.
Here are some changes I would recommend in addition to asserting the size of the array:
Assert that all of the numbers are in the range 0-9:
assert np.all((x >= 0) & (x <= 9))
Be careful about whether you are using rows or columns. If you have n rows of 16 columns each, checkDig should be x[:, 0], which is the first column, not x[0], which is the first row, equivalent to x[0, :].
No need to reverse the array: checkDig is just the last element: x[:, -1]; everySec becomes x[:, 1:-1:2]. There is no need for it to be reversed given how it gets used.
The function double is just a mess:
You declare an unused parameter num.
You then operate on everySec in the enclosing namespace
You apply a list comprehension to a numpy array, which is slower, harder to understand and won't work correctly for 2D arrays.
You can replace it with just xx = everySec * 2, or even get rid of xx and just do everySec *= 2.
getSingle is overkill. You are doubling numbers nine and under, so the result can have no more than 2 digits (whose sum can be no more than 9). yy = (xx // 10) + (xx % 10) should do just fine. By maintaining numpy arrays instead of lists, you can make all the operations work for 2D arrays instead of having to loop over all the individual elements of a list.
The remainder of your operations are a bit unclear. You appear to be implementing the Luhn algorithm, but there is no attempt to add in the non-doubled digits. The non-doubled digits x[:, :-1:2].
Calling builtin sum will prevent you from processing multiple inputs without a loop. Use np.sum, with axis=1 to sum the columns in each row.
values.append(go) is only called once. If you wanted to process multiple numbers, you would have to write some sort of loop. It would be much easier to have go be a boolean array instead of a single boolean value.
Combining all these suggestions yields something like:
def validation(x):
x = np.asanyarray(x)
assert x.ndim == 2, "input must be 2D"
assert x.shape[1] == 16, "input must have 16 columns"
assert np.issubdtype(x.dtype, np.integer), "input must be integers"
assert np.all((x >= 0) & (x <= 9))
checkDig = x[:, -1]
xx = x[:, 1:-1:2] * 2
yy = x[:, :-1:2]
sumDig = np.sum(xx, axis=1) + np.sum(yy, axis=1) + checkDig
return ['Invalid' if s % 10 else 'Valid' for s in sumDig]
The function could be further simplified by making a copy of the input to avoid overwriting things, and operating in-place:
def validation(x):
x = np.array(x, copy=True, subok=True)
assert x.ndim == 2, "input must be 2D"
assert x.shape[1] == 16, "input must have 16 columns"
assert np.issubdtype(x.dtype, np.integer), "input must be integers"
assert np.all((x >= 0) & (x <= 9))
y = x[1:-1:2]
x[1:-1:2] = ((2 * y) // 10) + ((2 * y) % 10)
sumDig = np.sum(x, axis=1)
return ['Invalid' if s % 10 else 'Valid' for s in sumDig]
You need to inspect the shape. Something like this:
assert len(x.shape) == 2, "input must be 2D"
assert x.shape[1] == 16, "input must have 16 columns"
assert np.issubdtype(x.dtype, np.integer), "input must be integers"

Why isn't my implementation O(NlogN)?

I was implementing and testing answers to this SO question -
Given an array of integers find the number of all ordered pairs of elements in the array whose sum lies in a given range [a,b]
The answer with the most upvotes (currently) only provides a text description of an algorithm that should be O(NlogN):
Sort the array... .
For each element x in the array:
Consider the array slice after the element.
Do a binary search on this array slice for [a - x], call it y0. If no exact match is found, consider the closest match bigger than [a - x] as y0.
Output all elements (x, y) from y0 forwards as long as x + y <= b. ... If you only need to count the number of pairs, you can do it in O(nlogn). Modify the above algorithm so [b - x] (or the next smaller element) is also searched for.
My implementation:
import bisect
def ani(arr, a, b):
# Sort the array (say in increasing order).
arr.sort()
count = 0
for ndx, x in enumerate(arr):
# Consider the array slice after the element
after = arr[ndx+1:]
# Do a binary search on this array slice for [a - x], call it y0
lower = a - x
y0 = bisect.bisect_left(after, lower)
# If you only need to count the number of pairs
# Modify the ... algorithm so [b - x] ... is also searched for
upper = b - x
y1 = bisect.bisect_right(after, upper)
count += y1 - y0
return count
When I plot Time versus N or some function of N I am seeing an exponential or N^2 response.
# generate timings
T = list() # run-times
N = range(100, 10001, 100) # N
arr = [random.randint(-10, 10) for _ in xrange(1000000)]
print 'start'
start = time.time()
for n in N:
arr1 = arr[:n]
t = Timer('ani(arr1, 5, 16)', 'from __main__ import arr1, ani')
timing_loops = 100
T.append(t.timeit(timing_loops) / timing_loops)
Is my implementation incorrect or is the author's claim incorrect?
Here are some plots of the data.
T vs N
T / NlogN vs N - one commenter thought this should NOT produce a linear plot - but it does.
T vs NlogN - I thought this should be linear if the complexity is NlogN but it is not.
If nothing else, this is your error:
for ndx, x in enumerate(arr):
# Consider the array slice after the element
after = arr[ndx+1:]
arr[ndx+1:] creates a copy of the list of length len(arr) - ndx, so therefore your loop is O(n^2).
Instead, use the lo and hi arguments to bisect.bisect.

Subset sum Problem

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:
import random
from time import time
from itertools import product
time0 = time()
# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
return [[0]*b for x in xrange(a)]
# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
return [random.randrange(lower,upper+1) for i in range(num)]
# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
N_list = []
P_list = []
for x in A:
if x < 0:
N_list.append(x)
elif x > 0:
P_list.append(x)
return [sum(N_list), sum(P_list)]
# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
if n < 0:
return 0
try:
return table[n][m - N]
except:
return 0
# same definition as above
def set_element(table, n, m, N, value):
table[n][m - N] = value
# input array
#A = [1, -3, 2, 4]
A = random_ints(200)
[N, P] = split_sum(A)
# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)
# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)
# iterate through each table element
#for i in xrange(1, m): #row
# for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
#set_element(table, i, s, N, 1)
table[i][s - N] = 1
# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
s = s - A[i]
solution.append(A[i])
print "Solution: ",solution
time1 = time()
print "Time execution: ", time1 - time0
I'm not quite sure if your solution is exact or a PTA (poly-time approximation).
But, as someone pointed out, this problem is indeed NP-Complete.
Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.
Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:
2^59 ops --> 2^59 seconds --> 2^26 years --> 1 year
-------------- ---------------
10.000.000.000 3600 x 24 x 365
You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.
On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory.
And even much larger than the few tera-bytes on your hard drive.
( That's for small values of the bound for the value of the elements in the set )
May be this is the case of your Dynamic programing algorithm.
It seemed to me that you were using a bound of 1000 when building your initialization matrix.
You can try a smaller bound. That is... if your input is consistently consist of small values.
Good Luck!
Someone on Hacker News came up with the following solution to the problem, which I quite liked. It just happens to be in python :):
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I spent a few minutes with it and it worked very well.
An interesting article on optimizing python code is available here. Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.
Hope that helps! Cheers
, 1st eye catch
def split_sum(A):
N_list = 0
P_list = 0
for x in A:
if x < 0:
N_list+=x
elif x > 0:
P_list+=x
return [N_list, P_list]
Some advices:
Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon. This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)
Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.
The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
nums = input("Enter the Elements").strip().split()
inputSum = int(input("Enter the Sum You want"))
for i, combo in enumerate(powerset(nums), 1):
sum = 0
for num in combo:
sum += int(num)
if sum == inputSum:
print(combo)
The Input Output is as Follows:
Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')
Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).
def subsetsum(cs,k,r,x,w,d):
x[k]=1
if(cs+w[k]==d):
for i in range(0,k+1):
if x[i]==1:
print (w[i],end=" ")
print()
elif cs+w[k]+w[k+1]<=d :
subsetsum(cs+w[k],k+1,r-w[k],x,w,d)
if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
x[k]=0
subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]
subsetsum(0,0,sum(w),x,w,7)

Categories

Resources