I have a list of numbers. I also have a certain sum. The sum is made from a few numbers from my list (I may/may not know how many numbers it's made from). Is there a fast algorithm to get a list of possible numbers? Written in Python would be great, but pseudo-code's good too. (I can't yet read anything other than Python :P )
Example
list = [1,2,3,10]
sum = 12
result = [2,10]
NOTE: I do know of Algorithm to find which numbers from a list of size n sum to another number (but I cannot read C# and I'm unable to check if it works for my needs. I'm on Linux and I tried using Mono but I get errors and I can't figure out how to work C# :(
AND I do know of algorithm to sum up a list of numbers for all combinations (but it seems to be fairly inefficient. I don't need all combinations.)
This problem reduces to the 0-1 Knapsack Problem, where you are trying to find a set with an exact sum. The solution depends on the constraints, in the general case this problem is NP-Complete.
However, if the maximum search sum (let's call it S) is not too high, then you can solve the problem using dynamic programming. I will explain it using a recursive function and memoization, which is easier to understand than a bottom-up approach.
Let's code a function f(v, i, S), such that it returns the number of subsets in v[i:] that sums exactly to S. To solve it recursively, first we have to analyze the base (i.e.: v[i:] is empty):
S == 0: The only subset of [] has sum 0, so it is a valid subset. Because of this, the function should return 1.
S != 0: As the only subset of [] has sum 0, there is not a valid subset. Because of this, the function should return 0.
Then, let's analyze the recursive case (i.e.: v[i:] is not empty). There are two choices: include the number v[i] in the current subset, or not include it. If we include v[i], then we are looking subsets that have sum S - v[i], otherwise, we are still looking for subsets with sum S. The function f might be implemented in the following way:
def f(v, i, S):
if i >= len(v): return 1 if S == 0 else 0
count = f(v, i + 1, S)
count += f(v, i + 1, S - v[i])
return count
v = [1, 2, 3, 10]
sum = 12
print(f(v, 0, sum))
By checking f(v, 0, S) > 0, you can know if there is a solution to your problem. However, this code is too slow, each recursive call spawns two new calls, which leads to an O(2^n) algorithm. Now, we can apply memoization to make it run in time O(n*S), which is faster if S is not too big:
def f(v, i, S, memo):
if i >= len(v): return 1 if S == 0 else 0
if (i, S) not in memo: # <-- Check if value has not been calculated.
count = f(v, i + 1, S, memo)
count += f(v, i + 1, S - v[i], memo)
memo[(i, S)] = count # <-- Memoize calculated result.
return memo[(i, S)] # <-- Return memoized value.
v = [1, 2, 3, 10]
sum = 12
memo = dict()
print(f(v, 0, sum, memo))
Now, it is possible to code a function g that returns one subset that sums S. To do this, it is enough to add elements only if there is at least one solution including them:
def f(v, i, S, memo):
# ... same as before ...
def g(v, S, memo):
subset = []
for i, x in enumerate(v):
# Check if there is still a solution if we include v[i]
if f(v, i + 1, S - x, memo) > 0:
subset.append(x)
S -= x
return subset
v = [1, 2, 3, 10]
sum = 12
memo = dict()
if f(v, 0, sum, memo) == 0: print("There are no valid subsets.")
else: print(g(v, sum, memo))
Disclaimer: This solution says there are two subsets of [10, 10] that sums 10. This is because it assumes that the first ten is different to the second ten. The algorithm can be fixed to assume that both tens are equal (and thus answer one), but that is a bit more complicated.
I know I'm giving an answer 10 years later since you asked this, but i really needed to know how to do this an the way jbernadas did it was too hard for me, so i googled it for an hour and I found a python library itertools that gets the job done!
I hope this help to future newbie programmers.
You just have to import the library and use the .combinations() method, it is that simple, it returns all the subsets in a set with order, I mean:
For the set [1, 2, 3, 4] and a subset with length 3 it will not return [1, 2, 3][1, 3, 2][2, 3, 1] it will return just [1, 2, 3]
As you want ALL the subsets of a set you can iterate it:
import itertools
sequence = [1, 2, 3, 4]
for i in range(len(sequence)):
for j in itertools.combinations(sequence, i):
print(j)
The output will be
()
(1,)
(2,)
(3,)
(4,)
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
Hope this help!
So, the logic is to reverse sort the numbers,and suppose the list of numbers is l and sum to be formed is s.
for i in b:
if(a(round(n-i,2),b[b.index(i)+1:])):
r.append(i)
return True
return False
then, we go through this loop and a number is selected from l in order and let say it is i .
there are 2 possible cases either i is the part of sum or not.
So, we assume that i is part of solution and then the problem reduces to l being l[l.index(i+1):] and s being s-i so, if our function is a(l,s) then we call a(l[l.index(i+1):] ,s-i). and if i is not a part of s then we have to form s from l[l.index(i+1):] list.
So it is similar in both the cases , only change is if i is part of s, then s=s-i and otherwise s=s only.
now to reduce the problem such that in case numbers in l are greater than s we remove them to reduce the complexity until l is empty and in that case the numbers which are selected are not a part of our solution and we return false.
if(len(b)==0):
return False
while(b[0]>n):
b.remove(b[0])
if(len(b)==0):
return False
and in case l has only 1 element left then either it can be part of s then we return true or it is not then we return false and loop will go through other number.
if(b[0]==n):
r.append(b[0])
return True
if(len(b)==1):
return False
note in the loop if have used b..but b is our list only.and i have rounded wherever it is possible, so that we should not get wrong answer due to floating point calculations in python.
r=[]
list_of_numbers=[61.12,13.11,100.12,12.32,200,60.00,145.34,14.22,100.21,14.77,214.35,200.32,65.43,0.49,132.13,143.21,156.34,11.32,12.34,15.67,17.89,21.23,14.21,12,122,134]
list_of_numbers=sorted(list_of_numbers)
list_of_numbers.reverse()
sum_to_be_formed=401.54
def a(n,b):
global r
if(len(b)==0):
return False
while(b[0]>n):
b.remove(b[0])
if(len(b)==0):
return False
if(b[0]==n):
r.append(b[0])
return True
if(len(b)==1):
return False
for i in b:
if(a(round(n-i,2),b[b.index(i)+1:])):
r.append(i)
return True
return False
if(a(sum_to_be_formed,list_of_numbers)):
print(r)
this solution works fast.more fast than one explained above.
However this works for positive numbers only.
However also it works good if there is a solution only otherwise it takes to much time to get out of loops.
an example run is like this lets say
l=[1,6,7,8,10]
and s=22 i.e. s=1+6+7+8
so it goes through like this
1.) [10, 8, 7, 6, 1] 22
i.e. 10 is selected to be part of 22..so s=22-10=12 and l=l.remove(10)
2.) [8, 7, 6, 1] 12
i.e. 8 is selected to be part of 12..so s=12-8=4 and l=l.remove(8)
3.) [7, 6, 1] 4
now 7,6 are removed and 1!=4 so it will return false for this execution where 8 is selected.
4.)[6, 1] 5
i.e. 7 is selected to be part of 12..so s=12-7=5 and l=l.remove(7)
now 6 are removed and 1!=5 so it will return false for this execution where 7 is selected.
5.)[1] 6
i.e. 6 is selected to be part of 12..so s=12-6=6 and l=l.remove(6)
now 1!=6 so it will return false for this execution where 6 is selected.
6.)[] 11
i.e. 1 is selected to be part of 12..so s=12-1=1 and l=l.remove(1)
now l is empty so all the cases for which 10 was a part of s are false and so 10 is not a part of s and we now start with 8 and same cases follow.
7.)[7, 6, 1] 14
8.)[6, 1] 7
9.)[1] 1
just to give a comparison which i ran on my computer which is not so good.
using
l=[61.12,13.11,100.12,12.32,200,60.00,145.34,14.22,100.21,14.77,214.35,145.21,123.56,11.90,200.32,65.43,0.49,132.13,143.21,156.34,11.32,12.34,15.67,17.89,21.23,14.21,12,122,134]
and
s=2000
my loop ran 1018 times and 31 ms.
and previous code loop ran 3415587 times and took somewhere near 16 seconds.
however in case a solution does not exist my code ran more than few minutes so i stopped it and previous code ran near around 17 ms only and previous code works with negative numbers also.
so i thing some improvements can be done.
#!/usr/bin/python2
ylist = [1, 2, 3, 4, 5, 6, 7, 9, 2, 5, 3, -1]
print ylist
target = int(raw_input("enter the target number"))
for i in xrange(len(ylist)):
sno = target-ylist[i]
for j in xrange(i+1, len(ylist)):
if ylist[j] == sno:
print ylist[i], ylist[j]
This python code do what you asked, it will print the unique pair of numbers whose sum is equal to the target variable.
if target number is 8, it will print:
1 7
2 6
3 5
3 5
5 3
6 2
9 -1
5 3
I have found an answer which has run-time complexity O(n) and space complexity about O(2n), where n is the length of the list.
The answer satisfies the following constraints:
List can contain duplicates, e.g. [1,1,1,2,3] and you want to find pairs sum to 2
List can contain both positive and negative integers
The code is as below, and followed by the explanation:
def countPairs(k, a):
# List a, sum is k
temp = dict()
count = 0
for iter1 in a:
temp[iter1] = 0
temp[k-iter1] = 0
for iter2 in a:
temp[iter2] += 1
for iter3 in list(temp.keys()):
if iter3 == k / 2 and temp[iter3] > 1:
count += temp[iter3] * (temp[k-iter3] - 1) / 2
elif iter3 == k / 2 and temp[iter3] <= 1:
continue
else:
count += temp[iter3] * temp[k-iter3] / 2
return int(count)
Create an empty dictionary, iterate through the list and put all the possible keys in the dict with initial value 0.
Note that the key (k-iter1) is necessary to specify, e.g. if the list contains 1 but not contains 4, and the sum is 5. Then when we look at 1, we would like to find how many 4 do we have, but if 4 is not in the dict, then it will raise an error.
Iterate through the list again, and count how many times that each integer occurs and store the results to the dict.
Iterate through through the dict, this time is to find how many pairs do we have. We need to consider 3 conditions:
3.1 The key is just half of the sum and this key occurs more than once in the list, e.g. list is [1,1,1], sum is 2. We treat this special condition as what the code does.
3.2 The key is just half of the sum and this key occurs only once in the list, we skip this condition.
3.3 For other cases that key is not half of the sum, just multiply the its value with another key's value where these two keys sum to the given value. E.g. If sum is 6, we multiply temp[1] and temp[5], temp[2] and temp[4], etc... (I didn't list cases where numbers are negative, but idea is the same.)
The most complex step is step 3, which involves searching the dictionary, but as searching the dictionary is usually fast, nearly constant complexity. (Although worst case is O(n), but should not happen for integer keys.) Thus, with assuming the searching is constant complexity, the total complexity is O(n) as we only iterate the list many times separately.
Advice for a better solution is welcomed :)
I have N elements in array. I can select the first item max of N times, the second item max of N-1 times and so on.
I have K tokens to use and need to use them so I can have the maximum number of items.
arr = [3, 4, 8] where array elements indicates tokens required for i'th item
n = 10 , represents number of tokens I have
Output:
3
Explanation:
We have 2 options here:
1. option 1: 1st item 2 times for 6 tokens (3*2) and second item once for 4 tokens (4*1)
2. option 2: 1st item 3 times for 9 tokens (3*3)
so maximum we can have 3 items
Code:
def process(arr,n):
count = 0
sum = 0
size = len(arr)+1
for i in range(0, len(arr), 1):
size1 = size-1
size -= 1
while((sum+arr[i] <= n) and (size1 > 0)):
size1 = size1 -1
sum = sum + arr[i]
count += 1
return count;
But it worked for only few test cases, it failed for some hidden test cases. I am not sure where I made a mistake. Can anybody help me?
Your greedy approach will fail for the test cases like this:
[8,2,1,1] 10
Your code will return 2 but the maximum will be 6.
I will use a heap of a tuple i.e. heap[(cost_of_ride,max_no_rides)] .
See the code below:
from heapq import *
def process(arr,n):
count = 0
heap = []
for i in range(len(arr)):
heappush(heap,(arr[i],-(len(arr)-i))) # Constructing min-heap with second index as negative of maximum number of rides
while(n>0 and heap):
cost,no_of_rides = heappop(heap)
no_of_rides = -1 * no_of_rides # Changing maximum no_of_rides from negative to positive
div = n//cost
# If the amount of money is not sufficient to calculate the last number of rides user could take
if(div<no_of_rides):
count += div
break
# Else decrement the number of tokens by minimum cost * maximum no_of_rides
else:
count += no_of_rides
n -= no_of_rides*cost
return count;
Time Complexity for the solution is: O(len(arr)*lg(len(arr))) or O(N*lg(N)).
Try:
def process(arr, n, res=[]):
l=len(arr)
for j in range(len(arr)+1):
r=[arr[0]]*j
if(sum(r)==n) or (sum(r)<n) and (l==1):
yield len(res+r)
elif(sum(r)<n):
yield from process(arr[1:], n-sum(r), res+r)
else:
break
The idea is to iterate over all possible combinations of resulting tokens, more precisely - all options for individual token are just this token taken between 0 and N times, where N refers to tokens position, per your logic.
Discarding on the way combinations, which exceed n, ultimately returning generator, which produces lengths of produced vector of all tokens taken in all possible quantities (so in order to address your question - you need to take max(...) from it).
Outputs:
>>> print(max(process([3,4,8],10)))
3
>>> print(max(process([8,2,1,1],10)))
6
>>> print(max(process([10, 8, 6, 4, 2], 30)))
6
#learner your logic doesn't seem to be working properly.
Please try these inputs: arr = [10, 8, 6, 4, 2], n = 30.
As per your description answer should be 6 rides but your code would produce 3
Use a modified form a quickselect, where you select the next pivot based on the sum of the products of cost * max_times, but still sort based on just cost. This is worst-case O(n^2), but expected O(n).
I am trying to extract rows from a large numpy array. The columns of the array are obs number, group id (j), time id (t), and some data x_jt.
Here is an example:
import numpy as np
N = 100
T = 100
X = np.vstack((np.array(range(1,N*T+1)),np.repeat(np.array(range(1,N+1)),T), np.tile(np.array(range(1,T+1)),N), np.random.randint(100,size=N*T))).T
If I want to extract all rows from X where group id = 2, I would do
X[np.where(X[:,1] == 2)]
And if I wanted all rows where j = 2 or 3, I could extend that code. However, in my case, I have many group ids (j's) to extract. Specifically, I want to extract all rows where j comes from
samples = np.random.randint(N, size=N) + 1
For example, suppose size = 5 instead of N, and samples = (2,4,5,4,7). What I am after is code that goes through X and selects all rows where j = 2, then j = 4, then j = 5, j = 4, and finally j = 7, and creates a new array with the results. Basically this:
result = []
for j in samples:
result.extend(X[np.where(X[:,1] == j)])
However, this code is slow when N is large. Do you have any suggestions to speed it up? Thanks!
Without replacement
This could be done with vectorized functions:
def contains(X, samples):
return numpy.vectorize(lambda x: x in samples)(X)
result = X[contains(X[:, 1], set(samples)), :]
With replacement
If you want to do this with replacement just check off one count per sample until there are no more samples (assuming the order does not matter). This way you at least reduce the amount of times you need to iterate over the matrix.
result = []
sample_counts = collections.Counter(samples)
while sum(sample_counts.itervalues()):
# pick up one of each of the remaining samples and chain their rows
# together in result
s = set(key for key, value in sample_counts.iteritems() if value)
result = itertools.chain(result, X[contains(X[:, 1], s), :])
sample_counts -= collections.Counter(dict.fromkeys(s, 1))
# create a matrix of the final result
result = numpy.array(list(result))
In that case the only way I can think of that might speed up what you're already doing is preallocating a matrix. So you would do:
It doesn't do exactly what you are describing, but this type of problems are a good candidate for np.in1d. Something like this should work:
result = X[np.in1d(X[:, 1], samples)]
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I'm trying to create a program that will create a 10 element array and then assign random values to each element. I then want the program to tell if the array is balanced. By balanced I mean, is there anywhere in the array values that at a certain element the sum of the values in the elements are equal to the sum of the array values in the elements greater than that current element.
Example
Element (1,2,3,4) Values (2,1,3,0)
The program would then display that elements 1-2 are balanced to elemtns 3-4, because they both equal 4.
So far I have
import random
size = 10
mean = 0
lists = [0] * size
for i in range(size):
var = random.randint(0,4)
lists[i] = var
for i in lists:
mean += i
avg = (mean)/(size)
I figured the only way the elements could be balanced is if the values average is equal to 2, so I figured that's how I should start.
I'd appreciate any help in the right direction.
If I understand the question, the simplest solution is something like this:
def balanced(numbers):
for pivot in range(len(numbers)):
left_total = sum(numbers[:pivot])
right_total = sum(numbers[pivot:])
if left_total == right_total:
return pivot
return None
For example:
>>> numbers = [2, 1, 3, 0]
>>> balanced(numbers)
2
>>> more_numbers = [2, 1, 3, 4]
>>> balanced(numbers)
(That didn't print anything, because it returned None, meaning there is no pivot to balance the list around.)
While this is the simplest solution, it's obviously not the most efficient, because you keep adding the same numbers up over and over.
If you think about it, it should be pretty easy to figure out how to keep running totals for left_total and right_total, only calling sum once.
def balanced(numbers):
left_total, right_total = 0, sum(numbers)
for pivot, value in enumerate(numbers):
if left_total == right_total:
return pivot
left_total += value
right_total -= value
return None
Finally, here's how you can build a program around it:
size = 10
numbers = [random.range(4) for _ in range(size)]
pivot = balanced(numbers)
if pivot is None:
print('{} is not balanced'.format(numbers))
else:
print('{} is balanced, because elements 1-{} equal {}-{}'.format(
numbers, pivot+1, pivot+2, size+1))
A good data structure to know about for this kind of problem is an array that has the cumulative sum. element[j] - element[i] is the sum from i to j in the original series. If you have the original series [1, 2, 3, 4], the cumulative series is [0, 1, 3, 6, 10]. The sum up to the i position in the original series is element[i] - element[0]. For this problem, we are interested in only a sum starting at 0, so this is a bit of overkill but, again, more fully useful for other problems.
Here is code to make a cumulative sum:
def cumulative_sum(series):
s = [0]
for element in series:
s.append(element + s[-1])
return s
Given that, we can find the pivot point with this code:
def find_pivot(series):
cs = cumulative_sum(series)
total = cs[-1]
even_total = not (total & 1)
if even_total:
target = total // 2
for i, element in enumerate(cs[1:]):
if element == target:
return i + 1
return -1
Notice that it is not necessary to try dividing the series if we know the series sums to an odd number: there cannot be a pivot point then.
Alternatively, you can write find_pivot like this:
def find_pivot(series):
cs = cumulative_sum(series)
total = cs[-1]
even_total = not (total & 1)
if even_total:
target = total // 2
try:
return cs.index(target)
except ValueError:
return -1
return -1
It has the advantage that the looping is not done explicitly in python but in C code in the standard library.
Trying the code out:
def test():
for i in range(1, 30):
test_values = range(i)
j = find_pivot(test_values)
if j >= 0:
print "{0} == {1}".format(test_values[:j], test_values[j:])
And we get this output:
[0] == []
[0, 1, 2] == [3]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] == [15, 16, 17, 18, 19, 20]