Math formula behind this find_single_in_triplets() function?

Math formula behind this find_single_in_triplets() function? - python

I came across this interesting function, which is to find the only single number (appear once) in an integer list, all other numbers are positive and appears as triplets (3 occurrences).
It works fine as the example shown below.
However, I cannot figure out what is the math formula that it's derived from. Hope someone can shed the light on this puzzle.
def find_single_in_triplets(L):
orig_sum = sum(L)
set_sum = sum(set(L)
return (set_sum * 3 - orig_sum) // 2 # given the single num.
find_single_in_triples([1, 2, 3, 4, 6, 2, 3, 4, 1, 3, 2, 1, 4]) # -> 6

More of a math problem really, but, it's pretty simple reasoning going on here.
Take your list of numbers L that contains this one unknown number x.
If we add 2 extra x's so that all numbers appear 3 times, then the sum sum(L) + 2*x will of course be equal to sum(set(L))*3.
Thus sum(set(L))*3 - sum(L) = 2*x. Just divide by 2 and you are done.
Of course, this not only works for triplets, and we can generalize
def find_single_in_n(L, n):
orig_sum = sum(L)
set_sum = sum(set(L))
return (set_sum * n - orig_sum) // (n-1)
find_single_in_n([1,1,1,1,2,3,3,3,3], 4)

Related

How can I get a sum from some elements of a list? [duplicate]

I have a list of numbers. I also have a certain sum. The sum is made from a few numbers from my list (I may/may not know how many numbers it's made from). Is there a fast algorithm to get a list of possible numbers? Written in Python would be great, but pseudo-code's good too. (I can't yet read anything other than Python :P )
Example
list = [1,2,3,10]
sum = 12
result = [2,10]
NOTE: I do know of Algorithm to find which numbers from a list of size n sum to another number (but I cannot read C# and I'm unable to check if it works for my needs. I'm on Linux and I tried using Mono but I get errors and I can't figure out how to work C# :(
AND I do know of algorithm to sum up a list of numbers for all combinations (but it seems to be fairly inefficient. I don't need all combinations.)

This problem reduces to the 0-1 Knapsack Problem, where you are trying to find a set with an exact sum. The solution depends on the constraints, in the general case this problem is NP-Complete.
However, if the maximum search sum (let's call it S) is not too high, then you can solve the problem using dynamic programming. I will explain it using a recursive function and memoization, which is easier to understand than a bottom-up approach.
Let's code a function f(v, i, S), such that it returns the number of subsets in v[i:] that sums exactly to S. To solve it recursively, first we have to analyze the base (i.e.: v[i:] is empty):
S == 0: The only subset of [] has sum 0, so it is a valid subset. Because of this, the function should return 1.
S != 0: As the only subset of [] has sum 0, there is not a valid subset. Because of this, the function should return 0.
Then, let's analyze the recursive case (i.e.: v[i:] is not empty). There are two choices: include the number v[i] in the current subset, or not include it. If we include v[i], then we are looking subsets that have sum S - v[i], otherwise, we are still looking for subsets with sum S. The function f might be implemented in the following way:
def f(v, i, S):
if i >= len(v): return 1 if S == 0 else 0
count = f(v, i + 1, S)
count += f(v, i + 1, S - v[i])
return count
v = [1, 2, 3, 10]
sum = 12
print(f(v, 0, sum))
By checking f(v, 0, S) > 0, you can know if there is a solution to your problem. However, this code is too slow, each recursive call spawns two new calls, which leads to an O(2^n) algorithm. Now, we can apply memoization to make it run in time O(n*S), which is faster if S is not too big:
def f(v, i, S, memo):
if i >= len(v): return 1 if S == 0 else 0
if (i, S) not in memo: # <-- Check if value has not been calculated.
count = f(v, i + 1, S, memo)
count += f(v, i + 1, S - v[i], memo)
memo[(i, S)] = count # <-- Memoize calculated result.
return memo[(i, S)] # <-- Return memoized value.
v = [1, 2, 3, 10]
sum = 12
memo = dict()
print(f(v, 0, sum, memo))
Now, it is possible to code a function g that returns one subset that sums S. To do this, it is enough to add elements only if there is at least one solution including them:
def f(v, i, S, memo):
# ... same as before ...
def g(v, S, memo):
subset = []
for i, x in enumerate(v):
# Check if there is still a solution if we include v[i]
if f(v, i + 1, S - x, memo) > 0:
subset.append(x)
S -= x
return subset
v = [1, 2, 3, 10]
sum = 12
memo = dict()
if f(v, 0, sum, memo) == 0: print("There are no valid subsets.")
else: print(g(v, sum, memo))
Disclaimer: This solution says there are two subsets of [10, 10] that sums 10. This is because it assumes that the first ten is different to the second ten. The algorithm can be fixed to assume that both tens are equal (and thus answer one), but that is a bit more complicated.

I know I'm giving an answer 10 years later since you asked this, but i really needed to know how to do this an the way jbernadas did it was too hard for me, so i googled it for an hour and I found a python library itertools that gets the job done!
I hope this help to future newbie programmers.
You just have to import the library and use the .combinations() method, it is that simple, it returns all the subsets in a set with order, I mean:
For the set [1, 2, 3, 4] and a subset with length 3 it will not return [1, 2, 3][1, 3, 2][2, 3, 1] it will return just [1, 2, 3]
As you want ALL the subsets of a set you can iterate it:
import itertools
sequence = [1, 2, 3, 4]
for i in range(len(sequence)):
for j in itertools.combinations(sequence, i):
print(j)
The output will be
()
(1,)
(2,)
(3,)
(4,)
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
Hope this help!

So, the logic is to reverse sort the numbers,and suppose the list of numbers is l and sum to be formed is s.
for i in b:
if(a(round(n-i,2),b[b.index(i)+1:])):
r.append(i)
return True
return False
then, we go through this loop and a number is selected from l in order and let say it is i .
there are 2 possible cases either i is the part of sum or not.
So, we assume that i is part of solution and then the problem reduces to l being l[l.index(i+1):] and s being s-i so, if our function is a(l,s) then we call a(l[l.index(i+1):] ,s-i). and if i is not a part of s then we have to form s from l[l.index(i+1):] list.
So it is similar in both the cases , only change is if i is part of s, then s=s-i and otherwise s=s only.
now to reduce the problem such that in case numbers in l are greater than s we remove them to reduce the complexity until l is empty and in that case the numbers which are selected are not a part of our solution and we return false.
if(len(b)==0):
return False
while(b[0]>n):
b.remove(b[0])
if(len(b)==0):
return False
and in case l has only 1 element left then either it can be part of s then we return true or it is not then we return false and loop will go through other number.
if(b[0]==n):
r.append(b[0])
return True
if(len(b)==1):
return False
note in the loop if have used b..but b is our list only.and i have rounded wherever it is possible, so that we should not get wrong answer due to floating point calculations in python.
r=[]
list_of_numbers=[61.12,13.11,100.12,12.32,200,60.00,145.34,14.22,100.21,14.77,214.35,200.32,65.43,0.49,132.13,143.21,156.34,11.32,12.34,15.67,17.89,21.23,14.21,12,122,134]
list_of_numbers=sorted(list_of_numbers)
list_of_numbers.reverse()
sum_to_be_formed=401.54
def a(n,b):
global r
if(len(b)==0):
return False
while(b[0]>n):
b.remove(b[0])
if(len(b)==0):
return False
if(b[0]==n):
r.append(b[0])
return True
if(len(b)==1):
return False
for i in b:
if(a(round(n-i,2),b[b.index(i)+1:])):
r.append(i)
return True
return False
if(a(sum_to_be_formed,list_of_numbers)):
print(r)
this solution works fast.more fast than one explained above.
However this works for positive numbers only.
However also it works good if there is a solution only otherwise it takes to much time to get out of loops.
an example run is like this lets say
l=[1,6,7,8,10]
and s=22 i.e. s=1+6+7+8
so it goes through like this
1.) [10, 8, 7, 6, 1] 22
i.e. 10 is selected to be part of 22..so s=22-10=12 and l=l.remove(10)
2.) [8, 7, 6, 1] 12
i.e. 8 is selected to be part of 12..so s=12-8=4 and l=l.remove(8)
3.) [7, 6, 1] 4
now 7,6 are removed and 1!=4 so it will return false for this execution where 8 is selected.
4.)[6, 1] 5
i.e. 7 is selected to be part of 12..so s=12-7=5 and l=l.remove(7)
now 6 are removed and 1!=5 so it will return false for this execution where 7 is selected.
5.)[1] 6
i.e. 6 is selected to be part of 12..so s=12-6=6 and l=l.remove(6)
now 1!=6 so it will return false for this execution where 6 is selected.
6.)[] 11
i.e. 1 is selected to be part of 12..so s=12-1=1 and l=l.remove(1)
now l is empty so all the cases for which 10 was a part of s are false and so 10 is not a part of s and we now start with 8 and same cases follow.
7.)[7, 6, 1] 14
8.)[6, 1] 7
9.)[1] 1
just to give a comparison which i ran on my computer which is not so good.
using
l=[61.12,13.11,100.12,12.32,200,60.00,145.34,14.22,100.21,14.77,214.35,145.21,123.56,11.90,200.32,65.43,0.49,132.13,143.21,156.34,11.32,12.34,15.67,17.89,21.23,14.21,12,122,134]
and
s=2000
my loop ran 1018 times and 31 ms.
and previous code loop ran 3415587 times and took somewhere near 16 seconds.
however in case a solution does not exist my code ran more than few minutes so i stopped it and previous code ran near around 17 ms only and previous code works with negative numbers also.
so i thing some improvements can be done.

#!/usr/bin/python2
ylist = [1, 2, 3, 4, 5, 6, 7, 9, 2, 5, 3, -1]
print ylist
target = int(raw_input("enter the target number"))
for i in xrange(len(ylist)):
sno = target-ylist[i]
for j in xrange(i+1, len(ylist)):
if ylist[j] == sno:
print ylist[i], ylist[j]
This python code do what you asked, it will print the unique pair of numbers whose sum is equal to the target variable.
if target number is 8, it will print:
1 7
2 6
3 5
3 5
5 3
6 2
9 -1
5 3

I have found an answer which has run-time complexity O(n) and space complexity about O(2n), where n is the length of the list.
The answer satisfies the following constraints:
List can contain duplicates, e.g. [1,1,1,2,3] and you want to find pairs sum to 2
List can contain both positive and negative integers
The code is as below, and followed by the explanation:
def countPairs(k, a):
# List a, sum is k
temp = dict()
count = 0
for iter1 in a:
temp[iter1] = 0
temp[k-iter1] = 0
for iter2 in a:
temp[iter2] += 1
for iter3 in list(temp.keys()):
if iter3 == k / 2 and temp[iter3] > 1:
count += temp[iter3] * (temp[k-iter3] - 1) / 2
elif iter3 == k / 2 and temp[iter3] <= 1:
continue
else:
count += temp[iter3] * temp[k-iter3] / 2
return int(count)
Create an empty dictionary, iterate through the list and put all the possible keys in the dict with initial value 0.
Note that the key (k-iter1) is necessary to specify, e.g. if the list contains 1 but not contains 4, and the sum is 5. Then when we look at 1, we would like to find how many 4 do we have, but if 4 is not in the dict, then it will raise an error.
Iterate through the list again, and count how many times that each integer occurs and store the results to the dict.
Iterate through through the dict, this time is to find how many pairs do we have. We need to consider 3 conditions:
3.1 The key is just half of the sum and this key occurs more than once in the list, e.g. list is [1,1,1], sum is 2. We treat this special condition as what the code does.
3.2 The key is just half of the sum and this key occurs only once in the list, we skip this condition.
3.3 For other cases that key is not half of the sum, just multiply the its value with another key's value where these two keys sum to the given value. E.g. If sum is 6, we multiply temp[1] and temp[5], temp[2] and temp[4], etc... (I didn't list cases where numbers are negative, but idea is the same.)
The most complex step is step 3, which involves searching the dictionary, but as searching the dictionary is usually fast, nearly constant complexity. (Although worst case is O(n), but should not happen for integer keys.) Thus, with assuming the searching is constant complexity, the total complexity is O(n) as we only iterate the list many times separately.
Advice for a better solution is welcomed :)

Incorrect number n of *args (Non-Keyword Arguments) at function in for cycle

I'm a noob trying to complete some tasks and got a problem.
I have to calc arithmetic mean (I hope it's a correct definition) in function with *args (Non-Keyword Arguments).
So I have this
def avsum(*numbers):
summ = 0
print('Numbers', numbers)
for n in numbers:
summ += n
print('Calc', summ)
print('n', n)
print('Numbers', numbers)
result = summ / (n - 1)
return result
print(avsum(2, 3, 4))
All that prints are for just control and understanding what happens.
Terminal shows this:
Numbers (2, 3, 4)
Calc 9
n 4
Numbers (2, 3, 4)
3.0
As I was studied that *numbers are non-keyword arguments and they are tuple.
So in tuple numbers, I have 3 elements (Numbers (2, 3, 4)) but n is 4 for some reason.
I've made arithmetic mean like sum / (n - 1) but it looks like a weird solution.
Any ideas why is that happened? Why is n not 3?
Thanks for the reply.
UPDATE
Thanks for the reply, sorry maybe I'm stupid but I really can't understand how to make code in comments here 'readable'. As I got it right they are cannot be multi-line. This way I add it to question
So I had a close task:
array = [1, 2, 3, 4, 5]
calc = 0
for n in array:
calc += n
print("Sum", calc)
print("Arithmetic mean", calc / n)
print(n)
And terminal shows this
Sum 15
Arithmetic mean 3.0
5
So in this case there is no such error you are talking about, the array has 5 elements and n is 5. why? Here and there for loop

The value n in the statement for n in enumerate(numbers): is a variable local to the for loop, it doesn't exist outside of the loop, therefore when attempting the statement result = summ / (n - 1), you are most likely getting an error related to an undefined variable n. To correct this error do:
result = summ/len(numbers)
Given that your input is now named array
array = [1, 2, 3, 4, 5]
calc = 0
for n in array:
calc += n
print("Sum", calc)
print("Arithmetic mean", calc / len(array))

Optimizing a factorial function in python

So i have achieved this function with unpacking parameter(*x), but i want to make it display the result not return it , and i want a good optimization meaning i still need it to be a two lines function
1.def fac(*x):
2.return (fac(list(x)[0], list(x)[1] - 1)*list(x)[1]) if list(x)[1] > 0 else 1//here i need the one line to print the factorial
i tried achieving this by implementing lambda but i didn't know how to pass the *x parameter

Your factorial lambda is correct. I take it that you would like to calculate the factorials for a list say [1, 2, 3] and output the results, this is how you can achieve this.
fact = lambda x: x*fact(x-1) if x > 0 else 1
print(*[fact(i) for i in [1, 2, 3]])
Which will output: 1, 2, 6
Another option, if you have python 3.8 is to use a list comprehension with the new walrus operator (:=), this is a bit more tricky but will calculate and output all factorials up to n inclusive whilst still fitting in your required two lines.
fac, n = 1, 5
print(*[fac for i in range(1, n+1) if (fac := fac*i)])
Which will output: 1, 2, 6, 24, 120

The optimized factorial number is display by the function that i have created below.
def fact(n):
list_fact = []
if n > 1 and n not in list_fact:
list_fact.extend(list(range(1, n + 1)))
return reduce(lambda x, y: x * y, list_fact)
print(fact(9000)) # it will display output within microseconds.
Note:
while iteration i saved all previous values into a list, so that computation of each value is not going to happen each time.

Finding the largest palindrome product of two 3-digit numbers: what is the error in logic?

I thought of solving this problem in the following way: start with two variables with value 999, multiplying one by another in a loop that decrements one or the other until a palindrome is found. The code is this:
def is_palindrome(n):
if str(n) == str(n)[::-1]:
return True
else:
return False
def largest_palindrome_product_of_3_digit():
x = 999
y = 999
for i in reversed(range(x + y + 1)):
if is_palindrome(x * y):
return x * y
if i % 2 == 0:
x -= 1
else:
y -= 1
The result of my method is 698896, while the correct result is 906609. Could you point me where my logic is incorrect?

Here are a couple of hints:
If n=y*x is any number in the range(600000, 700000) (for example) with y<=x, and x<1000, what's the smallest possible value of x?
If n is a palindromic number, both its first and last digit are 6, so what does that imply about the last digits of x & y?
Now generalize and figure out an efficient algorithm. :)
I've never done this problem before, but I just coded a reasonably fast algorithm that's around 2000 times faster than a brute-force search that uses
for x in xrange(2, 1000):
for y in xrange(2, x+1):
n = y*x
#etc
According to timeit.py, the brute-force algorithm takes around 1.29 seconds on my old machine, the algorithm I hinted at above takes around 747 microseconds.
Edit
I've improved my bounds (and modified my algorithm slightly) and brought the time down to 410 µsec. :)
To answer your questions in the comment:
Yes, we can start x at the square root of the beginning of the range, and we can stop y at x (just in case we find a palindromic square).
What I was getting at with my 2nd hint is that for x=10*I+i, y=10*J+j, we don't need to test all 81 combinations of i and j, we only need to test the ones where (i*j)%10 equals the digit we want. So if we know that our palindrome starts and ends with 9 then (i, j) must be in [(1, 9), (3, 3), (7, 7), (9, 1)].
I don't think I should post my actual code here; it's considered bad form on SO to post complete solutions to Project Euler problems. And perhaps some SO people don't even like it when people supply hints. Maybe that's why I got down-voted...

You're missing possible numbers.
You're considering O(x+y) numbers and you need to consider O(x * y) numbers. Your choices are, essentially, to either loop one of them from 999, down to 1, then decrement the other and...
Simple demonstration:
>>> want = set()
>>> for x in [1, 2, 3, 4, 5]:
... for y in [1, 2, 3, 4, 5]:
... want.add(x * y)
...
>>> got = set()
>>> x = 5
>>> y = 5
>>> for i in reversed(range(x + y + 1)):
... got.add(x * y)
... if i % 2:
... x -= 1
... else:
... y -= 1
...
>>> want == got
False
Alternatively, you do know the top of the range (999 * 999) and you can generate all palindromic numbers in that range, from the highest to the lowest. From there, doing a prime factorization and checking if there's a split of the factors that multiply to two numbers in the range [100,999] is trivial.

Finding numbers from a to b not divisible by x to y

This is a problem I've been pondering for quite some time.
What is the fastest way to find all numbers from a to b that are not divisible by any number from x to y?
Consider this:
I want to find all the numbers from 1 to 10 that are not divisible by 2 to 5.
This process will become extremely slow if I where to use a linear approach;
Like this:
result = []
a = 1
b = 10
x = 2
y = 5
for i in range(a,b):
t = False
for j in range(x,y):
if i%j==0:
t = True
break
if t is False:
result.append(i)
return result
Does anybody know of any other methods of doing this with less computation time than a linear solution?
If not, can anyone see how this might be done faster, as I am blank at this point...
Sincerely,
John
[EDIT]
The range of the number are 0 to >1,e+100
This is true for a, b, x and y

You only need to check prime values in the range of the possible divisors - for example, if a value is not divisible by 2, it won't be divisible by any multiple of 2 either; likewise for every other prime and prime multiple. Thus in your example you can check 2, 3, 5 - you don't need to check 4, because anything divisible by 4 must be divisible by 2. Hence, a faster approach would be to compute primes in whatever range you are interested in, and then simply calculate which values they divide.
Another speedup is to add each value in the range you are interested in to a set: when you find that it is divisible by a number in your range, remove it from the set. You then should only be testing numbers that remain in the set - this will stop you testing numbers multiple times.
If we combine these two approaches, we see that we can create a set of all values (so in the example, a set with all values 1 to 10), and simply remove the multiples of each prime in your second range from that set.
Edit: As Patashu pointed out, this won't quite work if the prime that divides a given value is not in the set. To fix this, we can apply a similar algorithm to the above: create a set with values [a, b], for each value in the set, remove all of its multiples. So for the example given below in the comments (with [3, 6]) we'd start with 3 and remove it's multiples in the set - so 6. Hence the remaining values we need to test would be [3, 4, 5] which is what we want in this case.
Edit2: Here's a really hacked up, crappy implementation that hasn't been optimized and has horrible variable names:
def find_non_factors():
a = 1
b = 1000000
x = 200
y = 1000
z = [True for p in range(x, y+1)]
for k, i in enumerate(z):
if i:
k += x
n = 2
while n * k < y + 1:
z[(n*k) - x] = False
n += 1
k = {p for p in range(a, b+1)}
for p, v in enumerate(z):
if v:
t = p + x
n = 1
while n * t < (b + 1):
if (n * t) in k:
k.remove(n * t)
n += 1
return k
Try your original implementation with those numbers. It takes > 1 minute on my computer. This implementation takes under 2 seconds.

Ultimate optimization caveat: Do not pre-maturely optimize. Any time you attempt to optimize code, profile it to ensure it needs optimization, and profile the optimization on the same kind of data you intend it to be optimized for to confirm it is a speedup. Almost all code does not need optimization, just to give the correct answer.
If you are optimizing for small x-y and large a-b:
Create an array with length that is the lowest common multiple out of all the x, x+1, x+2... y. For example, for 2, 3, 4, 5 it would be 60, not 120.
Now populate this array with booleans - false initially for every cell, then for each number in x-y, populate all entries in the array that are multiples of that number with true.
Now for each number in a-b, index into the array modulo arraylength and if it is true, skip else if it is false, return.
You can do this a little quicker by removing from you x to y factors numbers whos prime factor expansions are strict supersets of other numbers' prime factor expansions. By which I mean - if you have 2, 3, 4, 5, 4 is 2*2 a strict superset of 2 so you can remove it and now our array length is only 30. For something like 3, 4, 5, 6 however, 4 is 2*2 and 6 is 3*2 - 6 is a superset of 3 so we remove it, but 4 is not a superset of everything so we keep it in. LCM is 3*2*2*5 = 60. Doing this kind of thing would give some speed up on its own for large a-b, and you might not need to go the array direction if that's all you need.
Also, keep in mind that if you aren't going to use the entire result of the function every single time - like, maybe sometimes you're only interested in the lowest value - write it as a generator rather than as a function. That way you can call it until you have enough numbers and then stop, saving time.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.