Sum of subsequences recursion in Python - python

Over the weekend I was working on the Ad Infinitum challenge on HackerRank.
One problem was to calculate the sum of all subsequences of a finite sequence, if each subsequence is thought of as an integer.
For example, sequence 4,5,6 would give answer 4 + 5 + 6 + 45 + 46 + 56 + 456 = 618.
I found a recursion and wrote the Python code below. It solved 5/13 test cases.
The remaining 8/13 test cases had runtime errors.
I was hoping someone could spy where in the code the inefficiencies lie, and how they can be sped up. Or, help me decide that it must be that my recursion is not the best strategy.
# Input is a list, representing the given sequence, e.g. L = [4,5,6]
def T(L):
limit = 10**9 + 7 # answer is returned modulo 10**9 + 7
N = len(L)
if N == 1:
return L[0]
else:
last = L[-1]
K = L[:N-1]
ans = T(K)%limit + 10*T(K)%limit + (last%limit)*pow(2,N-1,limit)
return ans%limit

This is my submission for the same problem (Manasa and Sub-sequences).
https://www.hackerrank.com/contests/infinitum-may14/challenges/manasa-and-sub-sequences
I hope this will help you to think of a better way.
ans = 0
count = 0
for item in raw_input():
temp = (ans * 10 + (count + 1)*(int(item)))%1000000007
ans = (ans + temp)%1000000007
count = (count*2 + 1)%1000000007
print ans

Well, you want the combinations:
from itertools import combinations
def all_combinations(iterable):
for r in range(len(digits)):
yield from combinations(digits, r+1)
And you want to convert them to integers:
def digits_to_int(digits):
return sum(10**i * digit for i, digit in enumerate(reversed(digits)))
And you want to sum them:
sum(map(digits_to_int, all_combinations([4, 5, 6])))
Then focus on speed.

Assuming you mean continuous subsequence.
test = [4, 5, 6]
def coms(ilist):
olist = []
ilist_len = len(ilist)
for win_size in range(ilist_len, 0, -1):
for offset in range((ilist_len - win_size) + 1):
subslice = ilist[offset: offset + win_size]
sublist = [value * (10 ** power) for (power, value) in enumerate(reversed(subslice))]
olist.extend(sublist)
return olist
print sum(coms(test))

Related

What is the optimal way to remove numbers from a list that are multiples of any other number in the list

I was attempting to generalize an optimal answer to Problem 1 on Project Euler, and realized that while using the inclusion/exclusion method, the answer comes out wrong if you enter in a list where one of the numbers is a multiple of any of the other numbers in the list.
For example, with a limit of 1000, and a list of [3, 5, 6, 8], the answer comes out as 306004, but the answer SHOULD come out as 266824. This is because 6 is a multiple of 3, and needs to be removed.
I came up with the following code to remove extraneous multiples from a list:
def cleanMults(mults):
m = sorted(mults)
x = [m[0]]
for i in range(len(m) - 1, 0, -1):
multFound = False
for j in range(i - 1, -1, -1):
if m[i] % m[j] == 0:
multFound = True
break
if multFound == False: x.append(m[i])
return sorted(x)
Here I'm sorting the list (just in case it's out of order), then starting with the last element in the list, comparing every element to every other element. If an element being divided by another element results in a remainder of 0, then we set multFound = True, break, and don't add it to the solution list. Otherwise, if no divisor is found, we do add it to the list.
My question is, is there a more optimized way to do this? Even ignoring the sort, this runs in O(n^2) time. I know there is a way to compare two lists in O(n log(n)) time, but this isn't quite the same thing as that. Anybody have any ideas or solutions here?
You can use the gcd function going forward and backward once through the list. This will allow you to identify the divisors (e.g. 3) for which there is at least one multiple in the list. You can then filter the list based on these divisors.
from math import gcd
def cleanMult(A):
divisors = []
for d in (1,-1):
seen = 1
for a in A[::d]:
if seen%a: seen = a*seen//gcd(a,seen)
else: divisors.append(a)
return [a for a in A if all(d==a or a%d for d in divisors)]
print(cleanMult([3,5,6,8]))
# [3, 5, 8]
print(cleanMult([6,5,9,15,16,12,8]))
# [6, 5, 9, 8]
However, for Euler problem one, there is a much simple way to get the answer:
sum(n for n in range(1,100) if n%5==0 or n%3==0)
You can also generate all multiples and place them in a set to only count them once each:
multiset = set()
N = 1000
for f in [3,5,6,8]:
multiset.update(range(f,N,f))
total = sum(multiset)
# 266824
or get some inspiration from the sieve or Eratosthenes:
N=1000
sieve = [0]*N
for f in [3,5,6,8]:
sieve[f::f] = range(f,N,f)
total = sum(sieve)
# 266824
At the very least, you need to collect all of the numbers, de-duplicate, sort, and then check lower numbers against higher ones. There are tricks you can use to streamline the process; you can make bit maps, check primality, etc. However, you still have an inherently O(N^2) process.
However, the faster way to solve the original problem is to take the sums of the individual arithmetic sequences:
min3 = 3
max3 = (1000 // 3) * 3
cnt3 = max3 // 3
sum3 = (min3 + max3) * cnt3/ 2
min5 = 5
max5 = (1000 // 5) * 5
cnt5 = max5 // 5
sum5 = (min5 + max5) * cnt5 / 2
Now, since you've double-counted everything with both factors, you need to subtract the extra instance of multiples of 15. Compute sum15 in the same fashion.
Finally:
total = sum3 + sum5 - sum15
To generalize this to even more factors, composite numbers must be subtracted k-1 times, where k is the quantity of factors. Generalizing this for your general case may give you a solution more time-efficient than removing multiples.
In short, doing the direct sequence computations greatly lowers the cost of having extra factors.
Does that get you moving?
Here's what I currently have:
import time
from math import prod
from itertools import combinations as co
def SoM(lim, mul):
n = (lim - 1) //mul
return (n * (n + 1) * mul) // 2
def inex(lim, mults):
ans = 0
for i in range(len(mults)):
for j in co(mults, i + 1):
ans += (-1)**i * SoM(lim, prod(list(j)))
return ans
def cleanMults(mults):
m = sorted(mults)
x = [m[0]]
for i in range(len(m) - 1, 0, -1):
multFound = False
for j in range(i - 1, -1, -1):
if m[i] % m[j] == 0: multFound = True
if multFound == False: x.append(m[i])
return sorted(x)
def toString(mults):
if len(mults) == 1: return str(list(mults)[0])
s = 'or ' + str(list(mults)[-1])
for i in range(len(mults) - 2, -1, -1):
s = str(list(mults)[i]) + ', ' + s
return s
def SumOfMults(lim, mults):
#Declare variables
start = time.time()
strnums, m = '', cleanMults(mults)
#Solve the problem
ans = str(inex(lim, m))
#Print the results
print('The sum of all of the multiples of ')
print(toString(mults) + ' below ' + str(lim) + ' is ' + ans + '.')
print('This took ' + str(time.time() - start) + ' seconds to calculate.')
I'm assuming there are no duplicates, though if there are, all I have to do is cast the list to a set, then back to a list again.
You say that:
"To generalize this to even more factors, composite numbers must be subtracted k-1 times, where k is the quantity of factors."
Can you go into a little more detail of what you mean, with examples?
In an example like the list [3, 5, 26], the 26 is a composite number, and it only has 2 factors (2, and 13), so if I'm understanding correctly, the sum of all factors of 26 up to the limit needs to be subtracted from the total once?
With my logic, I do currently do the following:
sum3 + sum5 + sum26 - sum78 (26 * 3) - sum130 (26 * 5) - sum15 (3 * 5) + sum1170 (3 * 5 * 26).
Are you suggesting there is a more efficient way to calculate this?

Python: Reduce runtime?

I recently started to learn python and i'm using CodeWars to train. The task is to return a list [p, p + 4, p + 6, p + 10, p + 12, p + 16] where all of them are primes. The sum of them should be higher than sum_limit. For low values it is working, but at high values (about 2 million) the runtime is high. How can I reduce the runtime?
from math import sqrt; from itertools import count, islice
def find_primes_sextuplet(sum_limit):
for x in range(sum_limit):
if isPrime(x) and isPrime(x+4) and isPrime(x+6) and isPrime(x+10) and isPrime(x+12) and isPrime(x+16):
possible = [x, x+4, x+6, x+10, x+12, x+16]
if sum(possible) > sum_limit:
return possible
def isPrime(n):
return n > 1 and all(n%i for i in islice(count(2), int(sqrt(n)-1)))
print(find_primes_sextuplet(2000000))
For non-negative integer values of n, you can use this:
def isPrime(n):
if n == 1 or n % 2 == 0 or n % 3 == 0:
return False
end = int(sqrt(n)+1)
for start in [5, 7]:
for k in range(start, end, 6):
if n % k == 0:
return False
return True
It won't change the theoretical complexity, but it will reduce the practical running-time.
And if you change the outer loop to for x in range(5, sum_limit), then you can also get rid of the initial check if n == 1 or n % 2 == 0 or n % 3 == 0.
Here's my thinking about reducing complexity and run time.
You can write a sieve in O(n log log n). Here's a reasonable implementation:
def sieve(n):
grid = [None for _ in range(n+1)]
i = 2
while i < n+1:
if grid[i] is None:
grid[i] = True
for p in range(2*i, n+1, i):
grid[p] = False
else:
i += 1
return (index for index, b in enumerate(grid) if b)
There are 6 numbers, and the total amount added to the first number is 48. So the minimum possible value for the first number is (n - 48) / 6. In my sieve we can iterate the generator until number is greater than that.
def get_remaining_sieve(n):
g = sieve(n)
current = next(g)
min_value = (n - 48) / 6
while current < min_value:
current = next(g)
return [current] + list(g)
Now just iterate through every slice of length 6, and check if the separation matches the desired separation (4, 2, 4, 2, 4).
remaining = get_remaining_sieve(n)
for start in range(len(remaining) - 5):
slice = remaining[start:start+6]
differences = [slice[j] - slice[j-1] for j in range(1, 6)]
if differences == [4, 2, 4, 2, 4]:
print(slice)
Summary
Based on those principles, I've come up with this solution:
from itertools import dropwhile, islice
def get_solutions(n):
grid = [None for _ in range(n+1)]
i = 2
while i < n+1:
if grid[i] is None:
grid[i] = True
for p in range(2*i, n+1, i):
grid[p] = False
else:
i += 1
sieve = (index for index, b in enumerate(grid) if b)
min_value = (n - 48) / 6
reduced_sieve = dropwhile(lambda v: v < min_value, sieve)
reference_slice = list(islice(reduced_sieve, 6))
while True:
try:
ref = reference_slice[0]
differences = [v - ref for v in reference_slice[1:]]
if differences == [4, 6, 10, 12, 16]:
yield reference_slice
reference_slice = reference_slice[1:] + [next(reduced_sieve)]
except StopIteration:
break
n = 2000000
print(next(get_solutions(n))) # 695ms
# or for all solutions
for solution in get_solutions(n): # 755ms
print(solution)
This runs in less than a second on my computer.
There are various ways to improve the runtime of your code. For example a lot of numbers are checked for being prime numbers even though their sum is not eligible as a result. Calculating the sum of 6 numbers is faster than checking if they are prime. You could move the sum check above the prime check and only check the numbers for primes if their sum would be eligible.
To improve this further you could skip numbers which will not result in an eligible sum by starting the range at the floor of possible numbers.
x + x + 4 + x + 6 + x + 10 + x + 12 + x + 16 = 6x + 48
which is supposed to be above your sum_limit
6x + 48 >= sum_limit
x >=(sum_limit - 48) / 6
So if your range starts at x you will skip all numbers which would not result in an eligible sum anyway.
You would also be able to improve runtime by skipping even numbers in your loop(via range(y,x,2)).
Further improving the runtime would require you to adjust the isPrime function.

Fibonacci series in bit string

I am working on Fibonacci series but in bit string which can be represented as:
f(0)=0;
f(1)=1;
f(2)=10;
f(3)=101;
f(4)=10110;
f(5)=10110101;
Secondly, I have a pattern for example '10' and want to count how many times this occurs in particular series, for example, the Fibonacci series for 5 is '101101101' so '10' occur 3 times.
my code is running correctly without error but the problem is that it cannot run for more than the value of n=45 I want to run n=100
can anyone help? I only want to calculate the count of occurrence
n=5
fibonacci_numbers = ['0', '1']
for i in range(1,n):
fibonacci_numbers.append(fibonacci_numbers[i]+fibonacci_numbers[i-1])
#print(fibonacci_numbers[-1])
print(fibonacci_numbers[-1])
nStr = str (fibonacci_numbers[-1])
pattern = '10'
count = 0
flag = True
start = 0
while flag:
a = nStr.find(pattern, start)
if a == -1:
flag = False
else:
count += 1
start = a + 1
print(count)
This is a fun one! The trick is that you don't actually need that giant bit string, just the number of 10s it contains and the edges. This solution runs in O(n) time and O(1) space.
from typing import NamedTuple
class FibString(NamedTuple):
"""First digit, last digit, and the number of 10s in between."""
first: int
tens: int
last: int
def count_fib_string_tens(n: int) -> int:
"""Count the number of 10s in a n-'Fibonacci bitstring'."""
def combine(b: FibString, a: FibString) -> FibString:
"""Combine two FibStrings."""
tens = b.tens + a.tens
# mind the edges!
if b.last == 1 and a.first == 0:
tens += 1
return FibString(b.first, tens, a.last)
# First two values are 0 and 1 (tens=0 for both)
a, b = FibString(0, 0, 0), FibString(1, 0, 1)
for _ in range(1, n):
a, b = b, combine(b, a)
return b.tens # tada!
I tested this against your original implementation and sure enough it produces the same answers for all values that the original function is able to calculate (but it's about eight orders of magnitude faster by the time you get up to n=40). The answer for n=100 is 218922995834555169026 and it took 0.1ms to calculate using this method.
The nice thing about the Fibonacci sequence that will solve your issue is that you only need the last two values of the sequence. 10110 is made by combining 101 and 10. After that 10 is no longer needed. So instead of appending, you can just keep the two values. Here is what I've done:
n=45
fibonacci_numbers = ['0', '1']
for i in range(1,n):
temp = fibonacci_numbers[1]
fibonacci_numbers[1] = fibonacci_numbers[1] + fibonacci_numbers[0]
fibonacci_numbers[0] = temp
Note that it still uses a decent amount of memory, but it didn't give me a memory error (it does take a bit of time to run though).
I also wasn't able to print the full string as I got an OSError [Errno 5] Input/Output error but it can still count and print that output.
For larger numbers, storing as a string is going to quickly cause a memory issue. In that case, I'd suggest doing the fibonacci sequence with plain integers and then converting to bits. See here for tips on binary conversion.
While the regular fibonacci sequence doesn't work in a direct sense, consider that 10 is 2 and 101 is 5. 5+2 doesn't work - you want 10110 or an or operation 10100 | 10 yielding 22; so if you shift one by the length of the other, you can get the result. See for example
x = 5
y = 2
(x << 2) | y
>> 22
Shifting x by the number of bits representing y and then doing a bitwise or with | solves the issue. Python summarizes these bitwise operations well here. All that's left for you to do is determine how many bits to shift and implement this into your for loop!
For really large n you will still have a memory issue shown in the plot:
'
Finally i got the answer but can someone explain it briefly why it is working
def count(p, n):
count = 0
i = n.find(p)
while i != -1:
n = n[i + 1:]
i = n.find(p)
count += 1
return count
def occurence(p, n):
a1 = "1"
a0 = "0"
lp = len(p)
i = 1
if n <= 5:
return count(p, atring(n))
while lp > len(a1):
temp = a1
a1 += a0
a0 = temp
i += 1
if i >= n:
return count(p, a1)
fn = a1[:lp - 1]
if -lp + 1 < 0:
ln = a1[-lp + 1:]
else:
ln = ""
countn = count(p, a1)
a1 = a1 + a0
i += 1
if -lp + 1 < 0:
lnp1 = a1[-lp + 1:]
else:
lnp1 = ""
k = 0
countn1 = count(p, a1)
for j in range(i + 1, n + 1):
temp = countn1
countn1 += countn
countn = temp
if k % 2 == 0:
string = lnp1 + fn
else:
string = ln + fn
k += 1
countn1 += count(p, string)
return countn1
def atring(n):
a0 = "0"
a1 = "1"
if n == 0 or n == 1:
return str(n)
for i in range(2, n + 1):
temp = a1
a1 += a0
a0 = temp
return a1
def fn():
a = 100
p = '10'
print( occurence(p, a))
if __name__ == "__main__":
fn()

Calculating the sum of the 4th power of each digit, why do I get a wrong result?

I am trying to complete Project Euler question #30, I decided to verify my code against a known answer. Basically the question is this:
Find the sum of all the numbers that can be written as the sum of fifth powers of their digits.
Here is the known answer I am trying to prove with python:
1634 = 1^4 + 6^4 + 3^4 + 4^4
8208 = 8^4 + 2^4 + 0^4 + 8^4
9474 = 9^4 + 4^4 + 7^4 + 4^4
As 1 = 1^4 is not a sum it is not included.
The sum of these numbers is 1634 + 8208 + 9474 = 19316.
When I run my code I get all three of the values which add up to 19316, great! However among these values there is an incorrect one: 6688
Here is my code:
i=1
answer = []
while True:
list = []
i=i+1
digits = [int(x) for x in str(i)]
for x in digits:
a = x**4
list.append(a)
if sum(list) == i:
print(sum(list))
answer.append(sum(list))
The sum of list returns the three correct values, and the value 6688. Can anybody spot something I have missed?
You are checking the sum too early. You check for a matching sum for each individual digit in the number, and 6 ^ 4 + 6 ^ 4 + 8 ^ 4 is 6688. That's three of the digits, not all four.
Move your sum() test out of your for loop:
for x in digits:
a = x**4
list.append(a)
if sum(list) == i:
print(sum(list))
answer.append(sum(list))
At best you could discard a number early when the sum already exceeds the target:
digitsum = 0
for d in digits:
digitsum += d ** 4
if digitsum > i:
break
else:
if digitsum == i:
answer.append(i)
but I'd not bother with that here, and just use a generator expression to combine determining the digits, raising them to the 4th power, and summing:
if sum(int(d) ** 4 for d in str(i)) == i:
answer.append(i)
You haven't defined an upper bound, the point where numbers will always be bigger than the sum of their digits and you need to stop incrementing i. For the sum of nth powers, you can find such a point by taking 9 ^ n, counting its digits, then taking the number of digits in the nth power of 9 times the nth power of 9. If this creates a number with more digits, continue on until the number of digits no longer changes.
In the same vein, you can start i at max(10, 1 + 2 ** n), because the smallest sum you'll be able to make from digits will be using a single 2 digit plus the minimum number of 1 and 0 digits you can get away with, and at any power greater than 1, the power of digits other than 1 and 0 is always greater than the digit value itself, and you can't use i = 1:
def determine_bounds(n):
"""Given a power n > 1, return the lower and upper bounds in which to search"""
nine_power, digit_count = 9 ** n, 1
while True:
upper = digit_count * nine_power
new_count = len(str(upper))
if new_count == digit_count:
return max(10, 2 ** n), upper
digit_count = new_count
If you combine the above function with range(*<expression>) variable-length parameter passing to range(), you can use a for loop:
for i in range(*determine_bounds(4)):
# ...
You can put determining if a number is equal to the sum of its digits raised to a given power n in a function:
def is_digit_power_sum(i, n):
return sum(int(d) ** n for d in str(i)) == i
then you can put everything into a list comprehension:
>>> n = 4
>>> [i for i in range(*determine_bounds(n)) if is_digit_power_sum(i, n)]
[1634, 8208, 9474]
>>> n = 5
>>> [i for i in range(*determine_bounds(n)) if is_digit_power_sum(i, n)]
[4150, 4151, 54748, 92727, 93084, 194979]
The is_digit_power_sum() could benefit from a cache of powers; adding a cache makes the function more than twice as fast for 4-digit inputs:
def is_digit_power_sum(i, n, _cache={}):
try:
powers = _cache[n]
except KeyError:
powers = _cache[n] = {str(d): d ** n for d in range(10)}
return sum(powers[d] for d in str(i)) == i
and of course, the solution to the question is the sum of the numbers:
n = 5
answer = sum(i for i in range(*determine_bounds(n)) if is_digit_power_sum(i, n))
print(answer)
which produces the required output in under half a second on my 2.9 GHz Intel Core i7 MacBook Pro, using Python 3.8.0a3.
Here Fixed:
i=1
answer = []
while True:
list = []
i=i+1
digits = [int(x) for x in str(i)]
for x in digits:
a = x**4
list.append(a)
if sum(list) == i and len(list) == 4:
print(sum(list))
answer.append(sum(list))
The bug I found:
6^4+6^4+8^4 = 6688
So I just put a check for len of list.

Maximum sum of sublist with a specific length

I'm supposed to write a function which takes two numbers, the first is a given number, and the second is the length for the maximum sublist that I'm supposed to find:
for example input (1234,2)
the output would be 7
this is my code so far, it just computes the sum of the entire digits:
def altsum_digits(n,d):
b=str(n)
c=[]
for digit in b:
c.append(int(digit))
maxthere=0
realmax=0
for a in str(d):
for i in c:
maxthere=max(0,(maxthere+int(i)))
realmax=max(maxthere,realmax)
maxthere==0
print(realmax)
By what i get from question, this should do what you want:
def do(n, d):
print sum(sorted([int(x) for x in str(n)])[-d:])
let's say you get a number n, and a length k.
What you have to do is first turn n into a list of numbers, and then use a sliding window of size k where at each step you add the next number, and substract the first one in the sliding window, and keep track of the max_sum so you can return it at the end.
The function would look something like this
def altsum_digits(n, k):
list_n = [int(x) for x in str(n)]
max_sum = sum(list_n[:k])
for i in range(k, len(list_n)):
current_sum = current_sum + list_n[i] - list_n[i - k]
max_sum = max(current_sum, max_sum)
return max_sum
It's an O(n) solution, so it's a lot better than generating all sublists of size k. Hope it helps!
Let's clarify to make sure we're on the same page.
Inputs: 1) a list li of digits; 2) n
Output: the slice from li of length n that has maximal sum.
li = [4,2,1,7,1,3,8,4,7,8,1]
n = 2
slices = (li[x:x+n] for x in range(len(li)-n+1))
max(map(sum,slices))
Out[113]: 15
def sublists(lst, n):
return (lst[i:i+n] for i in range(len(lst) - n + 1))
def max_sublist_sum(lst, n):
return max(sum(sub) for sub in sublists(lst, n))
max_sublist_sum([1,2,3,4], 2) # => 7
This should do the trick:
def altsum_digits(n, d):
l = list(map(int, str(n)))
m = c = sum(l[:d])
for i in range(0, len(l)-d):
c = c - l[i] + l[i+d]
if c > m: m = c
print m
altsum_digits(1234,2)
>>> 7
I think I understand what you're asking, and here is my solution. I've tested it on your input as well as other inputs with varying lengths of substring. This code finds the maximum sum of adjacent substrings in the input.
def sum_of_sublist(input, maxLength):
input_array = [int(l) for l in str(input)]
tempMax = 0
realMax = 0
for i in range(len(input_array) - (maxLength - 1)):
for inc in range(0, maxLength):
tempMax += input_array[i+inc]
if tempMax > realMax:
realMax = tempMax
tempMax = 0
print realMax
sum_of_sublist(1234, 2)
So, for an input for the call sum_of_sublist(1234, 2), it will print the value 7 because the largest sum of 2 consecutive numbers is 3 + 4 = 7. Similarly, for the callsum_of_sublist(12531, 3), the program will print 10 because the largest sum of 3 consecutive numbers is 2 + 5 + 3 = 10.

Categories

Resources