I am an amateur Python coder trying to find an efficient solution for Project Euler Digit Sum problem. My code returns the correct result but is is inefficient for large integers such as 1234567890123456789. I know that the inefficiency lies in my sigma_sum function where there is a 'for' loop.
I have tried various alternate solutions such as loading the values into an numpy array but ran out of memory with large integers with this approach. I am eager to learn more efficient solutions.
import math
def sumOfDigits(n: int) :
digitSum = 0
if n < 10: return n
else:
for i in str(n): digitSum += int(i)
return digitSum
def sigma_sum(start, end, expression):
return math.fsum(expression(i) for i in range(start, end))
def theArguement(n: int):
return n / sumOfDigits(n)
def F(N: int) -> float:
"""
>>> F(10)
19
>>> F(123)
1.187764610390e+03
>>> F(12345)
4.855801996238e+06
"""
s = sigma_sum(1, N + 1, theArguement)
if s.is_integer():
print("{:0.0f}".format(s))
else:
print("{:.12e}".format(s))
print(F(123))
if __name__ == '__main__':
import doctest
doctest.testmod()
Try solving a different problem.
Define G(n) to be a dictionary. Its keys are integers representing digit sums and its values are the sum of all positive integers < n whose digit sum is the key. So
F(n) = sum(v / k for k, v in G(n + 1).items())
[Using < instead of ≤ simplifies the calculations below]
Given the value of G(a) for any value, how would you calculate G(10 * a)?
This gives you a nice easy way to calculate G(x) for any value of x. Calculate G(x // 10) recursively, use that to calculate the value G((x // 10) * 10), and then manually add the few remaining elements in the range (x // 10) * 10 ≤ i < x.
Getting from G(a) to G(10 * a) is mildly tricky, but not overly so. If your code is correct, you can use calculating G(12346) as a test case to see if you get the right answer for F(12345).
Related
I tried to solve this Kata problem
Write a function that takes an integer as input, and returns the number of bits that are equal to one in the binary representation of that number. You can guarantee that input is non-negative.
Example: The binary representation of 1234 is 10011010010, so the function should return 5 in this case.
But the problem is my code gives the correct answers right for the first time around but when
it is run for 2nd time onward it gives wrong answers only. I think it has to do sth with how my code is recursive. Please help me figure it out.
for example when i run count_bits(24) it gives the output 2 which is correct but when i run the same function again it would give 4 and then 6 and so on . I dont know what's wrong with this
My code.
dec_num = []
def count_bits(n):
def DecimalToBinary(n):
if n >= 1:
DecimalToBinary(n // 2)
dec_num.append( n % 2)
return dec_num
dec = DecimalToBinary(n)
return dec.count(1)
There is no need to create a list of digits if all you need is their sum. When possible, avoid creating mutable state in a recursive solution:
def count_bits(n):
if n > 0:
return n % 2 + count_bits(n // 2)
else:
return 0
That's a pretty natural translation of the obvious algorithm:
The sum of the bits in a number is the last bit plus the sum of the bits in the rest of the number.
Sometimes it's convenient to accumulate a result, which is best done by adding an accumulator argument. In some languages, that can limit stack usage, although not in Python which doesn't condense tail calls. All the same, some might find this more readable:
def count_bits(n, accum = 0):
if n > 0:
return count_bits(n // 2, accum + n % 2)
else:
return accum
In Python, a generator is a more natural control structure:
def each_bit(n):
while n > 0:
yield n % 2
n //= 2
def count_bits(n):
return sum(each_bit(n))
Of course, there are lots more ways to solve this problem.
That is because dec_num is outside the method, so it's reused at every call, put it inside
def count_bits(n):
dec_num = []
def DecimalToBinary(n):
if n >= 1:
DecimalToBinary(n // 2)
dec_num.append(n % 2)
return dec_num
dec = DecimalToBinary(n)
return dec.count(1)
How do I go about problems like this one: T(n) = T(n/3) + T(n/4) + O(n^2)
I am able to use two for loops which will give me the O(n^2), right?
To interpret the equation, read it in English as: "the running time for an input of size n equals the running time for an input of size n/3, plus the running time for an input of size n/4, plus a time proportional to n^2".
Writing code that runs in time proportional to n^2 is possible using nested loops, for example, though it's simpler to write a single loop like for i in range(n ** 2): ....
Writing code that runs in time equal to the time it takes for the algorithm with an input of size n/3 or n/4 is even easier - just call the algorithm recursively with an input of that size. (Don't forget a base case for the recursion to terminate on.)
Putting it together, the code could look something like this:
def nonsense_algorithm(n):
if n <= 0:
print('base case')
else:
nonsense_algorithm(n // 3)
nonsense_algorithm(n // 4)
for i in range(n ** 2):
print('constant amount of text')
I've used integer division (rounding down) because I don't think it makes sense otherwise.
Here, we would use for and range(start, stop, steps), maybe with a simple to understand algorithm, similar to:
def algorithm_n2(n: int) -> int:
"""Returns an integer for a O(N2) addition algorithm
:n: must be a positive integer
"""
if n < 1:
return False
output = 0
for i in range(n):
for j in range(n):
output += n * 2
for i in range(0, n, 3):
output += n / 3
for i in range(0, n, 4):
output += n / 4
return int(output)
# Test
if __name__ == "__main__":
print(algorithm_n2(24))
Using print() inside a method is not a best practice.
Output
27748
I'm trying to write a program to find sum of first N natural numbers i.e. 1 + 2 + 3 + .. + N modulo 1000000009
I know this can be done by using the formula N * (N+1) / 2 but I'm trying to find a sort of recursive function to calculate the sum.
I tried searching the web, but I didn't get any solution to this.
Actually, the problem here is that the number N can have upto 100000 digits.
So, here is what I've tried until now.
First I tried splitting the number into parts each of length 9, then convert them into integers so that I can perform arithmetic operations using the operators for integers.
For example, the number 52562372318723712 will be split into 52562372 & 318723712.
But I didn't find a way to manipulate these numbers.
Then again I tried to write a function as follows:
def find_sum(n):
# n is a string
if len(n) == 1:
# use the formula if single digit
return int(int(n[0]) * (int(n[0]) + 1) / 2)
# I'm not sure what to return here
# I'm expecting some manipulation with n[0]
# and a recursive call to the function itself
# I've also not used modulo here just for testing with smaller numbers
# I'll add it once I find a solution to this
return int(n[0]) * something + find_sum(n[1:])
I'm not able to find the something here.
Can this be solved like this?
or is there any other method to do so?
NOTE: I prefer a solution similar to the above function because I want to modify this function to meet my other requirements which I want to try myself before asking here. But if it is not possible, any other solution will also be helpful.
Please give me any hint to solve it.
Your best bet is to just use the N*(N+1)/2 formula -- but using it mod p. The only tricky part is to interpret division by 2 -- this had to be the inverse of 2 mod p. For p prime (or simply for p odd) this is very easy to compute: it is just (p+1)//2.
Thus:
def find_sum(n,p):
two_inv = (p+1)//2 #inverse of 2, mod p
return ((n%p)*((n+1)%p)*two_inv)%p
For example:
>>> find_sum(10000000,1000000009)
4550000
>>> sum(range(1,10000001))%1000000009
4550000
Note that the above function will fail if you pass an even number for p.
On Edit as #user11908059 observed, it is possible to dispense with multiplication by the modular inverse of 2. As an added benefit, this approach no longer depends on the modulus being odd:
def find_sum2(n,k):
if n % 2 == 0:
a,b = (n//2) % k, (n+1) % k
else:
a,b = n % k, ((n+1)//2) % k
return (a*b) % k
I'm generating prime numbers from Fibonacci as follows (using Python, with mpmath and sympy for arbitrary precision):
from mpmath import *
def GCD(a,b):
while a:
a, b = fmod(b, a), a
return b
def generate(x):
mp.dps = round(x, int(log10(x))*-1)
if x == GCD(x, fibonacci(x-1)):
return True
if x == GCD(x, fibonacci(x+1)):
return True
return False
for x in range(1000, 2000)
if generate(x)
print(x)
It's a rather small algorithm but seemingly generates all primes (except for 5 somehow, but that's another question). I say seemingly because a very little percentage (0.5% under 1000 and 0.16% under 10K, getting less and less) isn't prime. For instance under 1000: 323, 377 and 442 are also generated. These numbers are not prime.
Is there something off in my script? I try to account for precision by relating the .dps setting to the number being calculated. Can it really be that Fibonacci and prime numbers are seemingly so related, but then when it's get detailed they aren't? :)
For this type of problem, you may want to look at the gmpy2 library. gmpy2 provides access to the GMP multiple-precision library which includes gcd() and fib() functions which calculate the greatest common divisor and the n-th fibonacci numbers quickly, and only using integer arithmetic.
Here is your program re-written to use gmpy2.
import gmpy2
def generate(x):
if x == gmpy2.gcd(x, gmpy2.fib(x-1)):
return True
if x == gmpy2.gcd(x, gmpy2.fib(x+1)):
return True
return False
for x in range(7, 2000):
if generate(x):
print(x)
You shouldn't be using any floating-point operations. You can calculate the GCD just using the builtin % (modulo) operator.
Update
As others have commented, you are checking for Fibonacci pseudoprimes. The actual test is slightly different than your code. Let's call the number being tested n. If n is divisible by 5, then the test passes if n evenly divides fib(n). If n divided by 5 leaves a remainder of either 1 or 4, then the test passes if n evenly divides fib(n-1). If n divided by 5 leaves a remainder of either 2 or 3, then the test passes if n evenly divides fib(n+1). Your code doesn't properly distinguish between the three cases.
If n evenly divides another number, say x, it leaves a remainder of 0. This is equivalent to x % n being 0. Calculating all the digits of the n-th Fibonacci number is not required. The test just cares about the remainder. Instead of calculating the Fibonacci number to full precision, you can calculate the remainder at each step. The following code calculates just the remainder of the Fibonacci numbers. It is based on the code given by #pts in Python mpmath not arbitrary precision?
def gcd(a,b):
while b:
a, b = b, a % b
return a
def fib_mod(n, m):
if n < 0:
raise ValueError
def fib_rec(n):
if n == 0:
return 0, 1
else:
a, b = fib_rec(n >> 1)
c = a * ((b << 1) - a)
d = b * b + a * a
if n & 1:
return d % m, (c + d) % m
else:
return c % m, d % m
return fib_rec(n)[0]
def is_fib_prp(n):
if n % 5 == 0:
return not fib_mod(n, n)
elif n % 5 == 1 or n % 5 == 4:
return not fib_mod(n-1, n)
else:
return not fib_mod(n+1, n)
It's written in pure Python and is very quick.
The sequence of numbers commonly known as the Fibonacci numbers is just a special case of a general Lucas sequence L(n) = p*L(n-1) - q*L(n-2). The usual Fibonacci numbers are generated by (p,q) = (1,-1). gmpy2.is_fibonacci_prp() accepts arbitrary values for p,q. gmpy2.is_fibonacci(1,-1,n) should match the results of the is_fib_pr(n) given above.
Disclaimer: I maintain gmpy2.
This isn't really a Python problem; it's a math/algorithm problem. You may want to ask it on the Math StackExchange instead.
Also, there is no need for any non-integer arithmetic whatsoever: you're computing floor(log10(x)) which can be done easily with purely integer math. Using arbitrary-precision math will greatly slow this algorithm down and may introduce some odd numerical errors too.
Here's a simple floor_log10(x) implementation:
from __future__ import division # if using Python 2.x
def floor_log10(x):
res = 0
if x < 1:
raise ValueError
while x >= 1:
x //= 10
res += 1
return res
I would like to compute
for values of n up to 1000000 as accurately as possible. Here is some sample code.
from __future__ import division
from scipy.misc import comb
def M(n):
return sum(comb(n,k,exact=True)*(1/n)*(1-k/n)**(2*n-k)*(k/n)**(k-1) for k in xrange(1,n+1))
for i in xrange(1,1000000,100):
print i,M(i)
The first problem is that I get OverflowError: long int too large to convert to float when n = 1101. This is because comb(n,k,exact=True) is too large to be converted to a float. The end result is however always a number around 0.159 .
I asked a related question at How to compute sum with large intermediate values however this question is different for three main reasons.
The formula I want to compute is different which causes different problems.
The solution proposed before to use exact=True does not help here as can be seen in the example I gave. Coding up my own implementation of comb is also not going to work as I still need to perform the floating point division.
I need to compute the answer for much bigger values than before which causes new problems. I suspect it can't be done without coding up the sum in some clever way.
A solution that doesn't crash is to use
from fractions import Fraction
def M2(n):
return sum(comb(n,k,exact=True)*Fraction(1,n)*(1-Fraction(k,n))**(2*n-k)*Fraction(k,n)**(k-1) for k in xrange(1,n+1))
for i in xrange(1,1000000,100):
print i, M2(i)*1.0
Unfortunately it is now so slow that I don't get an answer for n=1101 in a reasonable amount of time.
So the second problem is how to make it fast enough to complete for large n.
You can compute each summand in with a logarithm transformation that replaces multiplication, division, and exponentiation with addition, subtraction, and multiplication, respectively.
def summand(n,k):
lk=log(k)
ln=log(n)
a=(lk-ln)*(k-1)
b=(log(n-k)-ln)*(2*n-k)
c=-ln
d=sum(log(x) for x in xrange(n-k+1,n+1))-sum(log(x) for x in xrange(1,k+1))
return exp(a+b+c+d)
def M(n):
return sum(summand(n,k) for k in xrange(1,n))
Note that when k=n the summand will be zero so I do not compute it since the logarithm will be undefined.
You can use gmpy2. It has arbitrary precision floating point arithmetic with large exponent bounds.
from __future__ import division
from gmpy2 import comb,mpfr,fsum
def M(n):
return fsum(comb(n,k)*(mpfr(1)/n)*(mpfr(1)-mpfr(k)/n)**(mpfr(2)*n-k)*(mpfr(k)/n)**(k-1) for k in xrange(1,n+1))
for i in xrange(1,1000000,100):
print i,M(i)
Here is an excerpt of the output:
2001 0.15857490038127975
2101 0.15857582611615381
2201 0.15857666768820194
2301 0.15857743607577454
2401 0.15857814042739268
2501 0.15857878842787806
2601 0.15857938657957615
Disclaimer: I maintain gmpy2.
A rather brutal method is to compute all the factors and then mutliply in such a way that the result stays around 1.0 (Python 3.x):
def M(n):
return sum(summand(n, k) for k in range(1, n + 1))
def f1(n, k):
for i in range(k - 1):
yield k
for i in range(k):
yield n - i
def f2(n, k):
for i in range(k - 1):
yield 1 / n
for i in range(2 * n - k):
yield 1 - k / n
yield 1 / n
for i in range(2, k + 1):
yield 1 / i
def summand(n, k):
result = 1.0
factors1 = f1(n, k)
factors2 = f2(n, k)
while True:
empty1 = False
for factor in factors1:
result *= factor
if result > 1:
break
else:
empty1 = True
for factor in factors2:
result *= factor
if result < 1:
break
else:
if empty1:
break
return result
For M(1101) I get 0.15855899364641846, but it takes a few seconds. M(2000) takes about 14 seconds and yields 0.15857489065619598.
(I'm sure it can be optimised.)