I can't find where I did wrong :( - python

I was working on project euler question 23 with python. For this question, I have to find sum of any numbers <28124 that cannot be made by sum of two abundant numbers. abundant numbers are numbers that are smaller then its own sum of proper divisors.
my apporach was : https://gist.github.com/anonymous/373f23098aeb5fea3b12fdc45142e8f7
from math import sqrt
def dSum(n): #find sum of proper divisors
lst = set([])
if n%2 == 0:
step = 1
else:
step = 2
for i in range(1, int(sqrt(n))+1, step):
if n % i == 0:
lst.add(i)
lst.add(int(n/i))
llst = list(lst)
lst.remove(n)
sum = 0
for j in lst:
sum += j
return sum
#any numbers greater than 28123 can be written as the sum of two abundant numbers.
#thus, only have to find abundant numbers up to 28124 / 2 = 14062
abnum = [] #list of abundant numbers
sum = 0
can = set([])
for i in range(1,14062):
if i < dSum(i):
abnum.append(i)
for i in abnum:
for j in abnum:
can.add(i + j)
print (abnum)
print (can)
cannot = set(range(1,28124))
cannot = cannot - can
cannot = list(cannot)
cannot.sort ()
result = 0
print (cannot)
for i in cannot:
result += i
print (result)
which gave me answer of 31531501, which is wrong.
I googled the answer and answer should be 4179871.
theres like 1 million difference between the answers, so it should mean that I'm removing numbers that cannot be written as sum of two abundant numbers. But when I re-read the code it looks fine logically...
Please save from this despair

Just for some experience you really should look at comprehensions and leveraging the builtins (vs. hiding them):
You loops outside of dSum() (which can also be simplified) could look like:
import itertools as it
abnum = [i for i in range(1,28124) if i < dSum(i)]
can = {i+j for i, j in it.product(abnum, repeat=2)}
cannot = set(range(1,28124)) - can
print(sum(cannot)) # 4179871

There are a few ways to improve your code.
Firstly, here's a more compact version of dSum that's fairly close to your code. Operators are generally faster than function calls, so I use ** .5 instead of calling math.sqrt. I use a conditional expression instead of an if...else block to compute the step size. I use the built-in sum function instead of a for loop to add up the divisors; also, I use integer subtraction to remove n from the total because that's more efficient than calling the set.remove method.
def dSum(n):
lst = set()
for i in range(1, int(n ** .5) + 1, 2 if n % 2 else 1):
if n % i == 0:
lst.add(i)
lst.add(n // i)
return sum(lst) - n
However, we don't really need to use a set here. We can just add the divisor pairs as we find them, if we're careful not to add any divisor twice.
def dSum(n):
total = 0
for i in range(1, int(n ** .5) + 1, 2 if n % 2 else 1):
if n % i == 0:
j = n // i
if i < j:
total += i + j
else:
if i == j:
total += i
break
return total - n
This is slightly faster, and uses less RAM, at the expense of added code complexity. However, there's a more efficient approach to this problem.
Instead of finding the divisors (and hence the divisor sum) of each number individually, it's better to use a sieving approach that finds the divisors of all the numbers in the required range. Here's a simple example.
num = 28124
# Build a table of divisor sums.
table = [1] * num
for i in range(2, num):
for j in range(2 * i, num, i):
table[j] += i
# Collect abundant numbers
abnum = [i for i in range(2, num) if i < table[i]]
print(len(abnum), abnum[0], abnum[-1])
output
6965 12 28122
If we need to find divisor sums for a very large num a good approach is to find the prime power factors of each number, since there's an efficient way to compute the sum of the divisors from the prime power factorization. However, for numbers this small the minor time saving doesn't warrant the extra code complexity. (But I can add some prime power sieve code if you're curious; for finding divisor sums for all numbers < 28124, the prime power sieve technique is about twice as fast as the above code).
AChampion's answer shows a very compact way to find the sum of the numbers that cannot be written as the sum of two abundant numbers. However, it's a bit slow, mostly because it loops over all pairs of abundant numbers in abnum. Here's a faster way.
def main():
num = 28124
# Build a table of divisor sums. table[0] should be 0, but we ignore it.
table = [1] * num
for i in range(2, num):
for j in range(2 * i, num, i):
table[j] += i
# Collect abundant numbers
abnum = [i for i in range(2, num) if i < table[i]]
del table
# Create a set for fast searching
abset = set(abnum)
print(len(abset), abnum[0], abnum[-1])
total = 0
for i in range(1, num):
# Search for pairs of abundant numbers j <= d: j + d == i
for j in abnum:
d = i - j
if d < j:
# No pairs were found
total += i
break
if d in abset:
break
print(total)
if __name__ == "__main__":
main()
output
6965 12 28122
4179871
This code runs in around 2.7 seconds on my old 32bit single core 2GHz machine running Python 3.6.0. On Python 2, it's about 10% faster; I think that's because list comprehensions have less overhead in Python 2 (the run in the current scope rather than creating a new scope).

Related

Project Euler #23 efficiency

I am trying to write a program in python to answer the following problem:
A perfect number is a number for which the sum of its proper divisors is exactly equal to the number. For example, the sum of the proper divisors of 28 would be 1 + 2 + 4 + 7 + 14 = 28, which means that 28 is a perfect number.
A number n is called deficient if the sum of its proper divisors is less
than n and it is called abundant if this sum exceeds n.
As 12 is the smallest abundant number, 1 + 2 + 3 + 4 + 6 = 16, the smallest number that can be written as the sum of two abundant numbers is 24.
By mathematical analysis, it can be shown that all integers greater than 28123 can be written as the sum of two abundant numbers. However, this upper limit cannot be reduced any further by analysis, even though it is known that the greatest number that cannot be expressed as the sum of two abundant numbers is less than this limit.
Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers.
So here is my code which should theoretically work but which is way too slow.
import math
import time
l = 28123
abondant = []
def listNumbers():
for i in range(1, l):
s = 0
for k in range(1, int(i / 2) + 1):
if i % k == 0:
s += k
if s > i:
abondant.append(i)
def check(nb):
for a in abondant:
for b in abondant:
if a + b == nb:
return False
return True
def main():
abondant_sum = 0
for i in range(12, l):
if check(i):
abondant_sum += i
return abondant_sum
start = time.time()
listNumbers()
print(main())
end = time.time()
print("le programme a mis ", end - start, " ms")
How can I make my program more efficient?
Checking until half and summing up all passing numbers is very inefficient.
Try to change
for k in range(1, int(i / 2) + 1):
if i % k == 0:
s += k
to
for k in range(1, int(i**0.5)+1):
if i % k == 0:
s += k
if k != i//k:
s += i//k
The thing is that you make a double iteration on "abondant" in the check function that you call 28111 times.
It would be much more efficient to only compute a set of all a+b once and then check if your number is inside.
Something like:
def get_combinations():
return set(a+b for a in abondant for b in abondant)
And then maybe for the main function:
def main():
combinations = get_combinations()
non_abondant = filter(lambda nb: nb not in combinations, range(12,l))
return sum(non_abondant)
Once you have the list of abundant number you can make a list result = [False] * 28123 and then
for a in abondant:
for b in abondant:
result[min(a+b, 28123)] = True
Then
l = []
for i in range(len(result)):
if not result[i]:
l.append(i)
print(l)

Most efficient way to filter prime numbers from a list of random numbers in Python

I have a list filled with random numbers and I want to return the prime numbers from this list. So I created these functions:
def is_prime(number):
for i in range(2, int(sqrt(number)) + 1):
if number % i == 0:
return False
return number > 1
And
def filter_primes(general_list):
return set(filter(is_prime, general_list))
But I want to improve performance, so how can I achieve this?
Sieve of Eratosthenes, taking about 0.17 seconds for primes under 10 million on PyPy 3.5 on my device:
from array import array
from math import isqrt
def primes(upper):
numbers = array('B', [1]) * (upper + 1)
for i in range(2, isqrt(upper) + 1):
if numbers[i]:
low_multiple = i * i
numbers[low_multiple:upper + 1:i] = array('B', [0]) * ((upper - low_multiple) // i + 1)
return {i for i, x in enumerate(numbers) if i >= 2 and x}
and the filter function:
filter_primes = primes(10_000_000).intersection
3 rounds of the the Miller-Rabin test ( https://en.wikipedia.org/wiki/Miller%2dRabin_primality_test ) using bases 2, 7, and 61, is known to accurately detect all primes <= 32 bit, i.e., anything that fits into a python int.
This is much, much faster than trial division or sieving if the numbers can be large.
If the numbers cannot be large (i.e., < 10,000,000 as you suggest in comments), then you may want to precompute the set of all primes < 10,000,000, but there are over 600,000 of those.
How about this? I think It's a little better:
def filter_primes(general_list):
return filter(is_prime, set(general_list))
This way we don't call is_prime() for same number multiple times.
The Sieve of Eratosthenes is more efficient than Trial Division, the method you are using.
Your trial division loop can be made more efficient, taking about half the time. Two is the only even prime number, so treat two as a special case and only deal with odd numbers thereafter, which will halve the work.
My Python is non-existent, but this pseudocode should make things clear:
def isPrime(num)
// Low numbers.
if (num <= 1)
return false
end if
// Even numbers
if (num % 2 == 0)
return (num == 2) // 2 is the only even prime.
end if
// Odd numbers
for (i = 3 to sqrt(num) + 1 step 2)
if (num % i == 0)
return false
end if
end for
// If we reach here, num is prime.
return true;
end def
That step 2 in the for loop is what halves the work. Having earlier eliminated all even numbers you only need to test with odd trial divisors: 3, 5, 7, ...
def primes_list(num_list):
divs = [2,3,5,7]
primes = [x for x in set(num_list) if 0 not in {1 if ((x%i != 0) | (x in divs)) & (x > 0) else 0 for i in divs}]
return primes
For this function, it takes a list, num_list, as a parameter. divs is a predefined, or rather hard coded, list of prime numbers less than 10 excluding 1. Then we use list comprehension to filter num_list for prime numbers as the variable primes.
This is one more flavour of code to find the prime no of the range. The simple and easy way.
def find_prime(n):
if n <=1:
return False
else:
for i in range(2, n):
if n % i == 0:
return False
return True
n = 10
x = filter(find_prime, range(n)) #you can give random no list too
print(list(x))
def is_prime(n):
if n>1:
for i in range(2,int(n**0.5)+1):
if n%i==0:
return False
return True
else: return False
print([x for x in general_list if is_prime(x)])
would you try this... There is no need for the filter at all and you could simply apply set() to the general_list if there are duplicate elements in the list to optimize more.

Divisors of a number: How can I improve this code to find the number of divisors of big numbers?

My code is very slow when it comes to very large numbers.
def divisors(num):
divs = 1
if num == 1:
return 1
for i in range(1, int(num/2)):
if num % i == 0:
divs += 1
elif int(num/2) == i:
break
else:
pass
return divs
For 10^9 i get a run time of 381.63s.
Here is an approach that determines the multiplicities of the various distinct prime factors of n. Each such power, k, contributes a factor of k+1 to the total number of divisors.
import math
def smallest_divisor(p,n):
#returns the smallest divisor of n which is greater than p
for d in range(p+1,1+math.ceil(math.sqrt(n))):
if n % d == 0:
return d
return n
def divisors(n):
divs = 1
p = 1
while p < n:
p = smallest_divisor(p,n)
k = 0
while n % p == 0:
k += 1
n //= p
divs *= (k+1)
return divs - 1
It returns the number of proper divisors (so not counting the number itself). If you want to count the number itself, don't subtract 1 from the result.
It works rapidly with numbers of the size 10**9, though will slow down dramatically with even larger numbers.
Division is expensive, multiplication is cheap.
Factorize the number into primes. (Download the list of primes, keep dividing from the <= sqrt(num).)
Then count all the permutations.
If your number is a power of exactly one prime, p^n, you obvioulsy have n divisors for it, excluding 1; 8 = 2^3 has 3 divisors: 8, 4, 2.
In general case, your number factors into k primes: p0^n0 * p1^n1 * ... * pk^nk. It has (n0 + 1) * (n1 + 1) * .. * (nk + 1) divisors. The "+1" term counts for the case when all other powers are 0, that is, contribute a 1 to the multiplication.
Alternatively, you could just google and RTFM.
Here is an improved version of my code in the question. The running time is better, 0.008s for 10^9 now.
def divisors(num):
ceiling = int(sqrt(num))
divs = []
if num == 1:
return [1]
for i in range(1, ceiling + 1):
if num % i == 0:
divs.append(num / i)
if i != num // i:
divs.append(i)
return divs
It is important for me to also keep the divisors, so if this can still
be improved I'd be glad.
Consider this:
import math
def num_of_divisors(n):
ct = 1
rest = n
for i in range(2, int(math.ceil(math.sqrt(n)))+1):
while rest%i==0:
ct += 1
rest /= i
print i # the factors
if rest == 1:
break
if rest != 1:
print rest # the last factor
ct += 1
return ct
def main():
number = 2**31 * 3**13
print '{} divisors in {}'.format(num_of_divisors(number), number)
if __name__ == '__main__':
main()
We can stop searching for factors at the square root of n. Multiple factors are found in the while loop. And when a factor is found we divide it out from the number.
edit:
#Mark Ransom is right, the factor count was 1 too small for numbers where one factor was greater than the square root of the number, for instance 3*47*149*6991. The last check for rest != 1 accounts for that.
The number of factors is indeed correct then - you don't have to check beyond sqrt(n) for this.
Both statements where a number is printed can be used to append this number to the number of factors, if desired.

How would you implement a divisor function?

The divisor function is the sum of divisors of a natural number.
Making a little research I found this to be a very good method if you want to find the divisor function of a given natural number N, so I tried to code it in Python:
def divisor_function(n):
"Returns the sum of divisors of n"
checked = [False]*100000
factors = prime_factors(n)
sum_of_divisors = 1 # It's = 1 because it will be the result of a product
for x in factors:
if checked[x]:
continue
else:
count = factors.count(x)
tmp = (x**(count+1)-1)//(x-1)
sum_of_divisors*=tmp
checked[x]=True
return sum_of_divisors
It works pretty well,but I am sure that it can be improved(e.g. : I create a list with 100000 elements,but I am not using most of them).
How would you improve/implement it?
P.S. This is prime_factors:
def prime_factors(n):
"Returns all the prime factors of a positive integer"
factors = []
d = 2
while (n > 1):
while (n%d==0):
factors.append(d)
n /= d
d = d + 1
if (d*d>n):
if (n>1): factors.append(int(n));
break;
return factors
When computing the sum of divisors, you need the factorization of n in the form p1k1 p2k2 ... — that is, you need the exponent of each prime in the factorization. At the moment you are doing this by computing a flat list of prime factors, and then calling count to work out the exponent. This is a waste of time because you can easily generate the prime factorization in the format you need in the first place, like this:
def factorization(n):
"""
Generate the prime factorization of n in the form of pairs (p, k)
where the prime p appears k times in the factorization.
>>> list(factorization(1))
[]
>>> list(factorization(24))
[(2, 3), (3, 1)]
>>> list(factorization(1001))
[(7, 1), (11, 1), (13, 1)]
"""
p = 1
while p * p < n:
p += 1
k = 0
while n % p == 0:
k += 1
n //= p
if k:
yield p, k
if n != 1:
yield n, 1
Notes on the code above:
I've transformed this code so that it generates the factorization, instead of constructing a list (by repeated calls to append) and returning it. In Python, this transformation is nearly always an improvement, because it allows you to consume elements one by one as they are generated, without having to store the whole sequence in memory.
This is the kind of function for which doctests work well.
Now computing the sum of divisors is really simple: there's no need to store the set of checked factors or to count the number of times each factor occurs. In fact you can do it in just one line:
from operator import mul
def sum_of_divisors(n):
"""
Return the sum of divisors of n.
>>> sum_of_divisors(1)
1
>>> sum_of_divisors(33550336) // 2
33550336
"""
return reduce(mul, ((p**(k+1)-1) // (p-1) for p, k in factorization(n)), 1)
You need to change two lines only:
def divisor_function(n):
"Returns the sum of divisors of n"
checked = {}
factors = prime_factors(n)
sum_of_divisors = 1 # It's = 1 because it will be the result of a product
for x in factors:
if checked.get(x,False):
continue
else:
count = factors.count(x)
tmp = (x**(count+1)-1)//(x-1)
sum_of_divisors*=tmp
checked[x]=True
return sum_of_divisors
why use dict or set - or count() - at all, when prime_factors() is guaranteed to return the factors in ascending order? You only ever deal with a previous factor. Counting can just be a part of iteration:
def divisor_function(n):
"Returns the sum of divisors of n"
factors = prime_factors(n)
sum_of_divisors = 1
count = 0; prev = 0;
for x in factors:
if x==prev:
count += 1
else:
if prev: sum_of_divisors *= (prev**(count+1)-1)//(prev-1)
count = 1; prev = x;
if prev: sum_of_divisors *= (prev**(count+1)-1)//(prev-1)
return sum_of_divisors
def sum_divisors(n):
assert n > 0
if n == 1:
return 0
sum = 1
if n % 2 == 0: # if n is even there is a need to go over even numbers
i = 2
while i < sqrt (n):
if n % i == 0:
sum = sum + i + (n//i) # if i|n then n/i is an integer so we want to add it as well
i = i + 1
if type (sqrt (n)) == int: # if sqrt(n)|n we would like to avoid adding it twice
sum = sum + sqrt (n)
else:
i = 3
while i < sqrt (n): # this loop will only go over odd numbers since 2 is not a factor
if n % i == 0:
sum = sum + i + (n//i) # if i|n then n/i is an integer so we want to add it as well
i = i + 2
if type (sqrt (n)) == int: # if sqrt(n)|n we would like to avoid adding it twice
sum = sum + sqrt (n)
return sum
Here is what I do in my Java number utilities (extensively used for Project Euler):
Generate the prime factorization with explicit exponents (see the answer by Gareth Rees).
Unfold the prime factorization in the various functions based on it. I.e., use the same algorithm as for prime factorization but directly compute the desire value instead of storing the factors and exponents.
By default test only divisors two and odd numbers. I have methods firstDivisor(n) and nextDivisor(n,d) for that.
Optionally precompute a table of least divisors for all numbers below a bound. This is very useful if you need to factorize all or most numbers below the bound (improves speed by about sqrt(limit)). I hook the table into the firstDivisor(n) and nextDivisor(n,d) methods, so this doesn't change the factorization algorithms.

Python Eratosthenes Sieve Algorithm Optimization

I'm attempting to implement the Sieve of Eratosthenes. The output seems to be correct (minus "2" that needs to be added) but if the input to the function is larger than 100k or so it seems to take an inordinate amount of time. What are ways that I can optimize this function?
def sieveErato(n):
numberList = range(3,n,2)
for item in range(int(math.sqrt(len(numberList)))):
divisor = numberList[item]
for thing in numberList:
if(thing % divisor == 0) and thing != divisor:
numberList.remove(thing)
return numberList
Your algorithm is not the Sieve of Eratosthenes. You perform trial division (the modulus operator) instead of crossing-off multiples, as Eratosthenes did over two thousand years ago. Here is an explanation of the true sieving algorithm, and shown below is my simple, straight forward implementation, which returns a list of primes not exceeding n:
def sieve(n):
m = (n-1) // 2
b = [True]*m
i,p,ps = 0,3,[2]
while p*p < n:
if b[i]:
ps.append(p)
j = 2*i*i + 6*i + 3
while j < m:
b[j] = False
j = j + 2*i + 3
i+=1; p+=2
while i < m:
if b[i]:
ps.append(p)
i+=1; p+=2
return ps
We sieve only on the odd numbers, stopping at the square root of n. The odd-looking calculations on j map between the integers being sieved 3, 5, 7, 9, ... and indexes 0, 1, 2, 3, ... in the b array of bits.
You can see this function in action at http://ideone.com/YTaMB, where it computes the primes to a million in less than a second.
You can try the same way Eratosthenes did. Take an array with all numbers you need to check order ascending, go to number 2 and mark it. Now scratch every second number till the end of the array. Then go to 3 and mark it. After that scratch every third number . Then go to 4 - it is already scratched, so skip it. Repeat this for every n+1 which is not already scratched.
In the end, the marked numbers are the prime one. This algorithm is faster, but sometimes need lots of memory. You can optimize it a little by drop all even numbers (cause they are not prime) and add 2 manually to the list. This will twist the logic a little, but will take half the memory.
Here is an illustration of what I'm talking about: http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
Warning: removing elements from an iterator while iterating on it can be dengerous...
You could make the
if(thing % divisor == 0) and thing != divisor:
test lighter by splitting it in the loop that breaks when you arrive to the index of 'divisor' and then the test:
for thing in numberList_fromDivisorOn:
if(thing % divisor == 0):
numberList.remove(thing)
This code takes 2 seconds to generate primes less than 10M
(it is not mine, i found it somewer on google)
def erat_sieve(bound):
if bound < 2:
return []
max_ndx = (bound - 1) // 2
sieve = [True] * (max_ndx + 1)
#loop up to square root
for ndx in range(int(bound ** 0.5) // 2):
# check for prime
if sieve[ndx]:
# unmark all odd multiples of the prime
num = ndx * 2 + 3
sieve[ndx+num:max_ndx:num] = [False] * ((max_ndx-ndx-num-1)//num + 1)
# translate into numbers
return [2] + [ndx * 2 + 3 for ndx in range(max_ndx) if sieve[ndx]]
I followed this link: Sieve of Eratosthenes - Finding Primes Python as suggested by #MAK and I've found that the accepted answer could be improved with an idea I've found in your code:
def primes_sieve2(limit):
a = [True] * limit # Initialize the primality list
a[0] = a[1] = False
sqrt = int(math.sqrt(limit))+1
for i in xrange(sqrt):
isprime = a[i]
if isprime:
yield i
for n in xrange(i*i, limit, i): # Mark factors non-prime
a[n] = False
for (i, isprime) in enumerate(a[sqrt:]):
if isprime:
yield i+sqrt
if given unlimited memory and time, the following code will print all the prime numbers. and it'll do it without using trial division. it is based on the haskell code in the paper: The Genuine Sieve of Eratosthenes by Melissa E. O'Neill
from heapq import heappush, heappop, heapreplace
def sieve():
w = [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
for p in [2,3,5,7]: print p
n,o = 11,0
t = []
l = len(w)
p = n
heappush(t, (p*p, n,o,p))
print p
while True:
n,o = n+w[o],(o+1)%l
p = n
if not t[0][0] <= p:
heappush(t, (p*p, n,o,p))
print p
continue
while t[0][0] <= p:
_, b,c,d = t[0]
b,c = b+w[c],(c+1)%l
heapreplace(t, (b*d, b,c,d))
sieve()

Categories

Resources