optimization help to find max number of divisors

optimization help to find max number of divisors - python

I'm trying to solve problem 12 of Euler project. What is the value of the first triangle number to have over five hundred divisors? ( the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28). This is my code, but it is not fast enough..
Do you have any optimization tips?
n=0
a=0
list=[]
maxcount=0
while True:
n+=1
a+=n
count=0
for x in range(1,int(a+1)):
if a%x==0:
count+=1
if count>maxcount:
maxcount=count
print a, "has", maxcount, "dividors"
Thank you!

Start with reducing the search space, no need to look at numbers that are not triangle numbers. Also try looking at divisors in range(1, sqrt(n)) instead of range(1, n)

Grab the code from this question which implements very fast prime factorization:
Fast prime factorization module
Then use the answer to this question to convert your prime factors into a list of all divisors (the length is what you want):
What is the best way to get all the divisors of a number?
For example, you can add the following function (adapted from the second link) to the bottom of the module from the first link:
def alldivisors(n):
factors = list(factorization(n).items())
nfactors = len(factors)
f = [0] * nfactors
while True:
yield reduce(lambda x, y: x*y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
i = 0
while True:
if i >= nfactors:
return
f[i] += 1
if f[i] <= factors[i][1]:
break
f[i] = 0
i += 1
Then in your code to count divisors you would use len(list(alldivisors(a))) which will calculate the number of divisors significantly more quickly than the brute force method you are currently using.

Apart from number theory: try caching, and doing things the other way around. For example: When you already know that 300 has 18 divisors (and what they are), what does that mean for a number which is dividable by 300? Can you cache such information? (sure you can.)
Pure python speedup hacks won't help you, you need a better algorithm.

Related

How to speed up search for abundant numbers?

Is there a way which this code could be improved so that it would run faster? Currently, this task takes between 11 and 12 seconds to run on my virtual environment
def divisors(n):
return sum([x for x in range(1, (round(n/2))) if n % x == 0])
def abundant_numbers():
return [x for x in range(1, 28123) if x < divisors(x)]
result = abundant_numbers()

Whenever you look for speeding up, you should first check whether the algorithm itself should change. And in this case it should.
Instead of looking for divisors given a number, look for numbers that divide by a divisor. For the latter you can use a sieve-like approach. That leads to this algorithm:
def abundant_numbers(n):
# All numbers are strict multiples of 1, except 0 and 1
divsums = [1] * n
for div in range(2, n//2 + 1): # Corrected end-of-range
for i in range(2*div, n, div):
divsums[i] += div # Sum up divisors for number i
divsums[0] = 0 # Make sure that 0 is not counted
return [i for i, divsum in enumerate(divsums) if divsum > i]
result = abundant_numbers(28123)
This runs quite fast, many factors faster than the translation of your algorithm to numpy.
Note that you had a bug in your code. round(n/2) as the range-end can miss a divisor. It should be n//2+1.

Find nearest prime number python

I want to find the largest prime number within range(old_number + 1 , 2*old_number)
This is my code so far:
def get_nearest_prime(self, old_number):
for num in range(old_number + 1, 2 * old_number) :
for i in range(2,num):
if num % i == 0:
break
return num
when I call the get_nearest_prime(13)
the correct output should be 23, while my result was 25.
Anyone can help me to solve this problem? Help will be appreciated!

There are lots of changes you could make, but which ones you should make depend on what you want to accomplish. The biggest problem with your code as it stands is that you're successfully identifying primes with the break and then not doing anything with that information. Here's a minimal change that does roughly the same thing.
def get_nearest_prime(old_number):
largest_prime = 0
for num in range(old_number + 1, 2 * old_number) :
for i in range(2,num):
if num % i == 0:
break
else:
largest_prime = num
return largest_prime
We're using the largest_prime local variable to keep track of all the primes you find (since you iterate through them in increasing order). The else clause is triggered any time you exit the inner for loop "normally" (i.e., without hitting the break clause). In other words, any time you've found a prime.
Here's a slightly faster solution.
import numpy as np
def seive(n):
mask = np.ones(n+1)
mask[:2] = 0
for i in range(2, int(n**.5)+1):
if not mask[i]:
continue
mask[i*i::i] = 0
return np.argwhere(mask)
def get_nearest_prime(old_number):
try:
n = np.max(seive(2*old_number-1))
if n < old_number+1:
return None
return n
except ValueError:
return None
It does roughly the same thing, but it uses an algorithm called the "Sieve of Eratosthenes" to speed up the finding of primes (as opposed to the "trial division" you're using). It isn't the fastest Sieve in the world, but it's reasonably understandable without too many tweaks.
In either case, if you're calling this a bunch of times you'll probably want to keep track of all the primes you've found since computing them is expensive. Caching is easy and flexible in Python, and there are dozens of ways to make that happen if you do need the speed boost.
Note that I'm not positive the range you've specified always contains a prime. It very well might, and if it does you can get away with a lot shorter code. Something like the following.
def get_nearest_prime(old_number):
return np.max(seive(2*old_number-1))
I don't completely agree with the name you've chosen since the largest prime in that interval is usually not the closest prime to old_number, but I think this is what you're looking for anyway.

You can use a sublist to check if the number is prime, if all(i % n for n in range(2, i)) means that number is prime due to the fact that all values returned from modulo were True, not 0. From there you can append those values to a list called primes and then take the max of that list.
List comprehension:
num = 13
l = [*range(num, (2*num)+1)]
print(max([i for i in l if all([i % n for n in range(2,i)])]))
Expanded:
num = 13
l = [*range(num, (2*num)+1)]
primes = []
for i in l:
if all([i % n for n in range(2, i)]):
primes.append(i)
print(max(primes))
23

Search for the nearest prime number from above using the seive function
def get_nearest_prime(old_number):
return old_number+min(seive(2*old_number-1)-old_number, key=lambda a:a<0)

What would be the best answer for this Fibonacci excercise in Python?

What's the best answer for this Fibonacci exercise in Python?
http://www.scipy-lectures.org/intro/language/functions.html#exercises
Exercise: Fibonacci sequence
Write a function that displays the n first terms of the Fibonacci
sequence, defined by:
u0 = 1; u1 = 1
u(n+2) = u(n+1) + un
If this were simply asking a Fibonacci code, I would write like this:
def fibo_R(n):
if n == 1 or n == 2:
return 1
return fibo_R(n-1) + fibo_R(n-2)
print(fibo_R(6))
... However, in this exercise, the initial conditions are both 1 and 1, and the calculation is going towards the positive direction (+). I don't know how to set the end condition. I've searched for an answer, but I couldn't find any. How would you answer this?

Note that u_(n+2) = u_(n+1) + u_n is equivalent to u_n = u_(n-1) + u_(n-2), i.e. your previous code will still apply. Fibonacci numbers are by definition defined in terms of their predecessors, no matter how you phrase the problem.
A good approach to solve this is to define a generator which produces the elements of the Fibonacci sequence on demand:
def fibonacci():
i = 1
j = 1
while True:
yield i
x = i + j
i = j
j = x
You can then take the first N items of the generator via e.g. itertools.islice, or you use enumerate to keep track of how many numbers you saw:
for i, x in enumerate(fibonacci()):
if i > n:
break
print x
Having a generator means that you can use the same code for solving many different problems (and quite efficiently though), such as:
getting the n'th fibonacci number
getting the first n fibonacci numbers
getting all fibonacci numbers satisfying some predicate (e.g. all fibonacci numbers lower than 100)

The best way to calculate a fibonacci sequence is by simply starting at the beginning and looping until you have calculated the n-th number. Recursion produces way too many method calls since you are calculating the same numbers over and over again.
This function calculates the first n fibonacci numbers, stores them in a list and then prints them out:
def fibonacci(n):
array = [1]
a = 1
b = 1
if n == 1:
print array
for i in range(n-1):
fib = a + b
a = b
b = fib
array.append(fib)
print array

If you want a super memory-efficient solution, use a generator that only produces the next number on demand:
def fib_generator():
e1, e2 = 0, 1
while True:
e1,e2 = e2, e1+e2
yield e1
f = fib_generator()
print(next(f))
print(next(f))
print(next(f))
## dump the rest with a for-loop
for i in range(3, 50):
print(next(f))
The recursive solution is the most elegant, but it is slow. Keiwan's loop is the fastest for a large number of elements.
Yes, definitely no globals as correctly observed by DSM. Thanks!

An alternative recursive just to show that things can be done in slightly different ways:
def fib2(n): return n if n < 2 else fib2( n - 1 ) + fib2( n - 2 )

Project Euler 240: number of ways to roll dice

I 'm trying to solve Project Euler problem 240:
In how many ways can twenty 12-sided dice (sides numbered 1 to 12) be rolled so that the top ten sum to 70?
I've come up with code to solve this. But it really takes a lot of time to compute. I know this approach is pretty bad. Can someone suggest me how I can fix this code to perform better?
import itertools
def check(a,b): # check all the elements in a list a, are lesser than or equal to value b
chk=0
for x in a:
if x<=b:
chk=1
return chk
lst=[]
count=0
for x in itertools.product(range(1,13),repeat=20):
a=sorted([x[y] for y in range(20)])
if sum(a[-10:])==70 and check(a[:10],min(a[-10:])):
count+=1
Below code is for the problem defined in the description of the problem. It works perfectly and gives the exact solution....
import itertools
def check(a,b):
chk=1
for x in a:
if x>b:
chk=0
break
return chk
count=0
for x in itertools.product(range(1,7),repeat=5):
a=sorted([x[y] for y in range(5)])
if sum(a[-3:])==15 and check(a[:2],min(a[-3:])):
count+=1

It's no good iterating over all possibilities, because there are 1220 = 3833759992447475122176 ways to roll 20 twelve-sided dice, and at, say, a million rolls per second, that would take millions of years to complete.
The way to solve this kind of problem is to use dynamic programming. Find some way to split up your problem into the sum of several smaller problems, and build up a table of the answers to these sub-problems until you can compute the result you need.
For example, let T(n, d, k, t) be the number of ways to roll n d-sided dice so that the top k of them sum to t. How can we split this up into sub-problems? Well, we could consider the number of dice, i, that roll d exactly. There are nCi ways to choose these i dice, and T(n − i, d − 1, ...) ways to choose the n − i remaining dice which must roll at most d − 1. (For some suitable choice of parameters for k and t which I've elided.)
Take the product of these, and sum it up for all suitable values of i and you're done. (Well, not quite done: you have to specify the base cases, but that should be easy.)
The number of sub-problems that you need to compute will be at most (n + 1)(d + 1)(k + 1)(t + 1), which in the Project Euler case (n = 20, d = 12, k = 10, t = 70) is at most 213213. (In practice, it's much less than this, because many branches of the tree reach base cases quickly: in my implementation it turns out that the answers to just 791 sub-problems are sufficient to compute the answer.)
To write a dynamic program, it's usually easiest to express it recursively and use memoization to avoid re-computing the answer to sub-problems. In Python you could use the #functools.lru_cache decorator.
So the skeleton of your program could look like this. I've replaced the crucial details by ??? so as not to deprive you of the pleasure of working it out for yourself. Work with small examples (e.g. "two 6-sided dice, the top 1 of which sums to 6") to check that your logic is correct, before trying bigger cases.
def combinations(n, k):
"""Return C(n, k), the number of combinations of k out of n."""
c = 1
k = min(k, n - k)
for i in range(1, k + 1):
c *= (n - k + i)
c //= i
return c
#lru_cache(maxsize=None)
def T(n, d, k, t):
"""Return the number of ways n distinguishable d-sided dice can be
rolled so that the top k dice sum to t.
"""
# Base cases
if ???: return 1
if ???: return 0
# Divide and conquer. Let N be the maximum number of dice that
# can roll exactly d.
N = ???
return sum(combinations(n, i)
* T(n - i, d - 1, ???)
for i in range(N + 1))
With appropriate choices for all the ???, this answers the Project Euler problem in a few milliseconds:
>>> from timeit import timeit
>>> timeit(lambda:T(20, 12, 10, 70), number=1)
0.008017531014047563
>>> T.cache_info()
CacheInfo(hits=1844, misses=791, maxsize=None, currsize=791)

this solution should work - not sure how long it will take on your system.
from itertools import product
lg = (p for p in product(xrange(1,13,1),repeat=10) if sum(p) == 70)
results = {}
for l in lg:
results[l] = [p for p in product(xrange(1,min(l),1),repeat=10)]
what it does is create the "top ten" first. then adds to each "top ten" a list of the possible "next ten" items where the max value is capped at the minimum item in the "top ten"
results is a dict where the key is the "top ten" and the value is a list of the possible "next ten"
the solution (amount of combinations that fit the requirements) would be to count the number of lists in all the result dict like this:
count = 0
for k, v in results.items():
count += len(v)
and then count will be the result.
update
okay, i have thought of a slightly better way of doing this.
from itertools import product
import math
def calc_ways(dice, sides, top, total):
top_dice = (p for p in product(xrange(1,sides+1,1),repeat=top) if sum(p) == total)
n_count = dict((n, math.pow(n, dice-top)) for n in xrange(1,sides+1,1))
count = 0
for l in top_dice:
count += n_count[min(l)]
return count
since im only counting the length of the "next ten" i figured i would just pre-calculate the amount of options for each 'lowest' number in "top ten" so i created a dictionary which does that. the above code will run much smoother, as it is comprised only of a small dictionary, a counter, and a generator. as you can imagine, it will probably still take much time.... but i ran it for the first 1 million results in under 1 minute. so i'm sure its within the feasible range.
good luck :)
update 2
after another comment by you, i understood what i was doing wrong and tried to correct it.
from itertools import product, combinations_with_replacement, permutations
import math
def calc_ways(dice, sides, top, total):
top_dice = (p for p in product(xrange(1,sides+1,1),repeat=top) if sum(p) == total)
n_dice = dice-top
n_sets = len(set([p for p in permutations(range(n_dice)+['x']*top)]))
n_count = dict((n, n_sets*len([p for p in combinations_with_replacement(range(1,n+1,1),n_dice)])) for n in xrange(1,sides+1,1))
count = 0
for l in top_dice:
count += n_count[min(l)]
return count
as you can imagine it is quite a disaster, and does not even give the right answer. i think i am going to leave this one for the mathematicians. since my way of solving this would simply be:
def calc_ways1(dice, sides, top, total):
return len([p for p in product(xrange(1,sides+1,1),repeat=dice) if sum(sorted(p)[-top:]) == total])
which is an elegant 1 line solution, and provides the right answer for calc_ways1(5,6,3,15) but takes forever for the calc_ways1(20,12,10,70) problem.
anyway, math sure seems like the way to go on this, not my silly ideas.

Handling memory usage for big calculation in python

I am trying to do some calculations with python, where I ran out of memory. Therefore, I want to read/write a file in order to free memory. I need a something like a very big list object, so I thought writing a line for each object in the file and read/write to that lines instead of to memory. Line ordering is important for me since I will use line numbers as index. So I was wondering how I can replace lines in python, without moving around other lines (Actually, it is fine to move lines, as long as they return back to where I expect them to be).
Edit
I am trying to help a friend, which is worse than or equal to me in python. This code supposed to find biggest prime number, that divides given non-prime number. This code works for numbers until the numbers like 1 million, but after dead, my memory gets exhausted while trying to make numbers list.
# a comes from a user input
primes_upper_limit = (a+1) / 2
counter = 3L
numbers = list()
while counter <= primes_upper_limit:
numbers.append(counter)
counter += 2L
counter=3
i=0
half = (primes_upper_limit + 1) / 2 - 1
root = primes_upper_limit ** 0.5
while counter < root:
if numbers[i]:
j = int((counter*counter - 3) / 2)
numbers[j] = 0
while j < half:
numbers[j] = 0
j += counter
i += 1
counter = 2*i + 3
primes = [2] + [num for num in numbers if num]
for numb in reversed(primes):
if a % numb == 0:
print numb
break
Another Edit
What about wrinting different files for each index? for example a billion of files with long integer filenames, and just a number inside of the file?

You want to find the largest prime divisor of a. (Project Euler Question 3)
Your current choice of algorithm and implementation do this by:
Generate a list numbers of all candidate primes in range (3 <= n <= sqrt(a), or (a+1)/2 as you currently do)
Sieve the numbers list to get a list of primes {p} <= sqrt(a)
Trial Division: test the divisibility of a by each p. Store all prime divisors {q} of a.
Print all divisors {q}; we only want the largest.
My comments on this algorithm are below. Sieving and trial division are seriously not scalable algorithms, as Owen and I comment. For large a (billion, or trillion) you really should use NumPy. Anyway some comments on implementing this algorithm:
Did you know you only need to test up to √a, int(math.sqrt(a)), not (a+1)/2 as you do?
There is no need to build a huge list of candidates numbers, then sieve it for primeness - the numbers list is not scalable. Just construct the list primes directly. You can use while/for-loops and xrange(3,sqrt(a)+2,2) (which gives you an iterator). As you mention xrange() overflows at 2**31L, but combined with the sqrt observation, you can still successfully factor up to 2**62
In general this is inferior to getting the prime decomposition of a, i.e. every time you find a prime divisor p | a, you only need to continue to sieve the remaining factor a/p or a/p² or a/p³ or whatever). Except for the rare case of very large primes (or pseudoprimes), this will greatly reduce the magnitude of the numbers you are working with.
Also, you only ever need to generate the list of primes {p} once; thereafter store it and do lookups, not regenerate it.
So I would separate out generate_primes(a) from find_largest_prime_divisor(a). Decomposition helps greatly.
Here is my rewrite of your code, but performance still falls off in the billions (a > 10**11 +1) due to keeping the sieved list. We can use collections.deque instead of list for primes, to get a faster O(1) append() operation, but that's a minor optimization.
# Prime Factorization by trial division
from math import ceil,sqrt
from collections import deque
# Global list of primes (strictly we should use a class variable not a global)
#primes = deque()
primes = []
def is_prime(n):
"""Test whether n is divisible by any prime known so far"""
global primes
for p in primes:
if n%p == 0:
return False # n was divisible by p
return True # either n is prime, or divisible by some p larger than our list
def generate_primes(a):
"""Generate sieved list of primes (up to sqrt(a)) as we go"""
global primes
primes_upper_limit = int(sqrt(a))
# We get huge speedup by using xrange() instead of range(), so we have to seed the list with 2
primes.append(2)
print "Generating sieved list of primes up to", primes_upper_limit, "...",
# Consider prime candidates 2,3,5,7... in increasing increments of 2
#for number in [2] + range(3,primes_upper_limit+2,2):
for number in xrange(3,primes_upper_limit+2,2):
if is_prime(number): # use global 'primes'
#print "Found new prime", number
primes.append(number) # Found a new prime larger than our list
print "done"
def find_largest_prime_factor(x, debug=False):
"""Find all prime factors of x, and return the largest."""
global primes
# First we need the list of all primes <= sqrt(x)
generate_primes(x)
to_factor = x # running value of the remaining quantity we need to factor
largest_prime_factor = None
for p in primes:
if debug: print "Testing divisibility by", p
if to_factor%p != 0:
continue
if debug: print "...yes it is"
largest_prime_factor = p
# Divide out all factors of p in x (may have multiplicity)
while to_factor%p == 0:
to_factor /= p
# Stop when all factors have been found
if to_factor==1:
break
else:
print "Tested all primes up to sqrt(a), remaining factor must be a single prime > sqrt(a) :", to_factor
print "\nLargest prime factor of x is", largest_prime_factor
return largest_prime_factor

If I'm understanding you correctly, this is not an easy task. They way I interpreted it, you want to keep a file handle open, and use the file as a place to store character data.
Say you had a file like,
a
b
c
and you wanted to replace 'b' with 'bb'. That's going to be a pain, because the file actually looks like a\nb\nc -- you can't just overwrite the b, you need another byte.
My advice would be to try and find a way to make your algorithm work without using a file for extra storage. If you got a stack overflow, chances are you didn't really run out of memory, you overran the call stack, which is much smaller.
You could try reworking your algorithm to not be recursive. Sometimes you can use a list to substitute for the call stack -- but there are many things you could do and I don't think I could give much general advice not seeing your algorithm.
edit
Ah I see what you mean... when the list
while counter <= primes_upper_limit:
numbers.append(counter)
counter += 2L
grows really big, you could run out of memory. So I guess you're basically doing a sieve, and that's why you have the big list numbers? It makes sense. If you want to keep doing it this way, you could try a numpy bool array, because it will use substantially less memory per cell:
import numpy as np
numbers = np.repeat(True, a/2)
Or (and maybe this is not appealing) you could go with an entirely different approach that doesn't use a big list, such as factoring the number entirely and picking the biggest factor.
Something like:
factors = [ ]
tail = a
while tail > 1:
j = 2
while 1:
if tail % j == 0:
factors.append(j)
tail = tail / j
print('%s %s' % (factors, tail))
break
else:
j += 1
ie say you were factoring 20: tail starts out as 20, then you find 2 tail becomes 10, then it becomes 5.
This is not terrible efficient and will become way too slow for a large (billions) prime number, but it's ok for numbers with small factors.
I mean your sieve is good too, until you start running out of memory ;). You could give numpy a shot.

pytables is excellent for working with and storing huge amounts of data. But first start with implementing the comments in smci's answer to minimize the amount of numbers you need to store.

For a number with only twelve digits, as in Project Euler #3, no fancy integer factorization method is needed, and there is no need to store intermediate results on disk. Use this algorithm to find the factors of n:
Set f = 2.
If n = 1, stop.
If f * f > n, print n and stop.
Divide n by f, keeping both the quotient q and the remainder r.
If r = 0, print q, divide n by q, and go to Step 2.
Otherwise, increase f by 1 and go to Step 3.
This just does trial division by every integer until it reaches the square root, which indicates that the remaining cofactor is prime. Each factor is printed as it is found.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.