Project Euler #641 Python 3.6 - Numpy

Project Euler #641 Python 3.6 - Numpy - python

I'm working on solve the below problem from Project Euler, which in short deals with iterating over 'n' dice and updating their values.
A Long Row of Dice - project Euler problem #641
Consider a row of n dice all showing 1.
First turn every second die,(2,4,6,…), so that the number showing is increased by 1. Then turn every third die. The sixth die will now show a 3. Then turn every fourth die and so on until every nth die (only the last die) is turned. If the die to be turned is showing a 6 then it is changed to show a 1.
Let f(n) be the number of dice that are showing a 1 when the process finishes. You are given f(100)=2 and f(10^8)=69.
Find f(10^36).
I've written the below code in Python using numpy, but can't exactly figure out what I'm doing wrong to my function output to match the output above. Right now f(100) returns 1 (should be 2); even f(1000) returns 1.
import numpy as np
def f(n):
# establish dice and the value sets for the dice
dice = np.arange(1, n + 1)
dice_values = np.ones(len(dice))
turns = range(2, len(dice) + 1)
print("{a} dice, {b} values, {c} runs to process".format(a=len(dice), b=len(dice_values), c=len(turns)))
# iterate and update the values of each die
# in our array of dice
for turn in turns:
# if die to be processed is 6, update to 1
dice_values[(dice_values == 6) & (dice % turn == 0)] = 1
# update dice_values to if the die's index has no remainder
# from the turn we're processing.
dice_values += dice % turn == 0
# output status
print('Processed every {0} dice'.format(turn))
print('{0}\n\n'.format(dice_values))
return "f({n}) = {x}".format(n=n, x=len(np.where(dice_values == 1)))
UPDATE 11/12/18
#Prune's guidance has been extremely helpful. My methodology is now as follows:
Find all the squares from 1 to n.
Find all squares with a number of factors which have a remainder of 1, when dividing by 6.
import numpy as np
# brute force to find number of factors for each n
def factors(n):
result = []
i = 1
# This will loop from 1 to int(sqrt(n))
while i * i <= n:
# Check if i divides x without leaving a remainder
if n % i == 0:
result.append(i)
if n / i != i:
result.append(n / i)
i += 1
# Return the list of factors of x
return len(result)
vect_factors = np.vectorize(factors)
# avoid brute forcing all numbers
def f(n):
# create an array of 1 to n + 1
# find all perfect squares in that range
dice = np.arange(1, n + 1)[(np.mod(np.sqrt(np.arange(1, n + 1)), 1) == 0)]
# find all squares which have n-factors, which
# when divided by 6 have a remainder of 1.
dice = dice[np.mod(vect_factors(dice), 6) == 1]
return len(dice)
Worth noting - on my machine, I'm unable to run larger than 10^10. While solving this would be ideal, I feel that what I've learned (and determined how to apply) in the process is enough for me.
UPDATE 11/13/2018
I'm continuing to spend a small bit of time trying to optimize this to get it processing more quickly. Here's the updated code base. This evaluates f(10**10) in 1 min and 17 seconds.
import time
from datetime import timedelta
import numpy as np
import math
from itertools import chain, cycle, accumulate
def find_squares(n):
return np.array([n ** 2 for n in np.arange(1, highest = math.sqrt(n) + 1)])
# brute force to find number of factors for each n
def find_factors(n):
def prime_powers(n):
# c goes through 2, 3, 5, then the infinite (6n+1, 6n+5) series
for c in accumulate(chain([2, 1, 2], cycle([2, 4]))):
if c * c > n: break
if n % c: continue
d, p = (), c
while not n % c:
n, p, d = n // c, p * c, d + (p,)
yield (d)
if n > 1: yield ((n,))
r = [1]
for e in prime_powers(n):
r += [a * b for a in r for b in e]
return len(r)
vect_factors = np.vectorize(find_factors)
# avoid brute forcing all numbers
def f(n):
# create an array of 1 to n + 1
# find all perfect squares in that range
start = time.time()
dice = find_squares(n)
# find all squares which have n-factors, which
# when divided by 6 have a remainder of 1.
dice = dice[np.mod(vect_factors(dice), 6) == 1]
diff = (timedelta(seconds=int(time.time() - start))).__str__()
print("{n} has {remain} dice with a value of 1. Computed in {diff}.".format(n=n, remain=len(dice), diff=diff))

I'm raising an x/y issue. Fixing your 6 => 1 flip will correct your code, but it will not solve the presented problem in reasonable time. To find f(10^36), you're processing 10^36 dice 10^36 times each, even if it's merely a divisibility check in the filter. That's a total of 10^72 checks. I don't know what hardware you have, but even my multi-core monster doesn't loop 10^72 times soon enough for comfort.
Instead, you need to figure out the underlying problem and try to generate a count for integers that fit the description.
The dice are merely a device to count something in mod 6. We're counting divisors of a number, including 1 and the number itself. This the (in)famous divisor function.
The problem at hand doesn't ask us to find σ0(n) for all numbers; it wants us to count how many integers have σ0(n) = 1 (mod 6). These are numbers with 1, 7, 13, 19, ... divisors.
First of all, note that this is an odd number. The only integers with an odd number of divisors are perfect squares. Look at the divisor function; how can we tell whether the square of a number will have the desired quantity of factors, 1 (mod 6)?
Does that get you moving?
WEEKEND UPDATE
My code to step through 10^18 candidates is still too slow to finish in this calendar year. It did well up to about 10^7 and then bogged down in the O(N log N) checking steps.
However, there are many more restrictions I've noted in my tracing output.
The main one is in characterizing what combinations of prime powers result in a solution. If we reduce each power mod 3, we have the following:
0 values do not affect validity of the result.
1 values make the number invalid.
2 values must be paired.
Also, these conditions are both necessary and sufficient to declare a given number as a solution. Therefore, it's possible to generate the desired solutions without bothering to step through the squares of all integers <= 10^18.
Among other things, we will need only primes up to 10^9: a solution's square root will need at least 2 of any prime factor.
I hope that's enough hints for now ... you'll need to construct an algorithm to generate certain restricted composite combinations with a given upper limit for the product.

As mentioned by Thierry in the comments, you are looping back to 2 when you flip dice at a 6. I'd suggest just changing dice_values[(dice_values == 6) & (dice % turn == 0)] = 1 to equal 0.
You also have an issue with return "f({n}) = {x}".format(n=n, x=len(np.where(dice_values == 1))) that I'd fix by replacing x=len(np.where(dice_values == 1)) with x=np.count_nonzero(dice_values == 1)
Doing both these changes gave me an output of f(100)=2

Related

Google foo.bar challenge "Hey I already did that", not passing all the test cases

This is the description of the problem I am trying to solve.
Hey, I Already Did That!
Commander Lambda uses an automated algorithm to assign minions randomly to tasks, in order to keep minions on their toes. But you've noticed a flaw in the algorithm -- it eventually loops back on itself, so that instead of assigning new minions as it iterates, it gets stuck in a cycle of values so that the same minions end up doing the same tasks over and over again. You think proving this to Commander Lambda will help you make a case for your next promotion.
You have worked out that the algorithm has the following process:
Start with a random minion ID n, which is a nonnegative integer of length k in base b
Define x and y as integers of length k. x has the digits of n in descending order, and y has the digits of n in ascending order
Define z = x - y. Add leading zeros to z to maintain length k if necessary
Assign n = z to get the next minion ID, and go back to step 2
For example, given minion ID n = 1211, k = 4, b = 10, then x = 2111, y = 1112 and z = 2111 - 1112 = 0999. Then the next minion ID will be n = 0999 and the algorithm iterates again: x = 9990, y = 0999 and z = 9990 - 0999 = 8991, and so on.
Depending on the values of n, k (derived from n), and b, at some point the algorithm reaches a cycle, such as by reaching a constant value. For example, starting with n = 210022, k = 6, b = 3, the algorithm will reach the cycle of values [210111, 122221, 102212] and it will stay in this cycle no matter how many times it continues iterating. Starting with n = 1211, the routine will reach the integer 6174, and since 7641 - 1467 is 6174, it will stay as that value no matter how many times it iterates.
Given a minion ID as a string n representing a nonnegative integer of length k in base b, where 2 <= k <= 9 and 2 <= b <= 10, write a function solution(n, b) which returns the length of the ending cycle of the algorithm above starting with n. For instance, in the example above, solution(210022, 3) would return 3, since iterating on 102212 would return to 210111 when done in base 3. If the algorithm reaches a constant, such as 0, then the length is 1.
My solution isn't passing 5 of the 10 test cases for the challenge. I don't understand if there's a problem with my code, as it's performing exactly as the problem asked to solve it, or if it's inefficient.
Here's my code for the problem. I have commented it for easier understanding.
def convert_to_any_base(num, b): # returns id after converting back to the original base as string
digits = []
while(num/b != 0):
digits.append(str(num % b))
num //= b
result = ''.join(digits[::-1])
return result
def solution(n, b):
minion_id_list = [] #list storing all occurrences of the minion id's
k = len(n)
while n not in minion_id_list: # until the minion id repeats
minion_id_list.append(n) # adds the id to the list
x = ''.join(sorted(n, reverse = True)) # gives x in descending order
y = x[::-1] # gives y in ascending order
if b == 10: # if number is already a decimal
n = str(int(x) - int(y)) # just calculate the difference
else:
n = int(x, b) - int(y, b) # else convert to decimal and, calculate difference
n = convert_to_any_base(n, b) # then convert it back to the given base
n = (k-len(n)) * '0' + n # adds the zeroes in front to maintain the id length
if int(n) == 0: # for the case that it reaches a constant, return 1
return 1
return len(minion_id_list[minion_id_list.index(n):]) # return length of the repeated id from
# first occurrence to the end of the list
I have been trying this problem for quite a while and still don't understand what's wrong with it. Any help will be appreciated.

How to refine number permutations for efficiency

I am working on a dice probability program and have been running into some efficiency issues in the permutation section when the numbers get big. For example, the perimeters I am required to run are 10 dice, with 10 sides, with an outcome of 50.
I require a total number of permutations to calculate the probability of the specified outcome given the number of dice and number of sides. The final_count(total, dice, faces) function lets the least number of combinations pass from the generator before moving into the perms(x) function.
The following code works, but for the previously mentioned perimeters it takes an extremely long time.
The perms(x) was posted by #Ashish Datta from this thread:
permutations with unique values
Which is where I believe I need help.
import itertools as it
total = 50
dice = 10
faces = 10
#-------------functions---------------------
# Checks for lists of ALL the same items
def same(lst):
return lst[1:] == lst[:-1]
# Generates the number of original permutations (10 digits takes 1.65s)
def perms(x):
uniq_set = set()
for out in it.permutations(x, len(x)):
if out not in uniq_set:
uniq_set.update([out])
return len(uniq_set)
# Finds total original dice rolls. "combinations" = (10d, 10f, 50t, takes 0.42s)
def final_count(total, dice, faces):
combinations = (it.combinations_with_replacement(range(1, faces+1), dice))
count = 0
for i in combinations:
if sum(i) == total and same(i) == True:
count += 1
elif sum(i) == total and same(i) != True:
count += perms(i)
else:
pass
return count
# --------------functions-------------------
answer = final_count(total, dice, faces) / float(faces**dice)
print(round(answer,4))
I have read the thread How to improve permutation algorithm efficiency with python. I believe my question is different, though a smarter algorithm is my end goal.
I originally posted my first draft of this program in CodeReview. https://codereview.stackexchange.com/questions/212930/calculate-probability-of-dice-total. I realize I am walking a fine line between a question and a code review, but I think in this case, I am more on the question side of things :)

You can use a function that deducts the current dice rolls from the totals for the recursive calls, and short-circuit the search if the total is less than 1 or greater than the number of dices times the number of faces. Use a cache to avoid redundant calculations of the same parameters:
from functools import lru_cache
#lru_cache(maxsize=None)
def final_count(total, dice, faces):
if total < 1 or total > dice * faces:
return 0
if dice == 1:
return 1
return sum(final_count(total - n, dice - 1, faces) for n in range(1, faces + 1))
so that:
final_count(50, 10, 10)
returns within a second: 374894389

I had a similar solution to blhsing but he beat me to it and, to be honest I didn't think of using lru_cache (nice! +1 for that). I'm posting it anyhow if only to illustrate how storage of previously computed counts cuts down on the recursion.
def permutationsTo(target, dices, faces, computed=dict()):
if target > dices*faces or target < 1: return 0
if dices == 1 : return 1
if (target,dices) in computed: return computed[(target,dices)]
result = 0
for face in range(1,min(target,faces+1)):
result += permutationsTo(target-face,dices-1,faces,computed)
computed[(target,dices)] = result
return result

One way to greatly reduce the time is to mathematically count how many combinations there are for each unique group of numbers in combinations, and increment count by that amount. If you have a list of n objects where x1 of them are all alike, x2 of them are all alike, etc., then the total number of ways to arrange them is n!/(x1! x2! x3! ...). For example, the number of different ways to arrange the letters of "Tennessee" is 9!/(1! 4! 2! 2!). So you can make a separate function for this:
import math
import itertools as it
import time
# Count the number of ways to arrange a list of items where
# some of the items may be identical.
def indiv_combos(thelist):
prod = math.factorial(len(thelist))
for i in set(thelist):
icount = thelist.count(i)
prod /= math.factorial(icount)
return prod
def final_count2(total, dice, faces):
combinations = it.combinations_with_replacement(range(1, faces + 1), dice)
count = 0
for i in combinations:
if sum(i) == total:
count += indiv_combos(i)
return count
I don't know off-hand if there's already some built-in function that does the job of what I wrote as indiv_combos2, but you could also use Counter to do the counting and mul to take the product of a list:
from operator import mul
from collections import Counter
def indiv_combos(thelist):
return math.factorial(len(thelist)) / reduce(mul, [math.factorial(i) for i in Counter(thelist).values()],1)
I get mixed results on the times when I try both methods with (25, 10, 10) as the input, but both give me the answer in less than 0.038 seconds every time.

Speeding up algorithm that finds multiples in a given range

I'm a stumped on how to speed up my algorithm which sums multiples in a given range. This is for a problem on codewars.com here is a link to the problem
codewars link
Here's the code and i'll explain what's going on in the bottom
import itertools
def solution(number):
return multiples(3, number) + multiples(5, number) - multiples(15, number)
def multiples(m, count):
l = 0
for i in itertools.count(m, m):
if i < count:
l += i
else:
break
return l
print solution(50000000) #takes 41.8 seconds
#one of the testers takes 50000000000000000000000000000000000000000 as input
# def multiples(m, count):
# l = 0
# for i in xrange(m,count ,m):
# l += i
# return l
so basically the problem ask the user return the sum of all the multiples of 3 and 5 within a number. Here are the testers.
test.assert_equals(solution(10), 23)
test.assert_equals(solution(20), 78)
test.assert_equals(solution(100), 2318)
test.assert_equals(solution(200), 9168)
test.assert_equals(solution(1000), 233168)
test.assert_equals(solution(10000), 23331668)
my program has no problem getting the right answer. The problem arises when the input is large. When pass in a number like 50000000 it takes over 40 seconds to return the answer. One of the inputs i'm asked to take is 50000000000000000000000000000000000000000, which a is huge number. That's also the reason why i'm using itertools.count() I tried using xrange in my first attempt but range can't handle numbers larger than a c type long. I know the slowest part the problem is the multiples method...yet it is still faster then my first attempt using list comprehension and checking whether i % 3 == 0 or i % 5 == 0, any ideas guys?

This solution should be faster for large numbers.
def solution(number):
number -= 1
a, b, c = number // 3, number // 5, number // 15
asum, bsum, csum = a*(a+1) // 2, b*(b+1) // 2, c*(c+1) // 2
return 3*asum + 5*bsum - 15*csum
Explanation:
Take any sequence from 1 to n:
1, 2, 3, 4, ..., n
And it's sum will always be given by the formula n(n+1)/2. This can be proven easily if you consider that the expression (1 + n) / 2 is just a shortcut for computing the average, or Arithmetic mean of this particular sequence of numbers. Because average(S) = sum(S) / length(S), if you take the average of any sequence of numbers and multiply it by the length of the sequence, you get the sum of the sequence.
If we're given a number n, and we want the sum of the multiples of some given k up to n, including n, we want to find the summation:
k + 2k + 3k + 4k + ... xk
where xk is the highest multiple of k that is less than or equal to n. Now notice that this summation can be factored into:
k(1 + 2 + 3 + 4 + ... + x)
We are given k already, so now all we need to find is x. If x is defined to be the highest number you can multiply k by to get a natural number less than or equal to n, then we can get the number x by using Python's integer division:
n // k == x
Once we find x, we can find the sum of the multiples of any given k up to a given n using previous formulas:
k(x(x+1)/2)
Our three given k's are 3, 5, and 15.
We find our x's in this line:
a, b, c = number // 3, number // 5, number // 15
Compute the summations of their multiples up to n in this line:
asum, bsum, csum = a*(a+1) // 2, b*(b+1) // 2, c*(c+1) // 2
And finally, multiply their summations by k in this line:
return 3*asum + 5*bsum - 15*csum
And we have our answer!

Improving runtime on Euler #10

So I was attacking a Euler Problem that seemed pretty simple on a small scale, but as soon as I bump it up to the number that I'm supposed to do, the code takes forever to run. This is the question:
The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
Find the sum of all the primes below two million.
I did it in Python. I could wait a few hours for the code to run, but I'd rather find a more efficient way to go about this. Here's my code in Python:
x = 1;
total = 0;
while x <= 2000000:
y = 1;
z = 0;
while x >= y:
if x % y == 0:
z += 1;
y += 1;
if z == 2:
total += x
x += 1;
print total;

Like mentioned in the comments, implementing the Sieve of Eratosthenes would be a far better choice. It takes up O(n) extra space, which is an array of length ~2 million, in this case. It also runs in O(n), which is astronomically faster than your implementation, which runs in O(n²).
I originally wrote this in JavaScript, so bear with my python:
max = 2000000 # we only need to check the first 2 million numbers
numbers = []
sum = 0
for i in range(2, max): # 0 and 1 are not primes
numbers.append(i) # fill our blank list
for p in range(2, max):
if numbers[p - 2] != -1: # if p (our array stays at 2, not 0) is not -1
# it is prime, so add it to our sum
sum += numbers[p - 2]
# now, we need to mark every multiple of p as composite, starting at 2p
c = 2 * p
while c < max:
# we'll mark composite numbers as -1
numbers[c - 2] = -1
# increment the count to 3p, 4p, 5p, ... np
c += p
print(sum)
The only confusing part here might be why I used numbers[p - 2]. That's because I skipped 0 and 1, meaning 2 is at index 0. In other words, everything's shifted to the side by 2 indices.

Clearly the long pole in this tent is computing the list of primes in the first place. For an artificial situation like this you could get someone else's list (say, this one), prase it and add up the numbers in seconds.
But that's unsporting, in my view. In which case, try the sieve of atkin as noted in this SO answer.

Finding numbers from a to b not divisible by x to y

This is a problem I've been pondering for quite some time.
What is the fastest way to find all numbers from a to b that are not divisible by any number from x to y?
Consider this:
I want to find all the numbers from 1 to 10 that are not divisible by 2 to 5.
This process will become extremely slow if I where to use a linear approach;
Like this:
result = []
a = 1
b = 10
x = 2
y = 5
for i in range(a,b):
t = False
for j in range(x,y):
if i%j==0:
t = True
break
if t is False:
result.append(i)
return result
Does anybody know of any other methods of doing this with less computation time than a linear solution?
If not, can anyone see how this might be done faster, as I am blank at this point...
Sincerely,
John
[EDIT]
The range of the number are 0 to >1,e+100
This is true for a, b, x and y

You only need to check prime values in the range of the possible divisors - for example, if a value is not divisible by 2, it won't be divisible by any multiple of 2 either; likewise for every other prime and prime multiple. Thus in your example you can check 2, 3, 5 - you don't need to check 4, because anything divisible by 4 must be divisible by 2. Hence, a faster approach would be to compute primes in whatever range you are interested in, and then simply calculate which values they divide.
Another speedup is to add each value in the range you are interested in to a set: when you find that it is divisible by a number in your range, remove it from the set. You then should only be testing numbers that remain in the set - this will stop you testing numbers multiple times.
If we combine these two approaches, we see that we can create a set of all values (so in the example, a set with all values 1 to 10), and simply remove the multiples of each prime in your second range from that set.
Edit: As Patashu pointed out, this won't quite work if the prime that divides a given value is not in the set. To fix this, we can apply a similar algorithm to the above: create a set with values [a, b], for each value in the set, remove all of its multiples. So for the example given below in the comments (with [3, 6]) we'd start with 3 and remove it's multiples in the set - so 6. Hence the remaining values we need to test would be [3, 4, 5] which is what we want in this case.
Edit2: Here's a really hacked up, crappy implementation that hasn't been optimized and has horrible variable names:
def find_non_factors():
a = 1
b = 1000000
x = 200
y = 1000
z = [True for p in range(x, y+1)]
for k, i in enumerate(z):
if i:
k += x
n = 2
while n * k < y + 1:
z[(n*k) - x] = False
n += 1
k = {p for p in range(a, b+1)}
for p, v in enumerate(z):
if v:
t = p + x
n = 1
while n * t < (b + 1):
if (n * t) in k:
k.remove(n * t)
n += 1
return k
Try your original implementation with those numbers. It takes > 1 minute on my computer. This implementation takes under 2 seconds.

Ultimate optimization caveat: Do not pre-maturely optimize. Any time you attempt to optimize code, profile it to ensure it needs optimization, and profile the optimization on the same kind of data you intend it to be optimized for to confirm it is a speedup. Almost all code does not need optimization, just to give the correct answer.
If you are optimizing for small x-y and large a-b:
Create an array with length that is the lowest common multiple out of all the x, x+1, x+2... y. For example, for 2, 3, 4, 5 it would be 60, not 120.
Now populate this array with booleans - false initially for every cell, then for each number in x-y, populate all entries in the array that are multiples of that number with true.
Now for each number in a-b, index into the array modulo arraylength and if it is true, skip else if it is false, return.
You can do this a little quicker by removing from you x to y factors numbers whos prime factor expansions are strict supersets of other numbers' prime factor expansions. By which I mean - if you have 2, 3, 4, 5, 4 is 2*2 a strict superset of 2 so you can remove it and now our array length is only 30. For something like 3, 4, 5, 6 however, 4 is 2*2 and 6 is 3*2 - 6 is a superset of 3 so we remove it, but 4 is not a superset of everything so we keep it in. LCM is 3*2*2*5 = 60. Doing this kind of thing would give some speed up on its own for large a-b, and you might not need to go the array direction if that's all you need.
Also, keep in mind that if you aren't going to use the entire result of the function every single time - like, maybe sometimes you're only interested in the lowest value - write it as a generator rather than as a function. That way you can call it until you have enough numbers and then stop, saving time.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.