Speeding up algorithm that finds multiples in a given range - python

I'm a stumped on how to speed up my algorithm which sums multiples in a given range. This is for a problem on codewars.com here is a link to the problem
codewars link
Here's the code and i'll explain what's going on in the bottom
import itertools
def solution(number):
return multiples(3, number) + multiples(5, number) - multiples(15, number)
def multiples(m, count):
l = 0
for i in itertools.count(m, m):
if i < count:
l += i
else:
break
return l
print solution(50000000) #takes 41.8 seconds
#one of the testers takes 50000000000000000000000000000000000000000 as input
# def multiples(m, count):
# l = 0
# for i in xrange(m,count ,m):
# l += i
# return l
so basically the problem ask the user return the sum of all the multiples of 3 and 5 within a number. Here are the testers.
test.assert_equals(solution(10), 23)
test.assert_equals(solution(20), 78)
test.assert_equals(solution(100), 2318)
test.assert_equals(solution(200), 9168)
test.assert_equals(solution(1000), 233168)
test.assert_equals(solution(10000), 23331668)
my program has no problem getting the right answer. The problem arises when the input is large. When pass in a number like 50000000 it takes over 40 seconds to return the answer. One of the inputs i'm asked to take is 50000000000000000000000000000000000000000, which a is huge number. That's also the reason why i'm using itertools.count() I tried using xrange in my first attempt but range can't handle numbers larger than a c type long. I know the slowest part the problem is the multiples method...yet it is still faster then my first attempt using list comprehension and checking whether i % 3 == 0 or i % 5 == 0, any ideas guys?

This solution should be faster for large numbers.
def solution(number):
number -= 1
a, b, c = number // 3, number // 5, number // 15
asum, bsum, csum = a*(a+1) // 2, b*(b+1) // 2, c*(c+1) // 2
return 3*asum + 5*bsum - 15*csum
Explanation:
Take any sequence from 1 to n:
1, 2, 3, 4, ..., n
And it's sum will always be given by the formula n(n+1)/2. This can be proven easily if you consider that the expression (1 + n) / 2 is just a shortcut for computing the average, or Arithmetic mean of this particular sequence of numbers. Because average(S) = sum(S) / length(S), if you take the average of any sequence of numbers and multiply it by the length of the sequence, you get the sum of the sequence.
If we're given a number n, and we want the sum of the multiples of some given k up to n, including n, we want to find the summation:
k + 2k + 3k + 4k + ... xk
where xk is the highest multiple of k that is less than or equal to n. Now notice that this summation can be factored into:
k(1 + 2 + 3 + 4 + ... + x)
We are given k already, so now all we need to find is x. If x is defined to be the highest number you can multiply k by to get a natural number less than or equal to n, then we can get the number x by using Python's integer division:
n // k == x
Once we find x, we can find the sum of the multiples of any given k up to a given n using previous formulas:
k(x(x+1)/2)
Our three given k's are 3, 5, and 15.
We find our x's in this line:
a, b, c = number // 3, number // 5, number // 15
Compute the summations of their multiples up to n in this line:
asum, bsum, csum = a*(a+1) // 2, b*(b+1) // 2, c*(c+1) // 2
And finally, multiply their summations by k in this line:
return 3*asum + 5*bsum - 15*csum
And we have our answer!

Related

Google foo.bar challenge "Hey I already did that", not passing all the test cases

This is the description of the problem I am trying to solve.
Hey, I Already Did That!
Commander Lambda uses an automated algorithm to assign minions randomly to tasks, in order to keep minions on their toes. But you've noticed a flaw in the algorithm -- it eventually loops back on itself, so that instead of assigning new minions as it iterates, it gets stuck in a cycle of values so that the same minions end up doing the same tasks over and over again. You think proving this to Commander Lambda will help you make a case for your next promotion.
You have worked out that the algorithm has the following process:
Start with a random minion ID n, which is a nonnegative integer of length k in base b
Define x and y as integers of length k. x has the digits of n in descending order, and y has the digits of n in ascending order
Define z = x - y. Add leading zeros to z to maintain length k if necessary
Assign n = z to get the next minion ID, and go back to step 2
For example, given minion ID n = 1211, k = 4, b = 10, then x = 2111, y = 1112 and z = 2111 - 1112 = 0999. Then the next minion ID will be n = 0999 and the algorithm iterates again: x = 9990, y = 0999 and z = 9990 - 0999 = 8991, and so on.
Depending on the values of n, k (derived from n), and b, at some point the algorithm reaches a cycle, such as by reaching a constant value. For example, starting with n = 210022, k = 6, b = 3, the algorithm will reach the cycle of values [210111, 122221, 102212] and it will stay in this cycle no matter how many times it continues iterating. Starting with n = 1211, the routine will reach the integer 6174, and since 7641 - 1467 is 6174, it will stay as that value no matter how many times it iterates.
Given a minion ID as a string n representing a nonnegative integer of length k in base b, where 2 <= k <= 9 and 2 <= b <= 10, write a function solution(n, b) which returns the length of the ending cycle of the algorithm above starting with n. For instance, in the example above, solution(210022, 3) would return 3, since iterating on 102212 would return to 210111 when done in base 3. If the algorithm reaches a constant, such as 0, then the length is 1.
My solution isn't passing 5 of the 10 test cases for the challenge. I don't understand if there's a problem with my code, as it's performing exactly as the problem asked to solve it, or if it's inefficient.
Here's my code for the problem. I have commented it for easier understanding.
def convert_to_any_base(num, b): # returns id after converting back to the original base as string
digits = []
while(num/b != 0):
digits.append(str(num % b))
num //= b
result = ''.join(digits[::-1])
return result
def solution(n, b):
minion_id_list = [] #list storing all occurrences of the minion id's
k = len(n)
while n not in minion_id_list: # until the minion id repeats
minion_id_list.append(n) # adds the id to the list
x = ''.join(sorted(n, reverse = True)) # gives x in descending order
y = x[::-1] # gives y in ascending order
if b == 10: # if number is already a decimal
n = str(int(x) - int(y)) # just calculate the difference
else:
n = int(x, b) - int(y, b) # else convert to decimal and, calculate difference
n = convert_to_any_base(n, b) # then convert it back to the given base
n = (k-len(n)) * '0' + n # adds the zeroes in front to maintain the id length
if int(n) == 0: # for the case that it reaches a constant, return 1
return 1
return len(minion_id_list[minion_id_list.index(n):]) # return length of the repeated id from
# first occurrence to the end of the list
I have been trying this problem for quite a while and still don't understand what's wrong with it. Any help will be appreciated.

Python coding, nested loops

Assume there are two variables, k and m, each already associated with a positive integer value and further assume that k's value is smaller than m's. Write the code necessary to compute the number of perfect squares between k and m. (A perfect square is an integer like 9, 16, 25, 36 that is equal to the square of another integer (in this case 3*3, 4*4, 5*5, 6*6 respectively).) Associate the number you compute with the variable q. For example, if k and m had the values 10 and 40 respectively, you would assign 3 to q because between 10 and 40 there are these perfect squares: 16, 25, and 36,.
**If I want to count the numbers between 16 and 100( 5,6,7,8,9 =makes 5)and write code in terms of with i and j, my code would be as follows but something goes wrong. I want to get the result,5 at last. how can I correct it?
k=16
m=100
i=0
j=0
q1=0
q2=0
while j**2 <m:
q2=q2+1
while i**2 <k:
q1=q1+1
i=i+1
j=j+1
print(q2-q1)
Your probably don't want to loop for this. If k and m are very far apart it will take a long time.
Given k < m, you want to compute how many integers l such that k < l^2 < m. The smallest possible such integer is floor( sqrt(k) +1 ) and the largest possible such integer is ceil(sqrt(m)-1). The number of such integers is:
import math
def sq_between(k,m):
return math.ceil(m**0.5-1) - math.floor(k**0.5+1) +1
This allows for
sq_between(16,100)
yielding:
5
Here is another version of your function that seems to do to what you ask for.
k = 16
m = 100
perfect_squares = []
for i in range(m):
if i**2 < k:
continue
if i**2 > m:
break
perfect_squares.append(i**2)
print(perfect_squares)
You code is mixing up everything in the second while loop. If you explain a bit further what you are trying to do there, I will probably be able to explain why your idea is not working.
I would change your code as follows in order to make it work:
k = 10
m = 40
i = 0
q = 0
while i ** 2 < m:
if i ** 2 > k:
print(i)
q += 1
i += 1
print (q)
By utilizing the fact that each square number can get expressed via square = sum from i = 1 to n (2 * i + 1) there is an easy way of speedup the above algorithm - but the algorithm will become much longer then ...

Counting with Replacement Every k Turns

The problem is as follows: You have n types of items and you want to select l of them (order matters). You can resample items of a type only if there are k other items selected since the last time you selected that item. Count the total number of sequences of items you can form. If this is confusing, the following example will clear things up:
Say n = 5, l = 6, and k = 3.
The answer is 5 * 4 * 3 * 2 * 2 * 2.
On the first turn we can choose any of the 5 items. On the second, third and fourth turns again we can choose any of the 4, 3, and 2 remaining items. Then, on the fifth turn we can choose 1, but also 5 again because there were 3 other items selected since the last it was picked, and so on. So the total count is 480.
Here's a naive algorithm to solve this:
def differentPlaylists(n, k, l):
ans, choices = 1, n
while l > 0:
ans = (ans * choices) % 1000000007
choices -= 1
k, l = k - 1, l - 1
if k < 0: choices += 1
return ans
This works, but it's too slow. I can't figure out how I could produce an algorithm that solves this problem in less than l multiply ops.
Can someone help me figure out how I could do that?
It seems you need only a remainder of the exact number. The answer is:
(n! / (n-k)! * (n-k)^(l-k)) % M =
(((n! / (n-k)!) % M) * ((n-k)^(l-k) % M)) % M.
You don't need a loop to find (n-k)^(l-k) % M, you can use exponentiation by squaring that works in O(log(l-k)). If k is small enough it will make overall computation significantly faster because the first factorial part of this formula is calculated in O(k) in your solution. As a result the complexity is O(log(l-k)) + O(k) instead of O(l) in your implementation.

Sum of all numbers

I need to write a function that calculates the sum of all numbers n.
Row 1: 1
Row 2: 2 3
Row 3: 4 5 6
Row 4: 7 8 9 10
Row 5: 11 12 13 14 15
Row 6: 16 17 18 19 20 21
It helps to imagine the above rows as a 'number triangle.' The function should take a number, n, which denotes how many numbers as well as which row to use. Row 5's sum is 65. How would I get my function to do this computation for any n-value?
For clarity's sake, this is not homework. It was on a recent midterm and needless to say, I was stumped.
The leftmost number in column 5 is 11 = (4+3+2+1)+1 which is sum(range(5))+1. This is generally true for any n.
So:
def triangle_sum(n):
start = sum(range(n))+1
return sum(range(start,start+n))
As noted by a bunch of people, you can express sum(range(n)) analytically as n*(n-1)//2 so this could be done even slightly more elegantly by:
def triangle_sum(n):
start = n*(n-1)//2+1
return sum(range(start,start+n))
A solution that uses an equation, but its a bit of work to arrive at that equation.
def sumRow(n):
return (n**3+n)/2
The numbers 1, 3, 6, 10, etc. are called triangle numbers and have a definite progression. Simply calculate the two bounding triangle numbers, use range() to get the numbers in the appropriate row from both triangle numbers, and sum() them.
Here is a generic solution:
start=1
n=5
for i in range(n):
start += len (range(i))
answer=sum(range(start,start+n))
As a function:
def trio(n):
start=1
for i in range(n):
start += len (range(i))
answer=sum(range(start,start+n))
return answer
def sum_row(n):
final = n*(n+1)/2
start = final - n
return final*(final+1)/2 - start*(start+1)/2
or maybe
def sum_row(n):
final = n*(n+1)/2
return sum((final - i) for i in range(n))
How does it work:
The first thing that the function does is to calculate the last number in each row. For n = 5, it returns 15. Why does it work? Because each row you increment the number on the right by the number of the row; at first you have 1; then 1+2 = 3; then 3+3=6; then 6+4=10, ecc. This impy that you are simply computing 1 + 2 + 3 + .. + n, which is equal to n(n+1)/2 for a famous formula.
then you can sum the numbers from final to final - n + 1 (a simple for loop will work, or maybe fancy stuff like list comprehension)
Or sum all the numbers from 1 to final and then subtract the sum of the numbers from 1 to final - n, like I did in the formula shown; you can do better with some mathematical operations
def compute(n):
first = n * (n - 1) / 2 + 1
last = first + n - 1
return sum(xrange(first, last + 1))

Subset sum Problem

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:
import random
from time import time
from itertools import product
time0 = time()
# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
return [[0]*b for x in xrange(a)]
# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
return [random.randrange(lower,upper+1) for i in range(num)]
# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
N_list = []
P_list = []
for x in A:
if x < 0:
N_list.append(x)
elif x > 0:
P_list.append(x)
return [sum(N_list), sum(P_list)]
# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
if n < 0:
return 0
try:
return table[n][m - N]
except:
return 0
# same definition as above
def set_element(table, n, m, N, value):
table[n][m - N] = value
# input array
#A = [1, -3, 2, 4]
A = random_ints(200)
[N, P] = split_sum(A)
# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)
# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)
# iterate through each table element
#for i in xrange(1, m): #row
# for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
#set_element(table, i, s, N, 1)
table[i][s - N] = 1
# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
s = s - A[i]
solution.append(A[i])
print "Solution: ",solution
time1 = time()
print "Time execution: ", time1 - time0
I'm not quite sure if your solution is exact or a PTA (poly-time approximation).
But, as someone pointed out, this problem is indeed NP-Complete.
Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.
Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:
2^59 ops --> 2^59 seconds --> 2^26 years --> 1 year
-------------- ---------------
10.000.000.000 3600 x 24 x 365
You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.
On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory.
And even much larger than the few tera-bytes on your hard drive.
( That's for small values of the bound for the value of the elements in the set )
May be this is the case of your Dynamic programing algorithm.
It seemed to me that you were using a bound of 1000 when building your initialization matrix.
You can try a smaller bound. That is... if your input is consistently consist of small values.
Good Luck!
Someone on Hacker News came up with the following solution to the problem, which I quite liked. It just happens to be in python :):
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I spent a few minutes with it and it worked very well.
An interesting article on optimizing python code is available here. Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.
Hope that helps! Cheers
, 1st eye catch
def split_sum(A):
N_list = 0
P_list = 0
for x in A:
if x < 0:
N_list+=x
elif x > 0:
P_list+=x
return [N_list, P_list]
Some advices:
Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon. This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)
Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.
The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
nums = input("Enter the Elements").strip().split()
inputSum = int(input("Enter the Sum You want"))
for i, combo in enumerate(powerset(nums), 1):
sum = 0
for num in combo:
sum += int(num)
if sum == inputSum:
print(combo)
The Input Output is as Follows:
Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')
Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).
def subsetsum(cs,k,r,x,w,d):
x[k]=1
if(cs+w[k]==d):
for i in range(0,k+1):
if x[i]==1:
print (w[i],end=" ")
print()
elif cs+w[k]+w[k+1]<=d :
subsetsum(cs+w[k],k+1,r-w[k],x,w,d)
if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
x[k]=0
subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]
subsetsum(0,0,sum(w),x,w,7)

Categories

Resources