Python - How to improve efficiency of complex recursive function?

Python - How to improve efficiency of complex recursive function? - python

In this video by Mathologer on, amongst other things, infinite sums there are 3 different infinite sums shown at 9:25, when the video freezes suddenly and an elephant diety pops up, challenging the viewer to find "the probable values" of the expressions. I wrote the following script to approximate the last of the three (i.e. 1 + 3.../2...) with increasing precision:
from decimal import Decimal as D, getcontext # for accurate results
def main(c): # faster code when functions defined locally (I think)
def run1(c):
c += 1
if c <= DEPTH:
return D(1) + run3(c)/run2(c)
else:
return D(1)
def run2(c):
c += 1
if c <= DEPTH:
return D(2) + run2(c)/run1(c)
else:
return D(2)
def run3(c):
c += 1
if c <= DEPTH:
return D(3) + run1(c)/run3(c)
else:
return D(3)
return run1(c)
getcontext().prec = 10 # too much precision isn't currently necessary
for x in range(1, 31):
DEPTH = x
print(x, main(0))
Now this is working totally fine for 1 <= x <= 20ish, but it starts taking an eternity for each result after that. I do realize that this is due to the exponentially increasing number of function calls being made at each DEPTH level. It is also clear that I won't be able to calculate the series comfortably up to an arbitrary point. However, the point at which the program slows down is too early for me to clearly identify the limit the series it is converging to (it might be 1.75, but I need more DEPTH to be certain).
My question is: How do I get as much out of my script as possible (performance-wise)?
I have tried:
1. finding the mathematical solution to this problem. (No matching results)
2. finding ways to optimize recursive functions in general. According to multiple sources (e.g. this), Python doesn't optimize tail recursion by default, so I tried switching to an iterative style, but I ran out of ideas on how to accomplish this almost instantly...
Any help is appreciated!
NOTE: I know that I could go about this mathematically instead of "brute-forcing" the limit, but I want to get my program running well, now that I've started...

You can store the results of the run1, run2 and run3 functions in arrays to prevent them from being recalculated every time, since in your example, main(1) calls run1(1), which calls run3(2) and run2(2), which in turn call run1(3), run2(3), run1(3) (again) and run3(3), and so on.
You can see that run1(3) is being called evaluated twice, and this only gets worse as the number increases; if we count the number of times each function is called, those are the results:
run1 run2 run3
1 1 0 0
2 0 1 1
3 1 2 1
4 3 2 3
5 5 6 5
6 11 10 11
7 21 22 21
8 43 42 43
9 85 86 85
...
20 160,000 each (approx.)
...
30 160 million each (approx.)
This is actually a variant of a Pascal triangle, and you could probably figure out the results mathematically; but since here you asked for a non mathematical optimization, just notice how the number of calls increases exponentially; it doubles at each iteration. This is even worse since each call will generate thousands of subsequent calls with higher values, which is what you want to avoid.
Therefore what you want to do is store the value of each call, so that the function does not need to be called a thousand times (and itself make thousands more calls) to always get the same result. This is called memoization.
Here is an example solution in pseudo code:
before calling main, declare the arrays val1, val2, val3, all of size DEPTH, and fill them with -1
function run1(c) # same thing for run2 and run3
c += 1
if c <= DEPTH
local3 = val3(c) # read run3(c)
if local3 is -1 # if run3(c) hasn't been computed yet
local3 = run3(c) # we compute it
val3(c) = local3 # and store it into the array
local2 = val2(c) # same with run2(c)
if local2 is -1
local2 = run2(c)
val2(c) = local2
return D(1) + local3/local2 # we use the value we got from the array or from the computation
else
return D(1)
Here I use -1 since your functions seem to only generate positive numbers, and -1 is an easy placeholder for the empty cells. In other cases you might have to use an object as Cabu below me did. I however think this would be slower due to the cost of retrieving properties in an object versus reading an array, but I might be wrong about that. Either way, your code should be much, much faster with it is now, with a cost of O(n) instead of O(2^n).
This would technically allow your code to run forever at a constant speed, but the recursion will actually cause an early stack overflow. You might still be able to get to a depth of several thousands before that happens though.
Edit: As ShadowRanger added in the comments, you can keep your original code and simply add #lru_cache(maxsize=n) before each of your run1, run2 and run3 functions, where n is one of the first powers of two above DEPTH (for example, 32 if depth is 25). This might require an import directive to work.

With some memoization, You could get up to the stack overflow:
from decimal import Decimal as D, getcontext # for accurate results
def main(c): # faster code when functions defined locally (I think)
mrun1 = {} # store partial results of run1, run2 and run3
# This have not been done in the as parameter of the
# run function to be able to reset them easily
def run1(c):
if c in mrun1: # if partial result already computed, return it
return mrun1[c]
c += 1
if c <= DEPTH:
v = D(1) + run3(c) / run2(c)
else:
v = D(1)
mrun1[c] = v # else store it and return the value
return v
def run2(c):
if c in mrun2:
return mrun2[c]
c += 1
if c <= DEPTH:
v = D(2) + run2(c) / run1(c)
else:
v = D(2)
mrun2[c] = v
return v
def run3(c):
if c in mrun3:
return mrun3[c]
c += 1
if c <= DEPTH:
v = D(3) + run1(c) / run3(c)
else:
v = D(3)
mrun3[c] = v
return v
return run1(c)
getcontext().prec = 150 # too much precision isn't currently necessary
for x in range(1, 997):
DEPTH = x
print(x, main(0))
Python will stack overflow if you go over 997.

Related

Julia code not finishing while Python code does

I am very new to Julia and was trying to implement/rewrite some of my previous Python code as practice. I was using the Project Euler problem 25 as practice
In Python I have
def fibonacci(N):
"""Returns the Nth Fibonacci Number"""
F = [0, 1]
i = 0
while i <= N-2:
F_new = F[i] + F[i+1]
F.append(F_new)
i += 1
return F[N]
N = 0
x = 1000
while len(str(fibonacci(N))) <= x:
if len(str(fibonacci(N))) == x:
print(N)
break
N = N + 1
Which runs and gives me the correct answer in about 6.5 seconds. When trying to do this in Julia below
function fib(N)
F = [0, 1]
global i = 1
while i <= N-2
F_new = F[i] + F[i+1]
append!(F, F_new)
global i += 1
end
return F[N]
end
N = 1
x = 1000
while length(string(fib(N))) <= x
if length(string(fib(N))) == x
print(N-1)
break
end
global N += 1
end
The code seems to run "forever". However in the Julia code only when x<= 20 will the code finish and produce the correct answer. In the Julia code when x>20 the program never ends.
I'm not sure where something could go wrong if it runs for all values below 21? Could somebody explain where the error is happening and why?

Python integers are by default unbounded in size and will grow as needed. Julia on the other hand will default to a signed 64 bit integer if on a 64 bit system. (See docs) This begins to overflow when trying to calculate values above around 19 digits long, hence why this starts around x=20. In order to get the same behavior in Julia, you should use the BigInt type for any values or arguments which can get above this size.

The main problem with your code is what #duckboycool has described. The second advice is to always write functions in Julia. Read the Julia performance tips page for a good start.
Note that you can make the function by #Bill 2X faster by removing the unnecessary if like this:
function test(x = 1000)
N = 0
while ndigits(fib(N)) < x
N += 1
end
return N
end
But if you really want a 16000X faster Julia version, then you can do this:
function euler25()
limit = big(10)^999
a, b = big(1), big(1)
N = 2
while b <= limit
a, b = b, a + b
N += 1
end
return N
end
#btime euler25() = 4782
377.700 μs (9573 allocations: 1.15 MiB)
This runs in 377 μs, because we avoid calculating fib(N) at every step from the beginning. And instead of comparing with the length of a string of the output at each iteration, we just compare with 10^999.

In addition to the earlier answer, note that you should avoid globals if looking at performance, so this is much faster than your global i and x code:
function fib(N)
F = [big"0", big"1"]
for i in 1:N-2
F_new = F[i] + F[i+1]
push!(F, F_new)
end
return F[N]
end
function test(x = 1000)
N = 1
while length(string(fib(N))) <= x
if length(string(fib(N))) == x
print(N-1)
break
end
N += 1
end
end
test()

#AboAmmar shows probably the best "normal" way of writing this. But if you want something even a bit more optimized, you can use in-place BigInt calls. I'm not sure whether I would recommend this, but it's nice to be aware of it.
using Base.GMP.MPZ: add!, set!
function euler25_(limit=big(10)^999)
a, b = big(1), big(1)
N = 2
c = big(0)
while b <= limit
add!(c, a, b)
set!(a, b)
set!(b, c)
N += 1
end
return N
end
This uses the special BigInt functions in the GMP.MPZ library, and writes values in-place, avoiding most of the allocations, and running 2.5x faster on my laptop.

Does an algorithm exist that converts a (base 10) number to into another number for any base in constant time?

I am solving a problem where I am given three integers (a,b,c), all three can be very large and (a>b>c)
I want to identify for which base between b and c, produces the smallest sum of digits, when we convert 'a' to that base.
For example a = 216, b=2, c=7 -> the output= 6, because: 216 base 2 = 11011000, and the sum of digits = 4, if we do the same for all bases between 2 and 7, we find that 216 base 6 produces the smallest sum of digits, because 216 base 6 = 1000, which has sum 1.
My question is, is there any function out there that can convert a number to any base in constant time faster than the below algorithm? Or any suggestions on how to optimise my algorithm?
from collections import defaultdict
n = int(input())
for _ in range(n):
(N,X) = map(int,input().split())
array = list(map(int,input().split()))
my_dict = defaultdict(int)
#original count of elements in array
for i in range(len(array)):
my_dict[array[i]] +=1
#ensure array contains distinct elements
array = set(array)
count = max(my_dict.values()) #count= max of single value
temp = count
res = None
XOR_count = float("inf")
if X==0:
print(count,0)
break
for j in array:
if j^X in my_dict:
curr = my_dict[j^X] + my_dict[j]
if curr>=count:
count = curr
XOR_count = min(my_dict[j],XOR_count)
if count ==temp:
XOR_count = 0
print(f"{count} {XOR_count}")
Here are some sample input and outputs:
Sample Input
3
3 2
1 2 3
5 100
1 2 3 4 5
4 1
2 2 6 6
Sample Output
2 1
1 0
2 0
Which for the problem I am solving runs into time limit exceeded error.
I found this link to be quite useful (https://www.purplemath.com/modules/logrules5.htm) in terms of converting log bases, which I can kind of see how it relates, but I couldn't use it to get a solution for my above problem.

You could separate the problem in smaller concerns by writing a function that returns the sum of digits in a given base and another one that returns a number expressed in a given base (base 2 to 36 in my example below):
def digitSum(N,b=10):
return N if N<b else N%b+digitSum(N//b,b)
digits = "0123456789abcdefghijklmnopqrstuvwxyz"
def asBase(N,b):
return "" if N==0 else asBase(N//b,b)+digits[N%b]
def lowestBase(N,a,b):
return asBase(N, min(range(a,b+1),key=lambda c:digitSum(N,c)) )
output:
print(lowestBase(216,2,7))
1000 # base 6
print(lowestBase(216,2,5))
11011000 # base 2
Note that both digitSum and asBase could be written as iterative instead of recursive if you're manipulating numbers that are greater than base^1000 and don't want to deal with recursion depth limits
Here's a procedural version of digitSum (to avoid recursion limits):
def digitSum(N,b=10):
result = 0
while N:
result += N%b
N //=b
return result
and returning only the base (not the encoded number):
def lowestBase(N,a,b):
return min(range(a,b+1),key=lambda c:digitSum(N,c))
# in which case you don't need the asBase() function at all.
With those changes results for a range of bases from 2 to 1000 are returned in less than 60 milliseconds:
lowestBase(10**250+1,2,1000) --> 10 in 57 ms
lowestBase(10**1000-1,2,1000) --> 3 in 47 ms
I don't know how large is "very large" but it is still sub-second for millions of bases (yet for a relatively smaller number):
lowestBase(10**10-1,2,1000000) --> 99999 in 0.47 second
lowestBase(10**25-7,2,1000000) --> 2 in 0.85 second
[EDIT] optimization
By providing a maximum sum to the digitSum() function, you can make it stop counting as soon as it goes beyond that maximum. This will allow the lowestBase() function to obtain potential improvements more efficiently based on its current best (minimal sum so far). Going through the bases backwards also gives a better chance of hitting small digit sums faster (thus leveraging the maxSum parameter of digitSum()):
def digitSum(N,b=10,maxSum=None):
result = 0
while N:
result += N%b
if maxSum and result>=maxSum:break
N //= b
return result
def lowestBase(N,a,b):
minBase = a
minSum = digitSum(N,a)
for base in range(b,a,-1):
if N%base >= minSum: continue # last digit already too large
baseSum = digitSum(N,base,minSum)
if baseSum < minSum:
minBase,minSum = base,baseSum
if minSum == 1: break
return minBase
This should yield a significant performance improvement in most cases.

Multiprocessing a Straight-forward Computation

Breakdown of the problem: There's a fair bit of underlying motivation, but let's say we are given some matrix N (code included at the bottom) and we wish to solve the matrix-vector equation Nv = b, where b is a binary string of weight k. I'm currently solving this using numpy.linalg.lstsq. If the 'residual' of this least-square calculation is less than 0.000001, I'm accepting it as a solution.
First, I need to generate all binary strings of a certain length and weight k. The following function does this.
'''
This was provided in one of the answers here:
https://stackoverflow.com/questions/58069431/find-all-binary-strings-of-certain-weight-has-fast-as-possible/
Given a value with weight k, you can get the (co)lexically next value as follows (using bit manipulations).
'''
def ksubsetcolexsuccessor(length,k):
limit=1<<length
val=(1<<k)-1
while val<limit:
yield "{0:0{1}b}".format(val,length)
minbit=val&-val
fillbit = (val+minbit)&~int(val)
val = val+minbit | (fillbit//(minbit<<1))-1
So for length = 6 and k = 2 we have:
0
00011, 000101, 000110, 001001, 001010, 001100, 010001, 010010, 010100, 011000, 100001, 100010, 100100, 101000, 110000.
We plug each into Nv = b and solve:
import numpy as np
solutions = 0
for b in ksubsetcolexsuccessor(n, k):
v = np.linalg.lstsq(N,np.array(list(b), dtype = float))
if v[1] < 0.000001:
solutions += 1
This totals the amount of solutions. My issue is that this is fairly slow. I'd like to eventually try n = 72 and k = 8. I noticed that only a fraction of my CPU is being used, so I'd like to multiprocess. I started by figuring out how to generate the above binary strings in chunks. I have written a function which gives me values for val that allows me to start and stop generating the above binary string sequence in any place. So my plan was to have as many chunks as I have cores on my machine. For example:
Core 1 processes: 000011, 000101, 000110, 001001, 001010, 001100, 010001
Core 2 processes: 010010, 010100, 011000, 100001, 100010, 100100, 101000, 110000
I am unfamiliar with multiprocessing and multithreading. Here is what I've done:
from multiprocessing import Process
'''
A version where you can specify where to start and end by passing start_val and end_val to the function.
It performs the least-squares computation for each b and tallies the trials that work.
'''
def ksubsetcolexsuccessorSegmentComp(n, k, val, limit, num):
while val<limit:
b = "{0:0{1}b}".format(val,n)
v = np.linalg.lstsq(N,np.array(list(b), dtype = float))
if v[1] < 0.000001:
num += 1
minbit=val&-val
fillbit = (val+minbit)&~int(val)
val = val+minbit | (fillbit//(minbit<<1))-1
print(num)
solutions = 0
length = 6
weight = 2
start_val = (1<<k)-1
mid_val = 18 #If splitting into 2 processes
end_val = 1<<n
#Multiprocessing
p1 = Process(target=ksubsetcolexsuccessorSegmentComp, args=(length, weight, start_val, mid_val, solutions))
p2 = Process(target=ksubsetcolexsuccessorSegmentComp, args=(length, weight, mid_val, end_val, solutions))
p1.start()
p2.start()
p1.join()
p2.join()
This does indeed reduce the computation time by around 45% (with increasing the length and weight). My CPU runs at around 50% during this. Oddly enough, if I split into 4 processes it takes roughly the same as if I split into 2 (even though my CPU runs at around 85% during that). On the laptop I'm testing this on, I have 2 physical cores and 4 logical cores. How many processes is optimal on such a CPU?
Questions:
In my lack of knowledge of multiprocessing, have I done this correctly? i.e. is this the fastest way to multiprocess this program?
Can anyone see any clear way to improve computation time? I've looked into multithreading within each process, but it seemed to slow down the computation.
The slowest part of this is definitely np.linalg.lstsq. Are there any clever ways to speed this part up?
If anyone wishes to test the code, here is how I generate the matrix N:
'''
Creating the matrix N
'''
def matrixgenerator(G, cardinality, degree):
matrix = np.zeros((size, (n-1)**2 + 1))
#Generate the matrix (it'll be missing stab_nn)
column_index = 0
for i in range(1,n):
for j in range(1,n):
for k in range(size):
if G[k](i) == j:
matrix[k][column_index] = 1
column_index += 1
#Determine the final column of N, which is determined by the characteristic vector of stab_nn
for k in range(size):
if G[k](n) == n:
matrix[k][(n-1)**2] = 1
return matrix
n = 3 #This will give length 6 and weight 2 binary strings
#Try n = 7 and change the second argument below to '4' - this takes roughly 14 min with 2 processes on my device
Group = TransitiveGroup(n, 2) #SageMath 8.8
length = Group.order()
weight = int(size / n)
N = matrixgenerator(Group, length, n)
This requires SageMath 8.8. Note this is the only part of the code that requires SageMath. Everything else is effectively written in Python 2.7. Thanks in advance.

Power Function Python

This function is calculates the value of a^b and returns it.My question is if m=log(b)
the best case scenario is that it does m+1 interactions
but what is the worst case? and how many times it enters the while loop?
def power(a,b):
result=1
while b>0: # b is nonzero
if b % 2 == 1:
result=result*a
a=a*a
b = b//2
return result

As #EliSadoff stated in a comment, you need an initial value of result in your function. Insert the line
result = 1
just after the def line. The code then works, and this is a standard way to implicitly use the binary representation of b to quickly get the exponentiation. (The loop invariant is that the value of result * a ** b remains constant, which shows the validity of this algorithm.)
The worst case is where your if b % 2 line is executed every time through the while loop. This will happen whenever b is one less than a power of 2, so every digit in bs binary representation is one. The while loop condition while b>0 is still checked only m+1 times, but each loop now has a little more to do.
There are several ways to speed up your code. Use while b rather than while b>0 and if b & 1 rather than if b % 2 = 1. Use result *= a rather than result = result*a and a *= a rather than a = a*a and b >>= 1 rather than b = b // 2. These are fairly minor improvements, of course. The only way to speed up the loop further is to use non-structured code, which I believe isn't possible in Python. (There is one more modification of a than is necessary but there is no good way to prevent that without a jump into a loop.) There are some variations on this code, such as an inner loop to keep modifying a and b as long as b is even, but that is not always faster.
The final code is then
def power(a, b):
"""Return a ** b, assuming b is a nonnegative integer"""
result = 1
while b:
if b & 1:
result *= a
a *= a
b >>= 1
return result
I cleaned up your code a little to better fit PEP8 (Python style standards). Note that there is no error checking in your code, especially to ensure that b is a nonnegative integer. I believe my code gets an infinite loop if b is a negative integer while yours returns a false result. So please do that error check! Also note that your code says power(0, 0) == 1 which is pretty standard for such a function but still takes some people by surprise.

My short recursive function takes too long to execute, how can I optimize it

I am trying to solve this problem on CodeChef: http://www.codechef.com/problems/COINS
But when I submit my code, it apparently takes too long to execute, and says the time has expired. I am not sure if my code is inefficient (it doesn't seem like it to me) or if it I am having trouble with I/O. There is a 9 second time limit, to solve maximum 10 inputs, 0 <= n <= 1 000 000 000.
In Byteland they have a very strange monetary system.
Each Bytelandian gold coin has an integer number written on it. A coin
n can be exchanged in a bank into three coins: n/2, n/3 and n/4. But
these numbers are all rounded down (the banks have to make a profit).
You can also sell Bytelandian coins for American dollars. The exchange
rate is 1:1. But you can not buy Bytelandian coins.
You have one gold coin. What is the maximum amount of American dollars
you can get for it?
Here is my code: It seems to take too long for an input of 1 000 000 000
def coinProfit(n):
a = n/2
b = n/3
c = n/4
if a+b+c > n:
nextProfit = coinProfit(a)+coinProfit(b)+coinProfit(c)
if nextProfit > a+b+c:
return nextProfit
else:
return a+b+c
return n
while True:
try:
n = input()
print(coinProfit(n))
except Exception:
break

The problem is that your code branches each recursive call into three new ones. This leads to exponential behavior.
The nice thing however is that most calls are duplcates: if you call coinProfit with 40, this will cascade to:
coinProfit(40)
- coinProfit(20)
- coinProfit(10)
- coinProfit(6)
- coinProfit(5)
- coinProfit(13)
- coinProfit(10)
What you see is that a lot of effort is repeated (in this small example, coinProfit is called already twice on 10).
You can use Dynamic programming to solve this: store earlier computed results preventing you from branching again on this parts.
One can implement dynamic programing him/herself, but one can use the #memoize decorator to do this automatically.
Now the function does a lot of work way too much times.
import math;
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
return memo[x]
return helper
#memoize
def coinProfit(n):
a = math.floor(n/2)
b = math.floor(n/3)
c = math.floor(n/4)
if a+b+c > n:
nextProfit = coinProfit(a)+coinProfit(b)+coinProfit(c)
if nextProfit > a+b+c:
return nextProfit
else:
return a+b+c
return n
The #memoize transforms the function such that: for the function, an array of already calculated outputs is maintained. If for a given input, the output has already been computed, it is stored in the array, and immediately returned. Otherwise it is computed as defined by your method, stored in the array (for later use) and returned.
As #steveha points out, python already has a built-in memoize function called lru_cache, more info can be found here.
A final note is that #memoize or other Dynamic programming constructs, are not the solution to all efficiency problems. First of all #memoize can have an impact on side-effects: say your function prints something on stdout, then with #memoize this will have an impact on the number of times something is printed. And secondly, there are problems like the SAT problem where #memoize simply doesn't work at all, because the context itself is exponential (this as far as we know). Such problems are called NP-hard.

You can optimize the program by storing result in some sort of cache. So if the result exist in cache then no need to perform the calculation , otherwise calculate and put the value in the cache. By this way you avoid calculating already calculated values. E.g.
cache = {0: 0}
def coinProfit(num):
if num in cache:
return cache[num]
else:
a = num / 2
b = num / 3
c = num / 4
tmp = coinProfit(c) + coinProfit(b) + coinProfit(a)
cache[num] = max(num, tmp)
return cache[num]
while True:
try:
print coinProfit(int(raw_input()))
except:
break

I just tried and noticed a few things... This doesn't have to be considered as The answer.
On my (recent) machine, it takes a solid 30 seconds to compute with n = 100 000 000. I imagine that it's pretty normal for the algorithm you just wrote, because it computes the same values times and times again (you didn't optimise your recursion calls with caching as suggested in other answers).
Also, the problem definition is pretty gentle because it insists: each Bytelandian gold coin has an integer number written on it, but these numbers are all rounded down. Knowing this, you should be turning the three first lines of your function into:
import math
def coinProfit(n):
a = math.floor(n/2)
b = math.floor(n/3)
c = math.floor(n/4)
This will prevent a, b, c to be turned into float numbers (Python3 at least) which would make your computer go like crazy into a big recursive mess, even with the smallest values of n.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - How to improve efficiency of complex recursive function? - python

Related

Julia code not finishing while Python code does

Does an algorithm exist that converts a (base 10) number to into another number for any base in constant time?

Multiprocessing a Straight-forward Computation

Power Function Python

My short recursive function takes too long to execute, how can I optimize it

Categories

Resources